Data warehouse appliances

Analysis of data warehouse appliances – i.e., of hardware/software bundles optimized for fast query and analysis of large volumes of (usually) relational data. Related subjects include:

Data warehousing
Parallelization
Netezza
DATAllegro
Teradata
Kickfire
(in The Monash Report) Computing appliances in multiple domains

September 24, 2006

Data warehouse and mart uses – a tentative taxonomy

I’ve been posting a lot recently about the diverse database technologies used to support data warehousing. With the marketplace supporting such a broad range of architectures, it seems clear that a lot of those architectures actually deserve to thrive, presumable each in a different kind of usage scenario. So in this post I’ll take a pass at dividing up use cases for data warehouses, and suggesting which kinds of data warehouse management technologies might do the best job of supporting them. To start with, I’ve divided things into a number of buckets:

Pinpoint data lookup
Constrained query and reporting
Cube-filling calculations
Hardcore tabular data crunching
Text and media search
Specialty areas, such as relationship analytics

Categories: Data warehouse appliances, Data warehousing, DATAllegro, IBM and DB2, MOLAP, Netezza, Teradata

1 Comment

September 22, 2006

Competitive issues in data warehouse ease of administration

The last person I spoke with at the Netezza conference on Tuesday was a customer/presenter that the company had picked out for me. One thing he said baffled me — he claimed that Netezza was a real appliance vendor, but DATallegro wasn’t, presumably due to administrability issues. Now, it wasn’t clear to me that he’d ever evaluated DATallegro, so I didn’t take this too seriously, but still the exchange brought into focus the great differences between data warehouse products in the area of administration. For example:

Netezza has no indices at all. And no caches. And the hardware is preconfigured. This all makes administration pretty simple.
DATallegro has almost no indices, and also has preconfigured hardware. But it has some partitioning, optionally.
Teradata also has preconfigured hardware. It does have indices, but rather simple ones. Plus it has join indices. And it has a few more configuration options in other areas (e.g., block size) than the other appliance vendors. (Yes, I count Teradata among the appliances.)
If you go through all the fuss of installing SAP’s applications and BI technology anyway, the incremental administration of just SAP BI Accelerator is pretty light.
Oracle and IBM have mammothly complex indexing options, but have put large amounts of work into tools to lessen the resulting administrative burden.
IBM offers preconfigured hardware units to simplify some installation issues.
Come to think of it, I don’t really know how hard it is to administer columnar systems (e.g., Sybase IQ).

Categories: Data warehouse appliances, Data warehousing, DATAllegro, Greenplum, IBM and DB2, Netezza, Oracle, SAP AG, Teradata

3 Comments

September 20, 2006

SAP’s BI Accelerator

I wrote about SAP’s BI Accelerator quite a bit in my white paper on memory-centric data management, but otherwise I seem not to have posted much about it here. In essence, it’s a product that’s all RAM-based, and generally geared for multi-hundred-gigabyte data marts. The basic design is a compression-heavy column-based architecture, evolved from SAP’s text-indexing technology TREX. Like data warehouse appliances, it eschews indexing, relying instead on blazingly fast table scans.

I asked Lothar Schubert of SAP how BIA was doing in the market in its early going. This was his response:

Categories: Analytic technologies, Business intelligence, Data warehouse appliances, Data warehousing, Database compression, Memory-centric data management, SAP AG

8 Comments

September 20, 2006

Myths about DATallegro, Ingres, open source, etc.

Sometimes, when one talks to a company about a close competitor, what one hears may not be 100% strictly accurate. Yesterday, I more than once heard claims that sounded oddly like “DATallegro has to open source whatever software it develops.” Today, DATallegro CEO Stuart Frost clarified as follows:

• DATallegro has no (little?) legal obligation to open source anything. Even the version of Ingres they use is not the GPL one.
• They do give a few enhancements back to Ingres (via open source?) rather than maintain them themselves.
• The whole MPP technology is proprietary, in every sense of “proprietary.” (For example, they use a whole different optimizer than Ingres’s. I’ve forgotten whether the Ingres optimizer is also left in place.)

Categories: Actian and Ingres, Data warehouse appliances, DATAllegro, Memory-centric data management, Open source

1 Comment

September 20, 2006

Teradata vs. the new appliance vendors, technically

Todd Walter and Randy Lea of Teradata gave generously of their time today, ducking out of their user conference, and shared their take on issues we’ve been discussing here recently. Overall, Teradata response to the data warehouse appliance guys is essentially: “Well, those may be fine for specific queries, or for data marts, but in true blended enterprise data warehouse workloads we’re superior, including in performance.”

Specific takeaways included:

Categories: Data warehouse appliances, DATAllegro, Netezza, Teradata

4 Comments

September 20, 2006

No locks, no logs — no problem?

There’s another cool-sounding part to the Netezza story, which straddles their chips and their software: The FPGA takes over the work of assuring database consistency. If the system attempts to read and write a record at the same time, the FPGA keeps thing straight. This eliminates the need for locks — at least if you don’t care about transactional integrity — and some of the reason for logs. (I guess that in lieu of any kind of rollback/rollforward they just rely on failover to mirrored disks.)

This isn’t exactly the way one would want to do OLTP, and in general my head is shaking as I write this — but it sure seems to suffice for some rather demanding data warehouse users.

Categories: Data warehouse appliances, Netezza, Theory and architecture

2 Comments

September 20, 2006

Netezza’s chip story

In addition to its software story, Netezza of course has a rather unique chip story. Where other vendors might have standard disk controllers and high-powered microprocessors, Netezza respectively has a FPGA (Field-Programmable Gate Array) and lesser microprocessor (PowerPC). Netezza claims that two major advantages of these choices are:

5X throughput/performance improvement
Much lower heat and power consumption.

The main function of the FPGA, other than generically getting data on and off disk, is to restrict and project tables (i.e., execute single-table WHERE clauses). Netezza claims that their FPGAs can perform these operations on the streaming data at least as quickly as an expensive, hot, power-hungry top-end microprocessor would, and indeed faster. The key word is “streaming”, which they contrast to the microprocessor’s need to get the data in and then back out of RAM (cache or otherwise).

I’ll be interested to see whether somebody can muster a ringing refutation to Netezza’s claims.

Categories: Data warehouse appliances, Netezza

12 Comments

September 20, 2006

Netezza vs. conventional data warehousing RDBMS

For various reasons, I’m not going to try to give a comprehensive overview of the Netezza story. But I’d like to highlight four points that illustrate a lot of the difference between Netezza’s architecture and that of more conventional data warehousing DBMS.
Read more

Categories: Data warehouse appliances, Data warehousing, DATAllegro, Netezza

6 Comments

September 20, 2006

Dealing with Netezza has not been easy

Over the past year, Netezza has exhibited the squirreliest question-dodging behavior I’ve seen from a DBMS vendor since – actually, since Sybase tried to conceal the System 10 fiasco in 1993-5. To its credit, however, Netezza finally decided to open the kimono. Specifically, they invited me to their user conference, which I attended today, and indeed were quite helpful in FINALLY getting my questions addressed, and in offering more access as needed.
Read more

Categories: Data warehouse appliances, Netezza

2 Comments

September 19, 2006

Is data warehousing now all about sequential access?

A lot of evidence is pointing to a major paradigm shift in data warehouse RDBMS, along the lines of:

Old way: Assume I/O is random; lower total execution time by improving selectivity and thus lowering the amount of I/O.

New way: Drive the amount of random I/O to near zero, and do as much sequential I/O as necessary to achieve this goal.

Examples include:

Data warehouse appliances (see especially this discussion of DATallegro’s architecture)
Columnar systems (see Nathan Myer’s first comment in this discussion of the much-hyped Required Technologies prototype)
Memory-centric systems, notably SAP’s BI Accelerator

Categories: Data warehouse appliances, DATAllegro, Memory-centric data management, SAP AG, Theory and architecture, TransRelational

4 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Data warehouse appliances

Data warehouse and mart uses – a tentative taxonomy

Competitive issues in data warehouse ease of administration

SAP’s BI Accelerator

Myths about DATallegro, Ingres, open source, etc.

Teradata vs. the new appliance vendors, technically

No locks, no logs — no problem?

Netezza’s chip story

Netezza vs. conventional data warehousing RDBMS

Dealing with Netezza has not been easy

Is data warehousing now all about sequential access?

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin