Data warehouse appliances

Analysis of data warehouse appliances – i.e., of hardware/software bundles optimized for fast query and analysis of large volumes of (usually) relational data. Related subjects include:

Data warehousing
Parallelization
Netezza
DATAllegro
Teradata
Kickfire
(in The Monash Report) Computing appliances in multiple domains

January 23, 2007

Arguments AGAINST data warehouse appliances

Data warehouse appliance opponents like to argue that history is conclusively on their side. Database machine maker Britton-Lee, eventually bought by Teradata, fizzled. LISP machines were a spectacular failure. Rational Software’s origins as a special-purpose Ada machine maker had to be renounced before the company could succeed.

But the true story is more mixed. Teradata continues to this day as a major data warehouse technology player, and as far as I’m concerned Teradata indeed makes appliances. If we look further than the applications stack, we find that appliances actually occupy a large and growing share of the computing market. So a persuasive anti-appliance argument has to do more than just invoke the names of Britton-Lee and Symbolics.

I just ran across an article by MIT professor Samuel Madden that attempts to make such a case. And his MIT colleague Mike Stonebraker made similar arguments to me a few days ago. They are not wholly unbiased; indeed, both are involved in Vertica Systems. With that caveat, they have an interesting three-part argument:

Categories: Data warehouse appliances, Data warehousing, Michael Stonebraker

1 Comment

October 5, 2006

Introduction to Kognitio WX-2

Kognitio called me for a briefing this morning on their WX-2 product. Technical highlights included:

Their core technology is MPP/shared-nothing data warehousing.
Unlike most other vendors (but like Greenplum), they are available software-only.
Like DATallegro and Netezza, they have no global indexing.
Unlike the other MPP players, they don’t hash partition the data and lead with hash joins. Rather, they have local compressed bitmap indices on every node.
Similarly, they distribute data utterly randomly and have no concept of range partitioning whatsoever.
Probably for that reason, WX-2 reads data in small 32K blocks. This forfeits the benefit of sequential reads, unless David Aldridge is correct that Linux can take care of that on its own.
They seem more chip-heavy than DATallegro and Netezza. A dual-core Opteron blade with 16 or 32 gigabytes of RAM talks to 144, 288, or in some cases 600 gigabytes of disk (before mirroring).
The position themselves somewhat as being a memory-centric product supplier. While I suspect this is exaggerated, it probably indicates that they’ve put some work into managing RAM as well as disk.

Much like the other “new” MPP data warehouse vendors, Kognitio claims to never have knowingly been outbenchmarked, whether on performance or on TCO factors such as ease of installation.
Read more

Categories: Data warehouse appliances, Data warehousing, Greenplum, Kognitio, Memory-centric data management

11 Comments

October 4, 2006

Data mining is driving much of data warehousing

Until I did all this recent research on data warehousing, I didn’t realize just how big a role data mining plays in driving the whole thing. Basically, there are three things you can do with a data warehouse – classical BI, “operational” BI, and data mining. If we’re talking about long-running queries, that’s not operational BI, and it’s not all of classical BI either. The rest is data mining. Indeed, if you think back to what you know of the customer bases at data warehouse appliance vendors Netezza and DATallegro, there are a lot of credit-reporting-data types of users – i.e., data miners. And it’s hard to talk about uses for those appliances very long without SAS extracts and the like coming up.
Read more

Categories: Data warehouse appliances, Data warehousing, DATAllegro, Netezza, Oracle, Predictive modeling and advanced analytics

8 Comments

October 4, 2006

Philip Howard on Netezza

Philip Howard has published a write-up based on Netezza’s user conference, entertaininly mixing fantasy and reality in his usual manner. Notably, he confuses Netezza’s zone maps, which are basically a very limited form of range partitioning, with something that can substitute for real indices. And the mind boggles at his implication that Netezza has neglected the FPGA in its overall market messaging. More understandable is his regurgitation of Netezza’s claims about heat and power, but although I must confess to not having checked either side’s arithmetic, I find Stuart Frost’s rebuttal in the comments to this thread pretty interesting.

But little nits like that aside — yeah, he went to the same conference I did. 😉

Categories: Data warehouse appliances, Data warehousing, Netezza

Vendor segmentation for data warehouse DBMS

February, 2011 edit: I’ve now commented on Gartner’s 2010 Data Warehouse Database Management System Magic Quadrant as well.

Several vendors are offering links to Gartner’s new Magic Quadrant report on data warehouse DBMS. (Edit: This is now a much better link to the 2006 MQ.) Somewhat atypically for Gartner, there’s a strict hierarchy among most of the vendors, with Teradata > IBM > Oracle > Microsoft > Sybase > Kognitio > MySQL > Sand, in each case on both axes of the matrix. The only two exceptions are Netezza and DATallegro, which are depicted as outvisioning Microsoft somewhat even as they trail both Microsoft and Sybase in execution.

Gartner Magic Quadrants tend to annoy me, and I’m not going to critique the rankings in detail. But I do think this particular MQ is helpful in framing a vendor segmentation, namely:

Big full-spectrum MPP/shared-nothing vendors: Teradata and IBM.
MPP/shared-nothing appliance upstarts: Netezza and DATallegro
Big SMP/shared-everything vendors who also are apt to be your OLTP incumbent, and who want to integrate your software stack soup-to-nuts: Oracle and Microsoft
Niche vendors: Pretty much everybody else

Categories: Data warehouse appliances, Data warehousing, DATAllegro, IBM and DB2, Microsoft and SQL*Server, Netezza, Oracle, Parallelization, Teradata

6 Comments

October 3, 2006

IBM and Teradata too

If I had to name one company with the broadest possible overview of the data warehouse engine market, it would have to be IBM. IBM offers software and hardware, services-heavy deals and quasi-appliances, OLTP and ROLAP, shared-everything and shared-nothing, integrated-(almost)-everything and best-of-breed. So their ROLAP recommendations, while still rather self-serving (just as any other vendor’s would be), are at least somewhat more than just a case of “Where you stand depends upon where you sit.”

At its core, the current IBM ROLAP story is:

Shared nothing MPP.
Flexible indexing, lightly applied.
Normalized data models.
Thoroughly mixed workloads.
Preconfigured hardware.

Here’s some more detail, about IBM and other vendors alike.

Categories: Data warehouse appliances, Data warehousing, DATAllegro, IBM and DB2, Netezza, Teradata

2 Comments

September 28, 2006

Relational data warehouse Expansion (or Explosion) Ratios

One of the least understood aspects of data warehouse technology is what may be called the

Expansion Ratio = (Total disk space used, except for mirroring) / (Size of the base database).

This is similar to the explosion ratio discussed in the OLAP Report’s justly famous discussion of database explosion, but I’m going with my own terminology because I don’t want to be tied to their precise terminology, nor to their technical focus. Expansion Ratios are hotly debated, with some figures being:

Teradata claims an Expansion Ratio of 8-9X for Oracle, 6X for DB2 (open system version), and 2.5X for Teradata. The underlying source is data warehouses they’ve replaced, so there may be a bias toward out-of-control warehouses on the part of their competitors.
An anonymous appliance vendor exec said to me off the top of his head that Oracle has 6-8X Expansion Ratios.
Oracle’s TPC-H submissions in the largest size range (10 terabytes) have 9.7-10.5X Expansion Ratios, if I’m reading the TPCs correctly.
Oracle cites a survey of 8 customers with 10-60 Tb database size in which the Expansion Ratio works out to 1.6X. (More on this anomalous result below.)

I don’t have actual figures from Netezza and DATallegro, but I imagine they’d come out lower than 2X, possibly well below.

Categories: Data warehouse appliances, Data warehousing, Database compression, DATAllegro, IBM and DB2, Netezza, Oracle, Teradata

9 Comments

September 27, 2006

Logless, lockless Netezza more carefully explained

I talked at length with Bill Blake and Doug Johnson of Netezza today. (Bill is exactly the guy I complained of previously having had my access cut off to.) One takeaway was a clarification of their approach to transactions, which sounds even cooler than I first thought. It’s actually not a new idea; they just timestamp rows with CreateIDs and DeleteIDs, then exploit those to the hilt. Actually, it seems like this approach would be interesting in OTLP as well, although I’m not aware of it being used in any of the more successful OLTP DBMS systems. (Yes, this is an open invitation to fans of less-established DBMS products to tell me of their virtues, preferably in a flame-free manner.)
Read more

Categories: Data warehouse appliances, Netezza

5 Comments

September 27, 2006

Oracle and Microsoft in data warehousing

Most of my recent data warehouse engine research has been with the specialists. But over the past couple of days I caught up with Oracle and Microsoft (IBM is scheduled for Friday). In at least three ways, it makes sense to lump those vendors together, and contrast them with the newer data warehouse appliance startups:

Shared-everything architecture
End-to-end solution story
OLTP industrial-strengthness carried over to data warehousing

In other ways, of course, their positions are greatly different. Oracle may have a full order-of-magnitude lead on Microsoft in warehouse sizes, for example, and has a broad range of advanced features that Microsoft either hasn’t matched yet, or else just released in SQL Server 2005. Microsoft was earlier in pushing DBA ease as a major product design emphasis, although Oracle has played vigorous catch-up in Oracle10g.

Categories: Data warehouse appliances, DATAllegro, EAI, EII, ETL, ELT, ETLT, IBM and DB2, Microsoft and SQL*Server, Netezza, Oracle, Parallelization, Teradata

1 Comment

September 24, 2006

More on data warehouse architecture choices

The very name of this blog comes from the kind of “horses for courses” data store strategy implied by my recent post on different kinds of data warehouse uses. A number of other commentators have recently made similar points, although they may not agree with every detail. For example, William McKnight pretty much makes the pure DBMS2 argument, pointing out that a partially virtual warehouse is often superior to a fully centralized physical one. And Andy Hayler of Kalido says pretty much the same thing, although he strongly calls out his difference in emphasis from William’s view.

A tip of the hat to Mark Rittman for pointing me to those two and others.

Categories: Data warehouse appliances, EAI, EII, ETL, ELT, ETLT, Theory and architecture

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in