A quick survey of data warehouse management technology
There are at least 16 different vendors offering appliances and/or software that do database management primarily for analytic purposes.* That’s a lot to keep up with,. So I’ve thrown together a little overview of the analytic data management landscape, liberally salted with links to information about specific vendors, products, or technical issues. In some ways, this is a companion piece to my prior post about data warehouse appliance myths and realities.
*And that’s just the tabular/alphanumeric guys. Add in text search and you run the total a lot higher.
Numerous data warehouse specialists offer traditional row-based relational DBMS architectures, but optimize them for analytic workloads. These include Teradata, Netezza, DATAllegro, Greenplum, Dataupia, and SAS. All of those except SAS are wholly or primarily vendors of MPP/shared-nothing data warehouse appliances. EDIT: See the comment thread for a correction re Kognitio.
Numerous data warehouse specialists offer column-based relational DBMS architectures. These include Sybase (with the Sybase IQ product, originally from Expressway), Vertica, ParAccel, Infobright, Kognitio (formerly White Cross), and Sand. Their products are generally available in software-only formats, although Vertica and ParAccel package their offerings as appliances too.
There are some array-based MOLAP (Multidimensional OnLine Analytical Processing) systems left. But the major ones are all now at Oracle, Microsoft, and IBM. Essbase wound up at Oracle, via the Hyperion acquisition. Express went to Oracle long ago, and got tightly integrated into the Oracle DBMS. Microsoft Analysis Services contains a MOLAP engine federated to Microsoft SQL Server. Applix‘s memory-centric TM1 went to Cognos, which had a couple of other MOLAP engines as well; Cognos is being bought by IBM.
There aren’t any star-schema specialists of note left. Most of them – actually just two, namely Red Brick and Stanford — merged into Informix a decade ago. Informix was later bought (in two stages) by IBM. Star schemas are now just a feature of general-purpose systems.
Of course, every general-purpose relational database management system can be used for a lot of analytic purposes. That’s the whole reason Codd introduced the relational model. What’s more, the leading SMP/shared-everything DBMS – Oracle, DB2 mainframe, and to a lesser extent Microsoft SQL Server – can be used even for very large databases, if you partition carefully and write your SQL code accordingly.
That’s 14 vendors already, without mentioning Calpont (hasn’t briefed me recently enough), HP (ditto, and partly working through Vertica), Sun (working through Greenplum and ParAccel), Attivio, the memory-centric engines of BI vendors such as QlikTech and SAP (not exactly database management), or the complex event/stream processing vendors such as Coral8, StreamBase, or Progress Apama (ditto). Methinks there’s some consolidation ahead.
Yet more links:
- Why Oracle and Microsoft are losing in VLDB data warehousing
- Three ways Oracle and Microsoft could catch up in MPP data warehousing
- IBM is oddly weak in the data warehouse market
- Some very big Teradata sites
- Extensive and overlapping coverage of Netezza, Vertica, database compression, and column-oriented database architectures.
- DATAllegro as an exemplar of non-proprietary index-light MPP data warehouse appliances
- An old article on Oracle’s integration of Express.
Comments
11 Responses to “A quick survey of data warehouse management technology”
Leave a Reply
Curt, Kognitio WX2 (formerly WhiteCross) is a standard row based relational database similar to Netezza, Datallegro, Teradata etc.. The differentiation from these other vendors is that WX2 is a software based product that will run on commodity X86 servers running Linux i.e. a virtual data warehouse appliance. WX2 has always been row based for simplicity (in an MPP architecture) and for high performance scalable loading.
Paul Groom
Director, Business Intelligence
Kognitio
Oh, crumb. I’m sorry about that.
I should have checked my own post from last year. http://www.dbms2.com/2006/10/05/introduction-to-kognitio-wx-2/
CAM
No DB2 on AIX! are you serious. Gartner continues to put DB2/AIX at the top right hand corner of their quadrants.
Fair enough. I didn’t say what I could or should about DB2. DB2 mainframe is another shared-everything system. DB2 on open systems — in practice, that means AIX — is in theory a solid MPP/shared-nothing system, with the BCUs playing a somewhat appliance-like role.
As I said in http://www.dbms2.com/2007/10/05/the-four-horsemen-of-data-warehousing/ and http://www.dbms2.com/2007/10/09/another-firm-that-never-sees-db2-in-data-warehousing/ , it’s pretty surprising how little data warehouse traction DB2 has, given DB2’s architecture as per http://www.dbms2.com/2006/10/03/ibm-and-teradata-too/ .
My comments about the Gartner MQ are summed up in http://www.dbms2.com/2007/10/19/gartner-2007-magic-quadrant-for-data-warehouse-database-management-systems/ (2007) and http://www.dbms2.com/2006/10/03/vendor-segmentation-for-data-warehouse-dbms/ (2006).
CAM
Wait a moment — was I also wrong when I wrote that Kognitio “relies on compressed bitmaps” for data access?
CAM
Curt, with regard to “compressed bitmaps,” and not knowing much about Kognitio, two things:
– Why bother to compress a bitmap? It’s already compact, and I’d think that the overhead in compression/decompression wouldn’t be worth the space savings.
– I believe that column stores typically don’t rely on indexes. That’s one reason they have fast load times.
1. My confusion was to remember the bitmaps and think that Kognitio was actually a column store. In part, it’s a distinction without a difference. Bitmaps have the same updating issues column stores do.
2. “Compression” in bitmaps comes into play in at least two ways. One is sparsity. In a naive bitmap, if the cardinality of a column is N, then 1/N of the entries will be 1 and (N-1)/N of them will be 0. That’s food for sparsity compression.
Second, that’s not how bitmaps really are implemented. If cardinality is 1024, there aren’t 1024 columns of bits implemented. Rather, numbers are assigned from 0 to 1023, and those are represented in 10 columns of bits. I.e., bitmaps and dictionary/tokenized compression are pretty much the same thing these days, with “bitmap” being a somewhat antiquated term.
CAM
Hi:
I have worked pretty extensively with DB2 ESE/EEE, and somewhat with both Greenplum and WX2. The later two are marketting themselves as high end data warehouse MPP databases. Can anyone of you please explain me
why they are superior to db2? I do not see any difference in their architecture! All three are MPPs. One of them is a very proven software, another runs on postgress, and the third is home made.
Performance wise, no one would be able to beat DB2 as they run on P series hardware, and power 6 cpus running at 4 GHz
I am wondering why IBM cannot position db2 as a competitor of Teradata and why they are going after Oracle.
Thanks
DB2 absolutely sounds like it has a good architecture. I don’t know what their problem is either. Perhaps, as in the special case of Viper, the practical implementation doesn’t live up to theory?
I would take issue with you on one thing — DB2 rightfully competes BOTH with Teradata and Oracle.
CAM
[…] A quick survey of data warehouse management technology […]
[…] execs Paul Groom and John Thompson. Hopefully I can now clear up some confusion that was created in this comment thread. (Most of what I wrote about Kognitio in October, 2006 still applies.) Here are some […]