IBM and DB2
Analysis of IBM and various of its product lines in database management, analytics, and data integration.
- Cognos
- solidDB
- (in The Monash Report) Operational and strategic issues for IBM
- (in Text Technologies) IBM in the text analytics market
- (in Software Memories) Historical notes on IBM
- (in Software Memories) Historical notes on Informix
Vendor segmentation for data warehouse DBMS
February, 2011 edit: I’ve now commented on Gartner’s 2010 Data Warehouse Database Management System Magic Quadrant as well.
Several vendors are offering links to Gartner’s new Magic Quadrant report on data warehouse DBMS. (Edit: This is now a much better link to the 2006 MQ.) Somewhat atypically for Gartner, there’s a strict hierarchy among most of the vendors, with Teradata > IBM > Oracle > Microsoft > Sybase > Kognitio > MySQL > Sand, in each case on both axes of the matrix. The only two exceptions are Netezza and DATallegro, which are depicted as outvisioning Microsoft somewhat even as they trail both Microsoft and Sybase in execution.
Gartner Magic Quadrants tend to annoy me, and I’m not going to critique the rankings in detail. But I do think this particular MQ is helpful in framing a vendor segmentation, namely:
- Big full-spectrum MPP/shared-nothing vendors: Teradata and IBM.
- MPP/shared-nothing appliance upstarts: Netezza and DATallegro
- Big SMP/shared-everything vendors who also are apt to be your OLTP incumbent, and who want to integrate your software stack soup-to-nuts: Oracle and Microsoft
- Niche vendors: Pretty much everybody else
Categories: Data warehouse appliances, Data warehousing, DATAllegro, IBM and DB2, Microsoft and SQL*Server, Netezza, Oracle, Parallelization, Teradata | 6 Comments |
IBM and Teradata too
If I had to name one company with the broadest possible overview of the data warehouse engine market, it would have to be IBM. IBM offers software and hardware, services-heavy deals and quasi-appliances, OLTP and ROLAP, shared-everything and shared-nothing, integrated-(almost)-everything and best-of-breed. So their ROLAP recommendations, while still rather self-serving (just as any other vendor’s would be), are at least somewhat more than just a case of “Where you stand depends upon where you sit.”
At its core, the current IBM ROLAP story is:
- Shared nothing MPP.
- Flexible indexing, lightly applied.
- Normalized data models.
- Thoroughly mixed workloads.
- Preconfigured hardware.
Here’s some more detail, about IBM and other vendors alike.
Categories: Data warehouse appliances, Data warehousing, DATAllegro, IBM and DB2, Netezza, Teradata | 2 Comments |
Relational data warehouse Expansion (or Explosion) Ratios
One of the least understood aspects of data warehouse technology is what may be called the
Expansion Ratio = (Total disk space used, except for mirroring) / (Size of the base database).
This is similar to the explosion ratio discussed in the OLAP Report’s justly famous discussion of database explosion, but I’m going with my own terminology because I don’t want to be tied to their precise terminology, nor to their technical focus. Expansion Ratios are hotly debated, with some figures being:
- Teradata claims an Expansion Ratio of 8-9X for Oracle, 6X for DB2 (open system version), and 2.5X for Teradata. The underlying source is data warehouses they’ve replaced, so there may be a bias toward out-of-control warehouses on the part of their competitors.
- An anonymous appliance vendor exec said to me off the top of his head that Oracle has 6-8X Expansion Ratios.
- Oracle’s TPC-H submissions in the largest size range (10 terabytes) have 9.7-10.5X Expansion Ratios, if I’m reading the TPCs correctly.
- Oracle cites a survey of 8 customers with 10-60 Tb database size in which the Expansion Ratio works out to 1.6X. (More on this anomalous result below.)
I don’t have actual figures from Netezza and DATallegro, but I imagine they’d come out lower than 2X, possibly well below.
Categories: Data warehouse appliances, Data warehousing, Database compression, DATAllegro, IBM and DB2, Netezza, Oracle, Teradata | 9 Comments |
Oracle and Microsoft in data warehousing
Most of my recent data warehouse engine research has been with the specialists. But over the past couple of days I caught up with Oracle and Microsoft (IBM is scheduled for Friday). In at least three ways, it makes sense to lump those vendors together, and contrast them with the newer data warehouse appliance startups:
- Shared-everything architecture
- End-to-end solution story
- OLTP industrial-strengthness carried over to data warehousing
In other ways, of course, their positions are greatly different. Oracle may have a full order-of-magnitude lead on Microsoft in warehouse sizes, for example, and has a broad range of advanced features that Microsoft either hasn’t matched yet, or else just released in SQL Server 2005. Microsoft was earlier in pushing DBA ease as a major product design emphasis, although Oracle has played vigorous catch-up in Oracle10g.
Categories: Data warehouse appliances, DATAllegro, EAI, EII, ETL, ELT, ETLT, IBM and DB2, Microsoft and SQL*Server, Netezza, Oracle, Parallelization, Teradata | 1 Comment |
Data warehouse and mart uses – a tentative taxonomy
I’ve been posting a lot recently about the diverse database technologies used to support data warehousing. With the marketplace supporting such a broad range of architectures, it seems clear that a lot of those architectures actually deserve to thrive, presumable each in a different kind of usage scenario. So in this post I’ll take a pass at dividing up use cases for data warehouses, and suggesting which kinds of data warehouse management technologies might do the best job of supporting them. To start with, I’ve divided things into a number of buckets:
- Pinpoint data lookup
- Constrained query and reporting
- Cube-filling calculations
- Hardcore tabular data crunching
- Text and media search
- Specialty areas, such as relationship analytics
Categories: Data warehouse appliances, Data warehousing, DATAllegro, IBM and DB2, MOLAP, Netezza, Teradata | 1 Comment |
Competitive issues in data warehouse ease of administration
The last person I spoke with at the Netezza conference on Tuesday was a customer/presenter that the company had picked out for me. One thing he said baffled me — he claimed that Netezza was a real appliance vendor, but DATallegro wasn’t, presumably due to administrability issues. Now, it wasn’t clear to me that he’d ever evaluated DATallegro, so I didn’t take this too seriously, but still the exchange brought into focus the great differences between data warehouse products in the area of administration. For example:
- Netezza has no indices at all. And no caches. And the hardware is preconfigured. This all makes administration pretty simple.
- DATallegro has almost no indices, and also has preconfigured hardware. But it has some partitioning, optionally.
- Teradata also has preconfigured hardware. It does have indices, but rather simple ones. Plus it has join indices. And it has a few more configuration options in other areas (e.g., block size) than the other appliance vendors. (Yes, I count Teradata among the appliances.)
- If you go through all the fuss of installing SAP’s applications and BI technology anyway, the incremental administration of just SAP BI Accelerator is pretty light.
- Oracle and IBM have mammothly complex indexing options, but have put large amounts of work into tools to lessen the resulting administrative burden.
- IBM offers preconfigured hardware units to simplify some installation issues.
- Come to think of it, I don’t really know how hard it is to administer columnar systems (e.g., Sybase IQ).
Categories: Data warehouse appliances, Data warehousing, DATAllegro, Greenplum, IBM and DB2, Netezza, Oracle, SAP AG, Teradata | 3 Comments |
Data warehouse appliances
If we define a “data warehouse appliance” as “a special-purpose computer system, with appliance administratibility, that manages a data warehouse,” then there are two major contenders: Netezza and DATAllegro, both startups, both with a small number of disclosed customers. Past contenders would include Teradata and White Cross (which seems to have just merged into Kognitio), but neither would admit to being in that market today. (I suspect this is a mistake on Teradata’s part, but so be it.) IBM with DB2 on the z-Series wouldn’t be properly regarded as an appliance player either, although IBM is certainly conscious of appliance competition. And SAP’s BI Accelerator does not persist data at this time.
In principle, the Netezza and DATAllegro stories are similar — take an established open source RDBMS*, build optimized hardware to run it, and optimize the software configuration as well. Much of the optimization is focused on getting data on and off disk sequentially, minimizing any random accesses. This is why I often refer to data warehouse appliances as being the best alternative to memory-centric data management. Beyond that, the optimizations by the two vendors differ considerably.
*Netezza uses PostgreSQL; DATAllegro uses Ingres.
Hmm. I don’t feel like writing more on this subject at this very moment, yet I want to post something urgently because there’s an IOU in my Computerworld column today for it. OK. More later.
Categories: Actian and Ingres, Companies and products, Data warehouse appliances, DATAllegro, DBMS product categories, IBM and DB2, Memory-centric data management, Open source, SAP AG | Leave a Comment |
Philip Howard likes Viper
Philip Howard likes DB2’s Viper release. Truth be told, Philip Howard seems to like most products, whether they deserve it or not. But in this case, I think his analysis is spot-on.
Categories: IBM and DB2, OLTP, Structured documents | Leave a Comment |
DBMS2 at IBM
I had a chat a couple of weeks ago with Bob Picciano, who runs servers (i.e., DBMS) for IBM. I came away feeling that, while they don’t use that name, they’re well down the DBMS2 path. By no means is this SAP’s level of commitment; after all, they have to cater to traditional technology strategies as well. But they definitely seem to be getting there.
Why do I say that? Well, in no particular order:
- They have a huge commitment to a data integration business, with an increasing XML focus.
- Their favorite buzzword these days is “information-intensive,” which seems to amount to semi-composite apps that may talk in part to unstructured/semi-structured data.
- They’re serious about their XML data server.
- Unprompted – well, OK, he’s clearly read my stuff, but other than that it was unprompted – Bob referred to one of the key benefits (real and perceived) of XML storage as being “schema flexibility.”
- By accident or design, IBM has a multi-server, horses-for-courses DBMS strategy: DB2 in two important flavors, XML server, Multivalue/Pick (that’s growing, by the way), and so on.
The big piece of a DBMS2 strategy that IBM seems to be lacking is a data-oriented services repository. IBM has had disasters in the past with over-grand repository plans, so they’re treading cautiously this time around. There also might be an organizational issue; DBMS and integration technology sit in separate divisions, and I doubt it’s yet appreciated throughout IBM how central data is to an SOA strategy.
But that not-so-minor detail aside, IBM definitely seems to be developing a DBMS2-like technology vision.
Categories: EAI, EII, ETL, ELT, ETLT, IBM and DB2, OLTP, Structured documents | Leave a Comment |
IBM’s definition of native XML
IBM’s recent press release on Viper says:
Viper is expected to be the only database product able to seamlessly manage both conventional relational data and pure XML data without requiring the XML data to be reformatted or placed into a large object within the database.
That, so far as I know, is true, at least among major products.
I’m willing to apply the “native” label to Microsoft’s implementation anyway, because conceptually there’s little or no necessary performance difference between their approach and IBM’s. (Dang. I thought I posted more details on that months ago. I need to remedy the lack soon.)
As for Oracle — well, right now Oracle has a bit of a competitive problem …
Categories: IBM and DB2, Microsoft and SQL*Server, Oracle, Structured documents | 1 Comment |