Database compression
Analysis of technology that compresses data within a database management system. Related subjects include:
Mike Stonebraker Blasts “One Size Fits All”
When it comes to DBMS inventors, Mike Stonebraker is the next closest thing to Codd. And he’s become a huge non-believer in the idea that one DBMS architecture meets all needs.
Frankly, there isn’t much in that paper that hasn’t already been said in this blog, except for the part that is specifically relevant to one of his startups, StreamBase. Still, it’s nice to have the high-powered agreement.
More recently, the argument in that paper has been extended with a benchmark-filled follow-up based on another Stonebraker startup, Vertica.
Categories: Columnar database management, Database compression, StreamBase, Theory and architecture, Vertica Systems | Leave a Comment |
Relational data warehouse Expansion (or Explosion) Ratios
One of the least understood aspects of data warehouse technology is what may be called the
Expansion Ratio = (Total disk space used, except for mirroring) / (Size of the base database).
This is similar to the explosion ratio discussed in the OLAP Report’s justly famous discussion of database explosion, but I’m going with my own terminology because I don’t want to be tied to their precise terminology, nor to their technical focus. Expansion Ratios are hotly debated, with some figures being:
- Teradata claims an Expansion Ratio of 8-9X for Oracle, 6X for DB2 (open system version), and 2.5X for Teradata. The underlying source is data warehouses they’ve replaced, so there may be a bias toward out-of-control warehouses on the part of their competitors.
- An anonymous appliance vendor exec said to me off the top of his head that Oracle has 6-8X Expansion Ratios.
- Oracle’s TPC-H submissions in the largest size range (10 terabytes) have 9.7-10.5X Expansion Ratios, if I’m reading the TPCs correctly.
- Oracle cites a survey of 8 customers with 10-60 Tb database size in which the Expansion Ratio works out to 1.6X. (More on this anomalous result below.)
I don’t have actual figures from Netezza and DATallegro, but I imagine they’d come out lower than 2X, possibly well below.
Categories: Data warehouse appliances, Data warehousing, Database compression, DATAllegro, IBM and DB2, Netezza, Oracle, Teradata | 9 Comments |
SAP’s BI Accelerator
I wrote about SAP’s BI Accelerator quite a bit in my white paper on memory-centric data management, but otherwise I seem not to have posted much about it here. In essence, it’s a product that’s all RAM-based, and generally geared for multi-hundred-gigabyte data marts. The basic design is a compression-heavy column-based architecture, evolved from SAP’s text-indexing technology TREX. Like data warehouse appliances, it eschews indexing, relying instead on blazingly fast table scans.
I asked Lothar Schubert of SAP how BIA was doing in the market in its early going. This was his response: