White paper — Index-Light MPP Data Warehousing
Many of my thoughts on data warehouse DBMS and appliances have been collected in a white paper, sponsored by DATAllegro. As in a couple of other white papers — collected here — I coined a phrase to describe the core concept: Index-light. MPP row-oriented data warehouse DBMSs certainly have indices, which are occasionally even used. But the approaches to database design that are supported or make sense to use are simply different for DATAllegro, Netezza (the most extreme example of all) or Teradata than for Oracle or Microsoft. And the differences are all in the direction of less indexing.
Here’s an excerpt from the paper. Please pardon the formatting; it reads better in the actual .PDF
Different DBMS are best at different tasks. |
A single relational database management system (RDBMS) can perform a broad variety of duties. It may even do them all pretty well. But for some uses, a special-purpose product can greatly outperform general-purpose systems. Complex data warehousing is such a task.
|
Index-light MPP appliances excel at data warehousing. |
For most data warehouses, market-leading general-purpose RDBMS are good enough. But for complex queries against multi-terabyte data warehouses, index-light MPP data warehouse appliances are a much more efficient option. Offered by DATAllegro, Netezza, Teradata (if you use the term “appliance” a bit loosely), and IBM (if you use the term “appliance” very loosely), these systems beat their index-heavy SMP counterparts on several major criteria:
|
Much of this superiority stems from three factors. |
The index-light MPP (Massively Parallel Processing) appliance story hinges on three technical factors:
1. Shared-nothing MPP. Loosely-coupled systems are significantly cheaper than tightly-coupled ones, for the same level of raw component performance. 2. Reduced use of indices. By minimizing redundant references to information, index-light systems can store up to 7X less data than index-heavy ones. This produces enormous savings both in hardware and in administrative costs. 3. Avoidance of random disk reads. Disk rotation speeds have only improved 12.5-fold in the past 50 years, making random disk lookup the greatest constraint on conventional RDBMS performance. Index-light systems largely evade this bottleneck.
|
DATAllegro offers a prime example. |
DATAllegro offers what may be the archetype of the index-light MPP appliance strategy. A typical system contains multiple standard servers, each responsible for 6-12 standard disk drives, for a total installation in the tens of terabytes. (Indeed, as of DATAllegro V3, the servers and storage units are just standard Dell and EMC products respectively.) Data generally comes off the disks in full table or partition scans, in 24-megabyte blocks, but you can use the functionality of Ingres if you want to. And the whole thing is a lot faster and cheaper than conventional index-heavy alternatives. |
Comments
4 Responses to “White paper — Index-Light MPP Data Warehousing”
Leave a Reply
[…] up on an earlier piece, DATAllegro has sponsored a second white paper on MPP data warehouse appliances. This one focuses […]
[…] Yes, Netezza streams data off of disk rather than doing a lot of random seeks. But DATAllegro does the same thing, without recourse to FPGAs. That doesn’t really have much to do with complex event processing […]
[…] that the data warehouse appliance vendors have ALREADY disrupted the market he’s focusing on. Index-light row-based and columnar systems are both super fast at data mining […]
[…] been referring to the disk (rotation) speed bottleneck for years, but I don’t really have a clean link for it. Let me fix that right […]