March 26, 2007

White paper — Index-Light MPP Data Warehousing

Many of my thoughts on data warehouse DBMS and appliances have been collected in a white paper, sponsored by DATAllegro. As in a couple of other white papers — collected here — I coined a phrase to describe the core concept: Index-light. MPP row-oriented data warehouse DBMSs certainly have indices, which are occasionally even used. But the approaches to database design that are supported or make sense to use are simply different for DATAllegro, Netezza (the most extreme example of all) or Teradata than for Oracle or Microsoft. And the differences are all in the direction of less indexing.

Here’s an excerpt from the paper. Please pardon the formatting; it reads better in the actual .PDF

Different DBMS are best at different tasks.	A single relational database management system (RDBMS) can perform a broad variety of duties. It may even do them all pretty well. But for some uses, a special-purpose product can greatly outperform general-purpose systems. Complex data warehousing is such a task.
Index-light MPP appliances excel at data warehousing.	For most data warehouses, market-leading general-purpose RDBMS are good enough. But for complex queries against multi-terabyte data warehouses, index-light MPP data warehouse appliances are a much more efficient option. Offered by DATAllegro, Netezza, Teradata (if you use the term “appliance” a bit loosely), and IBM (if you use the term “appliance” very loosely), these systems beat their index-heavy SMP counterparts on several major criteria: Performance Price/performance Consistency of performance Administration costs
Much of this superiority stems from three factors.	The index-light MPP (Massively Parallel Processing) appliance story hinges on three technical factors: 1. Shared-nothing MPP. Loosely-coupled systems are significantly cheaper than tightly-coupled ones, for the same level of raw component performance. 2. Reduced use of indices. By minimizing redundant references to information, index-light systems can store up to 7X less data than index-heavy ones. This produces enormous savings both in hardware and in administrative costs. 3. Avoidance of random disk reads. Disk rotation speeds have only improved 12.5-fold in the past 50 years, making random disk lookup the greatest constraint on conventional RDBMS performance. Index-light systems largely evade this bottleneck.
DATAllegro offers a prime example.	DATAllegro offers what may be the archetype of the index-light MPP appliance strategy. A typical system contains multiple standard servers, each responsible for 6-12 standard disk drives, for a total installation in the tens of terabytes. (Indeed, as of DATAllegro V3, the servers and storage units are just standard Dell and EMC products respectively.) Data generally comes off the disks in full table or partition scans, in 24-megabyte blocks, but you can use the functionality of Ingres if you want to. And the whole thing is a lot faster and cheaper than conventional index-heavy alternatives.

Categories: Data warehouse appliances, Data warehousing, DATAllegro, EMC, Theory and architecture

Subscribe to our complete feed!

Comments

4 Responses to “White paper — Index-Light MPP Data Warehousing”

DBMS2 — DataBase Management System Services»Blog Archive » Another short white paper on MPP data warehouse appliances on May 10th, 2007 12:34 pm

[…] up on an earlier piece, DATAllegro has sponsored a second white paper on MPP data warehouse appliances. This one focuses […]
DBMS2 — DataBase Management System Services » Blog Archive » Notes from the Netezza user conference on April 25th, 2008 12:07 am

[…] Yes, Netezza streams data off of disk rather than doing a lot of random seeks. But DATAllegro does the same thing, without recourse to FPGAs. That doesn’t really have much to do with complex event processing […]
MapReduce for data mining? Maybe for variable-schema analytics. | DBMS2 -- DataBase Management System Services on August 25th, 2008 3:52 am

[…] that the data warehouse appliance vendors have ALREADY disrupted the market he’s focusing on. Index-light row-based and columnar systems are both super fast at data mining […]
The disk rotation speed bottleneck | DBMS2 -- DataBase Management System Services on January 31st, 2010 6:02 pm

[…] been referring to the disk (rotation) speed bottleneck for years, but I don’t really have a clean link for it. Let me fix that right […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

White paper — Index-Light MPP Data Warehousing

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin