Dividing the data warehousing work among MPP nodes
I talk with lots of vendors of MPP data warehouse DBMS. I’ve now heard enough different approaches to MPP architecture that I think it might be interesting to contrast some of the alternatives.
The base-case MPP DBMS architecture is one in which there are two kinds of nodes:
- A boss node, whose jobs include:
- Receiving and parsing queries
- Optimizing queries, determining execution plans, and sending execution plans to the nodes
- Receiving result sets and sending them back to the querier
- Worker nodes, which do their part of the query execution job and eventually ship data back to the head
In primitive forms of this architecture, there’s a “fat head” that does altogether too much aggregation and query resolution. In more mature versions, data is shipped intelligently from worker nodes to their peers, reducing or eliminating “fat head” bottlenecks.
Exceptions to the base case include Vertica and Exasol. In their systems, all nodes run identical software. At the other extreme, some vendors use dedicated nodes for particular purposes. For example, Aster Data famously has special nodes for bulk data loading and export. Greenplum has a logical split between nodes that execute queries and nodes that talk to storage, and is considering offering the option of physically separating them in a future release.
The basic tradeoffs between these schemes go something like this:
- If there are more kinds of dedicated nodes, real-time load-balancing is harder; you’re more likely to have idle capacity.
- If there are more kinds of dedicated nodes, you can optimize hardware better, by using different kinds of hardware for different kinds of nodes. Potentially, this is a bigger factor if some kinds of nodes have dedicated disks attached and some don’t.
Calpont, which hasn’t actually shipped a DBMS yet, has an interesting twist. They’re building a columnar DBMS in which the querying work is split between a kind of worker node, which does the query processing, and a storage node, which talks to disk. These nodes are not in any kind of one-to-one correspondence; any worker node can talk with any storage node. Calpont believes that in the future some of the storage node logic can migrate into storage systems themselves, in almost a Netezza-like strategy, but on more standard equipment.
The Calpont story may actually make more sense in a shared-disk storage-area-network implementation than for a fully shared-nothing MPP, but that’s a subject for a different post.
Comments
22 Responses to “Dividing the data warehousing work among MPP nodes”
Leave a Reply
The biggest challenge with specialized nodes is managing failover. While a five node / 10TB Vertica cluster stays up when losing any two nodes, a comparably configured heterogeneous systems needs at least three of each type (head, working, loader, chauffeur…) to have the same kind of fail safety built in. That kind of hardware cost starts to add up fast.
Vertica’s “Fault Tolerance” requires that you put all nodes on battery power UPS because they can’t take a power hit without corrupting data. In other words, their replication scheme is broken for power failure cases, and the workaround they provide is to put all nodes on battery backup.
BTW – the comment is wrong about the number of nodes that can fail – logical functions are not physical nodes.
I’ve been told by Vertica that I have to use a star schema. I did some research and found out that other products like Teradata, Netezza and Greenplum don’t require a star schema.
Is this Vertica limitation there because they implemented parallelism differently?
Probably makes sense to clarify a few areas around Vertica’s HA capabilities…when a Vertica node goes away for any reason (network issues, h/w issues, etc.) the rest of the cluster gracefully ignores it and continues humming away. When the node returns, whether reconnected, rebooted or replaced, it recovers by validating itself against the rest of the cluster. There is no chance for data corruption and no requirement for UPS.
Schema design requirements are not an intrinsic function of how HA works. Star and snowflake schemas are popularly applied within analytic database applications, so that’s what we designed early versions of Vertica to support. The next version of Vertica (in beta now) does not require star or snowflake schemas.
I was at Vertica Wednesday, and they said that the Release 1 schema limitation was just to make it easier to write a “great” optimizer.
Leaving aside the marketing fluff of “great”, that makes sense.
And by the way — while I don’t mind anonymous snarky comments, I wish you wouldn’t go so far as to pretend to be somebody different than who you are.
Thanks,
CAM
Let me get this straight:
– Vertica release 1 (current version) can’t handle anything other than star schema because they implemented trivial parallelism
– Authoritative-sounding documents and lectures from Vertica push star schema ‘because it’s better for DW’
– when the product is re-architected, they will change their opinion on schema to fit the product
So, which is it? Is strict star schema really better? Let’s get Vertica on record here.
I was over at Vertica on Wednesday and they insisted that, despite my skepticism, Mike Stonebraker seems right — the vast majority of data warehouse users are content with single fact tables.
Of course, when all you sell is a hammer, you generally only get into a sales cycle with prospects who have lots of nails.
Anyhow, would you care to elaborate on the “trivial parallelism” comment about Vertica? Nobody ever seems to spell it out credibly.
CAM
If you are in a hurry and need to parallelize SQL, then you can build a parallel DB based on the following rules:
1 – Fact table that is distributed across disks and CPUs
2 – Dimension tables that are replicated on all CPUs
3 – Joins only occur between fact table and replicated dimension tables
4 – Aggregation (GROUP BY) only occurs on the column or columns that are used to distribute the fact table
These imply that dimension tables must be very small, otherwise (2) means you have a huge duplication of data as the CPU count grows.
Provided you follow these rules, you don’t have to do anything special to parallelize SQL:
– You don’t have to write an interconnect
– No optimizer changes needed except to enforce rules 1-4
This approach is sometimes referred to as a UNION ALL approach to parallelism. It’s so simple, you can implement it in a couple of months’ work. This has been done by tools like EnterpriseDB GridSQL (formerly ExtenDB), C-JDBC, Datallegro and now Vertica.
It has been amply demonstrated that this approach is extremely limited for data warehousing.
Datallegro found that they weren’t able to sell their product to anyone other than their investor, TEOCO until they re-architected and implemented a “repartition on the fly” approach to fix this problem of limited schema support. It took them two years to implement, and when they did it still was so bad that in the end they were only able to get 3 total customers limping along.
It turns out that customers doing significant data warehouse work can’t live with rules 1-4. The choice of schema needs to be flexible enough to accommodate the needs of modeling the business. It’s far too hard to bend the schema into a pure star in all but the most trivial of applications. Further, the exceptions to rules 1-4 end up causing the product to fail while trying to aggregate data on the “boss node”.
As a consequence, the much more difficult version of parallelism is required that allows for arbitrary choice of schema and aggregation and SQL. This is a non-trivial problem and it requires both a top notch architecture to begin with, excellent developers and a lot of time in testing and proof points.
Column store is trivial to build, parallelism for data warehousing SQL is not.
Why then do you suppose Vertica has as many paying customers as, say, Greenplum?
CAM
I guess we’ll see won’t we. When does the Gartner Magic Quadrant report come out?
I don’t put much credence in the Gartner MQ. http://www.dbms2.com/2007/10/19/gartner-2007-magic-quadrant-for-data-warehouse-database-management-systems/
CAM
Where else can we get information that digs into what “paying customer” really means for these vendors?
For instance:
– Did they pay $10 or $10,000,000?
– Did they get what they paid for?
– Are they in production?
– Are they buying more?
– How many failed?
At least Gartner digs into these details for the Magic Quadrant though it’s behind the scenes and you have to pay them to get the results.
Are there other reports that do this?
I think you’re abusing anonymity a bit. If you want to shill for one of my competitors, please do so in a different venue.
While I make the occasional mistake, I’m proud of the care I put into my research. I’m sure the Gartner folks feel the same way about theirs. I agree with some of their conclusions, but disagree with others; necessarily, the same is true in reverse.
Let’s leave it at that.
CAM
To answer your questions more directly about Vertica, I’m aware, by name (largely NDAed, of course, as most customer names I get from vendors are), of as many repeat customers at Vertica as at any other of the analytic DBMS startups (except, of course, Netezza). Since they’ve been selling product for a much shorter time than, say, Greenplum (a vendor with a comparable number of overall sales), that’s a nice proof point.
Let me hasten to point out, however, that I don’t take the time to review every customer for every vendor in detail. Not even close.
CAM
Didn’t mean to shill, didn’t consider the competitive nature, apologies.
However – I think we’d all like a bit of auditing of the customers reported by all of these vendors. Easy enough for public companies like Teradata and Netezza, not so easy for these others.
Two approaches to the market – you can be like MySQL and get tens of thousands of customers for $1,000 each or you can get tens of customers for $1,000,000 each. Netezza fits into the latter, I/we don’t know where these others fit.
[…] 10.0 – Dividing the data warehousing work among MPP nodes Fri, Sep 5, 2008 8:48 AM I talk with lots of vendors of MPP data warehouse DBMS. I’ ve now heard enough different approaches to MPP architecture that I think it might be interesting to contrast some of the alternatives. The base-case MPP DBMS architecture is one in which there are two kinds of nodes: A boss node, whose jobs include: Receiving and parsing […] […]
[…] are more parts of the Aster Data story I want to write about, namely node heterogeneity and MapReduce syntax, but for now I’ll stop here and post this. I’d also like to point […]
[…] Dividing the data warehousing work among MPP nodes […]
[…] Dividing the data warehousing work among MPP nodes […]
[…] Oracle’s idea of splitting database processing between a couple of types of server is a smart one, and is consistent with what multiple other vendors are doing. […]
[…] Автор: Curt Monash Дата публикации оригинала: 2008-09-05 Перевод: Олег Кузьменко Источник: Блог Курта Монаша […]
[…] they are similarly-named HP servers using identical Intel chips. I.e., Oracle has moved into the node heterogeneity camp. By way of contrast, the usual-suspect MPP vendors — Teradata, Netezza, Greenplum, […]