DATAllegro vs. Vertica and other columnar systems
Stuart Frost of DATAllegro offered an interesting counter today to columnar DBMS architectures — vertical partitioning. In particular, he told me of a 120 terabyte (growing soon to 250 terabytes) call data record database, in which a few key columns were separated out.
I’m hard pressed to see why, for some applications, this wouldn’t have all the benefits of the full columnar architectures of, say, Vertica or Kognitio. That said, I can also envision other applications in which Vertica would offer large performance benefits by allowing redundant storage with a variety of sort orders.
Stuart’s other anti-columnar objection was predictable FUD about load times. Certainly the burden of proof lies with the columnar vendors to show they’ve overcome the historical bottleneck for their design. But while I confess to not being sure yet, I’m actually inclined to believe that they’ve pulled it off.
Comments
13 Responses to “DATAllegro vs. Vertica and other columnar systems”
Leave a Reply
The problem with vertical partitioning in a row store is that when an ad hoc query comes along, the vertical partitioning scheme used may not be helpful. In a column store, you can take 100% advantage of column access patterns without considering what columns you’ll need in the future. In a row store, there is a tradeoff between adding columns to the partition, and query speed on the partition. Or you can make more partitions, but that hurts load time and space usage.
In practical terms, when we run against commercial row stores tuned by professionals who make their living doing this, we find we run way faster, even when the partition or MV scheme has exactly the right columns for the query. Reasons include columnar compression beating row store compression by 3:1 or more in these cases, and better data organization. In fact, with only a single copy of the data in Vertica, we are able to beat a “1-MV-per-query” row store scheme in many of our tests.
With regard to load time, there are no theoretical reasons why a column store should be slower (or faster) to load. Vertica’s goal is to never take longer to load than the competition, and we’ve achieved that. In a recent test on identical 4-node clusters, we were able to load data in 60% of the time of a popular row store; both systems used 4 parallel load streams and similar physical schemas. Also, on a schema with over 300 columns, we loaded significantly faster than popular commercial row stores. Vertica will continue to focus on load speed as well as query performance, so we expect to make loads even faster this year.
Note – I work for Vertica.
Vertical partitioning for me involves splitting your fact table into two or more fact tables to keep your fact table as ‘skinny’ as possible to get I/O throughput. I’m not clear that this is what the DATallegro guy means.
If this is his position then IMHO the reason columnar systems should beat even heavily optimized row-store DBs is that fact that processor speeds have increased by orders of magnitude whilst I/O throughput has increased fairly slowly.
Every big data warehouse I have ever seen max’s out I/O. For most adhoc queries the processors are doing nothing and the I/O is at 100%. This is particularly true of the SMP guys (Oracle, please stand up) but largely it is the biggest issue for serious analytical queries irrespective of the architecture. Whilst one might argue that many data warehouses have sub-optimal hardware implementations (these days pulling data off a SAN rather than local, physically optimized, non-contentious disk), the reality is that CPU speeds have increased exponentially whilst I/O throughput has creeped up in the last 10 years. If you can pull heavily compressed data off the disk and decompress quickly in memory you take advantage of this CPU capacity.
Bitmap indexes are supposed to provide the advantages of columnar data stores for row-store databases. The problem is that once the optimizer decides to drop off the index and do a table scan all that advantage is lost.
Here’s the problems with the notion that vertial partitioning is the answer.
A lot of adhoc BI queries look like this – ‘show me the aggregation of all products to a category roll-up for the last week’. The 80/20 rule is in play. The ‘all product aggregation’ isn’t selective and, depending on the insanity of the optimiser, you don’t hit the bitmaps. Vertical partitioning of a fact table won’t help at all because you cannot split the time and product foreign key columns from the core fact table. Nearly every query will use those key column. The optimizer will degenerate to a table scan. Then it’s all about I/O. The columnar DB wins.
Chuck,
You’re citing a 2x improvement for a 4 node cluster. It seems to me like this isn’t really enough data to make a claim that “load times aren’t an issue”. 4 nodes is pretty small, and we also don’t know how much data you are talking about.
What happens in a 64 or 128 node cluster, with varying data sizes?
I don’t doubt that columnar DB’s will have an advantage for compression – but there’s no such thing as a free lunch.
DK
[…] We have lively discussions going on columnar data stores vs. vertically partitioned row stores. Part is visible in the comment thread to a recent post. Other parts come in private comments from Stuart Frost of DATAllegro and Mike Stonebraker of Vertica et al. […]
Agreed that 4 nodes is pretty small, but considering that Vertica is shared nothing and all load steps are local to the machine except data segmentation, load speed scales with number of nodes assisting in the load as long as each node has a data stream and you have a good switch.
We did see around 4x load speed compared to 1 node in this test, which is far more than we can say for a competing row store that uses a shared disk architecture and saw 2.5x. Likewise, a competing shared-nothing row store without a parallel load feature didn’t get anywhere near 4x on 4 nodes, as load saw 1x while index and MV build saw 4x.
Of course other products out there do it the same way we do and don’t suffer these limitations, but I repeat my claim that there’s no theoretical reason why a column store would be beaten on load performance. Nor do we ever get beaten (on apples-to-apples hardware of course).
Stay tuned for more complete presentations on our numbers in an upcoming paper, as well as bigger cluster and data sizes.
Chuck,
Thanks for the elaboration. Here’s why I ask…
When I think about the problem from an architectural standpoint (computer architecture that is), it’s exactly analogous to an update cache coherency policy.
The E/T steps are local, and those probably scale reasonably well. What isn’t going to scale is the 1:N communication of distributing the data out to every node, and the resulting acknowledgements. You have to update all N nodes, and you can’t really get around that.
IIRC, you guys use timestamping, so you don’t have to redo any in-flight transactions because of an update.
As I said, what would be interesting would be a 3D graph of:
Data load size (x GB), data load frequency (Y loads/hour) and performance (Z seconds)
DK
David,
I don’t think I follow the analogy. There’s a big distinction between loads and updates. In inserts/loads, there’s no “coherency” needed beyond knowing which data was part of what transaction (timestamping, as you say). From the perspective of a node processing a load stream, each row loaded “belongs” on one of the nodes in the cluster, so it can be sent to that node directly and nobody else needs to know about it. (If a query requests that row, the node processing the query knows where to look.)
So from the perspective of a node doing a load in a cluster of N nodes, it processes a stream of data and sends it to N-1 places (keeping 1/Nth for itself). The node will also receive data from N-1 other nodes if they are processing loads. So if all nodes are loading data at the same time, each node would expect to receive as much data as it sends. So load speed scales linearly with the number of nodes as long as:
1. A node can handle N incoming load streams. (This translates to a memory requirement, mostly.)
2. The interconnect can distribute all the data, full duplex, with all nodes talking to all others at once on the backplane. (Currently this is true of relatively cheap GigE switches with 32 or 64 ports.)
I’m sorry, but I don’t understand the concept behind the graph. It seems to me like for a given load size and frequency, either the system will keep up or it won’t. Wouldn’t it be more intersting to know what the maximum load rate is (GB/min) for combinations of parameters like # of nodes in the cluster and # of GB in each load?
Chuck,
Thanks for the clarification – that makes much more sense now.
I believe that in the scenario you describe – loading, the graph wouldn’t have much value, whereas when you update (hence you must enforce consistency and coherency), it would be interesting.
DK
[…] *For example, were Vertica’s competitors set up with vertical partitioning? […]
[…] DBMS for efficient data warehousing, it isn’t necessarily dispositive for a comparison of columnar systems to data-warehouse-specialist row-based systems. The three reasons suggested for the poor performance of vertically-partitioned row stores […]
Have you had a chance at examining VectorStar? I believe this Mexican developers have made great progress with their columnar data engine -almost unknown in the United States. http://www.vectornova.com
Never heard of VectorStar.
Looking at VectorNova’s site, however, it sounds like VectorStar is based on arrays rather than columns, perhaps like http://www.dbms2.com/2006/10/04/sas-intelligence-storage/
Vectornova is completely COLUMNS, it’s amazing!