Vertica finally spells out its compression claims
Omer Trajman of Vertica put up a must-read blog post spelling out detailed compression numbers, based on actual field experience (which I’d guess is from a combination of production systems and POCs):
- CDR – 8:1 (87%)
- Consumer Data – 30:1 (96%)
- Marketing Analytics – 20:1 (95%)
- Network logging – 60:1 (98%)
- Switch Level SNMP – 20:1 (95%)
- Trade and Quote Exchange – 5:1 (80%)
- Trade Execution Auditing Trails – 10:1 (90%)
- Weblog and Click-stream – 10:1 (90%)
It’s clear what Omer means by most of those categories from reading the post, but I’m a little fuzzy on what “Consumer Data” or “Marketing Analytics” comprise in his taxonomy. Anyhow, Omer’s post is a huge improvement over my recent one — based on a conversation with Omer 🙂 — which featured some far less accurate or complete compression numbers.
Omer goes on to claim that trickle-feed data is harder for rival systems to compress than it is for Vertica, and generally to claim that Vertica’s compression is typically severalfold better than that of competitive row-based systems.
Comments
5 Responses to “Vertica finally spells out its compression claims”
Leave a Reply
[…] whose details I forget for now — they could only do 2.5X. Edit: Vertica has now posted much more accurate versions of those numbers. Infobright’s 30X compression reference at TradeDoubler seems to be for a clickstream-type […]
[…] Each of the customers cited below received “half” an Oracle Database Machine. As I previously noted, an Oracle Database Machine holds either 14.0 or 46.2 terabytes of uncompressed data. This suggests the 220 TB customer listed below — LGR Telecommunications — got compression of a little under 10:1 for a CDR (Call Detail Record) database. By comparison, Vertica claims 8:1 compression on CDRs. […]
[…] work underway that’s getting 20X compression in call detail records, versus the 8X that Vertica claims. I’ll post more about that […]
[…] *Greg Rahn of Oracle tweeted me yesterday that one customer is getting 12-17X compression on “dimensional model” data. That sounds comparable to Vertica’s claim of 20X on “marketing analytics” and 30X on “consumer data…. […]
[…] the highest level in the database industry. Database analyst Curt Monash pointed out in an online post this week that analytic database start-up Vertica Inc. claims compression ratios from 5:1 to as […]