Oracle Exadata hybrid columnar compression
Oracle Database 11g Release 2 is out, and as usual I wasn’t briefed — perhaps because Oracle is more scared than its competitors are of hard questions, perhaps for some other reason entirely.* Anyhow, Oracle Database 11 Release 2 contains an Exadata-only feature called hybrid columnar compression. The Oracle Database 11g Release 2 white paper says “data is grouped, ordered, and stored one column at a time.” But Kevin Closson clarifies:
The word hybrid is important.
Rows are still used. They are stored in an object called a Compression Unit. Compression Units can span multiple blocks. Like values are stored in the compression unit with metadata that maps back to the rows.
So, “hybrid” is the word. But, none of that matters as much as the effectiveness. This form of compression is extremely effective.
That sounds a whole lot like PAX. Specifically, in Oracle’s case I would guess “hybrid columnar compression” provides the compression benefits of column stores, but not column stores’ I/O benefits, and also not any kind of in-memory compression.
*Actually, Oracle has indicated to me multiple times that the reason is I won’t let Oracle review what I write before I publish it. My stance is that such “review” is an extremely time-wasting courtesy, in which one spends a lot of time diplomatically explaining to a vendor that, contrary to what it hopes, one really does know the difference between marketing puffery and sober fact. I rarely do white paper projects any more, notwithstanding that my fee for those now exceeds $2,000/page. I’m not about to go through the “review” hassle for something I write for free, about a vendor who isn’t otherwise a paying client.
Comments
20 Responses to “Oracle Exadata hybrid columnar compression”
Leave a Reply
I think it is an excellent guess as there are no many alternatives between the decomposition storage model used by column stores (which, btw, dates back to at least 1979), and the classical row store model.
Of course, in such a case, small differences in implementation can be a big deal. You can be a little closer to the row store, or a little closer to the column store. It would depend on the design philosophy. If you read the original PAX paper, it is not very specific on the implementation. This is not a fault, but it means there can be many PAX-like implementations. Oracle being what it is, we can guess that they stuck close to the row store philosophy. That is, referencing individual rows is probably nearly as easy as in a row store… maybe at the expense of compression.
Yet, I’m not certain that a PAX-based architecture would have to compromise on compression. Indeed, you can use all the same tricks as in a column store including run-length encoding and working from sorted tables. There is a small penalty to pay, but with enough cleverness, it can be made negligible. At least in theory, you can.
In fact, because the C-Store way introduces redundancies (through multiple projections on different sets of columns), it seems likely that a PAX-based architecture would use less storage overall.
My own guess is that Oracle probably decided to close the performance gap with the likes of Vertica. They don’t need to be as fast as long as they can convince their users that they are not 50x slower than Vertica. Their sales pitch might end up being that they are not quite as fast as a true column store on all queries, but then, they offer more balance performance (referencing an individual row might be faster than with a column store).
Disclaimer: I don’t speak with any authority. Just guessing. I do claim to know the science behind it a little bit though.
One thing is for certain, when using hybrid columnar compression the price per TB drops significantly given 10x compression for non-archive data and 40x for archive/historical data. This new technology seems to put Oracle’s compression portfolio well ahead of Netezza’s.
If you’re really getting close-to-columnar levels of compression, that’s indeed better than Netezza’s rates.
Oracle’s is the third hybrid row/column storage scheme announced in the last month. It’s time for a taxonomy: http://dbmsmusings.blogspot.com/2009/09/tour-through-hybrid-columnrow-oriented.html
[…] hybrids. For example, text search sometimes require full-text indexes such as suffix arrays, Oracle recently announced a row/column hybrid, and so on. Take away message: If you are stuck, try to rotate your data model. If neither the […]
If I understand this correctly, then I would not precisely say that they get the compression benefits of column stores. I’d say that they get compression benefits, which column stores also get. But column stores may be able to do more effective compression because they are compressing a sequence of values drawn from the same domain of data. I don’t know how big this effect is, but it might be significant for all I know.
Dan,
The idea of a PAX-like scheme is that you take a certain subset of the rows (i.e., enough to fill a block, or in the case of Oracle evidently enough to fill a number of blocks) and store them VERY much as you would in a column store. Now, do you get a better compression ratio on 10s of terabytes of data than you do on megabytes or 10s of megs? Yes. But it’s my understanding that for many data sets, the difference isn’t really very much.
>> in Oracle’s case I would guess “hybrid columnar
>> compression” provides the compression benefits
>> of column stores, but not column stores’ I/O
>> benefits
I was wondering if we consider a table with 10 columns; and a query Q is interested in only 2 columns – with architectures like Netezza and Exadata, if the unnecessary 8 columns are filtered and are never sent to the DB by storage servers – (network) IO perceived by the DB should be somewhat similar to what column stores see, no?
Ankush,
Correct. But in the case of Netezza and Exadata you’re paying for the equipment (and electricity) that does the initial I/O and filtering.
Netezza’s argument is “Yeah, but that’s not so bad because FPGAs are cheap.” Even so, it’s not free.
[…] frequently catching up to specialized engines. In particular, they are not limited to row stores. Curt Monash’s blog post on Oracle’s hybrid columnar approach makes this obvious. Nicolas Bruno, in Teaching an Old […]
[…] Oracle Database 11g Release 2 white paper I cited a couple of weeks ago has evidently been edited, given that a phrase I quoted last month is no longer to be found. Anyhow, […]
[…] These figures are highly sensitive to assumptions about Oracle’s hybrid columnar compression. […]
[…] http://www.dbms2.com/2009/09/03/oracle-11g-exadata-hybrid-columnar-compression/ […]
[…] Exadata supports hybrid columnar compressions since 2009. This was (deliberately?) omitted by Stonebraker. It would be interesting to hear him compare […]
https://www.sinara-group.com
Oracle Exadata hybrid columnar compression | DBMS 2 : DataBase Management System Services
Thanks for every other magnificent article. The place else may anybody get that kind of information in such a perfect approach
of writing? I’ve a presentation subsequent week, and I’m at the search for such info.
%%
Feeel free too surf to my webpage: Ligo Partners – http://www.ligopartners.com –
accelerator startup
Oracle Exadata hybrid columnar compression | DBMS 2 : DataBase Management System Services
This piece of writing will help the internet people for setting up new website or even a weblog from start to end.
see a link and picture are below http://veqvelawoa.wixblog.com/
Nice answer back in return of this question with genuine arguments and describing the whole thing on the topic of that.