IBM BLU
I had a good chat with IBM about IBM BLU, aka BLU Accelerator or Acceleration. BLU basics start:
- BLU is a part of DB2.
- BLU works like a columnar analytic DBMS.
- If you want to do a join combining BLU and non-BLU tables, all the BLU tables are joined first, and the result set is joined to the other tables by the rest of DB2.
And yes — that means Oracle is now the only major relational DBMS vendor left without a true columnar story.
BLU’s maturity and scalability basics start:
- BLU is coming out in IBM DB2 10.5, this quarter.
- BLU will initially be single-server, but …
- … IBM claims “near-linear” scalability up to 64 cores, and further says that …
- … scale-out for BLU is coming “soon”.
- IBM already thinks all your analytically-oriented DB2 tables should be in BLU.
- IBM describes the first version of BLU as being optimized for 10 TB databases, but capable of handling 20 TB.
BLU technical highlights include:
- “Complete” pipelining of queries, so that scans can be shared.
- Data skipping, to reduce I/O.
- Vectorization based on SIMD (Single Instruction Multiple Data). It turns out that SIMD is available on Intel and PowerPC chips alike, and allows you to operate at once on as much data as fits into a register. At load time, BLU packs data into vector lengths appropriate for the available silicon.
- Probabilistic caching rather than pure LRU (Least Recently Used). Blocks are more likely to stick in memory if they’ve been referenced more often before. There’s some extra randomization beyond that, for reasons I didn’t wholly grasp.
- “Automatic workload management”, an option that puts an automagically selected cap on the number of simultaneous queries. This cap will rarely be >1/core; not coincidentally, IBM believes that contention for resources among queries can be very wasteful. Otherwise, BLU’s “concurrency” model is similar to regular DB2’s.
Like any other columnar RDBMS or RDBMS option, BLU has a compression story:
- BLU compression options include approximate Huffman coding (which I gather is a form of tokenization), prefix encoding, and something to do with offsets (I don’t know whether that’s straightforward delta compression).
- BLU compression strategies are automagically chosen, one column segment at a time.
- BLU operates on compressed data.
IBM said all the compression algorithms were order-preserving, and hence range predicates can be executed on compressed data. Unfortunately, I neglected to ask how “approximate Huffman coding” could preserve order.
As with any columnar system, one has to wonder about how BLU writes data. IBM observed:
- It is recommended you have BLU commit 10,000 rows at a time.
- Many queries can read uncommitted data.
And so IBM doesn’t think load latency is much of a problem for BLU.
But that’s about all I figured out about how BLU writes data. So IBM kindly followed up with a lengthy email, plus permission to copy it — lightly edited — below:
Getting new data into/out of the database (in memory and on disk) is one of BLU’s strengths. The methods we support include:
We support LOAD (replace or append), SQL INSERT, UPDATE and DELETE, as well as three SQL based utilities INGEST, IMPORT and EXPORT (not to mention BACKUP and RESTORE, which are also ways of getting data in and out).
As in previous versions of DB2, the SQL-based operations (INSERT, UPDATE, DELETE, INGEST, IMPORT) are in-memory first and hardened to disk asynchronously. LOAD operations are bulk operations to disk (though they tend to be CPU bound due to the cost of parsing and formatting data during load). As you suspected, the UPDATE operations are not in place.
Because the syntax and semantics of getting data into the database remain unchanged (an ongoing theme for us), it means that ELT tools like IBM Data Stage and partner products work without modification.
Distinguishing characteristics:
- Unlike other columnar vendors that have tried to achieve performance when inserting new data using delta areas of the table — doing fast insert into a delta area (usually a delta tree or a row-based staging table) and then asynchronously moving the row data or delta data into the columnar area — we consider that a weak strategy since it means the data has to be loaded twice (once into the staging area and then again into the mainline table). Our approach is to add data directly against the main table and use bulk transformation (i.e. thousands of records at a time) to amortize any latency that would normally come from columnar processing. We believe that by bulk processing the data we largely eliminate the overhead inherent in columnar processing, and also entirely avoid the dual processing that other companies are suffering from.
- We’ve invented a new columnar-specific logging method for BLU Acceleration. Externally it looks exactly like DB2’s traditionally log based transaction recovery (for crash and rollforward recovery), but the format of the bytes within the log is organized in columnar fashion, so we log buffers of column data rather than logging rows. This, in combination with XOR logging (only logging the XOR delta bits) results in great log space reduction. As a result the recovery logging is actually much smaller than row based logging in many cases. Heck, even we were happily surprised by it. Again, like our storage story where BLU Accelerated tables can co-exist in the same storage and buffer pool as traditional row tables, the BLU Accelerated data is logged to the same log files as row table operations. The integration is completely seamless.
Finally, last and probably also least:
- “BLU” doesn’t stand for anything, except insofar as it’s a reference to IBM’s favorite color. It’s just a code name that stuck.
Comments
20 Responses to “IBM BLU”
Leave a Reply
[…] Watch out for DB2′s new columnar option. IBM BLU […]
Is Vertica is effective at smaller insert batches when you are willing to pay the price of writing twice — once to the row store and once to the column store?
What is the overhead for updates? If they are not in place what is done to remember that the old version has been replaced? I assume this imposes a cost on queries.
It took IBM quite a while to come up with a SIMD based processing, considering that VectorWise used it as far ago as 2003.
Probably in a few years we might see GPU-based processing in IBM database.
Would be interesting to see more details about compression used in BLU and also about operating on compressed data.
Mark,
Usually, the main cost for append-only (as opposed to update-in-place) is a background compaction process. Other advantages of append-only can be snapshot isolation or time travel. cf. MVCC
I screwed up by not questioning IBM about all this in detail. No real excuse; most of the factors I complain about in http://www.dbms2.com/2012/10/06/analyzing-big-companies-is-hard/ weren’t really present in this case.
Curt,
The paper Constant-Time Query processing by Raman et al (http://bit.ly/16nc6cC) has details on how the order is preserved and how the equality and range predicates are done on compressed data in Blink engine. IBM Smart analytics Optimizer (ISAO) and Informix warehouse accelerator used this technique (pre-cursors to DB2 BLU). The data is partitioned via frequency partitioning technique and encoded with Huffman encoding. In each cell, each column uses fixed length codes. During encoding, the bit-codes are assigned in same order as the original values — this preserves the order.
I’m sure detailed DB2 BLU papers will be forth coming. Previous work can be found in papers at http://www.informatik.uni-trier.de/~ley/pers/hd/l/Lohman:Guy_M=.html and http://www.informatik.uni-trier.de/~ley/pers/hd/r/Raman:Vijayshankar.html
Thanks, Keshav — I’ll take a look!
Can you clarify the last remark? col db insert or updates are logical shifts of bit vectors (though the vendors optimize by breaking the vectors into chunks, usually in lists or tree structure, and pad the chunks to allow some changes).
This means that col db inserts (or updates – which are handled generally as delete+insert) asymptotically approach the cost of touching the entire column data structures very quickly for even small batches. So multiple singleton or small batches are slow in most col dbs, and many vendors try to hide the pain by staging the data to a front-end DB (usually in-memory or row store) then merging periodically. Is this what you mean by append-only?
I’m simply drawing the distinction between append-only and update-in-place.
Of course, for many analytic DBMS use cases, there aren’t a lot of updates anyway, just inserts. For those, the distinction is somewhat academic. Still, in practice there’s a correlation between append-only and other characteristics that are still interesting in those use cases. (Think MVCC.)
I’m clearly missing the point here. There are 2 common strategies for MVCC: marking changes in-place (e.g., DB2 analytic accelerator) and keeping a change vector (e.g., Oracle).
The first is awkward in a col DB because you need to add lots of detail about when the change occurred, so the latter seems the cheaper approach for col DB (keep new stuff to the side and merge in periodically in bigger batches.)
I’m not clear on what append means in a col DB, since the data has to be kept in row order to match fields in that row together. I can’t see how append is distinct from insert (or for that matter delete) in small changes to col DB.
My guess is that append was used in presentation-speak to mean keeping changes on the side like other col db vendors.
The way I use “append-only” is that you just keep adding records. If there’s an update, it changes some sort of flag, but the old version of the record isn’t overwritten until some kind of compaction process comes into play.
A large fraction of newer database managers are append-only, analytic and short-request alike. MongoDB is a notable exception.
Thanks – that was the source of the confusion. I can’t reconcile column dbms and append-only, because of the cognitive dissonance. Append only seems a row store (or record store) construct, since col stores need ordering.
Are there any append-only column stores you know of?
> ”BLU” doesn’t stand for anything, except insofar as it’s a reference to IBM’s favorite color. It’s just a code name that stuck.
IIRC, it stands for “BLink Ultra”, an improved version of BLink, yet another internal codename.
see http://wwwdb.inf.tu-dresden.de/birte2011/files/keynote.pdf
[…] researcher Curt Monash, of Monash Research, has noted that with IBM’s DB2 10.5 release, Oracle now is “now a usually vital relational DBMS […]
[…] analyst Curt Monash, of Monash Research, has noted that with IBM’s DB2 10.5 release, Oracle now is “now the only major relational DBMS […]
RE aaron on May 29th, 2013 12:45 pm
Vertica is an append-only column store with respect to disk files. Inserts or updates can be directed to disk or default to filling memory first. Tuples in memory are periodically flushed to new disk files in column-oriented structures. These are not “on the side”; they are the same format as all other disk files.
Re “col stores need ordering”, this is implementation dependent. The relational model does not assume ordering of tuples. In Vertica’s case, each disk file has its own ordering; there is no total ordering of a table.
To keep the number of files from becoming too large, Vertica has a background “mergeout” process that creates new, larger files. Then the older files are deleted. So there is a bit of a double-write penalty, but it does not delay the insert processes.
[…] the devil is in the details. See, for example, my May post on IBM’s version, called BLU, outlining all the engineering IBM did around that […]
Curt,
You stated that “…Oracle is now the only major relational DBMS vendor left without a true columnar story.” What functionality does SQL Server offer, even now, that one might classify as a ‘true columnar story’?
At the time I wrote that, SQL Server’s columnar-option future had already been disclosed — I think in more detail than Oracle’s has been even now.
Thanks, Curt. I know it has been a while since you wrote this, and you’d probably rather attend to the newer posts :), so I appreciate that you did respond.
I’m still struggling to find a SQL Server functionality that one might term ‘true columnar’. The best I can do is their column store indexes. I want to be sure to include SQL Server in the POCs we are doing for columnar technology, if they belong there, but I can’t find a reason to include them at this point.
Thanks for doing DBMS2. I’ve been following for about 5 years.