August 25, 2009
Sybase IQ technical highlights
General highlights of the Sybase IQ technical story include:
- Sybase IQ is an analytic DBMS with a columnar/column-store architecture
- Unlike most analytic DBMS, Sybase IQ has a shared-disk architecture.
- The Sybase IQ indexing story is a bit complicated, with a bunch of different index kinds. Most are focused on columns with low cardinality, and it least in some cases are a lot like bitmaps. (Sybase IQ when first introduced was a pure bitmap index product, with a single index type “Fast Project”.) But one index kind, “High Group” — designed for columns with high cardinality – is an exception to most generalities about other Sybase IQ index kinds, and instead is more akin to a b-tree.
- Unlike Vertica, Sybase stores each column of data only once. I don’t see how it would make sense to have multiple indexes on the same column, but I didn’t actually ask whether doing so is possible or common.
- Sybase estimates that Sybase IQ requires ¼ the DBA effort of, say, Oracle. (Frankly, that’s not a particularly good figure.) Obviously, this is just a broad-brush average.
- Sybase recently repurposed an acquired ETL tool to be focused on Sybase IQ. IQ of course also works with various third-party tools, certified or otherwise.
- Sybase’s Power Designer CASE (Computer-Aided Software Engineering)/database design tool works with Sybase IQ.
- Sybase is proud of Sybase IQ’s new in-database analytics capabilities, but I haven’t yet grasped what, if anything, is differentiated about them.
- Sybase has an ILM (Information Lifecycle Management) story built around the point that different columns can be stored on different kinds of media.
Highlights of the Sybase IQ compression story include:
- Sybase IQ applies compression to both columns and pages
- A (the?) major kind of column compression is called “projection” — why? — but boils down to token/dictionary compression. Tokens can be 1, 2, or 3 bytes or length – whichever is the best fit for the column’s cardinality.
- I don’t have details about the other kinds of compression.
- Data is kept compressed in memory “until the latest point possible.”
Highlights of the Sybase IQ update and load story include:
- Sybase claims that only the “High Group” index is costly to update. Specifically, “High Group” costs about as much to update as the database itself. Other indexes are fairly trivial to update. (Upon reflection, I don’t immediately see why that makes sense.)
- There’s pipelining of some sort when a High Group index is updated.
- Sybase claims that bulk loads of Sybase IQ are very fast.
- Loading Sybase IQ doesn’t block queries. Rather, Sybase IQ has some kind of versioning system in which a query just executes against older data.
- Sybase IQ updating is done in parallel. (That would be parallel among servers, of course, since Sybase IQ is shared-disk.)
- Trickle feed loading of Sybase IQ is slow. When you need to do microbatch loading with latency in the 2-15 minute range, Sybase recommends staging via an OLTP DBMS, whether from Sybase or otherwise. Sybase PowerDesigner generates scripts for this, and Sybase Replication Server helps with the execution.
Highlights of the Sybase IQ concurrency, scalability, and workload management story include:
- Sybase points out that, because of Sybase IQ’s shared-disk architecture, queries can execute on a single server in the “grid.” Thus, if you have enough cores, it can be possible to isolate long-running queries from shorter ones.
- Similarly, Sybase notes that you can meet different SLAs by putting different users’ queries on more- or less-crowded Sybase IQ servers.
- Sybase further observes that not having to move data among nodes saves Sybase IQ from a lot of overhead true MPP systems endure.
- Sybase makes the usual claim that, because Sybase IQ is so efficient, queries finish quickly, and hence there’s less stress on concurrency than one might otherwise think.
- I don’t get the sense that Sybase IQ actually boasts a lot of direct workload management features. However, there are such features in Sybase’s flagship ASE product, so hopefully adding something similar to Sybase IQ is a product future.
Related links
Categories: Analytic technologies, Columnar database management, Data warehousing, Database compression, EAI, EII, ETL, ELT, ETLT, Sybase, Theory and architecture
Subscribe to our complete feed!
Comments
11 Responses to “Sybase IQ technical highlights”
Leave a Reply
[…] Sybase IQ technical highlights […]
Thoughts: I’m guessing that by “projection” they mean bit packing. In any case, it would be really useful if people in this industry agreed to some common definitions, or, minimally, to share their definitions. Reinventing the whole database language for each and every company is painful.
Bitmap indexes in Oracle support high-cardinality columns and their effectiveness in that case is determined by the number of rows per distinct index value, not by the index cardinality. The cost to update them is not as high as you mention for ‘High Group’ indexes in Sybase IQ. Is Sybase IQ storing one bitmap per index value and replacing the entire bitmap on update?
I’m under the impression that Sybase IQ supports compressed ‘binned’ bitmap indexes. These would require a lot of work to update, though I don’t think they would have to recalculate the entire bitmap.
A couple of years ago there was a TechWave conference where they talked about their ability to do comparison operations on compressed data without decompressing it. I thought that was a cool idea.
Jon,
That is possible. An example of such a bitmap index is implemented as Fastbit, an LGPL project.
It implements a type of bitmap compression which supports logical operations between bitmaps w/out decompressing.
You can find it here:
https://sdm.lbl.gov/fastbit/
Jon, Justin,
Those capabilities are quite straightforward if the database uses dictionary compression and compresses to a fixed number of bits per value. LucidDB does the same in its compressed column stores.
http://www.luciddb.org
I haven’t drilled down to the level of detail I’d like on the various Sybase index kinds, so I’ll duck questions like Mark’s.
CAM
Fastbit doesn’t use dictionaries. It uses Word-Aligned-Hybrid compression (for which they have established an “open” patent) and is available LGPL.
I used it briefly as part of a POC at AdBrite and was VERY impressed. I’ve though about integrating it as a storage engine for MySQL, but MySQL is row engine oriented which sucks for a column store like FastBit.
Eigenbase is pretty cool, and would make a nice frontend to add support for joins and other such SQL constructs which Fastbit currently doesn’t support.
[…] Sybase IQ technical highlights General highlights of the Sybase IQ technical story include: * Sybase IQ is an analytic DBMS with a columnar/column-store architecture * Unlike most analytic DBMS, Sybase IQ has a shared-disk architecture. * The Sybase IQ indexing story is a bit complicated, with a bunch of different index kinds. Most are focused on columns with low cardinality, and it least in some cases are a lot like bitmaps. (Sybase IQ when first introduced was a pure bitmap index product, with a single index type “Fast Project”.) But one index kind, “High Group” — designed for columns with high cardinality – is an exception to most generalities about other Sybase IQ index kinds, and instead is more akin to a b-tree. […]
[…] A lesser oopsie is Mike’s criterion “IO-1″, which is written so confusingly that it technically seems not to be met by any of the vendors cited — including Vertica, which introduced Vertica FlexStore in mid-2009. And while I’m at it — Aster Data nCluster definitely meets criterion IO-3; I confirmed that by asking Tasso Agyros. Mike’s “No” for Sybase IQ on his criterion CPU-5 is also pretty questionable, given that Sybase IQ operates on compressed data until “the last possible moment.” […]