Metamarkets Druid overview
This is part of a three-post series:
- Introduction to Metamarkets and Druid
- Druid overview (this post)
- Metamarkets’ back-end technology
My clients at Metamarkets are planning to open source part of their technology, called Druid, which is described in the Druid section of Metamarkets’ blog. The timing of when this will happen is a bit unclear; I know the target date under NDA, but it’s not set in stone. But if you care, you can probably contact the company to get involved earlier than the official unveiling.
I imagine that open-source Druid will be pretty bare-bones in its early days. Code was first checked in early in 2011, and Druid seems to have averaged around 1 full-time developer since then. What’s more, it’s not obvious that all the features I’m citing here will be open-sourced; indeed, some of the ones I’m describing probably won’t be.
In essence, Druid is a distributed analytic DBMS. Druid’s design choices are best understood when you recall that it was invented to support Metamarkets’ large-scale, RAM-speed, internet marketing/personalization SaaS (Software as a Service) offering. In particular:
- Druid tries to use RAM well.
- Druid tries to stay up all the time.
- Druid has multi-valued fields. (Numeric, but of course you can use encoding tricks to be effectively more general.)
- Druid’s big limitation is to assume that there’s literally only one (denormalized) table per query; you can’t even join to dimension tables.
- SQL is a bit of an afterthought; I would expect Druid’s SQL functionality to be pretty stripped-down out of the gate.
Interestingly, the single-table/multi-valued choice is echoed at WibiData, which deals with similar data sets. However, WibiData’s use cases are different from Metamarkets’, and in most respects the WibiData architecture is quite different from that of Metamarkets/Druid.
As for many DBMS, much of what’s interesting about Druid is how it organizes and chunks data. Most important, Druid has MVCC (Multi-Version Concurrency Control) on a segment-by-segment basis. That is, an update requires a new version of the whole segment to be written; while that happens, reads can continue on the old version unabated.
Obviously, this is more suited for streaming or batch-load scenarios than for ones with many single-row updates.
Other Druid specifics include:
- A Druid table must have a timestamp column.
- Druid data is stored in columns, in timestamp order.
- Druid data is commonly chunked into segments of 5-10 million rows. Data is partitioned by time and then perhaps also by some other dimension.
- There can be two sets of data storage servers, one for data that has arrived recently, the other for older data (e.g. >1 hour old). In that case, data is first persisted on one set of servers, then flushed to the other.
- Druid data is structured the same way in memory as on disk (memory mapping). More precisely, there seems to be memory mapping between generic persistent storage and virtual memory, with the operating system taking care of figuring out which parts of virtual memory need to be in actual RAM.
- Druid keeps compressed bitmap indexes on the various dimensions, on a segment-by-segment basis.
- Druid uses dictionary/token compression, with a separate dictionary for each segment. Token length is dynamic, based on column cardinality. Max length is 31 bits, which rarely is a problem, since a segment doesn’t usually hold a lot more than 2^33 rows.
- You can have different replication factors for different segments. You can read from all replicas.
For more on Druid, please see my post on Metamarkets’ back-end technology.
Comments
12 Responses to “Metamarkets Druid overview”
Leave a Reply
[…] Druid overview […]
[…] Metamarkets runs Druid on an 800-core system running on Amazon EC2. Others have done a decent job explaining what Druid seems good for and where the tradeoffs might […]
[…] Metamarkets runs Druid on an 800-core system running on Amazon EC2. Others have done a decent job explaining what Druid seems good for and where the tradeoffs might […]
This is the best explanation of Druid that exists anywhere – inclusive of their Marketing material, the Strata talk, and the documentation in the code. Thanks!
Thanks for the kind words!
I put a lot of effort into it, but was still frustrated by the results (mainly around the in-memory part, not Druid itself).
[…] Metamarkets’ Druid was open-sourced. Numerous other product introductions and so on that I’ve hinted at have […]
[…] HANA but cringe at the licensing costs? One option is to look into open source alternatives like Druid which was created by the vendor MetaMarkets. Druid claims to provide real-time analytics using […]
[…] “I would encourage you to keep an eye on Metamarkets’ Druid, which Curt Monash recently covered: http://www.dbms2.com/2012/06/16/metamarkets-druid-overview/ […]
[…] “I would encourage you to keep an eye on Metamarkets’ Druid, which Curt Monash recently covered: http://www.dbms2.com/2012/06/16/metamarkets-druid-overview/ […]
First of all I would like to say terrific blog!
I had a quick question in which I’d like to ask if you do not mind.
I was interested to know how you center yourself and clear
your head before writing. I have had trouble clearing
my thoughts in getting my thoughts out. I truly do take pleasure in writing
however it just seems like the first 10 to 15 minutes are usually lost simply just trying
to figure out how to begin. Any ideas or tips? Thanks!
Thanks for finally writing about > Metamarkets Druid overview | DBMS 2 :
DataBase Management System Services < Loved it!
I think that what you posted was very logical.
However, consider this, suppose you were to create a killer title?
I ain’t suggesting your information is not solid., however what if
you added a headline that grabbed people’s attention? I mean Metamarkets Druid overview | DBMS 2 : DataBase Management
System Services is a little boring. You could glance at Yahoo’s front page read and click see how they create article titles to get viewers interested.
You might add a related video or a related pic or two to get people interested about everything’ve written. In my opinion, it might make
your website a little livelier. http://pjjivezhqqc28.mee.nu/?entry=3548990