July 28, 2010
dbShards — a lot like an MPP OLTP DBMS based on MySQL or PostgreSQL
I talked yesterday w/ Cory Isaacson, who runs CodeFutures, makers of dbShards. dbShards is a software layer that turns an ordinary DBMS (currently MySQL or PostgreSQL) into an MPP shared-nothing ACID-compliant OLTP DBMS. Technical highlights included:
- Despite heavy emphasis on the word “sharding,” dbShards’s scale-out is transparent to the application programmer. E.g., in dbShards + MySQL, the APIs are more or less the same ones you’d expect for MySQL (JDBC, etc.)
- If the DBMS underneath is ACID-compliant (e.g., MySQL + InnoDB), then the dbShards version is ACID-compliant too.
- Beyond those basics, I forgot to check the fine details of dbShards’ MySQL (or PostgreSQL) syntax support. Todd Hoff, however, did not forget.
- dbShards keeps copies of each shard on two different servers, via asynchronous log-shipping. This allows for failover in both planned and unplanned outages.
- dbShards wants you to distribute big tables among shards via a “shard key,” which is a lot like the distribution key in MPP analytic DBMS. You’re encouraged to replicate small, low-update-volume tables across each shard.
- Cory says that dbShards has good join performance when – you guessed it! – everything being joined is co-located shard-by-shard, because the tables were distributed on the same shard key and/or replicated across each shard. Cory can’t imagine why you’d want to do an inner join under any other circumstances.
- The basic dbShards query execution model is: A query comes in; it’s parsed; a shard key is automagically detected (one hopes); the “global configuration file” is checked to see which shard to ship the work off too. I forgot to ask whether lookup was done via a hash table (the obvious guess) or something else. The programmer can put hints in the code comments to direct the sharding, but Cory asserts those aren’t needed very often.
- Cory says that insert performance with dbShards + MySQL + InnoDB is 1500-3000 inserts per shard per second, scaling almost linearly with the number of shards. I forgot to ask how many shards this had been tested for.
- If you want blazing dbShards performance, Cory’s base-case figure is 25 gigabytes of data per node, so that the most commonly used indexes can camp out in memory. (I forgot to ask what kind of hardware he was assuming per node.) This is if you’re going to be doing joins or aggregrations. If it’s just single-row inserts and updates, or if your performance requirements are lower, you can go with 10X that figure.
- Cory tells stories wherein going from an unsharded database to 4 or so shards took database re-indexing time down 50X or more. Apparently, such tasks can be exponential or even super-exponential with database size over InnoDB. (That said, I’d be surprised if all large InnoDB users suffered from that problem to the same degree.)
- dbShards’ customer workloads are all >= 50% reads. This is reflective of dbShards’ design priorities.
- As long as it can be in charge, dbShards is happy to interface to whatever kind of database backup software you want to use on a node by node basis. (dbShards wants to drive your backup software for you so that it can be sure the replicas are handled properly.)
- It’s “fairly common” for dbShards to be paired with memcached. I forgot to ask whether memcached typically lived on its own pool of servers, or on the same pool that runs dbShards.
- Future DBMS options under consideration for dbShards include Oracle and (unspecified) in-memory.
Business highlights for CodeFutures and dbShards include:
- dbShards’ price is $5000/server/year, including support and OEMed MySQL, with stated quantity discounts up to 40%.
- dbShards cloud pricing is different (on a usage basis).
- dbShards has 6 or so customers, half each on-premises and in the cloud. One of them is Facebook. (Those “100s” of customers mentioned on the dbShards website are for a fairly unrelated product.)
- CodeFutures has been at this 2 ½ years or so. There is no venture capital in the company.
- Early deals dbShards deals have evidently involved a fair amount of professional services.
- Counting contractors, Code Futures has 10-12 people, which has been as high as 15.
- Target dbShards customers are as you’d expect. Cory says he’s actually been more successful getting early-adopter money out of Web companies than Wall Street firms.
- There are a couple of dbShards PostgreSQL customers for greenfield applications. Most dbShards customers and prospects, however, are looking to scale out existing apps.
- Despite its connection to open source DBMS, there’s nothing open source about dbShards itself.
Categories: dbShards and CodeFutures, Facebook, MySQL, OLTP, Parallelization, PostgreSQL
Subscribe to our complete feed!
Comments
3 Responses to “dbShards — a lot like an MPP OLTP DBMS based on MySQL or PostgreSQL”
Leave a Reply
Smaller companies who have similar needs (scale out OLTP using MySQL), and are comfortable working without commercial support, should also look into the Spider storage engine for MySQL. http://spiderformysql.com/ Spider provides a simpler version of much of the same functionality.
dbShards is an interesting looking product. If I’m not mistaken, the failover strategy actually uses semi-synchronous replication, which is almost as fast as asynchronous while providing consistency rivaling synchronous approaches. That’s pretty cool.
I would be interested in knowing:
1. How dbShards handles partition rebalancing. Does a partition need to be taken offline and rebalanced manually? How are in-flight transactions handled during a rebalancing process? Is there a management console that gives the DBA/sysadmin partition usage info?
2. What level of integration exists between dbShards and memcached, if any. Vanilla memcached is visible to the application tier and requires things like cache invalidation to be explicitly managed. Does dbShards provide any transparency or other added value for memcached infrastructures?
3. Why would MySQL users choose dbShards over MySQL Cluster?
[…] dbShards has around 6 customers, including Facebook. (Facebook may outpace even Twitter and Zynga in using the most products mentioned in this post.) […]