Transparent sharding
Discussion of transparent sharding vendors and products.
Clarification on dbShards’ shard replication
After I posted recently about dbShards, a Very Smart Commenter emailed me with the challenge “but each individual shard is still replicated via two-phase commit, and everybody knows two-phase commit is fundamentally slow.” I replied that no, it wasn’t exactly two-phase commit, but fumbled the explanation of why — so I decided to escalate straight to dbShards honcho Cory Isaacson. Read more
ScaleBase, another MPP OLTP quasi-DBMS
Liran Zelkha of ScaleBase raised his hand on Twitter. It turns out ScaleBase has a story rather similar to that of CodeFutures/dbShards. That is:
- Like dbShards, ScaleBase is a proxy that looks to the application like a scale-out DBMS, but routes work to multiple servers running MySQL against different shards of the database. Other DBMS beyond MySQL are planned, but PostgreSQL — which dbShards supports — did not get mentioned.
- Sharding is done at configuration time, and is transparent to the application. You want to shard the big tables and replicate the small ones, because if you join two sharded tables, performance can be slow. ScaleBase may have more of a configuration-advisor wizard than dbShards does.
- Each shard is replicated to a mirror, in a high-availability way.
- You can use ScaleBase across multiple data centers, but there’s little or no magic to overcome the performance issues that would arise in many use cases.
- Much like dbShards, ScaleBase supports three kinds of sharding — hash, list, and range.
- ScaleBase currently has no support whatsoever for stored procedures, which is slightly less than dbShards has.
- Liran stresses that ScaleBase looks even to management tools — e.g. TOAD — like a single DBMS.
- ScaleBase runs on EC2 and private cloud.
Our talk didn’t get deeply technical, and I don’t know exactly how ScaleBase’s replication works. But a website reference to a small transaction log in a distributed cache does sound, while not identical to the dbShards approach, at least directionally similar.
ScaleBase is a year or so old, with about 6 people, based in the Boston area despite strong Israeli roots. ScaleBase has raised a round of venture capital; I didn’t ask for details.
Liran says that ScaleBase is in closed beta, with some production users, at least one of whom has over 100 database servers.
Categories: Clustering, dbShards and CodeFutures, MySQL, OLTP, Parallelization, ScaleBase, Transparent sharding | 10 Comments |
dbShards update
I talked yesterday with Cory Isaacson of CodeFutures, and hence can follow up on my previous post about dbShards. dbShards basics include:
- dbShards gives you, in effect, an MPP DBMS based on MySQL or PostgreSQL, meant for OLTP (OnLine Transaction Processing). dbShards always did distributed queries, and now does distributed transactions as well.
- dbShards works by sharding the database and automagically sending work to the correct shard.
- For safety, dbShards of course replicates each shard. Contrary to what I said in the previous post, the replication method is not log-shipping.
- At this time, dbShards only works in a single data center.
- dbShards can handle any SQL that would work through, say, a JDBC driver, and is not particularly sensitive to data type. However, dbShards’ stored procedure support is iffy — if a procedure touches data in more than one shard, it simply fails.
One dbShards customer writes 1/2 billion rows on a busy day, and serves 3-4,000 pages per second, naturally with multiple queries per page. This is on a 32-node cluster, with uninspiring hardware, in the cloud. The database has 16 shards, aggregating 128 virtual shards. I forgot to ask how big the database actually is. Overall, dbShards is up to a dozen or so signed customers, half of whom are in production or soon will be.
dbShards’ replication scheme works like this: Read more
Categories: Clustering, dbShards and CodeFutures, MySQL, OLTP, Parallelization, Transparent sharding | 9 Comments |