dbShards and CodeFutures
Analysis of Code Futures and its flagship product dbShards. Related subjects include:
Clarification on dbShards’ shard replication
After I posted recently about dbShards, a Very Smart Commenter emailed me with the challenge “but each individual shard is still replicated via two-phase commit, and everybody knows two-phase commit is fundamentally slow.” I replied that no, it wasn’t exactly two-phase commit, but fumbled the explanation of why — so I decided to escalate straight to dbShards honcho Cory Isaacson. Read more
The Continuent Tungsten MySQL replication story
To the consternation of its then-CEO, I wrote very little about my then-client Continuent. However, when I knew Schooner’s recent announcement was coming, I reached out to other MySQL scale-out vendors too. I’ve already posted accordingly about CodeFutures (the dbShards guys) and ScaleBase. Now it’s late-responding Continuent’s turn.
Actually, what I’m mainly going to do is quote a very long email that Continuent’s current CEO/former CTO Robert Hodges sent me, and which I lightly edited. Read more
Categories: Continuent, Data integration and middleware, dbShards and CodeFutures, MySQL, Open source, Parallelization, Schooner Information Technology | 9 Comments |
ScaleBase, another MPP OLTP quasi-DBMS
Liran Zelkha of ScaleBase raised his hand on Twitter. It turns out ScaleBase has a story rather similar to that of CodeFutures/dbShards. That is:
- Like dbShards, ScaleBase is a proxy that looks to the application like a scale-out DBMS, but routes work to multiple servers running MySQL against different shards of the database. Other DBMS beyond MySQL are planned, but PostgreSQL — which dbShards supports — did not get mentioned.
- Sharding is done at configuration time, and is transparent to the application. You want to shard the big tables and replicate the small ones, because if you join two sharded tables, performance can be slow. ScaleBase may have more of a configuration-advisor wizard than dbShards does.
- Each shard is replicated to a mirror, in a high-availability way.
- You can use ScaleBase across multiple data centers, but there’s little or no magic to overcome the performance issues that would arise in many use cases.
- Much like dbShards, ScaleBase supports three kinds of sharding — hash, list, and range.
- ScaleBase currently has no support whatsoever for stored procedures, which is slightly less than dbShards has.
- Liran stresses that ScaleBase looks even to management tools — e.g. TOAD — like a single DBMS.
- ScaleBase runs on EC2 and private cloud.
Our talk didn’t get deeply technical, and I don’t know exactly how ScaleBase’s replication works. But a website reference to a small transaction log in a distributed cache does sound, while not identical to the dbShards approach, at least directionally similar.
ScaleBase is a year or so old, with about 6 people, based in the Boston area despite strong Israeli roots. ScaleBase has raised a round of venture capital; I didn’t ask for details.
Liran says that ScaleBase is in closed beta, with some production users, at least one of whom has over 100 database servers.
Categories: Clustering, dbShards and CodeFutures, MySQL, OLTP, Parallelization, ScaleBase, Transparent sharding | 10 Comments |
dbShards update
I talked yesterday with Cory Isaacson of CodeFutures, and hence can follow up on my previous post about dbShards. dbShards basics include:
- dbShards gives you, in effect, an MPP DBMS based on MySQL or PostgreSQL, meant for OLTP (OnLine Transaction Processing). dbShards always did distributed queries, and now does distributed transactions as well.
- dbShards works by sharding the database and automagically sending work to the correct shard.
- For safety, dbShards of course replicates each shard. Contrary to what I said in the previous post, the replication method is not log-shipping.
- At this time, dbShards only works in a single data center.
- dbShards can handle any SQL that would work through, say, a JDBC driver, and is not particularly sensitive to data type. However, dbShards’ stored procedure support is iffy — if a procedure touches data in more than one shard, it simply fails.
One dbShards customer writes 1/2 billion rows on a busy day, and serves 3-4,000 pages per second, naturally with multiple queries per page. This is on a 32-node cluster, with uninspiring hardware, in the cloud. The database has 16 shards, aggregating 128 virtual shards. I forgot to ask how big the database actually is. Overall, dbShards is up to a dozen or so signed customers, half of whom are in production or soon will be.
dbShards’ replication scheme works like this: Read more
Categories: Clustering, dbShards and CodeFutures, MySQL, OLTP, Parallelization, Transparent sharding | 9 Comments |
I’m collecting data points on NoSQL and HVSP adoption
I was asked to do a magazine article on NoSQL, where by “NoSQL” is meant “whatever they talk about at NoSQL conferences.” By now the number of publications planning to run the article is up to 2, the deadline is next week and, crucially, it has been agreed that I may talk about HVSP in general, NoSQL and SQL alike.
It also is understood that, realistically, I can’t be expected to know and mention the very latest news for all the many products in the categories. Even so, I think this would be fine time to check just where NoSQL and HVSP adoption stand. Here is most of what I know, or links to same; it would be great if you guys would contribute additional data in the comment thread.
In the NoSQL area: Read more
dbShards — a lot like an MPP OLTP DBMS based on MySQL or PostgreSQL
I talked yesterday w/ Cory Isaacson, who runs CodeFutures, makers of dbShards. dbShards is a software layer that turns an ordinary DBMS (currently MySQL or PostgreSQL) into an MPP shared-nothing ACID-compliant OLTP DBMS. Technical highlights included: Read more
Categories: dbShards and CodeFutures, Facebook, MySQL, OLTP, Parallelization, PostgreSQL | 3 Comments |