Notes on short-request scale-out MySQL
A press person recently asked about:
… start-ups that are building technologies to enable MySQL and other SQL databases to get over some of the problems they have in scaling past a certain size. … I’d like to get a sense as to whether or not the problems are as severe and wide spread as these companies are telling me? If so, why wouldn’t a customer just move to a new database?
While that sounds as if he was asking about scale-out relational DBMS in general, MySQL or otherwise, short-request or analytic, it turned out that he was asking just about short-request scale-out MySQL. My thoughts and comments on that narrower subject include(d) but are not limited to:
- The biggest web companies had to go to non-transparently sharded MySQL years ago. The NoSQL movement is, in no small part, an attempt to improve upon that. Ditto for scale-out short-request MySQL.
- Some overlapping categories of companies or projects who need scale-out short-request database processing are:
- The aforementioned big companies who have other applications they haven’t hand-sharded yet.
- Other web companies whose applications are getting that big.
- Conventional enterprises whose web efforts happen to be very big.
- Sensor networks and other massive sources of machine-generated data.
- Certain specialized areas (e.g., financial trading).
- Relatively few of these applications are totally impossible to do in Oracle. But the Oracle approach might be very expensive.
- In particular, there’s a break point when companies — often SaaS vendors — outgrow Oracle Standard Edition.
- Yes, the alternatives usually are one of MySQL or Oracle.
- InnoDB isn’t an alternative to these newer technologies; it’s just a piece of the puzzle and indeed of default MySQL now. Several of them — e.g. dbShards — are meant to be used in conjunction with InnoDB.
- Merging his list and mine, the high-performance/scale-out MySQL alternatives look like dbShards, Schooner, ScaleBase, ScaleDB, Tokutek, Akiban, Xeround, and Clustrix. The first two are to my knowledge more proven than the rest.
- Proprietary hardware and the associated hardware/appliance pricing aren’t very appealing for these applications. That speaks against Oracle Exadata and Clustrix, and is the reason Schooner switched to a software-only strategy despite some initial appliance sales.
- However, hardware band-aids such as solid-state drives or even RAM-based solid-state storage could make more sense:
- If, for performance, you’ve scaling out your database so that it fits in RAM on each box, you don’t really have a disk-based architecture anyway, now do you?
- Even if you’re not doing that yet — if your problem is throughput rather than storage capacity, silicon-based storage could be a big help.
- In principle, devices of that kind can be moved from one application to another, after the first one is rearchitected not to need them. (In practice, however, I don’t know of anybody who is doing that. I also don’t believe that Kaminario et al. are marketing that kind of idea, more’s the pity.)
- My notes on all this from April, 2010 are already badly outdated, but may be interesting anyway.
Comments
5 Responses to “Notes on short-request scale-out MySQL”
Leave a Reply
Hi,
Why does not MySQL Cluster fit your list?
IMO we’re very proven with in short-request scale-out…and is working hard on better query performance.
/Jonas, MySQL Cluster developer
Jonas,
I don’t hear a lot of good things about MySQL Cluster, although I’m sure there are some use cases where it works just fine.
Hi Curt
Disclaimer first – I’m part of the MySQL Cluster product management team
We do see growing adoption of MySQL Cluster – now around 1,000 downloads per day and an expanding number of case studies from web and telecoms workloads:
http://www.mysql.com/customers/cluster/
MySQL Cluster was originally designed for in-network telecoms applications which needed ultra-low latency, high write performance and 99.999% availability. These sorts of applications typically had simple access patterns and limited size of data sets. And so use-cases were pretty specific
Over the past couple of years, MySQL Cluster adoption has been most significant in web use cases, and as such the product has continued to evolve to meet a broader set of use requirements.
Building upon auto-sharding of the database with multi-master replication to support write-intensive workloads, the ability to scale the cluster on-line, modify the schema without downtime and a variety of Non-SQL interfaces, the latest development release announced last week adds a range of capabilities to enable MySQL Cluster to meet a broader set of use-cases:
– Adaptive Query Localization which pushes JOIN operations down to the data nodes. We’ve seen 20-40x speed-up in JOIN operations as a result on real-world queries
– A new memcached API to the cluster, bypassing SQL enabling easy access for web developers and extending memcached with a persistent, scalable, HA back-end
More about the Development release is here:
http://dev.mysql.com/tech-resources/articles/mysql-cluster-labs-dev-milestone-release.html
There has been a lot of work also to enhance ease of deployment and management – enabling Devs and DBAs to get up and running much faster, and with a lower learning curve
Would welcome the opportunity to brief you more on the latest status of Cluster adoption and development – certainly MySQL Cluster has come a long way from its telecoms niche 3+ years ago
Thanks for posting. I’d love to be briefed.
Is this something that should happen while I’m in Redwood Shores next week, or are the relevant people elsewhere?
[…] an April 19 DBMS2 blog entry Curt Monash talks about just such a strategy for addressing a challenge faced by many Web […]