MemSQL scales out
The third of my three MySQL-oriented clients I alluded to yesterday is MemSQL. When I wrote about MemSQL last June, the product was an in-memory single-server MySQL workalike. Now scale-out has been added, with general availability today.
MemSQL’s flagship reference is Zynga, across 100s of servers. Beyond that, the company claims (to quote a late draft of the press release):
Enterprises are already using distributed MemSQL in production for operational analytics, network security, real-time recommendations, and risk management.
All four of those use cases fit MemSQL’s positioning in “real-time analytics”. Besides Zynga, MemSQL cites penetration into traditional low-latency markets — financial services (various subsectors) and ad-tech.
Highlights of MemSQL’s new distributed architecture start:
- There are two kinds of MemSQL node — “aggregator” and “leaf”.
- Aggregators are a kind of head node. You can have a bunch of them.
- Leafs run full single-server MemSQL. You can have a bunch of them too.
- MemSQL has two query optimizers. One kind runs on the aggregator nodes, and thinks about the whole cluster. The other runs on the leafs, and only thinks about its own node.
- Much of the join and aggregation work is done on the aggregator nodes, but I didn’t pursue that issue in much detail.
- It is good policy — and supported — to replicate small dimension/reference tables across the cluster. These are replicated to aggregator and leaf nodes alike. (This tells us that some joins are indeed done on the leafs. ;))
- MemSQL replication can be synchronous or asynchronous. It can be used for high availability.
Also:
- MemSQL writes (whether primary or replicated) go to a buffer. The buffer size can be 0 or positive, in a tradeoff of durability vs. the likelihood of a disk I/O bottleneck.
- MemSQL has many virtual nodes on each physical (leaf) node. (This is pretty much an industry-standard best practice, as it helps with elasticity, recovery from node failure, and so on.)
- Compression is still a future feature.
- So is online schema change.
- Leaf nodes have cost-based optimizers.
- MemSQL’s aggregator (cluster-wide) optimizer is mainly heuristic, but is supposed to get more cost-based in future releases.
- In some releases it will be possible to keep MemSQL running while upgrading the software. But that’s not a promise for releases that change how replication works.
And which not-easily-parallelized aggregate did MemSQL implement first? The same one Platfora did — COUNT DISTINCT.
Comments
6 Responses to “MemSQL scales out”
Leave a Reply
[…] data” included as well. I hear variants of that positioning from NewSQL vendors (e.g. MemSQL), NoSQL vendors (e.g. AeroSpike), BI stack vendors (e.g. Platfora), application-stack vendors (e.g. […]
I looked into memsql in part based on your post.
I do appreciate hearing about new options or improvements such as you mentioned; MPP was of interest. I have a strong interest in MPP SQL databases of the open source or closed-but-MySQL-compatible variety for scaling out data warehouse ETLs.
But I was a bit disillusioned that they didn’t support more than a 2-way join. (See http://developers.memsql.com/docs/1b/sql/join.html .) For a new project that can be worked around, but for migrating existing work, that’s a pretty big non-starter or important caveat I thought worth mentioning to you or your readers.
Hi Greg,
Sorry you found an outdated version of our documentation. If you go here – http://developers.memsql.com/docs/2.0/sql/join.html – you should find what you need. If you’re interested in banging away on MemSQL, you can download a free trial at http://www.memsql.com/download.
Thanks,
Ryan
Thanks Ryan.
[…] sponsor is MemSQL, one of my numerous clients to have recently adopted some version of a “real-time […]
[…] MemSQL has historically been an in-memory row store, which as of last year scales out. […]