Introduction to MemSQL
I talked with MemSQL shortly before today’s launch. MemSQL technology basics are:
- In-memory relational DBMS.
- Being released single-box only. Transparent sharding is under development for release in the fall. Basic replication is under development too.
- Subset of SQL-92.
- MySQL wire-compatible (SQL coverage issues excepted).
MemSQL’s performance claims include:
- Read performance 10% or so worse than memcached.
- Write performance 20% or so better than memcached.
- 1.2 million inserts/second on a 64-core, 1/2 TB of RAM machine.
- Similarly, 1/2 billion records loaded in under 20 minutes.
MemSQL company basics include:
- 15 employees.
- Some production customers and revenue.
- Going GA on Monday with two versions:
- Free, limited to 10 gigabytes of RAM but otherwise unrestricted.
- Enterprise; pricing is based on database size but otherwise remains a mystery.
As you might think, MemSQL’s technical story is focused on performance. Basics there include:
- Data organized via hash tables and skip lists. (I didn’t ask why there were two data structures.) MemSQL thinks skip lists scale very well across multiple cores.
- Query patterns compiled into C++ and stored against re-use.
- MVCC/no read locks (Multi-Version Concurrency Control).
- Lightweight/fairly optimistic write locks.
- Tunable durability — you can run MemSQL fully durable, or you can set a transaction buffer size that measures how much transaction data you can acknowledge before it’s persisted to disk.
The idea of compiled queries is hardly a new one, having been raised before by (for example) QlikTech, StreamBase, and ParAccel. Highlights of MemSQL’s version include:
- C++, which you can read if you want to.
- Parameter-aware; if a query comes in with one set of parameters, it’s stored to be run with other parameters in the future.
- Persistent — if the machine crashes, the stored compiled queries are still there when it comes back up.
Queries only take up a couple of kilobytes each, and in the early going MemSQL hasn’t seen customers with more than a few thousand stored patterns. Hence the MemSQL guys are highly optimistic that storing every query pattern a system has ever compiled will not create any material space crunch, and just have a simple LRU (Least Recently Used) approach to freeing space in that unlikely eventuality.
MemSQL’s persistence story is that you flush write-ahead logs to one set of disks (spinning or solid-state), while sending snapshots (perhaps continually) to other disks. Persistence is designed to be sequential, although I’m not sure whether that remains true in the full-durability case.
Other random technical notes include:
- MemSQL is working on a multi-temperature option, driven manually by DDL (Data Description Language). That is, recent data might be in MemSQL, while older data might be in MySQL.
- The MemSQL guys like the idea of what I regard as a best practice, namely committing writes as soon as they’ve been acknowledged in the RAM of two (or suitably many) different machines. But it’s not technology they are developing at this time.
- They’re proud of what they regard as a very fast parallel Group By.
- MemSQL doesn’t compress data, and sees compression as being more important in analytic use cases. (Perhaps not coincidentally, they say that the hardware on which they can do 1.2 million inserts/second is available in hosting for a couple thousand dollars/month.)
- Simple MemSQL inserts take a few hundred CPU instructions each.
- Median insert latency can be below 200 microseconds. Max insert latency can be below 1 millisecond. (Yes, they hope to sell MemSQL to high-frequency traders.)
Related links
Comments
3 Responses to “Introduction to MemSQL”
Leave a Reply
For HFT application 1 ms is an eternity :). When someone speaks about RDBMS “insert performance” one detail which is almost always missing is the presence or absence of table indexes. But I suppose they do not do any indexing except by some primary key (for skip lists).
When ever new technologies come about in the data space, thousands of people scream “this will be the death of SQL”. Then SQL vendors evolve and things go back to normal. Mark this as the beginning of the end for the NOSQL movement.
What’s the avg. record size and # of fields/record
in the benchmarks mentioned above?
1.2M inserts/sec is not very impressive if records are 10 bytes – and very impressive if they’re (let’s say) 500 bytes. So what’s the story?