Introduction to Tokutek
Tokutek has a paradoxical pitch: Tokutek writes data particularly quickly, and therefore you’re supposed to buy Tokutek for query-oriented uses. Highlights of the Tokutek story include:
- Tokutek is a MySQL storage engine.
- MySQL/Tokutek writes indexed data a lot faster than B-tree-based alternatives. (The claim is 10s of 1000s of rows per second on a single server.)
- MySQL/Tokutek reads data at B-tree speeds. (But not, I presume, at the speed of specialized analytic DBMS.)
- Tokutek is not yet ACID-compliant. They’re working on that, but we don’t know what the performance implications will be when they achieve it. ACID compliance won’t come as soon as the May release (Tokutek Version 2.0).
- Tokutek has made one sale. Others are in the pipeline.
Tokutek’s initial target market is the usual combination of clickstream/personalization/other network management. The idea is that many data warehouse technologies have trouble getting latency below, say, 15 seconds to 5 minutes, at least at very high update volumes. So if immediacy is more important than raw complex query performance, Tokutek’s performance profile could be attractive.
Tokutek’s core technology is something it’s named “fractal tree” indexing. The “fractal” part of the name is based on a kind of “self-similar at different size ranges” flavor to the scheme. But if I understood correctly, it doesn’t sound like much of a tree, in that it always goes root-branch-leaf, rather than ever going root-branch-branch-branch-leaf or whatever.
The basic idea is that (to a first approximation) there’s one index block each of length 1, 2, 4, 8, and so on. With each insert, blocks get emptied and filled, so that each block is either completely empty or completely full. (And each block is in a sorted state as soon as it’s filled.) This results in a huge amount of shuffling in RAM — where the shorter blocks live — but disk only gets touched for occasional orderly pre-sorted merges. Tricks are obviously needed to:
- Make Tokutek lookup speed be order (log(N)) rather than order (log^2(N)) — that’s done by salting each block with information about where values lie in the next one.
- Make Tokutek disk merges operate in background.
- Dump data quickly when it’s time to delete it — I didn’t ask how Tokutek does that.
I have a plane to catch, so I’ll just post this much and stop. More later.
Comments
14 Responses to “Introduction to Tokutek”
Leave a Reply
Curt,
Thanks for the post and also thanks for the quote for our TokuDB press release – it was over the phone so mea culpa if we messed up the wording.
Best,
Tom
Curt, may I rephrase your description slightly? The way I’d put it is: Tokutek’s technology is based on a fundamentally new index structure/algorithm. Many database experts thought that the B-tree was the ultimate best answer, but this new one has a huge advantage over ordinary B-trees: inserts are much, much faster. This is an impressive and fundamental new concept, widely applicable.
In a conventional relational database system, you have a tradeoff. Adding more indexes makes queries (that use whatever columns are indexed) faster, but it makes updates slower because you have to update all those indexes (and write back the pages they’re stored on). Tokutek’s basic idea is that now you’re free to put in as many indexes as you want (hence the reason it’s good for query-orinted uses), because the indexes do not slow down writes the way ordinary B-tree indexes would.
The “fractal tree” algorithm has been published. It’s based on the “Cache Oblivious” concept; see Wikipedia, which in turn points you to a nice explanation by Prof. Eric Demaine of MIT. The “fractal tree” index papers are pointed to from http://supertech.csail.mit.edu/cacheObliviousBTree.html. Note that the name “fractal tree” was introduced by Tokutek and is not used in these papers. They use names like “cache-oblivious search trees”, just so you know that that’t the thing we’re talking about. “Fractal Tree”, in my opinion, is a reasonable name from a technical point of view and better from a marketing point of view. Naturally, Tokutek and the inventors have patents pending. Bradley Kuszmaul of MIT, Michael Bender of Stony Brook U., and Martin Farach-Colton of Rutgers are the inventors.
It seems to me that doing ACID transactions will still incur costs for committing the index changes to disk, if you implement it the most obvious way. Presumably they will implement it more cleverly than the obvious one. This is not discussed in the academic papers. I speculate that this is why they have not yet implemented ACID — it’s not as easy as just following the published methods. But that’s just a speculation.
Dan Weinreb, you’re restating several years worth of justification for non-b-tree approaches to management of data for for analytical use put forward by various academics and software companies, in particular column-store backers. No problem there. I just wanted to point it out.
Is reliance on a “cache-oblivious dynamic search tree as an alternative to the ubiquitious [sic] B-tree” (quote from the paper you cite) meant to appeal to analytics users? B-trees are good for random-access retrieval of relatively small numbers of records and the implication of the term “search tree” is that it would be similarly suited.
If the goal is fast data availability, why not look at a streaming-data engine that caches good volumes of recent data in memory? Or is the advantage here that this software is a MySQL engine?
Curt, are they a client of yours?
Thanks.
Jerome,
Not at this time. But I took pity on them and did a favor anyway.
Seth,
Nothing that comes to mind is faster than a streaming engine integrated with a DBMS. But for use cases where Tokutek does the job, it could be simpler/cheaper than that approach.
Ouch that hurts 🙂
@Seth: Sure, a lot of what I wrote is just background, in order to provide the context to explain about Tokutek’s index. I intended it for readers with less experience than you.
I don’t know whether it’s aimed at analytics users. I have a tendency to think in terms of OLTP, since I am working on an airline reservation system.
As you say, the fact that it has a MySQL front end may mean that they have in mind LAMP-stack web sites, Drupal sites, etc. I don’t know. And also as you say, streaming technology is good for caching recenbt data in memory. I don’t know whether they are aiming at applications with that property.
As Pror. Stonebraker says these days, one size does not fit all. There are many useful architectures now, and for high performance you have to be careful to match the architecture to the use cases you care about.
ObjectStore, which I co-architected, was aimed at C++ programmer who wanted language transparency, sharing, ability to take advantage of powerful client-side computers, operating on data with good spatial and temporal locality, and not doing complex queries. (There were plenty of customers who were happy with that and Object Design was very successful.) That’s another example of a different (in some ways) DBMS architecture.
[…] Calpont plans to offer a MySQL storage engine for analytic database processing. Thus, Calpont will compete with Infobright, Kickfire, and perhaps Tokutek. […]
[…] hand, he praised or at least expressed hope for a variety of MySQL-related technologies, including Tokutek’s TokuDB and Continuent’s […]
[…] same site says Tokutek finally was able to raise some VC. […]
[…] Looking back for a couple of examples even if you’ve never heard of him before, I see that Dan ‘s 2009 comment on Tokutek is still interesting today, and so is a post on his own blog disagreeing with some of my choices in […]
[…] vendors I’ve written about in the past include Akiban, Tokutek, CodeFutures (dbShards), Clustrix, Schooner (Membrain), VoltDB, ScaleBase, and ScaleDB, with […]
[…] all been true since I first wrote about Tokutek and TokuDB in 2009. However, TokuDB’s technical details have changed. In particular, Tokutek has deemphasized […]