January 15, 2013

Tokutek update

Alternate title: TokuDB updates 🙂

Now that I’ve addressed some new NewSQL entrants, namely NuoDB and GenieDB, it’s time to circle back to some more established ones. First up are my clients at Tokutek, about whom I recently wrote:

Tokutek turns a performance argument into a functionality one. In particular, Tokutek claims that TokuDB does a much better job than alternatives of making it practical for you to update indexes at OLTP speeds. Hence, it claims to do a much better job than alternatives of making it practical for you to write and execute queries that only make sense when indexes (or other analytic performance boosts) are in place.

That’s all been true since I first wrote about Tokutek and TokuDB in 2009. However, TokuDB’s technical details have changed. In particular, Tokutek has deemphasized the ideas that:

Vaguely justified the “fractal” metaphor, namely …
… the stuff in that post about having one block each sized for each power of 2, …
… which seem to be a form of what is more ordinarily called “cache-oblivious” technology.

Rather, Tokutek’s new focus for getting the same benefits is to provide a separate buffer for each node of a b-tree. In essence, Tokutek is taking the usual “big blocks are better” story and extending it to indexes. TokuDB also uses block-level compression. Notes on that include:

It’s LZMA.
It’s expensive to write, cheap to read.
5X compression is common, 9X happens, and higher figures yet happen in a few edge cases.
LZMA detects and compresses repeated values, so it has some of the benefits of tokenization.
However, TokuDB has to decompress data before operating on it.

Somewhat like NuoDB, Tokutek talks in terms of sending messages to blocks. The TokuDB durability story involves streaming messages to disk and also checkpointing all dirty blocks to disk every minute or so. Further, TokuDB has an online schema change approach based on broadcasting messages about various column operations (delete, add w/ default value, etc.)

Beyond that:

Like most other RDBMS vendors I talk with, Tokutek goes for MVCC (Multi-Version Concurrency Control), if for no other reason than to obviate a need for read locks.
TokuDB doesn’t have much in the way of a scale-out story. But as for any other NewSQL vendor of whom that’s true — e.g. Akiban — expect that to change. And even if it doesn’t, one could use TokuDB in conjunction with a transparent sharding tool such as dbShards.
For more technical detail, Tokutek offers a web page with several detailed slide decks and so on.

And finally, Tokutek company basics include:

15-16 employees.
A few more paying customers than those logoed on its website.
Free customers beyond that. (TokuDB is free under 50 GB.)
Notwithstanding the meaningless of the phrase, “Fractal Tree indexing” is Tokutek’s story and it’s sticking to it.

Categories: Akiban, Database compression, Market share and customer counts, NewSQL, Tokutek and TokuDB

Subscribe to our complete feed!

Comments

7 Responses to “Tokutek update”

Jon Frisby on February 13th, 2013 12:39 pm

It’s also worth noting that TokuDB has horrendous bugs in their mutex code. On a server handling 1,100 clients simultaneously (each doing small batches up upserts), we were seeing MySQL fall over every 90 seconds or so — always in the TokuDB mutex code.

Even with only two connections (one doing reads from one table and upserts into another table, the other loading a mysqldump), we’ve seen the mutex code kill mysqld.

As promising as their performance characteristics are, they’re just way too unstable to rely on right now.
Jon Frisby on February 13th, 2013 12:40 pm

Also, they’ve ditched the free-under-50GiB model now. There’s a brief trial period and then you have to pay, regardless of data volume.
Gerry on February 15th, 2013 9:42 pm

Dear Jon,

I’m sorry that you seem to have hit a bug that affects your ability to use thousands of concurrent connections. Thanks for sharing your feedback so we can resolve any open issues.

In terms of our ability to handle large thread count, we run TPC-C and Sysbench with up to 1024 connections as part of our ongoing development process and run some tests with 2048 and have not found this particular problem. Our published Sysbench benchmark goes to 1024 and the results can be found on our benchmarks page (http://www.tokutek.com/resources/benchmark-results/benchmarks-vs-innodb-hdds/). Towards the end of the page you’ll find the command line we used which would allow anyone to reproduce the tests in their own environments.

If you could provide us with a reproducible case that we can use to debug our software, we’ll get to work on it.

In regards to the pricing issue, the shift from capacity based pricing to server based pricing was a business decision we took to better align ourselves with pricing in the MySQL ecosystem. TokuDB continues to be free for development and so far the reaction to the new server based pricing has been great.

Thank you very much for your interest in TokuDB.
Introduction to Deep Information Sciences and DeepDB | DBMS 2 : DataBase Management System Services on April 14th, 2013 12:33 am

[…] talked Friday with Deep Information Sciences, makers of DeepDB. Much like TokuDB — albeit with different technical strategies — DeepDB is a single-server DBMS in the […]
Notes on TokuDB and GenieDB | DBMS 2 : DataBase Management System Services on April 22nd, 2013 6:07 am

[…] posts about TokuDB and GenieDB Categories: GenieDB, Market share and customer counts, MySQL, NewSQL, Open source, […]
Unstructured Database Companies in Big Data Ecosystem - e-Data Analyst : Your Data Analyst Training Resource Center on July 21st, 2013 8:49 pm

[…] Information Sciences: makers of DeepDB. Much like TokuDB, DeepDB is a single-server DBMS in the form of a MySQL engine, concentrated around writing indexes […]
Tokutek’s interesting indexing strategy | DBMS 2 : DataBase Management System Services on August 31st, 2013 7:35 pm

[…] of “writes indexes efficiently” have been hard to nail down. For example, my post about Tokutek indexing last January, while not really mistaken, is drastically […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Tokutek update

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin