March 4, 2015
Quick update on Tachyon
I’m on record as believing that:
- Hadoop needs a memory-centric storage grid.
- Tachyon is a strong candidate to fill the role.
That said:
- It’s an open secret that there will be a Tachyon company. However, …
- … no details have been publicized. Indeed, the open secret itself is still officially secret.
- Tachyon technology, which just hit 0.6 a couple of days ago, still lacks many features I regard as essential.
- As a practical matter, most Tachyon interest to date has been associated with Spark. This makes perfect sense given Tachyon’s origin and initial technical focus.
- Tachyon was in 50 or more sites last year. Most of these sites were probably just experimenting with it. However …
- … there are production Tachyon clusters with >100 nodes.
As a reminder of Tachyon basics:
- You do I/O with Tachyon in memory.
- Tachyon data can optionally be persisted.
- That “tiered storage” capability — including SSDs — was just introduced in 0.6. So in particular …
- … it’s very primitive and limited at the moment.
- I’ve heard it said that Intel was a big contributor to tiered storage/SSD support. (Solid-State Drives.)
- Tachyon has some ability to understand “lineage” in the Spark sense of term. (In essence, that amounts to knowing what operations created a set of data, and potentially replaying them.)
Beyond that, I get the impressions:
- Synchronous write-through from Tachyon to persistent storage is extremely primitive right now — but even so I am told it is being used in production by multiple companies already.
- Asynchronous write-through, relying on lineage tracking to recreate any data that gets lost, is slightly further along.
- One benefit of adding Tachyon to your Spark installation is a reduction in garbage collection issues.
And with that I have little more to say than my bottom lines:
- If you’re writing your own caching layer for some project you should seriously consider adapting Tachyon instead.
- If you’re using Spark you should seriously consider using Tachyon as well.
- I think Tachyon will be a big deal, but it’s far too early to be sure.
Categories: Clustering, Databricks, Spark and BDAS, Hadoop, Memory-centric data management
Subscribe to our complete feed!
Comments
3 Responses to “Quick update on Tachyon”
Leave a Reply
If we look on Tachyon as caching layer, who share data in memory and then checkpoint it, if and when needed, we can assume it will be most valuable when underlying permanent storage is slow. Assuming this I would expect immediate value in using Tachyon over cloud storage, like S3, which today is much slower then HDFS.
Very nice post.
Tachyon and I think you will see other companies based on tachyon emerge that will help cloud/virtualized Hadoop and spark deployments. BlueData for one has already incorporated tachyon and announced a tech preview.
Check out the video and blog post at
http://www.bluedata.com/2015/02/bluedata-ioboost-tachyon-2/
Could you name some other possible contenders?