May 12, 2010
The Clustrix story
After my recent post, the Clustrix guys raised their hands and briefed me. Takeaways included:
- Nothing in my original short post about Clustrix was actually incorrect.
- Clustrix plans to reveal actual production “name-brand” customers soon.
- The name of Clustrix’s software, or at least the guts thereof, is Sierra.
- Clustrix’s products have actually been in general availability since last quarter, with some versions at customer sites for 2 years. Development started 3 ½ years ago.
- Clustrix says its technology is for OLTP systems, which it calls “non-batch/non-analytic,” with mixed read/write workloads. All Clustrix’s example target markets are “internet verticals,” such as photo sharing, gaming, social media, e-commerce, etc.
- Clustrix’s heart is in SQL, as is most of its customer base. Clustrix Sierra’s key-value-store option has little or no performance advantage over Clustrix Sierra’s SQL option, nor any other advantage over SQL that came up in discussion.
- Clustrix Sierra is “wire-compatible” with MySQL, but doesn’t use MySQL code; Clustrix wrote all the code itself.
- Clustrix asserts that Clustrix Sierra supports the “vast majority” of MySQL features. Examples of MySQL features Clustrix doesn’t support at this time are full-text search and geospatial indexing.
- Indeed, Clustrix claims Clustrix Sierra can be used to replace MySQL with few or zero changes to existing applications.
- I specifically asked about referential integrity, which has a poor performance reputation in MySQL. Besides saying they supported it, Clustrix said that some customers actually use referential integrity in some of their less active tables.
- Clustrix Sierra is fully ACID-compliant, with no eventual consistency or RYW consistency story. The default number of copies of each datum is two, and they’re kept consistent via two-phase commit.
- Clustrix Sierra is fully parallel, with no “head” node. I forgot to ask how it was determined which queries would be addressed to and/or controlled by which nodes, but I presume there’s some sort of a load-balancing scheme.
- Clustrix says that because Clustrix Sierra uses MVCC (Multi-Version Concurrency Control), and thus reads and writes don’t block each other, global locks aren’t a major issue. (They’re rare or short or something – I have trouble seeing why they would be non-existent.)
- Clustrix says there’s a second class of locks and latches that are purely local and short-lived, for B-tree indexes and the like. (I didn’t drill down into those either.) I guess this means Clustrix Sierra is B-tree-centric, which makes sense for an OLTP-oriented system.
- Clustrix Sierra distributes data among nodes via consistent hashing (default), range partitioning, or “full distribution”(i.e., copying a – presumably small – table to each node). The choice of distribution plans is manual now; more automation is a future feature.
- Clustrix Sierra’s CBO (Cost-Based Optimizer) is, as one would hope, distribution-aware.
- Clustrix Sierra compiles query fragments and ships them off to the relevant nodes. A fragment might contain both instructions for SQL to be executed locally and for where data is to be sent next.
- Clustrix says that Clustrix Sierra does data migration and redistribution (e.g., when you add a node) transparently online, and further says that in practice this doesn’t cause a performance hit.
- As for Clustrix hardware:
- Clustrix makes Type I appliances.
- A Clustrix node contains 2 quad-core chips, 32 gigs of RAM, and 7 160 GB solid-state drives.
- Specifically, Clustrix is using Intel SSDs, with a SAS interface.
- Clustrix says solid-state memory isn’t really essential to the product design; it’s just cheap in terms of $/IOPS (I/O Per Second).
- A minimum Clustrix configuration is 3 nodes, for redundancy. After that you can add nodes one at a time. Clustrix says it built a 20-node system in-house, leading me to suspect that customers don’t have anything bigger than 20 nodes either.
- That 20-node Clustrix system was tested to show near-linear scalability. (In discussing this, Clustrix tends to forget to use the word “near”.)
- Clustrix has partnered with somebody to provide global 4-hour-response support. As of now Clustrix seems to be active mainly in North America and Europe.
- Clustrix is formed from the combination of two startups, which I’ve heard elsewhere were called Clustrix and Sprout. Exactly when the combination happened sounds a little different depending on who’s telling the story (one version has the predecessors still being separate well into 2008, but Clustrix implies the combination happened pretty much on Day 1).
Categories: Application areas, Clustrix, Emulation, transparency, portability, Games and virtual worlds, MySQL, NoSQL, OLTP, Parallelization, Solid-state memory
Subscribe to our complete feed!
Comments
8 Responses to “The Clustrix story”
Leave a Reply
Hi Curt,
Thanks for taking the time for a quick briefing about Clustrix’s Clustered Database System.
Overall you did a very good job capturing aspects of our solution. I recognize it was not much time, and we appreciate your attention.
Here is clarification and/or additional information on a few of the points from your latest post about Clustrix:
1) “Clustrix’s heart is in SQL…”
• These are not separate options, since the heart of our system handles both in the same fashion. You get all the performance and scalability of key-value store without giving up full SQL relational and ACID transactional functionality.
2) “Clustrix Sierra is fully parallel, with no “head” node…”
• All nodes are peers, they all take reads and writes—and they are all seen as a single database.
• Access to the cluster can be through our virtual IP address (VIP), an external load balancer, or directly to any one of the nodes. If our VIP is used, it will round-robin the connections to the cluster and will intelligently route around down nodes.
3) “Clustrix is formed from the…”
• Sergei and Paul met in August of 2006, joined forces and formally formed this company in October of 2006. They closed a major funding round December 2006. Prior to this, they had been exploring related, but separate companies.
I hope that you find this helpful. We are happy to chat any time.
Regards,
-Dan Liddle
Thanks, Dan! You did exactly what I want vendors to do after I post about them. 🙂
Curt,
As Yogi Berra would put it “Deja vu all over again”. Clustrix is merely Clustra Systems Redux. Even a quick scan of your report and the web site suggests the technology is amazingly similar to what we developed over 10 years ago. See http://www.theregister.co.uk/2002/03/19/sun_saves_clustra_from_enemy/. I hope the market is kinder to Clustrix. The challenge is not technology but the market demand. We discovered that the vast majority (in excess of 95% is my “guesstimate”) of SQL database requirements can be met by bog standard RDBMS products. Very few need the kind of dynamic scalability and real-time repair that we delivered with the 5-9’s Clustra database. And now Oracle owns the code base for a production-quality, fault-tolerant, clustered SQL database. Interesting world. Timing is everything…
Regards,
Gary Ebersole (former CEO, Clustra Systems Inc.)
CEO, veloGraf Systems Inc.
[…] Clustrix cites a figure close to that. […]
Hi, I’ll go Clustra one better.
Tandem Computers had distributed, ACID, transactional SQL in the eighties already, drawing on talent like Jim Gray, the late Turing Award winner out of UC Berkeley. SQL users on Tandem have benefitted from pushing distributed queries down to the data for over a quarter century now, which is why Tandem is used as a data warehousing machine although it was originally designed for massive, nonstop, fault-tolerant ACID online transaction processing.
None of which detracts from the virtues of Clustra and Clustrix, however. I just wanted to provide some background, remembering my salad days when I thought the Rolling Stones were so incredibly brilliant simply because I was unaware of the great American Blues tradition they drew from.
[…] Clustrix says it has a few production users, some big-name, but is not disclosing them yet. […]
Tandem’s SQL product is now sold by Hewlett Packard. It is called NonStop SQL/MX and runs on HP’s proprietary NonStop servers. HP acquired this via its acquisition of Compaq. The only differences I see between SQL/MX and Clustrix is the use of MVCC for concurrency and the machine level code in query plan fragments.
HP’s product page for SQL/MX is here:
http://h20223.www2.hp.com/nonstopcomputing/cache/81318-0-0-0-121.html
[…] feels like time to write about Clustrix, which I last covered in detail in May, 2010, and which is releasing Clustrix 4.0 today. Clustrix and Clustrix 4.0 basics […]