Project Cassandra — Facebook’s open sourced quasi-DBMS
Edit: I posted much fresher information about Cassandra in July, 2010.
Facebook has open-sourced Project Cassandra, an imitation of Google’s BigTable. Actual public information about Facebook’s Cassandra seems to reside in a few links that may be found on the Cassandra Project’s Google code page. All the discussion I’ve seen seems to be based solely on some slides from a SIGMOD presentation. In particular, Dare Obasanjo offers an excellent overview of Cassandra. To wit:
The entire system is a giant table with lots of rows. Each row is identified by a unique key. Each row has a column family, which can be thought of as the schema for the row. A column family can contain thousands of columns which are a tuple of {name, value, timestamp} and/or super columns which are a tuple of {name, column+} where column+ means one or more columns. …
… Cassandra has several optimizations to make writes cheaper. When a write operation occurs, it doesn’t immediately cause a write to the disk. Instead the record is updated in memory and the write operation is added to the commit log. … Additionally, the writes are sorted so that the disk is written to sequentially thus significantly improving seek time on the hard drive and reducing the impact of random writes to the system. …
Cassandra is described as “always writable” which means that a write operation always returns success even if it fails internally to the system. This is similar to the model exposed by Amazon’s Dynamo which has an eventual consistency model. From what I’ve read, it isn’t clear how writes operations that occur during an internal failure are reconciled and exposed to users of the system.
Ocstatic has a shorter post about Cassandra, the meat of which is:
Relational database purists may feel queasy at some of the tradeoffs that this design involves – such as the loss of atomicity and the fact that consistency between cluster members is statistical rather than deterministic. But it’s hard to argue with success: Facebook has used Cassandra to scale out a tremendous amount of data without apparent major issues.
To a first approximation, it’s pretty obvious what’s going on — the usual tradeoff of achieving web megascalability at the expense of traditional RDBMS’ flexibility and guaranteed data integrity. But beyond that, I’m confused. For example, Slide 17 offers some performance benchmarks — and the queries are text search. Huh? Unless I’m missing something, that doesn’t seem like a natural fit for this data model. And Slide 14 looks to me as of the “for any” in the fourth bullet point is a typo for “there exists a.”
Maybe things would be clearer if one read either the Google Groups linked on the project page, or the actual code. But I’ve done neither …
Comments
11 Responses to “Project Cassandra — Facebook’s open sourced quasi-DBMS”
Leave a Reply
[…] in 1987 by Jim Grey. And finally, before we dive into the specific server news, here is a post on Facebook’s project to build a distributed database similar to Google’s […]
[…] http://www.bos89.nl/1324 – bookmarked by 1 members originally found by mmrvka on 2008-10-11 Project Cassandra — Facebook’s open sourced quasi-DBMS http://www.dbms2.com/2008/07/21/project-cassandra-facebook-open-sourced-quasi-dbms/ – bookmarked […]
[…] Open-Source-Projekte haben sich bereits um dieses Thema herum gebildet: Cassandra von Facebook, Apache HBase, CouchDB, Hadoop, Memcached, Tokyo Cabinet, MongoDB und LinkedIn hostet […]
[…] however, it’s true that Cassandra inventor Facebook has stopped working on Cassandra, and Facebook’s core Cassandra developers have shifted over […]
[…] […]
[…] Cassandra was originally developed and revealed at Facebook, to much early NoSQL fanfare. Facebook later backed away from Cassandra use. […]
[…] revolve around geo-distribution. Netflix, probably the flagship Cassandra user — since Cassandra inventor Facebook adopted HBase instead — actually hasn’t been using the geo-distribution feature. […]
Terrific article! That is the kind of information that should be shared
around the internet. Shame on the search engines for now not positioning this submit higher!
Come on over and discuss with my site . Thanks =)
Learn how to explode your players’ skills and make training more fun in record time.
They develop a sense of controlling and maneuvering the ball amidst an opposition attack.
This is because they lacked the mental edge or ‘killer instinct’
to produce when it mattered. The tournament is held every four years, with the Olympics in between.
Here is my blog; Sbobet
[…] that kind of thing is not necessarily a death knell. Cassandra inventor Facebook soon replaced Cassandra with HBase, yet Cassandra is doing just […]
[…] Open-Source-Projekte haben sich bereits um dieses Thema herum gebildet: Cassandra von Facebook, Apache HBase, CouchDB, Hadoop, Memcached, Tokyo Cabinet, MongoDB und LinkedIn hostet […]