December 8, 2013

DataStax/Cassandra update

Cassandra’s reputation in many quarters is:

World-leading in the geo-distribution feature.
Impressively scalable.
Hard to use.

This has led competitors to use, and get away with, sales claims along the lines of “Well, if you really need geo-distribution and can’t wait for us to catch up — which we soon will! — you should use Cassandra. But otherwise, there are better choices.”

My friends at DataStax, naturally, don’t think that’s quite fair. And so I invited them — specifically Billy Bosworth and Patrick McFadin — to educate me. Here are some highlights of that exercise.

DataStax and Cassandra have some very impressive accounts, which don’t necessarily revolve around geo-distribution. Netflix, probably the flagship Cassandra user — since Cassandra inventor Facebook adopted HBase instead — actually hasn’t been using the geo-distribution feature. Confidential accounts include:

A petabyte or so of data at a very prominent company, geo-distributed, with 800+ nodes, in a kind of block storage use case.
A messaging application at a very prominent company, anticipated to grow to multiple data centers and a petabyte of so of data, across 1000s of nodes.
A 300 terabyte single-data-center telecom account (which I can’t find on DataStax’s extensive customer list).
A huge health records deal.
A Fortune 10 company.

DataStax and Cassandra won’t necessarily win customer-brag wars versus MongoDB, Couchbase, or even HBase, but at least they’re strongly in the competition.

DataStax claims that simplicity is now a strength. There are two main parts to that surprising assertion.

DataStax claims that operation is simple, that operators are “bored”, that large users appreciate the ease of operation, and so on. These claims become a lot more plausible if you recall:
- Cassandra isn’t used for databases that resemble relational schemas with 1000s of tables, lots of foreign keys, and so on.
- Performance and capacity problems in Cassandra don’t necessarily require sophisticated operational solutions; you can throw hardware at them instead.
DataStax claims that CQL (Cassandra Query Language) makes Cassandra programming and data modeling much easier than they were before. More on that below.

DataStax claims that Cassandra excels at time series use cases, where “time series” seem to equate to collections of short records with timestamps. This seems borne out by, for example, the first three use cases on my bulleted list above. Actually, it’s not just timestamps, but rather any data that is naturally ordered by a sequential field, such as packet IDs from a packet-switching network.

Finally, DataStax claims that Cassandra is good for high-velocity applications in general. A generic example that DataStax supported with some Very Big Names — whether those were of customers or prospects wasn’t entirely clear — was in retailing, to actually serve accurate information as to whether inventory is in stock, something Walmart failed at as recently as last year.

Now let’s talk a bit about Cassandra technology. I’ll start with an example. Imagine a “phone-home” use case in which many devices emit many records each in the form of (DeviceID, TimeStamp, MeterReading) triples.

A relational database would store that as a bunch of rows, 3 columns wide.
A Cassandra database, however, would have a single row for each DeviceID; each row would contain two columns for each (TimeStamp, MeterReading) pair.
The column names are composite, in a way that shows the different column pairs are each recording the same kind of thing.
Cassandra Query Language (CQL) lets you query (or insert) as if the data were in the relational-table logical format. But of course you can also reference Cassandra in a way that takes its actual (row, column) structure at face value.

So in essence, you have schemas that at once are dynamic and tabular. The big downside vs. a relational DBMS is that — duh! — you can’t have the benefits of normalization.

For clarity, I should note that much of Cassandra’s logical architecture is shared by fellow BigTable-architecture data store HBase; it’s not a coincidence that Facebook invented Cassandra to support messaging, nor that when Facebook changed its mind about that, it adopted HBase as the alternative. Accumulo has similar characteristics as well.

Physically, what’s going on in Cassandra is something like this:

Each Cassandra row is maintained in memory, and in most cases sorted on timestamp (or some other comparator), in either order. This is the basis for the claims of great Cassandra performance and general suitability specifically in time series use cases. (E.g., “Last 10 events” kinds of reads are very easy.)
Once rows are flushed to disk, they are immutable … except that of course they eventually are compacted, typically via a merge sort. (When you do need to do a database update, last write wins.)
Rows are organized into files on disk. There’s a “key cache” that in many cases will tell you exactly which file contains the row you’re looking for. If you have a cache miss …
… each file has a Bloom filter predicting which keys it contains, and you interrogate those. Those Bloom filters are also maintained in memory (and copied on disk just for the sake of persistence).

Cassandra has few indexes, and no physical concept of datatype.

The benefits I see to this physical architecture are mainly:

Plays nicely with Cassandra’s logical architecture.
Plays nicely with scale-out.
Seems to have been designed RAM-first, which matches how databases are actually used.
Is fast for range queries on the comparator (e.g. timestamp).
Doesn’t have a lot of knobs to twiddle, which makes it plausible that a relatively immature product can be easy to administer.

For some use cases, that’s not a bad list of advantages. Not bad at all.

Related link

I covered some real basics in a Cassandra technical overview 3 1/2 years ago.
WibiData Kiji’s most fundamental goal — there are others too — is to tame HBase data modeling much as CQL tames Cassandra’s.

Categories: Cassandra, Clustering, Couchbase, Data models and architecture, DataStax, Facebook, HBase, Health care, Log analysis, Market share and customer counts, MongoDB, NoSQL, Petabyte-scale data management, Specific users

Subscribe to our complete feed!

Comments

10 Responses to “DataStax/Cassandra update”

Patrick McFadin on December 8th, 2013 2:09 pm

Hi Curt,

I just wanted to provide some clarification on the geo-replication feature usage at Netflix. They have been replicating, just not in an active-active state that is possible with Cassandra. They have just recently announced this project which you can read more about here: http://techblog.netflix.com/2013/12/active-active-for-multi-regional.html

Active-active architecture isn’t an easy thing to pull off and the engineers there have done an amazing job of creating the tools and services to make it work. The harder part of the infrastructure is data replication, which Cassandra makes much easier, and has allowed them to focus on other key areas.

Thanks!

Patrick
David Rydzewski on December 9th, 2013 12:35 am

Not only does CQL improve usability, it highlights the simplicity of the protocol used in RDBMS. String-based request and tabular response. The server is free to add language features w/out client library compatibility issues with each release.

A nice feature of Cassandra over HBase is that it is simpler to start small and grow. Cassandra runs in a single JVM (per node of course) and is easy to stand up. HBase depends on Hadoop file system, a bit heavier to get going on, and potentially harder to diagnose issues.
Thomas on December 9th, 2013 7:38 am

Hi,

“DataStax claims that Cassandra is good for high-velocity applications”

What do you (or they) mean by “high-velocity”? The sentence following the quote could imply fast in terms performance, or fast in terms of reaching eventual consistency. The term itself could also mean that Cassandra fits well in agile environments.

Which is true?

Thanks 🙂
Thomas
Curt Monash on December 9th, 2013 11:29 am

Thomas,

I try to use the term “velocity” to refer to updates per unit of time. I’m vague as to the metric (total data volume, number of operations, whatever).
Vlad Rodionov on December 9th, 2013 3:02 pm

“A generic example that DataStax supported with some Very Big Names — whether those were of customers or prospects wasn’t entirely clear — was in retailing, to actually serve accurate information as to whether inventory is in stock, something Walmart failed at as recently as last year.”

To actually serve accurate information writes must be consistent across the cluster (in Cassandra lingo – level ALL). It would be interesting to compare Cassandra with consistency level ALL and HBase head to head. I bet HBase will have substantial advantage.

One more comment:

I think its HBase Phoenix not WibiData Kiji can be compared to Cassandra’s CQL. Phoenix is SQL front end to HBase.
https://github.com/forcedotcom/phoenix

As for traditional Cassandra vs HBase , HBase is more feature rich (Coprocessors, filters, for example). For some applications this can be critical (cell level security, custom aggregates). Application logic can be pushed down to RS to reduce network usage and query latencies.
Curt Monash on December 9th, 2013 4:27 pm

Is Phoenix a salesforce.com project?
Keshav Murthy on December 9th, 2013 4:56 pm

Hi Curt,

regarding “phone-home” devices, you’re right about the pattern — each reading will be (timestamp, value1, value2, …) pattern. This can be stored efficiently if you store it as an array. Number of rows remain static and the table grows in the third dimension.

If you take the next step, for devices that that are expected to phone home at regular interval (every 30 mins or hour, etc), you can remove the timestamp altogether and simply store the value save additional space. Given a time, you can get its value simply by its logical offset. This is common in smart meters. Databases do have to deal with filling interpolation issue for missing values. When the “phone home” timestamp is non-regular, it efficient to store timestamp with the data.

Once you store this as an “array”, then the question of efficient traversal, insertion, etc will come into picture. The data does not always get loaded in increasing timestamp/etc.

For the curious, Informix has a specialized timeseries type, access method, language and fully relational exposure to the data. See some impressive benchmarks at: http://www-01.ibm.com/software/data/informix/smart-meter/

Interestingly, timeseries isn’t just applicable to high volume/velocity situations… It’s important to handle sensor data at the edge of IoT. Shaspa is using timeseries approach to sensor data management: http://www.tatung.com/en/news2013_09_09.asp
Vlad Rodionov on December 9th, 2013 7:13 pm

Yep, its salesforce.com.
clive boulton on May 23rd, 2014 12:26 am

Expedia moved its main travel sites to Cassandra
http://www.slideshare.net/clibou/seattle-scalability-meetup-10505322
Where the innovation is | DBMS 2 : DataBase Management System Services on March 1st, 2015 1:31 am

[…] and Variability have been solved. MongoDB, Cassandra and perhaps others are strong NoSQL choices. Schema-on-need is in earlier days, but may help […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

DataStax/Cassandra update

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin