August 7, 2016

Notes on DataStax and Cassandra

I visited DataStax on my recent trip. That was a tipping point leading to my recent discussions of NoSQL DBAs and misplaced fear of vendor lock-in. But of course I also learned some things about DataStax and Cassandra themselves.

On the customer side:

DataStax customers still overwhelmingly use Cassandra for internet back-ends — web, mobile or otherwise as the case might be.
This includes — and “includes” might be understating the point — traditional enterprises worried about competition from internet-only ventures.

Customers in large numbers want cloud capabilities, as a potential future if not a current need.

One customer example was a large retailer, who in the past was awful at providing accurate inventory information online, but now uses Cassandra for that. DataStax brags that its queries come back in 20 milliseconds, but that strikes me as a bit beside the point; what really matters is that data accuracy has gone from “batch” to some version of real-time. Also, Microsoft is a DataStax customer, using Cassandra (and Spark) for the Office 365 backend, or at least for the associated analytics.

Per Patrick McFadin, the four biggest things in DataStax Enterprise 5 are:

Graph capabilities.
Cassandra 3.0, which includes a complete storage engine rewrite.
Tiered storage/ILM (Information Lifecycle Management).
Policy-based replication.

Some of that terminology is mine, but perhaps my clients at DataStax will adopt it too. 🙂

We didn’t go into as much technical detail as I ordinarily might, but a few notes on that tiered storage/ILM bit are:

It’s a way to have some storage that’s more expensive (e.g. flash) and some that’s cheaper (e.g. spinning disk). Duh.
Since Cassandra has a strong time-series orientation, it’s easy to imagine how those policies might be specified.
Technologically, this is tightly integrated with Cassandra’s compaction strategy.

DataStax Enterprise 5 also introduced policy-based replication features, not all of which are in open source Cassandra. Data sovereignty/geo-compliance is improved, which is of particular importance in financial services. There’s also hub/spoke replication now, which seems to be of particular value in intermittently-connected use cases. DataStax said the motivating use case in that area was oilfield operations, where presumably there are Cassandra-capable servers at all ends of the wide-area network.

Related links

I wrote in detail about Cassandra architecture in December, 2013.
I wrote about intermittently-connected data management via the relational gold standard SQL Anywhere in July, 2010.

Categories: Cassandra, DataStax, Microsoft and SQL*Server, NoSQL, Specific users

Subscribe to our complete feed!

Comments

2 Responses to “Notes on DataStax and Cassandra”

Notes from a long trip, July 19, 2016 | DBMS 2 : DataBase Management System Services on August 9th, 2016 3:52 am

[…] Ditto DataStax. […]
Crayon Shin-chan on August 15th, 2016 6:28 pm

“Also, Microsoft is a DataStax customer, ”

I remember Microsoft used to run their SAP on Oracle, I wonder if they still do.

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Notes on DataStax and Cassandra

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin