Introduction to Cloudant
Cloudant is one of the few NoSQL companies with >100 paying subscription customers. For starters:
- Cloudant’s core software is a fork of CouchDB.
- Cloudant only sells you software as a service.
- More precisely, whether Cloudant offers DBaaS (DataBase as a Service) or PaaS (Platform as a Service) or a “data layer” (Cloudant’s preferred terminology) depends on your taste in buzzwords.
- I gather that Cloudant (the company) wants to handle pretty much all your data management needs. But Cloudant (the product) isn’t there yet, especially on the analytic side.
- Before CouchDB and Membase joined together, Cloudant was positioned as the big(ger) data version of CouchDB.
Company demographics include:
- Cloudant is based in Boston.
- Cloudant started out as a Y Combinator company in 2008, and “got serious” in 2009.
- Cloudant now has ~20 employees.
- Management hires include a couple of former Vertica guys.
The Cloudant guys gave me some customer counts in May that weren’t much higher than those they gave me in February, and seem to have forgotten to correct the discrepancy. Oh well. The latter (probably understated) figures included ~160 paying customers, of which:
- ~100 were multitenant.
- ~60 were single tenant.
- 1 was on-premise (but still managed by Cloudant) because of privacy concerns.
The largest Cloudant deployments seem to be in the 10s of terabytes, across a very low double digit number of servers.
The difference between single- and multi-tenant Cloudant is:
- Just as you would think, single-tenant customers get their own sets of servers, while multi-tenant customers share servers with others.
- There’s a fixed pricing scheme for multi-tenant customers, while single-tenant pricing is “let’s make a deal”.
Monthly costs (in dollars) for multi-tenant customers are typically 1-3 digits; for single-tenant they’re typically 4-5.
Despite only being available as a service, Cloudant has a free option too. It has >7000 total sign-ups. 2/3 of sign-ups wind up at least creating a database. But Cloudant doesn’t have figures available for production (as opposed to development-only) use on the free side.
Cloudant has some big-name customers, both among traditional enterprises and internet companies. Two of the flashier ones are:
- OMGPOP used Cloudant to build a new subsystem rather than continuing entirely with Membase/Couchbase. However, OMGPOP was acquired by flagship Membase user Zynga, so that relationship is expiring, leaving behind a glowing quote to remember it by.
- Monsanto is using Cloudant to manage genomic data (and hence is a non-internet user).
Cloudant says that CouchDB users used to constitute 100% of its pipeline, and still make up a (shrinking) majority.
There’s been some recent drama in the CouchDB world. Couchbase (the company) ran into delays merging CouchDB into Couchbase — often because of performance challenges — and no longer positions Couchbase as a straightforward scale-out enhancement to CouchDB. Realistically, if you like CouchDB but just wish it would scale out, you should still talk to both Couchbase and Cloudant, but it’s no longer the case that Couchbase is the obvious leader of the CouchDB community.
So how do you get at data in Cloudant? The basics seem to be:
- CouchDB and Cloudant are JSON-based document-oriented NoSQL stores.
- Cloudant’s core indexing system is an append-only b-tree. Supplementary approaches are being researched.
- Actually, there are at least two b-trees, one for document_ID and one for time of last update (not original document creation). The latter index is to support a kind of incremental MapReduce, which is used to, among other things:
- Create secondary indexes (and to do so without blocking writes).
- To build simple aggregates.
- There’s full-text search based on Lucene libraries (but not the Lucene indexer).
- You can replicate from Cloudant to CouchDB, which seems to be the main way to replicate to the outside world.
The essence of Cloudant’s incremental MapReduce seems to be that data is selected only if it’s been updated since the last run. Obviously, this only works for MapReduce algorithms whose eventual output can be run on different subsets of the target data set, then aggregated in a simple way.
Finally, some other technical notes on Cloudant include:
- Cloudant’s clustering scheme is much as you’d expect:
- Consistent hashing.
- RYW quorum consistency, with a default of 3 copies (across 2 data centers), 2 reads, and 2 writes.
- Cloudant has rewritten various components of CouchDB for performance or performance predictability, often in C (vs. the Erlang that the rest of CouchDB is written in). These include:
- JSON handling.
- I/O prioritization/(mixed) workload management.
- Compaction (which necessitated some changes to the core storage model).
- Cloudant has generally preserved CouchDB’s goodness in terms of synchronization and so on, which I gather is based on maintaining a sequence of updates and surfacing cases where multi-master edits cause conflicts.
- Multi-tenant servers still use disks (as opposed to solid-state storage). Single-tenant customers can choose among various different configurations.
- No doubt Cloudant has written various management and administrative aids, but we didn’t talk about those much. Those are things Cloudant uses, much more than it exposes them to its customers.
Comments
2 Responses to “Introduction to Cloudant”
Leave a Reply
[…] ones.) So I feel like making a quick post about it. For starters, I’ll quote myself about Cloudant: The essence of Cloudant’s incremental MapReduce seems to be that data is selected only if it’s […]
[…] motivation behind the single case of on-premises enterprise SaaS I have confirmed, namely one that Cloudant told me about.* (I don’t have similar levels of detail about Glassbeam’s one […]