NoSQL notes
Last week I visited with James Phillips of Couchbase, Max Schireson and Eliot Horowitz of 10gen, and Todd Lipcon, Eric Sammer, and Omer Trajman of Cloudera. I guess it’s time for a round-up NoSQL post. 🙂
Views of the NoSQL market horse race are reasonably consistent, with perhaps some elements of “Where you stand depends upon where you sit.”
- As James tells it, NoSQL is simply a three-horse race between Couchbase, MongoDB, and Cassandra.
- Max would include HBase on the list.
- Further, Max pointed out that metrics such as job listings suggest MongoDB has the most development activity, and Couchbase/Membase/CouchDB perhaps have less.
- The Cloudera guys remarked on some serious HBase adopters.*
- Everybody I spoke with agreed that Riak had little current market presence, although some Basho guys could surely be found who’d disagree.
*I hope to do a separate post on HBase adoption soon. In connection with that, any info on HBase adoption by Facebook (said to be very heavy), Twitter, et al. would be much appreciated.
The reasons for using NoSQL of course are, in some order, dynamic schemas, scale-out, and open source. I find the scale-out argument somewhat bogus,* but the data model one is very real. Depending on whom you talk with, the most important point about dynamic schemas may actually be that they’re changeable, or it may just be that you don’t have to specify a schema at the time of initial application design. MongoDB gets particular praise as a good platform on which to throw something together quickly, although predictions as to how far the application will then scale may differ depending on whether you’re talking with, say, Max or Todd.
*It’s fair to say that NoSQL systems are more proven in scale-out than most relational DBMS. Even so, I would cringe at any line of reasoning that concluded one should adopt NoSQL because it is more mature than relational alternatives.
Finally, I was perhaps too extreme when I suggested there was no good reason for Oracle to have adopted the major key/minor key approach it took in its NoSQL offering. Todd offered a reason why that approach – which he characterized as similar to Project Voldemort’s – could make sense:
- If you have some kind of global secondary index, it’s hard to maintain that index consistently without what amounts to distributed transactions.
- If you want to avoid the overhead of those, one alternative is a column-group system such as HBase or Cassandra. Those have no indexes at all, except in the sense that a column is its own index.
- Another alternative is to load as much indexing information as you can into the key of a key-value store.
I’d be interested to learn about the Couchbase and MongoDB answers to that challenge.
Comments
12 Responses to “NoSQL notes”
Leave a Reply
MongoDB and Couchbase are single-rack solutions right now afaik. Although I am not a big fun of HBase but potentially it can scale to hundreds and thousands of servers (Facebook will probably prove it as soon).
Re HBase adoption at FB:
This preso gives some good numbers: http://www.slideshare.net/brizzzdotcom/facebook-messages-hbase
The FB guys also maintain a HBase@FB group: https://www.facebook.com/UsingHbase
The latest message there reads: “Fun fact Facebook scaling fact of the day: the HBase clusters supporting the messages product have over 1 Petabyte of online capacity”
For Twitter, AFAIK their HBase setup is fairly limited and they don’t disclose a lot of information about it.
I’d also like to note that Huawei (http://en.wikipedia.org/wiki/Huawei) has a growing presence within the Hadoop and HBase dev communities.
(Disclaimer: I’m an HBase committer)
Thanks!!
Please hit me with any other examples you think I should be aware of. 🙂
I like this post on NoSQL but I think its honestly extremely influenced by the opinions of very few. You may want to approach NoSQL from 3 different standpoints — (a) origins/inspiration — Google Bigtable, Amazon Dynamo, or any other (b) use cases — not all NoSQL databases are made equal or try to solve the same problem (c) Features like ability to scale-out, less stringent schema or no schema, map-reduce style parallel processing of large data sets, etc…I wrote a book explaining some of these. Look at Professional NoSQL (Wiley, 2011) and feel free to get in touch and I would be happy to explain some of these in greater detail.
[…] NoSQL notes […]
@Shashank,
Have you seen some of my other posts that address most of the issues you cite?
Trend Micro has about 100 HBase nodes running in secure configuration, a modest data point by size but interesting for two reasons:
1) “Non-traditional” adopter, if that term makes any sense for nonrelational database technology; Trend is not a Facebook or a Twitter
2) This is a secure variant of HBase, integrated with Hadoop’s Kerberos authentication and supporting a familiar permissions grant model on the column family and table levels (ADMIN, CREATE, READ, WRITE, etc.) The enabling features are going in to 0.92 release I believe.
(Disclaimer: I’m an HBase committer.)
Also, we recently welcomed a committer from Salesforce.com to the project, so I think we shall see HBase address in more ways enterprise-y concerns: further attention to multitenancy (user isolation), and inclusion of constraints checking and transactional semantics as they make sense applied to the BigTable system model.
Thanks, guys! Keep ’em coming!
James *would* say that wouldn’t he? 🙂
Seriously though, CouchDB (base?) will be pretty damn cool once it goes live (‘specially if/when they can merge in the cloudant/dynamo stuff), but for now Redis and HBase are definitely in the mix.
I don’t disagree with the conclusion, but it’s disappointing the Voldemort and Riak don’t seem to be getting the same mindshare as these three. I hope someone from one of those teams will stop by and tell us something about their larger deployments, because I think they’re both fine projects that people should seriously consider alongside those mentioned.
@Obdurodon: In this context, Voldemort’s uptake is probably limited because of the lack of commercial support available. There are commercial entities you can buy support from for all the other offerings mentioned. A lot of application owners probably feel nervous about adopting something that doesn’t offer them somewhere to turn to if they have problems.
(Disclaimer: I work for 10gen.)