I’m collecting data points on NoSQL and HVSP adoption
I was asked to do a magazine article on NoSQL, where by “NoSQL” is meant “whatever they talk about at NoSQL conferences.” By now the number of publications planning to run the article is up to 2, the deadline is next week and, crucially, it has been agreed that I may talk about HVSP in general, NoSQL and SQL alike.
It also is understood that, realistically, I can’t be expected to know and mention the very latest news for all the many products in the categories. Even so, I think this would be fine time to check just where NoSQL and HVSP adoption stand. Here is most of what I know, or links to same; it would be great if you guys would contribute additional data in the comment thread.
In the NoSQL area:
- Back in April, the VoltDB guys told me they thought Cassandra and HBase were the two NoSQL systems with the most momentum.
- I know distressingly little about HBase adoption, but a source who may or may not wish to remain anonymous was kind enough to alert me that Twitter and StumbleUpon each have ~30 node deployments, for analytics and analytics/HVSP respectively.
- I wrote in detail on Cassandra adoption last month. News since then includes:
- Facebook is rumored to have dropped Cassandra completely.
- Twitter clarified that it may not be quite as lovestruck by Cassandra as before, but they’re still very close friends.
- It’s not obvious that the Cassandra Summit unveiled a lot of new adoption stories.
- Northscale’s Membase is still in its early days. Zynga is bought in, however, as is something called NHN Korea. (Edit: I subsequently saw NHN Korea on a prominent SEO expert’s list of the top half dozen or so search engines in the world. Who knew?)
- Basho has listed a few Riak customers. If memory serves (I haven’t spoken with Basho for a while, and some of my notes are misplaced due to some computer sloppiness), Basho has a few dozen customers in total.
- Mozilla has a 4 machine, 64 core Riak cluster in production.
- Hypertable has a few users/project sponsors, Baidu being the biggest name among them.
- I don’t really know how the MongoDB/10gen guys are doing. I think this is at least as much my fault as theirs. Anyhow, they seem to have links to a couple of folks who have written about MongoDB usage.
- NimbusDB is still in stealth mode. I’d be surprised if they had users for a while yet, since in January they didn’t yet sound as if development was very far underway. (Actually, I forget whether NimbusDB is supposed to be SQL-based or not.)
Among the SQL or SQL-friendly guys:
- Clustrix says it has a few production users, some big-name, but is not disclosing them yet.
- dbShards has around 6 customers, including Facebook. (Facebook may outpace even Twitter and Zynga in using the most products mentioned in this post.)
- As of May, VoltDB had one paying customer, plus 150 beta customers who weren’t in production yet.
- Akiban says they’ll get me up to speed on Thursday. 🙂
- ScaleDB seems to be pedaling along in perennial beta. Whether ScaleDB has any actual beta users is less clear. On the plus side, checking that out uncovered a pretty funny April Fool blog post.
- Groovy Corporation seems to have disappeared, or morphed into something called uCirrus, or something like that.
Comments
18 Responses to “I’m collecting data points on NoSQL and HVSP adoption”
Leave a Reply
John Kreisa of Cloudera sent over a bunch of links re HBase, of which the last one may be most to the point:
http://blog.mozilla.com/webdev/2010/07/26/moving-socorro-to-hbase/
http://highscalability.com/blog/2010/3/16/1-billion-reasons-why-adobe-chose-hbase.html
https://hbase.s3.amazonaws.com/hbase/HBase-Trend-HUG10.pdf
http://wiki.apache.org/hadoop/Hbase/PoweredBy
Mozilla has a few other clusters too.
Besides our 4 node Vertica cluster serving our data warehouse, we have a 5 node 80 core staging Hadoop+HBase cluster, a 20 node 296 core production Hadoop+HBase cluster and a new 8 node 128 core test cluster that we are putting the latest versions of Cloudera’s CDH3 and HBase 0.90 on.
Guess we could also toss in a 7 node Mac Mini cluster for prototyping and our two big ETL servers that run Pentaho Data Integration processes.
In the NoSQL portion of your post, you neglect to mention Apache CouchDB. I know you’ve been critical of CouchDB in the past, but it did recently release it’s 1.0 version (followed quickly by 1.0.1 to fix a critical bug). There are at least two companies that do commercial support and services for CouchDB, Cloudant (full disclosure, I am a cofounder of Cloudant) and Couch.io.(1)
There are a great number of companies using CouchDB in production. The BBC, for instance, runs their website off of CouchDB. I know of a large retail chain (not at liberty to disclose their name) uses a CouchDB-based system to connect their many points-of-sale to their main data center. At Cloudant, we’ve built dynamo-style clustering into CouchDB to provide horizontal scalability. We have customers in real-time search, advertising, and data analytics, some of whom have many TB of data and up to 1 billion documents in a single db.
CouchDB does not really fall into the HVSP category, but it still warrants mentioning as a NoSQL option that is gaining traction with production users.
(1):
http://cloudant.com
http://couch.io
Daniel,
Thanks for all the info!
What are the specs on those 16-core machines, and why so many cores?
MongoDB is doing great ! Regarding MongoDB usage in production:
A public, partial list of people who are using MongoDB in production can be found here:
http://www.mongodb.org/display/DOCS/Production+Deployments
We’ve been seeing a significant number of people moving from development into production, so that list is growing. In terms of downloads, we’re seeing more than 50k database server downloads/month, and that number is also growing rapidly.
Thanks,
-Roger
10Gen.com / mongodb.org
Curt,
If you are truly interested in what’s going on in the NoSQL market, I think you should check on regular bases the myNoSQL blog: http://altdbase.com which is focused exactly on this.
PS: yes, I am biased as I’m the main maintainer of the myNoSQL blog
Alex,
I scrolled through the first two pages, and I saw almost nothing that addressed the question in this blog post. How much further back should I go?
Or if you were just plugging your blog because it is indeed active in providing other kinds of NoSQL news — well, consider it plugged! 🙂
Curt,
The blog covers 9 months of activity in the NoSQL market, so it will be kind of difficult to get a detailed answer to your question directly on the home page :-). The tagging system should allow you to look for the status of the products you are missing details about.
Hi Curt — There is a significant base of NoSQL implementations if you include the ~40-year old data models of PICK and MUMPS, which are now known as MultiValue databases from many vendors and M or InterSystems Cache’. The latter has both of these NoSQL data models. Most of these vendors have worked hard over the years to project their models for a SQL implementation as well, so they would not all claim to be NoSQL. In the case of InterSystems, their SQL implementation is the fastest I have experienced, for example, but it need not be used in an application.
Some folks defining the NoSQL label want to limit this tag to new databases or specific data models or architectures, but I am convinced it should include these older implementations too. It might not be proof positive, but when I made the no sql graphic on this blog entry http://www.tincat-group.com/mewsings/2007/01/otlt-metadata-piece-not-apartheid.html it was after a colleague acquired the nosql.com and .org domains, when we were planning to use those to showcase MultiValue databases and applications. We changed directions but he still has those domains. That and the fact that all MultiValue databases can be accessed without SQL (some also with SQL) should be enough to give these a seat at the table.
Including the pre-relational data models (again, not the way these are positioned by marketing teams) gives a significant installed base from vendors such as InterSystems, Rocket Software, Tiger Logic, Revelation, jBASE, Ladybridge, and Northgate. I think these no sql databases and their logical NF2 data models should be mentioned in any treatment regarding real implementations of NoSQL databases. Thanks for your consideration and cheers!
Hi Dawn!
Personally, I don’t think the term “NoSQL” is meaningful if it includes classic DBMS that happen to have a different approach to data organization or DML. And your examples are part of the reason why.
But I do think of you every time I hear of multi-value as an exciting new feature. 😉
Best,
CAM
Hi Curt, I’m glad you enjoyed my April Fools post. I figured someone out there might appreciate it. Regarding our perennial beta, it has been really more of a deciduous beta. We started with one locking architecture, only to find people wanted more nodal scalability and switched to another (done). Then we found that sharing data via disk doesn’t provide sufficient performance (rather obvious, but we thought we could squeeze by on that one for a while) so we developed an alternative to Oracle’s cache fusion based on a cache tier (analogous to Memcached, but between the DB and storage devices) that also provides more storage flexibility (see various blog posts here: http://scaledb.blogspot.com/). This cache tier is in the final stages of debug/tuning work. It seems that we climb a mountain only to see another hidden behind it; fitting based on our name and logo I guess. We believe the last big mountain is addressed by the cache tier and we will soon re-enter beta in weeks.
People do talk about durability when it comes to MongoDB and whether this issue is an issue or an ‘issue’. See http://nosql.mypopescu.com/post/392868405/mongodb-durability-a-tradeoff-to-be-aware-of
I think that one of the reasons why you see so many downloads of MongoDB is that they often come with a new version. I for instance have a folder called “c:\nosql\mongodb” on my laptop that contains sub folders mongodb124, mongodb140, mongodb141, mongo142, mongodb151, mongodb152, mongodb155 and mongodb160. So I download a new version quite often.
This however also shows that MongoDB is very easy to install.
I posted a paper comparing some of the NoSQL and SQL NVSP systems on my website, if that is of any help to you or anyone else:
http://cattell.net/datastores/
I plan to do an update to that paper in September, if you have any input.
Looks good, Rick, with a lot of detail I’m unlikely to ever post here. 🙂
[…] posting last Wednesday morning that I’m looking into NoSQL and HVSP, I’ve had a lot of conversations, including with (among […]
Adoption in the Cassandra community has continued to increase since this blog post. And today another important step was taken to facilitate further adoption. For the first time, documentation is available to addresses such things as installation and configuration to data modeling. A reference guide for the API is also available.
The documentation is hosted on Riptano’s website. It can be viewed here: http://www.riptano.com/docs/0.6.5
[…] particularly interesting. First, those 5 TB/day are going straight into Vertica (from, I presume, memcached/Membase/Couchbase), as Zynga decided that sending the data to some kind of log first was more trouble than it’s […]
Every relationship goes through its ups and downs, but most importantly,
we love each other very much and are committed to being a family,’ she wrote.