MongoDB update
One pleasure in talking with my clients at MongoDB is that few things are NDA. So let’s start with some numbers:
- >2,000 named customers, the vast majority of which are unique organizations who do business with MongoDB directly.
- ~75,000 users of MongoDB Cloud Manager.
- Estimated ~1/4 million production users of MongoDB total.
Also >530 staff, and I think that number is a little out of date.
MongoDB lacks many capabilities RDBMS users take for granted. MongoDB 3.2, which I gather is slated for early November, narrows that gap, but only by a little. Features include:
- Some JOIN capabilities.
- Specifically, these are left outer joins, so they’re for lookup but not for filtering.
- JOINs are not restricted to specific shards of data …
- … but do benefit from data co-location when it occurs.
- A BI connector. Think of this as a MongoDB-to- SQL translator. Using this does require somebody to go in and map JSON schemas and relational tables to each other. Once that’s done, the flow is:
- Basic SQL comes in.
- Filters and GroupBys are pushed down to MongoDB. A result set … well, it results. 🙂
- The result set is formatted into a table and returned to the system — for example a business intelligence tool — that sent the SQL.
- Database-side document validation, in the form of field-specific rules that combine into a single expression against which to check a document.
- This is fairly simple stuff — no dependencies among fields in the same document, let alone foreign key relationships.
- MongoDB argues, persuasively, that this simplicity makes it unlikely to recreate the spaghetti code maintenance nightmare that was 1990s stored procedures.
- MongoDB concedes that, for performance, it will ordinarily be a good idea to still do your validation on the client side.
- MongoDB points out that enforcement can be either strict (throw errors) or relaxed (just note invalid documents to a log). The latter option is what makes it possible to install this feature without breaking your running system.
There’s also a closed-source database introspection tool coming, currently codenamed MongoDB Scout.
- The name will change, in part because if you try to search on that name you’ll probably find an unrelated Scout. 🙂
- Scout samples data, runs stats, and all that stuff.
- Scout is referred to as a “schema introspection” tool, but I’m not sure why; schema introspection sounds more like a feature or architectural necessity than an actual product.
As for storage engines:
- WiredTiger, which was the biggest deal in MongoDB 3.0, will become the default in 3.2. I continue to think analogies to InnoDB are reasonably appropriate.
- An in-memory storage engine option was also announced with MongoDB 3.0. Now there’s a totally different in-memory option. However, details were not available at posting time. Stay tuned.
- Yet another MongoDB storage engine, based on or akin to WiredTiger, will do encryption. Presumably, overhead will be acceptably low. Key management and all that will be handled by usual-suspect third parties.
Finally — most data management vendors brag to me about how important their text search option is, although I’m not necessarily persuaded. 🙂 MongoDB does have built-in text search, of course, of which I can say:
- It’s a good old-fashioned TF/IDF algorithm. (Text Frequency/Inverse Document Frequency.)
- About the fanciest stuff they do is tokenization and stemming. (In a text search context, tokenization amounts to the identification of word boundaries and the like. Stemming is noticing that alternate forms of the same word really are the same thing.)
This level of technology was easy to get in the 1990s. One thing that’s changed in the intervening decades, however, is that text search commonly supports more languages. MongoDB offers stemming in 8 or 9 languages for free, plus a paid option via Basis for other languages yet.
Related links
- BI for NoSQL (March, 2015)
- Uninterrupted DBMS operation (September, 2012)
Comments
6 Responses to “MongoDB update”
Leave a Reply
MongoDB lacks some features more mature SQL solutions include, and many applications require. More mature SQL solutions lack features to make sharded replica sets easy to manage. Who gets there first? This will be an interesting race.
Group By, Join, BI Connectors sounds relational. It means that to benefit from all of it developers should restrict themselves to specific schema… It means that in one hand developers are stripped from “schemaless” benefits. In other hand, data, which do not match the schema, will be missing in reports…
Hi Curt, thanks for the post.
One small clarification – your point about the performance of validation. For any DBMS I think it is a good idea and common practice to do some amount of validation client side – you shouldn’t round trip to the DBMS if you don’t have to.
Fair enough, Kelly. But it does somewhat undermine the importance of the feature. 🙂
[…] MongoDB isn’t the only company I reached out to recently for an update. Another is DataStax. I chatted mainly with Patrick McFadin, somebody with whom I’ve had strong consulting relationships at a user and vendor both. But Rachel Pedreschi contributed the marvelous phrase “twinkling dashboard”. […]
Reassuring MongoDB 3.2 default WiredTiger storage engine were also architects of Berkeley DB. Messrs Keith Bostic / Michael Cahill …