Defining NoSQL
A reporter tweeted: “Is there a simple plain English definition for NoSQL?” After reminding him of my cynical yet accurate Third Law of Commercial Semantics, I gave it a serious try, and came up with the following. More precisely, I tweeted the bolded parts of what’s below; the rest is commentary added for this post.
NoSQL is most easily defined by what it excludes: SQL, joins, strong analytic alternatives to those, and some forms of database integrity. If you leave all four out, and you have a strong scale-out story, you’re in the NoSQL mainstream.
- Thus, I’d say Cassandra, HBase, Mongo DB, and Couchbase are prime examples, in no particular order. Riak as well.
- I might have phrased that better if I’d used a different word than simply “strong” — but hey, there was a 140-character limit, and he was on deadline.
Using NoSQL can make sense when at least one of two things is paramount: low-cost scale-out or dynamic schemas.
- There are some seriously sensible use cases for dynamic schemas.
- “Low-cost” generally boils down to:
- Performance.
- Open source free-like-beer.
- Not a lot of database administration.
I’ve generally given object-oriented DBMS vendors and also MarkLogic hard times whenever they consider saying they’re “NoSQL”. Reasons include:
- Closed source.
- Database administration overhead (even if you get good stuff for incurring that overhead, like MarkLogic’s comprehensive indexing).
Also, NoSQL started out being ACID-unfriendly.
What you give up are the query flexibility and the easily automatic data integrity of SQL-based systems. I should have added something about a mature ecosystem.
In the most recent live example, I influenced a client away from Cassandra and toward scale-out MySQL (dbShards and/or Schooner flavors, most likely). Part of the reason was the ability to do joins, which are useful in their application. Another part is that their development practices obviated any significant benefit from dynamic schemas. But perhaps the most important — or at least resonant — reason of all was that they really, really cared about .NET support.
Comments
7 Responses to “Defining NoSQL”
Leave a Reply
Would it still be NoSQL if it left out joins but included product, union, restriction, intersect, minus and projection?
Joins are pretty essential to SQL.
Well SQL Server didn’t have the join operator until version 7 and Oracle first in 9i.
Does this mean that earlier versions of these database server products were NoSQL? So presumably Oracle can remarket Version 8 as a NoSQL product.
Will,
Frankly the join comment is rubbish. Oracle jas had joins from since v2. Where did you decide it was introduced in 9i?
@anon
You should read my post more carefully, I only said that Oracle didn’t support the join operator until version 9i. Previous to that join was performed by combining two other relational operators (it should be obvious which two). So my statement is correct.
You could thus argue that join is not a fundamental operator in the relational model (or in SQL) as you can express all join operations in terms of other operators.
So there you are; SQL is NoSQL.
My point being that the arguments of the NoSQL brigade don’t seem to be based on much understanding of SQL, still less of the relational model.
I consider the JOIN operator to be “syntactic sugar”. You could always perform a JOIN operation within the WHERE clause. And in the case of correlated sub-queries (i.e the EXISTS clause) it’s still done that way.
The notion of JOINing data has been intrinsic to all SQL databases pretty much since day one.
That notwithstanding, RDBMS vendors have for some time lagged behind Codd’s 12 Rules, which is why he published them to begin with.
The important thing to understand is that the relational model is as more of an ideal than a reality. Most RDBMSs are closer to this ideal than any NoSQL database. In fact, most of them meet all 12 (or 13) rules.
If you truly grok what Codd was trying to do, you will begin to see NoSQL databases as stop-gap solutions. Yes, traditional RDBMSs are most costly to scale, but that’s what it really boils down to: hardware cost of scaling.
[…] […]