Multi-model database managers
I’d say:
- Multi-model database management has been around for decades. Marketers who say otherwise are being ridiculous.
- Thus, “multi-model”-centric marketing is the last refuge of the incompetent. Vendors who say “We have a great DBMS, and by the way it’s multi-model (now/too)” are being smart. Vendors who say “You need a multi-model DBMS, and that’s the reason you should buy from us” are being pathetic.
- Multi-logical-model data management and multi-latency-assumption data management are greatly intertwined.
Before supporting my claims directly, let me note that this is one of those posts that grew out of a Twitter conversation. The first round went:
Merv Adrian: 2 kinds of multimodel from DBMS vendors: multi-model DBMSs and multimodel portfolios. The latter create more complexity, not less.
Me: “Owned by the same vendor” does not imply “well integrated”. Indeed, not a single example is coming to mind.
Merv: We are clearly in violent agreement on that one.
Around the same time I suggested that Intersystems Cache’ was the last significant object-oriented DBMS, only to get the pushback that they were “multi-model” as well. That led to some reasonable-sounding justification — although the buzzwords of course aren’t from me — namely:
Caché supports #SQL, #NoSQL. Interchange across tables, hierarchical, document storage.
Along the way, I was reminded that some of the marketing claims around “multi-model” are absurd. For example, at the time I am writing this, the Wikipedia article on “multi-model database” claims that “The first multi-model database was OrientDB, created in 2010…” In fact, however, by the definitions used in that article, multi-model DBMS date back to the 1980s, when relational functionality was grafted onto pre-relational systems such as TOTAL and IDMS.
What’s more, since the 1990s, multi-model functionality has been downright common, specifically in major products such as Oracle, DB2 and Informix, not to mention PostgreSQL. (But not so much Microsoft or Sybase.) Indeed, there was significant SQL standards work done around datatype extensions, especially in the contexts of SQL/MM and SQL3.
I tackled this all in 2013, when I argued:
- One database to rule them all systems aren’t very realistic, but even so, …
- … single-model systems will become increasingly obsolete.
Developments since then have been in line with my thoughts. For example, Spark added DataFrames, which promise substantial data model flexibility for Spark use cases, but more mature products have progressed in a more deliberate way.
What’s new in all this is a growing desire to re-integrate short-request and analytic processing — hence Gartner’s new-ish buzzword of HTAP (Hybrid Transactional/Analytic Processing). The more sensible reasons for this trend are:
- Operational applications have always needed to accept immediate writes. (Losing data is bad.)
- Operational applications have always needed to serve small query result sets based on the freshest data. (If you write something into a database, you might need to immediately retrieve it to finish the business operation.)
- It is increasingly common for predictive decisions to be made at similar speeds. (That’s what recommenders and personalizers do.) Ideally, such decisions can be based on fresh and historical data alike.
- The long-standing desire for business intelligence to operate on super-fresh data is, increasingly, making sense, as we get ever more stuff to monitor. However …
- … most such analysis should look at historical data as well.
- Streaming technology is supplying ever more fresh data.
But here’s the catch — the best models for writing data are the worst for reading it, and vice-versa, because you want to write data as a lightly-structured document or log, but read it from a Ted-Codd-approved RDBMS or MOLAP system. And if you don’t have the time to move data among multiple stores, then you want one store to do a decent job of imitating both kinds of architecture. The interesting new developments in multi-model data management will largely be focused on that need.
Related links
- The two-policemen joke seems ever more relevant.
- My April, 2015 post on indexing technology reminds us that one DBMS can do multiple things.
- Back in 2009 integrating OLTP and data warehousing was clearly a bad idea.
Comments
3 Responses to “Multi-model database managers”
Leave a Reply
Multi-model seems be to shifting from Caché class #SQL, #NoSQL support, to support for management of different storage engines in the same cluster. MongoDB’s Wired Tiger.
Perhaps the real break thru is eliminating the “impedance mismatch” between application programming and frugal multitenancy. http://www.vldb.org/pvldb/vol8/p1668-shukla.pdf
I am currently working on a new general-purpose data management system that supports several different data models. It was originally designed as a file system replacement that uses a new kind of data object I invented (a Didget) to manage unstructured data with extensive structured meta-data extensions (i.e. tags). It essentially became a new kind of file system that had a lot of database features.
The database features worked so well, that I tried to implement a traditional RDBMS on top of it. Tables are created using a column based set of key/value stores. So far, I am able to outperform traditional database managers (MySQL and PostgreSQL) at query operations. My system does not use indexes, yet my queries come back faster (and require less disk reads and memory footprint) than fully indexed tables on those other systems.
The system is still under development but all the tests so far are looking very promising. I want this product to be a platform upon which relational databases, graph databases, document databases, and key/value stores can be built. It is versatile enough to meet the needs of applications that currently need file systems, SQL stores, and NoSQL stores to manage large sets of both structured and unstructured data.
A simple video demonstration can be seen at:
https://www.youtube.com/watch?v=2uUvGMUyFhY
Update – Using some multi-threading techniques, I have a version that is now even faster. See how fast at https://youtu.be/0X02xpy8ygc