Mike Stonebraker’s DBMS taxonomy
In a response to my recent five-part series on DBMS diversity, Mike Stonebraker has proposed his own taxonomy of data management technologies over on Vertica’s Database Column blog. (Edit: Some good stuff disappeared when Vertica nuked that blog.)
- OLTP DBMSs focused on fast, reliable transaction processing
- Analytic/Data Warehouse DBMSs focused on efficient load and ad-hoc query performance
- Science DBMSs — after all MatLab does not scale to disk-sized arrays
- RDF stores focused on efficiently storing semi-structured data in this format
- XML stores focused on semi-structured data in this format
- Search engines — the big players all use proprietary engines in this area
- Stream Processing Engines focused on real-time StreamSQL
- “Lean and Mean,” less-than-a-database engines focused on doing a small number of things very well (embedded databases are probably in this category)
- MapReduce and Hadoop — after all Google has enough “throw weight” to define a category
He goes on to say that each will be architected differently, except that — as he already convinced me back in July — RDF will be well-managed by specialty data warehouse DBMS.
I must confess that I didn’t explicitly mention array-based data stores, whether scientific ones, nor the remaining native MOLAP (Multi-Dimensional OnLine Analytic Processing) engines, nor the sui generis SAS Intelligence Storage relational data warehouse product. So great catch there. On the not-so-great side, I think Mike’s definitions of categories #8 and #9 are a bit fuzzy (embedded DBMS tend to be full DBMS, but MapReduce is less than a DBMS). And of course any finite list like his will make over-general assumptions (e.g., it’s not obvious the StreamSQL-based CEP vendors will blow away rule-oriented Apama) and omit edge cases.
But there’s really only one point on which we have meaningful disagreement — Mike dumps all OLTP and general-purpose relational DBMS into a single bucket. Considering that such products currently represent a large majority of the multi-billion dollar DBMS market, I think some finer distinctions are in order. At a minimum, let’s break them into two categories — high-end vs. mid-range. High-end systems have maximum robustness, whether because there’s a real application need or because it just makes their owners feel good. Mid-range systems do everything high-end systems did in the 1990s, and are a cheaper/better alternative for ever more database management tasks.
The series on database diversity (more links at the bottom of Part 1):
- Part 1: Database management system choices – overview
- Part 2: Database management system choices – 4 categories of relational
- Part 3: Database management system choices – relational data warehouse
- Part 4: Database management system choices – mid-range-relational
- Part 5: Database management system choices – beyond relational
Comments
6 Responses to “Mike Stonebraker’s DBMS taxonomy”
[…] a recent webcast, I presented an 11-node data management software taxonomy, updating a post commenting on Mike Stonebraker’s. It […]
[…] Earlier I thought Mike was forgetting about the distinction between high-end and mid-range RDBMS. Naturally, that didn’t last long. He’s actually calling the mid-range systems “open source”, but that’s a decent first approximation to a hard-to-define category. […]
[…] build a little taxonomy for the variety in database technology. One effort was 4 1/2 years ago, in a pre-planned exchange with Mike Stonebraker (his side, alas, has since been taken down). A year ago I spelled out eight kinds of analytic […]
[…] DBMS attempts with Postgres and Illustra/Informix, then more recently suggesting the world needs 9 or so kinds of database technology. As for me — well, I agreed with Mike both […]
That is really interesting, You are a very professional blogger.
I’ve joined your rss feed and stay up for looking for more of your magnificent
post. Also, I have shared your website in my social networks
Hey I know this is off topic but I was wondering if you knew of any widgets I could add to my blog that automatically tweet my newest twitter updates.
I’ve been looking for a plug-in like this for quite some time and
was hoping maybe you would have some experience with something like this.
Please let me know if you run into anything. I truly enjoy reading
your blog and I look forward to your new updates.