Database diversity
Discussion of choices and variety in database management system architecture. Related subjects include:
Three bold assertions by Mike Stonebraker
In the first “meat” — i.e., other than housekeeping — post on the new Database Column blog, Mike Stonebraker makes three core claims:
1. Different DBMS should be used for different purposes. I am in violent agreement with that point, which is indeed a major theme of this blog.
2. Vertica’s software is 50X faster than anything non-columnar and 10X faster than anything columnar. Now, some of these stats surely come from the syndrome of comparing the future release of your product, as tuned by world’s greatest experts on it who also hope to get rich on their stock options in your company, vs. some well-established production release of your competitors’ products, tuned to an unknown level of excellence,* with the whole thing running test queries that you, in your impartial wisdom, deem representative of user needs. Or something like that … Read more
Categories: Benchmarks and POCs, Columnar database management, Data warehousing, Database diversity, Michael Stonebraker, OLTP, Theory and architecture, TransRelational | 3 Comments |
Oracle, Tangosol, objects, caching, and disruption
Oracle made a slick move in picking up Tangosol, a leader in object/data caching for all sorts of major OLTP apps. They do financial trading, telecom operations, big web sites (Fedex, Geico), and other good stuff. This is a reminder that the list of important memory-centric data handling technologies is getting fairly long, including:
- Object caching (e.g., Tangosol, Progress ObjectStore)
- In-memory RDBMS (e.g., Oracle TimesTen, Solid BoostEngine, McObject eXtremeDB)
- Stream processing (e.g., Progress Apama, Streambase)
And that’s just for OLTP; there’s a whole other set of memory-centric technologies for analytics as well.
When one connects the dots, I think three major points jump out:
- There’s a lot more to high-end OLTP than relational database management.
- Oracle is determined to be the leader in as many of those areas as possible.
- This all fits the market disruption narrative.
I write about Point #1 all the time. So this time around let me expand a little more on #2 and #3.
Read more
DBMS market competitive overview (Part 1)
Monash Advantage members just received an exclusive nine-page Monash Letter with a competitive overview of the DBMS industry. The full analysis is exclusive to them, but I’ll give some highlights here.
1. As per my recent “deck-clearing” posts, there’s a lot more competitive opportunity in the DBMS industry than many observers recognize.
2. One reason is the considerable number of separate niches in the DBMS space.
3. Oracle is a classical Geoffrey Moore “gorilla” only in the market for high-end OLTP and mixed-used DBMS. Everything else is up for grabs.
4. As discussed here extensively, simpler appliance-like architectures are beating the overly complex general-purpose DBMS vendors’ solutions for VLDB data warehousing.
5. MPP/shared-nothing architectures are deservedly beating SMP/shared-everything approaches for VLDB data warehousing.
That’s not the only Monash Letter recently released; another one covered online marketing strategy and tactics.
Categories: Data warehouse appliances, Data warehousing, Database diversity, Oracle, Theory and architecture | Leave a Comment |
Amazon’s version of DBMS2
Last year, I pointed out that Amazon has a highly diversified DBMS strategy. Now Mike Vizard has a great interview with Werner Vogel, Amazon’s CTO, where he unearths a lot more detail. And it turns out that Amazon has been a hardcore adopter of DBMS2, since long before DBMS2 was named.
Read more
Categories: Amazon and its cloud, Database diversity, NoSQL, Specific users, Theory and architecture | Leave a Comment |
Relational DBMS versus text data
There seems to be tremendous confusion about “search,” “meaning,” “semantics,” the suitability of relational DBMS to manage text data, and similar subjects. Here are some observations that may help sort some of that out.
1. Relational database theorists like to talk about the “meaning” or “semantics” of data as being in the database (specifically its metadata, and more specifically its constraints). This is at best a very limited use of the words “meaning” or “semantics,” and has little to do with understanding the meaning of plain English (or other language) phrases, sentences, paragraphs, etc. that may be stored in the database. Hugh Darwen is right and his fellow relational theorists are confused.
2. The standard way to manage text is via a full-text index, designed like this: For hundreds of thousands of words, the index maintains a list of which documents the word appears in, and at what positions in the document it appears. This is a columnar, memory-centric approach, that doesn’t work well with the architecture of mainstream relational products. Oracle pulled off a decent single-server integration nonetheless, although performance concerns linger to this day. Others, like Sybase, which attempted a Verity integration, couldn’t make it work reasonably at all. Microsoft, which started from the Sybase architecture, didn’t even try, or if they tried it wasn’t for long; Microsoft’s text search strategy has been multi-server more or less from the getgo.
3. Notwithstanding point #2, Oracle, IBM, Microsoft, and others have SQL DBMS extended to handle text via the SQL3 (or SQL/MM ) standard. (Truth be told, I get the names and sequencing of the SQL standard versions mixed up.) From this standpoint, the full text of a document is in a single column, and one can write WHERE clauses on that column using a rich set of text search operators.
But while such SQL statements formally fit into the relational predicate logic model, the fit is pretty awkward. Text search functions aren’t two-valued binary yes/no types of things; rather, they give scores, e.g. with 101 possible values (the integers from 0 – 100). Compounding them into a two-valued function typically throws away information, especially since that compounding isn’t well understood (which is why it’s so hard to usefully federate text searches across different corpuses).
4. Something even trickier is going on. Text search can be carried out against many different kinds of things. One increasingly useful target is the tables of a relational database. Where a standard SQL query might have trouble finding all the references in a whole database to a particular customer organization or product line or whatever, a text search can do a better job. This kind of use is becoming increasingly frequent. And while it works OK against relational products, it doesn’t fit into the formal relational model at all (at least not without a tremendous amount of contortion).
5. Relational DBMS typically manage the data they index. Text search systems often don’t. But that difference is almost a small one compared with some of the others mentioned above, especially since it’s a checkmark item for leading RDBMS to have some sort of formal federation capability.
Categories: Data types, Database diversity, Memory-centric data management, Text, Theory and architecture | 13 Comments |
Gartner on “The Death of the Database”
Gartner had a recent conference session on “The Death of the Database,” as described in David Berlind’s and Kathy Somebodyorother’s blogs. The core idea was that data in the future might be stored closest to where it would need to be used, which might not be in a traditional DBMS.
Before getting to the real meat of that, let me push back at some of the extremist boobirds. First, I doubt the analysts really talked about “the intersection of a row and a tuple”; it’s much more likely that that is a misquote due to reporting error. Second, their claim that BI will switch from being an “application” to a “service” is not at all unreasonable. BI should never have been viewed as an application; it’s much more a collection of application-enabling technologies. And the analysts explicitly said that DBMS will continue to be useful for analytics. As for their claim that some data needs to be only briefly persistent — they’re absolutely right, but let me defer that point to a separate post on memory-centric OLTP.
All that said — while a lot of their points ring true, it sounds as if they overstated their case in one important area. They’re making it sound as if some of today’s OLTP databases will no longer be needed, and as if tomorrow’s new kinds of OLTP data won’t need to be at least partly persisted to conventional DBMS. Wrong and wrong. Every important transaction needs to wind up in a DBMS. Those DBMS may not be as centralized as previously thought. The data may be copied to non-DBMS data stores (or, more likely, kept in a lightweight local DBMS and copied from there to serioius OLTP database). These DBMS may use native XML rather than traditional tabular data structures. But at the end of the day, transactional databases will continue to be needed for all the reasons they’ve been necessary in the past.
Categories: Business intelligence, Database diversity, Structured documents, Theory and architecture | Leave a Comment |
It’s not about a single database
Critics of the DBMS2 idea generally are focused on the design of a single database. That’s somewhat missing the point.
Here are some excerpts and paraphrases from a discussion over on TDAN.
- “DBMS2” is NOT primarily a blueprint for how to design a single database or a single DBMS. That said, it does give guidance as to what kinds of DBMS and data architecture choices you should consider and favor for each cluster of applications.
- Text, presence, authentication, customer profiling – I don’t think any of these will wind up being handled relationally, although at least in the case of customer profiling that’s currently a minority viewpoint.
- In particular, text processing is poised to explode as a fraction of the overall IT burden.
- XML is pretty clearly going to be the basis of text data management.
- Unless all your apps are built by the same company, and perhaps not even then, there’s no way you’ll have a single integrated database. The way your different databases will talk to each other is XML.
- It’s clear that a typical large enterprise’s data structure will evolve to part relational and part XML, and possibly some other data models as well, all tied together by XML. None of the relational-über-alles arguments can or should change that. At best, they are reasons to make the relational piece bigger and more tightly integrated.
Categories: Database diversity, Theory and architecture | 3 Comments |
The Amazon.com bookstore is a huge, modern OLTP app. So is it relational?
I don’t know for a fact that the Amazon.com bookstore is the world’s biggest OLTP application — but if it isn’t, it’s close.
And the thing is — that’s never been an entirely relational application. Oh, the ordering part surely is. But the inventory lookup is currently driven by an OODBMS (from Progress). The personalization used to be done in Red Brick (I knew which software replaced it, but I’m forgetting at the moment — it may even be one of the relational warehouse appliance vendors). And of course the full-text search is a custom in-house system.
Categories: Amazon and its cloud, Cache, Data types, Database diversity, Memory-centric data management, Object, OLTP, Progress, Apama, and DataDirect, Specific users | 4 Comments |
Or to put the core idea another way
Break the data management problem into pieces, and stitch the pieces together.
Some of the pieces are best managed relationally, for all the well-known reasons; some, especially but not only the document-oriented ones, are not. XML-based SOA, or a successor, is the right framework for most of the stitching.
Categories: Database diversity, Theory and architecture | Leave a Comment |
The core idea(s) of DBMS2
My introduction of the DBMS2 concept in an August Computerworld column has excited some heated discussion, little of it focused on what I regard as the core issues. But I must concede that in a short series of monthly 750 word columns (two published so far), with a target audience of senior IT managers, I have not necessarily made a clear statement of whether or why database designers should agree or care.
So here’s a little more of that story.
1. Everybody knows that large enterprises do not have single enterprise-wide data models, nor do they have single integrated enterprise databases, managed by a single DBMS.
2. The situation described in Point #1 is inevitable. Deal with it. Stop your futile efforts to change it.
3. What’s more, the obvious disadvantages to the situation in Point #1 are outweighed by other strong TCO advantages. Different kinds of data models, and different kinds of DBMS (or DBMS-substitutes) are appropriate for different applications and data sets.
That’s it.
Some supporting arguments may be found in my column appearing today (see other post). Most of the ones I had room for boil down to this:
Relational databases are ideally suited to manage facts. Most interesting new applications don’t deal (primarily) with the management of facts.
Categories: Database diversity, Theory and architecture | 2 Comments |