Database diversity
Discussion of choices and variety in database management system architecture. Related subjects include:
Some stuff that’s always on my mind
I have a LOT of partially-written blog posts, but am struggling to get any of them finished (obviously). Much of the problem is that they have so many dependencies on each other. Clearly, then, I should consider refactoring my writing plans. 🙂
So let’s start with this. Here, in no particular order, is a list of some things that I’ve said in the past, and which I still think are or should be of interest today. It’s meant to be background for numerous posts I write in the near future, and indeed a few hooks for such posts are included below.
1. Data(base) management technology is progressing pretty much as I expected.
- Vendors generally recognize that maturing a data store is an important, many-years-long process.
- Multiple kinds of data model are viable …
- … but it’s usually helpful to be able to do some kind of JOIN.
- To deal with the variety of hardware/network/storage arrangements out there, layering/tiering is on the rise. (An amazing number of vendors each seem to think they invented the idea.)
2. Rightly or wrongly, enterprises are often quite sloppy about analytic accuracy.
- My two central examples have long been inaccurate metrics and false-positive alerts.
- In predictive analytics, it’s straightforward to quantify how much additional value you’re leaving on the table with your imperfect accuracy.
- Enterprise search and other text technologies are still often terrible.
- After years of “real-time” overhype, organizations have seemingly swung to under-valuing real-time analytics.
Categories: Data models and architecture, Database diversity, Predictive modeling and advanced analytics, Public policy, Theory and architecture | 5 Comments |
Analytics on the edge?
There’s a theory going around to the effect that:
- Compute power is and will be everywhere, for example in cars, robots, medical devices or microwave ovens. Let’s refer to these platforms collectively as “real-world appliances”.
- Much more data will be created on these platforms than can reasonably be sent back to centralized/cloudy servers.
- Therefore, cloud-centric architectures will soon be obsolete, perhaps before they’re ever dominant in the first place.
There’s enough truth to all that to make it worth discussing. But the strong forms of the claims seem overblown.
1. This story doesn’t even make sense except for certain new classes of application. Traditional business applications run all over the world, in dedicated or SaaSy modes as the case may be. E-commerce is huge. So is content delivery. Architectures for all those things will continue to evolve, but what we have now basically works.
2. When it comes to real-world appliances, this story is partially accurate. An automobile is a rolling network of custom Linux systems, each running hand-crafted real-time apps, a few of which also have minor requirements for remote connectivity. That’s OK as far as it goes, but there could be better support for real-time operational analytics. If something as flexible as Spark were capable of unattended operation, I think many engineers of real-world appliances would find great ways to use it.
3. There’s a case to be made for something better yet. I think the argument is premature, but it’s worth at least a little consideration. Read more
MongoDB 3.4 and “multimodel” query
“Multimodel” database management is a hot new concept these days, notwithstanding that it’s been around since at least the 1990s. My clients at MongoDB of course had to join the train as well, but they’ve taken a clear and interesting stance:
- A query layer with multiple ways to query and analyze data.
- A separate data storage layer in which you have a choice of data storage engines …
- … each of which has the same logical (JSON-based) data structure.
When I pointed out that it would make sense to call this “multimodel query” — because the storage isn’t “multimodel” at all — they quickly agreed.
To be clear: While there are multiple ways to read data in MongoDB, there’s still only one way to write it. Letting that sink in helps clear up confusion as to what about MongoDB is or isn’t “multimodel”. To spell that out a bit further: Read more
Categories: Database diversity, Emulation, transparency, portability, MongoDB, MySQL, NoSQL, Open source, RDF and graphs, Structured documents, Text | 4 Comments |
Notes on the transition to the cloud
1. The cloud is super-hot. Duh. And so, like any hot buzzword, “cloud” means different things to different marketers. Four of the biggest things that have been called “cloud” are:
- The Amazon cloud, Microsoft Azure, and their competitors, aka public cloud.
- Software as a service, aka SaaS.
- Co-location in off-premises data centers, aka colo.
- On-premises clusters (truly on-prem or colo as the case may be) designed to run a broad variety of applications, aka private cloud.
Further, there’s always the idea of hybrid cloud, in which a vendor peddles private cloud systems (usually appliances) running similar technology stacks to what they run in their proprietary public clouds. A number of vendors have backed away from such stories, but a few are still pushing it, including Oracle and Microsoft.
This is a good example of Monash’s Laws of Commercial Semantics.
2. Due to economies of scale, only a few companies should operate their own data centers, aka true on-prem(ises). The rest should use some combination of colo, SaaS, and public cloud.
This fact now seems to be widely understood.
Oracle as the new IBM — has a long decline started?
When I find myself making the same observation fairly frequently, that’s a good impetus to write a post based on it. And so this post is based on the thought that there are many analogies between:
- Oracle and the Oracle DBMS.
- IBM and the IBM mainframe.
And when you look at things that way, Oracle seems to be swimming against the tide.
Drilling down, there are basically three things that can seriously threaten Oracle’s market position:
- Growth in apps of the sort for which Oracle’s RDBMS is not well-suited. Much of “Big Data” fits that description.
- Outright, widespread replacement of Oracle’s application suites. This is the least of Oracle’s concerns at the moment, but could of course be a disaster in the long term.
- Transition to “the cloud”. This trend amplifies the other two.
Oracle’s decline, if any, will be slow — but I think it has begun.
Oracle/IBM analogies
There’s a clear market lead in the core product category. IBM was dominant in mainframe computing. While not as dominant, Oracle is definitely a strong leader in high-end OTLP/mixed-use (OnLine Transaction Processing) RDBMS.
That market lead is even greater than it looks, because some of the strongest competitors deserve asterisks. Many of IBM’s mainframe competitors were “national champions” — Fujitsu and Hitachi in Japan, Bull in France and so on. Those were probably stronger competitors to IBM than the classic BUNCH companies (Burroughs, Univac, NCR, Control Data, Honeywell).
Similarly, Oracle’s strongest direct competitors are IBM DB2 and Microsoft SQL Server, each of which is sold primarily to customers loyal to the respective vendors’ full stacks. SAP is now trying to play a similar game.
The core product is stable, secure, richly featured, and generally very mature. Duh.
The core product is complicated to administer — which provides great job security for administrators. IBM had JCL (Job Control Language). Oracle has a whole lot of manual work overseeing indexes. In each case, there are many further examples of the point. Edit: A Twitter discussion suggests the specific issue with indexes has been long fixed.
Niche products can actually be more reliable than the big, super-complicated leader. Tandem Nonstop computers were super-reliable. Simple, “embeddable” RDBMS — e.g. Progress or SQL Anywhere — in many cases just work. Still, if you want one system to run most of your workload 24×7, it’s natural to choose the category leader. Read more
Categories: Cloud computing, Database diversity, Exadata, IBM and DB2, Market share and customer counts, Microsoft and SQL*Server, NoSQL, Oracle, Software as a Service (SaaS) | 28 Comments |
Readings in Database Systems
Mike Stonebraker and Larry Ellison have numerous things in common. If nothing else:
- They’re both titanic figures in the database industry.
- They both gave me testimonials on the home page of my business website.
- They both have been known to use the present tense when the future tense would be more accurate. 🙂
I mention the latter because there’s a new edition of Readings in Database Systems, aka the Red Book, available online, courtesy of Mike, Joe Hellerstein and Peter Bailis. Besides the recommended-reading academic papers themselves, there are 12 survey articles by the editors, and an occasional response where, for example, editors disagree. Whether or not one chooses to tackle the papers themselves — and I in fact have not dived into them — the commentary is of great interest.
But I would not take every word as the gospel truth, especially when academics describe what they see as commercial market realities. In particular, as per my quip in the first paragraph, the data warehouse market has not yet gone to the extremes that Mike suggests,* if indeed it ever will. And while Joe is close to correct when he says that the company Essbase was acquired by Oracle, what actually happened is that Arbor Software, which made Essbase, merged with Hyperion Software, and the latter was eventually indeed bought by the giant of Redwood Shores.**
*When it comes to data warehouse market assessment, Mike seems to often be ahead of the trend.
**Let me interrupt my tweaking of very smart people to confess that my own commentary on the Oracle/Hyperion deal was not, in retrospect, especially prescient.
Mike pretty much opened the discussion with a blistering attack against hierarchical data models such as JSON or XML. To a first approximation, his views might be summarized as: Read more
Differentiation in data management
In the previous post I broke product differentiation into 6-8 overlapping categories, which may be abbreviated as:
- Scope
- Accuracy
- (Other) trustworthiness
- Speed
- User experience
- Cost
and sometimes also issues in adoption and administration.
Now let’s use this framework to examine two market categories I cover — data management and, in separate post, business intelligence.
Applying this taxonomy to data management:
Read more
Categories: Buying processes, Clustering, Data warehousing, Database diversity, Microsoft and SQL*Server, Predictive modeling and advanced analytics, Pricing | 2 Comments |
Notes on packaged applications (including SaaS)
1. The rise of SAP (and later Siebel Systems) was greatly helped by Anderson Consulting, even before it was split off from the accounting firm and renamed as Accenture. My main contact in that group was Rob Kelley, but it’s possible that Brian Sommer was even more central to the industry-watching part of the operation. Brian is still around, and he just leveled a blast at the ERP* industry, which I encourage you to read. I agree with most of it.
*Enterprise Resource Planning
Brian’s argument, as I interpret it, boils down mainly to two points:
- Big ERP companies selling big ERP systems are pathetically slow at adding new functionality. He’s right. My favorite example is the multi-decade slog to integrate useful analytics into operational apps.
- The world of “Big Data” is fundamentally antithetical to the design of current-generation ERP systems. I think he’s right in that as well.
I’d add that SaaS (Software As A Service)/on-premises tensions aren’t helping incumbent vendors either.
But no article addresses all the subjects it ideally should, and I’d like to call out two omissions. First, what Brian said is in many cases applicable just to large and/or internet-first companies. Plenty of smaller, more traditional businesses could get by just fine with no more functionality than is in “Big ERP” today, if we stipulate that it should be:
- Delivered via SaaS.
- Much easier to adopt and use.
Categories: Database diversity, SAP AG, Software as a Service (SaaS) | 8 Comments |
Multi-model database managers
I’d say:
- Multi-model database management has been around for decades. Marketers who say otherwise are being ridiculous.
- Thus, “multi-model”-centric marketing is the last refuge of the incompetent. Vendors who say “We have a great DBMS, and by the way it’s multi-model (now/too)” are being smart. Vendors who say “You need a multi-model DBMS, and that’s the reason you should buy from us” are being pathetic.
- Multi-logical-model data management and multi-latency-assumption data management are greatly intertwined.
Before supporting my claims directly, let me note that this is one of those posts that grew out of a Twitter conversation. The first round went:
Merv Adrian: 2 kinds of multimodel from DBMS vendors: multi-model DBMSs and multimodel portfolios. The latter create more complexity, not less.
Me: “Owned by the same vendor” does not imply “well integrated”. Indeed, not a single example is coming to mind.
Merv: We are clearly in violent agreement on that one.
Around the same time I suggested that Intersystems Cache’ was the last significant object-oriented DBMS, only to get the pushback that they were “multi-model” as well. That led to some reasonable-sounding justification — although the buzzwords of course aren’t from me — namely: Read more
Categories: Data models and architecture, Database diversity, Databricks, Spark and BDAS, Intersystems and Cache', MOLAP, Object, Streaming and complex event processing (CEP) | 3 Comments |
Notes on HBase
I talked with a couple of Cloudera folks about HBase last week. Let me frame things by saying:
- The closest thing to an HBase company, ala MongoDB/MongoDB or DataStax/Cassandra, is Cloudera.
- Cloudera still uses a figure of 20% of its customers being HBase-centric.
- HBaseCon and so on notwithstanding, that figure isn’t really reflected in Cloudera’s marketing efforts. Cloudera’s marketing commitment to HBase has never risen to nearly the level of MongoDB’s or DataStax’s push behind their respective core products.
- With Cloudera’s move to “zero/one/many” pricing, Cloudera salespeople have little incentive to push HBase hard to accounts other than HBase-first buyers.
Also:
- Cloudera no longer dominates HBase development, if it ever did.
- Cloudera is the single biggest contributor to HBase, by its count, but doesn’t make a majority of the contributions on its own.
- Cloudera sees Hortonworks as having become a strong HBase contributor.
- Intel is also a strong contributor, as are end user organizations such as Chinese telcos. Not coincidentally, Intel was a major Hadoop provider in China before the Intel/Cloudera deal.
- As far as Cloudera is concerned, HBase is just one data storage technology of several, focused on high-volume, high-concurrency, low-latency short-request processing. Cloudera thinks this is OK because of HBase’s strong integration with the rest of the Hadoop stack.
- Others who may be inclined to disagree are in several cases doing projects on top of HBase to extend its reach. (In particular, please see the discussion below about Apache Phoenix and Trafodion, both of which want to offer relational-like functionality.)