MOLAP
Analysis of MOLAP (Multidimensional OnLine Analytic Processing) products and vendors. Related subjects include:
- Cognos (which now owns TM1)
- Oracle (which now owns Essbase)
- Microsoft (which offers Microsoft Analysis Services)
- Data warehousing
Readings in Database Systems
Mike Stonebraker and Larry Ellison have numerous things in common. If nothing else:
- They’re both titanic figures in the database industry.
- They both gave me testimonials on the home page of my business website.
- They both have been known to use the present tense when the future tense would be more accurate. 🙂
I mention the latter because there’s a new edition of Readings in Database Systems, aka the Red Book, available online, courtesy of Mike, Joe Hellerstein and Peter Bailis. Besides the recommended-reading academic papers themselves, there are 12 survey articles by the editors, and an occasional response where, for example, editors disagree. Whether or not one chooses to tackle the papers themselves — and I in fact have not dived into them — the commentary is of great interest.
But I would not take every word as the gospel truth, especially when academics describe what they see as commercial market realities. In particular, as per my quip in the first paragraph, the data warehouse market has not yet gone to the extremes that Mike suggests,* if indeed it ever will. And while Joe is close to correct when he says that the company Essbase was acquired by Oracle, what actually happened is that Arbor Software, which made Essbase, merged with Hyperion Software, and the latter was eventually indeed bought by the giant of Redwood Shores.**
*When it comes to data warehouse market assessment, Mike seems to often be ahead of the trend.
**Let me interrupt my tweaking of very smart people to confess that my own commentary on the Oracle/Hyperion deal was not, in retrospect, especially prescient.
Mike pretty much opened the discussion with a blistering attack against hierarchical data models such as JSON or XML. To a first approximation, his views might be summarized as: Read more
Multi-model database managers
I’d say:
- Multi-model database management has been around for decades. Marketers who say otherwise are being ridiculous.
- Thus, “multi-model”-centric marketing is the last refuge of the incompetent. Vendors who say “We have a great DBMS, and by the way it’s multi-model (now/too)” are being smart. Vendors who say “You need a multi-model DBMS, and that’s the reason you should buy from us” are being pathetic.
- Multi-logical-model data management and multi-latency-assumption data management are greatly intertwined.
Before supporting my claims directly, let me note that this is one of those posts that grew out of a Twitter conversation. The first round went:
Merv Adrian: 2 kinds of multimodel from DBMS vendors: multi-model DBMSs and multimodel portfolios. The latter create more complexity, not less.
Me: “Owned by the same vendor” does not imply “well integrated”. Indeed, not a single example is coming to mind.
Merv: We are clearly in violent agreement on that one.
Around the same time I suggested that Intersystems Cache’ was the last significant object-oriented DBMS, only to get the pushback that they were “multi-model” as well. That led to some reasonable-sounding justification — although the buzzwords of course aren’t from me — namely: Read more
Categories: Data models and architecture, Database diversity, Databricks, Spark and BDAS, Intersystems and Cache', MOLAP, Object, Streaming and complex event processing (CEP) | 3 Comments |
A new logical data layer?
I’m skeptical of data federation. I’m skeptical of all-things-to-all-people claims about logical data layers, and in particular of Gartner’s years-premature “Logical Data Warehouse” buzzphrase. Still, a reasonable number of my clients are stealthily trying to do some kind of data layer middleware, as are other vendors more openly, and I don’t think they’re all crazy.
Here are some thoughts as to why, and also as to challenges that need to be overcome.
There are many things a logical data layer might be trying to facilitate — writing, querying, batch data integration, real-time data integration and more. That said:
- When you’re writing data, you want it to be banged into a sufficiently-durable-to-acknowledge condition fast. If acknowledgements are slow, performance nightmares can ensue. So writing is the last place you want an extra layer, perhaps unless you’re content with the durability provided by an in-memory data grid.
- Queries are important. Also, they formally are present in other tasks, such as data transformation and movement. That’s why data manipulation packages (originally Pig, now Hive and fuller SQL) are so central to Hadoop.
Notes and comments, May 6, 2014
After visiting California recently, I made a flurry of posts, several of which generated considerable discussion.
- My claim that Spark will replace Hadoop MapReduce got much Twitter attention — including some high-profile endorsements — and also some responses here.
- My MemSQL post led to a vigorous comparison of MemSQL vs. VoltDB.
- My post on hardware and storage spawned a lively discussion of Hadoop hardware pricing; even Cloudera wound up disagreeing with what I reported Cloudera as having said. 🙂 Sadly, there was less response to the part about the partial (!) end of Moore’s Law.
- My Cloudera/SQL/Impala/Hive apparently was well-balanced, in that it got attacked from multiple sides via Twitter & email. Apparently, I was too hard on Impala, I was too hard on Hive, and I was too hard on boxes full of cardboard file cards as well.
- My post on the Intel/Cloudera deal garnered a comment reminding us Dell had pushed the Intel distro.
- My CitusDB post picked up a few clarifying comments.
Here is a catch-all post to complete the set. Read more
The two sides of BI
As is the case for most important categories of technology, discussions of BI can get confused. I’ve remarked in the past that there are numerous kinds of BI, and that the very origin of the term “business intelligence” can’t even be pinned down to the nearest century. But the most fundamental confusion of all is that business intelligence technology really is two different things, which in simplest terms may be categorized as user interface (UI) and platform* technology. And so:
- The UI aspect is why BI tends to be sold to business departments; the platform aspect is why it also makes sense to sell BI to IT shops attempting to establish enterprise standards.
- The UI aspect is why it makes sense to sell and market BI much as one would applications; the platform aspect is why it makes sense to sell and market BI much as one would database technology.
- The UI aspect is why vendors want to integrate BI with transaction-processing applications; the platform aspect is, I suppose, why they have so much trouble making the integration work.
- The UI aspect is why BI is judged on … well, on snazzy UIs and demos. The platform aspect is a big reason why the snazziest UI doesn’t always win.
*I wanted to say “server” or “server-side” instead of “platform”, as I dislike the latter word. But it’s too inaccurate, for example in the case of the original Cognos PowerPlay, and also in various thin-client scenarios.
Key aspects of BI platform technology can include:
- Query and data management. That’s the area I most commonly write about, for example in the cases of Platfora, QlikView, or Metamarkets. It goes back to the 1990s — notably the Business Objects semantic layer and Cognos PowerPlay MOLAP (MultiDimensional OnLine Analytic Processing) engine — and indeed before that to the report writers and fourth-generation languages of the 1970s. This overlaps somewhat with …
- … data integration and metadata management. Business Objects, Qlik, and other BI vendors have bought data integration vendors. Arguably, there was a period when Information Builders’ main business was data connectivity and integration. And sometimes the main value proposition for a BI deal is “We need some way to get at all that data and bring it together.”
- Security and access control — authentication, authorization, and all the additional As.
- Scheduling and delivery. When 10s of 1000s of desktops are being served, these aren’t entirely trivial. Ditto when dealing with occasionally-connected mobile devices.
It’s hard to make data easy to analyze
It’s hard to make data easy to analyze. While everybody seems to realize this — a few marketeers perhaps aside — some remarks might be useful even so.
Many different technologies purport to make data easy, or easier, to an analyze; so many, in fact, that cataloguing them all is forbiddingly hard. Major claims, and some technologies that make them, include:
- “We get data into a form in which it can be analyzed.” This is the story behind, among others:
- Most of the data integration and ETL (Extract/Transform/Load) industries, software vendors and consulting firms alike.
- Many things that purport to be “analytic applications” or data warehouse “quick starts”.
- “Data reduction” use cases in event processing.*
- Text analytics tools.
- Splunk.
- “Forget all that transformation foofarah — just load (or write) data into our thing and start analyzing it immediately.” This at various times has been much of the story behind:
- Relational DBMS, according to their inventor E. F. Codd.
- MOLAP (Multidimensional OnLine Analytic Processing), also according to RDBMS inventor E. F. Codd.
- Any kind of analytic DBMS, or general purpose DBMS used for data warehousing.
- Newer kinds of analytic DBMS that are faster than older kinds.
- The “data mart spin-out” feature of certain analytic DBMS.
- In-memory analytic data stores.
- Hadoop.
- NoSQL DBMS that have a few analytic features.
- TokuDB, similarly.
- Electronic spreadsheets, from VisiCalc to Datameer.
- Splunk.
- “Our tools help you with specific kinds of analyses or analytic displays.” This is the story underlying, among others:
- The business intelligence industry.
- The predictive analytics industry.
- Algorithmic trading use cases in complex event processing.*
- Some analytic applications.
- Splunk.
*Complex event/stream processing terminology is always problematic.
My thoughts on all this start: Read more
Do you need an analytic RDBMS?
I can think of seven major reasons not to use an analytic RDBMS. One is good; but the other six seem pretty questionable, niche circumstances excepted, especially at this time.
The good reason to not have an analytic RDBMS is that most organizations can run perfectly well on some combination of:
- SaaS (Software as a Service).
- A low-volume static website.
- A network focused on office software.
- A single cheap server, likely running a single instance of a general-purpose RDBMS.
Those enterprises, however, are generally not who I write for or about.
The six bad reasons to not have an analytic RDBMS all take the form “Can’t some other technology do the job better?”, namely:
- A data warehouse that’s just another instance of your OLTP (OnLine Transaction Processing) RDBMS. If your problem is that big, it’s likely that a specialized analytic RDBMS will be more cost-effective and generally easier to deal with.
- MOLAP (Multi-Dimensional OnLine Analytic Processing). That ship has sailed … and foundered … and been towed to drydock.
- In-memory BI. QlikView, SAP HANA, Oracle Exalytics, and Platfora are just four examples of many. But few enterprises will want to confine their analytics to such data as fits affordably in RAM.
- Non-tabular* approaches to investigative analytics. There are many examples in the Hadoop world — including the recent wave of SQL add-ons to Hadoop — and some in the graph area as well. But those choices will rarely suffice for the whole job, as most enterprises will want better analytic SQL performance for (big) parts of their workloads.
- Tighter integration of analytics and OLTP (OnLine Transaction Processing). Workday worklets illustrate that business intelligence/OLTP integration is a really good idea. And it’s an idea that Oracle and SAP can be expected to push heavily, when they finally get their product acts together. But again, that’s hardly all the analytics you’re going to want to do.
- Tighter integration of analytics and other short-request processing. An example would be maintaining a casual game’s leaderboard via a NoSQL write-optimized database. Yet again, that’s hardly all the analytics a typical enterprise will want to do.
Categories: Business intelligence, Data warehousing, Games and virtual worlds, Hadoop, In-memory DBMS, MOLAP | 12 Comments |
The Ted Codd guarantee
I write a lot about whether or not to use relational DBMS. For example:
- In May I surveyed relational vs. non-relational pros and cons at some length.
- Last November I mused about when it might be OK to do without joins.
- The question is implicit in a variety of posts about, say, document-oriented or object-oriented DBMS.
Before going further in that vein, I’d like to do a quick review of what E. F. “Ted” Codd was getting at with the relational model in the first place. Read more
Categories: Data models and architecture, IBM and DB2, MOLAP, NoSQL | 3 Comments |
Eight kinds of analytic database (Part 2)
In Part 1 of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I’ll cover four more kinds of analytic database — even newer, for the most part, with a use case/product short list match that is even less clear. Read more
Eight kinds of analytic database (Part 1)
Analytic data management technology has blossomed, leading to many questions along the lines of “So which products should I use for which category of problem?” The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for “big data” is little help.
Let’s try eight categories instead. While no categorization is ever perfect, these each have at least some degree of technical homogeneity. Figuring out which types of analytic database you have or need — and in most cases you’ll need several — is a great early step in your analytic technology planning. Read more