Do you need an analytic RDBMS?
I can think of seven major reasons not to use an analytic RDBMS. One is good; but the other six seem pretty questionable, niche circumstances excepted, especially at this time.
The good reason to not have an analytic RDBMS is that most organizations can run perfectly well on some combination of:
- SaaS (Software as a Service).
- A low-volume static website.
- A network focused on office software.
- A single cheap server, likely running a single instance of a general-purpose RDBMS.
Those enterprises, however, are generally not who I write for or about.
The six bad reasons to not have an analytic RDBMS all take the form “Can’t some other technology do the job better?”, namely:
- A data warehouse that’s just another instance of your OLTP (OnLine Transaction Processing) RDBMS. If your problem is that big, it’s likely that a specialized analytic RDBMS will be more cost-effective and generally easier to deal with.
- MOLAP (Multi-Dimensional OnLine Analytic Processing). That ship has sailed … and foundered … and been towed to drydock.
- In-memory BI. QlikView, SAP HANA, Oracle Exalytics, and Platfora are just four examples of many. But few enterprises will want to confine their analytics to such data as fits affordably in RAM.
- Non-tabular* approaches to investigative analytics. There are many examples in the Hadoop world — including the recent wave of SQL add-ons to Hadoop — and some in the graph area as well. But those choices will rarely suffice for the whole job, as most enterprises will want better analytic SQL performance for (big) parts of their workloads.
- Tighter integration of analytics and OLTP (OnLine Transaction Processing). Workday worklets illustrate that business intelligence/OLTP integration is a really good idea. And it’s an idea that Oracle and SAP can be expected to push heavily, when they finally get their product acts together. But again, that’s hardly all the analytics you’re going to want to do.
- Tighter integration of analytics and other short-request processing. An example would be maintaining a casual game’s leaderboard via a NoSQL write-optimized database. Yet again, that’s hardly all the analytics a typical enterprise will want to do.
*I’ve long used “tabular” to cover both relational and MOLAP structures, the point being that in both cases you have a neat and regular schema, well-represented as a set of arrays.
What could change this picture would be a future in which:
- All your tabular business data fits into RAM.
- Also, the OLTP/analytic DBMS distinction becomes less important.
In that case, it might be reasonable to get by with:
- A single in-memory relational DBMS, handling OLTP and some analytics alike.
- Whichever additional short-request systems you need (mainly for internet-heavy uses).
- A Hadoop-based analytic data store.
I’m on record as suggesting that traditional databases will indeed wind up in RAM. But I’m more doubtful that a single in-memory DBMS will suffice for OTLP and analytics alike.
What are some key aspects of a specialized analytic RDMS? My partial and overlapping list starts:
- Fast, high-volume analytic I/O.
- Smart query planning.
- Smart, high-volume internal data movement.
- Smart workload management.
- Good data compression, including in cache and during query execution.
- Strong analytic platform capabilities.
- Fast execution of analytic requests — standard SQL, advanced SQL, or other.
An analytic RDBMS typically:
- Is optimized for reads, which are often large, and perhaps temporary large writes as well.
- Reduces I/O bottlenecks via, for example, compression, columnar storage, and/or scale-out.
If all the data is in RAM, these problems are indeed lessened. Also, Oracle Exadata is dedicated to the premise that, even using conventional computer parts, I/O bottlenecks can be reduced with enough hardware — and price aside, it seems to work. Still, if you talk with analytic RDBMS designers, you repeatedly hear that it’s not that simple — even getting data efficiently out of RAM is different in the analytic and OLTP cases.
Query planning/execution, data movement, and workload management go together — they’re all about getting the most work done with the least machine effort, and they all depend on determining which specific execution choices might be synergistic or anti-synergistic with each other. Taken together, they form a very tough optimization challenge, which is different in the OLTP and analytic cases. Adding in analytic platform capabilities adds yet more difficulty to the optimization problem. And so:
A fast analytic database manager is a hard thing to build; expecting it to be fast at OLTP as well may be too much to ask for.
Given that, the discussion pivots to:
OK, but can we overprovision the RAM by so much that suboptimal performance doesn’t matter?
My guess is “Not any time soon” — because efficiency is always a good thing, databases will always grow, and RAM will never be free.
Bottom line: Analytic RDBMS will likely be needed for a long time.
Related link
- Integrating short-request and analytic processing (March, 2011)
Comments
12 Responses to “Do you need an analytic RDBMS?”
Leave a Reply
Analytic databases will always be with us in part due to organizational conflict between the needs of business stakeholders and the IT organization. Analysts whose careers depend on speed and agility are often at odds with IT, which tends to be control-oriented and focused on cost.
There are entire industries that depend on this divide. In 1995, Gartner predicted that improvements to enterprise data warehouses would put Marketing Service Organizations out of business. MSOs today continue to grow and expand.
from the point of the start-up that has all the computing resources on Amazon cloud, I’d like to make a couple of points:
– Access to elastic map reduce resources (Hadoop and Amazon map reduce) is very important. However, map reduce coding (which needs to be flexible, quick and “versioning friendly”) is not trivial, and the deployment of the environment takes time.
– Availability of the large analytical DMBS with advanced SQL capabilities can accomplish most of the required (and quite advanced) processing leaving a portion of only what’s absolutely required for Hadoop and other non-db and non-mapreduce processing.
We use Greenplum CE (on the largest single cluster instance in EC2 deployed with multiple EBS volumes) and it does the job beautifully even with the bad I/O typical for EC2. It is also very easily integrated with S3 storage and non-database processing.
Data partitioning, compression, and advanced querying capabilities all what PostgreSQL has to offer with parallel processing is very hard to beat.
This setup allows for a very flexible, very capable, and very cost effective infrastructure that is very hard to achieve on any single platform.
My company runs Database with a data base company- We also use Greenplum CE and it does a wonderful job as well… It is very flexible.. Our IT dept has been very appreciative of the new set up. Thanks for the great post!
> MOLAP. That ship has sailed …
So pre-aggregated cubes are no longer a thing now?
Replaced by ROLAP, or is it now NoLAP?
Pretty much. Performance can be gotten without MOLAP rigidity.
[…] http://www.dbms2.com/2012/11/05/do-you-need-an-analytic-rdbms/ […]
The MOLAP ship is alive and well – but perhaps has found its place in the spectrum of BI solutions: Essentially batch-based and conceived around pre-aggregation as the strategy for increasing performance, it’s retreating into the mid-data/complex structured data arena leaving the lower latency, continuous reporting of very big data to the big appliance vendors.
In particular, the rich (but complex) language that can define sets and calculated members is very useful in encapsulating complex logic within the cube structure for use in slice-and-dice interfaces. Additionally the MDX language being devoid of JOIN syntax is far more suited to reporting queries than SQL ever is.
> Performance can be gotten without MOLAP rigidity.
Is this perhaps what is driving the Microsoft BISM / Tabular mode and DAX?
[…] can’t yet comment about is MOLAP/ROLAP, which is a pity; if anybody can refute my claim that ROLAP trumps MOLAP, it’s either Microsoft or […]
[…] certain merits of this approach, I don’t believe in complete alternatives to analytic RDBMS. The rise of analytic DBMS oriented toward multi-structured data just strengthens that […]
top.moncs.su
Do you need an analytic RDBMS? | DBMS 2 : DataBase Management System Services
run your own web server
Do you need an analytic RDBMS? | DBMS 2 : DataBase Management System Services