Are analytic RDBMS and data warehouse appliances obsolete?
I used to spend most of my time — blogging and consulting alike — on data warehouse appliances and analytic DBMS. Now I’m barely involved with them. The most obvious reason is that there have been drastic changes in industry structure:
- Many of the independent vendors were swooped up by acquisition.
- None of those acquisitions was a big success.
- Microsoft did little with DATAllegro.
- Netezza struggled with R&D after being bought by IBM. An IBMer recently told me that their main analytic RDBMS engine was BLU.
- I hear about Vertica more as a technology to be replaced than as a significant ongoing market player.
- Pivotal open-sourced Greenplum. I have detected few people who care.
- Ditto for Actian’s offerings.
- Teradata claimed a few large Aster accounts, but I never hear of Aster as something to compete or partner with.
- Smaller vendors fizzled too. Hadapt and Kickfire went to Teradata as more-or-less acquihires. InfiniDB folded. Etc.
- Impala and other Hadoop-based alternatives are technology options.
- Oracle, Microsoft, IBM and to some extent SAP/Sybase are still pedaling along … but I rarely talk with companies that big. 🙂
Simply reciting all that, however, begs the question of whether one should still care about analytic RDBMS at all.
My answer, in a nutshell, is:
Analytic RDBMS — whether on premises in software, in the form of data warehouse appliances, or in the cloud — are still great for hard-core business intelligence, where “hard-core” can refer to ad-hoc query complexity, reporting/dashboard concurrency, or both. But they aren’t good for much else.
To see why, let’s start by asking: “With what do you want to integrate your analytic SQL processing?”
- If you want to integrate with relational OLTP (OnLine Transaction Processing), your OLTP RDBMS vendor surely has a story worth listening to. Memory-centric offerings MemSQL and SAP HANA are also pitched that way.
- If you want to integrate with your SAP apps in particular, HANA is the obvious choice.
- If you want to integrate with other work you do in the Amazon cloud, Redshift is worth a look.
Beyond those cases, a big issue is integration with … well, with data integration. Analytic RDBMS got a lot of their workloads from ELT or ETLT, which stand for Extract/(Transform)/Load/Transform. I.e., you’d load data into an efficient analytic RDBMS and then do your transformations, vs. the “traditional” (for about 10-15 years of tradition) approach of doing your transformations in your ETL (Extract/Transform/Load) engine. But in bigger installations, Hadoop often snatches away that part of the workload, even if the rest of the processing remains on a dedicated analytic RDBMS platform such as Teradata’s.
And suppose you want to integrate with more advanced analytics — e.g. statistics, other predictive modeling/machine learning, or graph analytics? Well — and this both surprised and disappointed me — analytic platforms in the RDBMS sense didn’t work out very well. Early Hadoop had its own problems too. But Spark is doing just fine, and seems poised to win.
My technical observations around these trends include:
- Advanced analytics commonly require flexible, iterative processing.
- Spark is much better at such processing than earlier Hadoop …
- … which in turn is better than anything that’s been built into an analytic RDBMS.
- Open source/open standards and the associated skill sets come into play too. Highly vendor-proprietary DBMS-tied analytic stacks don’t have enough advantages over open ones.
- Notwithstanding the foregoing, RDBMS-based platforms can still win if a big part of the task lies in fancy SQL.
And finally, if a task is “partly relational”, then Hadoop or Spark often fit both parts.
- They don’t force you into using SQL for everything, nor into putting all your data into relational schemas, and that flexibility can be a huge relief.
- Even so, almost everybody who uses those uses some SQL, at least for initial data extraction. Those systems are also plenty good enough at SQL for joining data to reference tables, and all that other SQL stuff you’d never want to give up.
But suppose you just want to do business intelligence, which is still almost always done over relational data structures? Analytic RDBMS offer the trade-offs:
- They generally still provide the best performance or performance/concurrency combination, for the cost, although YMMV (Your Mileage May Vary).
- One has to load the data in and immediately structure it relationally, which can be an annoying contrast to Hadoop alternatives (data base administration can be just-in-time) or to OLTP integration (less or no re-loading).
- Other integrations, as noted above, can also be weak.
Suppose all that is a good match for your situation. Then you should surely continue using an analytic RDBMS, if you already have one, and perhaps even acquire one if you don’t. But for many other use cases, analytic RDBMS are no longer the best way to go.
Finally, how does the cloud affect all this? Mainly, it brings one more analytic RDBMS competitor into the mix, namely Amazon Redshift. Redshift is a simple system for doing analytic SQL over data that was in or headed to the Amazon cloud anyway. It seems to be quite successful.
Bottom line: Analytic RDBMS are no longer in their youthful prime, but they are healthy contributors in middle age. Mainly, they’re still best-of-breed for supporting demanding BI.
Comments
29 Responses to “Are analytic RDBMS and data warehouse appliances obsolete?”
Leave a Reply
We use Vertica on AWS and we are pretty happy with it. We load raw and use parallel SQL to transform. We prefer it over Redshift for several reasons – partitioning, multiple projections per table, extensibility (UDF framework), and transactions actually work as expected.
“I hear about Vertica more as a technology to be replaced” – I’m curious what it is usually being replaced by?
Curt,
@2009 is when I began reading your posts. I was re-architecting a data warehouse. You left one out of your list that is my favorite, and which came to my attention via your posts of that time: Exasol. With GoldenGate Change Data Capture to deliver our database update in near-real-time, and a Data Vault Integrattion Hub on Exasol, we will virtualize the delivery layer for BI, and do analytics on the Data Vault. How does that strike you?
Curt, others have mentioned Exasol and Vertica in the cloud, and there’s Snowflake, Azure SQL Data Warehouse, Teradata and Google BigQuery also making big strides in this space.
I agree that the on-premise DW appliance is on life support, but cloud-based solutions offer a great alternative.
I’d also add Google Big Query as an interesting technology, as well as Presto – open source DB from FB, that is similar to Impala. I’ve seen Presto start to eat away workloads previously handled by Vertica.
I think situation is quite different between cloud and in-premises. I believe, that in cloud elastic allocation of resources is a right way to do analytics over big data…
In-premises, where compute is actually scarce and not-elastic resource – efficiency per pound of hardware of MPP databases is important factor.
Actian is axing their analytical databases, they already fired their dev teams.
If Vertica is replaced, it will be because either:
— The user thinks a Hadoop- and/or Spark-oriented stack is cheaper, more functional in analytics, and/or better at dealing with multi-structured data.
— The user wants to do analytics on data that’s being updated at OLTP speeds.
Actian’s website seems to be in a bit of disarray. I was tipped off last night that it’s hard to find material about ParAccel or VectorWise there. However, what used to be Pervasive DataRush and related technologies still seem to be being pushed.
But then, I can’t find an Ingres references either, and Actian is surely still in the Ingres business, so we’ll have to wait and see.
I’d be interested in hearing from Actian as to what’s up. I’ve blacklisted them for years due to some lies they told publicly about previous conversations with me, and I still wouldn’t want a product-feature briefing for that reason. But if they want to tell me what businesses they basically are or aren’t still in, I’m curious.
If you look back, I covered Exasol years ago. It seemed like a sensibly-architected technology, but without the maturity to be a top-tier competitor, and without the momentum (Germany excepted) to change that.
The comment above is the first time I’ve heard about them in ages.
Ron et al.,
I agree that cloud/SaaS is a viable deployment option for ever more analytic RDBMS. Vertica in particular got to the cloud early for test/dev.
But to what extent are new cloud adopters adopting “traditional” systems over the less functional but usually cheaper Redshift?
An informant in whom I have great confidence tells me that Actian was shopping Vector/Vectorwise and Matrix/ParAccel for quite a while, unsuccessfully. The informant also supplied revenue numbers that were amazingly low.
Meanwhile, Infobright suddenly canned its community edition, but is moving forward otherwise.
https://infobright.com/blog/the-final-frontiers-of-ice/
And Exasol is putting out minor press releases like a company that is still chugging along. Ditto Kognitio. While I may be wrong about this, I get the impression that each of Infobright, Kognitio and Exasol is telling a “query accelerator” story as much as it is telling an analytic RDBMS one.
[…] a recent blog post that ponders the future of the analytical RDBMS, Monash said the systems still excelled at key business intelligent jobs such as complex ad hoc […]
Actually Ingres is mentioned at Actian site under Data Management section:
http://www.actian.com/products/operational-databases/ingres/
Apparently HP are offloading Vertica as well http://www.reuters.com/article/us-hpe-software-thomabravo-idUSKCN1175RV
What will that mean for the Product and these products moving forward? Are the big players all going to follow and abandon them, consolidating the DBMS space again to what it looked like in the nineties (Oracle, Teradata…) and some niche players?
Interesting times.
As long as the Hadoop and NoSQL vendors aren’t acquired, we’re not consolidated the way we were before. Amazon also seems here to stay.
[…] Business intelligence should occur at interactive speeds, which is a major reason that there’s a market for high-performance analytic RDBMS. […]
Curt-
We liked your article and certainly noted the absence of Infobright. Here is what we have been up to.
https://infobright.com/blog/8650/
All the best,
Tim
[…] Business intelligence should occur at interactive speeds, which is a major reason that there’s a market for high-performance analytic RDBMS. […]
Tim,
That’s a perfect way to use the comments on this blog. Too marketing-spun for me to ever endorse myself, but with enough meat that I’m happy to see my readers’ attention drawn to it by you. 🙂
Trafodion seems to be getting stable and may not be a bad choice for an analytics rdbms. Well almost an rdbms ! Doing some benchmarks on it in our data sciences lab, so can comment better once this exercise is over.C curious as to how their mvcc based scheme scales.
I believe we will have RDBMS for awhile but their footprint will definitely shrink. In the Hadoop eco system we have yet to provide that fast connection that exist in RDBMS for supporting the dashboard and customer facing insights.
What I see working right now is to ETL offload data from the data warehouse experiment and find new insights, then push that completed data model back out the data warehouse.
Hi Curt,
I started as the Actian, SVP of Marketing, on Monday Oct 24th. I am sorry that you had a bad experience with Actian previously. I would love to talk one on one so please reach out to me and suggest some days/times that work for you. lennard.fischer@actian.com
Did you see our Press Release this week? http://www.actian.com/company/news-and-events/press-releases/actian-announces-executive-leadership-changes/
Cheers,
Len
Redshift is nothing but ParAccel!!
Chris,
Actually, Redshift started as a stripped-down subset of ParAccel.
Have used both and there isn’t any difference.
Only diff is udf support , even error messages are same :))
Besides UDFs, I’m pretty sure Amazon didn’t use all of ParAccel’s scale-out technology. But if you’re happy with Redshift performance and scaling, you don’t need to care about that part.
Other than performance and functionality, I agree they’re the same thing! 🙂
[…] mature alternative to MemSQL. The opportunity for MemSQL and CrateDB alike exists in part because analytic RDBMS vendors didn’t close it […]
moved here
In-memory DBMS | DBMS 2 : DataBase Management System Services