Comments on the 2011 Forrester Wave for Enterprise Data Warehouse Platforms
The Forrester Wave: Enterprise Data Warehouse Platforms, Q1 2011 is now out,* hot on the heels of the Gartner Magic Quadrant. Unfortunately, this particular Forrester Wave is riddled with inaccuracy.
*At the time of this writing, I don’t have a link to a free version of the full report. At the time of this writing, the 2011 Forrester Wave for Enterprise Data Warehouse Platforms graphic can be found here.
One example of the confusion pervading the 2011 Forrester Wave for Enterprise Data Warehouse Platforms lies in a list of three supposed trends.
- The Forrester Wave somehow conflates SaaS and MPP processing, tying them both to the term “cloud.” (In reality, the SaaS/cloud and MPP/cloud equations depend on two rather different word-senses for “cloud”.)
- The Forrester Wave then conflates EDWs, analytic computing systems, and application servers, the latter perhaps because of the “data-application server” product category name Aster Data floated. The Forrester Wave also conflates investigative analytics with low-latency operational processes that exploit investigative analytics’ results.
- The Forrester Wave then conflates social media, “unstructured data” (by which it seems at one point to mean text and at another point to also mean logs), solid-state drives, and a whole bunch of other technologies (especially but not only low-latency ones) into another supposed single trend.
Some of the sillier specific claims in the Forrester Wave for Enterprise Data Warehouse Platforms include:
- According to the 2011 Forrester Wave for Enterprise Data Warehouse Platforms, Netezza has hybrid row/columnar persistence, while most other vendors cited don’t. To recycle an old Larry Ellison joke, somebody obviously has a better pharmacist than I do. It’s tough to imagine how anybody who understands columnar storage could at all believe Netezza currently offers it.
- According to the 2011 Forrester Wave for Enterprise Data Warehouse Platforms, EMC/Greenplum is limited in the hardware it supports. Actually, Greenplum runs on pretty much any commodity Intel hardware, just like any other software-only DBMS does.
- According to the 2011 Forrester Wave for Enterprise Data Warehouse Platforms, Teradata, Sybase, and others are differentiated in their Hadoop support. Actually, Hadoop support of various forms is a checkmark item for analytic DBMS vendors.
- According to the 2011 Forrester Wave for Enterprise Data Warehouse Platforms, Oracle, Teradata, and others are differentiated in their cloud/SaaS support. Actually, having some kind of public cloud offering is a checkmark item; use of same is quite a different matter.
- The 2011 Forrester Wave for Enterprise Data Warehouse Platforms calls out EMC Greenplum for special praise in mixed workload management. Greenplum will probably be fine in concurrency and workload management, but implying it’s a leader is overstated.
- According to the 2011 Forrester Wave for Enterprise Data Warehouse Platforms, Vertica has not made a significant investment in real-time technologies (despite doing a lot of work with StreamBase and selling a lot into the algorithmic trading market). I disagree.
- Also according to the 2011 Forrester Wave for Enterprise Data Warehouse Platforms, Vertica has not made a significant investment in in-memory technology, despite the fact that all its updates pass through Vertica’s in-memory, query-responsive “Write-Optimized Store.” I disagree.
Even leaving aside the errors that obviously riddled the Forrester Wave for Enterprise Data Warehouse Platforms’ underlying 56-row matrix, I dispute the whole premise of the exercise. I’m not a big fan of overarching scorecard-based rankings, because the right choice of product varies so much by use case. For example:
- If you’re a smallish enterprise who can realistically do OLTP and data warehousing on the same instance of your DBMS, Oracle and Microsoft blow away everybody else mentioned.
- If columnar compression methods work really well for your use case, Vertica or maybe Oracle Exadata might shine.
- If you typically only retrieve a few columns from a wide table, so that columnar I/O is what you care most about, Vertica, Sybase, or even EMC Greenplum might shine. (The decidedly non-columnar Netezza and Oracle Exadata approaches to predicate pushdown might or might not excel as well.)
- If your database is above a certain size, some of the alternatives (such as Sybase IQ or non-Exadata Oracle) should be taken off the table.
- If you have a highly concurrent mixed workload, nobody else is as proven as Teradata.
- If you don’t want to invest much in database administration, Oracle is about the last vendor you should consider, and Netezza might be the first.
More excusable is some terminological confusion in the Forrester Wave for Enterprise Data Warehouse Platforms, the essence of which is this:
Notwithstanding its name, the Forrester Wave for Enterprise Data Warehouse Platforms isn’t just talking about what are called enterprise data warehouses (EDWs), but rather a broader range of analytic database management systems and use cases. These include:
- What are classically called operational data stores (the focus on “Next-Best Actions” suggests those are included).
- Analytic platforms/analytic computing systems (the high-level mentions of MapReduce, predictive modeling integration, and so on suggest they’re in too).
- Reporting data marts (some of the vendors cited might not make the minimum count threshold unless those are included too).
Indeed, the definition provided of “EDW” basically boils down to “runs SQL, is tuned in some way for analytics, has a cost-based or other query optimizer, and isn’t tied to a specific application.”
Frankly, I think classical EDWs have their problems, and are not necessarily the best way to address the numerous use cases for analytic DBMS technology. And product category names are commonly problematic anyhow. So I don’t much mind this overloading of the EDW term. But in one respect I think the Forrester Wave overdoes its inclusiveness — it includes things that aren’t actually DBMS, and then marks down just about every product cited for being a real DBMS rather than some sort of above-DBMS layer, at least when those things are sold by SAP. I’ve never agreed with the idea that SAP’s BW/BWA products should be included in a comparison with the other products cited in the Forrester Wave at all, and SAP HANA doesn’t change my mind.
One last thing — I’m suspicious of the Forrester Wave for Enterprise Data Warehouse Platforms’ comments on data warehouse appliance prices. However, they are hard to judge without knowing whether Forrester was using the term “raw data” in its usual sense, or actually means “user data”, and also without knowing whether Forrester is talking about list or “street” pricing.
Comments
8 Responses to “Comments on the 2011 Forrester Wave for Enterprise Data Warehouse Platforms”
Leave a Reply
Curt,
I agree that the term “EDW” is blurry. Somewhere, I’ve read “EDW = RDBMS + X” whereby X constitutes all the code like extraction programs, scheduling scripts, SQL generated by modeling tools, ETL tool etc. This sounds to me a valid perspective. Gartner, Forrester and yourself seem to focus heavily on the RDBMS piece in that equation, basically “EDW = RDBMS”, which then leads to the question “What the heck? They are only discussing databases and SQL. So what?”. While an RDBMS plays a significant role in an EDW context I argue that an RDBMS is commodity and thus replaceable. It is the “X portion” that is pretty decisive. But that’s too proprietary and not generic enough for analysts. That’s why it gets discarded. But the real-world is different.
Alistair
Alistair,
That sounds like you’re saying RDBMS work well and hence are commodities, while ETL doesn’t work well and hence is the interesting part. There’s certainly some truth in that direction.
But while RDBMS for handling traditional business-transaction data may be approaching commodity status, there’s a whole lot else RDBMS can do too, and that turns out not to have been commoditized yet at all.
And even the parts that you might reasonably call “commodity”? As long as one pays Oracle prices for it, that’s not a commodity at all.
Curt,
As usual, in awe of your ability to distill a ton complicated concepts and keep the other analysts honest. In defense of Forrester, it’s a hell of a hard job to produce all of this in a market that is so murky. Well done and thanks for the insights, I’ll now go and make money on them! LOL
-NR
[…] Forrester Waves always seem to have weird implicit definitions of “data warehousing”. This one is no exception. […]
[…] Whatever the problems may be with Gartner’s approach, the whole thing comes out better than do Forrester’s failed imitations. […]
Hi
I would question some of what you are saying. Whilst Terradata is widely used and proven, I would say that one problem as with all MPP systems is that the front-end node becomes a bottleneck especially when a query that does not match the partitioning is run. This happens on Netezza and Greenplum also.
Also from a support point of view I would say Sybase IQ is by far the lowest and I’ve also used the likes of Greenplum and Netezza to compare.
I would question that Sybase IQ is not scaleable. In my experience it is highly scaleable and can grow to Petabytes of data and has massive performance scaleability.
Hi, John!
Please do tell more about multi-petabyte Sybase IQ installations! The company itself has never mentioned a single one to me.
Anyhow, I don’t hear a lot about head-node-style bottlenecks for Teradata query processing.
[…] for bloggers when they report on the EDW space (see Curt Monash’s review of their last report here). They have a 2013 report out now that is quite mysterious (see […]