Hadoop confusion from Forrester Research
Jim Kobielus started a recent post
Most Hadoop-related inquiries from Forrester customers come to me. These have moved well beyond the “what exactly is Hadoop?” phase to the stage where the dominant query is “which vendors offer robust Hadoop solutions?”
What I tell Forrester customers is that, yes, Hadoop is real, but that it’s still quite immature.
So far, so good. But I disagree with almost everything Jim wrote after that.
Jim’s thesis seems to be that Hadoop will only be mature when a significant fraction of analytic DBMS vendors have own-branded versions of Hadoop alongside their DBMS, possibly via acquisition. Based on this, he calls for a formal, presumably vendor-driven Hadoop standardization effort, evidently for the whole Hadoop stack. He also says that
Hadoop is the nucleus of the next-generation cloud EDW, but that promise is still 3-5 years from fruition
where by “cloud” I presume Jim means first and foremost “private cloud.”
I don’t think any of that matches Hadoop’s actual strengths and weaknesses, whether now or in the 3-7 year future. My reasoning starts:
- Hadoop is well on its way to being a surviving data-storage-plus-processing system — like an analytic DBMS or DBMS-imitating data integration tool …
- … but Hadoop is best-suited for somewhat different use cases than those technologies are, and the gap won’t close as long as the others remain a moving target.
- I don’t think MapReduce is going to fail altogether; it’s too well-suited for too many use cases.
- Hadoop (as opposed to general MapReduce) has too much momentum to fizzle, perhaps unless it is supplanted by one or more embrace-and-extend MapReduce-plus systems that do a lot more than it does.
- The way for Hadoop to avoid being a MapReduce afterthought is to evolve sufficiently quickly itself; ponderous standardization efforts are quite beside the point.
As for the rest of Jim’s claim — I see three main candidates for the “nucleus of the next-generation enterprise data warehouse,” each with better claims than Hadoop:
- Relational DBMS, much like today. (E.g., Teradata, DB2, Exadata or their successors.) This is the case in which robustness of the central data store matters most.
- Grand cosmic data integration tools. (The descendants of Informatica PowerCenter, et al.) This is the case in which the logic of data relationships can safely be separated from physical storage.
- Nothing. (The architecture could have several strong members, none of which is truly the “nucleus.”) This is the case in which new ways keep being invented to extract high value from data, outrunning what grandly centralized solutions can adapt to. I think this is the most likely case of all.
Comments
9 Responses to “Hadoop confusion from Forrester Research”
Leave a Reply
I agree about the standardization rathole, we were discussing this in a recent hadoop user group
To me, the key is for the hadoop development community to continue to aggressively improve the product and lower the barrier to entry. The last thing hadoop needs is a standardization distraction
As far as the core stack, CDH3 seems to be the core stack for all intents and purposes
We’ve just put our HADOOP strategy through a major left turn because of the aleged patent infringement brought by parallel iron against AOL, with IBM and others as co-defendents. Parallel iron claims that HDFS infringes upon a patent that they filed back in the early 2000’s. Our HADOOP strategy is to now look for an edition of HADOOP that is not based on HDFS. There are a couple out there. Can you shed some light on this mess? Regardless of who wins in (or out of) court, we don’t want to be anywhere near it from a legal perspective. I suspect many others will follow.
Alan,
It depends on your use case. DataStax/Brisk and Hadapt both make strong arguments, optimized for rather different use cases. The HDFS fans can say, rightly “We have huge production user examples and those other guys don’t.” But if you don’t want to use HDFS, that’s irrelevant. 🙂
Or you can leave Hadoop and go with Aster Data’s SQL/MR (SQL-MapReduce), which is very cool in theory and also has nice references.
[…] File System?I was surprised to see the comment on Curt Monash's blog post from Alan Scott: http://www.dbms2.com/2011/06/05/… As a developer in related areas, I have learnt not to read others' patents. Can someone who […]
Hi again, Alan,
Upon review, that legal proceeding really doesn’t make any sense.
http://www.dbms2.com/2011/06/10/patent-nonsense-parallel-ironhdfs-edition/
[…] them guardians of all things analytical. In this respect, I agree with Curt Monash’s response to Mr. Kobielus’ post: Hadoop and MapReduce do not require EDW support in order to gain legitimacy in the enterprise. […]
[…] all started when I disputed James Kobielus’ blogged claim that Hadoop is the nucleus of the next-generation cloud EDW. Jim posted again to reiterate the claim, only this time he wrote that all EDW vendors [will soon] […]
[…] analyst Curt Monash isn’t so sure about Hadoop as the foundation of data warehouses, but he did offer this assessment of Hadoop’s […]
[…] analyst Curt Monash isn’t so sure about Hadoop as the foundation of data warehouses, but he did offer this assessment of Hadoop’s […]