June 5, 2011

Hadoop confusion from Forrester Research

Jim Kobielus started a recent post

Most Hadoop-related inquiries from Forrester customers come to me. These have moved well beyond the “what exactly is Hadoop?” phase to the stage where the dominant query is “which vendors offer robust Hadoop solutions?”

What I tell Forrester customers is that, yes, Hadoop is real, but that it’s still quite immature.

So far, so good. But I disagree with almost everything Jim wrote after that.

Jim’s thesis seems to be that Hadoop will only be mature when a significant fraction of analytic DBMS vendors have own-branded versions of Hadoop alongside their DBMS, possibly via acquisition. Based on this, he calls for a formal, presumably vendor-driven Hadoop standardization effort, evidently for the whole Hadoop stack. He also says that

Hadoop is the nucleus of the next-generation cloud EDW, but that promise is still 3-5 years from fruition

where by “cloud” I presume Jim means first and foremost “private cloud.”

I don’t think any of that matches Hadoop’s actual strengths and weaknesses, whether now or in the 3-7 year future. My reasoning starts:

Hadoop is well on its way to being a surviving data-storage-plus-processing system — like an analytic DBMS or DBMS-imitating data integration tool …
… but Hadoop is best-suited for somewhat different use cases than those technologies are, and the gap won’t close as long as the others remain a moving target.
I don’t think MapReduce is going to fail altogether; it’s too well-suited for too many use cases.
Hadoop (as opposed to general MapReduce) has too much momentum to fizzle, perhaps unless it is supplanted by one or more embrace-and-extend MapReduce-plus systems that do a lot more than it does.
The way for Hadoop to avoid being a MapReduce afterthought is to evolve sufficiently quickly itself; ponderous standardization efforts are quite beside the point.

As for the rest of Jim’s claim — I see three main candidates for the “nucleus of the next-generation enterprise data warehouse,” each with better claims than Hadoop:

Relational DBMS, much like today. (E.g., Teradata, DB2, Exadata or their successors.) This is the case in which robustness of the central data store matters most.
Grand cosmic data integration tools. (The descendants of Informatica PowerCenter, et al.) This is the case in which the logic of data relationships can safely be separated from physical storage.
Nothing. (The architecture could have several strong members, none of which is truly the “nucleus.”) This is the case in which new ways keep being invented to extract high value from data, outrunning what grandly centralized solutions can adapt to. I think this is the most likely case of all.

Categories: Data integration and middleware, EAI, EII, ETL, ELT, ETLT, Hadoop, MapReduce, Theory and architecture

Subscribe to our complete feed!

Comments

9 Responses to “Hadoop confusion from Forrester Research”

unholyguy on June 6th, 2011 12:59 am

I agree about the standardization rathole, we were discussing this in a recent hadoop user group

To me, the key is for the hadoop development community to continue to aggressively improve the product and lower the barrier to entry. The last thing hadoop needs is a standardization distraction

As far as the core stack, CDH3 seems to be the core stack for all intents and purposes
Alan Scott on June 8th, 2011 3:44 pm

We’ve just put our HADOOP strategy through a major left turn because of the aleged patent infringement brought by parallel iron against AOL, with IBM and others as co-defendents. Parallel iron claims that HDFS infringes upon a patent that they filed back in the early 2000’s. Our HADOOP strategy is to now look for an edition of HADOOP that is not based on HDFS. There are a couple out there. Can you shed some light on this mess? Regardless of who wins in (or out of) court, we don’t want to be anywhere near it from a legal perspective. I suspect many others will follow.
Curt Monash on June 8th, 2011 6:20 pm

Alan,

It depends on your use case. DataStax/Brisk and Hadapt both make strong arguments, optimized for rather different use cases. The HDFS fans can say, rightly “We have huge production user examples and those other guys don’t.” But if you don’t want to use HDFS, that’s irrelevant. 🙂

Or you can leave Hadoop and go with Aster Data’s SQL/MR (SQL-MapReduce), which is very cool in theory and also has nice references.
Does Parallel Iron LLC's patent infringement lawsuit against IBM et al have implications for Hadoop Distributed File System? - Quora on June 8th, 2011 7:56 pm

[…] File System?I was surprised to see the comment on Curt Monash's blog post from Alan Scott: http://www.dbms2.com/2011/06/05/… As a developer in related areas, I have learnt not to read others' patents. Can someone who […]
Curt Monash on June 10th, 2011 3:12 am

Hi again, Alan,

Upon review, that legal proceeding really doesn’t make any sense.

http://www.dbms2.com/2011/06/10/patent-nonsense-parallel-ironhdfs-edition/
It's The API, Stupid! | Bacon Wrapped Data on June 10th, 2011 9:42 am

[…] them guardians of all things analytical. In this respect, I agree with Curt Monash’s response to Mr. Kobielus’ post: Hadoop and MapReduce do not require EDW support in order to gain legitimacy in the enterprise. […]
Metaphors amok | DBMS 2 : DataBase Management System Services on June 15th, 2011 2:40 am

[…] all started when I disputed James Kobielus’ blogged claim that Hadoop is the nucleus of the next-generation cloud EDW. Jim posted again to reiterate the claim, only this time he wrote that all EDW vendors [will soon] […]
Is Hadoop just the flavor of the day? — Cloud Computing News on October 5th, 2011 7:23 pm

[…] analyst Curt Monash isn’t so sure about Hadoop as the foundation of data warehouses, but he did offer this assessment of Hadoop’s […]
OnlineMagazine » Blog Archive » Is Hadoop just the flavor of the day? on October 5th, 2011 7:39 pm

[…] analyst Curt Monash isn’t so sure about Hadoop as the foundation of data warehouses, but he did offer this assessment of Hadoop’s […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Hadoop confusion from Forrester Research

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin