Hadoop notes
I visited California recently, and chatted with numerous companies involved in Hadoop — Cloudera, Hortonworks, MapR, DataStax, Datameer, and more. I’ll defer further Hadoop technical discussions for now — my target to restart them is later this month — but that still leaves some other issues to discuss, namely adoption and partnering.
The total number of enterprises in the world paying subscription and license fees that they would regard as being for “Hadoop or something Hadoop-related” probably is not much over 100 right now, but I’d expect to see pretty rapid growth. Beyond that, let’s divide customers into three groups:
- Internet businesses.
- Traditional enterprises ‘ internet operations.
- Traditional enterprises’ other operations.
Hadoop vendors, in different mixes, claim to be doing well in all three segments. Even so, almost all use cases involve some kind of machine-generated data, with one exception being a credit card vendor crunching a large database of transaction details. Multiple kinds of machine-generated data come into play — web/network/mobile device logs, financial trade data, scientific/experimental data, and more. In particular, pharmaceutical research got some mentions, which makes sense, in that it’s one area of scientific research that actually enjoys fat for-profit research budgets.
On the partnering side, I heard things about a Hortonworks conference call that do not seem to have been contradicted by my visit to Hortonworks. Namely, Hortonworks promised prospective partners, such as analytic DBMS vendors, hardware vendors, or large system integrators, that it wouldn’t compete with them, in that Hortonworks pledges not to introduce its own products for at least two years. This is presumably targeted most directly at Cloudera, which has lots of partners, but also some proprietary code of its own. MapR, I’d think, would be the #2 target, but that’s just speculation.
The other big part of Hortonworks’ story is the claim that it holds the axe in Apache Hadoop development. Nobody doubts that a large fraction of the work on Hadoop’s core projects was done by Yahoo employees. Many of those indeed moved to Hortonworks; others left Yahoo earlier; Hadoop creator Doug Cutting is actually at Cloudera. So just how dominant Hortonworks really is in core Hadoop development is a bit unclear. Meanwhile, Cloudera people seem to be leading a number of Hadoop companion or sub-projects, including the first two I can think of that relate to Hadoop integration or connectivity, namely Sqoop and Flume. So I’m not persuaded that the “we know this stuff better” part of the Hortonworks partnering story really holds up.
What I am persuaded of is that the Hadoop platform competition is a good thing. Whichever vendors and projects win will be healthier from having had to outcompete worthy alternatives.
Comments
5 Responses to “Hadoop notes”
Leave a Reply
Yes, but what about the most recent round of suits by Parallel Iron for simply using Hadoop? Hulu, Twitter, two Amazon companies, EMC, and a bunch of small players are defendants in Parallel Iron’s latest suit.
Under Cloudera’s terms & conditions YOU indemnify THEM for using their software.
With a patent troll out there actively persecuting & prosecuting Hadoop users, people will be frightened away from Hadoop or at least apprehensive about publicizing their usage of Hadoop.
The next Parallel Iron suit may reference this blog post & list “Cloudera, Hortonworks, MapR, DataStax, Datameer, and more” as defendants. The defendants in the latest suit were gleaned mostly from an Information Week article about Hadoop…
If the pain becomes significant, I would hope and trust a joint legal effort/defense fund would emerge.
But as per http://www.dbms2.com/2011/06/10/patent-nonsense-parallel-ironhdfs-edition/ , it’s a remarkably bogus patent claim anyhow.
“So I’m not persuaded that the “we know this stuff better” part of the Hortonworks partnering story really holds up.”
One of the beautiful things about open projects like Apache Hadoop is that the contributions are transparent, so it’s easy for everyone to see who is doing what, and how that may or may not change over time. It’s in the open, and all can see for themselves.
http://www.hortonworks.com/the-yahoo-effect/
Cheers,
E14
[…] few weeks ago I wrote: The other big part of Hortonworks’ story is the claim that it holds the axe in Apache Hadoop […]