Cloudera and Hortonworks
My clients at Cloudera have been around for a while, in effect positioned as “the Hadoop company.” Their business, in a nutshell, consists of:
- Packaging up a Cloudera distribution of Apache Hadoop. This distribution doesn’t have proprietary code; it’s just packaged by Cloudera from Apache projects (with a decent minority of the code happening to have been contributed by Cloudera engineers).
- Paid subscription support for Apache Hadoop and, in connection with that …
- … proprietary software that all support customers automatically get. There are two points to this proprietary software:
- It adds value for the customer.
- It makes Cloudera’s support job easier.
- Professional services around Hadoop.
- Training and conferences around Hadoop, which probably don’t generate all that much money, but are great marketing in terms of visibility, thought leadership, and lead generation.
Hortonworks spun out of Yahoo last week, with parts of the Cloudera business model, namely Hadoop support, training, and I guess conferences. Hortonworks emphatically rules out professional services, and says that it will contribute all code back to Apache Hadoop. Hortonworks does grudgingly admit that it might get into the proprietary software business at some point — but evidently hopes that day will never actually come.
Hortonworks’ two main initial marketing messages — and there’s some synergy between these — boil down to:
- Open source purism
- “We have most of the Hadoop developers, so we’re better”*
Frankly, the open source purism part sounds like doubletalk to me, in that Hortonworks has trouble articulating what supposedly-less-pure Cloudera does wrong that Hortonworks will do better. However, I’ve been hearing for a long time that Yahoo’s MapReduce developers feel very strongly about open source, so perhaps this is in part an emotional issue for them. More substantively, it fits well with the pro-Hortonworks story I’ve outlined below.
*”We have most of the Hadoop developers” seems fairly defensible, give or take dueling definitions of “committer,” “core developer,” “patch” or for that matter “Hadoop.”
The other branch of the Hortonworks marketing message can be lampooned as “We’re the right folks to identify your bugs, since we’re probably the ones who put them there in the first place.” More darkly, that pitch could be “If you want the bugs fixed that bother you, we’re the ones who have control over whether or not that happens.” Well, maybe. But I also see Cloudera having a couple years experience supporting Hadoop, as well as shipping some code that perhaps makes Hadoop more supportable.
That’s the skeptical view. A more favorable view of Hortonworks’ prospects would go something like this:
- One version of Apache Hadoop is plenty.
- Cloudera (and arguably other Hadoop platform software vendors) sell capabilities that will soon be eclipsed by core Apache Hadoop. Folks should just please wait.
- Now that Hortonworks is an independent company focused on the task, it will speedily solve the packaging problems that have made Cloudera’s Hadoop distribution (perceived to be) necessary.
- Yahoo and IBM both back Hortonworks’ approach. That’s got to count for something.
- Apache Hadoop will be quickly enhanced, and Hortonworks will be driving the enhancements. Hortonworks simply is the top Hadoop authority.
We’ll see. Cloudera’s been around for a couple years, has smart people, and by definition has no technical inferiority to Hortonworks (since it has access to all Hortonworks’ code). What’s more, it will be a long time before Hadoop technology is so mature that there’s nothing left to do; add-on software should long prove to be useful. As for “We’re purer about open source than the other guys” — well, I’m dubious that that will turn out to be a great marketing message.
And so I think Cloudera is the early favorite in the competition. But perhaps Hadoop users will be able to play Cloudera and Hortonworks off against each other in price negotiations. Perhaps, notwithstanding my skepticism about Hadoop appliances, some hardware vendors will play them against each other for appliance partnerships.
Meanwhile, whatever else happens, I’m pretty psyched about some enhancements the Hortonworks folks plan to lead for Hadoop.
Related links
- A Hortonworks/Apache Hadoop slide deck Hortonworks graciously allowed me to post
- Cloudera’s post about it’s recent 3.5 release of Cloudera Enterprise
- Pros and cons of professional services efforts at young software companies
Comments
9 Responses to “Cloudera and Hortonworks”
Leave a Reply
[…] a new Hadoop company spun out of Yahoo, graciously permitted me to post a slide deck outlining an Apache Hadoop roadmap. Phase 1 refers to […]
I very much hope that there isn’t a fork in the future. These parties need to work together, on the same branch of code, which means sharing each others’ contributions.
Those who care about performance and reliability will go with MapR. It’ll probably be years before Apache Hadoop catches up with them in terms of performance and features.
Dan,
Cloudera vs. Hortonworks isn’t a matter of a fork, so far as I can seen, even though Hortonworks talks a fair amount about that concern.
MapR, Brisk, et al. are indeed forks, if not pitchforks.
Hi Curt,
Thanks for covering Hortonworks. In particular, I appreciate you pointing to the slides we shared with you. I’d also like to point out our HadoopSummit slides to anyone interested in learning more about Hortonwork’s plans for improving Apache Hadoop: http://www.hortonworks.com/hadoop-summit-presentations/.
I’d like to respond to a few of the points that you made in your post. First, I respectfully disagree with your assertion that our marketing message is limited to open source purism and “we have most of the Hadoop developers”. I’ve summarized points we covered in our conversation for your readers. Our objectives at Hortonworks are to:
1. Make Apache Hadoop projects easier to install, manage and use. We believe that anyone should be able to easily deploy Hadoop projects downloaded directly from Apache.
2. Make Apache Hadoop more robust. Much of this is spelled out in the slides referenced above. We plan to improve Hadoop performance, add high availability and improve administration and monitoring.
3. Make Apache Hadoop easier to integrate and extend. We want to work with technology vendors and other community members to create or improve open APIs that will make it easier to extend and experiment with Apache Hadoop.
This is not about focusing our energies on competing with any other vendor. We want to make Apache Hadoop better for everyone. There will still be value in having third parties package Apache Hadoop and add incremental functionality on top if it. These vendors will benefit from our work on core Apache Hadoop just like we will benefit from their contributions back to core Apache Hadoop. That’s one of the great things about open source. We firmly believe that we are in the early stages of a fundamental shift in how organizations store, manage and analyze the ever-increasing volume of data created inside and outside of their company’s walls. We believe that by focusing our efforts on making Apache Hadoop better, Apache Hadoop will become the de facto big data platform, which will create a huge business opportunity not only for Hortonworks but other vendors as well.
I know that a Hortonworks vs. Cloudera battle is a compelling story, but it’s clearly not our focus and I highly doubt it’s Cloudera’s focus either. Both companies can benefit from jointly working to improve Apache Hadoop. Our futures are much brighter because we are both going to be out there helping enterprises and technology vendors adopt Apache Hadoop.
Eric Baldeschwieler (a.k.a. Eric14), Hortonworks
Twitter @jeric14, @hortonworks
Eric,
Fair enough that your marketing messaging talks about a bunch of things other than head-to-head vs. Cloudera. That’s why, for example, I had a whole other blog post about the general Apache Hadoop roadmap as laid out by you.
But for now, the default prospect view has to be “Coolness! Hortonworks is going to help make Apache Hadoop better. All the more reason to install Hadoop and have it be supported by the company with a track record of supporting it, Cloudera.” And I do tend to focus on those aspects of a company’s marketing message that are relevant to its closest head-to-head competition.
‘But for now, the default prospect view has to be “Coolness! Hortonworks is going to help make Apache Hadoop better. All the more reason to install Hadoop and have it be supported by the company with a track record of supporting it, Cloudera.”’
As a spectator, that last comment sounded pretty biased. I think Cloudera has some great things for the community. However give Hortonworks a chance. They’ve outlined plans and have only been in operation for less than two weeks. Take Eric and his team at their word for now. Cloudera has a great team, but so does Hortonworks.
Jeremy,
As I said — “for now”. Cloudera has the lead, and Hortonworks could surely at some point overtake them. Both companies are still very, very young. But if I had a choice of getting support from an organization that has ~100 support clients (or whatever it is), with a >2 year history of serving some of them, vs. an organization with 1 client and no history, I know who I’d be more inclined to rely on in the short term.
[…] Cloudera and Hortonworks – dbms2 – July 2011 And then there is the animated discussion on who contributes more to the Hadoop source repo – e.g. number of patches vs. lines of code! Very entertaining stuff . […]