Hadoop: And then there were three
Hortonworks, IBM, EMC Pivotal and others have announced a project called “Open Data Platform” to do … well, I’m not exactly sure what. Mainly, it sounds like:
- An attempt to minimize the importance of any technical advantages Cloudera or MapR might have.
- A face-saving way to admit that IBM’s and Pivotal’s insistence on having their own Hadoop distributions has been silly.
- An excuse for press releases.
- A source of an extra logo graphic to put on marketing slides.
Edit: Now there’s a press report saying explicitly that Hortonworks is taking over Pivotal’s Hadoop distro customers (which basically would mean taking over the support contracts and then working to migrate them to Hortonworks’ distro).
The claim is being made that this announcement solves some kind of problem about developing to multiple versions of the Hadoop platform, but to my knowledge that’s a problem rarely encountered in real life. When you already have a multi-enterprise open source community agreeing on APIs (Application Programming interfaces), what API inconsistency remains for a vendor consortium to painstakingly resolve?
Anyhow, it now seems clear that if you want to use a Hadoop distribution, there are three main choices:
- Cloudera’s flavor, whether as software (from Cloudera) or in an appliance (e.g. from Oracle).
- MapR’s flavor, as software from MapR.
- Hortonworks’ flavor, from a number of vendors, including Hortonworks, IBM, Pivotal, Teradata et al.
In saying that, I’m glossing over a few points, such as:
- There are various remote services that run Hadoop, most famously Amazon’s Elastic MapReduce.
- You could get Apache Hadoop directly, rather than using the free or paid versions of a vendor distro. But why would you make that choice, unless you’re an internet bad-ass on the level of Facebook, or at least think that you are?
- There will surely always be some proprietary stuff mixed into, for example, IBM’s BigInsights, so as to preserve at least the perception of all-important vendor lock-in.
But the main point stands — big computer companies, such as IBM, EMC (Pivotal) and previously Intel, are figuring out that they can’t bigfoot something that started out as an elephant — stuffed or otherwise — in the first place.
If you think I’m not taking this whole ODP thing very seriously, you’re right.
Related links
- It’s a bit eyebrow-raising to see Mike Olson take a “more open source than thou” stance about something, but basically his post about this news is spot-on.
- My take on Hadoop distributions two years ago might offer context. Trivia question: What’s the connection between the song that begins that post and the joke that ends it?
Comments
11 Responses to “Hadoop: And then there were three”
Leave a Reply
[…] I don’t find the Open Data Platform thing very significant, an associated piece of news seems cooler — Pivotal is open sourcing a […]
[…] ODP is a sign of weakness for the sponsoring members. Analyst Curt Monash described it[3] as “A face-saving way to admit that IBM’s and Pivotal’s insistence on having […]
Lord Ganesha says Google.
Elephants grants blessings of prosperity and wisdom.
https://en.wikipedia.org/wiki/Ganesha
Coda. Lord Ganesha made Google!
[…] ODP is a sign of weakness for the sponsoring members. Analyst Curt Monash described it as “A face-saving way to admit that IBM’s and Pivotal’s insistence on having […]
http://blog.pivotal.io/pivotal/p-o-v/open-data-platform-initiative-putting-an-end-to-faux-pen-source-apache-hadoop-distributions sheds a new light on ODP.
There are actually quite a bit of API inconsistencies to resolve.
But these inconsistencies are not found by looking at each project individually. Distributions are at a layer above that where all these projects interact with each others.
They have to answers questions such as “If I upgrade Apache Hive on a Kerberos enabled cluster, will it break Apache HBase integration?”
And while Apache projects have a great test coverage and a lot of efforts are put into testing them individually, this sort of question is very difficult to answer without testing.
Which is also why these sorts of tests are rarely contributed back to the Apache Community.
I am curious how this effort will pan out, but if it enables the various actors to contribute scenarios or test cases they care about, it will be a huge win for everyone.
So according to that blog Pivotal has been planning this move to open source since the first half of 2014 (or longer)?
ODP reminiscent of OSF?
http://en.m.wikipedia.org/wiki/Open_Software_Foundation
[…] opinion was shared by analyst Curt Monash, who dismissed the effort as “a face-saving way to admit that IBM’s and Pivotal’s insistence on having […]
[…] API works against a test harness. Speaking of certification, Ion basically agrees with my views on ODP, although like many — most? — people he expresses himself more politely than I […]
>> You could get Apache Hadoop directly, rather than using the free or paid versions of a vendor distro. But why would you make that choice, unless you’re an internet bad-ass on the level of Facebook, or at least think that you are?
I disagree with the point that you need to be an internet bad-ass to use the Apache Hadoop distro. I have first hand experience of using Apache Hadoop in production for over 3 years now. Never had any issues. In fact the Apache Hadoop distro can run purely on an unprivileged account (no root access/sudo required). There has been the occasional need to patch Hive mainly because we use a rarely used database for the metastore.
Deenar
[…] in San Jose Tuesday afternoon, Cloudera’s Doug Cutting had responded as did Curt Monash the next day. The level of activities in this market makes it difficult to judge the merit of such broad […]