February 18, 2015

Hadoop: And then there were three

Hortonworks, IBM, EMC Pivotal and others have announced a project called “Open Data Platform” to do … well, I’m not exactly sure what. Mainly, it sounds like:

An attempt to minimize the importance of any technical advantages Cloudera or MapR might have.
A face-saving way to admit that IBM’s and Pivotal’s insistence on having their own Hadoop distributions has been silly.
An excuse for press releases.
A source of an extra logo graphic to put on marketing slides.

Edit: Now there’s a press report saying explicitly that Hortonworks is taking over Pivotal’s Hadoop distro customers (which basically would mean taking over the support contracts and then working to migrate them to Hortonworks’ distro).

The claim is being made that this announcement solves some kind of problem about developing to multiple versions of the Hadoop platform, but to my knowledge that’s a problem rarely encountered in real life. When you already have a multi-enterprise open source community agreeing on APIs (Application Programming interfaces), what API inconsistency remains for a vendor consortium to painstakingly resolve?

Anyhow, it now seems clear that if you want to use a Hadoop distribution, there are three main choices:

Cloudera’s flavor, whether as software (from Cloudera) or in an appliance (e.g. from Oracle).
MapR’s flavor, as software from MapR.
Hortonworks’ flavor, from a number of vendors, including Hortonworks, IBM, Pivotal, Teradata et al.

In saying that, I’m glossing over a few points, such as:

There are various remote services that run Hadoop, most famously Amazon’s Elastic MapReduce.
You could get Apache Hadoop directly, rather than using the free or paid versions of a vendor distro. But why would you make that choice, unless you’re an internet bad-ass on the level of Facebook, or at least think that you are?
There will surely always be some proprietary stuff mixed into, for example, IBM’s BigInsights, so as to preserve at least the perception of all-important vendor lock-in.

But the main point stands — big computer companies, such as IBM, EMC (Pivotal) and previously Intel, are figuring out that they can’t bigfoot something that started out as an elephant — stuffed or otherwise — in the first place.

If you think I’m not taking this whole ODP thing very seriously, you’re right.

Related links

It’s a bit eyebrow-raising to see Mike Olson take a “more open source than thou” stance about something, but basically his post about this news is spot-on.
My take on Hadoop distributions two years ago might offer context. Trivia question: What’s the connection between the song that begins that post and the joke that ends it?

Categories: Amazon and its cloud, Cloudera, EMC, Emulation, transparency, portability, Greenplum, Hadoop, Hortonworks, IBM and DB2, MapR, Open source

Subscribe to our complete feed!

Comments

11 Responses to “Hadoop: And then there were three”

Greenplum is being open sourced | DBMS 2 : DataBase Management System Services on February 18th, 2015 4:51 pm

[…] I don’t find the Open Data Platform thing very significant, an associated piece of news seems cooler — Pivotal is open sourcing a […]
The open data platform, like United Linux before it, will fail | Nagg on February 20th, 2015 5:38 pm

[…] ODP is a sign of weakness for the sponsoring members. Analyst Curt Monash described it[3] as “A face-saving way to admit that IBM’s and Pivotal’s insistence on having […]
clive boulton on February 20th, 2015 6:13 pm

Lord Ganesha says Google.

Elephants grants blessings of prosperity and wisdom.
https://en.wikipedia.org/wiki/Ganesha

Coda. Lord Ganesha made Google!
The Open Data Platform, like United Linux before it, will fail | Big Data on February 21st, 2015 10:18 am

[…] ODP is a sign of weakness for the sponsoring members. Analyst Curt Monash described it as “A face-saving way to admit that IBM’s and Pivotal’s insistence on having […]
Free Hadoop! on February 21st, 2015 2:53 pm

http://blog.pivotal.io/pivotal/p-o-v/open-data-platform-initiative-putting-an-end-to-faux-pen-source-apache-hadoop-distributions sheds a new light on ODP.

There are actually quite a bit of API inconsistencies to resolve.
But these inconsistencies are not found by looking at each project individually. Distributions are at a layer above that where all these projects interact with each others.
They have to answers questions such as “If I upgrade Apache Hive on a Kerberos enabled cluster, will it break Apache HBase integration?”
And while Apache projects have a great test coverage and a lot of efforts are put into testing them individually, this sort of question is very difficult to answer without testing.
Which is also why these sorts of tests are rarely contributed back to the Apache Community.

I am curious how this effort will pan out, but if it enables the various actors to contribute scenarios or test cases they care about, it will be a huge win for everyone.
Curt Monash on February 21st, 2015 6:49 pm

So according to that blog Pivotal has been planning this move to open source since the first half of 2014 (or longer)?
Allen on February 22nd, 2015 5:50 pm

ODP reminiscent of OSF?
http://en.m.wikipedia.org/wiki/Open_Software_Foundation
Harmony or Hail Mary? Experts debate need for the Open Data Platform | SiliconANGLE on February 26th, 2015 8:00 am

[…] opinion was shared by analyst Curt Monash, who dismissed the effort as “a face-saving way to admit that IBM’s and Pivotal’s insistence on having […]
Databricks and Spark update | DBMS 2 : DataBase Management System Services on February 28th, 2015 6:06 am

[…] API works against a test harness. Speaking of certification, Ion basically agrees with my views on ODP, although like many — most? — people he expresses himself more politely than I […]
Deenar Toraskar on March 2nd, 2015 12:11 am

>> You could get Apache Hadoop directly, rather than using the free or paid versions of a vendor distro. But why would you make that choice, unless you’re an internet bad-ass on the level of Facebook, or at least think that you are?

I disagree with the point that you need to be an internet bad-ass to use the Apache Hadoop distro. I have first hand experience of using Apache Hadoop in production for over 3 years now. Never had any issues. In fact the Apache Hadoop distro can run purely on an unprivileged account (no root access/sudo required). There has been the occasional need to patch Hive mainly because we use a rarely used database for the metastore.

Deenar
Strata Hadoop World 2015 summary - Simba Technologies on November 19th, 2015 3:15 pm

[…] in San Jose Tuesday afternoon, Cloudera’s Doug Cutting had responded as did Curt Monash the next day. The level of activities in this market makes it difficult to judge the merit of such broad […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Hadoop: And then there were three

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin