Context for Cloudera
Hadoop World/Strata is this week, so of course my clients at Cloudera will have a bunch of announcements. Without front-running those, I think it might be interesting to review the current state of the Cloudera product line. Details may be found on the Cloudera product comparison page. Examining those details helps, I think, with understanding where Cloudera does and doesn’t place sales and marketing focus, which given Cloudera’s Hadoop market stature is in my opinion an interesting thing to analyze.
So far as I can tell (and there may be some errors in this, as Cloudera is not always accurate in explaining the fine details):
- CDH (Cloudera Distribution … Hadoop) contains a lot of Apache open source code.
- Cloudera has a much longer list of Apache projects that it thinks comprise “Core Hadoop” than, say, Hortonworks does.
- Specifically, that list currently is: Hadoop, Flume, HCatalog, Hive, Hue, Mahout, Oozie, Pig, Sentry, Sqoop, Whirr, ZooKeeper.
- In addition to those projects, CDH also includes HBase, Impala, Spark and Cloudera Search.
- Cloudera Manager is closed-source code, much of which is free to use. (I.e., “free like beer” but not “free like speech”.)
- Cloudera Navigator is closed-source code that you have to pay for (free trials and the like excepted).
- Cloudera Express is Cloudera’s favorite free subscription offering. It combines CDH with the free part of Cloudera Manager. Note: Cloudera Express was previously called Cloudera Standard, and that terminology is still reflected in parts of Cloudera’s website.
- Cloudera Enterprise is the umbrella name for Cloudera’s three favorite paid offerings.
- Cloudera Enterprise Basic Edition contains:
- All the code in CDH and Cloudera Manager, and I guess Accumulo code as well.
- Commercial licenses for all that code.
- A license key to use the entirety of Cloudera Manager, not just the free part.
- Support for the “Core Hadoop” part of CDH.
- Support for Cloudera Manager. Note: Cloudera is lazy about saying this explicitly, but it seems obvious.
- The code for Cloudera Navigator, but that’s moot, as the corresponding license key for Cloudera Navigator is not part of the package.
- Cloudera Enterprise Data Hub Edition contains:
- Everything in Cloudera Basic Edition.
- A license key for Cloudera Navigator.
- Support for all of HBase, Accumulo, Impala, Spark, Cloudera Search and Cloudera Navigator.
- Cloudera Enterprise Flex Edition contains everything in Cloudera Basic Edition, plus support for one of the extras in Data Hub Edition.
In analyzing all this, I’m focused on two particular aspects:
- The “zero, one, many” system for defining the editions of Cloudera Enterprise.
- The use of “Data Hub” as a general marketing term.
Given its role as a highly influential yet still small “platform” vendor in a competitive open source market, Cloudera even more than most vendors faces the dilemma:
- Cloudera wants customers to adopt its views as to which Hadoop-related technologies they should use.
- However, Cloudera doesn’t want to be in the position of trying to ram some particular unwanted package down a customer’s throat.
The Flex/Data Hub packaging fits great with that juggling act, because Cloudera — and hence also Cloudera salespeople — get paid exactly as much when customers pick 2 Flex options as when they use all 5-6. If you prefer Cassandra or MongoDB to HBase, Cloudera is fine with that. Ditto if you prefer CitusDB or Vertica or Teradata Hadapt to Impala. Thus Cloudera can avoid a lot of religious wars, even if it can’t entirely escape Hortonworks’ “More open source than thou” positioning.
Meanwhile, so far as I can tell, Cloudera currently bets on the “Enterprise Data Hub” as its core proposition, as evidenced by that term being baked into the name of Cloudera’s most comprehensive and expensive offering. Notes on the EDH start:
- Cloudera also portrays “enterprise data hub” as an architectural/reference architecture concept.
- “Enterprise data hub” doesn’t really mean anything very different from “data lake” + “data refinery”; Cloudera just thinks it sounds more important. Indeed, Cloudera claims that the other terms are dismissive or disparaging, at least in some usages.
Cloudera’s long-term dream is clearly to make Hadoop the central data platform for an enterprise, while RDBMS fill more niche (or of course also legacy) roles. I don’t think that will ever happen, because I don’t think there really will be one central data platform in the future, any more than there has been in the past. As I wrote last year on appliances, clusters and clouds,
Ceteris paribus, fewer clusters are better than more of them. But all things are not equal, and it’s not reasonable to try to reduce your clusters to one — not even if that one is administered with splendid efficiency by low-cost workers, in a low-cost building, drawing low-cost electric power, in a low-cost part of the world.
and earlier in the same post
… these are not persuasive reasons to put everything on a SINGLE cluster or cloud. They could as easily lead you to have your VMware cluster and your Exadata rack and your Hadoop cluster and your NoSQL cluster and your object storage OpenStack cluster — among others — all while participating in several different public clouds as well.
One system is not going to be optimal for all computing purposes.
Comments
2 Responses to “Context for Cloudera”
Leave a Reply
[…] Hadoop World, Cloudera naturally put out a flurry of press releases. In anticipation, I put out a context-setting post last weekend. That said, the gist of the news seems to […]
[…] Cloudera’s move to “zero/one/many” pricing, Cloudera salespeople have little incentive to push HBase hard to accounts other than […]