Hadoop news and rumors, June 23, 2013
Cloudera
- Cloudera changed CEOs last week. Tom Reilly, late of ArcSight, is the new guy (I don’t know him), while Mike Olson’s titles become Chairman and Chief Strategy Officer. Mike told me Friday that Reilly had secretly been working with him for months.
- Mike shared good-sounding numbers with me. But little is for public disclosure except the stat >400 employees.
- There are always rumors of infighting at Cloudera, perhaps because from earliest days Cloudera was a place where tempers are worn on sleeves. That said, Mike denied stories of problems between him and COO Kirk Dunn, and greatly praised Kirk’s successes at large-account sales.
- Cloudera now self-identifies pretty clearly as an analytic data management company. The vision is multiple execution engines – MapReduce, Impala, something more memory-centric, etc. – talking to any of a variety of HDFS file formats. While some formats may be optimized for specific engines – e.g. Parquet for Impala – anything can work with more or less anything.*
- Mike told me that Cloudera didn’t have any YARN users in production, but thought there would be some by year-end. Even so, he thinks it’s fair to say that Cloudera users have substantial portions of Hadoop 2 in production, for example NameNode failover and HDFS (Hadoop Distributed File System) performance enhancements. Ditto HCatalog.
*Of course, there will always be exceptions. E.g., some formats can be updated on a short-request basis, while others can only be written to via batch conversions.
Everybody else
- There’s a widespread belief that Hortonworks is being shopped. Numerous folks – including me — believe the rumor of an Intel offer for $700 million. Higher figures and alternate buyers aren’t as widely believed.
- Views of MapR market traction, never high, are again on the downswing.
- IBM Big Insights seems to have some traction.
- In case there was any remaining doubt — DBMS vendors are pretty unanimous in agreeing that it makes sense to have Hadoop too. To my knowledge SAP hasn’t been as clear about showing a markitecture incorporating Hadoop as most of the others have … but then, SAP’s markitecture is generally less clear than other vendors’.
- Folks I talk with are generally wondering where and why Datameer lost its way. That still leaves Datameer ahead of other first-generation Hadoop add-on vendors (Karmasphere, Zettaset, et al.), in that I rarely hear them mentioned at all.
- I visited with my client Platfora. Things seem to be going very well.
- My former client Revelytix seems to have racked up some nice partnerships. (I had something to do with that. :))
Categories: Cloudera, Data warehousing, Datameer, Hadoop, Hortonworks, IBM and DB2, Intel, MapR, Market share and customer counts, Platfora, SAP AG, Zettaset | 11 Comments |
Introduction to Zettaset
Zettaset is confusing, but as best I understand:
- Zettaset sells Hadoop add-on/enhancement software, with what might be called an enterprise-friendly Hadoop management focus.
- “Business intelligence” gets mentioned prominently in Zettaset’s marketing, but not in what executives now say. Apparently the BI focus is old news, predating a hard pivot.
- Zettaset’s marketing also mentions NoSQL, for little reason that I can discern, except insofar as Zettaset relies on HBase.
- CEO Brian Christian told me that Zettaset has been around since December, 2007; on the other hand, Zettaset press release boilerplate says Zettaset was founded in 2009. Apparently, the distinction is that Zettaset was founded in 2007 as a consultancy, but turned its efforts to software development in 2009.
- Zettaset has fewer than 20 people but is “hiring like mad.”
- Zettaset just did a $3 million Series A round — or maybe is just announcing it now; the latter interpretation might explain how those 20ish people are getting paid.
- Zettaset’s product was just launched and made generally available, notwithstanding that Version 2 of Zettaset’s product was shipped last year to fanfare on Zettaset’s blog.
- Zettaset’s pricing is based on how many terabytes of compressed data it is being used to manage.
- Until very recently, Zettaset was called GOTO Metrics; I imagine the name change is connected to the strategy pivot.
- Zettaset told me of one big customer — with an almost-petabyte Hadoop cluster before compression — namely Zions Bancorporation.
- Zettaset has “a number of paying customers” overall.
Categories: Hadoop, HBase, MapReduce, Market share and customer counts, Specific users, Zettaset | 4 Comments |
Hadoop futures and enhancements
Hadoop is immature technology. As such, it naturally offers much room for improvement in both industrial-strengthness and performance. And since Hadoop is booming, multiple efforts are underway to fill those gaps. For example:
- Cloudera’s proprietary code is focused on management, set-up, etc.
- The “Phase 1” plans Hortonworks shared with me for Apache Hadoop are focused on industrial-strengthness, as are significant parts of “Phase 2”.*
- MapR tells a performance story versus generic Apache Hadoop HDFS and MapReduce. (One aspect of same is just C++ vs. Java.)
- So does Hadapt, but mainly vs. Hive.
- Cloudera also tells me there’s a potential 4-5X performance improvement in Hive coming down the pike from what amounts to an optimizer rewrite.
(Zettaset belongs in the discussion too, but made an unfortunate choice of embargo date.)
Categories: Cloudera, Greenplum, Hadapt, Hadoop, HBase, MapR, MapReduce, Parallelization, Zettaset | 20 Comments |
Hadoop hardware and compression
A month ago, I posted about typical Hadoop hardware. After talking today with Eric Baldeschwieler of Hortonworks, I have an update. I also learned some things from Eric and from Brian Christian of Zettaset about Hadoop compression.
First the compression part. Eric thinks 6-10X compression is common for “curated” Hadoop data — i.e., the data that actually gets used a lot. Brian used an overall figure of 6-8X, and told of a specific customer who had 6X or a little more. By way of comparison, it sounds as if the kinds of data involved are like what Vertica claimed 10-60X compression for almost three years ago.
Eric also made an excellent point about low-value machine-generated data. I was suggesting that as Moore’s Law made sensor networks ever more affordable: Read more
Categories: Cloudera, Database compression, Hadoop, Hortonworks, Storage, Vertica Systems, Zettaset | 10 Comments |