Introduction to Zettaset
Zettaset is confusing, but as best I understand:
- Zettaset sells Hadoop add-on/enhancement software, with what might be called an enterprise-friendly Hadoop management focus.
- “Business intelligence” gets mentioned prominently in Zettaset’s marketing, but not in what executives now say. Apparently the BI focus is old news, predating a hard pivot.
- Zettaset’s marketing also mentions NoSQL, for little reason that I can discern, except insofar as Zettaset relies on HBase.
- CEO Brian Christian told me that Zettaset has been around since December, 2007; on the other hand, Zettaset press release boilerplate says Zettaset was founded in 2009. Apparently, the distinction is that Zettaset was founded in 2007 as a consultancy, but turned its efforts to software development in 2009.
- Zettaset has fewer than 20 people but is “hiring like mad.”
- Zettaset just did a $3 million Series A round — or maybe is just announcing it now; the latter interpretation might explain how those 20ish people are getting paid.
- Zettaset’s product was just launched and made generally available, notwithstanding that Version 2 of Zettaset’s product was shipped last year to fanfare on Zettaset’s blog.
- Zettaset’s pricing is based on how many terabytes of compressed data it is being used to manage.
- Until very recently, Zettaset was called GOTO Metrics; I imagine the name change is connected to the strategy pivot.
- Zettaset told me of one big customer — with an almost-petabyte Hadoop cluster before compression — namely Zions Bancorporation.
- Zettaset has “a number of paying customers” overall.
Zettaset’s basic product pitch is that it gives you a management console that not only observes Hadoop services, but actually directs them. So your administrators are saved from having to know Hadoop; they only have to know Zettaset instead. What’s more, Zettaset solves some of Hadoop’s issues for you, such as NameNode single point of failure. Automated backup got mentioned in my discussions with Zettaset too, as did taking specific nodes in and out of service.
Also, the Zettaset folks largely come from a security background. Not coincidentally, encryption, retention assurance, and so on are in near-term (by year-end or so) product plans, with a compliance orientation. I like that idea, because of how well it fits with a subset of the big bit bucket use case.
While I was trying to sort out various uncertainties, Zettaset CEO Brian Christina sent over the comment:
Our ‘Hadoop’ story relates to traditional business requirements – monitoring, alerting, back-up and recovery, security and continuous integration. The Hadoop stack and subsequent packages have over 30 processes which are currently stand alone products without automated safeguards. Zettaset productizes Apache Hadoop by automating those safeguards so IT can securely leverage Hadoop without a Professional Service commitment, or building/maintaining those 30 processes with in-house Hadoop expertise.
Presumably the “currently” in that refers to vanilla Apache Hadoop, not to Cloudera Enterprise.
The coolest part of the Zettaset story is probably what Zettaset does with HBase to get around the HDFS small files/file number limit problem:
- Rather than store file metadata in RAM on a Hadoop NameNode, Zettaset stores it in HBase, so that it can be striped across the whole cluster just like any other HBase data is.
- Further, some files are entirely stored — I presume as BLOBs — in the same HBase table that holds the metadata for all files.
- There is a user-configurable file size threshold that determines whether a file is stored:
- In HBase as described above.
- Directly in HDFS in the usual way.
Comments
4 Responses to “Introduction to Zettaset”
Leave a Reply
Keeping NN meta data in a distributed K/V store will result in a significantly degraded overall performance. Fetching data from RAM and from remote hosts disk? How can you compare that?
DOA.
But will that cost be large when compared with the cost of actually retrieving the file, which is kind of the point of a NameNode lookup?
From the google GFSII reference, the NameNode metadata stored at Bigtable which keep at memory rather than disk. So I think NN meta data in distributed K/V store may resolve this problem if using properly.
[…] a follow up to the MongoDB positioning itself as Big Data and development agile environment, I’ve found this bit of data on Curt Monash’s […]