June 26, 2012

Is salesforce.com going to stick with Oracle?

Surprisingly often, I’m asked “Is salesforce.com going to stick with Oracle?” So let me refer to and expand upon my previous post about salesforce.com’s database architecture by saying:

Some day, Marc Benioff will probably say “We turned off Oracle across most of our applications a while ago, and nobody outside the company even noticed.”

*in that

Note: This blog post is less readable than it would be if I’d found a better workaround to WordPress’ bugs in the area of nested bullet points. I’m sorry.

June 26, 2012

Teradata SQL-H, using HCatalog

When I grumbled about the conference-related rush of Hadoop announcements, one example of many was Teradata Aster’s SQL-H. Still, it’s an interesting idea, and a good hook for my first shot at writing about HCatalog. Indeed, other than the Talend integration bundled into Hortonworks’ HDP 1, Teradata SQL-H is the first real use of HCatalog I’m aware of.

The Teradata SQL-H idea is:

At least in theory, Teradata SQL-H lets you use a full set of analytic tools against your Hadoop data, with little limitation except price and/or performance. Teradata thinks the performance of all this can be much better than if you just use Hadoop (35X was mentioned in one particularly favorable example), but perhaps much worse than if you just copy/extract the data to an Aster cluster in the first place.

So what might the use cases be for something like SQL-H? Offhand, I’d say:

By way of contrast, the whole thing makes less sense for dashboarding kinds of uses, unless the dashboard users are very patient when they want to drill down.

June 25, 2012

Why I’m so forward-leaning about Hadoop features

In my recent series of Hadoop posts, there were several cases where I had to choose between recommending that enterprises:

I favored the more advanced features each time. Here’s why.

To a first approximation, I divide Hadoop use cases into two major buckets, only one of which I was addressing with my comments:

1. Analytic data management.* Here I favored features over reliability because they are more important, for Hadoop as for analytic RDBMS before it. When somebody complains about an analytic data store not being ready for prime time, never really working, or causing them to tear their hair out, what they usually mean is that:

Those complaints are much, much, more frequent than “It crashed”. So it was for Netezza, DATAllegro, Greenplum, Aster Data, Vertica, Infobright, et al. So it also is for Hadoop. And how does one address those complaints? By performance and feature enhancements, of the kind that the Hadoop community is introducing at high speed. Read more

June 19, 2012

Notes on HBase 0.92

This is part of a four-post series, covering:

As part of my recent round of Hadoop research, I talked with Cloudera’s Todd Lipcon. Naturally, one of the subjects was HBase, and specifically HBase 0.92. I gather that the major themes to HBase 0.92 are:

HBase coprocessors are Java code that links straight into HBase. As with other DBMS extensions of the “links straight into the DBMS code” kind,* HBase coprocessors seem best suited for very sophisticated users and third parties.** Evidently, coprocessors have already been used to make HBase security more granular — role-based, per-column-family/per-table, etc. Further, Todd thinks coprocessors could serve as a good basis for future HBase enhancements in areas such as aggregation or secondary indexing. Read more

June 19, 2012

“Enterprise-ready Hadoop”

This is part of a four-post series, covering:

The posts depend on each other in various ways.

Cloudera, Hortonworks, and MapR all claim, in effect, “Our version of Hadoop is enterprise-ready, unlike those other guys’.” I’m dubious.

That said, “enterprise-ready Hadoop” really is an important topic.

So what does it mean for something to be “enterprise-ready”, in whole or in part? Common themes in distinguishing between “enterprise-class” and other software include:

For Hadoop, as for most things, these concepts overlap in many ways. Read more

June 19, 2012

Hadoop distributions: CDH 4, HDP 1, Hadoop 2.0, Hadoop 1.0 and all that

This is part of a four-post series, covering:

The posts depend on each other in various ways.

My clients at Cloudera and Hortonworks have somewhat different views as to the maturity of various pieces of Hadoop technology. In particular:

*”CDH” stands, due to some trademarking weirdness, for “Cloudera’s Distribution including Apache Hadoop”. “HDP” stands for “Hortonworks Data Platform”.

Read more

June 19, 2012

Hadoop marketing themes that deserve to be ignored

This is part of a four-post series, covering:

The posts depend on each other in various ways.

I am subjected to much Hadoop marketing. Indeed, I even help various clients inflict Hadoop marketing upon the world. But a guy’s got to draw a line somewhere, and there are certain Hadoop marketing themes that I just refuse to take seriously.

1. Big data. I think the term “big data” long ago jumped the shark. If a firm uses the term “big data”, I teeth-grittingly let it pass. But if they send me PR email offering to “explain” the benefits or “real meaning” of “big data”, my response is apt to be unkind.

2. Conference-timed news. I’ve never liked the custom of multiple vendors piling announcements into the same conference week. It seems like a calculated strategy to ensure getting the least possible mind share and attention — unless, of course, your announcement is so lame that brief mentions in conference-week roundups are the most visibility you can hope to get. Even so, many vendors make the marketing choice to pile on. Fine. But I’ll write in response if and when I feel like it.

3. Contribution Olympics. The Urinary Olympics as to who contributed more lines of code, patches, whatever to various Hadoop sub-projects got pretty silly; and although it peaked last year, elements of it are with us still. I do see two scenarios where the whole discussion might have genuine value, namely:

Otherwise, however, I pay little attention to claims like “We thought this scheme up 2 years ago, and hence we’re the experts on whether it’s now ready for production.”

June 18, 2012

Introduction to MemSQL

I talked with MemSQL shortly before today’s launch. MemSQL technology basics are:

MemSQL’s performance claims include:

MemSQL company basics include: Read more

June 16, 2012

Metamarkets’ back-end technology

This is part of a three-post series:

The canonical Metamarkets batch ingest pipeline is a bit complicated.

By “get data read to be put into Druid” I mean:

That metadata is what goes into the MySQL database, which also retains data about shards that have been invalidated. (That part is needed because of the MVCC.)

By “build the data segments” I mean:

When things are being done that way, Druid may be regarded as comprising three kinds of servers: Read more

June 16, 2012

Metamarkets Druid overview

This is part of a three-post series:

My clients at Metamarkets are planning to open source part of their technology, called Druid, which is described in the Druid section of Metamarkets’ blog. The timing of when this will happen is a bit unclear; I know the target date under NDA, but it’s not set in stone. But if you care, you can probably contact the company to get involved earlier than the official unveiling.

I imagine that open-source Druid will be pretty bare-bones in its early days. Code was first checked in early in 2011, and Druid seems to have averaged around 1 full-time developer since then. What’s more, it’s not obvious that all the features I’m citing here will be open-sourced; indeed, some of the ones I’m describing probably won’t be.

In essence, Druid is a distributed analytic DBMS. Druid’s design choices are best understood when you recall that it was invented to support Metamarkets’ large-scale, RAM-speed, internet marketing/personalization SaaS (Software as a Service) offering. In particular:

Interestingly, the single-table/multi-valued choice is echoed at WibiData, which deals with similar data sets. However, WibiData’s use cases are different from Metamarkets’, and in most respects the WibiData architecture is quite different from that of Metamarkets/Druid.

Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.