MapReduce

Analysis of implementations of and issues associated with the parallel programming framework MapReduce. Related subjects include:

October 10, 2010

Partnering with Cloudera

After I criticized the marketing of the Aster/Cloudera partnership, my clients at Aster Data and Cloudera ganged up on me and tried to persuade me I was wrong. Be that as it may, that conversation and others were helpful to me in understanding the core thesis:  Read more

October 10, 2010

EMC/Greenplum notes

I dropped by the former Greenplum for my quarterly consulting visit (scheduled for the first week of Q4 for a couple of reasons, one of them XLDB4). Much of what we discussed was purely advisory and/or confidential — duh! — but there were real, nonconfidential takeaways in two areas.

First, feelings about the EMC acquisition are still very positive.

Read more

August 21, 2010

The substance of Pentaho’s Hadoop strategy

Pentaho has been talking about a Hadoop-related strategy. Unfortunately, in support of its Hadoop efforts, Pentaho has been — quite insistently — saying things that don’t make a lot of sense to people who know anything about Hadoop.

That said, I think I found four sensible points in Pentaho’s Hadoop strategy, namely:

  1. If you use an ETL tool like Pentaho’s to move things in and out of HDFS, you may be able to orchestrate two more steps in the ETL process than if you used Hadoop’s native orchestration tools.
  2. A lot of what you want to do in MapReduce is things that can be graphically specified in an ETL tool like Pentaho’s. (That would include tokenization or regex.)
  3. If you have some really lightweight BI requirements (ad hoc, reporting, or whatever) against HDFS data, you might be content to do it straight against HDFS, rather than moving the data into a real DBMS. If so, BI tools like Pentaho’s might be useful.
  4. Somebody might want to use a screwy version of MapReduce, where by “screwy” I mean anything that isn’t Cloudera Enterprise, Aster Data SQL/MapReduce, or some other implementation/distribution with a lot of supporting tools. In that case, they might need all the tools they can get.

The first of those points is, in the grand scheme of things, pretty trivial.

The third one makes sense. While Hadoop’s Hive client means you could roll your own integration with your own favorite BI tool in any case, having somebody certify it for you themselves could be nice. So if Pentaho ships something that works before other vendors do, good on them. (Target date seems to be October.)

The fourth one is kind of sad.

But if there’s any shovel-meet-pony aspect to all this — or indeed a reason for writing this blog post — it would be the second point. If one understands data management, but is in the “Oh no! Hadoop wants me to PROGRAM!” crowd, then being able to specify one’s MapReduce might be a really nice alternative versus having to actually code it.

August 11, 2010

Big Data is Watching You!

There’s a boom in large-scale analytics. The subjects of this analysis may be categorized as:

The most varied, interesting, and valuable of those four categories is the first one.

Read more

July 23, 2010

Some interesting links

In no particular order:  Read more

June 30, 2010

Cloudera Enterprise and Hadoop evolution

I talked with Cloudera a couple of weeks ago in connection with the impending release of Cloudera Enterprise. I’d say:  Read more

May 7, 2010

Clarifying the state of MPP in-database SAS

I routinely am briefed way in advance of products’ introductions. For that reason and others, it can be hard for me to keep straight what’s been officially announced, introduced for test, introduced for general availability, vaguely planned for the indefinite future, and so on. Perhaps nothing has confused me more in that regard than the SAS Institute’s multi-year effort to get SAS integrated into various MPP DBMS, specifically Teradata, Netezza Twinfin(i), and Aster Data nCluster.

However, I chatted briefly Thursday with Michelle Wilkie, who is the SAS product manager overseeing all this (and also some other stuff, like SAS running on grids without being integrated into a DBMS). As best I understood, the story is: Read more

April 18, 2010

Aster Data’s mapreduce.org site

Aster Data has started a site mapreduce.org, which purports to compile “the best information about MapReduce.” At the moment, mapreduce.org highlights include:

*I did a couple of MapReduce-related webinars for Aster late last year. 🙂 But seriously — Aster does a good job of writing clear and informative collateral.

April 16, 2010

Introduction to Datameer

Elder care issues have flared up with a vengeance, so I’m not going to be blogging much for a while, and surely not at any length. That said, my first post about Datameer was never going to be very long, so lets get right to it:

March 13, 2010

The Naming of the Foo

Let’s start from some reasonable premises. Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.