MapReduce

Analysis of implementations of and issues associated with the parallel programming framework MapReduce. Related subjects include:

February 7, 2008

Why the Great MapReduce Debate broke out

While chatting with Mike Stonebraker today, I finally understood why he and Dave DeWitt launched the Great MapReduce Debate:

It was all about academia.

DeWitt noticed cases where study of MapReduce replaced study of real database management in the computer science curriculum. And he thought some MapReduce-related research papers were at best misleading. So DeWitt and Stonebraker decided to set the record straight.

Fireworks ensued.

Categories: MapReduce, Michael Stonebraker

5 Comments

January 24, 2008

Is MapReduce a good underpinning for next-gen scientific DBMS?

Back in November, Mike Stonebraker suggested that there’s a need for database management advances to serve “big science”. He said:

Obviously, the best solution to these … problems would be to put everything in a next-generation DBMS — one capable of keeping track of data, metadata, and lineage. Supporting the latter would require all operations on the data to be done inside the DBMS with user-defined functions — Postgres-style.

Categories: Data types, MapReduce, Scientific research

A passionate defense of MapReduce

Mark Chu-Carroll has weighed in with a passionate defense of MapReduce. I only see one thing he got wrong, which was to overlook the great shared-nothing parallelism of today’s data warehouse appliances and specialty data warehouse DBMS. But that doesn’t detract from his overall point, which is that MapReduce is designed to help with parallel computing in general, not database querying in particular.

He also has the best version I know of an old observation, namely:

… [relational database] people have found the most beautiful, wonderful, perfect hammer in the whole world. It’s perfectly balanced – not too heavy, not too light, and swings just right to pound in a nail just right every time. The grip is custom-made, fitted to the shape of the owners hand, so that they can use it all day without getting any blisters. It’s also beautifully decorated – encrusted with gemstones and gold filigree – but only in places that won’t detract from how well it works as a hammer. It really is the greatest hammer ever. Relational database guys love their hammer. It’s just such a wonderful tool! And when they make something with it, it really comes out great. In fact, they like it so much that they think it’s the only tool they need. If you give them a screw, they’ll just pound it in like it’s a nail. And when you point out to them that dammit, it’s a screw, not a nail, they’ll say “I know that. But you can’t expect me to use a crappy little screwdriver when I have a magnificent hammer like this!”

Categories: Data models and architecture, Database diversity, MapReduce, Parallelization

MapReduce for data mining? Maybe for variable-schema analytics.

Rich Skrenta is quite a successful entrepreneur, so it’s likely that he doesn’t really mean the more ridiculous parts of this rant on the MapReduce debate. E.g., he cheerfully disregards the fact that the data warehouse appliance vendors have ALREADY disrupted the market he’s focusing on. Index-light row-based and columnar systems are both super fast at data mining extracts.

But let’s go straight to the one interesting thing he said, Read more

Categories: Analytic technologies, MapReduce, Parallelization, SAS Institute

2 Comments

January 18, 2008

The Great MapReduce Debate

Google’s highly parallel file manipulator MapReduce has gotten great attention recently, after a research paper revealed:

MapReduce is running the core Google search engine, plus much of Google Analytics and other applications.
MapReduce is processing 400+ petabytes of data per month.

(Niall Kennedy popularized the paper and surveyed its results.)

David DeWitt and Mike Stonebraker then launched a blistering attack on MapReduce, accusing it of disregarding almost all the lessons of database management system theory and practice. A vigorous comment thread has ensued, pointing out that MapReduce is not a DBMS and asserting it therefore shouldn’t be judged as one.

While correct, that defense begs the question – what is MapReduce good for? Proponents of MapReduce highlight two advantages:

MapReduce makes it very easy to program data transformations, including ones to which relational structures are of little relevance.
MapReduce runs in massively parallel mode “for free,” without extra programming.

Based on those advantages, MapReduce would indeed seem to have significant uses, including: Read more

Categories: Cloud computing, MapReduce, Michael Stonebraker

10 Comments

← Previous Page

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in