August 25, 2008

MapReduce sound bites

Last Thursday, both Greenplum and Aster Data — the two most recent of my numerous data warehouse specialist customers — both told me of the same major innovation. Both were rushing to announce it first, before anybody else did. This led to considerable tap dancing, with the upshot being that both are releasing the information tonight or tomorrow morning.

What’s going on is that Aster Data and Greenplum have both integrated MapReduce into their respective MPP shared-nothing data warehouse DBMS. I’ll write about that at length very shortly, but for now let me throw up some sound bites ahead of the more detailed analysis:

MPP shared-nothing database managers like Greenplum or Aster Data give great performance. But sometimes you need to do even better. That’s where MapReduce comes in.
On its own, MapReduce can do a lot of important work in data manipulation and analysis. Integrating it with SQL should just increase its applicability and power.
Google’s internal use of MapReduce is impressive. So is Hadoop’s success. Now commercial implementations of MapReduce are getting their shots too.
At its core, most data analysis is really pretty simple – it boils down to arithmetic, Boolean logic, sorting, and not a lot else. MapReduce can handle a significant fraction of that.
The hardest part of data analysis is often the recognition of entities or semantic equivalences. The rest is arithmetic, Boolean logic, sorting, and so forth. MapReduce is already proven in use cases encompassing all of those areas.
MapReduce isn’t about data management, at least not primarily. It’s about parallelism.
MapReduce offers dramatic performance gains in analytic application areas that still need great performance speed-up.
MapReduce isn’t needed for tabular data management. That’s been efficiently parallelized in other ways. But if you want to build non-tabular structures such as text indexes or graphs, MapReduce turns out to be a big help.
In principle, any alphanumeric data at all can be stuffed into tables. But in high-dimensional scenarios, those tables are super-sparse. That’s when MapReduce can offer big advantages by bypassing relational databases. Examples of such scenarios are found in CRM and relationship analytics.

Some of our recent links about MapReduce

Categories: Analytic technologies, Aster Data, Greenplum, MapReduce, Parallelization

Subscribe to our complete feed!

Comments

11 Responses to “MapReduce sound bites”

Steve Wooledge on August 25th, 2008 8:21 pm

Hi Curt,

Uniting MapReduce with RDBMS/SQL is sure to be game changing. Looking forward to seeing your detailed analysis in further posts. Here is a link to the Aster In-Database MapReduce implementation and technical whitepaper. http://www.asterdata.com/product/mapreduce.html
Snarky Baloogle on August 26th, 2008 2:23 am

Yuck! When I look at the AsterData whitepaper I have to think these people don’t know anything about DBMS! Do they think they’ve invented UDFs?!

For some supposedly smart people they sure don’t seem to do their homework.

BRAND NEW! ROUND CIRCULAR PART FOR ROLLING CARTS! We call it a “weheeell” or weel for short, it’s much better than those square parts with less bumps when you roll.

We provide a normal square weheeell, a chisel and an axe with instructions for creating our special new ROUND weheeell…
Database vendors add Google’s MapReduce | about ICT on August 26th, 2008 3:38 pm

[…] "On its own, MapReduce can do a lot of important work in data manipulation and analysis. Integrating it with SQL should just increase its applicability and power," wrote Curt Monash of Monash Research, on the DBMS2 blog. […]
Steve Wooledge on August 26th, 2008 8:49 pm

Snarky – appreciate the discussion. On the surface, it’s easy to mis-categorize In-Database MapReduce functions as UDF’s. We’ve posted some thoughts on how they are in fact different. This is not just an attempt to revive an old brand (i.e., reinvent the wheel) http://www.asterdata.com/blog/index.php/2008/08/26/in-database-mapreduce-functions-not-your-granddaddys-udf/
Three approaches to parallelizing data transformation | DBMS2 -- DataBase Management System Services on August 27th, 2008 5:21 am

[…] Sound bites about MapReduce […]
Database vendors add Google’s MapReduce | TechHairBall.com on August 28th, 2008 11:08 am

[…] "On its own, MapReduce can do a lot of important work in data manipulation and analysis. Integrating it with SQL should just increase its applicability and power," wrote Curt Monash of Monash Research, on the DBMS2 blog. […]
Luke Lonergan on August 28th, 2008 1:11 pm

There is a coding tutorial available at this link in the middle of the page: http://www.greenplum.com/resources/mapreduce/

Key things to note about Greenplum’s MR implementation:
– It’s very similar in form and expression to Google and Hadoop
– Extensions for Joins and Pipelined task execution
– Native parallel file access
– Parallelism is full and transparent to the programmer

In summary: we have implemented MapReduce within which you can write SQL, Perl, Python and many more languages. It is straightforward to use MR programs written for Hadoop or Google and port them to Greenplum.
Steve Wooledge on August 29th, 2008 6:18 pm

Just a couple points on Aster’s implementation of MapReduce:
+ Developers can use Java, Python, C, Perl, and more
+ Aster’s In-Database MapReduce framework is a superset of MapReduce
+ Aster has a process management framework to guarantee transparency and availability

More in our whitepaper here:
http://www.asterdata.com/product/whitepaper_mapreduce.html
MapReduce links | DBMS2 -- DataBase Management System Services on September 5th, 2008 4:09 pm

[…] Sound bites about MapReduce Share: These icons link to social bookmarking sites where readers can share and discover new web pages. […]
Infology.Ru » Blog Archive » Несколько тезисов о MapReduce on November 3rd, 2008 2:30 pm

[…] Автор: Curt Monash Дата публикации оригинала: 2008-08-25 Перевод: Олег Кузьменко Источник: Блог Курта Монаша […]
Winning with Data: Aster Data Systems Blog » Blog Archive » TDWI MapReduce Nightschool Recap on November 14th, 2008 8:06 am

[…] and technology community, with recent coverage in the NY Times and by influential blogs like DBMS2, Beyond Search, and Cloud N, just to name a […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

MapReduce sound bites

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin