MapReduce sound bites
Last Thursday, both Greenplum and Aster Data — the two most recent of my numerous data warehouse specialist customers — both told me of the same major innovation. Both were rushing to announce it first, before anybody else did. This led to considerable tap dancing, with the upshot being that both are releasing the information tonight or tomorrow morning.
What’s going on is that Aster Data and Greenplum have both integrated MapReduce into their respective MPP shared-nothing data warehouse DBMS. I’ll write about that at length very shortly, but for now let me throw up some sound bites ahead of the more detailed analysis:
- MPP shared-nothing database managers like Greenplum or Aster Data give great performance. But sometimes you need to do even better. That’s where MapReduce comes in.
- On its own, MapReduce can do a lot of important work in data manipulation and analysis. Integrating it with SQL should just increase its applicability and power.
- Google’s internal use of MapReduce is impressive. So is Hadoop’s success. Now commercial implementations of MapReduce are getting their shots too.
- At its core, most data analysis is really pretty simple – it boils down to arithmetic, Boolean logic, sorting, and not a lot else. MapReduce can handle a significant fraction of that.
- The hardest part of data analysis is often the recognition of entities or semantic equivalences. The rest is arithmetic, Boolean logic, sorting, and so forth. MapReduce is already proven in use cases encompassing all of those areas.
- MapReduce isn’t about data management, at least not primarily. It’s about parallelism.
- MapReduce offers dramatic performance gains in analytic application areas that still need great performance speed-up.
- MapReduce isn’t needed for tabular data management. That’s been efficiently parallelized in other ways. But if you want to build non-tabular structures such as text indexes or graphs, MapReduce turns out to be a big help.
- In principle, any alphanumeric data at all can be stuffed into tables. But in high-dimensional scenarios, those tables are super-sparse. That’s when MapReduce can offer big advantages by bypassing relational databases. Examples of such scenarios are found in CRM and relationship analytics.
Some of our recent links about MapReduce
- The integration of MapReduce with SQL data warehousing
- Three major applications of MapReduce
- Another application of MapReduce
- Sound bites about MapReduce
- Other links about MapReduce
Comments
11 Responses to “MapReduce sound bites”
Leave a Reply
Hi Curt,
Uniting MapReduce with RDBMS/SQL is sure to be game changing. Looking forward to seeing your detailed analysis in further posts. Here is a link to the Aster In-Database MapReduce implementation and technical whitepaper. http://www.asterdata.com/product/mapreduce.html
Yuck! When I look at the AsterData whitepaper I have to think these people don’t know anything about DBMS! Do they think they’ve invented UDFs?!
For some supposedly smart people they sure don’t seem to do their homework.
BRAND NEW! ROUND CIRCULAR PART FOR ROLLING CARTS! We call it a “weheeell” or weel for short, it’s much better than those square parts with less bumps when you roll.
We provide a normal square weheeell, a chisel and an axe with instructions for creating our special new ROUND weheeell…
[…] "On its own, MapReduce can do a lot of important work in data manipulation and analysis. Integrating it with SQL should just increase its applicability and power," wrote Curt Monash of Monash Research, on the DBMS2 blog. […]
Snarky – appreciate the discussion. On the surface, it’s easy to mis-categorize In-Database MapReduce functions as UDF’s. We’ve posted some thoughts on how they are in fact different. This is not just an attempt to revive an old brand (i.e., reinvent the wheel) http://www.asterdata.com/blog/index.php/2008/08/26/in-database-mapreduce-functions-not-your-granddaddys-udf/
[…] Sound bites about MapReduce […]
[…] "On its own, MapReduce can do a lot of important work in data manipulation and analysis. Integrating it with SQL should just increase its applicability and power," wrote Curt Monash of Monash Research, on the DBMS2 blog. […]
There is a coding tutorial available at this link in the middle of the page: http://www.greenplum.com/resources/mapreduce/
Key things to note about Greenplum’s MR implementation:
– It’s very similar in form and expression to Google and Hadoop
– Extensions for Joins and Pipelined task execution
– Native parallel file access
– Parallelism is full and transparent to the programmer
In summary: we have implemented MapReduce within which you can write SQL, Perl, Python and many more languages. It is straightforward to use MR programs written for Hadoop or Google and port them to Greenplum.
Just a couple points on Aster’s implementation of MapReduce:
+ Developers can use Java, Python, C, Perl, and more
+ Aster’s In-Database MapReduce framework is a superset of MapReduce
+ Aster has a process management framework to guarantee transparency and availability
More in our whitepaper here:
http://www.asterdata.com/product/whitepaper_mapreduce.html
[…] Sound bites about MapReduce Share: These icons link to social bookmarking sites where readers can share and discover new web pages. […]
[…] Автор: Curt Monash Дата публикации оригинала: 2008-08-25 Перевод: Олег Кузьменко Источник: Блог Курта Монаша […]
[…] and technology community, with recent coverage in the NY Times and by influential blogs like DBMS2, Beyond Search, and Cloud N, just to name a […]