January 24, 2008

Is MapReduce a good underpinning for next-gen scientific DBMS?

Back in November, Mike Stonebraker suggested that there’s a need for database management advances to serve “big science”. He said:

Obviously, the best solution to these … problems would be to put everything in a next-generation DBMS — one capable of keeping track of data, metadata, and lineage. Supporting the latter would require all operations on the data to be done inside the DBMS with user-defined functions — Postgres-style.

Then he went on to give examples of failings in a prior effort, including that they didn’t support the right computations and data transformations.

Meanwhile, Google has started a program to host terabyte-scale scientific databases for free.

If the issue is that different scientific projects need different kinds of specialized indexing, it sure seems as if MapReduce would be a good way to populate those indexes in the first place. Banging data into indexes is what MapReduce was designed for, and indeed seems to be the core of what MapReduce does in production use for Google today. That said — getting data into indices is the beginning of DBMS design and operation, not the end.

Comments

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.