Is MapReduce a good underpinning for next-gen scientific DBMS?
Back in November, Mike Stonebraker suggested that there’s a need for database management advances to serve “big science”. He said:
Obviously, the best solution to these … problems would be to put everything in a next-generation DBMS — one capable of keeping track of data, metadata, and lineage. Supporting the latter would require all operations on the data to be done inside the DBMS with user-defined functions — Postgres-style.
Then he went on to give examples of failings in a prior effort, including that they didn’t support the right computations and data transformations.
Meanwhile, Google has started a program to host terabyte-scale scientific databases for free.
If the issue is that different scientific projects need different kinds of specialized indexing, it sure seems as if MapReduce would be a good way to populate those indexes in the first place. Banging data into indexes is what MapReduce was designed for, and indeed seems to be the core of what MapReduce does in production use for Google today. That said — getting data into indices is the beginning of DBMS design and operation, not the end.
Comments
Leave a Reply