HadoopDB
Despite a thoughtful heads-up from Daniel Abadi at the time of his original posting about HadoopDB, I’m just getting around to writing about it now. HadoopDB is a research project carried out by a couple of Abadi’s students. Further research is definitely planned. But it seems too early to say that HadoopDB will ever get past the “research and oh by the way the code is open sourced” stage and become a real code line — whether commercialized, open source, or both.
The basic idea of HadoopDB is to put copies of a DBMS at different nodes of a grid, and use Hadoop to parcel work among them. Major benefits when compared with massively parallel DBMS are said to be:
- Open/cheap/free
- Query fault-tolerance
- The related concept of tolerating node degradation that isn’t an outright node failure.
HadoopDB has actually been built with PostgreSQL. That version achieved performance well below that of a commercial DBMS “DBX”, where X=2. Column-store guru Abadi has repeatedly signaled his intention to try out HadoopDB with VectorWise at the nodes instead. (Recall that VectorWise is shared-everything.) It will be interesting to see how that configuration performs.
The real opportunity for HadoopDB, however, in my opinion may lie elsewhere. Rather than trying to compete with parallel relational DBMS, HadoopDB might do more good parallelizing more specialized kinds of database engines. How about, for example, a massively parallel XML manager to compete with MarkLogic? Or a massively parallel array processor other than the still-nascent SciDB? Or, even more to the point, something that parallelizes a yet-more-specialized scientific data management engine? That kind of area is where I suspect the potential for HadoopDB really lives.
Comments
5 Responses to “HadoopDB”
Leave a Reply
[…] keep going? For example, Daniel Abadi et al. trumpet query fault-tolerance as one of the virtues of HadoopDB. Some of the scientists at XLDB spoke of query fault-tolerance as being a good reason to leave 100s […]
Well I believe this recent interview is relevant here. That’s traction IMHO.
http://news.cnet.com/8301-13505_3-10355679-16.html
[…] still tentative — are afoot to integrate VectorWise with MapReduce in Daniel Abadi’s HadoopDB […]
[…] that HadoopDB has never been used for a production application, large-scale or otherwise. Unsurprisingly, Daniel […]
[…] HadoopDB project was started by Dan Abadi and two grad […]