Introduction to the XLDB and SciDB projects
Before I write anything else about the overlapping efforts known as XLDB and SciDB, I probably should explain and disambiguate what they are as best I can. XLDB was organized and still is run by guys who want to solve a scientific problem in eXtremely Large DataBase Management, most especially Jacek Becla of SLAC (the organization previously known as Stanford Linear Accelerator Center). Becla’s original motivation was that he needs a DBMS to manage what will be 55 petabytes of raw image data and 100 petabytes of astronomical data total for LSST (Large Synoptic Survey Telescope).
XLDB more or less comprises:
- A series of what have now been three workshops: XLDB1 in 2007, XLDB2 in 2008, and XLDB3 in 2009 (the closest thing to a master link is probably the XLDB3 site’s related link page). Participants have included, among others:
- A lot of big-name database-oriented computer science researchers — Mike Stonebraker, Dave DeWitt, Martin Kersten, and numerous others
- Academics responsible for scientific database management, especially but not only in the astronomy area
- Some vendors (although vendor participation was cut back after XLDB1) — at XLDB3, which is the one I went to, the three vendor folks who actually talked were Stephen Brobst of Teradata, Luke Lonergan of Greenplum (who worked in scientific high performance computing earlier in his career), and Jeff Hammerbacher of Cloudera.
- eBay and to some extent other large web companies
- A European Union funding bureaucrat
- Me
- An attempt to kick start a broader movement, perhaps comprising (it’s not totally clear yet):
- Computer science researchers interested in database issues
- Database technology vendors
- Scientific researchers (academic) who have very large or otherwise difficult database management problems
- Scientific researchers (commercial) who have very large or otherwise difficult database management problems
- Other commercial users who have very large database management problems
The first result or spin-out from the XLDB effort seems to have been the SciDB project. This is an effort to build an open source DBMS called SciDB that will address some of the needs the XLDB effort is uncovering. (More on that in other posts.) Somewhat confusingly, all the use cases the XLDB group is collecting are currently being posted on SciDB’s website, apparently because it’s glitzier and healthier than, say, the excessively sparse XLDB wiki. Some SciDB development has happened, but no large sugar daddy has yet been found. (It’s a fairly open secret that eBay looked seriously and favorably at funding SciDB before the economic downturn.) hit.
Numerous big-name computer scientists are associated with SciDB, indeed more closely (it would seem) than with XLDB. That said, I’m guessing Dave DeWitt’s involvement in the open-source SciDB isn’t what it would be if he hadn’t gone to Microsoft. DeWitt actually skipped XLDB3, although he was in town for VLDB. (XLDB3 was back-to-back with VLDB 2009 in Lyon, France in late August.) Stonebraker just didn’t make the flight for either conference, due to the double-knee “upgrade” he had back in March.
There’s a lot more to be said about the cross-discipline or science-specific requirements that researchers place on data management, but I’ll leave that for later and just get this posted as a start — assuming, of course, that blog outages permit. 🙁
Related links
- Paper laying out the SciDB project
- One version of a SciDB overview page, with links to academic papers
Comments
2 Responses to “Introduction to the XLDB and SciDB projects”
Leave a Reply
[…] et al. trumpet query fault-tolerance as one of the virtues of HadoopDB. Some of the scientists at XLDB spoke of query fault-tolerance as being a good reason to leave 100s or 1000s of terabytes of data […]
[…] when Jacek Becla started the XLDB conferences on the premise that scientific and big data analytic challenges have a lot in common, […]