September 12, 2009

Introduction to the XLDB and SciDB projects

Before I write anything else about the overlapping efforts known as XLDB and SciDB, I probably should explain and disambiguate what they are as best I can. XLDB was organized and still is run by guys who want to solve a scientific problem in eXtremely Large DataBase Management, most especially Jacek Becla of SLAC (the organization previously known as Stanford Linear Accelerator Center). Becla’s original motivation was that he needs a DBMS to manage what will be 55 petabytes of raw image data and 100 petabytes of astronomical data total for LSST (Large Synoptic Survey Telescope).

XLDB more or less comprises:

A series of what have now been three workshops: XLDB1 in 2007, XLDB2 in 2008, and XLDB3 in 2009 (the closest thing to a master link is probably the XLDB3 site’s related link page). Participants have included, among others:
- A lot of big-name database-oriented computer science researchers — Mike Stonebraker, Dave DeWitt, Martin Kersten, and numerous others
- Academics responsible for scientific database management, especially but not only in the astronomy area
- Some vendors (although vendor participation was cut back after XLDB1) — at XLDB3, which is the one I went to, the three vendor folks who actually talked were Stephen Brobst of Teradata, Luke Lonergan of Greenplum (who worked in scientific high performance computing earlier in his career), and Jeff Hammerbacher of Cloudera.
- eBay and to some extent other large web companies
- A European Union funding bureaucrat
- Me
An attempt to kick start a broader movement, perhaps comprising (it’s not totally clear yet):
- Computer science researchers interested in database issues
- Database technology vendors
- Scientific researchers (academic) who have very large or otherwise difficult database management problems
- Scientific researchers (commercial) who have very large or otherwise difficult database management problems
- Other commercial users who have very large database management problems

The first result or spin-out from the XLDB effort seems to have been the SciDB project. This is an effort to build an open source DBMS called SciDB that will address some of the needs the XLDB effort is uncovering. (More on that in other posts.) Somewhat confusingly, all the use cases the XLDB group is collecting are currently being posted on SciDB’s website, apparently because it’s glitzier and healthier than, say, the excessively sparse XLDB wiki. Some SciDB development has happened, but no large sugar daddy has yet been found. (It’s a fairly open secret that eBay looked seriously and favorably at funding SciDB before the economic downturn.) hit.

Numerous big-name computer scientists are associated with SciDB, indeed more closely (it would seem) than with XLDB. That said, I’m guessing Dave DeWitt’s involvement in the open-source SciDB isn’t what it would be if he hadn’t gone to Microsoft. DeWitt actually skipped XLDB3, although he was in town for VLDB. (XLDB3 was back-to-back with VLDB 2009 in Lyon, France in late August.) Stonebraker just didn’t make the flight for either conference, due to the double-knee “upgrade” he had back in March.

There’s a lot more to be said about the cross-discipline or science-specific requirements that researchers place on data management, but I’ll leave that for later and just get this posted as a start — assuming, of course, that blog outages permit. 🙁

Related links

Paper laying out the SciDB project
One version of a SciDB overview page, with links to academic papers

Categories: Data models and architecture, Database diversity, eBay, Michael Stonebraker, Open source, Petabyte-scale data management, Scientific research, Theory and architecture

Subscribe to our complete feed!

Comments

2 Responses to “Introduction to the XLDB and SciDB projects”

Fault-tolerant queries | DBMS2 -- DataBase Management System Services on September 13th, 2009 12:36 am

[…] et al. trumpet query fault-tolerance as one of the virtues of HadoopDB. Some of the scientists at XLDB spoke of query fault-tolerance as being a good reason to leave 100s or 1000s of terabytes of data […]
Why you should go to XLDB4 | DBMS2 -- DataBase Management System Services on July 1st, 2010 12:23 am

[…] when Jacek Becla started the XLDB conferences on the premise that scientific and big data analytic challenges have a lot in common, […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Introduction to the XLDB and SciDB projects

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin