Scientific research

Discussion of how database and related technologies are used to support scientific research. Related subjects include:

October 11, 2011

IBM is buying parallelization expert Platform Computing

IBM is acquiring Platform Computing, a company with which I had one briefing, last August. Quick background includes:  Read more

September 20, 2011

XLDB: The one conference I like to attend

I’m not a big fan of conferences, but I really like XLDB. Last year I got a lot out of XLDB, even though I couldn’t stay long (my elder care issues were in full swing). The year before I attended the whole thing — in Lyon, France, no less — and learned a lot more. This year’s XLDB conference is at SLAC — the organization formerly known as the Stanford Linear Accelerator Center — on Sand Hill Road in Menlo Park, October 18-19. As of right now, I plan to be there, at least on the first day. XLDB’s agenda and registration details (inexpensive) can be found on the XLDB conference website.

The only reason I wouldn’t go is if that turned out to be a lousy week for me to travel to California.

The people who go XLDB tend to be really smart — either research scientists, hardcore database technologists, or others who can hold their own with those folks. Audience participation can be intense; the most talkative members I can recall were Mike Stonebraker, Martin Kersten, Michael McIntire, and myself. Even the vendor folks tend to the smart — past examples include Stephen Brobst, Jeff Hammerbacher, Luke Lonergan, and IBM Fellow Laura Haas. When we had a datageek bash on my last trip to the SF area, several guys said they were planning to attend XLDB as well.

XLDB stands for eXtremely Large DataBases, and those are indeed what gets talked about there. Read more

September 12, 2011

Hadoop notes

I visited California recently, and chatted with numerous companies involved in Hadoop — Cloudera, Hortonworks, MapR, DataStax, Datameer, and more. I’ll defer further Hadoop technical discussions for now — my target to restart them is later this month — but that still leaves some other issues to discuss, namely adoption and partnering.

The total number of enterprises in the world paying subscription and license fees that they would regard as being for “Hadoop or something Hadoop-related” probably is not much over 100 right now, but I’d expect to see pretty rapid growth. Beyond that, let’s divide customers into three groups:

Hadoop vendors, in different mixes, claim to be doing well in all three segments. Even so, almost all use cases involve some kind of machine-generated data, with one exception being a credit card vendor crunching a large database of transaction details. Multiple kinds of machine-generated data come into play — web/network/mobile device logs, financial trade data, scientific/experimental data, and more. In particular, pharmaceutical research got some mentions, which makes sense, in that it’s one area of scientific research that actually enjoys fat for-profit research budgets.

Read more

July 6, 2011

Petabyte-scale Hadoop clusters (dozens of them)

I recently learned that there are 7 Vertica clusters with a petabyte (or more) each of user data. So I asked around about other petabyte-scale clusters. It turns out that there are several dozen such clusters (at least) running Hadoop.

Cloudera can identify 22 CDH (Cloudera Distribution [of] Hadoop) clusters holding one petabyte or more of user data each, at 16 different organizations. This does not count Facebook or Yahoo, who are huge Hadoop users but not, I gather, running CDH. Meanwhile, Eric Baldeschwieler of Hortonworks tells me that Yahoo’s latest stated figures are:

Read more

July 5, 2011

Eight kinds of analytic database (Part 2)

In Part 1 of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I’ll cover four more kinds of analytic database — even newer, for the most part, with a use case/product short list match that is even less clear.  Read more

July 5, 2011

Eight kinds of analytic database (Part 1)

Analytic data management technology has blossomed, leading to many questions along the lines of “So which products should I use for which category of problem?” The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for “big data” is little help.

Let’s try eight categories instead. While no categorization is ever perfect, these each have at least some degree of technical homogeneity. Figuring out which types of analytic database you have or need — and in most cases you’ll need several — is a great early step in your analytic technology planning.  Read more

October 10, 2010

A few notes from XLDB 4

As much as I believe in the XLDB conferences, I only found time to go to (a big) part of one day of XLDB 4 myself. In general:  Read more

July 31, 2010

Nested data structures keep coming up, especially for log files

Nested data structures have come up several times now, almost always in the context of log files.

I don’t have a grasp yet on what exactly is happening here, but it’s something.

July 1, 2010

Why you should go to XLDB4

Scientific data commonly:

In those respects, it is akin to some of the hottest areas for big data analytics, including:

So when Jacek Becla started the XLDB conferences on the premise that scientific and big data analytic challenges have a lot in common, he had a point. There are several tough database problems that the science-focused folks have taken the leading in thinking about, but which are soon going to matter to the commercial world as well. And that’s one of two big reasons why you should consider participating in XLDB4, October 6-7, at the SLAC facility in Menlo Park, CA, as an attendee, sponsor, or both.

The other big reason is that it is important for the world that XLDB succeed. Read more

May 22, 2010

Notes on SciDB and scientific data management

I firmly believe that, as a community, we should look for ways to support scientific data management and related analytics. That’s why, for example, I went to XLDB3 in Lyon, France at my own expense. Eight months ago, I wrote about issues in scientific data management. Here’s some of what has transpired since then.

The main new activity I know of has been in the open source SciDB project.   Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.