October 1, 2009

Yahoo wants to do decapetabyte-scale data warehousing in Hadoop

My old client Mark Tsimelzon moved over to Yahoo after Coral8 was acquired, and I caught up with him last month. He turns out to be running development for a significant portion of Yahoo’s Hadoop effort — everything other than HDFS (Hadoop Distributed File System). Yahoo evidently plans to, within a year or so, get Hadoop to the point that it is managing 10s of petabytes of data for Yahoo, with reasonable data warehousing functionality.

Highlights of our visit included:

Read more

September 30, 2009

Facts and rumors

September 29, 2009

What Nielsen really uses in data warehousing DBMS

In its latest earnings call, Oracle made a reference to The Nielsen Company that was — to put it politely — rather confusing. I just plopped down in a chair next to Greg Goff, who evidently runs data warehousing at Nielsen, and had a quick chat. Here’s the real story.

September 19, 2009

Oracle gives a few customer database size examples

In its recent quarterly conference call, Oracle said (as per the Seeking Alpha transcript):

AC Neilsen, for instance, we deployed a 45-terabyte data [mart], they called it; Adidas, 13 terabytes; Australian Bureau of Statistics, 250 terabytes; and of course, some of our high-end ones that you have probably heard of in the past, AT&T, 250 terabytes; Yahoo!, 700 terabytes — just gives you an idea of the size of the databases that are out there and how they are growing, and that’s driving the need for greater throughput.

I don’t know what’s being counted there, but I wouldn’t be surprised if those were legit user-data figures.

Some other notes:

September 12, 2009

Introduction to the XLDB and SciDB projects

Before I write anything else about the overlapping efforts known as XLDB and SciDB, I probably should explain and disambiguate what they are as best I can. XLDB was organized and still is run by guys who want to solve a scientific problem in eXtremely Large DataBase Management, most especially Jacek Becla of SLAC (the organization previously known as Stanford Linear Accelerator Center). Becla’s original motivation was that he needs a DBMS to manage what will be 55 petabytes of raw image data and 100 petabytes of astronomical data total for LSST (Large Synoptic Survey Telescope). Read more

July 6, 2009

Yahoo is up to 10 petabytes now?

According to somebody (I forget who) who attended Yahoo’s SIGMOD presentation last week, the big Yahoo database is now up to 10 petabytes in size, in line with Yahoo’s predictions last year.  Apparently, Yahoo also gave more details of how the technology works.

July 1, 2009

NoSQL?

Eric Lai emailed today to ask what I thought about the NoSQL folks, and especially whether I thought their ideas were useful for enterprises in general, as opposed to just Web 2.0 companies. That was the first I heard of NoSQL, which seems to be a community discussing SQL alternatives popular among the cloud/big-web-company set, such as BigTable, Hadoop, Cassandra and so on. My short answers are:

As for the longer form, let me start by noting that there are two main kinds of reason for not liking SQL. Read more

June 15, 2009

An example of what’s wrong with big vendors’ approaches to BI (SAP in this case)

I just found Chris Kanaracus’ article about SAP’s rollout last month of its “clear enterprises” strategy. The money quote comes from Sara Lee, the user SAP seems to have trotted out:

But Sara Lee has not yet decided to purchase the software, and there are substantial underlying tasks to perform as well, he added.

“This is giving us the horsepower [to analyze data] but we need to have harmonized and structured data underneath it.”

This is from the leading test user of the product?

Business intelligence and the associated data management processes need to be reimagined, and I’m increasingly coming to suspect that the big BI conglomerates aren’t up to the task.

June 14, 2009

MMO games are still screwed up in their database technology

Two years ago I wrote about the database technology of Guild Wars. Not coincidentally, Guild Wars was the MMO RPG (Massively Multiplayer Online Role-Playing Game) I then played. I had the chance to interview Guild Wars’ lead developers. While much else they had to say was impressive, Guild Wars’ database architecture was — er, it was rather mind-boggling.

Since then, Linda and I have taken to playing Lord of the Rings Online, commonly known as LOTRO, developed by Turbine, Inc.. I haven’t had the chance to interview any Turbine folks, despite repeated requests. But from afar, it would seem that Turbine’s technology choices leave quite a bit to be desired, in enterprise-like IT areas such as authentication, database management, and storage.

LOTRO and other Turbine games commonly are down, for scheduled maintenance or in some cases otherwise. There is scheduled multi-hour downtime to start many weeks. There are fairly frequent server restarts in addition to that. Lag and congestion are frequent. And so on and so forth. By way of contrast, Guild Wars very rarely goes down, and other technical difficulties are less common as well. Reliability is a key design goal for Guild Wars’ developers, and in my opinion they’ve achieved it.

Some of the reasons for Turbine’s difficulties seem related to the stresses of MMOs — e.g., they’re probably due to the problems caused by having many fictional characters moving through the same fictional space at once, with graphical detail much richer than Guild Wars’. But a couple of head-scratchers make me really wonder about how Turbine manages data. Read more

June 9, 2009

Aster Data sticks by its SQL/MapReduce guns

Aster Data continues to think that MapReduce, integrated with SQL, is an important technology. For example:

I was a big fan of SQL/MapReduce when it was first announced last August. Notwithstanding persuasive examples favoring pure DBMS or pure MapReduce over DBMS/MapReduce integration, I continue to think the SQL/MapReduce idea has great potential.  But I do wish more successful production examples would become visible …

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.