March 31, 2009

Twitter is considering using MapReduce

From a Twitter job listing (formatting mine).  The most interesting section is “Additional preferred experience.” Read more

March 20, 2009

More on Greenplum, Fox/MySpace, and load speeds

Eric Lai offers more facts, figures, explanation, and competitive insight than I did on Greenplum’s loading of the Fox/MySpace database, including that Greenplum is being loaded with data at the 4 TB/hour rate only for half an hour at a time.

Also, Eric cites the Greenplum Fox Interactive Media database as being only 200 TB in size.  Surely there is some confusion somewhere, since Greenplum described it as being 400 TB back in August.

March 20, 2009

Notes from the Oracle conference call

Chris Karnacus reports two tidbits from the Oracle conference call:

Seeking Alpha, as usual, has a full transcript, some typos aside.  There were plenty of comments on other sales, just not Exadata ones. On the other hand, Oracle execs did repeat several times how wonderful they think Exadata is.

One question about the transcript — it sort of reads like there was a big text-oriented deal at Bank of America, but there’s clearly a typo in the reference.  Does anybody who actually listened to the call know for sure whether that’s what was said? (Edit: Answered in the comments below.)

March 20, 2009

Greenplum claims very fast load speeds, and Fox still throws away most of its MySpace data

Data warehouse load speeds are a contentious issue.  Vertica contrived a benchmark with a 5 1/2 terabyte/hour load rate.  Oracle has gotten dinged for very low load speeds, which then are hotly debated.  I was told recently of a Greenplum partner’s salesman steering a prospect who needed rapid load speeds away from Greenplum, which seemed odd to me.

Now Greenplum has come out swinging, claiming “consistent” load speeds of 4 terabytes/hour at its Fox Interactive Media account, and armed with a customer quote saying just that.  Note however that load speeds tend to be proportional to the number of disks, and there are a LOT of disks at that installation.

One way to think about load speeds is — how long would it take to load the entire database? It seems as if the Fox database could be loaded, perhaps not in one week, but certainly in less than two. Flipping that around, the Fox site only has enough capacity to hold less than 2 weeks of detailed data. (This is not uncommon in network event kinds of databases.) And a corollary of that is — worldwide storage sales are still constrained by cost, not by absolute limits on the amounts of data enterprises would like to store.

March 7, 2009

Three Greenplum customers’ applications of MapReduce

Greenplum (and Truviso) advisor Joseph Hellerstein offers a few examples of MapReduce applications (specifically Greenplum MapReduce), namely:

The big aha moment occured for me during our panel discussion, which included Luke Lonergan from Greenplum, Roger Magoulas from O’Reilly, and Brian Dolan from Fox Interactive Media (which runs MySpace among other web properties).

Roger talked about using MapReduce to extract structured entities from text for doing tech trend analyses from billions of rows of online job postings.  Brian (who is a mathematician by training) was talking about implementing conjugate gradiant and Support Vector Machines in parallel SQL to support “hypertargeting” for advertisers.  I mentioned how Jonathan Goldman at LinkedIn was using SQL and MapReduce to do graph algorithms for social network analysis.

Incidentally: While it’s been some months since I asked, my sense is that the O’Reilly text extraction is home-grown, and primitive compared to what one could do via commercial products. That said, if the specific application is examining job postings, I’m not sure how much value more sophisticated products would add. After all, tech job listings are generally written in a style explicitly designed to ensure that most or all of their meaning is conveyed simply by a bag of keywords. And by the way, this effort has been underway for quite some time.

Related link

March 5, 2009

Fox Interactive Media’s multi-hundred terabyte database running on Greenplum

Greenplum’s largest named account is Fox Interactive Media — the parent organization of MySpace — which has a multi-hundred terabyte database that it uses for hardcore data mining/analytics. Greenplum has been engaging in regrettable business practices, claiming that it is in the process of supplanting Aster Data at Fox/MySpace. In fact, MySpace’s use of Aster is more mission-critical than Fox’s use of Greenplum, and is increasing significantly.

Still, as Greenplum’s gushing customer video with Fox Interactive Media* illustrates, the Fox/Greenplum database is impressive on its own merits. Read more

March 5, 2009

MySpace’s multi-hundred terabyte database running on Aster Data

Aster Data has put up a blog post embedding and summarizing a video about its MySpace account. Basic metrics include:

The combined Aster deployment now has 200+ commodity hardware servers working together to manage 200+ TB of data that is growing at 2-3TB per day by collecting 7-10B events that happen on one of the world.

I’m pretty sure that’s counting correctly (i.e., user data).* Read more

March 2, 2009

Closing the book on the DATAllegro customer base

I’m prepared to call an end to the “Guess DATAllegro’s customers” game.  Bottom line is that there are three in all, two of which are TEOCO and Dell, and the third of which is a semi-open secret.  I wrote last week:

The number of DATAllegro production references is expected to double imminently, from one to two. Few will be surprised at the identity of the second reference. I imagine the number will then stay at two, as DATAllegro technology is no longer being sold, and the third known production user has never been reputed to be particularly pleased with it.

Dell did indeed disclose at TDWI that it was a large DATAllegro user, notwithstanding that Dell is a huge Teradata user as well.  No doubt, Dell is gearing up to be a big user of Madison too.

Also at TDWI, I talked with some former DATAllegro employees who now work for rival vendors. None thinks DATAllegro has more than three customers.  Neither do I.

Edit: Subsequently, the DATAllegro customer count declined to 1.

March 2, 2009

Named customer silliness

Neither Greenplum nor eBay will say for the record that eBay is a Greenplum customer. Indeed, saying that is quite verboten. On the other hand, Greenplum’s press release boilerplate says that Skype is a Greenplum customer, and Skype is of course a subsidiary of eBay.  (Edit: Speaking of silliness, fixed a typo there.)

The point of such distinctions is sometimes lost on me.

In related news, of Greenplum’s two customers who back in August were supposedly heading into production soon with petabyte-plus databases, one hasn’t yet made it to that size. (“As we speak” turned out to be a longer conversation than I might have anticipated ….) The other (of course unnamed) customer has, Greenplum assures me, made it that high.  But upon checking with that (unnamed, in case I forgot to mention the point) customer, I don’t detect a whole lot of enthusiasm about Greenplum.

February 26, 2009

Data warehousing business trends

I’ve talked with a whole lot of vendors recently, some here at TDWI, as well as users, fellow analysts, and so on. Repeated themes include: Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.