Greenplum

Analysis of data warehouse DBMS vendor Greenplum and its successor, EMC’s Data Computing division. Related subjects include:

January 12, 2009

Gartner’s 2008 data warehouse database management system Magic Quadrant is out

February, 2011 edit: I’ve now commented on Gartner’s 2010 Data Warehouse Database Management System Magic Quadrant as well.

Gartner’s annual Magic Quadrant for data warehouse DBMS is out.  Thankfully, vendors don’t seem to be taking it as seriously as usual, so I didn’t immediately hear about it.  (I finally noticed it in a Greenplum pay-per-click ad.)  Links to Gartner MQs tend to come and go, but as of now here are two working links to the 2008 Gartner Data Warehouse Database Management System MQ.  My posts on the 2007 and 2006 MQs have also been updated with working links. Read more

November 15, 2008

High-performance analytics

For the past few months, I’ve collected a lot of data points to the effect that high-performance analytics – i.e., beyond straightforward query — is becoming increasingly important. And I’ve written about some of them at length. For example:

Ack. I can’t decide whether “analytics” should be a singular or plural noun. Thoughts?

Another area that’s come up which I haven‘t blogged about so much is data mining in the database. Data mining accounts for a large part of data warehouse use. The traditional way to do data mining is to extract data from the database and dump it into SAS. But there are problems with this scenario, including: Read more

November 7, 2008

Big scientific databases need to be stored somehow

A year ago, Mike Stonebraker observed that conventional DBMS don’t necessarily do a great job on scientific data, and further pointed out that different kinds of science might call for different data access methods. Even so, some of the largest databases around are scientific ones, and they have to be managed somehow. For example:

Long-term, I imagine that the most suitable DBMS for these purposes will be MPP systems with strong datatype extensibility — e.g., DB2, PostgreSQL-based Greenplum, PostgreSQL-based Aster nCluster, or maybe Oracle.

October 1, 2008

Greenplum pricing

Edit: Actually, this post is completely incorrect. The $20K/terabyte is for software only. So far, my attempts to get Greenplum to estimate hardware costs have been unsuccessful.

Greenplum’s Scott Yara was recently quoted citing a $20K/terabyte figure for Greenplum pricing. That naturally raises the question:

Greenplum charges around $20K/terabyte of what?

Read more

September 29, 2008

Eric Lai on Oracle Exadata, and some addenda

Eric Lai offers a detailed FAQ on Oracle Exadata, including a good selection of links and quotes. I’d like to offer a few comments in response: Read more

September 22, 2008

Database compression is heavily affected by the kind of data

I’ve written often of how different kinds or brands of data warehouse DBMS get very different compression figures. But I haven’t focused enough on how much compression figures can vary among different kinds of data. This was really brought home to me when Vertica told me that web analytics/clickstream data can often be compressed 60X in Vertica, while at the other extreme — some kind of floating point data, whose details I forget for now — they could only do 2.5X. Edit: Vertica has now posted much more accurate versions of those numbers. Infobright’s 30X compression reference at TradeDoubler seems to be for a clickstream-type app. Greenplum’s customer getting 7.5X — high for a row-based system — is managing clickstream data and related stuff. Bottom line:

When evaluating compression ratios — especially large ones — it is wise to inquire about the nature of the data.

September 22, 2008

Web analytics — clickstream and network event data

It should surprise nobody that web analytics – and specifically clickstream data — is one of the biggest areas for high-end data warehousing. For example:

Read more

September 17, 2008

Netezza overseas

22% of Netezza’s revenue comes from outside the US, at least if we use last quarter’s figures as a guide.  At first blush, that doesn’t sound like much.  Indeed, percentage-wise it surely lags behind Teradata, Greenplum (which has sold a lot in Asia/Pacific under Netezza’s former head of that region), and a few smaller competitors headquartered outside the US.  But a few conversations I had today suggest a rosier view.  Read more

September 5, 2008

Dividing the data warehousing work among MPP nodes

I talk with lots of vendors of MPP data warehouse DBMS. I’ve now heard enough different approaches to MPP architecture that I think it might be interesting to contrast some of the alternatives.

Read more

September 5, 2008

Three different implementations of MapReduce

So far as I can see, there are three implementations of MapReduce that matter for enterprise analytic use – Hadoop, Greenplum’s, and Aster Data’s.* Hadoop has of course been available for a while, and used for a number of different things, while Greenplum’s and Aster Data’s versions of MapReduce – both in late-stage beta – have far fewer users.

*Perhaps Nokia’s Disco or another implementation will at some point join the list.

Earlier this evening I posted some Mike Stonebraker criticisms of MapReduce. It turns out that they aren’t all accurate across all MapReduce implementations. So this seems like a good time for me to stop stalling and put up a few notes about specific features of different MapReduce implementations. Here goes. Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.