Data warehouse appliances

Analysis of data warehouse appliances – i.e., of hardware/software bundles optimized for fast query and analysis of large volumes of (usually) relational data. Related subjects include:

October 25, 2007

DATAllegro discloses a few numbers

Privately held DATAllegro just announced a few tidbits about financial results and suchlike for the fiscal year ended June, 2007. I sent over a few clarifying questions yesterday. Responses included:

All told, it sounds as if DATAllegro is more than 1/3 the size of Netezza, although given its higher system size and price points I’d guess it has well under 1/3 as many customers.

Here’s a link. I’ll likely edit that to something more permament-seeming later, and generally spruce this up when I’m not so rushed.

October 19, 2007

One Greenplum customer — 35 terabytes and growing fast

I was at the Business Objects conference this week, and as usual went to very few sessions. But one I did stroll into was on “Managing Rapid Growth With the Right BI Strategy.” This was by Reliance Telecommunications, an outfit in India that is adding telecom subscribers very quickly, and consequently banging 100-150 gigs of data per day into a 35 terabyte warehouse.

The beginning of the talk astonished me, as the presenter seemed to be saying they were doing all this on Oracle. Hah. Oracle is what they moved away from; instead, they got Greenplum. I couldn’t get details; indeed, as a BI guy he was far enough away from DBMS to misspeak and say that Greenplum was brought in by ‘HP’, before quickly correcting himself when prompted. Read more

October 19, 2007

Gartner 2007 Magic Quadrant for Data Warehouse Database Management Systems

February, 2011 edit: I’ve now commented on Gartner’s 2010 Data Warehouse Database Management System Magic Quadrant as well.

It’s early autumn, the leaves are turning in New England, and Gartner has issued another Magic Quadrant for data warehouse DBMS(Edit: As of January, 2009, that link is dead but this one works.) The big winners vs. last year are Greenplum and, secondarily, Sybase. Teradata continues to lead. Oracle has also leapfrogged IBM, and there are various other minor adjustments as well, among repeat mentionees Netezza, DATAllegro, Sand, Kognitio, and MySQL. HP isn’t on the radar yet; ditto Vertica. Read more

October 12, 2007

Three ways Oracle or Microsoft could go MPP

I’ve been arguing for a while that Oracle and Microsoft are screwed in high-end data warehousing. The reason is that they’re stuck with SMP (Symmetric Multi-Processing) architectures, while Teradata, Netezza, DATAllegro, and many others enjoy the benefits of MPP (Massively Parallel Processing). Thus, Teradata and DATAllegro boast installations in the hundreds of terabytes each, while Oracle and Microsoft users usually have to perform unnatural acts of hard-coded partitioning even to reach the 10 terabyte level.

That said, there are at least three ways Oracle and/or Microsoft could get out of this technical box:

1. They could buy or just partner with MPP vendors such as Dataupia, who offer plug-compatibility with their respective main DBMS.

2. They could buy whoever they want, plug-compatibility be damned. Presumably, they’d quickly add a light-weight data federation front-end to give the appearance of integration, then merge the products more closely over time.

3. They could develop or buy technology like DATAllegro’s, which essentially federates instances of an ordinary SMP DBMS across nodes of an MPP grid (Greenplum does something similar). I imagine that, for example, ripping Ingres out of DATAllegro and slotting in Oracle instead would be a pretty straightforward exercise; even without dramatic change to any of the optimizations, the resulting port would be something that ran pretty quickly on Day 1.

Bottom line: Oracle and Microsoft are hemorrhaging at the data warehouse high end now. But there are ways they could stanch the bleeding.

October 10, 2007

SAS goes MPP on Teradata first

After a hurried discussion with SAS CTO Keith Collins and a followup with Teradata CTO Todd Walter, I think I’ve figured out the essence of the SAS port to Teradata. (Subtle nuances, however, have to await further research.) Here’s what I think is going on:

1. SAS is porting or creating two different products or modules, with two different names (and I don’t know exactly what those names are). The two different things they are porting amount to modeling (i.e., analysis) and scoring (i.e., using the results of the model for automated decision-making).

2. Both products are slated for delivery at or near the time of SAS 9.2, which is slated for GA at or near the middle of next year. (Maybe somebody from SAS could send me the official word, as well as product names and so on?)

3. The essence of the modeling port is a library of static UDFs (User Defined Functions).

4. The essence of the SAS scoring port is the ability to easily generate a single “dynamic” UDF to score according to a particular model. This would seem to leverage Teradata scoring-related enhancements much more than it would compete or conflict with them.

5. There are two different kinds of benefits SAS gets from integrating with an MPP (Massively Parallel Processing) DBMS. One is actual parallel processing of operations, shortening absolute calculation time dramatically, and also leveraging Moore’s Law without painful SMP (Symmetric MultiProcessing) overhead. The other is a radical reduction in data movement costs for the handoff between the database and the SAS software. Interestingly, SAS reports huge performance gains even from putting its software on a single node inside the Teradata grid. That is, changing how data movement is done is already a huge win, even when there’s no reduction in the overall amount moved. But of course, in the complete implementation, where database and SAS processing are done on the same nodes, there’s also a huge reduction in actual data movement effort required.

One obvious question would be: How hard would it be for SAS to replicate this work on other MPP DBMS? Well, at its core this work involves implementing a variety of elementary arithmetic and data manipulation functions. So a first-best guess is that a fairly efficient port would be easy (given that this one has already been performed), but that the last 20% or whatever of the performance optimizations require a lot more work. As to whether or not this is more than a theoretical question — well, both SAS and SPSS are disclosed members of the Netezza Developers Network. As for SMP DBMS — well, some of the work certainly could be replicated, but other important parts don’t even make sense on Oracle or Microsoft the way they do on Teradata, Netezza, DATAllegro, et al. Read more

October 9, 2007

Marketing versus reality on the one-petabyte barrier

Usually, I don’t engage in the kind of high-speed quick-response blogging I have over the past couple of days from the Teradata Partners conference (and more generally have for the past week or so). And I’m not sure it’s working out so well.

For example, the claim that Teradata has surpassd the one-petabyte mark comes as quite a surprise to variety of Teradata folks, not to mention at least one reliable outside anonymous correspondent. That claim may indeed be true about raw disk space on systems sold. But the real current upper limit, according to CTO Todd Walter,* is 5-700 terabytes of user data. He thinks half a dozen or so customers are in that range. I’d guess quite strongly that three of those are Wal-Mart, eBay, and an unspecified US intelligence agency.

*Teradata seems to have quite a few CTOs. But I’ve seen things much sillier than that in the titles department, and accordingly shan’t scoff further — at least on that particular subject. 😉

On the other hand, if anybody did want to buy a 10 petabyte system, Teradata could ship them one. And by the way, the Teradata people insist Sybase’s claims in the petabyte area are quite bogus. Teradata claims to have had bigger internal systems tested earlier than the one Sybase writes about.

October 9, 2007

Yet more on petabyte-scale Teradata databases

I managed to buttonhole Teradata’s Darryl MacDonald again, to follow up on yesterday’s brief chat. He confirmed that there are more than one petabyte+ Teradata databases out there, of which at least one is commercial rather than government/classified. Without saying who any of them were, he dropped a hint suggestive of Wal-Mart. That makes sense, given that a 423 terabyte figure for Wal-Mart is now three years old, and Wal-Mart is in the news for its 4 petabyte futures. Yes, that news has tended to mention HP NeoView recently more than Teradata. But it seems very implausible that a NeoView replacement of Teradata has already happened, if if such a thing is a possibility for the future. So right now however much data Wal-Mart has on its path from 423 terabytes to 4 petabytes and beyond is probably collected mainly on Teradata machines.

October 8, 2007

Hot buzzword — multidimensional partitioning

Teradata finally announced multidimensional range partitioning in Version 12, not that they kept their plans in that regard a big secret. DATAllegro has also shipped multidimensional partitioning to at least one customer. Other vendors — well, I’ll stop there, given my ongoing atttitude problems about vendors’ self-defeating NDAs.

Whether or not multidimensional partitioning is a big improvement over single-dimensional will of course depend a great deal on the details of a particular database. Teradata used a figure of 30% performance improvement, but that’s surely just an example. Certainly in some extreme cases one could have a rather large reduction in the amount of data retrieved, and correspondingly a many-times-X improvement in the performance of certain important queries. Read more

October 8, 2007

Teradata apparently has crossed the petabyte barrier

According to a hurried conversation I had with Chief Marketing Office Darryl MacDonald, Teradata has customers with over 1 petabyte of user data in a single instance. He wouldn’t disclose any names, but I’d guess one is eBay, who he did confim is a customer. The intelligence area is another one where I’d speculate there are Very Large Databases.

However, since Darryl mentioned testing systems internally up to 4 petabytes, I’d guess the upper limit of Teradata deployments is in the 1-2 petabyte range.

EDIT: I’m now guessing that Teradata’s largest classified database — which previously was the largest overall — isn’t much over a petabyte in size. And there’s a strong chance this is larger than any unclassified one.

Update: That wasn’t really 1+ petabyte of user data.


October 8, 2007

SAS gets close to the database

One of the big announcements at the Teradata user conference this week (confusingly named “Partners”) is SAS integration. Now, SAS is integrating with other MPP data warehouse appliance vendors as well, but it’s likely that the Teradata integration is indeed the most advanced. For example, one customer proofpoint offered was an insurer who used this capability to reevaluate its risk profile at high speed after Hurricane Katrina. I doubt any of the other SAS/DBMS integrations I know of were in customer hands a year ago.

Three still-open questions I hope to address over the next couple of days are: Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.