Data warehouse appliances

Analysis of data warehouse appliances – i.e., of hardware/software bundles optimized for fast query and analysis of large volumes of (usually) relational data. Related subjects include:

August 12, 2006

Introduction to Greenplum and some compare/contrast

Netezza relies on FPGAs. DATallegro essentially uses standard components, but those include Infiniband cards (and there’s a little FPGA action when they do encryption). Greenplum, however, claims to offer a highly competitive data warehouse solution that’s so software-only you can download it from their web site. That said, their main sales mode seems to also be through appliances, specifically ones branded and sold by Sun, combining Greenplum and open source software on a “Thumper” box. And the whole thing supposedly scales even higher than DATallegro and Netezza, because you can manage over a petabyte if you chain together a dozen of the 100 terabyte racks.
Read more

July 9, 2006

OS-DBMS integration

A Slashdot thread tonight on the possibility of Oracle directly supporting Linux got me thinking – integration of DBMS and OS is much more common than one might at first realize, especially least in high-end data warehousing.

Think about it.

This trend isn’t quite universal, of course. Open systems DB2 and Sybase and Progress and MySQL and so on are quite OS-independent, and of course you could dispute my characterization of Oracle as being “integrated” with the underlying OS. But in performance-critical environments, DBMS are often intensely OS-aware.

And of course this dovetails with a point I noted in another thread – DBMS are (or need to become) increasingly aware of chip architecture details as well.

July 3, 2006

DATallegro’s technical strategy

Few areas of technology boast more architectural diversity than data warehousing. Mainframe DB2 is different from Teradata, which is different from the leading full-spectrum RDBMS, which are different from disk-based appliances, which are different from memory-centric solutions, which are different from disk-based MOLAP systems, and so on. What’s more, no two members of the same group are architected the same way; even the market-leading general purpose DBMS have important differences in their data warehousing features.

The hot new vendor on the block is DATallegro, which is stealing much of the limelight formerly enjoyed by data warehouse appliance pioneer Netezza. (After some good early discussions, Netezza abruptly reneged on a promise a year ago to explain more about its technology workings to me, and I’ve hardly heard from them since. Yes, they’re still much bigger than DATallegro, but I suspect they’ve hit some technical roadblocks, and their star is fading.)

Read more

June 28, 2006

Good DATallegro/Intel white paper

I really like this short white paper, which carries the personal byline of Stuart Frost. Stuart is DATallegro’s CEO, and also the guy who does analyst relations for them (at least in my case). Part of it just does a concise job of spelling out some of the DATallegro story. But the rest is about the comparison between Intel’s new dual-core “Woodcrest” Xeons and their single-core predecessors. Not only does it give credible statistics, it gives understanding of the reasons behind them.

Read more

May 22, 2006

Data warehouse appliances

If we define a “data warehouse appliance” as “a special-purpose computer system, with appliance administratibility, that manages a data warehouse,” then there are two major contenders: Netezza and DATAllegro, both startups, both with a small number of disclosed customers. Past contenders would include Teradata and White Cross (which seems to have just merged into Kognitio), but neither would admit to being in that market today. (I suspect this is a mistake on Teradata’s part, but so be it.) IBM with DB2 on the z-Series wouldn’t be properly regarded as an appliance player either, although IBM is certainly conscious of appliance competition. And SAP’s BI Accelerator does not persist data at this time.

In principle, the Netezza and DATAllegro stories are similar — take an established open source RDBMS*, build optimized hardware to run it, and optimize the software configuration as well. Much of the optimization is focused on getting data on and off disk sequentially, minimizing any random accesses. This is why I often refer to data warehouse appliances as being the best alternative to memory-centric data management. Beyond that, the optimizations by the two vendors differ considerably.
*Netezza uses PostgreSQL; DATAllegro uses Ingres.

Hmm. I don’t feel like writing more on this subject at this very moment, yet I want to post something urgently because there’s an IOU in my Computerworld column today for it. OK. More later.

November 14, 2005

So how robust is Ingres?

CA is spinning off Ingres, more or less, to an investment fund led by Terry Garnett, who will also be interim of CEO. Now, I’ve given Terry a lot of grief over the decades. It started by accident, when I bashed his presentation of Lightyear at a 1984 party in Rosann Stach’s house (where we also used Jerry Kaplan as a subject for the Mindprober psychological analysis product — those were the days of goofy software!). Years later, I didn’t even recall that had been Terry until I was reminded. But in the early 1990s, when Terry and Jerry Baker were dueling at Oracle, I was firmly in the Jerry Baker camp, and believe I was right to this day. Still — be all that as it may, Terry knows DBMS and knows promotion, and if the company falls flat it won’t be because he screwed it up. He’s no dunce, and he’s been around DBMS a loooong time.

But how stands the product? Let’s flash back a decade, to when CA bought it. Ingres was a solid general-purpose RDBMS. But it was beginning to fall behind the technology power curve, especially on the data warehousing side. (For more detail, see my Ingres history post over in the Software Memories blog.) And then product development slowed to a crawl. Tony Gaughan, who ran the product for CA before the latest move, claims that they’ve actually done a good job on advancing the product on the OLTP side, perhaps to the point of comparability with Oracle9i, and certainly ahead of MySQL 5.0. I’m inclined to believe him, after applying some reasonable discount factor for expected puffery, in part because this wasn’t a high hurdle to cross. Over the past decade, the main action in high-end DBMS product enhancement has been in data warehousing and nontabular datatypes, not in OLTP.

Where Ingres definitely seems to lag is in data warehousing. E.g., there are no materialized views, and I bet that even if they have some of the index types such as bitmaps, star schemas, etc., the implementation, optimizer support, administrative support, and so on lag far behind that of Oracle and IBM. So again, the proper comparison for Ingres isn’t Oracle and IBM; it’s fellow open source vendor MySQL. Only — deserved or not, MySQL has a ton of momentum for such a small company, incuding an attractive product plan partially fueled by SAP.

Appliance vendor DATallegro makes a plausibiity argument that Ingres can be adapted for nontrivial data warehouse uses as well. But while that’s cool, and might even become persuasive once DATallegro has some happy, disclosed customers, it’s not the same as saying you want to put a big data warehouse into off-the-shelf Ingres.

So basically, I’m afraid that Ingres is going to appeal mainly to users who either already are making major use of it, or else have a huge problem with paying the license fees demanded by other vendors. I wish them well, and hope they kindle a spark somehow; but right now I don’t see where it would be coming from.

November 13, 2005

Breaking through the disk speed barrier

Most aspects of computer performance and capacity grow at Moore’s Law kinds of speeds. Doubling times may be anywhere from 9 months to 2 years, but in any case speeds and storage capacities grow exponentially quickly. Not so, however, with disk rotation speeds. The very first disk drives, over 50 years ago, rotated 1,200 times per minute. Today’s top disk rotation speed is around 15,000 RPM. Indeed, while I recall seeing a reference to one at 15,600 RPM, I can’t now go back and find it. Yes, folks; disk rotational speed in the entire history of computing has increased just by a measly factor of 13.

Why does this matter to DBMS design? Simply put, disk rotation speed is an absolute limit to the speed of random disk-based data access. Today’s fastest disks take 4 milliseconds to rotate once. Thus, multiple heads aside, getting something from a known but random location on the disk will take at least 2 milliseconds. And a naive data management algorithm will, for a single query, result in dozens or even hundreds of random accesses.

Thus, for a DBMS to run at acceptable speed, it needs to get data off disk not randomly, but rather a page at a time (i.e., in large blocks of predetermined size) or better yet sequentially (i.e., in continuous streams of indeterminate size). The indexes needed to assure these goals had best be sized to fit entirely in RAM. Clustering also plays an increasingly large role, so that data needed at the same time is likely to be on the same page, or at least in the same part of the disk.

Right there I’ve described some of the toughest ongoing challenges facing DBMS engineers. The big vendors all do a great job at meeting them (if they didn’t, they’d be out of business). Even so, some small companies find themselves able to beat the big guys, by some egregious cheating.

Data warehouse appliance vendors such as Netezza and especially Datallegro optimize their systems to stream data sequentially off of disk. In doing so, they go deeper into the operating systems, hardware, etc. than Oracle could ever allow itself to do. And the results seem pretty good. But I’ll write about that another time. Instead, I’m focusing right now on memory-centric data management; please see my other posts in that topic category.

August 8, 2005

Down with database consolidation!

As with all changes in information technology, the move to DBMS2 will largely be one of evolution. But it does have a couple of revolutionary aspects.

Short-term, the biggest change is a renunciation of database and DBMS vendor consolidation. Consolidation never has worked, it never will work, and as data integration technologies keep improving it’s not that important anyway.

IBM and Oracle offer really great, brilliantly complex data warehousing technology. But if you want the most bang for the buck, forget about them, and go instead with a specialty vendor. Depending on the specifics of your situation, Teradata, Netezza, Datallego, WhiteCross, or SAP may offer the best choice, and that list could be even longer.

Similarly, for generic OLTP data management, cheap and/or open source options are getting ever more attractive. Microsoft is a serious contender for applications that previously only Oracle and IBM could handle, while MySQL and maybe Ingres are moving up the food chain right behind.

In many cases, these alternative technologies are lower-cost across the board: Lower purchase price, lower ongoing maintenance fees, and lower administrative costs.

So what, again, is the case for consolidation?

← Previous Page

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.