DATAllegro
Analysis of data warehouse appliance vendor DATAllegro and its products. Related subjects include:
- Microsoft, which is buying DATAllegro
- Data warehouse appliances
- Data warehousing
DATAllegro vs. Vertica and other columnar systems
Stuart Frost of DATAllegro offered an interesting counter today to columnar DBMS architectures — vertical partitioning. In particular, he told me of a 120 terabyte (growing soon to 250 terabytes) call data record database, in which a few key columns were separated out. Read more
Categories: Columnar database management, Data warehouse appliances, Data warehousing, DATAllegro, Kognitio, Vertica Systems | 13 Comments |
Word of the day: “Compression”
IBM sent over a bunch of success stories recently, with DB2’s new aggressive compression prominently mentioned. Mike Stonebraker made a big point of Vertica’s compression when last we talked; other column-oriented data warehouse/mart software vendors (e.g. Kognitio, SAP, Sybase) get strong compression benefits as well. Other data warehouse/mart specialists are doing a lot with compression too, although some of that is governed by please-don’t-say-anything-good-about-us NDA agreements.
Compression is important for at least three reasons:
- It saves disk space, which is a major cost issue in data warehousing.
- It saves I/O, which is the major performance issue in data warehousing.
- In well-designed systems, it can actually make on-chip execution faster, because the gains in memory speed and movement can exceed the cost of actually packing/unpacking the data. (Or so I’m told; I haven’t aggressively investigated that claim.)
When evaluating data warehouse/mart software, take a look at the vendor’s compression story. It’s important stuff.
EDIT: DATAllegro claims in a note to me that they get 3-4x storage savings via compression. They also make the observation that fewer disks ==> fewer disk failures, and spin that — as it were 🙂 — into a claim of greater reliability.
Categories: Data warehouse appliances, Data warehousing, Database compression, DATAllegro, IBM and DB2, SAP AG, Vertica Systems | 3 Comments |
Ingres tries to become relevant again
Ingres has non-trivial resources – 300 employees, 10,000 “real” customers, and some additional large number of installations embedded in CA products. It has a fairly pure support-only open source revenue model, although there may be exceptions to that in cases such as the DATAllegro relationship.
Should anybody care?
Yes and no. To compete effectively in the mid-range OLTP relational database management system market, you need a product that’s much easier to administer than Oracle, and preferably easier even than Microsoft SQL*Server. Ingres doesn’t meet that standard. Until it does, it probably won’t have much of a market outside its current installed base. But some of Ingres’s strategies and directions are pretty clever, and may be interesting to people who’d never actually consider using Ingres technology. Specifically, Ingres has plans in the areas of appliances and database services, two subjects that are close to my heart. Read more
Categories: Actian and Ingres, DATAllegro | 2 Comments |
Why Oracle and Microsoft will lose in VLDB data warehousing
I haven’t been as clear as I could have been in explaining why I think MPP/shared-nothing beats SMP/shared-everything. The answer is in a short white paper, currently bottlenecked at the sponsor’s end of the process. Here’s an excerpt from the latest draft:
There are two ways to make more powerful computers:
1. Use more powerful parts – processors, disk drives, etc.
2. Just use more parts of the same power.
Of the two, the more-parts strategy much more cost-effective. Smaller* parts are much more economical, since the bigger the part, the harder and more costly it is to avoid defects, in manufacturing and initial design alike. Consequently, all high-end computers rely on some kind of parallel processing.
*As measured in terms of capacity, transistor count, etc., not physical size. Read more
Categories: Data warehouse appliances, Data warehousing, DATAllegro, Microsoft and SQL*Server, Netezza, Oracle, Parallelization, Teradata, Theory and architecture, Vertica Systems | 7 Comments |
Really big databases
Business Intelligence Lowdown has a well-dugg post listing what it claims are the 10 largest databases in the world. The accuracy leaves much to be desired, as is illustrated by the fact that #10 on the list is only 20 terabytes, while entirely unmentioned is eBay’s 2-petabyte database (mentioned here, and also here). Read more
Categories: Data warehouse appliances, Data warehousing, DATAllegro, Greenplum, IBM and DB2, Netezza, Oracle, SAS Institute, Teradata, Theory and architecture | 4 Comments |
Data warehouse appliance hardware strategies
Recently, I’ve done extensive research into the hardware strategies of computing appliance vendors, across multiple functional areas. Data warehousing, firewall/unified threat management, antispam, data integration – you name it, I talked to them. Of course, each vendor has a unique twist. But some architectural groupings definitely emerged.
The most common approaches seem to be:
Type 1: Custom assembly from off-the-shelf parts. In this model, the only unusual (but still off-the-shelf) parts are usually in the area of network acceleration (or occasionally encryption). Also, the box may be balanced differently than standard systems, in terms of compute power and/or reliability.
Type 2 (Virtual): We don’t need no stinkin’ custom hardware. In this model, the only “appliancy” features are in the areas of easy deployment, custom operating systems, and/or preconfigured hardware.
And of course there are also appliances of Type 0: Custom hardware including proprietary ASICs or FPGAs.
Different markets had different emphases; e.g., firewall appliances are typically Type 1, while antispam devices cluster in Type 2. But the data warehouse appliance market is highly diverse, which maybe shouldn’t be a surprise. After all, the revenue market leader is non-appliance software vendor Oracle, while noisy upstart Netezza is famous for its FPGA. Read more
Categories: Data warehouse appliances, Data warehousing, DATAllegro, Greenplum, IBM and DB2, Kognitio, Netezza, Teradata | 8 Comments |
And then there were two: DATAllegro seems to be going with standard hardware
A while ago – for example, in a comment dated July 9, 2006 — CEO Stuart Frost of DATAllegro hinted that the company might port its software to commodity hardware before long. If this user story is to be believed, that has now happened. (Specific quote: “the Datallegro system is based on Dell and EMC hardware …”) Officially, the company is doing a Sgt. Schultz on the subject. But the evidence is pretty clear. Read more
Categories: Data warehouse appliances, Data warehousing, DATAllegro | 3 Comments |
Data mining is driving much of data warehousing
Until I did all this recent research on data warehousing, I didn’t realize just how big a role data mining plays in driving the whole thing. Basically, there are three things you can do with a data warehouse – classical BI, “operational” BI, and data mining. If we’re talking about long-running queries, that’s not operational BI, and it’s not all of classical BI either. The rest is data mining. Indeed, if you think back to what you know of the customer bases at data warehouse appliance vendors Netezza and DATallegro, there are a lot of credit-reporting-data types of users – i.e., data miners. And it’s hard to talk about uses for those appliances very long without SAS extracts and the like coming up.
Read more
Categories: Data warehouse appliances, Data warehousing, DATAllegro, Netezza, Oracle, Predictive modeling and advanced analytics | 8 Comments |
Vendor segmentation for data warehouse DBMS
February, 2011 edit: I’ve now commented on Gartner’s 2010 Data Warehouse Database Management System Magic Quadrant as well.
Several vendors are offering links to Gartner’s new Magic Quadrant report on data warehouse DBMS. (Edit: This is now a much better link to the 2006 MQ.) Somewhat atypically for Gartner, there’s a strict hierarchy among most of the vendors, with Teradata > IBM > Oracle > Microsoft > Sybase > Kognitio > MySQL > Sand, in each case on both axes of the matrix. The only two exceptions are Netezza and DATallegro, which are depicted as outvisioning Microsoft somewhat even as they trail both Microsoft and Sybase in execution.
Gartner Magic Quadrants tend to annoy me, and I’m not going to critique the rankings in detail. But I do think this particular MQ is helpful in framing a vendor segmentation, namely:
- Big full-spectrum MPP/shared-nothing vendors: Teradata and IBM.
- MPP/shared-nothing appliance upstarts: Netezza and DATallegro
- Big SMP/shared-everything vendors who also are apt to be your OLTP incumbent, and who want to integrate your software stack soup-to-nuts: Oracle and Microsoft
- Niche vendors: Pretty much everybody else
Categories: Data warehouse appliances, Data warehousing, DATAllegro, IBM and DB2, Microsoft and SQL*Server, Netezza, Oracle, Parallelization, Teradata | 6 Comments |
IBM and Teradata too
If I had to name one company with the broadest possible overview of the data warehouse engine market, it would have to be IBM. IBM offers software and hardware, services-heavy deals and quasi-appliances, OLTP and ROLAP, shared-everything and shared-nothing, integrated-(almost)-everything and best-of-breed. So their ROLAP recommendations, while still rather self-serving (just as any other vendor’s would be), are at least somewhat more than just a case of “Where you stand depends upon where you sit.”
At its core, the current IBM ROLAP story is:
- Shared nothing MPP.
- Flexible indexing, lightly applied.
- Normalized data models.
- Thoroughly mixed workloads.
- Preconfigured hardware.
Here’s some more detail, about IBM and other vendors alike.
Categories: Data warehouse appliances, Data warehousing, DATAllegro, IBM and DB2, Netezza, Teradata | 2 Comments |