Data warehouse appliances
Analysis of data warehouse appliances – i.e., of hardware/software bundles optimized for fast query and analysis of large volumes of (usually) relational data. Related subjects include:
- Data warehousing
- Parallelization
- Netezza
- DATAllegro
- Teradata
- Kickfire
- (in The Monash Report) Computing appliances in multiple domains
DATAllegro vs. Vertica and other columnar systems
Stuart Frost of DATAllegro offered an interesting counter today to columnar DBMS architectures — vertical partitioning. In particular, he told me of a 120 terabyte (growing soon to 250 terabytes) call data record database, in which a few key columns were separated out. Read more
Categories: Columnar database management, Data warehouse appliances, Data warehousing, DATAllegro, Kognitio, Vertica Systems | 13 Comments |
Netezza under fire
I talk to a lot of data warehouse software and/or appliance start-ups. Naturally, they’re all gunning for Netezza, and regale me with stories about competitive replacements, competitive wins, benchmark wins, and the like. And there have been a couple of personnel departures too, notably development chief Bill Blake. Netezza insists this is because he got a CEO offer he couldn’t refuse, he’s still friendly with the company, development plans are entirely on track, and news of some sort is coming out in a few weeks. Also, Greenplum brags that its Asia/Pacific manager was snagged from Netezza.
On the other hand, Netezza claims lots of sales momentum, and that’s certainly consistent with what I hear from its competitors. Read more
Categories: Business Objects, Data warehouse appliances, Data warehousing, Greenplum, Netezza | Leave a Comment |
Word of the day: “Compression”
IBM sent over a bunch of success stories recently, with DB2’s new aggressive compression prominently mentioned. Mike Stonebraker made a big point of Vertica’s compression when last we talked; other column-oriented data warehouse/mart software vendors (e.g. Kognitio, SAP, Sybase) get strong compression benefits as well. Other data warehouse/mart specialists are doing a lot with compression too, although some of that is governed by please-don’t-say-anything-good-about-us NDA agreements.
Compression is important for at least three reasons:
- It saves disk space, which is a major cost issue in data warehousing.
- It saves I/O, which is the major performance issue in data warehousing.
- In well-designed systems, it can actually make on-chip execution faster, because the gains in memory speed and movement can exceed the cost of actually packing/unpacking the data. (Or so I’m told; I haven’t aggressively investigated that claim.)
When evaluating data warehouse/mart software, take a look at the vendor’s compression story. It’s important stuff.
EDIT: DATAllegro claims in a note to me that they get 3-4x storage savings via compression. They also make the observation that fewer disks ==> fewer disk failures, and spin that — as it were 🙂 — into a claim of greater reliability.
Categories: Data warehouse appliances, Data warehousing, Database compression, DATAllegro, IBM and DB2, SAP AG, Vertica Systems | 3 Comments |
Greenplum’s strategy
I talked with Greenplum honchos Bill Cook and Scott Yara yesterday. Bill is the new CEO, formerly head of Sun’s field operations. Scott is president, and in effect the marketing-guy co-founder. I still don’t know whether I really believe their technical story. But I do think I have a feel for what they’re trying to do. Key aspects of the Greenplum strategy include:
- Greenplum rewrote a lot of PostgreSQL to parallelize it, in the correct belief that MPP is the best way to go for high-end data warehousing.
- Indeed, Greenplum claims to have a general solution to DBMS parallelization. Unlike Netezza, DATallegro, Vertica, and Kognitio, Greenplum offers a row-oriented data store with a fairly full set of indexing techniques. You want star indices or bitmaps? They have them. (They even claimed to be used for some text management when last we talked, although that was for O’Reilly and Mark Logic seems to be O’Reilly’s main text-indexing vendor.)
- Greenplum’s main sales strategy is to be part of Sun’s product line, bundled into Thumper boxes as single-part-number Sun offerings. They certainly could add other hardware OEMs, just like Checkpoint sells firewalls through multiple appliance vendors. But at least for now it’s all about Sun.
Categories: Data warehouse appliances, Data warehousing, Greenplum, Open source, PostgreSQL | 5 Comments |
DBMS market competitive overview (Part 1)
Monash Advantage members just received an exclusive nine-page Monash Letter with a competitive overview of the DBMS industry. The full analysis is exclusive to them, but I’ll give some highlights here.
1. As per my recent “deck-clearing” posts, there’s a lot more competitive opportunity in the DBMS industry than many observers recognize.
2. One reason is the considerable number of separate niches in the DBMS space.
3. Oracle is a classical Geoffrey Moore “gorilla” only in the market for high-end OLTP and mixed-used DBMS. Everything else is up for grabs.
4. As discussed here extensively, simpler appliance-like architectures are beating the overly complex general-purpose DBMS vendors’ solutions for VLDB data warehousing.
5. MPP/shared-nothing architectures are deservedly beating SMP/shared-everything approaches for VLDB data warehousing.
That’s not the only Monash Letter recently released; another one covered online marketing strategy and tactics.
Categories: Data warehouse appliances, Data warehousing, Database diversity, Oracle, Theory and architecture | Leave a Comment |
Why Oracle and Microsoft will lose in VLDB data warehousing
I haven’t been as clear as I could have been in explaining why I think MPP/shared-nothing beats SMP/shared-everything. The answer is in a short white paper, currently bottlenecked at the sponsor’s end of the process. Here’s an excerpt from the latest draft:
There are two ways to make more powerful computers:
1. Use more powerful parts – processors, disk drives, etc.
2. Just use more parts of the same power.
Of the two, the more-parts strategy much more cost-effective. Smaller* parts are much more economical, since the bigger the part, the harder and more costly it is to avoid defects, in manufacturing and initial design alike. Consequently, all high-end computers rely on some kind of parallel processing.
*As measured in terms of capacity, transistor count, etc., not physical size. Read more
Categories: Data warehouse appliances, Data warehousing, DATAllegro, Microsoft and SQL*Server, Netezza, Oracle, Parallelization, Teradata, Theory and architecture, Vertica Systems | 7 Comments |
Really big databases
Business Intelligence Lowdown has a well-dugg post listing what it claims are the 10 largest databases in the world. The accuracy leaves much to be desired, as is illustrated by the fact that #10 on the list is only 20 terabytes, while entirely unmentioned is eBay’s 2-petabyte database (mentioned here, and also here). Read more
Categories: Data warehouse appliances, Data warehousing, DATAllegro, Greenplum, IBM and DB2, Netezza, Oracle, SAS Institute, Teradata, Theory and architecture | 4 Comments |
If you can’t trust the storage vendors …
… isn’t that another reason to go with massively parallel systems?
StorageMojo has a great post on storage myth and reality.
Want to continue getting great research about DBMS, analytics, and other technologies related to data management? Then subscribe to our feed, by RSS/Atom or e-mail! We recommend taking the integrated feed for all our blogs, but blog-specific ones are also easily available.
Data warehouse appliance hardware strategies
Recently, I’ve done extensive research into the hardware strategies of computing appliance vendors, across multiple functional areas. Data warehousing, firewall/unified threat management, antispam, data integration – you name it, I talked to them. Of course, each vendor has a unique twist. But some architectural groupings definitely emerged.
The most common approaches seem to be:
Type 1: Custom assembly from off-the-shelf parts. In this model, the only unusual (but still off-the-shelf) parts are usually in the area of network acceleration (or occasionally encryption). Also, the box may be balanced differently than standard systems, in terms of compute power and/or reliability.
Type 2 (Virtual): We don’t need no stinkin’ custom hardware. In this model, the only “appliancy” features are in the areas of easy deployment, custom operating systems, and/or preconfigured hardware.
And of course there are also appliances of Type 0: Custom hardware including proprietary ASICs or FPGAs.
Different markets had different emphases; e.g., firewall appliances are typically Type 1, while antispam devices cluster in Type 2. But the data warehouse appliance market is highly diverse, which maybe shouldn’t be a surprise. After all, the revenue market leader is non-appliance software vendor Oracle, while noisy upstart Netezza is famous for its FPGA. Read more
Categories: Data warehouse appliances, Data warehousing, DATAllegro, Greenplum, IBM and DB2, Kognitio, Netezza, Teradata | 8 Comments |
And then there were two: DATAllegro seems to be going with standard hardware
A while ago – for example, in a comment dated July 9, 2006 — CEO Stuart Frost of DATAllegro hinted that the company might port its software to commodity hardware before long. If this user story is to be believed, that has now happened. (Specific quote: “the Datallegro system is based on Dell and EMC hardware …”) Officially, the company is doing a Sgt. Schultz on the subject. But the evidence is pretty clear. Read more
Categories: Data warehouse appliances, Data warehousing, DATAllegro | 3 Comments |