Data warehouse appliances
Analysis of data warehouse appliances – i.e., of hardware/software bundles optimized for fast query and analysis of large volumes of (usually) relational data. Related subjects include:
- Data warehousing
- Parallelization
- Netezza
- DATAllegro
- Teradata
- Kickfire
- (in The Monash Report) Computing appliances in multiple domains
Three happy 100 terabyte-plus customers for DATAllegro
Over on my Network World blog, I asked the question “So who are DATAllegro’s actual current customers?” As regular readers know, that’s a fairly hard question to answer. TEOCO is widely known as DATAllegro’s flagship reference, but after that the list gets thin in a hurry.
As a by-the-by to other discussions, DATAllegro Stuart Frost undertook to respond in part himself. Specifically, he gave me two names of two other happy customers that are or imminently will be running DATAllegro against 100+ terabytes of user data. Read more
Categories: Data warehouse appliances, Data warehousing, DATAllegro, DBMS product categories | Leave a Comment |
Netezza update
In my usual dual role, I called Phil Francisco of Netezza to lay some post-Microsoft/DATAllegro consulting on him late on a Friday night — and then took the opportunity of being on the phone with him to get a general Netezza update. Netezza’s July quarter just ended, so they’re still in quiet period, so I didn’t press him for a lot of numerical detail. More generally, I didn’t find a lot out that wasn’t already covered in my May Netezza update. But notwithstanding all those disclaimers, it was still a pretty interesting chat. Read more
Categories: Data warehouse appliances, Data warehousing, Greenplum, Netezza, Sybase | 3 Comments |
How will Oracle save its data warehouse business?
By acquiring DATAllegro, Microsoft has seriously leapfrogged Oracle in data warehouse technology. All doubts about maturity and versatility notwithstanding, DATAllegro has a 10X or better size advantage (actually, I think it’s more like 20-40X) versus Oracle in warehouses its technology can straightforwardly handle. Oracle cannot afford to let this move go unanswered.
It’s of course possible that Oracle has been successfully developing comparable data warehouse technology internally. But it’s unlikely. Oracle hasn’t done anything that radical, internally and successfully, for about 15 years, RAC (Real Application Clusters) excepted. (I.e., since the object/relational extensibility framework started in Release 7.) So in all likelihood, the answer will come via acquisition. I think there are four candidates that make the most sense: Teradata, Vertica, ParAccel, and Greenplum. Kognitio (controlled by former Oracle honcho Geoff Squire) might be in the mix as well. Netezza is probably a non-starter because of its hardware-centric strategy.
Here’s why I’m emphasizing Teradata, Vertica, ParAccel, and Greenplum: Read more
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, DATAllegro, Greenplum, Microsoft and SQL*Server, Oracle, ParAccel, Teradata, Vertica Systems | 15 Comments |
Microsoft is buying DATAllegro
I’ve long argued that:
- Oracle and Microsoft are doomed in the data warehouse market unless they acquire MPP/shared-nothing data warehouse DBMS and/or data warehouse appliances.
- DATAllegro is the ideal acquisition for either of them.
Microsoft has now validated my claim by agreeing to buy DATAllegro. As you probably know, we’ve been covering DATAllegro extensively, as per the links listed below.
Basic deal highlights include: Read more
Declaration of Data Independence (humor)
The data warehouse appliance industry has a well-developed funny bone. Dataupia’s contribution is a Declaration of Data Independence, which begins:
When in the Course of an increasingly competitive global economy it becomes necessary for one data set to dissolve its connections to a constraining environment, the separate but inherently unequal station to which the Laws of Whose budget is larger prevails.
Related links:
- Cartoons from DATAllegro
- April Fool press release from Netezza
Oracle Optimized Warehouse Initiative
Oracle’s response to data warehouse appliances — and to IBM’s BCUs (Balanced Configuration Units) — so far is the Oracle Optimized Warehouse Initiative (OOW, not to be confused with Oracle Open World). A small amount of information about Oracle Optimized Warehouse can be found on Oracle’s website. Another small amount can be found in this recent long and breathless TDWI article, full of such brilliancies as attributing to the data warehouse appliance vendors the “claim that relational databases simply aren’t cut out for analytic workloads.” (Uh, what does he think they’re running — CODASYL DBMS?)
So far as I can tell, what Oracle Optimized Warehouse — much like IBM’s BCU — boils down to is the same old Oracle DBMS, but with recommended hardware configuration and tuning parameters. Thus, a lot of the hassle is taken out of ordering and installing an Oracle data warehouse, which is surely a good thing. But I doubt it does much to solve Oracle’s problems with price, price/performance, or the inevitable DBA hassles derived from a poorly-performing DBMS.
Categories: Data warehouse appliances, Data warehousing, Oracle | 3 Comments |
DATAllegro on compression
DATAllegro CEO Stuart Frost has been blogging quite a bit recently (and not before time!). A couple of his posts have touched on compression. In one he gave actual numbers for compression, namely:
DATAllegro compresses between 2:1 and 6:1 depending on the content of the rows, whereas column-oriented systems claim 4:1 to 10:1.
In another recent post, Stuart touched on architecture, saying:
Due to the way our compression code works, DATAllegro’s current products are optimized for performance under heavy concurrency. The end result is that we don’t use the full power of the platform when running one query at a time.
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Database compression, DATAllegro | Leave a Comment |
Data warehouse appliance power user TEOCO
If you had to name super-high-end users of data warehouse technology, your list might start with a few retailers, credit data processors, and telcos, plus the US intelligence establishment. Well, it turns out that TEOCO runs outsourced data warehouses for several of the top US telcos, making it one of the top data warehouse technology users around.
A few weeks ago, I had a fascinating chat with John Devolites of TEOCO. Highlights included:
- TEOCO runs a >200 TB DATAllegro warehouse for a major US telco. (When we hear about a big DATAllegro telco site that’s been in production for a while, that’s surely the one they’re talking about.)
- TEOCO runs around 450 TB total of DATAllegro databases across its various customers. (When Stuart Frost blogs of >400 TB “systems,” that may be what he’s talking about.)
- TEOCO likes DATAllegro better than Netezza, although the margin is now small. This is mainly for financial reasons, specifically price-per-terabyte. When TEOCO spends its own money without customer direction as to appliance brand, it buys DATAllegro.
- TEOCO runs at least one 50 TB Netezza system — originally due to an acquisition of a Netezza user — with more coming. There also is more DATAllegro coming.
- TEOCO feels 15-30 concurrent users is the current practical limit for both DATAllegro and Netezza. That’s greater than it used to be.
- Netezza is a little faster than DATAllegro on a few esoteric queries, but the difference is not important to TEOCO’s business.
- Official price lists notwithstanding, TEOCO sees prices as being in the $10K/TB range. DATAllegro’s price advantage has shrunk greatly, as others have come down to more or less match. However, since John stated his price preference for DATAllegro as being in the present tense, I presume the price match isn’t perfect.
- Teradata was never a serious consideration, for price reasons.
- In the original POC a few years ago, the incumbent Oracle — even after extensive engineering — couldn’t get an important query down under 8 hours of running time. DATAllegro and Netezza both handled it in 2-3 minutes. Similarly, Oracle couldn’t get the load time for 100 million call detail records (CDRs) below 24 hours.
- Applications sound pretty standard for telecom: Lots of CDR processing — 550 million/day on the big DATAllegro system cited above. Pricing and fraud checking. Some data staging for legal reasons (giving the NSA what it subpoenas and no more).
Categories: Analytic technologies, Data mart outsourcing, Data warehouse appliances, Data warehousing, DATAllegro, Netezza, Pricing, Specific users, Telecommunications, TEOCO | 7 Comments |
Netezza on compression
Phil Francisco put up a nice post on Netezza’s company blog about a month ago, explaining the Netezza compression story. Highlights include:
- Like other row-based vendors, Netezza compresses data on a column-by-column basis, then stores the results in rows. This is obviously something of a limitation — no run-length encoding for them — but can surely accommodate several major compression techniques.
- The Netezza “Compress Engine” compresses data on a block-by-block basis. This is a disadvantage for row-based systems vs. columnar ones in the area of compression, because columnar systems have more values per block to play with, and that yields higher degrees of compression. And among row-based systems, typical block size is an indicator of compression success. Thus, DATAllegro probably does a little better at compression than Netezza, and Netezza does a lot better at compression than Teradata.
- Netezza calls its compression “compilation.” The blog post doesn’t make the reason clear. And the one reason I can recall confuses me. Netezza once said the compression extends at least somewhat to columns with calculated values. But that seems odd, as Netezza only has a very limited capability for materialized views.
- Netezza pays the processing cost of compression in the FPGA, not the microprocessor. And so Netezza spins the overhead of the Compress Engine as being zero or free. That’s actually not ridiculous, since Netezza seems to have still-unused real estate on the FPGA for new features like compression. Read more
Categories: Analytic technologies, Columnar database management, Data warehouse appliances, Data warehousing, Database compression, Netezza, Theory and architecture | 2 Comments |
Netezza has an EMC deal too
Netezza has an EMC deal too. As befits a hardware vendor, Netezza has an actual OEM relationship with EMC, in which it is offering CLARiiONs built straight into NPS appliances. 5 TB of CLARiiON will be free in any Netezza system from 2 racks on upward. (A rack holds about 12.5 TB.) In addition, you’ll be able to buy 10 TB more of CLARiiON in every Netezza rack, if you want. The whole thing is supposed to ship before year-end. Read more
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, EMC, Netezza | 5 Comments |