Data warehouse appliances
Analysis of data warehouse appliances – i.e., of hardware/software bundles optimized for fast query and analysis of large volumes of (usually) relational data. Related subjects include:
- Data warehousing
- Parallelization
- Netezza
- DATAllegro
- Teradata
- Kickfire
- (in The Monash Report) Computing appliances in multiple domains
Netezza’s April Fool press release
Short and cute. Even makes a genuine marketing point (low power consumption), and ties into past marketing gimmicks (they’ve played Pimp My SPU in the past, with dramatic paint jobs).
Netezza Corporation (NYSE Arca: NZ), the global leader in data warehouse and analytic appliances, today introduced a limited-edition range of its award-winning Netezza system. Expected to become an instant industry collectible, the systems can now be purchased in a variety of color finishes – pink, blue, red or silver. The standard gun-metal gray unit will continue to be the default option for orders requiring eight or more units, to ensure availability.
Affectionately known as ‘the Netezza’ by customers and partners, the systems not only offer unparalleled processing performance, but the secret sauce of its innovative design is also leading the way in effective power and cooling management – making it a truly green option for any data center.
Not earth-shaking — even if it purports to be earth-saving — but unless I’ve overlooked a biggie, there isn’t much competition this rather lame April Fool’s year.
Categories: Data warehouse appliances, Data warehousing, Humor, Netezza | 5 Comments |
Disruption versus chasm crossing in the database market
The 451 Group just released a report on open source DBMS adoption. In a blog post announcing same, Matthew Aslett wrote (emphasis mine):
you only have to look at the comparative revenues of the open source and proprietary vendors to see that there is a vast chasm to be crossed.
“Chasm” memes were introduced by Geoffrey Moore, founder of the Chasm Group and author of Crossing the Chasm. His defining example was Oracle, and the database market in general. The core insight was that platform markets get to tipping points, after which the leaders have tremendous advantages that make them tend to remain leaders for a good long time.
The sequel to “chasm” theory is Clayton Christensen’s “disruption” rubric, popularized in The Innovator’s Dilemma. I’ve argued previously that the DBMS market is being disrupted, in both the ways that Christensen records: Read more
Categories: Data warehouse appliances, Open source | 1 Comment |
Data warehousing with paper clips and duct tape
An interesting part of my conversation with Dataupia’s CTO John O’Brien came when we talked about data warehousing in general. On the one hand, he endorsed the view that using Oracle probably isn’t a good idea for data warehouses larger than 10 terabytes, with SQL Server’s limit being well below that. On the other hand, he said he’d helped build 50-60 terabyte warehouses in Oracle years ago.
The point is that to build warehouses that big in Oracle or other traditional DBMS, you have to pull out a large bag of tricks. Read more
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Microsoft and SQL*Server, Oracle | 17 Comments |
Dataupia catch-up
I had a catch-up phone meeting with Dataupia, since I hadn’t spoke with the company since the middle of last year. Like several other companies in the data warehouse specialist market, Dataupia can be annoyingly secretive. On the plus side – and this is very refreshing — Dataupia doesn’t seem to expect credit for accomplishments beyond those they’re willing to provide actual evidence for.
What I’ve gleaned about Dataupia’s customer activity to date amounts to: Read more
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Dataupia, Emulation, transparency, portability | 1 Comment |
The biggest eBay database
There’s been some confusion over my post about eBay’s multiple petabytes of data. So to clarify, let me say:
- eBay’s figure of >1.4 petabytes of data — for its largest single analytic database — counts disks or something, not raw user data.
- I previously published a strong conjecture that the database vendor in question was Teradata, which is definitely an eBay supplier. In particular, it is definitely not an Oracle data warehouse.
- While eBay isn’t saying who it is either — not even off-the-record — the 50%ish compression figures they experience just happen to map well to Teradata’s usual range.
- Edit: Just to be clear — not that there was any doubt, but I have reconfirmed that eBay is a Teradata user, in or including eBay’s Paypal division.
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, eBay, Specific users, Teradata | 1 Comment |
Database management system choices – relational data warehouse
This is the third of a five-part series on database management system choices. For the first post in the series, please click here.
High-end OLTP relational database management system vendors try to offer one-stop shopping for almost all data management needs. But as I noted in my prior post, their product category is facing two major competitive threats. One comes from specialty data warehouse database management system products. I’ve covered those extensively in this blog, with key takeaways including:
- Specialty data warehouse products offer huge cost advantages versus less targeted DBMS. This applies to purchase/maintenance and administrative costs alike. And it’s true even when the general-purposed DBMS boast data warehousing features such as star indexes, bitmap indexes, or sophisticated optimizers.
- The larger the database, the bigger the difference. It’s almost inconceivable to use Oracle for a 100+ terabyte data warehouse. But if you only have 5 terabytes, Oracle is a perfectly viable – albeit annoying and costly – alternative.
- Most specialty data warehouse products have a shared-nothing architecture. Smaller parts are cheaper per unit of capacity. Hence shared nothing/grid architectures are inherently cheaper, at least in theory. In data warehousing, that theoretical possibility has long been made practical.
- Specialty data warehouse products with row-based architectures are commonly sold in appliance formats. In particular, this is true of Teradata, Netezza, DATAllegro, and Greenplum. One reason is that they’re optimized to stream data off of disk fairly sequentially, as opposed to relying on random seeks.
- Specialty data warehouse products with columnar architectures are commonly available in software-only formats. Even so, Vertica and ParAccel also boast appliance deals, with HP and Sun respectively.
- There is tremendous technical diversity and differentiation in the specialty data warehouse system market.
Let me expand on that last point. Different features may or may not be important to you, depending on whether your precise application needs include: Read more
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Database diversity, Theory and architecture | 20 Comments |
Kognitio WX2 overview
I had a call today with Kognitio execs Paul Groom and John Thompson. Hopefully I can now clear up some confusion that was created in this comment thread. (Most of what I wrote about Kognitio in October, 2006 still applies.) Here are some highlights. Read more
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Kognitio | 12 Comments |
Is Teradata bringing out a low-end data warehouse appliance?
Edit: This post is superseded by our analysis of the new Teradata 2500 data warehouse appliance.
One of Teradata’s competitors believes they got an accurate leak about a new low-end Teradata appliance. Teradata is neither confirming nor denying. I believe the leak.
I’m not going to give product or pricing details, which in any case could be subject to change before a final product release. But the general idea is:
- Commodity Dell servers.
- Some of the higher-end software stripped out.
- Limit on the number of nodes, leading to a database size limit somewhere in the tens of terabytes.
It will be interesting to see whether Teradata can come out with something that’s closely competitive in price, performance, and administrative ease to what the newer data warehouse appliance vendors offer, yet upgrades cleanly to full-sophistication Teradata systems for those who choose to pursue that path.
Categories: Data warehouse appliances, Data warehousing, Teradata | 1 Comment |
Flash-based data warehousing is getting ever closer
EMC is rolling out solid-state drives later this quarter. The press release mentions the word “terabyte”, so this is for non-trivial systems. And by the way, 100,000 write/erase cycles before something wears out is several per hour, so that’s a non-problem for data warehousing.
ParAccel and SAP already offer RAM-based appliances. I suspect we’ll see appliances based on solid-state drives before long. I also wouldn’t be shocked if a non-appliance vendor such as Oracle suddenly jumped into this area, trying to use it as a way to leapfrog the appliance vendors.
Categories: Data warehouse appliances, Data warehousing | 1 Comment |
Netezza targets 1 petabyte
Netezza is promising petabyte-scale appliances later this year, up from 100 terabytes. That’s user data (I checked), and assumes 2-3X compression, or a little less than they think is actually likely. I.e., they’re describing their capacity in the same kinds of terms other responsible vendors do. They haven’t actually built and tested any 1 petabyte systems internally yet, but they’ve gone over 100 terabytes.
Basically, this leaves Netezza’s high-end capability about 10X below Teradata’s. On the other hand, it should leave them capable of handling pretty much every Teradata database in existence. Read more