Data warehouse appliances
Analysis of data warehouse appliances – i.e., of hardware/software bundles optimized for fast query and analysis of large volumes of (usually) relational data. Related subjects include:
- Data warehousing
- Parallelization
- Netezza
- DATAllegro
- Teradata
- Kickfire
- (in The Monash Report) Computing appliances in multiple domains
Kickfire capacity and pricing
Kickfire’s marketing communication efforts are still a work in progress. Kickfire did finally relax its secrecy about FPGA-vs.-custom-silicon – not coincidentally during Netezza’s recent publicity cycle. That wise choice helped Kickfire get some favorable attention recently for its technical and market strategy, e.g. from Daniel Abadi, Merv Adrian and, kicking things off — as it were — me. Weeks after a recent Kickfire product release, there’s finally a fairly accurate data sheet up, although there’s still one self-defeatingly misleading line I’ll comment on below. Pricing is a whole other area of confusion, although it seems that current list prices have been inadvertently* leaked in Merv’s post linked above, with only one inaccuracy that I can detect.**
*I gather from the company that they forgot to tell Merv pricing was NDA.
** Merv cited a price as “starting” that I believe to be top-of-the-line. No criticism of Merv is implied in that; Kickfire has not been very clear in communicating hard numbers.
All that said, if one takes Kickfire’s marketing statements literally, Kickfire list pricing is around $20-50K per terabyte for a few small, fixed, high-performance configurations. That’s all-in, for plug-and-play appliances. What’s more, that range is based on the actual published user data capacity numbers for various Kickfire models, which I think are low for several reasons:
- Kickfire doesn’t officially admit that its model with 14.4 terabytes of disk can manage more than 6 terabytes of data, even though it clearly can.
- Actually, those 14.4 terabytes of disk can be increased or lowered as you choose.
- The basic compression figures implied in those calculations seem conservative.
- Compression figures are a lot more conservative yet, in that Kickfire assumes you’ll have a lot of actual indexes on your data. I’m not sure that’s necessary for most workloads.
Categories: Columnar database management, Data warehouse appliances, Data warehousing, Database compression, Kickfire, Pricing | 3 Comments |
Oracle Exadata 2 capacity pricing
Summary of Oracle Exadata 2 capacity pricing
Analyzing Oracle Exadata pricing is always harder than one would first think. But I’ve finally gotten around to doing an Oracle Exadata 2 pricing spreadsheet. The main takeaways are:
- If we believe Oracle’s claims of 10X compression, Exadata 2 costs more per terabyte of user data than Netezza TwinFin — $22-26K/TB vs. TwinFin’s <$20K — but less than the Teradata 2550.
- These figures are highly sensitive to assumptions about Oracle’s hybrid columnar compression.
- Similarly, if Netezza or Teradata were to significantly upgrade their own compression, the price comparison would look quite different.
- Options such as Data Mining or Oracle Spatial add 12% or so each to Exadata’s total system price.
Longer version
When Oracle introduced Exadata last year it was, well, expensive. Exadata 2 has now been announced, and it is significantly cheaper than Exadata 1 per terabyte of user data, based on:
- Similar overall pricing
- Twice the disk capacity
- Better compression
Categories: Analytic technologies, Columnar database management, Data warehouse appliances, Data warehousing, Database compression, Exadata, Netezza, Oracle, Pricing, Teradata | 13 Comments |
Facts and rumors
- Vertica is putting out a press release today touting its 100th customer, and talking of triple digit growth last year.
- Multiple sources have told me that the DATAllegro system is being thrown out of Dell, so evidently Dell is telling this to one and all. If that goes through, this would presumably leave TEOCO as DATAllegro’s single happy customer. (I haven’t checked with Microsoft for its view.)
- A rumor has it that Infiniband technology vendor Voltaire, Ltd. privately claims triple-digit sales of switches for Exadata 1 (I think that one would be one switch per Exadata installation, not per rack). Based just on a quick glance, this is far from confirmed by Voltaire’s earnings conference call transcripts or SEC filings. However, the most recent transcript does seem to indicate Voltaire got multiple Exadata deals in the telecommunications sector, and suggests some Exadata penetration in other sectors as well.
- I was told of a classified-agency user that has >1 petabyte of data on Exadata 1 and 600 terabytes or so on Netezza. My not-obviously-biased source says the agency is distinctly happier with Netezza than Exadata.
- Like ParAccel, Oracle just got dinged for TPC-related misbehavior.
- Rumor has it that Sun has no intention of helping ParAccel rerun its withdrawn TPC-H benchmark.
- ParAccel has withdrawn the claim from its home page to be the “CERTIFIED” price-performance leader. This seems to confirm that the claim was a reference to the TPC-H. In my opinion, that was a gross misrepresentation of what the TPC-H shows.
What Nielsen really uses in data warehousing DBMS
In its latest earnings call, Oracle made a reference to The Nielsen Company that was — to put it politely — rather confusing. I just plopped down in a chair next to Greg Goff, who evidently runs data warehousing at Nielsen, and had a quick chat. Here’s the real story.
- The Nielsen Company has over half a petabyte of data on Netezza in the US. This installation is growing.
- The Nielsen Company indeed has 45 terabytes or whatever of data on Oracle in its European (Customer) Information Factory. This is not particularly growing. Nielsen’s Oracle data warehouse has been built up over the past 9 years. It’s not new. It’s certainly not on Exadata, nor planned to move to Exadata.
- These are not single-instance databases. Nielsen’s biggest single Netezza database is 20 terabytes or so of user data, and its biggest single Oracle database is 10 terabytes or so.
- Much (most?) of the rest of the installations are customer data marts and the like, based in each case on the “big” central database. (That’s actually a classic data mart use case.) Greg said that Netezza’s capabilities to spin out those databases seemed pretty good.
- That 10 terabyte Oracle data warehouse instance requires a lot of partitioning effort and so on in the usual way.
- Nielsen has no immediate plans to replace Oracle with Netezza.
- Nielsen actually has 800 terabytes or so of Netezza equipment. Some of that is kept more lightly loaded, for performance.
Categories: Analytic technologies, Data mart outsourcing, Data warehouse appliances, Data warehousing, Netezza, Oracle, Specific users | 6 Comments |
Thoughts on the integration of OLTP and data warehousing, especially in Exadata 2
Oracle is pushing Exadata 2 as being a great system for any of OLTP (OnLine Transaction Processing), data warehousing or, presumably, the integration of same. This claim rests on a few premises, namely: Read more
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Exadata, OLTP, Oracle, Solid-state memory, Theory and architecture | 36 Comments |
The hunt for Oracle Exadata production references
Over the past four weeks, I’ve given speeches in Boston, DC, Milan, London, and SF,* attended a conference in Lyon, done a fair amount of consulting, and taken a few non-client briefings as well. That’s why I haven’t had much of a chance to sit down, analyze the tea leaves, and write about Exadata 2. (Small exception: Highlights from and remarks on the Oracle Database 11g Release 2 white paper.) I hope to do that soon.
*I’ll bop over to Chicago for the last of the series early next week.
But first — can anybody identify much in the way of Exadata production references? Oracle recently talked of a few flagship data warehouse customers, but those don’t seem to be running Exadata. I talked recently with an Oracle prospect from the US, who only got one reference from Oracle — in Eastern Europe. (Well, two references, if you also count the system integrator on the same deal.)
So far as I can tell, Oracle Exadata production sites are pretty scarce on the ground. What, if anything, am I missing?
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Exadata, Market share and customer counts, Oracle | 17 Comments |
Notes on the Oracle Database 11g Release 2 white paper
The Oracle Database 11g Release 2 white paper I cited a couple of weeks ago has evidently been edited, given that a phrase I quoted last month is no longer to be found. Anyhow, here are some quotes from and comments on what evidently is the latest version. Read more
Teradata really means that those 100+ appliances are in PRODUCTION
I was misremembering. It turns out that when Teradata said it had over 100 appliances “in production”, it meant that >100 hardware-based appliances are actually in production. If you add in the software-only “appliances,” and count test/development as well as true production, the total rises to >200.
I tried to get a finer breakdown out of Teradata on a disclosable basis, but failed. The ostensible reason is that public companies often don’t do that sort of thing without permission from the investor relations department, and Teradata’s marketers evidently haven’t felt a sense of urgency about getting permission to, for example, communicate how well just the 25xx series is doing.
Categories: Data warehouse appliances, Data warehousing, Market share and customer counts, Teradata | 1 Comment |
SAS on Netezza and other Netezza extensibility
I chatted with SAS CTO Keith Collins yesterday about the new SAS/Netezza in-database parallel data mining scoring offering. My impression is that this is very similar to SAS’ current Teradata support, notwithstanding SAS’ and Teradata’s apparent original intention of offering in-database modeling by now as well.
I gather this is a big performance-enhancing deal, just as it is for SPSS or Oracle’s own data mining over Oracle. However, I must confess to not yet understanding why. That is, I don’t know what’s so complicated about data mining scoring algorithms that makes hand-coding them in SQL particularly forbidding. My naive view of data mining is that you do a big regression to get a bunch of weights, and the resulting scoring algorithm is a linear combination of a few dozen variables. Evidently, that’s not quite right.
Anyhow, it turns out that SAS held off on this work until it could be done for TwinFin. That’s largely because TwinFin lets partners write code on Intel CPUs, while previously they had to write in C for Netezza’s FPGAs. I got a similar sense from at least one other Netezza partner as well.
Categories: Data warehouse appliances, Data warehousing, Netezza, Predictive modeling and advanced analytics, SAS Institute | 5 Comments |
Teradata has over 100 appliances in production
I recently wrote that Teradata had gotten serious about appliance product lines, and had non-trivial sales figures for them. In a press release today, Teradata is now explicitly saying (emphasis mine):
Teradata now has more than 100 appliances in production, including the Data Mart Appliance 551, the Data Warehouse Appliance 2550, and the Extreme Data Appliance 1550, which complement the core platform, the Teradata Active Enterprise Data Warehouse 5550.
The breakdowns on that are NDA, and anyhow I can’t find them immediately in my notes.* But if memory serves — while a lot of those appliances are used for test and development, a whole other lot of them are used to do actual production query-answering work. (Edit: Memory turned out to be wrong.) Read more