Data warehouse appliances
Analysis of data warehouse appliances – i.e., of hardware/software bundles optimized for fast query and analysis of large volumes of (usually) relational data. Related subjects include:
- Data warehousing
- Parallelization
- Netezza
- DATAllegro
- Teradata
- Kickfire
- (in The Monash Report) Computing appliances in multiple domains
Kickfire’s FPGA-based technical strategy
Kickfire’s basic value proposition is that, if you have a data warehouse in the 100s of gigabytes, they’ll sell you – for $32,000 – a tiny box that solves all your query performance problems, as per the Kickfire spec sheet. And Kickfire backs that up with a pretty cool product design. However, thanks in no small part to what was heretofore Kickfire’s penchant for self-defeating secrecy, the Kickfire story is not widely appreciated.
Fortunately, Kickfire is getting over its secrecy kick. And so, here are some Kickfire technical basics.
- Kickfire is MySQL-based, with all the SQL functionality and lack of functionality that entails.
- The Kickfire/MySQL DBMS is columnar, with the usual benefits in compression and I/O reduction.
- Kickfire is based on FPGAs (Field-Programmable Gate Arrays).
- The Kickfire DBMS is ACID-compliant.
- Kickfire runs only as a single-box appliance.
- While Kickfire earlier estimated that, at least for data sets that compressed well, a Kickfire box could hold 3-10 terabytes of user data, more recent figures I’ve heard from Kickfire have been in the 1-1 /2 terabyte range. (Edit: Karl Van Der Bergh subsequently wrote in to say that the 1 1/2 TB is raw disk figure, not user data.)
The new information there is that Kickfire relies on an FPGA; Read more
Categories: Analytic technologies, Columnar database management, Data warehouse appliances, Data warehousing, Database compression, Kickfire, MySQL, Theory and architecture | 16 Comments |
What does Netezza do in the FPGAs anyway, and other questions
The news of Netezza’s new TwinFin product family has generated a lot of comments and questions, some pretty reasonable, some quite silly. E.g., I’ve seen it suggested privately or publicly that
- Netezza’s older products only handle one query at a time (nonsense, and I’m going to loyally protect the identity of the person who emailed that odd suggestion to me)
- A Netezza node can be a single point of failure (also nonsense, although performance degradation from a node failure might be considerable)
- Netezza has a cache consistency problem (also hardly true, except insofar as it’s an issue to overcome in future development as Netezza moves toward parallelizing bulk loads, transactional updates, and/or trickle feeds).
Netezza’s Phil Francisco addressed some points of this nature in a recent blog post.
More reasonable is the question:
Now that Netezza has changed its architecture, what are all those FPGAs (Field-Programmable Gate Arrays) being used for anyway?
The short answer is: Read more
Categories: Data warehouse appliances, Data warehousing, Netezza, Theory and architecture | 6 Comments |
Dataupia is officially for sale
Dataupia marketing VP Samantha Stone — who by the way has been one heck of a trooper through Dataupia’s troubles — is joining the exodus from the company. General graciousness aside, the heart of Samantha’s farewell email reads:
Unfortunately, we have had to reduce our burn rate as we seek an acquirer for our technology.
We have a group of loyal employees remaining on staff focused on current production customers and the acquisition efforts.
As part of the most recent staff reductions I will be leaving Dataupia.
Two years ago I wrote:
[Dataupia would] make a great acquisition for a BI company or DBMS vendor who could then say “Oh, no, this isn’t a DBMS appliance – it’s merely a data warehouse accelerator.” When you look at it that way, their chances of prospering look distinctly higher.
But at this point I think there probably would be more appealing ways for those vendors to meet the same needs.
Categories: Data warehouse appliances, Data warehousing, Dataupia, Emulation, transparency, portability | 14 Comments |
Teradata 13 focuses on advanced analytic performance
Last October I wrote about the Teradata 13 release of Teradata’s database management software. Teradata 13, which will be used across the various Teradata product lines, has now been announced for GCA (General Customer Availability)*. So far as I can tell, there were two main points of emphasis for Teradata 13:
- Performance (of course, performance is a point of emphasis for almost any release of any analytic DBMS product), especially but not only in the areas of aggregates, ETL (Extract/Transform/Load), and UDFs.
- UDFs (User Defined Functions), especially but not only in the areas of data mining and geospatial analysis.
To put it even more concisely, the focus of Teradata 13 is on advanced analytic performance, although there of course are some enhancements in simple query performance and in analytic functionality as well. Read more
“The Netezza price point”
Over the past couple of years, quite a few data warehouse appliance or DBMS vendors have talked to me directly in terms of “Netezza’s price point,” or some similar phrase. Some have indicated that they’re right around the Netezza price point, but think their products are superior to Netezza’s. Others have stressed the large gap between their price and Netezza’s. But one way or the other, “Netezza’s price” has been an industry metric.
One reason everybody talks about the “Netezza (list) price” is that it hasn’t been changing much, seemingly staying stable at $50-60K/terabyte for a long time. And thus Teradata’s 2550 and Oracle’s larger-disk Exadata configuration — both priced more or less in the same range — have clearly been price-competitive with Netezza since their respective introductions.
That just changed. Netezza is cutting its pricing to the $20K/terabyte range imminently, with further cuts to come. So where does that leave competitors?
- The Teradata 1550 is in the Netezza price range (still a little below, actually).
- Oracle basically has nothing price-competitive with Netezza.
- Microsoft has stated it plans to introduce Madison below the old DATAllegro price points; conceivably, that could be competitive with Netezza’s new pricing, although I haven’t checked as to how much it now costs simply to buy a lot of SQL Server licenses (which presumably would be a Madison lower bound, and might except for hardware be the whole thing, since Microsoft likes to create large product bundles).
- XtremeData just launched in the new Netezza price range.
- Troubled Dataupia is hard to judge. While on the surface Dataupia’s prices sound very low, you can’t use a Dataupia box unless you also have a brand-name DBMS (license and hardware) alongside it. That obviously affects total cost significantly.
- Kickfire seems unaffected, as it doesn’t and most likely won’t compete with Netezza (different database size ranges).
- For the most part, software-only vendors are free to adapt or not as they choose. Hardware prices generally don’t need to be over $10K/terabyte, and in some cases could be a lot less. So the question is how far they’re willing to discount their software.
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Dataupia, Exadata, Kickfire, Oracle, Pricing, Teradata, XtremeData | 14 Comments |
Netezza’s worldwide show-and-tell
In this economy, conference attendance is way down. Accordingly, a number of vendors have reevaluated whether it makes sense to have a traditional big-bang user conference, or whether it might make more sense to do a tour, bringing their message to multiple geographical areas. Netezza has opted for the latter course, something I’ve been well aware of for two reasons:
- Planning for the conferences and for Netezza’s product roll-out is of course coordinated, and product roll-out is something I advise my clients on.
- Netezza engaged me to speak at six different versions of the event (i.e., America and Europe, but not the Far East). There’s still time to contribute suggestions about my talk here.
Apparently, I’ll be talking late morning each time. My dates are:
- September 2, Boston
- September 9, Washington, DC
- September 15, Milan
- September 17, London
- September 24, San Francisco
- September 29, Chicago
The brand name of the events is Enzee Universe. Locations, registration information, and other particulars may be found on the Enzee Universe website.
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Netezza, Presentations | 2 Comments |
Netezza is changing its hardware architecture and slashing prices accordingly
Netezza is about to make its biggest product announcement in years. In particular:
- Netezza is cutting prices to under $20K/terabyte of user data, with even lower numbers promised for the near future.
- Netezza is replacing its PowerPC chips with Intel-based IBM blades.
- There will be substantial changes in how data flows between the various parts of a Netezza node.
- Netezza claims this will all produce an immediate 10-15X increase in price-performance, based on a 3X cut in price/terabyte and a 3-5X improvement in mixed workload performance. (Edit: Netezza now agrees that it shouldn’t have phrased things that way”.)
Allow me to explain. Read more
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Netezza, Pricing, Theory and architecture | 35 Comments |
XtremeData announces its DBx data warehouse appliance
XtremeData is announcing its DBx data warehouse appliance today. Highlights include: Read more
Categories: Benchmarks and POCs, Data warehouse appliances, Data warehousing, Pricing, XtremeData | 34 Comments |
Netezza on concurrency and workload management
I visited Netezza Friday for what was mainly an NDA meeting. But while I was there I asked where Netezza stood on concurrency, workload management, and rapid data mart spin-out. Netezza’s claims in those regards turned out to be surprisingly strong.
In the biggest surprise, Netezza claimed at least one customer had >5,000 simultaneous users, and a second had >4,000. Both are household names. Other unspecified Netezza customers apparently also have >1,000 simultaneous users. Read more
Categories: Data warehouse appliances, Data warehousing, Netezza, Teradata, Theory and architecture, Workload management | 13 Comments |
Update on Microsoft’s Madison and Fast Track data warehouse products
I chatted with Stuart Frost of Microsoft yesterday. Stuart is and remains GM of Microsoft’s data warehouse product unit, covering about $1 billion or so of revenue. While rumors of Stuart’s departure from Microsoft are clearly exaggerated, it does seem that his role is more one of coordination than actual management.
Microsoft Madison availability remains scheduled for H1 2010. Nothing new there. Tangible progress includes a few customer commitments of various sorts, including one outright planned purchase (due to some internal customer considerations around using up a budget). At the moment various Microsoft Madison technology “previews” are going on, which seem to amount to proofs-of-concept, that:
- Start with actual customer data (some from Microsoft, some from outside)
- Generate larger synthesized data sets based on those (database size seems to be 10-100 TB)
- Run in Microsoft data centers or “technology centers”, rather than on customer premises.
The basic Microsoft Madison product distribution strategy seems to be: Read more