Data warehouse appliances
Analysis of data warehouse appliances – i.e., of hardware/software bundles optimized for fast query and analysis of large volumes of (usually) relational data. Related subjects include:
- Data warehousing
- Parallelization
- Netezza
- DATAllegro
- Teradata
- Kickfire
- (in The Monash Report) Computing appliances in multiple domains
TwinFin(i) – Netezza’s version of a parallel analytic platform
Much like Aster Data did in Aster 4.0 and now Aster 4.5, Netezza is announcing a general parallel big data analytic platform strategy. It is called Netezza TwinFin(i), it is a chargeable option for the Netezza TwinFin appliance, and many announced details are on the vague side, with Netezza promising more clarity at or before its Enzee Universe conference in June. At a high level, the Aster and Netezza approaches compare/contrast as follows: Read more
Categories: Aster Data, Data warehouse appliances, Data warehousing, Hadoop, MapReduce, Netezza, Predictive modeling and advanced analytics, SAS Institute, Teradata | 10 Comments |
Comments on the Gartner 2009/2010 Data Warehouse Database Management System Magic Quadrant
February, 2011 edit: I’ve now commented on Gartner’s 2010 Data Warehouse Database Management System Magic Quadrant as well.
At intervals of a little over a year, Gartner Group publishes a Data Warehouse Database Management System Magic Quadrant. Gartner’s 2009 data warehouse DBMS Magic Quadrant — actually, January 2010 — is now out.* For many reasons, including those I noted in my comments on Gartner’s 2008 Data Warehouse DBMS Magic Quadrant, the Gartner quadrant pictures are a bad use of good research. Rather than rehash that this year, I’ll merely call out some points in the surrounding commentary that I find interesting or just plain strange. Read more
Netezza Skimmer
As I previously complained, last week wasn’t a very convenient time for me to have briefings. So when Netezza emailed to say it would release its new entry-level Skimmer appliance this morning, while I asked for and got a Friday afternoon briefing, I kept it quick and basic.
That said, highlights of my Netezza Skimmer briefing included:
- In essence, Netezza Skimmer is 1/3 of Netezza’s previously smallest appliance, for 1/3 the price.
- I.e., Netezza Skimmer has 1 S-blade and 9 disks, vs. 3 S-blades and 24 disks on the Netezza TwinFin 3.
- With 1 disk reserved as a hot spare, that boils down to a 1:1:1 ratio among CPU cores, FPGA cores, and 1-terabyte disks on Netezza skimmer. The same could pretty much be said of Netezza TwinFin, the occasional hot-spare disk notwithstanding.
- Netezza Skimmer costs $125K.
- With 2.8 or so TB of space for user data before compression, that’s right in line with the Netezza price point of slightly <$20K/terabyte of user data.
- That assumes Netezza’s usual 2.25X compression. I forgot to ask when 4X compression was actually being shipped.
- I forgot to ask, but it seems obvious that Netezza Skimmer uses identical or substantially similar components to Netezza TwinFin’s.
- Netezza Skimmer is 7 rack units high.
- In place of the SMP hosts on TwinFin Systems, Netezza Skimmer has a host blade.
- Netezza (specifically Phil Francisco) mentioned that when Kalido uses Netezza Skimmer for its appliance, there will be an additional host computer, but when it uses TwinFin for the same software, the built-in host will suffice. (Even so, I suspect it might be too strong to say that Skimmer’s built-in host computer is underpowered.)
- Netezza also suggested that more appliance OEMs are coming down the pike specifically focused on the affordable Skimmer.
Categories: Data mart outsourcing, Data warehouse appliances, Data warehousing, Netezza, Pricing | 2 Comments |
Two cornerstones of Oracle’s database hardware strategy
After several months of careful optimization, Oracle managed to pick the most inconvenient* day possible for me to get an Exadata update from Juan Loaiza. But the call itself was long and fascinating, with the two main takeaways being:
- Oracle thinks flash memory is the most important hardware technology of the decade, one that could lead to Oracle being “bumped off” if they don’t get it right.
- Juan believes the “bulk” of Oracle’s business will move over to Exadata-like technology over the next 5-10 years. Numbers-wise, this seems to be based more on Exadata being a platform for consolidating an enterprise’s many Oracle databases than it is on Exadata running a few Especially Big Honking Database management tasks.
And by the way, Oracle doesn’t make its storage-tier software available to run on anything than Oracle-designed boxes. At the moment, that means Exadata Versions 1 and 2. Since Exadata is by far Oracle’s best DBMS offering (at least in theory), that means Oracle’s best database offering only runs on specific Oracle-sold hardware platforms. Read more
Comments on a fabricated press release quote
My clients at Kickfire put out a press release last week quoting me as saying things I neither said nor believe. The press release is about a “Queen For A Day” kind of contest announced way back in April, in which users were invited to submit stories of their data warehouse problems, with the biggest sob stories winning free Kickfire appliances. The fabricated “quote” reads: Read more
Categories: About this blog, Data warehouse appliances, Data warehousing, Kickfire, Market share and customer counts, Sybase | 3 Comments |
Boston Big Data Summit keynote outline
Last month, Bob Zurek asked me to give a talk on “Big Data”, where “big” is anything from a few terabytes on up, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I’m posting them below.
Teradata hardware strategy and tactics
In my opinion, the most important takeaways about Teradata’s hardware strategy from the Teradata Partners conference last week are:
- Teradata’s future lies in solid-state memory. That’s in line with what Carson Schmidt told me six months ago.
- To Teradata’s surprise, the solid-state future is imminent. Teradata is 6-9 months further along with solid-state drives (SSD) than it thought a year ago it would be at this point.
- Short-term, Teradata is going to increase the number of appliance kinds it sells. I didn’t actually get details on anything but the new SSD-based Blurr, but it seems there will be others as well.
- Teradata’s eventual future is to mix and match parts (especially different kinds of storage) in a more modular product line. Teradata Virtual Storage is of pretty limited value otherwise. I probably believe Teradata will go modular more emphatically than Teradata itself does, because I think doing so will meet users needs more effectively than if Teradata relies strictly on fixed appliance configurations.
In addition, some non-SSD componentry tidbits from Carson Schmidt include:
- Teradata really likes Intel’s Nehalem CPUs, with special reference to multi-threading, QuickPath interconnect, and integrated memory controller. Obviously, Nehalem-based Teradata boxes should be expected in the not too distant future.
- Teradata really likes Nehalem’s successor Westmere too, and expects to be pretty fast to market with it (faster than with Nehalem) because Nehalem and Westmere are plug-compatible in motherboards.
- Teradata will go to 10-gigabit Ethernet for external connectivity on all its equipment, which should improve load performance.
- Teradata will also go to 10-gigabit Ethernet to play the Bynet role on appliances. Tests are indicating this improves query performance.
- What’s more, Teradata believes there will be no practical scale-out limitations with 10-gigabit Ethernet.
- Teradata hasn’t decided yet what to do about 2.5” SFF (Small Form Factor) disk drives, but is leaning favorably. Benefits would include lower power consumption and smaller cabinets.
- Also on Carson’s list of “exciting” future technologies is SAS 2.0, which at 6 gigabits/second doubles the I/O bandwidth of SAS 1.0.
- Carson is even excited about removing universal power supplies from the cabinets, increasing space for other components.
- Teradata picked Intel’s Host Bus Adapters for 10-gigabit Ethernet. The switch supplier hasn’t been determined yet.
Let’s get back now to SSDs, because over the next few years they’re the potential game-changer. Read more
Categories: Data warehouse appliances, Data warehousing, Solid-state memory, Storage, Teradata | 13 Comments |
Reports of perfectly-balanced hardware configurations are greatly exaggerated
Data warehouse appliance and software appliance vendors like to claim that they’ve worked out just the right hardware configuration(s), and that a single configuration is correct for a fairly broad range of workloads. But there are a lot of reasons to be dubious about that. Specific vendor evidence includes:
- Teradata ascribes considerable importance to a Virtual Storage technology whose main purpose is to allow mixing of heterogeneous storage devices in a single system. And the discussion rarely suggests that these parts will be in a rigid fixed relationship.
- Netezza — as Teradata keeps reminding me — often sells boxes with the expectation that they won’t be filled with data, so as to increase spindle count and hence performance.
- Oracle/Sun have dropped some comments about Exadata being more flexibly configured going forward.
- Kickfire’s new “high-end” appliance lets you attach fairly arbitrary amounts of external storage.
- And of course, software-only analytic DBMS vendors run their software in all sorts of hardware and storage environments.
What’s more, the claim never made a lot of sense anyway. With the rarest of exceptions, even a single data warehouse’s workload will contain different queries that strain different parts of the system in different ratios. Calculating the “ideal” hardware configuration for that single workload would be forbiddingly difficult. And even if one could calculate it, it almost surely would be different than another user’s “ideal” configuration. How a single hardware configuration can be “ideally balanced” for a broad class of use cases boggles the imagination.
Categories: Data warehouse appliances, Data warehousing, Exadata, Kickfire, Netezza, Oracle, Teradata | 6 Comments |
This week at the Teradata Partners user conference
Teradata tells me that its press embargoes are ending at 9:00 this morning. Here are some highlights of what’s going on, although names, dates, and details will have to await conversations and press releases this week.
- Teradata is productizing “private cloud,” under names including “Teradata Enterprise Analytics Cloud,” “Teradata Agile Analytics Cloud,” and “Teradata Elastic Mart Builder.” I.e., Teradata hopes to leapfrog Greenplum in its “Enterprise Data Cloud” strategy. This is only fair, in that Greenplum lifted the idea from Teradata and eBay in the first place. It also provides major support for what I think is an extremely sensible trend. Give or take issues of who announces and ships what a couple months before or after a competitor, my early thinking is that the main differences between Greenplum and Teradata in this regard will be:
- Virtual as opposed to just physical data marts, based on robust workload management software. (Advantage: Teradata)
- Pricing, deployment options. (Advantage: Greenplum)
- Features that don’t directly relate to enterprise/private cloud. (Advantage: Either, often Teradata.)
- Teradata is generally strengthening its data movement technology, e.g. for making various appliances work in sync. I’m not too clear yet on the details of that. I think this is what Teradata’s phrase “ecosystem management” refers to.
- Teradata is (pre-)announcing – at least as a statement of direction — an appliance based on solid-state drives (SSDs). I’ve thought for a while that Teradata was a leader in thinking through the issues around solid-state memory in data warehousing, so it makes sense that they’re among the leaders in actually coming to market as well. I plan to say more after meeting with, e.g., Carson Schmidt.
- Teradata has achieved a 300%ish speed-up in geospatial processing. I gather this is largely a byproduct of the parallel analytics work Teradata did around strengthening its SAS integration. However, there don’t seem to be a lot of Teradata geospatial users yet.
- Teradata Express, Teradata’s free Windows-based crippleware, is being ported to Amazon EC2 and VMware as well. Presumably to avoid cannibalizing Teradata product sales, there are quite a few limitations on Teradata Express, including system capacity, database size, and “no production use.”
- Teradata continues to extend its optimizations to handle queries issued by business intelligence tools. Previously, the focus of what Teradata discussed in this regard was query rewrite. But soon automatic recommendation and creation of Aggregate Join Indexes – i.e.., materialized views – will be included as well.
Greenplum customer notes
In a briefing about a forthcoming product announcement, Greenplum threw in a slide saying:
- Greenplum is getting 12-15 new (paying) customers per quarter, all of whom it fondly refers to as “Tier 1” enterprises.
- Greenplum will hit the 100+ customer mark this quarter (thus joining Vertica and Infobright).
- <10% of Greenplum business is now “influenced” by Sun hardware.
I asked Ben Werther to unpack that last claim for me. He quickly noted that it wasn’t his slide, but rather had been put together by colleagues. That said:
- As of the past quarter or two, <10% of Greenplum’s sales activity is on Sun, which works out to maybe one sale per quarter and at most a small number of sales cycles. (That’s down from from 50%+ not that long ago.)
- Most Greenplum business is now on HP or Dell equipment. Some is on IBM. There are some interesting sales cycles on Cisco’s new UCS (Unified Computing System) blades, but no closed deals yet. EMC seems to be part of the Cisco story.
No doubt part of the reason for the move away from Sun equipment is the impending Oracle acquisition. Another may be that the Greenplum/Sun appliance is somewhat underpowered. E.g., without particularly high levels of compression, eBay puts over 60 terabytes of data on each Greenplum node, which probably isn’t ideal from the standpoint of query performance.
Greenplum also says that 50% or so of sales are subscription-priced, rather than perpetual-licensed. I don’t have a sense for how long that’s been going on. (Edit: Ben Werther tells me this has been true for over a year.)