Data warehouse appliances
Analysis of data warehouse appliances – i.e., of hardware/software bundles optimized for fast query and analysis of large volumes of (usually) relational data. Related subjects include:
- Data warehousing
- Parallelization
- Netezza
- DATAllegro
- Teradata
- Kickfire
- (in The Monash Report) Computing appliances in multiple domains
Why analytic DBMS increasingly need to be storage-aware
In my quick reactions to the EMC/Greenplum announcement, I opined
I think that even software-only analytic DBMS vendors should design their systems in an increasingly storage-aware manner
promising to explain what I meant later on. So here goes. Read more
Categories: Data warehouse appliances, Data warehousing, Solid-state memory, Storage, Theory and architecture | 6 Comments |
EMC is buying Greenplum
EMC is buying Greenplum. Most of the press release is a general recapitulation of Greenplum’s marketing messages, the main exceptions being (emphasis mine):
The acquisition of Greenplum will be an all-cash transaction and is expected to be completed in the third quarter of 2010, subject to customary closing conditions and regulatory approvals. The acquisition is not expected to have a material impact to EMC GAAP and non-GAAP EPS for the full 2010 fiscal year. Upon close, Bill Cook will lead the new data computing product division and report to Pat Gelsinger. EMC will continue to offer Greenplum’s full product portfolio to customers and plans to deliver new EMC Proven reference architectures as well as an integrated hardware and software offering designed to improve performance and drive down implementation costs.
Greenplum is one of my biggest vendor clients, and EMC is just becoming one, but of course neither side gave me a heads-up before the deal happened, nor have I yet been briefed subsequently. With those disclaimers out of the way, some of my early thoughts include:
- I wish my clients would never buy each other, but it’s inevitable.
- I don’t think anybody evaluating Greenplum should be much influenced by this deal one way or the other. (Whether they will be is of course a different matter.)
- EMC tends to run its bigger software acquisitions in a fairly hands-off manner. There’s no particular FUD (Fear/Uncertainty/Doubt) reason why this deal should stop anybody from buying Greenplum software.
- I also don’t think adding a rich parent adds much of a reason to buy from Greenplum. But if you’re the type who’s nervous about smaller vendors — well, Greenplum now isn’t so small.
- Greenplum Chorus could, in principle, work with non-Greenplum DBMS. That possibility suddenly looks a lot more realistic.
- The list of analytic DBMS vendors with an appliance orientation is pretty impressive, including:
- Oracle, with Exadata
- Microsoft, partially
- Teradata
- Netezza
- Now EMC/Greenplum, at least partially
- Weaker players such as:
- The ailing Kickfire, which a client (not Kickfire itself) tells me is being shopped around
- The reeling HP Neoview
- XtremeData, but I’m still waiting to hear of XtremeData’s first real sale
- Greenplum is something of a specialist in large databases. EMC has to love that.
- Greenplum’s weakness is concurrency.
- Greenplum’s “polymorphic storage” is a good fit for a storage vendor with appliance-y ideas.
- And finally — I think that even software-only analytic DBMS vendors should design their systems in an increasingly storage-aware manner, and have been advising my vendor clients of same. I’ll blog that line of reasoning separately when I get a chance, and edit in a link here after I do.
Related links (edit)
- Here’s the promised post as to why analytic DBMS need to be ever more storage-aware.
- Dave Kellogg crunched the EMC/Greenplum numbers, coming up with an estimated valuation range of $3-400 million, the high end of which is rumored to be correct.
- Merv Adrian suggests the big EMC/Greenplum loser is ParAccel, a viewpoint which presumably presupposes that the EMC/ParAccel partnership was significant in the first place.
- I talked with Ben Werther and posted more about Greenplum and EMC.
Categories: Data warehouse appliances, EMC, Greenplum, Storage | 13 Comments |
Netezza’s silicon balance
As I’ve mentioned in a couple of other posts, Netezza is stressing that the most recent wave of its technology is software-only, with no hardware upgrades made or needed. In other words, Netezza boxes already have all the silicon they need. But of course, there are really at least three major aspects to the Netezza silicon story – FPGA (Field-Programmable Gate Array), CPU, and RAM.
- Netezza planned to be “generous” in its original TwinFin FPGA capacity, anticipating software upgrades like the ones it’s introducing now. It is satisfied that this strategy worked. More on this below.
- The same surely applies to CPU.
- What’s more, I get the sense that the CPU turned out in practice to be even more over-provisioned than they anticipated …
- … at least when one just considers Netezza’s base NPS software.
- However, I suspect that if the advanced analytics capability takes off, Netezza will determine that more CPU is always better.
- And by the way, NEC is making versions of Netezza appliances with more advanced chips than Netezza is. So if anybody should really, really need more CPU in their Netezza boxes, there’s a very straightforward way to make that happen. (And if there were nontrivial demand for that, appropriate support plans could surely be structured.)
- Everybody needs to be careful about RAM. Netezza is surely no exception.
The major parts of Netezza’s FPGA software are:
- Compress Engine 2. This is Netezza’s new way of doing compression.
- Compress Engine 1. This is Netezza’s old way of doing compression. It is being kept around so that existing Netezza tables don’t suddenly have to be changed or reloaded.
- Project Engine. Guess what this does.
- Restrict Engine. Ditto.
- Visibility Engine. This enforces ACID and handles row-level security. It is “sort of a corner of” the Restrict Engine (Actually, Netezza seems to waver as to whether to describe “Restrict” and “Visibility” as being two engines or one.)
- Miscellaneous plumbing.
If I understood correctly, each Netezza FPGA has two each of the engines in parallel.
Related link
- An August, 2009 post on what Netezza does in its FPGA
Categories: Data warehouse appliances, Data warehousing, Database compression, Netezza, Theory and architecture | Leave a Comment |
A partial overview of Netezza database software technology
Netezza is having its user conference Enzee Universe in Boston Monday–Wednesday, June 21-23, and naturally will be announcing new products there, and otherwise providing hooks and inducements to get itself written about. (The preliminary count is seven press releases in all.) To get a head start, I stopped by Netezza Thursday for meetings that included a 3 ½ hour session with 10 or so senior engineers, and have exchanged some clarifying emails since. Read more
Categories: Data warehouse appliances, Data warehousing, Netezza, Theory and architecture, Workload management | 15 Comments |
Notes on a spate of Netezza-related blog posts
Fearing that last year’s tight travel budgets would hamper attendance, Netezza – like a number of other vendors – decided to forgo a traditional user conference. Instead, it took its Enzee Universe show on the road, essentially spreading the conference across eight cities. I was asked to keynote six of the installments.
After the first one, Netezza Marketing VP Tim Young took me aside for two pieces of constructive criticism. The surprising one* was that he felt I had been INSUFFICIENTLY critical of Netezza. Since then, every other conversation we’ve had about content creation has also featured ringing reassurances that Tim truly wants independent, non-pandering work.
*The unsurprising one was that I’d rushed. Well, duh. After months of telling me I had a 1 hour slot, Netezza cut me to ½ hour a few days beforehand. And my talk had been designed to be high-speed even in the longer time slot …
As a result, I accepted a subsequent gig from Netezza that I would barely consider from most other vendors. Namely, for this year’s Enzee Universe – June 21-23, aka Monday-Wednesday of this week, at the Westin Waterfront Hotel in Boston – I would do some contemporaneous blogging. The parameters we agreed on included: Read more
Categories: Data warehouse appliances, Data warehousing, Netezza, Presentations | 3 Comments |
Kickfire update
A Kickfire competitor tipped me off that he got 3 Kickfire salesmen’s resumes in 24 hours. I ran this by Kickfire CEO Bruce Armstrong, who confirmed that Kickfire has had a layoff, but gave me no further details.
Bruce also told me that Kickfire is now up to 10 paying customers, and that there are repeat deals.
Categories: Data warehouse appliances, Data warehousing, Kickfire, Market share and customer counts | 3 Comments |
Clarifying the state of MPP in-database SAS
I routinely am briefed way in advance of products’ introductions. For that reason and others, it can be hard for me to keep straight what’s been officially announced, introduced for test, introduced for general availability, vaguely planned for the indefinite future, and so on. Perhaps nothing has confused me more in that regard than the SAS Institute’s multi-year effort to get SAS integrated into various MPP DBMS, specifically Teradata, Netezza Twinfin(i), and Aster Data nCluster.
However, I chatted briefly Thursday with Michelle Wilkie, who is the SAS product manager overseeing all this (and also some other stuff, like SAS running on grids without being integrated into a DBMS). As best I understood, the story is: Read more
Categories: Aster Data, Data warehouse appliances, MapReduce, Netezza, Parallelization, Predictive modeling and advanced analytics, SAS Institute, Specific users, Teradata | 11 Comments |
Clustrix may be doing something interesting
Clustrix launched without briefing me or, at least so far as I can tell, anybody else who knows much about database technology. But Clustrix did post a somewhat crunchy, no-registration-required, white paper. Based on that, I get the impression:
- Clustrix is making OLTP DBMS.
- The core problem Clustrix tries to solve is scale-out, without necessarily giving up SQL. (I couldn’t immediately tell whether Clustrix supports NoSQL-style key-value interfaces enthusiastically, grudgingly, or not at all.)
- Unlike Akiban or VoltDB, Clustrix makes database appliances. The Clustrix software seems to assume a Clustrix appliance.
- A key feature of Clustrix’s database appliances is that they rely on solid-state memory. I’m guessing that Clustrix appliances don’t even have disks, or that if they do the disks store some software or something, not actual data. (As previously noted, I agree with Oracle in thinking that much of the progress in database technology this decade will come from proper design for solid-state memory.)
- Clustrix talks of things that sound like compiled queries and attempts to avoid locks. However, it doesn’t sound as extreme in these regards as VoltDB.
- Clustrix also talks of things that sound like consistent hashing.
- The brand name “Sierra” also shows up along with the brand name “Clustrix.”
Categories: Clustrix, Data warehouse appliances, DBMS product categories, NoSQL, Parallelization, Solid-state memory, Storage, Theory and architecture | 2 Comments |
Netezza nails April Fool’s Day
Netezza has nailed April Fool’s Day this year. 🙂 (Their site will revert to normal after April 1, so I may later edit this post accordingly.)
Related links
Categories: Data warehouse appliances, Data warehousing, Fun stuff, Humor, Netezza | Leave a Comment |
XtremeData update
I talked with Geno Valente of XtremeData tonight. Highlights included:
- XtremeData still hasn’t sold any dbX stuff (they’ve had a side business in generic FPGA-based boards paying the bills for years). Well, there may have been some paid POCs (proofs of concept) or something, but real sales haven’t come through yet.
- XtremeData does have three prospects who have said “Yes”, and expects one order to come through this month.
- XtremeData continues to believe it shines when:
- Data models are complex
- In particular, there are complex joins
- In particular, two large tables have to be joined with each other, under circumstances where no product can avoid doing vast data redistribution
- XtremeData insists that all the nice things Bill Inmon – including in webinars — has said about it has not been for pay or other similar business compensation. That’s quite unusual.
- XtremeData is coming out with a new product, codenamed the Personal Data Warehouse (PDW), which:
- Is ready to go into beta test
- Should be launched in a month and a half or so
- Will have a different name when it is launched
Naming aside, Read more