Data warehouse appliances
Analysis of data warehouse appliances – i.e., of hardware/software bundles optimized for fast query and analysis of large volumes of (usually) relational data. Related subjects include:
- Data warehousing
- Parallelization
- Netezza
- DATAllegro
- Teradata
- Kickfire
- (in The Monash Report) Computing appliances in multiple domains
eBay’s two enormous data warehouses
A few weeks ago, I had the chance to visit eBay, meet briefly with Oliver Ratzesberger and his team, and then catch up later with Oliver for dinner. I’ve already alluded to those discussions in a couple of posts, specifically on MapReduce (which eBay doesn’t like) and the astonishingly great difference between high- and low-end disk drives (to which eBay clued me in). Now I’m finally getting around to writing about the core of what we discussed, which is two of the very largest data warehouses in the world.
Metrics on eBay’s main Teradata data warehouse include:
- >2 petabytes of user data
- 10s of 1000s of users
- Millions of queries per day
- 72 nodes
- >140 GB/sec of I/O, or 2 GB/node/sec, or maybe that’s a peak when the workload is scan-heavy
- 100s of production databases being fed in
Metrics on eBay’s Greenplum data warehouse (or, if you like, data mart) include:
- 6 1/2 petabytes of user data
- 17 trillion records
- 150 billion new records/day, which seems to suggest an ingest rate well over 50 terabytes/day
- 96 nodes
- 200 MB/node/sec of I/O (that’s the order of magnitude difference that triggered my post on disk drives)
- 4.5 petabytes of storage
- 70% compression
- A small number of concurrent users
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, eBay, Greenplum, Petabyte-scale data management, Teradata, Web analytics | 48 Comments |
Data warehouse storage options — cheap, expensive, or solid-state disk drives
This is a long post, so I’m going to recap the highlights up front. In the opinion of somebody I have high regard for, namely Carson Schmidt of Teradata:
- There’s currently a huge — one order of magnitude — performance difference between cheap and expensive disks for data warehousing workloads.
- New disk generations coming soon will have best-of-both-worlds aspects, combining high-end performance with lower-end cost and power consumption.
- Solid-state drives will likely add one or two orders of magnitude to performance a few years down the road. Echoing the most famous logjam in VC history — namely the 60+ hard disk companies that got venture funding in the 1980s — 20+ companies are vying to cash in.
In other news, Carson likes 10 Gigabit Ethernet, dislikes Infiniband, and is “ecstatic” about Intel’s Nehalem, which will be the basis for Teradata’s next generation of servers.
Categories: Data warehouse appliances, Data warehousing, eBay, Solid-state memory, Storage, Teradata | 16 Comments |
Kickfire update
I talked recently with my clients at Kickfire, especially newish CEO Bruce Armstrong. I also visited the Kickfire blog, which among other virtues features a fairly clear overview of Kickfire technology. (I did my own Kickfire overview in October.) Highlights of the current Kickfire story include:
- Kickfire is initially focused on three heavily overlapping markets — network event analysis, the general Web 2.0/clickstream/online marketing analytics area, and MySQL/LAMP data warehousing.
- Kickfire has blogged about a few sales to unnamed customers in those markets.
- I think network management is a market that’s potentially friendly to five-figure-cost appliances. After all, networking equipment is generally sold in appliance form. Kickfire doesn’t dispute this analysis.
- Kickfire’s sales so far are to run databases in the sub-terabyte range, although both Kickfire and its customers intend to run bigger databases soon. (Kickfire describes the range as 300 GB – 1 TB.) Not coincidentally, Kickfire believes that MySQL doesn’t scale very well past 100 GB without a lot of partitioning effort (in the case of data warehouses) or sharding (in the case of OLTP).
- When Bruce became CEO, he let go some sales, marketing, and/or business development folks. He likes to call this a restructuring of Kickfire rather than a reduction-in-force, but anyhow — that’s what happened. There are now about 50 employees, and Kickfire still has most of the $20 million it raised last August in the bank. Edit: The company clarifies that it actually wound up with more sales and marketing people than before.
- Kickfire has thankfully deemphasized various marketing themes I found annoying, such as ascribing great weight to TPC-H benchmarks or explaining why John von Neumann originally made bad choices in his principles of computer design.
Categories: Data warehouse appliances, Data warehousing, Kickfire, MySQL, Open source, Web analytics | 1 Comment |
Oracle introduces a half-rack version of Exadata
Oracle has introduced what amounts to a half-rack Exadata machine. My thoughts on this basically boil down to “makes sense” and “no big deal.” Specifically:
- The new Baby Exadata still holds 10 terabytes or more.
- Most specialty analytic DBMS purchases are still for databases of 10 terabytes or smaller.
- Large enterprise data warehouse projects are often being deferred or cut back due to the economic crunch, but smaller projects with credible, quick ROIs are doing fine.
- Exadata is evidently being sold overwhelmingly to Oracle loyalists. Other analytic DBMS vendors aren’t telling me of serious Exadata competition yet. If the market for Exadata is primarily “happy Oracle data warehouse users”, that’s mainly folks who have <5-10 terabytes of user data today.
- Oracle Exadata beta tests were done on a kind of half-rack configuration anyway.
Categories: Data warehouse appliances, Data warehousing, Exadata, Oracle | Leave a Comment |
DATAllegro sales price: $275 million
According to a press release announcing a venture capitalist’s job change,
Microsoft purchased DATAllegro for $275 million
Technically, that needn’t shut down the rumor mill altogether, since given the way deals are structured and reported, it’s unlikely that Microsoft actually cut checks to DATAllegro stockholders in the aggregate amount of $275 million promptly after the close of the acquisition.
Still, it’s a data point of some weight.
Hat tip to Mark Myers.
Closing the book on the DATAllegro customer base
I’m prepared to call an end to the “Guess DATAllegro’s customers” game. Bottom line is that there are three in all, two of which are TEOCO and Dell, and the third of which is a semi-open secret. I wrote last week:
The number of DATAllegro production references is expected to double imminently, from one to two. Few will be surprised at the identity of the second reference. I imagine the number will then stay at two, as DATAllegro technology is no longer being sold, and the third known production user has never been reputed to be particularly pleased with it.
Dell did indeed disclose at TDWI that it was a large DATAllegro user, notwithstanding that Dell is a huge Teradata user as well. No doubt, Dell is gearing up to be a big user of Madison too.
Also at TDWI, I talked with some former DATAllegro employees who now work for rival vendors. None thinks DATAllegro has more than three customers. Neither do I.
Edit: Subsequently, the DATAllegro customer count declined to 1.
Categories: Data warehouse appliances, Data warehousing, DATAllegro, Market share and customer counts, Microsoft and SQL*Server, Specific users | 10 Comments |
HP and Neoview update
I had lunch with some HP folks at TDWI. Highlights (burgers and jokes aside) included:
- HP’s BI consulting (especially the former Knightsbridge) and analytic product groups (including Neoview) are now tightly integrated.
- HP is trying to develop and pitch “solutions” where it has particular “intellectual property.” This IP can come from ordinary product engineering or internal use, because HP Labs serves both sides of the business. Specific examples offered included:
- Telecom. Apparently, HP made specialized data warehouse devices for CDRs (Call Detail Records) long ago, and claims this has been area of particular expertise ever since.
- Supply chain – based on HP’s internal experiences.
- Customer relationship – ditto
- The main synergy suggested between consulting and Neoview is that HP’s experts work on talking buyers into such a complex view of their requirements that only Neoview (supposedly) can fit the bill.
- HP insists there are indeed new Neoview sales.
- Neoview sales seem to be concentrated in what Aster might call “frontline” applications — i.e., low latency, OLTP-like uptime requirements, etc.
- HP says it did an actual 80 TB POC. I asked whether this was for an 80 TB app or something a lot bigger, but didn’t get a clear answer.
Given the emphasis on trying to exploit HP’s other expertise in the data warehousing business, I suggested it was a pity that HP spun off Agilent (HP’s instrumentation division, aka HP Classic). Nobody much disagreed.
Categories: Analytic technologies, Business intelligence, Data warehouse appliances, Data warehousing, HP and Neoview, Telecommunications | 4 Comments |
Draft slides on how to select an analytic DBMS
I need to finalize an already-too-long slide deck on how to select an analytic DBMS by late Thursday night. Anybody see something I’m overlooking, or just plain got wrong?
Edit: The slides have now been finalized.
Winter Corporation on Exadata
The most ridiculous analyst study I can recall — at least since Aberdeen pulled back from the “You pay; we say” business — is Winter Corporation’s list of large data warehouses. (Failings include that it only lists warehouses run by software from certain vendors; it doesn’t even list most of the largest warehouses from those vendors; and its size metrics are in my opinion fried.) So it was with some trepidation that I approached what appears to be an Oracle-sponsored Winter Corporation white paper about Exadata.* Read more
Categories: Data warehouse appliances, Data warehousing, Exadata, Oracle | 5 Comments |
Oracle Exadata article — up at last
I’d been promising Intelligent Enterprise editor Doug Henschen an article on Oracle Exadata for months. It’s finally up. For a variety of reasons, it was a lot more work than one might at first guess. One such reason is that it spawned four related blog posts over the past few days.
As I post this, there are two glitches in the article. One is that em dashes are appearing as quote marks — and as you know, I use a lot of em dashes. The other is that one sentence on in-database data mining seems unclear to me, and I’ve asked for a small edit to make it clearer what I’m talking about. No doubt both will be cleared up soon. Edit: Doug indeed fixed all that within minutes.
This is an edited article. Other than columns, it may be my first such since the Upside Magazine cover story on AOL over a decade ago. But it was edited with a light and skillful touch. Please don’t hold me responsible for every minor subtlety of emphasis or grammatical nuance. But otherwise I stand behind the opinions, for they are indeed mine.
Categories: About this blog, Data warehouse appliances, Exadata, Oracle | 1 Comment |