Reports of perfectly-balanced hardware configurations are greatly exaggerated
Data warehouse appliance and software appliance vendors like to claim that they’ve worked out just the right hardware configuration(s), and that a single configuration is correct for a fairly broad range of workloads. But there are a lot of reasons to be dubious about that. Specific vendor evidence includes:
- Teradata ascribes considerable importance to a Virtual Storage technology whose main purpose is to allow mixing of heterogeneous storage devices in a single system. And the discussion rarely suggests that these parts will be in a rigid fixed relationship.
- Netezza — as Teradata keeps reminding me — often sells boxes with the expectation that they won’t be filled with data, so as to increase spindle count and hence performance.
- Oracle/Sun have dropped some comments about Exadata being more flexibly configured going forward.
- Kickfire’s new “high-end” appliance lets you attach fairly arbitrary amounts of external storage.
- And of course, software-only analytic DBMS vendors run their software in all sorts of hardware and storage environments.
What’s more, the claim never made a lot of sense anyway. With the rarest of exceptions, even a single data warehouse’s workload will contain different queries that strain different parts of the system in different ratios. Calculating the “ideal” hardware configuration for that single workload would be forbiddingly difficult. And even if one could calculate it, it almost surely would be different than another user’s “ideal” configuration. How a single hardware configuration can be “ideally balanced” for a broad class of use cases boggles the imagination.
Comments
6 Responses to “Reports of perfectly-balanced hardware configurations are greatly exaggerated”
Leave a Reply
DW appliance vendors in general do not claim that they have the perfect hardware ‘alignment’ with respect to storage, IO, CPU, and memory for all workloads. Each vendor has a sweet spot with respect to volume of data, distribution, and workload and that their default configuration(s) would work reasonably well for a given workload characteristics. The vendors goal is to beat their competition with price/performance ratios for a given workload (TPC-H). To me there is a difference between applying general algorithms to problems and specifically tunning the algorithm knowing the characteristics of the problem. While the general algorithm may not be the ‘ideal’ solution, there is in general a bigger cost for maintaining the hand tunned one.
To a first approximation, you’re right. Some vendors optimize for TPC-H, and advertise TPC-H results. Others optimize for real-life work, and advertise real-life successful customers.
Be that digression as it may, I stand by my opinion that it’s worthwhile pointing out the difference between a naive interpretation of marketing claims on the one hand, and reality on the other.
I think that “ideal” only matters if it affects the price/performance of the system. That is, given some workload with an SLA, “ideal” is the cheapest system that satisfies the SLA.
The advantage of a software-only solution that can “run their software in all sorts of hardware and storage environments” is only meaningful if it allows the hardware and storage environment to be tweaked to some price/performance advantage.
If a hardware/software appliance is too course-grained… i.e. the CPU or storage appliance increments are big and expensive, then the software-only offerings will have an advantage because they can offer systems in between the appliance increments. The appliance vendors can mitigate this advantage only by taking a margin hit when an in-between solution is required.
Full Disclosure: I work for Greenplum… but this note presents my personal opinion and does not reflect a company view…
[…] predictable enough to warrant such configuration specificity; Curt Monash discusses the issue here.) With its strong base of happy customers, Teradata can back that story with real world examples […]
Oracle Exadata…
Technical Documentation Exadata Prepup Presentation (compiled by Suman) Exadata V2 for Data Warehousing.pptx Files provided by Greg Day, Principal DB/Grid Sales Consultant Exadata Technical White Paper exadatatechnicalwhitepaper.pdf…….
[…] believe Teradata will go modular more emphatically than Teradata itself does, because I think doing so will meet users needs more effectively than if Teradata relies strictly on fixed appliance […]