Notes on analytic hardware
I took the opportunity of Teradata’s Aster/Hadoop appliance announcement to catch up with Teradata hardware chief Carson Schmidt. I love talking with Carson, about both general design philosophy and his views on specific hardware component technologies.
From a hardware-requirements standpoint, Carson seems to view Aster and Hadoop as more similar to each other than either is to, say, a Teradata Active Data Warehouse. In particular, for Aster and Hadoop:
- I/O is more sequential.
- The CPU:I/O ratio is higher.
- Uptime is a little less crucial.
The most obvious implication is differences in the choice of parts, and of their ratio. Also, in the new Aster/Hadoop appliance, Carson is content to skate by with RAID 5 rather than RAID 1.
I think Carson’s views about flash memory can be reasonably summarized as:
- He doesn’t like flash plugged straight into the motherboard, via PCIe, as some sort of flash cache because:
- The difference in latency between flash and RAM really hurts.
- So do difficulties in writing fine-grained caching logic.
- He doesn’t like the PCIe option as replacement for spinning disks, because you can’t get enough total capacity that way.
- He does like flash via disk controllers as a replacement for spinning disk drives, for reasons of throughput rather than latency.
- He can get 390 MB/second out of a solid-state drive (SSD).
- But for a particularly random workload, a disk could have throughput as low as 14 MB/second.
- Of course, if the disk workload can be made quasi-sequential, then the throughput gap is much reduced.
- Indeed, Carson is happy to let different AMPs* talk to an SSD at once — current limit of 6 — which he wouldn’t allow with a hard disk.
Notwithstanding the foregoing, Carson seemed open-minded to my conjecture that if MapReduce insists on writing to persistent storage at every step, you might want to have flash cache just for that.
*An AMP is a Teradata unit of parallelism; it’s what executes part of a query plan. Thinking of an AMP as a core is a “useful approximation”.
Carson’s views about networking seem to be at their simplest state in a while:
- InfiniBand is the best choice in almost all cases, until something better someday comes down the pike.
- Teradata’s Bynet software can be ported straightforwardly to any plausible kind of network.
The greatest fun in talking with Carson comes when he introduces me to concepts or issues I hadn’t heard of before. This time there were two. One was LR-DIMM, where the LR stands for Load Reduced/Reduction, and DIMM stands for Dual In-line Memory Module. The idea of LR-DIMM seems to be that RAM will have an onboard buffer to smooth its communication with the memory bus. This is important because overloaded memory busses crater in performance.
The other is that Carson is shaking his head about different customers’ desires for electric power density. Traditional data centers can’t supply more than 4 kilowatts or so to the area of a standard-sized chassis, so that’s all the power requirement you can put in one. However, enterprises with newer data centers can handle 12 kilowatts or more, and in one case up to 27. And so they want to cram the same computer power into 1/3 or less of the floor space than most enterprises do.
Comments
2 Responses to “Notes on analytic hardware”
Leave a Reply
[…] took the opportunity to chat with Teradata hardware chief Carson Schmidt, in part about the rationale for these design […]
[…] and Hadoop clusters are separate, even if they can be run on different nodes in the same appliance. Hadapt’s RDBMS runs on the same nodes as HDFS (Hadoop Distributed File System), or […]