October 17, 2012

Notes on analytic hardware

I took the opportunity of Teradata’s Aster/Hadoop appliance announcement to catch up with Teradata hardware chief Carson Schmidt. I love talking with Carson, about both general design philosophy and his views on specific hardware component technologies.

From a hardware-requirements standpoint, Carson seems to view Aster and Hadoop as more similar to each other than either is to, say, a Teradata Active Data Warehouse. In particular, for Aster and Hadoop:

I/O is more sequential.
The CPU:I/O ratio is higher.
Uptime is a little less crucial.

The most obvious implication is differences in the choice of parts, and of their ratio. Also, in the new Aster/Hadoop appliance, Carson is content to skate by with RAID 5 rather than RAID 1.

I think Carson’s views about flash memory can be reasonably summarized as:

He doesn’t like flash plugged straight into the motherboard, via PCIe, as some sort of flash cache because:
- The difference in latency between flash and RAM really hurts.
- So do difficulties in writing fine-grained caching logic.
He doesn’t like the PCIe option as replacement for spinning disks, because you can’t get enough total capacity that way.
He does like flash via disk controllers as a replacement for spinning disk drives, for reasons of throughput rather than latency.
- He can get 390 MB/second out of a solid-state drive (SSD).
- But for a particularly random workload, a disk could have throughput as low as 14 MB/second.
- Of course, if the disk workload can be made quasi-sequential, then the throughput gap is much reduced.
Indeed, Carson is happy to let different AMPs* talk to an SSD at once — current limit of 6 — which he wouldn’t allow with a hard disk.

Notwithstanding the foregoing, Carson seemed open-minded to my conjecture that if MapReduce insists on writing to persistent storage at every step, you might want to have flash cache just for that.

*An AMP is a Teradata unit of parallelism; it’s what executes part of a query plan. Thinking of an AMP as a core is a “useful approximation”.

Carson’s views about networking seem to be at their simplest state in a while:

InfiniBand is the best choice in almost all cases, until something better someday comes down the pike.
Teradata’s Bynet software can be ported straightforwardly to any plausible kind of network.

The greatest fun in talking with Carson comes when he introduces me to concepts or issues I hadn’t heard of before. This time there were two. One was LR-DIMM, where the LR stands for Load Reduced/Reduction, and DIMM stands for Dual In-line Memory Module. The idea of LR-DIMM seems to be that RAM will have an onboard buffer to smooth its communication with the memory bus. This is important because overloaded memory busses crater in performance.

The other is that Carson is shaking his head about different customers’ desires for electric power density. Traditional data centers can’t supply more than 4 kilowatts or so to the area of a standard-sized chassis, so that’s all the power requirement you can put in one. However, enterprises with newer data centers can handle 12 kilowatts or more, and in one case up to 27. And so they want to cram the same computer power into 1/3 or less of the floor space than most enterprises do.

Categories: Aster Data, Data warehouse appliances, Data warehousing, Hadoop, Solid-state memory, Storage, Teradata

Subscribe to our complete feed!

Comments

2 Responses to “Notes on analytic hardware”

The Teradata Aster Big Analytics Aster/Hadoop appliance | DBMS 2 : DataBase Management System Services on October 17th, 2012 10:32 pm

[…] took the opportunity to chat with Teradata hardware chief Carson Schmidt, in part about the rationale for these design […]
Hadoop/RDBMS integration: Aster SQL-H and Hadapt | DBMS 2 : DataBase Management System Services on June 2nd, 2013 5:25 am

[…] and Hadoop clusters are separate, even if they can be run on different nodes in the same appliance. Hadapt’s RDBMS runs on the same nodes as HDFS (Hadoop Distributed File System), or […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Notes on analytic hardware

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin