Teradata vs. the new appliance vendors, technically
Todd Walter and Randy Lea of Teradata gave generously of their time today, ducking out of their user conference, and shared their take on issues we’ve been discussing here recently. Overall, Teradata response to the data warehouse appliance guys is essentially: “Well, those may be fine for specific queries, or for data marts, but in true blended enterprise data warehouse workloads we’re superior, including in performance.”
Specific takeaways included:
- Since 2002, Teradata has had a “cylinder read” option, allowing coarse-grained reads in the 2 megabyte size, comparable to what DATallegro or Netezza do all the time. Users evidently find this extremely valuable.
- Teradata’s views on standard processors vs. custom chips or FPGAs is essentially the same as DATallegro’s, and different from Netezza’s. Indeed, it’s hard to conclude based on available price lists that Netezza’s FPGAs give it a compelling price/performance advantage. However, I don’t know how important the heat/power advantages are.
- Data appliance vendors brag that because of their non-reliance on indices, they are immune from the kind of storage explosion that haunts conventional data warehouse (or MOLAP) software vendors. Teradata claims the same advantage, saying their indices are based on compressed Row IDs, and are always a lot smaller than the database itself. (Actually, upon thinking about it I realize I don’t understand that. I’ll have to probe further to see why it makes sense.)
Teradata also claims a number of features that are more reminiscent of a full DBMS than an appliance (although there’s a hierarchy of appliances, with Netezza’s software features being more limited than those of, say, DATallegro or Greenplum).
- Like DATallegro, Teradata is a big fan of partitioning, especially by time. They’re planning to soon offer the ability to partition by multiple dimensions at once. Also like DATallegro, they stripe one partition across multiple disks for maximum parallelism.
- While Teradata resists hard-to-update index types such as bitmaps or star schemas, it does support “join indices” – i.e., true materialized views, but ones that can’t be referenced in SQL statements.
- They have a capability called “synchronized scan” that lets multiple queries benefit from the same table scans.
- They’re really proud of their caching capabilities. (By way of contrast, Netezza has no cache at all.) They try to have dimension tables and heavily used partitions in cache, and the optimizer is smart about leaving them there, rather than pushing them out with the results of some large join or table scan. Using 32-bit processors, they claim cache hit rates of 90%+ on intermediate files, 80%+ on higher-level index information, and 60%+ on data blocks themselves, and expect further improvements when they exploit 64-bit processors and increase RAM. (Apparently, they just announced their first production 64-bit customer, Overstock.com).
Comments
4 Responses to “Teradata vs. the new appliance vendors, technically”
Leave a Reply
Wow, so the mighty Teradata has finally admitted that we exist. I’m truly honored.
I think it’s kind of amusing that all of a sudden Teradata is claiming to do everything we do – shame they can’t match us on price or performance.
OK, that’s slightly facetious, since there are some areas in large-scale ‘Active Warehousing’ where they still have a lead. However, there are a lot of data warehousing projects for which we simply have the better, easier to use product at a far lower price.
Stuart
DATAllegro
[…] Teradata also has preconfigured hardware. It does have indices, but rather simple ones. Plus it has join indices. And it has a few more configuration options in other areas (e.g., block size) than the other appliance vendors. (Yes, I count Teradata among the appliances.) […]
[…] But the true story is more mixed. Teradata continues to this day as a major data warehouse technology player, and as far as I’m concerned Teradata indeed makes appliances. If we look further than the applications stack, we find that appliances actually occupy a large and growing share of the computing market. So a persuasive anti-appliance argument has to do more than just invoke the names of Britton-Lee and Symbolics. […]
A database appliance compared to Teradata is like comparing a bottle rocket to a rocketship or a toaster oven to a full fledge stove. One certainly can cook all their meals quickly in a toaster oven but who would want to let alone try to cater to an audience of thousands. Teradata is a full fledge RDBMS and has been since it was designed to be a shared nothing parallel linearly scalable database. No other solution is even comparable where mixed workloads and simultaneous complex queries are concerned. I think appliances have a niche market however I think they shouldn’t call themselves data warehouse appliances because what they really are are data mart appliances.
Teradata on the otherhand is a node based system that scales and gets faster as it scales and the whole organization benefits. The complexity or lack there of ones queries or indexes is entirely up to those implementing it unlike most appliances which excel the more specialized they become. As for Join indexes not being referenced by SQL it is because they don’t have to be, they are utilized transparently.