Teradata
Analysis of data warehousing giant Teradata. Related subjects include:
Aster Data nCluster 4.5
Like Vertica, Netezza, and Teradata, Aster is using this week to pre-announce a forthcoming product release, Aster Data nCluster 4.5. Aster is really hanging its identity on “Big Data Analytics” or some variant of that concept, and so the two major named parts of Aster nCluster 4.5 are:
- Aster Data Analytic Foundation, a set of analytic packages prebuilt in Aster’s SQL-MapReduce
- Aster Data Developer Express, an Eclipse-based IDE (Integrated Development Environment) for developing and testing applications built on Aster nCluster, Aster SQL-MapReduce, and Aster Data Analytic Foundation
And in other Aster news:
- Along with the development GUI in Aster nCluster 4.5, there is also a new administrative GUI.
- Aster has certified that nCluster works with Fusion I/O boards, because at least one retail industry prospect cares. However, that in no way means that arm’s-length Fusion I/O certification is Aster’s ultimate solid-state memory strategy.
- I had the wrong impression about how far Aster/SAS integration has gotten. So far, it’s just at the connector level.
Aster Data Developer Express evidently does some cool stuff, like providing some sort of parallelism testing right on your desktop. It also generates lots of stub code, saving humans from the tedium of doing that. Useful, obviously.
But mainly, I want to write about the analytic packages. Read more
Categories: Aster Data, Data warehousing, Investment research and trading, Predictive modeling and advanced analytics, RDF and graphs, SAS Institute, Teradata | 9 Comments |
Intelligent Enterprise’s Editors’/Editor’s Choice list for 2010
As he has before, Intelligent Enterprise Editor Doug Henschen
- Personally selected annual lists of 12 “Most influential” companies and 36 “Companies to watch” in analytics- and database-related sectors.
- Made it clear that these are his personal selections.
- Nonetheless has called it an Editors’ Choice list, rather than Editor’s Choice. 🙂
(Actually, he’s really called it an “award.”)
Comments on the Gartner 2009/2010 Data Warehouse Database Management System Magic Quadrant
February, 2011 edit: I’ve now commented on Gartner’s 2010 Data Warehouse Database Management System Magic Quadrant as well.
At intervals of a little over a year, Gartner Group publishes a Data Warehouse Database Management System Magic Quadrant. Gartner’s 2009 data warehouse DBMS Magic Quadrant — actually, January 2010 — is now out.* For many reasons, including those I noted in my comments on Gartner’s 2008 Data Warehouse DBMS Magic Quadrant, the Gartner quadrant pictures are a bad use of good research. Rather than rehash that this year, I’ll merely call out some points in the surrounding commentary that I find interesting or just plain strange. Read more
Aster Data 4.0 and the evolution of “advanced analytic(s) servers”
Since Linda and I are leaving on vacation in a few hours, Aster Data graciously gave me permission to morph its “12:01 am Monday, November 2” embargo into “late Friday night.”
Aster Data is officially announcing the 4.0 release of nCluster. There are two big pieces to this announcement:
- Aster is offering a slick vision for integrating big-database management and general analytic processing on the same MPP cluster, under the not-so-slick name “Data-Application Server.”
- Aster is also offering a sophisticated vision for workload management.
In addition, Aster has matured nCluster in various ways, for example cleaning up a performance problem with single-row updates.
Highlights of the Aster “Data-Application Server” story include: Read more
Categories: Aster Data, Cloud computing, Data warehousing, EAI, EII, ETL, ELT, ETLT, MapReduce, Market share and customer counts, Teradata, Theory and architecture, Workload management | 9 Comments |
Teradata’s nebulous cloud strategy
As the pun goes, Teradata’s cloud strategy is – well, it’s somewhat nebulous. More precisely, for the foreseeable future, Teradata’s cloud strategy is a collection of rather disjointed parts, including:
- What Teradata calls the Teradata Agile Analytics Cloud, which is a combination of previously existing technology plus one new portlet called the Teradata Elastic Mart(s) Builder. (Teradata’s Elastic Mart(s) Builder Viewpoint portlet is available for download from Teradata’s Developer Exchange.)
- Teradata Data Mover 2.0, coming “Soon”, which will ease copying (ETL without any significant “T”) from one Teradata system to another.
- Teradata Express DBMS crippleware (1 terabyte only, no production use), now available on Amazon EC2 and VMware. (I don’t see where this has much connection to the rest of Teradata’s cloud strategy, except insofar as it serves to fill out a slide.)
- Unannounced (and so far as I can tell largely undesigned) future products.
Teradata openly admits that its direction is heavily influenced by Oliver Ratzesberger at eBay. Like Teradata, Oliver and eBay favor virtual data marts over physical ones. That is, Oliver and eBay believe that the ideal scenario is that every piece of data is only stored once, in an integrated Teradata warehouse. But eBay believes and Teradata increasingly agrees that users need a great deal of control over their use of this data, including the ability to import additional data into private sandboxes, and join it to the warehouse data already there. Read more
Categories: Analytic technologies, Cloud computing, Data integration and middleware, Data warehousing, EAI, EII, ETL, ELT, ETLT, eBay, Teradata, Theory and architecture | 5 Comments |
Teradata hardware strategy and tactics
In my opinion, the most important takeaways about Teradata’s hardware strategy from the Teradata Partners conference last week are:
- Teradata’s future lies in solid-state memory. That’s in line with what Carson Schmidt told me six months ago.
- To Teradata’s surprise, the solid-state future is imminent. Teradata is 6-9 months further along with solid-state drives (SSD) than it thought a year ago it would be at this point.
- Short-term, Teradata is going to increase the number of appliance kinds it sells. I didn’t actually get details on anything but the new SSD-based Blurr, but it seems there will be others as well.
- Teradata’s eventual future is to mix and match parts (especially different kinds of storage) in a more modular product line. Teradata Virtual Storage is of pretty limited value otherwise. I probably believe Teradata will go modular more emphatically than Teradata itself does, because I think doing so will meet users needs more effectively than if Teradata relies strictly on fixed appliance configurations.
In addition, some non-SSD componentry tidbits from Carson Schmidt include:
- Teradata really likes Intel’s Nehalem CPUs, with special reference to multi-threading, QuickPath interconnect, and integrated memory controller. Obviously, Nehalem-based Teradata boxes should be expected in the not too distant future.
- Teradata really likes Nehalem’s successor Westmere too, and expects to be pretty fast to market with it (faster than with Nehalem) because Nehalem and Westmere are plug-compatible in motherboards.
- Teradata will go to 10-gigabit Ethernet for external connectivity on all its equipment, which should improve load performance.
- Teradata will also go to 10-gigabit Ethernet to play the Bynet role on appliances. Tests are indicating this improves query performance.
- What’s more, Teradata believes there will be no practical scale-out limitations with 10-gigabit Ethernet.
- Teradata hasn’t decided yet what to do about 2.5” SFF (Small Form Factor) disk drives, but is leaning favorably. Benefits would include lower power consumption and smaller cabinets.
- Also on Carson’s list of “exciting” future technologies is SAS 2.0, which at 6 gigabits/second doubles the I/O bandwidth of SAS 1.0.
- Carson is even excited about removing universal power supplies from the cabinets, increasing space for other components.
- Teradata picked Intel’s Host Bus Adapters for 10-gigabit Ethernet. The switch supplier hasn’t been determined yet.
Let’s get back now to SSDs, because over the next few years they’re the potential game-changer. Read more
Categories: Data warehouse appliances, Data warehousing, Solid-state memory, Storage, Teradata | 13 Comments |
Reports of perfectly-balanced hardware configurations are greatly exaggerated
Data warehouse appliance and software appliance vendors like to claim that they’ve worked out just the right hardware configuration(s), and that a single configuration is correct for a fairly broad range of workloads. But there are a lot of reasons to be dubious about that. Specific vendor evidence includes:
- Teradata ascribes considerable importance to a Virtual Storage technology whose main purpose is to allow mixing of heterogeneous storage devices in a single system. And the discussion rarely suggests that these parts will be in a rigid fixed relationship.
- Netezza — as Teradata keeps reminding me — often sells boxes with the expectation that they won’t be filled with data, so as to increase spindle count and hence performance.
- Oracle/Sun have dropped some comments about Exadata being more flexibly configured going forward.
- Kickfire’s new “high-end” appliance lets you attach fairly arbitrary amounts of external storage.
- And of course, software-only analytic DBMS vendors run their software in all sorts of hardware and storage environments.
What’s more, the claim never made a lot of sense anyway. With the rarest of exceptions, even a single data warehouse’s workload will contain different queries that strain different parts of the system in different ratios. Calculating the “ideal” hardware configuration for that single workload would be forbiddingly difficult. And even if one could calculate it, it almost surely would be different than another user’s “ideal” configuration. How a single hardware configuration can be “ideally balanced” for a broad class of use cases boggles the imagination.
Categories: Data warehouse appliances, Data warehousing, Exadata, Kickfire, Netezza, Oracle, Teradata | 6 Comments |
This week at the Teradata Partners user conference
Teradata tells me that its press embargoes are ending at 9:00 this morning. Here are some highlights of what’s going on, although names, dates, and details will have to await conversations and press releases this week.
- Teradata is productizing “private cloud,” under names including “Teradata Enterprise Analytics Cloud,” “Teradata Agile Analytics Cloud,” and “Teradata Elastic Mart Builder.” I.e., Teradata hopes to leapfrog Greenplum in its “Enterprise Data Cloud” strategy. This is only fair, in that Greenplum lifted the idea from Teradata and eBay in the first place. It also provides major support for what I think is an extremely sensible trend. Give or take issues of who announces and ships what a couple months before or after a competitor, my early thinking is that the main differences between Greenplum and Teradata in this regard will be:
- Virtual as opposed to just physical data marts, based on robust workload management software. (Advantage: Teradata)
- Pricing, deployment options. (Advantage: Greenplum)
- Features that don’t directly relate to enterprise/private cloud. (Advantage: Either, often Teradata.)
- Teradata is generally strengthening its data movement technology, e.g. for making various appliances work in sync. I’m not too clear yet on the details of that. I think this is what Teradata’s phrase “ecosystem management” refers to.
- Teradata is (pre-)announcing – at least as a statement of direction — an appliance based on solid-state drives (SSDs). I’ve thought for a while that Teradata was a leader in thinking through the issues around solid-state memory in data warehousing, so it makes sense that they’re among the leaders in actually coming to market as well. I plan to say more after meeting with, e.g., Carson Schmidt.
- Teradata has achieved a 300%ish speed-up in geospatial processing. I gather this is largely a byproduct of the parallel analytics work Teradata did around strengthening its SAS integration. However, there don’t seem to be a lot of Teradata geospatial users yet.
- Teradata Express, Teradata’s free Windows-based crippleware, is being ported to Amazon EC2 and VMware as well. Presumably to avoid cannibalizing Teradata product sales, there are quite a few limitations on Teradata Express, including system capacity, database size, and “no production use.”
- Teradata continues to extend its optimizations to handle queries issued by business intelligence tools. Previously, the focus of what Teradata discussed in this regard was query rewrite. But soon automatic recommendation and creation of Aggregate Join Indexes – i.e.., materialized views – will be included as well.
Oracle Exadata customers presenting at Oracle Open World
Greg Rahn tweeted a list of Exadata-focused sessions at Oracle Open World next week. As Oracle employees and supporters have been foreshadowing, there will be Exadata users and user-like folks presenting. I identified what look like half a dozen (not counting any who, for example, will make surprise appearances at keynote addresses), specifically: Read more
Categories: Data warehousing, Exadata, Market share and customer counts, Oracle, Teradata | 5 Comments |
Oracle Exadata 2 capacity pricing
Summary of Oracle Exadata 2 capacity pricing
Analyzing Oracle Exadata pricing is always harder than one would first think. But I’ve finally gotten around to doing an Oracle Exadata 2 pricing spreadsheet. The main takeaways are:
- If we believe Oracle’s claims of 10X compression, Exadata 2 costs more per terabyte of user data than Netezza TwinFin — $22-26K/TB vs. TwinFin’s <$20K — but less than the Teradata 2550.
- These figures are highly sensitive to assumptions about Oracle’s hybrid columnar compression.
- Similarly, if Netezza or Teradata were to significantly upgrade their own compression, the price comparison would look quite different.
- Options such as Data Mining or Oracle Spatial add 12% or so each to Exadata’s total system price.
Longer version
When Oracle introduced Exadata last year it was, well, expensive. Exadata 2 has now been announced, and it is significantly cheaper than Exadata 1 per terabyte of user data, based on:
- Similar overall pricing
- Twice the disk capacity
- Better compression