Data warehouse appliances
Analysis of data warehouse appliances – i.e., of hardware/software bundles optimized for fast query and analysis of large volumes of (usually) relational data. Related subjects include:
- Data warehousing
- Parallelization
- Netezza
- DATAllegro
- Teradata
- Kickfire
- (in The Monash Report) Computing appliances in multiple domains
Notes on analytic hardware
I took the opportunity of Teradata’s Aster/Hadoop appliance announcement to catch up with Teradata hardware chief Carson Schmidt. I love talking with Carson, about both general design philosophy and his views on specific hardware component technologies.
From a hardware-requirements standpoint, Carson seems to view Aster and Hadoop as more similar to each other than either is to, say, a Teradata Active Data Warehouse. In particular, for Aster and Hadoop:
- I/O is more sequential.
- The CPU:I/O ratio is higher.
- Uptime is a little less crucial.
The most obvious implication is differences in the choice of parts, and of their ratio. Also, in the new Aster/Hadoop appliance, Carson is content to skate by with RAID 5 rather than RAID 1.
I think Carson’s views about flash memory can be reasonably summarized as: Read more
Categories: Aster Data, Data warehouse appliances, Data warehousing, Hadoop, Solid-state memory, Storage, Teradata | 2 Comments |
IBM Pure jargon
As best I can tell, IBM now has three related families of hardware/software bundles, aka appliances, aka PureSystems, aka something that sounds like “expert system” but in fact has nothing to do with the traditional rules-engine meaning of that term. In particular,
- One of the three families is for the data tier, under the name PureData. That’s what’s new today.
- One of the three families is for the application tier, under the name PureApplication. More information can be found here.
- One of the three families is for “infrastructure”, under the name PureFlex. More information can be found here.
Within the PureData line, there are three sub-families:
- One is based on DB2 pureScale and is said to be “optimized exclusively for transactional data workloads”.
- One is based on Netezza, and is said to be “optimized exclusively for analytic workloads”.
- One is based on DB2 with the shared-nothing option, and is said to be “optimized exclusively for operational analytic data workloads”, notwithstanding that the underlying software has for years been IBM’s flagship general-purpose (non-mainframe) DBMS.
The Netezza part of the story seems to start:
- The Netezza name is being deprecated, except insofar as certain PureData systems are “Powered by Netezza Technology.”
- Netezza didn’t trumpet slipstream hardware enhancements even when it was independent, and IBM sure isn’t reversing that policy now.
- The Netezza software has been enhanced, most notably in a ~20X improvement in concurrency for “tactical” queries.
Perhaps someday I’ll be able to supply interesting details, for example about the concurrency improvement or about the uses (if any) customers are finding for Netezza’s in-database analytics — but as previously noted, analyzing big companies is hard.
Categories: Data warehouse appliances, IBM and DB2, Netezza, OLTP | 4 Comments |
Notes on the Oracle OpenWorld Sunday keynote
I’m not at Oracle OpenWorld, but as usual that won’t keep me from commenting. My bottom line on the first night’s announcements is:
- At many large enterprises, Oracle has a lock on much of their IT efforts. (But not necessarily in the internet or investigative analytics areas.) Tonight’s announcements serve to strengthen that.
- Tonight’s announcements do little to help Oracle in other market segments.
In particular:
1. At the highest level, my view of Oracle’s strategy is the same as it’s been for several years:
Clayton Christensen’s The Innovator’s Solution teaches us that Oracle should focus on selling a thick stack of technology to its highest-end customers, and that’s exactly what Oracle does focus on.
2. Tonight’s news is closely in line with what Oracle’s Juan Loaiza told me three years ago, especially:
- Oracle thinks flash memory is the most important hardware technology of the decade, one that could lead to Oracle being “bumped off” if they don’t get it right.
- Juan believes the “bulk” of Oracle’s business will move over to Exadata-like technology over the next 5-10 years. Numbers-wise, this seems to be based more on Exadata being a platform for consolidating an enterprise’s many Oracle databases than it is on Exadata running a few Especially Big Honking Database management tasks.
3. Oracle is confusing people with its comments on multi-tenancy. I suspect:
- What Oracle is talking about when it says “multi-tenancy” is more like consolidation than true multi-tenancy.
- Probably there are a couple of true multi-tenancy features as well.
4. SaaS (Software as a Service) vendors don’t want to use Oracle, because they don’t want to pay for it.* This limits the potential impact of Oracle’s true multi-tenancy features. Even so: Read more
Notes on Hadoop adoption
I successfully resisted telephone consulting while on vacation, but I did do some by email. One was on the oft-recurring subject of Hadoop adoption. I think it’s OK to adapt some of that into a post.
Notes on past and current Hadoop adoption include:
- Enterprise Hadoop adoption is for experimental uses or departmental production (as opposed to serious enterprise-level production). Indeed, it’s rather tough to disambiguate those two. If an enterprise uses Hadoop to search for new insights and gets a few, is that an experiment that went well, or is it production?
- One of the core internet-business use cases for Hadoop is a many-step ETL, ELT, and data refinement pipeline, with Hadoop executing some or many of the steps. But I don’t think that’s in production at many enterprises yet, except in the usual forward-leaning sectors of financial services and (we’re all guessing) national intelligence.
- In terms of industry adoption:
- Financial services on the investment/trading side are all over Hadoop, just as they’re all over any technology. Ditto national intelligence, one thinks.
- Consumer financial services, especially credit card, are giving Hadoop a try too, for marketing and/or anti-fraud.
- I’m sure there’s some telecom usage, but I’m hearing of less than I thought I would. Perhaps this is because telcos have spent so long optimizing their data into short, structured records.
- Whatever consumer financial services firms do, retailers do too, albeit with smaller budgets.
Thoughts on how Hadoop adoption will look going forward include: Read more
Categories: Cloud computing, Data warehouse appliances, Data warehousing, EAI, EII, ETL, ELT, ETLT, Hadoop, Investment research and trading, Telecommunications | 3 Comments |
Analytic platform — analytic glossary draft entry
This is a draft entry for the DBMS2 analytic glossary. Please comment with any ideas you have for its improvement!
Note: Words and phrases in italics will be linked to other entries when the glossary is complete.
In our usage, an “analytic platform” is an analytic DBMS with well-integrated in-database analytics, or a data warehouse appliance that includes one. The term is also sometimes used to refer to:
- Any analytic DBMS or data warehouse appliance.
- Other kinds of software, or software/hardware combination, that support broad analytic capabilities.
To varying extents, most major vendors of analytic DBMS or data warehouse appliances have extended their products into analytic platforms; see, for example, our original coverage of analytic platform versions of as Aster, Netezza, or Vertica.
Related posts
- Our original definition of “analytic platform” (February, 2011)
- Our original feature list for analytic platforms (January, 2011)
Categories: Analytic glossary, Aster Data, Data warehouse appliances, Data warehousing, Netezza, Vertica Systems | 3 Comments |
Data warehouse appliance — analytic glossary draft entry
This is a draft entry for the DBMS2 analytic glossary. Please comment with any ideas you have for its improvement!
Note: Words and phrases in italics will be linked to other entries when the glossary is complete.
A data warehouse appliance is a combination of hardware and software that includes an analytic DBMS (DataBase Management System). However, some observers incorrectly apply the term “data warehouse appliance” to any analytic DBMS.
The paradigmatic vendors of data warehouse appliances are:
- Teradata, which embraced the term “data warehouse appliance” in 2008.
- Netezza — now an IBM company — which popularized the term “data warehouse appliance” in the 2000s.
Further, vendors of analytic DBMS commonly offer — directly or through partnerships — optional data warehouse appliance configurations; examples include:
- Greenplum, now part of EMC.
- Vertica, now an HP company.
- IBM DB2, under the brand “Smart Analytic System”.
- Microsoft (Parallel Data Warehouse).
Oracle Exadata is sometimes regarded as a data warehouse appliance as well, despite not being solely focused on analytic use cases.
Data warehouse appliances inherit marketing claims from the category of analytic DBMS, such as: Read more
Categories: Analytic glossary, Data warehouse appliances, Data warehousing, EMC, Exadata, Greenplum, HP and Neoview, IBM and DB2, Microsoft and SQL*Server, Netezza, Oracle, Teradata | 4 Comments |
Notes on some basic database terminology
In a call Monday with a prominent company, I was told:
- Teradata, Netezza, Greenplum and Vertica aren’t relational.
- Teradata, Netezza, Greenplum and Vertica are all data warehouse appliances.
That, to put it mildly, is not accurate. So I shall try, yet again, to set the record straight.
In an industry where people often call a DBMS just a “database” — so that a database is something that manages a database! — one may wonder why I bother. Anyhow …
1. The products commonly known as Oracle, Exadata, DB2, Sybase, SQL Server, Teradata, Sybase IQ, Netezza, Vertica, Greenplum, Aster, Infobright, SAND, ParAccel, Exasol, Kognitio et al. all either are or incorporate relational database management systems, aka RDBMS or relational DBMS.
2. In principle, there can be difficulties in judging whether or not a DBMS is “relational”. In practice, those difficulties don’t arise — yet. Every significant DBMS still falls into one of two categories:
- Relational:
- Was designed to do relational stuff* from the get-go, even if it now does other things too.
- Supports a lot of SQL.
- Non-relational:
- Was designed primarily to do non-relational things.*
- Doesn’t support all that much SQL.
*I expect the distinction to get more confusing soon, at which point I’ll adopt terms more precise than “relational things” and “relational stuff”.
3. There are two chief kinds of relational DBMS: Read more
Thoughts on the next releases of Oracle and Exadata
A reporter asked me to speculate about the next releases of Oracle and Exadata. He and I agreed:
- It seems likely that they’ll be discussed at Oracle OpenWorld in a couple of months.
- Exadata in particular is due for a hardware refresh.
- Oracle12c is a good guess at a name, where “C” is for “Cloud”.
My answers mixed together thoughts on what Oracle should and will emphasize (which aren’t the same thing but hopefully bear some relationship to each other ;)). They were (lightly edited):
- The worst thing about Oracle is the ongoing DBA work for what should be automatic.
- Oracle RAC still makes scale-out too difficult. Presumably, Oracle is looking to build aggressively on recent steps in automating parallelism.
- For Exadata, assume that Oracle is always looking to improve how data gets allocated among disk, flash, and RAM. Look also for Exadata versions with different silicon-disk ratios than are available now.
- Tighter integration among the various appliances is surely a goal, …
- … but I don’t know whether Oracle will pick them apart and let you put various kinds of hardware in the same racks or not. I’d guess against that, because the current set-up gives them a pretext to sell you more capacity than you need.
- I wonder whether Oracle will finally introduce a true columnar storage option, a year behind Teradata. That would be the obvious enhancement on the data warehousing side, if they can pull it off. If they can’t, it’s a damning commentary on the core Oracle codebase.
- Probably Oracle will have something that it portrays as good multi-tenancy support. Some of that could be based on Label Security and so on.
- Anything that makes schema change easier could be a win on the DBA and multi-tenancy sides alike, which would be a nice two-fer.
Categories: Clustering, Columnar database management, Data warehouse appliances, Data warehousing, Exadata, Oracle, Teradata | 7 Comments |
The eternal bogosity of performance marketing
Chris Kanaracus uncovered a case of Oracle actually pulling an ad after having been found “guilty” of false advertising. The essence seems to be that Oracle claimed 20X hardware performance vs. IBM, based on a comparison done against 6 year old hardware running an earlier version of the Oracle DBMS. My quotes in the article were:
- “Everybody’s guilty of that kind of exaggeration.”
- “Oracle tends to be even a little guiltier than others.”
- “If your new system can’t outperform somebody else’s old system by a huge factor on at least some queries, you’re doing something wrong.”
- “Use newer, better hardware; use newer, better software; have a top sales engineer do a great job of tuning it and of course you’ll see huge performance results.”
Another example of Oracle exaggeration was around the Exadata replacement of Teradata at Softbank. But the bogosity flows both ways. Netezza used to make a flat claim of 50X better performance than Oracle, while Vertica’s standard press release boilerplate long boasted
50x-1000x faster performance at 30% the cost of traditional solutions
Of course, reality is a lot more complicated. Even if you assume apples-to-apples comparisons in terms of hardware and software versions, performance comparisons can vary greatly depending upon queries, databases, or use cases. For example:
- Many queries are inherently much faster over columnar storage than over row-based.
- Different data sets respond very differently to various compression algorithms.
- Some analytic RDBMS can maintain strong performance at high levels of concurrent usage. Some can’t.
- Some queries that run very fast on one DBMS without tuning might require careful tuning in another system.
- Some DBMS scale out much better than others.
- Vendors optimize for different usage assumptions, which may or may not apply in your particular case.
And so, vendor marketing claims about across-the-board performance should be viewed with the utmost of suspicion.
Related links
Categories: Columnar database management, Data warehouse appliances, Data warehousing, Database compression, Exadata, Netezza, Oracle, Vertica Systems | Leave a Comment |
Hardware and components — lessons from Teradata
I love talking with Carson Schmidt, chief of Teradata’s hardware engineering (among other things), even if I don’t always understand the details of what he’s talking about. It had been way too long since our last chat, so I requested another one. We were joined by Keith Muller, who I presume is pictured here. Takeaways included:
- Teradata performance growth was slow in the early 2000s, but has accelerated since then; Intel gets a lot of the credit (and blame) for that.
- Carson hopes for a performance “discontinuity” with Intel Ivy Bridge.
- Teradata is not afraid to use niche special-purpose chips.
- Teradata’s views can be taken as well-informed endorsements of InfiniBand and SAS 2.0.