Netezza
Analysis of Netezza and its data warehouse appliances. Related subjects include:
MapReduce user eHarmony chose Netezza over Aster or Greenplum
Depending on which IDG reporter you believe, eHarmony has either 4 TB of data or more than 12 TB, stored in Oracle but now analyzed on Netezza. Interestingly, eHarmony is a Hadoop/MapReduce shop, but chose Netezza over Aster Data or Greenplum even so. Price was apparently an important aspect of the purchase decision. Netezza also seems to have had a very smooth POC. Read more
Categories: Application areas, Aster Data, Benchmarks and POCs, Data warehousing, Greenplum, MapReduce, Netezza, Oracle, Predictive modeling and advanced analytics, Pricing | 5 Comments |
The Netezza guys propose a POC checklist
The Netezza guys at “Data Liberators” are being a bit too cute in talking about FULL DISCLOSURE yet not actually saying they’re from Netezza — but only a bit, in that their identity is pretty clear even so. That said, they’ve proposed a not-terrible checklist of how to conduct POCs. Of course, vendor-provided as it is, it’s incomplete; e.g., there’s no real mention of a baseball-bat test.
Here’s the first part of the Netezza list, with my comments interspersed. Read more
Categories: Benchmarks and POCs, Buying processes, Data warehousing, Netezza | 1 Comment |
Draft slides on how to select an analytic DBMS
I need to finalize an already-too-long slide deck on how to select an analytic DBMS by late Thursday night. Anybody see something I’m overlooking, or just plain got wrong?
Edit: The slides have now been finalized.
Netezza’s marketing goes retro again
Netezza loves retro images in its marketing, such as classic rock lyrics, or psychedelic paint jobs on its SPUs. (Given the age demographics at, say, a Teradata or Netezza user conference, this isn’t as nutty as it first sounds.) Netezza’s latest is a creative peoples-liberation/revolution riff, under the name Data Liberators. The ambience of that site and especially its first download should seem instinctively familiar to anybody who recalls the Symbionese Liberation Army when it was active, or who has ever participated in a chant of “The People, United, Will Never Be Defeated!”
The substance of the first “pamphlet”, so far as I can make out, is that you should only trust vendors who do short, onsite POCs, and Oracle may not do those for Exadata. Read more
Categories: Benchmarks and POCs, Buying processes, Data warehouse appliances, Exadata, Netezza, Oracle | 2 Comments |
Gartner’s 2008 data warehouse database management system Magic Quadrant is out
February, 2011 edit: I’ve now commented on Gartner’s 2010 Data Warehouse Database Management System Magic Quadrant as well.
Gartner’s annual Magic Quadrant for data warehouse DBMS is out. Thankfully, vendors don’t seem to be taking it as seriously as usual, so I didn’t immediately hear about it. (I finally noticed it in a Greenplum pay-per-click ad.) Links to Gartner MQs tend to come and go, but as of now here are two working links to the 2008 Gartner Data Warehouse Database Management System MQ. My posts on the 2007 and 2006 MQs have also been updated with working links. Read more
ParAccel actually uses relatively little PostgreSQL code
I often find it hard to write about ParAccel’s technology, for a variety of reasons:
- With occasional exceptions, ParAccel is reluctant to share detailed information.
- With occasional exceptions, ParAccel is reluctant to say anything for attribution.
- In ParAccel’s version of an “agile” development approach, product details keep changing, as do plans and schedules. (The gibe that ParAccel’s product plans are whatever their current sales prospect wants them to be — while of course highly exaggerated — isn’t wholly unfounded.)
- ParAccel has sold very few copies of its products, so it’s hard to get information from third parties.
ParAccel is quick, however, to send email if I post anything about them they think is incorrect.
All that said, I did get careless when I neglected to doublecheck something I already knew. Read more
Categories: Data warehousing, Netezza, ParAccel, PostgreSQL | 3 Comments |
High-performance analytics
For the past few months, I’ve collected a lot of data points to the effect that high-performance analytics – i.e., beyond straightforward query — is becoming increasingly important. And I’ve written about some of them at length. For example:
- MapReduce – controversial or in some cases even disappointing though it may be – has a lot of use cases.
- It’s early days, but Netezza and Teradata (and others) are beefing up their geospatial analytic capabilities.
- Memory-centric analytics is in the spotlight.
Ack. I can’t decide whether “analytics” should be a singular or plural noun. Thoughts?
Another area that’s come up which I haven‘t blogged about so much is data mining in the database. Data mining accounts for a large part of data warehouse use. The traditional way to do data mining is to extract data from the database and dump it into SAS. But there are problems with this scenario, including: Read more
Categories: Aster Data, Data warehousing, EAI, EII, ETL, ELT, ETLT, Greenplum, MapReduce, Netezza, Oracle, Parallelization, SAS Institute, Teradata | 6 Comments |
Big scientific databases need to be stored somehow
A year ago, Mike Stonebraker observed that conventional DBMS don’t necessarily do a great job on scientific data, and further pointed out that different kinds of science might call for different data access methods. Even so, some of the largest databases around are scientific ones, and they have to be managed somehow. For example:
- Microsoft just put out an overwrought press release. The substance seems to be that Pan-STARRS — a Jim Gray legacy also discussed in an August, 2008 Computerworld article — is adding 1.4 terabytes of image data per night, and one not so new database adds 15 terabytes per year of some kind of computer simulation output used to analyze protein folding. Both run on SQL Server, of course.
- Kognitio has an astronomical database too, at Cambridge University, adding 1/2 a terabyte of data per night.
- Oracle is used for a McGill University proteonomics database called CellMapBase. A figure of 50 terabytes of “mass storage” is included, which doesn’t include tape backup and so on.
- The Large Hadron Collider, once it actually starts functioning, is projected to generate 15 petabytes of data annually, which will be initially stored on tape and then distributed to various computing centers around the world.
- Netezza is proud of its ability to serve images and the like quickly, although off the top of my head I’m not thinking of a major customer it has in that area. (But then, if you just sell software, your academic discount can approach 100%; but if like Netezza you have an actual cost of goods sold, that’s not as appealing an option.)
Long-term, I imagine that the most suitable DBMS for these purposes will be MPP systems with strong datatype extensibility — e.g., DB2, PostgreSQL-based Greenplum, PostgreSQL-based Aster nCluster, or maybe Oracle.
Categories: Aster Data, Data types, Greenplum, IBM and DB2, Kognitio, Microsoft and SQL*Server, Netezza, Oracle, Parallelization, PostgreSQL, Scientific research | 1 Comment |
How to tell Teradata’s product lines apart
Once Netezza hit the market, Teradata had a classic “disruptive” price problem – it offered a high end product, at a high price, sporting lots of features that not all customers needed or were willing to pay for. Teradata has at times slashed prices in competitive situations, but there are obvious risks to that, especially when a customer already has a number of other Teradata systems for which it paid closer to full price.
This year, Teradata has introduced a range of products that flesh out its competitive lineup. There now are three mainstream Teradata offerings, plus two with more specialized applicability. Teradata no longer has to sell Cadillacs to customers on Corolla budgets.
But how do we tell the five Teradata product lines apart? The names are confusing, both in their hardware-vendor product numbers and their data-warehousing-dogma product names, especially since in real life Teradata products’ capabilities overlap. Indeed, Teradata executives freely admit that the Teradata Data Mart Appliance 551 can run smaller data warehouses, while the Teradata Data Warehouse Appliance 2550 is positioned in large part at what Teradata quite reasonably calls data marts.
When one looks past the difficulties of naming, Teradata’s product lineup begins to make more sense. Let’s start by considering the three main Teradata products. Read more
Categories: Data warehouse appliances, Data warehousing, Netezza, Pricing, Teradata | 14 Comments |
Eric Lai on Oracle Exadata, and some addenda
Eric Lai offers a detailed FAQ on Oracle Exadata, including a good selection of links and quotes. I’d like to offer a few comments in response: Read more
Categories: Data warehouse appliances, Data warehousing, Exadata, Greenplum, Netezza, Oracle, Pricing | 4 Comments |