Netezza

Analysis of Netezza and its data warehouse appliances. Related subjects include:

April 21, 2011

In-memory, parallel, not-in-database SAS HPA does make sense after all

I talked with SAS about its new approach to parallel modeling. The two key points are:

SAS no longer plans to go as far with in-database modeling as it previously intended.
Rather, SAS plans to run in RAM on MPP DBMS appliances, exploiting MPI (Message Passing Interface).

The whole thing is called SAS HPA (High-Performance Analytics), in an obvious reference to HPC (High-Performance Computing). It will run initially on RAM-heavy appliances from Teradata and EMC Greenplum.

A lot of what’s going on here is that SAS found it annoyingly difficult to parallelize modeling within the framework of a massively parallel DBMS such as Teradata. Notes on that aspect include:

SAS wasn’t exploiting the capabilities of individual DBMS to their fullest; rather, it was looking for an approach that would work across multiple brands of DBMS. Thus, for example, the fact that Aster’s analytic platform architecture is more flexible or powerful than Teradata’s didn’t help much with making SAS run within the Aster nCluster database.
Notwithstanding everything else, SAS did make a certain set of modeling procedures run in-database.
SAS’ previous plans to run in-database modeling in Aster and/or Netezza DBMS may never come to fruition.

Categories: Aster Data, Data warehouse appliances, Data warehousing, EMC, Greenplum, Memory-centric data management, Netezza, Parallelization, Predictive modeling and advanced analytics, SAS Institute, Teradata, Workload management

7 Comments

April 17, 2011

Netezza TwinFin i-Class overview

I have long complained about difficulties in discussing Netezza’s TwinFin i-Class analytic platform. But I’m ready now, and in the grand sweep of the product’s history I’m not even all that late. The Netezza i-Class timing story goes something like this:

Netezza i-Class was first foreshadowed in February, 2010.
Netezza i-Class customer testing started in October, 2010 or so. Netezza i-Class evidently has been shipped to 4-5 partners and a single-digit number of end-user organizations, spread across some usual-suspect industries (financial services, telecom, and so on).
Netezza i-Class 1.0 general availability is still in the (near) future.

My advice to Netezza as to how it should describe TwinFin i-Class boils down to: Read more

Categories: Cloudera, Data warehouse appliances, Data warehousing, GIS and geospatial, Hadoop, IBM and DB2, MapReduce, Netezza, Parallelization, Predictive modeling and advanced analytics

5 Comments

February 28, 2011

Updating our vendor client disclosures

Edit: This disclosure has been superseded by a March, 2012 version.

From time to time, I disclose our vendor client lists. Another iteration is below. To be clear:

This is a list of Monash Advantage members.
All our vendor clients are Monash Advantage members, unless …
… we work with them primarily in their capacity as technology users. (A large fraction of our user clients happen to be SaaS vendors.)
We do not usually disclose our user clients.
We do not usually disclose our venture capital clients, nor those who invest in publicly-traded securities.
Included in the list below are two expired Monash Advantage members who haven’t said they will renew, as mentioned in my recent post on analyst bias. (You can probably imagine a couple of reasons for that obfuscation.)

With that said, our vendor client disclosures at this time are:

Aster Data
Cloudera
CodeFutures/dbShards
Couchbase
EMC/Greenplum
Endeca
IBM/Netezza
Infobright
Intel
MarkLogic
ParAccel
QlikTech
salesforce.com/database.com
SAND Technology
SAP/Sybase
Schooner Information Technology
Skytide
Splunk
Teradata
Vertica

Categories: About this blog, Aster Data, Cloudera, Couchbase, dbShards and CodeFutures, EMC, Greenplum, IBM and DB2, Infobright, Intel, MarkLogic, Netezza, ParAccel, QlikTech and QlikView, SAND Technology, SAP AG, Schooner Information Technology, Splunk, Sybase, Tableau Software, Teradata, Vertica Systems

1 Comment

February 11, 2011

Comments on the 2011 Forrester Wave for Enterprise Data Warehouse Platforms

The Forrester Wave: Enterprise Data Warehouse Platforms, Q1 2011 is now out,* hot on the heels of the Gartner Magic Quadrant. Unfortunately, this particular Forrester Wave is riddled with inaccuracy. Read more

Categories: Analytic technologies, Columnar database management, Data warehousing, EMC, Exadata, Greenplum, Netezza, Oracle, Pricing, SAP AG, Sybase, Teradata, Vertica Systems

8 Comments

February 5, 2011

Comments on the Gartner 2010/2011 Data Warehouse Database Management Systems Magic Quadrant

Edit: Comments on the February, 2012 Gartner Magic Quadrant for Data Warehouse Database Management Systems — and on the companies reviewed in it — are now up.

The Gartner 2010 Data Warehouse Database Management Systems Magic Quadrant is out. I shall now comment, just as I did to varying degrees on the 2009, 2008, 2007, and 2006 Gartner Data Warehouse Database Management System Magic Quadrants.

Note: Links to Gartner Magic Quadrants tend to be unstable. Please alert me if any problems arise; I’ll edit accordingly.

In my comments on the 2008 Gartner Data Warehouse Database Management Systems Magic Quadrant, I observed that Gartner’s “completeness of vision” scores were generally pretty reasonable, but their “ability to execute” rankings were somewhat bizarre; the same remains true this year. For example, Gartner ranks Ingres higher by that metric than Vertica, Aster Data, ParAccel, or Infobright. Yet each of those companies is growing nicely and delivering products that meet serious cutting-edge analytic DBMS needs, neither of which has been true of Ingres since about 1987. Read more

Categories: 1010data, Actian and Ingres, Analytic technologies, Aster Data, Benchmarks and POCs, Columnar database management, Data warehouse appliances, Data warehousing, Database compression, EMC, Exadata, Greenplum, illuminate Solutions, Infobright, Microsoft and SQL*Server, Netezza, Open source, ParAccel, Pricing, SAND Technology, Storage, Sybase, Teradata, Vertica Systems, Workload management

23 Comments

January 24, 2011

Choices in analytic computing system design

When I posted a long list of architectural options for analytic DBMS, I left a couple of IOUs in for missing parts. One was in the area of what is sometimes called advanced-analytics functionality, which roughly speaking means aspects of analytic database management systems that are not directly related to conventional* SQL queries.

*Main examples of “conventional” = filtering, simple aggregrations.

The point of such functionality is generally twofold. First, it helps you execute analytic algorithms with high performance, due to reducing data movement and/or executing the analytics in parallel. Second, it helps you create and execute sophisticated analytic processes with (relatively) little effort.

For now, I’m going to refer to an analytic RDBMS that has been extended by advanced-analytics functionality as an analytic computing system, rather than as some kind of “platform,” although I suspect the latter term is more likely to wind up winning. So far, there have been five major categories of subsystem or add-on module that contribute to making an analytic DBMS a more fully-fledged analytic computing system:

SQL extensions. Examples include SQL-2003 analytics (notably windowing), or vendor-specific temporal functionality.
A framework for UDFs (User-Defined Functions) to further extend SQL. At its core, a relational DBMS is a big SQL interpreter. SQL, while powerful, only does a limited number of things. User-Defined Functions are new predicates in the SQL language that do additional things.
An execution engine for analytic processes that is less coupled to the SQL engine than a pure UDF framework might be. The two main approaches are MapReduce (e.g. Aster Data) and general C++ libraries (Netezza, ParAccel).
Libraries of pre-built analytic processes. Commonly included are statistics, (other machine learning), general linear algebra, and Monte Carlo analysis. Some of these functions are fully parallelized (perhaps tens per vendor). Others just play nicely with the vendor’s execution framework, in that a separate copy can be run on each node (up to thousands per vendor, for those who bring in open source statistics libraries).
Development tools such as integrated development environments (IDEs). Aster keeps trying to convince me that having built a nice Eclipse IDE is a major competitive differentiation.

Categories: Aster Data, MapReduce, Netezza, ParAccel, Parallelization, Predictive modeling and advanced analytics, Workload management

8 Comments

October 22, 2010

Notes and links October 22, 2010

A number of recent posts have had good comments. This time, I won’t call them out individually.

Evidently Mike Olson of Cloudera is still telling the machine-generated data story, exactly as he should be. The Information Arbitrage/IA Ventures folks said something similar, focusing specifically on “sensor data” …

… and, even better, went on to say: Read more

Categories: Analytic technologies, Aster Data, Cloudera, eBay, Greenplum, Hadoop, IBM and DB2, In-memory DBMS, Market share and customer counts, Netezza, Open source, Oracle, ParAccel, Petabyte-scale data management, SAS Institute, Surveillance and privacy, Teradata, VoltDB and H-Store

1 Comment

October 15, 2010

Notes on data warehouse appliance prices

I’m not terribly motivated to do a detailed analysis of data warehouse appliance list prices, in part because:

Everybody knows that in practice data warehouse appliances tend to be deeply discounted from list price.
The only realistic metric to use for pricing data warehouse appliances is price-per-terabyte, and people have gotten pretty sick of that one.

That said, here are some notes on data warehouse appliance prices. Read more