Netezza TwinFin i-Class overview
I have long complained about difficulties in discussing Netezza’s TwinFin i-Class analytic platform. But I’m ready now, and in the grand sweep of the product’s history I’m not even all that late. The Netezza i-Class timing story goes something like this:
- Netezza i-Class was first foreshadowed in February, 2010.
- Netezza i-Class customer testing started in October, 2010 or so. Netezza i-Class evidently has been shipped to 4-5 partners and a single-digit number of end-user organizations, spread across some usual-suspect industries (financial services, telecom, and so on).
- Netezza i-Class 1.0 general availability is still in the (near) future.
My advice to Netezza as to how it should describe TwinFin i-Class boils down to:
1. The Netezza platform has been enhanced in two major ways:
- There’s a good way to run all kinds of analytic processes. This is very flexible and powerful, but tightly integrated with the SQL engine even so.
- You are supplying some specific high-performing, highly parallel, big-data analytic process building blocks. More precisely, you have greatly extended the set of such building blocks; you had some cool building blocks (notably Spatial) even before this.
2. There are four main ways to get at this:
- Extended SQL.
- Programming, in a bunch of languages and paradigms, integrated into the SQL.
- Partner code, with them doing the programming for you.
Some of the rah-rah words aside, that’s a pretty fair overview. Here’s more detail.
To refresh your memory: Netezza TwinFin i-Class functionality basics include, as best I can tell (and there’s some more detail at the links above):
- You can run processes in a usual-suspect set of languages on Netezza i-Class (even Fortran).
- One notable example is R; indeed, there’s an R client for talking to Netezza TwinFin.
- Netezza provides its own Hadoop implementation, which differs from standard Hadoop implementations most notably in that it manages data relationally via the usual Netezza DBMS, not in anything like HDFS.
- Anything written in any language except C/C++ (or of course SQL) — and in particular anything involving Hadoop — runs out-of-process versus the Netezza DBMS. C/C++ can run in-process, for maximum performance.*
- There’s an assortment of parallelized mathematical analytic packages built into Netezza i-Class. The matrix algebra ones are called nzMatrix. Most of the rest are part of a collection called nzAnalytics. Often these are implemented as stored procedures, as they may make multiple passes through the data.
- Netezza has thoughtfully ported thousands of analytic procedures for you to the Netezza platform (in essence, the basic R/CRAN and GNU libraries). These are not promised to be parallel on their own, but you’re welcome to invoke an instance on each node and parallelize that way.
I forgot to check, but I’m guessing any extension of workload management to cover non-DBMS processes won’t be in the first release of Netezza i-Class.
*However, Netezza says that if you can batch requests to return even just 500-1000 records at a time, the out-of-process performance penalty — which is based on wait time for transferring data between processes — becomes insignificant.
None of that is particularly new information. But after a visit to Netezza on Tuesday, I’ve finally gotten some kind of handle on how i-Class is architected. Highlights of the Netezza i-Class architecture story, as I understand them, include:
- It all starts with UDtFs — User-Defined (table) Functions, which are subject to the usual limitations.
- To overcome the standard limitations of UDtFs, Netezza built:
- A set of UDtFs that, taken together, execute command-line programs.
- For each language (Java, Python, R, etc., and I think also C/C++), a library that talks to the command-line executor. This library can talk to multiple instances of the executor, so it’s not limited to a single data stream. Similarly, it can persist past the life of a query.
- Similarly, Netezza built a C/C++ library that talks to the command-line executor and also talks MPI (Message Passing Interface).
- This has not yet been exposed outside Netezza.
- Rather, MPI is used by nzMatrix, so that nzMatrix can invert (for example) really, really big matrices.
- There are two* main ways to invoke all this.
- SQL. Any analytic process can be invoked via a SQL UDtF. Netezza tends to use the term UDAP (User-Defined Analytic Process) interchangeably for the process itself and for the SQL UDtF that encapsulates it.
- Netezza’s (interfaces to an) R client. More on that below.
- Netezza’s version of Hadoop is an important special case. The mappers and reducers you write in Hadoop are UDAPs.
I didn’t delve far enough into Netezza’s UDAP syntax to understand how it compares to, say, Aster’s SQL/MR.
*From a marketing standpoint, Netezza might prefer to count partner code separately as a third way, but I’m focusing on the technology here, which is used by partners and end-user organizations alike.
Other Netezza/Hadoop notes include:
- Netezza has the usual kind of Cloudera partnership.
- Since Netezza’s owner IBM has a Hadoop implementation, it seems obvious there will be some partnership action with that too. But at this point it’s not so far along.
The Netezza TwinFin i-Class R story goes something like this:
- Assume you’re using R on a client. (I’m not sure whether Netezza has an R client to give or recommend to you.)
- There are three Netezza packages that change how R works, by letting it use stuff on the Netezza box.
- nzR translates between logical R memory structures and Netezza tables. In particular, nzR allows R to run, not just in-memory, but against the data on the Netezza box.
- nzMatrix lets you do R matrix algebra against the data on the Netezza box.
- nzAnalytics lets you invoke various algorithms that run on the Netezza box, against Netezza data.
A recently announced Netezza partnership with Revolution Analytics is meant to lead to Revolution replacing Netezza’s ports of R libraries with its own preferred distribution, and then supporting same.
Finally, there’s Netezza Spatial.
- Netezza claims multiple orders of magnitude of performance advantage for Netezza Spatial vs. geospatial alternatives, which is always a nice thing to be able to say.
- Generally, Netezza Spatial is now regarded as being part of i-Class.
- However, the product timing and adoption comments above don’t apply to Netezza Spatial.
- Netezza Spatial has a couple of dedicated salespeople, and seems to be well-liked by retailers.
- Netezza surely wishes everybody would forget about some of rewrites and controversy associated with Netezza Spatial.
Perhaps there are yet more pieces of the Netezza TwinFin i-Class story I’m overlooking, but I hope I now have most of the major aspects at least partway right.
Comments
5 Responses to “Netezza TwinFin i-Class overview”
Leave a Reply
Very good insight, seems like IBM don´t missed the target in Netezza acquisition. Looks an mature and flexible product.
Curt, do u think that IBM will improve more and more Netezza or just bought the technology to compete against Teradata/Exadata?
Regards,
Luiz
Luiz,
Of course IBM is planning to enhance Netezza’s technology. Otherwise it would stop being competitive very soon.
Curt,
Thanks for taking the time to review Netezza’s analytics capabilities. I look forward to keeping you updated as we progress. This year’s Enzee Universe user’s conference will highlight Netezza Analytics. We’ll have many customers present their use cases and results. These presentations should give you even greater context.
-Matt
Senior Director Analytics Product Marketing
Netezza, an IBM Company
Curt –
Thank you for your thorough review of Netezza TwinFin i-Class.
I thought it might be beneficial for your readers to make it very clear that we run R on the s-Blades inside TwinFin in addition to supporting the R client to connect to R on the TwinFin (via the packages and capabilities you described very well).
See you in Boston at Enzee Universe in June!
Michele Chambers
GM & VP Analytic Solutions
Netezza, an IBM Company
[…] Netezza TwinFin i-Class overview | DBMS 2 : DataBase Management System Services (tags: netezza analytics twinfin) […]