The Netezza Developer Network
Netezza has officially announced the Netezza Developer Network. Associated with that is a set of technical capabilities, which basically boil down to programming user-defined functions or other capabilities straight onto the Netezza nodes (aka SPUs). And this is specifically onto the FPGAs, not the PowerPC processors. In C. Technically, I think what this boils down to is:
- Extending Netezza’s SQL via user-defined functions (which probably wasn’t too hard, especially since the Netezza engine is related to PostgreSQL).
- Providing a C-to-Verilog compiler.
- Providing an application development environment and associated tools. (Presumably rather primitive, but I haven’t really checked it out.)
The applications mentioned in the NDN press release, and I quote directly, are:
- Multi-dimensional geospatial analytics on comprehensive data sets for risk management
- Predictive model scoring for customer segmentation, enabling real-time offer provisioning for customers
- Iterative modeling and analytics on billions of call detail records (CDRs) for telco price optimization
- Real-time Monte Carlo simulations on terabytes of detail-level data for risk management
- “Fingerprinting” with hashing algorithms for chain-of-custody document fingerprinting and to ensure that files transferred are intact
- Fuzzy text search analysis uses algorithms that provide a “best guess” of most likely results
Netezza says that the greatest interest has come from usual-suspect sophisticated users, specifically intelligence agencies and perhaps also financial services firms. But naturally, the partners actually trotted out at Netezza’s user conference were mainly hopeful small-company ISVs. The biggest stir was made by not-so-small SAS, which evidently believes this new capability will provide massive improvements to SAS/Netezza combined performance.
In principle, there are four different ways this new programmability could be a big win:
- Code might just run faster on FPGAs — or on an MPP system in general — than on standard processors. I don’t currently have an opinion as to whether this situation is likely to arise in practice to any significant degree. (Note to self: Talk with one or both of Netezza partners SAS and SPSS on this subject soon.)
- A communication bottleneck is eliminated, whereby query result sets currently have to be sent to an application box via gigabit Ethernet (or whatever) to be processed. I’m sure that’s a biggie. Rival vendors, who run on (more) standard hardware, have this problem to a much lesser extent.
- Network traffic internal to the appliance is also reduced, as data can be massaged right on the node rather than shipped off for processing elsewhere. For some kinds of applications, such as scoring or certain kinds of data reduction, this is surely a big deal. Once again, other MPP data warehouse specialists can and should offer such capabilities too.
- Non-tabular datatypes can now be supported. E.g., there are small outfits offering XML and geospatial, and Netezza has done some internal work to show off its ability to store and load images. I’ll say more about this in another post, not necessarily tonight.
Comments
9 Responses to “The Netezza Developer Network”
Leave a Reply
Curt,
Technically, this looks the same as regular User Defined Functions (UDFs), which we (and some other appliance vendors) already support.
As you indicate, there can be huge advantages to using UDFs on an MPP system, due to the reduced network traffic and sheer processing power available.
However, I’ll admit that it’s an interesting marketing spin.
Stuart
For the record, I do not believe that NZ’s engine is related to PostgreSQL; they use it on the front end, but I think the actual query processing is an entirely separate beast.
My understanding is that they started with PostgreSQL and then rewrote the back-end to embed in the FPGA.
Query processing on a SPU is split between the general purpose CPU and the FPGA, with the latter mostly responsible for restricting rows and projecting columns.
I’m not sure how much of PostgreSQL is left and I don’t believe they contribute to or benefit from the open source community. Effectively, it’s a proprietary DBMS engine that Netezza develops and supports themselves. Nothing particularly wrong with that, but it’s different to our model.
Stuart
CEO, DATAllegro
So is your model the same as Greenplum’s then?
Well, DATAllegro uses Ingres rather than PostgreSQL, claiming the latter didn’t offer enough support for partitioning. And they’re optimized for a lot less index use than Greenplum is. Not coincidentally, they have less support for exotic indices or datatypes than Greenplum seems to.
Those are a few differences that come to mind.
CAM
Tom,
Our business model is a little different to Greenplum’s. They offer Bizgres as an open source variant of PostgreSQL and then sell Bizgres MPP under a software license.
We embed a set of Ingres licenses under our own commercial MPP layer and sell the solution as an appliance on Dell/EMC/Cisco hardware (and Bull/EMC/Cisco in Continental Europe). We contribute most of our changes to Ingres to the open source version, but we don’t use the GPL version, so we can be selective.
In effect, our model is a hybrid of Netezza’s appliance and Greenplum’s use of an open source, commodity database.
Stuart
CEO, DATAllegro
[…] is more than a theoretical question — well, both SAS and SPSS are disclosed members of the Netezza Developers Network. As for SMP DBMS — well, some of the work certainly could be replicated, but other important […]
[…] one example: The Netezza Development Network seems to consist mainly of ISVs and classified-agency government users. Or to be even more […]
[…] Netezza’s form of UDFs (User-Defined Functions) […]