Predictive modeling and advanced analytics

Discussion of technologies and vendors in the overlapping areas of predictive analytics, predictive modeling, data mining, machine learning, Monte Carlo analysis, and other “advanced” analytics.

May 15, 2010

Further clarifying in-database MPP SAS

My recent post about SAS’ MPP/in-database efforts was based on a discussion in a shared ride to the airport, and was correspondingly rough. SAS’ Shannon Heath was kind enough to write in with clarifications, and to allow me to post same. Read more

May 7, 2010

Notes and cautions about new analytic technology

As previously noted, I headlined Aster’s Big Data Summit in Washington, DC last Thursday. More than others, that talk did reuse material I’d presented before.  I promised the audience that when I got back I’d put up a blog post linking to supporting material for the talk.

Part of the time, I talked about things I’ve written about before. For example: Read more

May 7, 2010

Clarifying the state of MPP in-database SAS

I routinely am briefed way in advance of products’ introductions. For that reason and others, it can be hard for me to keep straight what’s been officially announced, introduced for test, introduced for general availability, vaguely planned for the indefinite future, and so on. Perhaps nothing has confused me more in that regard than the SAS Institute’s multi-year effort to get SAS integrated into various MPP DBMS, specifically Teradata, Netezza Twinfin(i), and Aster Data nCluster.

However, I chatted briefly Thursday with Michelle Wilkie, who is the SAS product manager overseeing all this (and also some other stuff, like SAS running on grids without being integrated into a DBMS). As best I understood, the story is: Read more

May 4, 2010

Revolution Analytics seems very confused

Revolution Analytics is a relaunch of a company previously known as REvolution Computing, built around the open source R language. Last week they sent around email claiming they were a new company (false), and asking for briefings in connection with an embargo this morning. I talked to Revolution Analytics yesterday, and they told me the embargo had been moved to Thursday.* However, Revolution apparently neglected to tell the press the same thing, and there’s an article out today — quoting me, because I’d given quotes in line with the original embargo, before I’d had the briefing myself. And what’s all this botched timing about? Mainly, it seems to be for a “statement of direction” about software Revolution Analytics hasn’t actually developed yet.

*More precisely, they spoke as if the embargo had been Thursday all along.

Read more

February 22, 2010

TwinFin(i) – Netezza’s version of a parallel analytic platform

Much like Aster Data did in Aster 4.0 and now Aster 4.5, Netezza is announcing a general parallel big data analytic platform strategy. It is called Netezza TwinFin(i), it is a chargeable option for the Netezza TwinFin appliance, and many announced details are on the vague side, with Netezza promising more clarity at or before its Enzee Universe conference in June. At a high level, the Aster and Netezza approaches compare/contrast as follows: Read more

February 22, 2010

Aster Data nCluster 4.5

Like Vertica, Netezza, and Teradata, Aster is using this week to pre-announce a forthcoming product release, Aster Data nCluster 4.5. Aster is really hanging its identity on “Big Data Analytics” or some variant of that concept, and so the two major named parts of Aster nCluster 4.5 are:

And in other Aster news:

Aster Data Developer Express evidently does some cool stuff, like providing some sort of parallelism testing right on your desktop. It also generates lots of stub code, saving humans from the tedium of doing that. Useful, obviously.

But mainly, I want to write about the analytic packages. Read more

October 10, 2009

How 30+ enterprises are using Hadoop

MapReduce is definitely gaining traction, especially but by no means only in the form of Hadoop. In the aftermath of Hadoop World, Jeff Hammerbacher of Cloudera walked me quickly through 25 customers he pulled from Cloudera’s files. Facts and metrics ranged widely, of course:

Read more

September 3, 2009

SAS on Netezza and other Netezza extensibility

I chatted with SAS CTO Keith Collins yesterday about the new SAS/Netezza in-database parallel data mining scoring offering. My impression is that this is very similar to SAS’ current Teradata support, notwithstanding SAS’ and Teradata’s apparent original intention of offering in-database modeling by now as well.

I gather this is a big performance-enhancing deal, just as it is for SPSS or Oracle’s own data mining over Oracle.  However, I must confess to not yet understanding why.  That is, I don’t know what’s so complicated about data mining scoring algorithms that makes hand-coding them in SQL particularly forbidding. My naive view of data mining is that you do a big regression to get a bunch of weights, and the resulting scoring algorithm is a linear combination of a few dozen variables.  Evidently, that’s not quite right.

Anyhow, it turns out that SAS held off on this work until it could be done for TwinFin. That’s largely because TwinFin lets partners write code on Intel CPUs, while previously they had to write in C for Netezza’s FPGAs. I got a similar sense from at least one other Netezza partner as well.

February 23, 2009

MapReduce user eHarmony chose Netezza over Aster or Greenplum

Depending on which IDG reporter you believe, eHarmony has either 4 TB of data or more than 12 TB, stored in Oracle but now analyzed on Netezza.  Interestingly, eHarmony is a Hadoop/MapReduce shop, but chose Netezza over Aster Data or Greenplum even so.  Price was apparently an important aspect of the purchase decision. Netezza also seems to have had a very smooth POC. Read more

February 10, 2009

Aster Data nPath

Edit: Unfortunately, this post and its sequel rely on Aster Data posts that Aster’s buyer Teradata no longer makes easily available.

At the same time as it rolled out its cloud story, Aster Data told of nPath, a MapReduce-based feature in nCluster. As best I understand it, the core idea of nPath is that it preprocesses sequential data via MapReduce so that you can then do ordinary SQL on it. (Steve Wooledge’s blog post about nPath outlines why that might be needed. Point 1 in Mayank Bawa’s August, 2008 post is much more concise. 😉 ) Now, that might seem to contradict the syntax, which is all about MapReduce being invoked via SQL — still, it’s what’s really going on.

That leads to two obvious questions: What is nPath used (or useful) for? and How is the preprocessing done anyway? Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.