February 22, 2010

Aster Data nCluster 4.5

Like Vertica, Netezza, and Teradata, Aster is using this week to pre-announce a forthcoming product release, Aster Data nCluster 4.5. Aster is really hanging its identity on “Big Data Analytics” or some variant of that concept, and so the two major named parts of Aster nCluster 4.5 are:

Aster Data Analytic Foundation, a set of analytic packages prebuilt in Aster’s SQL-MapReduce
Aster Data Developer Express, an Eclipse-based IDE (Integrated Development Environment) for developing and testing applications built on Aster nCluster, Aster SQL-MapReduce, and Aster Data Analytic Foundation

And in other Aster news:

Along with the development GUI in Aster nCluster 4.5, there is also a new administrative GUI.
Aster has certified that nCluster works with Fusion I/O boards, because at least one retail industry prospect cares. However, that in no way means that arm’s-length Fusion I/O certification is Aster’s ultimate solid-state memory strategy.
I had the wrong impression about how far Aster/SAS integration has gotten. So far, it’s just at the connector level.

Aster Data Developer Express evidently does some cool stuff, like providing some sort of parallelism testing right on your desktop. It also generates lots of stub code, saving humans from the tedium of doing that. Useful, obviously.

But mainly, I want to write about the analytic packages. I’m not convinced that they’re a big deal in themselves yet, or that a whole lot of person-months have gone into their combined development. Still, I think they provide a great indication of one direction in which analytic functionality is going. And by the way, Aster promises to release a lot more of that kind of thing over the next 12 months.

Aster’s flagship analytic package is nPath, which is like a regular expression matcher, but for (time) series of data rather than for character strings. The main use for nPath is in pulling specific kinds of event sequences out of web or network event logs. However, one could imagine uses in other sectors that focus on temporal or sequential data (e.g., trading, intelligence, other sensor analysis), should existing SQL- and/or CEP-based technologies not prove sufficiently flexible. Aster 4.5 adds some new aggregation capabilities around nPath.

Other not-wholly-new packages in the Aster Data Analytic Foundation announcement are for sessionization (of clickstream data and the like) and tokenization (of text/character string data). While sessionization can be done in SQL, Aster thinks its MapReduce-based version is faster, since it doesn’t require self-joins. Makes sense. Aster’s tokenization sounds lame, however – text analytics in MapReduce tends to reinvent simplistic wheels for no clear reason, and Aster doesn’t seem to be an exception. (Aster would argue, however, that anything it does in SQL-MapReduce is more flexible than pure SQL or pure MapReduce alternatives.)

Another example of better-living-without-self-joins is Aster’s new market basket package. This lets you look at a set of point-of-sale data, pick a small integer N, and pull out all the sets of N things that were bought by the same person at the same time. I haven’t probed the claim in detail, but Aster implies there’s less combinatorial explosion in its approach than it is in the self-join alternative.

Note: Gartner highlighted self joins as a performance challenge in its recent Data Warehouse Magic Quadrant.

Aster is also releasing a few statistical and general analytic functions — specifically (and I quote a slide):

exponential moving average
weighted moving average
simple moving average
volume-weighted average price
correlation
linear regression
logistic regression
approximate_percentile
approximate_count_distinct

The point of the last two items on the list is that if you set a non-zero tolerance for error, you can you can count things or order them into bins very efficiently – especially in terms of RAM — while being guaranteed not to exceed your error tolerance.

Note: One obvious inference from this list — which Aster gladly confirms — is that Aster has high hopes of selling to the financial services industry.

Finally, Aster is releasing its first pure graph-analytic function, for finding the shortest path between a given pair of nodes.

While I had the Aster folks on the phone anyway, I also took the opportunity to ask about the Aster nCluster 4.0 capability to create fairly persistent non-relational in-memory data structures. Specifically, I asked whether different users could access the same in-memory structure, and was told that this is a little klugey but not too horrendous. That suggests Aster’s capability may be a strict superset of UDF-based (User-Defined Function) approaches to meeting the same need, at least from a functionality standpoint. However, ease of creating those in-memory structures may still be better in the more SQL/UDF-centric approach favored by Teradata.

Categories: Aster Data, Data warehousing, Investment research and trading, Predictive modeling and advanced analytics, RDF and graphs, SAS Institute, Teradata

Subscribe to our complete feed!

Comments

9 Responses to “Aster Data nCluster 4.5”

February 2010 data warehouse DBMS news roundup | DBMS2 -- DataBase Management System Services on February 22nd, 2010 5:05 pm

[…] Data nCluster 4.5. Much like Aster’s prior release — Aster Data nCluster 4.0 – Aster Data nCluster 4.5 has a major focus on integrating analytics and database processing. This time, the emphasis is on […]
Clarifying the state of MPP in-database SAS | DBMS2 -- DataBase Management System Services on May 7th, 2010 4:46 pm

[…] I routinely am briefed way in advance of products’ introductions. For that reason and others, it can be hard for me to keep straight what’s been officially announced, introduced for test, introduced for general availability, vaguely planned for the indefinite future, and so on. Perhaps nothing has confused me more in that regard than the SAS Institute’s multi-year effort to get SAS integrated into various MPP DBMS, specifically Teradata, Netezza Twinfin(i), and Aster Data nCluster. […]
So can logistic regression be parallelized or not? | DBMS 2 : DataBase Management System Services on April 6th, 2011 5:04 am

[…] the other hand, Aster Data said it had parallelized logistic regression a year ago. (Slides 6-7 from a mid-2010 Aster deck may be clearer.) I’m guessing Fuzzy Logix might make […]
Lots of Aster Data analytic packages | DBMS 2 : DataBase Management System Services on April 8th, 2011 12:00 am

[…] start with Aster Data, which added to the list of analytic packages it previously announced, and kindly gave me permission to post a partial slide deck from the […]
TwinFin(i) – Netezza’s version of a parallel analytic platform | DBMS 2 : DataBase Management System Services on April 8th, 2011 12:02 am

[…] like Aster Data did in Aster 4.0 and now Aster 4.5, Netezza is announcing a general parallel big data analytic platform strategy. It is called Netezza […]
Teradata’s future product strategy | DBMS 2 : DataBase Management System Services on September 24th, 2011 11:10 pm

[…] Netezza or Aster, Teradata doesn’t seem to plan analytic capability that works outside the UDF (User Defined […]
mICHAEL rAMIREZ on November 10th, 2011 12:06 pm

dO YOU HAVE A PRICE LIST FOR THE NC-PE-75TB
PLEASE SEND ME THE PRICE LIST.
THANKS,
MIKE RAMIREZ
PROCUREMENT DEPT
CACI INC FEDERAL
Entity-centric event series analytics | DBMS 2 : DataBase Management System Services on October 18th, 2013 4:29 am

[…] number of my clients are focused on such scenarios, including WibiData, Teradata Aster (e.g. via nPath), Platfora (in the imminent Platfora 3), and others. And so I get involved in naming exercises. The […]
Interana | DBMS 2 : DataBase Management System Services on April 17th, 2017 6:11 am

[…] Interana may be the first company that’s ever told me it’s focused on providing a better nPath. […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Aster Data nCluster 4.5

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin