Aster Data nCluster 4.5
Like Vertica, Netezza, and Teradata, Aster is using this week to pre-announce a forthcoming product release, Aster Data nCluster 4.5. Aster is really hanging its identity on “Big Data Analytics” or some variant of that concept, and so the two major named parts of Aster nCluster 4.5 are:
- Aster Data Analytic Foundation, a set of analytic packages prebuilt in Aster’s SQL-MapReduce
- Aster Data Developer Express, an Eclipse-based IDE (Integrated Development Environment) for developing and testing applications built on Aster nCluster, Aster SQL-MapReduce, and Aster Data Analytic Foundation
And in other Aster news:
- Along with the development GUI in Aster nCluster 4.5, there is also a new administrative GUI.
- Aster has certified that nCluster works with Fusion I/O boards, because at least one retail industry prospect cares. However, that in no way means that arm’s-length Fusion I/O certification is Aster’s ultimate solid-state memory strategy.
- I had the wrong impression about how far Aster/SAS integration has gotten. So far, it’s just at the connector level.
Aster Data Developer Express evidently does some cool stuff, like providing some sort of parallelism testing right on your desktop. It also generates lots of stub code, saving humans from the tedium of doing that. Useful, obviously.
But mainly, I want to write about the analytic packages. I’m not convinced that they’re a big deal in themselves yet, or that a whole lot of person-months have gone into their combined development. Still, I think they provide a great indication of one direction in which analytic functionality is going. And by the way, Aster promises to release a lot more of that kind of thing over the next 12 months.
Aster’s flagship analytic package is nPath, which is like a regular expression matcher, but for (time) series of data rather than for character strings. The main use for nPath is in pulling specific kinds of event sequences out of web or network event logs. However, one could imagine uses in other sectors that focus on temporal or sequential data (e.g., trading, intelligence, other sensor analysis), should existing SQL- and/or CEP-based technologies not prove sufficiently flexible. Aster 4.5 adds some new aggregation capabilities around nPath.
Other not-wholly-new packages in the Aster Data Analytic Foundation announcement are for sessionization (of clickstream data and the like) and tokenization (of text/character string data). While sessionization can be done in SQL, Aster thinks its MapReduce-based version is faster, since it doesn’t require self-joins. Makes sense. Aster’s tokenization sounds lame, however – text analytics in MapReduce tends to reinvent simplistic wheels for no clear reason, and Aster doesn’t seem to be an exception. (Aster would argue, however, that anything it does in SQL-MapReduce is more flexible than pure SQL or pure MapReduce alternatives.)
Another example of better-living-without-self-joins is Aster’s new market basket package. This lets you look at a set of point-of-sale data, pick a small integer N, and pull out all the sets of N things that were bought by the same person at the same time. I haven’t probed the claim in detail, but Aster implies there’s less combinatorial explosion in its approach than it is in the self-join alternative.
Note: Gartner highlighted self joins as a performance challenge in its recent Data Warehouse Magic Quadrant.
Aster is also releasing a few statistical and general analytic functions — specifically (and I quote a slide):
- exponential moving average
- weighted moving average
- simple moving average
- volume-weighted average price
- correlation
- linear regression
- logistic regression
- approximate_percentile
- approximate_count_distinct
The point of the last two items on the list is that if you set a non-zero tolerance for error, you can you can count things or order them into bins very efficiently – especially in terms of RAM — while being guaranteed not to exceed your error tolerance.
Note: One obvious inference from this list — which Aster gladly confirms — is that Aster has high hopes of selling to the financial services industry.
Finally, Aster is releasing its first pure graph-analytic function, for finding the shortest path between a given pair of nodes.
While I had the Aster folks on the phone anyway, I also took the opportunity to ask about the Aster nCluster 4.0 capability to create fairly persistent non-relational in-memory data structures. Specifically, I asked whether different users could access the same in-memory structure, and was told that this is a little klugey but not too horrendous. That suggests Aster’s capability may be a strict superset of UDF-based (User-Defined Function) approaches to meeting the same need, at least from a functionality standpoint. However, ease of creating those in-memory structures may still be better in the more SQL/UDF-centric approach favored by Teradata.
Comments
9 Responses to “Aster Data nCluster 4.5”
Leave a Reply
[…] Data nCluster 4.5. Much like Aster’s prior release — Aster Data nCluster 4.0 – Aster Data nCluster 4.5 has a major focus on integrating analytics and database processing. This time, the emphasis is on […]
[…] I routinely am briefed way in advance of products’ introductions. For that reason and others, it can be hard for me to keep straight what’s been officially announced, introduced for test, introduced for general availability, vaguely planned for the indefinite future, and so on. Perhaps nothing has confused me more in that regard than the SAS Institute’s multi-year effort to get SAS integrated into various MPP DBMS, specifically Teradata, Netezza Twinfin(i), and Aster Data nCluster. […]
[…] the other hand, Aster Data said it had parallelized logistic regression a year ago. (Slides 6-7 from a mid-2010 Aster deck may be clearer.) I’m guessing Fuzzy Logix might make […]
[…] start with Aster Data, which added to the list of analytic packages it previously announced, and kindly gave me permission to post a partial slide deck from the […]
[…] like Aster Data did in Aster 4.0 and now Aster 4.5, Netezza is announcing a general parallel big data analytic platform strategy. It is called Netezza […]
[…] Netezza or Aster, Teradata doesn’t seem to plan analytic capability that works outside the UDF (User Defined […]
dO YOU HAVE A PRICE LIST FOR THE NC-PE-75TB
PLEASE SEND ME THE PRICE LIST.
THANKS,
MIKE RAMIREZ
PROCUREMENT DEPT
CACI INC FEDERAL
[…] number of my clients are focused on such scenarios, including WibiData, Teradata Aster (e.g. via nPath), Platfora (in the imminent Platfora 3), and others. And so I get involved in naming exercises. The […]
[…] Interana may be the first company that’s ever told me it’s focused on providing a better nPath. […]