Aster Data on parallelism
Aster Data’s core claim boils down to “We do parallelism better.” Aster has shied away from saying that for marketing purposes, for fear of the response “Yeah, right, everybody says that.” But when I talked with Mayank Bawa, Steve Wooledge, et al. yesterday, I focused discussions on just that point. Based on that chat and others before, here are some highlights (as I understand them) of what Aster claims, believes, or believes to be differentiated about its nCluster technology:
- Aster Data believes that nCluster’s parallel query optimization is highly sophisticated. For example, Aster nCluster reduces network traffic and congestion by aggressively doing partial aggregates and GROUP BYs before moving data around.
- Aster Data claims easy bare-metal installs for nCluster nodes. Aster further claims to have long had background data movement along the lines of what Greenplum recently introduced. Taken together, that produces a claim of easy, without-taking-the-system-down expandability.
- More generally, Aster Data claims that nCluster does more things in parallel than other products.
- One example is backup. Aster is proud of its integrated parallel backup, with the backed-up data going to an nCluster cluster. (One thing about that link — I think it was a bit over-optimistic about Aster’s customer count.)
- In another example, Aster claims that, while MapReduce does analytics in parallel, alternative technologies (typically User Defined Functions) generally do not. (There’s a certain truth to that characterization of competitors’ technology, but more exceptions than Aster would perhaps like to suggest.)
- Most other MPP database management systems, while they can mix and match hardware in a single installation, tend to run heterogeneous hardware on a least-common-denominator basis. (Teradata Virtual Storage would be one partial exception to that rule. DATAllegro’s old mix-and-match “multi-temperature” strategy would be another, very partial one.) Aster claims that nCluster can be run on dissimilar hardware nodes side-by-side, each used to more or less its full capability.
- Aster Data claims that nCluster offers particularly high reliability — due in large part to clever parallel recovery from failures — despite running on low-cost hardware and storage. More on that some other time.
- Empirically, Aster did put this all together and get nCluster into production use in the cloud ahead of most competitors. (Vertica and Kognitio are, I’d say, entitled to be regarded as exceptions.)
And by the way, Aster Data claims that better parallelism doesn’t matter just for huge databases. Rather, it’s important even for small numbers of nodes, due to overhead in parallel processing among even a couple of nodes.
Comments
3 Responses to “Aster Data on parallelism”
Leave a Reply
Hi Curt,
I’m glad you’re talking to these guys. From what I’ve learned about, I tend to believe the claims. To me, it seems like an appliance-less data warehouse appliance that installs fast on commodity hardware and clusters well.
Cheers,
Dave
Hi Dave!
I talk to Aster quite a bit, and like them a lot. I’ve just been a little remiss in writing about them.
[…] don’t have a lot more to add right now, mainly because I wrote at some length about Aster’s non-appliance-specific, non-MapReduce technology and positioning a couple of weeks ago. Categories: Analytic technologies, Aster Data, Business intelligence, Data […]