Aster Data

Analysis of data warehouse DBMS vendor Aster Data. Related subjects include:

October 15, 2009

MapReduce webinars and annotated slides

As previously noted, I’m giving a webinar twice today — i.e., Thursday, October 15 — at 10:00 am and 1:00 pm Eastern time.

The subject is MapReduce.
The sponsor is Aster Data.
Part of the webinar will be an explanation of MapReduce basics, especially the conflict between theory/propaganda and reality.
As you might guess from the identity of the sponsor, there will be an emphasis on how MapReduce and SQL play nicely with each other.
You can register for the webinar on Aster’s site.
(Edit) The webinar replay can be found here.
I’ve already uploaded the slides from which I will present. (But not the ones from which Aster folks will be talking. I’ve seen those, and there’s some good technical crunch in some of them.) The “Notes” under the slides have a number of relevant URLs for follow-up, as well as a small number of explanatory comments (e.g., as to why one slide simply has a quote from and corresponding picture of Shakespeare).

Categories: Aster Data, MapReduce, Presentations

6 Comments

October 10, 2009

How 30+ enterprises are using Hadoop

MapReduce is definitely gaining traction, especially but by no means only in the form of Hadoop. In the aftermath of Hadoop World, Jeff Hammerbacher of Cloudera walked me quickly through 25 customers he pulled from Cloudera’s files. Facts and metrics ranged widely, of course:

Some are in heavy production with Hadoop, and closely engaged with Cloudera. Others are active Hadoop users but are very secretive. Yet others signed up for initial Hadoop training last week.
Some have Hadoop clusters in the thousands of nodes. Many have Hadoop clusters in the 50-100 node range. Others are just prototyping Hadoop use. And one seems to be “OEMing” a small Hadoop cluster in each piece of equipment sold.
Many export data from Hadoop to a relational DBMS; many others just leave it in HDFS (Hadoop Distributed File System), e.g. with Hive as the query language, or in exactly one case Jaql.
Some are household names, in web businesses or otherwise. Others seem to be pretty obscure.
Industries include financial services, telecom (Asia only, and quite new), bioinformatics (and other research), intelligence, and lots of web and/or advertising/media.
Application areas mentioned — and these overlap in some cases — include:
- Log and/or clickstream analysis of various kinds
- Marketing analytics
- Machine learning and/or sophisticated data mining
- Image processing
- Processing of XML messages
- Web crawling and/or text processing
- General archiving, including of relational/tabular data, e.g. for compliance

Categories: Application areas, Aster Data, Cloudera, Data types, Data warehousing, Database diversity, EAI, EII, ETL, ELT, ETLT, Hadoop, Investment research and trading, Log analysis, MapReduce, Open source, Parallelization, Predictive modeling and advanced analytics, Scientific research, Structured documents, Telecommunications, Text, Vertica Systems, Web analytics

9 Comments

October 9, 2009

I have some presentations coming up (all on October Thursdays)

On Thursday, October 15, and two different times (10:00 am and 1:00 pm Eastern time), I’ll be giving a webinar for Aster Data on MapReduce. The content is very much work in progress, but it definitely will:

Be overviewy in nature
Emphasize SQL/MapReduce integration

Then, on the evening of Thursday, October 22, there’s something called the Boston Big Data Summit, in Waltham, where “Big Data” evidently is to be construed as anything from a few terabytes on up. (Things are smaller in the Northeast than in California …) It’s being put together by Amrith Kumar (who I don’t really know) and Bob Zurek (who everybody knows). This is the inaguaral meeting. It seems I’m both giving the keynote and running the subsequent panel, one of whose participants will be Ellen Rubin. Read more

Categories: Analytic technologies, Aster Data, Cloud computing, MapReduce, Presentations

4 Comments

October 1, 2009

MapReduce tidbits

I’ve never had children, and so have never had to supervise squabbling siblings, each accusing the other of selfishness and insufficient sharing. Perhaps the MapReduce vendors are a form of karmic payback. Be that as it may, my client Cloudera has organized Hadoop World on October 2 in New York, and my other client Aster Data is hosting a MapReduce-centric Big Data Summit the night before, at the same venue. Even if you don’t go, both conference’s agenda pages offer a peek into what’s going on in MapReduce applications. I’m not going either, but even so I hope to post an overview of MapReduce uses after the conferences serve to publicize some of them.

Even better, I plan to hold a couple of webinars on MapReduce, the first at 10 am (blech) and 1 pm Eastern time on October 15. They’re sponsored by Aster Data, and so will have a strong SQL/MapReduce orientation.

In connection with its conference, Aster is introducing an nCluster-Hadoop connector — i.e., a loader from HDFS (Hadoop Distributed File System) implemented in SQL/MapReduce. In particular: Read more

Categories: Aster Data, Cloudera, Data warehousing, Hadoop, MapReduce

7 Comments

September 13, 2009

Fault-tolerant queries

MapReduce/Hadoop fans sometimes raise the question of query fault-tolerance. That is — if a node fails, does the query need to be restarted, or can it keep going? For example, Daniel Abadi et al. trumpet query fault-tolerance as one of the virtues of HadoopDB. Some of the scientists at XLDB spoke of query fault-tolerance as being a good reason to leave 100s or 1000s of terabytes of data in Hadoop-managed file systems.

When we discussed this subject a few months ago in a couple of comment threads, it seemed to be the case that:

Hadoop generally has query fault-tolerance. Intermediate result sets are materialized, and data isn’t tied to nodes anyway. So if a node goes down, its work can be sent to another node.
Hive actually did not have query fault-tolerance at that time, but it was on the roadmap. (Edit: Actually, it did within a single MapReduce job. But one Hive job can comprise several rounds of MapReduce.)
Most DBMS vendors do not have query fault-tolerance. If a query fails, it gets restarted from scratch.
Aster Data’s nCluster, however, does appear to have some kind of query fault-tolerance.

This raises an obvious (pair of) question(s) — why and/or when would anybody ever care about query fault-tolerance? Read more

Categories: Analytic technologies, Aster Data, Data warehousing, Hadoop, Parallelization, Scientific research, Theory and architecture

10 Comments

July 1, 2009

Correction to a recent quote

I’m quoted in a recent article around Aster’s appliance announcement as saying data warehouse appliances are more suitable for small workgroups of analysts crunching small amounts of data than they are for other uses.

But that’s not what I think at all.

I do think the ease-of-administration pitch for appliances makes them particularly well suited for users who want to scrape by without doing much database adminstration. This is especially appealing to departments or smaller enterprises. And the first/best scenario that comes to mind is indeed a small team of analysts, with good SQL skills but lightweight DBA experience, although Netezza has proved that many other kinds of users can find appliances appealing as well.

But that small team of analysts may maintain the largest database in the firm.

And by the way — notwithstanding the MySpace counterexample, most of Aster’s initial customers had <10 terabyte databases, and I think indeed <5 terabyte. The “frontline” pitch succeeded for Aster before (MySpace again aside) any better-big-data-crunching story did.

Categories: Analytic technologies, Aster Data, Data warehouse appliances, Data warehousing, Theory and architecture

Aster Data enters the appliance game

Aster Data is rolling out a line of nCluster appliances today. Highlights include:

Configurations ranging from 9 6.25 terabytes to 1 petabyte of user data. (Edit: Here’s the up-to-date data sheet.)
A $50K “Express Edition” price for <1 terabyte of user data. Unfortunately, that’s the only stated price.
The option of bundled MicroStrategy.
“MapReduce” in the name, which suggests something about the positioning — i.e., enterprise decision support, rather than Aster’s usual web/”frontline” emphasis. (Edit: That also fits with Aster’s recent MapReduce-for-.NET announcement.) (Edit: Actual name is Aster MapReduce Data Warehouse Appliance.)
Claims that because Aster runs effectively on cheaper, more truly “commodity” hardware than competitors, you get more hardware bang for the buck if you buy from Aster.

I don’t have a lot more to add right now, mainly because I wrote at some length about Aster’s non-appliance-specific, non-MapReduce technology and positioning a couple of weeks ago.

Categories: Analytic technologies, Aster Data, Business intelligence, Data warehouse appliances, Data warehousing, Database compression, MapReduce, Pricing

16 Comments

June 25, 2009

My current customer list among the analytic DBMS specialists

(This is an updated version of an August, 2008 post.)

One of my favorite pages on the Monash Research website is the list of many current and a few notable past customers. (Another favorite page is the one for testimonials.) For a variety of reasons, I won’t undertake to be more precise about my current customer list than that. But I don’t think it would hurt anything to list the analytic/data warehouse DBMS/appliance specialists in the group. They are:

Aster Data
Greenplum
Infobright
Kickfire
Kognitio
Microsoft
Netezza (my biggest client this year, probably, because of all the Enzee Universe appearances)
Sybase
Teradata
Vertica
Attivio, which may or may not be construed as being in the analytic DBMS business
Clearpace, ditto

All of those are Monash Advantage members.

If you care about all this, you may also be interested in the rest of my standards and disclosures.

Categories: About this blog, Aster Data, Data warehousing, Greenplum, Infobright, Kickfire, Microsoft and SQL*Server, Netezza, Sybase, Teradata, Vertica Systems

4 Comments

June 16, 2009

Aster Data on parallelism

Aster Data’s core claim boils down to “We do parallelism better.” Aster has shied away from saying that for marketing purposes, for fear of the response “Yeah, right, everybody says that.” But when I talked with Mayank Bawa, Steve Wooledge, et al. yesterday, I focused discussions on just that point. Based on that chat and others before, here are some highlights (as I understand them) of what Aster claims, believes, or believes to be differentiated about its nCluster technology: Read more

Categories: Analytic technologies, Aster Data, Data warehousing, MapReduce, Parallelization, Theory and architecture

3 Comments

June 9, 2009

Aster Data sticks by its SQL/MapReduce guns

Aster Data continues to think that MapReduce, integrated with SQL, is an important technology. For example:

Aster announced today that it’s providing .NET support for SQL/MapReduce. Perhaps not coincidentally, Aster’s biggest customer is MySpace, which is apparently a big Microsoft shop. (And MySpace parent Fox Interactive Media is a SQL/MapReduce fan, albeit running on Greenplum.)
Aster generally puts more emphasis on MapReduce than SQL/MapReduce rival Greenplum. That’s a non-trivial comparison, because Greenplum is making progress in SQL/MapReduce itself.
When talking with Aster folks, I can’t get them to shut up hear a lot about SQL/MapReduce.

I was a big fan of SQL/MapReduce when it was first announced last August. Notwithstanding persuasive examples favoring pure DBMS or pure MapReduce over DBMS/MapReduce integration, I continue to think the SQL/MapReduce idea has great potential. But I do wish more successful production examples would become visible …

Categories: Analytic technologies, Aster Data, Data warehousing, Fox and MySpace, Greenplum, MapReduce, Parallelization

4 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Aster Data

MapReduce webinars and annotated slides

How 30+ enterprises are using Hadoop

I have some presentations coming up (all on October Thursdays)

MapReduce tidbits

Fault-tolerant queries

Correction to a recent quote

Aster Data enters the appliance game

My current customer list among the analytic DBMS specialists

Aster Data on parallelism

Aster Data sticks by its SQL/MapReduce guns

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin