EAI, EII, ETL, ELT, ETLT

Analysis of data integration products and technologies, especially ones related to data warehousing, such as ELT (Extract/Transform/Load). Related subjects include:

June 26, 2012

Teradata SQL-H, using HCatalog

When I grumbled about the conference-related rush of Hadoop announcements, one example of many was Teradata Aster’s SQL-H. Still, it’s an interesting idea, and a good hook for my first shot at writing about HCatalog. Indeed, other than the Talend integration bundled into Hortonworks’ HDP 1, Teradata SQL-H is the first real use of HCatalog I’m aware of.

The Teradata SQL-H idea is:

At least in theory, Teradata SQL-H lets you use a full set of analytic tools against your Hadoop data, with little limitation except price and/or performance. Teradata thinks the performance of all this can be much better than if you just use Hadoop (35X was mentioned in one particularly favorable example), but perhaps much worse than if you just copy/extract the data to an Aster cluster in the first place.

So what might the use cases be for something like SQL-H? Offhand, I’d say:

By way of contrast, the whole thing makes less sense for dashboarding kinds of uses, unless the dashboard users are very patient when they want to drill down.

June 25, 2012

Why I’m so forward-leaning about Hadoop features

In my recent series of Hadoop posts, there were several cases where I had to choose between recommending that enterprises:

I favored the more advanced features each time. Here’s why.

To a first approximation, I divide Hadoop use cases into two major buckets, only one of which I was addressing with my comments:

1. Analytic data management.* Here I favored features over reliability because they are more important, for Hadoop as for analytic RDBMS before it. When somebody complains about an analytic data store not being ready for prime time, never really working, or causing them to tear their hair out, what they usually mean is that:

Those complaints are much, much, more frequent than “It crashed”. So it was for Netezza, DATAllegro, Greenplum, Aster Data, Vertica, Infobright, et al. So it also is for Hadoop. And how does one address those complaints? By performance and feature enhancements, of the kind that the Hadoop community is introducing at high speed. Read more

June 16, 2012

Metamarkets’ back-end technology

This is part of a three-post series:

The canonical Metamarkets batch ingest pipeline is a bit complicated.

By “get data read to be put into Druid” I mean:

That metadata is what goes into the MySQL database, which also retains data about shards that have been invalidated. (That part is needed because of the MVCC.)

By “build the data segments” I mean:

When things are being done that way, Druid may be regarded as comprising three kinds of servers: Read more

June 14, 2012

Workday update

In August 2010, I wrote about Workday’s interesting technical architecture, highlights of which included:

I caught up with Workday recently, and things have naturally evolved. Most of what we talked about (by my choice) dealt with data management, business intelligence, and the overlap between the two.

It is now reasonable to say that Workday’s servers fall into at least seven tiers, although we talked mainly about five that work together as a kind of giant app/database server amalgamation. The three that do noteworthy data management can be described as:

Two other Workday server tiers may be described as: Read more

June 12, 2012

QlikTech bought Expressor

QlikTech has bought Expressor. Notes on that include:

April 5, 2012

Human real-time

I first became an analyst in 1981. And so I was around for the early days of the movement from batch to interactive computing, as exemplified by:

Of course, wherever there is interactive computing, there is a desire for interaction so fast that users don’t notice any wait time. Dan Fylstra, when he was pitching me the early windowing system VisiOn, characterized this as response so fast that the user didn’t tap his fingers waiting.* And so, with the move to any kind of interactive computing at all came a desire that the interaction be quick-response/low-latency. Read more

March 21, 2012

DataStax Enterprise 2.0

Edit: Multiple errors in the post below have been corrected in a follow-on post about DataStax Enterprise and Cassandra.

My client DataStax is announcing DataStax Enterprise 2.0. The big point of the release is that there’s a bunch of stuff integrated together, including at least:

DataStax stresses that all this runs on the same cluster, with the same administrative tools and so on. For example, on a single cluster:

Read more

March 16, 2012

Juggling analytic databases

I’d like to survey a few related ideas:

Here goes. Read more

March 12, 2012

Kinds of data integration and movement

“Data integration” can mean many different things, to an extent that’s impeding me from writing about the area. So I’ll start by simply laying out some of the myriad ways that data can be brought to where it is needed, and worry about other subjects later. Yes, this is a massive wall of text, and incomplete even so — but that in itself is my central point.

There are two main paradigms for data integration:

Data movement and replication typically take one of three forms:

Beyond the core functions of movement, replication, and/or federation, there are other concerns closely connected to data integration. These include:

In particular, the following are largely different from each other. Read more

January 25, 2012

Departmental analytics — best practices

I believe IT departments should support and encourage departmental analytics efforts, where “support” and “encourage” are not synonyms for “control”, “dominate”, “overwhelm”, or even “tame”. A big part of that is:
Let, and indeed help, departments have the data they want, when they want it, served with blazing performance.

Three things that absolutely should NOT be obstacles to these ends are:

Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.