Analytic technologies

Discussion of technologies related to information query and analysis. Related subjects include:

Business intelligence
Data warehousing
(in Text Technologies) Text mining
(in The Monash Report) Data mining
(in The Monash Report) General issues in analytic technology

November 5, 2012

Real-time confusion

I recently proposed a 2×2 matrix of BI use cases:

Is there an operational business process involved?
Is there a focus on root cause analysis?

Let me now introduce another 2×2 matrix of analytic scenarios:

Is there a compelling need for super-fresh data?
Who’s consuming the results — humans or machines?

My point is that there are at least three different cool things people might think about when they want their analytics to be very fast:

Fast investigative analytics — e.g., business intelligence with great query response.
Computations on very fresh data, presented to humans — e.g. “heartbeat” graphics monitoring a network.
Computations on very fresh data, presented back to a machine — e.g., a recommendation engine that includes makes good use of data about a user’s last few seconds of actions.

There’s also one slightly boring one that however drives a lot of important applications: Read more

Categories: Business intelligence, Games and virtual worlds, Log analysis, Predictive modeling and advanced analytics, Splunk, Streaming and complex event processing (CEP), WibiData

5 Comments

November 1, 2012

Introduction to Continuuity

I chatted with Todd Papaioannou about his new company Continuuity. Todd is as handy at combining buzzwords as he is at concatenating vowels, and so Continuuity — with two “U”s — is making a big data fabric platform as a service with REST APIs that runs over Hadoop and HBase in the private or public clouds. I found the whole thing confusing, in that:

I recoil against buzzwords. In particular …
… I pay as little attention to distinctions among PaaS/IaaS/WaaS — Platform/Infrastructure/Whatever as a Service — as I can.
The Continuuity story sounds Heroku-like, but Todd doesn’t want Continuuity compared to Heroku.
Todd does want Continuuity discussed in terms of the application server category, but:
- It is hard to discuss app servers without segueing quickly amongst development, deployment, and data connectivity, and Continuuity is no exception to that rule.
- There is doubt as to whether using app servers makes any sense.

But all confusion aside, there are some interesting aspects to Continuuity. Read more

Categories: Application servers, Cloud computing, Hadoop, HBase, MapReduce, Parallelization, Predictive modeling and advanced analytics, Software as a Service (SaaS)

7 Comments

October 24, 2012

Introduction to Cirro

Stuart Frost, of DATAllegro fame, has started a small family of companies, and they’ve become my clients sort of as a group. The first one that I’m choosing to write about is Cirro, for which the basics are:

Cirro does data federation for analytics.
Cirro has 10 full-time people plus 4 part-timers.
Cirro launched its product in June.
Cirro doesn’t have customers yet, but hopes to fix that soon.

Data federation stories are often hard to understand because, until you drill down, they implausibly sound as if they do anything for everybody. That said, it’s reasonable to think of Cirro as a layer between Hadoop and your BI tool that:

Helps with data transformations.
Helps join Hadoop data to relational tables, even if the joins are large ones.

In both cases, Cirro is calling on your data management software for help, RDBMS or Hadoop as the case may be.

More precisely, Cirro’s approach is: Read more

Categories: Business intelligence, Cirro, Data integration and middleware, Hadoop, MapReduce, Tableau Software

5 Comments

October 23, 2012

Introduction to Platfora

When I wrote last week that I have at least 5 clients claiming they’re uniquely positioned to support BI over Hadoop (most of whom partner with a 6th client, Tableau) the non-partnering exception I had in mind was Platfora, Ben Werther’s oh-so-stealthy startup that is finally de-stealthing today. Platfora combines:

An interesting approach to analytic data management.
Business intelligence tools integrated with that.

The whole thing sounds like a perhaps more general and certainly non-SaaS version of what Metamarkets has been offering for a while.

The Platfora technical story starts: Read more

Categories: Business intelligence, Columnar database management, Data models and architecture, Data warehousing, Database compression, Hadoop, Memory-centric data management, Platfora

6 Comments

October 18, 2012

Notes on Hadoop adoption and trends

With Strata/Hadoop World being next week, there is much Hadoop discussion. One theme of the season is BI over Hadoop. I have at least 5 clients claiming they’re uniquely positioned to support that (most of whom partner with a 6th client, Tableau); the first 2 whose offerings I’ve actually written about are Teradata Aster and Hadapt. More generally, I’m hearing “Using Hadoop is hard; we’re here to make it easier for you.”

If enterprises aren’t yet happily running business intelligence against Hadoop, what are they doing with it instead? I took the opportunity to ask Cloudera, whose answers didn’t contradict anything I’m hearing elsewhere. As Cloudera tells it (approximately — this part of the conversation* was rushed): Read more

Categories: Business intelligence, Cloudera, EAI, EII, ETL, ELT, ETLT, Hadoop, HBase, Health care, Investment research and trading, MapR, Market share and customer counts, Telecommunications, Web analytics

5 Comments

October 17, 2012

Notes on analytic hardware

I took the opportunity of Teradata’s Aster/Hadoop appliance announcement to catch up with Teradata hardware chief Carson Schmidt. I love talking with Carson, about both general design philosophy and his views on specific hardware component technologies.

From a hardware-requirements standpoint, Carson seems to view Aster and Hadoop as more similar to each other than either is to, say, a Teradata Active Data Warehouse. In particular, for Aster and Hadoop:

I/O is more sequential.
The CPU:I/O ratio is higher.
Uptime is a little less crucial.

The most obvious implication is differences in the choice of parts, and of their ratio. Also, in the new Aster/Hadoop appliance, Carson is content to skate by with RAID 5 rather than RAID 1.

I think Carson’s views about flash memory can be reasonably summarized as: Read more

Categories: Aster Data, Data warehouse appliances, Data warehousing, Hadoop, Solid-state memory, Storage, Teradata

2 Comments

October 16, 2012

Hadapt Version 2

My clients at Hadapt are coming out with a Version 2 to be available in Q1 2013, and perhaps slipstreaming some of the features before then. At that point, it will be reasonable to regard Hadapt as offering:

A very tight integration between an RDBMS-based analytic platform and Hadoop …
… that is decidedly immature as an analytic RDBMS …
… but which strongly improves the SQL capabilities of Hadoop (vs., say, the alternative of using Hive).

Solr is in the mix as well.

Hadapt+Hadoop is positioned much more as “better than Hadoop” than “a better scale-out RDBMS”– and rightly so, due to its limitations when viewed strictly from an analytic RDBMS standpoint. I.e., Hadapt is meant for enterprises that want to do several of:

Dump multi-structured data into Hadoop.
Refine or just move some of it into an RDBMS.
Bring in data from other RDBMS.
Process of all the above via Hadoop MapReduce.
Process of all the above via SQL.
Use full-text indexes on the data.

Hadapt has 6 or so production customers, a dozen or so more coming online soon, 35 or so employees (mainly in Cambridge or Poland), reasonable amounts of venture capital, and the involvement of a variety of industry luminaries. Hadapt’s biggest installation seems to have 10s of terabytes of relational data and 100s of TBs of multi-structured; Hadapt is very confident in its ability to scale an order of magnitude beyond that with the Version 2 product, and reasonably confident it could go even further.

At the highest level, Hadapt works like this: Read more

Categories: Analytic technologies, Cloudera, Columnar database management, Data models and architecture, Data warehousing, Hadapt, Hadoop, MapR, MapReduce, Market share and customer counts, SQL/Hadoop integration, Text

4 Comments

October 15, 2012

What is meant by “iterative analytics”

A number of people and companies are using the term “iterative analytics”. This is confusing, because it can mean at least three different things:

You analyze something quickly, decide the result is not wholly satisfactory, and try again. Examples might include:
- Aggressive use of drilldown, perhaps via an advanced-interface business intelligence tool such as Tableau or QlikView.
- Any case where you run a query or a model, think about the results, and run another one after that.
You develop an intermediate analytic result, and using it as input to the next round of analysis. This is roughly equivalent to saying that iterative analytics refers to a multi-step analytic process involving a lot of derived data.
#1 and #2 conflated/combined. This is roughly equivalent to saying that iterative analytics refers to all of to investigative analytics, sometimes known instead as exploratory analytics.

Based both on my personal conversations and a quick Google check, it’s reasonable to say #1 and #3 seem to be the most common usages, with #2 trailing a little bit behind.

But often it’s hard to be sure which of the various possible meanings somebody has in mind.

Related links

Monash’s First and Third Laws of Commercial Semantics state:

Categories: Analytic technologies, Business intelligence, QlikTech and QlikView, Tableau Software

3 Comments

October 1, 2012

Notes on the Oracle OpenWorld Sunday keynote

I’m not at Oracle OpenWorld, but as usual that won’t keep me from commenting. My bottom line on the first night’s announcements is:

At many large enterprises, Oracle has a lock on much of their IT efforts. (But not necessarily in the internet or investigative analytics areas.) Tonight’s announcements serve to strengthen that.
Tonight’s announcements do little to help Oracle in other market segments.

In particular:

1. At the highest level, my view of Oracle’s strategy is the same as it’s been for several years:

Clayton Christensen’s The Innovator’s Solution teaches us that Oracle should focus on selling a thick stack of technology to its highest-end customers, and that’s exactly what Oracle does focus on.

2. Tonight’s news is closely in line with what Oracle’s Juan Loaiza told me three years ago, especially:

Oracle thinks flash memory is the most important hardware technology of the decade, one that could lead to Oracle being “bumped off” if they don’t get it right.

Juan believes the “bulk” of Oracle’s business will move over to Exadata-like technology over the next 5-10 years. Numbers-wise, this seems to be based more on Exadata being a platform for consolidating an enterprise’s many Oracle databases than it is on Exadata running a few Especially Big Honking Database management tasks.

3. Oracle is confusing people with its comments on multi-tenancy. I suspect:

What Oracle is talking about when it says “multi-tenancy” is more like consolidation than true multi-tenancy.
Probably there are a couple of true multi-tenancy features as well.

4. SaaS (Software as a Service) vendors don’t want to use Oracle, because they don’t want to pay for it.* This limits the potential impact of Oracle’s true multi-tenancy features. Even so: Read more

Categories: Business intelligence, Cloud computing, Columnar database management, Data warehouse appliances, Data warehousing, Exadata, Memory-centric data management, Oracle, Software as a Service (SaaS), Solid-state memory, Storage

9 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Analytic technologies

Real-time confusion

More on Cloudera Impala

Introduction to Continuuity

Introduction to Cirro

Introduction to Platfora

Notes on Hadoop adoption and trends

Notes on analytic hardware

Hadapt Version 2

What is meant by “iterative analytics”

Notes on the Oracle OpenWorld Sunday keynote

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin