Predictive modeling and advanced analytics

Discussion of technologies and vendors in the overlapping areas of predictive analytics, predictive modeling, data mining, machine learning, Monte Carlo analysis, and other “advanced” analytics.

November 1, 2012

More on Cloudera Impala

What I wrote before about Cloudera Impala was quite incomplete. After a followup call, I now feel I have a better handle on the whole thing.

First, some basics:

The general technical idea of Impala is:

Read more

October 29, 2012

Introduction to Continuuity

I chatted with Todd Papaioannou about his new company Continuuity. Todd is as handy at combining buzzwords as he is at concatenating vowels, and so Continuuity — with two “U”s —  is making a big data fabric platform as a service with REST APIs that runs over Hadoop and HBase in the private or public clouds. I found the whole thing confusing, in that:

But all confusion aside, there are some interesting aspects to Continuuity. Read more

September 26, 2012

When should analytics be in-memory?

I was asked today for rules or guidance regarding “analytical problems, situations, or techniques better suited for in-database versus in-memory processing”. There are actually two kinds of distinction to be drawn:

Let’s focus on the first part of that — what work, in principle, should be done in memory?  Read more

September 7, 2012

Integrated internet system design

What are the central challenges in internet system design? We probably all have similar lists, comprising issues such as scale, scale-out, throughput, availability, security, programming ease, UI, or general cost-effectiveness. Screw those up, and you don’t have an internet business.

Much new technology addresses those challenges, with considerable success. But the success is usually one silo at a time — a short-request application here, an analytic database there. When it comes to integration, unsolved problems abound.

The top integration and integration-like challenges for me, from a practical standpoint, are:

Other concerns that get mentioned include:

Let’s skip those latter issues for now, focusing instead on the first four.

Read more

August 19, 2012

In-database analytics — analytic glossary draft entry

This is a draft entry for the DBMS2 analytic glossary. Please comment with any ideas you have for its improvement!

Note: Words and phrases in italics will be linked to other entries when the glossary is complete.

“In-database analytics” is a catch-all term for analytic capabilities, beyond standard SQL, running on the same machine as and under the management of an analytic DBMS. These can run in one or both of two modes:

In-database analytics may offer great performance and scalability advantages versus the alternative of extracting data and having it be processed on a separate server. This is particularly likely to be the case in MPP (Massively Parallel Processing) analytic DBMS environments.

Examples of in-database analytics include:

Other common domains for in-database analytics include sessionization, time series analysis, and relationship analytics.

Notable products offering in-database analytics include:

August 6, 2012

People’s facility with statistics — extremely difficult to predict

My recent post on broadening the usefulness of statistics presupposed two things about the statistical sophistication of business intelligence tool users:

Let me now say a little more on the subject. My basic message is — people’s facility with statistics is extremely difficult to predict.

If you DO have to make a point estimate, however, you could do worse than just putting quotation marks around the last four words of that sentence …

Suppose we measure people’s statistical understanding on a 5-point scale:

  1. People who haven’t clue what a p-value is.
  2. People who think a p-value of .05 signifies a 95% chance of truth.
  3. People who know better than that, but who still think that “statistically significant” is pretty close to the same as “true”.
  4. People who know better yet, but aren’t fluent in using statistical techniques correctly.
  5. People who are fluent in statistics.

Just knowing somebody’s job description, can you confidently predict their ranking to within, say, +/- 1 point? I suggest you can’t. People differ wildly in general numeracy and in specific statistical knowledge.

Even our guesses about average knowledge may be off, not least because education is changing things. Read more

July 31, 2012

Integrating statistical analysis into business intelligence

Business intelligence tools have been around for two decades.* In that time, many people have had the idea of integrating statistical analysis into classical BI. Yet I can’t think of a single example that was more than a small, niche success.

*Or four decades, if you count predecessor technologies.

The first challenge, I think, lies in the paradigm. Three choices that come to mind are:

But the first of those approaches requires too much intelligence from the software, while the third requires too much numeracy from the users. So only the second option has a reasonable chance to work, and even that one may be hard to pull off unless vendors focus on one vertical market at a time.

The challenges in full automation start: Read more

July 28, 2012

Some Vertica 6 features

Vertica 6 was recently announced, and so it seemed like a good time to catch up on Vertica features. The main topics I want to address are:

Also:

In general, the main themes of Vertica 6 appear to be:

Let’s do the analytic functionality first. Notes on that include:

I’ll also take this opportunity to expand on something I wrote about a few vendors — including Vertica — at the end of my post on approximate query results. When I probed how customers of Vertica and other RDBMS-based analytic platform vendors used vendor-proprietary advanced analytic SQL and other analytic capabilities, answers included: Read more

May 28, 2012

Quick-turnaround predictive modeling

Last November, I wrote two posts on agile predictive analytics. It’s time to return to the subject. I’m used to KXEN talking about the ability to do predictive modeling, very quickly, perhaps without professional statisticians; that the core of what KXEN does. But I was surprised when Revolution Analytics told me a similar story, based on a different approach, because ordinarily that’s not how R is used at all.

Ultimately, there seem to be three reasons why you’d want quick turnaround on your predictive modeling: Read more

May 21, 2012

Cool analytic stories

There are several reasons it’s hard to confirm great analytic user stories. First, there aren’t as many jaw-dropping use cases as one might think. For as I wrote about performance, new technology tends to make things better, but not radically so. After all, if its applications are …

… all that bloody important, then probably people have already been making do to get it done as best they can, even in an inferior way.

Further, some of the best stories are hard to confirm; even the famed beer/diapers story isn’t really true. Many application areas are hard to nail down due to confidentiality, especially but not only in such “adversarial” domains as anti-terrorism, anti-spam, or anti-fraud.

Even so, I have two questions in my inbox that boil down to “What are the coolest or most significant analytics stories out there?” So let’s round up some of what I know. Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.