Notes on analytic technology, May 13, 2015
1. There are multiple ways in which analytics is inherently modular. For example:
- Business intelligence tools can reasonably be viewed as application development tools. But the “applications” may be developed one report at a time.
- The point of a predictive modeling exercise may be to develop a single scoring function that is then integrated into a pre-existing operational application.
- Conversely, a recommendation-driven website may be developed a few pages — and hence also a few recommendations — at a time.
Also, analytics is inherently iterative.
- Everything I just called “modular” can reasonably be called “iterative” as well.
- So can any work process of the nature “OK, we got an insight. Let’s pursue it and get more accuracy.”
If I’m right that analytics is or at least should be modular and iterative, it’s easy to see why people hate multi-year data warehouse creation projects. Perhaps it’s also easy to see why I like the idea of schema-on-need.
2. In 2011, I wrote, in the context of agile predictive analytics, that
… the “business analyst” role should be expanded beyond BI and planning to include lightweight predictive analytics as well.
I gather that a similar point is at the heart of Gartner’s new term citizen data scientist. I am told that the term resonates with at least some enterprises.
3. Speaking of Gartner, Mark Beyer tweeted
In data management’s future “hybrid” becomes a useless term. Data management is mutable, location agnostic and services oriented.
I replied
And that’s why I launched DBMS2 a decade ago, for “DataBase Management System SERVICES”. 🙂
A post earlier this year offers a strong clue as to why Mark’s tweet was at least directionally correct: The best structures for writing data are the worst for query, and vice-versa.
4. The foregoing notwithstanding, I continue to believe that there’s a large place in the world for “full-stack” analytics. Of course, some stacks are fuller than others, with SaaS (Software as a Service) offerings probably being the only true complete-stack products.
5. Speaking of full-stack vendors, some of the thoughts in this post were sparked by a recent conversation with Platfora. Platfora, of course, is full-stack except for the Hadoop underneath. They’ve taken to saying “data lake” instead of Hadoop, because they believe:
- It’s a more benefits-oriented than geek-oriented term.
- It seems to be more popular than the roughly equivalent terms “data hub” or “data reservoir”.
6. Platfora is coy about metrics, but does boast of high growth, and had >100 employees earlier this year. However, they are refreshingly precise about competition, saying they primarily see four competitors — Tableau, SAS Visual Analytics, Datameer (“sometimes”), and Oracle Data Discovery (who they view as flatteringly imitative of them).
Platfora seems to have a classic BI “land-and-expand” kind of model, with initial installations commonly being a few servers and a few terabytes. Applications cited were the usual suspects — customer analytics, clickstream, and compliance/governance. But they do have some big customer/big database stories as well, including:
- 100s of terabytes or more (but with a “lens” typically being 5 TB or less).
- 4-5 customers who pressed them to break a previous cap of 2 billion discrete values.
7. Another full-stack vendor, ScalingData, has been renamed to Rocana, for “root cause analysis”. I’m hearing broader support for their ideas about BI/predictive modeling integration. For example, Platfora has something similar on its roadmap.
Related links
- I did a kind of analytics overview last month, which had a whole lot of links in it. This post is meant to be additive to that one.
Comments
2 Responses to “Notes on analytic technology, May 13, 2015”
Leave a Reply
Thanks Curt — great to catch up. Fully on board with your comments here, and I wanted to re-emphasize that our customers are increasingly differentiated between ‘traditional data discovery’ (ala Tableau) that supports quick visualization against prepared SQL sources, and ‘big data discovery’ (ala Platfora) which allows regular analysts to look at patterns of behavior around customers, products, security threats, etc across diverse large datasets in the data lake. The latter really requires superpowering the analyst with a platform that lets them interactively and visually connect the dots down to raw datasets in Hadoop and weaves together data prep, in-memory acceleration and visual analysis into one seamless end-to-end experience. Litmus test — can a non-technical analysis point at a Petabyte of raw customer-related data in Hadoop (clickstream, social, loyalty, etc, etc) and answer meaningful multi-channel or behavioral/segmentation questions with interactive performance in an afternoon without IT involvement?
Ben, are you claiming that your product can be pointed to some bytes on hadoop and magically it understands data and provide analytics? I am sorry but let’s not convert this into marketing forum and no you cannot put a petabyte of data on any platform and get interactive performance on every question so let’s not go there and let’s talk facts.
Please come with facts and explain a clear use case of PB data set and what questions did you answer with what kind of HW etc.
Thanks