Datameer at the time of Datameer 5.0
Datameer checked in, having recently announced general availability of Datameer 5.0. So far as I understood, Datameer is still clearly in the investigative analytics business, in that:
- Datameer does business intelligence, but not at human real-time speeds. Datameer query durations are sometimes sub-minute, but surely not sub-second.
- Datameer also does lightweight predictive analytics/machine learning — k-means clustering, decision trees, and so on.
Key aspects include:
- Datameer runs straight against Hadoop.
- Like many other analytic offerings, Datameer is meant to be “self-service”, for line-of-business business analysts, and includes some “data preparation”. Datameer also has had some data profiling since Datameer 4.0.
- The main way of interacting with Datameer seems to be visual analytic programming. However, I Datameer has evolved somewhat away from its original spreadsheet metaphor.
- Datameer’s primitives resemble those you’d find in SQL (e.g. JOINs, GROUPBYs). More precisely, that would be SQL with a sessionization extension; e.g., there’s a function called GROUPBYGAP.
- Datameer lets you write derived data back into Hadoop.
Datameer use cases sound like the usual mix, consisting mainly of a lot of customer analytics, a bit of anti-fraud, and some operational analytics/internet-of-things. Datameer claims 200 customers and 240 installations, the majority of which are low-end/single-node users, but at least one of which is a multi-million dollar relationship. I don’t think those figures include OEM sell-through. I forgot to ask for any company size metrics, such as headcount.
In a chargeable add-on, Datameer 5.0 has an interesting approach to execution. (The lower-cost version just uses MapReduce.)
- An overall task can of course be regarded as a DAG (Directed Acyclic Graph).
- Datameer automagically picks an execution strategy for each node. Administrator hints are allowed.
- There are currently three choices for execution: MapReduce, clustered in-memory, or single-node. This all works over Tez and YARN.
- Spark is a likely future option.
Datameer calls this “Smart Execution”. Notes on Smart Execution include:
- Datameer sees a lot of tasks that look at 10-100 megabytes of data, especially in malware/anomaly detection. Datameer believes there can be a huge speed-up from running those on a single-node rather than in a clustered mode requiring data (re)distributed, with at least one customer reporting >20X speedup of at least one job.
- Yes, each step of the overall DAG might look to the underlying execution engine as a DAG of its own.
- Tez can fire up processes ahead of when they’re needed, so you don’t have to wait for all the process start-up delays in series.
- Datameer had a sampling/preview engine from the getgo that outside of Hadoop MapReduce. That’s the basis for the non-MapReduce options now.
Strictly from a BI standpoint, Datameer seems clunky.
- Datameer doesn’t have drilldown.
- Datameer certainly doesn’t let you navigate from one visualization to the next ala QlikView/Tableau/et al. (Note to self: I really need to settle on a name for that feature.)
- While Datameer does have a bit in the way of event series visualization, it seems limited.
- Of course, Datameer doesn’t have streaming-oriented visualizations.
- I’m not aware of any kind of text search navigation.
Datameer does let you publish BI artifacts, but doesn’t seem to have any collaboration features beyond that.
Last and also least: In an earlier positioning, Datameer made a big fuss about an online app store. Since analytic apps stores never amount to much, I scoffed.* That said, they do have it, so I asked which apps got the most uptake. Most of them seem to be apps which boil down to connectors, access to outside data sets, and/or tutorials. Also mentioned were two more substantive apps, one for path-oriented clickstream analysis, and one for funnel analysis combining several event series.
*I once had a conversation with a client that ended:
- “This app store you’re proposing will not be a significant success.”
- “Are you sure?”
- “Almost certain. It really just sounds like StreamBase’s.”
- “I ‘m not familiar with StreamBase’s app store.”
- “My point exactly.”
Comments
7 Responses to “Datameer at the time of Datameer 5.0”
Leave a Reply
“name for that feature” = “click across”.
Interesting idea, Dave. Doesn’t seem to be used much in that context, however.
Hi Curt, thanks as ever for an insightful article. Can you please elaborate a bit on the below statement?
“However, I Datameer has evolved somewhat away from its original spreadsheet metaphor.”
Martin,
I don’t think Datameer’s UI is particularly spreadsheet-like at this time. Do you disagree? I’ll confess to not having spent a lot of time with it.
I don’t have anything to compare against, version 5 is the first time I am looking at Datameer. But I would say majority of analyst time is spent in the Workbook view, which seems fairly spreadsheet-like. Do you see Datameer moving away from this paradigm going forward?
Hi Curt,
What in your opinion makes an analytics app store infeasible?
Thanks,
Fiorina
Hi Fiorina,
Market size. Doing a particular analysis in a particular way using a particular base technology is rarely something for which there’s enough demand to support a major business.