October 13, 2014

Context for Cloudera

Hadoop World/Strata is this week, so of course my clients at Cloudera will have a bunch of announcements. Without front-running those, I think it might be interesting to review the current state of the Cloudera product line. Details may be found on the Cloudera product comparison page. Examining those details helps, I think, with understanding where Cloudera does and doesn’t place sales and marketing focus, which given Cloudera’s Hadoop market stature is in my opinion an interesting thing to analyze.

So far as I can tell (and there may be some errors in this, as Cloudera is not always accurate in explaining the fine details):

In analyzing all this, I’m focused on two particular aspects:

Read more

October 10, 2014

Notes on predictive modeling, October 10, 2014

As planned, I’m getting more active in predictive modeling. Anyhow …

1. I still believe most of what I said in a July, 2013 predictive modeling catch-all post. However, I haven’t heard as much subsequently about Ayasdi as I had expected to.

2. The most controversial part of that post was probably the claim:

I think the predictive modeling state of the art has become:

  • Cluster in some way.
  • Model separately on each cluster.

In particular:

3. Nutonian is now a client. I just had my first meeting with them this week. To a first approximation, they’re somewhat like KXEN (sophisticated math, non-linear models, ease of modeling, quasi-automagic feature selection), but with differences that start: Read more

October 5, 2014

Spark vs. Tez, revisited

I’m on record as noting and agreeing with an industry near-consensus that Spark, rather than Tez, will be the replacement for Hadoop MapReduce. I presumed that Hortonworks, which is pushing Tez, disagreed. But Shaun Connolly of Hortonworks suggested a more nuanced view. Specifically, Shaun tweeted thoughts including:

Tez vs Spark = Apples vs Oranges.

Spark is general-purpose engine with elegant APIs for app devs creating modern data-driven apps, analytics, and ML algos.

Tez is a framework for expressing purpose-built YARN-based DAGs; its APIs are for ISVs & engine/tool builders who embed it

[For example], Hive embeds Tez to convert its SQL needs into purpose-built DAGs expressed optimally and leveraging YARN

That said, I haven’t yet had a chance to understand what advantages Tez might have over Spark in the use cases that Shaun relegates it to.

Related link

October 5, 2014

Streaming for Hadoop

The genesis of this post is that:

Of course, we should hardly assume that what the Hadoop distro vendors favor will be the be-all and end-all of streaming. But they are likely to at least be influential players in the area.

In the parts of the problem that Cloudera emphasizes, the main tasks that need to be addressed are: Read more

September 28, 2014

Some stuff on my mind, September 28, 2014

1. I wish I had some good, practical ideas about how to make a political difference around privacy and surveillance. Nothing else we discuss here is remotely as important. I presumably can contribute an opinion piece to, more or less, the technology publication(s) of my choice; that can have a small bit of impact. But I’d love to do better than that. Ideas, anybody?

2. A few thoughts on cloud, colocation, etc.:

3. As for the analytic DBMS industry: Read more

September 7, 2014

An idealized log management and analysis system — from whom?

I’ve talked with many companies recently that believe they are:

At best, I think such competitive claims are overwrought. Still, it’s a genuinely important subject and opportunity, so let’s consider what a great log management and analysis system might look like.

Much of this discussion could apply to machine-generated data in general. But right now I think more players are doing product management with an explicit conception either of log management or event-series analytics, so for this post I’ll share that focus too.

A short answer might be “Splunk, but with more analytic functionality and more scalable performance, at lower cost, plus numerous coupons for free pizza.” A more constructive and bottoms-up approach might start with:  Read more

August 31, 2014

Notes from a visit to Teradata

I spent a day with Teradata in Rancho Bernardo last week. Most of what we discussed is confidential, but I think the non-confidential parts and my general impressions add up to enough for a post.

First, let’s catch up with some personnel gossip. So far as I can tell:

The biggest change in my general impressions about Teradata is that they’re having smart thoughts about the cloud. At least, Oliver is. All details are confidential, and I wouldn’t necessarily expect them to become clear even in October (which once again is the month for Teradata’s user conference). My main concern about all that is whether Teradata’s engineering team can successfully execute on Oliver’s directives. I’m optimistic, but I don’t have a lot of detail to support my good feelings.

In some quick-and-dirty positioning and sales qualification notes, which crystallize what we already knew before:

Also: Read more

August 14, 2014

“Freeing business analysts from IT”

Many of the companies I talk with boast of freeing business analysts from reliance on IT. This, to put it mildly, is not a unique value proposition. As I wrote in 2012, when I went on a history of analytics posting kick,

  • Most interesting analytic software has been adopted first and foremost at the departmental level.
  • People seem to be forgetting that fact.

In particular, I would argue that the following analytic technologies started and prospered largely through departmental adoption:

  • Fourth-generation languages (the analytically-focused ones, which in fact started out being consumed on a remote/time-sharing basis)
  • Electronic spreadsheets
  • 1990s-era business intelligence
  • Dashboards
  • Fancy-visualization business intelligence
  • Planning/budgeting
  • Predictive analytics
  • Text analytics
  • Rules engines

What brings me back to the topic is conversations I had this week with Paxata and Metanautix. The Paxata story starts:

Metanautix seems to aspire to a more complete full-analytic-stack-without-IT kind of story, but clearly sees the data preparation part as a big part of its value.

If there’s anything new about such stories, it has to be on the transformation side; BI tools have been helping with data extraction since — well, since the dawn of BI. Read more

August 7, 2014

Actian Vector Hadoop Edition

I have a small blacklist of companies I won’t talk with because of their particularly unethical past behavior. Actian is one such; they evidently made stuff up about me that Josh Berkus gullibly posted for them, and I don’t want to have conversations that could be dishonestly used against me.

That said, Peter Boncz isn’t exactly an Actian employee. Rather, he’s the professor who supervised Marcin Zukowski’s PhD thesis that became Vectorwise, and I chatted with Peter by Skype while he was at home in Amsterdam. I believe his assurances that no Actian personnel sat in on the call. 🙂

In other news, Peter is currently working on and optimistic about HyPer. But we literally spent less than a minute talking about that

Before I get to the substance, there’s been a lot of renaming at Actian. To quote Andrew Brust,

… the ParAccel, Pervasive and Vectorwise technologies are being unified under the Actian Analytics Platform brand. Specifically, the ParAccel technology … is being re-branded Actian Matrix; Pervasive’s technologies are rechristened Actian DataFlow and Actian DataConnect; and Vectorwise becomes Actian Vector.

and

Actian … is now “one company, with one voice and one platform” according to its John Santaferraro

The bolded part of the latter quote is untrue — at least in the ordinary sense of the word “one” — but the rest can presumably be taken as company gospel.

All this is by way of preamble to saying that Peter reached out to me about Actian’s new Vector Hadoop Edition when he blogged about it last June, and we finally talked this week. Highlights include:  Read more

July 23, 2014

Teradata bought Hadapt and Revelytix

My client Teradata bought my (former) clients Revelytix and Hadapt.* Obviously, I’m in confidentiality up to my eyeballs. That said — Teradata truly doesn’t know what it’s going to do with those acquisitions yet. Indeed, the acquisitions are too new for Teradata to have fully reviewed the code and so on, let alone made strategic decisions informed by that review. So while this is just a guess, I conjecture Teradata won’t say anything concrete until at least September, although I do expect some kind of stated direction in time for its October user conference.

*I love my business, but it does have one distressing aspect, namely the combination of subscription pricing and customer churn. When your customers transform really quickly, or even go out of existence, so sometimes does their reliance on you.

I’ve written extensively about Hadapt, but to review:

As for what Teradata should do with Hadapt: Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.