January 27, 2015

Soft robots, Part 1 — introduction

There may be no other subject on which I’m so potentially biased as robotics, given that:

I don’t spend a lot of time on the area, but …
… one of the better robotics engineers in the world (Kevin Albert) just happens to be in my family …
… and thus he’s overwhelmingly my main source on the general subject of robots.

Still, I’m solely responsible for my own posts and opinions, while Kevin is busy running his startup (Pneubotics) and raising my grandson. And by the way — I’ve been watching the robotics industry slightly longer than Kevin has been alive. 😉

My overview messages about all this are:

Historically, robots have been very limited in their scope of motion and action. Indeed, most successful robots to date have been immobile, metallic programmable machines, serving on classic assembly lines.
Next-generation robots should and will be much more able to safely and effectively navigate through and work within general human-centric environments.
This will affect a variety of application areas in ways that readers of this blog may care about.

Categories: Uncategorized

2 Comments

January 19, 2015

Where the innovation is

I hoped to write a reasonable overview of current- to medium-term future IT innovation. Yeah, right. 🙂 But if we abandon any hope that this post could be comprehensive, I can at least say:

1. Back in 2011, I ranted against the term Big Data, but expressed more fondness for the V words — Volume, Velocity, Variety and Variability. That said, when it comes to data management and movement, solutions to the V problems have generally been sketched out.

Volume has been solved. There are Hadoop installations with 100s of petabytes of data, analytic RDBMS with 10s of petabytes, general-purpose Exadata sites with petabytes, and 10s/100s of petabytes of analytic Accumulo at the NSA. Further examples abound.
Velocity is being solved. My recent post on Hadoop-based streaming suggests how. In other use cases, velocity is addressed via memory-centric RDBMS.
Variety and Variability have been solved. MongoDB, Cassandra and perhaps others are strong NoSQL choices. Schema-on-need is in earlier days, but may help too.

2. Even so, there’s much room for innovation around data movement and management. I’d start with:

Product maturity is a huge issue for all the above, and will remain one for years.
Hadoop and Spark show that application execution engines:
- Have a lot of innovation ahead of them.
- Are tightly entwined with data management, and with data movement as well.
Hadoop is due for another refactoring, focused on both in-memory and persistent storage.
There are many issues in storage that can affect data technologies as well, including but not limited to:
- Solid-state (flash or post-flash) vs. spinning disk.
- Networked vs. direct-attached.
- Virtualized vs. identifiable-physical.
- Object/file/block.
Graph analytics and data management are still confused.

3. As I suggested last year, data transformation is an important area for innovation. Read more

Categories: Business intelligence, Cassandra, Data models and architecture, Databricks, Spark and BDAS, DataStax, EAI, EII, ETL, ELT, ETLT, Hadoop, In-memory DBMS, MapReduce, MongoDB, NoSQL, Parallelization, Petabyte-scale data management, Predictive modeling and advanced analytics, RDF and graphs, Schema on need, Solid-state memory, Storage, Text

11 Comments

January 10, 2015

Migration

There is much confusion about migration, by which I mean applications or investment being moved from one “platform” technology — hardware, operating system, DBMS, Hadoop, appliance, cluster, cloud, etc. — to another. Let’s sort some of that out. For starters:

There are several fundamentally different kinds of “migration”.
- You can re-host an existing application.
- You can replace an existing application with another one that does similar (and hopefully also new) things. This new application may be on a different platform than the old one.
- You can build or buy a wholly new application.
- There’s also the inbetween case in which you extend an old application with significant new capabilities — which may not be well-suited for the existing platform.
Motives for migration generally fall into a few buckets. The main ones are:
- You want to use a new app, and it only runs on certain platforms.
- The new platform may be cheaper to buy, rent or lease.
- The new platform may have lower operating costs in other ways, such as administration.
- Your employees may like the new platform’s “cool” aspect. (If the employee is sufficiently high-ranking, substitute “strategic” for “cool”.)
Different apps may be much easier or harder to re-host. At two extremes:
- It can be forbiddingly difficult to re-host an OLTP (OnLine Transaction Processing) app that is heavily tuned, tightly integrated with your other apps, and built using your DBMS vendor’s proprietary stored-procedure language.
- It might be trivial to migrate a few long-running SQL queries to a new engine, and pretty easy to handle the data connectivity part of the move as well.
Certain organizations, usually packaged software companies, design portability into their products from the get-go, with at least partial success.

I mixed together true migration and new-app platforms in a post last year about DBMS architecture choices, when I wrote: Read more

Categories: Emulation, transparency, portability, Hadoop, NoSQL

3 Comments

December 31, 2014

Notes on machine-generated data, year-end 2014

Most IT innovation these days is focused on machine-generated data (sometimes just called “machine data”), rather than human-generated. So as I find myself in the mood for another survey post, I can’t think of any better idea for a unifying theme.

1. There are many kinds of machine-generated data. Important categories include:

Web, network and other IT logs.
Game and mobile app event data.
CDRs (telecom Call Detail Records).
“Phone-home” data from large numbers of identical electronic products (for example set-top boxes).
Sensor network output (for example from a pipeline or other utility network).
Vehicle telemetry.
Health care data, in hospitals.
Digital health data from consumer devices.
Images from public-safety camera networks.
Stock tickers (if you regard them as being machine-generated, which I do).

That’s far from a complete list, but if you think about those categories you’ll probably capture most of the issues surrounding other kinds of machine-generated data as well.

2. Technology for better information and analysis is also technology for privacy intrusion. Public awareness of privacy issues is focused in a few areas, mainly: Read more

Categories: Ayasdi, Business intelligence, Data models and architecture, Databricks, Spark and BDAS, Games and virtual worlds, Health care, Investment research and trading, Kafka and Confluent, Log analysis, Memory-centric data management, NoSQL, Predictive modeling and advanced analytics, Splunk, Surveillance and privacy, Telecommunications, Web analytics

11 Comments

December 16, 2014

WibiData’s approach to predictive modeling and experimentation

A conversation I have too often with vendors goes something like:

“That confidential thing you told me is interesting, and wouldn’t harm you if revealed; probably quite the contrary.”
“Well, I guess we could let you mention a small subset of it.”
“I’m sorry, that’s not enough to make for an interesting post.”

That was the genesis of some tidbits I recently dropped about WibiData and predictive modeling, especially but not only in the area of experimentation. However, Wibi just reversed course and said it would be OK for me to tell more or less the full story, as long as I note that we’re talking about something that’s still in beta test, with all the limitations (to the product and my information alike) that beta implies.

As you may recall:

WibiData started out with a rich technology stack …
… but decided to cast itself as an application company …
… whose first vertical market is retailing,

With that as background, WibiData’s approach to predictive modeling as of its next release will go something like this: Read more

Categories: Predictive modeling and advanced analytics

2 Comments

December 12, 2014

Notes and links, December 12, 2014

1. A couple years ago I wrote skeptically about integrating predictive modeling and business intelligence. I’m less skeptical now.

For starters:

The predictive experimentation I wrote about over Thanksgiving calls naturally for some BI/dashboarding to monitor how it’s going.
If you think about Nutonian’s pitch, it can be approximated as “Root-cause analysis so easy a business analyst can do it.” That could be interesting to jump to after BI has turned up anomalies. And it should be pretty easy to whip up a UI for choosing a data set and objective function to model on, since those are both things that the BI tool would know how to get to anyway.

I’ve also heard a couple of ideas about how predictive modeling can support BI. One is via my client Omer Trajman, whose startup ScalingData is still semi-stealthy, but says they’re “working at the intersection of big data and IT operations”. The idea goes something like this:

Suppose we have lots of logs about lots of things.* Machine learning can help:
- Notice what’s an anomaly.
- Group* together things that seem to be experiencing similar anomalies.
That can inform a BI-plus interface for a human to figure out what is happening.

Makes sense to me. (Edit: ScalingData subsequently launched, under the name Rocana.)

* The word “cluster” could have been used here in a couple of different ways, so I decided to avoid it altogether.

Finally, I’m hearing a variety of “smart ETL/data preparation” and “we recommend what columns you should join” stories. I don’t know how much machine learning there’s been in those to date, but it’s usually at least on the roadmap to make the systems (yet) smarter in the future. The end benefit is usually to facilitate BI.

2. Discussion of graph DBMS can get confusing. For example: Read more

Categories: Business intelligence, Greenplum, Hadoop, Hortonworks, Log analysis, Neo Technology and Neo4j, Nutonian, Predictive modeling and advanced analytics, RDF and graphs, WibiData

5 Comments

December 10, 2014

A few numbers from MapR

MapR put out a press release aggregating some customer information; unfortunately, the release is a monument to vagueness. Let me start by saying:

I don’t know for sure, but I’m guessing Derrick Harris was incorrect in suspecting that this release was a reaction to my recent post about Hortonworks’ numbers. For one thing, press releases usually don’t happen that quickly.
And as should be obvious from the previous point — notwithstanding that MapR is a client, I had no direct involvement in this release.
In general, I advise clients and other vendors to put out the kind of aggregate of customer success stories found in this release. However, I would like to see more substance than MapR offered.

Anyhow, the key statement in the MapR release is:

… the number of companies that have a paid subscription for MapR now exceeds 700.

Unfortunately, that includes OEM customers as well as direct ones; I imagine MapR’s direct customer count is much lower.

In one gesture to numerical conservatism, MapR did indicate by email that it counts by overall customer organization, not by department/cluster/contract (i.e., not the way Hortonworks does). Read more

Categories: Hadoop, Health care, MapR, Market share and customer counts, Pricing, Telecommunications

3 Comments

December 7, 2014

Hadoop’s next refactoring?

I believe in all of the following trends:

Hadoop is a Big Deal, and here to stay.
Spark, for most practical purposes, is becoming a big part of Hadoop.
Most servers will be operated away from user premises, whether via SaaS (Software as a Service), co-location, or “true” cloud computing.

Trickier is the meme that Hadoop is “the new OS”. My thoughts on that start:

People would like this to be true, although in most cases only as one of several cluster computing platforms.
Hadoop, when viewed as an operating system, is extremely primitive.
Even so, the greatest awkwardness I’m seeing when different software shares a Hadoop cluster isn’t actually in scheduling, but rather in data interchange.

There is also a minor issue that if you distribute your Hadoop work among extra nodes you might have to pay a bit more to your Hadoop distro support vendor. Fortunately, the software industry routinely solves more difficult pricing problems than that.

Categories: Cloud computing, Databricks, Spark and BDAS, Hadoop, MapReduce, MemSQL, Software as a Service (SaaS)

15 Comments

December 7, 2014

Notes on the Hortonworks IPO S-1 filing

Given my stock research experience, perhaps I should post about Hortonworks’ initial public offering S-1 filing. 🙂 For starters, let me say:

Hortonworks’ subscription revenues for the 9 months ended last September 30 appear to be:
- $11.7 million from everybody but Microsoft, …
- … plus $7.5 million from Microsoft, …
- … for a total of $19.2 million.
Hortonworks states subscription customer counts (as per Page 55 this includes multiple “customers” within the same organization) of:
- 2 on April 30, 2012.
- 9 on December 31, 2012.
- 25 on April 30, 2013.
- 54 on September 30, 2013.
- 95 on December 31, 2013.
- 233 on September 30, 2014.
Per Page 70, Hortonworks’ total September 30, 2014 customer count was 292, including professional services customers.
Non-Microsoft subscription revenue in the quarter ended September 30, 2014 seems to have been $5.6 million, or $22.5 million annualized. This suggests Hortonworks’ average subscription revenue per non-Microsoft customer is a little over $100K/year.
This IPO looks to be a sharply “down round” vs. Hortonworks’ Series D financing earlier this year.
- In March and June, 2014, Hortonworks sold stock that subsequently was converted into 1/2 a Hortonworks share each at $12.1871 per share.
- The tentative top of the offering’s price range is $14/share.
- That’s also slightly down from the Series C price in mid-2013.

And, perhaps of interest only to me — there are approximately 50 references to YARN in the Hortonworks S-1, but only 1 mention of Tez.

Categories: Hadoop, Hortonworks, HP and Neoview, Market share and customer counts, Microsoft and SQL*Server, Pricing, Teradata, Yahoo

8 Comments

November 30, 2014

Thoughts and notes, Thanksgiving weekend 2014

I’m taking a few weeks defocused from work, as a kind of grandpaternity leave. That said, the venue for my Dances of Infant Calming is a small-but-nice apartment in San Francisco, so a certain amount of thinking about tech industries is inevitable. I even found time last Tuesday to meet or speak with my clients at WibiData, MemSQL, Cloudera, Citus Data, and MongoDB. And thus:

1. I’ve been sloppy in my terminology around “geo-distribution”, in that I don’t always make it easy to distinguish between:

Storing different parts of a database in different geographies, often for reasons of data privacy regulatory compliance.
Replicating an entire database into different geographies, often for reasons of latency and/or availability/ disaster recovery,

The latter case can be subdivided further depending on whether multiple copies of the data can accept first writes (aka active-active, multi-master, or multi-active), or whether there’s a clear single master for each part of the database.

What made me think of this was a phone call with MongoDB in which I learned that the limit on number of replicas had been raised from 12 to 50, to support the full-replication/latency-reduction use case.

2. Three years ago I posted about agile (predictive) analytics. One of the points was:

… if you change your offers, prices, ad placement, ad text, ad appearance, call center scripts, or anything else, you immediately gain new information that isn’t well-reflected in your previous models.

Subsequently I’ve been hearing more about predictive experimentation such as bandit testing. WibiData, whose views are influenced by a couple of Very Famous Department Store clients (one of which is Macy’s), thinks experimentation is quite important. And it could be argued that experimentation is one of the simplest and most direct ways to increase the value of your data.

3. I’d further say that a number of developments, trends or possibilities I’m seeing are or could be connected. These include agile and experimental predictive analytics in general, as noted in the previous point, along with: Read more

Categories: About this blog, Citus Data, Cloudera, Data models and architecture, Data warehousing, Databricks, Spark and BDAS, EMC, Hadoop, IBM and DB2, KXEN, MapReduce, Market share and customer counts, MemSQL, Microsoft and SQL*Server, MongoDB, MySQL, NewSQL, NoSQL, OLTP, Oracle, PostgreSQL, Predictive modeling and advanced analytics, SAP AG, Specific users, Sybase, Tokutek and TokuDB, WibiData

12 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in