December 31, 2014

Notes on machine-generated data, year-end 2014

Most IT innovation these days is focused on machine-generated data (sometimes just called “machine data”), rather than human-generated. So as I find myself in the mood for another survey post, I can’t think of any better idea for a unifying theme.

1. There are many kinds of machine-generated data. Important categories include:

Web, network and other IT logs.
Game and mobile app event data.
CDRs (telecom Call Detail Records).
“Phone-home” data from large numbers of identical electronic products (for example set-top boxes).
Sensor network output (for example from a pipeline or other utility network).
Vehicle telemetry.
Health care data, in hospitals.
Digital health data from consumer devices.
Images from public-safety camera networks.
Stock tickers (if you regard them as being machine-generated, which I do).

That’s far from a complete list, but if you think about those categories you’ll probably capture most of the issues surrounding other kinds of machine-generated data as well.

2. Technology for better information and analysis is also technology for privacy intrusion. Public awareness of privacy issues is focused in a few areas, mainly: Read more

Categories: Ayasdi, Business intelligence, Data models and architecture, Databricks, Spark and BDAS, Games and virtual worlds, Health care, Investment research and trading, Kafka and Confluent, Log analysis, Memory-centric data management, NoSQL, Predictive modeling and advanced analytics, Splunk, Surveillance and privacy, Telecommunications, Web analytics

11 Comments

October 10, 2014

Notes on predictive modeling, October 10, 2014

As planned, I’m getting more active in predictive modeling. Anyhow …

1. I still believe most of what I said in a July, 2013 predictive modeling catch-all post. However, I haven’t heard as much subsequently about Ayasdi as I had expected to.

2. The most controversial part of that post was probably the claim:

I think the predictive modeling state of the art has become:

Cluster in some way.

Model separately on each cluster.

In particular:

It is always possible to instead go with a single model formally.
A lot of people think accuracy, ease-of-use, or both are better served by a true single-model approach.
Conversely, if you have a single model that’s pretty good, it’s natural to look at the subset of the data for which it works poorly and examine that first. Voila! You’ve just done a kind of clustering.

3. Nutonian is now a client. I just had my first meeting with them this week. To a first approximation, they’re somewhat like KXEN (sophisticated math, non-linear models, ease of modeling, quasi-automagic feature selection), but with differences that start: Read more

Categories: Ayasdi, Databricks, Spark and BDAS, Log analysis, Nutonian, Predictive modeling and advanced analytics, Revolution Analytics, Scientific research, Web analytics

9 Comments

July 12, 2013

More notes on predictive modeling

My July 2 comments on predictive modeling were far from my best work. Let’s try again.

1. Predictive analytics has two very different aspects.

Developing models, aka “modeling”:

Is a big part of investigative analytics.
May or may not be difficult to parallelize and/or integrate into an analytic RDBMS.
May or may not require use of your whole database.
Generally is done by humans.
Often is done by people with special skills, e.g. “statisticians” or “data scientists”.

More precisely, some modeling algorithms are straightforward to parallelize and/or integrate into RDBMS, but many are not.

Using models, most commonly:

Is done by machines …
… that “score” data according to the models.
May be done in batch or at run-time.
Is embarrassingly parallel, and is much more commonly integrated into analytic RDBMS than modeling is.

2. Some people think that all a modeler needs are a few basic algorithms. (That’s why, for example, analytic RDBMS vendors are proud of integrating a few specific modeling routines.) Other people think that’s ridiculous. Depending on use case, either group can be right.

3. If adoption of DBMS-integrated modeling is high, I haven’t noticed.

Categories: Ayasdi, Data warehousing, Hadoop, Health care, IBM and DB2, KXEN, Predictive modeling and advanced analytics, SAS Institute

6 Comments

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Notes on machine-generated data, year-end 2014

Notes on predictive modeling, October 10, 2014

More notes on predictive modeling

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin