Derived data

The management of data that is derived, augmented, enhanced, adjusted, or cooked — as opposed to just the raw stuff.

June 19, 2011

Investigative analytics and derived data: Enzee Universe 2011 talk

I’ll be speaking Monday, June 20 at IBM Netezza’s Enzee Universe conference. Thus, as is my custom:

I’m posting draft slides.
I’m encouraging comment (especially in the short time window before I have to actually give the talk).
I’m offering links below to more detail on various subjects covered in the talk.

The talk concept started out as “advanced analytics” (as opposed to fast query, a subject amply covered in the rest of any Netezza event), as a lunch break in what is otherwise a detailed “best practices” session. So I suggested we constrain the subject by focusing on a specific application area — customer acquisition and retention, something of importance to almost any enterprise, and which exploits most areas of analytic technology. Then I actually prepared the slides — and guess what? The mix of subjects will be skewed somewhat more toward generalities than I first intended, specifically in the areas of investigative analytics and derived data. And, as always when I speak, I’ll try to raise consciousness about the issues of liberty and privacy, our options as a society for addressing them, and the crucial role we play as an industry in helping policymakers deal with these technologically-intense subjects.

Slide 3 refers back to a post I made last December, saying there are six useful things you can do with analytic technology:

Operational BI/Analytically-infused operational apps: You can make an immediate decision.
Planning and budgeting: You can plan in support of future decisions.
Investigative analytics (multiple disciplines): You can research, investigate, and analyze in support of future decisions.
Business intelligence: You can monitor what’s going on, to see when it necessary to decide, plan, or investigate.
More BI: You can communicate, to help other people and organizations do these same things.
DBMS, ETL, and other “platform” technologies: You can provide support, in technology or data gathering, for one of the other functions.

Slide 4 observes that investigative analytics:

Is the most rapidly advancing of the six areas …
… because it most directly exploits performance & scalability.

Slide 5 gives my simplest overview of investigative analytics technology to date: Read more

Categories: Analytic technologies, Business intelligence, Data warehousing, Derived data, EAI, EII, ETL, ELT, ETLT, GIS and geospatial, Netezza, Predictive modeling and advanced analytics, RDF and graphs, Text

4 Comments

May 30, 2011

Another category of derived data

Six months ago, I argued the importance of derived analytic data, saying

… there’s no escaping the importance of derived/augmented/enhanced/cooked/adjusted data for analytic data processing. The five areas I have in mind are, loosely speaking:

Aggregates, when they are maintained, generally for reasons of performance or response time.

Calculated scores, commonly based on data mining/predictive analytics.

Text analytics.

The kinds of ETL (Extract/Transform/Load) Hadoop and other forms of MapReduce are commonly used for.

Adjusted data, especially in scientific contexts.

Probably there are yet more examples that I am at the moment overlooking.

Well, I did overlook at least one category. 🙂

A surprisingly important kind of derived data is metadata, especially for large, poly-structured data sets. For example, CERN has vastly quantities of experiment sensor data, stored as files; just the metadata alone fills over 10 terabytes in an Oracle database. MarkLogic is big on storing derived metadata, both on the publishing/media and intelligence sides of the business.

Categories: Data models and architecture, Derived data, Hadoop, MarkLogic

2 Comments

November 29, 2010

Data that is derived, augmented, enhanced, adjusted, or cooked

On this food-oriented weekend, I could easily go on long metaphorical flights about the distinction between “raw” and “cooked” data. I’ll spare you that part — reluctantly, given my fondness for fresh fruit, sushi, and steak tartare — but there’s no escaping the importance of derived/augmented/enhanced/cooked/adjusted data for analytic data processing. The five areas I have in mind are, loosely speaking:

Aggregates, when they are maintained, generally for reasons of performance or response time.
Calculated scores, commonly based on data mining/predictive analytics.
Text analytics.
The kinds of ETL (Extract/Transform/Load) Hadoop and other forms of MapReduce are commonly used for.
Adjusted data, especially in scientific contexts.

Categories: Analytic technologies, Data warehousing, Derived data

12 Comments

October 6, 2010

eBay followup — Greenplum out, Teradata > 10 petabytes, Hadoop has some value, and more

I chatted with Oliver Ratzesberger of eBay around a Stanford picnic table yesterday (the XLDB 4 conference is being held at Jacek Becla’s home base of SLAC, which used to stand for “Stanford Linear Accelerator Center”). Todd Walter of Teradata also sat in on the latter part of the conversation. Things I learned included: Read more

Categories: Data warehousing, Derived data, eBay, Greenplum, Hadoop, HBase, Log analysis, Petabyte-scale data management, Teradata

30 Comments

← Previous Page

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in