December 27, 2005

What needs to be updated anyway?

Shayne Nelson is posting some pretty wild ideas on data architecture and redundancy. In the process of doing so, he’s reopening an old discussion topic:

Why would data ever need to be erased?

and the natural follow-on

If it doesn’t need to be erased, what exactly do we have to update?

Here are some quick cuts at answering the second question:

Respondents to Nelson’s blog generally argue that it’s better to store data once and have redundant subcopies of it in the form of indexes. I haven’t yet seen any holes in those arguments. Still, it’s a discussion worth looking at and noodling over.

Comments

2 Responses to “What needs to be updated anyway?”

  1. Eric on February 1st, 2006 9:41 am

    Hi Curt – I found this interesting:

    Analytic data usually doesn’t need to be updated with full transactional integrity; slight, temporary errors do little harm.

    Perhaps true (difficult to say with precision), but the terms “usually,” “slight,” and “little” give me pause. It would seem that constraints on derived analytical data would follow from the needs which drove the data to be ETLed; wouldn’t you want to know whether your ETL process actually produced the structures you thoguht it would produce?

  2. Curt Monash on February 1st, 2006 4:43 pm

    Eric,

    If we leave out what is called “operational BI,” and also leave out planning, it is overwhelmingly the case that analytic data is used to identify ratios, trends, statistical correlations, and the like. Small errors simply aren’t important to those uses, because they don’t change the outcome in any detectable way.

    In most non-planning analytics, operational or otherwise, a lot of historical data is being examined. Small latency is rarely an issue.

    Planning applications are almost by definition non-urgent. A small amount of latency is not a big problem.

    There are a few application areas that we call “analytic” which truly require the same near-instantaneous data integrity that an order processing system would. For example, fraud prevention comes to mind, in a variety of industries. (Interestingly, that really is an aspect of order processing.)

    But most analytics are done today with a latency of at least a day, and often a week or more. I rarely find analytic applications that truly need sub-day latency. And it’s very rare to find ones where, to use the most common figure cited to me for “real-time” analytics, 15 minute latency would be a problem.

    Hell, most individuals who dabble in the stock market do so off of price data with 20 minute delays …

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.