December 27, 2005

What needs to be updated anyway?

Shayne Nelson is posting some pretty wild ideas on data architecture and redundancy. In the process of doing so, he’s reopening an old discussion topic:

Why would data ever need to be erased?

and the natural follow-on

If it doesn’t need to be erased, what exactly do we have to update?

Here are some quick cuts at answering the second question:

“Primary” data usually doesn’t really need to be updated, exactly. But it does need to be stored in such a way that it can immediately be found again and correctly identified as the most recent information.
Analytic data usually doesn’t need to be updated with full transactional integrity; slight, temporary errors do little harm.
“Derived” data such as bank balances (derived from deposits and withdrawals) and inventory levels (derived from purchases and sales) commonly needs to be updated with full industrial-strength protections.
Certain kinds of primary transactions, such as travel reservations, need the same treatment as “derived” data. When the item sold is unique, the primary/derived distinction largely goes away.
Notwithstanding the foregoing, it must be possible to update anything for error-correction purposes (something Nelson seems to have glossed over to date).

Respondents to Nelson’s blog generally argue that it’s better to store data once and have redundant subcopies of it in the form of indexes. I haven’t yet seen any holes in those arguments. Still, it’s a discussion worth looking at and noodling over.

Categories: Theory and architecture

Subscribe to our complete feed!

Comments

2 Responses to “What needs to be updated anyway?”

Eric on February 1st, 2006 9:41 am

Hi Curt – I found this interesting:

Analytic data usually doesn’t need to be updated with full transactional integrity; slight, temporary errors do little harm.

Perhaps true (difficult to say with precision), but the terms “usually,” “slight,” and “little” give me pause. It would seem that constraints on derived analytical data would follow from the needs which drove the data to be ETLed; wouldn’t you want to know whether your ETL process actually produced the structures you thoguht it would produce?
Curt Monash on February 1st, 2006 4:43 pm

Eric,

If we leave out what is called “operational BI,” and also leave out planning, it is overwhelmingly the case that analytic data is used to identify ratios, trends, statistical correlations, and the like. Small errors simply aren’t important to those uses, because they don’t change the outcome in any detectable way.

In most non-planning analytics, operational or otherwise, a lot of historical data is being examined. Small latency is rarely an issue.

Planning applications are almost by definition non-urgent. A small amount of latency is not a big problem.

There are a few application areas that we call “analytic” which truly require the same near-instantaneous data integrity that an order processing system would. For example, fraud prevention comes to mind, in a variety of industries. (Interestingly, that really is an aspect of order processing.)

But most analytics are done today with a latency of at least a day, and often a week or more. I rarely find analytic applications that truly need sub-day latency. And it’s very rare to find ones where, to use the most common figure cited to me for “real-time” analytics, 15 minute latency would be a problem.

Hell, most individuals who dabble in the stock market do so off of price data with 20 minute delays …

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

What needs to be updated anyway?

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin