February 25, 2010

Chris Bird’s blog is brilliant, and update-in-place is increasingly passe’

I wouldn’t say every post in Chris Bird’s occasionally-updated blog is brilliant. I wouldn’t even say every post is readable. But I’d still recommend his blog to just about anybody who reads here as, at a minimum, a consciousness-raiser.

One of the two posts inspiring me to mention this is a high-level one on “technical debt“, reminding us why things don’t always get done right the first time, and further reminding us that circling back to fix them sooner rather than later is usually wise. The other connects two observations that individually have great merit (at least if you don’t take them to extremes):

Update-in-place is passe’
So is elaborate up-front database design

Specific points of interest here include:

Most data never gets changed after being written. Update-in-place doesn’t save all that much in storage hardware.
Update-in-place interferes with a lot of modern optimizations in analytic DBMS design.
Knowing what values data had in the past is interesting in and of itself.
So, potentially, is knowing what “dirty” data end-users — especially customers and prospects — decided to enter.
The “right” amount of data validation is application-dependent. For example, if data validation involves torturing your customers, maybe it’s not such a good idea. (Great observation by Chris.)
If you have the old data as well as the new, the harm of having “bad” updates is lessened. (Central connecting observation by Chris.)
People enter data inconsistently. MDM (Master Data Management) and data cleansing tools fix much (admittedly not all) of the harm. Computers are cheaper than people. You do the math.
Data is increasingly being managed in non-relational and/or non-persistent ways. Get used to it.
As the NoSQL guys point out, some of today’s most demanding applications have extremely simple schemas.

Categories: Theory and architecture

Subscribe to our complete feed!

Comments

7 Responses to “Chris Bird’s blog is brilliant, and update-in-place is increasingly passe’”

Christopher Browne on February 26th, 2010 7:11 pm

Part of me says, “oh, this is just MVCC.” Which

But that usually is treated as being an invisible layering, just like the updates in a Log Structured Filesystem. ()

It’s clear that in the later parts of your comments, you’re describing the notion that the sequence of update history is actually intended to be visible.

It seems worthy of some thought, for sure.
Christopher Browne on February 26th, 2010 7:13 pm

Odd… Bits of my comment seem to have gotten lost, notably URLs for MVCC & LSF.

Perhaps your MDM cleansing is a bit overexuberant? 🙂
Curt Monash on February 26th, 2010 9:08 pm

Yes, I think time-travel is a useful feature. And I suspect Chris feels more emphatically about that than I do — but then he has an outstanding track record of catching on early to technical trends.
RC on February 28th, 2010 12:39 pm

A case for update-in-place: http://blog.mongodb.org/post/248614779/fast-updates-with-mongodb-update-in-place and http://blog.mongodb.org/post/171353301/using-mongodb-for-real-time-analytics
Curt Monash on March 1st, 2010 4:07 am

RC,

I’m not sure in what cases I’d endorse the application design being emphasized there. You keep some of the incoming data. You throw away the rest. What you keep you send to disk in the forms of counters that are constantly changed. The point of the exercise is that you want access in real time.

Huh? Why not keep the small amount you want in real time in memory, and send a complete record of everything to disk however fast you can get it there?
Chris Bird on March 1st, 2010 1:19 pm

Of course this isn’t quite as black and white as I made it out to be on the original blog posting. There is always a careful balance between what you are throwing away and what performance you can afford (or think you need).

The idea of the original post is to look with healthy skepticism every time you are tempted to use update logic. There are good cases for doing updates in place – but I don’t think it should be the default case. I will shortly be writing some responses to observations made against the original post.
Michael Calcagno on March 8th, 2010 3:04 pm

Some excellent points. I am more practitioner than theorist and tend to take a very pragmatic approach. As a result I have tended to move away from these two items as a matter of necessity in our ever changing world. However, can you elaborate on the point about data is increasingly being managed with non-relational and non-persistent methods. As I have come to understand that data by it’s existence is relational. Meaning that all data has inherent within it relationships and is only useful in relation to other data and concepts that it either supports, is neutral to (no relationship or passive relationship), or disproves. Maybe I misunderstood and you simply meant traditional normalized relational models? Thanks for the great posts (I have subscribed to the RSS feeds for both blogs).

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Chris Bird’s blog is brilliant, and update-in-place is increasingly passe’

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin