September 19, 2006

Is data warehousing now all about sequential access?

A lot of evidence is pointing to a major paradigm shift in data warehouse RDBMS, along the lines of:

Old way: Assume I/O is random; lower total execution time by improving selectivity and thus lowering the amount of I/O.

New way: Drive the amount of random I/O to near zero, and do as much sequential I/O as necessary to achieve this goal.

Examples include:

Data warehouse appliances (see especially this discussion of DATallegro’s architecture)
Columnar systems (see Nathan Myer’s first comment in this discussion of the much-hyped Required Technologies prototype)
Memory-centric systems, notably SAP’s BI Accelerator

The hardware logic is compelling, as long as we rely on hard disks rather than, say, flash memory. Rotation speed has only gone up 12.5-fold in the entire 50-year history of the hard drive, and currently maxes out at 15,000 RPM, which puts a floor of 2 ms on average random access time. But streaming data on and off disk gets exponentially faster, in line with increases in disk density and semiconductor performance. Hence sequential data access gets ever faster, while random access does not.

What I don’t 100% understand yet, however, is the full array of techniques used by the traditional leaders to co-opt or combat this trend. I’m looking into that; in particular, I have a call scheduled with Oracle.

I hope to write about this issue in my October Computerworld column. (My columns are typically submitted on the first Monday or Tuesday morning of the month, to appear in the following week’s edition.) Or if it slips from October, then soon thereafter. Any thoughts in the interim would be most welcome.

Categories: Data warehouse appliances, DATAllegro, Memory-centric data management, SAP AG, Theory and architecture, TransRelational

Subscribe to our complete feed!

Comments

4 Responses to “Is data warehousing now all about sequential access?”

DBMS2 — DataBase Management System Services»Blog Archive » I say “sequential”, you say … on September 20th, 2006 5:52 pm

[…] I talked with Teradata today, and they called me on my use of the term “sequential.” Basically, if there’s any head movement for disk seeks, some computer science researchers wouldn’t call it “sequential.” I didn’t know that; I was just familiar with the less precise usage of the term in some vendors’ marketing and discussions.* OK, I’ll make up a new, more precise term instead. How about “coarse-grained”? […]
David Aldridge on September 25th, 2006 5:52 pm

I’m absolutely behind anything that will supress disk head latency as a factor in data warehouse performance. In fact I wrote something on the subject something over a year ago. http://oraclesponge.wordpress.com/2005/07/25/time-slicing-of-disk-io/

I suppose that the vendors are still having trouble grasping the inherently different nature of data warehouses to the small-and-random i/o model that OLTP generates.
Linux 2.6 Kernel I/O Schedulers for Oracle Data Warehousing: Part I « The Oracle Sponge on September 28th, 2006 10:47 pm

[…] This issue popped back into my head after being directed through Log Buffer #11 at Mark Rittman’s site to an article by Curt Monash titled “Is data warehousing all about sequential access?” and which matched my thoughts very well. […]
oraclesponge.com » Blog Archive » Linux 2.6 Kernel I/O Schedulers for Oracle Data Warehousing: Part I on May 20th, 2010 9:57 am

[…] through Log Buffer #11 at Mark Rittman’s site to an article by Curt Monash titled “Is data warehousing all about sequential access?” and which matched my thoughts very […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Is data warehousing now all about sequential access?

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin