August 18, 2010

More on temp space, compression, and “random” I/O

My PhD was in a probability-related area of mathematics (game theory), so I tend to squirm when something is described as “random” that clearly is not. That said, a comment by Shilpa Lawande on our recent flash/temp space discussion suggests the following way of framing a key point:

You really, really want to have multiple data streams coming out of temp space, as close to simultaneously as possible.
The storage performance characteristics of such a workload are more reminiscent of “random” than “sequential” I/O.

If everybody else is cool with it too, I can live with that. 🙂

Meanwhile, I talked again with Tim Vincent of IBM this afternoon. Tim endorsed the temp space/Flash fit, but with a different emphasis, which upon review I find I don’t really understand. The idea is:

Analytic DBMS processing generally stresses reads over writes.
Temp space is an exception — read and write use of temp space is pretty balanced. (You spool data out once, you read it back in once, and that’s the end of that; next time it will be overwritten.)

My problem with that is: Flash typically has lower write than read IOPS (I/O per second), so being (relatively) write-intensive would, to a first approximation, seem if anything to disfavor a workload for flash.

On the plus side, I was reminded of something I should have noted when I wrote about DB2 compression before:

Much like Vertica, DB2 operates on compressed data all the way through, including in temp space.

Categories: Data warehousing, Database compression, IBM and DB2, Vertica Systems

Subscribe to our complete feed!

Comments

6 Responses to “More on temp space, compression, and “random” I/O”

DB2 workload management | DBMS2 -- DataBase Management System Services on August 18th, 2010 4:47 am

[…] By way of contrast, Tim is cautious about the common approach of just lowering a query’s priority. His concern is that a long-running query could linger even longer, creating a long-lasting bottleneck in, for example, temp space. […]
Maris Darbonis on August 18th, 2010 5:27 am

Well, for multiple data streams maybe “concurrency” is the key word, if data needs to be fetched from several places on the storage device concurrently, that places rather heavy load on the read heads of rotating disks. And temp space often is used by several sessions concurrently.
It is also used for sort segments and hashes, which have random access patterns; for example, this presentation shows that SSDs are a good fit for that:
http://www.cs.arizona.edu/~bkmoon/papers/sigmod08ssd-slides.pdf , slides 17-20.
Curt Monash on August 18th, 2010 6:53 am

Slide 19 is really interesting. Thanks!
Juan Benavides on August 18th, 2010 7:59 am

Related to IBM (Tim) comment
In early times of DB2, temp storage was mostly used for sorting output from queries like ORDER BY. In most of this queries, not in all i.e. GROUP BY, the number of writes approach the number of reads.
Alex B on August 20th, 2010 10:47 am

Talking about concurrency. I haven’t noticed any degradation, but 400% performance improvement.
http://code.google.com/p/mist01/wiki/Vertica_demystified
Is this because I’ve used relatively small datasets for this test?
Introduction to Kaminario | DBMS 2 : DataBase Management System Services on December 5th, 2010 6:00 am

[…] you can choose to put just your most bottlenecking data on Kaminario K2 – the hot stuff, your temp space, your logs, […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

More on temp space, compression, and “random” I/O

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin