September 20, 2006
I say “sequential”, you say …
I talked with Teradata today, and they called me on my use of the term “sequential.” Basically, if there’s any head movement for disk seeks, some computer science researchers wouldn’t call it “sequential.” I didn’t know that; I was just familiar with the less precise usage of the term in some vendors’ marketing and discussions.* OK, I’ll make up a new, more precise term instead. How about “coarse-grained”?
*And so we have another instance of Monash’s First Law of Commercial Semantics: Bad jargon drives out good.
Comments
8 Responses to “I say “sequential”, you say …”
Leave a Reply
[…] Since 2002, Teradata has had a “cylinder read” option, allowing coarse-grained reads in the 2 megabyte size, comparable to what DATallegro or Netezza do all the time. Users evidently find this extremely valuable. […]
Well OK, I guess we do move the disk heads between reads. But that’s because we have sophisticated partitioning to minimize the amount of data read, rather than scanning the whole disk (who would want to do that?). Our appliance is very carefully optimized to minimize I/O waits due to head movement while reading enough data in each access to max out the disk arrays.
Let’s just do a simple calculation to see if this is effective. In a DATAllegro appliance running a complex mix of concurrent queries, we typically see up to 800MBps per node of twelve disks. At close to 70MBps per disk, that’s around the maximum sequential read speed as quoted by the manufacturer (where caching on the disk or controller is not involved). Hard for a computer scientist to argue with that, eh?
In contrast, random I/O reading one 32k page at a time will max out at around 300 transactions per second with even the most expensive disks. That’s only 9.6MBps.
Stuart
DATAllegro
I’m wondering whether the ability that appliances have to keep disks near their theoretical limit is challenged by the anticipatory scheduler introduced in the Linus 2.6 kernel. Some tests that I have started with Oracle parallel query suggest that it might be.
The anticipatory scheduler takes a very small pause in the order of one millisecond after satisfying as large red request to see if a read request is then forthcoming that was contiguous with the first. If so, a head movement is avoided. The first test I’ve tried showed that close-to-maximum performance is sustained with eight simultaneous query slaves, with a throughput benefit of 60% over other schedulers.
I should add that the read requests were only of 256kb, but it seems to me that this scheduler makes the read size very much less relevant.
I posted the first test results here … http://oraclesponge.wordpress.com/2006/10/02/linux-26-kernel-io-schedulers-for-oracle-data-warehousing-part-ii/
Any comments, pro or con, by those experienced with appliances or other technologies are very welcome, of course.
[…] By way of contrast, DATallegro would endorse 1, 2, and 5, but argue that table scans via sequential reads (I’ve happily given up the “coarse-grained” terminology, since almost nobody cares) obviate most or all of the need for 3 and 4. And Netezza – well, I guess I shouldn’t comment on their views, because of their strict NDA policy. […]
Rather than ‘sequential’ or ‘coarse grained’, I suggest the terminology of
‘pseudo-sequential’. This captures the idea that near sequential rates are
being achieved regardless of the method – i.e., scheduling and reordering,
combining, large transfer sizes, etc.
Mark,
That was actually the first thing I thought of. But I imagined questions like “Is that anything like ‘pseudo-conversational’, and changed direction.”
Now I’ve gone back to just using “sequential”, purism be damned.
Best,
CAM
Someone can help me with oracle 10g RAC instllation on win server 2003?
I would like to leave documentation on http://www.merovingio.it
thank you
Very helpful !!!
thanks