March 24, 2007
Will database compression change the hardware game?
I’ve recently made a lot of posts about database compression. 3X or more compression is rapidly becoming standard; 5X+ is coming soon as processor power increases; 10X or more is not unrealistic. True, this applies mainly to data warehouses, but that’s where the big database growth is happening. And new kinds of data — geospatial, telemetry, document, video, whatever — are highly compressible as well.
This trend suggests a few interesting possibilities for hardware, semiconductors, and storage.
- The growth in demand for storage might actually slow. That said, I frankly think it’s more likely that Parkinson’s Law of Data will continue to hold: Data expands to fill the space available. E.g., video and other media have near-infinite potential to consume storage; it’s just a question of resolution and fidelity.
- Solid-state (aka semiconductor or flash) persistent storage might become practical sooner than we think. If you really can fit a terabyte of data onto 100 gigs of flash, that’s a pretty affordable alternative. And by the way — if that happens, a lot of what I’ve been saying about random vs. sequential reads might be irrelevant.
- Similarly, memory-centric data management is more affordable when compression is aggressive. That’s a key point of schemes such as SAP’s or QlikTech’s. Who needs flash? Just put it in RAM, persisting it to disk just for backup.
- There’s a use for faster processors. Compression isn’t free. What you save on disk space and I/O you pay for at the CPU level. Those 5X+ compression levels do depend on faster processors, at least for the row store vendors.
Categories: Data warehousing, Database compression, Memory-centric data management, QlikTech and QlikView, SAP AG
Subscribe to our complete feed!
Comments
6 Responses to “Will database compression change the hardware game?”
Leave a Reply
Hi Curt,
Just a quick comment regarding flash.
Flash doesn’t actually support the bandwidth required for data-warehousing. While flash can support orders of magnitude more IOPS than disk, the bandwidth of each IOP is fairly small. The bandwidth provided is basically (size of read)*(number of IOPS). In comparison, disks provide few IOPS, but under the right circumstances, the read bandwidth becomes very very large.
Most flash devices are not going to get any where close to the 70-100MB/s that a disk can achieve (reading data that has high spatial locality). This means flash is largely unsuitable for the underlying data in a warehouse.
Flash makes sense for OLTP (actually Texas Memory Systems has devices targeted specifically at OLTP). The only issue there is that many frequent reads will eventually wear down the flash…if you’re sustaining thousands of writes/sec, it may prove to be an unwise choice.
DK
Thanks, David!
But is that a locked-in aspect of the technology, or can we expect it to improve over time?
So, I believe that the throughput/bandwidth aspect of solid state memory is here to stay for a while. Just to satisfy our mathematical curiousity, here’s are some stats on flash drives:
http://www.bitmicro.com/products_storage_devices.php
Most of them look like they provide up to 70MB/s – half the bandwidth of a 15K RPM drive, but ridiculous numbers of IOPs. Most disks top out at 150 IOPS, these provide up to 10x that.
The other side of the equation is the tolerance for writes. While flash memory will never be up to par for heavy write workloads, it’s possible that future types of solid state memory could deal with 10x more writes. That might be sufficient to deal with many transactional workloads.
Also, it’s probably good to keep in mind that the MTTF numbers that disk vendors provide are likely not very accurate. I think it would be interesting to compare the TTF for a set of solid state and traditional disk drives.
DK
I have used QlikView for almost 6 years now and the compression is unbelievable. I routinely get over 20x compression and average 1-2 bytes per field value.
[…] While I love this work I always seem to come away with a couple of nagging issues. One being the amount of memory required to store state on the server for many requests, which over time will become less of an issue. […]
That increase will be very good for games…
Maybe late but it will