Word of the day: “Compression”
IBM sent over a bunch of success stories recently, with DB2’s new aggressive compression prominently mentioned. Mike Stonebraker made a big point of Vertica’s compression when last we talked; other column-oriented data warehouse/mart software vendors (e.g. Kognitio, SAP, Sybase) get strong compression benefits as well. Other data warehouse/mart specialists are doing a lot with compression too, although some of that is governed by please-don’t-say-anything-good-about-us NDA agreements.
Compression is important for at least three reasons:
- It saves disk space, which is a major cost issue in data warehousing.
- It saves I/O, which is the major performance issue in data warehousing.
- In well-designed systems, it can actually make on-chip execution faster, because the gains in memory speed and movement can exceed the cost of actually packing/unpacking the data. (Or so I’m told; I haven’t aggressively investigated that claim.)
When evaluating data warehouse/mart software, take a look at the vendor’s compression story. It’s important stuff.
EDIT: DATAllegro claims in a note to me that they get 3-4x storage savings via compression. They also make the observation that fewer disks ==> fewer disk failures, and spin that — as it were 🙂 — into a claim of greater reliability.
Comments
3 Responses to “Word of the day: “Compression””
Leave a Reply
[…] column stores allow the processors to operate directly on compressed data. But once again, I don’t see why row stores can’t do that too. For example, when you join via bitmapped indices, exactly what you’re doing is operating on highly-compressed data.Want to continue getting great research about DBMS, analytics, and other technologies related to data management? Then subscribe to our feed, by RSS/Atom or e-mail! We recommend taking the integrated feed for all our blogs, but blog-specific ones are also easily available. • • • […]
[…] I’ve recently made a lot of posts about database compression. 3X or more compression is rapidly becoming standard; 5X+ is coming soon as processor power increases; 10X or more is not unrealistic. True, this applies mainly to data warehouses, but that’s where the big database growth is happening. And new kinds of data — geospatial, telemetry, document, video, whatever — are highly compressible as well. […]
[…] и его коллеги открыли свой блог. Компрессия уже давно играет одну из главных ролей в истории DATAllegro, в то время, как Netezza занялась этим вопросом совсем […]