May 22, 2008
Netezza on compression
Phil Francisco put up a nice post on Netezza’s company blog about a month ago, explaining the Netezza compression story. Highlights include:
- Like other row-based vendors, Netezza compresses data on a column-by-column basis, then stores the results in rows. This is obviously something of a limitation — no run-length encoding for them — but can surely accommodate several major compression techniques.
- The Netezza “Compress Engine” compresses data on a block-by-block basis. This is a disadvantage for row-based systems vs. columnar ones in the area of compression, because columnar systems have more values per block to play with, and that yields higher degrees of compression. And among row-based systems, typical block size is an indicator of compression success. Thus, DATAllegro probably does a little better at compression than Netezza, and Netezza does a lot better at compression than Teradata.
- Netezza calls its compression “compilation.” The blog post doesn’t make the reason clear. And the one reason I can recall confuses me. Netezza once said the compression extends at least somewhat to columns with calculated values. But that seems odd, as Netezza only has a very limited capability for materialized views.
- Netezza pays the processing cost of compression in the FPGA, not the microprocessor. And so Netezza spins the overhead of the Compress Engine as being zero or free. That’s actually not ridiculous, since Netezza seems to have still-unused real estate on the FPGA for new features like compression.
Also in the post is a lot of chest-beating about Netezza’s thought-leading greatness, hyperbolically comparing the company to David Hume, or perhaps to the combination of David Hume and Immanuel Kant. Actually, I don’t think a careful analysis of either Hume’s or Kant’s work would give Phil much joy in the area of marketing metaphors, but hey — anything that calls attention to Hume’s greatness is OK by me. 🙂
Categories: Analytic technologies, Columnar database management, Data warehouse appliances, Data warehousing, Database compression, Netezza, Theory and architecture
Subscribe to our complete feed!
Comments
2 Responses to “Netezza on compression”
Leave a Reply
Netezza calls their compression feature compiled tables because the data on disk is basically a set of instructions that when fed into the FPGA results in a data set being produced (the original uncompressed / unencrypted data).
Compiled tables will also be used to impliment encryption as well. Another possibile use of the compiled tables feature is row level security and virtual columns.
Shawn,
Yes, that’s what they say. But the “set of instructions” bit could be said of any compression scheme.
CAM