Databricks, Spark and BDAS

Discussion of BDAS (Berkeley Data Analytics Systems), especially Spark and related projects, and also of Databricks, the company commercializing Spark.

December 2, 2012

Are column stores really better at compression?

A consensus has evolved that:

Columnar compression (i.e., value-based compression) compresses better than block-level compression (i.e., compression of bit strings).
Columnar compression can be done pretty well in row stores.

Still somewhat controversial is the claim that:

Columnar compression can be done even better in column stores than in row-based systems.

A strong plausibility argument for the latter point is that new in-memory analytic data stores tend to be columnar — think HANA or Platfora; compression is commonly cited as a big reason for the choice. (Another reason is that I/O bandwidth matters even when the I/O is from RAM, and there are further reasons yet.)

One group that made the in-memory columnar choice is the Spark/Shark guys at UC Berkeley’s AMP Lab. So when I talked with them Thursday (more on that another time, but it sounds like cool stuff), I took some time to ask why columnar stores are better at compression. In essence, they gave two reasons — simplicity, and speed of decompression.

In each case, the main supporting argument seemed to be that finding the values in a column is easier when they’re all together in a column store. Read more

Categories: Columnar database management, Database compression, Databricks, Spark and BDAS, In-memory DBMS, Netezza

10 Comments

← Previous Page

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Databricks, Spark and BDAS

Are column stores really better at compression?

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin