October 14, 2009

Greenplum is going hybrid columnar as well

Over the past summer, Vertica, VectorWise, and Oracle all announced flavors of hybrid row/columnar storage. Now it’s Greenplum’s turn. Greenplum is actually offering true columnar storage, as opposed to Oracle’s PAX-like scheme — and also as opposed to the kind of Frankencolumn storage Daniel Abadi decries. For example, you don’t have to do a join to retrieve multiple columns; you just ask for them and there they are. Similarly, Greenplum doesn’t maintain explicit row IDs – whether in row-oriented or column-oriented append-only storage – relying instead on block-level header information.

Highlights include:

Column orientation is a special case of what Greenplum is calling Polymorphic Data Storage.*
As per product management chief Ben Werther’s blog post, what Greenplum’s polymorphic data storage boils down to is that you can store different tables in different storage paradigms. This is transparent to the SQL or any other API; it’s just a performance choice.
Indeed, Greenplum lets you store different partitions of the same table in different storage and/or compression schemes. So Greenplum now has a kind of ILM (Information Lifecycle Management) story, although it doesn’t offer the faster vs. cheaper storage media differentiation options of Sybase IQ or Vertica.
Greenplum now has, depending on how one counts, three or four main types of table:
- Traditional PostgreSQL, which has been available since Day One
- Row-oriented append-only (compressible and scan-optimized), available since Greenplum 3.2 (July, 2008)
- Columnar append-only (new in Greenplum 3.3.4, shipping now)
- External, in which Greenplum treats something external – in a relational DBMS or otherwise – as if it were a Greenplum table
Greenplum offers multiple versions of LZ (Lempel-Ziv) and gzip compression, any of which you can choose on a table-by-table or partition-by-partition basis.
Greenplum offers the same compression algorithms for both row-oriented and column-oriented tables.
Greenplum says that compression is typically at least 50% better (i.e., to 2/3 as much space) in columnar vs. row storage, for the same algorithm.
Just as it doesn’t offer columnar-specific compression algorithms, Greenplum also doesn’t sport other columnar features Daniel loves, such as in-memory compression or late materialization. (But then, VectorWise doesn’t do in-memory compression either, and Daniel likes VectorWise.)
All the Greenplum choices I’ve mentioned have to be made manually by DBAs.
Similarly, I doubt Greenplum can match Vertica’s engineering for getting updates and trickle feeds quickly into a column store – a traditional columnar Achilles heel that Vertica has invested a lot of effort to circumvent.

*The term “polymorphic” is somewhat, shall we say, overloaded these days.

Categories: Analytic technologies, Columnar database management, Data warehousing, Database compression, Greenplum, Theory and architecture

Subscribe to our complete feed!

Comments

12 Responses to “Greenplum is going hybrid columnar as well”

Osma on October 14th, 2009 2:18 am

Interesting — the append-only compressed row store sounds kind of like a compressed MySQL/MyISAM table though. I’m curious how they’ve approached indexing in the column store mechanism. Have you found any data on that?
Seth Grimes on October 14th, 2009 6:59 am

Very helpful write-up!
DW Consultant on October 14th, 2009 12:23 pm

Nice write-up, Although it sounds like Greenplum loves to copy technology rather than innovating. Does this mean that they cannot perform as well as Columnar DBMS? Are they loosing business to Columnar Vendors?
Ben Werther on October 14th, 2009 1:35 pm

DW Consultant —

– You’d have to agree that every vendor is building from a largely shared pool of ideas. Most of everything that every vendor does is covered in academic literature going back decades. Our goal isn’t being novel in everything we do — it is delivering value to customers.

– That being said, I think a little credit is due here. We’ve built a flexible enough storage infrastructure to allow us to (1) easily add a very efficient implementation of column-oriented tables, and (2) allow both row- and column-orientation to be used not just in the same database but in different partitions of the same table.

So why did we add this feature? It is about customer choice. For most analytical queries and mixed workloads – particularly with high-rate continuous microbatched loads – our row processing wins out over columnar approaches. (i.e. There are good reasons why the pure columnar guys aren’t winning mixed-workload EDW deals against Teradata like we are). But there are a lot of cases where columnar processing does great and does have an edge over row processing. Customers wanted the choice, so now we do both.
Daniel Abadi on October 14th, 2009 2:53 pm

Curt,

You pretty much predicted everything I was going to say, but nonetheless, my reactions can be found at:

http://dbmsmusings.blogspot.com/2009/10/greenplum-announces-column-oriented.html
Paul Johnson on October 15th, 2009 3:52 pm

Well done to Greenplum for offering more choice say I. A hybrid column/row capability is pretty cool.

We downloaded the new release a few days ago after Luke mentioned during a call that the new column stuff had been made available.

It’ll be interesting to see how it works once folks start beating on it.
The Top 10 Trends for 2010 in Analytics, Business Intelligence, and Performance Management « Enterprise Information Management on December 3rd, 2009 5:16 am

[…] Data, and the like with significant innovations in in-memory processing, exploiting parallelism, columnar storage options, and more. We already starting to see hybrid approaches between the Hadoop players and […]
Appregatta Blog » 2010: The Year of Business Intelligence on December 10th, 2009 6:08 am

[…] Data, and the like with significant innovations in in-memory processing, exploiting parallelism, columnar storage options, and more. Additionally, significant opportunities to push application processing into […]
Aster Data nCluster Version 4.6 | DBMS 2 : DataBase Management System Services on September 15th, 2010 3:35 am

[…] Aster Data has now joined Greenplum/EMC among row-based analytic DBMS vendors with hybrid row-column stores. Oracle will join them some […]
Columnar compression vs. column storage | DBMS 2 : DataBase Management System Services on February 6th, 2011 4:23 am

[…] that truly offer some form of hybrid row/column storage include Vertica, EMC/Greenplum, and Aster Data. Oracle Exadata, in my opinion, does not, but I can see why people might get […]
The Top 10 Trends for 2010 in Analytics, Business Intelligence, and Performance Management | Analytics Careers on November 11th, 2011 12:41 pm

[…] Data, and the like with significant innovations in in-memory processing, exploiting parallelism, columnar storage options, and more. We already starting to see hybrid approaches between the Hadoop players and […]
Comments on the analytic DBMS industry and Gartner’s Magic Quadrant for same : DBMS 2 : DataBase Management System Services on February 18th, 2012 6:43 pm

[…] neglects to praise Greenplum for true hybrid row/columnar data management, a feature shared by Teradata and Vertica, among others, but not by Oracle, DB2, or […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Greenplum is going hybrid columnar as well

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin