Database compression

Analysis of technology that compresses data within a database management system. Related subjects include:

February 6, 2013

Key questions when selecting an analytic RDBMS

I recently complained that the Gartner Magic Quadrant for Data Warehouse DBMS conflates many use cases into one set of rankings. So perhaps now would be a good time to offer some thoughts on how to tell use cases apart. Assuming you know that you really want to manage your analytic database with a relational DBMS, the first questions you ask yourself could be:

How big is your database? How big is your budget?
How do you feel about appliances?
How do you feel about the cloud?
What are the size and shape of your workload?
How fresh does the data need to be?

Let’s drill down. Read more

Categories: Buying processes, Cloud computing, Clustering, Data integration and middleware, Data warehouse appliances, Data warehousing, Database compression, Exadata, IBM and DB2, Memory-centric data management, Microsoft and SQL*Server, Netezza, Oracle, Pricing, Teradata, Workload management

3 Comments

February 5, 2013

Comments on Gartner’s 2012 Magic Quadrant for Data Warehouse Database Management Systems — evaluations

To my taste, the most glaring mis-rankings in the 2012/2013 Gartner Magic Quadrant for Data Warehouse Database Management are that it is too positive on Kognitio and too negative on Infobright. Secondarily, it is too negative on HP Vertica, and too positive on ParAccel and Actian/VectorWise. So let’s consider those vendors first.

Gartner seems confused about Kognitio’s products and history alike.

Gartner calls Kognitio an “in-memory” DBMS, which is not accurate.
Gartner doesn’t remark on Kognitio’s worst-in-class* compression.
Gartner gives Kognitio oddly high marks for a late, me-too Hadoop integration strategy.
Gartner writes as if Kognitio’s next attempt at the US market will be the first one, which is not the case.
Gartner says that Kognitio pioneered data warehouse SaaS (Software as a Service), which actually has existed since the pre-relational 1970s.

Gartner is correct, however, to note that Kognitio doesn’t sell much stuff overall.

* non-existent

In the cases of HP Vertica, Infobright, ParAccel, and Actian/VectorWise, the 2012 Gartner Magic Quadrant for Data Warehouse Database Management’s facts are fairly accurate, but I dispute Gartner’s evaluation. When it comes to Vertica: Read more

Categories: 1010data, Actian and Ingres, Amazon and its cloud, Calpont, Columnar database management, Data warehouse appliances, Data warehousing, Database compression, EMC, Exasol, Greenplum, Hadoop, HP and Neoview, IBM and DB2, Infobright, Kognitio, Market share and customer counts, Microsoft and SQL*Server, MicroStrategy, Netezza, Oracle, ParAccel, Petabyte-scale data management, SAND Technology, SAP AG, Software as a Service (SaaS), Sybase, Teradata, VectorWise, Vertica Systems

6 Comments

February 5, 2013

Comments on Gartner’s 2012 Magic Quadrant for Data Warehouse Database Management Systems — concepts

The 2012 Gartner Magic Quadrant for Data Warehouse Database Management Systems is out. I’ll split my comments into two posts — this one on concepts, and a companion on specific vendor evaluations.

Links:

Maintaining working links to Gartner Magic Quadrants is an adventure. But as of early February, 2013, this link seems live.
I also commented on the 2011, 2010, 2009, 2008, 2007, and 2006 Gartner Magic Quadrants for Data Warehouse DBMS.

Let’s start by again noting that I regard Gartner Magic Quadrants as a bad use of good research. On the facts:

Gartner collects a lot of input from traditional enterprises. I envy that resource.
Gartner also does a good job of rounding up vendor claims about user base sizes and the like. If nothing else, you should skim the MQ report for that reason.
Gartner observations about product feature sets are usually correct, although not so consistently that they should be relied on.

When it comes to evaluations, however, the Gartner Data Warehouse DBMS Magic Quadrant doesn’t do as well. My concerns (which overlap) start:

The Gartner MQ conflates many different use cases into one ranking (inevitable in this kind of work, but still regrettable).
A number of the MQ vendor evaluations seem hard to defend. So do some of Gartner’s specific comments.
Some of Gartner’s criteria seemingly amount to “parrots back our opinions to us”.
As do I, Gartner thinks a vendor’s business and financial strength are important. But Gartner overdoes the matter, drilling down into picky issues it can’t hope to judge, such as assessing a vendor’s “ability to generate and develop leads.” *
The 2012 Gartner Data Warehouse DBMS Magic Quadrant is closer to being a 1-dimensional ranking than 2-dimensional, in that entries are clustered along the line x=y. This suggests strong correlation among the results on various specific evaluation criteria.

Categories: Data integration and middleware, Data warehousing, Database compression, Emulation, transparency, portability, Hadoop, Market share and customer counts, Oracle, Text

5 Comments

January 15, 2013

Tokutek update

Alternate title: TokuDB updates 🙂

Now that I’ve addressed some new NewSQL entrants, namely NuoDB and GenieDB, it’s time to circle back to some more established ones. First up are my clients at Tokutek, about whom I recently wrote:

Tokutek turns a performance argument into a functionality one. In particular, Tokutek claims that TokuDB does a much better job than alternatives of making it practical for you to update indexes at OLTP speeds. Hence, it claims to do a much better job than alternatives of making it practical for you to write and execute queries that only make sense when indexes (or other analytic performance boosts) are in place.

That’s all been true since I first wrote about Tokutek and TokuDB in 2009. However, TokuDB’s technical details have changed. In particular, Tokutek has deemphasized the ideas that:

Vaguely justified the “fractal” metaphor, namely …
… the stuff in that post about having one block each sized for each power of 2, …
… which seem to be a form of what is more ordinarily called “cache-oblivious” technology.

Rather, Tokutek’s new focus for getting the same benefits is to provide a separate buffer for each node of a b-tree. In essence, Tokutek is taking the usual “big blocks are better” story and extending it to indexes. TokuDB also uses block-level compression. Notes on that include: Read more

Categories: Akiban, Database compression, Market share and customer counts, NewSQL, Tokutek and TokuDB

7 Comments

January 12, 2013

Introduction to NuoDB

NuoDB has an interesting NewSQL story. NuoDB’s core design goals seem to be:

SQL.
Transactions.
Very flexible topology, including:
- Local replicas.
- Remote replicas.
- Easy deployment and management.

Categories: Cache, Cloud computing, Clustering, Database compression, NewSQL, NuoDB

5 Comments

December 2, 2012

Are column stores really better at compression?

A consensus has evolved that:

Columnar compression (i.e., value-based compression) compresses better than block-level compression (i.e., compression of bit strings).
Columnar compression can be done pretty well in row stores.

Still somewhat controversial is the claim that:

Columnar compression can be done even better in column stores than in row-based systems.

A strong plausibility argument for the latter point is that new in-memory analytic data stores tend to be columnar — think HANA or Platfora; compression is commonly cited as a big reason for the choice. (Another reason is that I/O bandwidth matters even when the I/O is from RAM, and there are further reasons yet.)

One group that made the in-memory columnar choice is the Spark/Shark guys at UC Berkeley’s AMP Lab. So when I talked with them Thursday (more on that another time, but it sounds like cool stuff), I took some time to ask why columnar stores are better at compression. In essence, they gave two reasons — simplicity, and speed of decompression.

In each case, the main supporting argument seemed to be that finding the values in a column is easier when they’re all together in a column store. Read more

Categories: Columnar database management, Database compression, Databricks, Spark and BDAS, In-memory DBMS, Netezza

10 Comments

November 29, 2012

Notes on Microsoft SQL Server

I’ve been known to gripe that covering big companies such as Microsoft is hard. Still, Doug Leland of Microsoft’s SQL Server team checked in for phone calls in August and again today, and I think I got enough to be worth writing about, albeit at a survey level only,

Subjects I’ll mention include:

Hadoop
Parallel Data Warehouse
PolyBase
Columnar data management
In-memory data management (Hekaton)

One topic I can’t yet comment about is MOLAP/ROLAP, which is a pity; if anybody can refute my claim that ROLAP trumps MOLAP, it’s either Microsoft or Oracle.

Microsoft’s slides mentioned Yahoo refining a 6 petabyte Hadoop cluster into a 24 terabyte SQL Server “cube”, which was surprising in light of Yahoo’s history as an Oracle reference.

Categories: Columnar database management, Data warehouse appliances, Data warehousing, Database compression, Hadoop, Hortonworks, In-memory DBMS, MapReduce, Market share and customer counts, Microsoft and SQL*Server, Oracle, Yahoo

10 Comments

October 24, 2012

Quick notes on Impala

Edit: There is now a follow-up post on Cloudera Impala with substantially more detail.

In my world it’s possible to have a hasty 2-hour conversation, and that’s exactly what I had with Cloudera last week. We touched on hardware and general adoption, but much of the conversation was about Cloudera Impala, announced today. Like Hive, Impala turns Hadoop into a basic analytic RDBMS, with similar SQL/Hadoop integration benefits to those of Hadapt. In particular:

Impala is Hive-compatible in query language (HQL, which is a whole lot like SQL), metadata, JDBC/ODBC drivers, etc.
Unlike Hive, Impala does not work through Hadoop MapReduce.
Unlike Hadoop MapReduce and hence Hive, Impala does not persist intermediate results to disk. This is good for performance, but on extremely long-running queries it increases the risk you’ll have a node failure and have to restart the query from scratch.
Impala in its first version is missing some Hive syntax, notably in support for UDFs (User-Defined Functions).

Beyond that: Read more

Categories: Cloudera, Columnar database management, Database compression, Hadapt, Hadoop, MapReduce, Open source, SQL/Hadoop integration

6 Comments

October 23, 2012

Introduction to Platfora

When I wrote last week that I have at least 5 clients claiming they’re uniquely positioned to support BI over Hadoop (most of whom partner with a 6th client, Tableau) the non-partnering exception I had in mind was Platfora, Ben Werther’s oh-so-stealthy startup that is finally de-stealthing today. Platfora combines:

An interesting approach to analytic data management.
Business intelligence tools integrated with that.

The whole thing sounds like a perhaps more general and certainly non-SaaS version of what Metamarkets has been offering for a while.

The Platfora technical story starts: Read more

Categories: Business intelligence, Columnar database management, Data models and architecture, Data warehousing, Database compression, Hadoop, Memory-centric data management, Platfora

6 Comments

September 27, 2012

Hoping for true columnar storage in Oracle12c

I was asked to clarify one of my July comments on Oracle12c,

I wonder whether Oracle will finally introduce a true columnar storage option, a year behind Teradata. That would be the obvious enhancement on the data warehousing side, if they can pull it off. If they can’t, it’s a damning commentary on the core Oracle codebase.

by somebody smart who however seemed to have half-forgotten my post comparing (hybrid) columnar compression to (hybrid) columnar storage.

In simplest terms:

Columnar storage and columnar compression are two different things. The main connections are:
- Columnar storage can make columnar compression more effective.
- In different ways, both technologies reduce I/O.
EMC Greenplum, Teradata Aster, and Teradata Classic are all originally row-based systems that have gone hybrid columnar.
Vertica is an originally column-based system that has gone hybrid columnar.

Categories: Aster Data, Columnar database management, Data warehousing, Database compression, Greenplum, Oracle, Teradata, Vertica Systems

4 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Database compression

Key questions when selecting an analytic RDBMS

Comments on Gartner’s 2012 Magic Quadrant for Data Warehouse Database Management Systems — evaluations

Comments on Gartner’s 2012 Magic Quadrant for Data Warehouse Database Management Systems — concepts

Tokutek update

Introduction to NuoDB

Are column stores really better at compression?

Notes on Microsoft SQL Server

Quick notes on Impala

Introduction to Platfora

Hoping for true columnar storage in Oracle12c

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin