Database compression

Analysis of technology that compresses data within a database management system. Related subjects include:

May 4, 2011

IBM InfoSphere Warehouse pricing, packaging, compression and more

IBM InfoSphere Warehouse 9.7.3 has been announced, and is planned for general availability late this month. IBM InfoSphere Warehouse is, in essence, DB2-plus, where the “plus” comprises:

DPF (Data Partitioning Feature) — i.e., the ability to do shared-nothing scale-out.
Unimportant add-ons — e.g., a mere 5 seats of the Cognos BI tool.

The main news in this release of InfoSphere Warehouse is probably pricing. While IBM has long had a funky server-power-based pricing scheme, it is now adding per-terabyte pricing, with a twist: IBM InfoSphere Warehouse now can be bought per terabyte of compressed user data. Specifically:

IBM InfoSphere Warehouse 9.7.3 Enterprise Edition can be bought for production for $70K or so per terabyte of compressed user data.
IBM InfoSphere Warehouse 9.7.3 Departmental Edition can be bought for production for $35K or so per terabyte of compressed user data.
Development/test seats of IBM InfoSphere Warehouse cost about $2K per user.
High availability/disaster recovery instances are priced as if they were managing 1 TB each — unless, of course, you have an active-active configuration, in which case they’re priced according to their full amount of data.

Per-terabyte pricing is generally a good way to think about analytic DBMS costs, for at least two reasons: Read more

Categories: Data warehousing, Database compression, IBM and DB2, Pricing

1 Comment

April 18, 2011

Endeca topics

I visited my then-clients at Endeca in January. We focused on underpinnings (and strategic counsel) more than on coolness in what the product actually does. But going over my notes I think there’s enough to write up now.

Before saying much else about Endeca, there’s one confusion to dispose of: What’s the relationship between Endeca’s efforts in e-commerce (helping shoppers navigate websites) and business intelligence (helping people navigate their own data)? As Endeca tells it:

Endeca’s e-commerce and business intelligence efforts are reflections of the same technical approach. Indeed, I’m pretty sure Endeca’s product lines still share much/most of the same technology.
Endeca went after e-commerce first because that’s where the provable ROI was. As I pointed out a couple of times in 2007, Endeca became a market leader in that area.
Endeca increased its BI efforts later.
Circa 2009-10, Endeca differentiated its e-commerce and BI product lines from each other.
- An e-commerce line extension called Page Builder is what really got Endeca through the recent recession.
- The BI product line Latitude was launched in the fall of 2010.

Endeca’s positioning in the business intelligence market boils down to “investigative analytics for people who aren’t hardcore analysts.” Endeca’s technological support for that stresses: Read more

Categories: Business intelligence, Columnar database management, Database compression, Endeca

11 Comments

April 7, 2011

Introduction to Syncsort and DMExpress

Let’s start with some Syncsort basics.

Syncsort was founded in 1968.
As you might guess from its name and age, Syncsort started out selling software for IBM mainframes, used for sorting data. However, for the past 30 or so years, Syncsort’s products have gone beyond sort to also do join, aggregation, and merge. This was the basis for Syncsort’s expansion into the more general ETL (Extract/Transform/Load) business.
As you might further guess, along the way there was a port to UNIX, development of a GUI (Graphical User Interface), and a change of ownership as Syncsort’s founder more or less cashed out.
At this point, Syncsort sees itself primarily as a data integration/ETL company, whose main claim to fame is performance, with further claims of linear scaling and no manual tuning.*

One of Syncsort’s favorite value propositions is to contrast the cost of doing ETL in Syncsort, on commodity hardware, to the cost of doing ELT (Extract/Load/Transform) on high-end Teradata gear.

Categories: Data integration and middleware, Database compression, EAI, EII, ETL, ELT, ETLT, Specific users, Syncsort

9 Comments

March 4, 2011

Teradata, Aster Data, and Teradata/Aster

Teradata is acquiring Aster Data. Naturally, the deal is being presented with a Treaty of Tordesillas kind of positioning — Teradata does X, Aster Data does Y, and everybody looks forward to having X and Y in the same product portfolio. That said, my initial positioning and product strategy thoughts on the Teradata/Aster combination go something like this. Read more

Categories: Analytic technologies, Aster Data, Columnar database management, Data warehouse appliances, Data warehousing, Database compression, RDF and graphs, Specific users, Teradata

9 Comments

February 6, 2011

Columnar compression vs. column storage

I’m getting the increasing impression that certain industry observers, such as Gartner, are really confused about columnar technology. (I further suspect that certain vendors are encouraging this confusion, as vendors commonly do.) So here are some basic points.

A simple way to think about the difference between columnar storage and columnar (or any other kind of) compression is this:

Columnar storage is a reference to how data is grouped together on disk (or in solid-state memory).
(Columnar) compression is a reference to whether the actual data is on disk, or whether you save space by storing some smaller substitute for the actual data.

Specifically, if data in a relational table is grouped together according to what row it’s in, then the database manager is called “row-based” or a “row store.” If it’s grouped together according to what column it’s in, then the database management system is called “columnar” or a “column store.” Increasingly, row-based and columnar storage are being hybridized.

There are two main kinds of compression — compression of bit strings and more intelligent compression of actual data values. Compression of actual data values can reasonably be called “columnar,” in that different columns of data can be compressed in different ways, often depending only on the data in that column.* Read more

Categories: Columnar database management, Data warehousing, Database compression, Exadata, Vertica Systems

21 Comments

February 5, 2011

Comments on the Gartner 2010/2011 Data Warehouse Database Management Systems Magic Quadrant

Edit: Comments on the February, 2012 Gartner Magic Quadrant for Data Warehouse Database Management Systems — and on the companies reviewed in it — are now up.

The Gartner 2010 Data Warehouse Database Management Systems Magic Quadrant is out. I shall now comment, just as I did to varying degrees on the 2009, 2008, 2007, and 2006 Gartner Data Warehouse Database Management System Magic Quadrants.

Note: Links to Gartner Magic Quadrants tend to be unstable. Please alert me if any problems arise; I’ll edit accordingly.

In my comments on the 2008 Gartner Data Warehouse Database Management Systems Magic Quadrant, I observed that Gartner’s “completeness of vision” scores were generally pretty reasonable, but their “ability to execute” rankings were somewhat bizarre; the same remains true this year. For example, Gartner ranks Ingres higher by that metric than Vertica, Aster Data, ParAccel, or Infobright. Yet each of those companies is growing nicely and delivering products that meet serious cutting-edge analytic DBMS needs, neither of which has been true of Ingres since about 1987. Read more

Categories: 1010data, Actian and Ingres, Analytic technologies, Aster Data, Benchmarks and POCs, Columnar database management, Data warehouse appliances, Data warehousing, Database compression, EMC, Exadata, Greenplum, illuminate Solutions, Infobright, Microsoft and SQL*Server, Netezza, Open source, ParAccel, Pricing, SAND Technology, Storage, Sybase, Teradata, Vertica Systems, Workload management

23 Comments

February 2, 2011

Exadata notes

It’s been a while since I penetrated Oracle’s tight message control and actually talked with them about Exadata. But Doug Henschen wrote a good article about Exadata based on an Andy Mendelsohn webcast. I agree with almost all of it. At first I was a little surprised that Exadata’s emphasis shift from data warehousing to OLTP/generic consolidation hasn’t gone more quickly, but on the other hand:

On the data warehouse side Exadata can alleviate screaming pain points.
In OLTP consolidation, Exadata mainly can save money. (Yes, I just said a product from Oracle can save customers money, and I meant it. You may stop laughing at any time.)

Doug did overstate when he said that columnar architectures give 100X or more compression. That doesn’t happen. Yes, columnar compression can be >10X in a variety of use cases, while pre-Exadata Oracle index bloat can approach 10X at times; but even if you’re counting that way I doubt there are many instances in which it actually multiplies out to >100.

In other Exadata news, the long-standing observation that Oracle doesn’t like to do on-site Exadata POCs still holds true. A couple of existing Oracle users — one rather well-known — recently told me that Oracle won’t let them text Exadata except on Oracle premises. In one case, this is a deal-breaker keeping Exadata from being considered for a purchase, and Oracle still won’t budge.

Finally, I’m pretty sure that this “new” Softbank Teradata replacement Oracle has been touting since September as competitive evidence — which Doug’s article also references — isn’t quite what it sounds like. I believe Teradata’s version of the story, which somewhat edited goes like this: Read more

Categories: Benchmarks and POCs, Columnar database management, Data warehouse appliances, Database compression, Exadata, Oracle, Teradata

26 Comments

January 18, 2011

Architectural options for analytic database management systems

Mike Stonebraker recently kicked off some discussion about desirable architectural features of a columnar analytic DBMS. Let’s expand the conversation to cover desirable architectural characteristics of analytic DBMS in general. Read more

Categories: Analytic technologies, Aster Data, Benchmarks and POCs, Columnar database management, Data pipelining, Data warehousing, Database compression, Exadata, Michael Stonebraker, Oracle, Solid-state memory, Theory and architecture

5 Comments

January 12, 2011

Mike Stonebraker on “real column stores”

Mike Stonebraker has a post up on Vertica’s blog trying to differentiate “real” from “pretend” column stores. (Edit: That post seems to have come back down, but as of 1/19 it can be found in Google Cache.) In essence, Mike argues that the One Right Way to design a column store is Vertica’s, a position that Daniel Abadi used to share but since has retreated from.

There are some good things about that post, and some not-so-good. The worst paragraph is probably

Several row-store vendors (including Oracle, Greenplum and Aster Data) now claim to be selling a column store. Obviously, this would require a complete rewrite of a DBMS to move from Figure 1 to Figure 2. Hence, none of the “pretenders” have actually done this. Instead all have implemented some aspects of column stores, and then claim to be the real thing. This blog defines what the “real enchilada” looks like, and how to tell it from the pretenders.

which I question on two levels. Read more

Categories: Aster Data, Columnar database management, Database compression, Michael Stonebraker, Sybase, Theory and architecture, Vertica Systems

24 Comments

October 17, 2010

Where ParAccel is at

Until recently, I was extremely critical of ParAccel’s marketing. But there was an almost-clean sweep of the relevant ParAccel executives, and the specific worst practices I was calling out have for the most part been eliminated. So I was open to talking and working with ParAccel again, and that’s now happening. On my recent California trip, I chatted with three ParAccel folks for a few hours. Based on that and other conversation, here’s the current ParAccel story as I understand it.
Read more

Categories: Benchmarks and POCs, Columnar database management, Database compression, Investment research and trading, Memory-centric data management, ParAccel, Solid-state memory, Storage, Vertica Systems

10 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in