Database compression

Analysis of technology that compresses data within a database management system. Related subjects include:

May 29, 2009

Sneakernet to the cloud

Recently, Amazon CTO Werner Vogels put up a blog post which suggested that, now and in the future, the best way to get large databases into the cloud is via sneakernet.  In some circumstances, he is surely right. Possible implications include:

But for one-time moves of data sets — sure, sneaker net/snail mail should work just fine.

May 26, 2009

Teradata Developer Exchange (DevX) begins to emerge

Every vendor needs developer-facing web resources, and Teradata turns out to have been working on a new umbrella site for its.  It’s called Teradata Developer Exchange — DevX for short.  Teradata DevX seems to be in a low-volume beta now, with a press release/bigger roll-out coming next week or so.  Major elements are about what one would expect:

If you’re a Teradata user, you absolutely should check out Teradata DevX.  If you just research Teradata — my situation 🙂 — there are some aspects that might be of interest anyway.  In particular, I found Teradata’s downloads instructive, most particularly those in the area of extensibility.  Mainly, these are UDFs (User-Defined Functions), in areas such as:

Also of potential interest is a custom-portlet framework for Teradata’s management tool Viewpoint.  A straightforward use would be to plunk some Viewpoint data into a more general system management dashboard.  A yet cooler use — and I couldn’t get a clear sense of whether anybody’s ever done this yet — would be to offer end users some insight as to how long their queries are apt to run.

May 14, 2009

Facebook’s experiences with compression

One little topic didn’t make it into my long post on Facebook’s Hadoop/Hive-based data warehouse: Compression. The story seems to be:

May 14, 2009

The secret sauce to Clearpace’s compression

In an introduction to archiving vendor Clearpace last December, I noted that Clearpace claimed huge compression successes for its NParchive product (Clearpace likes to use a figure of 40X), but didn’t give much reason that NParchive could compress a lot more effectively than other columnar DBMS. Let me now follow up on that.

To the extent there’s a Clearpace secret sauce, it seems to lie in NParchive’s unusual data access method.  NParchive doesn’t just tokenize the values in individual columns; it tokenizes multi-column fragments of rows.  Which particular columns to group together in that way seems to be decided automagically; the obvious guess is that this is based on estimates of the cardinality of their Cartesian products.

Of the top of my head, examples for which this strategy might be particularly successful include:

April 24, 2009

Some DB2 highlights

I chatted with IBM Thursday, about recent and imminent releases of DB2 (9.5 through 9.7). Highlights included:

February 23, 2009

The questionable benefits of terabyte-scale data warehouse virtualization

Vertica is virtualizing via VMware, and has suggested a few operational benefits to doing so that might or might not offset VMware’s computational overhead. But on the whole,it seems virtualization’s major benefits don’t apply to the large-database MPP data warehousing. Read more

January 28, 2009

More Oracle notes

When I went to Oracle in October, the main purpose of the visit was to discuss Exadata. And so my initial post based on the visit was focused accordingly. But there were a number of other interesting points I’ve never gotten around to writing up. Let me now remedy that, at least in part. Read more

December 20, 2008

More grist for the column vs. row mill

Daniel Abadi and Sam Madden are at it again, following up on their blog posts of six months arguing for the general superiority of column stores over row stores (for analytic query processing).  The gist is to recite a number of bases for superiority, beyond the two standard ones of less I/O and better compression, and seems to be based largely on Section 5 of a SIGMOD paper they wrote with Neil Hachem.

A big part of their argument is that if you carry the processing of columnar and/or compressed data all the way through in memory, you get lots of advantages, especially because everything’s smaller and hence fits better into Level 2 cache. There also is some kind of join algorithm enhancement, which seems to be based on noticing when the result wound up falling into a range according to some dimension, and perhaps using dictionary encoding in a way that will help induce such an outcome.

The main enemy here is row-store vendors who say, in effect, “Oh, it’s easy to shoehorn almost all the benefits of a column-store into a row-based system.”  They also take a swipe — for being insufficiently purely columnar — at unnamed columnar Vertica competitors, described in terms that seemingly apply directly to ParAccel.

December 16, 2008

Database archiving and information preservation

Two similar companies reached out to me recently – SAND Technology and Clearpace. Their current market focus is somewhat different: Clearpace talks mainly of archiving, and sells first and foremost into the compliance market, while SAND has the most traction providing “near-line” storage for SAP databases.* But both stories boil down to pretty much the same thing: Cheap, trustworthy data storage with good-enough query capabilities. E.g., I think both companies would agree the following is a not-too-misleading first-approximation characterization of their respective products:

Read more

October 22, 2008

Update on Aster Data Systems and nCluster

I spent a few hours at Aster Data on my West Coast swing last week, which has now officially put out Version 3 of nCluster. Highlights included: Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.