Database compression

Analysis of technology that compresses data within a database management system. Related subjects include:

May 17, 2010

Technical basics of Sybase IQ

The Sybase IQ folks had been rather slow about briefing me, at least with respect to crunch. They finally fixed that in February. Since then, I’ve been slow about posting based on those briefings. But what with Sybase being acquired by SAP, Sybase having an analyst meeting this week, and other reasons – well, this seems like a good time to post about Sybase IQ. 🙂

For starters, Sybase IQ is not just a bitmapped system, but it’s also not all that closely akin to C-Store or Vertica. In particular,

Sybase IQ stores data in columns – like, for example, Vertica.
Sybase IQ relies on indexes to retrieve data – unlike, for example, Vertica, in which the column pretty much is the index.
However, columns themselves can be used as indexes in the usual Vertica-like way.
Most of Sybase IQ’s indexes are bitmaps, or a lot like bitmaps, ala’ the original IQ product.
Some of Sybase IQ’s indexes are not at all like bitmaps, but more like B-trees.
In general, Sybase recommends that you put multiple indexes on each column because — what the heck – each one of them is pretty small. (In particular, the bitmap-like indexes are highly compressible.) Together, indexes tend to take up <10% of Sybase IQ storage space.

Categories: Columnar database management, Data warehousing, Database compression, Sybase, Theory and architecture

3 Comments

April 7, 2010

Thoughts on IBM’s anti-Oracle announcements

IBM is putting out a couple of press releases today that are obviously directed competitively at Oracle/Sun, and more specifically at Oracle’s Exadata-centric strategy. I haven’t been briefed, so I just have those to go on.

On the whole, the releases look pretty lame. Highlights seem to include:

Maybe a claim of enhanced data compression.
Otherwise, no obvious new technology except product packaging and bundling.
Aggressive plans to throw capital at the Sun channel to convert it to selling IBM gear. (A figure of $1/2 billion is mentioned, for financing.

Disappointingly, IBM shows a lot of confusion between:

Text data
Machine-generated data such as that from sensors

While both highly important, those are very different things. IBM has not in the past shown much impressive technology in either of those two areas, and based on these releases, I presume that trend is continuing.

Edits:

I see from press coverage that at least one new IBM model has some Fusion I/O solid-state memory boards in it. Makes sense.

A Twitter hashtag has a number of observations from the event. Not much substance I could detect except various kind of Oracle bashing.

Categories: Database compression, Exadata, IBM and DB2, Oracle, Solid-state memory

14 Comments

March 18, 2010

XtremeData update

I talked with Geno Valente of XtremeData tonight. Highlights included:

XtremeData still hasn’t sold any dbX stuff (they’ve had a side business in generic FPGA-based boards paying the bills for years). Well, there may have been some paid POCs (proofs of concept) or something, but real sales haven’t come through yet.
XtremeData does have three prospects who have said “Yes”, and expects one order to come through this month.
XtremeData continues to believe it shines when:
- Data models are complex
- In particular, there are complex joins
- In particular, two large tables have to be joined with each other, under circumstances where no product can avoid doing vast data redistribution
XtremeData insists that all the nice things Bill Inmon – including in webinars — has said about it has not been for pay or other similar business compensation. That’s quite unusual.
XtremeData is coming out with a new product, codenamed the Personal Data Warehouse (PDW), which:
- Is ready to go into beta test
- Should be launched in a month and a half or so
- Will have a different name when it is launched

Naming aside, Read more

Categories: Analytic technologies, Benchmarks and POCs, Data warehouse appliances, Data warehousing, Database compression, Kickfire, Market share and customer counts, Netezza, Pricing, XtremeData

5 Comments

January 15, 2010

There sure seem to be a lot of inaccuracies on ParAccel’s website

In what is actually an interesting post on database compression, ParAccel CTO Barry Zane threw in

Anyone who has met with us knows ParAccel shies away from hype.

But like many things ParAccel says, that is not true.

Edit (October, 2010): Like other posts I’ve linked to from Barry Zane’s blog, that one seems to be gone, with the URL redirecting elsewhere on ParAccel’s website.

The latest whoppers came in the form of several customers ParAccel listed on its website who hadn’t actually bought ParAccel’s DBMS, nor even decided to do so. It is fairly common to to claim a customer win, then retract the claim due to lack of permission to disclose. But that’s not what happened in these cases. Based on emails helpfully shared by a ParAccel competitor competing in some of those accounts, it seems clear that ParAccel actually posted fabricated claims of customer wins. Read more

Categories: Columnar database management, Data warehousing, Database compression, Market share and customer counts, ParAccel, Telecommunications

24 Comments

November 7, 2009

Calpont’s InfiniDB

Since its inception, Calpont has gone through multiple management teams, strategies, and investor groups. What it hadn’t done, ever, is actually shipped a product. Last week, however, Calpont introduced a free/open source DBMS, InfiniDB, with technical details somewhat reminiscent of what Calpont was promising last April. Highlights include:

Like Infobright, Calpont’s InfiniDB is a columnar DBMS consisting of a MySQL front end and a columnar storage engine.
Community edition InfiniDB runs on a single server.
One of commercial/enterprise edition InfiniDB’s main claims to fame will be MPP support.
There’s no announced time frame for commercial edition InfiniDB.
InfiniDB’s current compression story is dictionary/token only, with decompression occurring before joins are executed. Improvement is a roadmap item.
Indeed, InfiniDB has many roadmap items, a few of which can be found here. Also, a great overview of InfiniDB’s current state and roadmap can be found in this MySQL Performance Blog thread. (And follow the links there to find performance discussions of other free analytic DBMS.)
One thing InfiniDB already has that is still a roadmap item for Infobright is the ability to run a query across multiple cores at once.
One thing free InfiniDB has that Infobright only offers in its Enterprise Edition is ACID-compliant Insert/Update/Delete. (Note: I wish people would stop saying that Infobright Enterprise Edition isn’t ACID-compliant, since that point was cleared up a while ago.)
InfiniDB has no indexes or materialized views.
However, InfiniDB’s retrieval is expedited by something called “Extents,” which sounds a lot like Netezza’s zone maps.

Being on vacation, I’ll stop there for now. (If it weren’t for Tropical Storm/ depression Ida, I might not even be posting this much until I get back.)

Categories: Analytic technologies, Calpont, Columnar database management, Data warehousing, Database compression, Infobright, MySQL, Open source

3 Comments

October 18, 2009

Introduction to SenSage

I visited with SenSage on my two most recent trips to San Francisco. Both visits were, through no fault of SenSage’s, hasty. Still, I think I have enough of a handle on SenSage basics to be worth writing up.

General SenSage highlights include:

Categories: Analytic technologies, Columnar database management, Data warehousing, Database compression, Log analysis, MapReduce, SenSage, Streaming and complex event processing (CEP), Telecommunications

3 Comments

October 18, 2009

Kickfire capacity and pricing

Kickfire’s marketing communication efforts are still a work in progress. Kickfire did finally relax its secrecy about FPGA-vs.-custom-silicon – not coincidentally during Netezza’s recent publicity cycle. That wise choice helped Kickfire get some favorable attention recently for its technical and market strategy, e.g. from Daniel Abadi, Merv Adrian and, kicking things off — as it were — me. Weeks after a recent Kickfire product release, there’s finally a fairly accurate data sheet up, although there’s still one self-defeatingly misleading line I’ll comment on below. Pricing is a whole other area of confusion, although it seems that current list prices have been inadvertently* leaked in Merv’s post linked above, with only one inaccuracy that I can detect.**

*I gather from the company that they forgot to tell Merv pricing was NDA.

** Merv cited a price as “starting” that I believe to be top-of-the-line. No criticism of Merv is implied in that; Kickfire has not been very clear in communicating hard numbers.

All that said, if one takes Kickfire’s marketing statements literally, Kickfire list pricing is around $20-50K per terabyte for a few small, fixed, high-performance configurations. That’s all-in, for plug-and-play appliances. What’s more, that range is based on the actual published user data capacity numbers for various Kickfire models, which I think are low for several reasons:

Kickfire doesn’t officially admit that its model with 14.4 terabytes of disk can manage more than 6 terabytes of data, even though it clearly can.
Actually, those 14.4 terabytes of disk can be increased or lowered as you choose.
The basic compression figures implied in those calculations seem conservative.
Compression figures are a lot more conservative yet, in that Kickfire assumes you’ll have a lot of actual indexes on your data. I’m not sure that’s necessary for most workloads.

Categories: Columnar database management, Data warehouse appliances, Data warehousing, Database compression, Kickfire, Pricing

3 Comments

October 14, 2009

Greenplum is going hybrid columnar as well

Over the past summer, Vertica, VectorWise, and Oracle all announced flavors of hybrid row/columnar storage. Now it’s Greenplum’s turn. Greenplum is actually offering true columnar storage, as opposed to Oracle’s PAX-like scheme — and also as opposed to the kind of Frankencolumn storage Daniel Abadi decries. For example, you don’t have to do a join to retrieve multiple columns; you just ask for them and there they are. Similarly, Greenplum doesn’t maintain explicit row IDs – whether in row-oriented or column-oriented append-only storage – relying instead on block-level header information. Read more

Categories: Analytic technologies, Columnar database management, Data warehousing, Database compression, Greenplum, Theory and architecture

12 Comments

October 6, 2009

Oracle and Vertica on compression and other physical data layout features

In my recent post on Exadata pricing, I highlighted the importance of Oracle’s compression figures to the discussion, and the uncertainty about same. This led to a Twitter discussion featuring Greg Rahn* of Oracle and Dave Menninger and Omer Trajman of Vertica. I also followed up with Omer on the phone. Read more

Categories: Columnar database management, Data models and architecture, Data warehousing, Database compression, Oracle, Theory and architecture, Vertica Systems

14 Comments

October 5, 2009

Oracle Exadata 2 capacity pricing

Summary of Oracle Exadata 2 capacity pricing

Analyzing Oracle Exadata pricing is always harder than one would first think. But I’ve finally gotten around to doing an Oracle Exadata 2 pricing spreadsheet. The main takeaways are:

If we believe Oracle’s claims of 10X compression, Exadata 2 costs more per terabyte of user data than Netezza TwinFin — $22-26K/TB vs. TwinFin’s <$20K — but less than the Teradata 2550.
These figures are highly sensitive to assumptions about Oracle’s hybrid columnar compression.
Similarly, if Netezza or Teradata were to significantly upgrade their own compression, the price comparison would look quite different.
Options such as Data Mining or Oracle Spatial add 12% or so each to Exadata’s total system price.

Longer version

When Oracle introduced Exadata last year it was, well, expensive. Exadata 2 has now been announced, and it is significantly cheaper than Exadata 1 per terabyte of user data, based on:

Similar overall pricing
Twice the disk capacity
Better compression

13 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Database compression

Technical basics of Sybase IQ

Thoughts on IBM’s anti-Oracle announcements

XtremeData update

There sure seem to be a lot of inaccuracies on ParAccel’s website

Calpont’s InfiniDB

Introduction to SenSage

Kickfire capacity and pricing

Greenplum is going hybrid columnar as well

Oracle and Vertica on compression and other physical data layout features

Oracle Exadata 2 capacity pricing

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin