Theory and architecture

Analysis of design choices in databases and database management systems. Related subjects include:

July 7, 2010

Why analytic DBMS increasingly need to be storage-aware

In my quick reactions to the EMC/Greenplum announcement, I opined

I think that even software-only analytic DBMS vendors should design their systems in an increasingly storage-aware manner

promising to explain what I meant later on. So here goes.  Read more

July 6, 2010

The Wonderful One-Hoss Shay

I often write of Bottleneck Whack-A-Mole, an engineering approach that ensues when parts of a system are out of balance. Well, the flip side of that is the One-Hoss Shay, as in Oliver Wendell Holmes’ marvelous poem. (Here’s a version with Howard Pyle illustrations.)  Read more

July 6, 2010

Riptano, and Cassandra adoption

Tonight’s Cassandra technology post got plenty long enough on its own, so I’m separating out business and adoption issues here. For starters, known Cassandra users include:

Fetlife, Meebo, and others seem to at least have a healthy interest in Cassandra, based on their level of involvement in a forthcoming Cassandra Summit. That said, the @Fetlife tweetstream features numerous yelps of pain, and I don’t mean the recreational kind.  Read more

July 6, 2010

Cassandra technical overview

Back in March, I talked with Jonathan Ellis of Rackspace, who runs the Apache Cassandra project. I started drafting a blog post then, but never put it up. Then Jonathan cofounded Riptano, a company to commercialize Cassandra, and so I talked with him again in May. Well, I’m finally finding time to clear my Cassandra/Riptano backlog. I’ll cover the more technical parts below, and the more business- or usage-oriented ones in a companion Cassandra/Riptano post.

Jonathan’s core claims for Cassandra include:

In general, Jonathan positions Cassandra as being best-suited to handle a small number of operations at high volume, throughput, and speed. The rest of what you do, as far as he’s concerned, may well belong in a more traditional SQL DBMS.  Read more

June 30, 2010

Cloudera Enterprise and Hadoop evolution

I talked with Cloudera a couple of weeks ago in connection with the impending release of Cloudera Enterprise. I’d say:  Read more

June 30, 2010

Details and analysis of the VoltDB argument

Todd Hoff (High Scalability blog) posted a lengthy examination of the case and use cases for VoltDB. That excellent post, in turn, is based on a Mike Stonebraker* webinar for VoltDB, for which the slide deck is happily available. It’s all nicely consistent with what I wrote about VoltDB last month, in connection with its launch.  Read more

June 27, 2010

Infobright’s Release 3.4

Infobright called a couple weeks ago to discuss, among other subjects, its subsequently-released Infobright Release 3.4. I made no effort to distinguish between community/open source and professional/chargeable editions, but leaving that aside, it seems fair to characterize Infobright 3.4 as having two overlapping primary themes:

That said, the traditional release for cleaning up the last huge gaps in an analytic DBMS product seems have become 4.0; recent examples include Aster Data, Vertica and Greenplum. Infobright seems on track to be another example of that rule.

Ack. Now that I’ve said that, other vendors are going to be tempted to accelerate their numbering so as to reach the 4.0 mark sooner …

A lot of Infobright performance enhancements are in the vein “We used to rely on generic MySQL for that, but now we do it ourselves, and it works a lot better.” Examples include:  Read more

June 25, 2010

Flash is coming, well …

I really, really wanted to title this post “Flash is coming in a flash.” That seems a little exaggerated — but only a little.

Uptake of solid-state memory (i.e. flash) for analytic database processing will probably stay pretty low in 2010, but in 2011 it should be a notable (b)leading-edge technology, and it should get mainstreamed pretty quickly after that.  Read more

June 21, 2010

What kinds of data warehouse load latency are practical?

I took advantage of my recent conversations with Netezza and IBM to discuss what kinds of data warehouse load latency were practical. In both cases I got the impression:

There’s generally a throughput/latency tradeoff, so if you want very low latency with good throughput, you may have to throw a lot of hardware at the problem.

I’d expect to hear similar things from any other vendor with reasonably mature analytic DBMS technology. Low-latency load is a problem for columnar systems, but both Vertica and ParAccel designed in workarounds from the getgo. Aster Data probably didn’t meet these criteria until Version 4.0, its old “frontline” positioning notwithstanding, but I think it does now.

Related link

June 21, 2010

The Netezza and IBM DB2 approaches to compression

Thursday, I spent 3 ½ hours talking with 10 of Netezza’s more senior engineers. Friday, I talked for 1 ½ hours with IBM Fellow and DB2 Chief Architect Tim Vincent, and we agreed we needed at least 2 hours more. In both cases, the compression part of the discussion seems like a good candidate to split out into a separate post. So here goes.

When you sell a row-based DBMS, as Netezza and IBM do, there are a couple of approaches you can take to compression. First, you can compress the blocks of rows that your DBMS naturally stores. Second, you can compress the data in a column-aware way. Both Netezza and IBM have chosen completely column-oriented compression, with no block-based techniques entering the picture to my knowledge. But that’s about as far as the similarity between Netezza and IBM compression goes.  Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.