Theory and architecture

Analysis of design choices in databases and database management systems. Related subjects include:

Any subcategory
Database diversity
Explicit support for specific data types
(in Text Technologies) Text search

July 7, 2010

Why analytic DBMS increasingly need to be storage-aware

In my quick reactions to the EMC/Greenplum announcement, I opined

I think that even software-only analytic DBMS vendors should design their systems in an increasingly storage-aware manner

promising to explain what I meant later on. So here goes. Read more

Categories: Data warehouse appliances, Data warehousing, Solid-state memory, Storage, Theory and architecture

6 Comments

July 6, 2010

The Wonderful One-Hoss Shay

I often write of Bottleneck Whack-A-Mole, an engineering approach that ensues when parts of a system are out of balance. Well, the flip side of that is the One-Hoss Shay, as in Oliver Wendell Holmes’ marvelous poem. (Here’s a version with Howard Pyle illustrations.) Read more

Categories: Humor, Theory and architecture

1 Comment

July 6, 2010

Riptano, and Cassandra adoption

Tonight’s Cassandra technology post got plenty long enough on its own, so I’m separating out business and adoption issues here. For starters, known Cassandra users include:

Facebook, which has said it has 150 or so Cassandra nodes (but see below)
Twitter, which has said it has 45 or so Cassandra nodes
Rackspace, which used to be Jonathan Ellis’ employer, and now is backing Cassandra company Riptano
Digg, which along with Twitter and Rackspace was one of the three major users helping advance the Cassandra project
OpenX, Simple Geo, Digital Reasoning, who Jonathan cited as production users in March
Cloudkick, as noted and linked in my other post
Two customers Riptano named at launch (but I’ve forgotten who they were*)

Fetlife, Meebo, and others seem to at least have a healthy interest in Cassandra, based on their level of involvement in a forthcoming Cassandra Summit. That said, the @Fetlife tweetstream features numerous yelps of pain, and I don’t mean the recreational kind. Read more

Categories: Cassandra, DataStax, Facebook, Market share and customer counts, NoSQL, Open source, Parallelization, Pricing, Specific users

5 Comments

July 6, 2010

Cassandra technical overview

Back in March, I talked with Jonathan Ellis of Rackspace, who runs the Apache Cassandra project. I started drafting a blog post then, but never put it up. Then Jonathan cofounded Riptano, a company to commercialize Cassandra, and so I talked with him again in May. Well, I’m finally finding time to clear my Cassandra/Riptano backlog. I’ll cover the more technical parts below, and the more business- or usage-oriented ones in a companion Cassandra/Riptano post.

Jonathan’s core claims for Cassandra include:

Cassandra is shared-nothing.
Cassandra has good approaches to replication and partitioning, right out of the box.
In particular, Cassandra is good for use cases that distribute a database around the world and want to access it at “local” latencies. (Indeed, Jonathan asserts that non-local replication is a significant non-big-data Cassandra use case.)
Cassandra’s scale-out is application-transparent, unlike sharded MySQL’s.
Cassandra is fast at both appends and range queries, which would be hard to accomplish in a pure key-value store.

In general, Jonathan positions Cassandra as being best-suited to handle a small number of operations at high volume, throughput, and speed. The rest of what you do, as far as he’s concerned, may well belong in a more traditional SQL DBMS. Read more

Categories: Amazon and its cloud, Cassandra, DataStax, Facebook, Google, Log analysis, NoSQL, Open source, Parallelization

4 Comments

June 30, 2010

Cloudera Enterprise and Hadoop evolution

I talked with Cloudera a couple of weeks ago in connection with the impending release of Cloudera Enterprise. I’d say: Read more

Categories: Cloudera, Data integration and middleware, EAI, EII, ETL, ELT, ETLT, eBay, Hadoop, Investment research and trading, MapReduce, Market share and customer counts, Petabyte-scale data management, Pricing, Specific users, Web analytics

7 Comments

June 30, 2010

Details and analysis of the VoltDB argument

Todd Hoff (High Scalability blog) posted a lengthy examination of the case and use cases for VoltDB. That excellent post, in turn, is based on a Mike Stonebraker* webinar for VoltDB, for which the slide deck is happily available. It’s all nicely consistent with what I wrote about VoltDB last month, in connection with its launch. Read more

Categories: In-memory DBMS, Michael Stonebraker, OLTP, Parallelization, Theory and architecture, VoltDB and H-Store

3 Comments

June 27, 2010

Infobright’s Release 3.4

Infobright called a couple weeks ago to discuss, among other subjects, its subsequently-released Infobright Release 3.4. I made no effort to distinguish between community/open source and professional/chargeable editions, but leaving that aside, it seems fair to characterize Infobright 3.4 as having two overlapping primary themes:

Performance and bottleneck cleanup.
“Omigod, you mean you didn’t have that feature before?” cleanup.

That said, the traditional release for cleaning up the last huge gaps in an analytic DBMS product seems have become 4.0; recent examples include Aster Data, Vertica and Greenplum. Infobright seems on track to be another example of that rule.

Ack. Now that I’ve said that, other vendors are going to be tempted to accelerate their numbering so as to reach the 4.0 mark sooner …

A lot of Infobright performance enhancements are in the vein “We used to rely on generic MySQL for that, but now we do it ourselves, and it works a lot better.” Examples include: Read more

Categories: Data warehousing, Infobright, MySQL, Workload management

6 Comments

June 25, 2010

Flash is coming, well …

I really, really wanted to title this post “Flash is coming in a flash.” That seems a little exaggerated — but only a little.

Netezza now intends to come out with a flash-based appliance earlier than it originally expected.
Indeed, Netezza has suspended — by which I mean “scrapped” — prior plans for a RAM-heavy disk-based appliance. It will use a RAM/flash combo instead.*
Tim Vincent of IBM told me that customers seem ready to adopt solid-state memory. One interesting comment he made is that Flash isn’t really all that much more expensive than high-end storage area networks.

Uptake of solid-state memory (i.e. flash) for analytic database processing will probably stay pretty low in 2010, but in 2011 it should be a notable (b)leading-edge technology, and it should get mainstreamed pretty quickly after that. Read more

Categories: Data integration and middleware, Data warehousing, IBM and DB2, Memory-centric data management, Netezza, Solid-state memory, Theory and architecture

4 Comments

June 21, 2010

What kinds of data warehouse load latency are practical?

I took advantage of my recent conversations with Netezza and IBM to discuss what kinds of data warehouse load latency were practical. In both cases I got the impression:

Subsecond load latency is substantially impossible. Doing that amounts to OLTP.
5 seconds or so is doable with aggressive investment and tuning.
Several minute load latency is pretty easy.
10-15 minute latency or longer is now very routine.

There’s generally a throughput/latency tradeoff, so if you want very low latency with good throughput, you may have to throw a lot of hardware at the problem.

I’d expect to hear similar things from any other vendor with reasonably mature analytic DBMS technology. Low-latency load is a problem for columnar systems, but both Vertica and ParAccel designed in workarounds from the getgo. Aster Data probably didn’t meet these criteria until Version 4.0, its old “frontline” positioning notwithstanding, but I think it does now.

Related link

Just what is your need for speed anyway?

Categories: Analytic technologies, Aster Data, Columnar database management, Data warehousing, IBM and DB2, Netezza, ParAccel, Vertica Systems

4 Comments

June 21, 2010

The Netezza and IBM DB2 approaches to compression

Thursday, I spent 3 ½ hours talking with 10 of Netezza’s more senior engineers. Friday, I talked for 1 ½ hours with IBM Fellow and DB2 Chief Architect Tim Vincent, and we agreed we needed at least 2 hours more. In both cases, the compression part of the discussion seems like a good candidate to split out into a separate post. So here goes.

When you sell a row-based DBMS, as Netezza and IBM do, there are a couple of approaches you can take to compression. First, you can compress the blocks of rows that your DBMS naturally stores. Second, you can compress the data in a column-aware way. Both Netezza and IBM have chosen completely column-oriented compression, with no block-based techniques entering the picture to my knowledge. But that’s about as far as the similarity between Netezza and IBM compression goes. Read more

Categories: Data warehousing, Database compression, IBM and DB2, Microsoft and SQL*Server, Netezza

17 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Theory and architecture

Why analytic DBMS increasingly need to be storage-aware

The Wonderful One-Hoss Shay

Riptano, and Cassandra adoption

Cassandra technical overview

Cloudera Enterprise and Hadoop evolution

Details and analysis of the VoltDB argument

Infobright’s Release 3.4

Flash is coming, well …

What kinds of data warehouse load latency are practical?

The Netezza and IBM DB2 approaches to compression

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin