Theory and architecture

Analysis of design choices in databases and database management systems. Related subjects include:

March 2, 2011

How about “Short Request Processing”?

While my other terminology posts seem to have gone pretty well, the Internet Request Processing name is proving a bit problematic. People seem pretty cool with the “request processing” part, but there are issues with the modifier, including:

So how about just going with “short”? OLTP requests are inherently short. “GET” and “SET” are certainly short. 🙂 In general, queries that do not involve JOINs are probably short requests. Analytic queries, however, are generally not short. Even better, all that can apply to the syntax and the execution time alike. 🙂

Please note that I’m focused more here on describing use cases than products. Whether products generally used to do one kind of thing can also be stretched to do another — e.g., complex analytics hardwired into a Cassandra application — is not my primary concern.

February 24, 2011

Terminology: Internet Request Processing (IRP)

As I observed previously, we need a term that means “like OLTP but not necessarily transactional”, to help describe a category of use cases that can reasonably be addressed by NoSQL or scale-out SQL systems alike.* So here’s a candidate phrase: Internet Request Processing (IRP). If we use that, I’ll call Schooner, Cassandra, Couchbase , et al. IRP DBMS, while other people will probably call them IRP databases.

*Consider, for example, the overlapping use cases for Schooner, dbShards, ScaleBase, Couchbase, and DataStax/Cassandra.

In my proposed terminology, an internet request processing (IRP) use case is one in which:  Read more

February 11, 2011

Comments on the 2011 Forrester Wave for Enterprise Data Warehouse Platforms

The Forrester Wave: Enterprise Data Warehouse Platforms, Q1 2011 is now out,* hot on the heels of the Gartner Magic Quadrant. Unfortunately, this particular Forrester Wave is riddled with inaccuracy.  Read more

February 8, 2011

Membase and CouchOne merged to form Couchbase

Membase, the company whose product is Membase and whose former company name is Northscale, has merged with CouchOne, the company whose product is CouchDB and whose former name is Couch.io. The result (product and company) will be called Couchbase. CouchDB inventor Damien Katz will join the Membase (now Couchbase) management team as CTO. Couchbase can reasonably be regarded as a document-oriented NoSQL DBMS, a product category I not coincidentally posted about yesterday.

In essence, Couchbase will be CouchDB with scale-out. Alternatively, Couchbase will be Membase with a richer programming interface. The Couchbase sweet spot is likely to be:  Read more

February 7, 2011

Notes on document-oriented NoSQL

When people talk about document-oriented NoSQL or some similar term, they usually mean something like:

Database management that uses a JSON model and gives you reasonably robust access to individual field values inside a JSON (JavaScript Object Notation) object.

Or, if they really mean,

The essence of whatever it is that CouchDB and MongoDB have in common.

well, that’s pretty much the same thing as what I said in the first place. 🙂

Of the various questions that might arise, three of the more definitional ones are:

Let me take a crack at each.  Read more

February 6, 2011

Columnar compression vs. column storage

I’m getting the increasing impression that certain industry observers, such as Gartner, are really confused about columnar technology. (I further suspect that certain vendors are encouraging this confusion, as vendors commonly do.) So here are some basic points.

A simple way to think about the difference between columnar storage and columnar (or any other kind of) compression is this:

Specifically, if data in a relational table is grouped together according to what row it’s in, then the database manager is called “row-based” or a “row store.” If it’s grouped together according to what column it’s in, then the database management system is called “columnar” or a “column store.” Increasingly, row-based and columnar storage are being hybridized.

There are two main kinds of compression — compression of bit strings and more intelligent compression of actual data values. Compression of actual data values can reasonably be called “columnar,” in that different columns of data can be compressed in different ways, often depending only on the data in that column.*  Read more

February 5, 2011

Comments on the Gartner 2010/2011 Data Warehouse Database Management Systems Magic Quadrant

Edit: Comments on the February, 2012 Gartner Magic Quadrant for Data Warehouse Database Management Systems — and on the companies reviewed in it — are now up.

The Gartner 2010 Data Warehouse Database Management Systems Magic Quadrant is out. I shall now comment, just as I did to varying degrees on the 2009, 2008, 2007, and 2006 Gartner Data Warehouse Database Management System Magic Quadrants.

Note: Links to Gartner Magic Quadrants tend to be unstable. Please alert me if any problems arise; I’ll edit accordingly.

In my comments on the 2008 Gartner Data Warehouse Database Management Systems Magic Quadrant, I observed that Gartner’s “completeness of vision” scores were generally pretty reasonable, but their “ability to execute” rankings were somewhat bizarre; the same remains true this year. For example, Gartner ranks Ingres higher by that metric than Vertica, Aster Data, ParAccel, or Infobright. Yet each of those companies is growing nicely and delivering products that meet serious cutting-edge analytic DBMS needs, neither of which has been true of Ingres since about 1987.  Read more

February 2, 2011

Exadata notes

It’s been a while since I penetrated Oracle’s tight message control and actually talked with them about Exadata. But Doug Henschen wrote a good article about Exadata based on an Andy Mendelsohn webcast. I agree with almost all of it. At first I was a little surprised that Exadata’s emphasis shift from data warehousing to OLTP/generic consolidation hasn’t gone more quickly, but on the other hand:

Doug did overstate when he said that columnar architectures give 100X or more compression. That doesn’t happen. Yes, columnar compression can be >10X in a variety of use cases, while pre-Exadata Oracle index bloat can approach 10X at times; but even if you’re counting that way I doubt there are many instances in which it actually multiplies out to >100.

In other Exadata news, the long-standing observation that Oracle doesn’t like to do on-site Exadata POCs still holds true. A couple of existing Oracle users — one rather well-known — recently told me that Oracle won’t let them text Exadata except on Oracle premises. In one case, this is a deal-breaker keeping Exadata from being considered for a purchase, and Oracle still won’t budge.

Finally, I’m pretty sure that this “new” Softbank Teradata replacement Oracle has been touting since September as competitive evidence — which Doug’s article also references — isn’t quite what it sounds like. I believe Teradata’s version of the story, which somewhat edited goes like this:  Read more

February 1, 2011

Cassandra company DataStax (formerly Riptano) is on track

Riptano, the Cassandra company, has changed its name to DataStax. DataStax has opened headquarters in Burlingame and hired some database-experienced folks – notably Ben Werther from Greenplum and Michael Weir from ParAccel, with Zenobia Godschalk (who worked with Aster Data) somewhere in the outside PR mix. Other than that, what’s new at DataStax is pretty much what could have been expected based on what DataStax folks said last spring.

Most notably, DataStax is introducing a software offering, whose full name is DataStax OpsCenter for Apache Cassandra. DataStax OpsCenter for Apache Cassandra seems to be, in essence, a monitoring tool for Cassandra clusters, with a bit of capacity planning bundled in. (If there are any outright operations parts to DataStax OpsCenter, they got overlooked in our conversation.)* Read more

January 24, 2011

Choices in analytic computing system design

When I posted a long list of architectural options for analytic DBMS, I left a couple of IOUs in for missing parts. One was in the area of what is sometimes called advanced-analytics functionality, which roughly speaking means aspects of analytic database management systems that are not directly related to conventional* SQL queries.

*Main examples of “conventional” = filtering, simple aggregrations.

The point of such functionality is generally twofold. First, it helps you execute analytic algorithms with high performance, due to reducing data movement and/or executing the analytics in parallel. Second, it helps you create and execute sophisticated analytic processes with (relatively) little effort.

For now, I’m going to refer to an analytic RDBMS that has been extended by advanced-analytics functionality as an analytic computing system, rather than as some kind of “platform,” although I suspect the latter term is more likely to wind up winning.  So far, there have been five major categories of subsystem or add-on module that contribute to making an analytic DBMS a more fully-fledged analytic computing system:

Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.