Theory and architecture

Analysis of design choices in databases and database management systems. Related subjects include:

Any subcategory
Database diversity
Explicit support for specific data types
(in Text Technologies) Text search

January 18, 2011

Architectural options for analytic database management systems

Mike Stonebraker recently kicked off some discussion about desirable architectural features of a columnar analytic DBMS. Let’s expand the conversation to cover desirable architectural characteristics of analytic DBMS in general. Read more

Categories: Analytic technologies, Aster Data, Benchmarks and POCs, Columnar database management, Data pipelining, Data warehousing, Database compression, Exadata, Michael Stonebraker, Oracle, Solid-state memory, Theory and architecture

5 Comments

January 12, 2011

Mike Stonebraker on “real column stores”

Mike Stonebraker has a post up on Vertica’s blog trying to differentiate “real” from “pretend” column stores. (Edit: That post seems to have come back down, but as of 1/19 it can be found in Google Cache.) In essence, Mike argues that the One Right Way to design a column store is Vertica’s, a position that Daniel Abadi used to share but since has retreated from.

There are some good things about that post, and some not-so-good. The worst paragraph is probably

Several row-store vendors (including Oracle, Greenplum and Aster Data) now claim to be selling a column store. Obviously, this would require a complete rewrite of a DBMS to move from Figure 1 to Figure 2. Hence, none of the “pretenders” have actually done this. Instead all have implemented some aspects of column stores, and then claim to be the real thing. This blog defines what the “real enchilada” looks like, and how to tell it from the pretenders.

which I question on two levels. Read more

Categories: Aster Data, Columnar database management, Database compression, Michael Stonebraker, Sybase, Theory and architecture, Vertica Systems

24 Comments

December 28, 2010

Evolving definitions and technology categories for 2011

It seems my prediction of a limited blogging schedule in December came emphatically true. I shall re-start with a collection of quick thoughts, clearing the decks for more detailed posts to follow. Read more

Categories: Analytic technologies, Data types, Data warehousing, DBMS product categories, MOLAP, Theory and architecture

6 Comments

November 29, 2010

Data that is derived, augmented, enhanced, adjusted, or cooked

On this food-oriented weekend, I could easily go on long metaphorical flights about the distinction between “raw” and “cooked” data. I’ll spare you that part — reluctantly, given my fondness for fresh fruit, sushi, and steak tartare — but there’s no escaping the importance of derived/augmented/enhanced/cooked/adjusted data for analytic data processing. The five areas I have in mind are, loosely speaking:

Aggregates, when they are maintained, generally for reasons of performance or response time.
Calculated scores, commonly based on data mining/predictive analytics.
Text analytics.
The kinds of ETL (Extract/Transform/Load) Hadoop and other forms of MapReduce are commonly used for.
Adjusted data, especially in scientific contexts.

Categories: Analytic technologies, Data warehousing, Derived data

12 Comments

November 29, 2010

Document-oriented DBMS without joins

When I talked with MarkLogic’s Ken Chestnut about MarkLogic 4.2, I was surprised to learn that MarkLogic really, truly doesn’t do anything like a join. Unlike some other non-SQL DBMS, MarkLogic has no SQL interface, no ODBC or JDBC. Nothing, nada. (MarkLogic has a Java interface for Xquery, but not for anything like SQL.)

Categories: CouchDB, MarkLogic, NoSQL, Structured documents, Text, Theory and architecture

8 Comments

October 22, 2010

Notes and links October 22, 2010

A number of recent posts have had good comments. This time, I won’t call them out individually.

Evidently Mike Olson of Cloudera is still telling the machine-generated data story, exactly as he should be. The Information Arbitrage/IA Ventures folks said something similar, focusing specifically on “sensor data” …

… and, even better, went on to say: Read more

Categories: Analytic technologies, Aster Data, Cloudera, eBay, Greenplum, Hadoop, IBM and DB2, In-memory DBMS, Market share and customer counts, Netezza, Open source, Oracle, ParAccel, Petabyte-scale data management, SAS Institute, Surveillance and privacy, Teradata, VoltDB and H-Store

1 Comment

October 18, 2010

More notes on Membase and memcached

As a companion to my post about Membase last week, the company has graciously allowed me to post a rather detailed Membase slide deck. (It even has pricing.) Also, I left one point out.

Membase announced a Cloudera partnership. I couldn’t detect anything technically exciting about that, but it serves to highlight what I do find to be an interesting usage trend. A couple of big Web players (AOL and ShareThis) are using Hadoop to crunch data and derive customer profile data, then feed that back into Membase. Why Membase? Because it can serve up the profile in a millisecond, as part of a bigger 40-millisecond-latency request.

And why Hadoop, rather than Aster Data nCluster, which ShareThis also uses? Umm, I didn’t ask.

When I mentioned this to Colin Mahony, he said Vertica had similar stories. However, I don’t recall whether they were about Membase or just memcached, and he hasn’t had a chance to get back to me with clarification. (Edit: As per Colin’s comment below, it’s both.)

Categories: Aster Data, Cache, Cloudera, Couchbase, Hadoop, memcached, Memory-centric data management, NoSQL, Pricing, Specific users, Vertica Systems, Web analytics

7 Comments

October 17, 2010

Where ParAccel is at

Until recently, I was extremely critical of ParAccel’s marketing. But there was an almost-clean sweep of the relevant ParAccel executives, and the specific worst practices I was calling out have for the most part been eliminated. So I was open to talking and working with ParAccel again, and that’s now happening. On my recent California trip, I chatted with three ParAccel folks for a few hours. Based on that and other conversation, here’s the current ParAccel story as I understand it.
Read more

Categories: Benchmarks and POCs, Columnar database management, Database compression, Investment research and trading, Memory-centric data management, ParAccel, Solid-state memory, Storage, Vertica Systems

10 Comments

October 15, 2010

Notes on data warehouse appliance prices

I’m not terribly motivated to do a detailed analysis of data warehouse appliance list prices, in part because:

Everybody knows that in practice data warehouse appliances tend to be deeply discounted from list price.
The only realistic metric to use for pricing data warehouse appliances is price-per-terabyte, and people have gotten pretty sick of that one.

That said, here are some notes on data warehouse appliance prices. Read more

Categories: Data warehouse appliances, Data warehousing, Database compression, EMC, Exadata, Greenplum, Netezza, Oracle, Pricing

8 Comments

October 12, 2010

Vertica-Hadoop integration

DBMS/Hadoop integration is a confusing subject. My post on the Cloudera/Aster Data partnership awaits some clarification in the comment thread. A conversation with Vertica left me unsure about some Hadoop/Vertica Year 2 details as well, although I’m doing better after a follow-up call. On the plus side, we also covered some rather cool Hadoop/Vertica product futures, and those seemed easier to understand. 🙂

I say “Year 2” because Hadoop/Vertica integration has been going on since last year. Indeed, Vertica says that there are now over 25 users of the Hadoop/Vertica combination and hence Vertica’s Hadoop connector. Vertica is now introducing — for immediate GA — a new version of its Hadoop connector. So far as I understood: Read more

Categories: Analytic technologies, Cloudera, EAI, EII, ETL, ELT, ETLT, Hadoop, MapReduce, Market share and customer counts, SQL/Hadoop integration, Text, Vertica Systems

6 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in