Theory and architecture

Analysis of design choices in databases and database management systems. Related subjects include:

Any subcategory
Database diversity
Explicit support for specific data types
(in Text Technologies) Text search

October 11, 2010

Membase simplifies name, goes GA

The company Northscale that makes the product Membase is now the company Membase that makes the product Membase. Good. Also, the product Membase has now gone GA.

I wrote back in August about Membase, and that covers most of what I think, with perhaps a couple of exceptions: Read more

Categories: Basho and Riak, Cache, Couchbase, memcached, Memory-centric data management, NoSQL

4 Comments

October 11, 2010

NoSQL overview

My NoSQL article is finally posted; I hope it lives up to all the foreshadowing. It is being run online at Intelligent Enterprise/Information Week, as per the link above, where Doug Henschen edited it with an admirably light touch.

Below please find three excerpts* that convey the essence of my thinking on NoSQL. For much more detail, please see the article itself.

*Notwithstanding my admiration for Doug’s editing, the excerpts are taken from my final pre-editing submission, not from the published article itself.

My quasi-definition of “NoSQL” wound up being: Read more

Categories: Database diversity, NoSQL, Parallelization

18 Comments

October 10, 2010

A few notes from XLDB 4

As much as I believe in the XLDB conferences, I only found time to go to (a big) part of one day of XLDB 4 myself. In general: Read more

Categories: Analytic technologies, Health care, Michael Stonebraker, MySQL, Open source, Parallelization, Petabyte-scale data management, Scientific research, Surveillance and privacy

2 Comments

October 10, 2010

Partnering with Cloudera

After I criticized the marketing of the Aster/Cloudera partnership, my clients at Aster Data and Cloudera ganged up on me and tried to persuade me I was wrong. Be that as it may, that conversation and others were helpful to me in understanding the core thesis: Read more

Categories: Analytic technologies, Aster Data, Cloudera, Data warehousing, Database diversity, Hadoop, MapReduce, Parallelization, Petabyte-scale data management

11 Comments

October 6, 2010

eBay followup — Greenplum out, Teradata > 10 petabytes, Hadoop has some value, and more

I chatted with Oliver Ratzesberger of eBay around a Stanford picnic table yesterday (the XLDB 4 conference is being held at Jacek Becla’s home base of SLAC, which used to stand for “Stanford Linear Accelerator Center”). Todd Walter of Teradata also sat in on the latter part of the conversation. Things I learned included: Read more

Categories: Data warehousing, Derived data, eBay, Greenplum, Hadoop, HBase, Log analysis, Petabyte-scale data management, Teradata

30 Comments

September 21, 2010

How to tell whether you need ACID-compliant transaction integrity

In a post about the recent JPMorgan Chase database outage, I suggested that JPMorgan Chase’s user profile database was over-engineered, in that various web surfing data was stored in a fully ACID-compliant manner when it didn’t really need to be. I’ve since gotten private communication expressing vehement agreement, and telling of the opposite choice being major in other major web-facing transactional systems.

What’s going on is this:

ACID-compliant transaction integrity commonly costs more in terms of DBMS licenses and many other components of TCO (Total Cost of Ownership) than less rigorous approaches.
Worse, it can actually hurt application uptime, by forcing your system to pull in its horns and stop functioning in the face of failures that a non-transactional system might smoothly work around.
Other flavors of “complexity can be a bad thing” apply as well.

Thus, transaction integrity can be more trouble than it’s worth.

In essence, of course, that’s half of the classic NoSQL claim, where the other half of the claim is to assert that the same may be said of joins.

So when should you go for ACID-compliant transaction integrity, and when shouldn’t you bother? Every situation is different, but here’s a set of considerations to start you off. Read more

Categories: NoSQL, Web analytics

12 Comments

September 15, 2010

Aster Data nCluster Version 4.6

The main thing in Aster Data nCluster Version 4.6 is Aster’s version of hybrid row-column store technology. Technical highlights include:

Aster Data is simply taking the number of storage options in nCluster up from 1 to 2 – you now can store a table either in the Aster Data nCluster row store or column store.
In fact, you can store parts of a table in the Aster Data nCluster row store and other parts in the Aster Data nCluster column store. I‘m a bit foggy on the details of that – Aster makes discussions of partitioning more complicated than they need to be — but it definitely sounds pretty flexible. Edit: See comment thread below.
Anything you can do with the Aster Data nCluster row store you can also do with the Aster Data nCluster column store. In particular, that includes all of Aster Data’s analytic functionality.
The same is true vice-versa. There is no columnar-oriented kind of compression in Aster Data nCluster at this time.

So Aster Data has now joined Greenplum/EMC among row-based analytic DBMS vendors with hybrid row-column stores. Oracle will join them some day, and the same probably applies to other row-based vendors as well. Similarly, Aster Data will probably join Oracle some day in having columnar compression. And so this all fits the model:

Aster Data has an impressively competitive analytic relational DBMS, considering the youth and size of the company.
Aster Data is a leader in extending its analytic relational DBMS by integrating in other analytic processing capabilities.

Categories: Analytic technologies, Aster Data, Columnar database management, Data warehousing, Database compression

4 Comments

August 26, 2010

More on NoSQL and HVSP (or OLRP)

Since posting last Wednesday morning that I’m looking into NoSQL and HVSP, I’ve had a lot of conversations, including with (among others):

Dwight Merriman of 10gen (MongoDB)
Damien Katz of Couchio (CouchDB)
Matt Pfeil of Riptano (Cassandra)
Todd Lipcon of Cloudera (HBase committer)
Tony Falco of Basho (Riak)
John Busch of Schooner
Ori Herrnstadt of Akiban

Categories: Akiban, Basho and Riak, Cache, Cassandra, Cloudera, Clustrix, CouchDB, DataStax, Facebook, Hadoop, HBase, memcached, MySQL, NewSQL, NoSQL, Object, OLTP, Open source, Parallelization, Schooner Information Technology, Theory and architecture, Tokutek and TokuDB

3 Comments

August 22, 2010

Workday comments on its database architecture

In my discussion of Workday’s technology, I gave an estimate that Workday’s database, if relationally designed, would require “1000s” of tables. That estimate came from Workday, Inc. CTO Stan Swete, in a thoughtful email that made several points about Workday’s database strategy. Workday kindly gave me permission to quote it below.
Read more

Categories: Data models and architecture, Object, OLTP, Software as a Service (SaaS), Specific users, Theory and architecture, Workday

3 Comments

August 22, 2010

The Workday architecture — a new kind of OLTP software stack

One of my coolest company visits in some time was to SaaS (Software as a Service) vendor Workday, Inc., earlier this month. Reasons included:

Workday has forward-thinking ideas about SaaS enterprise applications and the integration of business intelligence into same.
Workday has highly innovative ideas in how it manages data.
Companies founded by Dave Duffield tend to feature smart, likeable people who talk to one pleasantly and forthrightly. Workday is no exception; CTO Stan Swete and the other Workday folks present were a delight to talk with.
I’d invited Merv Adrian to come along with me. He asked great questions, and I could gather myself a bit despite how sleep-deprived I was for the first part of that trip.

Workday kindly allowed me to post this Workday slide deck. Otherwise, I’ve split out a quick Workday, Inc. company overview into a separate post.

The biggie for me was the data and object management part. Specifically: Read more

Categories: Business intelligence, Data integration and middleware, Data models and architecture, EAI, EII, ETL, ELT, ETLT, NoSQL, Object, OLTP, Software as a Service (SaaS), Specific users, Theory and architecture, Workday

13 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in