Theory and architecture
Analysis of design choices in databases and database management systems. Related subjects include:
- Any subcategory
- Database diversity
- Explicit support for specific data types
- (in Text Technologies) Text search
Membase simplifies name, goes GA
The company Northscale that makes the product Membase is now the company Membase that makes the product Membase. Good. Also, the product Membase has now gone GA.
I wrote back in August about Membase, and that covers most of what I think, with perhaps a couple of exceptions: Read more
Categories: Basho and Riak, Cache, Couchbase, memcached, Memory-centric data management, NoSQL | 4 Comments |
NoSQL overview
My NoSQL article is finally posted; I hope it lives up to all the foreshadowing. It is being run online at Intelligent Enterprise/Information Week, as per the link above, where Doug Henschen edited it with an admirably light touch.
Below please find three excerpts* that convey the essence of my thinking on NoSQL. For much more detail, please see the article itself.
*Notwithstanding my admiration for Doug’s editing, the excerpts are taken from my final pre-editing submission, not from the published article itself.
My quasi-definition of “NoSQL” wound up being: Read more
Categories: Database diversity, NoSQL, Parallelization | 18 Comments |
A few notes from XLDB 4
As much as I believe in the XLDB conferences, I only found time to go to (a big) part of one day of XLDB 4 myself. In general: Read more
Categories: Analytic technologies, Health care, Michael Stonebraker, MySQL, Open source, Parallelization, Petabyte-scale data management, Scientific research, Surveillance and privacy | 2 Comments |
Partnering with Cloudera
After I criticized the marketing of the Aster/Cloudera partnership, my clients at Aster Data and Cloudera ganged up on me and tried to persuade me I was wrong. Be that as it may, that conversation and others were helpful to me in understanding the core thesis: Read more
Categories: Analytic technologies, Aster Data, Cloudera, Data warehousing, Database diversity, Hadoop, MapReduce, Parallelization, Petabyte-scale data management | 11 Comments |
eBay followup — Greenplum out, Teradata > 10 petabytes, Hadoop has some value, and more
I chatted with Oliver Ratzesberger of eBay around a Stanford picnic table yesterday (the XLDB 4 conference is being held at Jacek Becla’s home base of SLAC, which used to stand for “Stanford Linear Accelerator Center”). Todd Walter of Teradata also sat in on the latter part of the conversation. Things I learned included: Read more
Categories: Data warehousing, Derived data, eBay, Greenplum, Hadoop, HBase, Log analysis, Petabyte-scale data management, Teradata | 30 Comments |
How to tell whether you need ACID-compliant transaction integrity
In a post about the recent JPMorgan Chase database outage, I suggested that JPMorgan Chase’s user profile database was over-engineered, in that various web surfing data was stored in a fully ACID-compliant manner when it didn’t really need to be. I’ve since gotten private communication expressing vehement agreement, and telling of the opposite choice being major in other major web-facing transactional systems.
What’s going on is this:
- ACID-compliant transaction integrity commonly costs more in terms of DBMS licenses and many other components of TCO (Total Cost of Ownership) than less rigorous approaches.
- Worse, it can actually hurt application uptime, by forcing your system to pull in its horns and stop functioning in the face of failures that a non-transactional system might smoothly work around.
- Other flavors of “complexity can be a bad thing” apply as well.
Thus, transaction integrity can be more trouble than it’s worth.
In essence, of course, that’s half of the classic NoSQL claim, where the other half of the claim is to assert that the same may be said of joins.
So when should you go for ACID-compliant transaction integrity, and when shouldn’t you bother? Every situation is different, but here’s a set of considerations to start you off. Read more
Categories: NoSQL, Web analytics | 12 Comments |
Aster Data nCluster Version 4.6
The main thing in Aster Data nCluster Version 4.6 is Aster’s version of hybrid row-column store technology. Technical highlights include:
- Aster Data is simply taking the number of storage options in nCluster up from 1 to 2 – you now can store a table either in the Aster Data nCluster row store or column store.
- In fact, you can store parts of a table in the Aster Data nCluster row store and other parts in the Aster Data nCluster column store. I‘m a bit foggy on the details of that – Aster makes discussions of partitioning more complicated than they need to be — but it definitely sounds pretty flexible. Edit: See comment thread below.
- Anything you can do with the Aster Data nCluster row store you can also do with the Aster Data nCluster column store. In particular, that includes all of Aster Data’s analytic functionality.
- The same is true vice-versa. There is no columnar-oriented kind of compression in Aster Data nCluster at this time.
So Aster Data has now joined Greenplum/EMC among row-based analytic DBMS vendors with hybrid row-column stores. Oracle will join them some day, and the same probably applies to other row-based vendors as well. Similarly, Aster Data will probably join Oracle some day in having columnar compression. And so this all fits the model:
- Aster Data has an impressively competitive analytic relational DBMS, considering the youth and size of the company.
- Aster Data is a leader in extending its analytic relational DBMS by integrating in other analytic processing capabilities.
Categories: Analytic technologies, Aster Data, Columnar database management, Data warehousing, Database compression | 4 Comments |
More on NoSQL and HVSP (or OLRP)
Since posting last Wednesday morning that I’m looking into NoSQL and HVSP, I’ve had a lot of conversations, including with (among others):
- Dwight Merriman of 10gen (MongoDB)
- Damien Katz of Couchio (CouchDB)
- Matt Pfeil of Riptano (Cassandra)
- Todd Lipcon of Cloudera (HBase committer)
- Tony Falco of Basho (Riak)
- John Busch of Schooner
- Ori Herrnstadt of Akiban
Workday comments on its database architecture
In my discussion of Workday’s technology, I gave an estimate that Workday’s database, if relationally designed, would require “1000s” of tables. That estimate came from Workday, Inc. CTO Stan Swete, in a thoughtful email that made several points about Workday’s database strategy. Workday kindly gave me permission to quote it below.
Read more
Categories: Data models and architecture, Object, OLTP, Software as a Service (SaaS), Specific users, Theory and architecture, Workday | 3 Comments |
The Workday architecture — a new kind of OLTP software stack
One of my coolest company visits in some time was to SaaS (Software as a Service) vendor Workday, Inc., earlier this month. Reasons included:
- Workday has forward-thinking ideas about SaaS enterprise applications and the integration of business intelligence into same.
- Workday has highly innovative ideas in how it manages data.
- Companies founded by Dave Duffield tend to feature smart, likeable people who talk to one pleasantly and forthrightly. Workday is no exception; CTO Stan Swete and the other Workday folks present were a delight to talk with.
- I’d invited Merv Adrian to come along with me. He asked great questions, and I could gather myself a bit despite how sleep-deprived I was for the first part of that trip.
Workday kindly allowed me to post this Workday slide deck. Otherwise, I’ve split out a quick Workday, Inc. company overview into a separate post.
The biggie for me was the data and object management part. Specifically: Read more