Data models and architecture

Discussion of issues in data modeling, and whether databases should be consolidated or loosely coupled. Related subjects include:

December 18, 2007

Amazon SimpleDB – when less is, supposedly, enough

I’ve posted several times about Amazon as an innovative, super-high-end user — doing transactional object caching with ObjectStore, building an inhouse less-than-DBMS called Dynamo, or just generally adopting a very DBMS2-like approach to data management. Now Amazon is bring the Dynamo idea to the public, via a SaaS offering called SimpleDB. (Hat tip to Tim Anderson.)

SimpleDB is obviously meant to be a data server for online applications. There are no joins, and queries don’t run over 5 seconds, so serious analytics are out of the question. Domains are limited to 10GB for now, so extreme media file serving also isn’t what’s intended; indeed, Amazon encourages one to use SimpleDB to store pointers to larger objects stored as files in Amazon S3.

On the other hand, if you think of SimpleDB as an OLTP DBMS, your head might explode. There’s no sense of transaction, no mechanisms to help with integrity, no way to do arithmetic, and indeed no assurance that writes will be immediately reflected in reads. Read more

December 2, 2007

Amazon Dynamo — when primary key access is enough

Amazon has a very decentralized technical operation. But even the individual pieces have interestingly huge scale. Thus, various different things they’re doing are of interest.

They recently presented a research paper on a high-performance transactional system called Dynamo. (Hat tip to Dare Obasanjo.) A key point is the following:

There are many services on Amazon’s platform that only need primary-key access to a data store. For many services, such as those that provide best seller lists, shopping carts, customer preferences, session management, sales rank, and product catalog, the common pattern of using a relational database would lead to inefficiencies and limit scale and availability. Dynamo provides a simple primary-key only interface to meet the requirements of these applications.

Now, I don’t think too many organizations past Amazon are going to decide that they can’t afford the overhead of an RDBMS for such OLTP-like applications. But I do think it will become increasingly common to find other reasons to eschew traditional OLTP relational architectures. Maybe you’ll want the schema flexibility of XML. Or perhaps you’ll be happy with a fixed relational schema, but will want to optimize for analytic performance.

October 23, 2007

Vertica — just star and snowflake schemas?

One of the longest-running technotheological disputes I know of is the one pitting flat/normalized data warehouse architectures vs. cubes, stars, and snowflake schemas. Teradata, for example, is a flagwaver for the former camp; Microstrategy is firmly in the latter. (However, that doesn’t keep lots of retailers from running Microstrategy on Teradata boxes.) Attensity (a good Teradata partner) is in the former camp; text mining rival Clarabridge (sort of a Microstrategy spinoff) is in the latter. And so on.

Vertica is clearly in the star/snowflake camp as well. I asked them about this, and Vertica’s CTO Mike Stonebraker emailed a response. I’m reproducing it below, with light edits; the emphasis is also mine. Key points include:

Great question. This is something that we’ve thought a lot about and have done significant research on with large enterprise customers. … short answer is as follows:

Vertica supports star and snowflake schemas because that is the desired data structure for data warehousing. The overwhelming majority of the schemas we see are of this form, and we have highly optimized for this case. Read more

September 24, 2007

Pervasive Summit PSQL v10

Pervasive Software has a long history – 25 years, in fact, as they’re emphasizing in some current marketing. Ownership and company name have changed a few times, as the company went from being an independent startup to being owned by Novell to being independent again. The original product, and still the cash cow, was a linked-list DBMS called Btrieve, eventually renamed Pervasive PSQL as it gained more and more relational functionality.

Pervasive Summit PSQL v10 has just been rolled out, and I wrote a nice little white paper to commemorate the event, describing some of the main advances over v9, primarily for the benefit of current Pervasive PSQL developers. In one major advance, Pervasive made the SQL functionality much stronger. In particular, you now can have a regular SQL data dictionary, so that the database can be used for other purposes – BI, additional apps, whatever. Apparently, that wasn’t possible before, although it had been possible in yet earlier releases. Pervasive also added view-based security permissions, which is obviously a Very Good Thing.

There also are some big performance boosts. Read more

June 15, 2007

Fast RDF in specialty relational databases

When Mike Stonebraker and I discussed RDF yesterday, he quickly turned to suggesting fast ways of implementing it over an RDBMS. Then, quite characteristically, he sent over a paper that allegedly covered them, but actually was about closely related schemes instead. 🙂 Edit: The paper has a new, stable URL. Hat tip to Daniel Abadi.

All minor confusion aside, here’s the story. At its core, an RDF database is one huge three-column table storing subject-property-object triples. In the naive implementation, you then have to join this table to itself repeatedly. Materialized views are a good start, but they only take you so far. Read more

June 15, 2007

RDF “definitely has legs”

Thus spake Mike Stonebraker to me, on a call we’d scheduled to talk about several other things altogether. This was one day after I was told at the Text Analytics Summit that the US government is going nuts for RDF. And I continue to get confirmation of something I first noted last year — Oracle is pushing RDF heavily, especially in the life sciences market.

Evidently, the RDF data model is for real … unless, of course, you’re the kind of purist who cares to dispute whether RDF is a true “data model” at all.

December 9, 2005

More flame war stupidity

Robert Seiner (publisher of TDAN) and Fabian Pascal are now claiming that Computerworld approached Bob and asked him to do something about the false charge that I personally engaged in censorship. To the best of my knowledge, they’re both lying. It was just me, and me alone, who approached Bob, which is exactly what one would think, if for some odd reason one cared about the matter at all. I don’t have the faintest idea why they fabricated this story, or what they think it demonstrates — but they did.

Seiner also picked a title for an article of mine he published, then published one by Fabian attacking me for the title. Classy.

Bob also made two promises in the matter which he didn’t keep. Nor did he have the courtesy to inform me that he’d changed his mind, nor did he really address it when I called him on it.

I wondered why Seiner kept on publishing Pascal’s stuff, even for free, when most of Fabian’s other publishers have dropped him. Now I have a better idea. They’re soulmates.

A pity. Partway through our discussions, Bob sounded eminently reasonable. That’s why I jumped at his suggestion I write an article for him. Oh well; live and learn.

And for the record — no, I won’t respond to Pascal’s critiques point by point. He typically attacks straw men, rather than restricting his barbs to my actual opinions. In those areas where we do actually disagree, I haven’t hesitated to publish follow-on arguments, repeatedly and at length, here and elsewhere. I’ve given that relative nonentity much more attention than he deserves.

Also for the record — even though I don’t respond to every nasty shot Pascal and his associates take at me, I’m of course not conceding that his other libels and opinions are actually correct. I just think that by and large he’s a waste of bandwidth, because even his coherent ideas are quickly sidetracked by highly illogical fulminations. Even in articles where he’s otherwise making enough sense to respond to, he usually goes off on some extremist semantics-related kick that doesn’t mesh well with his own imperfect command of the English language.

(I really want to respond to his film contracts example from a three-year-old anti-XML diatribe. But the article gets bogged down with various “definitions” that are not easily reconciled to normal usage of the words, and it’s too much trouble to sort through them all. Maybe I’ll respond to the idea without linking to the article itself, when I get around to it.)

Exception to the above slam at Pascal — he recently posted a good interchange he had with Hugh Darwen, which I’m referencing in another post in this blog. His side was wrong, but both sides were well-presented.

← Previous Page

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.