Memory-centric data management

Analysis of technologies that manage data entirely or primarily in random-access memory (RAM). Related subjects include:

January 11, 2006

Another OLTP success for memory-centric OO

Computerworld published a Progress ObjectStore OLTP success story.

Hotel reservations system, this time. Not as impressive as the Amazon store — what is? — but still nice.

December 20, 2005

Solid state (Flash) memory vs. RAM vs. disks

I just wrote a column and a blog post on the potential for diskless PCs based on flash drives. It was a fun exercise, and I think I kept it general enough that my lack of knowledge about hardware technology details didn’t lead me into significant error.

The first vendor response I got was from Bit Micro Networks, who seem to sell such drives for PCs and enterprise storage alike. One of their press releases touts an Oracle implementation. Interesting idea. It’s far from a substitute for full memory-centric data management, but it’s kind of an intermediate way of getting some of the benefits without altering your traditional software setup much at all.

December 9, 2005

SAP’s version of DBMS2

I just spent a couple of days at SAP’s analyst meeting, and realized something I’d somewhat forgotten – much of the DBMS2 concept was inspired by SAP’s technical strategy. That’s not to say that SAP’s techies necessarily agree with me on every last point. But I do think it is interesting to review SAP’s version of DBMS2, to the extent I understand it.

1. SAP’s Enterprise Services Architecture (ESA) is meant to be, among other things, an abstraction layer over relational DBMS. The mantra is that they’re moving to a “message-based architecture” as opposed to a “database architecture.” These messages are in the context of a standards-based SOA, with a strong commitment to remaining open and standards-based, at least on the data and messaging levels. (The main limitation on openness that I’ve detected is that they don’t think much of standards such as BPEL in the business process definition area, which aren’t powerful enough for them.)

2. One big benefit they see to this strategy is that it reduces the need to have grand integrated databases. If one application manages data for an entity that is also important to another application, the two applications can exchange messages about the entity. Anyhow, many of their comments make it clear that, between partner company databases (a bit of a future) and legacy app databases (a very big factor in the present day), SAP is constantly aware of situations in which a single integrated database in infeasible.

3. SAP is still deeply suspicious of redundant transactional data. They feel that with redundant data you can’t have a really clean model – unless, of course, you code up really rigorous synchronization. However, if for some reason synchronization is preferred – e.g., for performance reasons — it can be hidden from users and most developers.

4. One area where SAP definitely favors redundancy and synchronization is data warehousing. Indeed, they have an ever more elaborate staging system to move data from operational to analytic systems.

5. In general, they are far from being relational purists. For example, Shai Agassi referred to doing things that you can’t do in a pure relational approach. And Peter Zencke reminded me that this attitude is nothing new. SAP has long had complex business objects, and even done some of its own memory management to make them performant, when they were structured in a manner that RDBMS weren’t well suited for. (I presume he was referring largely to BAPI.)

6. That said, they’re of course using relational data stores today for most things. One exception is text/content, which they prefer to store in their own text indexing/management system TREX. Another example is their historical support for MOLAP, although they seem to be edging as far away from that as they can without offending the MOLAP-loving part of their customer base.

Incidentally, the whole TREX strategy is subject to considerable doubt too. It’s not a state-of-the-art product, and they currently don’t plan to make it into one. In particular, they have a prejudice against semi-automated ontology creation, and that has clearly become a requirement for top-tier text technologies.

7. One thing that Peter said which confused me a bit is when we were talking about nonrelational data retrieval. The example he used was retrieving information on all of a specific sales reps’ customers, or perhaps on several sales reps’ customers. I got the feeling he was talking about the ability to text search on multiple columns and/or multiple tables/objects/whatever at once, but I can’t honestly claim that I connected all the dots.

And of course, the memory-centric ROLAP tool BI Accelerator — technology that’s based on TREX — is just another example of how SAP is willing to go beyond passively connecting to a single RDBMS. And while their sponsorship of MaxDB isn’t really an example of that, it is another example of how SAP’s strategy is not one to gladden the hearts of the top-tier DBMS vendors.

December 9, 2005

Relational DBMS versus text data

There seems to be tremendous confusion about “search,” “meaning,” “semantics,” the suitability of relational DBMS to manage text data, and similar subjects. Here are some observations that may help sort some of that out.

1. Relational database theorists like to talk about the “meaning” or “semantics” of data as being in the database (specifically its metadata, and more specifically its constraints). This is at best a very limited use of the words “meaning” or “semantics,” and has little to do with understanding the meaning of plain English (or other language) phrases, sentences, paragraphs, etc. that may be stored in the database. Hugh Darwen is right and his fellow relational theorists are confused.

2. The standard way to manage text is via a full-text index, designed like this: For hundreds of thousands of words, the index maintains a list of which documents the word appears in, and at what positions in the document it appears. This is a columnar, memory-centric approach, that doesn’t work well with the architecture of mainstream relational products. Oracle pulled off a decent single-server integration nonetheless, although performance concerns linger to this day. Others, like Sybase, which attempted a Verity integration, couldn’t make it work reasonably at all. Microsoft, which started from the Sybase architecture, didn’t even try, or if they tried it wasn’t for long; Microsoft’s text search strategy has been multi-server more or less from the getgo.

3. Notwithstanding point #2, Oracle, IBM, Microsoft, and others have SQL DBMS extended to handle text via the SQL3 (or SQL/MM ) standard. (Truth be told, I get the names and sequencing of the SQL standard versions mixed up.) From this standpoint, the full text of a document is in a single column, and one can write WHERE clauses on that column using a rich set of text search operators.

But while such SQL statements formally fit into the relational predicate logic model, the fit is pretty awkward. Text search functions aren’t two-valued binary yes/no types of things; rather, they give scores, e.g. with 101 possible values (the integers from 0 – 100). Compounding them into a two-valued function typically throws away information, especially since that compounding isn’t well understood (which is why it’s so hard to usefully federate text searches across different corpuses).

4. Something even trickier is going on. Text search can be carried out against many different kinds of things. One increasingly useful target is the tables of a relational database. Where a standard SQL query might have trouble finding all the references in a whole database to a particular customer organization or product line or whatever, a text search can do a better job. This kind of use is becoming increasingly frequent. And while it works OK against relational products, it doesn’t fit into the formal relational model at all (at least not without a tremendous amount of contortion).

5. Relational DBMS typically manage the data they index. Text search systems often don’t. But that difference is almost a small one compared with some of the others mentioned above, especially since it’s a checkmark item for leading RDBMS to have some sort of formal federation capability.

December 2, 2005

Some Moore’s Law data points

I’m not a hardware guy, but here are some data points around the subject of Moore’s Law, quasi-Moore laws, and their bearing on random access times to disk and RAM. This line of inquiry is central to my argument favoring memory-centric data management.

Human-Computer Interaction cites 10 nanoseconds for RAM access time, 7 milliseconds for disk, and those figures are a couple of years old — that’s pretty supportive of my figure, namely the 1,000,000:1 ratio.

Don Burleson asserts that RAM speed has been 50 nanoseconds for decades. Hmm. I’m not sure what that means, since I’d think that RAM access speeds are bounded by clock speed. Of course, 20 megahertz is 50 nanoseconds per cycle, so some multiple of 20 megahertz would suffice to allow true 50 nanosecond access.

EDIT: I looked again, and he says that the 50 nanosecond limit is based on “speed of light and Proximity to CPU.” I’m even more confused than before, since light travels about 30 centimeters per nanocecond (at least in a vacuum).

A summary of a February, 2000 Jim Gray article makes some interesting points and claims in the same general subject area. One that stands out:

Storage capacity improves 100x / decade, while storage device throughput increases 10x / decade. At the same time the ratio between disk capacity and disk accesses/second is increasing more than 10x / decade. Consequently, disk accesses become more precious and disk data becomes colder with time, at a rate of 10x / decade.

Also, the disk capacity analog to Moore’s Law is sometimes named after Mark Kryder, and the network capacity version is attributed to George Gilder.

And finally — the fastest disks now made seem to spin at 15000 RPM. Those would take 2 milliseconds to spin halfway around, for the most naive estimate of their average random access time. And the naive estimate seems not to be too bad — depending on the exact model, they’re actually advertised with 3.3-3.9 millisecond seek times.

November 14, 2005

Defining and surveying “Memory-centric data management”

I’m writing more and more about memory-centric data management technology these days, including in my latest Computerworld column. You may be wondering what that term refers to. Well, I’ve basically renamed what are commonly called “in-memory DBMS,” for what I think is a very good reason: Most of the products in the category aren’t true DBMS, aren’t wholly in-memory, or both! Indeed, if you catch me in a grouchy mood I might argue that “in-memory DBMS” is actually a contradiction in terms.

I’ll give a quick summary of the vendors and products I am focusing on in this newly-named category, and it should be clearer what I mean:

So there you have it. There are a whole lot of technologies out there that manage data in RAM, in ways that would make little or no sense if disks were more intimately involved. Conventional DBMS also try to exploit RAM and limit disk access, via caching; but generally the data access methods they use in RAM are pretty similar to those they use when going out to disk. So memory-centric systems can have a major advantage.

November 13, 2005

Breaking through the disk speed barrier

Most aspects of computer performance and capacity grow at Moore’s Law kinds of speeds. Doubling times may be anywhere from 9 months to 2 years, but in any case speeds and storage capacities grow exponentially quickly. Not so, however, with disk rotation speeds. The very first disk drives, over 50 years ago, rotated 1,200 times per minute. Today’s top disk rotation speed is around 15,000 RPM. Indeed, while I recall seeing a reference to one at 15,600 RPM, I can’t now go back and find it. Yes, folks; disk rotational speed in the entire history of computing has increased just by a measly factor of 13.

Why does this matter to DBMS design? Simply put, disk rotation speed is an absolute limit to the speed of random disk-based data access. Today’s fastest disks take 4 milliseconds to rotate once. Thus, multiple heads aside, getting something from a known but random location on the disk will take at least 2 milliseconds. And a naive data management algorithm will, for a single query, result in dozens or even hundreds of random accesses.

Thus, for a DBMS to run at acceptable speed, it needs to get data off disk not randomly, but rather a page at a time (i.e., in large blocks of predetermined size) or better yet sequentially (i.e., in continuous streams of indeterminate size). The indexes needed to assure these goals had best be sized to fit entirely in RAM. Clustering also plays an increasingly large role, so that data needed at the same time is likely to be on the same page, or at least in the same part of the disk.

Right there I’ve described some of the toughest ongoing challenges facing DBMS engineers. The big vendors all do a great job at meeting them (if they didn’t, they’d be out of business). Even so, some small companies find themselves able to beat the big guys, by some egregious cheating.

Data warehouse appliance vendors such as Netezza and especially Datallegro optimize their systems to stream data sequentially off of disk. In doing so, they go deeper into the operating systems, hardware, etc. than Oracle could ever allow itself to do. And the results seem pretty good. But I’ll write about that another time. Instead, I’m focusing right now on memory-centric data management; please see my other posts in that topic category.

November 12, 2005

TransRelational(TM) — The final debunking

In prior posts, I’ve mentioned the essential dishonesty behind the hoohah around Transrelational(TM) technology from Required Technologies, Inc., and Chris Date’s highly regrettable promotion of same. Now I’ve been able to get more detail from another former executive of the company. Unsurprisingly, it corroborates what I wrote before, and utterly contradicts some of the myths spread by Date and his acolytes. This executive, while requesting that his name be withheld because of the acrimony between the CEO and just about every other company insider, otherwise gave me permission to report fully on what he told me. Read more

October 10, 2005

TransRelational(TM) nonsense

Database guru Christopher J. Date is apparently accepting money from attendees to his seminars on TransRelational(TM) database archicture, so that he can tell them about an as-yet unreleased product from Required Technologies, Inc.

This is regrettable on multiple levels.

1. Required Technologies shut down product development in 2002, after running through $30 million; there’s great acrimony between investors and the CEO; and lawsuits are likely.

2. Required’s product never did most of what Date seems to be claiming it now does. It was a read-oriented columnar data store, much like Sybase IQ or a number of other products from younger companies. Read more

October 10, 2005

The Amazon.com bookstore is a huge, modern OLTP app. So is it relational?

I don’t know for a fact that the Amazon.com bookstore is the world’s biggest OLTP application — but if it isn’t, it’s close.

And the thing is — that’s never been an entirely relational application. Oh, the ordering part surely is. But the inventory lookup is currently driven by an OODBMS (from Progress). The personalization used to be done in Red Brick (I knew which software replaced it, but I’m forgetting at the moment — it may even be one of the relational warehouse appliance vendors). And of course the full-text search is a custom in-house system.

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.