Memory-centric data management

Analysis of technologies that manage data entirely or primarily in random-access memory (RAM). Related subjects include:

May 22, 2006

Data warehouse appliances

If we define a “data warehouse appliance” as “a special-purpose computer system, with appliance administratibility, that manages a data warehouse,” then there are two major contenders: Netezza and DATAllegro, both startups, both with a small number of disclosed customers. Past contenders would include Teradata and White Cross (which seems to have just merged into Kognitio), but neither would admit to being in that market today. (I suspect this is a mistake on Teradata’s part, but so be it.) IBM with DB2 on the z-Series wouldn’t be properly regarded as an appliance player either, although IBM is certainly conscious of appliance competition. And SAP’s BI Accelerator does not persist data at this time.

In principle, the Netezza and DATAllegro stories are similar — take an established open source RDBMS*, build optimized hardware to run it, and optimize the software configuration as well. Much of the optimization is focused on getting data on and off disk sequentially, minimizing any random accesses. This is why I often refer to data warehouse appliances as being the best alternative to memory-centric data management. Beyond that, the optimizations by the two vendors differ considerably.
*Netezza uses PostgreSQL; DATAllegro uses Ingres.

Hmm. I don’t feel like writing more on this subject at this very moment, yet I want to post something urgently because there’s an IOU in my Computerworld column today for it. OK. More later.

May 10, 2006

White paper on memory-centric data management — excerpt

Here’s an excerpt from the introduction to my new white paper on memory-centric data management. I don’t know why WordPress insists on showing the table gridlines, but I won’t try to fix that now. Anyhow, if you’re interested enough to read most of this excerpt, I strongly suggest downloading the full paper.

Introduction

Conventional DBMS don’t always perform adequately.

Ideally, IT managers would never need to think about the details of data management technology. Market-leading, general-purpose DBMS (DataBase Management Systems) would do a great job of meeting all information management needs. But we don’t live in an ideal world. Even after decades of great technical advances, conventional DBMS still can’t give your users all the information they need, when and where they need it, at acceptable cost. As a result, specialty data management products continue to be needed, filling the gaps where more general DBMS don’t do an adequate job.

Memory-centric technology is a powerful alternative.

One category on the upswing is memory-centric data management technology. While conventional DBMS are designed to get data on and off disk quickly, memory-centric products (which may or may not be full DBMS) assume all the data is in RAM in the first place. The implications of this design choice can be profound. RAM access speeds are up to 1,000,000 times faster than random reads on disk. Consequently, whole new classes of data access methods can be used when the disk speed bottleneck is ignored. Sequential access is much faster in RAM, too, allowing yet another group of efficient data access approaches to be implemented.

It does things disk-based systems can’t.

If you want to query a used-book database a million times a minute, that’s hard to do in a standard relational DBMS. But Progress’ ObjectStore gets it done for Amazon. If you want to recalculate a set of OLAP (OnLine Analytic Processing) cubes in real-time, don’t look to a disk-based system of any kind. But Applix’s TM1 can do just that. And if you want to stick DBMS instances on 99 nodes of a telecom network, all persisting data to a 100th node, a disk-centric system isn’t your best choice – but Solid’s BoostEngine should get the job done.

Memory-centric data managers fill the gap, in various guises.

Those products are some leading examples of a diverse group of specialist memory-centric data management products. Such products can be optimized for OLAP or OLTP (OnLine Transaction Processing) or event-stream processing. They may be positioned as DBMS, quasi-DBMS, BI (Business Intelligence) features, or some utterly new kind of middleware. They may come from top-tier software vendors or from the rawest of startups. But they all share a common design philosophy: Optimize the use of ever-faster semiconductors, rather than focusing on (relatively) slow-spinning disks.

They have a rich variety of benefits.

For any technology that radically improves price/performance (or any other measure of IT efficiency), the benefits can be found in three main categories:

  • Doing the same things you did before, only more cheaply;
  • Doing the same things you did before, only better and/or faster;
  • Doing things that weren’t technically or economically feasible before at all.

For memory-centric data management, the “things that you couldn’t do before at all” are concentrated in areas that are highly real-time or that use non-relational data structures. Conversely, for many relational and/or OLTP apps, memory-centric technology is essentially a much cheaper/better/faster way of doing what you were already struggling through all along.

Memory-centric technology has many applications.

Through both OEM and direct purchases, many enterprises have already adopted memory-centric technology. For example:

  • Financial services vendors use memory-centric data management throughout their trading systems.
  • Telecom service vendors use memory-centric data management in multiple provisioning, billing, and routing applications.
  • Memory-centric data management is used to accelerate web transactions, including in what may be the most demanding OLTP app of all — Amazon.com’s online bookstore.
  • Memory-centric data management technology is OEMed in a variety of major enterprise network management products, including HP Openview.
  • Memory-centric data management is used to accelerate analytics across a broad variety of industries, especially in such areas as planning, scenarios, customer analytics, and profitability analysis.

May 8, 2006

Memory-centric data management whitepaper

I have finally finished and uploaded the long-awaited white paper on memory-centric data management.

This is the project for which I origially coined the term “memory-centric data management,” after realizing that the prevalent “in-memory DBMS” creates all sorts of confusion about how and whether data persists on disk. The white paper clarifies and updates points I have been making about memory-centric data management since last summer. Sponsors included:

If there’s one area in my research I’m not 100% satisfied with, it may be the question of where the true hardware bottlenecks to memory-centric data management lie (it’s obvious that the bottleneck to disk-centric data management is random disk access). Is it processor interconnect (around 1 GB/sec)? Is it processor-to-cache connections (around 5 GB/sec)? My prior pronouncements, the main body of the white paper, and the Intel Q&A appendix to the white paper may actually have slightly different spins on these points.

And by the way — the current hard limit on RAM/board isn’t 2^64 bytes, but a “mere” 2^40. But don’t worry; it will be up to 2^48 long before anybody actually puts 256 gigabytes under the control of a single processor.

April 26, 2006

Solid/MySQL fit and positioning

I felt like writing a lot about the great potential fit between MySQL and Solid over the weekend, but Solid didn’t want me to do so. Now, however, I’m not in the mood, so I’ll just say that in OLTP, Solid’s technology is strong where MySQL’s is weak, and vice-versa. E.g., Solid is so proud of its zero-administration capabilities that, without MySQL, it doesn’t have much in the way of admin tools at all. Conversely, I think that many of those websites that crash all the time with MySQL errors would crash less with the Solid engine underneath. (Solid happens to be proud of its BLOB-handling capability, efficiency-wise.)

Neither outfit is good in data warehousing, or in text search, image search, etc. (Solid slings big files around, but it doesn’t peer closely inside them). But for OLTP of tabular or dumb media data, this looks like a great fit.

Whether anybody will care, however, is a different matter.

Lisa Vaas of eWeek offers a survey of the many MySQL engine options.

EDIT: Another Lisa Vaas article makes it clear that MySQL is planning to compete in data warehousing/OLAP as well.

April 22, 2006

More on Solid and MySQL?

In a stunningly self-defeating move, my friends at Solid have decided that anything about their already-leaked possible cooperation with MySQL is embargoed.

Indeed, they’ve emphasized to me multiple times that they do not wish me to write about it.

I shall honor their wishes. I hope they are pleased with the sophistication and insight of the coverage they receive from other sources.

April 17, 2006

MySQL gets the Solid engine

Solid and MySQL have struck a deal (and for some odd reason I had to find out about it from Slashdot and then here rather than from one one the companies). Apparently Solid will open source a version of its storage engine, to be used with the MySQL front-end.

Solid’s core technology is a lightweight, zero-administration OLTP RDBMS. And they really mean “zero-administration,” because as they like to point out, a typical deployment is embedded in a piece of telecom equipment that doesn’t even have a keyboard. Now, that doesn’t really mean the Solid engine would still be zero-administration in other applications, but sure aren’t talking about something as prickly as, say, Oracle.

That said, Solid’s technology has its limitations. It isn’t historically designed for the query load (volume or mix) of, say, an SAP installation. It certainly doesn’t have much in the way of data warehousing functionality. And it doesn’t have much in the way of administration tools itself (although presumably MySQL will fill that gap).

One very important aspect of the Solid technology is its hybrid memory-centric design. Much more on that soon. My white paper on memory-centric data management is finally close to publication, with Solid as a co-sponsor. At some point I’ll even do a webinar for them associated with the paper.

I don’t know whether that’s part of the MySQL relationship — it would be very cool if it were.

January 31, 2006

Computerworld on memory-centric data management

Computerworld recently ran an excellent story on memory-centric data management. The opening sentences show that correspondent Gary Anthes most definitely “gets it”:

Relational database management systems have become all but ubiquitous in enterprise computing since 1970, when they were first devised by E.F. Codd. But as powerful and flexible as those databases are, they’ve proved inadequate for a handful of ultrademanding applications that have to process hundreds or thousands of transactions per second and never go down.

I’m quoted in one of the sidebars, but with the core article being this good I didn’t really add much.

Incidentally, the article talked a lot about Oracle’s recently acquired TimesTen in-memory DBMS product, and also a fair amount about Streambase. This is complementary to my own research, which has focused more on the other leading memory-centric data management vendors.

January 31, 2006

DB2 Express-C

IBM announced the freeware version of DB2 today. I’ll post links to the details later, but I want to highlight a couple of interesting implications:

1. They define the cutoff between the free and paid version not by how big a database you can manage on disk, but rather by how much RAM the software can address. This supports my thesis that effective use of RAM is crucial to DBMS performance, and is corollary — specially optimized memory-centric data management products deserve a place in most large enterprises’ product portfolios.

2. Having a free version of DB2 lets one play with whatever features DB2 may have that simply aren’t available in other DBMS, to see if they’re worth using. And the most significant such feature, in my opinion, is native XML storage. Whatever else this product does or doesn’t accomplish, it may serve to speed adoption of IBM’s native XML server technology.

January 27, 2006

Detailed webinar on memory-centric technology

I did a webinar on memory-centric data management for Applix. It was the standard hour in length, but they had me do the vast majority of the talking, so I laid out my ideas in some detail.

In line with their business focus, I emphasized OLAP in general and MOLAP in particular. But I did have a chance to lay out pretty much the whole story.

There’s a lot of material in it I haven’t published yet in written form, and some nuances I may never get around to writing down. So if you’re sufficiently interested in the area, I recommend watching the webinar.

January 13, 2006

Memory-centric research — hear the latest!

What I’ve written so far in this blog (and in Computerworld) about memory-centric data management technology is just the tip of the iceberg. A detailed white paper is forthcoming, sponsored by most of the industry leaders: Applix, Progress, SAP, Intel (in association with SAP), and Solid. (But for some odd reason Oracle declined to participate …)

A lot of the material will be rolled out publically for the first time in a webinar on Wednesday, January 25, at 11 EST. Applix is the host. To participate, please follow this link.

I’m also holding forth online, in webinars and even video, on other subjects these days. More details may be found over in the Monash Report.

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.