April 7, 2012

Many kinds of memory-centric data management

I’m frequently asked to generalize in some way about in-memory or memory-centric data management. I can start:

The desire for human real-time interactive response naturally leads to keeping data in RAM.
Many databases will be ever cheaper to put into RAM over time, thanks to Moore’s Law. (Most) traditional databases will eventually wind up in RAM.
However, there will be exceptions, mainly on the machine-generated side. Where data creation and RAM data storage are getting cheaper at similar rates … well, the overall cost of RAM storage may not significantly decline.

Getting more specific than that is hard, however, because:

The possibilities for in-memory data storage are as numerous and varied as those for disk.
The individual technologies and products for in-memory storage are much less mature than those for disk.
Solid-state options such as flash just confuse things further.

Consider, for example, some of the in-memory data management ideas kicking around.

In many cases there is essentially an in-memory DBMS, trying for as much ACIDity as RAM reasonably allows, then (usually) also copying data synchronously to persistent storage. These can have many different architectures. For example:
- SAP HANA is an in-memory columnar DBMS, with text indexing/inverted-list antecedents, except when it uses one of a couple of approaches to in-memory row-based data management.
- solidDB, now an IBM product, is an RDBMS that relies on Patricia tries. It is actually a hybrid memory/disk product, but optimized for in-memory operation.
- eXtremeDB is an OODBMS, but relies on B-trees.
- H-Store and its commercialization VoltDB are row-based RDBMS that make drastic assumptions about the nature of your workload, but in return get to drop much of the overhead other DBMS need.
- Oracle TimesTen is a row-based RDBMS, oriented to OLTP (OnLine Transaction Processing), which stores its data persistently via another RDBMS. (MySQL was the default choice before Oracle bought the company.)
- Oracle’s answer to SAP HANA is to take TimesTen and do analytics on it, via the Exalytics appliance.
Some disk-based DBMS just happen to be architected in ways so that for good performance you’re going to want to keep all the data in RAM. Often, their in-memory architecture is lot like their on-disk architecture, with memory mapping for I/O. This is done in very different kinds of DBMS.
- MongoDB is one visible example. In general, scale-out web databases (whether NoSQL or MySQL) often keep all their data in RAM, whether or not that plan is baked into the DBMS architecture.
- Various analytic DBMS vendors have at time been memory-oriented. At the moment, I think:
  - Exasol (columnar) isn’t quite as extreme about wanting to be in-memory as it used to be.
  - ParAccel (columnar) and its memory-mapped architecture can be happily used either in-memory or on disk.
  - Kognitio (row-based), which used to be portrayed as a disk-based system that’s smart about using RAM, is currently being marketed as an in-memory system.
My last technical briefing on Applix TM1 (now an IBM Cognos product) was in September, 2005. (The product itself dates back to 1984.) At the time TM1 had an interesting sparse MOLAP (Multi-Dimensional OnLine Analytic Processing) story, the point being that the system worked hard to isolate what was actually non-zero. Loading of raw data seemed to be batch, but you could update models with derived data, and there was a transaction log for confident persistence.
Alternatively, you can use a caching layer, typically on a separate set of servers from your DBMS, which has no responsibility for managing data persistence. For example:
- TimesTen and solidDB are used, respectively, as relational caches for Oracle and DB2.
- Peter Zencke told me years ago that SAP had a purpose-built caching layer that kept over 99% of requests from touching disk.
- The key-value store memcached is central to many of the world’s largest web sites, typically backed by a MySQL cluster.
- ScaleArc has key-value cache that stores — rather than individual records — the entire TCP string sent by an RDBMS in response to a particular SQL query.
Some systems manage data in memory in one kind of structure, then ensure persistence via a very different structure on disk. Examples include:
- Workday’s architecture — object-oriented in RAM, MySQL (really key-value) on disk. Edit: Workday thinks “key-value” is a slightly misleading way to put it. Stay tuned for more.
- Oracle Coherence (formerly Tangosol) — object-oriented in RAM, Oracle on disk. Edit: Actually, Coherence isn’t really a write-through ORM (Object-Relational Mapper). It functions more like memcached, albeit with a very different data model.
- Couchbase — memcached (key-value) in-memory, evolving from SQLite to CouchDB on disk.
Similarly, business intelligence suites can manage data in-memory that comes from some other kind of data store (usually an RDBMS, sometimes Hadoop or whatever). I haven’t had a lot of luck in getting details, with one exception — QlikView, which uses a simple tabular data structure.
Stream processors — i.e. CEP engines — are a whole other sort of in-memory engine, doing something that’s a lot like data management.

And that, kiddies, is why I hesitate to generalize in too much detail about “in-memory database management.”

Despite its length, this is still a very partial list of memory-centric data management approaches. I encourage you to add other examples into the comments that I might have left out.

Related link

I did a simpler overview of memory-centric alternatives in 2005.

Categories: Business intelligence, Cache, Cognos, Columnar database management, Couchbase, Data models and architecture, Data warehousing, Database diversity, Exasol, IBM and DB2, In-memory DBMS, Kognitio, memcached, MongoDB, MySQL, NoSQL, Oracle, Oracle TimesTen, ParAccel, QlikTech and QlikView, SAP AG, solidDB, Streaming and complex event processing (CEP), VoltDB and H-Store, Workday

Subscribe to our complete feed!

Comments

15 Responses to “Many kinds of memory-centric data management”

Brian Andersen on April 7th, 2012 10:26 pm

Really great overview of the approaches used in various products. An approach that I have seen used with success in (adhoc) applications is to keep the raw facts on disk and to keep pre-aggregated datasets at various levels of granularity in memory. This allows “human real-time” slicing and dicing of very large datasets, because a lot of the grunt work has already been done. I am a little surprised that commercial products have not incorporated this idea.
Ian in London on April 8th, 2012 5:12 am

It would be interesting to read Kurt review Microsoft’s Analysis Services Tabular Mode in-memory offering.
Jean-Michel Franco on April 8th, 2012 7:02 am

Very interesting post.
Regarding what you mention about Timesten, my understanding is that Oracle Exalytics comes with a columnar version of Timesten, making it more similar to SAP Hana except that I’m unsure you can have columnar and row based into one single DB instance. Also I’m unsure Timesten has write capability in its columnar version ; for that, Exalytics comes with an in-memory version of Essbase (however, given Essbase architecture, I’m unsure that it can provide sparse MOLAP).

As Ian, I would be interested to have your POV around xVelocity from Microsoft. Also, although I know you consider Actian Vectorwise with some caution, I would be intersesting in how you would position the way it uses CPU Cache as execution memory. Would you consider it as in-memory too ?
Curt Monash on April 8th, 2012 3:07 pm

Jean-Michel,

You’ve listed a few very frustrating subjects.

First, Oracle likes to imply it has columnar storage when it really just has columnar compression. I’ve heard nothing about TimesTen being an exception.

Second, Ingres has lied rather badly about the contents of my previous conversations with them, in an attempt to discredit my opinion of them. That puts them on a very short blacklist of companies I will not risk talking to, for fear of adding further fuel to the slanderous fire. (Basho is the other big example.)

Third, Microsoft is not in the habit of briefing me if they can help it. They evidently prefer analysts who are easier to control. Oracle takes a similar approach, but not to the same extreme.
Michelle Agul on April 8th, 2012 6:43 pm

Jean-Michel,

Curt is absolutely correct regarding the new Oracle TimesTen(v.11.2.2)specifically for Exalytics. It includes analytic functions along with in-memory columnar compression (~5x). The compression is specified against individual columns or groups of columns. Although compressed columns take longer to load, they will theoretically be much faster to query (than uncompressed columns), and obviously, take up less disk space.

The idea is to load aggregates into TimesTen, leave base data at source, and use OBIEE’s vertical federation capability to seamlessly report across both (all transparent to the end-user).

Mark Rittman, of UK-based Rittman Mead Consulting, did a terrific job back in late February http://bit.ly/HtFEWo providing a much deeper dive than Oracle does in any of its white papers for public viewing.
Curt Monash on April 8th, 2012 8:45 pm

Edit: Never mind — I misread Michelle’s comment.

Michelle,

I just drew a distinction between columnar storage and columnar compression, and now you’re commenting as if they’re the same thing. 🙁
Curt Monash on April 8th, 2012 8:47 pm

And Mark’s link doesn’t say anything about TimesTen having columnar storage.
Harri Kallio on April 9th, 2012 5:59 am

An SAP company Sybase has also in-memory database in ASE product.
http://www.sybase.com/manage/in-memory-databases

Which is also used in Sybase RAP product as RAP Cache.

Nevertheless of personal Actian/Ingres relationships, in theory-wise interesting paper about pre-VectorWise DBMS MonetDB breaking the “memory wall”:
http://oai.cwi.nl/oai/asset/13805/13805B.pdf
Keshav Murthy on April 9th, 2012 3:07 pm

IBM Informix has an in-memory engine, Informix Warehouse accelerator(IWA). It uses hybrid columnar approach & is targeted for data analysis. The product details are at http://ibm.co/f0Hc2h and http://ibm.co/xQxaD2. IWA takes snapshot of the data, compresses it and runs queries on compressed data, without decompression when possible. For details, see the references at the end of the paper http://ibm.co/xQxaD2.
Darpan Dinker on April 24th, 2012 7:08 pm

With respect to “real-time analytics” and hence maintaining data in DRAM… it may be interesting to compare the rate of data growth and Moore’s law.
Curt Monash on April 24th, 2012 8:13 pm

Hence my point that some kinds of data are clearly destined to be held entirely in RAM, while other kinds will rely on cheaper storage for a long time to come.
Disk, flash, and RAM | DBMS 2 : DataBase Management System Services on July 12th, 2012 11:08 pm

[…] months ago, I pointed out that it is hard to generalize about memory-centric database management, because there are so many different kinds. That said, there are some basic points that I’d […]
In-memory, (hybrid) memory-centric DBMS — three analytic glossary draft entries | DBMS 2 : DataBase Management System Services on August 20th, 2012 2:31 pm

[…] Many examples of memory-centric data management (April, 2012) […]
Notes on memory-centric data management | DBMS 2 : DataBase Management System Services on January 3rd, 2014 8:12 am

[…] The broad variety of memory-centric data management approaches. […]
In-memory technologies move databases to real time – NetworksAsia.net on May 20th, 2020 6:45 pm

[…] and Pivotal’s new offerings join an increasing number of databases with in-memory capabilities, including IBM Blu for DB2, SAP Hana, VoltDB’s eponymous database, and Oracle TimesTen, among […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Many kinds of memory-centric data management

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin