April 7, 2012
Many kinds of memory-centric data management
I’m frequently asked to generalize in some way about in-memory or memory-centric data management. I can start:
- The desire for human real-time interactive response naturally leads to keeping data in RAM.
- Many databases will be ever cheaper to put into RAM over time, thanks to Moore’s Law. (Most) traditional databases will eventually wind up in RAM.
- However, there will be exceptions, mainly on the machine-generated side. Where data creation and RAM data storage are getting cheaper at similar rates … well, the overall cost of RAM storage may not significantly decline.
Getting more specific than that is hard, however, because:
- The possibilities for in-memory data storage are as numerous and varied as those for disk.
- The individual technologies and products for in-memory storage are much less mature than those for disk.
- Solid-state options such as flash just confuse things further.
Consider, for example, some of the in-memory data management ideas kicking around.
- In many cases there is essentially an in-memory DBMS, trying for as much ACIDity as RAM reasonably allows, then (usually) also copying data synchronously to persistent storage. These can have many different architectures. For example:
- SAP HANA is an in-memory columnar DBMS, with text indexing/inverted-list antecedents, except when it uses one of a couple of approaches to in-memory row-based data management.
- solidDB, now an IBM product, is an RDBMS that relies on Patricia tries. It is actually a hybrid memory/disk product, but optimized for in-memory operation.
- eXtremeDB is an OODBMS, but relies on B-trees.
- H-Store and its commercialization VoltDB are row-based RDBMS that make drastic assumptions about the nature of your workload, but in return get to drop much of the overhead other DBMS need.
- Oracle TimesTen is a row-based RDBMS, oriented to OLTP (OnLine Transaction Processing), which stores its data persistently via another RDBMS. (MySQL was the default choice before Oracle bought the company.)
- Oracle’s answer to SAP HANA is to take TimesTen and do analytics on it, via the Exalytics appliance.
- Some disk-based DBMS just happen to be architected in ways so that for good performance you’re going to want to keep all the data in RAM. Often, their in-memory architecture is lot like their on-disk architecture, with memory mapping for I/O. This is done in very different kinds of DBMS.
- MongoDB is one visible example. In general, scale-out web databases (whether NoSQL or MySQL) often keep all their data in RAM, whether or not that plan is baked into the DBMS architecture.
- Various analytic DBMS vendors have at time been memory-oriented. At the moment, I think:
- Exasol (columnar) isn’t quite as extreme about wanting to be in-memory as it used to be.
- ParAccel (columnar) and its memory-mapped architecture can be happily used either in-memory or on disk.
- Kognitio (row-based), which used to be portrayed as a disk-based system that’s smart about using RAM, is currently being marketed as an in-memory system.
- My last technical briefing on Applix TM1 (now an IBM Cognos product) was in September, 2005. (The product itself dates back to 1984.) At the time TM1 had an interesting sparse MOLAP (Multi-Dimensional OnLine Analytic Processing) story, the point being that the system worked hard to isolate what was actually non-zero. Loading of raw data seemed to be batch, but you could update models with derived data, and there was a transaction log for confident persistence.
- Alternatively, you can use a caching layer, typically on a separate set of servers from your DBMS, which has no responsibility for managing data persistence. For example:
- TimesTen and solidDB are used, respectively, as relational caches for Oracle and DB2.
- Peter Zencke told me years ago that SAP had a purpose-built caching layer that kept over 99% of requests from touching disk.
- The key-value store memcached is central to many of the world’s largest web sites, typically backed by a MySQL cluster.
- ScaleArc has key-value cache that stores — rather than individual records — the entire TCP string sent by an RDBMS in response to a particular SQL query.
- Some systems manage data in memory in one kind of structure, then ensure persistence via a very different structure on disk. Examples include:
- Workday’s architecture — object-oriented in RAM, MySQL (really key-value) on disk. Edit: Workday thinks “key-value” is a slightly misleading way to put it. Stay tuned for more.
- Oracle Coherence (formerly Tangosol) — object-oriented in RAM, Oracle on disk. Edit: Actually, Coherence isn’t really a write-through ORM (Object-Relational Mapper). It functions more like memcached, albeit with a very different data model.
- Couchbase — memcached (key-value) in-memory, evolving from SQLite to CouchDB on disk.
- Similarly, business intelligence suites can manage data in-memory that comes from some other kind of data store (usually an RDBMS, sometimes Hadoop or whatever). I haven’t had a lot of luck in getting details, with one exception — QlikView, which uses a simple tabular data structure.
- Stream processors — i.e. CEP engines — are a whole other sort of in-memory engine, doing something that’s a lot like data management.
And that, kiddies, is why I hesitate to generalize in too much detail about “in-memory database management.”
Despite its length, this is still a very partial list of memory-centric data management approaches. I encourage you to add other examples into the comments that I might have left out.
Related link
- I did a simpler overview of memory-centric alternatives in 2005.
Categories: Business intelligence, Cache, Cognos, Columnar database management, Couchbase, Data models and architecture, Data warehousing, Database diversity, Exasol, IBM and DB2, In-memory DBMS, Kognitio, memcached, MongoDB, MySQL, NoSQL, Oracle, Oracle TimesTen, ParAccel, QlikTech and QlikView, SAP AG, solidDB, Streaming and complex event processing (CEP), VoltDB and H-Store, Workday
Subscribe to our complete feed!
Comments
15 Responses to “Many kinds of memory-centric data management”
Leave a Reply
Really great overview of the approaches used in various products. An approach that I have seen used with success in (adhoc) applications is to keep the raw facts on disk and to keep pre-aggregated datasets at various levels of granularity in memory. This allows “human real-time” slicing and dicing of very large datasets, because a lot of the grunt work has already been done. I am a little surprised that commercial products have not incorporated this idea.
It would be interesting to read Kurt review Microsoft’s Analysis Services Tabular Mode in-memory offering.
Very interesting post.
Regarding what you mention about Timesten, my understanding is that Oracle Exalytics comes with a columnar version of Timesten, making it more similar to SAP Hana except that I’m unsure you can have columnar and row based into one single DB instance. Also I’m unsure Timesten has write capability in its columnar version ; for that, Exalytics comes with an in-memory version of Essbase (however, given Essbase architecture, I’m unsure that it can provide sparse MOLAP).
As Ian, I would be interested to have your POV around xVelocity from Microsoft. Also, although I know you consider Actian Vectorwise with some caution, I would be intersesting in how you would position the way it uses CPU Cache as execution memory. Would you consider it as in-memory too ?
Jean-Michel,
You’ve listed a few very frustrating subjects.
First, Oracle likes to imply it has columnar storage when it really just has columnar compression. I’ve heard nothing about TimesTen being an exception.
Second, Ingres has lied rather badly about the contents of my previous conversations with them, in an attempt to discredit my opinion of them. That puts them on a very short blacklist of companies I will not risk talking to, for fear of adding further fuel to the slanderous fire. (Basho is the other big example.)
Third, Microsoft is not in the habit of briefing me if they can help it. They evidently prefer analysts who are easier to control. Oracle takes a similar approach, but not to the same extreme.
Jean-Michel,
Curt is absolutely correct regarding the new Oracle TimesTen(v.11.2.2)specifically for Exalytics. It includes analytic functions along with in-memory columnar compression (~5x). The compression is specified against individual columns or groups of columns. Although compressed columns take longer to load, they will theoretically be much faster to query (than uncompressed columns), and obviously, take up less disk space.
The idea is to load aggregates into TimesTen, leave base data at source, and use OBIEE’s vertical federation capability to seamlessly report across both (all transparent to the end-user).
Mark Rittman, of UK-based Rittman Mead Consulting, did a terrific job back in late February http://bit.ly/HtFEWo providing a much deeper dive than Oracle does in any of its white papers for public viewing.
Edit: Never mind — I misread Michelle’s comment.
Michelle,
I just drew a distinction between columnar storage and columnar compression, and now you’re commenting as if they’re the same thing. 🙁
And Mark’s link doesn’t say anything about TimesTen having columnar storage.
An SAP company Sybase has also in-memory database in ASE product.
http://www.sybase.com/manage/in-memory-databases
Which is also used in Sybase RAP product as RAP Cache.
Nevertheless of personal Actian/Ingres relationships, in theory-wise interesting paper about pre-VectorWise DBMS MonetDB breaking the “memory wall”:
http://oai.cwi.nl/oai/asset/13805/13805B.pdf
IBM Informix has an in-memory engine, Informix Warehouse accelerator(IWA). It uses hybrid columnar approach & is targeted for data analysis. The product details are at http://ibm.co/f0Hc2h and http://ibm.co/xQxaD2. IWA takes snapshot of the data, compresses it and runs queries on compressed data, without decompression when possible. For details, see the references at the end of the paper http://ibm.co/xQxaD2.
With respect to “real-time analytics” and hence maintaining data in DRAM… it may be interesting to compare the rate of data growth and Moore’s law.
Hence my point that some kinds of data are clearly destined to be held entirely in RAM, while other kinds will rely on cheaper storage for a long time to come.
[…] months ago, I pointed out that it is hard to generalize about memory-centric database management, because there are so many different kinds. That said, there are some basic points that I’d […]
[…] Many examples of memory-centric data management (April, 2012) […]
[…] The broad variety of memory-centric data management approaches. […]
[…] and Pivotal’s new offerings join an increasing number of databases with in-memory capabilities, including IBM Blu for DB2, SAP Hana, VoltDB’s eponymous database, and Oracle TimesTen, among […]