Traditional databases will eventually wind up in RAM
In January, 2010, I posited that it might be helpful to view data as being divided into three categories:
- Human/Tabular data –i.e., human-generated data that fits well into relational tables or arrays.
- Human/Nontabular data — i.e., all other data generated by humans.
- Machine-Generated data.
I won’t now stand by every nuance in that post, which may differ slightly from those in my more recent posts about machine-generated data and poly-structured databases. But one general idea is hard to dispute:
Traditional database data — records of human transactional activity, referred to as “Human/Tabular data above” — will not grow as fast as Moore’s Law makes computer chips cheaper.
And that point has a straightforward corollary, namely:
It will become ever more affordable to put traditional database data entirely into RAM.
Actually, there are numerous ways for OLTP, other short-request, and some analytic databases to wind up in RAM.
- SAP has some good ideas for how it could happen, banging transactions into what is essentially an in-memory analytic database. (I dispute SAP’s claims of transformational database technology leadership, but that doesn’t mean the underlying ideas aren’t good.)
- For those who can afford the associated technology disruption, memory-centric object-oriented DBMS could be appealing.
- Web scalability best practices commonly include keeping data in RAM (e.g., that’s pretty much the point of caching layer memcached).
- SaaS (Software as a Service) companies — such as Workday — often bring a particular tenant’s database entirely into RAM.
- QlikView highlights the benefits of doing business intelligence in RAM.
- SAS HPA makes the argument that even “big data analytics” should sometimes be done in RAM.
- I don’t have particularly favorable opinions at this time about marketing strategies or momentum at Oracle TimesTen, IBM solidDB, or VoltDB, but those examples at least serve to illustrate that memory-centric OLTP DBMS have existed for years.
- Actually, SAP has at least two good ideas, if you count Sybase as part of SAP.
And here’s the kicker: Intel told me last year that CPUs are headed to 46-bit address spaces around mid-decade. Indeed, they hired me to help figure out if that was enough.* That multiplies out to 64 terabytes of RAM on a single server, chip costs permitting. So most of what we now think of as operational databases — and many of the analytic ones too — will fit in-memory, even if they run very large businesses.
*And did so without putting the discussion under any kind of NDA.
Likely consequences of all this include:
- Legacy apps will (eventually) be consolidated and virtualized in-memory. Their underlying databases will grow so slowly that eventually the cost of putting them in RAM will be too low to worry about.
- Expensive storage systems will (continue to) be irrelevant to database processing. Databases that don’t fit in RAM will typically be big enough to require the attention of a lot of CPUs — and in those cases the DBMS software itself will handle all the storage tasks.
- Major OLTP DBMS vendors, such as Oracle, will need alternate in-memory code lines, because disk-centric architectures are sub-optimal in-memory. Well, that’s what they have those big R&D budgets for.
- SaaS vendors and web businesses may not rely on today’s major OLTP DBMS vendors. (I was going to say “won’t” rather than “may not” until I recalled the likely M&A endgame.) Traditional enterprises may blanch at migrating away from their legacy DBMS environments, but the trade-offs are different for technology companies using DBMS as subsystems.
Of course, the same trends that make data-storing chips cheaper will make data-generating chips cheaper too. So, just as there are huge amounts of machine-generated data that you’d never pay to store in RAM, the same will still be true 10 years from now; the data volumes involved will just be a lot bigger. And thus there will still be plenty of very large analytic databases using relatively cheap forms of storage, perhaps even disk.
But OLTP and other short-request processing are likely to wind up in-memory. And the same may be true for a considerable amount of analytics, especially but not only if the analytics have a low-latency requirement.
Comments
28 Responses to “Traditional databases will eventually wind up in RAM”
Leave a Reply
Hi
I think this is neglecting a big change in hardware that may be coming ~ wherein RAM speed non-volatile storage might denote an entire architecture change.
It’s called Phase Change Memory, and Intel at least is betting big on it.
http://en.wikipedia.org/wiki/Phase-change_memory
http://www.technologyreview.com/computing/20148/
These list it as a replacement for flash, but:
http://www.micron.com/innovations/pcm.html
There used to be a site http://www.numonyx.com/ which listed a collaboration between Intel and Micron.
The point is , whether this is the breaktrhough itself or not, I think the current state of affairs will not carry on ~ volatile RAM will be replaced by almost as fast NON-volatile RAM of some sort
Yep, there is another trend in the industry: new non-volatile RAM technologies are coming to the market. This makes future perspectives of an in-memory data processing even more solid.
Fair point as to whether new solid-state memory developments are relevant. I’m inclined to say:
1. No matter what happens, there will long be major speed or latency differences between various kinds of storage. The level of cache closest to the CPU will be much faster than the storage that holds most of the data. DBMS design will need to optimize around that.
2. More important, however, is the speed or latency of the SLOWEST storage involved in low-latency response. If you can afford to make that be at RAM speeds, very different DBMS architectures seem optimal than those that are optimal in a disk-centric world.
3. Flash memory and its speeds live in a kind of muddled middle. If it were always going to be here, it would itself trigger major DBMS redesign. But while there’s certainly been some good flash-oriented engineering — specifically at a few appliance vendors or former appliance vendors — I think there’s a lot of wait-and-see as to whether PCM, racetrack memory, whatever leapfrog flash.
This suggests that it is best to have human tabular data stored separately from machine generated data. Is that driven by the preferred DBMS technology / volumes or do you see them as being gathered for different purposes such that they naturally stay independent
Gary,
Primarily the former. There are plenty of use cases in which you’d copy a (relatively speaking) small set of human-generated data into a data warehouse that holds mainly machine-generated stuff. Conversely, you might extract/subset/derive data from a large machine-generated set to load into a warehouse operating on a human-generated-data scale (that already happens in a large fraction of web businesses). Or not just a warehouse; it could be more of an operational system. And that’s even without telling federation stories and so on.
[…] skepticism about specialized storage hardware for database applications applies in part but not in whole to […]
[…] one of the best sources is Curt Monash’s DBMS2 blog. Recently he posted an article called Traditional Databases will eventually wind up in RAM. I have two comments about his points from that […]
I was sufficiently inspired by this post to write a whole blog post about it: see http://danweinreb.org/blog/what-are-human-generated-data-and-in-ram-databases
It does not help if your secondary storage is as fast as RAM if your DBMS architecture is based on moving secondary storage (Disk) to primary storage (RAM) before operating on it.
So even if your disk was faster than RAM, a traditional DBMS will still be much slower than a memory centric database. My claim is easy to validate; move your “favorite-db-name-goes-here” to a RAM disk and benchmark it.
A simple example. If you have CPU addressability to you database storage, you would not copy it to RAM before scanning for a value.
While a naive conclusion would be to simply rewrite such code, the cruel reality is that the complexity of any modern DBMS (due amongst other things to cuncurrent reads and writes) makes a total rewrite a much easier path.
Back to Mark Stacey’s comment, I believe that another important HW based trend is being ignored.
DRAM and SSD [aka fast media] based storage arrays are changing the way DBMSs work. While for years DBMSs have done their best to avoid accessing the storage arrays, due to their high latencies resulted by rotating disk, the new generation of fast media storage array enable fast access to data. Though it is still not (and never be) as fast as accessing DRAM on a server, it enables millions of IOPS requests, high throughput and fraction of mill seconds latency from a shared resource which is persistent and highly available.
I believe that in the near future we will see how more DBMSs are utilizing these new IO performance capabilities more and more, leaving the usage of RAM centric DBMS only to very exotic applications (like ones used in trading floors for example).
Arik,
We’ll see. DBMS vendors will definitely adapt to the fact that persistent storage can have a BROAD range of latencies. And there surely will be a marketing (and product management) strain of “Oh, our old disk-bound stuff is plenty fast enough when you don’t actually have to be bound by disks.” And there definitely is interest in direct-attached and/or custom-appliance use of solid-state memory.
But it may be that, despite various details being very different, from a qualitative design standpoint, running against homogeneous solid-state storage arrays is a lot like running against disk arrays.
As Daniel Weinreb discussed in his blog response to your article, the question of going to 100% in-memory gets a lot more tricky if you need to ensure data don’t vaporize on failures. “Legacy” applications tend to have kind of annoying requirements in this regard.
Economics for capable SSDs are starting to compare very favorably to disk and can be adopted without radical changes to current DBMS architectures. In the MySQL community at least much of the performance focus has been in this area though of course more memory is not bad either.
Finally, what do you mean by “expensive” storage systems? If that means NetApp, EMC, and the like, I would not count them out by any means. They have a lot of very useful features like snapshots, cross-site replication, de-duplication, and centralized management that will keep them relevant for anyone doing transactional processing. However, perhaps you had something else in mind?
Yep, Robert, I meant EMC, NetApp, and the like. All the features you list are ones I’d like to see and indeed expect to see in the DBMS itself. (Well, the Delphix approach to database de-dupe is potentially pretty interesting, but that’s not the province of storage hardware.)
As for “100% in-memory”, obviously you log to persistent storage.
And finally — there’s no doubt that solid-state memory will push out many uses of disk. What I’m wondering is whether the industry will stop there, or go all the way to RAM. (It is of course possible that this distinction is somehow obviated before we ever have a chance to get to that point.)
@Kurt, Not that I’m trying to avoid work, but adding sophisticated disk management functions seems to add a lot of complexity to DBMS implementations. Many DBMS in the 1990s supported functions like mirroring but dropped/de-emphasized them as it became apparent RAID hardware did a better job. Storage technology is changing very rapidly, so it does not seem like a good time for DBMSs to take on advanced storage management again.
My $0.02 anyway. 🙂
It’s also worth noting in this context that Kognition’s appliance offering is moving very close to ‘everything in RAM’ approach. The latest specs on their site quote RAM:Disk ratios of either 1:4 or 1:8.
Interestingly the only Kognitio customer I’ve personally spoken to told me that their actual user data was a somewhat less than the total system RAM.
I have great trouble keeping up with Kognitio’s shifting strategies. They so hate appliances that they circulate a letter saying Netezza shouldn’t have gone public. They sell appliances. They’re data-as-a-service. No, they sell products. It’s bewildering.
That said, http://www.dbms2.com/2008/12/14/kognitio-and-wx-2-update/ is supportive of your view. 🙂
[…] Last but certainly not least, we share the sentiment that traditional databases will eventually end up in RAM , because RAM is two-to-three orders of magnitude faster than disk. This storage would be more […]
Lots of useful information – Thanks for putting it together.
I’m quite surprised that i didn’t ran into it earlier.
Anyway i thought that this could be another useful resource:
Memory is the New Disk for the Enterprise
http://natishalom.typepad.com/nati_shaloms_blog/2010/03/memory-is-the-new-disk-for-the-enterprise.html
[…] We share Curt Monash’s sentiment that traditional databases will eventually end up in RAM , as memory costs continue to fall. In-memory analytics are popular because they are fast, often […]
[…] from disk to SSD, others have observed that many traditional, relational databases will soon be entirely in memory. This is particularly true for applications that require repeated, fast access to a full set of […]
[…] one of the best sources is Curt Monash’s DBMS2 blog. Recently he posted an article called Traditional Databases will eventually wind up in RAM. I have two comments about his points from that […]
[…] one of the best sources is Curt Monash’s DBMS2 blog. Recently he posted an article called Traditional Databases will eventually wind up in RAM. I have two comments about his points from that […]
[…] I maintain my opinion that traditional databases will eventually wind up in RAM. […]
[…] argued back in 2011 that traditional databases will wind up in RAM, basically because […]
[…] can make a good stream processing system, as Curt Monash’s points out on his research traditional databases will eventually end up in RAM. An example of how this can work in the context of real-time analytics for Big Data is provided […]
[…] in Memory: We share Curt Monash’s sentiment that traditional databases will eventually end up in RAM , as memory costs continue to fall. In-memory analytics are popular because they are fast, often […]
[…] On DB’s in RAM: Traditional databases will eventually wind up in RAM: […]