Memory-centric data management
Analysis of technologies that manage data entirely or primarily in random-access memory (RAM). Related subjects include:
- Oracle TimesTen
- solidDB
- QlikTech
- SAP‘s BI Accelerator
- Exasol
- Solid-state memory as a replacement for disk
Data warehouse appliances
If we define a “data warehouse appliance” as “a special-purpose computer system, with appliance administratibility, that manages a data warehouse,” then there are two major contenders: Netezza and DATAllegro, both startups, both with a small number of disclosed customers. Past contenders would include Teradata and White Cross (which seems to have just merged into Kognitio), but neither would admit to being in that market today. (I suspect this is a mistake on Teradata’s part, but so be it.) IBM with DB2 on the z-Series wouldn’t be properly regarded as an appliance player either, although IBM is certainly conscious of appliance competition. And SAP’s BI Accelerator does not persist data at this time.
In principle, the Netezza and DATAllegro stories are similar — take an established open source RDBMS*, build optimized hardware to run it, and optimize the software configuration as well. Much of the optimization is focused on getting data on and off disk sequentially, minimizing any random accesses. This is why I often refer to data warehouse appliances as being the best alternative to memory-centric data management. Beyond that, the optimizations by the two vendors differ considerably.
*Netezza uses PostgreSQL; DATAllegro uses Ingres.
Hmm. I don’t feel like writing more on this subject at this very moment, yet I want to post something urgently because there’s an IOU in my Computerworld column today for it. OK. More later.
Categories: Actian and Ingres, Companies and products, Data warehouse appliances, DATAllegro, DBMS product categories, IBM and DB2, Memory-centric data management, Open source, SAP AG | Leave a Comment |
White paper on memory-centric data management — excerpt
Here’s an excerpt from the introduction to my new white paper on memory-centric data management. I don’t know why WordPress insists on showing the table gridlines, but I won’t try to fix that now. Anyhow, if you’re interested enough to read most of this excerpt, I strongly suggest downloading the full paper.
|
Introduction
|
Conventional DBMS don’t always perform adequately. |
Ideally, IT managers would never need to think about the details of data management technology. Market-leading, general-purpose DBMS (DataBase Management Systems) would do a great job of meeting all information management needs. But we don’t live in an ideal world. Even after decades of great technical advances, conventional DBMS still can’t give your users all the information they need, when and where they need it, at acceptable cost. As a result, specialty data management products continue to be needed, filling the gaps where more general DBMS don’t do an adequate job.
|
Memory-centric technology is a powerful alternative. |
One category on the upswing is memory-centric data management technology. While conventional DBMS are designed to get data on and off disk quickly, memory-centric products (which may or may not be full DBMS) assume all the data is in RAM in the first place. The implications of this design choice can be profound. RAM access speeds are up to 1,000,000 times faster than random reads on disk. Consequently, whole new classes of data access methods can be used when the disk speed bottleneck is ignored. Sequential access is much faster in RAM, too, allowing yet another group of efficient data access approaches to be implemented.
|
It does things disk-based systems can’t. |
If you want to query a used-book database a million times a minute, that’s hard to do in a standard relational DBMS. But Progress’ ObjectStore gets it done for Amazon. If you want to recalculate a set of OLAP (OnLine Analytic Processing) cubes in real-time, don’t look to a disk-based system of any kind. But Applix’s TM1 can do just that. And if you want to stick DBMS instances on 99 nodes of a telecom network, all persisting data to a 100th node, a disk-centric system isn’t your best choice – but Solid’s BoostEngine should get the job done.
|
Memory-centric data managers fill the gap, in various guises. |
Those products are some leading examples of a diverse group of specialist memory-centric data management products. Such products can be optimized for OLAP or OLTP (OnLine Transaction Processing) or event-stream processing. They may be positioned as DBMS, quasi-DBMS, BI (Business Intelligence) features, or some utterly new kind of middleware. They may come from top-tier software vendors or from the rawest of startups. But they all share a common design philosophy: Optimize the use of ever-faster semiconductors, rather than focusing on (relatively) slow-spinning disks.
|
They have a rich variety of benefits. |
For any technology that radically improves price/performance (or any other measure of IT efficiency), the benefits can be found in three main categories:
For memory-centric data management, the “things that you couldn’t do before at all” are concentrated in areas that are highly real-time or that use non-relational data structures. Conversely, for many relational and/or OLTP apps, memory-centric technology is essentially a much cheaper/better/faster way of doing what you were already struggling through all along.
|
Memory-centric technology has many applications. |
Through both OEM and direct purchases, many enterprises have already adopted memory-centric technology. For example: |
|
|
Categories: Data types, Memory-centric data management, MOLAP, Object, OLTP, Open source, Progress, Apama, and DataDirect | 3 Comments |
Memory-centric data management whitepaper
I have finally finished and uploaded the long-awaited white paper on memory-centric data management.
This is the project for which I origially coined the term “memory-centric data management,” after realizing that the prevalent “in-memory DBMS” creates all sorts of confusion about how and whether data persists on disk. The white paper clarifies and updates points I have been making about memory-centric data management since last summer. Sponsors included:
- Applix, vendors of in-memory/memory-centric MOLAP tool TM1
- Progress Software, vendors of ObjectStore, an OODBMS that has more impressive references in-memory or otherwise memory-centric than it does in classical disk-based configurations, and also of the Apama stream processing products
- SAP, vendors of the BI Accelerator functionality of SAP NetWeaver, or whatever tortured name they want to give it this month — basically, that’s a very cool in-memory columnar data mart technology
- Solid Information Technology, vendor of hybrid in-memory/disk-based OLTP RDBMS. Historically focused on the embedded systems market, especially telecom and networking, they’ve recently been in the news because of a deal with MySQL that is designed to extend their reach.
- Intel, makers of the processors used to run a lot of the other sponsors’ products (including all BI Accelerator installations to date).
If there’s one area in my research I’m not 100% satisfied with, it may be the question of where the true hardware bottlenecks to memory-centric data management lie (it’s obvious that the bottleneck to disk-centric data management is random disk access). Is it processor interconnect (around 1 GB/sec)? Is it processor-to-cache connections (around 5 GB/sec)? My prior pronouncements, the main body of the white paper, and the Intel Q&A appendix to the white paper may actually have slightly different spins on these points.
And by the way — the current hard limit on RAM/board isn’t 2^64 bytes, but a “mere” 2^40. But don’t worry; it will be up to 2^48 long before anybody actually puts 256 gigabytes under the control of a single processor.
Categories: Cognos, Companies and products, In-memory DBMS, Intel, Memory-centric data management, MOLAP, Open source, Progress, Apama, and DataDirect, SAP AG, solidDB | 2 Comments |
Solid/MySQL fit and positioning
I felt like writing a lot about the great potential fit between MySQL and Solid over the weekend, but Solid didn’t want me to do so. Now, however, I’m not in the mood, so I’ll just say that in OLTP, Solid’s technology is strong where MySQL’s is weak, and vice-versa. E.g., Solid is so proud of its zero-administration capabilities that, without MySQL, it doesn’t have much in the way of admin tools at all. Conversely, I think that many of those websites that crash all the time with MySQL errors would crash less with the Solid engine underneath. (Solid happens to be proud of its BLOB-handling capability, efficiency-wise.)
Neither outfit is good in data warehousing, or in text search, image search, etc. (Solid slings big files around, but it doesn’t peer closely inside them). But for OLTP of tabular or dumb media data, this looks like a great fit.
Whether anybody will care, however, is a different matter.
Lisa Vaas of eWeek offers a survey of the many MySQL engine options.
EDIT: Another Lisa Vaas article makes it clear that MySQL is planning to compete in data warehousing/OLAP as well.
Categories: Memory-centric data management, Mid-range, MySQL, OLTP, Open source, solidDB | 4 Comments |
More on Solid and MySQL?
In a stunningly self-defeating move, my friends at Solid have decided that anything about their already-leaked possible cooperation with MySQL is embargoed.
Indeed, they’ve emphasized to me multiple times that they do not wish me to write about it.
I shall honor their wishes. I hope they are pleased with the sophistication and insight of the coverage they receive from other sources.
Categories: Memory-centric data management, Mid-range, MySQL, OLTP, Open source, solidDB | 3 Comments |
MySQL gets the Solid engine
Solid and MySQL have struck a deal (and for some odd reason I had to find out about it from Slashdot and then here rather than from one one the companies). Apparently Solid will open source a version of its storage engine, to be used with the MySQL front-end.
Solid’s core technology is a lightweight, zero-administration OLTP RDBMS. And they really mean “zero-administration,” because as they like to point out, a typical deployment is embedded in a piece of telecom equipment that doesn’t even have a keyboard. Now, that doesn’t really mean the Solid engine would still be zero-administration in other applications, but sure aren’t talking about something as prickly as, say, Oracle.
That said, Solid’s technology has its limitations. It isn’t historically designed for the query load (volume or mix) of, say, an SAP installation. It certainly doesn’t have much in the way of data warehousing functionality. And it doesn’t have much in the way of administration tools itself (although presumably MySQL will fill that gap).
One very important aspect of the Solid technology is its hybrid memory-centric design. Much more on that soon. My white paper on memory-centric data management is finally close to publication, with Solid as a co-sponsor. At some point I’ll even do a webinar for them associated with the paper.
I don’t know whether that’s part of the MySQL relationship — it would be very cool if it were.
Categories: Memory-centric data management, Mid-range, MySQL, OLTP, Open source, solidDB | 2 Comments |
Computerworld on memory-centric data management
Computerworld recently ran an excellent story on memory-centric data management. The opening sentences show that correspondent Gary Anthes most definitely “gets it”:
Relational database management systems have become all but ubiquitous in enterprise computing since 1970, when they were first devised by E.F. Codd. But as powerful and flexible as those databases are, they’ve proved inadequate for a handful of ultrademanding applications that have to process hundreds or thousands of transactions per second and never go down.
I’m quoted in one of the sidebars, but with the core article being this good I didn’t really add much.
Incidentally, the article talked a lot about Oracle’s recently acquired TimesTen in-memory DBMS product, and also a fair amount about Streambase. This is complementary to my own research, which has focused more on the other leading memory-centric data management vendors.
Categories: Memory-centric data management, Oracle | Leave a Comment |
DB2 Express-C
IBM announced the freeware version of DB2 today. I’ll post links to the details later, but I want to highlight a couple of interesting implications:
1. They define the cutoff between the free and paid version not by how big a database you can manage on disk, but rather by how much RAM the software can address. This supports my thesis that effective use of RAM is crucial to DBMS performance, and is corollary — specially optimized memory-centric data management products deserve a place in most large enterprises’ product portfolios.
2. Having a free version of DB2 lets one play with whatever features DB2 may have that simply aren’t available in other DBMS, to see if they’re worth using. And the most significant such feature, in my opinion, is native XML storage. Whatever else this product does or doesn’t accomplish, it may serve to speed adoption of IBM’s native XML server technology.
Categories: IBM and DB2, Memory-centric data management, Mid-range, OLTP, Structured documents | Leave a Comment |
Detailed webinar on memory-centric technology
I did a webinar on memory-centric data management for Applix. It was the standard hour in length, but they had me do the vast majority of the talking, so I laid out my ideas in some detail.
In line with their business focus, I emphasized OLAP in general and MOLAP in particular. But I did have a chance to lay out pretty much the whole story.
There’s a lot of material in it I haven’t published yet in written form, and some nuances I may never get around to writing down. So if you’re sufficiently interested in the area, I recommend watching the webinar.
Categories: Memory-centric data management, MOLAP | Leave a Comment |
Memory-centric research — hear the latest!
What I’ve written so far in this blog (and in Computerworld) about memory-centric data management technology is just the tip of the iceberg. A detailed white paper is forthcoming, sponsored by most of the industry leaders: Applix, Progress, SAP, Intel (in association with SAP), and Solid. (But for some odd reason Oracle declined to participate …)
A lot of the material will be rolled out publically for the first time in a webinar on Wednesday, January 25, at 11 EST. Applix is the host. To participate, please follow this link.
I’m also holding forth online, in webinars and even video, on other subjects these days. More details may be found over in the Monash Report.
Categories: Memory-centric data management, MOLAP | Leave a Comment |