Streaming and complex event processing (CEP)
Discussion of complex event processing (CEP), aka event processing or stream processing – i.e., of technology that executes queries before data is ever stored on disk. Related subjects include:
Coral8
StreamBase
Truviso
Progress Apama
More on stream processing integration with disk-based DBMS
Mike Stonebraker wrote in with one “nit pick” about yesterday’s blog. I had credited Truviso for strong DBMS/stream processor integration. He shot back that StreamBase has Sleepycat integrated in-process. He further pointed out that a Sleepycat record lookup takes only 5 microseconds if the data is in cache. Assuming what he means is that it’s in Sleepycat’s cache, that would be tight integration indeed.
I wonder whether StreamBase will indefinitely rely on Sleepycat, which is of course now an Oracle product …
Categories: Memory-centric data management, Michael Stonebraker, Oracle, StreamBase, Streaming and complex event processing (CEP) | Leave a Comment |
Mike Stonebraker on financial stream processing
After my call with Truviso and blog post referencing same, I had the chance to discuss stream processing with Mike Stonebraker, who among his many other distinctions is also StreamBase’s Founder/CTO. We focused almost exclusively on the financial trading market. Here are some of the highlights. Read more
Categories: Memory-centric data management, Michael Stonebraker, StreamBase, Streaming and complex event processing (CEP), Truviso | Leave a Comment |
StreamBase and Truviso
StreamBase is a decently-established startup, possibly the largest company in its area. Truviso, in the process of changing its name from Amalgamated Insight, has a dozen employees, one referenceable customer, and a product not yet in general availability. Both have ambitious plans for conquering the world, based on similar stories. And the stories make a considerable amount of sense.
Both companies’ core product is a memory-centric SQL engine designed to execute queries without ever writing data to disk. Of course, they both have persistence stories too — Truviso by being tightly integrated into open-source PostgreSQL, StreamBase more via “yeah, we can hand the data off to a conventional DBMS.” But the basic idea is to route data through a whole lot of different in-memory filters, to see what queries it satisfies, rather than executing many queries in sequence against disk-based data. Read more
Categories: Memory-centric data management, PostgreSQL, StreamBase, Streaming and complex event processing (CEP), Truviso | 8 Comments |
Oracle, Tangosol, objects, caching, and disruption
Oracle made a slick move in picking up Tangosol, a leader in object/data caching for all sorts of major OLTP apps. They do financial trading, telecom operations, big web sites (Fedex, Geico), and other good stuff. This is a reminder that the list of important memory-centric data handling technologies is getting fairly long, including:
- Object caching (e.g., Tangosol, Progress ObjectStore)
- In-memory RDBMS (e.g., Oracle TimesTen, Solid BoostEngine, McObject eXtremeDB)
- Stream processing (e.g., Progress Apama, Streambase)
And that’s just for OLTP; there’s a whole other set of memory-centric technologies for analytics as well.
When one connects the dots, I think three major points jump out:
- There’s a lot more to high-end OLTP than relational database management.
- Oracle is determined to be the leader in as many of those areas as possible.
- This all fits the market disruption narrative.
I write about Point #1 all the time. So this time around let me expand a little more on #2 and #3.
Read more
Defining and surveying “Memory-centric data management”
I’m writing more and more about memory-centric data management technology these days, including in my latest Computerworld column. You may be wondering what that term refers to. Well, I’ve basically renamed what are commonly called “in-memory DBMS,” for what I think is a very good reason: Most of the products in the category aren’t true DBMS, aren’t wholly in-memory, or both! Indeed, if you catch me in a grouchy mood I might argue that “in-memory DBMS” is actually a contradiction in terms.
I’ll give a quick summary of the vendors and products I am focusing on in this newly-named category, and it should be clearer what I mean:
- TimesTen (now owned by Oracle): TimesTen is the quintessentional “in-memory DBMS.” It’s a fairly full relational DBMS, but if you want to persist memory to disk it has to be handed off to a conventional DBMS. Historically, that has usually been MySQL or Oracle. TimesTen’s biggest market penetration has been in financial trading.
- Solid Information Technology‘s BoostEngine: Solid is a Finnish company (or was — it’s pretty American now) specializing in embedded DBMS sold mainly for telecommunication uses. Big OEM customers include several well-known telecom equipment manufacturers and HP (for OpenView). “Embedded” often means no DBA, no monitor, no keyboard — they box manufacturer installs it and there it stays for the life of the product. Solid has to offer strong replication capabilities, since its products are often used in highly distributed (e.g., multiblade, multibox) environments. So it’s taken the next step and exploited the replication by allowing customers to use some instances of the product disklessly.
- Event-stream products from Streambase and Progress: The canonical application for event-stream products is automating financial trading decisions based on the flow of market information. Mike Stonebraker, the brains behind Streambase, has recently popularized the idea; Progress bought Apama, who actually have been in the business longer. These applications require even more speed than the financial trading apps that TimesTen handles, and they discard most of the information they look at. In-memory is the only way to go.
- Progress’s ObjectStore: ObjectStore comes from the company Object Design, which merged into Excelon, which was acquired by Progress. It’s really a toolkit for building DBMS and similar systems, which is why it’s at various times been marketed as an OODBMS and an XML DBMS, without a lot of success either way. But there have been a few sterling apps built in ObjectStore even so, including a key part of the Amazon bookstore Despite this limited market success, a significant fraction of Progress’s best engineering talent has moved over to the Real-Time Division to focus on ObjectStore and other memory-centric products. The memory-centric aspect of ObjectStore is this: ObjectStore’s big virtue is that it gets objects from disk to memory and vice-versa very efficiently, then distributes and caches them around a network as needed. This was originally invented for client/server processing, but works fine in a multi-server thin client setup as well. And object processing, of course, relies on a whole lot of pointers. And pointer-chasing is pretty much the worst way to deal with the disk speed barrier, unless you do it in main memory.
- Applix‘s TM1: Like many companies in the analytics area, Applix has had trouble deciding whether it sells applications, BI system software, or both. But in any case its core technology is TM1, a memory-centric MOLAP offering. Traditional MOLAP products reside on the horns of a nasty dilemma: They rely on precalculation to give good performance, but that causes ghastly database explosion. Applix gets out of this problem by doing no precalculation whatsoever, loading the data into main memory, and executing all queries on the fly.
- SAP’s BI Accelerator: SAP is building out an elaborate technology stack with NetWeaver, especially in the BI area. One important aspect is that the full data warehouse is logically broken (or copied) into a series of data marts called “InfoCubes.” BI Accelerator takes the logical next step, loading an entire InfoCube into main memory. Almost every query is executed via a full table scan, which would be insane on disk but makes perfect sense when the data is already in RAM.
So there you have it. There are a whole lot of technologies out there that manage data in RAM, in ways that would make little or no sense if disks were more intimately involved. Conventional DBMS also try to exploit RAM and limit disk access, via caching; but generally the data access methods they use in RAM are pretty similar to those they use when going out to disk. So memory-centric systems can have a major advantage.