Streaming and complex event processing (CEP)
Discussion of complex event processing (CEP), aka event processing or stream processing – i.e., of technology that executes queries before data is ever stored on disk. Related subjects include:
Coral8
StreamBase
Truviso
Progress Apama
Applications for not-so-low-latency CEP
The highest-profile applications for complex event/stream processing are probably the ones that require super-low latency, especially in financial trading. However, as I already noted in writing about StreamBase and Truviso, there are plenty of other CEP apps with less extreme latency requirements.
Commonly, these are data reduction apps – i.e., there’s a gushing stream of inputs, and the CEP engine filters and “enhances” it, so that only a small, modified subset is sent forward. In other cases, disk-based systems could do the job perfectly well from a performance standpoint, but the pattern matching and filtering requirements are just a better fit for the CEP paradigm.
Read more
Categories: Aleri and Coral8, IBM and DB2, Memory-centric data management, StreamBase, Streaming and complex event processing (CEP), Structured documents | 3 Comments |
Applications for super-low-latency CEP
Complex event/stream processing vendors compete fiercely on the basis of low latency, down to the single-digit number of milliseconds, or even sub-millisecond levels. A question naturally springs to mind: When does this extreme low latency matter?
I think I’ve come up with a concise yet fairly accurate answer: Super-low latency matters when the application includes direct competition against a similarly fast opponent. The best example is automated stock trading – if you can exploit a market inefficiency 1 millisecond before your competition, you make money.
Other examples might arise in network security or battlefield systems, but I don’t know of any specific real-life cases. Instead, other applications for complex event/stream processing tend to be content with latencies that are easier to achieve. E.g., 100 milliseconds (1/10 of second) is likely to be plenty fast enough.
Categories: Investment research and trading, Memory-centric data management, Streaming and complex event processing (CEP) | 2 Comments |
Coral8 versus StreamBase
Besides talking about what Coral8 and StreamBase (and other CEP vendors) have in common, Mark Tsimelzon and I talked quite a bit about what he sees as some of the important differences. There were a lot, of course, but three in particular stood out.
1. Mark believes Coral8 has significantly lower latency than StreamBase. E.g., the Wombat/Coral8 combo achieves sub-millisecond latency, with Coral8 itself consuming less than a tenth of that. The best comparable figures from StreamBase that I currently know of are almost an order of magnitude slower.
Top-end speed aside, Mark believes that Coral8 is fundamentally better suited for complex queries and pattern recognition, while StreamBase works well with simpler queries. For example, his other performance claims notwithstanding, he concedes that StreamBase is at least comparable to Coral8 in its throughput for huge numbers of simple queries. (The number he mentioned was ½ million queries/second.) Indeed, while we barely talked about customer/marketing issues, Mark asserts that the companies’ respective customer bases reflect this complex/simple distinction.*
Read more
Categories: Aleri and Coral8, Memory-centric data management, Progress, Apama, and DataDirect, StreamBase, Streaming and complex event processing (CEP) | 5 Comments |
The essence of CEP according to Coral8
Last week, I complained that my first briefing with Coral8 wasn’t very technical. Wednesday I had a call with Mark Tsimelzon, CTO and founder of Coral8, and he made up for that in spades. In this post I’ll cover some of his general comments. Others will touch on more Coral8-specific topics, and his view of the Coral8/StreamBase comparison.
As Mark describes it, the big difference between a DBMS – even an in-memory DBMS – and a complex event processing engine is this: CEP engines do instantaneous incremental processing. He commonly refers to this as registering queries and operators for incremental evaluation. For example, suppose you need to maintain the sum of some data stream over the past 10 minutes. Then each second (or other short unit of time), the system adds in all the values that arrived in the past second, and subtracts all those that arrived 600-601 seconds ago. Voila! The sum is incrementally updated.
Now, rolling sums may not sound very interesting – but where you have rolling sums, you trivially also have rolling averages (just divide the sum by the count) and rolling standard deviations (same idea, with some squares and square roots mixed in). Those, of course, are primitives in Coral8 too. Ditto rolling maxima and minima. Ditto rolling joins (which are updated a lot like materialized views).
Read more
Categories: Aleri and Coral8, Memory-centric data management, Streaming and complex event processing (CEP) | 2 Comments |
Competitive claims in CEP
For the most part, the vendors I talk with in complex event/stream processing like and speak well of each other (most of the exceptions seem to involve StreamBase). Even so, there are a lot of interesting competitive claims and counterclaims in this market. Prior posts and comment threads have covered Apama/StreamBase jousting on the subjects of who has more business and how many financial data feeds StreamBase supports. Other areas that generate interesting sparks are performance, parallelism, and determinism. Read more
Categories: Aleri and Coral8, Investment research and trading, Memory-centric data management, Progress, Apama, and DataDirect, StreamBase, Streaming and complex event processing (CEP) | 1 Comment |
A deeper dive into Apama
My recent non-technical Apama briefing has now had a much more technical sequel, with charming founder and former Cambridge professor John Bates. He still didn’t fully open the kimono – trade secrets and all that — but here’s the essence of what’s going on.
Complex event/stream processing (CEP) is all about looking for many patterns at once. Reality – the stream(s) of data – is checked against these patterns for matches. In Apama, these patterns are kept in a kind of tree – they call it a hypertree — and John says the work to check them is only logarithmic in the number of patterns.
Since patterns commonly have multiple parts — and usually also take time to unfold — what really goes on is that partial matches are found, after which what’s being matched against is the REMAINDER of the pattern. Thus, there’s constant pruning and rebalancing of the tree. What’s more, a large fraction of all patterns – at least in the financial trading market — involve a short time window, which again creates a need for ongoing, rapid tree modification. Read more
Categories: Memory-centric data management, Progress, Apama, and DataDirect, Streaming and complex event processing (CEP) | 4 Comments |
The Coral8 story
Complex event/stream processing vendor Coral8 raised its hand and offered a briefing – non-technical, alas, but at least it was a start. Here are some of the highlights: Read more
StreamBase rebuts
In my post Monday about Apama, I complained that StreamBase hadn’t offered a rebuttal to some of Apama’s claims. This has now been fixed. 🙂 Bill Hobbib, StreamBase’s VP of Marketing wrote in. Part of what he had to say was the following.
Adapters to Data Feeds
Your blog comment that adapters doesn’t seem like a key competitive differentiator is accurate, and since adapters are so straightforward to develop with StreamBase as part of a customer engagement, we’ve never found adapters to be a key competitive differentiator. The comment by a competitor that their advantage over StreamBase comes from their having developed more adapters suggests they cannot distinguish themselves based on the other functional capabilities that are important to customers. In reality, our speed/performance and scalability are orders of magnitude superior to competitors, as is the speed with which StreamBase applications are developed, deployed, and modified when business needs change. (If it were easy to develop applications with certain competitive systems, then one might assume they would make free evaluation versions of their product available for download from their websites!)
That being said, StreamBase offers adapters to a broad array of data feeds. Most of these are offered out-of-the-box by StreamBase, including the following:
* Financial Market Data: processes data from Reuters® RMDS™ and Reuters Triarch™
* TIBCO® Rendezvous™: converts Rendezvous message into StreamBase tuples and vice versa.
* StreamBase Adapter for JDBC: connects StreamBase to enterprise databases, allowing submission of SQL queries to external resources such as IBM® DB2™, Oracle®, Microsoft® SQLServer™, and Sybase®.
* StreamBase Adapter for JMS: integrates StreamBase with any JMS-compliant message bus.
* StreamBase Adapter for Microsoft Excel™: allows applications to publish data to Excel or read data from Excel.
* StreamBase CSV Adapters: allow applications to read data from, and write data to, comma-separated value (CSV) files.
* StreamBase SMTP adapter: taps into the IP stack on a running system to process live data, converts the IP packets into a TCP data stream, or reads IP packets from captured files.
* StreamBase XML Adapter: streams XML-formatted data records into and out of StreamBase applicationsWe also can connect to financial exchanges either using our own adapters or through a third-party partnership. Below you’ll find a listing of those.
Categories: Memory-centric data management, Progress, Apama, and DataDirect, StreamBase, Streaming and complex event processing (CEP) | Leave a Comment |
Progress Apama
I finally got my promised briefing with Progress Apama. Unfortunately, nobody particularly technical was able to attend, but I came away with a better understanding even so.
Unlike StreamBase or Truviso, Apama has a rules-based architecture. In essence, the rules engine maintains state of various kinds, and matches that state against desired patterns, called “scenarios.” They can handle 100s or possibly even 1000s of scenarios at once. Read more
Categories: Memory-centric data management, Progress, Apama, and DataDirect, Streaming and complex event processing (CEP) | 2 Comments |
Nonstandard data management software — beyond the Bowling Alley?
I just finished a short Monash Letter on markets for nonstandard data management software. Of course, the whole thing is available only to Monash Advantage members, but here are some salient points:
- When new kinds of data are managed, new kinds of data management are used. More precisely, the old ways are tried first — but once they fail new technologies are tried out.
- Up through the “Bowling Alley,” markets for nonstandard data management technology commonly follow the classic Geoffrey Moore pattern. However, they rarely experience a “Tornado” or mass adoption.
- I think this is apt to change. My three strongest candidates are native XML, RDF, and memory-centric event/stream processing used for data reduction (as opposed to sub-millisecond latency, which I do think will continue to be a niche requirement).