May 21, 2009
Notes on CEP performance
I’ve been talking to CEP vendors on and off for a few years. So what I hear about performance is fairly patchwork. On the other hand, maybe 1-2+ year-old figures of per-core performance are still meaningful today. After all, Moore’s Law is being reflected more in core count than per-core performance, and it seems CEP vendors’ development efforts haven’t necessarily been concentrated on raw engine speed.
So anyway, what do you guys have to add to the following observations?
- Super-low-latency financial services industry tasks are often “embarrassingly parallel.” Thus, near-linear scale-out is common.
- That said, good parallelism seems fairly new in CEP engines (of course, CEP engines are fairly new themselves — for all I know, some have been parallel since inception).
- I’ve heard claims of up to 400,000 messages/second/core for simple queries or patterns.
- I’ve heard claims of 70,000 messages/core for not-so-simple queries or patterns, and probably higher than that depending on what the meaning of “simple” is.
- IBM just disclosed >15,000 messages/core on a pretty low-powered processor.
- I’ve heard that Coral8, Apama, and StreamBase rarely lost deals due to performance or throughput problems. I’ve heard that the same is not as true of Aleri.
- StreamBase proudly says it’s been fully multithreaded since academic research-project days. For Apama multithreading is evidently a more recent feature. But does it matter much?
Categories: Aleri and Coral8, IBM and DB2, Memory-centric data management, Progress, Apama, and DataDirect, StreamBase, Streaming and complex event processing (CEP)
Subscribe to our complete feed!
Comments
13 Responses to “Notes on CEP performance”
Leave a Reply
[…] performance may not be all that great a source of CEP competitive differentiation, event processing vendors find plenty of other bases for technological competition, including […]
> Super-low-latency financial services industry
> tasks are often “embarrassingly parallel.”
No. FSI uses of events are so broad and varied that you can’t put them in any one category. Some tasks are easily parallelized and scaling has become almost a commodity, while some tasks are so hard yet so important that for 30 years they have driven a significant portion of the sales volume of “supercomputers” and hardware accelerated math engines.
> That said, good parallelism seems fairly new in
> CEP engines
I doubt that we are at a point where anyone should make generalizations about parallelism. I think in terms of usage decisions, and making a usage decision about a product, or the category of CEP products in general, based on “has good parallelism” would be a mistake. There is no standard for parallelism and “good parallelism” highly depends on the application.
What you can say is that a few engines never allowed multiple threads to interact with the engine data structures, so they were effectively single threaded. And they have been working to allow for multiple threads. Where as other products have always used multiple threads. But that is too simple a description for making a usage decision. Multiple threads absolutely does not mean better performance, lower latency or whatever metric you want to apply. For example, the shared-nothing paradigm that is often hyped in performance circles involves breaking down a large processing task into what are essentially many single threaded applications that do not interact.
So I would say that some vendors are expanding their threading capabilities, but that is where I would stop.
In terms of the SB vs Apama threading thing, that is something that *really* needs to be evaluated on an application by application basis. I would not recommend anyone make a usage decision without understanding exactly what each vendor means by multithreaded. No one should be thinking “oh it supports multithreading, so that must be good for me because I need multithreading.”
Hans,
I wouldn’t disagree with any of that, except perhaps to the extent you may be implying that since the financial services industry worked w/o parallelism for a long time, a large fraction of its app categories can’t benefit much from parallelism going forward.
CAM
Threads are just a programming abstraction for concurrency. Event programming is just another. See this classic paper for a discussion of their duality. Nobody should argue about something being good just because it’s threaded, or just because it’s event-based. Make them show you their performance.
Eric Brewer and folks here at Berkeley wrestled with the practical ramifications of this over many years, in part due to Eric’s experience building the Inktomi search engine. Papers worth reading on that front include Matt Welsh’s work on SEDA (“threads can’t scale, events win”), and Rob von Behren’s work on Capriccio (“threads win, events are a bad idea”).
For a long discussion of how the relational database vendors do it (answer: they do it every which way!) see our survey on database system architecture.
Joe,
My point on event programming, in the other post, were about PROGRAMMING. And even there I was just passing through vendor claims. 😉
Seriously, I’m a huge believer in the theory that the best programming paradigm is usually the one that gives the best modularity. Two complications in that are:
1) It has to be best both for the “core” of the task and for the “other stuff” — UI, integration, etc.
2) This is a Goldilocks kind of test. Too much modularity can be almost as bad as too little. (Classic examples — debugging programs that rely too much on stored procedures, or almost anything to do w/ rules engines.)
>>> I’ve heard that Coral8, Apama, and StreamBase rarely lost deals due to performance or throughput problems. I’ve heard that the same is not as true of Aleri.
That is very surprising. Aleri always touted themselves as the most performant of the CEP engines. They widely proclaim that they were the only CEP vendor to submit themselves to Stac Research for benchmarking, and they have publicly challenged all comers.
I would be interested in the use case where Aleri was thrown out. I would have expected them to be thrown out because of usability issues, not because of performance.
“Events per second” processing throughput is, like the rules per second throughput for rules engines, spectacularly irrelevant as a metric unless you specify what the “processing” is. Aggregating or comparing with the prior event? Prior event from a week ago? Prior 1000 events? Some deep XML payload component in those events? etc. etc.
In other words, how complex are the event (patterns) you are outputting?
Caveat emptor, as usual.
Cheers
Paul,
IBM’s Blue Gene/TD Financial case sounded like VWAP and other pretty standard tickstream stuff.
Not sure where you get your information Curt, but Aleri has never lost a deal on performance grounds. You should check your sources. In fact we are the only CEP vendor to have submitted our engine for objective 3rd party benchmarking. See the STAC report: stacresearch.com/aleri. I would welcome other CEP vendors to do the same.
Jeff,
Different competitors in a deal commonly see it differently. And some deals are evaluated fairly, some not so much.
As for STAC — I don’t know a lot about it. And I find the website pretty hard to learn anything from. Outfits that don’t put any management names on a site often are a little shaky,
In theory, however, STAC’s goals sound quite worthy.
CAM
On the subject of performance, we, at University of Coimbra, have been running many many micro-benchmarks on CEP engines and we found… huge differences in performance.
Some engines can do trivial selections at millions of events per second while others do only 6-7x less. Joins are even more extreme, with some configurations about 20x better than others. The type of window (sliding or jumping) also affects performance quite a lot but does not affect all engines in the same way. In some challenging pattern matching queries, throughput was only about hundreds of events per second or the system couldn’t handle them.
Memory consumption varies wildly (e.g., from 50 MB to 15 GB) for the same query running in different engines. CPU utilization also varies and some engines are more bursty than others affecting average and worst case response times.
Overall, no engine (we tested) was the best in all scenarios but some were clearly ahead in most cases.
Pedro,
Sounds fascinating!
Do you have any specifics you could share?
Best,
CAM
[…] Complex event/stream processing, which I’ve written quite a bit about too […]