Bottleneck Whack-A-Mole
Developing a good software product is often a process of incremental improvement. Obviously, that can happen in the case of feature addition or bug-fixing. Less obviously, there’s also great scope for incremental improvement in how the product works at its core.
And it goes even further. For example, I was told by a guy who is now a senior researcher at Attivio: “How do you make a good speech recognition product? You start with a bad one and keep incrementally improving it.”
In particular, I’ve taken to calling the process of enhancing a product’s performance across multiple releases “Bottleneck Whack-A-Mole” (rhymes with guacamole). This is a reference to the Whack-A-Mole arcade game,* the core idea of which is:
- An annoying mole pops its head up.
- You whack it with a mallet.
- Another pops its head up.
- You whack that one.
- Repeat, as mole_count increments to a fairly large integer.
*You can see Whack-A-Mole in a great picture here.
Improving performance in, for example, a database management system has a lot in common with Whack-A-Mole. Unclog your worst performance bottleneck(s), and what do you get? You get better performance, limited by other bottlenecks, which may not even have been apparent while the first ones were still in place. For example, Oracle is surely going through that now with Exadata. In its very first release, Exadata probably solved the basic I/O problem that had been limiting Oracle’s analytic query performance – edge cases perhaps aside. With that out of the way, Oracle now gets to:
- Attend to the edge cases
- Fix whatever other bottlenecks are next-worst in the highly engineered, highly complex Oracle DBMS.
When I spoke with Oracle’s development managers last fall, they didn’t really know how many development iterations would be needed to get the product truly unclogged. Of course, they professed optimism — which seemed quite sincere – that it wouldn’t be many iterations at all. But they confessed, as well they should have, to not truly knowing.
*In one way, the metaphor falls short – in the game, you have to whack a mole quickly or else you lose your opportunity entirely, while in software the problems just linger until you fix them. Well – who ever said games were PERFECT mirrors of reality? 🙂
Netezza is an even better example. Originally, Netezza had a “fat head,” in which a lot of query processing was done at a single master node. They fixed that, whereupon they had to get data redistribution right. Now Netezza’s performance focus is in yet different areas.
And in line with this theory – if you plotted a graph comparing analytic DBMS product age vs. maximum number of concurrent users supported, you could get a strong fit to a monotonically increasing curve. Evidently, concurrent performance is another of those things that takes multiple product revisions to get right.
Comments
24 Responses to “Bottleneck Whack-A-Mole”
Leave a Reply
I think this is the one and only CAM post I’ve ever seen without comments 🙂
So as mine was too large to fit, I posted it on my blog instead at http://jeromepineau.blogspot.com
Apologies for the plagiarism.
I’m glad you didn’t drop a direct link to the post here, Jerome. It wasn’t one of your smarter ones.
If you’re claiming that your employer precisely predicted every aspect of product performance in every release, multiple releases and years in advance, I find it hard to believe you.
If you’re not claiming that, the whole premise of your post is wrong.
@Curt, no I’m not implying anything about my employer besides the fact that their engineering practices might, apparently from your post, be superior to a company that has been around for 30 years, spent 100s of millions of dollars on development, claims market leadership, costs millions of dollars to buys, but is yet apparently still “putzing” around with performance issues – If I were a customer reading this, I’d have to think twice about dropping $6M on a product which “probably solved the I/O problem” — I might want a little better reassurance. As far as I know Oracle has some fairly impressive benchmarks too so I’m not sure why their own people would express doubt about their capabilities. It’s puzzling to me.
That is the premise of my post and nothing else.
Oracle expressed doubt because they were honest when I pressed them.
I am sorry that you do not live up to the same standard in this matter.
CAM
And that’s all in their honor clearly! But I don’t see how I am being dishonest in the least by simply pointing out what you yourself have written about and drawing conclusions. Everyone is free to interpret your paraphrasing of Oracle as they wish.
You’ll notice a similar trend with InnoDB recently. As computers get faster, new bottlenecks show up that were not as evident before, so you whack ’em. When we whack those, new ones show up. Remember that these software products were produced in a time when hardware technology was significantly different, so it is understandable that as time goes by, incremental improvements can be made.
As far as Exadata goes, didn’t Oracle buy them? It is reasonable to expect that it will take some time for Oracle engineers to fully dig into every last bit of the code.
It is also possible that due to time constraints, certain performance features or optimizations are not put into development, because it is unclear how useful such optimization will be in the real world, since every database faces a multitude of workload scenarios. Some of these become big bottlenecks which are whack’ed in the next release.
It is simply the nature of a database product with a long lifecycle.
Sorry, my bad, it is homegrown.
I totally get that, and in no way am I implying it’s possible or realistic to get everything right on the first try, but at least plan/goal is nice. Now,m clearly you improve release by release (hopfully) but to simply go into such an endeavor strategizing with “oh well, we’ll cross the bridge when we get there, it’s an iterative process anyway” is a little shocking to me from a company of such size and resources. You’d think by now they would have pretty much figured out all this stuff no? If not maybe they’ve hit a wall? I mean correct me if I’m wrong but this Exadata is basically a storage layer designed to feed faster I/O to the same old RAC database isnt’t it? I find it less than re-assuring (thinking in a customer’s shoes) that they would then say hey, we’re not sure how many other rounds we’ll need to “unclog” this thing (not my term, mind you) – If that’s the case then be upfront about it I say — no shame there I’m sure Oracle’s engineering teams are probably some of the best in the world. You dont get to where they are by sucking at doing this 🙂
“Fix whatever other bottlenecks are next-worst in the highly engineered, highly complex Oracle DBMS.”
And highly expensive, I might add. I guess herein lies the problem. When you have this level of complexity/engineering, you tend to lose control. Inherently, this is the message being put out here from where I stand. I don’t think Oracle’s complexity is a big secret and as you point out, their lifecycle is quite long, which also explains the issue I suppose – It’s hard to control something so huge and so old.
Jerome,
No, you’re not free to read my paraphrase very differently than I read it.
Oracle views its development operation through rosy glasses similar to those through which you view yours. But, as I said, they’re honest enough to admit that they could be mistaken.
I continue to find it regrettable that you are (were) punishing them for their honesty in what looks like an attempt to score cheap marketing points at their expense. Hence my vigorous defense of them against your misrepresentation.
CAM
Curt,
I addressed the same from Greg on my blog so I won’t re-iterate here (too much echo ). The day I am either competent enough or powerful enough to “punish” anything the size/success of Oracle is not likely to come 🙂
Users will come up with new ways to use a certain technology and hardware vendors come with new technology too. So there will always be new and unexpected moles that needs to be wacked.
@RC: I think I’ll stick to Kevin Closson’s last comment on my blog and leave it at that 🙂
Thanks.
[…] may be such simple algorithms that they’re not patentable. What’s left over is incremental enhancement. Once again, O’Grady is […]
[…] the most part, Greenplum 4.0 is focused on general robustness catch-up and Bottleneck Whack-A-Mole, much like the latest releases from fellow analytic DBMS vendors Vertica and Aster […]
[…] Performance and bottleneck cleanup. […]
[…] often write of Bottleneck Whack-A-Mole, an engineering approach that ensues when parts of a system are out of balance. Well, the flip side […]
[…] improvement can indeed be made, given how few resources CouchDB has been able to devote to date to Bottleneck Whack-A-Mole. Categories: Cache, Clustering, Couchbase, Memory-centric data management, MySQL, […]
[…] management, than is needed for initial creation of something cool-but-fragile. What’s more, the schedule of problem-fixing can be hard to predict — if you knew everything about how to fix your product problems, you wouldn’t have […]
[…] Bottleneck Whack-A-Mole (August, 2009) Categories: In-memory DBMS, Sybase, Theory and architecture Subscribe to our complete feed! […]
[…] — feature set, performance, […]
[…] upgrades, major and minor alike) It adds some feature catch-up and Bottleneck Whack-A-Mole that we hope will make more conservative prospects find it newly OK to buy from […]
[…] Surely some general Bottleneck Whack-A-Mole. […]
[…] Bottleneck Whack-A-Mole. […]