Agile predictive analytics – the heart of the matter
I’ve already suggested that several apparent issues in predictive analytic agility can be dismissed by straightforwardly applying best-of-breed technology, for example in analytic data management. At first blush, the same could be said about the actual analysis, which comprises:
- Data preparation, which is tedious unless you do a good job of automating it.
- Running the actual algorithms.
Numerous statistical software vendors (or open source projects) help you with the second part; some make strong claims in the first area as well (e.g., my clients at KXEN). Even so, large enterprises typically have statistical silos, commonly featuring expensive annual SAS licenses and seemingly slow-moving SAS programmers.
As I see it, the predictive analytics workflow goes something like this:
- Business-knowledgeable people develop a theory as to what kinds of information and segmentation could be valuable in making better business micro-decisions.
- Statistics-knowledgeable people determine a structure for modeling that reflects this theory.
- Statistics-knowledgeable people tweak the model over time, within a fixed general structure, as new data comes in.
- (Optional) Somebody sees to acquiring whatever data is needed that the organization doesn’t already have (and won’t get in the ordinary course of ongoing business).
The optional last part can be a purchase of third-party information (relatively fast and easy) or the development of a business process (and if necessary associated software) to capture the information (not always so easy). But even if that’s taken care of, or not present, we have at least two hand-offs where agility can be lost:
- Businesspeople may throw a request “over the wall” to the statisticians, who then work on it as their schedule permits.
- Once created, a model may be so set in stone that even small changes are as hard as building a new model from scratch.
The second problem can be solved by the statisticians themselves, without outside involvement. Model research and model refinement should be separate processes. You can recheck your clustering on one schedule, but recalibrate your regressions against each cluster more frequently. If that all sounds forbiddingly difficult, perhaps your model recalibration process needs another level of automation.
So I’ve finally gotten to the point of saying what may have been obvious from the start: The only excusable impediment to predictive analytic agility is the hand-off from the people who know the business to the people who know the math. So let’s examine ways that difficulty can be resolved.
At big internet companies, the usual answer is something like
Hey, it’s just data. From web logs. And network event logs. The data scientists know how to handle that.
In financial trading firms, the answer is more
The traders and analysts work closely together. Very closely. In fact, when the traders rip out their phones and throw them across the room, the analysts need to duck to avoid getting clobbered.
In credit card or telecom marketing or insurance actuarial organizations, the answer may be
Don’t worry; the stats geeks have been at this for a long time; they really do understand our business.
All three approaches work.
But what about conventional enterprises, where line-of-business people may not be as math-savvy as internet developers or financial traders, and where the math experts may not have the business issues down cold? My flippant answer is that businesspeople should know some math too.* My more serious answer is that the “business analyst” role should be expanded beyond BI and planning to include lightweight predictive analytics as well.
*I wasn’t being entirely flippant, of course. Statistics is even being taught in high school these days. And when I got a PhD in game theory, 2/3 of my thesis committee was at the Harvard Business School.
For example, at retailers:
- Market basket analysis is pretty simplistic (it only looks at small subsets of a basket at a time).
- Seasonality is tricky. (Weather and so on can skew it.)
- Each store or region can be its own universe.
- Some of the results of analytics are rather coarse-grained — e.g., merchandise adjacencies — so precision in statistical analysis may not matter much anyway.
And so truly rigorous statistical analysis may be both unfeasible and unnecessary; a lot of business-informed seat-of-the-pants reasoning needs to be mixed in. Consequently, there’s a lot to be said for pushing at least some retail predictive analytics pretty close to the merchandising department(s).
Similar stories could be told in many other industries and pursuits, including but emphatically not limited to:
- Event marketing.
- College admissions.
- Political campaigning.
- Field maintenance at utility companies.
- Price-setting (across many industries).
In each case, it’s easy to see how statistical and predictive analytic techniques could add real value to the business. But it’s hard to imagine how the enterprise could support the kind of large, experienced, business-knowledge analytic operation one might find in hedge fund investing or telecom churn analysis. And absent that, it’s tough to see why the only people doing predictive analytics for the organization should sit in some silo of statistical expertise.
Comments
22 Responses to “Agile predictive analytics – the heart of the matter”
Leave a Reply
[…] those preliminaries out of the way, now let’s turn to the heart of the agile predictive analytics challenge. Categories: EAI, EII, ETL, ELT, ETLT, Investment research and trading, Predictive modeling and […]
Another excellent post. Anyone with actual experience in applied analytics can tell you that the most difficult and time consuming parts of the analytic process are (a) convincing the business to use predictive modeling, and (b) identifying a business problem to attack.
Part of this is cultural. We all know analysts who can tell you in excruciating detail how to compute an F test but couldn’t sell igloos to eskimos.
Businesspeople, on the other hand, don’t care whether you use perceptrons or kohonen maps; they just want a cost-effective and reliable prediction.
It seems to me that agile has to start and end with a business frame of mind — that the goal is to develop predictions that work, and not to show how well we can do the math.
Thomas,
Thanks!
I’d make that “work well enough”. I think that at a large fraction of organizations, business units should do their own modeling. In some cases, some or all of the models will inform production use directly. In other cases, they’ll just be the “one to throw away” before the modeling professionals get to work on the problem.
Curt,
As a SAS employee I’m perplexed by your statement “expensive annual SAS licenses and seemingly slow-moving SAS programmers.”
I don’t understand what you mean by the statement about SAS programmers. They must provide value to their employers or else they wouldn’t have a job. SAS programmers are some of the most intelligent, productive people out there.
And as for the myth that SAS is expensive, please check out the article by InformationWeek’s Doug Henschen at http://www.informationweek.com/news/software/bi/231002687
SAS is also enabling many businesses, large and small, other than telcos and hedge funds to benefit from advance analytics. SAS is not alone in providing analytics tools that offer visual interfaces and don’t require statistical expertise.
Curt,
Maybe so — folks have been touting that vision for a long time, but the job market is robust for specialists.
If analytics are mission-critical, you can bet that organizations won’t buy into the notion that the receptionist can build predictive models between calls.
Seems more likely that organizations that analysts themselves will learn how to better integrate with the business. Managers who run large analytics shops generally strike me as businesspeople and not geeks.
Henschen’s Information Week article is misleading, because it compares apples to oranges. Prices given for Revolution and Alpine are for perpetual licenses; for SAS, the price given appears to be a first year licensing fee; after the first year, the user must pay renewal fees that substantially exceed the maintenance fees on other software. A more accurate comparison would show life-cycle cost over a period of several years.
Also, Alpine and Revolution are designed to run in high performance appliances, while the SAS pricing given seems to be for the desktop version. A comparable comparison would be with SAS HPA, which SAS expects to release this quarter.
SAS is a great product, and its renewal pricing model enables the company to offer a level of customer support that other vendors do not match.
Steve, Thomas,
The other pricing question is — if you don’t pay up for SAS apps, does SAS have the same usability as more modern tools?
If you don’t pay a renewal fee, SAS stops working (as do any applications you build in SAS).
I meant to ask whether SAS had good usability without its vertical applications.
Yes, the SAS Programming Language is very powerful, and that is what most customers actually use. The vertical solutions are strategic for SAS because that’s how they appeal outside their traditional user base.
SAS does not publicly disclose the numbers, but in my personal experience about 70% of all SAS customers use basic SAS, about 25% use applications like Enterprise Miner or SAS BI, and about 5% use a vertical solution.
I’m not sure I agree with this part:
“Business-knowledgeable people develop a theory as to what kinds of information and segmentation could be valuable in making better business micro-decisions.”
The nice thing about statistical modeling is you don’t really need a theory you just need a question.
“We want to figure out who is most likely to respond to this mailing, here are 1000 customer attributes, build us a targeting model” or “Segment this user base by trading activity, here are a 1000 customer attributes, tell us which ones differentiates the segments”
The hypothesizing as to “why” these twenty variables drive trading activity and building a narrative around that explanation come last, not first and is in a sense optional.
SAS is expensive from a sticker price perspective, especially compared to R.
The overall discussion needs to be a TCO one of course. TCO is very sensitive to input assumptions. YMMV.
But a big part of what statisticians do is knock things down from 1000 variables to 60, so that the best 20 of the 60 can be pulled out automagically.
My point is that delegating away that selection isn’t necessarily the best move.
My further point is that the best models aren’t necessarily linear. Once you start allowing in the products of two or even, gasp, three variables, you really need business-informed judgment to whittle down the modeling dimensions.
Interesting and articulate article, as always, Curt. Nice job.
One trend we generally see with specialist niche capabilities is that they often get folded into “applications” over time. An example is BI which used to be specialized reporting tools; many of the reporting capabilities are now made available via business applications…dashboards and visualizations are also being folded into apps.
Do you think predictive analytics will also move towards that route? For example will retail merchandizing apps now have predictive inventory turn calculations, campaign management tools have list selection based on predictive scoring, or lead management applications with integrated predictive lead quality scoring? What does that portend for agile predictive analytics?
Nick,
I think it will be hard to tightly couple predictive analytics with operational apps, a few niches perhaps aside. In predictive analytics, you’re trying to discover something previously unknown. Sometimes, part of what you’re trying to discover is what your data structure should be. That all doesn’t mesh well with the packaged operational app business.
Perhaps I should add to my previous answer by saying that verticalizing predictive apps might go better. And if that happens to get integrated in practice with operational apps, so be it.
Does anybody know what actual analytics are done via price optimizers like Zilliant and Vendavo? There’s an example that comes to mind — but then, those companies don’t really seem to be booming …
Thanks Curt. Sounds reasonable. The verticalized predictive + operational apps would likely come first… such as Ascent’s gate assignment optimization solution which has a very large market share of the airports and airlines (http://www.ascent.com/)
Cheers,
Nick
Curt, another great post and obviously you are preaching to the choir with us. My view is that in most companies, regardless of their analytical sophistication:
1/ there are still too many people manually cranking things that can be automated (accurately) at scale.
2/ there are plenty of use cases where speed becomes a very important factor in model accuracy in real life, because if it takes too long to build the model, the business has changed and the model is therefore worthless (e.g. fast product lifecycles and marketing campaigns)
So “good enough” may be viewed as suboptimal by the purists. But they are losing sight of the fact that many business decision makers have NO MODELS to base their decision on because the analysts are overbooked (e.g. marketing teams running campaigns with no models at all.) So just from a philosophical standpoint a “good enough” model always beats no model…
I don’t want to oversimplify the process or imply that the receptionist is going to build a great predictive model. But even for big “important” applications for major banks and telecoms companies, speed and agility ARE important.
And with today’s technology, you can build models quickly and accurately without compromising on quality. In fact, we often beat “traditional” models on quality because we can include more dimensions (e.g. thousands of columns including derived data) and nevertheless quickly build the model through automatic variable reduction.
Last I would agree with unholy guy that the question is what is important. Let the modeling technology find the right variables. Otherwise, you are excluding information a priori and you just may be another victim of the curse of knowledge.
[…] November, I wrote two posts on agile predictive analytics. It’s time to return to the subject. I’m used to KXEN talking about the ability to do […]
[…] Curt Monash recently published an excellent two-part blog on the subject (here and here) […]
[…] Predictive modeling by business analysts is problematic from beginning to end; much progress needs to be made here. […]