Notes on predictive modeling, November 2, 2014
Following up on my notes on predictive modeling post from three weeks ago, I’d like to tackle some areas of recurring confusion.
Why are we modeling?
Ultimately, there are two reasons to model some aspect of your business:
- You generally want insight and understanding.
- This is analogous to why you might want to do business intelligence.
- It commonly includes a search for causality, whether or not “root cause analysis” is exactly the right phrase to describe the process.
- You want to do calculations from the model to drive wholly or partially automated decisions.
- A big set of examples can be found in website recommenders and personalizers.
- Another big set of examples can be found in marketing campaigns.
- For an example of partial automation, consider a tool that advises call center workers.
How precise do models need to be?
Use cases vary greatly with respect to the importance of modeling precision. If you’re doing an expensive mass mailing, 1% additional accuracy is a big deal. But if you’re doing root cause analysis, a 10% error may be immaterial.
Who is doing the work?
It is traditional to have a modeling department, of “data scientists” or SAS programmers as the case may be. While it seems cool to put predictive modeling straight in the hands of business users — some business users, at least — it’s rare for them to use predictive modeling tools more sophisticated than Excel. For example, KXEN never did all that well.
That said, I support the idea of putting more modeling in the hands of business users. Just be aware that doing so is still a small business at this time.
“Operationalizing” predictive models
The topic of “operationalizing” models arises often, and it turns out to be rather complex. Usually, to operationalize a model, you need:
- A program that generates scores, based on the model.
- A program that consumes scores (for example a recommender or fraud alerter).
In some cases, the two programs might be viewed as different modules of the same system.
While it is not actually necessary for there to be a numerical score — or scores — in the process, it seems pretty common that there are such. Certainly the score calculations can create a boundary for loose-coupling between model evaluation and the rest of the system.
That said:
- Sometimes the scoring is done on the fly. In that case, the two programs mentioned above are closely integrated.
- Sometimes the scoring is done in batch. In that case, loose coupling seems likely. Often, there will be ETL (Extract/Transform/Load) to make the scores available to the program that will eventually use them.
- PMML (Predictive Modeling Markup Language) is good for some kinds of scoring but not others. (I’m not clear on the details.)
In any case, operationalizing a predictive model can or should include:
- A process for creating the model.
- A process for validating and refreshing the model.
- A flow of derived data.
- A program that consumes the model’s outputs.
Traditional IT considerations, such as testing and versioning, apply.
What do we call it anyway?
The term “predictive analytics” was coined by SPSS. It basically won. However, some folks — including whoever named PMML — like the term “predictive modeling” better. I’m in that camp, since “modeling” seems to be a somewhat more accurate description of what’s going on, but I’m fine with either phrase.
Some marketers now use the term “prescriptive analytics”. In theory that makes sense, since:
- “Prescriptive” can be taken to mean “operationalized predictive”, saving precious syllables and pixels.
- What’s going on is usually more directly about prescription than prescription anyway.
Edit: Ack! I left the final paragraph out of the post, namely:
In practice, however, the term “prescriptive analytics” is a strong indicator of marketing nonsense. Predictive modeling has long been used to — as it were — prescribe business decisions; marketers who use the term “prescriptive analytics” are usually trying to deny that very obvious fact.
Comments
4 Responses to “Notes on predictive modeling, November 2, 2014”
Leave a Reply
Kurt,
A few comments.
(1) Precision and accuracy are not the same thing; predictive models can be precise but not accurate, accurate but not precise, both or neither.
(2) Organizations do not let “business users” deliver high value “money” analytics for applications like credit risk, fraud, trading and so forth. The people who work in these areas aren’t afraid to code.
(3) A scoring engine simply computes a numerical score; a decision engine implements rules based on scores and other criteria. Separating them into modules makes sense because the tasks are often asynchronous; we may want to score in batch, store the results and use the score in a real-time decision.
It also makes sense to separate the scoring engine from the model training or learning engine because (a) the tasks are asynchronous; (b) scoring is embarrassingly parallel and can be implemented inside MPP databases; (c) scoring is a production application; and (d) scoring does not require highly trained analytic specialists.
PMML is a fine standard, but only works if the organization has aligned the data models for the deployment environment and the model development environment. Many haven’t.
The process for operationalizing a model also requires a facility to catalogue deployed models and to track performance over time.
(4) There’s a bit more to prescriptive analytics than marketing
http://en.wikipedia.org/wiki/Prescriptive_analytics
Regards,
Thomas
Thomas,
That Wikipedia article is my top example for holding the opposite opinion from you. But I didn’t realize until you sent me back to the article that the term “Prescriptive analytics” is trademarked by one obscure firm, and hence should probably be ignored by the rest of the industry entirely.
Kurt,
Ayata is obscure? No more than — let’s say — Nutonian. Obviously, Ayata punches above its weight if their thoughtware is part of the lexicon.
Regards,
Thomas
Thomas,
What’s the relevance of Nutonian here? This post contradicts some of their marketing claims too. Even more to the point, it is approximately true that nobody except a trademark holder should use a trademarked term.
Also — I somehow forgot the last paragraph of the post. It’s been added in now.