When people don’t want accurate predictions made about them
In a recent article on governmental anti-terrorism data mining efforts — and the privacy risks associated with same — The Economist wrote (emphasis mine):
Abdul Bakier, a former official in Jordan’s General Intelligence Department, says that tips to foil data-mining systems are discussed at length on some extremist online forums. Tricks such as calling phone-sex hotlines can help make a profile less suspicious. “The new generation of al-Qaeda is practising all that,” he says.
Well, duh. Terrorists and fraudsters don’t want to be detected. Algorithms that rely on positive evidence of bad intent may work anyway. But if you rely on evidence that shows people are not bad actors, that’s likely to work about as well as Bayesian spam detectors.*
*I.e., pretty much not at all. The idea behind Bayesian spam detectors is that rare words are the best indicator of subject matter. So spammers salt spam with random non-spammy rare words as companions to their spammy ones, and Bayesian filters wave it through. That’s been going on since shortly after I predicted it in 2003 or 2004.
Now let’s take that idea a little further. A lot of data mining and predictive analytics is devoted to figuring out which customers and prospects should get the most attractive offers, or other preferential treatment. The biggest example of this may be telecom companies who invest in reducing churn rates, but other examples abound. For example, gaming companies send pit bosses out to give comp tickets to gamblers who may have reached their pain level on losses. Personalized websites might offer individualized deals as well.
As public awareness of these techniques grows, there’s an obvious risk — consumers could try to game the system so as to get special treatment. It’s not as if this is unknown behavior even without data mining; people complain loudly all the time in hope that they’ll get some sort of mollifying payoff. And so we’ll have just one more reason why data mining models need to be constantly moving targets.
And we two of the many reasons why data mining is an ethical minefield:
- It’s tempting for companies to take advantage of their most docile, agreeable, least demanding customers.
- It’s tempting for consumers to pretend to hold attitudes different from how they really feel.
Related links
- Freedom even without data privacy (a public policy wish list)
- Who will watch the watchmen?
- The no-fly list in practice
Comments
One Response to “When people don’t want accurate predictions made about them”
Leave a Reply
Interesting analysis – not sure if I agree that Bayesian spam filters are entirely useless, but admittedly spammers have found ways to make them less useful.
It’s something to keep in mind as we decide how to use mining data.