November 16, 2008

When people don’t want accurate predictions made about them

In a recent article on governmental anti-terrorism data mining efforts — and the privacy risks associated with same — The Economist wrote (emphasis mine):

Abdul Bakier, a former official in Jordan’s General Intelligence Department, says that tips to foil data-mining systems are discussed at length on some extremist online forums. Tricks such as calling phone-sex hotlines can help make a profile less suspicious. “The new generation of al-Qaeda is practising all that,” he says.

Well, duh. Terrorists and fraudsters don’t want to be detected. Algorithms that rely on positive evidence of bad intent may work anyway. But if you rely on evidence that shows people are not bad actors, that’s likely to work about as well as Bayesian spam detectors.*

*I.e., pretty much not at all. The idea behind Bayesian spam detectors is that rare words are the best indicator of subject matter. So spammers salt spam with random non-spammy rare words as companions to their spammy ones, and Bayesian filters wave it through. That’s been going on since shortly after I predicted it in 2003 or 2004.

Now let’s take that idea a little further. A lot of data mining and predictive analytics is devoted to figuring out which customers and prospects should get the most attractive offers, or other preferential treatment. The biggest example of this may be telecom companies who invest in reducing churn rates, but other examples abound. For example, gaming companies send pit bosses out to give comp tickets to gamblers who may have reached their pain level on losses. Personalized websites might offer individualized deals as well.

As public awareness of these techniques grows, there’s an obvious risk — consumers could try to game the system so as to get special treatment. It’s not as if this is unknown behavior even without data mining; people complain loudly all the time in hope that they’ll get some sort of mollifying payoff. And so we’ll have just one more reason why data mining models need to be constantly moving targets.

And we two of the many reasons why data mining is an ethical minefield:

It’s tempting for companies to take advantage of their most docile, agreeable, least demanding customers.
It’s tempting for consumers to pretend to hold attitudes different from how they really feel.

Related links

Categories: Analytic technologies, Data warehousing, Surveillance and privacy

Subscribe to our complete feed!

Comments

One Response to “When people don’t want accurate predictions made about them”

Brad on December 10th, 2008 8:12 am

Interesting analysis – not sure if I agree that Bayesian spam filters are entirely useless, but admittedly spammers have found ways to make them less useful.
It’s something to keep in mind as we decide how to use mining data.

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

When people don’t want accurate predictions made about them

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin