How is the surveillance data used?
Over the past week, discussion has exploded about US government surveillance. After summarizing, as best I could, what data the government appears to collect, now I ‘d like to consider what they actually do with it. More precisely, I’d like to focus on the data’s use(s) in combating US-soil terrorism. In a nutshell:
- Reporting is persuasive that electronic surveillance data is helpful in following up on leads and tips obtained by other means.
- Reporting is not persuasive that electronic surveillance data on its own uncovers or averts many terrorist plots.
- With limited exceptions, neither evidence nor logic suggests that data mining or predictive modeling does much to prevent domestic terrorist attacks.
Consider the example of Tamerlan Tsarnaev:
In response to this 2011 request, the FBI checked U.S. government databases and other information to look for such things as derogatory telephone communications, possible use of online sites associated with the promotion of radical activity, associations with other persons of interest, travel history and plans, and education history.
While that response was unsuccessful in preventing a dramatic act of terrorism, at least they tried.
As for actual success stories — well, that’s a bit tough. In general, there are few known examples of terrorist plots being disrupted by law enforcement in the United States, except for fake plots engineered to draw terrorist-leaning individuals into committing actual crimes. One of those examples, that of Najibullah Zazi, was indeed based on an intercepted email — but the email address itself was uncovered through more ordinary anti-terrorism efforts.
As for machine learning/data mining/predictive modeling, I’ve never seen much of a hint of it being used in anti-terrorism efforts, whether in the news or in my own discussions inside the tech industry. And I think there’s a great reason for that — what would they use for a training set? Here’s what I mean.
Unless the jargon is being misused — which of course happens all too often — data mining works like this:
- Data sets are collected in which outcomes are matched to (vectors of) other (dependent) variables. These are called training sets.
- Analytic software is run, with the training sets as inputs and algorithms as outputs. This is called training the model. The output algorithms are produced which purport to estimate which other vectors of dependent variables are likely to be associated with which outcomes.
Yes, I’m saying that predictive modeling software, used at the modeling stage — as opposed to the model scoring/execution stage — has algorithms as output. Depending on details, that’s either literally true or else just true in effect.
For example, in the simplest case, namely a linear regression:
- The outcome is an event such as a product sale (desirable) or equipment failure (to be avoided).
- The algorithm is a weighted sum of the other variables, whose value is interpreted as the probability of that outcome.
- The algorithm discovery process simply boiled down to calculating the coefficients in the weighted sum.
When data mining and predictive modeling get a little more complicated than that, we still call them “statistical analysis”; when they get much more complicated, the name “machine learning” is commonly used instead.
And so my views on the application of predictive modeling to domestic US anti-terrorism start:
- In most respects, there aren’t enough examples to train models to help predict or avert terror attacks.
- Presumably not coincidentally, while I’ve heard of many query and visualization techniques — notably graph analytics — I haven’t heard of predictive modeling applied directly to anti-terrorism.
- There’s one big exception to this rule:
- Surveillance-based anti-terrorism efforts depend heavily on natural language processing …
- … and natural language processing depends heavily on machine learning.
Perhaps there are other examples similar to the natural language one, but nothing is currently coming to mind.
Note that not all these arguments apply to all parts of the world. For example, there have been enough roadside IEDs (Improvised Explosive Devices) in Iraq and Afghanistan that looking for unusual communication patterns associated with them might bear fruit. But when it comes to fending off terrorist attacks on US soil, I believe the main use of surveillance data is for straightforward query and data visualization based on the best educated guesses of smart human analysts.
Comments
9 Responses to “How is the surveillance data used?”
Leave a Reply
[…] Edit: Please see the comment thread below for updates. Please also see a follow-on post about how the surveillance data is actually used. […]
I have no idea whether your conclusion is correct, Curt, but a few points are worth noting:
1. There are many techniques (e.g. oversampling) that can be used to help with modeling rare events.
2. The definition of “data mining” above is obviously an informal one, but amongst other things it should also include unsupervised techniques, in which you are trying to find hidden structure and patterns in data that don’t necessarily have a specified dependent variable.
3. Presumably a terrorist attack is not the only event of interest. For example, I assume it’s interesting to predict the likelihood of an individual’s being engaged in a terrorist group.
4. Sadly, terrorist events in the U.S. are not that rare. There have been more than 2000 since 1970. And is every foiled attack reported? I’m not sure.
Steven,
You raise good points. Not all intelligence work is pure anti-terrorism.
And subsequent to my writing the post above, I opened an email from a predictive modeling software company telling me of a forthcoming feature that they were working on “for the spooks”.
Hi Curt,
With regard to training data sets, Have you seen the work the work the LAPD have done? They use the patterns associated with earthquake aftershocks to predict crime after an incident and in the area where it is most likely to occur.
Perhaps all that intelligence data has been applied (or could be) in a similar way..?
Rob
“When I touched the unit in my pants, it exploded prematurely”.
Mine that data.
[…] Study Said the South Is More Racist Than the North How is the surveillance data used? (a good argument for why the data collected can almost never be used to stop an attack) Bill Black: […]
[…] recent posts based on surveillance news have been partly superseded by – well, by more news. Some of that news, along with some good […]
“when it comes to fending off terrorist attacks on US soil…”
…Is the use of this kind of “leaked” informations created in order to make people believe that such a technology is not only possible but NSA possess it, and to create fear among common people on many dimensions.
Please, are you really serious in playing their game or you just don’t understand who and how rule the world? In case that you don’t know who are “they” check out this video http://www.youtube.com/watch?v=wtSVBTne-KY&feature=related ;all real what you’ve seen.
To see another huge secret see this http://www.youtube.com/watch?v=_62WWGLzynY ;yes, stars and planets are actually very small and in close proximity around the Earth, the only existing world. No aliens. Keep researching on this, there are many more secrets…Once when you see the Truth you will understand how huge is deception. And then tell it to your friends and readers of your blog however it is fantastic.
Still not convinced? Type in your browser (not Google) itanimulli.com, itanimulli stands for reversed illuminati word. It will take you right there to the source of deception-NSA.
Now, when you’ve got the picture of hidden reality you are ready to see another piece of the Truth. Stars and planets only appear after dark advances, like this http://www.youtube.com/watch?v=eaYlUgH7Fac&feature=related ;
Which reminded me to ask-WHO CREATED all this?