Where things stand in US government surveillance
Edit: Please see the comment thread below for updates. Please also see a follow-on post about how the surveillance data is actually used.
US government surveillance has exploded into public consciousness since last Thursday. With one major exception, the news has just confirmed what was already thought or known. So where do we stand?
My views about domestic data collection start:
- I’ve long believed that the Feds — specifically the NSA (National Security Agency) — are storing metadata/traffic data on every telephone call and email in the US. The recent news, for example Senator Feinstein’s responses to the Verizon disclosure, just confirms it. That the Feds sometimes claim this has to be “foreign” data or they won’t look at it hardly undermines my opinion.
- Even private enterprises can more or less straightforwardly buy information about every credit card purchase we make. So of course the Feds can get that as well, as the Wall Street Journal seems to have noticed. More generally, I’d assume the Feds have all the financial data they want, via the IRS if nothing else.
- Similarly, many kinds of social media postings are aggregated for anybody to purchase, or can be scraped by anybody who invests in the equipment and bandwidth. Attensity’s service is just one example.
- I’m guessing that web use data (http requests, search terms, etc.) is not yet routinely harvested by the US government.* Ditto deanonymization of same. I guess that way basically because I’ve heard few rumblings to the contrary. Further, the consumer psychographic profiles that are so valuable to online retailers might be of little help to national security analysts anyway.
- Video surveillance seems likely to grow, from fixed cameras perhaps to drones; note for example the various officials who called for more public cameras after that Boston Marathon bombing. But for the present discussion, that’s of lesser concern to me, simply because it’s done less secretively than other kinds of surveillance. If there’s a camera that can see us, often we can see it too.
*Recall that these comments are US-specific. Data retention legislation has been proposed or passed in multiple countries to require recording of, among other things, all URL requests, with the stated goal of fighting either digital piracy or child pornography.
As for foreign data:
- Last I heard, we were collecting at least 10s of petabytes of satellite images per day. That’s probably too much even for the US government to persist in its entirety at this time. In the installation I heard of, most of the satellite data was deleted within 12-48 hours. But it may fit into the yottabyte-scale data center in Utah.
- I also once heard the US monitors every radio transmission detectable from North Korea.
Beyond that, use your imagination.
The big question is how much domestic or quasi-domestic communications-content data the US government currently captures. I think it’s a lot more than we previously acknowledged. For example:
- Both Edward Snowden and William Binney have said things that sound like the NSA is comprehensively storing actual communications content. I guess it’s possible that in each case they misspoke.
- Other claims to that effect have been more ringing. For example:
- The secret AT&T room/message splitter story dates back to 2007.
- The FBI itself states that in 2011 it “checked U.S. government databases and other information to look for such things as derogatory telephone communications, possible use of online sites associated with the promotion of radical activity.”
- Much of the PRISM project seems to be about access to communication or file contents.
- The most visible, emphatic denials — e.g. those from President Obama or various tech companies — seem to leave weasel room if one parses them carefully.
And cost is not a barrier. I would guess the order of magnitude* for all email in the US at 10 petabytes/day uncompressed. (100s of billions of messages, 10s of KB per message.) Phone call volumes are probably less. (Fewer than 10 billion calls per day.) The Feds can afford to store that. Hadoop or NoSQL clusters, for example, can be set up for low six figures per petabyte.** HP Vertica will sell anybody an RDBMS cluster (hardware and software) for around $2 million/petabyte.**
*In the most literal high-school-chemistry sense of the phrase.
**Of raw data; particularly compressible data might be managed yet more cheaply.
Coverage of all this has of course been intense. In particular:
- Glenn Greenwald has the big Snowden scoops.
- Matthew Ingram offers an amazing overview of the revelations and discussion.
- Michael Arrington has unleashed a couple of polemics.
And my views can be summarized much as I did three years ago:
- It is inevitable* that governments and other constituencies will obtain huge amounts of information, which can be used to drastically restrict everybody’s privacy and freedom.
- To protect against this grave threat, multiple layers of defense are needed, technical and legal/regulatory/social/political alike.
- One particular layer is getting insufficient attention, namely restrictions upon the use (as opposed to the acquisition or retention) of data.
*And indeed in many ways even desirable
Comments
10 Responses to “Where things stand in US government surveillance”
Leave a Reply
Government has lots of data, and historically it has been segregated and unintegrated by policy and law. This is clearly changing, and there is likely a lot of integration that would have been considered illegal or unethical in the near past.
I think you are radically underestimating how much streaming traffic is both reviewed and staged. Also, there is at least evidence that much traffic such as URLs and weblogs, of course where there is incriminating use or Urdu or such – but more innocuous seeming stuff as well is being captured, including (illegal) analysis of traffic in the US.
The amazing part of this journey is the intentional lack of oversight. Does anyone believe that use of this data is being monitored? Has anyone been prosecuted for passing info to a friend or stalking an ex from these hairballs? That is the fundamental problem in this for me.
Good luck actually restricting use of such information by the government. They have NICS for firearm background checks. They are required by law to destroy the information about successful checks, so as to avoid creating a national firearm registry. Yet during the Malvo shootings, the FBI was running around confiscating .223 rifles for ballistic checks and there is no legal way they could have known who had them. BTW, there us a HUGE difference between a company buying or acquiring information that is known to be public (CC purchases) in order to better target customers and a government illegally acquiring information thought to be private, especially when they have a monopoly on force (police, military, IRS, prisons, etc.).
https://medium.com/prism-truth/82a1791c94d3 has a good discussion of some likely PRISM misunderstanding, although with a nothing-to-see-here spin I don’t agree with.
Aaron,
You could be right about what is reviewed/staged. It’s always tough to know everything secret that’s going on — after all, that’s why it’s secret.
http://touch.latimes.com/#section/-1/article/p2p-76220035/ is supportive of Aaron’s views on streaming data.
Just because I call into doubt the PRISM reporting doesn’t mean I think there’s nothing to see in general. I’d argue that there’s a lot the public should know, but doesn’t about how our government looks at private communications, but we’re all getting sidetracked over PRISM, which seems like it might be something less than alleged..
Fair enough, Mark.
As previously noted, the $20 million PRISM budget proves it’s only a small part of the puzzle.
[…] week, discussion has exploded about US government surveillance. After summarizing, as best I could, what data the government appears to collect, now I ‘d like to consider what they actually do with it. More precisely, I’d like to […]
Whoops. Declan McCullagh argues in http://news.cnet.com/8301-13578_3-57589078-38/nsa-chief-drops-hint-about-isp-web-e-mail-surveillance/ that the cell phone data includes wireless web surfing logs.
http://www.motherjones.com/kevin-drum/2013/06/some-questions-and-about-edward-snowden is nice and straightforward about inaccuracies in the Snowden disclosures.
[…] recent posts based on surveillance news have been partly superseded by – well, by more news. Some of that news, along with some good […]