• On MovieTome: See the TRAILER for TERMINATOR 4!
October 7, 2008 9:30 AM PDT

Government report: Data mining doesn't work well

Posted by Declan McCullagh
  • Font size
  • Print

The most extensive government report to date on whether terrorists can be identified through data mining has yielded an important conclusion: It doesn't really work.

A National Research Council report, years in the making and scheduled to be released Tuesday, concludes that automated identification of terrorists through data mining or any other mechanism "is neither feasible as an objective nor desirable as a goal of technology development efforts." Inevitable false positives will result in "ordinary, law-abiding citizens and businesses" being incorrectly flagged as suspects.

The whopping 352-page report, called "Protecting Individual Privacy in the Struggle Against Terrorists," amounts to at least a partial repudiation of the Defense Department's controversial data-mining program called Total Information Awareness, which was limited by Congress in 2003.

But the ambition of the report's authors is far broader than just revisiting the problems of the TIA program and its successors. Instead, they aim to produce a scholarly evaluation of the current technologies that exist for data mining, their effectiveness, and how government agencies should use them to limit false positives--of the sort that can result in situations like heavily-armed SWAT teams raiding someone's home and shooting their dogs based on the false belief that they were part of a drug ring.

The report was written by a committee whose members include William Perry, a professor at Stanford University; Charles Vest, the former president of MIT; W. Earl Boebert, a retired senior scientist at Sandia National Laboratories; Cynthia Dwork of Microsoft Research; R. Gil Kerlikowske, Seattle's police chief; and Daryl Pregibon, a research scientist at Google.

They admit that far more Americans live their lives online, using everything from VoIP phones to Facebook to RFID tags in automobiles, than a decade ago, and the databases created by those activities are tempting targets for federal agencies. And they draw a distinction between subject-based data mining (starting with one individual and looking for connections) compared with pattern-based data mining (looking for anomalous activities that could show illegal activities).

But the authors conclude the type of data mining that government bureaucrats would like to do--perhaps inspired by watching too many episodes of the Fox series 24--can't work. "If it were possible to automatically find the digital tracks of terrorists and automatically monitor only the communications of terrorists, public policy choices in this domain would be much simpler. But it is not possible to do so."

A summary of the recommendations:

* U.S. government agencies should be required to follow a systematic process to evaluate the effectiveness, lawfulness, and consistency with U.S. values of every information-based program, whether classified or unclassified, for detecting and countering terrorists before it can be deployed, and periodically thereafter.

* Periodically after a program has been operationally deployed, and in particular before a program enters a new phase in its life cycle, policy makers should (carefully review) the program before allowing it to continue operations or to proceed to the next phase.

* To protect the privacy of innocent people, the research and development of any information-based counterterrorism program should be conducted with synthetic population data... At all stages of a phased deployment, data about individuals should be rigorously subjected to the full safeguards of the framework.

* Any information-based counterterrorism program of the U.S. government should be subjected to robust, independent oversight of the operations of that program, a part of which would entail a practice of using the same data mining technologies to "mine the miners and track the trackers."

* Counterterrorism programs should provide meaningful redress to any individuals inappropriately harmed by their operation.

* The U.S. government should periodically review the nation's laws, policies, and procedures that protect individuals' private information for relevance and effectiveness in light of changing technologies and circumstances. In particular, Congress should re-examine existing law to consider how privacy should be protected in the context of information-based programs (e.g., data mining) for counterterrorism.

By itself, of course, this is merely a report with non-binding recommendations that Congress and the executive branch could ignore. But NRC reports are not radical treatises written by an advocacy group; they tend to represent a working consensus of technologists and lawyers.

The great encryption debate of the 1990s was one example. The NRC's so-called CRISIS report on encryption in 1996 concluded export controls--that treated software like Web browsers and PGP as munitions--were a failure and should be relaxed. That eventually happened two years later.

Declan McCullagh, CNET News' chief political correspondent, chronicles the intersection of politics and technology. He has covered politics, technology, and Washington, D.C., for more than a decade, which has turned him into an iconoclast and a skeptic of anyone who says, "We oughta have a new federal law against this." E-mail Declan.
Recent posts from Politics and Law
Google lashes out at D.C. critic over 'payola punditry'
DHS needs fresh ideas on cybersecurity, experts say
Panel: Government data-mining programs lack oversight
Bush signs law promoting censorship of kids' programming
Telcos, groups draw up national broadband strategy
'Help Wanted' ad names next FCC chair's priorities
Coalition urges Obama to adopt open transition
Obama's attorney general pick: Good on privacy?
Add a Comment (Log in or register) 15 comments
by Dalkorian October 7, 2008 10:13 AM PDT
"The report was written by a committee whose members include William Perry, a professor at Stanford University; Charles Vest, the former president of MIT; W. Earl Boebert, a retired senior scientist at Sandia National Laboratories; Cynthia Dwork of Microsoft Research; R. Gil Kerlikowske, Seattle's police chief; and Daryl Pregibon, a research scientist at Google."

That's an impressive list of educated folks that came to the obvious conclusion. Now, how do we fix the mess? Should we elect another president that thinks of the Constitution as a piece of toilet paper and driven to reignite hitler's ideals, or should we elect change?

Think about it, it's not a hard decision to make. War, terrorism, torture, Gitmo, tanking economy, tanking dollar, homelessness and unemployment still sounding good to you? Feeling safer yet? Where is OBL again?

Voting for mcSHAME is a treasonous act against America!
Reply to this comment
by imhodudes October 7, 2008 10:33 AM PDT
You are indeed naive if you believe that the primary purpose of government data mining is to detect criminal activity. 'Nuff said, if you've been around for a few decades.
Reply to this comment
by scdecade October 7, 2008 10:37 AM PDT
I TOLD YOU SO. When CNET was shamelessly regurgitating gov't reports on the efficacy of data mining to identify terrorists I posted in the comments that it was a misdirected waste of time.

http://news.cnet.com/8301-13772_3-9879556-52.html?tag=commProfileMain;profileBot

Cnet should be more reticent and use prudent caution when communicating gov't propoganda. Lots and lots of people are still brainwashed into believing the gov't only tell the truth based on their public "education." Pouring more urine into the cesspool of gov't lies really does a disservice to society. For example, how's the bailout working out? Hmph.

Me thinks this isn't the only untruth from the Bushies which will later be debunked.
Reply to this comment
by n3td3v October 7, 2008 12:01 PM PDT
I don't think even they know who the terrorists are anymore, they are confused in their own ******** propaganda.
Reply to this comment
by Lerianis October 7, 2008 12:26 PM PDT
You hit the nail on the head. Frankly, in Iraq they are now fighting against people analogous to our own Founding Fathers, dissidents who are pissed at the United States on various issues, not to mention in some cases for killing or injuring their family members.
by harms_way October 7, 2008 12:26 PM PDT
William Perry is not just a professor at Stanford. He is a former U.S. Secretary of Defense.
Reply to this comment
by mbenedict October 7, 2008 2:01 PM PDT
Nitpick: the National Academies (which includes the NRC) is not considered part of the government. In fact the Academies were purposely setup to be independent of the government, as private non-profit institutions.

It's important for the National Academies to have an independent voice separate from the government. Compare for example the Institute of Medicine (part of the National Academies) and the NIH (a Federal agency.) So it's a disservice to label NRC's work as a "government report".

As an aside I found the references to the "shooting their dogs" incident are out of place. That incident had nothing to do with data mining or terrorism. It wasn't even a case of "false positive." Someone mailed a 30lbs of drugs to a house, so they raided the house.
Reply to this comment
by declan00 October 7, 2008 11:50 PM PDT
Two thoughts:

* If the NAS/NRC was created by the government and gets its annual budget from the government, it's part of the government. It's certainly not a private-sector enterprise.

* The shooting the dogs incident absolutely was an example of a false positive; more investigation needed to be done to double-check that false positive. Didn't happen.
by bemenaker October 7, 2008 2:25 PM PDT
From a collective of the Internet: DUH!!!!

Why has it taken them so long to figure out such obvious results?
Reply to this comment
by doconn7 October 7, 2008 2:27 PM PDT
Data mining has never proven a result, it's more or less a voyeuristic tool. After all has Amazon ever really given you a product intro you could use? NOPE. Statistics are only as good as the one looking at them and every one is different, that's why were called individuals.
All that intellect and not one working on cures for cancer, diabetes or what ever ails you. . .
Reply to this comment
by WMCClark October 7, 2008 2:31 PM PDT
Data mining has been one of the most effective tools (along with more troops and a correct plan to use them) in Operation Iraqi Freedom. There, US military reservists who are in police departments in their "day" jobs started using the same tools they use in their regular work to track connections between criminals. So, they started doing the same thing in Iraq, and, wonder of wonders, it works there, too.

And specifically to respond to Dalkorian, Hitler was a SOCIALIST, just like Lenin, Stalin, Mao, Pol Pot and the rest of the rogues gallery of 20th century criminals. NAZI is a shortened version of National Socialist. If you liked Hitler, you'll LOVE Obama!
Reply to this comment
by gpTom October 7, 2008 6:57 PM PDT
It's unfortunate that these researchers were too unaware of the history of fascist tyrrany to understand that effective discovery of terrorism is not what the data mining is about. The purpose of the data mining is to make all citizens feel watched all the time. If false-positives are created, so much the better because a terrorized citizenry is the goal. Watch Naomi Wolf's lecture on the ten-step blueprint to fascism on YouTube. The data mining is just part of a time-tested program put into play by all tyrants.
Reply to this comment
by rdupuy11 October 7, 2008 7:57 PM PDT
ah ha ah ha ah ha ha ha ha ha ha ha

seriously data mining is a good idea. About 20% of terrorists are going to post a page on MySpace outlining their terrorist activities.

You are going to feel very stupid for not at least catching those people.
It wont' do anything for the other 80% but nobody said it would be the only tool.

with that said, I agree Bush treats the constitution like toilet paper and so do most americans, and if you are voting for Obama as an alternative, guess what, you are just vacilating between tweedle dee and tweedle dum.

The only solution would be to actually do something different, Bob Barr comes to mind.
But as you said, those public school educations work wonders...you believe you can only vacilate from Repub. to Democrat year after year, so its Carter...then Reagan....Bush...then Clinton...Bush again, then Obama....

that isn't real change.
Reply to this comment
by Imalittleteapot October 7, 2008 8:46 PM PDT
Basically just wanted to post what gpTom posted. Data mining isn't about terrorism at all. It's about controlling people, even controlling their fear or paranoia, and using what you know about them against them for your own advantage. Companies do it because they want to sell your identity or use it to market to you as well.

You just have to trace it back to greed and power. Everything evil in the world traces back to those two things. Some people say religion too, but usually that's just when someone in the building wants money or power. So, yeah. It's always been that way.

Anyway, going through everyone's information or even having it would just make me feel dirty and perverted so I've always thought maybe there's that aspect too. Maybe they get off on it in some weird way because I just don't know anyone could even go through with that kind of thing.
Reply to this comment
by m_onger October 10, 2008 4:52 PM PDT
Data mining is superb at finding facts that support a preconceived notion.

At least that is my conclusion based on some experience. False positives, as McCullagh calls them, would be bad enough if they just occurred with statistical randomness, but they are catastrophic to victims when they can be concocted by a malevolent miner.

Data mining by the Government is part of a challenge to America's culture - do we bargain away our freedoms for illusions of a secure life?
Reply to this comment
advertisement

In the news now

Slowing expectations at a green-tech start-up

Six months ago, biofuels start-up Mascoma had the wind in its sails, as did the rest of the clean-tech sector. Now, the company is treading carefully and scaling back.


With JavaFX, Sun seeks new coders, new revenue

With the launch of JavaFX 1.0, Sun is trying to reclaim Java's strength as a foundation for rich Internet applications. But it's no longer the incumbent.


Tim Lincecum, motion capture star

San Francisco Giants pitcher, who won the Cy Young award last month, dons a motion capture suit for 2K Sports' Major League Baseball 2K9 video game.


About Politics and Law

Lead contributor Declan McCullagh has covered politics, technology, and Washington, D.C., for more than a decade, which has turned him into an iconoclast and a skeptic of anyone who says, "We oughta have a new federal law against this."

Add this feed to your online news reader

Politics and Law topics

advertisement
advertisement
Click Here

Inside CNET News

Scroll Left Scroll Right