X

White House says blocking Iraq Web documents was 'mistake'

In response to questions from News.com, the Bush administration says blocking search engines from indexing Iraq-related documents was a simple mistake and alters the White House Web site.

Declan McCullagh Former Senior Writer
Declan McCullagh is the chief political correspondent for CNET. You can e-mail him or follow him on Twitter as declanm. Declan previously was a reporter for Time and the Washington bureau chief for Wired and wrote the Taking Liberties section and Other People's Money column for CBS News' Web site.
Declan McCullagh
3 min read

The Bush administration says that blocking search engines from indexing key Iraq-related documents on its White House Web site was a simple mistake.

Until Thursday, the White House was using a robots.txt file that instructed search engines not to visit publicly accessible Iraq files on Whitehouse.gov, including a January strategy report (PDF) and a July benchmark report (PDF).

This public report on Iraq was marked as off-limits to search engines by Whitehouse.gov through the robots.txt file.

In response to phone conversations I had with them pointing out the problem, they've since revised their robots.txt file--meaning the progress report on Iraq due next week should be visible through Google, MSN and so on.

"It was not intentional, and we have corrected the mistake," White House spokesman Blair Jones told me.

I've put the pre-Thursday version of their robots.txt file online here so you can see for yourself.

The other odd thing I noticed is that Whitehouse.gov was programmed to block search engines from indexing a photo gallery of President Bush in a flight suit standing in front of that famous Iraq "Mission Accomplished" banner in May 2003.

What's odd is that the gallery, which has since been moved, was the only one on the entire Whitehouse.gov site listed as off-limits. To be fair, though, the current location is not off-limits.

By way of background, there was a flap in late 2003 about the White House using robots.txt to tell search engine bots to stay away from "/iraq" pages because the same file was posted in the main section and duplicated in the "/iraq" section. It's the same logic as blocking text-only pages; here's an example of the same text appearing in three different templates: normal, text-only, and printer-friendly. The White House seems to have subsequently discontinued the Iraq template.

That explains the "/nsc/iraq" directory being marked as off-limits to search engines. But out of 767 mentions of "/iraq" in the robots.txt file from 2003, the sole Iraq press release or gallery listed as blocked this week (a) represents a uniquely embarrassing moment for the Bush administration and (b) has been the subject of revisionism.

Don't believe me? Bush's carrier speech originally was titled, according to the Internet Archive, "President Bush Announces Combat Operations in Iraq Have Ended" and featured photographs of smiling Iraqi children. At some point the children vanished and the speech was quietly renamed: "President Bush Announces Major Combat Operations in Iraq Have Ended." Another USS Abraham Lincoln-related switch: before and after.

Jones said that robots.txt entry, too, was just a coincidence. He told me: "We reorganized our Iraq content into one 'In Focus' area and the Web team inadvertently missed these folders in the robots.txt file."

I should point out that the White House is not the only federal Web site to have an overzealous robots.txt file. For no good reason that I can discern, National Intelligence Director Mike McConnell blocks search engines from his entire organization's Web site. In this case and Whitehouse.gov, it's time for a friendly amendment to the Robots Exclusion Protocol: Search engines should ignore robots.txt when a government agency uses it, intentionally or unintentionally, to keep public documents away from the public.

P.S.: Here's what I did to see which directories of interest are listed as off-limits to search engines (you'll have to replace the URL to robots.txt with the archived one I linked to above):

sh-2.05a$ wget http://whitehouse.gov/robots.txt -O wh.txt -o log.txt
sh-2.05a$ grep -v text wh.txt
User-agent:     *
Disallow:       /cgi-bin
Disallow:       /search
Disallow:       /query.html
Disallow:       /help
Disallow:       /news/releases/2003/05/images/iraq
Disallow:       /news/releases/iraq
Disallow:       /nsc/iraq

User-agent:     whsearch
Disallow:       /cgi-bin
Disallow:       /search
Disallow:       /query.html
Disallow:       /help
Disallow:       /sitemap.html
Disallow:       /privacy.html
Disallow:       /accessibility.html
sh-2.05a$