FAQ: Protecting yourself from search engines

Sites record what you look for--just ask AOL users whose search histories were disclosed. How to protect privacy?

Declan McCullagh Former Senior Writer
Declan McCullagh is the chief political correspondent for CNET. You can e-mail him or follow him on Twitter as declanm. Declan previously was a reporter for Time and the Washington bureau chief for Wired and wrote the Taking Liberties section and Other People's Money column for CBS News' Web site.
Declan McCullagh
6 min read
AOL's publication of the search histories of more than 650,000 of its users should reinforce an important point: What you type in online may not be as private as you think.

Search engines place a multibillion-dollar infrastructure at the hands of any random user who stops by their Web site. The price you pay, however, is that the company may hold on to your search queries--which can provide a glimpse into your life--forever.

To offer some suggestions about preserving your privacy while using search engines, CNET News.com has prepared the following list of frequently asked questions.

Q: Why did AOL publish those search histories?
A research arm of AOL published the data in hopes the information would help other scientists and statisticians learn more about how people use the Internet. AOL apologized for this on Monday, saying the release had not been properly vetted.

Q: How can I protect myself from a search engine doing the same thing in the future?
Because of the negative press AOL received, the company is not likely to do the same thing anytime soon.

But of the big four search engines (AOL, Google, Microsoft and Yahoo), only Google resisted a Justice Department subpoena that asked for similar search terms. Keep reading for more detailed suggestions.

Q: Why do search engines store what I type in after my search is complete?
No law requires search companies to delete your search terms, and there are some business justifications for keeping them around at least a little while.

For instance, keeping detailed records can help in identifying click fraud (faking clicks on Web ads to drive up a rival's costs), and in optimizing search results for different geographic areas. Compiling a user profile can aid in tailoring search results in products like Google Personalized Search. Also, disk storage is cheap, and engineers tend to prefer to keep data rather than delete it.

But it's hardly clear that a compelling reason exists for keeping older records--beyond a few months--unless a customer voluntarily chooses options like personalization.

Q: Do any search engines not store records of what their users do?
Yes. Ixquick.com, a start-up funded by Holland Ventures of Amsterdam, pledges to do precisely that.

The Netherlands-based company proudly says it doesn't keep records of its users' Internet addresses. In other words, it does save search terms, but the company says it's unable to link them to any person, unique ID number or Internet address.

"I'm a firm believer in the privacy cause," Ixquick.com CEO Robert Beens said in a recent interview with CNET News.com. "I can imagine a lot of people are keen on their privacy."

Beens said that "we delete the (Internet protocol) address of users. We have a program running which opens the log files and deletes the user IP addresses and overwrites them." And, Beens said, the company removed the unique ID from Ixquick.com's cookies.

Q: Is AOL thinking of doing the same thing?
Nobody knows. But Jason Calacanis, who co-founded blog publisher Weblogs Inc., which AOL bought last year, says it should.

In a blog post on Monday, Calacanis wrote: "Frankly, I want us to NOT KEEP LOGS of our search data. Yep, you heard that right... we shouldn't even keep this data."

Q: How does Ixquick.com work?
Ixquick.com is what's known as a meta-search engine. For U.S. queries, it contacts Yahoo, AltaVista, Alltheweb, Entireweb, Amazon, Netscape, Wikipedia and a handful of other sites. It compiles the results, decides which Web sites received the most votes as relevant, and displays the top scorers.

"It is possible to fool one search engine by modifying the links, tags, or content of the site," Beens said. "To fool 11 search engines is very hard."

Q: OK, Ixquick.com is fine and all that, but I want to keep using my favorite search engine. How can I protect my privacy while doing that?
The first thing you should do is clear the cookies that are set by search engines--those let the company correlate your repeat visits. In Firefox, go to Preferences and select Privacy. There you have the option to delete cookies and even prevent search engines from ever setting them again. (Unfortunately, not all Web browsers offer this option.)

Let's say you're using Google. Add "google.com" to Firefox's list of cookies-not-allowed sites. Be warned: That prevents you from using options like personalization or Gmail, which is why you might want to keep another browser like Opera, Safari or Internet Explorer around to do those things.

If you're really worried, go to Anonymizer.com and sign up for one of its anonymous browsing options (they're primarily for Windows users). Tor is another option. It's a pain, but protecting your privacy may well be worth it.

Q: Excluding Ixquick, what information do other search engines collect?
We surveyed the search engines in February of this year and asked them precisely that question.

The rough overview: Given a number of search terms, they can produce a list of people (identified by Internet address or cookie) who searched for a given term. Second, given a collection of Internet addresses, they can produce a list of the terms searched by the user of a given address. That effectively creates an electronic dossier of an individual.

Q: Who can get access to my list of search terms?
Well, prosecutors in criminal cases certainly can. And it's likely that even lawyers in civil cases--divorce attorneys, employers in severance disputes--eventually will demand that Google, Microsoft, Yahoo, AOL and other search engines cough up users' search histories.

Q: Has this happened before?
Almost. A North Carolina man was found guilty of murder in November in part because he Googled the words "neck," "snap," "break" and "hold" before his wife was killed. But those search terms were found on Robert Petrick's computer, not obtained from Google directly.

Q: How are Internet addresses handed out? Do people always have the same one?
It depends. Many DSL and cable modem providers allocate Internet addresses only when they're in use (the methods are called DHCP and PPPoe). Those IP addresses can change frequently.

Other IP addresses tend to be fixed. Faculty and staff members at universities, and employees of corporations, are more likely to have fixed Internet addresses.

AOL Search is a unique case. Because AOL users tend to be logged in when using it, AOL will know who you are--assuming, that is, that you provided accurate information when signing up for its service.

Q: If Google knows I'm connecting from a dynamically assigned Internet address of one day, and the next day and the third, how can it link my queries together to create that dossier?
This is where "cookies" come in. A cookie is simply a device for a Web site to recognize people the next time they return. Google, Yahoo, AOL and Microsoft all set cookies by default. (Microsoft's expire in 2016; Yahoo's in 2010; Google's in 2038. AOL sets a third-party cookie that expires in 2011.)

In the above example, Google.com would set a cookie for whoever's connecting from Internet address the first day, and then figure out that the same Web browser is connecting from and the next two days. If people are logged in to their Google account, this makes the process even easier, of course.

Q: How long do companies keep records of my search terms?
In our survey, Microsoft, Google and Yahoo all said they keep data as long as it's necessary, which could mean forever. Microsoft did add that they are "looking at ways" to provide users with the option to delete their search histories, and Yahoo made a similar statement. It's unclear how long AOL keeps it.