X

Privacy: What should Google do?

Company claims that search quality and user privacy are a zero-sum game: deleting log data makes it more difficult to improve search results. Perhaps the company is right.

Chris Soghoian
Christopher Soghoian delves into the areas of security, privacy, technology policy and cyber-law. He is a student fellow at Harvard University's Berkman Center for Internet and Society , and is a PhD candidate at Indiana University's School of Informatics. His academic work and contact information can be found by visiting www.dubfire.net/chris/.
Chris Soghoian
6 min read

Public interest groups, academics and members of the press have hammered Google for its lax privacy policies. The criticism has mostly focused on the log deletion practices and browser cookie policies at the search giant. Google claims that search quality and user privacy are a zero-sum game: deleting log data makes it more difficult to improve search results. Perhaps the company is right. However, there are several other pro-privacy steps that Google could take to significantly protect its customers--which it has not done, and continues to reject.

Over the last few months, a number of Google's engineers have issued public statements on the company's public policy blog to defend its much criticized log data retention policies. The company claims that the data can be used to hunt down malware, to catch people defrauding its advertising system, and can be used to improve search results.

These high-profile Googlers make the case that user privacy and search quality are a zero sum game: deleting logs to protect customer privacy makes it far more difficult to provide a good search experience.

While I personally think this is a load of rubbish, I'm going to give them the benefit of the doubt today, because I want to focus on a different issue. Namely, that Google could take a few easy steps in other areas to protect customers from the prying eyes of AT&T, the NSA, or the pervert next door reading your e-mails sent over a wireless network.

Search terms

Imagine a normal search situation. A user will visit Google.com, type in a few words, "security blogs," perhaps, and click on the search button. From the search results page, a user will click on a link, taking them to www.some-website.com. Due to the way that Google has designed its search engine, Web site owners are given the search terms that brought each Web surfer to their site.

A more technical explanation of this is as follows: Google embeds the search terms that the user issued into the Web URL of the search response page. That is, an example search URL will look like http://www.google.com/search?q=security+blogs. This is known as a HTTP GET request. When a user clicks on one of the search results on that page, the Web site owner will be told the exact address of the referring Web site. Due to the fact that Google embeds the search terms in its results URL, the Web site owner learns which terms lead a user to their page.

Google could very easily stop including the search terms in the URL and thus stop passing on the search terms to the Web sites that users click on from a Google results page. It could do so by requesting that the user's browser send the terms to a Google server in a more discrete way. Many Web sites do this, especially those dealing with private information. Amazon.com and other e-commerce sites do not transmit the customer's credit card information by sending it in the URL--even on a SSL-encrypted Web session. To do so would needlessly endanger the user.

A switch to this more privacy-protecting method of Web data submission, known as a HTTP POST, would be a trivial change for Google's engineers. Furthermore, it wouldn't lead to any additional data processing resources for its vast number of servers. For Google, such a change would cost the company essentially nothing yet it would give its customers an immediate increase in privacy.

The only downside to such a change, would be the loss of information for Web masters. Companies would like to know which search terms drew a customer to their Web site, especially if that visit resulted in a sale. While no doubt useful for marketers, this is not something they deserve to know. Furthermore, Google's responsibility is to the users with the eyeballs. At the very least, if a firm wants to know what people are searching for--let it buy an advertisement from Google. Right now, Google gives this data away to every Web site owner, for free.

Encrypted mail

By default, all Google searches as well as e-mail sent and read via Gmail are transmitted in the open, over an unencrypted session. What that means, is that the data can be seen by anyone with access to the network--anyone else using the Wi-Fi connection at Starbucks, your Internet service provider, or any government agency that has tapped the Internet backbone.

All Web browsers support the SSL encryption standard. Google even offers encrypted access to Gmail users, if they know to ask for it. Users simply need to visit https://www.gmail.com, and their e-mail entire session will be safe from prying eyes.

Unfortunately, encryption is expensive, at least in terms of computing power. Turning SSL on by default for the millions of Gmail users would mean that Google would have to dedicate more computers to the service. Those computers cost money. A Google spokesperson confirmed this, telling me that "we have not made SSL the default due to capacity and latency issues."

Google has made a shrewd business decision: Those users who care enough about their privacy to read the company's FAQ can get a bit of protection for their e-mail, while those users who presumably don't care, are left exposed to hackers and snoops.

Google should change its policies with regard to SSL and e-mail. At the very least, it should mention the secure Web mail option and provide a link on the main Gmail log-in page. This information is currently hidden in one of the help pages. In an ideal world, Gmail would enable SSL by default.

Searches, exposed.

While the company offers encrypted Web mail, it does not do the same for searches. Currently, there is no way to keep your search terms secret from those who might be watching the network. Could the company offer this? Sure, but it has chosen not to. Primarily, because of cost.

Luckily, someone else has taken steps to fill the search privacy gap left by Google.com. A Texas man named Daniel Brandt has created a Google-powered privacy-preserving search engine: Scroogle.org.

Scroogle submits search queries to Google on a user's behalf, scrapes the results, and displays them to the user. Scroogle's search data policies are fantastic: no cookies, no search-term records and all access logs are deleted within 48 hours. The site uses HTTP POST requests by default, which helps to keep the search terms a secret between the user and the search engine. Furthermore, for those users willing to put up with the 1- or 2-second delay required to initiate an SSL connection, encrypted searches are available to users via https://ssl.scroogle.org/.

Over 130,000 searches per day are made through the Scroogle site, 10 percent of which use SSL. In an e-mail conversation, Daniel told me that his "ultimate goal is for Scroogle to survive long enough so that the public sector gets the idea that all major search engines should be treated like public utilities."

Daniel Brandt seems like a great guy. He's doing this for free--and accepts tax deductible donations on the Scroogle site. However, for users who don't trust Daniel's claims, they may wish to use the anonymizing TOR proxy in parallel with Scroogle.

What Daniel's site shows, is that privacy preserving search is possible. While Scroogle doesn't show any ads, if Google offered this service, they could still make a buck on it. Imagine that--making money, while not being evil.

Disclosure: I'm paid as a technology policy fellow by the Electronic Privacy Information Center, a public interest group that has repeatedly criticized Google for its privacy policies. Furthermore, I interned for Google in 2006, and have received a $5,000 fellowship from the company, both in 2006 and 2007.