Android data tied to users? Some say yes

Google says its collection of location information from Android devices isn't "traceable" to a particular individual, a narrow claim that's already attracting criticism.

Declan McCullagh Former Senior Writer
Declan McCullagh is the chief political correspondent for CNET. You can e-mail him or follow him on Twitter as declanm. Declan previously was a reporter for Time and the Washington bureau chief for Wired and wrote the Taking Liberties section and Other People's Money column for CBS News' Web site.
Declan McCullagh
5 min read

Google acknowledged today that it collects location information from Android devices, but downplayed concerns about privacy by saying the information is not "traceable to a specific user."

That claim, it turns out, depends on the definition of "traceable."

According to detailed records provided to CNET by a security researcher, Android phones regularly connect to Google.com and disgorge a miniature data dump that includes time down to the millisecond, current and recent GPS coordinates, nearby Wi-Fi network addresses, and two 16-letter strings representing a device ID that's unique to each phone.

Apple, which came under fire this week after reports that approximate location data is stored in perpetuity on iPhones, also collects such data through the Internet. It acknowledged (PDF) to Congress last year that "cell tower and Wi-Fi access point information" is "intermittently" collected and "transmitted to Apple" every 12 hours, but has refused to elaborate. (See CNET's FAQ on the topic.)

Location tracking compared
Declan McCullagh/CNET

Assembling a database of locations can raise privacy concerns. While Android's device ID isn't a name or phone number, it uniquely identifies each phone and is linked to its whereabouts, which means Google might be able to trace the location of an Android phone over months or even years. Less is known about what data Apple collects, including whether a unique device ID is transmitted.

A Google representative said she would not immediately be able to respond to a list of questions posed by CNET this afternoon. The company's statement says: "We provide users with notice and control over the collection, sharing, and use of location in order to provide a better mobile experience on Android devices. Any location data that is sent back to Google location servers is anonymized and is not tied or traceable to a specific user."

"It's not tied to a user," says Samy Kamkar, who provided the Android connection logs to CNET. "But it is a unique identifier to that phone that never changes unless you do a factory reset."

An Android setup screen references these ongoing location updates, saying that choosing to enable location services allows Google to "collect anonymous location data," even when "no applications are running." But that disclosure does not acknowledge that a unique device ID is transmitted. (See a screen snapshot.)

It's difficult to know how significant the privacy risks are. That depends in large part on whether Google anonymizes the location information and device ID that it collects from Android devices--and, especially, how long data is kept.

Marc Rotenberg, executive director of the Electronic Privacy Information Center, is skeptical of Google's claim that the data is not "traceable" to a specific person. "If you can link a person's address with their activity," he says, "bingo! It's personal data."

Excerpts from Android connection-logging done by Samy Kamkar. CNET has redacted his device ID and Wi-Fi MAC address.
Excerpts from Android connection-logging done by Samy Kamkar. CNET has redacted his device ID and Wi-Fi MAC address. Click for a larger image.

Requesting cell phone location information from wireless carriers has become a staple of criminal investigations, often without search warrants being sought. It's not clear how often legal requests for these records have been sent to Google and Apple, or whether the companies have required a judge's signature on a search warrant, the most privacy-protective approach, or settled for less.

The Android device ID can be tied to a person without a minimum of number-crunching, said Kamkar, a onetime hacker with a colorful past. Google can determine that "this is probably their home address because they're there at 3 a.m. every single day," he said. And "this is probably their work address because they're there between 9 a.m. and 5 p.m. every day."

Even though police are tapping into the locations of mobile phones thousands of times a year by contacting AT&T, Verizon, and other carriers, the legal ground rules remain unclear, and federal privacy laws written a generation ago are ambiguous at best. The Obama Justice Department has claimed that no warrant is required for historical location information. (CNET was the first to report on warrantless cell tracking, in 2005.)

"I think it's important that people know what's happening" inside their phones, Kamkar said.

Like iOS devices, Android phones do collect location information in a local file. But they seem to erase it relatively quickly instead of saving it forever. Swedish programer Magnus Eriksson has highlighted a portion of the Android source code suggesting a maximum of 50 cell tower locations are retained, which a source close to Google indicates is correct.

Here are the questions, still unanswered, that CNET posed to Google this afternoon:

I've been looking into this a bit more. It appears that Android phones send an HTTP POST data packet to Google, specifically this URL: http://www.google.com/loc/m/api

Included in the POST packet are a series of strings, including:
- carrier name
- time packet was sent, down to the millisecond
- MAC address, name, signal strength of the Wi-Fi network in use
- MAC address, name, signal strength for other visible Wi-Fi networks
- lat/long GPS coordinates of the phone
- other lat/long pairs and times associated with them (showing motion)
- Two 16-byte strings that are uniquely tied to that Android device

The last field is the important one. It doesn't include a name or phone number, but it is traceable to a specific user. If I'm at a certain home address every evening, and at a certain work address every day from 9 a.m.-5 p.m., it's pretty clear who I am.

So my questions are:

- Why doesn't Google randomize those two 16-byte strings (let's call them the device ID) on an hourly or daily basis?
- Given a street address or pair of GPS coordinates, is Google able to produce the complete location logs associated with that device ID, if legally required to do so?
- Given a device ID, is Google able to produce the complete location logs associated with it, if legally required to do so?
- Given a MAC address of an access point, is Google able to produce the device IDs and location data associated with it, if legally required to do so?
- How long are these location logs and device ID logs kept?
- If they are partially anonymized after a certain time, how is that done, and can those records be restored from a backup if Google is legally required to do so?
- How many law enforcement requests or forms of compulsory process have you received for access to any portion of this database?
- Why have you assembled this location and device ID database? My current theory is that it shows traffic on Google Maps where street data would be otherwise unavailable (a very useful feature, but one that doesn't appear to require keeping fixed device IDs).
- How are the device ID strings calculated?
- Did Alma Whitten approve this form of device ID logging? If not, what internal process did you use to vet any possible privacy concerns?
- If Google knows that a Gmail user is connecting from a home network IP address every evening, it would be trivial to link that with an Android phone's device ID that also connects via that IP address. Does Google do that?
- Does Android store only a maximum of 50 cell records and 200 Wi-Fi records?

Disclosure: Declan McCullagh is married to a Google employee not involved in this issue.