Earlier this month, Google shared a fascinating statistic. The number of items in the company's Knowledge Graph -- its database of people, places, and things, and the connections between them -- over its first seven months in the wild, to 18 billion facts.
Up until this month, though, those facts were available only in English. It wasn't until December 4 that Google made the Knowledge Graph truly global, by introducing it in seven new languages: Spanish, French, German, Portuguese, Japanese, Russian, and Italian.
The project to make Knowledge Graph content available in so many new languages simultaneously was no small feat. Even in English, creating an easily searchable database of relationships is fraught with potential problems. If I search for "giants," do I mean the baseball team, the football team, or enormous people? Using a variety of signals, Google makes its best guess -- and then presents its findings in a handy panel on the right-hand side of the results page. (Or it will display a panel asking if you want results for baseball or football.)
Now imagine running that same query in other languages. What does a user searching for "giants" in French want? How about in Italian? Japanese?
The task of localizing Google's Knowledge Graph fell in part to Tamar Yehoshua, who oversees Google's efforts to take search international. What roadblocks has Google found along the way? Here are a few.
Figuring out where you are. This challenge predates the Knowledge Graph, but determining a user's location is still the foundation of all localization. The most important signal is the Google domain you're using -- .com indicates the United States, .co.jp indicates Japan, and so on. From there. Google looks at your IP address, the language of your browser, and the language you're searching in. The idea is to drill down to your current city so that results are as local as possible. (Using the settings page, users can search as if they were in another city.)
This has a big impact on search results. Search "UPC" on Google.com and you'll see information about universal product codes. But take the same search to Google.ie, the company's Irish domain, and the query brings up results for UPC Ireland -- the biggest cable television provider in that country.
Making answers locally relevant. Knowing where users are is only the first step. From there Google has to consider what sorts of things other people in that location tend to search for, and offer results accordingly. This is perhaps the Knowledge Graph's biggest challenge as it expands around the globe, because a query that Google can answer well in English might not be useful in another country. Getting it right means more than performing a basic translation.
"We want to understand entities that are culturally relevant as well," Yehoshua said. "In most software programs, you translate search strings -- all you have to do is translate a string to make your product work in another locale. In this instance, with the Knowledge Graph, we want to make sure the entities are culturally relevant."
Hence a search for "sumo" brings up much more detailed information about the sport in a Japanese-language search than it does in an English language search. (In San Francisco, naturally, the first search result is a software startup.)
Making the Knowledge Graph work in other languages means teaching Google about the cultural traditions that exist everywhere it is used, and generating results that are useful to local residents. It's a tall order.
Different cultures want different answers. Even when language isn't a factor in Knowledge Graph queries, Google still finds that other cultures have different expectations when they search.
Take the search for "Barack Obama." In the United States, the Knowledge Graph shows facts like his name, age, children, and education. Do the same search in Japan, though, and you'll find another fact, prominently displayed: the president's height (6'1"). It turns out that Japanese queries about the president involve a disproportionate amount of queries about how tall Obama is, and so the Knowledge Graph has to account for that.
Doing all this mostly with algorithms. Google has localization teams around the world that help bring the service to new languages. But the Knowledge Graph isn't a handcrafted guide to world culture.
"It is predominantly algorithmic," Yehoshua said.
The machine learning involved in getting Google to understand the relative importance of millions of culturally distinct subjects in real time is not trivial. As Yehoshua illustrated for me, expanding into new languages results in all sorts of new complications. But if Google is ever going to transform into the "Star Trek" computer, this is how it has to begin. And tackling those complications has another benefit for Google: it makes it harder for challengers to compete. And with Bing now adding Knowledge Graph-like features to its own search results page, it's clear that competitors will be following close behind.