Google's translation center: Language lessons for the Googlebot?
The search giant appears ready to launch a service to help people get documents translated. Might the service also help train Google's machine translation technology?
Updated 1:17 p.m. to correct that Google ranked first in the machine-translation accuracy evaluation. Updated 10:50 a.m. PDT with Google's no-comment.
Google looks set to launch a beta test of a document translation service, a new move in the company's efforts to break down language barriers.
With the service, the company will connect people who need documents translated with humans who will be paid to do so, according to the Google Translation Center information page. The site was spotted by sharp eyes at the Google Blogoscoped blog.
"Google Translation Center is the fast and easy way to get translations for your content. Simply upload your document, choose your translation language, and choose from our registry of professional and volunteer translators. If a translator accepts, you should receive your translated content back as soon as it's ready," the site said.
Google prefers to rely on computer algorithms rather than humans, so at first glance the Google Translation Center looks somewhat anomalous, even though Google is only playing a middleman role. But it's possible that the human translators might be gradually improving Google's machine translation technology as they work, in effect helping to put themselves out of a job.
That's because Google's translation system uses a statistical model that works better the more it can compare the same text in two different languages. And Google evidently will track translation work in its database; according to the center's introduction for translators, "our translation search feature matches your current translation with previous translations, so you don't have to translate over and over again."
Google is fervently interested in better machine translation. With it, it can use its search technology to link people with data around the world, regardless of language barriers, making its search engine significantly more powerful.
Wanted: More Rosetta Stones
Google's translation technique essentially relies on having as many Rosetta Stone-like documents as possible. The more documents it has in two languages, the better able it is to match words and phrases from one language to another, according to a , a Google fellow who works Google's computing infrastructure.
"By computing statistics over all words and phrases, you...get a model of word-by-word and phrase-by-phrase replacements," Dean said. Machine translation often produces awkward results today, but "the impact of having a really large language model makes the sentences flow a lot more easily."The screenshot below, from Google, shows the online interface a Google translator apparently will see. It shows text in two languages, with the passage broken down into chunks of text. It also suggests a previous translation of one chunk, offering a "use suggestion" button to employ it. It's not clear if the previous translation draws just on that individual translator's work or a larger collection.
Based on the Bilingual Evaluation Understudy method for rating translation accuracy, Google scored first place in a 2005 evaluation by the National Institute of Standards and Technology evaluation.
Google was mum about the project. "We're always looking at new ways of providing tools for users to connect with each other, share information, and improve access to information on the internet, but we don't have any new details to share at this time," the company said in a statement.
Paying the middleman
It's a time-tested business to be the middleman who connects customers to those willing to pay for a product or service, but the Internet has taken the role to new heights by more easily enabling that process on a national and sometimes global scale. For example, , Serebra Connect, and Elance can help companies that need tasks done find people who can do them.
But the Google Translation Center seems to have a different approach. Translators get access to free Google tools, and it appears Google isn't involved in any payment transactions, according to the site.
"Google Translation Center provides a venue for you to enter into and complete translation transactions. Except when you use Google Translation Center as provided in Section 4, Google is not involved in any transactions in Google Translation Center. Your interaction with any third party participant(s) or user(s) within Google Translation Center, including payment and delivery of goods and services...are solely between you and such third party participant(s) or user(s) and Google is not involved in such dealings," according to the terms of service. Section 4, titled "Google Participation," says just that "Google and/or its subsidiaries and affiliates may use Google Translation Center from time to time."
So what's in it for Google?
Of course, Google has a strong search-ad business that it uses to subsidize any number of efforts that may not be profitable for years, if indeed ever. After all, Google's mission is "to organize the world's information and make it universally accessible and useful."
But even if Google doesn't charge a percentage, improving automated translation could be a powerful incentive as Google tries to keep its core product, the search engine, competitive.
Google's translation technology is available through thesite, but the company also has technology called Cross Language Information Retrieval (CLIR) that builds translation into its search engine.
Search increasingly is the gateway by which people discover what's on the Internet, so building automated two-way translation into the process could open up the very parts of the Internet that today are available but effectively hidden by language barriers.
CLIR can translate a search query into a foreign tongue then translate the answer back into the search results. Clicking a link produces the translated version of that page.
For example, a search in Russian for Tony Blair's biography will present an option, in Russian and presented at the bottom of the search results page, to search pages written in English. Clicking on a link then translates the English page into Russian.
Google executives have given indications recently about just how grand the company's ambitions are for the automated language translation. The company wants people from any major language to understand any other.
"We will eventually do 100 by 100 languages, to take this set of languages and convert to another," Google Chief Executive Eric Schmidt said in a. "That alone will have a phenomenal impact on an open society," he said, a reference to concerns many have expressed about Google's censored search results in countries such as China.