Google finds perks in its Wikipedia translations

It's nice for Wikipedia fans that Google helps fund and foster translation work. It's also nice for Google's own translation technology.

Google's mission is to organize the world's information and make it universally accessible, but not necessarily to create it outright. This makes Wikipedia a natural partner.

It's therefore no surprise to hear when the search colossus helps out the cooperatively written project.

Specifically, Google is helping Wikipedia with translation, so subject matter documented in one language needn't be created from scratch in another. Google described some of its translation work in a presentation at the Wikimania conference in Poland over the weekend.

"In the last 16 months, Google has been working with the Wikimedia Foundation, students, professors, Google volunteers, paid translators, and members of the Wikipedia community to increase Wikipedia content in Arabic, Indic languages, and Swahili," Google said at the conference. In a blog post on Wednesday, Google said it has begun the work with Hindi, which despite having millions of speakers had only 21,000 Wikipedia articles in 2008 compared with 2.5 million in English.

All of this is a laudable goal, given how often Wikipedia entries show up in Google search results. But there's an interesting, financially helpful side effect of the work, too: it's perfectly suited to improving Google's own translation tools.

That's because Google's translation technology begins with content in which the same text appears in multiple languages. The more examples of human translation it has, the better it works and the less often it has to fall back on machine translation. Wikipedia provides a diverse and growing body of subject matter that seems ideal for the task.

Google helps others help itself with the Google Translator Toolkit, which lets people collaboratively translate documents with Google's technology offering a head start.

The Translator Toolkit can specifically import Wikipedia pages, and doing so contributes a project to Google's translation technology. "Translated segments for Wikipedia articles are stored in our global, shared translation memory. You cannot change this setting for Wikipedia translations," the tool notes.

"There are many Internet users who have used our tools to translate more than 100 million words of Wikipedia content into various languages worldwide," Google product manager Michael Galvez said in the blog post.

Having better translation directly helps Google by lowering language barriers for its sites--not just supplying search results, but indexing Web sites, captioning YouTube videos, translating e-mail, and translating Web pages viewed in Chrome.

About the author

Stephen Shankland has been a reporter at CNET since 1998 and covers browsers, Web development, digital photography and new technology. In the past he has been CNET's beat reporter for Google, Yahoo, Linux, open-source software, servers and supercomputers. He has a soft spot in his heart for standards groups and I/O interfaces.

 

Join the discussion

Conversation powered by Livefyre

Don't Miss
Hot Products
Trending on CNET

HOT ON CNET

Delete your photos by mistake?

Whether you've deleted everything on your memory card or there's been a data corruption, here's a way to recover those photos.