Google augments open-source spell-check

Because of dictionary entries stemming from Google's translation work, the latest version of Chrome no longer thinks "antivirus" or "screensaver" is a misspelled word.

Google's expertise in translation has begun to pay dividends for an entirely separate project, its Chrome browser--as well as any other software using the open-source spell-checking package called Hunspell.

Chrome combines WebKit's spell-check infrastructure with Hunspell's multilanguage library of correctly spelled words to supply spell-check in 27 languages. But many widely used words were missing from Hunspell, and Google used its translation expertise to fill in the gaps.

Here's the explanation in a Wednesday blog post from Google programmers Brett Wilson and Siddhartha Chattopadhyay:

"The Hunspell dictionary maintainers have done a great job creating high-quality dictionaries that anybody can use, but one of the problems with any dictionary is that there are inevitably omissions, especially as new words appear or proper nouns come into common use. We at Google are in a good position to use our knowledge of the internet to identify and fix some of these omissions. The Google translation team used their language models to generate a sorted list of the most popular words in each language. This was cross-checked with the Hunspell dictionaries to generate a list of the top 1000 words not present in each dictionary. This list includes many popular words, but also common misspellings. To remove these words, each list was reviewed by specialist in that language. Generally, we tried to keep proper nouns and even foreign words as long as they were in common usage.

Among the English words Google added to the dictionary: antivirus, anime, screensaver, Mozilla, Obama, and Wikipedia.

Google released the resulting dictionary entries under the three open-source licenses that Hunspell uses: the GNU General Public License and Lesser General Public License and the Mozilla Public License. Google added new words for 19 languages into the latest developer preview version of Chrome, 2.0.160.0 .

By virtue of the way open-source software works, Google's work can help others who adopt the freely available changes. According to the Hunspell site, "Hunspell is the default spell checker of OpenOffice.org and Mozilla Firefox 3 and Thunderbird."

About the author

Stephen Shankland has been a reporter at CNET since 1998 and covers browsers, Web development, digital photography and new technology. In the past he has been CNET's beat reporter for Google, Yahoo, Linux, open-source software, servers and supercomputers. He has a soft spot in his heart for standards groups and I/O interfaces.

 

Join the discussion

Conversation powered by Livefyre

Don't Miss
Hot Products
Trending on CNET

HOT ON CNET

Looking for an affordable tablet?

CNET rounds up high-quality tablets that won't break your wallet.