X

Google augments open-source spell-check

Because of dictionary entries stemming from Google's translation work, the latest version of Chrome no longer thinks "antivirus" or "screensaver" is a misspelled word.

Stephen Shankland Former Principal Writer
Stephen Shankland worked at CNET from 1998 to 2024 and wrote about processors, digital photography, AI, quantum computing, computer science, materials science, supercomputers, drones, browsers, 3D printing, USB, and new computing technology in general. He has a soft spot in his heart for standards groups and I/O interfaces. His first big scoop was about radioactive cat poop.
Expertise Processors, semiconductors, web browsers, quantum computing, supercomputers, AI, 3D printing, drones, computer science, physics, programming, materials science, USB, UWB, Android, digital photography, science. Credentials
  • Shankland covered the tech industry for more than 25 years and was a science writer for five years before that. He has deep expertise in microprocessors, digital photography, computer hardware and software, internet standards, web technology, and more.
Stephen Shankland
2 min read

Google's expertise in translation has begun to pay dividends for an entirely separate project, its Chrome browser--as well as any other software using the open-source spell-checking package called Hunspell.

Chrome combines WebKit's spell-check infrastructure with Hunspell's multilanguage library of correctly spelled words to supply spell-check in 27 languages. But many widely used words were missing from Hunspell, and Google used its translation expertise to fill in the gaps.

Here's the explanation in a Wednesday blog post from Google programmers Brett Wilson and Siddhartha Chattopadhyay:

"The Hunspell dictionary maintainers have done a great job creating high-quality dictionaries that anybody can use, but one of the problems with any dictionary is that there are inevitably omissions, especially as new words appear or proper nouns come into common use. We at Google are in a good position to use our knowledge of the internet to identify and fix some of these omissions. The Google translation team used their language models to generate a sorted list of the most popular words in each language. This was cross-checked with the Hunspell dictionaries to generate a list of the top 1000 words not present in each dictionary. This list includes many popular words, but also common misspellings. To remove these words, each list was reviewed by specialist in that language. Generally, we tried to keep proper nouns and even foreign words as long as they were in common usage.

Among the English words Google added to the dictionary: antivirus, anime, screensaver, Mozilla, Obama, and Wikipedia.

Google released the resulting dictionary entries under the three open-source licenses that Hunspell uses: the GNU General Public License and Lesser General Public License and the Mozilla Public License. Google added new words for 19 languages into the latest developer preview version of Chrome, 2.0.160.0.

By virtue of the way open-source software works, Google's work can help others who adopt the freely available changes. According to the Hunspell site, "Hunspell is the default spell checker of OpenOffice.org and Mozilla Firefox 3 and Thunderbird."