X

Google's Ngram Viewer: A time machine for wordplay

You may never get through all 500 billion words from more than 5 million books over five centuries. But you can find out, for instance, that "smartphone" is a lot older than you think.

Lance Whitney Contributing Writer
Lance Whitney is a freelance technology writer and trainer and a former IT professional. He's written for Time, CNET, PCMag, and several other publications. He's the author of two tech books--one on Windows and another on LinkedIn.
Lance Whitney
3 min read

Google

The word "spiderman" appeared in books in the 1920s, long before the famed Marvel superhero debuted in the early '60s. And the term "smartphone" was in use during the first decade of the 20th century, a century before anyone picked up their first iPhone.

How do I know all this? By using a new tool from Google called the Ngram Viewer. Launched by the search giant yesterday, this tool lets you trace the usage of a word or phrase during the past five centuries--five centuries!--by seeing how often it's appeared in books over that time span.

Courtesy of the folks at Google Labs, Ngram Viewer can work its analysis as a result of Google's sometimes contentious digitization of vast quantities of books--more than 15 million since the project began in 2004. The Ngram tool draws on what the company calls "a subset of that corpus" totaling more than 5 million books, around 4 percent of all the books ever published. By tracing the 500 billion or so unique words that show up in those 5 million books, the tool can offer a glimpse into their history and popularity over the years.

Ngram Viewer works rather simply. After you enter a word or phrase (up to five words), the tool displays a graph charting how frequently your term has appeared in books over that half a millennium. By default, the Ngram Viewer taps into books written in English. But you can change that to a different "corpus" or category of books, such as American English, British English, English Fiction, Chinese, French, German, Russian, or Spanish.

You can vary the years tracked, all the way from 1500 to 2008 or anywhere in between. Providing a wide range of years gives you more of an overview, while narrowing the years lets the tool graph a word's usage in a more granular fashion year by year.

You can enter multiple terms to compare their popularity. For example, typing the two terms "frankfurter" and "hot dog" shows that frankfurter's usage has remained steady over the years, but the hot dog has continued to jump in popularity since the early 1920s.

But although the Ngram Viewer can tell you how frequently a certain word or phrase has shown up in books, it can't tell you why. Nor can it necessarily explain the meaning of that word or phrase at the time it was used. So discovering that the word "android" first appeared in books in the mid-18th century is interesting, but did it mean the same to an Enlightenment reader that it does to someone in the era of Google?

You can, however, select a certain year or range of years to view a page that lists the books with your chosen word or phrase. By clicking on a specific book you can see the actually digitized pages, which in some cases can provide a bit of insight into how the word was used at the time.

Researchers at Harvard University gave the project its helping hand by providing the actual datasets used to generate the information. An article on the new tool published online yesterday in Science Magazine calls it a "quantitative analysis of culture using millions of digitized books."

Though Ngram Viewer sounds like a tool more for scholars and linguists, anyone with a penchant for words and the history and evolution of language should enjoy taking it for a spin.