Unicode outpaces ASCII for encoding Web site text, and life gets easier for Google and others that grapple with an increasingly international Internet.
Stephen Shanklandprincipal writer
Stephen Shankland has been a reporter at CNET since 1998 and writes about processors, digital photography, AI, quantum computing, computer science, materials science, supercomputers, drones, browsers, 3D printing, USB, and new computing technology in general. He has a soft spot in his heart for standards groups and I/O interfaces. His first big scoop was about radioactive cat poop.
Expertiseprocessors, semiconductors, web browsers, quantum computing, supercomputers, AI, 3D printing, drones, computer science, physics, programming, materials science, USB, UWB, Android, digital photography, scienceCredentials
I've been covering the technology industry for 24 years and was a science writer for five years before that. I've got deep expertise in microprocessors, digital photography, computer hardware and software, internet standards, web technology, and other dee
I picture it happening this way. The Roman alphabet is on the run, pursued by a much larger army of Arabic characters with long scimitar-like ligatures, Chinese characters that look like throwing stars, and European peasant letters bristling with umlauts, cedillas, and tildes.
Unicode is a character encoding standard that gracefully accommodates dozens of languages as well as Roman characters with diacritical marks. ASCII, a tried-and true, decades-old standard, is limited to 128 or 256 characters and has a hard time extending beyond the range of a century-old Remington typewriter.
Unicode vanquished ASCII and Western European within 10 days in December, Davis said.
"What's more impressive than simply overtaking them is the speed with which this happened," he added, pointing to a graph showing the meteoric rise of Unicode.
Google's a fan of Unicode Web sites. When it processes data from Web sites, it converts it into Unicode first if it's not already there. That improves international search abilities.
"The continued rise in use of Unicode makes it even easier to do the processing for the many languages that we cover," he said.
Google just converted to Unicode 5.1, he added, "so people speaking languages such as Malayalam can now search for words containing the new characters," he said.
One disadvantage Unicode has over ASCII, though, is that it takes at least twice as much memory to store a Roman alphabet character because Unicode uses more bytes to enumerate its vastly larger range of alphabetic symbols.