A Chrome extension called Project Naptha allows you to highlight and copy text from an image, with more features on the way.
Last year, XKCD posted a single-panel comic about absentmindedly selecting text when browsing the web. It wasn't about images — but it did give MIT student Kevin Kwok an idea. He started work on a project that would become Project Naptha — a Chrome extension that allows you to select and copy text on images.
It uses something called optical character recognition (OCR) — that is, the kind of software that allows printed material to be scanned as text documents and PDF conversion — but that's not the key to how Project Naptha works.
"The primary feature of Project Naptha is actually the text detection, rather than optical character recognition," Kwok wrote. "It runs an algorithm called the Stroke Width Transform, invented by Microsoft Research in 2008, which is capable of identifying regions of text in a language-agnostic manner. In a sense that's kind of like what a human can do: we can recognize that a sign bears written language without knowing what language it's written in, never mind what it means."
Stroke Width Transform is used in conjunction with other algorithms, such as connected components analysis, which identifies individual letters; Otsu thresholding, which detects word spacing; and disjoint set forests, which identify lines of text. In this way, it can build models of letters, words, text regions and paragraphs. If this system isn't up to scratch, it falls back on Google's cloud-based text recognition software, Tesseract.
A few other features make the extension user-friendly. It is able to identify text of many colours on picture backgrounds as well as plain, it can read text at an orientation of up to 30 degrees from the horizontal, and it constantly watches cursor movement so that it can predict where you are going to mouse over and start processing text in advance.
Current features include selecting and copying text, and you can even erase the text from an image or rewrite it in a clearer font using the "Translate" option in the right-click menu. Other features that are still in beta include actual text translation to and from languages such as Spanish, Russian, Chinese, Japanese, German and French.
It's not, of course, perfect. It sometimes messes up, and seems to perform better on even fonts. Comics and handwriting are a bit trickier. The above image, for example, returned "I'M NOT MFID E ST HLL I'M JUST DIFFERENT L)’ SHNE" as the copied text; but we expect Kwok will continue to tweak his software to reduce its limitations.
Meanwhile, a version for Firefox is in the works — and we'll certainly be making use of it from this point forward.