Google is opening a new chapter in its book digitization saga, this time taking on the likes of Amazon.com's Kindle and Sony's eReader.
The search giant on Thursday launched a mobile version of its Google Book Search, giving iPhone and Android users instant access to more than 1.5 million public domain books. The works of authors such as William Shakespeare, Jane Austen, and Charles Dickens were optimized to be read on the small screen, a challenge the Google Book Search team called "daunting" in a blog post announcing the launch:
There's an interesting backstory about the work involved to prepare so many books for mobile devices. If you use Google Book Search, you'll notice that our previews are composed of page images made by digitizing physical copies of books. These page images work well when viewed from a computer, but prove unwieldy when viewed on a phone's small screen.
Our solution to make these books accessible is to extract the text from the page images so it can flow on your mobile browser just like any other web page. This extraction process is known as Optical Character Recognition (or OCR for short).
However, as the team notes, there are frequently obstacles that keep the printed word from being accurately extracted, such as smudges, fancy fonts, old fonts, and torn pages. As an example of an "extreme case," the team presented the this page image from Lewis Carroll's Alice's Adventures Under Ground:
...and the resulting extraction:
=> "lV~e.il!" .ÍAoHyU- AUte. U brstty/affc. su.it a. f o.tl as ~tk¿* , I s&O.IL .éfiiíjz tiotkun-) of-ttmlr1¿*y ¿i^n. sta¿rs ! Jfo» ura.ve ...
The e-book reader market has exploded in the past year, with analysts estimating Amazon sold 500,000 Kindles in 2008. Last month, Amazon CEO Jeff Bezos credited the "unusually strong demand" for the unit in helping the e-tailer beat Wall Street's fourth-quarter revenue and earnings expectations.
Google, which last yearover its scanning project, may have timed this launch to upstage Amazon, which is rumored to be later this month.