The Internet is a magnificent resource -- and, Internet Archive believes, it has a lot of potential as a free library for researchers, historians, scholars and those who are just plain curious about the world.
And, with a new project, that library is getting bigger. In collaboration with the Internet Archive, Georgetown University academic Kalev Leetaru is in the process of uploading more than 14 million images from more than 2 million public domain e-books (more than 600 million pages) to Flickr.
The books, which are from the Internet Archive's library, span a period of 500 years and are automatically tagged thanks to a tool that scrapes the text before and after each image, making for a fully searchable database.
"Because we have OCR'd [optical character recognition] the books, we have now been able to attach about 500 words before and after each image," Internet Archive wrote in a blog post. "This means you can now see, click and read about each image in the collection. Think full-text search of images!"
Leetaru's contribution was to tap into the part of the Internet Archive's OCR software that recognises pictures in order to exclude them from the e-book scrape. He wrote his own algorithm that looks for the sections of the scanned books that the OCR software omitted and saves each one as a JPG file, capturing the caption and text in the process.
These are then uploaded to Flickr.
"I think one of the greatest things people will do is time travel through the images," Leetaru told the BBC. "Type in the telephone, for example, and you can see that all the initial pictures are of businesspeople, and mostly men. Then you see it morph into more of a tool to connect families. You see another progression with the railroad where in the first images it was all about innovation and progress that was going to change the world, then you see its evolution as it becomes part of everyday life."
Leetaru and the Internet Archive plan to share the code with library partners, allowing them to add to the already extensive archive. Meanwhile, the Internet Archive Book Images Flickr page is available online for anyone to use.