Google is the No. 1 free tool to snoop on friends or strangers. But government agencies including the Federal Aviation Administration are investing in a new search engine being developed at the University of Buffalo to do some of their more sensitive detective work.
The technology, released as a prototype in recent weeks, is designed to mine a corpus of documents for associated ideas or connections--connections between two unrelated concepts, for example, that would otherwise go unseen or would take countless hours of investigative work to discover. The project was specifically funded for anti-terrorism efforts and initially was used for searching over data within the 9/11 Commission report and public Web pages related to the suicide bombings carried out by terrorists who hijacked four U.S. commercial planes.
"Say you have the kind of question that connects these two people that we don't know about. You could start reading through all those documents. But our system is designed to look specifically for those evidence trails" that connect those two people, said Rohini Srihari, UB professor of computer science and engineering.
John McCarthy, professor emeritus of computer science at Stanford University, said that linking between concepts is an old idea, but that a new way of doing it could be an important breakthrough. In general, search engines such as Google and Yahoo mine documents for textual clues, or matches to query terms, rather than on the occurrence of ideas. Still,is working in the area of searching for concepts.
"The tools that we already have would be more useful if we could search on concepts," McCarthy said.
Srihari and a team in the Center of Excellence in Document Analysis and Recognition in the UB School of Engineering and Applied Sciences have been developing the search engine for the last two years. She said that her team plans to have a deliverable system for the FAA and the intelligence community by the end of the year, but it will not be widely available to the public. The underlying research, co-funded by the National Science Foundation, will also be published.
The technology, called a concept chain graph, uses different mathematical algorithms for finding the best path for connecting two different concepts. It will then list the strongest to weakest links.
For example, the engine might find an association between John Smith, who belongs to an association that sponsors radical right-wing discussions, and company B. Company B owns a subsidiary that is the same organization that sponsors the discussions. The search engine would find the link automatically.
The search engine examines a limited collection of documents, such as the 9-11 Commission report. It will index every document and then identify the important concepts within them, such as names, places, dates, times, as well as key ideas for the intelligence community, such as guns, bombs, buildings, etc. It will then map the connections to create a trail of evidence between two ideas.
Computer scientists have taken other approaches to determine such connections, including techniques called graph mining or social-network analysis. Many search engines create associations by using hypertext links as a way to establish connections between documents or query terms. By contrast, the University of Buffalo's project uses only textual analysis.