Two companies recently pitched me on their semantic engines. These are not search engines, which is what most people think. Rather, they are databases and algorithms that hold the structure of language (in both cases, the English language). At the most basic level semantic engines tell you what's synonymous with what. At the advanced end of the spectrum they know how grammatically similar phrases like "take a seat," "take a stand," and "take a lollipop," mean completely different things.
These engines can be used by search products to greatly improve results. Powerset, now a part of Microsoft, made a big deal of its semantic chops by showing how vaguely worded search queries would return just the results you wanted. Now, it seems, that raw semantic technology is about to become mainstream.
Cognition recently announced its "world's largest semantic map of the English language," sporting more than "10 million semantic connections." The company is rolling the technology into products like CognitionSearch for the Enterprise, which is a knowledge mining tool, as well as an "eDiscovery" product for the legal industry that enables lawyers to "quickly and efficiently find incriminating, smoking gun documents." The company is also applying its technology to a new advertising engine.
The much smaller and newer company, Eeggi, which I was introduced to at Web 2.0 Expo in New York, is also building an engine for discerning meaning. Founder and chief scientist Frank Bandach told me his model was mathematical (his training is as a prime number theorist) and that his engine goes well beyond understanding synonyms. In his demo, he entered the query "Mary kissed John," and showed how traditional word-matching engines picked up pages there were also about John kissing Mary. His system understands English well enough to filter those out as misses.
Bandach says that he's got most of the English language in his system, and that he did English first, "because it's hard. Only Finnish is harder." He's going to work on German next, by feeding it some German dictionaries, which sounds like a science-fiction way to seed a semantic engine, but he said it's enough to get the system going. Bandach says his algorithms are efficient and not, like Powerset's, CPU hogs.
Unlike Cognition, Eeggi is an early-stage project with only four people working on it. It's far too early to tell if the technology is robust and scalable enough to compete with Cognition or Powerset. But I am encouraged to see small companies working on this problem and claiming intellectual breakthroughs. I really would not be surprised to see "meaning engines" become available to Web developers in the same way spelling checkers and grammar engines are now. I have no idea what developers will build with this technology, but I can't wait to see it.
See also: Cycorp.
Click here for full coverage of Web 2.0 Expo