But if a Google could come out of nowhere to
As it launches, Spock has more than 100 million people in its database, and the company plans to quickly add more by scouring other publicly available sites. While people-related search sites such as Wink, ZoomInfo.com and
CNET News.com recently sat down the CEO co-founder Jaideep Singh to find out more. By the way, Singh says the company name has nothing to do with the Vulcan science officer of the Starship Enterprise. It's an acronym for "single point of contact and knowledge."
Q: How many people has Spock indexed now?
Singh: A little over 100 million people.
And you're adding approximately how many each day?
Singh: There are two things: one is people, and the other is how many documents we're processing, because one person may have many documents. We're really crawling an index in the entire Web and picking out documents and organizing those documents around people.
Can you explain exactly how the technology works?
Singh: If you're looking for some specific keyword, Google is great. The issue is that when you now search for people on Google, what you get is a bunch of documents about people. If you have a popular name like David Stern, who is the NBA commissioner, the first couple of pages are really about that person. So you really can't find the David Stern you met at the bar or from a business meeting.
That's a simple manifestation of the thing. It takes a lot more technology to do what we're doing, which is really trying to figure out the unique David Stern and organize documents and information and images and relationships--all those things--around a person.
How much harder is it to do that than a general search?
Singh: A lot harder. It's actually a different technology stack. The only thing that's common is crawling.
So where's the difference?
Singh: When we're done crawling, we go off in a different direction. Instead of just doing metadata extraction, we try to figure out who is this document about. We want to figure out the most relevant thing in that document. So, say there's a document about Charlie, and it says, "Jaideep likes to play tennis with Renee." That doesn't mean Charlie plays tennis or likes to play tennis. So you really you have to understand language and understand what this document is all about and that takes you to do things like natural language processing and other technologies.
Is there anything that you folks have come up with that's proprietary?
Singh: Absolutely. We have numerous patents. We have seven Ph.D.-type people in our company working on the algorithms for this thing. We have a lot of other outside help, including a lot of notable advisers from Stanford and from industry who are helping us really solve these problems. It's not just solving the problem, it's solving that scale for billions of Web documents. That's the largest-scale problem there is out there, so that's a challenge.
Is what we now see on the screen what the public will see when Spock opens up?
Singh: That's correct.
And one of the first questions they'll have is how is this different from Google.
Singh: Let me just step back a little bit. When users come to use the site, we think they're going to find it to be a very cool service because not only is it Google-esque in a way. You can give a query and type in a name or any keyword--you can say "Give me all the astronauts--but when you do that, you get very well-organized results and see the picture of the person. You see the most relevant terms or words that define this person and you'll see where they are on the Web and their relationships.
I tested the service earlier and it pulled up a lot more
Singh: You raised a really good point. Let's talk about that for a second. One has to realize that what we're doing is identical to what Google is doing in terms of indexing the Web. We're going out to public documents and picking up content. One has to realize that there is a lot of stuff about you on the Internet. You may have blogged someplace but it's on the Web. You may have a MySpace profile and it's on the Web. What we're finding is our users--when they come on to Spock--can really find this valuable in terms of "Hey, what has Spock discovered that's on the Web about me?" So, just knowing that is valuable.
Can you go beyond a firewall?
Singh: We don't do that. Unless it's out on the public Web, we don't try to get inside.