Cleaning spam from swapping networks

Researchers think computers that "gossip" with each other are key to filtering out ads--and piracy-fighting decoys--on P2P networks.

John Borland Staff Writer, CNET News.com
John Borland
covers the intersection of digital entertainment and broadband.
John Borland
4 min read
What if every other time you brought a book home from the library, you opened it up and it turned out to be a pop-up advertisement for a local dry cleaner?

Cornell University researchers are trying to clear file-swapping networks of this kind of disappointment, with a new program aimed at filtering spam out of the peer-to-peer pool. But the tool could also ratchet up the antipiracy arms race, by filtering out the numerous "decoy" files used by Hollywood and record label allies to discourage illegal downloaders.

Released Monday, the researchers' "Credence" software lets different computers "gossip" with each other in the background to figure out which peer-to-peer files can be trusted, and which should be ignored. The researchers say they're trying to take a page from Google's book, boosting the accuracy of search results by relying on the recommendations of other trusted users.


What's new:
Cornell researchers think computers that "gossip" with each other are the key to filtering out ads on P2P networks.

Bottom line:
The new program could also ratchet up the antipiracy arms race by filtering out the numerous "decoy" files used by Hollywood and record label allies to discourage illegal downloaders.

More stories on this topic

"I believe in people; I think most people are honest," said Emin Gun Sirer, the assistant professor of computer science at Cornell who is leading the project. "I think it will be people on the periphery who will be kept out."

The project aims at the heart of peer-to-peer networks' biggest weakness today. Allowing people to search each other's hard drives has made hundreds of millions of files potentially available at a mouse-click, but search results remain spotty and badly organized, much like the early days of Web search.

What would ordinarily be a straightforward computer science question has been complicated by the fact that so many of the files on peer-to-peer networks are songs or videos under copyright. In this case, improving search results could also contribute to making copyright infringement more efficient.

Peer-to-peer networks have been polluted with junk files and spam almost since their inception. It took spammers only a few months to realize that the popular networks presented a new opportunity for unsolicited advertising, and to adapt their technologies accordingly.

Advertisements on peer-to-peer networks are typically sent by creating servers that automatically respond to any search request with an affirmative. Thus, if someone is searching for "Bush speech," a spammer might respond with a file that is dynamically named "Bush_speech," which instead turns out to be an ad.

Companies in the file-swapping software business say they've talked to ISPs about unplugging these spammers, but that it has largely remained up to consumers to adapt.

"Users tend to learn to detect and ignore it," said LimeWire Chief Technology Officer Greg Bildson. "It does hamper the user experience a bit, but isn't as bad as endless volumes of e-mail spam, for example."

However, the issue has been complicated by the rise of antipiracy companies, such as Overpeer, that seed file-swapping networks with false versions of popular songs and movies in attempts to prevent

would-be copyright infringers from downloading the real versions.

A study performed in May 2004 by researchers at the Polytechnic University in Brooklyn, N.Y., found that more than half the copies of many popular songs found on the Kazaa file-swapping network were decoys, damaged or junk files.

Gossiping over digital fences
Peer-to-peer developers have long focused on the idea of "trust" on their networks. Because any computer can join and become an instant equal in a network, researchers have looked for ways to prevent attackers who want to disrupt data traffic or spread corrupted information.

Many modern peer-to-peer networks include basic file integrity ratings, which simply allow people to rate whether a file is good or not. This has been easily evaded or abused, however--indeed, the Polytechnic study called Kazaa's file rating system so flawed that it was "meaningless."

"It's an ever-escalating arms race."
--Marc Morgenstern, general manager, Overpeer

A popular idea in universities has been establishing a "reputation" system, which would give different computers a way to know how much to trust each other. But these have typically been difficult to implement.

The Cornell researchers' open-source Credence system also starts with users giving ratings to files. But from there, the software "gossips" with other computers to see how other people have rated the same files, looking for evaluations that are similar. When searching for files, the software then gives precedence to results that have been rated highly by this "trusted" community of people whose ratings have matched.

The idea is to filter out spammers who rate their own files as genuine, by simply isolating them outside these communities of computers with good reputations.

While aimed at unwanted advertising, the reputation-based filtering system could also serve to filter out the decoy files propagated by Overpeer and other piracy fighters, unless they too spend time rating files to become part of trusted communities. That company's executives say they're not worried, however.

"It's an ever-escalating arms race," said Marc Morgenstern, general manager of Overpeer, a division of Loudeye. "We've tackled various types of filters successfully, and we feel very confident that we will continue to do so."