X

The file-sharing dilemma

M-Terra CTO Darrell Smith says that while the world may be a different place, arguments in the peer-to-peer file-sharing space are still based on days gone by.

4 min read
Attempting to deny the undeniable has become the accepted practice for many file-sharing companies especially as it relates to the ability to filter. As the former CTO of one of those companies, I know firsthand that there are no technical limitations to the ability to filter. I currently own a company, M-Terra, that permits filtering within a distributed environment. Further, there are a growing number of companies that do the same. Thus, the question is not whether file-sharing companies can filter, but whether they will.

The continued focus with regard to filtering copyrighted works by keyword filtering is intended to divert attention from the facts. Kazaa has something that Napster didn't have: metadata. Metadata is the information that's kept about the files that a Kazaa user is sharing. An example of metadata would be the name of a song or name of an album.

The Napster file-sharing system contained a database used for searching for music on the system. However, it contained limited information about the files. Napster didn't have a mechanism for tracking metadata. In order to find a song on Napster searching by song name, the file name of the MP3 file would have to contain the name of the song.

Take, for example, a Napster filename:

1. Shakespeare-Romeo_and_Juliet.mp3

2. srj.mp3

As you can see, the first file has the name of the artist and also the name of the song appended to it--if it's even a song. But the second file, which possibly is the same file, has a completely unrecognizable name. Using this method for searching content, it's hard for a computer system to know with confidence what the file really is.

Kazaa on the other hand has metadata and would look like the following:

File_Name: Shakespeare-Romeo_and_Juliet.mp3

Author: Shakespeare

Title: Romeo and Juliet

Song: N/A

Type: Audio book

Price: .99

The metadata tells us this file is an audio book. However, looking at this same file in the Napster way would make us assume this file was a song.

Why haven't any of the popular file-sharing companies added filtering, other than basic keyword filters?
Metadata is not perfect but it can easily be made more accurate in determining whether a file is copyrighted. A question at this point should be why haven't any of the popular file-sharing companies added filtering, other than basic keyword filters? All the popular file-sharing clients support metadata.

The file-sharing application from Streamcast, Morpheus, even goes further than Kazaa in how it defines metadata. The open-source origin of the Morpheus application allows Streamcast as a developer to add additional metadata to its network. This additional metadata can be used for embedding copyright information, or information about the user who's sharing the file with the public.

In a recent letter to Congress, Sharman refers to the term "hash marks," which are unique codes used to identify digital files, like MP3 music files. You could think of a hash value as being similar to a person's fingerprint, allowing for unique identification. Hash marks are another method that can be used for filtering files on Kazaa

Sharman says that it's not aware of any applications that can scan its users' files for hash marks associated with copyrighted files. In fact, the Kazaa application relies on hash marks to determine how a file is to be downloaded from the network to a user's computer.

If a user of Kazaa or Morpheus tries to download a song from the artist Sting, the file-sharing application would first look at a list of the songs found during a search, and then compare their hash marks to determine which songs in a list of many files are indeed the same copy. If the file-sharing application sees two or more files that it thinks are the same but with different hash marks, it will filter the choices of files to include only files with the same hash mark.

The developers of the Kazaa application could easily write a piece of code that could interface with a fingerprinting tool.
Using hash marks for identifying similar files is very important for file-sharing applications. This process allows for downloading a single song from multiple computers at the same time, speeding up the download time. The technology exists today for creating similar systems for identifying copyrighted music or other digital works. Some of the existing digital fingerprinting tools and databases for identifying music contain information for millions of copyrighted works.

The developers of the Kazaa application could easily write a piece of code that could interface with a fingerprinting tool. The fingerprinting tool would confirm if the file was approved for downloading, and would even allow the file to be sold at a specified price similar to Apple's iTunes music store.

During the days of the original Napster application, network bandwidth was at a premium, supercomputers cost over $100 million, the average hard drive in a computer could only store about 1,000 songs, and filtering digital files was impractical. Now, let's fast-forward to 2004. Bandwidth is plentiful (in the United States), a supercomputer can be built for $5 million instead of $100 million, and we can carry 10,000 songs in our shirt pocket.

The world is a different place, but arguments in the peer-to-peer file-sharing space are still based on days gone by. The current crop of peer-to-peer file-sharing applications is actually hurting its own cause by purposely not innovating applications. The reason? They are scared they might be told to start filtering.