Open-sourcing factual data, Wikipedia style

FriendFeed's Bret Taylor says the barriers to accessing actual data, such as mapping and stocks info, are holding back innovation. But there's an alternative.

Bret Taylor, formerly of Google and now of FriendFeed, has a greater appreciation for the business development function. In a post today he wrote about the challenges of getting legal access to factual data--such as mapping, stock quotes, white pages, TV schedules, movie show times, and sports scores--for use in applications.

If you want to experiment with a new driving directions algorithm, it is infinitely more difficult than coming up with an algorithm; you have to hire a lawyer and a sign a contract with a company that collects that data in the country you are developing for.
Bret Taylor: Free the data

He adds that some of the data has quality problems or is incomplete. In sum, Taylor believes that innovation is stymied and the barrier to entry is raised in the current environment. It's not just the need for lawyers and contracts but also the issue of companies that sell data restricting use.

What the solution to freeing up the data? Taylor advocates open-sourcing factual data, and competing on use of the data, not access to it. He wrote:

To this end, I think we should create a Wikipedia for data: a global database for all of these important data sources to which we all contribute and that anyone can use. When a user reports an inaccurate phone number in your products, save it back to the DataWiki so everyone can benefit, and in return, you get everyone else's improvements as well. If your local movie theater doesn't have listings data in DataWiki, you can type it in yourself, and everyone in your town can benefit, and all the products you use that access movie listings will automatically update. Need better mapping data for a city? Pay to collect it, and upload it to the DataWiki. In return you get all the other cities other companies paid for (sort of like a company contributing device drivers to the Linux kernel).

For centuries, companies have made money in exchange for doing the busy work of collecting, massaging, and publishing factual data. The same was true for encyclopedia data until recently. Taylor is definitely onto something, but it presents some real data collection challenges. The open-source community is sure to take up the challenge.

The question is, will the companies that already have the data be of assistance? It's not exactly in their best financial interest to give away their content, but the example of Wikipedia should give them the incentive to press the pause button.

See also: Sarah Perez discusses where to find open data on the Web, such as CKAN (Comprehensive Knowledge Archive Network), OpenStreetMap and Freebase.

