X

Wikidata to provide structured data for all Wikipedia versions

A new initiative will make it possible for any language version of the online encyclopedia to automatically pull in data rather than enter the information manually. But will it drive off editors?

Daniel Terdiman Former Senior Writer / News
Daniel Terdiman is a senior writer at CNET News covering Twitter, Net culture, and everything in between.
Daniel Terdiman
2 min read

With more than 280 different language editions of Wikipedia often sharing data elements like people's birth dates and definitions, there has never been a single central data repository from which each version could pull such information. Until now.

Today, the German chapter of the Wikimedia Foundation pulled back the wraps on Wikidata, a project that is aiming to be a single common source of structured data that can be used across all versions of Wikipedia. By December, that should allow editors of each individual language version of a Wikipedia article to pull data from that repository rather than adding it by hand themselves.

The project has an initial budget of $1.7 million, half of which is being funded by Microsoft co-founder Paul Allen's Allen Institute for Artificial Intelligence. An additional 25 percent of the funding is coming from Google, and the remaining 25 percent is from the Gordon and Betty Moore Foundation.

According to Andrew Lih, the author of "The Wikipedia Revolution," the Wikidata effort is a "logical progression" for Wikipedia, and one that will do for data use in the massively popular online encyclopedia what the Wikimedia Commons has done for sharing media across the dozens of different Wikipedia versions.

"When Wikipedia started out, it was every Wikipedia [language version] for itself," Lih said. "It was a manual process of...copying articles over."

The Wikimedia Commons allows editors to automatically pull in things like photographs and maps, and Lih said he sees Wikidata's making structured data commonly available as "much more efficient, and a better way to organize data."

But Lih also worries that the move to more technical solutions like Wikidata will hasten a trend that may be scaring away less tech-savvy Wikipedia editors. Over the years, he explained, the act of adding new information or articles to Wikipedia has gone from being something simple that anyone can understand to being something that requires filling in code that will correctly enter the new data in a database. And that was before the advent of Wikidata.

The problem, he said, is that the Wikipedia editing user interface is becoming trickier and trickier to navigate, and that may be driving away people who have trouble with complex data entry systems, and making it hard for Wikipedia to recruit new editors. And that will only get worse as more and more articles are re-configured to pull information from Wikidata.

Still, Lih acknowledges that something like Wikidata is a tool that's essential for the future of Wikipedia because it will allow article editors to query the central database for things like, say, all Austrian actors born between May 5 and May 12. That's not something that's possible today because the information that appears in Wikipedia articles is nothing more than text.

"So even though I'm hesitant," Lih said, "there's no doubt that this is the way it has to go."