Should "open source" include open data?

Open source is increasingly meaningless if the data it manages isn't open, too. Should the Open Source Definition be expanded to include data provisions?

Tech Culture

I just read Glyn Moody's post on the importance of open data and, increasingly, open source, in science. Good science requires good data--data available to any who want to replicate another's results and ensure that true science is going on, not pseudo-science.

Marry that to Tim O'Reilly's insistence that data, not code, is the new lock-in (and cross that with my own declaration that Microsoft's new platform for lock-in is Sharepoint, not Office), and you end up with what I think is an implicit, urgent need in open source today:

The need to ensure data remains free/open.

I'm not speaking for the Open Source Initiative here, but to me this makes it critical to add open data provisions to the Open Source Definition. Why? Because open source that locks down one's data is not all that open, in the grand scheme of things.

As some others have suggested, I, too, believe that Wesabe's Open Data Bill of Rights is a good start. It requires:

  • You can export and/or delete your data from Wesabe whenever you want;
  • Your data is your data, not ours. Our job is to help you understand and act on your data;
  • We'll keep all of your data online and accessible for as long as you have an account. No "archive access" charges;
  • Any data you want us to keep private, we will.
  • If a question comes up not covered by these rights, we will answer it remembering that your data belongs to you.

Simple and yet powerful. It means, essentially, that getting one's data/content out of Wesabe is as easy as putting it in. This is obviously good for the customer, and maybe makes monetizing user data more difficult than it otherwise would be. Too bad.

Open data should be an inalienable right that customers should expect from their vendors, whether the product is consumer-focused financial services (as Wesabe is) or enterprise-level content management software (as Sharepoint is).

Is there any downside to expanding the Open Source Definition to include data rights? Or should that be an entirely different body that approves data policies, separate from the OSI?

Autoplay: ON Autoplay: OFF