Who owns transit data?

Some transit agencies are opening up their data, some keep it under lock and key. But the prevailing movement is clearly toward openness.

Rafe Needleman Former Editor at Large
Rafe Needleman reviews mobile apps and products for fun, and picks startups apart when he gets bored. He has evaluated thousands of new companies, most of which have since gone out of business.
Rafe Needleman
6 min read

Commuters on public transit want to know two fundamental things: when can I expect the bus or train to pick me up? And when will it drop me off at my destination?

Nowadays, they may also be wondering whether their local transit agency is willing to share that data with others to put it into new and helpful formats.

How likely is it that the arrival and departure information will be available on a site or service other than the official one? That depends on how open your local agency is. In some metro areas, transit agencies make data--routes, schedules, and even real-time vehicle location feeds--available to developers to mash into whatever applications they wish. In others, the agencies lock down their information, claiming it may not be reused without permission or fee.

In local blogs and on transit sites, outrage over agencies and companies that claim ownership of the data is growing. The core argument against locking down such data is that it's collected by or paid for by public, taxpayer-funded agencies and thus should be open to all citizens, and that schedule data by itself is not protectable content. The argument against is that the agencies might be able to profit from using the data if they can maintain control of it. The counter to that is the belief that if the data is open, clever developers will create cool apps that make transit systems more usable, thus increasing ridership and helping transit agencies live up to their charters of moving people around and getting as many private cars as possible off the roads.

StationStops gives New York metro rail commuters a timetable in their iPhones. StationStops

Each city and metro area with a transit system is unique, but there are three cases in the U.S. that highlight the way the transit data drama can play out.

New York locks down subway schedules
As reported last week at ReadWriteWeb and elsewhere, the New York Metropolitan Transportation Agency believes its public train schedules fall under copyright law and thus applied an interpretation of the Digital Millennium Copyright Act (DMCA) to send a takedown notice to the developer of StationStops, an iPhone app that gives people access to train schedules on the Metro-North lines.

According to StationStops developer Chris Schoenfeld, the MTA claims that the StationStops iPhone app (not the Web site) infringes on MTA intellectual property. The MTA, Schoenfeld says, has sent a letter to Apple to get it to remove the app from the iTunes App Store. As of this writing, the $2.99 iPhone app is still available.

Schoenfeld does believe that he and the MTA will come to an agreement for use of the data, even though the initial communications were not promising: the MTA, he says, was asking for royalties on use of the data in arrears, at a price that would basically drive him out of business as an app developer in the category. Schoenfeld and his lawyer say that the data isn't protectable content.

Furthermore, Schoenfeld says the procedure that the MTA said it would use to update data for him and other developers is archaic: the MTA said the agency would send StationStops the schedule data on CD ROM, and that it would send him updates only after receiving paper letters requesting them--guaranteeing that Schoenfeld would never have current data.

San Francisco writes data accessibility into contracts

The Routesy iPhone app uses NextBus data to predict transit arrival times. Routesy
In San Francisco last week, Mayor Gavin Newsom unveiled (via TechCrunch) the Datasf.org initiative, which aims to put all the city's data online for open access. Included in the program is the San Francisco Municipal Transit Agency's schedule data. There's no question that this is a positive development for San Francisco Bay Area transit app developers and that it sets a good precedent for developers elsewhere. However, static schedule data is not the whole story for transit apps, especially on systems where route schedules are poorly adhered to (on New York's Metro-North lines, the schedules are somewhat reliable; for San Francisco's MUNI buses, they are not). The most useful new apps collect real-time vehicle location data, and access to that information is not yet available from SFData.

In many cities, a company called NextBus gathers location data from vehicles and then makes that information available to the subscribing cities as well as on its own Web site. Developers of real-time transit iPhone apps, such as San Francisco's Routesy and iCommute, have had mixed results in getting access to that data.

The drama around the NextBus data appears to be due in part to the actions of a separate company, confusingly called NextBus Information Systems, which has access to the Nextbus Inc. data and which has apparently claimed the exclusive right to license it. NBIS is run by the team that started, and then sold to Grey Island Systems, the original NextBus Inc. A claim from NBIS to the Apple iTunes store led to Routesy being taken down from the store, although it was reinstated to the store this month.

In San Francisco's recent renegotiation of the Nextbus contact, there is clear language that states that the real-time data is the SFMTA's property even though the SFMTA pays Nextbus to collect it, and that it may be made freely available to developers by the SFMTA (see last paragraph in this story). As SFMTA spokesperson Judson True says, "There were some legal loose ends from the original contract. We approved a new contract that has clear language on data ownership issues." True also says that there's a nationwide movement to make data created with public funds available to the public--and more importantly, available to entrepreneurs.

The SFMTA contract even specifically states that arrival time prediction data--information created by Nextbus based on data derived from vehicle locations--is part of the Nextbus agreement.

Visit Portland for the best in transit apps
In Portland, Ore., openness on the part of the local transit agency has been a blessing for transit app developers. There are more than 25 apps that use the public TriMet data stream. Many of the apps duplicate others' functions and features, but it's just this kind of competition that makes apps better over time. When companies control data about their services and are the only ones to provide the apps that use the data, users do not get the same benefit of rapid application evolution.

Google drives the bus
Google is the most aggressive company in the transit planning business. If you ask Google Maps for directions, by default it will route you by car, but you can also ask it to give you directions by public transit. In many metro areas, it will even direct you among different transit systems (from a local bus line to a commuter rail system, for example).

To get the data that powers the transit routing, Google became instrumental in the creation of a standard that transit agencies can use to publish schedules. The GTFS, or Google Transit Feed Specification, is a standard that Google has been developing in concert with transit agencies for the reporting of transit schedule data. First developed with Portland's TriMet agency, it's now being used by agencies in more than 400 cities, Google software engineer Joe Hughes says.

Google Maps gets its transit data using the GTFS spec that it helped develop. Screenshot by Rafe Needleman/CNET

Hughes confirmed that in recent months cities have been getting more receptive to the ideas of open and public transit data. "For a long time the default was not giving it out," he says. "I'm happy to see a change in the zeitgeist, a push for government transparency. People are putting this in their political platforms, which is helping."

Hughes says, "Agencies that lock up the data have less control over the accuracy of what's out there. It's also a false economy to charge for the data. If you put it out for free, you get great apps and more riders."

Currently, the GTFS has a spec only for schedule data. Google doesn't know where trains and buses actually are and thus can't tell you when you need to really get to the station to catch your train or bus--only when you should if the system you're riding respects its timetables. As city dwellers know, on many metro transit systems, timetables are fiction--hence the real value of the NextBus scheme, which predicts arrival times based on actual vehicle location. However, some agencies, like TriMet, have their own protocols for delivering real-time location data to application developers. Google is likely to work with TriMet and NextBus to bring real-time data to its transit effort, just as it did when it built the schedule-based GTFS spec.

For the moment, the movement in transit is toward an opening up of data, for both schedules and vehicle positions. It appears unlikely that agencies that attempt to hoard their data--or sell it--will be able to withstand the increasing public and political pressure to open it up.