The UK's local public transport data is effectively a closed dataset. The situation in the US seems similar: In spite of the benefits only a handful of agencies have released raw data freely (such as BART and TriMet on the west coast of America).

That hasn't stopped "screen-scraping" of data or simply typing in paper timetables (from Urban Mapping to many listed here). Unfortunately, the legal basis for scraping is complex, which creates significant risks for anyone building a business. For example, earlier this year, airline Ryanair requested the removal of all their data from Skyscanner, a flight price comparison site that gathers data by scraping airlines' websites. How many airlines would need to object to their data being scraped before a "price comparison" service becomes unusable?

User-generated mapping content is evolving, often to circumvent restrictive distribution of national mapping. Services include OpenStreetMap and the recently announced Google Map Maker.

Micro-blogging, primarily through Twitter, has started to show the potential of individual travellers to report information about their journeys: Ron Whitman's Commuter Feed is a good example. Tom Morris has also experimented with London Twitter feeds.

This article outlines why the "social web"/tech-entrepreneur sector may wish to stop trying to use official sources of data, and instead apply the technology it understands best: People.

The Big Picture

I will use the example of UK local bus data to summarise the strategic issues for data providers. I can only presume the issues are similar elsewhere (comments welcome).

Explaining exactly who the data providers are is one of the many problems of trying to extract and use the data. I would provide more detail, but the topic is somewhat sensitive. The most critical point in the chain that constructs and distributes the data are local authorities - sub-regional public bodies, typically those responsible for large cities, conurbations or counties. They process the data, but are not under any statutory requirement to do so (no national government legislation requires it).

There are a number of issues for the existing data providers:

  1. Mindset of centralised control: Most operators, public authorities, and other agencies, still have a mindset of centralised control of information, delivered to users via the method the agency believes is appropriate. This is heavily driven by the belief that only the agency can be accountable or impartial, and that incorrect information supplied by an uncontrolled third party is likely to damage the image of local transport service and generally reflect badly on the agency.
  2. Mindset of local: Most agencies are locally focused, locally orientated. It seems logical for them to commission a fully-functioning website or piece of information delivery software that is specific to their city, because their target market is local. There's a lack of global perspective: An agency will typically commission a system that is specific to their city, even when 95% of the features would work for any city, and 90% are already in existing global products.
  3. Not appreciating trends in delivery channels: There is still an attitude of "we'll provide a website", without a comprehension that the number of channels for delivery of information is exploding far faster than any one agency can hope to construct bespoke user interfaces to cater for. Mobile devices, integration into social software. There would probably a market for a "WiFi-enabled" alarm clock that would ring later if your morning train had been delayed: We simply can't define the limits for how this information might be used.
  4. Not appreciating trends in cost: Even large, well-funded agencies are starting to fall behind the technology. The cost of systems (many millions of dollars invested year on year in some cases) is starting to hurt. Logically the global system should win out, because one city is very much like another: There is considerable scope for sharing systems costs.

What It Means

Long term we are heading for global providers of information, that pool data from local sources. That will be forced by the cost of technology. This can be seen in technology costs driving things like agglomeration in the groceries sector (such as Walmart) over the last 30 years. Also in the move from customised mainframe computing, to shared operating systems and platforms (such as Windows). This will be worse, because the number of systems will be simultaneously exploding alongside the complexity of those systems.

As these issues become progressively better understood, data will become more centralised. Even in agencies where (in my opinion) uniqueness and absolute control are culturally in-breed, such as London Transport/TfL, cost will eventually win the argument.

However, centralised data handling does not automatically make the data open. Quite the opposite.

Contracted Provision

Currently, effective control of data is with local government. Many individuals within local government will naturally attempt to block any change that might leverage power away from them and their organisation. "Job protection" is an over-simplification, but helps explain the underlying position. But by contracting data handling and presentation to a third-party contractor, local government would gain the technological "economies of scale" (assuming the contractor won many contracts from different authorities) and notionally maintain control.

Use of third-party contractors is already common within the local government sector, particularly for Information Technology.

An example can be seen in Edinburgh City Council's Traffic Map. In spite of how it appears, the information isn't powered directly by Edinburgh City Council or Google. Instead it is part of Mott MacDonald's Common Data Management Facility, providing services under contract to many different local authorities.

In the UK public transport arena, Trapeze is a good example of the gradual agglomeration of data handling within a few large businesses, where historically many small software providers could be found.

The example above provides key driver information, and is somewhat useful, but is it the best outcome? I suspect not. Contracts tend to be priced highly, because local government clients are high risk: Their political control means that they can change their strategic direction and requirements unexpectedly. At best, customer feedback loops through local authorities are slow and politicised. At worst the design of the system will reflect the arbitrary views of a self-proclaimed expert (such as myself). Even if you think it is perfect, there is no scope for choice or creativity. Choice is good and need not be expensive.

Social Provision

Instead of using official data, why not let users reconstruct it? User-generated content is cheaper to create than information from professionally staffed sources: Since very many contributors do so little work, no individual expects payment. User-generated content can be just as accurate too, although this is not automatic: For example, a strong community will subject everything to peer review, weeding out poor information and contributors.

This is not an entirely theoretical position. There is a largely untapped human resource, just waiting to help.

The transport enthusiasts (transit fans, "spotters") already collate and produce some extremely high quality information about certain technical aspects of operations and services. For example, sites such as contain detail on the bus route timing and vehicle allocation (type and number of buses), which transpires to be difficult to extract from official sources. While it may be argued that these sites simply repackage official information, their very existence is a testament to the strength of underlying community.

Casual observation of people delayed on trains or in traffic suggests they derive some comfort from picking up their mobile (cell) phone and telling someone about it. Something they can do, in a scenario they otherwise have no control over. Their desire to communicate the same information to drivers or users 10 miles behind them (who might be able to re-plan their route, should they know) is untested. But the potential is intriguing.

Nobody has entirely worked out how to use these people; yet.

Battle Lines

If the social web/tech-entrepreneur sector chooses to fight the "status quo" head on, it does so against large multi-national IT providers who support clients with historically entrenched positions. Not a contest that favours the underdog.

If the tech' "upstarts" can find a way to use this human resource effectively, they will ultimately provide a more cost-effective solution than the traditional "government IT" sector can offer. Integrate that user-generated information into the wider consumer internet, and the machinery of government simply won't be able to justify its historic position of pouring millions into systems it controls. The "social web"/tech-entrepreneur sector wins.

The upstarts do not need perfect source data, if the implementation of results is considered to be better by users. The early Xephos vs TransportDirect comparisons provide some evidence. The success or failure of the social web/tech-entrepreneur sector is ultimately dependant on whether they can provide better information than official sources, using the resources and skills they have available to them.

Disclaimer: The contents of this article reflect my own personal analysis of the situation. This does not directly reflect advice to, or views of, government or anyone else involved in the handling and provision of public transportation data.

Read More

Similiar writings: Analysis, Bus, Collaboration, Google, government, Information Management, Journey Planner, micro-blogging, Public Transport, Rail, Web 2.0, Transport.

Archived Comments and Reactions