The value of opening up government data

I presented at the British Library’s event “Open and Engaged” as part of Open Access week on 22 October 2018, on the value of opening up government data. Here are the slides, which I’ve adapted into the following post, with a post-script of additions generously suggested by the ever-excellent Steve Messer.

Why has government opened up data?

Probably the first motivation for opening up government data was to increase transparency and trust.
The MP expenses scandal led to a political drive to make government and politics more transparent.

Data.gov.uk was commissioned by Gordon Brown and overseen by Tim Berners-Lee, and built in in 2009/10.

In 2010 Prime Minister David Cameron wrote to government departments on plans to open up government data, promising “Greater transparency across government”. He wrote of a desire to
“enable the public to hold politicians and public bodies to account”, “deliver better value for money in public spending” and “realise significant economic benefits”.

What transparency data is published?

Theresa May published a letter in December 2017 clarifying what data departments and the Cabinet Office were expected to publish. This includes things like the pay of senior civil servants, and central government spending over £25,000. (Monitoring and enforcing this is an interesting challenge. Subsequent to this talk, I’ve been having some thoughts and conversations about how we might do this better.)

Transparency data around the world

In the USA you can track spend data back to policy commitments:

It will have taken a lot of political work to have consistent identifiers between different parts of government, so that this type of scrutiny is possible. Not glamorous, but very valuable – a trend you’ll see more of in data work.

My favourite example of transparency work in other countries is DoZorro.org:

Ukraine’s recent reform work is highlighted by its more open online public procurement system. An ecosystem of tools and an engaged community has emerged around this data. Citizen monitoring platform www.DoZorro.org has been used to bring 22 criminal charges and 79 sanctions so far.

This open procurement data has also led to the creation of a tool for identifying corruption risks http://risk.dozorro.org/:

It’s also led to the creation of a business intelligence tool http://bi.prozorro.org:

This takes us to the second big benefit of opening up government data.

Economic value of opening up government data

Open data improves data sharing within government. Previously, having to send a Freedom of Information request to someone else in your own department to access information was a thing that actually happened.

Looking at the datasets on data.gov.uk that are used more, they generally have a clear economic use. These include datsets with information on land and property payments, or information on MOT testing stations. Other popular datasets are more related to understanding society, and are likely used by councils, third sector organisations and other agencies interested in planning service provision – e.g. English Indices of Deprivation 2010 and Statistics on Obesity, Physical Activity and Diet, England.

Measuring value is hard

I don’t think the above section was compelling enough. This is because measuring the value of open data is hard. There are a number of different techniques you can use to measure value. None of them are great – either you have something cheap and broad, which doesn’t give deep insight, or you have to commission a deep and expensive study.

Johanna Walker, University of Southampton, is a great source on this type of thing. She presented on ‘Measuring the Impact of Open Data’, Paris, 14 September 2018. (Aside: Johanna Walker suggested semantically-augmented version control as a way of ensuring quality, consistency and giving a better idea of how a dataset is being used.)

This post has more thoughts about the value of open data.

International Leadership

The UK’s early work on opening up government data has helped set the direction internationally.

International rankings are a relative thing, and the rest of the world has been catching up.
The UK has been first in the Open Data Barometer for 5 years in a row. Now we’re joint first with Canada

Global Open Data Index

Some other countries are doing really good things with procurement data. This work is hard – it took Brazil 5 years to get consistent identifiers so that you can link policy to budget to spend.

Challenges highlighted by user research

International rankings are lovely, but what do we know about the use of open government data and the challenges associated with this?

Government data is hard to find and use. Working with government data is hard – even for users who know government:

  • Metadata is inconsistent and incomplete.
  • Titles are sometimes obscure.
  • Some datasets are published more than once.
  • Data users can only understand how useful the data is once they’ve downloaded it and not all data users have the capability to download and view certain types of data.

Competing catalogues, standards, technologies and propositions.

We don’t have consistent, reliable data:

  • Basic data skills and literacy aren’t strong enough to consistently produce data that is findable, usable, comparable and reliable.
  • This means that many of the solutions designed to make data easier to find are only theoretically possible.
  • It means that many services that need consistent, reliable data are also only theoretical

Where there is a relationship between the data publisher/producer and the data user, the quality of the data and metadata is better

Publishing is not enough

Publishing open data is a crucial start. But it isn’t enough.

We need to optimise for use and value.

“It is not enough to have open data; quality, reliability and accessibility are also required.”

Theresa May, December 2017

“Availability is only one aspect of openness, and the government also has a role to play in increasing the value of data, through structuring and linking datasets.

Treasury Discussion Paper – The Economic Value of Data, 2018

“If no-one can find and understand data it is not open.”

Laura Koesten

How to get more value from data

Some ideas by Elena Simperl and Johanna Walker (Analytical Report 8: The Future of Open Data Portals, 2017) on how we might get more value from open data portals. These actually apply more broadly. Some highlights to pick out:

  • Linking and interoperability, including consistent schemas. So that you can get network effects from combining datasets.
  • Colocation of documentation and tools to reduce barriers to entry.
  • Organisation for and promotion of use – thinking about how to get value out of the data rather than seeing the job as finished when the data has been published. So reflecting back the use that has been made of data to teach and inspire others, some level of fostering of a community around the data.

(Analytical Report 8: The Future of Open Data Portals, 2017, Elena Simperl and Johanna Walker)

Opening up the Geospatial frontier

There’s lots of excitement around the potential of geospatial data. In the UK there’s a newly-created Geospatial Commission looking into how to open up this data. One of its key early tasks is opening up some of the Ordnance Survey’s Master Map data. This looks likely to be on a tiered basis, so free up to a point but paid beyond that.

Some highlights of what this Master Map gives include: Building Height data, Water Network Layer.



Even here we have the question of linked identifiers. The Geo6 are looking at this. This is the kind of graft we need to get the most value from our data.

In summary

Transparency and economic value are the key drivers behind government publishing open data.

But publishing data is not enough – we need to work hard to understand user needs and maximise use and the value derived from data.

Bonus: Extra insights from Steve Messer

TfL

They opened up their data en masse and waited for people to use it before improving it. Here’s the original download page and a post by Chris Applegate on why the first release could be improved. It has since been improved thus proving your point about ‘optimise for use and value’. Optimise after you’ve opened it up. The MVP for opening up data is creating a good standard and putting it out there.

Economic value

I’ve tried to find some £figures to help you back this point up, as below:

Local government

I once made a timeline of notable events in local gov open data. It’s in this blog post.