A summary of the sessions I attended at the Open Data Institute’s Summit on 12 November 2019
Tim Berners-Lee and Nigel Shadbolt, interviewed by Zoe Kleinman
Tim Berners-Lee described commercial advertising as “win-win”, because targeted advertising is more relevant. But “political advertising is very different… people are being manipulated into voting for things which are not in their best interests.”
Nigel Shadbolt: There’s a risk that people just move on to new shiny things. Creating a common data infrastructure is unfinished business.
Berners-Lee: We should be able to choose where our data is shared, rather than it just being impossible because systems can’t speak to each other. “You can share things with people that you want to share it with to get your life done.”
Shadbolt: Data sharing has to be consensual. Public data shouldn’t be privatised. We need transparency and accountability of algorithms used to make decisions on the basis of data. Platform providers are controlling and tuning the algorithms.
Berners-Lee: How might we train algorithms to feed us news that optimises for ‘aha’ connection moments, rather than feelings of revulsion?
Kriti Sharma – Can AI create a fairer world?
If you’re building tools with data, the biases of that data are perpetuated and potentially amplified, which can worsen existing inequalities. e.g. access to credit or benefits, or deciding who gets job interviews.
- Early on in a design process, think about how things could go wrong.
- Train machine learning or AI on more diverse datasets.
An MIT test of facial recognition found an error rate of 1% with white-skinned men. For darker skinned women, the error rate was 35%.
- Build diverse teams. Only 12% of the workforce on AI and machine learning are women. A more diverse team is more likely to question and correct biases.
Data Pitch accelerator
A EU funded accelerator, connecting public and private sectors to create some new data-driven products and services. A 3-year project.
28 data challenges, 13 countries.
4.6 million euros invested
14.8 million euros “value unlocked” – additional sales, investment and efficiencies. These are actual numbers, not optimistic forecasts.
datapitch.eu/datasharingtoolkit
How do we cultivate open data ecosystems?
Richard Dobson, Energy Systems Catapult
Leigh Dodds, Open Data Institute
Rachel Rank, Chief Exec, 360 Giving
Huw Davies, Ecosystem Development Director, Open Banking
Energy Systems Catapult:
If you want to move to renewable energy, you need to know what’s produced, where, and when.
So BEIS, through a Catapult scheme, set up a challenge on this. Seamless data sharing was crucial.
360 Giving:
Help grant makers open up their grant data in an open format so people can see who is funding what, why, and how much.
Open Banking:
Catalysed by regulation from the Competition and Markets authority. UK required largest banks to fund an implementation entity, to make sure it was effective and standards-driven to set up a thriving ecosystem. So they worked on standards for consent and security. Every 2 months the ecosystem doubles in size.
When encouraging people to contribute to an ecosystem, show value, don’t tell people about it.
Don’t talk to people about organisational identifiers. Show them why you can’t see their grants alongside the other grants because they haven’t been collecting these. People had such low insight into what other people were funding, that this was very compelling. Make people feel left out if they aren’t sharing their data.
Thoughts on making a healthy ecosystem:
You need standards for an ecosystem to scale
Accept that even with common standards and APIs you’ll get a few different technical service providers emerge, then people emerge who add value on top of this. (This was the experience in Open Banking)
“You can’t over-emphasise the importance of good facilitation at the heart of the ecosystem”
(I took this as: you need investment from somewhere to make this collaboration happen)
Open Banking did lots of work to collaboratively set up standards that everyone bought into. And they did lots of work facilitating and matchmaking to get people working together, to understand each other and provide more value.
Need to move away from just thinking about publishers and consumers. Think about the ecosystem more widely.
“When great stuff happens, shine a light on it and celebrate it”
Don’t pre-empt your users. They’ll surprise you.
Work out a way to police/protect data quality without having a single point of failure
Don’t aim for perfection, aim for progress
Start with what you’ve got. Perfect data doesn’t exist.
Caroline Criado Perez – Invisible Women: exposing data bias in a world designed for men
[This was the best session of the day by far. Excellent insight and communication.]
Most data, and the decisions based on it, has been predicted on the male experience.
Le Corbusier defined the generic human as a 6ft British police detective, as the archetype to design buildings for. Rejected the female body as too unharmonious.
Voice recognition software is 70% more accurate for men. 70% of the sample databases are male.
Car crash test dummies for decades were only male. The female ones used now are just scaled down male ones. 2015 EU regulations only said that female crash dummies should be used in 1/5 tests, and only in the passenger seat. Women are 47% more likely to be injured in a car crash and 17% more likely to die.
Medical diagrams generally centre the male body, and then have the female body as little extracts on the side. Female body seen as a deviant from the (male) standard.
Yes, the menstrual cycle is a complicating factor. So you need to study it! Heart medication and antidepressants are affected by it.
How many treatments might we have ruled out because they didn’t work on men, but might work on women but we never researched them because they didn’t work on the default male body?
Young women are almost twice as likely as men to die of heart problems in hospital.
Machine learning amplifies our biases.
A 2017 study on image labelling algorithms found that pictures involving cooking were 33% more likely to be categorised as women.
When thinking about different types of use of transport, the way that you classify different types of travel is important. If you don’t bundle ‘care’ together as a category, you can undersell its importance relative to employment-relate travel. In general, we undervalue women’s unpaid care work. You should collect sex aggregated data. Be careful of not doing this by proxy.
Women tend to assess their intelligence accurately. Men of average intelligence think they’e more intelligent than 2/3 of people.
Equality doesn’t mean treating women like men. Men are not the standard that women fail to live up to. Don’t fall into this when you try to fix inequality.
Diversity is the best fix for this sort of thing.
Intersectionality is even more of a problem, but wasn’t the focus of this session.
John Sheridan, Digital Director at the National Archives
Context in which data was created is important.
Good quality URLs essential to data infrastructure
Good quality processes for changing. Understanding user needs better and improving the data.
Manit Chander on information sharing in the maritime industry
In maritime industry, information sharing has been fragmented, and data classification not standardised.
HiLo gets internal near-miss data, does predictive risk modelling, and produces risk analysis and good practice.
They get messy data shared with them and then tidy it up at their end.
They produce simple, easy-to-apply, non-judgmental insights.
They focus on building trust as the most important thing to sustain the community.
The people providing the data are the key group here.
People will share their information if they can see value to them.
- Reduced risk of lifeboat incidents by 72%
- Reduced engine room fires by 62%
- Reduced risk of bunker spills by 25%