‘Feelings about sharing data can be context and time dependent – you can’t just do one survey or focus group’

Real World Data Science sits down with Helen Miller-Bakewell of the UK Office for Statistics Regulation to talk data sharing and linkage in government – progress made, hurdles to overcome, the importance of public engagement, and why ‘social licence’ is key.

Data sharing
Data linkage
Public engagement
Author

Brian Tarran

Published

October 16, 2023

This summer, the UK Office for Statistics Regulation (OSR) published its report on Data Sharing and Linkage for the Public Good. In the report, the OSR notes that the value of sharing and linking data has become widely recognised within government, though there remain areas of challenge and uncertainties about “the public’s attitude to, and confidence in, data sharing.”

The report also warns that “unless significant changes are implemented… progress that has been made could be lost and the potential for data sharing and linkage to deliver public good will not be achieved.”

To find out more about the report and its recommendations for change, we sat down with Helen Miller-Bakewell, OSR’s head of development and impact. Listen to the full interview below or on YouTube.

Transcript

This transcript has been produced using speech-to-text transcription software. It has been only lightly edited to correct mistranscriptions and remove some repetitions.

Brian Tarran
Hello, I’m Brian Tarran, editor of realworlddatascience.net. And welcome to another Real World Data Science interview. Today I’m speaking with Helen Miller-Bakewell of the Office for Statistics Regulation. And we’re talking about the Office’s July 2023 report, Data Sharing and Linkage for the Public Good. In this report, the OSR reviews progress that has been made towards sharing and linking data for the public good. It says that the value of sharing and linking data has become widely recognised within government, though there remain areas of challenge and uncertainties about, quote, the public’s attitude to and confidence in data sharing. The report warns that quote, unless significant changes are implemented, the OSR is concerned that progress that has been made could be lost and the potential for data sharing and linkage to deliver public good will not be achieved. In my interview with Helen, we talk about some of the key highlights and findings of the report some of its main recommendations, some examples of data sharing and linkage that are going on now within government. So let’s hand over now to Helen, who will begin by introducing herself, her role within OSR and some of the background to the report.

Helen Miller-Bakewell
So I’m Helen, Helen Miller-Bakewell, I have an official title of head of development and impact within OSR. OSR as a whole, our kind of formal job is to regulate statistics produced by government, we are the regulatory arm of the UK Statistics Authority. And our aim is to work towards statistics that serves the public good, and a government that produces and uses statistics analysis in a way that means the public can feel confident in them, and the analysis that that’s done and how they’re used. Within OSR, I used to be a regulator of statistics, I was out there looking at official statistics on crime and security, holding them up against our code of practice for statistics, which sets the standards that we would like to see government statistics meet. In my role now I actually oversee a few of our cross organisation functions, that all are designed to try and improve the way that OSR as a whole works, and to do work that can support the statistical system as a whole. And the most relevant one for today is the data and methods function. And a key piece of work that that function has been working on for the last year is looking at how data sharing and linkage is done across government, and that supports one of OSR’s wider interests and ambitions. One of our ambitions for the current five years is to make greater data available in a secure way for research and evaluation. That’s what this report that we’re going to be talking about has contributed to.

Brian Tarran
Before we get stuck into the report, maybe it’s worth setting out what does OSR mean when it’s talking about data sharing and linkage? What’s the driver for this being a kind of a priority, something that OSR wants to encourage and to see happen? What are the public benefits that you hope to accrue from it?

Helen Miller-Bakewell
The general premise that’s underpinned the report, and the reason that we have been a champion and advocate for date sharing and linkage for a while, is that we think data can be more powerful when it’s linked, and when it’s made accessible, in a secure way, to a wider audience for analysis. And it can offer more insights and better fulfil its potential to serve the public goods. And again, I keep saying serve the public goods. Within OSR, we very much think that statistics analysis, it shouldn’t just be for government, for decision makers in government, it should be available to the public, to all stakeholders, really, who wants to use it to make decisions hold government to account.

Brian Tarran
So for people reading this or listening to this, who haven’t yet had a chance to dig into the report – and it’s very interesting report – what are the key messages that you want to share with people? What is the, I guess, what’s the assessment of the current state of data sharing and linkage?

Helen Miller-Bakewell
I think some of the key messages do echo what other reports have said in this space, which is that there really has been some excellent progress in terms of data sharing and data linkage. We last reported on data sharing and linkage back in 2018 and 2019, when the DA was kind of coming into force and things were starting to move slowly.

Brian Tarran
Sorry to interrupt you. But what is the DEA did you say?

Helen Miller-Bakewell
It’s the Digital Economy Act. So there was some amendments to the DEA, it created new, new powers to enable greater sharing of data for research and statistics, and it put the Office for National Statistics at the centre of those powers. It kind of gave ONS greater powers to ask other departments across the government to share data with them for research and statistical purposes. And the UK Statistics Authority as well have powers to accredit researcher and accredit processing environments so that people can have access to more data as well across governments.

Brian Tarran
So your last report was in 2018, 2019 time. There’s been good progress, you say, since then. I wanted to ask you about some good examples of data linkage that you’ve seen in that time. Maybe the obvious one is Covid, I’m guessing. The pandemic, that offered a lot of opportunities for linking different datasets together. But are there– is that it? Or are there others that you want to highlight?

Helen Miller-Bakewell
I think the pandemic definitely provided a really strong impetus to share data, to link data. And I think, you know, we saw things done which hadn’t been possible previously, it broke down barriers, and was a kind of an excellent enabler, which is fantastic, because it was a crisis situation. And actually, yes, one of the examples that I would highlight, and I know others in OSR would highlight that was doing Covid, was Office for National Statistics estimates of Covid-19 mortality rates among different ethnic groups, and that drew on census data, it drew on death registration data at your on hospital episode statistics, it drew on data from lots of different places to create some really, really important analysis. It’s exciting to be able to say now that there are good examples of data sharing and linkage across different topic areas and different organisations. And I think if you spoke to regulators in OSR working in different domains, they will each have like their favourite examples of, of data sharing and linkage. Having worked in crime and security regulation myself, the one that comes straight to mind is Data First, which is data linkage project led by Ministry of Justice, working with ADR UK – another one, I don’t have to say full often, Administrative Data Research UK – and that’s done a fantastic job of kind of opening up access again, in a secure way, a real focus on security, to a wealth of data from across MOJ systems, sometimes horrible, clunky legacy systems – I hope none won’t be offended if I say that – but making it valuable because people can, you know, to a greater extent now link it and link it to data from other departments as well.

I did ask a couple of colleagues if they wanted to throw me any other examples of data sharing and linkage that they particularly think highly of, and another one that came up was the Registration and Population Interaction Database, RAPID database, which has been created by DWP [Department for Work and Pensions]. And that provides that brings together data, information from DWP, HMRC [His Majesty’s Revenue & Customs] and local authorities to try and give a view of citizens’ interactions across the breadth of DWP services. What the report does say is, although we have these good examples, there are still barriers and challenges to doing data sharing and linkage. And that, that can, can be true across the whole process, right from getting support for the idea, through the practical steps of finding out what data is available, where, who owns it, how you can get to it, and then actually doing the linkage bit technically at the end. So we’ve been in a situation where things definitely have improved. But it’s, in many cases, it is not easy or efficient yet to share or link data. Our report talks about different barriers we heard about during the course of interviews with stakeholders, that we’ve encountered through our regulatory work as well. And we make 16 – to be precise – recommendations for how we could, or how government could, could start to chip away at those to, to improve the situation going forward.

Brian Tarran
Yeah, and I would like to talk about some of those recommendations, I guess the more technical side of the recommendations, a bit, a bit later. One thing I wanted to ask you now about was about a word that sort of jumped out at me in, in the report was this idea of needing there to be a social licence for data sharing. And in the report, social licence has, has been defined as the, like, the level of acceptance or approval in local communities for data linkage projects. Now, I guess for something like the pandemic, right, you can argue that there is a kind of an implicit social licence, it was an emergency situation, there was, you know, people at risk. So that sort of use case was kind of justified, but I was curious about how social licences for these things can best be established and, and maintained, because that’s about kind of interaction with the public, right? You know, you could put a load of government statisticians and data scientists into a room and say, what could you do with this, all this data and how you could, could you link it all together? And they’d get very excited about it? But actually, then, how do you then take that to the public and convince them that it’s a good idea, or are there other ways of making sure this social licence has been obtained?

Helen Miller-Bakewell
So, social licence and public engagement were one of the topics that was good, most consistently mentioned across the interview– interviews we held. And yeah, there just seemed to be a consensus which we would support that those working on data sharing and linkage should be prioritising public engagement around their work, both to kind of gauge the amount of social licence that there might be, or any sticking points, and potentially start to build social licence as well. And I think, you know, it’s really great to see people thinking like that, and I would actually say as well, it seems to be a recurring theme. And it’s really nice to see that having prominence. Another finding from the interviews was, you know, the flip side of this, yes, people think it’s important, but often people can not quite know how to approach public engagement, building social licence. And there are actually a few examples we highlight in the report where we, where we think public engagement has been done, done well, hopefully, to kind of inspire people. I would say, you know, I have a few kind of overarching thoughts on what can be important. And I think it, it comes so much to trust and trustworthiness. And I think the way, some ways that you can support trust with the public are transparency, saying what you’re going to do and why, and how, and actually the outcomes as well, when possible, let’s sit closing the circle. Thinking about this interview, I remembered OSR ran a public dialogue last year with ADR UK. And we were trying to talk to members of the public about what they understand by the public good of statistics and data. And one of the messages that came up there was people want to feel the impact their data and feel that outcome. And they don’t always feel they get that knowledge back. Like, what, what was, so what was the outcome of you having my data and doing these things that you kind of said you would do? So yeah, transparency, and then kind of linked to that, I guess, like continuous engagement, and considered engagement. And the public is not a homogenous group, there will be different groups that are important to engage for different data sharing, different data linkage projects, and you kind of need to consider who are the people you need to really engage with for your specific initiatives. And then it’s, I’m afraid you can’t just do one survey or one focus group, there needs to be some kind of mechanism for getting more continuous engagement to keep an eye on actually, how are people feeling now, because we know that social licence and people’s feelings about sharing data, it can be context dependent, it can be time dependent. And actually the first couple of recommendations in the report, so right there at one and two, number one is about the value of trackers like CDEI’s [Centre for Data Ethics and Innovation] Public Attitudes Survey, which, you know, are run on a semi-regular basis to try and track how the public are feeling at a high level about questions around data. And then our second recommendation is about having an organisation that can do more to produce guidance and support people doing research to do public engagement well.

Brian Tarran
Does the mission statement of statistics for the public good, does that kind of help guide the approach, right? So any data linkage, data sharing project, you need to understand, you need to think about okay, what’s the public good that we’re trying to achieve here? And then that becomes your, almost the point, the focal point of the discussion with the public about why we want to do this and why we think there’ll be a benefit.

Helen Miller-Bakewell
Absolutely. This focus on the public good is what we always come back to in, in OSR. And again, it is something that can sometimes slightly differentiate us from other organisations in this space where there may be a very internal government focus. Yeah, absolutely. What’s the outcome that’s seeking to be achieved? And will we achieve it? Did we achieve it?

Brian Tarran
One of the parts of the report that I was quite interested in was the four future scenarios. The task you set yourself was to look five years from, from now at where we might be, and I guess give a range of, like, scenarios in which data linkage is great, and everyone’s doing it and it’s fully supported, down to it’s, you know, it’s happening on a piecemeal basis or not at all. So what I wanted to understand was, how those scena– whether any of those scenarios are more likely than others to emerge, and whether the kind of likelihood of those scenarios emerging are dependent on certain of your recommendations.

Helen Miller-Bakewell
The scenarios just allowed us to explore in a theoretical way, like, yeah, where, where could we end up? We hope the 16 recommendations we’ve made, taken together, if they could all be fully delivered, could lead us towards that ultimate scenario of data sharing and linkage for the public good. And you can put quite neatly different recommendations against different bits of that scenario to help get, get us there. I think if, if I reflect on the scenarios like right now, the one that feels like most familiar to me is data sharing and linkage in silos. We’ve kind of spoken a little bit earlier already about how there are some areas of government and some topic areas and some organisations that are doing some really brilliant work. And I could see quite a realistic scenario where that kind of becomes more entrenched over, over the next five years. But, you know, maybe that’s actually just realistic, right? Maybe it’s unrealistic to expect that every organisation [in] government with different, different sizes, different funding, different priorities, could, could end up in exactly the same place on date sharing and linkage, all at the kind of, the top level. But I do, I do think if we can chip away at the recommendations we’ve made, then every organisation could improve on their starting point and move towards that, that scenario.

Brian Tarran
Okay, so let’s talk about some of the recommendations of how we get there. And there were a couple of areas I particularly wanted to focus on. One was talking about career frameworks, and having those kind of reflect, and I guess, reward the skills of those who are working on data and data linkage projects. So I was kind of wondering, you know, are there, are these skills that are kind of currently either underserved or under recognised within the existing career frameworks? And if so, how do we change that? Or was that is that beyond the scope of your report to recommend that?

Helen Miller-Bakewell
I think the situation across government is perhaps relatively complicated here. And I think what we, what we’d really like to see, essentially, is a situation where people in roles, in data roles, basically feel valued, and they can see a clear career pathway for them within government. And I think what we have at the moment is a variety of career frameworks that support people working in in roles with data and a decentralised pay model. Which means pay scales across, across roles can, can – and frameworks – can vary. You can see intuitively, how that has the potential to kind of create confusion for individuals – like well, which career framework should I look at – and, and then on the pay side, the potential to create kind of skills sink, where people want to go and work particular areas simply because they can be paid more, and in some cases it’s a considerable amount more, actually. And the reason I say it’s complicated is because, I mean, to some extent, this, what we have, is appropriate. People who work on data and sharing and linkage projects can come from lots of different analytical backgrounds – like, you could have a statistician or a social researcher or a data scientist or data architect, they could all be working on a, on a data sharing project, shall we say. And actually, it kind of makes sense that people in those kind of different roles might have different career frameworks and paths. And similarly, as I say, pay is decentralised, I don’t think OSR has much power to change too much there. But, you can see there are arguments for departments being able to have their own say on what skills they need to kind of pay more for in different circumstances at different times to, to bolster things. However, what we have said in the report, what we call for, and what we will try and speak to people who own frameworks to try and facilitate, is just a bit more awareness between people who own different frameworks about what else is out there, and a bit more consistency, therefore, about how different data skills perhaps are reflected and where they’re reflected in different frameworks. So we’re not asking for one single framework, I don’t think that would be practical, or particularly serve people working in data very well. But yeah, like more awareness, better joined up working, more consistent use of frameworks in, in job adverts, for example, just to help people see more clarity about their careers and where they can take them.

Brian Tarran
I was at an event recently where, you know, people working in, in data science or data, data kind of roles in government, were talking about how, like, the sort of career pathways and that there’s a feeling that technical skills, or growing technical skills aren’t always as well rewarded as greater sort of managerial responsibilities, so that if you become a, someone who’s excellent at being able to solve the, the knotty problems of data linkage and sharing, right, you might not be as well compensated for that as you might be if you were, say, running a team of 20, 30 people and sort of not actually applying those technical skills on a day-to-day basis. So it’s, I guess that was where my question was coming from was, is it that sort of thing that we kind of need to, need to address in some way? But, again, that might be beyond the scope of, you know, what you were looking at it on this particular matter?

Helen Miller-Bakewell
Well, I think it’s something for us to think more about, if I’m honest. We have committed ourselves to a follow on report for this. And we want to kind of take a look at the recommendations next year and see, you know, how far everything’s got. So, you know, any, any additional bits that we haven’t covered in this first report are good for us to think about. I think the situation you’ve just outlined sounds very familiar, to be honest, and not just across data science, across a whole load of roles where, yeah, actually your technical skills, they take you really well to middle management, but then there often comes a point where, if you want to go higher and have greater remuneration, you might have to move more to those softer skills, those managerial skills. That’s something we can think a bit more about, actually. And when we do have conversations with those who we kind of pointed to for the frameworks recommendation, maybe have a look into it a bit more.

Brian Tarran
This is, I guess, some somewhat related to the previous point. But the other aspect of the recommendations that jumped out to me were the discussion around quality metadata and documentation, standardisation and things like that as being priorities for effective data linkage, these are the things that need to be in place. But again, when you chat to researchers, not just in government but all over the place, these less glamorous aspects of data management are kind of underappreciated. And often teams are, the way that, the way teams work, the way projects work is you finish one, you move on to the next, you don’t really want to think too much about the one that you just worked on, because you’ve got a new priority or a new round of funding or whatever it is. So how do we convince senior leaders that, that there are sufficient resources for this sort of work that needs to be done? It might not sort of deliver necessarily immediate value and benefits, but it’s about kind of accruing the kind of infrastructure, I guess, to, to make sure that data sharing and linkage, you can achieve your, you know, your most optimistic vision of data sharing and linkage being widespread in government.

Helen Miller-Bakewell
Yeah, I think it’s a really important question. And yeah, having worked as a statistician as well, before I came into this world of regulation, yep, I recognise what you just described. And, you know, I think it’s always going to be a challenge in these kind of fast-paced multiple priority environments, where often people are resource stretched in terms of people, time, money, all those things together. Though, I think there’s a couple of tacks. One, I think, is maybe improving the data literacy of senior leaders, and trying to give them a greater understanding of, of data, how it’s used, and around these issues of kind of standardisation, and why they’re why they’re so important. And, actually, a couple of the recommendations earlier on in the report, in the people section, are around improving the– or strengthening the statistical literacy and the data literacy of senior leaders and recommending they go on, for example, the Data Science Campus in ONS run a master class for senior leaders across the service. So I think there was a kind of a bit of a, an education thing. And yeah, I guess, you know, part of that is setting out, like, what are the benefits? And what are the risks of not doing this? I think that, you know, here comes a role for people like OSR in setting expectations, especially in the world of official statistics, it is completely within our power to set the expectation for what government statistics should be doing with regards to metadata, or kind of following, following best practice and things like that. Our code of practice for statistics does do that to some extent already. And us as well, there’s a role for us in demonstrating the benefits and saying why we’re asking people to do these things and, and what’s good when it, when it goes well. I was thinking about this, and it drew to mind the EAST Framework. I don’t know if you’ve come across that. It’s a framework that was introduced to me by the Behavioural Insights Team for bringing about change and what you– what interventions need to be if they’re going to be successful, and it’s Easy, Attractive, Social and Timely. And I think when we and when other organisations who are kind of working on metadata standardisation, like the Central Digital Data Office, like Department for Science, Innovation and Technology, there’s, you know, there’s a few players here. We need to be keeping these, the EAST in mind as we design to try and help people kind of come on board with things more easily. And yeah, recommendation 16 in the report, the final one, is about standardisation and about, there are lots of players in this space and can we, can we bring them together a bit to be even more effective? So that’s, that’s definitely something we’ll be looking at in the coming months.

Brian Tarran
And you said that there’ll be a follow up report soon. When is that? When are you targeting?

Helen Miller-Bakewell
We’re planning next summer. So last time when we did our first report on Joined-up Data in 2018. And then we did a follow on one year on in 2019. We found that was quite an effective way to, for us and others, to kind of build and maintain momentum. And yeah, again, you know, feels a bit unfair almost to just put out a load of recommendations, and then then leave, leave the world to it. We’d like to see if we can help facilitate and then tell people how we’ve been getting on.

Brian Tarran
And I’m guessing it’s not, you’re not looking for all recommend– 16 recommendations to be implemented by next year. But it’s, are we making steps towards some of them? Are we are we heading in the right direction?

Helen Miller-Bakewell
You know, let’s practice what we preach. A bit of transparency. Yeah, are we heading in the right direction? And if we’re not, you know, is there a plan? I’d love to be optimistic. That optimistic. I wouldn’t expect that we can just put ticks against all 16 recommendations next year. But hopefully, yeah, we can, we could do some progress bars.

Brian Tarran
Excellent. Well, we should probably schedule a follow up interview for a year’s time then. But Helen, thank you very much for your time today.

Helen Miller-Bakewell
Oh, you’re very welcome. And if anyone, any of your listeners interested in, in the report, please do get in touch with OSR. We’d be very happy to talk about the report that we’ve just written or about, you know what we’re doing. Following on from that? Yeah, thank you.

Brian Tarran
So we’ll definitely put a link to the report in the show notes. So once again, Helen, thank you very much for joining us.

Helen Miller-Bakewell
Oh, you’re welcome. Thank you.

Find more Interviews

Copyright and licence
© 2023 Royal Statistical Society

This interview is licensed under a Creative Commons Attribution 4.0 (CC BY 4.0) International licence.

How to cite
Tarran, Brian. 2023. “‘Feelings about sharing data can be context and time dependent – you can’t just do one survey or focus group.’” Real World Data Science, October 16, 2023. URL