A month ago now, Real World Data Science published an interview with UK national statistician Professor Sir Ian Diamond. In the process of preparing the text of that interview for publication, I found myself reflecting on a conversation I’d been part of earlier in the year with Robert Santos, director of the US Census Bureau.
I met Santos in Toronto, Canada, in August – a few hours before his President’s Invited Address at the 2023 Joint Statistical Meetings. The meeting was arranged as a joint interview with Anna Britten, editor of our sister publication Significance magazine, and Santos was joined by Sallie Ann Keller, the Census Bureau’s chief scientist and associate director for research and methodology, and Michael Hawes, senior advisor for data access and privacy.
The interview with Santos, Keller, and Hawes was published in the October issue of Significance, so you may have already read it. But, following on from our Sir Ian Diamond interview, I thought it worth highlighting some of what Santos et al. had to say, particularly where key themes, challenges, and opportunities seem to resonate across both the US Census Bureau and the UK Office for National Statistics.
I’ve also gone back to the original interview recording to pick out some previously unpublished comments.
On the scale of the challenge Santos inherited on becoming director of the US Census Bureau in January 2022
Robert Santos: Certainly it was formidable – although I’m comforted in knowing, after a year and a half, that the career staff [of the Census Bureau] were well positioned to accept this challenge anyway, and were working on it. But the challenge was real. We had the pandemic. We had to, basically, not redesign but scramble and adapt to a really threatening situation where the entirety of the 2020 census was conducted before there was a vaccine, and when people didn’t know the nature of the beast. A huge chunk of this operation was conducted when society was shut down. And not only did the Census Bureau need to rethink, nimbly and quickly, how to do its operation, but so did all of the different community partners – which was really enlightening because we realised that, at the end of the day, we could not have completed this job alone. And now our position is that we cannot complete our mission without the external community. They’re the extra folks we need in order to understand better what the needs are, and therefore improve our methods and data and the relevance of what we’re doing.
So, we see our role now as having a continuous engagement with the entire country at all levels – be it elected officials, universities and professors and the research community, or data users like policy users and policy researchers, or local community organisations that are doing neighbourhood stuff. And so we’re actively working between censuses to engage them and show them the value of the data that we’ve collected – not just decennial [census data], but also our flagship American Community Survey and our Current Population Study and all the 130 other business, economic as well as household types of studies that we’re doing.
On the need to transform Census Bureau operations
Robert Santos: We absolutely have to transform and modernise our operation, from what was historically this transactional survey type of data collection – where we go to somebody that’s randomly sampled and we say, “Please give me your information” – and realise the value of taking that information, blending it with existing data, administrative data, even third-party private sector data, into a huge data pool and linking it together, and that will create new data products that will serve the public in ways that we never imagined before. And we already have some great examples of that. So, that transformation process is an incredible priority that we have to do, regardless of what our funding situation is. If we don’t do that, we’re not going to be able to serve the public in the way that we need to.
On laying the groundwork for the 2030 census and an increased use of administrative data
Robert Santos: There are a couple of things going on. One is that we’re obliged, because of our values of scientific integrity, objectivity, transparency, and independence, to let folks know what we’re doing in terms of our use of administrative records, and we’ve done that and we will continue doing that. The big lift was really in preparing for the last decennial [census], where we took the use of administrative records to new heights in terms of their utility – not only to help us for some enumeration of households, but, more importantly, to help us predict which households were occupied or not, or to predict which households would benefit from the use of administrative record enumeration versus which ones wouldn’t, or how many times should we knock on the door before we do something else. And now, with that knowledge, we’re looking back at 2020 and saying, what worked? What didn’t? How can we exploit it? And we’re kind of moving the dial to say, “What can we take more advantage of for 2030?”, with full recognition that there were some subpopulations, there’s some segments of society, that we really need to focus and hone in on to make sure we get a good count.
We absolutely have to transform and modernise our operation, and realise the value of taking [survey] information, blending it with existing data, administrative data, even third-party private sector data, into a huge data pool and linking it together, and that will create new data products that will serve the public in ways that we never imagined before.
On addressing public concerns about data collection and data privacy
Michael Hawes: Even though the decennial census is mandatory under law, we rely on voluntary participation. We’re relying on people being willing to respond to their census. In the lead up to each census, we do an extensive survey of what are the attitudes or motivators that will encourage people to respond or to not respond. And one of the recurring themes in that is concerns about privacy, concerns about how their data can be used. So, in order to help encourage people who have those concerns – and this is a sizable percentage of the population – we do need to have strong messaging about how their data are protected, how they can only be used for statistical purposes, and so on. But that has to be in very easy-to-consume sound bites, because a lot of people don’t have a background in statistical disclosure control or even in the legal conceptions about what privacy is. So, that is a real challenge for us. How do we convey the fact that we are taking this very seriously, and that their data are protected, in a way that people can kind of internalise and respond to?
On making sure statistics serve the public good, and the role of the Census Bureau in supporting data literacy
Sallie Ann Keller: In the US over the last decade, there’s been a really large movement around data for the public good, data science for the public good, and it’s really focused at trying to engage researchers and scholars – and we’re talking about high school students, community college students, undergraduates, graduate students – trying to engage them with civic engagement around data and data insights. That’s happening all over the country – really trying to democratise data and bring it in service of the public good. And I think that’s very exciting.
Michael Hawes: We have a whole programme called Statistics in Schools which is about taking census data and making it valuable to teachers in the classroom, and allowing students at various levels – from elementary school through high school – to be able to engage with the data and use it to inform their own learning, and to learn about their own communities. That is especially profound in the years around the actual census, because that also serves as a catalyst for getting households to respond. If the kids are using the census data within the classroom, then they go home and say, “Hey, have you filled out your census form?”
Robert Santos: It’s really important to start young, but then there’s also folks who want to use the data who are adults. So, we have something called the Census Academy, where you can go on to YouTube and get tutorials that show you visually somebody trying to use census data. And the second thing we do is, we really have a strong commitment for creating easier platforms for folks to access and utilise various types of data produced by the Census Bureau. We’re creating these data visualisation tools that bring together the demographic data that we collect, the economic data that we collect, and visualise it down to the census tract level so that local communities can pull that up. And then finally, in terms of the public good, there’s also work that we’re doing with the Federal Emergency Management Agency and the National Oceanic and Atmospheric Administration on our community resilience estimates to create the same type of data visualisations that can show where the potential worrisome geographic spots are.
On the opportunities for bringing together Census Bureau data and large language models
Sallie Ann Keller: We’re not going to be in the business of building generative AI models. But what we want is the statistics that we put out, the data that we put out, to be picked up by these large language models – to be kind of an input into generative AI. So, we are focused on that in terms of really looking at the structure of how we’re disseminating statistics, and how we’re disseminating things like data tables. How harvestable are they for AI? What are the guardrails we should put around that? We’re looking at and considering issues on data integrity, because when questions are posed, we would like our official statistics to be answering those questions, not our statistics translated through three other parties. Data integrity is really a huge issue, because we don’t want false data and infiltration happening that gets branded as our statistics. I don’t know where we’ll take it all, but I think we’d also like to be incredibly creative here. So, let’s suppose you ask a question and some statistic comes back. Well, why not have that be an experience, so that not just a statistic comes back but maybe a question or two comes back, to try to assess the context that you’re really asking about, so that we can not only have our data coming to you, but we can have the right data coming to you?
Michael Hawes: Even with some of our more traditional statistical data products, informing users of the limitations and the the uncertainty baked into a lot of those estimates has historically been a challenge – even for some more sophisticated users. The number of people who ignore margins of error on data tables, even in our data products, is not insubstantial. And so, when we get into an AI-driven data dissemination kind of framework, how can we use the flexibility of those platforms to not just provide the answers to the questions people are asking, but also to educate and inform about what the limitations of those answers are?
- Copyright and licence
- © 2023 Royal Statistical Society
This article is licensed under a Creative Commons Attribution 4.0 (CC BY 4.0) International licence. Photo of Robert Santos is excluded from this licence; it is a US Government work.
- How to cite
- Tarran, Brian. 2023. “‘We absolutely have to transform and modernise our operation’ – US Census Bureau director Robert Santos.” Real World Data Science, January 15, 2024. URL