The Food for Thought Challenge: Using AI to support evidence-based food and nutrition policy

Brian Tarran and Julia Lane introduce a collection of articles telling the story of the Food for Thought Challenge, which sought to use machine learning and natural language processing to more efficiently link data on food purchases to nutritional information.

Machine learning
Natural language processing
Public policy
Health and wellbeing

Brian Tarran and Julia Lane


August 21, 2023

There’s a saying: “You are what you eat.” Its meaning is somewhat open to interpretation, as with many such sayings, but it is typically used to make the point that if you want to be well, you need to eat well. Nutrition scientists and dieticians spend their careers trying to figure out what “eating well” looks like – the foods the human body needs, in what quantities, and how best to consume them. Their research informs advice and guidance issued by health professionals and governments. Ultimately, though, the choice of what to eat falls to us – individuals and families – and our choices are often determined by our tastes, the availability of foodstuffs in our local stores, their price and affordability.

So, what exactly do we eat? Answers come from a variety of sources. In the United States, there are dietary recall studies such as the National Health and Nutrition Examination Survey, which asks a sample of respondents to report their food and beverage consumption over a set period of time. There are also organisations like IRI that collect point-of-sale data from retail stores on the actual food and drink being sold to consumers. By and large, this information comes from barcodes on product packaging being scanned at checkouts, so it is often referred to as “scanner data”.

This data – from dietary recall studies and retail scanners – is valuable: once we know what people are eating, we can check the nutritional content of those foods and build up a picture of what the diet of a typical individual or family looks like and how it compares to the diet recommended by doctors and policymakers. And, if we know what other foodstuffs are available, how much they cost, and the nutritional value of those items, we can work out how much families need to spend, and on what, in order to eat well and, hopefully, be well.

Figuring all this out is where something called the Purchase to Plate Crosswalk (PPC) comes in. It’s a key tool for understanding the “healthfulness of retail food purchases” and it does this by linking IRI scanner data on what people buy with data on the nutritional content of those foods, as recorded in the US Department of Agriculture’s Food and Nutrient Database for Dietary Studies (FNDDS). But there’s a catch: scanner data is collected about hundreds of thousands of food products, whereas the FNDDS has nutritional profile information for only a few thousand items. Linking these two datasets therefore gives rise to a one-to-many matching problem – a problem that takes several hundred person-hours to resolve.

What if machine learning can help? That question inspired a competition, the Food for Thought Challenge, organized by the Coleridge Initiative, a nonprofit organization working with governments to ensure that data are more effectively used for public decision-making. Researchers and data scientists were invited to use machine learning and natural language processing to more efficiently link data on supermarket products to nutrient databases.

This collection of articles tells the story of the Food for Thought Challenge. We begin by exploring the policy issues that drive the development of the PPC – the need to understand the national diet, developing healthy diet plans, and costing up those plans – and the issues posed by record linkage. Next, we learn about the nature of the challenge and the structure of the competition in more detail, and then the three winning teams walk us through their solutions. We end the collection with some closing thoughts on the value of competitions for addressing data scientific challenges in the public sector.

About the authors
Brian Tarran is editor of Real World Data Science, and head of data science platform at the Royal Statistical Society.

Julia Lane is a professor at the NYU Wagner Graduate School of Public Service and a NYU Provostial Fellow for Innovation Analytics. She co-founded the Coleridge Initiative, whose goal is to use data to transform the way governments access and use data for the social good through training programs, research projects and a secure data facility. She recently served on the Advisory Committee on Data for Evidence Building and the National AI Research Resources Task Force.

Copyright and licence
© 2023 Royal Statistical Society and Julia Lane

This article is licensed under a Creative Commons Attribution 4.0 (CC BY 4.0) International licence. Thumbnail photo by Melanie Lim on Unsplash.

How to cite
Tarran, Brian, and Julia Lane. 2023. “The Food for Thought Challenge: Using AI to support evidence-based food and nutrition policy.” Real World Data Science, August 21, 2023. URL