Defining Purposes and Uses to Support the Development of Statistical Products in a 21st Century Census Curated Data Enterprise Environment

Learn about researchers’ plans to develop the Curated Data Enterprise through a use case research program.

Public Policy
Data Analysis
Data Integration
Curation
Statistical Products
Author

Stephanie Shipp, Joseph Salvo, and Vicki Lancaster
University of Virginia

Published

November 22, 2024

Acknowledgments: This research was sponsored by the:
Unites States Census Bureau Agreement No. 01-21-MOU-06 and
Alfred P. Sloan Foundation Grant No. G-2022-19536



The views expressed in this article are those of the authors and not the Census Bureau.

1 Summing it up

We end where we began in the first article of our series. Through this four-part series, we introduced a Curated Data Enterprise (CDE) Framework (see Figure 1) that can guide the development and dissemination of statistics broadly applicable to addressing social and economic issues while ensuring replicability and reusability. The CDE provides the scaffold for scaling the statistical product development of interest to the US Census Bureau and broadly applies to official statistics agencies (Keller et al. 2022). We illustrated this through a use case on climate resiliency of skilled nursing facilities, highlighting the replicability and reusability of the capabilities that would benefit inclusion in a CDE.

Figure 1: The CDE Framework starts with the purposes & uses of the statistical products. The outer rectangle identifies the guiding principles for ethical, transparent, reproducible statistical product development and dissemination. The inner rectangle identifies the statistical product development steps.

As noted in the first three articles, the process begins with articulating purposes and uses through stakeholder engagement and continues by leveraging that engagement, including subject matter expertise, to inform statistical product development. Eliciting purposes and uses from stakeholders and data users is facilitated by asking questions such as:  

  1. What questions keep you awake at night because you don’t have data insights to address them? What are those purposes and uses that you need statistical products to support?

  2. How do we collaborate and engage with you to better understand your needs and help you identify gaps in understanding regarding purpose and use?

  3. How do we prioritize what statistical products to develop first?

Examples of purposes and uses that drive new statistical products include accurately measuring gig employment (Salvo, Shipp, and Zhang 2022a), migration due to extreme climate events (Salvo, Shipp, and Zhang 2022b), the various dimensions of housing affordability (Wu et al. 2023), and addressing the undercount of young children (Salvo, Lancaster, and Shipp 2023). Other topics that require multiple sources and types of data include creating a household living budget based on the minimum necessary to ensure an adequate standard of living (Lancaster et al. 2023) and using this budget as a starting point for measuring insecurity across components such as food or housing (Montalvo et al. 2023).

2 Developing an end-to-end (E2E) curation system

Purposes and uses defined in use cases are important to support the rapid development of statistical products. These use cases will capture the imagination of those working to address today’s critical issues and advance public understanding and trust in federal statistics. The above paragraph provides examples of purposes and uses for which we have developed use cases.

Use cases are a powerful mechanism to promote methodological research to develop and implement capabilities needed in a CDE. The objectives are to undertake research projects that have the potential to create statistical products with explicit purposes and uses that will exercise the end-to-end (E2E) curation components.

When implemented, these proposed use cases will demonstrate a sequence of capabilities needed to build the CDE, such as agile data discovery, reusing modules and data (including synthetic data), tracking the provenance of collected and generated data, reusing synthetic data and methods to integrate many types of data, conducting statistical analysis involving heterogeneous data integration, and reviewing data and statistical results with an equity and ethics lens. These steps will be captured in an end-to-end curation system.

  1. Criteria for developing and evaluating use cases that will uncover the capabilities and research necessary to develop the CDE

Criteria are needed to evaluate, and partner with researchers and stakeholders in developing and implementing the capabilities to capture in the CDE. The choice of use cases, when curated, needs to provide unique insight into CDE capabilities and statistical product development. The capabilities to be developed include addressing some purpose and use that no single source of information can resolve, generating practical diagnostics to improve existing methods, creating pilot software, and validating new and improved statistical products. These criteria, developed through listening sessions and discussions with experts, guide the prioritization and selection of use cases and their evaluation after curation (see Table 2) (Keller et al. 2022).

Table 2. Criteria for Selecting and Prioritizing Use Cases to Identify CDE Capabilities
Value and feasibility of the CDE approach described in the existing research (potential use case) to address emerging or long-standing issues, ie, its purpose and use over and above existing approaches to address high-priority problems. | | Stakeholders’ challenges and issues as the source of purposes and uses. | | Subject matter experts to advise on the approach and implementation. | | Partners to access data from local and state governments, non-profit organizations, and the private sector, and strategies to overcome legal and administrative barriers to such access that benefits to both the providers and recipients of the data. | Survey, administrative, opportunity, and procedural data from multiple sources (eg, local, state, federal, third-party) to address the purpose and use (issue) in an integrated way. There are well-defined data ingestion and governance requirements. | | Computation and measurement requirements for statistical products include the unit(s) of analysis and their characteristics, temporal sequence, geocoded location data, and methods for imputations, projections, and statistical analysis. | | Equity and ethical dimensions are considered at each step to ensure that the use case provides fair and accurate representation across groups and an assessment that the potential benefits outweigh the potential harm. | | Evidence of CDE capabilities to be built, including the code, data, and documentation to create the statistical products, which can be described in the curation step. | | Statistical products include integrated data sources, indicators, maps, visualizations, storytelling and analysis. | | Potential viability of proposed dissemination platforms for interactive access to data products at all levels of data acumen (Keller and Shipp 2021) while adhering to confidentiality and privacy rules. |
  1. An end-to-end curation process

Curation is an end-to-end process defined by the context of the purposes and uses that document the decisions and trade-offs at each step in the CDE Framework. The following curation definition will be used as it serves the CDE’s vision.

Curation involves documenting, for each statistical product, the inputs from which the product is derived, the wrangling used to transform the information into product, and the statistical product itself. Purposes and uses provide the context for each statistic and statistical product.

This definition has evolved from numerous stakeholder discussions via listening sessions and discussions with Census Bureau staff. (Nusser et al. forthcoming; Faniel, Frank, and Yakel 2019; NASEM 2022).

As use cases are curated, the CDE capabilities will evolve to quickly develop statistical products. These curated use cases are integral to developing an E2E curation process for the CDE.  

  1. Invitation to contribute purpose and use ideas for developing new statistical products

The CDE development aims to curate a significant number of use cases that address social and economic issues that have the potential to define capabilities to be built in the CDE. Initially, they are seeking ideas for purposes and uses to define these use cases and statistical products.

The skilled nursing facility use case included code, data, and documentation to calculate the probability of workers getting to work during a weather event, resilience indicators at the county or sub-county level, alternative skilled nursing home deficiency measures, and other capabilities.

Incorporating capabilities in the CDE

To accelerate the development of statistical products, the Census Bureau will develop use cases to articulate and create CDE capabilities. This requires identifying those valuable nuggets for learning and quickly translating and incorporating this information into the CDE. Examples of critical capabilities of interest are learning about the utility of synthetic data, the ability to aggregate data into custom geographies, and combining different units of analysis. The expected outcome is the creation of an innovative 21st Century Census Curated Data Enterprise focused on purposes and uses that overcome the limitations and challenges of today’s survey-alone model.  

The 21st Century Census Curated Data Enterprise development presents an opportunity for researchers to help drive the development of the CDE as the foundation for creating new statistical products. The US Census Bureau is seeking ideas for purposes and uses that will define new statistical products. They are interested in research projects (use cases) that are guided by the CDE framework as potential new statistical products. They want to learn from and understand your experiences in using the CDE framework, for example, what worked well, what challenges you faced, how each step in the framework was curated, and what capabilities are replicable and reusable for developing and enhancing statistical products.

About the authors
Stephanie Shipp leads the Curated Data Enterprise research portfolio and collaborates with the US Census. She is an economist with experience in data science, survey statistics, public policy, innovation, ethics, and evaluation.
Joseph Salvo is a demographer with experience in US Census Bureau statistics and data. He makes presentations on demographic subjects to a wide range of groups about managing major demographic projects involving the analysis of large data sets for local applications.
Vicki Lancaster is a statistician with expertise in experimental design, linear models, computation, visualizations, data analysis, and interpretation.
Copyright and licence
© 2024 Stephanie Shipp

This article is licensed under a Creative Commons Attribution 4.0 (CC BY 4.0) International licence. Thumbnail photo by Lukas Blazek on Unsplash.

How to cite
Shipp S, Salvo J, Lancaster V (2024). “Statistical Products in a 21st Century Census Curated Data Enterprise Environment” Real World Data Science, November 22, 2024. URL

References

Faniel, Ixchel M, Rebecca D Frank, and Elizabeth Yakel. 2019. “Context from the Data Reuser’s Point of View.” Journal of Documentation 75 (6): 1274–97. https://doi.org/10.1108/JD-08-2018-0133.
Keller, Sallie, Kenneth Prewitt, John Thompson, Steve Jost, Christopher Barrett, Sarah Nusser, Joseph Salvo, and Stephanie Shipp. 2022. “A 21st Century Census Curated Data Enterprise. A Bold New Approach to Create Official Statistics. Technical Report.” Proceedings of the Biocomplexity Institute BI-2022-1115: 297–323. https://doi.org/10.18130/r174-yk24.
Keller, Sallie, and Stephanie Shipp. 2021. “Data Acumen in Action.” Notices of the American Mathematical Society. https://www.ams.org/journals/notices/202109/noti2353/noti2353.html?adat=October%202021&trk=2353&galt=feature&cat=feature&pdfissue=202109&pdffile=rnoti-p1468.pdf .
Lancaster, V., M. Montalvo, J. Salvo, and S. Shipp. 2023. “The Importance of Household Living Budget in the Context of Measuring Economic Vulnerability: A Census Curated Data Enterprise Use Case Demonstration.” Proceedings of the Biocomplexity Institute Technical Report. TR# BI-2023-258. https://doi.org/10.18130/p43z-c742.
Montalvo, Cesar, Vicki Lancaster, Joseph Salvo, and Stephanie Shipp. 2023. “The Importance of Household Living Budget in the Context of Food Insecurity: A Census Curated Data Enterprise Use Case Demonstration.” Proceedings of the Biocomplexity Institute, Technical Report BI-2023-261. https://doi.org/10.18130/2kgx-tv50.
NASEM. 2022. “Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies.” National Academies of Science, Engineering, and Medicine. https://doi.org/10.1162/99608f92.17405bb6.
Nusser, S., S. Keller, S. Shipp, Z. Zhu, and E. Wu. forthcoming. “Curation in the Context of the Census Curated Data Enterprise (CDE).” TBD, forthcoming.
Salvo, J., V. Lancaster, and S. Shipp. 2023. “The Net Undercount of Children Under 5 Years of Age in the Decennial Census: An Art of the Possible Use Case.” Proceedings of the Biocomplexity Institute Technical Report. TR# BI-2023-000. https://doi.org/10.18130/nzyj-m621.
Salvo, J., S. Shipp, and S. Zhang. 2022b. “Building a Case Study of Domestic Migration and the Curated Data TR# 2022-027 - Essential Elements.” Proceedings of the Biocomplexity Institute Technical Report BI 2022-027 (2022b). https://doi.org/10.18130/bcwa-gt69.
———. 2022a. “Defining the Role of Gig Employment in the Post-Pandemic World of Work.” Proceedings of the Biocomplexity Institute Technical Report BI 2022-026 (2022a). https://doi.org/10.18130/wkx0-4y46.
Wu, E., J. Salvo, V. Lancaster, and S. Shipp. 2023. “Housing Affordability – an Art of the Possible Use Case to Develop the 21st Century Census Curated Data Enterprise.” Proceedings of the Biocomplexity Institute Technical Report BI-2023-262. https://doi.org/10.18130/qgkd-va29.