Changes to Census Bureau Data Products
The Census Bureau has announced a new set of standards and methods for disclosure control in public use data products. According to the Census Bureau, the new approach, “marks a sea change for the way that official statistics are produced and published” and represents "the death knell for public-use detailed tabulations and microdata sets as they have been traditionally prepared.” The reason for these changes is concern about respondent confidentiality, even though the decennial census and American Community Survey (ACS) research data files have an unblemished record of confidentiality. As the Census Bureau acknowledges, there has never been a single documented case where the identity of a respondent in the ACS or decennial census has been revealed by someone outside the Census Bureau.
IPUMS is concerned that scientists, planners, and the public will soon lose the free access we have enjoyed for the past six decades to reliable public Census Bureau data describing American social and economic change. This page reports what we have learned about the new data products.
Use this form to join our mailing list for updates on the Bureau’s evolving plans and to tell us about how the proposed changes might affect your research.
DIFFERENTIAL PRIVACY IN THE 2020 CENSUS
Census Bureau implementation of the new disclosure controls is most advanced for the summary files of the 2020 Census. These data files have limited information, since the short form census asks only a few questions. The data are primarily used for redistricting, allocation of funds based on population counts, planning, and studies of residential segregation. Given the complete coverage of the decennial census, these data provide a crucial high-quality baseline for surveys and estimates throughout each decade. They are also the only source of high-quality nationwide data for small areas, for which survey sample sizes (from the American Community Survey or other sources) are typically too small to produce reliable estimates.
The Census Bureau plans to release only "differentially private" data from the 2020 Census. These data will have intentional errors added to nearly all statistics, including even the total populations of all geographic units below the state level. There is an ongoing lawsuit arguing that the introduction of deliberate errors is unconstitutional.
The Census Bureau justifies the new disclosure controls by citing the threat of database reconstruction, which is a technique for inferring individual-level responses from tabular data. Our analysis, however, determined that the threat of database reconstruction was minimal. The Census Bureau's attempt to reconstruct the 2010 Census from published tabulations was incorrect in most cases, and did not perform much better than random guesses of people's characteristics. As Acting Director of the Census Bureau Ron Jarmin concluded, “The accuracy of the data our researchers obtained from this study is limited, and confirmation of reidentified responses requires access to confidential internal Census Bureau information … an external attacker has no means of confirming them."
The Census Bureau produced a series of demonstration products based on the 2010 Census to allow outside experts to assess the usability of the data for redistricting, planning, and research. IPUMS, along with collaborators at the University of Washington, the University of Tennessee, and NORC at the University of Chicago, received a grant from the Alfred P. Sloan Foundation to analyze the demonstration files, and other groups from Harvard and CUNY also undertook analyses.
The most recent demonstration data were released in April 2021, and the analyses are now complete. Although not all of the studies have been publicly released, those that have suggest that the differentially private data will be unfit for redistricting and for many other research applications.
- Feedback on the April 2021 Census Demonstration Files. Van Riper, Schroeder, and Ruggles
- Does the Quality of the Census April 28, 2021, Census Demonstration Product (with an Epsilon of 12.2) Mean that Such a Product Would Be “Fit for Use” for Redistricting? Beveridge
- The Impact of the U.S. Census Disclosure Avoidance System on Redistricting and Voting Rights Analysis. Kenny et al.
- State of Washington Feedback on the April 2021 Census Demonstration Files. Mohrman.
- Differential Privacy and the Upcoming Process of Redistricting. Sullivan & Cai.
SYNTHETIC MICRODATA FROM THE AMERICAN COMMUNITY SURVEY
The American Community Survey (ACS) microdata is by far the most intensively-used dataset disseminated by IPUMS and is a core dataset across social science and health research. Common topics of analysis include poverty, inequality, immigration, internal migration, ethnicity, disability, transportation, fertility, marriage, occupations, education, and family structure.
At the April 2021 ACS Data Users conference, the Census Bureau announced that it will replace the ACS research data with “fully synthetic” data over the next three years. A week after the conference--after an uproar on Twitter--the Census Bureau backtracked, and now says that there is no firm timeline on implementation of simulated ACS data. The Census Bureau has not announced any formal process for evaluation of the change, as is required under the Administrative Procedures Act.
The Bureau has not finalized the details of their methods, but the idea is to develop statistical models describing the interrelationships of the variables in the ACS and then construct a simulated population consistent with those models. Such modeled data captures relationships between variables only if they have been intentionally baked into the model. Accordingly, synthetic data are poorly suited to studying unanticipated relationships, which impedes new discovery. Most analyses currently conducted with the ACS are likely to become impossible with the shift to synthetic data. For example, the ACS makes it easy for investigators to measure ethnic intermarriage, or the impact of a partner’s education on women’s fertility. The synthetic data would likely incorporate only individual-level interrelationships among variables, so analysis across household members would be impossible.
The Bureau apparently recognizes that the synthetic ACS microdata will not be suitable for research. The Bureau therefore proposes a system whereby investigators would develop analyses using synthetic data, and then submit them to the Census Bureau for “validation” using real data. This would preclude exploratory analyses on the real data, and would probably be logistically infeasible.
SMALL-AREA DATA FROM THE AMERICAN COMMUNITY SURVEY
The Census Bureau has announced that the ACS summary data will also be made "formally private" by 2025 at the earliest, but it has provided no further details about either the methods or the timeline for achieving this goal.
Updates & Research Reports
- Census Bureau Disclosure Avoidance System (DAS) Updates
- David Van Riper, "Differential Privacy and the 2020 Decennial Census"
- David Van Riper, "Differential Privacy and the Decennial Census"
- David Van Riper, Tracy Kugler, José Pacas, and Jonathan Schroeder, “Differential Privacy and the Decennial Census”
- Ruggles, Steven, Catherine Fitch, Diana Magnuson, and Jonathan Schroeder. 2019. "Differential Privacy and Census Data: Implications for Social and Economics Research.” AEA Papers and Proceedings, 109 : 403-08.
- Task Force on Differential Privacy for Census Data, Implications of Differential Privacy for Census Bureau Data and Research
We will continue to gather relevant information for the IPUMS user community and post here and share via IPUMS Twitter.