IPUMS MLP: Revolutionizing Linked Data

By Etienne Breton

As researchers, we often ask questions we cannot answer due to lack of data. More intriguingly, however, there are questions we only think of asking once we encounter data that may answer them. Good data address existing problems; great data inspire new questions. The latest iteration of the IPUMS Multigenerational Longitudinal Panel (MLP) project, which links together records from the full count US census data, fits this description. Visit our data browser and project description for inspiration on research questions you did not know could be asked – and answered.

Full count census data offer unprecedented opportunities for social scientific research. Once harmonized, these data enable precise measurement of key demographic, economic, and social patterns across time and space. Researchers can observe entire populations over long periods and produce estimates virtually free of sampling error. Estimates can also be produced down to the smallest geographical units, allowing researchers to define and observe communities with an outstanding level of detail.

Perhaps even more powerfully, full count data have opened the possibility of automated record linkages across census years to construct millions of individual life histories and trace millions of families over multiple generations. These linked data speak compellingly to core research questions in the social sciences, including intergenerational mobility and the intergenerational transmission of socioeconomic characteristics; exhaustive descriptions of individual and family trajectories; internal migration patterns within small geographic units; long-term outcomes of early-life conditions; and many more.

IPUMS disseminates full count census enumerations for ten census years from 1850 to 1950. These data, covering over 800 million individual records, are the fruit of collaboration between IPUMS and the world’s two largest genealogical organizations — Ancestry.com and FamilySearch — to leverage genealogical data for scientific purposes. IPUMS MLP now offers longitudinal links between individuals and households enumerated in those ten censuses. As shown in the figure below, we offer 645 million links between census pairs in MLP’s current iteration. This amounts to more than 175 million people linked over two or more censuses.

Figure 1: Case Counts for Linked Census Pairs

Grid of decennial census pairs, 1850-1950. Cells in grid show number of links.
The IPUMS USA access system allows users to see a detailed count of the number of links between census pairs in the latest version of the MLP data. See the IPUMS MLP data description for more information about links.

Full count and linked census data can further be merged with datasets outside the census, including administrative data and surveys from the present day, to multiply research opportunities. Already researchers have been drawing on these new data to tackle ambitious questions. This includes studies on the life-course impacts of New Deal policies on individual wellbeing and economic outcomes; on the role of declining kin availability on birth rates; on the late-life consequences of early-life exposure to lead; on the intergenerational mobility of immigrants over the long run; and many other studies.

Figure 2. Intergenerational mobility in the US from Connor et al. (2025)

Choropleth maps of continental US. Blue counties denote more upward mobility; yellow denotes lower mobility.
This figure shows two maps of the continental US taken from a recent publication by Connor et al. (2025). On the top map, the authors use MLP data to construct county-level estimates of father-to-son intergenerational mobility between 1904 and 1950. On the lower map, they calculate the same estimates from other data sources, for years 1978 to 2015. Comparing these two maps, the authors find that “[w]hile states in the central and northern regions exhibit rising upward mobility rates relative to the rest of the country, much of the South continues to perform poorly”.
Despite the enormous potential of these linked data to advance research, however, users must also remain aware of their limitations. Record linkage is inherently probabilistic and subject to non-random omissions, raising concerns about selection bias and generalizability. Yet these challenges also present important methodological opportunities – not only for causal and multilevel modelling, but also for improving the methods needed to create the links themselves.

We have only scratched the surface of the potential of full count and linked census data. These data provide a fertile ground for formulating new and original research questions. They allow revisiting more foundational research questions, which can now be answered with added depth, granularity and exhaustiveness. The field of publication opportunities using these data remains wide open. Studies published in major journals already attest to the data’s significance for scholarly contributions. In both academic and applied research settings, there is also great demand and momentum for improving the methods – chiefly machine learning models – needed to create these data.

If any of these possibilities piques your curiosity, consider yourself cordially invited to visit the IPUMS USA website for more information on full count census data, as well as the MLP project page for more information on the latest updates to MLP data and for the project’s next steps.