Use It for Good – News and information from IPUMS

IPUMS Time Use Leadership Receives Ellen Galinsky Generative Researcher Award

July 17, 2026July 16, 2026 by mpcblog

By Kari Williams & Stacy Nordstrom

Drs. Sarah Flood, Liana Sayer, and Melissa Milkie presented with the Ellen Galinsky Generative Researcher Award
Since 1993, IPUMS has worked to preserve and harmonize population data and make them freely accessible to researchers. This includes time diary data from the American Time Use Survey (ATUS) as well as international and historical time diary data. Last month, Drs. Sarah Flood and Liana Sayer, co-PIs of IPUMS Time Use, along with their long-time collaborator Dr. Melissa Milkie, received the Ellen Galinsky Generative Researcher Award, presented by the Work and Family Researchers Network.

The award recognizes work-family researchers who have contributed breakthrough thinking to the work-family field via theory, measures, and/or data sets that led to expansive application, innovation, and diffusion, including the sharing of research opportunities in the spirit of open science. The award committee highlighted Flood’s foundational contributions to public data infrastructure and long history of demystifying complex time diary data, as well as Sayer’s innovative scholarship on gender and social class inequalities in time use over the life course.

Family Interrelationships Variables in IPUMS MEPS

July 31, 2026July 1, 2026 by mpcblog

By Etienne Breton

Health and family are inextricably tied. Their interplay is complex and dynamic, ranging from biological transmissions to the presence or absence of familial support over the life course. Elucidating these associations often requires vast datasets collected over multiple decades – to account for the ever-changing health and family circumstances of our lives. Researchers interested in investigating these questions at scale may now add a new tool to their toolkit: IPUMS family interrelationship variables are now available in IPUMS MEPS!

Also known as family pointers, these variables identify the location of a person’s probable co-resident spouse and/or parent(s) in the household. They increase reproducibility, flexibility and ease of use when analyzing family units and relationships within households. Whether interested in studying simple parent-child dyads or complex multigenerational arrangements, users may now seamlessly attach characteristics of in-household family members to a person’s records in MEPS.

IPUMS has pioneered the development of family pointers on nationally-representative samples of households and individuals, and these variables have since been added to most of our data collection projects. Their recent addition to IPUMS MEPS presents exciting opportunities owing to the unique richness of the MEPS data, which includes the possibility to eventually expand these pointers to a panel format.

How do the IPUMS MEPS family pointers compare to those in other IPUMS data collections?

The construction of these family interrelationship variables is comparable with other IPUMS microdata collections centered in the US: these are IPUMS USA¹, IPUMS CPS, IPUMS ATUS and IPUMS NHIS. The logic underpinning both common and project-specific codes is best described in the rule variables (as exemplified in the variables descriptions for MEPS: SPRULE and MOMRULE). These variables detail how pointers were attributed to certain individuals and not others, which further allows users to adjust the strictness of pointer attributions.

Let us provide a very brief overview of these procedures. In IPUMS MEPS, as in other IPUMS data collections, the assignment of family pointers and the corresponding rule variables rely primarily on information provided by the variable RELATE (denoting relationship to the householder or household reference person), and additionally on information from variables AGE, SEX and MARSTAT (marital status). The vast majority of family pointers are assigned using direct links established by RELATE (i.e., when a respondent is listed as the child or spouse of the householder). In IPUMS MEPS, these direct attributions represent between 94.7% and 98.9% of all assigned pointers depending on the year and the family pointer variable under consideration.

There remains, therefore, cases that RELATE does not directly solve. For instance, RELATE identifies persons who are grandchildren of the householder but does not specify who are the parents of those grandchildren among all children of the householder. In such clear but indirect cases, our codes algorithmically assign parent-child and spouse-spouse links based on information from RELATE as well as respondents’ age and marital status. These assignments are not probabilistic but instead follow a predefined logic which relies on a small number of well-defined assumptions². Crucially, the values of the rules variables listed above correspond to how direct (first digit) and unambiguous (second digit) each case is, with lower numbers indicating more direct and/or unambiguous cases. This means that users can rely on these rule variables to tailor the levels of directness and clarity they prefer for assigning family pointers.

Note that MEPS data are collected in a panel format: they encompass five interview rounds carried out over two calendar years. Currently, we provide family pointers for person records reported at the annual-level (or full-year consolidated files); variables reported at this level may differ from individual round-level observations, for which we do not yet offer family pointers. These variables should, therefore, be interpreted as reflecting household membership and family interrelationships within households as of December 31 of the survey year under consideration. The vast majority of family pointers are assigned using direct links established by RELATE (i.e., when a respondent is listed as the child or spouse of the householder)³.

Does 1 + 2 = 8? Automating QA/QC for Tabular Data

May 21, 2026May 21, 2026 by mpcblog

By Tracy Kugler and Tsu Zhu

The problem with OCR and numbers

To extract data tables from census reports only available as print documents, IPUMS IHGIS uses optical character recognition (OCR) software to automate the conversion of scanned images into digital representations of letters and numbers. OCR software has made great strides in accuracy for textual information by using dictionaries of known words to interpret uncertain letters. However, dictionaries do not help in distinguishing uncertain numerical digits. While a dictionary can suggest that the third character in “wh_t” should be an ‘a’ and not an ‘o’, there is no simple way to tell whether the third digit in “45_” should be a 3 or an 8. To ensure that IHGIS data are accurate, we must have confidence that each number has been recognized correctly and matches the number in the source document.

To address this gap, we developed an R package that leverages IHGIS structured metadata to identify logical relationships between cell counts and row/column totals and determine where cells don’t add up as expected. Often, a given cell participates in multiple relationships, which allows the package to use patterns among discrepancies to pinpoint and correct errors. The package can automatically identify and correct up to 95% of error cells, depending on the structure of relationships.

Identifying relationships from structured metadata

The R package currently relies on structured metadata generated by earlier stages in the IHGIS data processing pipeline to identify sum and total relationships among rows and columns. After tables are OCR’ed from source documents, we use a customized markup framework to generate metadata. We then convert the marked up files into CSV files with a standard structure, which serve as input to the quality assurance/quality control (QA/QC) process. The CSV files include hierarchical labels for categories on the columns and geographic units on the rows. Within the labels, blanks are used to indicate totals. The package identifies a column/row with a blank header cell as the sum of other columns/rows that share the same non-blank label(s) and have sub-category labels corresponding to the blank.

IPUMS Announces 2025 Research Award Recipients

May 19, 2026May 13, 2026 by mpcblog

IPUMS is excited to announce the winners of its annual IPUMS Research Awards. These awards honor both published research and nominated graduate student papers from 2025 that use IPUMS data to advance or deepen our understanding of social and demographic processes.

The 2025 competition awarded prizes for both published research and graduate student research (published or unpublished) in eight categories:

IPUMS USA: data from the U.S. decennial censuses (including full count data for 1850-1950) and American Community Survey Data
IPUMS CPS: monthly data from the Current Population Survey (back to 1976) and Annual Social and Economic supplement (back to 1962)
IPUMS International: harmonized data from censuses and labor force surveys around the world, contributed by more than 100 international statistical office partners, for 1960-forward
IPUMS Health Surveys: harmonized data from the U.S. National Health Interview Survey (NHIS) for 1963 onward and Medical Expenditure Panel Survey (MEPS) for 1996 onward
IPUMS Spatial: Census summary tables and GIS data from the US (IPUMS NHGIS) and around the world (IPUMS IHGIS), and measures of contextual determinants of health (IPUMS CDOH)
IPUMS Global Health: harmonized health survey data from around the world, including harmonized versions of the Demographic and Health Surveys (IPUMS DHS), Multiple Indicator Cluster Surveys (IPUMS MICS), and the Performance Monitoring for Action (IPUMS PMA)
IPUMS Time Use: time diary data from the American Time User Survey (IPUMS ATUS), historical and contemporary time use data from the U.S. (IPUMS AHTUS), and around the world (IPUMS MTUS)
IPUMS Excellence in Research: The IPUMS mission of democratizing data is strengthened by broad representation among our data users and the research that we highlight. This award was created to recognize the diversity of scholars doing innovative research with IPUMS data. This category includes submissions from all IPUMS data collections.

The award committee received and reviewed hundreds of nominations for our 2025 competition. From these publications the award committees selected the 2025 honorees.

IPUMS DHS Goes Global

May 4, 2026 by mpcblog

By Miriam L. King and Sula Sarkar

IPUMS DHS now includes integrated variables for 84 counties (up from 51) and nearly 350 samples (up from 233), including new data from Latin America, Eastern Europe, Oceania, the Caribbean, and Central and East Asia. Providing DHS data in a form that facilitates micro-analyses across countries is one of IPUMS’ greatest strengths, so researchers will be excited to learn that they can now do even more! Our latest data release expands the scope of IPUMS DHS beyond its initial coverage of Africa, the Middle East, and South Asia and adds the latest samples for 12 countries previously in the database. Figure 1 shows the full geographic scope of IPUMS DHS, as well as highlighting newly added countries and previously included countries with new samples.

Figure 1: Countries included in IPUMS DHS

Adjust Monetary Values for IPUMS CPS

April 1, 2026April 1, 2026 by mpcblog

By Kari Williams with support from former IPUMS research staff member Danika Brockman

We love to extend useful functionality across multiple IPUMS data collections, so we were delighted to extend the the Adjust Monetary Values (AMV) feature, which adjusts dollar values for inflation and was first developed for IPUMS USA, to IPUMS CPS. The initial release of the AMV feature in IPUMS CPS in 2023 provided adjustment for a limited number of variables. Late last year, we extended the feature to cover variables from the ASEC as well. This blog post provides a quick introduction to the AMV tool and step-by-step guidance for using the tool in IPUMS CPS – for full details on the feature, see our IPUMS working paper on the AMV feature.

The Basics

The AMV feature allows users to adjust the monetary variables in a customized dataset from IPUMS into constant dollars, so that all monetary variables for months and years of data in your downloaded data file are in comparable units. IPUMS CPS variables are adjusted to 2010 dollars using the Consumer Price Index for All Urban Consumers (CPI-U). When you add an inflation-adjusted version of a variable to your data extract, the IPUMS data access system applies the appropriate CPI-U adjustment factor for each year to the variable(s) you’ve selected and includes both the original variable and an inflation-adjusted version of that variable in your extract. The adjustment factor is only applied to codes that represent monetary values in the original variable. All missing data codes (e.g., NIU, “Refused”, “Don’t Know”, and “No response”) from the original variable are combined into a single NIU code consisting entirely of 9s in the adjusted version of the variable (which is two digits wider than the original variable).

Note that IPUMS only adjusts monetary variables for years with a final published CPI-U. Final CPI-U values for a given year are typically published early in the next year (e.g., the 2025 CPI-U values are published in 2026). Notably for basic monthly CPS data, current year samples will not be available for adjustment because the final CPI-U will not be published. However, the reference period for the ASEC is the previous calendar year (e.g., 2024 is the reference year for the 2025 ASEC); the adjustment factor for the reference year has been published by the time Census Bureau releases the ASEC data each September and we integrate them into IPUMS CPS. One quick way to check whether the CPI-U value has been published for a given year is to consult the IPUMS CPS CPI99 documentation. We also update the IPUMS CPS revision history to note when we have extended the AMV tool to cover an additional year of data. Any adjusted variables in your extract for samples that are not yet available for monetary adjustment will consist entirely of the adjusted variable NIU code (i.e., a string of 9s).

Linking children and adolescents to their mothers using IPUMS MICS

March 20, 2026 by mpcblog

By Anna Bolgrien

IPUMS MICS offers hundreds of harmonized variables related to children’s health and wellbeing that allow for rich and innovative research. From the IPUMS MICS website, users can browse variables and create custom data extracts within a selected unit of analysis. In order to conduct many analyses, however, users will want to combine and link datasets relating to different units of analysis available in MICS.

For example, to investigate how child characteristics are related to characteristics of their mother, users will need to download and link data between the Children (either 0-4 or 5-17) unit of analysis and the Women unit of analysis.

IPUMS MICS provides instructions for linking across units of analysis as a user note. This user note lists the variables available as linking keys for each unit of analysis, and is a general guide for linking across the units, such as linking household characteristics with individual person records.

In this blog post, we provide more detailed information on how to link children and adolescents to their mothers. Similar logic can be applied to link children to fathers or other caregivers in the household. As IPUMS MICS requires Stata to conduct harmonization, we provide example code in Stata syntax.