Use It for Good

Family Interrelationships Variables in IPUMS MEPS

By Etienne Breton

Health and family are inextricably tied. Their interplay is complex and dynamic, ranging from biological transmissions to the presence or absence of familial support over the life course. Elucidating these associations often requires vast datasets collected over multiple decades – to account for the ever-changing health and family circumstances of our lives. Researchers interested in investigating these questions at scale may now add a new tool to their toolkit: IPUMS family interrelationship variables are now available in IPUMS MEPS!

Also known as family pointers, these variables identify the location of a person’s probable co-resident spouse and/or parent(s) in the household. They increase reproducibility, flexibility and ease of use when analyzing family units and relationships within households. Whether interested in studying simple parent-child dyads or complex multigenerational arrangements, users may now seamlessly attach characteristics of in-household family members to a person’s records in MEPS.

IPUMS has pioneered the development of family pointers on nationally-representative samples of households and individuals, and these variables have since been added to most of our data collection projects. Their recent addition to IPUMS MEPS presents exciting opportunities owing to the unique richness of the MEPS data, which includes the possibility to eventually expand these pointers to a panel format.

How do the IPUMS MEPS family pointers compare to those in other IPUMS data collections?

The construction of these family interrelationship variables is comparable with other IPUMS microdata collections centered in the US: these are IPUMS USA1, IPUMS CPS, IPUMS ATUS and IPUMS NHIS. The logic underpinning both common and project-specific codes is best described in the rule variables (as exemplified in the variables descriptions for MEPS: SPRULE and MOMRULE). These variables detail how pointers were attributed to certain individuals and not others, which further allows users to adjust the strictness of pointer attributions.

Let us provide a very brief overview of these procedures. In IPUMS MEPS, as in other IPUMS data collections, the assignment of family pointers and the corresponding rule variables rely primarily on information provided by the variable RELATE (denoting relationship to the householder or household reference person), and additionally on information from variables AGE, SEX and MARSTAT (marital status). The vast majority of family pointers are assigned using direct links established by RELATE (i.e., when a respondent is listed as the child or spouse of the householder). In IPUMS MEPS, these direct attributions represent between 94.7% and 98.9% of all assigned pointers depending on the year and the family pointer variable under consideration.

There remains, therefore, cases that RELATE does not directly solve. For instance, RELATE identifies persons who are grandchildren of the householder but does not specify who are the parents of those grandchildren among all children of the householder. In such clear but indirect cases, our codes algorithmically assign parent-child and spouse-spouse links based on information from RELATE as well as respondents’ age and marital status. These assignments are not probabilistic but instead follow a predefined logic which relies on a small number of well-defined assumptions2. Crucially, the values of the rules variables listed above correspond to how direct (first digit) and unambiguous (second digit) each case is, with lower numbers indicating more direct and/or unambiguous cases. This means that users can rely on these rule variables to tailor the levels of directness and clarity they prefer for assigning family pointers.

Note that MEPS data are collected in a panel format: they encompass five interview rounds carried out over two calendar years. Currently, we provide family pointers for person records reported at the annual-level (or full-year consolidated files); variables reported at this level may differ from individual round-level observations, for which we do not yet offer family pointers. These variables should, therefore, be interpreted as reflecting household membership and family interrelationships within households as of December 31 of the survey year under consideration. The vast majority of family pointers are assigned using direct links established by RELATE (i.e., when a respondent is listed as the child or spouse of the householder)3.

How accurate are IPUMS MEPS family pointers?

While there is no omniscient vantage point allowing us to determine whether any given attribution of a family pointer is accurate or not, we possess at least two ways of assessing the plausibility (or plausible accuracy) of family pointers in IPUMS MEPS. The first is to compare the population-level prevalence of family pointers between IPUMS MEPS and other IPUMS data collections centered in the US. All of these data collections can be used to generate nationally representative statistics of the non-institutionalized population over a long time period. Once weighted, they should therefore provide reasonably convergent demographic estimates.

In brief, such a comparison reveals that IPUMS MEPS pointers describe a similar family demography within households to that obtained described by family pointers in other major US surveys. For instance, as shown on Figure 1, the proportion of all survey respondents who were assigned a mother in their household declined in all US-centered IPUMS data collections between the mid-1990s and the mid-2020s. This trend may well be explained by the ongoing fertility decline in the US, but nonetheless deserves further scrutiny as it could also be due to changes in patterns of living arrangements or even to changes in household rostering accuracy.

Figure 1 – Weighted Proportion of Respondents With Mother in the Household (MOMLOC!=0)

Figure 1 shows a decline in the proportion of respondents for whom IPUMS pointers identify a mother in the household across five major US surveys from the mid-1990s to the early-2020s.A second way to assess the plausible accuracy of our IPUMS-constructed family pointers is to compare them to family pointers provided in the original MEPS data from AHRQ (the Agency for Healthcare Research and Quality, which field MEPS). These AHRQ-pointers are provided at the round-level and not at the annual-level. They are initially reported by the respondents themselves and are then validated or imputed by AHRQ based on internal procedures (which include tests of age plausibility in parent-child relationships). These respondent-reported pointers have benefits, but they remain subject to reporting errors from respondents and enumerators. Furthermore, it is worth noting that many other U.S. federal data sources do not provide self-reported, much less agency-validated, family interrelationship variables4. They nonetheless provide a meaningful comparison for pointers constructed strictly from algorithmic rules based on a small number of variables5.

Figure 2 – Agreement between IPUMS and Respondent-Reported Pointers by Type of Pointer

Figure 2 shows a very high level of agreement (above 98% of observations) between IPUMS and Respondent-Reported pointers for identifying in-household mothers and fathers, but a declining level of agreement for identifying in-household spouses over the period 1996-2023.As shown above on Figure 2, there is a very high level of agreement between IPUMS and respondent-reported pointers of mothers and father (MOMLOC and POPLOC compared to MOMPIDRD and POPPIDRD), both of which show more 98% of agreement from 1997 onward. However, Figure 2 also shows a declining level of agreement between IPUMS-constructed and respondent-reported location of spouse (SPLOC and SPOUSEPNUMRD) in the household. At first glance, this decline appears to be almost monotonic throughout the whole period. Yet this overall trend hides two distinct components, as shown below on Figure 3.

Figure 3 – Discrepant Cases Between IPUMS and Respondent-Reported Pointers by Selected SPRULE Values

Figure 3 shows a growing proportion of observations where IPUMS and respondent-reported pointers are in disagreement by different values of the variable SPRULE over the period 1996-2023.

The first component of this declining rate of agreement is due to the presence of unmarried partners of household heads (RELATE code 30). These individuals cannot be designated as spouse in respondent-reported pointers, while IPUMS-constructed pointers do designate them as spouse in SPLOC. Hence this simply represents a case of IPUMS-constructed pointers relying on a broader definition of union, one that includes some cohabiting couples, to define their spousal pointer. Fortunately, this discrepancy can be directly addressed by using the variable SPRULE. Indeed, SPRULE code 21 contains all and only cases of unmarried partners to household heads coded as spouses in SPLOC. Users can therefore remove this source of discrepancy in their own extracts by simply recoding SPLOC as 0 for all observations that have SPRULE code 21. Figure 3 shows that the use of this rule has become more prevalent since MEPS was initiated, reaching a peak prevalence in the mid-2010s and declining afterward.

The second component of the declining rate of agreement in spousal pointers is more puzzling. This component has been growing in importance since the mid-2010s and cannot be addressed directly. These are a subset of individuals with SPRULE code 00; more specifically, individuals for whom the IPUMS-constructed family pointers find no spouse but who have a respondent-reported spouse located at any round of interview. For the most part, these are respondents living in one-person households reporting that they are married with a spouse present with them in the household and who provide what appears to be a valid PID for that person. This spouse is therefore only identified in the variable SPOUSEPNUMRD. In other words, our IPUMS programming rules cannot find any possible spouse for those respondents living in one-person households. It is unclear whether these discrepant cases result from incomplete household rostering on AHRQ’s part or from inaccurate respondent reports. Additional research on this issue is under way, notably to investigate whether recent trends in one-person households converge between IPUMS MEPS and other major US surveys.

In conclusion

Researchers interested in using family pointers in IPUMS MEPS should keep three caveats in mind. The first is the deterministic nature of the pointer attribution rules. Our family pointers are highly accurate but remain imperfect, and users can manage these imperfections with a great degree of flexibility using the rule variables. The second is the inclusion of some unmarried but cohabiting spouse in SPLOC, which users can directly manage using SPRULE code 21. These two caveats apply to all IPUMS data collections centered in the US. The third issue is specific to IPUMS MEPS, where we are observing a growing proportion of one-person households where respondents provide a PID for their spouse’s location in the household. We’ll keep you posted on this one.

Taken together, our family pointers are reliable, comparable, and provide new flexible opportunities for combining person-level and family-level analyses. Use these newly added variables to expand your research in both familiar and unfamiliar directions (pun very much intended)!

 


IPUMS USA applies a comparable methodology for 1970-forward samples and uses a similar but unique methodology for pre-1970.

2 This predefined logic states, for instance, that where there are multiple potential spouses in the household those individuals who are closer in age are more likely to be each other’s spouse than those individuals with a larger age gap; or that the older of two sets of dependent children in a household are more likely to have as parents the older of two sets of spouses in that same household (given a plausible age gap between parents and children). There are also rules assigning family pointers to dependent children with no clear parent in the household. For instance, IPUMS rules prioritize assigning those children to relatives over non-relatives; ever-married adults over never-married adults; older adults over younger adults; and so on. This serves as a reminder that IPUMS family pointers for parents represent social in addition to biological relationships within households.

Users should note that, in IPUMS MEPS, the householder is not strictly the first person listed on the household roster.

4 The Current Population Survey provides such self-reported pointers for 2007-onward.

5 We define agreement as IPUMS-constructed pointers correctly predicting parental pointers on all non-missing rounds of a given survey year, and as correctly predicting spousal pointers on any non-missing round of a given survey year. This is because we expect marital instability to be more prevalent within a calendar year than changes in living arrangements with one’s own parents.

Does 1 + 2 = 8? Automating QA/QC for Tabular Data

By Tracy Kugler and Tsu Zhu

The problem with OCR and numbers

To extract data tables from census reports only available as print documents, IPUMS IHGIS uses optical character recognition (OCR) software to automate the conversion of scanned images into digital representations of letters and numbers. OCR software has made great strides in accuracy for textual information by using dictionaries of known words to interpret uncertain letters. However, dictionaries do not help in distinguishing uncertain numerical digits. While a dictionary can suggest that the third character in “wh_t” should be an ‘a’ and not an ‘o’, there is no simple way to tell whether the third digit in “45_” should be a 3 or an 8. To ensure that IHGIS data are accurate, we must have confidence that each number has been recognized correctly and matches the number in the source document.

To address this gap, we developed an R package that leverages IHGIS structured metadata to identify logical relationships between cell counts and row/column totals and determine where cells don’t add up as expected. Often, a given cell participates in multiple relationships, which allows the package to use patterns among discrepancies to pinpoint and correct errors. The package can automatically identify and correct up to 95% of error cells, depending on the structure of relationships.

Identifying relationships from structured metadata

The R package currently relies on structured metadata generated by earlier stages in the IHGIS data processing pipeline to identify sum and total relationships among rows and columns. After tables are OCR’ed from source documents, we use a customized markup framework to generate metadata. We then convert the marked up files into CSV files with a standard structure, which serve as input to the quality assurance/quality control (QA/QC) process. The CSV files include hierarchical labels for categories on the columns and geographic units on the rows. Within the labels, blanks are used to indicate totals. The package identifies a column/row with a blank header cell as the sum of other columns/rows that share the same non-blank label(s) and have sub-category labels corresponding to the blank.

Continue reading…

IPUMS Announces 2025 Research Award Recipients

IPUMS research awardsIPUMS is excited to announce the winners of its annual IPUMS Research Awards. These awards honor both published research and nominated graduate student papers from 2025 that use IPUMS data to advance or deepen our understanding of social and demographic processes.

The 2025 competition awarded prizes for both published research and graduate student research (published or unpublished) in eight categories:

  • IPUMS USA: data from the U.S. decennial censuses (including full count data for 1850-1950) and American Community Survey Data
  • IPUMS CPS: monthly data from the Current Population Survey (back to 1976) and Annual Social and Economic supplement (back to 1962)
  • IPUMS International: harmonized data from censuses and labor force surveys around the world, contributed by more than 100 international statistical office partners, for 1960-forward
  • IPUMS Health Surveys: harmonized data from the U.S. National Health Interview Survey (NHIS) for 1963 onward and Medical Expenditure Panel Survey (MEPS) for 1996 onward
  • IPUMS Spatial: Census summary tables and GIS data from the US (IPUMS NHGIS) and around the world (IPUMS IHGIS), and measures of contextual determinants of health (IPUMS CDOH)
  • IPUMS Global Health: harmonized health survey data from around the world, including harmonized versions of the Demographic and Health Surveys (IPUMS DHS), Multiple Indicator Cluster Surveys (IPUMS MICS), and the Performance Monitoring for Action (IPUMS PMA)
  • IPUMS Time Use: time diary data from the American Time User Survey (IPUMS ATUS), historical and contemporary time use data from the U.S. (IPUMS AHTUS), and around the world (IPUMS MTUS)
  • IPUMS Excellence in Research: The IPUMS mission of democratizing data is strengthened by broad representation among our data users and the research that we highlight. This award was created to recognize the diversity of scholars doing innovative research with IPUMS data. This category includes submissions from all IPUMS data collections.

The award committee received and reviewed hundreds of nominations for our 2025 competition. From these publications the award committees selected the 2025 honorees.

Continue reading…

IPUMS DHS Goes Global

By Miriam L. King and Sula Sarkar

IPUMS DHS now includes integrated variables for 84 counties (up from 51) and nearly 350 samples (up from 233), including new data from Latin America, Eastern Europe, Oceania, the Caribbean, and Central and East Asia. Providing DHS data in a form that facilitates micro-analyses across countries is one of IPUMS’ greatest strengths, so researchers will be excited to learn that they can now do even more! Our latest data release expands the scope of IPUMS DHS beyond its initial coverage of Africa, the Middle East, and South Asia and adds the latest samples for 12 countries previously in the database. Figure 1 shows the full geographic scope of IPUMS DHS, as well as highlighting newly added countries and previously included countries with new samples.

Figure 1: Countries included in IPUMS DHSWorld map with countries that are new to IPUMS DHS, have new samples in IPUMS DHS, or have no new samples in IPUMS DHS filled in

Continue reading…

Adjust Monetary Values for IPUMS CPS

By Kari Williams with support from former IPUMS research staff member Danika Brockman

We love to extend useful functionality across multiple IPUMS data collections, so we were delighted to extend the the Adjust Monetary Values (AMV) feature, which adjusts dollar values for inflation and was first developed for IPUMS USA, to IPUMS CPS. The initial release of the AMV feature in IPUMS CPS in 2023 provided adjustment for a limited number of variables. Late last year, we extended the feature to cover variables from the ASEC as well. This blog post provides a quick introduction to the AMV tool and step-by-step guidance for using the tool in IPUMS CPS – for full details on the feature, see our IPUMS working paper on the AMV feature.

The Basics

The AMV feature allows users to adjust the monetary variables in a customized dataset from IPUMS into constant dollars, so that all monetary variables for months and years of data in your downloaded data file are in comparable units. IPUMS CPS variables are adjusted to 2010 dollars using the Consumer Price Index for All Urban Consumers (CPI-U). When you add an inflation-adjusted version of a variable to your data extract, the IPUMS data access system applies the appropriate CPI-U adjustment factor for each year to the variable(s) you’ve selected and includes both the original variable and an inflation-adjusted version of that variable in your extract. The adjustment factor is only applied to codes that represent monetary values in the original variable. All missing data codes (e.g., NIU, “Refused”, “Don’t Know”, and “No response”) from the original variable are combined into a single NIU code consisting entirely of 9s in the adjusted version of the variable (which is two digits wider than the original variable).

Note that IPUMS only adjusts monetary variables for years with a final published CPI-U. Final CPI-U values for a given year are typically published early in the next year (e.g., the 2025 CPI-U values are published in 2026). Notably for basic monthly CPS data, current year samples will not be available for adjustment because the final CPI-U will not be published. However, the reference period for the ASEC is the previous calendar year (e.g., 2024 is the reference year for the 2025 ASEC); the adjustment factor for the reference year has been published by the time Census Bureau releases the ASEC data each September and we integrate them into IPUMS CPS. One quick way to check whether the CPI-U value has been published for a given year is to consult the IPUMS CPS CPI99 documentation. We also update the IPUMS CPS revision history to note when we have extended the AMV tool to cover an additional year of data. Any adjusted variables in your extract for samples that are not yet available for monetary adjustment will consist entirely of the adjusted variable NIU code (i.e., a string of 9s).

Continue reading…

Linking children and adolescents to their mothers using IPUMS MICS

By Anna Bolgrien

IPUMS MICS offers hundreds of harmonized variables related to children’s health and wellbeing that allow for rich and innovative research. From the IPUMS MICS website, users can browse variables and create custom data extracts within a selected unit of analysis. In order to conduct many analyses, however, users will want to combine and link datasets relating to different units of analysis available in MICS.

IPUMS MICS menu of units of analysis for data browsing

For example, to investigate how child characteristics are related to characteristics of their mother, users will need to download and link data between the Children (either 0-4 or 5-17) unit of analysis and the Women unit of analysis.

IPUMS MICS provides instructions for linking across units of analysis as a user note. This user note lists the variables available as linking keys for each unit of analysis, and is a general guide for linking across the units, such as linking household characteristics with individual person records.

In this blog post, we provide more detailed information on how to link children and adolescents to their mothers. Similar logic can be applied to link children to fathers or other caregivers in the household. As IPUMS MICS requires Stata to conduct harmonization, we provide example code in Stata syntax.

Continue reading…

New Tool! ATUS-CPS Linking Counts

By Sarah Flood

The team at IPUMS is excited to introduce something brand new! ATUS-CPS Linking Counts is an interactive tool for exploring the number of ATUS respondents who can be linked to specific CPS months. We know that linkages between ATUS and CPS have great potential for enabling exciting new research, but we also know firsthand how hard it can be to wrap your head around the panel component of the CPS, the relationship between ATUS and CPS, and the many possibilities for linking them. Even researchers who have deep knowledge of the ATUS and CPS may still wonder whether there is a sufficient number of cases to conduct an analysis of interest. This new tool helps address all of these challenges. It very quickly allows you to view the number of ATUS respondents who should appear in each CPS month and determine if there is sufficient sample size for a particular application of linked ATUS-CPS data.

Linking ATUS and CPS data enables an incredible wealth of research questions. This tool allows users to specify and view different linking scenarios to assess the feasibility of various ATUS to CPS linkages. For example, you may want to investigate the relationship between food security in the CPS with shopping or eating-related behavior in the ATUS. This interactive tool would allow you to select only years of ATUS data that contain, for example, the Eating and Health module and view the CPS months in which the Food Security supplement was fielded to assess the sample sizes for your desired analysis. Figure 1 shows how you would select ATUS years of interest and find information about which ATUS modules were fielded in each year.

Figure 1. Selecting ATUS Years of Interest

drop-down menu displaying ATUS years with colored bubbles to indicate which ATUS modules are available in each year

Continue reading…