IPUMS Announces 2024 Research Award Recipients

IPUMS research awardsIPUMS is excited to announce the winners of its annual IPUMS Research Awards. These awards honor both published research and nominated graduate student papers from 2024 that use IPUMS data to advance or deepen our understanding of social and demographic processes.

The 2024 competition awarded prizes for the best published and best graduate student research in eight categories, each associated with specific IPUMS data collections:

  1. IPUMS USA, providing data from the U.S. decennial censuses, the American Community Survey, and includes full count data, from 1850 to the present.
  2. IPUMS CPS, providing data from the monthly U.S. labor force survey, the Current Population Survey (CPS), from 1962 to the present.
  3. IPUMS International, providing harmonized data contributed by more than 100 international statistical office partners; it currently includes information on over 1 billion people in more than 547 censuses and surveys from around the world, from 1960 forward.
  4. IPUMS Health Surveys, which makes available the U.S. National Health Interview Survey (NHIS) and the Medical Expenditure Panel Survey (MEPS).
  5. IPUMS Spatial, covering IPUMS NHGIS, IPUMS IHGIS, and IPUMS CDOH. NHGIS includes GIS boundary files from 1790 to the present; IHGIS provides data tables from population and housing censuses as well as agricultural censuses from around the world; CDOH provides access to measures of disparities, policies, and counts, by state and county, for historically marginalized populations in the US.
  6. IPUMS Global Health, providing harmonized data from the Demographic and Health Surveys (DHS), Multiple Indicator Cluster Surveys (MICS), and the Performance Monitoring for Action (PMA) data series, for low and middle-income countries.
  7. IPUMS Time Use, providing time diary data from the U.S. and around the world from 1930 to the present.
  8. IPUMS Excellence in Research, The IPUMS mission of democratizing data demands that we increase representation of scholars from groups that are systemically excluded in research spaces. This award is an opportunity to highlight and reward outstanding work using any of the IPUMS data collections by authors who are underrepresented in social science research*.

Over 1,300 publications based on IPUMS data appeared in journals, magazines, and newspapers worldwide last year. From these publications and from nominated graduate student papers, the award committees selected the 2024 honorees.

Continue reading…

IPUMS CPS Checks on Basic Monthly Data

By Sarah Flood, Renae Rodgers, and Kari Williams

Federal data are critical for understanding much about the US population from its size and composition to its health and employment. The Current Population Survey (CPS) is our nation’s official source of information about the labor force. At the beginning of each month, we eagerly await the first Friday when the Employment Situation Summary (aka the monthly jobs report) will be released (it isn’t just us, right??). The monthly snapshot of the US labor force serves as a bellwether for how our economy is faring.

The Wednesday after the jobs report is released, we at IPUMS clear the decks in preparation for the release of the CPS Basic Monthly Survey (BMS) by the Census Bureau. The CPS BMS is the individual-level data from which the jobs report is generated. Our goal is always to process these data as soon as they’re released by the Census Bureau so that we can deliver them to IPUMS CPS users as quickly as possible. Those who rely on CPS BMS data each month might be familiar with coping strategies while waiting for the data–obsessive page refreshing, some nervous pacing, maybe wondering why they haven’t yet been released (iykyk).

While quickly processing CPS Basic Monthly data is a priority, so, too, is ensuring data quality. Each month, we carefully inspect CPS BMS data at several points in our process. First, we review all of the variables for codes that are undocumented or have suspicious frequencies. Second, we rely on a suite of tools during our integration process that alert us to any codes in the data that we haven’t accounted for in our variable-level harmonizations. After harmonization, we compare univariate statistics from the newest month data to the previous month of data. Generally we expect very little change across months and we have built tools that are designed to flag variable-level differences above a certain threshold as well as new codes on either end of the distribution.

Continue reading…

Unlocking Spatial and Social Data with R: Introducing the R Spatial Notebook Series

By Kate Vavra-Musser

Introduction: What is the R Spatial Notebooks Project?

The R Spatial Notebooks Project is a series of R code notebooks, structured like a textbook, designed to guide users through the intricacies of data extraction, integration, cleaning, analysis, and visualization using R. The notebooks are specifically tailored for social science research and applications using spatial data. The modular textbook-style structure is designed for comprehensive skill development by working through sequences of notebooks. The project was developed through a partnership between the Institute for Social Research and Data Innovation (ISDRI), which houses IPUMS, and the Institute for Geospatial Understanding through an Integrated Discovery Environment (I-GUIDE). IPUMS provides census and survey data from around the world integrated across time and space. I-GUIDE is cyberinfrastructure that combines distributed geospatial data with computing for researchers, students, and policymakers.

The initial R Spatial Notebooks release includes roughly 20 freely-available notebooks on topics including IPUMS data extraction via API, accessing open-source data, data cleaning, foundational spatial data principles, exploratory data analysis, and mapping.

Continue reading…

New Data! IPUMS International Spring 2025 Data Release

By Derek Burk, Lara Cleveland, Jane Lee, Rodrigo Lovaton, and Sula Sarkar

Megaphone with Exciting news speech bubble banner.

Great news for IPUMS International (IPUMSI) users! Our ever-expanding census and survey data collection has just released new harmonized census samples from Honduras (2013), Kenya (2019), Malawi (2018), Mongolia (2010, 2020), and Mozambique (2017). We now have an average of 4.5 censuses per country. The Kenya census collection now spans 50 years!

This release also includes a large series of quarterly Labor Force Surveys from the Philippines (1997-2019). The 91 waves of the Philippines Labor Force Survey contain a total of 18 million person records.

Many thanks to the National Statistical Office partners in these countries for their ongoing contributions.

Continue reading…

Tools for Combining Data Across IPUMS Global Health Surveys

By Miriam King, Devon Kristiansen, and Anna Bolgrien

IPUMS Global Health includes integrated data from three international health surveys: Demographic and Health Surveys (IPUMS DHS), Multiple Indicator Cluster Surveys (IPUMS MICS), and Performance Monitoring for Action (IPUMS PMA). All three surveys are nationally representative, primarily focus on low- and middle-income countries, and address issues related to the health and well-being of women and young children. These commonalities make combining integrated data across these data collections appealing. As Figure 1 shows, IPUMS DHS and IPUMS MICS cover different countries; combining them extends the geographic coverage of harmonized versions of data covering similar topics. Researchers can also combine data for those countries included in both IPUMS DHS and IPUMS MICS to provide additional observation points for time-series analyses.

Figure 1: Countries covered by IPUMS DHS and IPUMS MICS

World map with the countries included in IPUMS MICS and IPUMS DHS shaded in

Researchers who want to carry out cross-survey analyses face practical challenges. IPUMS imposes consistent variable names and codes within one kind of survey (DHS, MICS, or PMA); harmonized variable names and codes differ between these surveys. On each project’s website, the documentation for each variable highlights comparability issues to keep in mind when combining multiple samples, either within one type of survey or across survey types. IPUMS users must make separate customized data files from each database and merge those files. And subtle differences in question wording, skip patterns, geographic boundaries, and sampling procedures—such as MICS’ taking reports on child health from caretakers other than the biological mother—can introduce inconsistencies and inadvertent errors.

Continue reading…

Even More IPUMS Data Available in the SDA Online Data Analysis Tool

By Daniel Backman

Beyond offering the ability to create and download customized datasets from the IPUMS microdata collections, we also support web-based analysis of the data through the SDA (Survey Documentation and Analysis) online data analysis tool. SDA empowers users to analyze IPUMS data directly from their web browsers without the need for additional software or advanced programming skills. Whether you’re a seasoned researcher or a student exploring data for the first time, the SDA tool makes it easier than ever to unlock insights from our datasets. If you’re a current SDA user and ready to get started, check out the new datasets from IPUMS CPS and IPUMS MEPS. Otherwise, read on to learn more about SDA and how to use this tool to analyze IPUMS data.

About IPUMS & SDA

What is SDA?

The SDA tool is a web-based interface that allows you to generate frequency tables, cross-tabulations, and summary statistics; create customized data visualizations, including bar charts, line graphs, and scatter plots; perform regression analysis; and export results as a CSV file for presentations or further analysis.

SDA increases the accessibility of data by allowing users to analyze data through a web-interface without needing to use (or purchase!) statistical software. There is detailed guidance on how to use the tool for analyses and how to manipulate variables. Additionally, it provides exceptionally fast real-time processing of data, making it ideal for use in the classroom or other interactive settings. See our data training exercises page for exercises that will guide you through using SDA to analyze IPUMS data.

Continue reading…

Updated Land Cover Summaries for Census Tracts, County Subdivisions, Counties, and Places

By David Van Riper, ISRDI Director of Spatial Analysis

What’s new?

We just released updated land cover summaries for census tracts, county subdivisions, counties and places. Our land cover summaries describe the proportion of a particular geographic unit (e.g., a county or a census tract) that is covered by a particular land cover class (e.g., deciduous forest, evergreen forest, or cultivated crops). This release provides users with land cover summaries from nine vintages of the National Land Cover Database (NLCD) – 2001, 2004, 2006, 2008, 2011, 2013, 2016, 2019, and 2021. Summaries are available for 2010, 2020, and 2022 census tract, county subdivisions, counties, and places. We include 2022 versions of these geographic units because that was the year the Census Bureau began identifying planning regions in Connecticut. These regions replaced Connecticut’s historical counties, which have long had no official administrative function. These new planning regions changed the unique identifiers for census tracts, county subdivisions, and counties.

Why did IPUMS NHGIS create these land cover summaries?

Land cover data is commonly used to study the impacts of natural events such as hurricanes or human modifications such as converting forest to agriculture or agricultural land to developed land. Land cover data is typically released as a high spatial resolution, gridded spatial dataset where each grid cell (or pixel) is assigned to a land cover category (Figure 1, Panel A). The gridded data almost never align with the geographic units, and the high spatial resolution yields massive files that can be slow to process. A single NLCD file is 25 gigabytes in size.

We summarized nine versions of the NLCD to multiple sets of geographic units so that users can easily integrate the data into analyses already structured around geographic units. This reduces the burden on individual users to create such summaries themselves.

Continue reading…