New Data! IPUMS International Spring 2025 Data Release

By Derek Burk, Lara Cleveland, Jane Lee, Rodrigo Lovaton, and Sula Sarkar

Megaphone with Exciting news speech bubble banner.

Great news for IPUMS International (IPUMSI) users! Our ever-expanding census and survey data collection has just released new harmonized census samples from Honduras (2013), Kenya (2019), Malawi (2018), Mongolia (2010, 2020), and Mozambique (2017). We now have an average of 4.5 censuses per country. The Kenya census collection now spans 50 years!

This release also includes a large series of quarterly Labor Force Surveys from the Philippines (1997-2019). The 91 waves of the Philippines Labor Force Survey contain a total of 18 million person records.

Many thanks to the National Statistical Office partners in these countries for their ongoing contributions.

Continue reading…

Tools for Combining Data Across IPUMS Global Health Surveys

By Miriam King, Devon Kristiansen, and Anna Bolgrien

IPUMS Global Health includes integrated data from three international health surveys: Demographic and Health Surveys (IPUMS DHS), Multiple Indicator Cluster Surveys (IPUMS MICS), and Performance Monitoring for Action (IPUMS PMA). All three surveys are nationally representative, primarily focus on low- and middle-income countries, and address issues related to the health and well-being of women and young children. These commonalities make combining integrated data across these data collections appealing. As Figure 1 shows, IPUMS DHS and IPUMS MICS cover different countries; combining them extends the geographic coverage of harmonized versions of data covering similar topics. Researchers can also combine data for those countries included in both IPUMS DHS and IPUMS MICS to provide additional observation points for time-series analyses.

Figure 1: Countries covered by IPUMS DHS and IPUMS MICS

World map with the countries included in IPUMS MICS and IPUMS DHS shaded in

Researchers who want to carry out cross-survey analyses face practical challenges. IPUMS imposes consistent variable names and codes within one kind of survey (DHS, MICS, or PMA); harmonized variable names and codes differ between these surveys. On each project’s website, the documentation for each variable highlights comparability issues to keep in mind when combining multiple samples, either within one type of survey or across survey types. IPUMS users must make separate customized data files from each database and merge those files. And subtle differences in question wording, skip patterns, geographic boundaries, and sampling procedures—such as MICS’ taking reports on child health from caretakers other than the biological mother—can introduce inconsistencies and inadvertent errors.

Continue reading…

Even More IPUMS Data Available in the SDA Online Data Analysis Tool

By Daniel Backman

Beyond offering the ability to create and download customized datasets from the IPUMS microdata collections, we also support web-based analysis of the data through the SDA (Survey Documentation and Analysis) online data analysis tool. SDA empowers users to analyze IPUMS data directly from their web browsers without the need for additional software or advanced programming skills. Whether you’re a seasoned researcher or a student exploring data for the first time, the SDA tool makes it easier than ever to unlock insights from our datasets. If you’re a current SDA user and ready to get started, check out the new datasets from IPUMS CPS and IPUMS MEPS. Otherwise, read on to learn more about SDA and how to use this tool to analyze IPUMS data.

About IPUMS & SDA

What is SDA?

The SDA tool is a web-based interface that allows you to generate frequency tables, cross-tabulations, and summary statistics; create customized data visualizations, including bar charts, line graphs, and scatter plots; perform regression analysis; and export results as a CSV file for presentations or further analysis.

SDA increases the accessibility of data by allowing users to analyze data through a web-interface without needing to use (or purchase!) statistical software. There is detailed guidance on how to use the tool for analyses and how to manipulate variables. Additionally, it provides exceptionally fast real-time processing of data, making it ideal for use in the classroom or other interactive settings. See our data training exercises page for exercises that will guide you through using SDA to analyze IPUMS data.

Continue reading…

Updated Land Cover Summaries for Census Tracts, County Subdivisions, Counties, and Places

By David Van Riper, ISRDI Director of Spatial Analysis

What’s new?

We just released updated land cover summaries for census tracts, county subdivisions, counties and places. Our land cover summaries describe the proportion of a particular geographic unit (e.g., a county or a census tract) that is covered by a particular land cover class (e.g., deciduous forest, evergreen forest, or cultivated crops). This release provides users with land cover summaries from nine vintages of the National Land Cover Database (NLCD) – 2001, 2004, 2006, 2008, 2011, 2013, 2016, 2019, and 2021. Summaries are available for 2010, 2020, and 2022 census tract, county subdivisions, counties, and places. We include 2022 versions of these geographic units because that was the year the Census Bureau began identifying planning regions in Connecticut. These regions replaced Connecticut’s historical counties, which have long had no official administrative function. These new planning regions changed the unique identifiers for census tracts, county subdivisions, and counties.

Why did IPUMS NHGIS create these land cover summaries?

Land cover data is commonly used to study the impacts of natural events such as hurricanes or human modifications such as converting forest to agriculture or agricultural land to developed land. Land cover data is typically released as a high spatial resolution, gridded spatial dataset where each grid cell (or pixel) is assigned to a land cover category (Figure 1, Panel A). The gridded data almost never align with the geographic units, and the high spatial resolution yields massive files that can be slow to process. A single NLCD file is 25 gigabytes in size.

We summarized nine versions of the NLCD to multiple sets of geographic units so that users can easily integrate the data into analyses already structured around geographic units. This reduces the burden on individual users to create such summaries themselves.

Continue reading…

Online IPUMS Document Collection

By Diana L. Magnuson; Curator and Historian, ISRDI

IPUMS now has an online IPUMS Document Collection for our ancillary census and survey materials collected by IPUMS International!

Boxes on shelves holding the IPUMS International manuscript collection
The IPUMS International manuscript collection

In 1999, with a social science infrastructure grant from the National Science Foundation (NSF), IPUMS International had a simple yet audaciously ambitious goal: preserve the world’s microdata resources and democratize access to those sources. Twenty-five years later, the project goals continue to be: collecting and preserving census and survey data and documentation; harmonizing those data; and disseminating the harmonized data free of charge.

IPUMS-I amassed tens of thousands of ancillary materials in support of its data harmonization work. These materials came from partner organizations: United State Census Bureau (USCB), United Nations Statistical Division (UNSD), Latin American and Caribbean Demographic Center (CELADE), The East-West Center, Centre Population et Dévelopement (CEPED), and over one hundred national statistical agencies. Examples of this material include correspondence, maps, enumerator instructions, supervisor instructions, training materials, codebooks, publicity, reports, newspaper clippings, unpublished papers, census timetables, data processing materials, and technical manuals. The ancillary materials in the IPUMS collection attest to the varied technical, business, social, and economic aspects of conducting census and surveys across time and space.

A portion of IPUMS-I grant money has funded the curation and preservation of the ancillary materials acquired by the project. For over two decades, archival staff have been preserving thousands of unique pieces of census and survey documentation, creating bibliographic records using an extended Dublin Core profile that supports the use of controlled vocabularies to enhance findability for the project staff and outside users. The goal of this work was the creation of a simple, findable, searchable, and downloadable document access system.

Continue reading…

Introducing the MEPS Variable Builder!

By Julia A. Rivera Drew

Earlier this year, IPUMS MEPS launched a new feature – the MEPS Variable Builder – to make it dramatically easier to create customized person-level variables that summarize information from the medical event and condition records and add them to your IPUMS extract. If you have ever thought about using the MEPS event and condition data but didn’t know where to begin because of the complexity of the data, the MEPS Variable Builder is for you!

The Medical Expenditure Panel Survey Household Component (MEPS-HC, referred to MEPS here) provides comprehensive information on characteristics of people residing in responding households, as well as information about their medical encounters during the calendar year – e.g., office-based provider visits, emergency room (ER) visits, and hospitalizations – and medical conditions associated with those medical encounters. This unique combination of information makes the MEPS data ideal for research questions that need detailed health care utilization and/or expenditure data alongside individual-level correlates of health. However, these rich data can be difficult to work with, creating barriers for researchers who wish to use the MEPS data.

IPUMS MEPS created the MEPS Variable Builder to enable users to easily build person-level variables summarizing information from the MEPS-HC event and medical condition records, also known as “event summary variables.” Using a point-and-click interface, researchers can create custom event summary variables that count the number of events or sum expenditures across event records, filtered on selected characteristics of events and/or medical conditions. Users can then include these custom event summary variables in their IPUMS extract. At this time, the variable builder does not include prescribed medicines data.

In this blog post, we run through an example where we create a variable that is the sum of all expenditures paid for by Workers’ Compensation for medical visits due to a workplace injury.

Continue reading…

Census Data for Good: Analysis to Action

By Lara Cleveland

IPUMS International regularly asks representatives of National Statistical Offices (NSOs) around the world to share their data with the research community. While IPUMS offers a license payment to countries for the right to redistribute microdata, NSO representatives are most interested in how sharing data with IPUMS will benefit the people of their countries. After 30 years of harmonizing data that NSOs have shared with us, IPUMS can indeed point to innovative research from data users all over the world, many at major universities in these partner countries. Directors of statistical offices, especially those with close ties to academia, are thrilled that the data are used for scholarly scientific production and for the purpose of educating the next generation. However, most of these leaders are much more interested in how data sharing leads to effective policy. And they want examples. They are essentially asking how the data have been “used for good,” as the original IPUMS tagline, “Use it for good!” implores.

Sustainable Development Goals Square Text Logo, color wheel as O in goals
IPUMS supports the Sustainable Development Goals

In response, IPUMS has been following data-to-policy trails where we can find them. The United Nations’ efforts to establish and measure the Sustainable Development Goals (SDGs) have provided wins in this area. Early in the life of the SDGs, colleagues from the World Health Organization visited IPUMS to leverage detailed information in the occupational variables for locating the health workforce. Microdata from censuses helped them measure the density of a range of health worker classifications at subnational levels. The International Organization for Migration (IOM) did similar work to disaggregate census-based SDGs by migratory status. At the start of the pandemic, The United Nations Population Fund (UNFPA) used IPUMS census microdata to spin up a dashboard showing the living arrangements of older adults, again at subnational levels. Each of these applications of IPUMS International data resulted in policy recommendations, informed by additional data, additional policy research, and pilot projects.

Continue reading…