Accessing IPUMS NHGIS in R: A Primer

By Finn Roberts & Jonathan Schroeder

R users have a powerful new way to access IPUMS NHGIS!

The July 2023 release of ipumsr 0.6.0 includes a fully-featured set of client tools enabling R users to get NHGIS data and metadata via the IPUMS API. Without leaving their R environment, users can find, request, download and read in U.S. census summary tables, geographic time series, and GIS mapping files for years from 1790 through the present. This blog post gives an overview of the possibilities and describes how to get started.

What you can do with ipumsr

Request and download NHGIS data

You can use ipumsr to specify the parameters of an NHGIS data extract request and submit that request for processing by the IPUMS servers. You can request any of the data products that are available through the NHGIS Data Finder: summary tables, time series tables, and shapefiles. You can also specify general formatting parameters (e.g., file format or time series table layout) to customize the structure of your data extract.

Once you have specified a data extract, you can use a series of ipumsr functions to:

  • submit the extract request to the IPUMS servers for processing
  • check on the extract status
  • wait for the extract to complete
  • download the extract as soon as it’s ready
  • load the data into R with detailed data field descriptions.

This workflow allows you to go from a set of abstract NHGIS data specifications to analyzable data, all without having to leave your R session!

Continue reading…

Going Global: IPUMS International

By Diana Magnuson

Display case with a banner "Going Global: IPUMS International" and memorabilia from around the world
The display case at IPUMS HQ

A new exhibit, “Going Global: IPUMS International,” is now on display at IPUMS headquarters, housed at the University of Minnesota. The exhibit features pieces that tell the history and scope of IPUMS International.

Beginning in 1999 with a social science infrastructure grant from the National Science Foundation, IPUMS International had a simple yet audaciously ambitious goal: preserve the world’s microdata resources and democratize access to those resources. Twenty-four years later, the goals are: collecting and preserving census and survey data and documentation; harmonizing those data; and disseminating the harmonized data free of charge. The data series includes information on an impressive range of population characteristics, including fertility, nuptiality, life-course transitions, migration, labor-force participation, occupational structure, education, ethnicity, and household composition.

Dr. Bob McCaa standing behind a table with stacks of paper
Dr. Bob McCaa

Source data for IPUMS International are generously provided by participating national statistical offices. Our staff develop and nurture relationships with representatives of NSOs from around the world. As IPUMS International got underway, co-principal investigator Dr. Bob McCaa, University of Minnesota Department of History, “proved to have formidable persuasive powers and managed to convince . . . agency directors of the benefits of preservation and access to scientific information.” Over time, IPUMS International developed a team of research scientists articulating to a broad international audience the significance of the IPUMS data collection, harmonization, and preservation work. Today, an NSF advisory committee, senior personnel including research scientists and data analysts, an external advisory panel, and graduate and undergraduate research assistants all support the work of IPUMS International.

Continue reading…

Preparing Time Diary Data to Create Tempograms and to Conduct Sequence Analysis

By Sarah Flood and Kamila Kolpashnikova

Time diary data: a unique opportunity

Time diary data offer researchers an opportunity to visualize daily life in a way that just isn’t possible with other data and demonstrating how people spend time. Respondents report every activity that they engage in (along with where and who they were with) over the course of the day, which means that time diaries can indicate how much time was spent in various activities as well as when activities occur during the day (e.g., timing) and the order in which they occur (i.e., sequencing) . This blog post will describe how to transform IPUMS ATUS data to perform these types of analyses, illustrate how to create a tempogram (including sample code), and link to additional resources for creating tempograms and performing sequence analysis.

While there are several ways to leverage the unique properties of time diary data, analysts are increasingly interested in creating tempograms and conducting sequence analyses, both of which capitalize on the temporal specificity of time diary data. These techniques allow researchers to explore the timing and order of activities over the course of a day. Both creating tempograms and conducting sequence analysis require time units that are consistent across respondents. Most time diary data are not natively in this format.

Continue reading…

IPUMS FAQ: Alternative Measures of Unemployment

By Matthew Bombyk

As part of the IPUMS mission to democratize data, our User Support team strives to answer your questions about the data. Over time, some questions are repeated. This blog post is an extension of an earlier series addressing frequently asked questions. Maybe you’ll learn something. Perhaps you’ll just find the information interesting. Regardless, we hope you enjoy it!

Here’s one of those questions:

How can I use IPUMS CPS to calculate the Alternative Measures of Unemployment published by the BLS?

Every month the Bureau of Labor Statistics (BLS) publishes a set of Alternative Measures of Labor Underutilization as part of its well-known Employment Situation News Release. A common question we are asked at IPUMS is how to calculate these rates using IPUMS CPS data. The “headline” unemployment figure is known as U-3 and is a straightforward calculation using only the main employment status variable, EMPSTAT. However, the other measures are not quite as simple. Nonetheless, these can be calculated using IPUMS CPS! Using the table below, you can calculate these rates using the public use microdata.

Continue reading…

Malaria Transmission in Context: Linking Health, Census, and Ecological Data

by Yara Ghazal, Ilyana Hohenkirk, Tracy Kugler, and Kelly Searle

Malaria, like many vector-borne diseases, impacts health, economic growth, and society. The burden of malaria incidence and death is concentrated in Sub-Saharan Africa; in 2020, 95% of all malaria cases and 96% of all deaths occurred in Sub-Saharan Africa (WHO, 2022). Malaria impacts not only population health but also the economic growth of these 32 countries. It is estimated that up to 1.3% of economic growth in this region of Africa is slowed each year due to malaria (CCP-JHU, 2015). Understanding malaria transmission is essential to ending its spread and creating a healthier and more prosperous future for developing nations.

The literature on malaria transmission patterns has shown that several environmental factors impact mosquito and parasite vital rates, and thus affect the transmission intensity, seasonality, and geographical distribution of malaria (Castro, 2017). Temperature and precipitation are the primary climate-based factors that influence malaria transmission patterns. Temperature creates geographical constraints for vector and parasite development. Increasing temperatures have been found to shorten mosquito maturation time and increase feeding frequency. However, areas of extremely high temperatures usually yield smaller, less fecund mosquitoes. In parallel, because mosquitoes often breed in pools formed by rainfall and flooding, the frequency, duration, and intensity of precipitation have a significant influence on mosquito populations.

Continue reading…

Guidance for Pooling Multiple Years of NHIS Data

By Julia A. Rivera Drew

Introduction

Depending on their research question, analysts will commonly pool multiple years of the National Health Interview Survey (NHIS) data together in order to increase sample sizes of particular subpopulations of interest, such as bisexual adults, immigrants, or pregnant women. The complex design of the NHIS, however, requires analysts to take additional steps to correctly construct and analyze pooled NHIS datasets. Moreover, planned changes to the NHIS design implemented in 2019, as well as changes made in response to the COVID-19 pandemic, require additional special handling to correctly analyze datasets combining multiple years of NHIS data. The objectives of this blog post are to: (1) share tips to correctly construct and analyze pooled NHIS datasets and (2) identify resources for more information.

Tips to Correctly Construct and Analyze Pooled NHIS Datasets

1. Create a pooled sampling weight to use with your pooled dataset.

In general, when pooling multiple years of NHIS data together, you will need to create a new sampling weight to use with the pooled sample. To create this new sampling weight, divide the appropriate sampling weight by the number of years within each distinct sample design period. For example, if one wished to estimate the number of children living in families with low or very low food security (FSSTAT) using pooled 2020-2021 NHIS data (e.g., similar to this report), one would need to create a new sampling weight by dividing the sampling weight identified under the “weights” tab for FSSTAT, SAMPWEIGHT, by the number of years pooled together from the same sampling design period (in this case, two). The sum of the pooled weights would then represent the average annual population size for the pooled time period, rather than the total cumulative population size for the pooled time period. For any given combination of variables, refer to information under the “weights” tab for the variables included in your analysis to help select the appropriate sampling weight. The distinct NHIS sample design periods are 1963-1974, 1975-1984, 1985-1994, 1995-2005, 2006-2015, 2016-2018, and 2019-present.

Continue reading…

New to IPUMS USA: The Adjust Monetary Values Feature

By Danika Brockman and the Adjust Monetary Values Team

Introducing the Adjust Monetary Values feature

The team at IPUMS is excited to introduce a brand-new extract feature, Adjust Monetary Values, which gives you the option to adjust monetary variables to constant units in the IPUMS data extract system. We know firsthand how tedious it can be to compare things like income and rent over time when you have to manually adjust for inflation. This feature allows you to request pre-adjusted monetary variables (e.g., INCWAGE) as part of your extract request! The feature is first being released on IPUMS USA, where you will be able to adjust monetary variables to 2010 dollars.

What does the Adjust Monetary Values feature do?

This feature gives you the option to adjust the monetary variables you have added to your data cart into constant dollars, so that all samples in your data cart are comparable across time for your selected monetary variables. IPUMS USA variables are adjusted to 2010 dollars using the Consumer Price Index for All Urban Consumers (CPI-U). For more information about why the CPI-U was chosen as the pricing index for this feature, see the Monetary Adjustment Feature page.

When you add an inflation-adjusted version of a variable to your data cart, the IPUMS data extract system applies the appropriate CPI-U adjustment factor for each sample year to the variable(s) you’ve selected. Your extract will include both the original monetary variable and the inflation-adjusted monetary variable. Special codes (e.g., NIU, missing) will not be affected by the inflation adjustment. Inflation-adjusted versions of variables will assign all specialty (i.e., non-monetary) codes to a code comprised exclusively of “9’s” with a width two digits greater than the largest value in the original variable (e.g., a variable where the maximum monetary value is “8500,” would assign all specialty codes to “999999” and apply a label of “Non-monetary.”) For details on the original specialty codes and their labels, consult the documentation for the original variable on the IPUMS USA website or cross-tab the adjusted and original variables in your statistical program (note that you may want to include a qualifying if statement so you see only the non-monetary codes).

Continue reading…