Automating monthly workflows using IPUMS CPS and the IPUMS Microdata Extract API

By Renae Rodgers

As many readers will know, the Current Population Survey (CPS) is a monthly labor force survey that is, among other things, the data source for the monthly jobs report (or more formally the Employment Situation reports) from the Bureau of Labor Statistics.

In this blog post, I will show you how to create a reproducible, sustainable monthly workflow to update previous analyses using new data with IPUMS CPS data, IPUMS Microdata Extract API, and the ipumspy Python library.

If this is not your first CPS rodeo, you may already have a monthly workflow for working with IPUMS CPS data that suits your needs just fine – perhaps written in Stata. Did you know you can use ipumspy to make IPUMS CPS extracts from Stata?! Check out the set up instructions and template .do file in this blog post and optimize your monthly analysis even more with the IPUMS Microdata Extract API!

But I digress. In this blog post, I will first walk through a simple analysis using the IPUMS Microdata Extract API and ipumspy. I will then show you how to package that workflow so that it can be simply executed monthly when the most recent data becomes available from IPUMS CPS for refreshed analysis including the newest data.

An example IPUMS CPS, IPUMS Microdata Extract API workflow: teleworking due to COVID-19

Let’s suppose that we’re interested in looking at trends in telework due to COVID-19 over the course of the pandemic. The IPUMS CPS variable COVIDTELEW indicates whether the respondent worked from home at any time during the past 4 weeks due to COVID-19. This example will show us the overall trend in remote work due to COVID-19 as well as how teleworking breaks down by educational attainment. First we’ll define an IPUMS CPS extract that contains COVIDTELEW and EDUC variables and all months from May 2020 to June 2022.

Helpful Functions for Constructing Reproducible Workflows in the Absence of an IPUMS Microdata Metadata API

by Renae Rodgers

As you may have heard, the IPUMS Microdata Extract API is now in beta for our IPUMS USA and IPUMS CPS collections! Yay! You may have also heard that an IPUMS Metadata API is not yet available. Bummer. This means that users still need to do data discovery via the IPUMS web sites, which means a lot of clicking. It also makes it difficult to set up a reproducible workflow that can be updated as new data become available (and who doesn’t want that?). This blog post will show you some easy functions you can use to retrieve sample metadata to create a monthly or annual workflow that doesn’t require visiting an IPUMS website, and walk through some ipumspy functionality to access metadata for variables already included in your IPUMS extracts. The workflow in this blog post is using ipumspy v0.2.1.

If Stata is your stats package of choice, you can leverage ipumspy  to make IPUMS extracts from a Stata do file! You can spruce up your workflow with any of the functions in the blog post below by incorporating them into the template .do file offered in our blog post on making IPUMS extracts from Stata.

Sample Metadata

To get started, import the utilities  module from ipumspy. All of the helper functions in this blog post will be using the  CollectionInformation  class from this module. The sample_ids attribute of  CollectionInformation  returns a dictionary with sample descriptions as keys and sample ids as values. This information is pulled from the sample ID page for the specified IPUMS data collection. This page is the source of the sample_ids dictionary for IPUMS CPS, and this page is the source of the sample_ids  dictionary for IPUMS USA.

from ipumspy import utilities

Functions for retrieving IPUMS CPS sample IDs

First, let’s take a look at the sample ID dictionary for IPUMS CPS.

IPUMS International has brand new low level geographic variables and shapefiles

By Quinn Heimann

Map showing percentage of households with internet access in the 2014 Myanmar census by township
Map of Myanmar Internet Access

An ongoing goal and challenge for IPUMS-International (IPUMSI) is providing users with the most detailed geography possible. A unique obstacle to this is the confidentiality requirements agreed upon in order to distribute these census and survey samples. Nevertheless, IPUMSI has started launching lower-level geographic variables in samples where data is sufficient and confidentiality thresholds are still met. As of spring 2022, twenty samples have been released with third administrative level geographic data, covering ten countries across Africa and Asia. In addition, accompanying shapefiles are also being distributed to supplement these variables. Shapefiles can be used in conjunction with these more granular geographic variables to map out population trends in greater detail.

Screenshot to IPUMS International third level download page
IPUMSI third level download page

Many of these countries have multiple samples with lower level geography variables available. It is always a goal of IPUMSI to provide users with as much detail as possible for each sample, but this is sometimes hindered by a lack of sufficient data or detail. Some countries, such as Bangladesh and Mali, contain sufficient detail to provide lower level geography for all available samples in IPUMSI. More recent samples often contain more detail and more thorough documentation, whereas oftentimes this level of information is not present for samples produced longer ago.

