by Renae Rodgers
As you may have heard, the IPUMS Microdata Extract API is now in beta for our IPUMS USA and IPUMS CPS collections! Yay! You may have also heard that an IPUMS Metadata API is not yet available. Bummer. This means that users still need to do data discovery via the IPUMS web sites, which means a lot of clicking. It also makes it difficult to set up a reproducible workflow that can be updated as new data become available (and who doesn’t want that?). This blog post will show you some easy functions you can use to retrieve sample metadata to create a monthly or annual workflow that doesn’t require visiting an IPUMS website, and walk through some ipumspy
functionality to access metadata for variables already included in your IPUMS extracts. The workflow in this blog post is using ipumspy v0.2.1.
If Stata is your stats package of choice, you can leverage ipumspy
to make IPUMS extracts from a Stata do file! You can spruce up your workflow with any of the functions in the blog post below by incorporating them into the template .do file offered in our blog post on making IPUMS extracts from Stata.
Sample Metadata
To get started, import the utilities
module from ipumspy
. All of the helper functions in this blog post will be using the CollectionInformation
class from this module. The sample_ids
attribute of CollectionInformation
returns a dictionary with sample descriptions as keys and sample ids as values. This information is pulled from the sample ID page for the specified IPUMS data collection. This page is the source of the sample_ids
dictionary for IPUMS CPS, and this page is the source of the sample_ids
dictionary for IPUMS USA.
from ipumspy import utilities
Functions for retrieving IPUMS CPS sample IDs
First, let’s take a look at the sample ID dictionary for IPUMS CPS.