Accessing IPUMS NHGIS in R: A Primer

By Finn Roberts & Jonathan Schroeder

R users have a powerful new way to access IPUMS NHGIS!

The July 2023 release of ipumsr 0.6.0 includes a fully-featured set of client tools enabling R users to get NHGIS data and metadata via the IPUMS API. Without leaving their R environment, users can find, request, download and read in U.S. census summary tables, geographic time series, and GIS mapping files for years from 1790 through the present. This blog post gives an overview of the possibilities and describes how to get started.

What you can do with ipumsr

Request and download NHGIS data

You can use ipumsr to specify the parameters of an NHGIS data extract request and submit that request for processing by the IPUMS servers. You can request any of the data products that are available through the NHGIS Data Finder: summary tables, time series tables, and shapefiles. You can also specify general formatting parameters (e.g., file format or time series table layout) to customize the structure of your data extract.

Once you have specified a data extract, you can use a series of ipumsr functions to:

  • submit the extract request to the IPUMS servers for processing
  • check on the extract status
  • wait for the extract to complete
  • download the extract as soon as it’s ready
  • load the data into R with detailed data field descriptions.

This workflow allows you to go from a set of abstract NHGIS data specifications to analyzable data, all without having to leave your R session!

Get metadata describing NHGIS data

You can also use ipumsr to view metadata about NHGIS data. This includes both high-level summaries of all available datasets, time series tables, and shapefiles as well as specific details about particular summary tables and time series.

Access to this information can simplify workflows in several ways:

Identify available data

Browse data descriptions, geographic levels, comparability, and more to explore what’s available and find data that suits your particular research needs. You could also use other R capabilities to search and filter through the thousands of data descriptions in ways that the NHGIS Data Finder doesn’t support.

Create extract requests

Use the names and options given in the NHGIS metadata to specify requests for desired data. For instance, the metadata for a specific dataset will include lists of tables, geographic levels, and breakdowns available for that dataset. After getting that information, you could copy the names of any items of interest directly into an extract request definition.

Streamline data management

You can even use metadata as a resource to build pipelines to make basic data management tasks easier. For instance, you can write code to search the metadata for the most recent ACS 1-year release and add that dataset to your base extract definition when it becomes available, allowing you to quickly update your analysis with the latest data.

Share NHGIS extract definitions

The NHGIS terms of use prohibit the redistribution of NHGIS data outside of collaborative work groups without permission. For public redistribution—e.g., to support research reproducibility, to fulfill journal requirements, to supply work datasets for an open online course—IPUMS requires users to access data directly from our sites, helping us track usage and demonstrate value to our funders.

The ipumsr package provides an easy way to facilitate data sharing without violating the terms of use: instead of sharing your extract data, you can share the extract definition.

With ipumsr, you can easily save an extract definition to a JSON file and share that with others who can then load the definition into R and submit it to the IPUMS servers for processing under their own account.

View previous NHGIS extract definitions

You can also use ipumsr to view the definitions of extracts you previously requested—either via the web or via ipumsr—which you can then share or use as the foundation for a new extract request with similar parameters. You can even view the parameters for extracts that have expired. (All NHGIS extracts expire two weeks after completion, meaning the data are no longer available for download, but the definitions persist.) If you need to reproduce the data in an expired extract, simply resubmit it and download the new files when complete.

Getting Started

To install the latest update of ipumsr from CRAN, run

install.packages("ipumsr")

in your R console. You can ensure your currently installed version is greater than 0.6.0 with

packageVersion("ipumsr")

Requesting NHGIS data and metadata via ipumsr also requires that you first register to use NHGIS (if you haven’t already) and obtain an IPUMS API key. The IPUMS API (Application Programming Interface) is the system through which IPUMS enables programmatic access to its servers. All of the ipumsr functions that access NHGIS data and metadata do so by submitting calls to the API. Each API call must include a key for a specific registered IPUMS user, which enables IPUMS servers to authenticate API users and associate their requests with their accounts.

The ipumsr website includes several articles that demonstrate ways to work with the IPUMS API within R, including instructions on how to get an API key and use it with ipumsr. We suggest you start with the introduction to the IPUMS API for R users. This outlines the core workflow for creating, submitting, and downloading an extract within ipumsr.

Once you have a sense of this workflow, check out the NHGIS API Requests article for more details on the available options when dealing specifically with NHGIS metadata and extract requests.

Finally, the Reading IPUMS Data article will get you familiar with the specialized ipumsr file-reading functionality.

Final Notes

Other Supported Data Collections

The IPUMS API currently supports access to three other IPUMS data collections in addition to NHGIS:

  • IPUMS USA
  • IPUMS CPS
  • IPUMS International

These data collections are also supported by ipumsr, and use the same general workflow as does NHGIS except that the IPUMS API does not yet support metadata access for these collections. For more details about the specifics of requesting data from these collections, see the Microdata API Requests article.

The IPUMS team will continue to add API support for more IPUMS collections in the future, and as we do so, we intend to add parallel support in ipumsr. So, if a collection you’re interested in isn’t available yet, stay tuned! Check out the API development roadmap to get a sense of what features may be available in the future.

Python Support

IPUMS also maintains ipumspy, a python library that provides much of the same functionality that is available in ipumsr. At the current time, we have not yet added NHGIS support to ipumspy. We hope to do so in the near future!