IPUMS International has brand new low level geographic variables and shapefiles

By Quinn Heimann

Map showing percentage of households with internet access in the 2014 Myanmar census by township
Map of Myanmar Internet Access

An ongoing goal and challenge for IPUMS-International (IPUMSI) is providing users with the most detailed geography possible. A unique obstacle to this is the confidentiality requirements agreed upon in order to distribute these census and survey samples. Nevertheless, IPUMSI has started launching lower-level geographic variables in samples where data is sufficient and confidentiality thresholds are still met. As of spring 2022, twenty samples have been released with third administrative level geographic data, covering ten countries across Africa and Asia. In addition, accompanying shapefiles are also being distributed to supplement these variables. Shapefiles can be used in conjunction with these more granular geographic variables to map out population trends in greater detail.

Screenshot to IPUMS International third level download page
IPUMSI third level download page

Many of these countries have multiple samples with lower level geography variables available. It is always a goal of IPUMSI to provide users with as much detail as possible for each sample, but this is sometimes hindered by a lack of sufficient data or detail. Some countries, such as Bangladesh and Mali, contain sufficient detail to provide lower level geography for all available samples in IPUMSI. More recent samples often contain more detail and more thorough documentation, whereas oftentimes this level of information is not present for samples produced longer ago.

Map series showing third administrative boundaries in Bangladesh, called Upazilas, in the 1991, 2001, and 2011 censuses for the entire country and the Dhaka urban area
Map of Bangladesh showing complete level3 series


Another challenge associated with distributing more granular geographic data to users is the production of related shapefiles. IPUMSI aims to provide accompanying shapefiles to all lower level geographic variables produced, however, certain samples may be more difficult to produce these files for if adequate maps are not present, or the country is very large. One example is China, for which IPUMSI has just released lower level variables. As China is very large geographically and consists of more than 2,500 counties, processing is slower than for other countries. As a compromise, the IPUMSI team has released all currently available county-level variables for each sample for China, and a special GIS file that highlights select urban areas across the country for the 2000 sample. This combination hopes to provide users with as much data as possible, while also providing supplemental geographic files while the complete lower level file is being processed.

Map showing median age by counties in Chongqing and Shanghai cities as well as their surrounding prefectures
Map of select China cities, showing adjacent areas

As IPUMSI moves forward with further low level geographic variable creation, it is important to note the great amount of effort that is needed to create these variables. Many datasets provided to IPUMS are lacking sufficient detail to publish geographic detail beyond the first or second administrative level. The greatest amount of time spent with these variables is matching many codes and labels from datasets to real world boundaries. Oftentimes data can be present, but sufficient maps or shapefiles are not present, which is the case with Ethiopia and Senegal. In these cases, IPUMSI works hard to disseminate as many years of data as possible, but the earlier years are omitted. IPUMSI hopes to obtain further funding and resources to continue producing low level geographic variables and shapefiles.

Reproducible Research with R Markdown, ipumsr, and the IPUMS API

By Dan Ehrlich

Have you ever wanted to share a project using IPUMS data with a colleague, but then thought, “Oh wait! It is against the terms of use to redistribute my IPUMS data file!”

Maybe you’d like a colleague to explore your findings. Or maybe you’re a teacher with an exercise you’d like your students to review and replicate. In the past, if you wanted someone to use the same IPUMS data that you did, you would need to provide a list of samples and variables and instructions for your collaborator on how to navigate the online data extract system.

If you’re thinking that sounds like a pain, don’t worry, the brand new IPUMS microdata API makes it easier than ever to share your extract definitions with fellow IPUMS users!!! Using the microdata API, you and your collaborators can:

  • Save an extract definition as a .json file that can be shared freely
  • Submit a new extract request based on a .json definition
  • Download data and metadata directly into your project directory (this feature is a personal favorite)

Continue reading…

Sharing IPUMS Extract Definitions Using ipumspy

By Renae Rodgers

What is an Extract?

IPUMS users will already be familiar with the concept of an extract, but for those who may just be joining us, we’ll do a brief recap. Public Use data files are often large, unwieldy blocks of data, many variables wide and many many records long. Most analyses will only require a small subset of the available variables in any given dataset, but downloading public data from government agencies is an all-or-nothing endeavor. In addition to offering public use data that is harmonized across time and place, IPUMS allows users to choose only their variables of interest for download. These individualized datasets and accompanying metadata are IPUMS extracts.

What is an Extract Definition?

In short, an IPUMS extract definition is all the information needed to create a user’s personalized extract data file and accompanying metadata – everything short of those files themselves.

An IPUMS extract is defined by:

  1. The name of the IPUMS collection (e.g. “usa”, “cps”)
  2. A list of sample names or IDs (to be) included in the extract file
  3. A list of variable names (to be) included in the extract file
  4. An extract description (e.g. “2022 ACS demographic variables”)

IPUMS users build these extract definitions piece by piece when they create an extract through the IPUMS website, selecting samples, variables, and formats.

Continue reading…