by Tsu Zhu and Tracy Kugler
ipumsr now supports IHGIS!
IPUMS spatial data users now have programmatic access to international census data tables in IHGIS. The recent release of ipumsr 0.9.0 enables users to explore metadata; build, submit, and download extracts; and read IHGIS tables directly in R. Many of the ipumsr
functions that had been NHGIS-specific have now been generalized to accommodate both IHGIS and NHGIS. More information on the new functions can be found in the ipumsr changelong.
About IPUMS IHGIS
The International Historical Geographic Information System (IHGIS) provides data tables from population, housing, and agricultural censuses from around the world. The data are derived from tables originally published by national statistical offices. The format and structure of the published tables varies widely between countries and across time, even within the same country. IHGIS extracts the tables and standardizes them into a machine-readable structure along with consistently formatted metadata and corresponding GIS boundary files. As of this writing, the IHGIS collection consists of 40 datasets from population and housing censuses, 14 from agricultural censuses, and an additional 305 datasets tabulated from IPUMS International microdata samples.
Working with ipumsr for IHGIS
The following sections walk through how you would use new ipumsr
functionality to explore metadata and submit an extract request for data tables from Ireland censuses from 1966 through 1991. This example highlights changes to ipumsr
functions that have been generalized to support both IHGIS and NHGIS. For more details see the Aggregate Data API Requests article on the ipumsr
webpage.
Getting Started
To install the latest version of ipumsr
from CRAN, run install.packages("ipumsr")
in your R console. You can ensure your currently installed version is 0.9.0 with packageVersion("ipumsr")
.
Requesting IHGIS data and metadata via ipumsr
also requires that you first register to use IHGIS (if you haven’t already) and obtain an IPUMS API key. The ipumsr
website includes several articles that demonstrate working with the IPUMS API within R, including instructions on how to get an API key and use it with ipumsr. If you are unfamiliar with creating, submitting, and downloading an extract within ipumsr
, we suggest you start with the introduction to the IPUMS API for R users.
Explore IHGIS Metadata
You can view a list of all available datasets, data tables, or tabulation geographies (sets of geographic units) using 'get_metadata_catalog()'
. You can then use R functions to filter the returned list.
Datasets
First, let’s get a list of available datasets for Ireland.
get_metadata_catalog(collection = "ihgis", metadata_type = "datasets") |>
dplyr::filter(country == "IE") |>
dplyr::select("name", "description", "dataset_type")
# A tibble: 15 × 3
name description dataset_type
<chr> <chr> <chr>
1 IE1966pop Census of Population of Ireland, 1966 Population Census
2 IE1971pop Census of Population of Ireland, 1971 Population Census
3 IE1971tab Census of Population of Ireland, 1971 Tabulated from IPUMS International Microdata Sample
4 IE1979pop Census of Population of Ireland, 1979 Population Census
5 IE1979tab Census of Population of Ireland, 1979 Tabulated from IPUMS International Microdata Sample
6 IE1981pop Census of Population of Ireland, 1981 Population Census
7 IE1981tab Census of Population of Ireland, 1981 Tabulated from IPUMS International Microdata Sample
8 IE1986pop Census of Population of Ireland, 1986 Population Census
9 IE1986tab Census of Population of Ireland, 1986 Tabulated from IPUMS International Microdata Sample
10 IE1991pop Census of Population of Ireland, 1991 Population Census
11 IE1991tab Census of Population of Ireland, 1991 Tabulated from IPUMS International Microdata Sample
12 IE1996tab Census of Population of Ireland, 1996 Tabulated from IPUMS International Microdata Sample
13 IE2002tab Census of Population of Ireland, 2002 Tabulated from IPUMS International Microdata Sample
14 IE2006tab Census of Population of Ireland, 2006 Tabulated from IPUMS International Microdata Sample
15 IE2011tab Census of Population of Ireland, 2011 Tabulated from IPUMS International Microdata Sample
We can see that IHGIS includes several Ireland datasets tabulated from microdata (those with dataset names ending in ‘tab’) and several derived from published data sources (those with dataset names ending in ‘pop’). In addition to the name
, description
, and dataset_type
selected above, fields containing information about how the census was conducted and definitions of key terms are also available.
To view detailed metadata for a specific dataset, we can use ‘get_metadata()
‘. In addition to the high-level summary metadata available in the catalog listing, the detailed dataset-level metadata includes information on available data tables and tabulation geographies. Below is a list of available data tables for the IE1991pop dataset.
get_metadata(collection = "ihgis", dataset = "IE1991pop")$data_tables |>
dplyr::select("name","label")
# A tibble: 25 × 2
name label
<chr> <chr>
1 IE1991pop.AAA Population by Sex and Age Group
2 IE1991pop.AAB Population, Marriages, Births, Deaths, Natural Increase, and Estimated Net Migration [1926-1991]
3 IE1991pop.AAC Population, Area and Density [1986-1991]
4 IE1991pop.AAD Population and Percent Change [1971-1991]
5 IE1991pop.AAE Percentage Change in Population [1946-1991]
6 IE1991pop.AAF Percentage Change in Population in Each Age Group by Sex [1966-1991]
7 IE1991pop.AAG Average Annual Rate of Change in Population Per 1,000 by Age Group and Sex [1966-1991]
8 IE1991pop.AAH Population by Sex, Age Group, and Marital Status [1926-1991]
9 IE1991pop.AAI Population by Single Year of Age, Sex, and Marital Status
10 IE1991pop.AAJ Population by Sex, Age Group, and Detailed Marital Status
# ℹ 15 more rows
In addition to the name and label selected above, available table-level metadata also include the universe, table number, tabulation geographies, and footnotes.
Tabulation Geographies
IHGIS tabulation geographies are sets of units over which population data are summarized. Most tabulation geographies are organized into hierarchies of child units nested within parent units (e.g., states, provinces, districts). To view available tabulation geographies for Ireland population census datasets, the following example filters the catalog to match dataset names to those starting with “IE” and ending with “pop”. Metadata on the number of units and mean population and area of units provides an idea of the level of geographic granularity available. In the case of IHGIS Ireland datasets, hierarchical levels go down to 'g5'
, which represents towns and environs. For this example, we apply a filter for labels containing the word “Counties” to see which tabulation geographies represent counties.
get_metadata_catalog(collection = "ihgis", metadata_type = "tabulation_geographies") |>
dplyr::filter(stringr::str_detect(name, "^IE.*pop.*$")) |>
dplyr::filter(stringr::str_detect(label,"Counties")) |>
dplyr::arrange(unit_count)
# A tibble: 15 × 7
name label hierarchical_level mean_population mean_area sequence unit_count
<chr> <chr> <chr> <int> <dbl> <int> <int>
1 IE1966pop.ga Counties with County & City (County Borough) groups ga 110923 2702 7 26
2 IE1971pop.gc Counties with County & County Borough groups gc 110305 2602. 9 27
3 IE1979pop.gc Counties with County & County Borough groups gc 124749 2602. 8 27
4 IE1981pop.gc Counties with County & County Borough groups gc 127534 2602. 8 27
5 IE1986pop.gc Counties with County & County Borough groups gc 131135 2602. 7 27
6 IE1966pop.g3 Counties/County Boroughs g3 93032 2266. 4 31
7 IE1971pop.g3 Counties/County Boroughs g3 96073 2266. 4 31
8 IE1979pop.g3 Counties/County Boroughs g3 108652 2266. 4 31
9 IE1981pop.g3 Counties/County Boroughs g3 111078 2266. 4 31
10 IE1971pop.gb Counties/County Boroughs [including Dun Laoghaire] gb 93070 2195. 8 32
11 IE1979pop.ga Counties/County Boroughs [including Dun Laoghaire] ga 105257 2195. 7 32
12 IE1981pop.ga Counties/County Boroughs [including Dun Laoghaire] ga 107606 2195. 7 32
13 IE1986pop.g3 Counties/County Boroughs g3 110645 2195. 4 32
14 IE1991pop.g3 Counties g3 110179 2195. 4 32
15 IE1966pop.gb Counties/County Boroughs [including Dun Laoghaire] gb 87394 2129. 8 33
We can see that several tabulation geographies, designated 'ga'
, 'gb'
, or 'g3'
, represent counties and related units. Also note that the unit counts are similar across the range of years.
Data Tables
Now let’s look for tables we could use to analyze changes in Ireland’s employed population over time at the county/county borough level. First, we get a list of Ireland datasets derived from published population censuses by filtering the results of get_metadata_catalog()
.
Then, we iterate over each dataset to pull table titles that contain “employ”.
# Pull list of available population censuses for Ireland.
ie_datasets <- get_metadata_catalog(collection = "ihgis", metadata_type = "datasets") |>
dplyr::filter(stringr::str_detect(name, "^IE.*pop.*$"))
purrr::map(ie_datasets$name, function(dataset) {
get_metadata(collection = "ihgis", dataset = dataset)$data_tables}) |>
dplyr::bind_rows(.id = "dataset") |>
dplyr::filter(grepl("employ",label, ignore.case = TRUE)) |>
dplyr::pull("label")
[1] "Working Age Population by Employment Status and Sex"
[2] "Employed Population by Industrial Group and Sex"
[3] "Population by Sex and Employment Status"
[4] "Employed Population by Sex and Industrial Group"
[5] "Population 15 Years and Over by Sex, Employment Status, and Age Group"
[6] "Economically Active Population by Employment Status"
[7] "Employed Population by Industrial Group"
[8] "Population by Sex, Employment Status, and Age Group"
[9] "Employed Population by Sex and Industrial Group"
From the list above, we decide to select datasets with tables that have “Employed population” in the title. We can then apply a similar iteration method to refine the filter and identify tables that are available for the 'ga'
, 'gb'
, and/or 'g3'
county-level tabulation geographies.
purrr::map(ie_datasets$name, function(dataset) {get_metadata(collection = "ihgis", dataset = dataset)$data_tables}) |>
bind_rows(.id = "dataset") |>
filter(grepl("Employed population", label, ignore.case = TRUE),
map_lgl(tabulation_geographies, ~ any(grepl("ga|gb|g3", .x, ignore.case = TRUE)))) |>
dplyr::select("dataset","name","dataset_name","label", "universe")
# A tibble: 4 × 5
dataset name dataset_name label universe
<chr> <chr> <chr> <chr> <chr>
1 1 IE1966pop.AAI IE1966pop Employed Population by Industrial Group and Sex Employed population
2 2 IE1971pop.AAT IE1971pop Employed Population by Sex and Industrial Group Employed population
3 4 IE1981pop.AAX IE1981pop Employed Population by Industrial Group Employed Population
4 6 IE1991pop.AAO IE1991pop Employed Population by Sex and Industrial Group Employed population
The resulting list includes four tables, one each from 1966, 1971, 1981, and 1991.
Request and Download IHGIS Data
After identifying the datasets, data tables, and tabulation geographies you are interested in, you can define an IHGIS data extract using 'define_extract_agg()'
and 'ds_spec()'
.1
Here we’ll define an extract for the tables representing employed populations from the list above.
extract <- define_extract_agg(
"ihgis",
description = "Ireland employed population tables for ga, gb, g3",
datasets = list(
ds_spec("IE1966pop", data_tables = "IE1966pop.AAI", tabulation_geographies = "IE1966pop.gb"),
ds_spec("IE1971pop", data_tables = "IE1971pop.AAT", tabulation_geographies = "IE1971pop.gb"),
ds_spec("IE1981pop", data_tables = "IE1981pop.AAX", tabulation_geographies = "IE1981pop.ga"),
ds_spec("IE1991pop", data_tables = "IE1991pop.AAO", tabulation_geographies = "IE1991pop.g3")
))
print(extract)
Unsubmitted IPUMS IHGIS extract
Description: Ireland employed population tables for ga, gb, g3
Dataset: IE1966pop
Tables: IE1966pop.AAI
Tabulation Geogs: IE1966pop.gb
Dataset: IE1971pop
Tables: IE1971pop.AAT
Tabulation Geogs: IE1971pop.gb
Dataset: IE1981pop
Tables: IE1981pop.AAX
Tabulation Geogs: IE1981pop.ga
Dataset: IE1991pop
Tables: IE1991pop.AAO
Tabulation Geogs: IE1991pop.g3
Once the extract is defined, you can use ipumsr
functions to
- submit the extract request (
submit_extract()
) - check the extract’s status (
is_extract_ready()
,get_extract_info()
) - wait for the extract to complete (
wait for extract()
) - download the extract as soon as it’s ready (
download_extract()
)
See details in the Introduction to the IPUMS API for R Users.
After downloading an extract, you can load the data into R with detailed data field descriptions by calling the function 'read_ipums_agg()'.
2 IHGIS extracts come with full metadata that is accessible by calling 'read_ihgis_codebook()'
. For more detail on extract metadata, see the article Read metadata from an IHGIS extract’s codebook files.
Final Notes
The workflow outlined above is just a sample of the new features in ipumsr
that support aggregate data API requests. See the full updated documentation in the Aggregate Data API Requests article.
Python Support
We have also added IHGIS support to ipumspy, a python library that provides much of the same functionality that is available in ipumsr
! This library is maintained by IPUMS.
Footnotes
- The function
'define_extract_agg()'
has replaced'define_extract_nhgis()'
. ↩︎ - The function
'read_ipums_agg()'
has replaced'read_nhgis()'
. ↩︎