Digitizing and Exploring Qatar’s Population Censuses

By Shine Min Thant

Qatar, a small yet influential state in the Middle East, is a very interesting case study for demographic research because of its rapid development over the past thirty years. Qatar occupies a peninsula only slightly larger than the U.S. state of Rhode Island that juts out into the Persian Gulf from its border with Saudi Arabia. The country has experienced relatively rapid economic growth since the late 20th century, mainly due to its vast reserves of natural gas and oil. This newfound wealth allowed Qatar to invest heavily in its healthcare, infrastructure, and education – therefore making the country an ideal case study for social change and development. Additionally, a recent surge in Qatar’s immigrant population (which constitutes over 78 percent of the population) also makes it an ideal country to study social mobility and social change.

As part of the ISRDI Diversity Fellowship Program, I worked with Dr. Tracy Kugler, Professor Steven Manson, Professor Evan Roberts, and undergraduate student Rawan AlGahtani on a project to examine Qatar’s change using census data from 1984, 1997, and 2004. Summary tables from all three censuses were previously only available as printed documents. As a first step, we needed to transform the data from a hard-to-get printed format to widely accessible IPUMS IHGIS format. This process included multiple steps from conducting optical character recognition (OCR) to conducting data quality checks using R scripts (Figure 1).

Figure 1: IPUMS IHGIS Workflow

A workflow schematic that highlights the process of preparing summary tables and source shapefiles into consistent and machine-readable formats via IPUMS IHGIS

Continue reading…

Bivariate Proportional Symbol Maps, Part 2: Design Tips with Instructions for ArcGIS Pro

By Jonathan Schroeder, IPUMS Research Scientist, NHGIS Project Manager

How to make effective bivariate proportional symbol maps

A map of the share of population under age 18 in the Miami area in 2020. There is one colored circle for each census tract. There are five colors ranging from dark blue (representing less than 15% under age 18) to light green (representing 20 to 25% under age 18) to brown (representing 30% or more). The circle sizes correspond to tract populations. Most circles have similar sizes, representing around 1,000 to 10,000 people. The circles cluster together forming groups where there are more tracts and more people. The circles in central Miami and along the coast are bluer than elsewhere.
A bivariate proportional symbol map.
Click map for larger version.

In Part 1 of this blog series, I introduced bivariate proportional symbol maps and shared some examples to demonstrate their advantages. In short, when they’re well designed, they can make it easy to see multiple dimensions of a population all at once: size, composition, and spatial distribution.

A key part of that statement is, “when they’re well designed.” Standard mapping tools can make it easy to get started, but getting all the way to a good design still takes some extra effort.

In this Part 2 post, I discuss some key design considerations for bivariate proportional symbol maps, and I provide specific instructions to help you get to a good design.

Software considerations

I used Esri’s ArcGIS Pro to create the examples here and in Part 1. The design tips I share below should be relevant for any mapping tool, but my instructions are specifically for ArcGIS Pro (version 3.2). I expect there are ways to achieve similar designs with QGIS, R, Python, etc., quite possibly more easily than with ArcGIS Pro. I can only say that it’s easier to create effective bivariate proportional symbols now with ArcGIS Pro than it was with its predecessor ArcMap.

As I proceed, I’ll flag which instructions pertain specifically to ArcGIS Pro. All other tips are “tool neutral.”

General tip: Match size to “size” and color to “character”

When selecting which features to map, a framework that works consistently well is to use symbol color to represent an intensive property—e.g., the share of population under 18 years old, average household size, median household income, or the share of votes cast for a candidate—and use symbol size to represent the number of cases to which the intensive property pertains—e.g., the total population (when color corresponds to a population share) or the count of households (when color corresponds to average household size or median household income).

This framework enables the map to illustrate both the spatial distribution of the mapped characteristics and the frequency distribution of the intensive property—e.g., not only where a candidate received large or small vote shares but also how many votes were cast in each of those areas. Other frameworks can also work well (e.g., see the change maps in Part 1), but it’s generally very helpful if the two mapped characteristics relate to each other in a way that corresponds intuitively with “size” and “color.”

Continue reading…

Bivariate Proportional Symbol Maps, Part 1: An Introduction

By Jonathan Schroeder, IPUMS Research Scientist, NHGIS Project Manager

A powerful, underused mapping technique

The world could use a lot more bivariate proportional symbol maps. These maps pair two basic visual variables—size and (usually) color—to symbolize two characteristics of mapped features. When designed well, they convey multiple key dimensions of a population all at once: size and composition as well as spatial distribution and density.

A map of the share of population under age 18 in the Miami area in 2020. There is one colored circle for each census tract. There are five colors ranging from dark blue (representing less than 15% under age 18) to light green (representing 20 to 25% under age 18) to brown (representing 30% or more). The circle sizes correspond to tract populations. Most circles have similar sizes, representing around 1,000 to 10,000 people. The circles cluster together forming groups where there are more tracts and more people. The circles in central Miami and along the coast are bluer than elsewhere.
A bivariate proportional symbol map.
Click map for larger version.

Unfortunately, standard mapping software hasn’t made it easy to create good versions of these maps, and most introductions to statistical mapping stick to simpler strategies. As a result, bivariate proportional symbols aren’t used very often. With few examples and little guidance to go on, it’s understandable that mapmakers don’t realize how often they’re a viable, well-suited option.

This two-part blog series aims to spark more interest by providing a “few examples” (Part 1) and a “little guidance” (Part 2).

Picking up where I left off

In a previous blog post, I shared an example of a bivariate proportional symbol map and described some of the technique’s advantages. But that post focuses on a mapping resource (census centers of population) rather than on mapping techniques. Most of the examples in the post are also simply “proportional symbol maps,” without the more intriguing “bivariate” part.

To close that post, I suggested “a tantalizing next step” would be to use bivariate proportional symbols with small-area data (for census tracts or block groups), and I shared a few technical notes and design tips without much detail. I later expanded on those ideas in a conference talk, sharing some new examples with small-area data and going a little deeper with design tips.

In these new posts, I’m sharing and building on the examples and tips from the conference talk.

Continue reading…

Geospatial Contextuals from IPUMS International

By Ryan Gavin & Quinn Heimann

IPUMS International launched a new platform that will aid researchers using geospatial contextual data along with IPUMS International census microdata!

What is geospatial contextual data?

Geospatial contextual data describe features of the physical and social environment of a geographic area, and allow users to explore how contextual factors interrelate with individual characteristics and outcomes. For example, in their 2020 paper in Global Environmental Change, Mueller et al. estimated the effects that climate-related variables had on migration in Botswana, Kenya, and Zambia between 1989 and 2011. Often, however, these data are large, complex, and packaged in unfamiliar ways. With this new platform, IPUMS International simplifies the process of identifying and linking contextual data with our robust repository of census microdata.

Geospatial contextual data can vary across space, time, or both and often do not obey administrative boundaries. IPUMS International is unique in offering spatiotemporally harmonized administrative geography variables, which when linked to time-variant contextual data, allow researchers to explore the relationship between social phenomena and temporally-dynamic geospatial data using a consistent spatial footprint.

For example, researchers might be interested in studying how changing January precipitation in Bangladesh from 1991-2011 is associated with social or demographic variables. In this case, harmonized geographic variables are ideal because of administrative boundary changes in Bangladesh between 2001 and 2011.

Maps of Bangladesh in 1991, 2001, and 2011 showing the total January Precipitation using year-specific geography and harmonized geography.
Bangladesh map showing January precipitation totals for each census year, showing the difference between year-specific and harmonized geography for measuring effects.

Continue reading…

Accessing IPUMS NHGIS in R: A Primer

By Finn Roberts & Jonathan Schroeder

R users have a powerful new way to access IPUMS NHGIS!

The July 2023 release of ipumsr 0.6.0 includes a fully-featured set of client tools enabling R users to get NHGIS data and metadata via the IPUMS API. Without leaving their R environment, users can find, request, download and read in U.S. census summary tables, geographic time series, and GIS mapping files for years from 1790 through the present. This blog post gives an overview of the possibilities and describes how to get started.

What you can do with ipumsr

Request and download NHGIS data

You can use ipumsr to specify the parameters of an NHGIS data extract request and submit that request for processing by the IPUMS servers. You can request any of the data products that are available through the NHGIS Data Finder: summary tables, time series tables, and shapefiles. You can also specify general formatting parameters (e.g., file format or time series table layout) to customize the structure of your data extract.

Once you have specified a data extract, you can use a series of ipumsr functions to:

  • submit the extract request to the IPUMS servers for processing
  • check on the extract status
  • wait for the extract to complete
  • download the extract as soon as it’s ready
  • load the data into R with detailed data field descriptions.

This workflow allows you to go from a set of abstract NHGIS data specifications to analyzable data, all without having to leave your R session!

Continue reading…

IPUMS IHGIS: Unlocking International Population and Agricultural Census Data

By Tracy Kugler

Nearly all countries throughout the world conduct population and housing censuses at least every ten years, and most also conduct agricultural censuses or surveys regularly. These censuses collect information on demographics, education, employment, housing characteristics, migration, agricultural land ownership, agricultural workforce, livestock, crops, and more. The resulting data can be used to study a wide range of questions, from the character of demographic transitions within and across countries, to utilization of irrigation, to educational trends among women. 

Unfortunately, this wealth of data has remained largely inaccessible to researchers. The data are typically published in reports as tables summarizing population characteristics. In recent decades, many of these reports have been published as PDF documents and made available on national statistical office websites. While the reports are available, data from a PDF document cannot be easily imported into a statistical or GIS package. Furthermore, the table structures are highly heterogeneous, both across countries and even within the same report.

The International Historical Geographic Information System (IPUMS IHGIS) is designed to provide easy access to these data in a way that researchers can easily use for analysis. In the early phases, IHGIS was known internally as “Project Mako,” named after the Mako shark, which has a global range, voracious appetite, and a reputation for a broad-ranging diet. Like the shark, IHGIS (née Project Mako) will encompass the world and ingest all kinds of data tables.

Continue reading…

In the Archive: “25 Years of IPUMS Data”

“25 Years of IPUMS Data,” the current IPUMS/MPC archive exhibit, highlights a dynamic quarter center history of data innovation at the University of Minnesota. In the late 1980s, the Social History Research Laboratory at the University of Minnesota’s History Department proposed “the creation of a single integrated microdata series composed of public use samples for every year … with the exception of the 1890 census, which was destroyed by fire.”  The primary aim was to make the U.S. census microdata “as compatible over time as possible while losing little, if any, of the detail in the original datasets” (Integrated Public Use Microdata Series: A Prospectus).

Continue reading…