IPUMS International has released new data! Eighteen new census samples have been added to the collection, including data from Côte d’Ivoire, which is new to IPUMS International. Newly released census samples include Cambodia (2019), Côte d’Ivoire (1988, 1998), Denmark (1845, 1880, 1885), Laos (1995, 2015), Mexico (2020), Peru (2017), Puerto Rico (2015, 2020), Switzerland (2011), United Kingdom (1961, 1971), United States (2015, 2020) and Vietnam (2019). As always, we gratefully acknowledge the national statistical offices of all the countries partnering with IPUMS International to make data available for research.
New geography variables are also now available with harmonized migration variables at the second-administrative level; the codes for the newly released migration variables match existing IPUMS International geography codes and labels. As an example, the geographic units in the migration variable for Mexico at the municipo level (place of residence 5 years ago, MIG2_5_MX) are reconciled to the boundaries for place of current residence (GEO2_MX).
In October 2023, the World Health Organization stated, “3.6 billion people already live in areas highly susceptible to climate change. Between 2030 and 2050, climate change is expected to cause approximately 250,000 additional deaths per year, from undernutrition, malaria, diarrhea and heat stress alone.”
The Demographic and Health Surveys (DHS) are an ideal source for research on the health effects of climate change. Since the 1980s, the DHS has collected a broad range of nationally representative health data from over 90 countries. With supplemental funding from NICHD, harmonized DHS data from IPUMS (dhs.ipums.org) is now doing more to support research on the effects of climate change on health. We are adding new contextual variables; we are integrating data from Malaria Indicator Surveys (MIS); and we are offering guidance through the new Climate Change and Health Research Hub.
Sound research on climate change and health requires combining social science and health data with natural science data. While social scientists and public health researchers have considerable experience analyzing health survey data, few have been trained in simultaneously employing data on environmental factors. This knowledge gap is addressed by the Climate Change and Health Research Hub, under the leadership of Dr. Kathryn Grace and Senior Data Analyst Finn Roberts.
A Public Use Microdata Area (PUMA) is a type of geographic unit created for statistical purposes. PUMAs represent geographic areas with a population size of 100,000–200,000 within a state (PUMAs cannot cross state lines). PUMAs are the smallest level of geography available in American Community Survey (ACS) microdata. They are designed to protect respondent confidentiality while simultaneously allowing analysts to produce estimates for small geographic areas.
Every ten years, the decennial census results are used to redefine ACS PUMA boundaries to account for shifts in population and continue to maintain respondent confidentiality. This process is intended to yield geographic definitions that are meaningful to many stakeholders.
Most recently, new PUMAs were created based on the 2020 Census; these 2020 PUMAs were implemented in the ACS starting in the 2022 data year. Although Public Use Microdata Area components remain consistent to the extent possible, they are updated based on census results and revised criteria. Therefore, they are not directly comparable with PUMAs from any previous ACS data years. For example, the 2020 PUMAs used in the 2022 data year are distinct from the 2010 PUMAs, which were used in the 2012–2021 ACS data years.
The 2020 PUMAs were created based on definitions that include two substantive changes relative to the 2010 PUMAs:
1) An increase in the minimum population threshold for the minimum size of partial counties from 2,400 to 10,000. Increasing the population minimum for a PUMA-county part aims to further protect the confidentiality of respondents. However, exceptions are allowed on a case-by-case basis in order to maintain the stability of PUMA definitions (that were based on the previous minimum of 2,400) and due to unique geography.
2) Allowing noncontiguous geographic areas. Allowing PUMAs to include noncontiguous geographic areas aims to avoid unnecessarily splitting up demographic groups in order to provide more meaningful data. This change is not intended to create highly fragmented PUMAs.
Other than the two changes listed above, PUMA criteria remained consistent, such as treating 100,000 as a strict minimum population size for PUMAs. The maximum population size for PUMAs can exceed a population of 200,000 in certain instances due to expected population declines or geographic constraints.
By Jonathan Schroeder, IPUMS Research Scientist, NHGIS Project Manager
How to make effective bivariate proportional symbol maps
In Part 1 of this blog series, I introduced bivariate proportional symbol maps and shared some examples to demonstrate their advantages. In short, when they’re well designed, they can make it easy to see multiple dimensions of a population all at once: size, composition, and spatial distribution.
A key part of that statement is, “when they’re well designed.” Standard mapping tools can make it easy to get started, but getting all the way to a good design still takes some extra effort.
In this Part 2 post, I discuss some key design considerations for bivariate proportional symbol maps, and I provide specific instructions to help you get to a good design.
Software considerations
I used Esri’s ArcGIS Pro to create the examples here and in Part 1. The design tips I share below should be relevant for any mapping tool, but my instructions are specifically for ArcGIS Pro (version 3.2). I expect there are ways to achieve similar designs with QGIS, R, Python, etc., quite possibly more easily than with ArcGIS Pro. I can only say that it’s easier to create effective bivariate proportional symbols now with ArcGIS Pro than it was with its predecessor ArcMap.
As I proceed, I’ll flag which instructions pertain specifically to ArcGIS Pro. All other tips are “tool neutral.”
General tip: Match size to “size” and color to “character”
When selecting which features to map, a framework that works consistently well is to use symbol color to represent an intensive property—e.g., the share of population under 18 years old, average household size, median household income, or the share of votes cast for a candidate—and use symbol size to represent the number of cases to which the intensive property pertains—e.g., the total population (when color corresponds to a population share) or the count of households (when color corresponds to average household size or median household income).
This framework enables the map to illustrate both the spatial distribution of the mapped characteristics and the frequency distribution of the intensive property—e.g., not only where a candidate received large or small vote shares but also how many votes were cast in each of those areas. Other frameworks can also work well (e.g., see the change maps in Part 1), but it’s generally very helpful if the two mapped characteristics relate to each other in a way that corresponds intuitively with “size” and “color.”
By Jonathan Schroeder, IPUMS Research Scientist, NHGIS Project Manager
A powerful, underused mapping technique
The world could use a lot more bivariate proportional symbol maps. These maps pair two basic visual variables—size and (usually) color—to symbolize two characteristics of mapped features. When designed well, they convey multiple key dimensions of a population all at once: size and composition as well as spatial distribution and density.
Unfortunately, standard mapping software hasn’t made it easy to create good versions of these maps, and most introductions to statistical mapping stick to simpler strategies. As a result, bivariate proportional symbols aren’t used very often. With few examples and little guidance to go on, it’s understandable that mapmakers don’t realize how often they’re a viable, well-suited option.
This two-part blog series aims to spark more interest by providing a “few examples” (Part 1) and a “little guidance” (Part 2).
Picking up where I left off
In a previous blog post, I shared an example of a bivariate proportional symbol map and described some of the technique’s advantages. But that post focuses on a mapping resource (census centers of population) rather than on mapping techniques. Most of the examples in the post are also simply “proportional symbol maps,” without the more intriguing “bivariate” part.
To close that post, I suggested “a tantalizing next step” would be to use bivariate proportional symbols with small-area data (for census tracts or block groups), and I shared a few technical notes and design tips without much detail. I later expanded on those ideas in a conference talk, sharing some new examples with small-area data and going a little deeper with design tips.
In these new posts, I’m sharing and building on the examples and tips from the conference talk.
The newest IPUMS data collection, IPUMS MICS, has many similarities with other IPUMS microdata collections. However, there is one major difference: the IPUMS MICS Data Extract System only uses Stata.
Yes, you read that right. Users of IPUMS MICS must use Stata to open and create their customized data file.
Let’s start with how using IPUMS MICS is the same as using other IPUMS microdata collections.
If you are an IPUMS user, you will find the process of browsing the variables, looking at documentation, and adding samples to your data cart completely familiar. If you are not familiar with IPUMS, you can read more about browsing and selecting variables.
However, when you finish choosing variables and samples in IPUMS MICS and click “Create Extract,” things start to look different.
Normally, you could change the data format, but the only option currently available for IPUMS MICS is a .dat (fixed-width text) file format.
Geospatial contextual data describe features of the physical and social environment of a geographic area, and allow users to explore how contextual factors interrelate with individual characteristics and outcomes. For example, in their 2020 paper in Global Environmental Change, Mueller et al. estimated the effects that climate-related variables had on migration in Botswana, Kenya, and Zambia between 1989 and 2011. Often, however, these data are large, complex, and packaged in unfamiliar ways. With this new platform, IPUMS International simplifies the process of identifying and linking contextual data with our robust repository of census microdata.
Geospatial contextual data can vary across space, time, or both and often do not obey administrative boundaries. IPUMS International is unique in offering spatiotemporally harmonized administrative geography variables, which when linked to time-variant contextual data, allow researchers to explore the relationship between social phenomena and temporally-dynamic geospatial data using a consistent spatial footprint.
For example, researchers might be interested in studying how changing January precipitation in Bangladesh from 1991-2011 is associated with social or demographic variables. In this case, harmonized geographic variables are ideal because of administrative boundary changes in Bangladesh between 2001 and 2011.