2021 IPUMSI New Data Release Highlights

Map depicting where IPUMSI has dataIPUMS International has added 19 new census samples and new labor force surveys.  First-time data release countries include four new countries from four different continents—Finland, Mauritius, Myanmar, and Suriname. Other newly added samples extend pre-existing series. Another first is the addition of labor force surveys from Spain and Italy. See a summary of the full IPUMS collection on the IPUMSI samples page.

In addition to the new data, check out the usage-enhancing highlights that are part of this recent release.

  • Spatially-harmonized migration variables
  • New work variables that maximize the utility of newly-harmonized labor force surveys
  • New disability variables per The Washington Group recommendations
  • Access to harmonization tables and code for registered IPUMS data users
  • Population density variables for all samples with the requisite geography- POPDENSGEO1 and POPDENSGEO2 capture the population density in persons per square kilometer of the first and second administrative units of the household, respectively.
  • Variables AREAMOLLWGEO1 and AREAMOLLWGEO2 provided for additional convenience
  • New lower level single-sample variables for select countries, as well as regionalized variables and shapefiles at the 3rd administrative level for Senegal 2013 and 2002, South Africa 2016, 2011, and 2007, and Uganda 2014, and Myanmar 2014

Stay tuned for the future IPUMS International releases, which will include population density variables for lower-level geography, more 3rd-level geography variables for existing IPUMS International countries, and additional labor force surveys. In the meantime, be sure to share what you’re doing with IPUMS data with us on Twitter @ipumsi!

Locating Dimensions of Women’s Empowerment in Family Planning in Burkina Faso

By Tayler Nelson

Women’s “empowerment,” defined by Naila Kabeer[1] as “the expansion of people’s ability to make strategic life choices in a context where this ability was previously denied to them,” has been shown[2] to be associated with greater birth spacing, lower fertility, and lower rates of unplanned pregnancy. Yet scholars disagree[3] on how to measure women’s empowerment, and meanings of empowerment can shift across geographic and cultural contexts.

IPUMS PMA’s family planning surveys include variables that can help researchers investigate dimensions of women’s empowerment in family planning. All samples include indicators of women’s knowledge about family planning methods. Many survey rounds dig deeper, collecting data that can be used by researchers and policymakers.

The Burkina Faso 2018 Round 6 survey includes a range of variables measuring family planning attitudes, beliefs, and decision-making dynamics that relate to women’s empowerment. I used a weighted polychoric factor analysis[4] to investigate women’s empowerment in family planning in Burkina Faso. Factor analysis can help researchers reduce a large number of observed variables by identifying similar response patterns among observed variables and grouping them into a smaller set of underlying variables, or factors. Through analyzing how variables are grouped and the strength and signs of coefficients within these groups, researchers can glean insight into which sets of observed variables might be best at measuring an unobserved construct such as women’s empowerment.

After reviewing the literature on women’s empowerment in family planning, I selected nineteen PMA variables to capture dimensions of women’s empowerment in family planning in Burkina Faso. These included social context variables (URBAN, WEALTHQ, EDUCATTGEN), whether the woman is a current/recent family planning user (FPCURRECUSER), whether the woman has heard of family planning on television (FPTVHR) or radio (FPRADIOHR), and belief and attitude variables that use a Likert scale to measure how much the woman agrees with a particular statement about family planning (SAFEDISCKID, SAFEDISCFP, CONFLICTFP, DAMRELFP, NEGOTIATEKIDS, BELIEFCARRYPREG, BELIEFDAUPREG, AGREESPACE, AGREELIMIT, AGREECONTR, AGREEFP, AGREEPARTFP). I also combined two variables that measure who is the primary decision-maker in deciding to use (FPDECIDER) or not to use (DECNOFPUSE) family planning into a single ordinal variable (DECIDER) that reports how much input a woman has in the couple’s family planning decisions. I excluded not in universe (NIU) and missing cases as well as females who were not married or partnered; my analytical sample contained 1,686 women. All analyses applied appropriate sampling weights.

I retained three factors[5] and analyzed which characteristics or beliefs relate to each underlying factor by examining variable groupings and factor loadings. Table 1 displays all factor loadings above 0.3 for each variable. Looking at high factor loadings can help researchers identify underlying factors. I have marked moderate and strong factor loadings[6] in the table with an asterisk(*).

Table 1: Factor Loadings (promax rotations)

URBAN Urban/rural status .94 *
WEALTHQ Wealth score quintile .87 *
EDUCATTGEN Highest level of schooling attended .66 *
FPCURRECUSER Current or recent FP user .32
DECIDER Women’s level of say in decisions about whether to use/not use FP
SAFEDISCKID Safe to discuss when to have children w/ partner .66 *
SAFEDISCFP Safe to discuss FP w/ partner  .71 *
CONFLICTFP Conflict in relationship if used FP -.50
DAMRELFP Delaying or limiting children would deteriorate relationship w/ partner -.44
NEGOTIATEKIDS Able to negotiate w/ partner when to stop having children .65 *
BELIEFCARRYPREG Thinks a woman should not get pregnant if child still on her back .60 *
BELIEFDAUPREG Thinks a woman should not get pregnant if daughter is pregnant .48
AGREESPACE Agrees with couple that uses FP to space births .45 .44
AGREELIMIT Agrees with couple that uses FP to limit births .52
AGREECONTR Agrees with man/woman that uses contraception .34
AGREEFP Agrees with couples that use FP .59 .39
AGREEPARTFP Partner agrees with couples that use FP .69 *
FPTVHR Heard about FP on TV .66 * .38
FPRADIOHR Heard about FP on the radio .33

Factor 1 includes the majority of attitude and belief variables. I looked at similarities and differences between variables and their factor loadings[7] to interpret this factor. Variables with high factor loadings all relate to good spousal communication around family planning. For instance, SAFEDISCFP, which reports how much the respondent agrees that it is safe to discuss family planning with her partner, has the highest factor loading. Variables related to spousal conflict, such as CONFLICTFP, have weak but negative factor loadings. Together, these results indicate that Factor 1 might be a latent measure of spousal communication. Alternative interpretations are possible: for example, it might be that this factor simply represents support for family planning, and variables with high factor loadings are the most useful observed indicators of this factor.

For Factor 2, the high factor loadings for URBAN and WEALTHQ suggest that this factor indicates socioeconomic status. The variable for whether the woman has heard about family planning on TV loads moderately highly on this factor; this seems logical assuming TV ownership is a reflection of socioeconomic status.

For Factor 3, the factor loadings are relatively small except for the loading for the belief that women should not get pregnant if she is still carrying a child on her back. In line with a Burkina Faso 2014 report that found popular acceptance of family planning for spacing rather than limiting births and stigma around women who have children too closely together, this factor might indicate adherence to traditional Burkina Faso beliefs around family planning. AGREESPACE, which reports the respondent’s agreement with couples using contraception to space the birth of their children, also loads on this factor. This factor may indicate value of traditional beliefs or more explicit support for child spacing.

This factor analysis indicates several dimensions of empowerment for understanding family planning in Burkina Faso and suggests a handful of important variables (those with high factor loadings) users may be interested in including in analyses of female empowerment. It also highlights other areas to explore. For example, I was surprised that DECIDER was not strongly associated with any single factor (though iterations of this analysis that measured joint decision-making showed high factor loading alongside the other socioeconomic status variables in Factor 2). Researchers may want to use PMA data to investigate whether spousal communication, socioeconomic status, and geographically-specific traditional beliefs remain important factors in family planning variables across different countries and years, or look further into how decision-making might align with these factors.

Get the Stata code to replicate this analysis: Stata_synatax_ipums.txt

[1] Kabeer, Naila. 2001. “Reflections on the Measurement of Women’s Empowerment.” In: Discussing Women’s Empowerment: Theory and Practice. Stockholm: SIDA: Swedish International Development Cooperation Agency.

[2] Upadhyay, Ushma D. et al. 2014. “Women’s Empowerment and Fertility: A Review of the Literature,” Social Science & Medicine 115: 111-120.

[3] Prata, Ndola et al. 2017. “Women’s Empowerment and Family Planning: A Review of the Literature,” Journal of Biosocial Science 49(6): 713-743.

[4] I used Stata’s polychoric function in order to include a mix of binary and ordinal variables in my analysis.

[5] I used Eigenvalues and scree plots to determine how many factors to retain.

[6] I am defining moderate and strong factor loadings as those loadings of .60 and greater.

[7] I looked for common conceptual threads between the highest factor loadings (asking myself, ‘What latent factor might all of these variables be measuring?’). I also looked at weak and negative factor loadings to help interpret this factor.

IPUMS IHGIS: Unlocking International Population and Agricultural Census Data

By Tracy Kugler

Nearly all countries throughout the world conduct population and housing censuses at least every ten years, and most also conduct agricultural censuses or surveys regularly. These censuses collect information on demographics, education, employment, housing characteristics, migration, agricultural land ownership, agricultural workforce, livestock, crops, and more. The resulting data can be used to study a wide range of questions, from the character of demographic transitions within and across countries, to utilization of irrigation, to educational trends among women. 

Unfortunately, this wealth of data has remained largely inaccessible to researchers. The data are typically published in reports as tables summarizing population characteristics. In recent decades, many of these reports have been published as PDF documents and made available on national statistical office websites. While the reports are available, data from a PDF document cannot be easily imported into a statistical or GIS package. Furthermore, the table structures are highly heterogeneous, both across countries and even within the same report.

The International Historical Geographic Information System (IPUMS IHGIS) is designed to provide easy access to these data in a way that researchers can easily use for analysis. In the early phases, IHGIS was known internally as “Project Mako,” named after the Mako shark, which has a global range, voracious appetite, and a reputation for a broad-ranging diet. Like the shark, IHGIS (née Project Mako) will encompass the world and ingest all kinds of data tables.


The initial version of IHGIS includes 270 tables from 9 population and housing censuses and 4 agricultural censuses. We plan to release new datasets several times a year. Our next release will include tables for an additional 12 datasets and is planned for early 2021. We have acquired over 30,000 data tables from 150 population and 107 agricultural censuses from 132 countries, which we will move through the processing pipeline over the next few years.

Datasets present/planned in the first two IHGIS data releases.
Datasets present/planned in the first two IHGIS data releases.

Our data collection efforts for population data have focused primarily on countries for which microdata are not yet available in IPUMS International. The geographic detail available with microdata is often limited due to confidentiality concerns associated with individual-level data. For several countries, notably Canada, Russia, and much of northern Europe, IPUMS International is only able to release first-level (e.g., province) identifiers. Confidentiality concerns are mitigated in summary tables. IHGIS may therefore be able to provide much more geographic detail, and we will focus on acquiring such data in future collection efforts.

You can explore the current collection through the IHIGIS data finder, where you can filter by dataset, browse available tables, select the tables you are interested in, and download the data. Your extract will include consistently structured data tables in CSV format, ready for use in your analysis. You will also receive comprehensive metadata in both human- and machine-readable formats. For more information about how to use the data finder and interpret your extract, check out our User Guide.

IHGIS also provides GIS shapefiles delineating the boundaries of the geographic units described in the data tables. Each unit is identified with a unique code in both the data tables and shapefiles, allowing you to easily join them in a GIS package.

IHGIS Under the Hood

Transforming data tables from the myriad structures in which they are published to the standardized IHGIS structure is no small task. Clearly, it would be impossible without substantial software infrastructure. But it is equally infeasible to completely automate the task of interpreting the contents of any given table. Therefore, the overarching philosophy of IHGIS data processing is to have computers do what computers are good at and have humans do what humans are good at. For example, it is relatively easy for a person to determine whether row headers identify geographic units or categories of marital status or educational attainment. Developing software to make that determination would be a significant challenge. On the other hand, having humans extract state-level totals from a table by copying and pasting is tedious, time-consuming, and error-prone.

The heart of the IHGIS data processing workflow is a table markup framework. Table markup uses Excel as an interface for a lightweight process through which researchers (mostly undergraduate research assistants) indicate the location of key structural elements within each table. For each table, students extract information such as the universe, time frame, and geographic extent. They then add keyword tags indicating the location of geographic unit headers, headers describing the characteristics summarized in the table, the table title, the extent of the data, and other structural elements.

Example of markup for a relatively simple table
Example of markup for a relatively simple table

The markup serves as a guide for our software, enabling ingest into a metadata database. The database organizes all row and column headers, titles, universes, and other metadata elements and their relationships in a consistent way. The database, in turn, enables automated restructuring of the data tables to generate the consistently structured tables in IHGIS extracts. For example, many source tables include nested geographic units at two or more levels (e.g., states and counties). IHGIS pulls the appropriate rows apart to create separate files for each level, enabling easier data linkages in GIS packages.

We hope you enjoy using IHGIS, and please send us a note at ipums@umn.edu if you have any questions, comments, or suggestions.