IPUMS IHGIS: Unlocking International Population and Agricultural Census Data

By Tracy Kugler

Nearly all countries throughout the world conduct population and housing censuses at least every ten years, and most also conduct agricultural censuses or surveys regularly. These censuses collect information on demographics, education, employment, housing characteristics, migration, agricultural land ownership, agricultural workforce, livestock, crops, and more. The resulting data can be used to study a wide range of questions, from the character of demographic transitions within and across countries, to utilization of irrigation, to educational trends among women. 

Unfortunately, this wealth of data has remained largely inaccessible to researchers. The data are typically published in reports as tables summarizing population characteristics. In recent decades, many of these reports have been published as PDF documents and made available on national statistical office websites. While the reports are available, data from a PDF document cannot be easily imported into a statistical or GIS package. Furthermore, the table structures are highly heterogeneous, both across countries and even within the same report.

The International Historical Geographic Information System (IPUMS IHGIS) is designed to provide easy access to these data in a way that researchers can easily use for analysis. In the early phases, IHGIS was known internally as “Project Mako,” named after the Mako shark, which has a global range, voracious appetite, and a reputation for a broad-ranging diet. Like the shark, IHGIS (née Project Mako) will encompass the world and ingest all kinds of data tables.

Continue reading…

What’s new with IPUMS USA? Updates for Industry and Occupation Variables

By Megan Schouweiler (Senior Data Analyst, IPUMS USA) and Sophia Foster (Data Analyst, IPUMS USA)

The Census Bureau drops ACS 1-year PUMS files tomorrow (October 15, 2020)! Don’t worry, the IPUMS USA team will get right to work to get you some data as soon as possible. In the meantime, let’s talk a little about what’s new with occupation and industry variables on IPUMS USA.

New OCCSOC and INDNAICS Crosswalks Available

You may be familiar with our harmonized occupation (OCC1950, OCC1990, OCC2010) and industry variables (IND1950, IND1990). These variables harmonize occupation/industry codes based on Census Bureau classification systems to a base year, making comparisons across time much easier. Researchers are also interested in using the Standard Occupational Classification (SOC) system and North American Industry Classification System (NAICS) codes that are available in the public use data; IPUMS has not created nifty harmonized variables for these codes. We hope to harmonize these codes someday– until then, we will settle for providing great documentation about how these codes have changed over time. And we’ve recently made the documentation even better!

OCCSOC reports the primary occupation based on the SOC system, and INDNAICS reports the type of establishment of the primary occupation based on the NAICS system. Both of these coding systems are periodically updated. In the past two decades, the OCCSOC codes have been updated six times and the INDNAICS codes have been updated five times, creating a challenge for those utilizing the codes to conduct research across time. Beyond navigating the changes to the coding schemes, there are separate crosswalks for each update. We recently updated each of our crosswalks to include all iterations of the underlying coding systems from 2000 onward in a single table for OCCSOC and INDNAICS, respectively. Instead of a bunch of links to crosswalks that just compare adjacent schemes, we’ve combined all years into one table.

In total, we created four crosswalks: OCC to OCCSOC; IND to INDNAICS; OCCSOC only; and INDNAICS only. These crosswalks include detailed descriptions of how OCCSOC and INDNAICS codes have changed over time from the 2000 Census to present. Examples of changes include one occupation/industry splitting into multiple new categories, multiple categories collapsing into one occupation/industry, and updates to codes and titles. Because these types of changes occur with each new iteration of the coding scheme, it can be difficult to understand how the codes relate to one another across time. We hope that these new crosswalks provide a more comprehensive mapping of the OCCSOC and INDNAICS codes over time and will aid researchers in using these variables. These crosswalks are available to view on the IPUMS USA website and for download in both Excel and CSV format. Trust us, you’ll want to download these crosswalks to make your programming a lot easier.

Continue reading…

IPUMS provides demographic data for international COVID-19 research

By Lara Cleveland

Since the onset of the COVID-19 outbreak, researchers across the globe have been accessing census microdata from IPUMS International for COVID-19-related research. Scholars at universities from the U.S. to Nepal, Columbia to Belgium, Nigeria to China, and elsewhere have used IPUMS data to assess population dynamics contributing to COVID-19 vulnerability or spread. Divisions of the United Nations, World Bank, and other policy research institutes have similarly accessed IPUMS census data for COVID response and relief efforts.

IPUMS International harmonizes and disseminates household-level microdata census samples from more than 100 countries. Access to microdata is essential for rapid response in new areas because of its analytic flexibility. Researchers needing to build custom tables or construct variables for complex modeling suited to specific research questions can only do that with microdata. Of particular interest for research on population dynamics of COVID-19 is information about the age structure of the population, household living arrangements (household size, intergenerational co-residence, etc.), indicators of health vulnerability (age, work status, housing conditions, disability, etc.), healthcare workforce distribution, and migration patterns. IPUMS International census samples also include valuable subnational geographic identifiers at the first and second administrative levels, which are especially useful for highlighting particular regions or localities of vulnerability.

Continue reading…