What’s new with IPUMS USA? Updates for Industry and Occupation Variables

By Megan Schouweiler (Senior Data Analyst, IPUMS USA) and Sophia Foster (Data Analyst, IPUMS USA)

The Census Bureau drops ACS 1-year PUMS files tomorrow (October 15, 2020)! Don’t worry, the IPUMS USA team will get right to work to get you some data as soon as possible. In the meantime, let’s talk a little about what’s new with occupation and industry variables on IPUMS USA.

New OCCSOC and INDNAICS Crosswalks Available

You may be familiar with our harmonized occupation (OCC1950, OCC1990, OCC2010) and industry variables (IND1950, IND1990). These variables harmonize occupation/industry codes based on Census Bureau classification systems to a base year, making comparisons across time much easier. Researchers are also interested in using the Standard Occupational Classification (SOC) system and North American Industry Classification System (NAICS) codes that are available in the public use data; IPUMS has not created nifty harmonized variables for these codes. We hope to harmonize these codes someday– until then, we will settle for providing great documentation about how these codes have changed over time. And we’ve recently made the documentation even better!

OCCSOC reports the primary occupation based on the SOC system, and INDNAICS reports the type of establishment of the primary occupation based on the NAICS system. Both of these coding systems are periodically updated. In the past two decades, the OCCSOC codes have been updated six times and the INDNAICS codes have been updated five times, creating a challenge for those utilizing the codes to conduct research across time. Beyond navigating the changes to the coding schemes, there are separate crosswalks for each update. We recently updated each of our crosswalks to include all iterations of the underlying coding systems from 2000 onward in a single table for OCCSOC and INDNAICS, respectively. Instead of a bunch of links to crosswalks that just compare adjacent schemes, we’ve combined all years into one table.

In total, we created four crosswalks: OCC to OCCSOC; IND to INDNAICS; OCCSOC only; and INDNAICS only. These crosswalks include detailed descriptions of how OCCSOC and INDNAICS codes have changed over time from the 2000 Census to present. Examples of changes include one occupation/industry splitting into multiple new categories, multiple categories collapsing into one occupation/industry, and updates to codes and titles. Because these types of changes occur with each new iteration of the coding scheme, it can be difficult to understand how the codes relate to one another across time. We hope that these new crosswalks provide a more comprehensive mapping of the OCCSOC and INDNAICS codes over time and will aid researchers in using these variables. These crosswalks are available to view on the IPUMS USA website and for download in both Excel and CSV format. Trust us, you’ll want to download these crosswalks to make your programming a lot easier.

Occupational Standing Variables: What are they good for?

In addition to updating the OCCSOC and INDNAICS crosswalks, IPUMS USA also released the 2018 occupational standing variables for the ACS/PRCS samples. Updated variables include OCCSCORE, SEI, HWSEI, PRENT, PRESGL, EDSCOR50, EDSCOR90, ERSCOR50, ERSCOR90, NPBOSS50, and NPBOSS90. To provide an example of how these variables can be used in research, we conducted a visual analysis using two of the updated variables, ERSCOR50 and EDSCOR50, to examine how occupational standing has changed over time for occupations highlighted during the COVID-19 pandemic.

The COVID-19 pandemic has directed particular attention towards “essential” workers and their contributions to society, raising the question of whether traditional measures of occupational standing reflect the value that we are placing on these “essential” occupations. We examined the occupational standing of occupations that have received popular attention during the COVID-19 pandemic to understand how these occupations compare to one another based on education and earnings, and to see whether these rankings have changed over time from 1950 to 2010 using the decennial Census samples.

We chose these groups from a list of occupations with high exposure to COVID-19 (Lu, 2020) and then narrowed down to a core list of occupations that have been receiving recent media attention.

Table 1
A List of Occupations Included in Each Occupation Category
Occupation Category Occupations Included in Each Category: 
Waitstaff Waiters and Waitresses
Cashiers Cashiers
Beauticians Barbers; Hairdressers, Hair Stylists, and Cosmetologists; Miscellaneous Personal Appearance Workers
Nurses Registered nurses; Nurse anesthetists; Nurse practitioners and nurse midwives; Physician assistants; Medical and health service managers
Physicians and Surgeons Physicians and Surgeons
Managers and Officials Financial analysts; Food service managers; Retail worker supervisors; Producers and Directors; Chief executives and legislators; General and operations managers; Construction managers
Teachers Elementary and Middle School Teachers; Preschool and Kindergarten Teachers; Secondary School Teachers; Education administrators
Laborers Construction Laborers; Refuse and Recyclable Material Collectors; Food Cooking Machine Operators and Tenders; Helpers–Production Workers; Subway, streetcar, and other rail transportation workers
Note. This table lists the occupations that are included in each of the eight occupation categories included in the analysis.

Next, we matched the occupation titles to the Census defined occupation categories and then to the 1950-equivalent occupation titles using the IPUMS USA variable, OCC1950. To assess occupational standing based on education and earnings, we used EDSCOR50 and ERSCOR50.

Figure 1
Figure 1

EDSCOR50 is constructed by calculating the percentage of people in a given occupation who have completed one or more years of college. ERSCOR50 is constructed by converting median earnings for each occupation to standardized z-scores, and then converting the z-score to a percent to indicate the percentage of occupations that are above or below a given occupation based on median earnings.

Overall, these figures show that the educational standing of waitstaff, cashiers, beauticians and laborers has been increasing relative to other occupations but their earnings have not. Despite their “essentialness,” examining the occupational standing variables shows that we’ve been compensating these occupations less and less over time.

Our visual analysis is just one of many ways these variables can be utilized for research. We look forward to learning about all the ways our users are leveraging occupational standing measures in their work. And remember… use if for good, never for evil!

IPUMS provides demographic data for international COVID-19 research

By Lara Cleveland

Since the onset of the COVID-19 outbreak, researchers across the globe have been accessing census microdata from IPUMS International for COVID-19-related research. Scholars at universities from the U.S. to Nepal, Columbia to Belgium, Nigeria to China, and elsewhere have used IPUMS data to assess population dynamics contributing to COVID-19 vulnerability or spread. Divisions of the United Nations, World Bank, and other policy research institutes have similarly accessed IPUMS census data for COVID response and relief efforts.

IPUMS International harmonizes and disseminates household-level microdata census samples from more than 100 countries. Access to microdata is essential for rapid response in new areas because of its analytic flexibility. Researchers needing to build custom tables or construct variables for complex modeling suited to specific research questions can only do that with microdata. Of particular interest for research on population dynamics of COVID-19 is information about the age structure of the population, household living arrangements (household size, intergenerational co-residence, etc.), indicators of health vulnerability (age, work status, housing conditions, disability, etc.), healthcare workforce distribution, and migration patterns. IPUMS International census samples also include valuable subnational geographic identifiers at the first and second administrative levels, which are especially useful for highlighting particular regions or localities of vulnerability.

The surge in new or renewing IPUMS International data user applications listing COVID-related research topics provides only a glimpse of how IPUMS is being used to aid in pandemic response efforts. Many existing IPUMS users are also turning their research focus toward pandemic response. For example, long-time IPUMS colleagues, Esteve et al.,1 analyzed how national age and co-residence patterns shape COVID-19 vulnerability using data entirely from IPUMS International (see figure below). The paper, published in the Proceedings of the National Academy of Sciences, simulates COVID-19 outbreak in 10% of the population to investigate mortality vulnerability by country based on national age and co-residence patterns.

From Esteve et al., 2020: Estimated number of direct (dark) and indirect (light) deaths per 100,000 individuals if primary infections of specific age groups are avoided. Data are from 2010 census round. Individuals from each age group who were selected in the 10% random draw are recoded as not infected before calculating direct deaths and simulating within household transmission.

Over the summer, researchers at IPUMS calculated a series of population-based indicators from census microdata for the UNFPA’s COVID-19 Population Vulnerability Dashboard2, which maps demographic characteristics contributing to COVID-19 vulnerability (outbreak, spread, or mortality). The resulting dashboard went live in July, and includes additional data layers from WorldPop, the World Health Organization (WHO), and the Johns Hopkins University Coronavirus COVID-19 Global Cases Dashboard. Interestingly enough, the WHO measures of healthcare workforce prevalence also relied, in part, on occupational information from the census samples in IPUMS International. The interactive dashboard aims to provide health workers, policy makers, and the public with important information about vulnerable populations to aid in preparedness and response to COVID-19.

UNFPA’s COVID-19 Population Vulnerability Dashboard
UNFPA’s COVID-19 Population Vulnerability Dashboard

We would love to hear about your COVID-19-related research! Send us a note (ipums@umn.edu), and make sure to submit your work to our bibliography. It is particularly difficult for us to track down work on dashboards, indicators, and policy briefs using IPUMS data. We depend upon you to let us know how you have used the data. All of your research products help us give back to our national data partners and secure future funding.

Census microdata samples from IPUMS International belong to the countries that partner with us. IPUMS adds standardization, harmonization, and documentation work in order to save researchers countless hours of data preparation. We are grateful to the many countries who choose to share their data so that we can all use it for good!

  1. Esteve, Albert, Inaki Permeyer, Diederick Boertien, James Vaupel. 2020. “National age and coresidence patterns shape COVID-19 vulnerability.” Proceedings of the National Academy of Sciences, July 14, 2020, 117 (28): 16118-16120; first published June 23, 2020 https://doi.org/10.1073/pnas.2008764117 (https://www.pnas.org/content/117/28/16118)
  2. 2020. COVID-19 Population Vulnerability Dashboard (https://covid19-map.unfpa.org).

Cite us! Seriously though…

By Renae Rodgers and Kari Williams

Hi there IPUMS users! Let’s talk about citations. When using our datasets in your insightful, groundbreaking, interesting work, please cite us! 

Seriously though. 

Cite us. 

You wouldn’t steal a car, you wouldn’t rob a little old lady of her handbag, you wouldn’t base work on that of a colleague and not put their paper(s) in your reference section, right?!? Then don’t use IPUMS data and fail to mention it! 

To help you on your way, here are some answers to frequently asked questions:

Q:   Do I have to though? 

A:   Yes. Properly citing IPUMS data is part of the user agreement. Before you ever submitted your first extract, you agreed to do this!

Screenshot of citation agreement

Q:   I’ve mentioned IPUMS in the caption of my figures and tables, so I am good to go, right?

A:   Nope. Putting our URL in a footnote, endnote, or caption is insufficient. Name-checking IPUMS in your “Data and Methods” section is not enough. Just for good measure, we will mention that, citing a paper by IPUMS staff about IPUMS data is not the same thing as citing a dataset.


Q:   What about talking about how much I love IPUMS on Twitter or naming my firstborn after this amazing data provider? Is that an appropriate substitute?

A:   [public radio voice] If you appreciate the resources that IPUMS provides, using the data and citing it is the best way to support us. Our ability to continue to provide this service is dependent on capable and intelligent users like you citing our datasets! Seriously, a core part of our funding depends on our ability to prove that the data infrastructure IPUMS offers is being used. If you want IPUMS to keep offering the latest data and developing new tools, we need you to cite us so we can demonstrate to our funders that IPUMS is useful. Citing us is the best way to support us (though we are keen to hear about your children with middle names based on your favorite IPUMS variables).


Q:   How do I cite IPUMS properly?

A:   We are so glad you asked! When you receive an email notification that your custom dataset from IPUMS is ready to download, it includes the citation! Each IPUMS data product has its own citation – be sure to use the citation associated with the IPUMS data that you used. If you use more than one IPUMS data product, cite all of them!

Q:   Okay, wait! I have one more. 

A:   Go for it.

Q:   What if…I deleted the extract email and didn’t make note of the citation? 

A:   Not a problem! For your convenience, we just happen to have this handy link of all the current IPUMS dataset citations with DOIs. You can also find each IPUMS dataset’s citation on the left menu of the homepage.

To those users who are diligent about citing IPUMS datasets, we thank you! If you have used IPUMS without citing it or committed one of the other faux pas above in the past, we hope you now have the instruction and incentive to do better going forward!