By Kari Williams
As part of the IPUMS mission to democratize data, our user support team strives to answer your questions about the data. Over time, some questions are repeated. This blog post is an extension of an earlier series addressing frequently asked questions. Maybe you’ll learn something. Perhaps you’ll just find the information interesting. Regardless, we hope you enjoy it!
Here’s one of those questions:
How do the original occupation and industry codes map onto harmonized versions created by IPUMS?
If your research includes measures of occupation or industry over long periods of time, you may have realized that changes to the labor market change how occupations and industries are classified and reported in data. For example, the 1990 U.S. Census occupation codes include two computer-related occupations; the 2018 version of these codes include 13 different computer-related occupations! Similarly, comparing occupations from different countries’ censuses is difficult if they don’t use the same original coding scheme. IPUMS provides harmonized occupation and industry variables (see list at end of post) that allow researchers to use a consistent classification scheme over long periods of time or for international comparisons.
These harmonized versions of occupation and industry variables can be useful, but data users often wonder how the original classifications relate to the harmonized versions. A crosswalk shows how different schemas code the same input. IPUMS USA offers crosswalks that show how occupations or industries in adjacent classifications relate to one another, and there are a number of papers that include crosswalks between one or more classifications and the harmonized IPUMS variables (Meyer & Osborne 2005, Williams & Flood 2019). However, we do not provide crosswalks between every sample’s unharmonized occupation and industry codes and the harmonized version of these measures. Next, I will describe how you can make this on your own crosswalk for this purpose.
To make a crosswalk mapping the contemporaneous occupation or industry coding scheme into a harmonized IPUMS variable, you should create a data extract with the following:
- Unharmonized version of the occupation or industry code (e.g., OCC or IND)
- Harmonized occupation or industry variable of interest (e.g., OCC1990, IND1990, OCCISCO)
- Sample(s) of interest (e.g., 2015 1-year ACS PUMS, January 2020 CPS basic monthly data)
After opening the data into the stats package of your choice, take the following steps:
- If necessary, append the numeric codes to the descriptive labels for the occupation and/or industry codes.
- For each sample included in your extract, identify all unique combinations of the unharmonized and harmonized measures and create a summary dataset with each unique combination as an observation. Note: because the relationship may differ by sample, you should repeat this step separately for each sample included in your extract.
- Celebrate your custom crosswalk with a little dance or a fancy beverage!
Here is sample code for doing this in Stata:
*Let’s create a crosswalk between the contemporaneous OCC variable in 2015 1-year ACS PUMS and OCC1990!
*run do file for extract 00001 from IPUMS USA
do usa_00001.do
*append numeric codes to descriptive labels
numlabel, add
*create summary dataset that has one observation per unique combination of OCC and OCC1990 for the 2015 1-year ACS
contract occ occ1990
*display my crosswalk
list occ occ1990
*I might even sort it by harmonized occupation code and write it out to a log file so I can keep it afterwards
sort occ1990
log using my_crosswalk, text replace
list occ occ1990
log close
*ta-da! I will now do the macarena while sipping on coffee from my IPUMS mug!
If you want to learn more about harmonized occupation and industry variables, check out the following documentation:
- IPUMS USA: Occupation and Industry variables overview (this user guidance documentation includes links to the harmonized OCC1950, OCC1990, OCC2010, IND1950, IND1990 variables in IPUMS USA)
- IPUMS CPS variables: OCC1950, OCC1990, OCC2010, IND1950, IND1990
- IPUMS International variables: OCCISCO, ISCO88A, ISCO68A, INDGEN
- IPUMS NHIS: OCC1995, IND1995
Working papers & journal articles
- Meyer, Peter B. and Anastasiya M. Osborne. 2005. “Proposed Category System for 1960-2000 Census Occupations.” Office of Productivity and Technology Working Paper 383. Bureau of Labor Statistics.
- Sobek, Matthew. “The Comparability of Occupations and the Generation of Income Scores.” Historical Methods 28(Spring 1995): 47-51.
- Williams & Flood. 2019. “Harmonizing the 2010 and 2002 Census Occupation Coding Schemes.” Minnesota Population Center Working Paper Series 2019-01. https://doi.org/10.18128/MPC2019-01
Classification schemes