IPUMS CPS Now Offering Unharmonized Variables

Have you ever wanted to include a Current Population Survey (CPS) variable in your analysis that isn’t available in IPUMS CPS? Maybe you have even gone through the drudgery of merging original CPS data onto your IPUMS CPS extract. Well, no more! IPUMS CPS is now offering original basic monthly CPS data as unharmonized variables to save you the trouble. And we’re not stopping there — ASEC and topical supplement unharmonized variables are in the pipeline!

What are unharmonized variables?

Unharmonized variables are original CPS variables packaged for accessibility via IPUMS CPS alongside our harmonized variables.

As a reminder, harmonized variables are coded consistently across years and months, with unknown and NIU categories consistently coded and unexpected values documented and recoded. By contrast, unharmonized variables are (mostly) unrecoded variables that apply a consistent name to variables with the same meaning and codes across multiple months of data. While no codes have been changed to harmonize categories across time, original values are occasionally recoded in our unharmonized variables to increase usability. For example, all NIU records are given a common code across time. For more details on recoding in unharmonized variables, see our unharmonized variable documentation.

The CPS is fielded monthly, and variables rarely change from month to month. We leverage the commonalities in the underlying CPS data across months in constructing unharmonized variables. Delivering unharmonized variables eliminates the need for users to navigate the large number of datasets and occasional changes in variable names across time and then merge original data with your IPUMS CPS extract.

How are these different from source variables available in other IPUMS products?

IPUMS USA and IPUMS International make original unrecoded data available to users via source variables. Like source variables, IPUMS CPS unharmonized variables deliver original data but, rather than being unique to each dataset, a single unharmonized variable is available for every month in which a given original CPS variable has identical codes and value labels.

IPUMS CPS unharmonized variables are denoted with a “UH_” prefix and a numeric suffix. When codes are added or removed or have different labels between years, a new unharmonized variable is created, and the suffix increments by one.

For example, the race variable exists in all CPS months from 1976-2019. However, it has eight different variable names and five different sets of codes across this time period.

IPUMS CPS synthesizes this information into five different unharmonized variables:UH_RACE_1, which is available from 1976-1988, andUH_RACE_2, available from 1989-1995, UH_RACE_3 available from 1996-2002, UH_RACE_4 available from 2003-2012, and UH_RACE_5 from 2013-2019. These variables bundle the source data from all months which have the same possible codes into five easily-selected variables.

Why should I use IPUMS CPS unharmonized variables?

Convenience. Unharmonized variables allow you to include variables in your IPUMS CPS extract that we have not yet harmonized! While we’re always happy to take requests via IPUMS user support (ipums@umn.edu) for new harmonized variables, they aren’t going to be available yesterday, which we know is when you actually wanted them. Instead of merging original files with IPUMS extracts while you wait for us to harmonize your variable(s) of interest, you can simply use the unharmonized variables to construct the variable you need.

Flexibility. Don’t like our harmonized variables? Make your own! Don’t need a harmonized variable for your time period of interest?No need to retain unwanted detail that we preserve and no need to sacrifice wanted detail that we collapse for comparability!

How do I find IPUMS CPS unharmonized variables?

We clearly differentiate between harmonized and unharmonized variables in the IPUMS CPS extract system. Users can browse available unharmonized variables by selecting the “Unharmonized Variables” radio button on the IPUMS CPS Select Data page.

To find unharmonized variables associated with an IPUMS CPS harmonized variables, see the “Unharmonized Variables” tab on the harmonized variable’s page. For example, the harmonized RACE variable has five unharmonized variables associated with it. From the “Unharmonized Variables” tab, you can easily browse the associated unharmonized variables and include them in your extract.

Links to associated harmonized variables are also available on unharmonized variable pages.

Some IPUMS CPS variables are created out of multiple original CPS variables. Curious users can now easily view the source data underlying our harmonized variables! For example:

The harmonized variable SCHLCOLL is constructed using three unharmonized variables, UH_SCHENR_1, “Attend high school or college last week,” UH_SCHFT_1, “Enrolled in school full-time or part-time,” and UH_SCHLVL_1, “Attending high school, college, or university.”

Excited about uharmonized variables? We are too! With the introduction of unharmonized variables, IPUMS is giving CPS researchers full access to the CPS in a format ready for over time analyses or for analyses of the most recent data. Users can now avoid merging original data with IPUMS CPS to access the original unrecoded data. Unharmonized variables also allow researchers to harmonize themselves if IPUMS doesn’t recoded exactly the way they would want it, and to easily map original unharmonized and harmonized codes onto one another. We’re here to make the research community’s access to the CPS easier. Send us your best ideas for continuing to improve IPUMS CPS.


Story by Renae Rodgers and Sarah Flood