IPUMS USA: County Variable Name Changes

“What’s in a name? That which we call a COUNTYFIPS by any other name would still be accurate” ~ Steve Shakespeare (the lesser known, quantitative Shakespeare).

Couldn’t have said it better myself. Though, when you are sharing your variable names with tens of thousands of researchers who really just want their analysis to work you may want to be cautious about changing variable names. This is exactly the reason our (now re-named) COUNTY variable held onto that moniker for so long. And when you do change variable names you better try your hardest to let everyone know, so let’s start with that part.

Hey, everyone, we changed some variable names!


Why did we change the names?

Well, we hope that is a little obvious. The old names were inconsistent and did not give the user a lot of clues about how to use them. (You would have to read the variable description for NHGISJOIN to even figure out that it was related to counties.) The new names indicate that all three variables are related to the concept of county and once you read the variable descriptions it is easy to remember that COUNTYICP uses ICPSR codes while COUNTFIP uses FIPS codes and COUNTYNHG uses NHGIS county identifiers.

Beyond clarifying the relationship between the three variables, these new names also help identify these variables’ relationship to other geographic identifiers in IPUMS USA. STATEFIP and STATEICP had already established a good naming convention and now it is obvious which county identifier fits better with your state variable of choice. Before, researchers may have selected COUNTY alongside STATEFIP, mixing the ICPSR coding scheme with FIPS. This isn’t inherently problematic as geographic units are still uniquely identified, but mixing the two different coding systems may be confusing, especially if you are trying to link contextual data from another source.

A little bit of history.

You may be wondering how our county variables ended up with these not-so-integrated variable names. Well, some of you may remember that waaaaay back in the 1990’s most stats packages had an 8-character limit for variable names. This is why our two state variables, which have been around since the beginning, are both 8-characters long (hence STATEICP not STATEICPSR). This is also why many IPUMS variables to this day are missing a few vowels. Back then we only had the one county variable so distinguishing the coding schema in the name wasn’t really an issue. COUNTYFIPS was added later when the character limit was no longer a factor. Why not COUNTYFIP, then? Hey, when you’ve been restricted to 8 characters for so long and finally get let off that leash you go a little crazy sometimes. After our initial “use all the characters!” excitement we did realize that there is a benefit to shorter variable names as long as they make sense. And as stated before, we really want to avoid having to change these names for the sake of our data users. Which leads me to….

What does this mean for you?

There are three ways that an IPUMS variable name change may affect you:

1) You previously created an extract with a now renamed variable.

We’ve got you covered. When you resubmit your extract the extract engine will notice the now deprecated variable name and add the new variable name in its place. So if you have an old extract with NHGISJOIN, when you resubmit, your extract will now include COUNTYNHG. Furthermore, if you happen to have a link to the old variable metadata bookmarked (we all have our favorites, no judgement here) that old link will automatically redirect to the new variable name.

2) You are trying to replicate someone else’s previous work and they used a now renamed variable.

Say, you are trying to replicate an analysis of COUNTYFIPS, so you start building a data extract and realize COUNTYFIPS is nowhere to be found in IPUMS USA. In this situation the best place to look is the Errata and Revisions page. If you search the page for COUNTYFIPS you should find a note about “Renamed variables” and hopefully find what you are looking for.

A nifty “hack” would be to try to go to the documentation for the unfindable variable like so: https://usa.ipums.org/usa-action/variables/COUNTYFIPS. As mentioned above all old links to renamed variables get redirect to the new name, so even though you typed COUNTYFIPS in the URL you will end up on the COUNTYFIP page.

3) You have some old syntax files that perform analysis based on a now renamed variable.

Yeah, that’s gonna break. The good news is that the new variable is already in your extract. So if your old analysis used COUNTY, you just have to figure out which variable in your updated extract used to be COUNTY. Again, going to the Errata and Revisions page is probably the best place to get information on things that may have changed. Hopefully your code breaks in a way that very nicely points you to the fact that COUNTY is not a variable in the dataset. Then, when looking at what variables you do have, you will probably notice COUNTYICP and think to yourself, “I feel like I read a really good blog post about this.”

In conclusion…

We don’t like doing it, but sometimes old names have to change. We have tried to make this change as uneventful as possible and hopefully most of you will never even notice. But for those of you that do, I hope this post has been helpful.

Do you have a favorite IPUMS variable name? I think mine is AVAILBLE, because it drops a vowel but not the one you might expect.

Story by Joe Grover