At IPUMS we try to address every user’s questions and suggestions about our data. It is just one feature that adds value to IPUMS data. Over time, many questions are often repeated. In this blog series, we will be sharing some of those frequently asked questions. Maybe you’ll learn something, or perhaps you’ll just find these interesting. Regardless, we hope you enjoy.
Why are there so many missing U.S. counties in my data?
There are two answers for this question, depending on the data source:
For IPUMS USA, which provides U.S. Census data, counties are identifiable in public use data prior to 1950, but not after. This is due to the “72-year” rule, which restricts the U.S. government from releasing personally identifiable information until 72 years after it was collected for the Census. Identifying individual records by all counties in the U.S. runs the risk of breaching this confidentiality requirement. Experienced IPUMS USA users will note, however, that some counties are actually identifiable in post-1950 IPUMS USA samples. This is because some counties are able to be recovered by using other geographic identifiers–such as, in recent years, Public Use Microdata Area (PUMA) . Not all counties are available, however, and this spreadsheet details which counties are available in which post-1950 samples.
For IPUMS CPS, which provides data from the Bureau of Labor Statistics Current Population Survey (CPS), only about 45% of all households reside in a county that can be identified. For many new users, this is a surprisingly low number. Again, for reasons of confidentiality, but made more dramatic due to the smaller sample size of the CPS (compared to the ACS), the majority of households are not within an identifiable county.
So, what now? Some researchers simply choose a different level of geography, such as the PUMA, which is identifiable for all records in modern IPUMS USA samples. Others, who are interested in performing aggregated analysis at the county level, seek alternative data sources. IPUMS NHGIS provides geo-coded data that is aggregated and so isn’t subject to the confidentiality restrictions of individual record level data.
Have a question or comment for IPUMS? Email firstname.lastname@example.org or post on our User Forum.
Story by Jeff R. Bloem
PhD Student, Department of Applied Economics
Graduate Research Assistant, Minnesota Population Center