At IPUMS we try to address every user’s questions and suggestions about our data. It is just one feature that adds value to IPUMS data. Over time, many questions are often repeated. In this blog series, we will be sharing some of these frequently asked questions. Maybe you’ll learn something, or perhaps you’ll just find these interesting. Regardless, we hope you enjoy.
Here’s one of those questions:
Why isn’t the large U.S. city I’m interested in analyzing identifiable in the data?
Occasionally a user will contact us wondering why a relatively large city (such as Atlanta, GA or Dallas, TX) is missing from their data extract. This understandably seems strange. Surely these cities have large enough populations to maintain confidentiality of individuals included in the sample, so why are these cities not identifiable in the data?
The reason is that city geographic level is not identified in modern US Census samples. Rather the PUMA is the lowest geographic level in public use data. Sometimes, however, the PUMA boundary is coterminous with the city boundary. In such cases the city is identifiable. In other cases the PUMA boundary encompasses several cities or straddles a city boundary, which means the city is likely not identifiable in public use data.
The IPUMS USA Team tries to make some improvements by establishing a protocol for identifying cities in samples from 1990 and onward. The protocol aims to identify a city in which the majority of each PUMA’s population resides. In these situations, the household may not actually reside within the given city, as the protocol is simply making judgements based on where it is most likely a household is located. Therefore, there is an error rate associated with this protocol. The default error rate, set by IPUMS, is for less than 10% of the households to be misallocated. Researchers are able to make this error rate more restrictive by using the CITYERR.
Story by Jeff R. Bloem
PhD Student, Department of Applied Economics
Graduate Research Assistant, Minnesota Population Center