IPUMS FAQs: Why is the 2014 sample twice as large as the total population?

At IPUMS we try to address every user’s questions and suggestions about our data. It is just one feature that adds value to IPUMS data. Over time, many questions are often repeated. In this blog series, we will be sharing some of these frequently asked questions. Maybe you’ll learn something, or perhaps you’ll just find these interesting. Regardless, we hope you enjoy.

Here’s one of those questions:

Why is the 2014 sample twice as large as the total population?

If you look at the 2014 CPS ASEC sample and calculate the total population size, you will find a number of about 626 million. For those who have the total population of the US memorized, this number will seem a bit high — roughly double the true population from 2014. Many users report this observation either thinking they’ve uncovered an error (and thus deserve a coveted IPUMS mug) or are simply looking for clarification.

The reason for this peculiarity is that in 2014 the US Census Bureau instituted an experimental redesign of the income questions within the ASEC questionnaire. The experiment gave 3/8ths of the sample new income questions and gave the remaining 5/8ths of the sample the old income questions. An important aspect of this design is that both the 3/8ths and the 5/8ths samples have WTSUPP values so that researchers can calculate representative statistics of the entire US population from either sample. Therefore, if users pool both of these samples together (or rather don’t actively separate these samples), they will inadvertently calculate a population size of roughly twice what is expected.

The best way around this issue is to use only one of the two samples within the 2014 ASEC. The HFLAG variable identifies whether the respondent is part of the 3/8ths or 5/8ths sample. Ultimately, the decision of which sample to use is up to the researcher. The income variables in the 5/8ths sample are the same as pre-2014 income variables and the income variables in the 3/8ths sample are the same as the post-2014 income variables. This presentation by the US Census Bureau discusses the differences between these income questions and how comparability of income variables over time may, or may not, be affected.

Some may be tempted to simply divide the WTSUPP values by 2 in the 2014 ASEC sample, so to preserve the full sample size by pooling both the 3/8ths and 5/8ths sample. Although this method seems reasonable, its accuracy is not verified in all situations. Researchers who want to maximize sample size and estimate precision, should compare results from this method to the method discussed above.

More information about the 3/8ths ASEC file is available here and here.


Story by Jeff R. Bloem
PhD Student, Department of Applied Economics
Graduate Research Assistant, Minnesota Population Center

