IPUMS FAQs: Sample Weights

At IPUMS we try to address every user’s questions and suggestions about our data. It is just one feature that adds value to IPUMS data. Over time, many questions are often repeated. In a new blog series, we will be sharing some of these frequently asked questions. Maybe you’ll learn something, or perhaps you’ll just find these interesting. Regardless, we hope you enjoy.

Here’s one of those questions:

What are sample weights? And do I need to use them?

Many users write in asking about sample weights. Most samples within IPUMS projects have complex sampling designs. This means that not everyone who is found in the data has the same probability of being selected into the sample. Without correcting for this reality, statistics may be calculated incorrectly.

Here is a simple example: Consider a desert island where exactly 1,000 birds live. A researcher wants to know some basic information about the birds on the island, but only has a budget to collect information on 100 birds. If the researcher takes a random sample of 100 birds from the island, each bird would represent 10 other birds in the researcher’s analysis.

Next, we need to account for the fact that not all birds on the island are the same. In fact, there are 850 pelicans and 150 hummingbirds on the island. Performing a 1/10 random sample would provide about 15 hummingbirds. This isn’t a large enough sample size for the researcher to understand the characteristics of the hummingbird population on the island. So, instead the sample procedure becomes a bit more complicated. The researcher samples 30 hummingbirds and 70 pelicans. Now every hummingbird in the researcher’s sample represents 5 hummingbirds and each pelican represents about 12 pelicans. One kind of sample weight, which is often used in IPUMS data, will account for how many observations in the total population each observation accounts for in the sample population.

So, when is using sample weights necessary? Returning to the island of birds, it isn’t too difficult to see if we were to simply calculate the average height of birds in our weighted sample, we would calculate an incorrect average for the total population. This is because, without accounting for sampling weights, we will be over-representing the height of hummingbirds and under-representing the height of pelicans. As a general rule of thumb, sample weights are necessary when researchers are calculating average statistics of some characteristic of the population. When running regressions, weights will not change the point estimates, but may influence the standard errors around these estimates.

For more information on sample weights in IPUMS samples, see the following resources:

Have a question or comment for IPUMS? Email ipums@umn.edu or post on our User Forum.

Story by Jeff R. Bloem
PhD Student, Department of Applied Economics
Graduate Research Assistant, Minnesota Population Center