By Julia A. Rivera Drew, Kari C.W. Williams, and Natalie Del Ponte
The IPUMS NHIS project offers integrated versions of the National Health Interview Survey (NHIS) data, the leading source of nationally representative information on the health of the U.S. population. The National Center for Health Statistics (NCHS) collects the NHIS data through face-to-face interviews covering information about health, health insurance coverage, health care utilization, socioeconomic characteristics, and demographics of all household members. It is representative of the civilian, non-institutionalized U.S. population with annual samples ranging between 30,000-50,000 households and 75,000-100,000 people. NCHS has collected the NHIS annually since 1957 (with digital copies of the data available going back to 1963), making it the longest running annual survey of health in the world.
Periodically, aspects of data collection – such as the sampling frame, oversampled populations, or questionnaire content – change to better capture changes in the most pressing health concerns of Americans or changes in the demographic makeup of Americans and where they reside within the U.S. Most of these changes are modest, reflecting changes in U.S. population composition and distribution detected in the most recent decennial census. However, 2019 heralded the largest change in NHIS data collection since 1997. In fall 2020, the NCHS will release the 2019 public use data files, the first data collected under the newly redesigned NHIS. The upcoming release of the 2019 data warrants a look back at how NCHS collected the NHIS data over the 1997-2018 period.
1997-2018 at a Glance
The data collection design of the 1997-2018 NHIS was largely comparable over time. There were a few minor changes during this period, the largest taking place between 2005 and 2006 to update the sampling frame to reflect the 2000 Census and add an oversample of Asian persons. Most oversamples were discontinued in 2016 (see the IPUMS NHIS note on Sample Design for more information). Under the 1997-2018 design (illustrated in Figure 1), the NHIS was a sample of households, where each household could potentially contain multiple families. One representative from each family, the family respondent, provided demographic, health status, and health insurance coverage information about all family members. In addition to the data collected for all family members, interviewers randomly sampled one adult and one child per family to complete additional interviews (the “sample adult” and “sample child” questionnaires, respectively). Through this mechanism, the NHIS collected further information on topics such as Body Mass Index, mental health, access to health care, health behaviors, and (for adults) sexual orientation and details about paid employment. NCHS releases standalone data files for each of these content areas (households, families, family members, sample adults, and sample children) every year. IPUMS NHIS allows users to review variables from all content areas and include them in a single data extract.
Figure 1. 1997-2018 NHIS Data Collection
For IPUMS NHIS users interested in combining information collected on different parts of the survey, understanding the NHIS data collection process is important for two reasons. First, when users design analyses of the NHIS data, they must take into account the extent to which the overlap of topical supplements collected for sample adults and sample children varies by subject area and over time. Second, which variables analysts combine determines which sampling weight is most appropriate for analyses that utilize data from these different content areas.
Overlapping Sample Adult and Sample Child Content
Users interested in the rich topical content of NHIS may wish to design analyses that take advantage of the occasional and recurring supplements asked of sample adults and sample children. However, it is important to note that the items collected by the sample adult questionnaire are not necessarily also part of the sample child questionnaire, and vice versa. Even when similar topics are covered, the two questionnaires may not include identical measures. IPUMS NHIS combines sample adult and sample child measures into a single integrated variable wherever they overlap to make it easier for users interested in looking at both groups.
Additionally, because NCHS fields some supplements only in certain years, there are topical combinations that are not possible because NCHS never asks specific supplemental questions in the same year (e.g., the balance problems supplement never overlaps with the complementary and alternative medicine supplement). IPUMS NHIS users who add variables of topical interest to their data requests without confirming that they are available for all the relevant years may be confused to find missing values where they did not expect any.
Selection of Appropriate Sampling Weight
As described above, NHIS is a complex, multistage probability sample. Users must make use of sampling weights to produce population representative point estimates. For information on producing correct standard errors and statistical tests, see the IPUMS NHIS user note on variance estimation. Because NCHS releases standalone data files for each content area, they offer more weight variables (at least one for each file). Most person-level analyses using IPUMS NHIS will use PERWEIGHT or SAMPWEIGHT.
The IPUMS NHIS variable PERWEIGHT corresponds to WTFA in the original NCHS data files. PERWEIGHT is appropriate for analyses that use variables collected for all family members. The IPUMS NHIS variable SAMPWEIGHT combines two separate weights, one for sample adults and one for sample children, from the original NCHS data files. SAMPWEIGHT reports the sample adult weight only if the person is the selected sample adult and the sample child weight only if the person is the selected sample child. SAMPWEIGHT is 0 for all other persons. SAMPWEIGHT is appropriate for analyses that include variables collected as part of the sample adult or sample child content of the questionnaire. In cases where both types of variables are included, users should apply the more restrictive of the two weights (SAMPWEIGHT in this case).
Look for a future post describing the 2019 NHIS redesign after it is released in the fall of 2020. Until then, you may be interested in these IPUMS NHIS user notes on Sample Design, Sampling Weights, and Variance Estimation in NHIS data.