Overview of NHIS Data Collection, 1997-2018

By Julia A. Rivera Drew, Kari C.W. Williams, and Natalie Del Ponte

The IPUMS NHIS project offers integrated versions of the National Health Interview Survey (NHIS) data, the leading source of nationally representative information on the health of the U.S. population. The National Center for Health Statistics (NCHS) collects the NHIS data through face-to-face interviews covering information about health, health insurance coverage, health care utilization, socioeconomic characteristics, and demographics of all household members. It is representative of the civilian, non-institutionalized U.S. population with annual samples ranging between 30,000-50,000 households and 75,000-100,000 people. NCHS has collected the NHIS annually since 1957 (with digital copies of the data available going back to 1963), making it the longest running annual survey of health in the world.

Periodically, aspects of data collection – such as the sampling frame, oversampled populations, or questionnaire content – change to better capture changes in the most pressing health concerns of Americans or changes in the demographic makeup of ­­Americans and where they reside within the U.S. Most of these changes are modest, reflecting changes in U.S. population composition and distribution detected in the most recent decennial census. However, 2019 heralded the largest change in NHIS data collection since 1997. In fall 2020, the NCHS will release the 2019 public use data files, the first data collected under the newly redesigned NHIS. The upcoming release of the 2019 data warrants a look back at how NCHS collected the NHIS data over the 1997-2018 period.

1997-2018 at a Glance

The data collection design of the 1997-2018 NHIS was largely comparable over time. There were a few minor changes during this period, the largest taking place between 2005 and 2006 to update the sampling frame to reflect the 2000 Census and add an oversample of Asian persons. Most oversamples were discontinued in 2016 (see the IPUMS NHIS note on Sample Design for more information). Under the 1997-2018 design (illustrated in Figure 1), the NHIS was a sample of households, where each household could potentially contain multiple families. One representative from each family, the family respondent, provided demographic, health status, and health insurance coverage information about all family members. In addition to the data collected for all family members, interviewers randomly sampled one adult and one child per family to complete additional interviews (the “sample adult” and “sample child” questionnaires, respectively). Through this mechanism, the NHIS collected further information on topics such as Body Mass Index, mental health, access to health care, health behaviors, and (for adults) sexual orientation and details about paid employment. NCHS releases standalone data files for each of these content areas (households, families, family members, sample adults, and sample children) every year. IPUMS NHIS allows users to review variables from all content areas and include them in a single data extract.

Figure 1. 1997-2018 NHIS Data Collection

Illustration of sampling of data for NHIS

For IPUMS NHIS users interested in combining information collected on different parts of the survey, understanding the NHIS data collection process is important for two reasons. First, when users design analyses of the NHIS data, they must take into account the extent to which the overlap of topical supplements collected for sample adults and sample children varies by subject area and over time. Second, which variables analysts combine determines which sampling weight is most appropriate for analyses that utilize data from these different content areas.

Overlapping Sample Adult and Sample Child Content

Users interested in the rich topical content of NHIS may wish to design analyses that take advantage of the occasional and recurring supplements asked of sample adults and sample children. However, it is important to note that the items collected by the sample adult questionnaire are not necessarily also part of the sample child questionnaire, and vice versa. Even when similar topics are covered, the two questionnaires may not include identical measures. IPUMS NHIS combines sample adult and sample child measures into a single integrated variable wherever they overlap to make it easier for users interested in looking at both groups.

Additionally, because NCHS fields some supplements only in certain years, there are topical combinations that are not possible because NCHS never asks specific supplemental questions in the same year (e.g., the balance problems supplement never overlaps with the complementary and alternative medicine supplement). IPUMS NHIS users who add variables of topical interest to their data requests without confirming that they are available for all the relevant years may be confused to find missing values where they did not expect any.

Selection of Appropriate Sampling Weight

As described above, NHIS is a complex, multistage probability sample. Users must make use of sampling weights to produce population representative point estimates. For information on producing correct standard errors and statistical tests, see the IPUMS NHIS user note on variance estimation. Because NCHS releases standalone data files for each content area, they offer more weight variables (at least one for each file). Most person-level analyses using IPUMS NHIS will use PERWEIGHT or SAMPWEIGHT.

The IPUMS NHIS variable PERWEIGHT corresponds to WTFA in the original NCHS data files. PERWEIGHT is appropriate for analyses that use variables collected for all family members. The IPUMS NHIS variable SAMPWEIGHT combines two separate weights, one for sample adults and one for sample children, from the original NCHS data files. SAMPWEIGHT reports the sample adult weight only if the person is the selected sample adult and the sample child weight only if the person is the selected sample child. SAMPWEIGHT is 0 for all other persons. SAMPWEIGHT is appropriate for analyses that include variables collected as part of the sample adult or sample child content of the questionnaire. In cases where both types of variables are included, users should apply the more restrictive of the two weights (SAMPWEIGHT in this case).

Look for a future post describing the 2019 NHIS redesign after it is released in the fall of 2020. Until then, you may be interested in these IPUMS NHIS user notes on Sample Design, Sampling Weights, and Variance Estimation in NHIS data.

New survey data from IPUMS PMA allows for exploration of factors in child nutrition status

By Devon Kristiansen

Last month, when IPUMS PMA released data from nine countries, including the most recent person level and service delivery point level surveys on family planning, we also released data on a new topic for Performance Monitoring for Action (PMA) – nutrition.  PMA conducted two survey rounds each in Burkina Faso and Kenya (2017 and 2018) in both in people’s homes (households) and where they received care and medical services (service delivery points).  Household surveys contained questions about the diet and nutritional status of children under 5 and women between 10 and 49 years, antenatal care and advice received by currently or recently pregnant women, and other household and demographic questions.  Service delivery points were surveyed for medical equipment and services relating to malnutrition and anthropometric monitoring.

A key factor for nutrition status of young children in the low and middle-income country (LMIC) context is incidence of diarrhea.  Diarrhea prevents the uptake of nutrients into the child’s body and causes dehydration. According to the World Health Organization1, diarrhea is the leading cause of malnutrition and second leading cause of death for children under 5 globally.  A well-established association in the nutrition literature is the presence of livestock on the homestead and incidence of diarrhea in young children, due to fecal contamination of water and food sources2, 3.

The newly-released IPUMS PMA Nutrition data confirm past findings regarding the presence of livestock and diarrheal disease incidence.  This blogpost is a brief, informal exploration of one type of research question these data can provide.  In the Burkina Faso and Kenya 2017 Nutrition rounds, 26.9% of children under 2 years old living on a homestead with livestock present had experienced diarrhea in the past 2 weeks, compared to 20.2% of young children living on homesteads without livestock.  The difference between these prevalence rates are statistically significant.

The richness of IPUMS PMA Nutrition data allow researchers to further study the nuances of this effect, and test other hypotheses related to child nutrition.   For example, I looked to see if there was a significant correlation between the presence of livestock and diarrhea in young children after controlling for urban-rural status and examined the impact of possible mitigating factors.

It’s important to note the differences in how ‘urban’ and ‘rural’ are defined in different countries. For example, Burkina Faso’s definition of urban is a locality of more than 10,000 people with sufficient socio-economic and administrative infrastructures. In contrast, Kenya’s definition of urban is municipalities, town councils, and other urban centers with 2,000 or more inhabitants.  The comparability tab on IPUMS PMA makes it easier to identify these differences and take them into consideration when comparing data across countries and surveys.

I found that higher wealth, secondary education or higher of the child’s mother, treatment of drinking water by boiling, and the presence of hand sanitation facilities in the household may have protective effects on children’s health, that is, these factors are associated with a lesser probability of diarrhea in young children.

Looking at the occurrence of diarrhea in young children and factors expected to mitigate it revealed unexpected results. Surprisingly, access to protected drinking water sources and treated water was positively associated with children experiencing diarrhea.  Perhaps families do not take additional precautions when they perceive their drinking water source to be safe.

Also surprising was the finding that the mother’s educational level only seemed to have a protective impact when it was a post-secondary level. Primary and middle school educational levels among mothers was actually associated with greater diarrhea incidence when compared to mothers with less education.

The results I describe are the result of an informal look into IPUMS PMA’s new Nutrition data, and they hold so much potential for more research.  There are more than 1000 new variables that have been added to IPUMS PMA from the Nutrition module data.

Like the family planning core surveys, household and service delivery point data can be linked together by the variable (the primary sampling unit).  For more information on how to link person data and facility data, see our user note.

We hope you are able to leverage these data in your own research!

As always – use it for good! 


1 https://www.who.int/news-room/fact-sheets/detail/diarrhoeal-disease

2Mosites, E. M., Rabinowitz, P. M., Thumbi, S. M., Montgomery, J. M., Palmer, G. H., May, S., … & Walson, J. L. (2015). The relationship between livestock ownership and child stunting in three countries in Eastern Africa using national survey data. PLoS One, 10(9).

3Kaur, M., Graham, J. P., & Eisenberg, J. N. (2017). Livestock ownership among rural households and child morbidity and mortality: an analysis of demographic health survey data from 30 sub-Saharan African countries (2005–2015). The American journal of tropical medicine and hygiene, 96(3), 741-748.

IPUMS ATUS data now available for online analysis

A Q&A about the new tool

By Daniel Backman, Senior Data Analyst, IPUMS

Earlier this year, the IPUMS Time Use team enabled analysis of American Time Use Survey (ATUS) data via an online data analysis tool. The Survey Documentation and Analysis (SDA) program was developed at UC Berkeley and allows users to analyze data online without a statistical package.

What data are available for analysis?

All years of ATUS data are available for online analysis. Users can choose to analyze a single year of ATUS data, or select among a number of multiple-year data files. Data from specific modules are also pooled together to facilitate analysis of ATUS module data and appropriate weights are set as defaults.

If you are familiar with ATUS data, it is important to note that the data in SDA are not in a hierarchical (or time sequence) format. As such, you are not able to create your own time use variables that summarize time use within a person through the SDA tool. However, a number of pre-fabricated time use variables are available (BLS and IPUMS summary variables as well as the ERS Eating and Health module time use variables).

Continue reading…