Even More IPUMS Data Available in the SDA Online Data Analysis Tool

By Daniel Backman

Beyond offering the ability to create and download customized datasets from the IPUMS microdata collections, we also support web-based analysis of the data through the SDA (Survey Documentation and Analysis) online data analysis tool. SDA empowers users to analyze IPUMS data directly from their web browsers without the need for additional software or advanced programming skills. Whether you’re a seasoned researcher or a student exploring data for the first time, the SDA tool makes it easier than ever to unlock insights from our datasets. If you’re a current SDA user and ready to get started, check out the new datasets from IPUMS CPS and IPUMS MEPS. Otherwise, read on to learn more about SDA and how to use this tool to analyze IPUMS data.

About IPUMS & SDA

What is SDA?

The SDA tool is a web-based interface that allows you to generate frequency tables, cross-tabulations, and summary statistics; create customized data visualizations, including bar charts, line graphs, and scatter plots; perform regression analysis; and export results as a CSV file for presentations or further analysis.

SDA increases the accessibility of data by allowing users to analyze data through a web-interface without needing to use (or purchase!) statistical software. There is detailed guidance on how to use the tool for analyses and how to manipulate variables. Additionally, it provides exceptionally fast real-time processing of data, making it ideal for use in the classroom or other interactive settings. See our data training exercises page for exercises that will guide you through using SDA to analyze IPUMS data.

What’s new?

IPUMS CPS

In addition to the previously available ASEC data for 1962-present, IPUMS CPS has released 17 additional datasets covering BMS data and supplement topics.

  • All BMS Samples: 1976-Present. This dataset contains every month and year of the BMS sample (588 and counting) and all BMS person and household-level core variables.
  • CPS Supplement Specific Datasets. These datasets are tailor-made to include all available samples for each specific supplemental topic available via IPUMS CPS. There are 16 supplement-specific datasets, such as Tobacco Use, Food Security, and Education. Supplement datasets include supplement-specific weights and all supplement variables offered by IPUMS CPS in addition to the BMS core person and household-level variables.

IPUMS MEPS

IPUMS MEPS now offers the ability to analyze person-level variables derived from the Full Year Consolidated (FYC) files (i.e., those listed under the “Annual” variable drop down menu).

  • All MEPS Combined Samples: 1996-Present. This dataset contains all years of MEPS FYC data and all person-level annual variables available from IPUMS MEPS in those years. This combined dataset allows for pooled and time trend analysis.
  • Single-year MEPS samples. These datasets each contain only one year of MEPS and their corresponding variables and weights. The single year datasets allow for faster data analysis.

These are just the newest additions of IPUMS data to SDA; they augment previously available datasets available for online analysis from IPUMS USA (decennial censuses and ACS), IPUMS CPS (ASEC), IPUMS International, IPUMS Time Use (ATUS and MTUS), IPUMS NHIS, and IPUMS DHS.

Let’s Look at an Example!

Because IPUMS MEPS are newly available for analysis with SDA, let’s use MEPS as an example.

First, How to Get Started

Using IPUMS MEPS SDA as our example, follow these steps to begin exploring data with the SDA tool:

Choose a dataset. From the IPUMS MEPS SDA page, select your dataset. Let’s use the All MEPS Combined dataset. This dataset contains all years of MEPS data, including the relevant technical survey variables, such as weights, primary sampling unit, and strata variables. For more information on technical variable considerations when using these SDA datasets, view the documentation on our IPUMS MEPS SDA page.

Select variables for analysis. You can use the SDA built-in variable search and selection tools to identify variables of interest. For this analysis, let’s look at the distribution of insulin users by sex across the past ten years of the MEPS data.

To discover variables, you can use the search tool (in the upper-left corner of the SDA interface in the Variable Selection pane) or explore the drop-down menus (below the search bar in the Variable Selection pane – note that these topical groupings correspond to those on the IPUMS MEPS website). If you know the variable name or prefer to explore variables through the IPUMS MEPS user interface, you can enter variable names directly in the fields for the SDA program you are interested in running.

Screenshot of SDA home page for the data collection All MEPS: 1996-2022 highlighting three methods of variable discovery. With arrows pointing to the view variable metadata in the selected text box, the variable group menu, or entering the variable name directly in the Row box.
Figure 1: Screenshot of SDA home page for the data collection All MEPS: 1996-2022 highlighting three methods of variable discovery: view variable metadata in Selected text box, find variables in the variable group menu, or enter variable name directly in Row box to begin analysis.

Run your analysis. For this example, let’s create a cross-tabulation showing the sex distribution (SEX) of insulin users (INSULIN==2 “Yes, now taking insulin”) over the years of 2007-2017. In 2018, there was a change in MEPS to how conditions were reported which includes diabetes, so for this example, we will look at 2007-2017. We will also calculate the confidence intervals and standard errors. Under the Tables tab (this is the default view in SDA), we will enter the criteria for our cross-tabulation as follows:

Row: SEX
Column: YEAR(2007-2017)
Control: INSULIN(2)
Selection Filter(s):
WEIGHT: DIABWEIGHT

Appending parentheses that contain a subset of codes after the variable is a shortcut to define a filter (see the SDA variable manipulation guidance for other tips). INSULIN is part of the MEPS Diabetes Care Supplement which is fielded only to MEPS household members who were ever diagnosed with diabetes, and requires the use of the Diabetes Care weight, so you will need to select DIABWEIGHT from the weight drop-down menu (PERWEIGHT is the default weight).

To calculate confidence intervals and standard errors, check the applicable boxes under the Output Options accordion menu. SDA will automatically apply the appropriate sample design variables to correctly estimate standard errors. There are lots of ways to customize your output through the Chart Options, Decimal Options and Create and Download CSV file, but we won’t use those for this example. When you are ready to generate your cross-tab, click the “Run the Table” box!

Screenshot of SDA home page with Output Options menu open. Arrows pointing to Row: Sex, Column: year(2007-2017), and Control: insulin(2). Under the Output Options arrows point to selecting "Confidence intervals - Level: 95 percent and Standard error of each percent. Arrow and highlight box around "Run the Table"
Figure 2: Screenshot of SDA home page with Output Options menu open. Variables displayed will create a table that cross-tabulates sex and year for adults currently using insulin from 2007-2017. Under the output options menu, the checkboxes for ‘Confidence Intervals’ and ‘Standard error of each percent’ are selected.

Visualize results. After you “run the table,” SDA should generate your results within seconds. By default, the cross-tabulation output will include column percentages, weighted population count, and a bar chart. Because we selected the confidence intervals and standard error options, those will be displayed as well. At the bottom of the results page, you will see that our standard error calculations were produced using STRATAPLD and PSUPLD, which are the default technical survey variables for this combined years dataset.

We can see that the proportion of females currently using insulin increased between the years of 2007-2017.

Screenshot of SDA generated cross-tabulation table and bar chart using variable definitions entered on Figure 2. The cells within the table display column percentages, Confidence Interval and Standard error and the weighted population.
Figure 3: Screenshot of SDA generated cross-tabulation table and bar chart using variable definitions entered on Figure 2. The cells within the table display column percentages, Confidence Interval and Standard error and the weighted population.

Quick Tips

  • Read the documentation. Familiarize yourself with SDA’s capabilities and limitations to make the most of its features. The SDA online analysis homepage for each IPUMS data collection includes links to relevant technical documentation for that specific data collection.
  • Use SDA on its own or with IPUMS extracts. Frequency tables and descriptive statistics are a great way to explore the data before you make an extract to run more complex analyses. For example, you can confirm that there is sufficient sample size for your analysis by showing the unweighted number of observations.
  • Leverage the CSV export option. If you want to produce figures with your SDA output outside of the SDA tool, save yourself the hassle of cleaning up output that you have copy-pasted into Excel. Instead, export the output as a CSV.
  • Customize your options. The output, chart, and decimal options drop-down menus allow you to tailor the display of your results. Explore the choices to help drive home the most important takeaways from your results.

Introducing the MEPS Variable Builder!

By Julia A. Rivera Drew

Earlier this year, IPUMS MEPS launched a new feature – the MEPS Variable Builder – to make it dramatically easier to create customized person-level variables that summarize information from the medical event and condition records and add them to your IPUMS extract. If you have ever thought about using the MEPS event and condition data but didn’t know where to begin because of the complexity of the data, the MEPS Variable Builder is for you!

The Medical Expenditure Panel Survey Household Component (MEPS-HC, referred to MEPS here) provides comprehensive information on characteristics of people residing in responding households, as well as information about their medical encounters during the calendar year – e.g., office-based provider visits, emergency room (ER) visits, and hospitalizations – and medical conditions associated with those medical encounters. This unique combination of information makes the MEPS data ideal for research questions that need detailed health care utilization and/or expenditure data alongside individual-level correlates of health. However, these rich data can be difficult to work with, creating barriers for researchers who wish to use the MEPS data.

IPUMS MEPS created the MEPS Variable Builder to enable users to easily build person-level variables summarizing information from the MEPS-HC event and medical condition records, also known as “event summary variables.” Using a point-and-click interface, researchers can create custom event summary variables that count the number of events or sum expenditures across event records, filtered on selected characteristics of events and/or medical conditions. Users can then include these custom event summary variables in their IPUMS extract. At this time, the variable builder does not include prescribed medicines data.

In this blog post, we run through an example where we create a variable that is the sum of all expenditures paid for by Workers’ Compensation for medical visits due to a workplace injury.

Continue reading…

Introducing the MEPS Prescribed Medicines Data

By Julia A. Rivera Drew

The Household Component of the Medical Expenditure Panel Survey (MEPS), administered by the Agency for Healthcare Research and Quality (AHRQ), is a short panel survey collecting information for a nationally representative sample of the civilian, noninstitutionalized population. Since 1996, the MEPS has collected information on demographic and socioeconomic characteristics; health status; medical conditions; and health care access, utilization, and expenditures.

Based on information provided by a family respondent about each family member at each interview, AHRQ produces a dataset of all reported fills of prescribed medicines purchased by family members during the calendar year (including refills). For example, if a prescription was filled monthly, there would be 12 records for that specific prescribed medicine (DRUGID) in the annual file. The prescribed medicines data includes information such as the medication name (RXNAME), national drug code (RXNDC), therapeutic classification (MULTC1), when the person began taking the medication (RXBEGMM and RXBEGYR), amounts paid (RXFEXPTOT), and source of payment (RXFEXPSRC).

IPUMS MEPS provides a harmonized and integrated version of the MEPS Household Component data, including data from the prescribed medicines files.

Continue reading…

IPUMS Announces 2020 Research Award Recipients

IPUMS research awardsIPUMS is excited to announce the winners of its annual IPUMS Research Awards. These awards honor the best-published research and nominated graduate student papers from 2020 that used IPUMS data to advance or deepen our understanding of social and demographic processes.

IPUMS, developed by and housed at the University of Minnesota, is the world’s largest individual-level population database, providing harmonized data on people in the U.S. and around the world to researchers at no cost.

There are six award categories, and each is tied to the following IPUMS projects:

  • IPUMS USA, providing data from the U.S. decennial censuses, the American Community Survey, and IPUMS CPS from 1850 to the present.
  • IPUMS International, providing harmonized data contributed by more than 100 international statistical office partners; it currently includes information on 500 million people in more than 200 censuses from around the world, from 1960 forward.
  • IPUMS Health Surveys, which makes available the U.S. National Health Interview Survey (NHIS) and the Medical Expenditure Panel Survey (MEPS).
  • IPUMS Spatial, covering IPUMS NHGIS and IPUMS Terra. NHGIS includes GIS boundary files from 1790 to the present; Terra provides data on population and the environment from 1960 to the present.
  • IPUMS Global Health: providing harmonized data from the Demographic and Health Surveys and the Performance Monitoring and Accountability surveys, for low and middle-income countries from the 1980s to the present.
  • IPUMS Time Use, providing time diary data from the U.S. and around the world from 1965 to the present.

Over 2,500 publications based on IPUMS data appeared in journals, magazines, and newspapers worldwide last year. From these publications and from nominated graduate student papers, the award committees selected the 2020 honorees.

Continue reading…