Even More IPUMS Data Available in the SDA Online Data Analysis Tool

By Daniel Backman

Beyond offering the ability to create and download customized datasets from the IPUMS microdata collections, we also support web-based analysis of the data through the SDA (Survey Documentation and Analysis) online data analysis tool. SDA empowers users to analyze IPUMS data directly from their web browsers without the need for additional software or advanced programming skills. Whether you’re a seasoned researcher or a student exploring data for the first time, the SDA tool makes it easier than ever to unlock insights from our datasets. If you’re a current SDA user and ready to get started, check out the new datasets from IPUMS CPS and IPUMS MEPS. Otherwise, read on to learn more about SDA and how to use this tool to analyze IPUMS data.

About IPUMS & SDA

What is SDA?

The SDA tool is a web-based interface that allows you to generate frequency tables, cross-tabulations, and summary statistics; create customized data visualizations, including bar charts, line graphs, and scatter plots; perform regression analysis; and export results as a CSV file for presentations or further analysis.

SDA increases the accessibility of data by allowing users to analyze data through a web-interface without needing to use (or purchase!) statistical software. There is detailed guidance on how to use the tool for analyses and how to manipulate variables. Additionally, it provides exceptionally fast real-time processing of data, making it ideal for use in the classroom or other interactive settings. See our data training exercises page for exercises that will guide you through using SDA to analyze IPUMS data.

What’s new?

IPUMS CPS

In addition to the previously available ASEC data for 1962-present, IPUMS CPS has released 17 additional datasets covering BMS data and supplement topics.

  • All BMS Samples: 1976-Present. This dataset contains every month and year of the BMS sample (588 and counting) and all BMS person and household-level core variables.
  • CPS Supplement Specific Datasets. These datasets are tailor-made to include all available samples for each specific supplemental topic available via IPUMS CPS. There are 16 supplement-specific datasets, such as Tobacco Use, Food Security, and Education. Supplement datasets include supplement-specific weights and all supplement variables offered by IPUMS CPS in addition to the BMS core person and household-level variables.

IPUMS MEPS

IPUMS MEPS now offers the ability to analyze person-level variables derived from the Full Year Consolidated (FYC) files (i.e., those listed under the “Annual” variable drop down menu).

  • All MEPS Combined Samples: 1996-Present. This dataset contains all years of MEPS FYC data and all person-level annual variables available from IPUMS MEPS in those years. This combined dataset allows for pooled and time trend analysis.
  • Single-year MEPS samples. These datasets each contain only one year of MEPS and their corresponding variables and weights. The single year datasets allow for faster data analysis.

These are just the newest additions of IPUMS data to SDA; they augment previously available datasets available for online analysis from IPUMS USA (decennial censuses and ACS), IPUMS CPS (ASEC), IPUMS International, IPUMS Time Use (ATUS and MTUS), IPUMS NHIS, and IPUMS DHS.

Let’s Look at an Example!

Because IPUMS MEPS are newly available for analysis with SDA, let’s use MEPS as an example.

First, How to Get Started

Using IPUMS MEPS SDA as our example, follow these steps to begin exploring data with the SDA tool:

Choose a dataset. From the IPUMS MEPS SDA page, select your dataset. Let’s use the All MEPS Combined dataset. This dataset contains all years of MEPS data, including the relevant technical survey variables, such as weights, primary sampling unit, and strata variables. For more information on technical variable considerations when using these SDA datasets, view the documentation on our IPUMS MEPS SDA page.

Select variables for analysis. You can use the SDA built-in variable search and selection tools to identify variables of interest. For this analysis, let’s look at the distribution of insulin users by sex across the past ten years of the MEPS data.

To discover variables, you can use the search tool (in the upper-left corner of the SDA interface in the Variable Selection pane) or explore the drop-down menus (below the search bar in the Variable Selection pane – note that these topical groupings correspond to those on the IPUMS MEPS website). If you know the variable name or prefer to explore variables through the IPUMS MEPS user interface, you can enter variable names directly in the fields for the SDA program you are interested in running.

Screenshot of SDA home page for the data collection All MEPS: 1996-2022 highlighting three methods of variable discovery. With arrows pointing to the view variable metadata in the selected text box, the variable group menu, or entering the variable name directly in the Row box.
Figure 1: Screenshot of SDA home page for the data collection All MEPS: 1996-2022 highlighting three methods of variable discovery: view variable metadata in Selected text box, find variables in the variable group menu, or enter variable name directly in Row box to begin analysis.

Run your analysis. For this example, let’s create a cross-tabulation showing the sex distribution (SEX) of insulin users (INSULIN==2 “Yes, now taking insulin”) over the years of 2007-2017. In 2018, there was a change in MEPS to how conditions were reported which includes diabetes, so for this example, we will look at 2007-2017. We will also calculate the confidence intervals and standard errors. Under the Tables tab (this is the default view in SDA), we will enter the criteria for our cross-tabulation as follows:

Row: SEX
Column: YEAR(2007-2017)
Control: INSULIN(2)
Selection Filter(s):
WEIGHT: DIABWEIGHT

Appending parentheses that contain a subset of codes after the variable is a shortcut to define a filter (see the SDA variable manipulation guidance for other tips). INSULIN is part of the MEPS Diabetes Care Supplement which is fielded only to MEPS household members who were ever diagnosed with diabetes, and requires the use of the Diabetes Care weight, so you will need to select DIABWEIGHT from the weight drop-down menu (PERWEIGHT is the default weight).

To calculate confidence intervals and standard errors, check the applicable boxes under the Output Options accordion menu. SDA will automatically apply the appropriate sample design variables to correctly estimate standard errors. There are lots of ways to customize your output through the Chart Options, Decimal Options and Create and Download CSV file, but we won’t use those for this example. When you are ready to generate your cross-tab, click the “Run the Table” box!

Screenshot of SDA home page with Output Options menu open. Arrows pointing to Row: Sex, Column: year(2007-2017), and Control: insulin(2). Under the Output Options arrows point to selecting "Confidence intervals - Level: 95 percent and Standard error of each percent. Arrow and highlight box around "Run the Table"
Figure 2: Screenshot of SDA home page with Output Options menu open. Variables displayed will create a table that cross-tabulates sex and year for adults currently using insulin from 2007-2017. Under the output options menu, the checkboxes for ‘Confidence Intervals’ and ‘Standard error of each percent’ are selected.

Visualize results. After you “run the table,” SDA should generate your results within seconds. By default, the cross-tabulation output will include column percentages, weighted population count, and a bar chart. Because we selected the confidence intervals and standard error options, those will be displayed as well. At the bottom of the results page, you will see that our standard error calculations were produced using STRATAPLD and PSUPLD, which are the default technical survey variables for this combined years dataset.

We can see that the proportion of females currently using insulin increased between the years of 2007-2017.

Screenshot of SDA generated cross-tabulation table and bar chart using variable definitions entered on Figure 2. The cells within the table display column percentages, Confidence Interval and Standard error and the weighted population.
Figure 3: Screenshot of SDA generated cross-tabulation table and bar chart using variable definitions entered on Figure 2. The cells within the table display column percentages, Confidence Interval and Standard error and the weighted population.

Quick Tips

  • Read the documentation. Familiarize yourself with SDA’s capabilities and limitations to make the most of its features. The SDA online analysis homepage for each IPUMS data collection includes links to relevant technical documentation for that specific data collection.
  • Use SDA on its own or with IPUMS extracts. Frequency tables and descriptive statistics are a great way to explore the data before you make an extract to run more complex analyses. For example, you can confirm that there is sufficient sample size for your analysis by showing the unweighted number of observations.
  • Leverage the CSV export option. If you want to produce figures with your SDA output outside of the SDA tool, save yourself the hassle of cleaning up output that you have copy-pasted into Excel. Instead, export the output as a CSV.
  • Customize your options. The output, chart, and decimal options drop-down menus allow you to tailor the display of your results. Explore the choices to help drive home the most important takeaways from your results.

IPUMS FAQ: Alternative Measures of Unemployment

By Matthew Bombyk

As part of the IPUMS mission to democratize data, our User Support team strives to answer your questions about the data. Over time, some questions are repeated. This blog post is an extension of an earlier series addressing frequently asked questions. Maybe you’ll learn something. Perhaps you’ll just find the information interesting. Regardless, we hope you enjoy it!

Here’s one of those questions:

How can I use IPUMS CPS to calculate the Alternative Measures of Unemployment published by the BLS?

Every month the Bureau of Labor Statistics (BLS) publishes a set of Alternative Measures of Labor Underutilization as part of its well-known Employment Situation News Release. A common question we are asked at IPUMS is how to calculate these rates using IPUMS CPS data. The “headline” unemployment figure is known as U-3 and is a straightforward calculation using only the main employment status variable, EMPSTAT. However, the other measures are not quite as simple. Nonetheless, these can be calculated using IPUMS CPS! Using the table below, you can calculate these rates using the public use microdata.

Continue reading…

Automating monthly workflows using IPUMS CPS and the IPUMS Microdata Extract API

By Renae Rodgers

As many readers will know, the Current Population Survey (CPS) is a monthly labor force survey that is, among other things, the data source for the monthly jobs report (or more formally the Employment Situation reports) from the Bureau of Labor Statistics.

In this blog post, I will show you how to create a reproducible, sustainable monthly workflow to update previous analyses using new data with IPUMS CPS data, IPUMS Microdata Extract API, and the ipumspy Python library.

If this is not your first CPS rodeo, you may already have a monthly workflow for working with IPUMS CPS data that suits your needs just fine – perhaps written in Stata. Did you know you can use ipumspy to make IPUMS CPS extracts from Stata?! Check out the set up instructions and template .do file in this blog post and optimize your monthly analysis even more with the IPUMS Microdata Extract API!

But I digress. In this blog post, I will first walk through a simple analysis using the IPUMS Microdata Extract API and ipumspy. I will then show you how to package that workflow so that it can be simply executed monthly when the most recent data becomes available from IPUMS CPS for refreshed analysis including the newest data.

An example IPUMS CPS, IPUMS Microdata Extract API workflow: teleworking due to COVID-19

Let’s suppose that we’re interested in looking at trends in telework due to COVID-19 over the course of the pandemic. The IPUMS CPS variable COVIDTELEW indicates whether the respondent worked from home at any time during the past 4 weeks due to COVID-19. This example will show us the overall trend in remote work due to COVID-19 as well as how teleworking breaks down by educational attainment. First we’ll define an IPUMS CPS extract that contains COVIDTELEW and EDUC variables and all months from May 2020 to June 2022.

Continue reading…

Helpful Functions for Constructing Reproducible Workflows in the Absence of an IPUMS Microdata Metadata API

by Renae Rodgers

As you may have heard, the IPUMS Microdata Extract API is now in beta for our IPUMS USA and IPUMS CPS collections! Yay! You may have also heard that an IPUMS Metadata API is not yet available. Bummer. This means that users still need to do data discovery via the IPUMS web sites, which means a lot of clicking. It also makes it difficult to set up a reproducible workflow that can be updated as new data become available (and who doesn’t want that?). This blog post will show you some easy functions you can use to retrieve sample metadata to create a monthly or annual workflow that doesn’t require visiting an IPUMS website, and walk through some ipumspy functionality to access metadata for variables already included in your IPUMS extracts. The workflow in this blog post is using ipumspy v0.2.1.

If Stata is your stats package of choice, you can leverage ipumspy  to make IPUMS extracts from a Stata do file! You can spruce up your workflow with any of the functions in the blog post below by incorporating them into the template .do file offered in our blog post on making IPUMS extracts from Stata.

Sample Metadata

To get started, import the utilities  module from ipumspy. All of the helper functions in this blog post will be using the  CollectionInformation  class from this module. The sample_ids attribute of  CollectionInformation  returns a dictionary with sample descriptions as keys and sample ids as values. This information is pulled from the sample ID page for the specified IPUMS data collection. This page is the source of the sample_ids dictionary for IPUMS CPS, and this page is the source of the sample_ids  dictionary for IPUMS USA.

from ipumspy import utilities

Functions for retrieving IPUMS CPS sample IDs

First, let’s take a look at the sample ID dictionary for IPUMS CPS.

Continue reading…

An Introduction to the IPUMS Extract API for Microdata

By Renae Rodgers

Have you heard the news?! The IPUMS Extract API now supports microdata! For users who have been clamoring for this feature for some time, feel free to skip to the final section for resources to get started. For our users who haven’t been awaiting this announcement with bated breath, and who may be saying to themselves, “ok…great…but…”


via GIPHY

This blog post will give a brief introduction to APIs, give some examples of ways to use the IPUMS Extract API in your workflow, and share some more in-depth resources.

What is an API?

API stands for Application Programming Interface. An API is an intermediate layer between a user and a server that allows the user to interact programmatically with another program or a service. First, the user’s program talks to the API – this is known as making an API call or a request. The API, in turn, talks to the server, translating the user’s request into something the server can understand. The server returns the requested information to the API, and the API then returns that information to the user. For example, Google Maps has an API that allows developers to request and retrieve information from Google Maps from within their applications, without needing to go through a web interface.

At this point you may be thinking, “great, now I have a general idea of what an API is, but I am not a software developer so… thanks anyway.”


via GIPHY

The IPUMS Extract API opens up many possibilities for easing collaboration, and creating efficient workflows with only a few simple lines of code. Please read on!

Continue reading…

Introducing CPS-ASEC Longitudinal Extracts

By Renae Rodgers

The panel component of the Current Population Survey and new Longitudinal Extracts

Did you know that the Current Population Survey (CPS) – an important source of information on unemployment, poverty, and many other topics – has a panel component? If you didn’t, you’re not alone. The CPS rotation pattern is complex and can be difficult to work with. In fact, IPUMS CPS has held multi-day workshops intended to introduce researchers to the CPS panel component, help them understand the rotation pattern, and show some convenient IPUMS CPS features that make working with CPS panel data a little easier. If you’re completely new to the CPS panel, check out the materials from our latest workshop!

Maybe you did know about the CPS panel component, but looked at the complex rotation pattern, the Census Bureau guidelines and linking keys, and decided that this was for the birds. If this sounds like you, then our newest IPUMS CPS feature may be right for you! IPUMS CPS users can now download CPS-ASEC panels that contain two observations per person across a one-year period as longitudinal extracts. The rest of this blog post will explain what you are getting when you make a CPS-ASEC longitudinal extract and will walk you through how to create one for yourself.

Continue reading…

IPUMS Announces 2020 Research Award Recipients

IPUMS research awardsIPUMS is excited to announce the winners of its annual IPUMS Research Awards. These awards honor the best-published research and nominated graduate student papers from 2020 that used IPUMS data to advance or deepen our understanding of social and demographic processes.

IPUMS, developed by and housed at the University of Minnesota, is the world’s largest individual-level population database, providing harmonized data on people in the U.S. and around the world to researchers at no cost.

There are six award categories, and each is tied to the following IPUMS projects:

  • IPUMS USA, providing data from the U.S. decennial censuses, the American Community Survey, and IPUMS CPS from 1850 to the present.
  • IPUMS International, providing harmonized data contributed by more than 100 international statistical office partners; it currently includes information on 500 million people in more than 200 censuses from around the world, from 1960 forward.
  • IPUMS Health Surveys, which makes available the U.S. National Health Interview Survey (NHIS) and the Medical Expenditure Panel Survey (MEPS).
  • IPUMS Spatial, covering IPUMS NHGIS and IPUMS Terra. NHGIS includes GIS boundary files from 1790 to the present; Terra provides data on population and the environment from 1960 to the present.
  • IPUMS Global Health: providing harmonized data from the Demographic and Health Surveys and the Performance Monitoring and Accountability surveys, for low and middle-income countries from the 1980s to the present.
  • IPUMS Time Use, providing time diary data from the U.S. and around the world from 1965 to the present.

Over 2,500 publications based on IPUMS data appeared in journals, magazines, and newspapers worldwide last year. From these publications and from nominated graduate student papers, the award committees selected the 2020 honorees.

Continue reading…