As part of the 2017 Minnesota Population Center Diversity Fellowship program, the goal of our project—titled “IPUMS-International and IPUMS-DHS Stats Education Outreach”—was to develop exercises in basic statistics that were based on the IPUMS-International (census) and IPUMS-DHS (Demographic and Health Survey) datasets.
The motivation behind this project was to extend the reach of IPUMS datasets to a broader audience of stats students and teachers from around the world. The population of users we had in mind when designing these exercises were statistics instructors and teaching assistants in search of ready-made, real-data-based exercises. Specifically, our goal was to make use of the unique quality and breadth of the empirical data included in IPUMS-International and IPUMS-DHS, and to expose teachers and students from the field of statistics to the complexity and messiness of real-life data. In doing so, we were also aiming to bridge a gap that exists between statistics and the social sciences, with the hope of getting the best of both worlds: the elegance, accuracy, and unambiguity of statistics, on the one hand, and the messier, more holistic approach of the social sciences on the other. As students who themselves come from these two radically different disciplines, we were an embodiment of this very endeavor, and had a chance to learn from each other throughout this fruitful summer.
More concretely, the first week of our project involved familiarizing ourselves with the datasets of IPUMS-International and IPUMS-DHS. We created a list of important concepts from an Intro to Statistics class as an outline for us to follow, and from there, we were ready to begin. Right away, we noticed the vast difference in sample sizes between the IPUMS-International and IPUMS-DHS datasets. Since the DHS data contained smaller samples, we decided it would be easier to write exercises using the integrated DHS data first, and then create complementary exercises using the IPUMS-International census data.
Because teachers of statistics often do not use or have access to statistical packages such as Stata and SPSS, all of the coding for the exercises was performed using the open source programming language R. Along with creating questions that involved coding in R, we included conceptual questions and asked about the interpretation of the findings, to ensure that the students obtained a true understanding of their data and analyses. By the end of our project, we had completed 18 exercises in total, 9 with the IPUMS-International census datasets and 9 with the IPUMS-DHS datasets, which will be posted on the IPUMS-International and IPUMS-DHS websites, We definitely gained valuable experience and have enjoyed our time working here at the MPC. We’d like to thank our mentors, Dr. Miriam King and Dr. Lara Cleveland, as well as David Haynes and Mia Riza for helping us along the way and for giving us such a great opportunity.
Story by Erez Garnai and Stephanie Chen
2017 Summer Diversity Program Fellows, Minnesota Population Center
The Diversity Fellowship Program at University of Minnesota’s Minnesota Population Center is designed to help recruit undergraduate and graduate students to work on U.S. or international demographic data resources. The summer program is 10 weeks long, and students are paired with faculty members and research staff to work on a research project and gain professional skills.