IPUMS Terra: Speeding up Data Transformations

IPUMS Terra integrates population and environmental data by transforming data between microdata records describing individuals, area-level data records describing places, and grid-based raster data. These transformations are computationally intensive and can easily become very time consuming. For example, in the current IPUMS Terra system, summarizing raster-structured climate data to create area-level data describing Mexican municipios takes about twenty minutes per climate variable. Multiply that by a year’s worth of monthly climate data for several variables, and an extract can easily take many hours to run. Summer Diversity Program Fellows Luyi Hunter and Xinran Duan teamed up with Dr. Eric Shook (Geography faculty) and Dr. Tracy Kugler (IPUMS Terra project manager) to speed up the processing time for these raster-to-area-level transformations.

 

The Fellows first used a technique known as profiling to identify the specific points in existing code that were taking the most processing time. The profiling exercise revealed that most of the processing time was spent on one particular function. The Fellows then focused on optimizing the computation performed by that function.

By implementing a new version of the function that takes advantage of existing scientific computing packages and eliminating repeated loops over the data, the Fellows reduced the execution time by 83.9% compared to the original version of the transformation code. The Fellows then turned to parallelization to achieve even greater performance improvements. They developed a parallel version of the transformation code that is capable of using multiple processing cores simultaneously to read in data and process it in smaller chunks. The parallel version achieved a 92.4% speedup compared to the original version. With these performance improvements, the Mexican municipio calculations should drop from twenty minutes per climate variable to about a minute and a half, reducing a year’s worth of monthly climate analyses from 4 hours to 18 minutes.

 

Story by Luyi Hunter and Xinran Duan
2017 Summer Diversity Program Fellows, Minnesota Population Center

 The Diversity Fellowship Program at University of Minnesota’s Minnesota Population Center is designed to help recruit undergraduate and graduate students to work on U.S. or international demographic data resources. The summer program is 10 weeks long, and students are paired with faculty members and research staff to work on a research project and gain professional skills.