Celebrating 30 Years: Three Decades of IPUMS Data

By Diana L. Magnuson; Curator and Historian, ISRDI

"Celebrating 30 years: three decades of IPUMS data" display case with promotional materials, swag items, and historical IPUMS items
“Celebrating 30 years: three decades of IPUMS data” display case at ISRDI Headquarters

“Celebrating 30 Years: Three Decades of IPUMS Data,” the current exhibit at ISRDI Headquarters, highlights thirty years of data innovation at the University of Minnesota. In the late 1980s, the Social History Research Laboratory at the University of Minnesota’s History Department proposed “the creation of a single integrated microdata series composed of public use samples for every year … with the exception of the 1890 census, which was destroyed by fire.” The primary aim was to make the U.S. census microdata “as compatible over time as possible while losing little, if any, of the detail in the original datasets.” (Integrated Public Use Microdata Series: A Prospectus).

Steven Ruggles remembers the moment he went into the History Department lounge on the sixth floor of the Social Science Tower and said, “IPUMS! Integrated Public Use Microdata Series! Isn’t that a great idea?” The response from the graduate research assistants was not enthusiastic. “What a terrible name! You can’t call it that!” According to Ruggles, “It was universal; everyone thought it was just a horrible name … It wasn’t a bad idea to propose, just a terrible thing to call it.” After a brief quandary over pronunciation (Ī-pŭms or Ĭ-pŭms), the name has stuck and is now synonymous with social research, data innovation, and free access. And for the record, we don’t care how you pronounce it, just as long as you cite it!

With funding from both the University of Minnesota and the National Science Foundation (NSF), the Social History Research Laboratory converted existing public use samples of the U.S. census from 1880 to 1980 (excluding 1890) into a single coherent series with extensive documentation. IPUMS data were disseminated through an anonymous FTP site and the first IPUMS dataset was downloaded on November 19, 1993. The first IPUMS website appeared 16 months later in 1995. For three decades, IPUMS continued to secure funding to create additional census datasets. Currently IPUMS is supported by nineteen grants from three funding entities–NIH, NSF, and the Gates Foundation.

Over time, the IPUMS data integration project expanded to include other U.S. microdata sources: the Current Population Survey (IPUMS CPS), the American Community Survey (in IPUMS USA), the National Health Interview Survey and Medical Expenditure Panel Survey (IPUMS Health Surveys), and survey data on scientists and engineers (IPUMS Higher Ed). Census and survey data from around the world were harmonized and disseminated through IPUMS International (with a first data release in 2002), and through IPUMS DHS, IPUMS MICS and IPUMS PMA (in IPUMS Global Health), IPUMS historical full-count census data consists of the North Atlantic Population Project (NAAP, first release 2001, now included in IPUMS International) and IPUMS USA Full Count (1790-1950) data. Aggregate data were also harmonized and disseminated through the National Historical Geographic Information System (IPUMS NHGIS), IPUMS Terra (combining global population, land use, and environmental data; now decommissioned), and the International Historical Geographic Information System (IPUMS IHGIS). Another aggregate data collection, the Contextual Determinants of Health (CDOH) provides access to measures of disparities, policies, and counts, by U.S. state and county, for historical marginalized populations. IPUMS Time Use has harmonized data from three time diary surveys: ATUS, AHTUS, and MTUS.

The first researcher download of IPUMS harmonized microdata occurred in the fall of 1993. Our user base is now at over 300,000 registered researchers worldwide and still growing. IPUMS users currently download over 12,000GB per week. Technological advances over this period of course played a role in the expansion of IPUMS datasets and their use by a wide swath of researchers. For perspective on how far IPUMS has come in 30 years, consider that the largest file available in the 1993 version of IPUMS (the 1990 1% sample) was just 163MB! Today, the largest files in IPUMS are 1940 Full Count File at 923GB and 1950 Full Count File at 835GB! IPUMS succeeded because of two key technological innovations: 1) in 1991, IPUMS introduced the first structured metadata system for data integration and 2) in 1995, IPUMS produced the first interactive web-based system for creating customized pooled datasets.

Our current exhibit highlights these developments by displaying artifacts central to this dynamic history. On display are original documentation, examples of evolving data storage media, and the first iteration of the IPUMS User’s Guide. Promotional swag ranges from mugs to stickers to winter scarves to a “Powered by IPUMS” phone charger. Steve Ruggles donated his first IPUMS Minnesota license plate.


Some content in this blog post was published in:
Diana L. Magnuson and Steven Ruggles, “Challenges of Large-Scale Data Processing in the 1990s: The IPUMS Experience,” special issue of The IEEE Annals of the History of Computing, “The IT of Demography: Analyzing Population Dynamics with Computers,” 44: 71-83.