Measuring the ANZACs: Crowdsourcing a war effort


Historical demographic data has been a big part of the Minnesota Population Center’s history. The MPC can trace its own lineage to the Social History Research Laboratory in the University of Minnesota’s History Department. Current MPC Director Steven Ruggles, and one of the MPC’s founding faculty members, Rus Menard, led a project to create a 1% sample of the United States’ 1880 census. Starting in 1988 the data was entered by professional data entry personnel reading microfilm. In the late 1990s and early 2000s, the 1880 census was the first complete-count census that the historical census team at MPC worked on. The complete-count 1880 census was entered by Church of Jesus Christ of Latter Day Saints volunteers, introducing us to the challenges of working with data sources created by enthusiastic people around the world.

Now MPC faculty member, Evan Roberts, is working with the world’s leading citizen science organization The Zooniverse (in which UMN is a leading institution) to develop new crowd-sourcing methods for transcribing historical demographic data. Historical demographers partner with genealogists and other volunteers to develop datasets because the data wasn’t born digital, and is costly to transcribe. While modern censuses and surveys are delivered to the MPC as a dataset created by statistical agencies and researchers, historical data has to be transformed from paper or microfilm into a statistical database.  

A service record from the Measuring the ANZACs project.

Roberts’ project, Measuring the ANZACs, is an example of the challenges of creating demographic data from historical records. The project has taken nearly 4 million page images of New Zealand soldiers’ personnel files from the South African War and World War I and made them accessible on a website that allows the public to transcribe the records into a structured database of the demographic information on the forms.

Roberts’ and his co-authors, Kris Inwood (Univeristy of Guelph) and Les Oxley (University of Waikato), have been using New Zealand military and prison records, and public health surveys to examine the changing health of New Zealanders from the mid-nineteenth century to the present. Their research has been published in journals including the Australian Economic History Review, Journal of Family History, and Social Science & Medicine. The research team collected 45,000 records from World War I and World War II, but found their analyses were limited by the size of the database.

The opportunity to expand the database through crowd-sourcing came through a partnership with Zooniverse, one of the world’s leading citizen science organizations. The Zooniverse helps scientists classify huge collections of digital images created for research in areas as diverse as astronomy and wildlife biology. People from around the world assist scientists by classifying images containing various items including animal species, galaxies, or describing aspects of the images that are hard for computers to classify.

Similar challenges exist in historical demography, where there are vast collections of digitized documents but computers do a poor job of reading old handwriting. By tapping into the labor of millions of people worldwide, scientists can create new data on a scale that would be hard to achieve with traditional funding methods for social science.

The documents in the Measuring the ANZACs collection are even less regular and more messy than the census data MPC is used to dealing with. Roberts and a CLA freshman intern found there were more than 30 versions of the enlistment form, all with slightly different configurations of the questions asked. Service history sheets had sticky notes affixed to them, so that people had to look at several copies of the page to see what lay under the sticky note.

A sample history sheet for transcription from Measuring the ANZACs project.

Since a soft launch in October 2015 the Measuring the ANZACs website has been visited by more than 10,000 unique citizen scientists. More than half a million fields of data have been classified, the equivalent of more than 3000 complete personnel files being transcribed. Each personnel file contains up to 150 variables. Recent publicity for the project in major media outlets including Television New Zealand, TV3, and the BBC have increased traffic on the site. The research team are now processing data from the website to use in their research. Roberts recently used data provided by citizen transcribers in a seminar in the MPC’s Inequality & Methods workshop.

Public engagement and educational outreach are critical to the project’s goal of completing the database by the end of the World War I centenary in November 2018. Inwood, Oxley and Roberts have been developing lesson plans for high school and college students to use the website for social studies, history and demography research, and are happy to share material with MPC friends around the world.