A History of Data: Information Technology and the MPC

MPCSuperman (1)

Bethel University Professor of History Diana L. Magnuson is documenting the growth of the Minnesota Population Center. Believing that preserving institutional memory is vital, the Center is supporting Magnuson’s work to capture oral histories of past and present MPC faculty and staff.

This is the second in a three-part series, with oral histories from the information technology (IT) side of the MPC. For over 16 years, the IT staff has collaborated with the MPC research staff to recode and disseminate data, develop specialized software, and make research more efficient. The “secret sauce of the MPC” is the longstanding synergistic collaboration between IT and research staff.

BILL BLOCK was the first person to direct the IT core after the organization of the MPC in 2000. Block now serves as director of the Cornell Institute for Social and Economic Research. Block says that from the beginning, the MPC established a work culture that attracted a distinct group of people.

People that like what they are doing and coalesce around a common purpose … It was a very interesting mix of easy comfortable environment but seriousness of purpose underneath. That what we are doing is real, has deadlines, value, and people will use it, and research will be built on it. But it’s amidst a very collegial group…it’s a great place to work, collegial, people like it, they are all data geeks.

COLIN DAVIS is a senior software developer and has been at the MPC since 2001. MARCUS PETERSON is an IT supervisor at the MPC and has been part of IT since 2002. Davis and Peterson were pioneers in the early days of the MPC.

Peterson:
When I first came here, it was the Wild West. We didn’t know what we do now, and we made things work. That was our goal. At that time, that was a perfectly effective strategy. That works fine when you’re a small organization and you don’t have many projects, but as you grow, you need to introduce efficiencies and solutions that scale to bigger teams.

Colin Davis had a long to-do list when he began:

The first thing I ever did…was write a program to deploy the web application that does the data extraction, and copy the data files over and get everything set up … The second thing I did was fix a ton of bugs on the website with Patt Kelly. So she kept sending me little e-mails, notes about this and that…it was like 150-some things, which was crazy. … [Also] the extract system, the back end part of it, wasn’t good when I got there, so I rewrote the whole thing… I can’t believe I did all that stuff. None of it was great, because all of that was happening at the same time, but it all worked to the point where the data was good enough.

Peterson described the genesis of developing “scalable” software that could also be used across projects:

There was this commonality across each project, and we were thinking, “We could develop one codebase that would support all of these different projects.” At that time [initially], each project was a one off web application … IPUMS-USA in particular was done by historians, and for not knowing anything about programming, they did a great job, but to have that site be scalable and have that codebase be applicable to other projects, we needed to sort of start from scratch. I remember that was this huge revelation that we had. We could develop just one kind of codebase and immediately we could just keep adding new projects. That’s still how we do things to this day.

Peterson explained that the process of standardization and automation began with the IPUMS-International project:

They [the research staff] had these translation tables, which would describe how to map each variable and whatever its column location was to its IPUMS format…This information was copied and hard coded in the existing web applications. It was just like someone copied all of this information out of the translation tables and put it into the code. I had this thought, “Well, what if our code was more dynamic than that, since these translation table files are getting generated anyway? What if our programs read those in and dynamically displayed whatever was in those rather than having someone copy those into other files every time?” So we could automate all of this stuff.

Peterson and Davis applied their new automated approach to the data documentation as well:

The researchers had detailed variable descriptions of how you could use a certain variable and how it was comparable across data samples. A lot of this information ended up being copied from document to document. I was thinking, “Well, why don’t we read in these documents they’re generating and display that information on the web directly, rather than having all of this copying and hard coding?” We could automate everything instead of having all of this human intervention — people copying and doing repetitive work. Why don’t we have the web applications read in all of the metadata? Colin and I came up with the idea of reading in the metadata and having that drive the content for the web applications.

FRAN FABRIZIO came to the MPC in 2011 and is the current IT Director. One of the things that drew Fabrizio to his current position was that it was “nothing” like the previous IT Director position he held.

I really enjoyed that in this role, everybody in the organization was focused on this same mission… it was really strong, and the sense of collaboration across the whole organization was fantastic.

From the IT perspective, you will never have a customer at any job, whether it’s private sector, public sector, wherever— that 1) you will work more closely with, or 2) that just intuitively understand the value of what IT does …

So, you can get to work and start talking about the important things, and people appreciate that in the IT core. The fact that you can walk ten feet and talk to your customer five times a day if you have to, that’s really unique as well. Most IT jobs are writing a piece of software for a client or for a website that is out in the general public. You don’t know exactly who is using it and how it’s being used. Here, we’re all users of the products we are building, and so you can immediately get feedback…that’s the secret sauce of the MPC, is that we’re in the same room; we’re in the same mixing pot…But, I think a big part of it is the co-location and the intrinsic understanding that everybody has accelerated the timeline immensely. That’s what’s allowed us to do as much as much as we’ve done.

Story and interviews by Diana Magnuson.

Post Views: 1,070