By Kari Williams
As part of the IPUMS mission to democratize data, our user support team strives to answer your questions about the data. Over time, some questions are repeated. This blog post is an extension of an earlier series addressing frequently asked questions. Maybe you’ll learn something. Perhaps you’ll just find the information interesting. Regardless, we hope you enjoy it!
Here’s one of those questions:
How do I open IPUMS microdata files in my stats package?
You have honed your research question and analytical approach, identified an IPUMS data collection that suits your needs, learned to navigate the IPUMS interface to create a custom data extract, and just received an email notification that your data file is ready to download. You put your favorite song on the stereo and open your data file in Stata (or whatever statistical software package makes your data analysis dreams come true), and…
record scratch! You see a “file not Stata format” error.
Take a deep breath. This blog post is going to help you restart your data analysis party.
The key info is that IPUMS microdata files, by default, are delivered as fixed-width, compressed files (e.g., ipumsdata_00001.dat.gz).
Okay, great. What does that mean for getting the file to open in your stats package though?
Fixed-width ASCII files
IPUMS data are delivered as fixed-width ASCII files. This is denoted by the “.dat” extension in your data download. Each column, or set of columns, corresponds to a specific variable. Before you can use this file, you need to tell your statistical analysis software package which columns correspond to which variables and provide it with meaningful value labels.
For Stata, SAS, and SPSS users, the IPUMS extract system creates a statistical package syntax file to accompany each data file; this is designed to read the ASCII data into your statistical package while applying the appropriate variable and value labels. You must use the syntax file with the extract to read the data. The syntax file may require minor editing to update the filepath for where you store the data on your local computer; this step is not required if you store the data and syntax file in the same directory.
For R and Python users, the ipumsr and ipusmpy packages leverage the information from the DDI codebook included with your extract to apply the appropriate variable and value labels.
Decompressing data files
All IPUMS data files are delivered in a gzip compressed format; this saves space on our servers and speeds up your download of the data file. This is denoted by the “.gz” extension in your file. Most statistical software packages require that you decompress or unzip the file before using it (the ipumsr and ipumspy packages allow you to read files without decompressing them first). I recommend 7zip as a free decompression software for Windows and The Unarchiver for Macs (Archive Utility works well for Macs too, but occasionally gets tripped up on gzip files).
Can’t you format the file for me?!?!
It’s worth noting that fixed-width data files are the easiest for IPUMS servers to process (i.e., you get your data faster this way). However, you can modify your extract request to produce a data file that has been formatted to open directly in Stata, SAS, or SPSS as well as a .csv formatted file that can be opened in Excel.
To request a formatted file, click on the “change” option in the Data Format row on your extract
Image 1: Change data format from extract summary page
On the next page, select your preferred format.
Image 2: Selecting an updated data format.
Note that you will still need to decompress your formatted data file before opening it in the statistical software package of your choice.
You now have all the tools you need to decompress your IPUMS data file and either read it into your stats package via a syntax file or request a formatted dataset. Woo hoo! If you are still running into issues drop a line to firstname.lastname@example.org for targeted troubleshooting.