Making IPUMS extracts from Stata

By Renae Rodgers

IPUMS has released a beta version of the Extract API that supports the IPUMS USA and IPUMS CPS microdata collections! Check out An Introduction to the IPUMS Extract API for Microdata for a brief introduction to the IPUMS Extract API for microdata. This blog post will demonstrate how to leverage the IPUMS Extract API and the ipumspy Python library to make IPUMS extracts directly from your Stata .do files! No prior knowledge of or interest in learning Python is required, but you will need Stata 16 or higher, an IPUMS user account, and an API key. All the code in the examples below is available in a template .do file if you would like to follow along.

Setting up Python for use with Stata

While you don’t need to be a Python user to make an IPUMS extract via Stata, there is a little bit of Python set up required. Chuck Huber at Stata has put together a great series of blog posts about how to use Python in Stata and this section is heavily inspired by the first post in that series.

Step 1: Download and install Miniconda

Even if you already have Python installed on your computer, I highly recommend setting up a separate python installation in a conda environment for your ipumspy-in-Stata work. Miniconda is a light-weight version of the package manager Anaconda that will allow you to install Python, ipumspy, and all of the necessary dependencies in a separate environment that you can access from Stata without disturbing anything else on your system that might be using Python. Download Miniconda for your operating system and install.

After Miniconda has been successfully installed, add the relevant Miniconda directories to PATH. Enter the following command in the Windows Command Prompt, modifying the file paths to reflect the correct Miniconda file paths on your computer.

  set 
PATH=%PATH%;C:\Users\renae\Miniconda3;C:\Users\renae\Miniconda3\Scripts;C:\Users\renae\Miniconda3\Library\bin

Now we are ready to create an environment that includes the ipumspy package!

Step 2: Create a conda environment and install ipumspy

Now that Miniconda is installed, we will create a conda environment that includes Python, ipumspy, and all of its dependencies. Note that version 0.2.1 or higher is required to use ipumspy in conjunction with Stata. Open a terminal if you’re using Mac or Linux or Command Prompt if you’re using Windows, and enter the following command to create a conda environment named “ipums” that has Python version 3.8.12 and ipumspy version 0.2.1 installed. This may take a few moments to finish.

conda create -n ipums python=3.8.12 ipumspy=0.2.1 -c conda-forge

Now that we’ve got our environment set up and the requisite libraries installed, it’s time to go to Stata!

Step 3: Setting up Stata for working with Python

In order to use your new Python installation from within Stata, you need to point Stata to the correct Python executable. To view available Python installations, run python search in the Stata command window.

python search
----------------------------------------------------------------------------
Python environments found:
C:\Users\renae\AppData\Local\Programs\Python\Python38\python.exe
C:\Users\renae\Miniconda3\python.exe
----------------------------------------------------------------------------

If you have multiple available Python environments on your computer, as shown above, be sure to point to the Python executable in the environment you just created using Miniconda.

. set python_exec C:\Users\renae\Miniconda3\envs\ipums\python.exe

To make sure that step was successful, run python query. The output should looks something like this.

. python query
---------------------------------------------------------------------------
    Python Settings
      set python_exec      C:\Users\renae\Miniconda3\envs\ipums\python.exe
      set python_userpath  

    Python system information
      initialized          no
      version              3.8.12
      architecture         64-bit
      library path         C:\Users\renae\Miniconda3\envs\ipums\python38.dll

Note that python_exec is set to the Python executable in my new ipums conda environment.

Next, save your API key as a global macro. We will access the value of this macro later to make an extract via the IPUMS Extract API.

global MY_API_KEY "YOUR API KEY HERE"

And now you’re all set up!

Making an IPUMS Extract from Stata

I promised at the beginning of this post that I wasn’t going to make you learn Python – and I am not – but I am going to walk through the mechanics of making an IPUMS USA extract using Python… in Stata. The first step is to drop into Python in Stata by entering python in the command window or adding python to a line in your .do file.

Once in Python, import the necessary modules from the ipumspy and the Stata Function Interface (sfi) libraries, initialize the IpumsApiClient and pass it your API Key.

    import gzip
    import shutil
    
    from ipumspy import IpumsApiClient, UsaExtract
    from sfi import Macro

    my_api_key = Macro.getGlobal("MY_API_KEY")

    ipums = IpumsApiClient(my_api_key)

(A brief note about API keys – an API key is like a password and you should treat it as such. Don’t share it with co-workers, put it in emails, or include it in code that you publish on GitHub. Ok, back to the code.)

An IPUMS extract is defined by:

  1. An IPUMS data collection
  2. A list of sample IDs
  3. A list of variable names
  4. An extract description

IPUMS does not currently have a publicly available metadata API (though we hope to offer one in the future). If you’re the type of IPUMS user that has memorized your favorite variable names and sample IDs, you can just drop them into the code below. For us mere mortals, sample IDs and variable names can be found on the IPUMS data collections websites. For this example, check out the IPUMS USA sample ids list and use the IPUMS USA variable selection interface to find your variable names. As more IPUMS data collections are supported by the IPUMS Extract API, the ipumspy documentation will be updated with links to each of these pages for each available data collection.

The code below defines a small extract from the 2012 ACS.

    ipums_collection = "usa"
    samples = ["us2012a"]
    variables = ["AGE", "SEX", "RELATE"]
    extract_description = "My first API extract!"

This information is then passed into the UsaExtract object to build your extract.

extract = UsaExtract(samples,
                         variables,
                         description=extract_description)

The next steps are to submit your extract definition to the IPUMS extract system, wait for your extract to complete, and download the data, ddi, and stata syntax files into your current working directory!

    # submit your extract to the IPUMS extract system
    ipums.submit_extract(extract)

    # wait for the extract to finish
    ipums.wait_for_extract(extract, collection=ipums_collection)

    # download it to your current working directory
    ipums.download_extract(extract, stata_command_file=True)

This step will take a few moments. Stata may say that it is not responding, but a little patience will see your extract safely downloaded!

Still in Python, use the sfi Macro class to set local Stata macros containing the name of the IPUMS data collection and the id number of your extract. You’ll eventually use these to run your extract do file.

    Macro.setLocal("id", str(extract.extract_id).zfill(5))
    Macro.setLocal("collection", ipums_collection)

The final step before exiting Python and getting back to Stata is to unzip your extract. The extract data file comes in a compressed format and needs to be decompressed before it can be read into Stata.

    extract_name = f"{ipums_collection}_{str(extract.extract_id).zfill(5)}"
    # unzip the extract data file
    with gzip.open(f"{extract_name}.dat.gz", 'rb') as f_in:
        with open(f"{extract_name}.dat", 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)

    # exit python
    end

Now back in Stata, use the local macros defined using sfi above to read in your data using the Stata syntax file that accompanies the extract.

qui do `collection'_`id'.do

And we’ve got data!

list age sex relate in 1/10

        +---------------------------------+
        | age      sex             relate |
        |---------------------------------|
     1. |  36   Female   Head/Householder |
     2. |  38     Male             Spouse |
     3. |  16     Male              Child |
     4. |  13     Male              Child |
     5. |  59   Female   Head/Householder |
        |---------------------------------|
     6. |  60     Male             Spouse |
     7. |  74     Male   Head/Householder |
     8. |  33     Male              Child |
     9. |  41     Male   Head/Householder |
    10. |  42   Female             Spouse |
        +---------------------------------+

TaDa! And there you have it! An IPUMS Extract created, submitted, downloaded, and read into Stata without leaving your .do file!

Some Final Tips

The profile.do file can be used to run commands every time you start Stata. You may find it useful to set the Python executable (as show in Step 3 in the Setting up Python for use with Stata section above) and set your IPUMS API key as a global macro in this file. This extra bit of set up will mean so you don’t need to set the python executable everytime you want to make an IPUMS extract. Storing your API key this way is also more efficient and slightly more secure than copying and pasting it in to every do file from which you want to make an extract. Setting the API key as a global macro in your Stata session guards against accidentally exposing your credential in do files that get shared with collaborators or students or are stored on GitHub.

Use it for Good!