Don’t be a square—rectangularize!
What do we mean by “rectangularize”?
You may not be familiar with the term rectangularizing. Some data providers also call this type of data a “flat file.” A flat or rectangularized file is created by attaching information from a higher-level record (most commonly household-level measures, things like geography or number of household members) onto all associated lower-level records (e.g., information about persons nested within those households). The result is a set of lower-level records with information from the higher level attached to each lower-level record. Rectangularized data is not new to IPUMS projects; many of our datasets provide information about both individuals and their household, and the IPUMS extract system automatically rectangularizes or flattens the data by default. This saves users the effort of sorting data by unique household identifiers and attaching values for household-level variables onto all individuals within the household.
Okay, I get it, but why are you doing a blog post about rectangularizing down?
We have expanded the rectangularize down functionality of the IPUMS extract system! Some IPUMS datasets include additional record types that are nested within person records. IPUMS Time Use nests activities reported as part of the time diary within persons, and MEPS data nests round-level observations collected at different points in time within persons. IPUMS now supports rectangularizing these additional record types down, or adding these higher-level data (e.g., household- or person- level variables), onto lower-level record types (e.g., round records in MEPS or activity records in Time Use).
Let’s conceptualize why you might rectangularize…
The Medical Expenditure Panel Survey Household Component (MEPS) interviews individuals (nested within households) five times over a two-year period; each interview is called a round. IPUMS MEPS currently offers variables at two levels: those that contain information about people, and those that contain information collected in specific rounds. Person-level variables are available under the “Annual” drop down menu. There are a few different types of variables that are included in annual, person-level MEPS data:
- time-invariant characteristics (e.g., nativity, race, sex),
- annualized measures (e.g., total expenditures in the year, self-administered questionnaire variables that are only asked about once per year), and
- variables that report the value for time-variant characteristics on the last day of the year (e.g., age, marital status).
Variables classified as round-level data in IPUMS MEPS are those variables asked about in some or all round interviews. These can include time-variant demographic and socioeconomic measures (age, marital status, employment status), as well as some health data (condition diagnoses).
Figure 1. Sample of relationship between hierarchical and rectangularized data in IPUMS MEPS
Keep your eye out in the near future for the opportunity to attach person-level characteristics from IPUMS NHIS to injury records, too!
IPUMS TIME USE
All three IPUMS-Time Use data collections (MTUS, AHTUS and ATUS) provide time diary data; they report a person’s activities over a 24-hour period. The data are structured to include information about a person (on the person-record), and nest their activities for the 24-hour reference period under the person record (on the activity-record). The ATUS has three record types (household, person and activity) while MTUS and AHTUS have two record types (person and activity) that are included in the rectangularize down functionality.
Most researchers are interested in analyses that focus on individuals; for example, the research question “Are there leisure time differences between mothers and fathers?” is about making comparisons between mothers and fathers. However, one may also be interested in analyses that focus on activity sequencing; the research question “Is mothers’ leisure more likely to be interrupted than fathers’ leisure?” requires determining, for example, whether a leisure activity is followed by housework and then leisure. Users interested in person-level analyses that summarize activity-level data about time use (e.g., summing up time spent in a particular activity for each person) can do so using the Create Your Own Time Use Variable tool. The rectangularize down functionality allows users interested in activity-level analyses to easily include household- and person-level measures on the activity records (e.g., control for sex and parental status when examining sequencing of particular activities).
Figure 2. Sample relationship between hierarchical and rectangularized data in IPUMS MTUS
Enough blabbing. How do you make this happen?
Step 1: Click on the “CHANGE DATA STRUCTURE” button from the data browsing area of your IPUMS dataset of choice.
Step 2: Choose the data structure for your extract. Rectangular extracts at the person level are the default. Modify this, and select a rectangular extract for activity or round.
Step 3: Go about selecting your variables like you would with any extract, remembering that your output will be activities (Time Use) or rounds (MEPS), with household- or person-level characteristics attached to them. These higher-level variables will have the same name and report the same data, but will be included on the activity- or round-level record instead.
Step 4: Submit your extract! As always, you can choose to receive an unformatted .dat file with a syntax file to read in the data (the fastest option), or choose to receive a data file that is ready to be analyzed in your preferred statistical software package instead.
Step 5: Do a celebratory dance while you wait for your extract, or maybe make a list of all the amazing things you will do with the time you are about to save by having IPUMS rectangularize your data for you!
Please note that ATUS “who” and “eldercare” record types still require hierarchical data structure.
Story by Daniel Backman and Kari Williams
IPUMS Senior Data Analysts