By Rodrigo Lovaton Davila
IPUMS International recently added twenty-one new harmonized variables that expand the thematic coverage of the data collection and enable new possibilities for research. Most notably, the data release introduces harmonized variables representing sample level information, including selected characteristics of the statistical operation and the sampling design (accessible in technical household). This information was previously available in the sample descriptions section, but is now also accessible through variables that can be included in data extracts. Read on for more details on these new sample-level variables and a few new work and household amenity variables!
New variables about the statistical operation describe whether the data correspond to a census or a survey; whether enumeration was de jure or de facto; the type of form received by respondents in the sample; and the month of data collection. The IPUMS International data collection currently includes 395 census samples, 233 labor force surveys, and 27 population surveys.
FORMTYPE allows users to identify whether the data for each sample consist of responses to a single, standard questionnaire applied to the entire population; responses to a short or long form, in a census that gathered more information from a sample of the population; or records derived from administrative registers (with no questionnaire used in data collection.) Most datasets in the collection correspond to one standard questionnaire (79% of 395 census samples). For censuses where a short and a long form were applied, the samples in IPUMS typically correspond to the long questionnaire (78% of 78 samples), which includes additional questions and is richer for research purposes.
ENUMTYPE indicates whether the enumeration was de jure or de facto, an important distinction for understanding how the population was counted in the census operation. Some censuses enumerate combining both de jure (usual residents) and de facto (those present on the census reference date whether resident or visitor), which is reflected in this new variable. Importantly, users can work with the existing variable RESIDENT to eliminate double-counting of persons who were enumerated both at their permanent residence and at the residence they were visiting on census night. ENUMMO complements the variable YEAR to provide a more accurate indicator of the timing of data collection.
The new sampling design variables provide key details about each sample, including the method used to draw the sample, the average sampling fraction, what units were selected (person, household, dwelling, or some geographic area), the identification numbers for the primary sampling units (PSU), whether the sample was drawn by the National Statistical Office or IPUMS, and whether the data are organized into households (or if the sample is only person-level data.) The modal sample in the IPUMS International collection uses systematic sampling, selecting 1-in-10 households (and the corresponding person records) from geographically sorted data. SAMPMETH indicates that 87% of samples apply systematic selection, with almost one third designed with stratification (26.5%) and a smaller proportion with clustering (9.6%). SAMPFRCT shows that a majority of samples have a density of 10% of the population or more: two thirds of 361 census samples, excluding the 34 historical full-count datasets. Most census samples have persons organized into households (93% of census samples according to ENUMHH). Users are encouraged to review the sampling design information, along with household and person weights, to produce estimates based on census or survey microdata in IPUMS International.
Finally, additional newly released variables broaden the range of harmonized information available regarding work and household appliances or utilities. New work variables (accessible in work or occupation, industry) harmonize information mostly from labor force surveys, which have a larger representation in IPUMS International after recent data releases. The new work variables include hours of overtime work, ideal hours of work, the reason absent from work, type of part-time work, and a detailed occupation variable following ISCO-2008 (for samples that use this classification). New variables on household appliances and utilities (accessible in appliances, mechanicals, and other amenities or dwelling characteristics) harmonize questions present in many census and survey samples in the data collection, such as the availability of a bicycle, a motorcycle, a stove, or a boat in the household, and whether the toilet is of private use for the household or shared with other households.
Many thanks to the National Statistical Office partners for their support and contributions to IPUMS. As you incorporate these new data into your work, be sure to let us know about your resulting research or policy findings. National Statistical Offices are eager to know how their countries’ data are being used. Upload a citation for your #poweredbyIPUMS work, or email us today at ipums@umn.edu.

