IPUMS International has brand new low level geographic variables and shapefiles

By Quinn Heimann

Map showing percentage of households with internet access in the 2014 Myanmar census by township
Map of Myanmar Internet Access

An ongoing goal and challenge for IPUMS-International (IPUMSI) is providing users with the most detailed geography possible. A unique obstacle to this is the confidentiality requirements agreed upon in order to distribute these census and survey samples. Nevertheless, IPUMSI has started launching lower-level geographic variables in samples where data is sufficient and confidentiality thresholds are still met. As of spring 2022, twenty samples have been released with third administrative level geographic data, covering ten countries across Africa and Asia. In addition, accompanying shapefiles are also being distributed to supplement these variables. Shapefiles can be used in conjunction with these more granular geographic variables to map out population trends in greater detail.

Screenshot to IPUMS International third level download page
IPUMSI third level download page

Many of these countries have multiple samples with lower level geography variables available. It is always a goal of IPUMSI to provide users with as much detail as possible for each sample, but this is sometimes hindered by a lack of sufficient data or detail. Some countries, such as Bangladesh and Mali, contain sufficient detail to provide lower level geography for all available samples in IPUMSI. More recent samples often contain more detail and more thorough documentation, whereas oftentimes this level of information is not present for samples produced longer ago.

Map series showing third administrative boundaries in Bangladesh, called Upazilas, in the 1991, 2001, and 2011 censuses for the entire country and the Dhaka urban area
Map of Bangladesh showing complete level3 series


Another challenge associated with distributing more granular geographic data to users is the production of related shapefiles. IPUMSI aims to provide accompanying shapefiles to all lower level geographic variables produced, however, certain samples may be more difficult to produce these files for if adequate maps are not present, or the country is very large. One example is China, for which IPUMSI has just released lower level variables. As China is very large geographically and consists of more than 2,500 counties, processing is slower than for other countries. As a compromise, the IPUMSI team has released all currently available county-level variables for each sample for China, and a special GIS file that highlights select urban areas across the country for the 2000 sample. This combination hopes to provide users with as much data as possible, while also providing supplemental geographic files while the complete lower level file is being processed.

Map showing median age by counties in Chongqing and Shanghai cities as well as their surrounding prefectures
Map of select China cities, showing adjacent areas

As IPUMSI moves forward with further low level geographic variable creation, it is important to note the great amount of effort that is needed to create these variables. Many datasets provided to IPUMS are lacking sufficient detail to publish geographic detail beyond the first or second administrative level. The greatest amount of time spent with these variables is matching many codes and labels from datasets to real world boundaries. Oftentimes data can be present, but sufficient maps or shapefiles are not present, which is the case with Ethiopia and Senegal. In these cases, IPUMSI works hard to disseminate as many years of data as possible, but the earlier years are omitted. IPUMSI hopes to obtain further funding and resources to continue producing low level geographic variables and shapefiles.