By Quinn Heimann
An ongoing goal and challenge for IPUMS-International (IPUMSI) is providing users with the most detailed geography possible. A unique obstacle to this is the confidentiality requirements agreed upon in order to distribute these census and survey samples. Nevertheless, IPUMSI has started launching lower-level geographic variables in samples where data is sufficient and confidentiality thresholds are still met. As of spring 2022, twenty samples have been released with third administrative level geographic data, covering ten countries across Africa and Asia. In addition, accompanying shapefiles are also being distributed to supplement these variables. Shapefiles can be used in conjunction with these more granular geographic variables to map out population trends in greater detail.
Many of these countries have multiple samples with lower level geography variables available. It is always a goal of IPUMSI to provide users with as much detail as possible for each sample, but this is sometimes hindered by a lack of sufficient data or detail. Some countries, such as Bangladesh and Mali, contain sufficient detail to provide lower level geography for all available samples in IPUMSI. More recent samples often contain more detail and more thorough documentation, whereas oftentimes this level of information is not present for samples produced longer ago.
Another challenge associated with distributing more granular geographic data to users is the production of related shapefiles. IPUMSI aims to provide accompanying shapefiles to all lower level geographic variables produced, however, certain samples may be more difficult to produce these files for if adequate maps are not present, or the country is very large. One example is China, for which IPUMSI has just released lower level variables. As China is very large geographically and consists of more than 2,500 counties, processing is slower than for other countries. As a compromise, the IPUMSI team has released all currently available county-level variables for each sample for China, and a special GIS file that highlights select urban areas across the country for the 2000 sample. This combination hopes to provide users with as much data as possible, while also providing supplemental geographic files while the complete lower level file is being processed.
As IPUMSI moves forward with further low level geographic variable creation, it is important to note the great amount of effort that is needed to create these variables. Many datasets provided to IPUMS are lacking sufficient detail to publish geographic detail beyond the first or second administrative level. The greatest amount of time spent with these variables is matching many codes and labels from datasets to real world boundaries. Oftentimes data can be present, but sufficient maps or shapefiles are not present, which is the case with Ethiopia and Senegal. In these cases, IPUMSI works hard to disseminate as many years of data as possible, but the earlier years are omitted. IPUMSI hopes to obtain further funding and resources to continue producing low level geographic variables and shapefiles.