Joint work with Carl Müller-Crepon (ETH Zürich & Harvard) and Yannick Pengl (ETH Zürich).
Abstract
Ethnic identities structure social space. Social scientists have therefore assembled many high-quality datasets on ethnic groups, in particular covering African countries. However, the integration of these datasets has not yet been brought to its full potential. We propose a consistent and comprehensive matching of 10'000 ethnic categories in the eleven most prominent datasets. We first match all ethnic categories to the linguistic database Ethnologue. We leverage this set of all known languages as an intermediate dictionary to provide a link between any two groups in the original data. Leveraging the structure of the linguistic trees, we can match ethnic groups on multiple levels of aggregation and take into account linguistic distances between them. This will help researchers to make the most out of the available data on ethnic groups in Africa.
Interlinked datasets
In total, we link ethnic lists drawn from eleven data sources:
- Afrobarometer Surveys
- All Minorities at Risk (AMAR)
- Census data from IPUMS
- Ethnic Power Relations Dataset
- Ethnologue languages
- Ethnic groups in Francois, Trebbi & Rainer (2015)
- Ethnic groups from Fearon (2003)
- GREG Data (based on the Russian Atlas Miradova)
- Demographic and Health Surveys
- Murdock Atlas
- Spatially Interpolated Data on Ethnicity (SIDE)