Linking Ethnic Data from Africa

Ethnic group clusters in Benin, colored points denote groups from different datasets

Joint work with Carl Müller-Crepon (ETH Zürich & Harvard) and Yannick Pengl (ETH Zürich).


Ethnic identities structure social space. Social scientists have therefore assembled many high-quality datasets on ethnic groups, in particular covering African countries. However, the integration of these datasets has not yet been brought to its full potential. We propose a consistent and comprehensive matching of 10’000 ethnic categories in the eleven most prominent datasets. We first match all ethnic categories to the linguistic database Ethnologue. We leverage this set of all known languages as an intermediate dictionary to provide a link between any two groups in the original data. Leveraging the structure of the linguistic trees, we can match ethnic groups on multiple levels of aggregation and take into account linguistic distances between them. This will help researchers to make the most out of the available data on ethnic groups in Africa.

Interlinked datasets

In total, we link ethnic lists drawn from eleven data sources:

Nils-Christian Bormann
Senior Lecturer in Political Science

I am a political scientist and my research focuses on causes and consequences of ethnic power sharing and civil wars.