The Ethnologue Global Dataset makes it possible for researchers to replicate the statistical summaries that are published in Ethnologue and to use data from Ethnologue in their own analyses. Most of the information published in Ethnologue is in the form of textual comments that are not amenable to statistical analysis; these fields of information are specifically not included in the dataset. This dataset contains only data fields with simple values (like booleans, numbers, categories) that can be submitted to statistical analysis. The data tables are supplied in the ubiquitous tab-delimited format which can be loaded into virtually any spreadsheet, database, or other data analysis tool.
The product is distributed as a zip-archive containing four files:
- A table of country data containing 12 columns of information about 236 countries.
- A table of language data containing 22 columns of information about 7,469 languages. The data in this table pertain to the language in general or provide aggregated results over all the countries in which it is used.
- A table of language-in-country data containing 22 columns of information about 10,995 instances of language-in-country. The data in this table pertain to the language in a particular country where it is used.
The full documentation may be downloaded in order to read the definitions of all the data columns that are included.
The Ethnologue Global Dataset is available for use under a number of different licenses:
- See Personal Research License for research use by a single individual.
- See Institutional Research Subscription License for research use by multiple people at the same institution.
- See Dynamic Republishing Subscription License for republication of certain data in your web site or mobile app.
- To inquire about any other use, contact SIL International at email@example.com.
Discounted pricing is offered to academic and bona fide nonprofit institutions who will use the data for non-commercial purposes.
Back to Ethnologue Products