Ethnologue Global Dataset


The Ethnologue Global Dataset makes it possible for researchers to replicate the statistical summaries that are published in Ethnologue and to use data from Ethnologue in their own analyses. Most of the information published in Ethnologue is in the form of textual comments that are not amenable to statistical analysis; these fields of information are specifically not included in the dataset. This dataset contains only data fields with simple values (like booleans, numbers, categories) that can be submitted to statistical analysis. The data tables are supplied in the ubiquitous tab-delimited format which can be loaded into virtually any spreadsheet, database, or other data analysis tool.

The product is distributed as a zip-archive containing four files:

  • A document describing the product in detail: its terms of use, its format, the exact contents of each data table, and information on sources of additional data.
  • A table of country data containing 12 columns of information about 236 countries.
  • A table of language data containing 22 columns of information about 7,469 languages. The data in this table pertain to the language in general or provide aggregated results over all the countries in which it is used.
  • A table of language-in-country data containing 22 columns of information about 10,995 instances of language-in-country. The data in this table pertain to the language in a particular country where it is used.

The full documentation may be downloaded in order to read the definitions of all the data columns that are included.

Licensing options

The Ethnologue Global Dataset is available for use under a number of different licenses: 

Discounted pricing is offered to academic and bona fide nonprofit institutions who will use the data for non-commercial purposes.  

Back to Ethnologue Products