One feature of the Ethnologue since its inception as a database in 1971 has been a system of three-letter codes for uniquely identifying languages. These became part of the publication in 1984. In the interest of fostering the uniform identification of all the world's languages in information systems, beginning with the 14th edition (2000), SIL International has released the complete set of three-letter codes (plus indexing information involving countries and alternate names) as downloadable data tables that the public may incorporate into their own database applications and dynamic web sites. Prior to the publication of the 15th edition in 2005, the Ethnologuei> worked in cooperation with the International Organization for Standardization (ISO) to create a new international standard for language codes. This was fully adopted in 2007 as ISO 639-3, Codes for the representation of names of languages — Part 3: Alpha‐3 code for comprehensive coverage of languages. The current downloadable tables are compatible with the latest updates to that standard. Examples of efforts that are already using these codes as a standard for language identification are the Open Language Archives Community and its participating archives.

Any application that makes use of these language identifiers is just one click away from access to the full language descriptions that are available in the Ethnologue. That is, for any language identifier [abc] that may be stored in a database, an application may present a link to the following URL in order to give the user access to the Ethnologue's description of that language:

https://ethnologue.com/language/abc

The remainder of this document, after describing the terms of use for the download tables, describes their relationship to standards, explains their structure, gives some hints on how to use them, and offers links for downloading them.

Relation to Standards

This 15th edition of the Ethnologue (2005) marked an important milestone in the development of the language identifiers, namely, their emergence as part of the draft international standard, ISO 639-3. (See History of the Ethnologue for a fuller discussion of the history of the language identifiers.) The aim of that standard is to enable the uniform identification of all known human languages in information systems. ISO 639-3 was devised to enable the uniform identification of all known languages in a wide range of applications, particularly including information systems. It provides as complete an enumeration of languages as possible, including living, extinct, ancient, and constructed languages, whether major or minor. The Ethnologue does not cover this entire scope; it seeks to catalog all known living languages, languages that have gone extinct since the inception of the Ethnologue around 1950, and languages that have no native speakers but which are still in use as a second language in certain communities. Long extinct and constructed languages that fall outside this scope are documented by Linguist List.

The most widely used standard for identifying languages in Internet documents (such as in HTTP headers or HTML metadata or in the XML lang attribute) is BCP 47 of the Internet Engineering Task Force. In that standard, any three-letter identifier from ISO 639-3 is recognized as a valid language identifier.  Thus any of the three-letter codes reported in Ethnologue is valid for use in Internet documents.

Structure of the Code Tables

Three files make up the package of data tables that SIL International releases in support of the ISO 639-3 standard for language identifiers. They are tab-delimited files in which each line represents one row of a database table. The characters are encoded in the 8-bit standard known as ISO 8859-1 (which is a subset of the default Windows code page 1252). See Downloading the Code Tables for the latest version of the tables.

LanguageCodes.tab The complete list of three-letter language identifiers used in the current Ethnologue (along with name, primary country, and language status).
CountryCodes.tabThe list of two-letter country codes that are used in the main language code table.
LanguageIndex.tabAn index for finding languages by country and by all known names (including primary name, alternate names, and dialect names).

The following declarations provide the formal definitions for SQL data tables into which the tab-delimited files can be loaded:

CREATE TABLE LanguageCodes (
   LangID      char(3) NOT NULL,  -- Three-letter code
   CountryID   char(2) NOT NULL,  -- Main country where used
   LangStatus  char(1) NOT NULL,  -- L(iving), (e)X(tinct)
   Name    varchar(75) NOT NULL)  -- Primary name in that country

CREATE TABLE CountryCodes (
   CountryID  char(2) NOT NULL,  -- Two-letter code from ISO3166
   Name   varchar(75) NOT NULL,  -- Country name
   Area   varchar(10) NOT NULL ) -- World area 

CREATE TABLE LanguageIndex (
   LangID    char(3) NOT NULL,  -- Three-letter code for language
   CountryID char(2) NOT NULL,  -- Country where this name is used
   NameType  char(2) NOT NULL,  -- L(anguage), LA(lternate),
                                -- D(ialect), DA(lternate)
                                -- LP,DP (a pejorative alternate)
   Name  varchar(75) NOT NULL ) -- The name

Using the Code Tables

LanguageCodes.tab lists the 7,600+ distinct language identifiers used in the current Ethnologue database. All values in the Name column are unique; in cases where distinct languages have the same name, a parenthetical disambiguator is added.  The following shows the entries for the first six languages identifiers:

LangID CountryID LangStatus Name
------ --------- ---------- ------------- 
aaa    NG        L          Ghotuo
aab    NG        L          Alumu-Tesu
aac    PG        L          Ari
aad    PG        L          Amal
aae    IT        L          Albanian, Arbëreshë
aaf    IN        L          Aranadan

We see that aaa and aab denote living languages spoken in Nigeria, aac and aad denote living languages spoken in Papua New Guinea, and so on. When a language is actually spoken in more than one country, the CountryId gives the country that is considered primary; usually the country of origin or country where most of the speakers are located.
 

CountryCodes.tab lists the two-letter identifier and name for the countries reported on by Ethnologue. The codes are from the international standard known as ISO 3166-1 (1997. Codes for the representation of names of countries and their subdivisionsPart 1: Country codes. Geneva: International Organization on Standardization. http://www.din.de/gremien/nas/nabd/iso3166ma/). The following shows the entries for the first five codes in the list:

CountryID Name                  Area
--------- --------------------- ----------
AD        Andorra               Europe
AE        United Arab Emirates  Asia
AF        Afghanistan           Asia
AG        Antigua and Barbuda   Americas
AI        Anguilla              Americas

The CountryCodes.tab table can be used to narrow the search for an identifier to a particular country. The user would choose a country from the country list in order to select the appropriate country code. That code would then be used in a SQL query to restrict the language identifier list to just entries for that country. For instance, if the user were interested only in Afghanistan, the following SQL query would return just the table rows for that country:

SELECT * FROM LanguageCodes WHERE CountryID='AF'

Alternatively, the following link to the Ethnologue website could be used to generate a report listing all the languages for Afghanistan:

http://www.ethnologue.com/country/AF 

LanguageIndex.tab documents over 55,000 distinct names used for the languages and their dialects. The entries in this index of names indicate in which country each name is used. The table thus contains over 70,000 records since many of the names are used in more than one country and some are used with more than one language or dialect. The following shows the entries in the name index for the first three language identifiers:

LangID CountryID NameType Name  
------ --------- -------- ------------- 
aaa    NG        L        Ghotuo
aab    NG        D        Alumu
aab    NG        D        Tesu
aab    NG        DA       Arum
aab    NG        L        Alumu-Tesu
aab    NG        LA       Alumu
aab    NG        LA       Arum-Cesu
aab    NG        LA       Arum-Chessu
aab    NG        LA       Arum-Tesu
aac    PG        D        Serea
aac    PG        L        Ari

We see that aaa has just one name, Ghotuo; aab has four alternate names, two dialect names, and an alternate dialect name in addition to its primary name; aac has a dialect name in addition to the primary name of Ari.

The LanguageIndex.tab table would be used to implement a search by name. For instance, the following query returns the three-letter codes for all the languages that use the name xyz:

SELECT DISTINCT LangID FROM LanguageIndex
WHERE Name='xyz'

Note that DISTINCT is used since the same language could be known by the same name in multiple countries. To allow the user to verify that a proposed identifier is indeed the right one, the software would offer the following link to the Ethnologue website to see a report giving detailed information about the selected language (where abc is the proposed three-letter identifier):

http://www.ethnologue.com/language/abc

Another application of the LanguageIndex.tab table is to find all the countries in which a given language is spoken. For instance, the following query returns the names of all the countries in which language abc is spoken:

SELECT DISTINCT C.Name
FROM CountryCodes AS C
JOIN LanguageIndex AS L ON C.CountryID=L.CountryID
WHERE L.LangID='abc'

In this case DISTINCT must be used since a language could have multiple names in a given country.

Finally, the LanguageIndex.tab table can be used to find all the languages spoken in a particular country. Whereas the query illustrated previously retrieves all languages whose primary country is Afghanistan, the following query retrieves all languages spoken in Afghanistan:

SELECT DISTINCT LangID FROM LanguageIndex
WHERE CountryID='AF'


Downloading the Codes Tables

The code tables (as tab-delimited, UTF-8 encoded plain text files) may be downloaded individually by clicking the following links. In each case, the first line contains the column names rather than the first row of data.

Or download the complete set of tables with the terms of use statement in a single zip file:

Terms of use

In the interest of fostering the use of ISO 639-3 for the uniform identification of all the world's languages in information systems, SIL International releases certain information from the Ethnologue database for use in the development of information systems, specifically, information that describes language identifiers in terms of alternate names and countries where spoken. You are welcome to download the information as provided and incorporate the supplied tables into your own database application. You are authorized to include the information in a product that you make available to the public (even on a commercial basis), provided that you:

  • cite SIL International and this website (ethnologue.com) as the source of the information,
  • do not modify or extend the codes other than those set aside for local use (i.e. qaa to qtz),
  • do not redistribute the code tables for download, and
  • use only the data in the tables and no other data posted on this site. Other information on this site should be accessed by supplying a link like the following (where abc is an ISO 639-3 code):

http://www.ethnologue.com/language/abc SIL International periodically updates the supplied information, and intends this site to be the sole distribution source in order to ensure uniformity of versions. You are not authorized to redistribute the code tables for download, whether in the exact form they were obtained from this site or in a modified form you have developed, without the written consent of SIL International (see instructions above).