Three-letter codes for identifying languages:
Updating codes from the 14th Edition to the 15th Edition
This page is aimed at those who have used the SIL Three-Letter Codes for Identifying Languages from the 14th edition of the Ethnologue in their own relational database application. Significant changes were made in the set of language identifiers between the 14th and 15th editions. This page explains some of the background behind the changes and then describes how to update the codes in a local database.
- Background
- Overview of the update procedure
- The code update tables
- Dealing with retired codes
- Converting to 15th edition codes
1. Background
Since the tenth edition oif the Ethnologue (1984), the language entries have included a three-letter code that uniquely identifies the language. The 15th edition (2005) marks an important milestone in the development of the language identifiers, namely, their emergence as a draft international standard. In 1998, the International Organization for Standardization adopted ISO 639-2its standard for three-letter language identifiers. That was based on a convergence of ISO 639-1 (its earlier standard for two-letter language identifiers adopted in 1988) and of ANSI Z39.53 (also known as the MARC language codes, a set of three-letter identifiers developed within the library community and adopted as an American National Standard in 1987). The current standard, ISO 639-2, has proven insufficient for many purposes since it has identifiers for fewer than 400 individual languages.
In 2002, ISO TC37SC2 invited SIL International to participate in the development of a new standard based on the language identifiers in the Ethnologue. The new standard would be a superset of ISO 639-2 and would provide identifiers for all known languages. As a consequence, hundreds of codes from the 14th edition have been changed to a different sequence of letters in the 15th edition in order to achieve alignment with ISO 639-2. In fact, all language identifiers have changed, since the convention in previous editions of the Ethnologue was to use three upper-case letters while the ISO convention is to use three lower-case letters. The proposed new standard, ISO 639-3, has passed the first round of balloting to attain the status of Draft International Standard. The three-letter language identifiers in the 15th edition of the Ethnologue are thus the codes of ISO/DIS 639-3.
2. Overview of the update procedure
In addition to the fact that many codes have changed in the transition from the former SIL system to the new ISO system, there are also a number of changes to the inventory of codes that have occurred between the 14th and 15th editions. Most such changes occur when our knowledge of lesser-known languages improves such that what was previously thought to be one language with multiple dialects is determined to be multiple languages, or when what were previously thought to be multiple languages are determined to be dialects of a single language.
Before applying the changes that convert from the upper-case codes of the 14th edition to the lower-case codes of the 15th edition, it is first necessary to apply all the updates to the inventory of codes that have transpired between editions. These updates have been released at intervals (see Code downloads). The instructions given below can be followed without regard to those intermediate releases; that is, the instructions will work regardless of whether or not you have made intermediate updates.
A pair of downloadable data tables is provided in order to facilitate the update process. Section 3 documents the structure of those tables. Section 4 describes the steps involved in using the data tables to deal with all the codes that have been retired from the inventory of codes. After the inventory is up-to-date, then the instructions in section 5 are followed to convert the codes from the old-style upper-case codes to the equivalent new lower-case codes of ISO/DIS 639-3.
3. The code update tables
Two tab-delimited data tables are provided to assist you in automating the process of updating the language codes used in your local database. The first supports the task of dealing with 14th edition codes that have been retired from use (see sections 4 to 6); the second supports the task of converting the old-style codes to the new 15th edition codes. In both cases, the first line contains the column names rather than the first row of data.
Here are formal definitions for SQL data tables into which the tab-delimited files can be loaded, followed by a fuller explanation of what the columns of the tables represent:
CREATE TABLE Retired_Codes ( Code char(3) NOT NULL, -- The language code that was retired Reason char(2) NOT NULL, -- A one-letter code for why it was retired Change_To char(3) NULL) -- A (potential) replacement code
CREATE TABLE Update_Mappings ( Old_Code char(3) NOT NULL, -- The 14th edition code New_Code char(3) NOT NULL, -- The 15th edition (ISO/DIS 639-3) code Name varchar(60) NOT NULL ) -- The language name
Retired_Codes. This table lists all the 14th edition codes that were retired some time after the publication of the 14th edition. The table has just three fields:
| Code | char(3) | The 14th edition code that was retired | |||||||||||||||||||||
| Reason | char(1) | A one-letter code signifying the reason it was retired:
|
|||||||||||||||||||||
| Change_To | char(3) | A code that the retired code should be changed to. The code
given here is still in the style of the 14th edition (that is, all upper case).
This column relates to the Reason column as follows:
|
Code_Mappings. Lists the mapping for each old-style upper-case code to its corresponding new ISO/DIS 639-3 lower-case code. It contains a row for every new code in the 15th edition. It does not have a row for every 14th edition code since those that have been retired are omitted; new old-style codes that were created after the publication of the 14th edition are included. Every code that was added between the 14th and 15th editions was first added as an old-style code. The table has three columns (though just the first two are used in this application; the last is simply provided for reference):
| Old_Code | char(3) | The old-style code (whether in the14th edition, or added in the interim) |
| New_Code | char(3) | The corresponding new 15th edition (ISO/DIS 639-3) code |
| Name | varchar(60) | The language name |
4. Dealing with retired codes
If your database contains any language codes that were retired between the 14th and 15th editions, then a simple application of the code conversion table will not work since the retired codes are not in the conversion table. The following subsections describe how to deal with the retired codes.
Checking for the presence of update problems
The first step is to determine if you have any retired codes in your existing database. Assuming that the table in your application that contains language codes is named My_Table and that the column in it containing language codes is named Code, then the following SQL query produces a two-column table of the retirement types (i.e. one-letter Reason codes) taht occur in your database, along with a count of how many times each occurs.
-- Discover what update problems exist
SELECT r.Reason, COUNT(m.Code)
FROM My_Table AS m JOIN Retired_Codes AS r
ON m.Code = r.Code
GROUP BY r.Reason
If this query returns no rows, then your database contains no retired codes and you may skip directly to the code conversion step in section 5. On the other hand, if the query returns results, then you must do all of the following which apply:
- If there is a count for N, you must remove deleted codes.
- If there is a count for C, D, or M, you must apply automatic code changes.
- If there is a count for S or Z, you must handle non-automatable changes.
Removing deleted codes
Codes with a retirement reason of N were retired because the Ethnologue editor concluded that the language was nonexistent. These typically entered the Ethnologue database decades ago as being included in published lists of world languages, but in the meantime, the editor has not been able to find any solid evidence that the named language did or does exist.
The likelihood of such a code being used to tag data in your database is thus very low. However, if you have used such a code, then you may have the evidence the editor has been looking for! You should first take a look at all the records in your database that contain a deleted code. The following query will find them:
-- Find the records that use deleted codes
SELECT m.*
FROM My_Table AS m JOIN Retired_Codes AS r
ON m.Code = r.Code
WHERE r.Reason = 'N'
You will need to examine each record to determine what should be done. You may want to look up the entry for this code in the 14th edition of the Ethnologue in order to see what language it was purported to represent. Substitute the three-letter code at the end of the following URL:
http://www.ethnologue.com/14/show_language.asp?code=abc
After consideraton of the database record, you should make one of three conclusions:
- The wrong code was used; search the 14th edition to find the code that should have been used and change the data.
- The language really does exist; leave the code as is and send your evidence to the Ethnologue editor (see the Feedback page).
- The language doesn't exist; simply delete the record from your database.
Applying automatic code changes
Codes with a retirement reason of C, D, or M involve a one-to-one change to a different code. This is because the three-letters were simply changed or because the purported language was found to duplicate another language already listed or because the speech variety denoted by the code was determined to be a dialect of another language and merged into it.
In all three cases, an automatic update of the code can be made. The following query will make the necessary change:
-- Automatically update the codes that involve a one-to-one change UPDATE My_Table SET m.Code = r.Change_To FROM My_Table AS m, Retired_Codes AS r WHERE m.Code = r.Code AND (r.Reason = 'C' OR r.Reason = 'D' OR r.Reason = 'M')
Handling non-automatable changes
Codes with a retirement reason of S or Z involve a one-to-many change that cannot be automated. This is because the original code represents something that was previously thought to be one language, but which is now treated as two or more languages. Therefore it is necessary for someone to review each use of the original code and determine which of the new alternatives it should be changed to.
The following query will find all the data instances in your database that need to be changed by hand, and will indicate what possible newer codes they may be changed to:
-- Find data instances that involve a one-to-many change SELECT DISTINCT m.* FROM My_Table AS m, Retired_Codes AS r WHERE m.Code = r.Code AND (r.Reason = 'S' OR r.Reason = 'Z')
This query retrieves every row in MyTable that contains a code that was split. You will need to look up each old code in the archived version of the 14th edition to see both what it originally denoted and then what the possible new codes are:
http://www.ethnologue.com/14/show_language.asp?code=abc
The page will show a yellow block containing an explanation of how the old code has changed. Click on the links to the 15th edition to read the description of each new code in order to decide which one best fits the instance in your database; then update the old codc by hand. When you do so, you must change it to the "Interim SIL code" using upper case letters. If you jump ahead to the lower-case 15th edition code, then the mass conversion operation in step 5 below could unexpectedly change it to the wrong code.
If you are unable to decide which interim code to change an instance to, then you have two options. The cleanest thing to do would be to select a code from the local use area (that is, any code in the range [qaa] to [qtz]) to represent the old dentation until you can get more information and resolve the code assignment at a later date. Alternatively, you could prefix a symbol like an asterisk to the old code in order to mark it as pending and invalid.
Going from the 15th edition forward, it will not be so critical to change codes that have been retired, since the retired code will not be reassigned, so it will always be possible to look up the denotation of the retired code. However, leaving a retired 14th edition code in your database could cause problems, since there are cases in which a retired 14th edition has the same three letters (except for the upper- versus lower-case distinction) as a 15th edition code for a different language.
5. Converting to 15th edition codes
Once your database contains no retired codes, you are ready to proceed to the step of automatically converting all old-style codes to 15th edition codes. Before doing that, it is a good idea to run the following query to ensure that you have no retired codes left in your database:
-- Return all rows containing retired codes
SELECT m.*
FROM My_Table AS m JOIN Retired_Codes AS r
ON m.Code = r.Code
If this query returns any rows, go back to step 4 to take care of the remaining retired codes.
Before making the mass conversion, you should also verify that your database does not contain any unkown codes. Run the following query to test for this:
-- Return all rows containing unknown codes
SELECT m.*
FROM My_Table AS m LEFT JOIN Update_Mappings AS u
ON m.Code = u.Old_Code
WHERE u.Old_Code IS NULL
If this query returns any rows, make a print out of the query results so that you will be able to find these rows again at the end of the process.
You are now ready to perform the mass conversion from old-style codes to 15th edition codes. The following query should do the trick:
-- Automatically convert all old codes to 15th edition codes UPDATE My_Table SET m.Code = u.New_Code FROM My_Table AS m, Update_Mappings AS u WHERE m.Code = u.Old_Code
Finally, if you found any unknown codes in the database, you should now fix these. Look at each problem record in your database, and then look in the 15th edition of the Ethnologue to discover the appropriate code for the record.

