Relation to Standards

Print

This 15th edition of the Ethnologue (2005) marked an important milestone in the development of the language identifiers, namely, their emergence as part of the draft international standard, ISO/DIS 639-3. (See History of the Ethnologue for a fuller discussion of the history of the language identifiers.) The aim of that standard is to enable the uniform identification of all known human languages in information systems. ISO 639-3 was devised to enable the uniform identification of all known languages in a wide range of applications, particularly including information systems. It provides as complete an enumeration of languages as possible, including living, extinct, ancient, and constructed languages, whether major or minor. The Ethnologue does not cover this entire scope; it seeks to catalog all known living languages, languages that have gone extinct since the inception of the Ethnologue (1950), and languages that have no native speakers but which are still in use as a second language in certain communities. Ancient, historical, and constructed languages that fall outside this scope are documented by Linguist List.

The most widely used standard for identifying languages in Internet documents (such as in HTTP headers or HTML metadata or in the XMLlang attribute) is RFC 4646 (formerly RFC 3066). In that standard, a three-letter identifier is interpreted as being a code from the ISO 639-2standard. RFC 4646 offers an extension mechanism of tags beginning with x- to handle custom codes for languages not covered in the standard. With the 14th edition of the Ethnologue, we recommended that an RFC 4646 compliant language tag be formed from an SIL three-letter language identifier as follows: x-sil-abc. The situation is now different since the identifiers used in the Ethnologue are a subset of the codes in ISO 639-3, which in turn includes the individual language codes of ISO 639-2 as a subset. We anticipate that the RFC will be revised when ISO 639-3 becomes fully adopted. In the meantime, using an ISO/DIS 639-3 code in a context where a 639-2 code is expected will not lead to misinterpretation, since:

  • If the code is found in the 639-2 code set, then it is in fact the same as that 639-2 code.
  • If the code is not found in the 639-2 code set, then it could be treated as an unknown language, or the 639-3 code set could be consulted to find its denotation.