The Problem of Language Identification


How one chooses to define a language depends on the purposes one has in identifying one language as being distinct from another. Some base their definition on purely linguistic grounds, focusing on lexical and grammatical differences. Others may see social, cultural, or political factors as being primary. In addition, speakers themselves often have their own perspectives on what makes a particular language uniquely theirs. Those are frequently related to issues of heritage and identity much more than to the actual linguistic features. In addition, it is important to recognize that not all languages are oral. Sign languages constitute an important class of linguistic varieties that merit consideration.

Due to the nature of language and the various perspectives brought to its study, it is not surprising that a number of issues prove controversial. Of preeminence in this regard is the definition of the basic unit which the Ethnologue reports on: what constitutes a language?

Language as particle, wave, and field

Scholars recognize that languages are not always easily nor best treated as discrete, identifiable, and countable units with clearly defined boundaries between them (Makoni and Pennycook 2006). Rather, a language is more often comprised of waves of features that extend across time, geography, and social space. In addition, there is growing attention being given to the roles or functions that language varieties play within the linguistic ecology of a region or a speech community. The Ethnologue approach to listing and counting languages as though they were discrete units does not preclude any of these more dynamic perspectives on the linguistic makeup of the countries and regions we describe.

While discrete linguistic varieties can be distinguished, we also recognize that those varieties exist in a complex set of relationships to each other. Languages can be viewed simultaneously as discrete units (particles) amenable to being listed and counted, as bundles of features across time and space (waves) that are best studied in terms of variational tendencies as examples of “change in progress” (Weinreich, Labov and Herzog 1968), and as parts of a larger ecological matrix (field), where functional roles and usage of the linguistic codes for a wide range of purposes are more in focus. All three of these, language as particle, wave, and field (Lewis 1999; Pike 1959), are useful and important perspectives. Ethnologue focuses primarily on the unitary nature of languages without prejudice against the other perspectives.

Language and dialect

As part of the wave-like nature of language in general, every language is characterized by variation within the speech communities that use it. Innovations of new features and retentions of long-standing lexical, phonological or grammatical features spread like waves across geographic and social space and come and go over time. Varieties which share similar features diverge from one another to different degrees. Divergent varieties are often referred to as dialects. In some cases, they may be distinct enough that some would consider them to be separate languages. In other cases, the varieties may be sufficiently similar to be considered merely characteristic of a particular geographic region, social grouping, or historical era. Sometimes speakers may be very aware of dialect variation and be able to label a particular dialect with a name. In other cases, the variation may go largely unnoticed or overlooked. For many, the term dialect is a pejorative term that identifies a variety as being in some way deficient or inadequate.

To further complicate the issue, not all scholars share the same set of criteria for distinguishing what level of divergence distinguishes a “language” from a “dialect” and therefore the terms are not always consistently applied. Since the fifteenth edition (2005), Ethnologue has followed the ISO 639-3 inventory of identified languages ( ) as the basis for our listing of distinct languages.

ISO 639-3 criteria for language identification

The ISO 639-3 standard applies the following basic criteria for defining a language in relation to varieties which may be considered dialects:

  • Two related varieties are normally considered to belong to the same individual language if speakers of each variety have inherent understanding of the other language variety (that is, can understand each other based on knowledge of their own language variety without needing to learn the other language variety) at a functional level.
  • Where spoken intelligibility between language varieties is marginal, the existence of a common literature or of a common ethnolinguistic identity with a central variety that both speaker communities understand can be strong indicators that they should nevertheless be considered language varieties of the same individual language.
  • Where there is enough intelligibility between varieties to enable communication, they can nevertheless be treated as different languages when they have long-standing distinctly named ethnolinguistic identities coupled with established standardization and literatures that are distinct.

These criteria make it clear that the identification of “a language” is not based on linguistic criteria alone.

The language entries in Ethnologue include a listing of dialect names. In most cases, those listings are not based on rigorous research using the methods of dialectology. Rather, these lists include all names reported to us which may, at one time or another, have been used in reference to a local variety of a language. Names listed may be alternate names for the same linguistic variety.


In addition to defining three-letter codes for individual languages, the ISO 639-3 standard also defines codes for macrolanguages. The latter are defined in the standard as “multiple, closely related individual languages that are deemed in some usage contexts to be a single language.” Macrolanguages were introduced into the standard in order to handle cases in which varieties would be considered distinct languages by the criterion of non-intelligibility as described above, but had already been given a code as a single language by the previously existing ISO 639-2 standard. For instance, Arabic [ara] and Chinese [zho] were already defined in ISO 639-2 on the basis of literature shared across many spoken varieties (and a shared writing system in the case of Chinese).

Languages like these (with their existing three-letter codes) were included in ISO 639-3 as macrolanguages, and the varieties that were so distinct as not to be intelligible to each other received new three-letter codes as individual languages. The standard then enumerates the set of individual languages that are the members of each macrolanguage. It is important to note that macrolanguages are more than just groups of related languages. The individual languages that comprise a macrolanguage must be closely related, and there must be some context in which they are commonly viewed as comprising a single language.

This edition of Ethnologueincludes entries for 60 macrolanguages as identified in the ISO 639-3 standard. The addition of this conceptualization of language provides us with a way to represent the fact that linguistic varieties function simultaneously as both individual units and within a larger functional matrix. Macrolanguage entries are brief, consisting largely of a listing of the individual languages that comprise them.

Sign languages

There are hundreds of sign languages in the world, created and used by deaf people. This edition of Ethnologue lists 149 living sign languages. As the primary language of daily face-to-face communication for their respective communities of users, these languages fall within the scope of the Ethnologue. The deaf sign languages listed in language entries are those used exclusively within deaf communities. The listings include only natural sign languages, not signed versions of spoken languages (manual codes), which typically have names like “Signed English” or “Signed French.” Manual codes are, however, sometimes mentioned in the entries for individual sign languages. Generally, we do not include manual systems invented primarily for use by hearing people that are not full languages (e.g., hand signals in sports), though some manual systems that have been assigned ISO 639-3 codes and are used as second languages only are included in our listings.