Methodology

The purpose of Ethnologue is to provide a comprehensive listing of the known living languages of the world. The Ethnologue is intended more as a catalog than as an encyclopedia and so provides summary data rather than more extensive descriptions of identified languages. Information comes from numerous sources and is confirmed by consulting both reliable published sources and a large network of field correspondents.

Much of the focus of Ethnologue is on the less commonly known languages. Even a relatively limited range of information is hard to discover for all of the world’s languages, so the breadth of Ethnologue’s descriptions varies. The information is organized within specific categories as described below in Language information. With each new edition of Ethnologue the range and focus of the categories we use is refined and adapted to better serve those who use the Ethnologue as a reference and research tool. However, we can report little data beyond those categories. Greater detail and depth of description of many of the languages, especially the larger, more commonly studied languages, can best be found in other works.

Among the many reference works about languages, the unique contribution of Ethnologue is to document the global language ecology. By “language ecology” we mean “the study of interactions between any given language and its environment” (Haugen 1972:325). This involves not only the environments of geography and of genealogical linguistic relationships, but also the political and social environments of the society that uses the language. This includes the status a language has with respect to other languages that the community uses, and how the community members use it within their total repertoire of languages. Providing this kind of ecological overview is where the “sweet spot” of Ethnologue lies.

The information given on the demography, geography, typology, vitality, and development of the known living languages of the world can be useful to linguists, translators, anthropologists, bilingual educators, language planners, government officials, aid workers, field investigators, missionaries, students, and any others interested in languages.

The problem of language identification

How one chooses to define a language depends on the purposes one has in identifying one language as being distinct from another. Some base their definition on purely linguistic grounds, focusing on lexical and grammatical differences. Others may see social, cultural, or political factors as being primary. In addition, speakers themselves often have their own perspectives on what makes a particular language uniquely theirs. Those are frequently related to issues of heritage and identity much more than to the actual linguistic features. In addition, it is important to recognize that not all languages are oral. Sign languages constitute an important class of linguistic varieties that merit consideration.

Due to the nature of language and the various perspectives brought to its study, it is not surprising that a number of issues prove controversial. Of preeminence in this regard is the definition of the basic unit which the Ethnologue reports on: what constitutes a language?

Language as particle, wave, and field

Scholars recognize that languages are not always easily nor best treated as discrete, identifiable, and countable units with clearly defined boundaries between them (Makoni and Pennycook 2006). Rather, a language is more often comprised of waves of features that extend across time, geography, and social space. In addition, there is growing attention being given to the roles or functions that language varieties play within the linguistic ecology of a region or a speech community. The Ethnologue approach to listing and counting languages as though they were discrete units does not preclude any of these more dynamic perspectives on the linguistic makeup of the countries and regions we describe.

While discrete linguistic varieties can be distinguished, we also recognize that those varieties exist in a complex set of relationships to each other. Languages can be viewed simultaneously as discrete units (particles) amenable to being listed and counted, as bundles of features across time and space (waves) that are best studied in terms of variational tendencies as examples of “change in progress” (Weinreich, Labov and Herzog 1968), and as parts of a larger ecological matrix (field), where functional roles and usage of the linguistic codes for a wide range of purposes are more in focus. All three of these, language as particle, wave, and field (Lewis 1999; Pike 1959), are useful and important perspectives. Ethnologue focuses primarily on the unitary nature of languages without prejudice against the other perspectives.

Language and dialect

As part of the wave-like nature of language in general, every language is characterized by variation within the speech communities that use it. Innovations of new features and retentions of long-standing lexical, phonological or grammatical features spread like waves across geographic and social space and come and go over time. Varieties which share similar features diverge from one another to different degrees. Divergent varieties are often referred to as dialects. In some cases, they may be distinct enough that some would consider them to be separate languages. In other cases, the varieties may be sufficiently similar to be considered merely characteristic of a particular geographic region, social grouping, or historical era. Sometimes speakers may be very aware of dialect variation and be able to label a particular dialect with a name. In other cases, the variation may go largely unnoticed or overlooked. For many, the term dialect is a pejorative term that identifies a variety as being in some way deficient or inadequate.

To further complicate the issue, not all scholars share the same set of criteria for distinguishing what level of divergence distinguishes a “language” from a “dialect” and therefore the terms are not always consistently applied. Since the fifteenth edition (2005), Ethnologue has followed the ISO 639-3 inventory of identified languages (http://iso639-3.sil.org/) as the basis for our listing of distinct languages.

ISO 639 criteria for language identification

The ISO 639 standard (ISO 2023) applies the following basic criteria for determing whether two language varieties should be viewed as distinct languages or as dialects of the same language:

Two related language varieties are normally considered to belong to the same individual language if speakers of each language variety have inherent understanding of the other language variety at a functional level (i.e. they can understand each other based on knowledge of their own language variety without needing to learn the other language variety). Where such mutual intelligibility does not exist, the two language varieties are generally seen to belong to different individual languages.
Where spoken intelligibility between language varieties is marginal, the existence of a common literature or of a common ethnolinguistic identity with a central language variety that both speaker communities understand is a strong indicator that they should nevertheless be considered language varieties of the same individual language.
Where there is enough intelligibility between language varieties to enable communication, they can nevertheless be treated as different individual languages when they have long-standing, distinctly named ethnolinguistic identities coupled with established linguistic normalization and literatures that are distinct.

These criteria make it clear that the identification of “a language” is not based on linguistic criteria alone.

The language entries in Ethnologue include a listing of dialect names. In most cases, those listings are not based on rigorous research using the methods of dialectology. Rather, these lists include all names reported to us which may, at one time or another, have been used in reference to a local variety of a language. Names listed may be alternate names for the same linguistic variety.

Macrolanguages

In addition to defining three-letter codes for individual languages, the ISO 639 standard also defines codes for macrolanguages. The latter are defined in the standard as “individual language that for the purpose of language coding can be subdivided into two or more other individual languages.” This has been done when multiple, closely related individual languages are deemed in some language coding contexts to be a single language. Macrolanguages were introduced into the standard in order to handle cases in which varieties would be considered distinct languages by the criterion of non-intelligibility as described above, but had already been given a code as a single language by the previously existing ISO 639-2 standard. For instance, Chinese [zho] was already defined in ISO 639-2 on the basis of literature and writing system shared across many language varieties that are not mutually intelligible when spoken.

Languages like these (with their existing three-letter codes) were included in ISO 639-3 as macrolanguages, and the varieties that were so distinct as not to be intelligible to each other received new three-letter codes as individual languages. The standard then enumerates the set of individual languages that are the members of each macrolanguage. It is important to note that macrolanguages are more than just groups of related languages. The individual languages that comprise a macrolanguage must be closely related, and there must be some use cases for public data interchange in which the varieties are treated as separate languages and others in which they need to be trated as varieties of a singlee language.

This edition of Ethnologue includes entries for 60 macrolanguages as identified in the ISO 639-3 standard. The addition of this conceptualization of language provides us with a way to represent the fact that linguistic varieties function simultaneously as both individual units and within a larger functional matrix. Macrolanguage entries are brief, consisting largely of a listing of the individual languages that comprise them.

Sign languages

There are hundreds of sign languages in the world, created and used by deaf people. This edition of Ethnologue lists 158 living sign languages. As the primary language of daily face-to-face communication for their respective communities of users, these languages fall within the scope of the Ethnologue. The deaf sign languages listed in language entries are those used exclusively within deaf communities. The listings include only natural sign languages, not signed versions of spoken languages (manual codes), which typically have names like “Signed English” or “Signed French.” Manual codes are, however, sometimes mentioned in the entries for individual sign languages. Generally, we do not include manual systems invented primarily for use by hearing people that are not full languages (e.g., hand signals in sports), though some manual systems that have been assigned ISO 639-3 codes and are used as second languages only are included in our listings.

Language status

We summarize the status of each language in each country where it is used in the Status element of a language entry by reporting two types of information. The first is an estimate of the overall development versus endangerment of the language using the EGIDS scale (Lewis and Simons 2010). The second is a categorization of the official recognition given to a language within the country.

The EGIDS consists of 13 levels with each higher number on the scale representing a greater level of disruption to the intergenerational transmission of the language. Table 1 provides summary definitions of the 13 levels of the EGIDS.

Table 1. Expanded Graded Intergenerational Disruption Scale

Level	Label	Description
0	International	The language is widely used between nations in trade, knowledge exchange, and international policy.
1	National	The language is used in education, work, mass media, and government at the national level.
2	Provincial	The language is used in education, work, mass media, and government within major administrative subdivisions of a nation.
3	Wider Communication	The language is used in work and mass media without official status to transcend language differences across a region.
4	Educational	The language is in vigorous use, with standardization and literature being sustained through a widespread system of institutionally supported education.
5	Developing	The language is in vigorous use, with literature in a standardized form being used by some though this is not yet widespread or sustainable.
6a	Vigorous	The language is used for face-to-face communication by all generations and the situation is sustainable.
6b	Threatened	The language is used for face-to-face communication within all generations, but it is losing users.
7	Shifting	The child-bearing generation can use the language among themselves, but it is not being transmitted to children.
8a	Moribund	The only remaining active users of the language are members of the grandparent generation and older.
8b	Nearly Extinct	The only remaining users of the language are members of the grandparent generation or older who have little opportunity to use the language.
9	Dormant	The language serves as a reminder of heritage identity for an ethnic community, but no one has more than symbolic proficiency.
10	Extinct	The language is no longer used and no one retains a sense of ethnic identity associated with the language.

The EGIDS levels are designed to largely coincide with Fishman’s Graded Intergenerational Disruption Scale, or GIDS (Fishman 1991). We refer users to Fishman’s work for an orientation to this approach to evaluating endangerment and to the original work on EGIDS (Lewis and Simons 2010) for the rationale behind the development of the expanded framework. The descriptions of the levels presented here have been adjusted to take into account significant feedback on the scale that has been received since its initial development. Most notably, the EGIDS level descriptions have been reworded to take into account signed languages (Bickford et al 2014). Like the GIDS, the EGIDS at its core measures the level of disruption of intergenerational transmission. Therefore, stronger, more vital languages have lower numbers on the scale and weaker, more endangered languages have higher numbers.

In comparison to GIDS, the EGIDS includes some additional factors at both the stronger and weaker levels of the scale and thus adds some levels not included in the original scale. As a result, the EGIDS can be applied to all of the languages of the world. In addition, two of the levels in the GIDS (6 and 8) have been split (6a, 6b, 8a, 8b) in the EGIDS in order to allow for a finer-grained description of the state of intergenerational transmission in the presence of language shift (or revitalization). The EGIDS uses letters to distinguish these divided levels in order to maintain numbering alignment with Fishman’s better-known GIDS. Each number on the EGIDS has also been assigned a one or two word label that summarizes the state of development or vitality of the language. The labels are intended to provide mnemonics for those who prefer to use words rather than numbers. In a few cases, alternative labels are assigned to a level in order to distinguish significantly different situations that are associated with the same level on the scale. Table 2 lists the alternative labels that are used.

Table 2. Alternative labels for other special situations

Level	Label	Description
5	Dispersed	The language is fully developed in its home country, so that the community of language users in a different country has access to a standardized form and literature, but these are not promoted in the country in focus via institutionally supported education.
9	Reawakening	The ethnic community associated with a dormant language is working to establish more uses and more users for the language with the results that new L2 speakers are emerging.
9	Second language only	The language was originally vehicular, but it is not the heritage language of an ethnic community and it no longer has enough users to have significant vehicular function.

How the EGIDS Works

The EGIDS is a multi-dimensional scale which focuses on different aspects of vitality at different levels. Like Fishman’s GIDS, the EGIDS, at its core, measures disruption in use. At the weakest levels of vitality, EGIDS 9 (Dormant) and EGIDS 10 (Extinct) the primary factor in focus is the function of the language as a marker of identity. If no one still associates the language with their identity, the language can be considered to be Extinct. If there is an ethnic group that associates its identity with the language but uses the language only for symbolic purposes to remind themselves of that identity, the language can be categorized as Dormant (EGIDS 9).

At EGIDS levels 6a (Vigorous), 6b (Threatened), 7 (Shifting), 8a (Moribund), and 8b (Nearly extinct) the primary factor in focus is the state of daily face-to-face use and intergenerational transmission of the language. Each successively weaker level on the scale represents the loss of use, generation by generation.

EGIDS 4 (Educational) and EGIDS 5 (Developing) bring into focus the degree to which the ongoing use of the language is supported and reinforced by the use of the language in education. This largely focuses around issues of standardization and literacy acquisition and the degree to which those are institutionally supported and have been adopted by the community of language users.

EGIDS 3 (Wider Communication) focuses primarily on the notion of vehicularity. If a language (whether written or not) is widely used by others as a second language and as a means of intergroup communication, it has greater vitality than a language with a smaller number of users and which is seen as being less useful by outsiders. Where we have data, we report the use of each language by speakers of other languages.

EGIDS 2 (Provincial) and EGIDS 1 (National) focus on the level of recognition and use given to the language by government. Beyond purely official use, however, the focus includes the widespread use of the language in media and the workplace at either the provincial (sub-national) or national levels. EGIDS 0 (International) is a category reserved for those few languages that are used as the means of communication in many countries for the purposes of diplomacy and international commerce. Because the Ethnologue organizes the language entries by country, EGIDS 1 (National) is the strongest vitality level that we report.

The EGIDS levels are hierarchical in nature. With only one exception, the scale assumes that each stronger level of vitality entails the characteristics of the levels below it. Thus, for example, a language cannot be characterized as EGIDS 5 (Developing) if it cannot also be characterized as being at EGIDS 6a (Vigorous). A language with written materials which is not used for day-to-day communication by all generations and which is not being passed on to all children cannot be categorized as EGIDS 5 (Developing). The one exception to this principle is EGIDS 3 (Wider Communication) where the vehicularity of languages of wider communication is counted as being weightier than the existence of an orthography and the use of the language in education. Some languages that are widely used for intergroup communication are not used in formal education and have no written materials. Were these languages to lose that vehicularity, they would drop directly to EGIDS 6a (Vigorous).

Further details of how we assign EGIDS levels are given below in the documentation on EGIDS estimates in the Language status section of a language description.

Official recognition

If a language has an official function within a country or is specifically recognized in legislation, the entry for the language includes a description of the nature of its recognition. When that recognition is by statute, the specific law is also cited. Table 3 lists and defines (with examples) the fourteen language recognition categories that are used.

In developing these recognition categories, we have adapted the general framework described by Cooper (1989:99–103). Following Stewart’s (1968) identification of the official function of languages in a country, Cooper further distinguishes between statutory, working, and symbolic official languages. To that we have added a further distinction between those same functions at either the national or the provincial level. This descriptive framework identifies the legal foundation (if any) for the recognition, the nature of the official use of the language, and the geopolitical scope of that use and recognition. The combination of these three parameters (legal status, nature of use, and scope of application) results in the first twelve function categories that are listed in table 3. The final two categories represent any other kind of statutory recognition for a language, either for some designated purpose or by the association of the language with an officially recognized ethnic group.

The distinction between statutory and de facto functions is relatively straightforward. When a language function is described as statutory, it means that there is a legal document such as the constitution of the country, language or diversity policy legislation, or the like, that specifies the functions for which the language will be used. Whenever the function is identified as statutory, we provide the name of the relevant statute. We are unable at this time to distinguish in all cases between legislation that is in force and legislation which may not be enforced though it is still legally viable. As for de facto status, in many countries languages are commonly used for governance functions but there is no formal legislative mandate for that use. In those cases, we identify the function as de facto.

Table 3. Official recognition categories and definitions

Function	Definition	Example
Statutory national language	This is the language in which the business of the national government is conducted and this is mandated by law. It is also the language of national identity for the citizens of the country.	Bengali [ben] in Bangladesh, Indonesian [ind] in Indonesia, Spanish [spa] in Spain
Statutory national working language	This is a language in which the business of the national government is conducted and this is mandated by law. However it is not the language of national identity for the citizens of the country.	Danish [dan] in Greenland, English [eng] in India, French [fra] in Rwanda
Statutory language of national identity	This is the language of national identity and this is mandated by law. However, it is not developed enough to function as the language of government business.	Irish [gle] in Ireland, Kituba [mkw] in Congo, Maori [mri] in New Zealand
De facto national language	This is the language in which the business of the national government is conducted but this is not mandated by law. It is also the language of national identity for the citizens of the country.	Standard German [deu] in Germany, Japanese [jpn] in Japan, Setswana [tsn] in Botswana
De facto national working language	This is a language in which the business of the national government is conducted, but this is not mandated by law. Neither is it the language of national identity for the citizens of the country.	English [eng] in Botswana, Spanish [spa] in Andorra, Tagalog [tgl] in Philippines
De facto language of national identity	This is the language of national identity but this is not mandated by law. Neither is it developed enough or known enough to function as the language of government business.	Algerian Arabic [arq] in Algeria, Jamaican Creole English [jam] in Jamaica, Tokelauan [tkl] in Tokelau
Statutory provincial language	This is the language in which the business of provincial government is conducted and this is mandated by law. It is also the language of identity for the citizens of the province.	Assamese [asm] in India, Slovene [slv] in Italy, Turkish [tur] in Greece
Statutory provincial working language	This is a language in which the business of the provincial government is conducted and this is mandated by law. However, it is not the language of identity for the citizens of the province.	Portuguese [por] in Macao, Russian [rus] in Ukraine
Statutory language of provincial identity	This is the language of identity for the citizens of the province and this is mandated by law. However, it is not developed enough or known enough to function as the language of government business.	Danish [dan] in Germany, Turkmen [tuk] in Afghanistan, Walloon [wln] in Belgium
De facto provincial language	This is the language in which the business of the provincial government is conducted, but this is not mandated by law. It is also the language of identity for the citizens of the province.	Yue Chinese [yue] in China, Faroese [fao] in Denmark, Hausa [hau] in Nigeria
De facto provincial working language	This is a language in which the business of provincial government is conducted, but this is not mandated by law. Neither is it the language of identity for the citizens of the province.	Greek [ell] in Albania, Central Kurdish [ckb] in Iran
De facto language of provincial identity	This is the language of identity for citizens of the province, but this is not mandated by law. Neither is it developed enough or known enough to function as the language of government business.	Aceh [ace] in Indonesia, Khinalugh [kjj] in Azerbaijan, Tausug [tsg] in Philippines
Recognized language	There is a law at the national level that names this language and recognizes its right to be used and developed for some purposes.	Greek Sign Language [gss] in Greece, Mamara Sénoufo [myk] in Mali, Saafi-Saafi [sav] in Senegal
Provincially recognized language	There is a law at the provincial level that names this language and recognizes its right to be used and developed for some purposes.	Plains Indian Sign Language [psd] in Canada, Valencian Sign Language [vsv] in Spain
Language of recognized nationality	There is a law that names the ethnic group that uses this language and recognizes their right to use and develop their identity.	Lisu [lis] in China, Puma [pum] in Nepal

The nature of the use of a language in government operations is specified using the term “working” or “identity” or the absence of these terms. When a language is identified as a working language, it means that the operations of the government (debate in parliament, the language of the laws, the language used in government offices, on official forms) may be carried out in the language, but the language is not the language of identity of the majority of the citizens. There are many countries where an international language or the language of a colonial power is used for day-to-day operations of the government, but national (or provincial) identity is linked to a different language. On the other hand, when a language is identified as a language of identity, the reverse is true. The majority of citizens identify that language as being closely associated with their identity but for practical reasons the language is not generally used for governmental operations. In these cases, the language often has a very strong symbolic use to reinforce a common identity and to build national or provincial unity. In the final case, in which the language functions both as the working language of the government and as the language of identity for the majority of the citizens, the label for the category is simply “national language” or “provincial language”, implying both the working function and the identify function.

In terms of geopolitical scope, we distinguish between the national and provincial levels of recognition and use. When a language is identified as performing a particular function at the provincial level, we describe the geopolitical regions involved. If there are many, that description may be reduced to a summary statement.

Some languages are not used or recognized for all of the functions of governance as described above, but may instead be granted only partial or limited recognitions by law. Those languages have been identified more generically as a “recognized language”. Though our data are admittedly incomplete, we attempt to describe the nature of the recognition and its geopolitical scope in as many cases as possible. In addition, in some countries, ethnic groups or nationalities are given official recognition rather than their languages. In some cases these recognized nationalities speak multiple languages. We attempt to identify the languages of such officially recognized nationalities using the label “language of recognized nationality”.

The recognition category for each language is based on the best research available to us. As with all Ethnologue information, we welcome corrections and updates from informed users.

How references are cited

Because the Ethnologue is produced by extracting data from a database, there is a great deal of uniformity, and some stiffness in the wording and phrases used. Frequently the data are maintained using a set of predetermined categories and labels. Because of this, the Ethnologue rarely quotes any source verbatim. Sources are acknowledged wherever specific statements or facts can be directly attributed to them.

Three kinds of source citations are used:

Published works are identified using standard in-text citations enclosed in parentheses. These consist of the author’s or editor’s surname followed by the year of publication. Up to two authors are listed in the citation. Published works authored or edited by three or more persons are cited using the first author’s surname followed by “et al” and the year of publication. The bibliographic details of all cited published works may be found in the bibliography.
Unpublished sources are also acknowledged when specific statements or facts are attributed to them. Unpublished works may include personal communications, manuscripts, unpublished reports, and other materials submitted to us. They are identified using in-text citations enclosed in parentheses in which the year of the communication is given first, followed by the source’s first initial and surname. Unpublished sources with multiple authors are handled in the same way as published source citations, except for the inversion of the order of the citation elements as just described. Unpublished sources are not further described in the bibliography.
Census data are cited like unpublished sources with the year first followed only by the word “census.” Though there may be electronic or print publications where such data can be accessed, we treat them this way in order to avoid swelling the bibliography beyond a manageable size.

We have made strenuous efforts to track and identify all the sources cited in the text and we are happy to supply additional details upon request.

Country information

The country names used as headings are not official names, but are the commonly known names of the countries in English. Ethnologue uses the ISO 3166 standard as a starting point in determining what geopolitical entities to list as countries. As a consequence, some political dependencies are listed in a country section of their own while others are included within the larger country with which they are associated. We generally do not create a separate country section for smaller geopolitical entities which have only one widespread language. The Ethnologue takes no position on issues of national sovereignty by this arrangement which is intended solely to facilitate the navigation of the published information.

The information elements reported on the main page for each country are as follows:

Official name

If the name used by the country in its official documents differs from the popular English name as given in the heading, the official name of the country is listed here. There may be more than one official name listed in more than one language. In a few cases, additional or former names used to identify the country are also included.

Sovereignty status

If the geopolitical entity is not a fully sovereign nation, a comment is given here to describe its status in relation to the sovereign state with which it is associated.

Population

These figures are taken from the most recent national census data where available or are the current estimated population from the United Nations or another reliable source, which is identified. Country populations from these sources may be estimates based on population trends rather than the results of actual head counts.

General remarks

The country information may also contain general remarks about the political status, the geography, or the population.

Principal languages

Languages that have been identified as having a working function at the nation-wide level are listed here, whether this is by statute or is the de facto situation. For a fuller discussion of what we mean by a working function, see Official recognition.

Literacy rate

This is an estimate of the percentage of the population in the country that is literate in any language. Data are primarily from UNESCO but may come from various other sources if more recent estimates are available.

International conventions

We have identified 9 conventions within the body of international law that affirm the language and culture rights of indigenous and minority peoples. This element of the country information lists which of these conventions the country in focus has subscribed to. Knowing this information can be of use to those advocating for indigenous and minority languages within the country.

Table 4 lists the international conventions that Ethnologue reports on. The Acronym column gives the abbreviation by which the convention is referenced in the country information; the Full name is given in the second column. See especially identifies the articles of the convention that are particularly pertinent to language use and development. Year adopted is the year in which the convention was adopted by the international body that has promulgated it. For most conventions, however, it does no go into force for a particular country until the government of that country takes a further step to ratify it. The country information places the year of ratification by the country after the acronym in parentheses; the Year displayed column names the action that was taken in the year given.

Table 4. International conventions pertaining to language rights

Acronym	Full name	See especially	Year adopted	Year displayed
ACHPR	African Charter on Human and Peoples’ Rights	Article 22 affirms “cultural development”	1987	Ratification
CDE	Convention against Discrimination in Education	Article 5.1c	1960	Deposit
CPPDCE	Convention on the Protection and Promotion of the Diversity of Cultural Expressions	Article 6.2b	2005	Deposit
CSICH	Convention for the Safeguarding of Intangible Cultural Heritage	Article 2.2a	2003	Deposit
ECRML	European Charter for Regional or Minority Languages	Articles 7, 8	1992	Ratification
FCPNM	Framework Convention for the Protection of National Minorities	Articles 5, 10-12, 14	1998	Ratification
ICCPR	International Covenant on Civil and Political Rights	Article 27	1966	Acession, Sucession, or Ratification
ILOCITP	ILO Convention on Indigenous and Tribal People	Articles 28, 30	1989	Ratification
UNCRPD	United Nations Convention on the Rights of People with Disabilities	Articles 9.2e, 21b, 24.3b, 30.4	2006	Formal confirmation, Accession, or Ratification
UNDRIP	United Nations Declaration on the Rights of Indigenous Peoples	Article 13	2007	Voted for

General references

This lists author-year citations for published sources of general information about the country and its languages including sources which we may have consulted in developing the language maps of the country. See Bibliography for the full bibliographic references of the cited works. This list offers suggestions for those who wish to begin exploring the language situation of the country. It is not intended to be exhaustive and sources for specific languages are included in the individual language entries. Suggestions for additional or more up-to-date general works on the languages of the country are solicited. See Updates and corrections for submission instructions.

Deaf population

There are millions of deaf and hearing-impaired people in the world. The country overview gives information on the number of audiologically deaf people (which is generally larger than the number of deaf people who use a sign language). The deaf sign languages listed in language entries are those used exclusively within deaf communities. See the fuller discussion under The problem of language identification.

Recognized nationalities

If the country has a system for officially recognizing nationalities within its borders, it is described here. The officially recognized nationality with which the individual languages are associated is reported in Status section of the language entries.

Language counts (with profile graphic)

The number of established languages in the country is given with a breakdown of the number of living languages versus extinct languages. The counts of living languages are further broken down into languages which are indigenous or non-indigenous in the country and into the summary categories of Institutional (EGIDS 0-4), Developing (EGIDS 5), Vigorous (EGIDS 6a), In trouble (EGIDS 6b-7), and Dying (EGIDS 8a-9). These counts are represented visually in a country profile histogram that shows the relative number of languages in each of these five summary vitality categories. These categories are represented by bars of purple, blue, green, yellow, and red, respectively. See Language status for a detailed discussion of the EGIDS levels that make up these summary categories. Macrolanguages are not included in these counts since they are not distinct from, but encompass, the individual languages that are already counted.

Language information

The title for a language page is an anglicized form of the name used to refer to that language in that country. In most cases the name corresponds to the ISO 639-3 reference name associated with the ISO 639-3 code. Where the users of the language have expressed a preference for a different name, Ethnologue generally follows that preference. In other cases, the primary name may be the most well-known English (or anglicized) name associated with the language. Names are generally recorded using English spellings, though diacritical marks may be included. For some language names in southern Africa special symbols are used to represent the click sounds produced with ingressive mouth air.

The subtitle names the primary country for the language. When a language is spoken in more than one country, Ethnologue designates one of the countries as primary, usually the country of origin. In cases where the language is indigenous in multiple countries, the country having the most users is designated as primary.

A complete language description contains the following elements. Follow the link for a full description of the element. Each language description includes only the elements for which information is known.

Language identification gives the code assigned to the language by the ISO 639-3 standard, plus a list of alternate or other names that have been used to refer to the language.
User population gives the number of people in the country who use this language, plus the total number of users worldwide if it is used in multiple countries. These user populations are broken down into first and second language users when the data are available. Also included may be monolingual population, ethnic population, and other comments about population.
Location describes where the language users are located within the country.
Language status gives the EGIDS level for the language in the country and describes the level of official recognition, if any. If the language is associated with an officially recognized nationality or ethnic group, that association is reported here.
Classification provides the language classification.
Dialects lists the names that have been used to refer to varieties of the language, as well as giving information about dialect relations in terms of intelligibility and lexical similarity with other varieties if available. Includes macrolanguage membership if applicable.
Typology provides typological information, including brief descriptions of basic word order, significant phonological, morphological, and syntactic features, and other matters of interest to linguists.
Language use gives information about domains of use, age of speakers, other comments on the viability of the language and patterns of use, the use of other languages by this language community, and the use by others of this language as a second language.
Language development gives information about literacy rates, use in education, language documentation and development products, revitalization efforts, and language development agencies.
Digital support gives information about the digital vitality of the language as measured on the Digital Language Support scale.
Writing gives information about writing systems and scripts used for the language.
Other comments gives information identifying non-indigenous languages and all additional information about the language or ethnic group, including primary religious affiliations
Language resources gives a link to the page from the Open Language Archives Community (OLAC) catalog that lists resources in and about the language.

If the language has significant use in other countries, subentries for these countries are listed at the bottom of the page. Information like classification and typology which is the same in every country is not repeated in these subentries.

Language identification

The entry begins with the international three-letter ISO 639-3 code that is used to identify the language uniquely, plus a list of other names that have been used to identify it.

ISO 639-3 code. The code assigned to the language by the ISO 639-3 standard (ISO 2007) is given in lower-case letters within square brackets. When a given language is spoken in multiple countries, all of the entries for that language use the same three-letter code. The code distinguishes the language from other languages with the same or similar names and identifies those cases in which the name differs across country borders. These codes ensure that each language is counted only once in world or area statistics.

Alternate names. Many languages are known by or have been referenced in the literature by more than one name. Alternate names come from many diverse sources: speakers may have more than one name for their language, or neighboring groups may use different names. Other names may have been assigned by outsiders and used in ethnographic or linguistic publications before the name used by the speakers themselves was known. Another source of alternate names is variant spellings of what is essentially the same name. In many cases, spellings used in languages of wider communication or in regional languages are also included in the list. Some names may identify the ethnic group or place names that have been used in the literature as names for the language.

Some names, used in the past or in use by others, are pejorative and offensive to the speakers of the language. Those are identified, wherever they are listed, by enclosing the name in double quotation marks and appending the label pej. (pejorative) following the name. We include these names as a means of helping users find languages they may have only heard of or seen referred to by such names. By so doing, Ethnologue in no way implies any endorsement of the pejorative names.

Autonym. This is the “self name”, or, the name of the language in the language itself. Furthermore, the form given is a standard spelling within the writing system of the language, which means that this field is never reported for an unwritten language. When the script is non-Roman or contains unusual characters, a romanization of the name is given in parentheses.

User population

Population data have been provided from many different sources over a number of years. This diversity among sources and dates frequently causes the totals of the populations for all of the languages in any given country to differ markedly from the total current census population of the country.

We do not extrapolate population estimates to bring them up-to-date, since populations of language communities do not necessarily increase or decrease at the same rate within a country and since some initial estimates themselves turn out to have been incorrect to start with. However, some population data submitted to the Ethnologue may be the result of extrapolation.

It is often difficult to get an accurate estimate of the number of speakers of a language. All figures are only estimates; this is true even for census figures. Some sources do not include all dialects in their figures or may count as a single language two languages identified separately in the ISO 639-3 inventory. Some sources count members of ethnic groups, who, in some cases, may not be speakers of the language. Some sources do not make clear whether they refer to the total number of speakers in all countries, or only to those in one of the countries. We attempt to distinguish first-language (L1) users from second-language (L2) users. In a case where the source combines these into a single number, we identify the population as “all users”, rather than “L1 users” or “L2 users”.

Country user population. This field begins with a number; it is the number of all known users in the country. It is suffixed with “all users” if it is known to combine L1 and L2 users and further information about the breakdown follows if it is known. If the initial number is suffixed with “L2 users”, then the only known user population is for L2 users. If the initial number has neither of these phrases suffixed, then it is an estimate of L1 users and there are no known L2 users.

Languages that are no longer in use, but still have ethnic group members who identify with the language, are listed as having “No known speakers” in place of a population figure. Languages that have neither societal use nor remaining ethnic group members are described as “Extinct”. Languages which have no L1 speakers but which are used for specific purposes by a community are identified as “Second Language Only”.

Dates and sources for population data are given where available using the conventions described in How references are cited. Where the word “census” appears as the source, it is generally the national census of the country and is not included in the list of references cited. In some cases the source is a government agency (but not the official census) or another organization. Only when the citation has the form “Author Year” will the source appear in the list of references cited; see How references are cited.

Population stability comment. For some languages, we are able to indicate whether the L1 speaker population is increasing or decreasing. This information also contributes to an overall evaluation of ethnolinguistic vitality. There may be a few cases in which the actual speaker population count is not known or is unreported, but the stability and general trend of the population is evident and has been commented on.

Population remarks. Additional information concerning populations may include population breakdowns (by dialect, gender, ethnic groups, or specific villages or communities), the population of the deaf community (in the case of sign language entry), or other comments on demographics. In the case of an extinct or dormant language, an estimate of when the last speakers died is given when available.

Monolingual population. When the data are available, the number of those who are monolingual is reported. In some cases it is reported as a percentage of the L1 speaker population. Where it is known that there are no monolingual users of the language that fact is reported. This information along with the total speaker population is an indicator of the vitality of the language.

Ethnic population. Where it is known, the population of those who identify themselves as part of the ethnic group, whether or not they speak the language, is given. A language with no first-language speakers will be reported as extinct when the ethnic population figure is zero, absent, or unknown. When the reported L1 speaker population is zero but there is an ethnic population figure, the language will be reported as having “No known speakers”.

Location

A description of the locations where the language is spoken is included in each entry where a specific area can be defined. Those languages that are used everywhere in a country or specified region are reported as “Widespread”. These languages may not appear on the country maps. Languages that are widely dispersed in specific locations or which are used by nomadic groups, are identified as “Scattered.” Generally, where locations are known, they are listed in descending order from the largest geopolitical subdivision to the smallest. Major administrative subdivisions are followed by a colon followed by a comma-separated list of subordinate locations. The list of locations may not be exhaustive and locations other than the first-order subdivisions may not be ranked accurately in the list.

Language status

This part of the entry reports on the vitality status of the language in the country, describes its official function in the country, and supplies additional background information for a language of wider communication (LWC)

EGIDS estimate. The vitality status of the language in the country is summarized by estimating its level on the Expanded Graded Intergenerational Disruption Scale (EGIDS); see the complete section on Language status for a listing of the levels. In cases where the rest of the language entry is sparse in terms of reporting facts about the situation of the language, this estimate can be taken to be the best guess of contributors familiar with the region.

An EGIDS estimate is provided only for languages that are judged to be “established” within the country. This includes all languages that are indigenous to the country, plus any languages originating from elsewhere that have become rooted in that country. We judge a non-indigenous language to have become established in a country when it meets the following two characteristics. First, it is being acquired by the next generation. This can take place by various means—in the home, through mandatory schooling, or in the work place. Second, its use is a norm (whether as L1 or L2) within a language community or a community of practice. The community of practice may include students who learn it as an L2 in a widespread, mandatory educational system.

From the point of view of sustaining language use, the single most significant break in the EGIDS scale is the divide between 6a and 6b. For languages that are 6a and higher, it is the norm that the language is being learned by all the children within its user community. But at level 6b and below, this is no longer the norm and intergenerational transmission is being disrupted. The determination about that is based on information that has been reported about whether or not all children use the language. For cases in which no such information has been reported, we use an asterisk as a modifier on the EGIDS estimate to indicate that it represents our editorial best guess. Thus 5* or 6a* indicates a language that we think is most likely to be in vigorous use by all, while 6b* indicates a language that we believe is most likely to be losing speakers. These judgments have been made by comparing the population of the language in question to the populations of all the other languages in the same country or region for which there is explicit data about whether the language is vigorous or is beginning to shift.

Special cases. There are two cases in which the status field is reported differently as either unestablished or unattested. Due to the increasing influence of human migration on the world language situation, Ethnologue now includes full entries for immigrant languages that have a significant presence in a host country. Languages identified as Unestablished are those that have not yet become rooted in the host country and thus do not share the characteristics described above of being transmitted to the next generation within the country as a norm for a language community or a community of practice. These include the first languages of refugees, newly arrived immigrants, temporary foreign workers, or immigrants who are so scattered as not to form significant speech communities within the host country. These may also include languages learned as an L2 by a significant number of people in the country through elective classes in education.

In a few cases, there is real doubt as to whether the language actually exists. Although an ISO 639-3 code has been assigned, data on the existence of the language is not convincing. In such cases, we do not report an EGIDS level but identify the language status as Unattested. A full entry is published in order to document what the ISO 639-3 code is meant to signify, the language is not counted in the statistics as a living language. Languages identified as unattested are submitted to the ISO 639-3 review process and removed from future editions if they become deprecated by the ISO 639-3 standard.

Function in country. If the language has been officially recognized in legislation or serves official functions at the national or provincial levels, there is an additional note naming the nature of the recognition and function. If the recognition is statutory, the statute is identified. If the recognition is regional, the region where the status is assigned is identified. The categories for recognition and function are described in the section on Official recognition.

LWC information. If the reported language status is EGIDS 3 (Wider Communication) and the data are available, further information about the history or the nature of the use of this language as an LWC by L1 speakers of other languages is described.

Classification

This part of the entry names the linguistic affiliation of the language.

All languages are slowly changing, and linguistically related varieties may be diverging or merging. Most languages are related to other languages—to some more closely and to others more distantly. Linguists have used terms such as phylum, stock, family, branch, group, language, and dialect to refer to these relationships in increasing order of linguistic similarity much like a family tree.

Linguistic classification. The classification information for each language follows the general order from largest grouping to smallest. More inclusive group names are given first, followed by the names for less inclusive subgroups, separated by commas.

Language classification information comes from a variety of sources. The Ethnologue attempts to report the generally accepted consensus of scholars working in the language family based on published works and scholarly review. The sources on which the classifications are based are not overtly cited in the language entry but may be included in the list of general references listed at the country level.

A listing of the highest-level language families (including the number of languages, average populations, and countries where spoken) is given in the Summary by Language Family. The family trees may be browsed by going to Browse Languages by Family.

Dialects

This part of the entry gives information about the names of dialects of the language. It may also describe the relationships among dialects or to other languages in terms of dialect intelligibility and lexical similarity. It also includes macrolanguage membership if applicable.

Dialect names. Speech varieties which are functionally intelligible to each others’ speakers because of linguistic similarity are generally considered dialects of the same language and the names of all such dialects are listed under that language. In addition, alternate names for individual dialects are listed in parentheses following the primary name for the dialect. When one of these names is known to be offensive to its speakers, it is placed in double quotes (and tagged as pejorative with the abbreviation “pej.” as is also done for alternate language names).

The listing of dialect names is not the result of rigorous dialectological investigations. As with the alternate names, the list of dialect names includes all names reported to us which may, at one time or another, have been used in reference to some variety of a language. Some of these names are village or regional names and may not actually represent significant linguistic variants. In a few cases, the ISO 639-3 standard has assigned individual language identification codes to varieties which we, on the advice of our contributors and consultants, have included in our list of dialects. In those very few cases, we depart from the ISO 639-3 standard and do not list these varieties separately as individual languages.

Intelligibility and dialect relations. A measure of inherent intelligibility with other varieties is given by percent. Values of less than 85% are likely to signal difficulty in comprehension of the indicated language.

The ability of the users of one variety to understand another variety, based only on the similarity of those two varieties, is called inherent intelligibility. Intelligibility may not be reciprocal or mutual, thus the wording of the intelligibility description may indicate the direction of the intelligibility (e.g., 85% intelligibility of another variety, or 85% intelligibility by speakers of another variety). If the direction of intelligibility is not indicated (e.g., 85% intelligibility with another variety) or is identified as being mutual, it should be understood as being reciprocal with speakers of each of the varieties mentioned understanding each other equally well.

The ability of speakers to understand another variety because of previous exposure to it or learning is called acquired intelligibility and may be commented on in some language entries.

Lexical similarity. The percentage of lexical similarity between two linguistic varieties is determined by comparing a set of standardized wordlists and counting those forms that show similarity in both form and meaning. Percentages higher than 85% usually indicate a speech variant that is likely a dialect of the language with which it is being compared. Unlike intelligibility, lexical similarity is bidirectional or reciprocal.

Macrolanguage membership. If an individual language is a member of a macrolanguage (see Macrolanguages), that fact is reported here. The listing gives the name of the macrolanguage of which the individual language is a member, the name of the primary country under which its entry is found (if different from the current country), and the ISO code for the macrolanguage. By looking up that entry, it is possible to find a list of all the members of the macrolanguage.

Typology

A list of linguistic features of the language is given. Constituent order is the most commonly reported feature. Other basic characteristics that are of particular interest to linguists are also reported when the data are available. In a growing number of cases these listings are more extensive and cover a range of linguistic features, including information about the existence of prepositions versus postpositions, constituent order in noun phrases, gender, case, transitivity and ergativity, canonical syllable patterns, the number of consonants and vowels, the existence of tone, and in some cases whether users of the language also use whistle speech. These descriptions are no more than brief mentions, however, and do not constitute adequate descriptions of the language.

Language use

This part of the entry gives information about the use and viability of the language, as well as the use of other languages by members of the community. These data, for the most part, provide supporting evidence for the assignment of the EGIDS status; See the section on Language status above.

Vitality Remarks. As a general summary, in situations where the language is being passed on to children as their first language, or where it is used frequently and widely within the community, the term “Vigorous” is most often used. Other factors related to language vitality that may be reported are descriptions of languages that are used, use of this language by others, and the degree and nature of language shift that may be taking place.

Domains of use. When more than one language is used in a community, speakers often establish patterns of language use for specific configurations of speakers, topics, and locations. These domains of language use can be described by answering the well-known question, “Who speaks which language to whom, about what, and where?” In some language entries, we are able to specify a set of identified domains of use and we may also report whether the domain is associated exclusively with the language or is one where mixed language use is prevalent.

The Ethnologue does not have sufficient data about each language to permit a full description of the domains of use in this technical sense, but uses the term to refer most often to a general set of categories that name the context is which communication takes place (e.g., home, community, work, education, and religion) and thus only indirectly related to the topics and speakers most generally associated with those settings.

User age range. As language use shifts from a traditional language to one of wider communication, differences in use appear between age groups. As language change takes place, older adults tend to be the final speakers of the traditional language. This field describes the age range of those who use the language as an L1. When possible the value is chosen from the following picklist:

Used by all — The language is used by virtually everyone in every age group.
Some young people, all adults — All adults still use the language, but among children and youth, some use it and some do not.
Some of all ages — Language shift has been in progress for multiple generations; as a result, in each generation there are some who use the language and some who do not.
Adults only — No children or youth use the languages; all remaining users are in the child-bearing generation and older.
Older adults only — The only remaining speakers are middle-aged and older (e.g., 45 and above).
Elderly only — The only remaining speakers are of the great-grandparent generation (e.g., 70 and above).

Language attitudes. This field describes the general attitudes of the language community itself towards the use of its own language. We report only summary attitude evaluations as positive attitudes, neutral attitudes, or negative attitudes. Where attitudes towards use of the language are not the same throughout the community, we may report “mixed attitudes”.

Bilingualism remarks. Descriptions of the use of second languages by this language community are included here. Generally the remark consists of the phrase “Also use” followed by the name(s) of the additional languages. If use of a particular L2 is restricted to a particular domain or region or population segment, a comment to that effect is added.

These statements may be modified by a term estimating the extent of the second-language usage. The terms correspond to fairly broad percentage ranges as follows:

All — At least 95% of the ethnic population use the reported language as L2.
Most — At least 65% but less than 95% of the ethnic population use the reported language as L2.
Many — At least 35% but less than 65% of the ethnic population use the reported language as L2.
Some — At least 5% but less than 35% of the ethnic population use the reported language as L2.
Few — Less than 5% of the ethnic population use the reported language as L2.

These quantifiers are frequently based on the best estimates reported to us, though in some cases they represent calculated conversions of reported percentages over a wide time period. The bilingualism remarks are constructed automatically from the Ethnologue database with the result that they are sometimes repetitive or redundant.

When significant language shift has taken place, the “Also use” wording is changed to “Shifting to” (in the case of EGIDS 7) or “Shifted to” (in the case of EGIDS 8a, 8b, and 9) to indicate that the named language is the one that has been adopted in the home domain as the new L1 among children. If the entry lists additional languages with the “Also use” designation, these indicate languages that are an L2 for both the L1 user community and for those who have shifted to a different L1.

Use as second language. When the language in focus is used by others as a second language (as reported in the bilingualism remarks in other language entries), this is indicated with the phrase “Used as L2 by ...”. Following this introductory phrase is a list of the other languages that are reported to use this one as a second language. As with L2 use, this report of usage does not imply any specific level of proficiency.

Language development

This part of the entry gives information about literacy rates, use in education, publications and use in media, revitalization efforts, and language development agencies.

Literacy rates. When available, percentages of the speaker population who are literate are given for L1 and L2 languages. When the L2 is not specifically identified, it is assumed to be the dominant language of the country in focus or another major language in the vicinity.

Literacy remarks. Information concerning motivation for literacy and existence of government (and other) literacy programs are given where available. Additional information concerning literacy that does not appear in related categories may also be reported here.

Use in education. The use of a language in formal education within the country at primary, secondary, and tertiary levels is reported when known. The statement is introduced with the words “Taught in” when the language is the language of instruction for the basic curriculum across multiple subjects. Such a statement indicates the existence of written curricular materials in the language, as well as its oral use in the classroom. Alternatively, the phrase “Taught as subject” indicates that the curricular use of the language is limited to language subjects, including mother tongue reading and writing. When the language of instruction begins in early grades as the home language and transitions in later grades to a more dominant language, or when instruction as a language subject is limited to just some years, the statement is qualified by indicating the applicable grade levels.

Publications and use in media. The existence of materials that have been produced in the language such as language documentation (dictionaries, grammars, texts), printed literature, and broadcast media is indicated when known. We report the existence of such materials but do not list titles individually. Where extensive literature and media exist, we identify the language as “Fully developed”.

The most widely published book in the world is the Bible with at least portions having been translated and published in 3,181 or 44% of the living languages listed in the Ethnologue. Our information on the existence of the biblical text comes from a variety of sources. The information about Bible publication for each language is given with the dates of both the earliest and the most recent published Bible, New Testament (NT), Old Testament (OT), or complete books (portions) of the Bible.

Revitalization efforts. When formalized efforts to revitalize an endangered language have been reported, a cursory description of those efforts is given.

Language development agencies. Agencies that focus on the revitalization, maintenance, or development of the language are listed. These may be national or provincial official or semi-official entities or they may include formally constituted local organizations. In general, international development organizations are not included here. Additions to the existing information are welcomed.

Digital support

This part of the entry provides an assessment of the digital vitality of the language by reporting its level on the Digital Language Support (DLS) scale. The assessment involves harvesting the lists of supported languages from the websites of over 140 digital tools that were selected to represent a full range of ways that digital technology can support languages. Our methodology is described in full in Assessing digital language support on a global scale (Simons et al 2022). The assessment is reported in two parts: DLS level (DLS score).

DLS level. The degree to which a language is digitally supported is summarized on a five-level scale. The following are not the formal definitions of the levels, but in practice they roughly correspond to the following kinds of support:

Still — The language shows no signs of digital support.
Emerging — The language has some content in digital form or some encoding tools.
Ascending — The language has some spell checking or localized tools or machine translation as well.
Vital — The language is supported by multiple tools in all of the above categories and some speech processing as well.
Thriving — The language has all of the above plus virtual assistants.

DLS score. Mokken scale analysis is used to transform the harvested data into a numerical scoare that is ultimately reported as a proportion ranging from 0 (no support detected) to 1.0 (supported in every category by many vendors). When the score is 0, the level is reported as Still. A score greater than 0 maps to one of the other four levels: 0.12 is the upper bound for Emerging, 0.48 is the upper bound for Ascending, and 0.83 is the upper bound for Vital. The exact score is reported in parentheses so that one can compare the relative separation between two languages on the DLS growth curve.

Writing

For each language, the script used for written materials is given if known. Where multiple scripts are associated with the language they are reported in alphabetical order. Where possible we also report any specific style of a script that is used, the years when a script began to be used or ceased to be used, and other comments regarding writing and orthography. In general, where no script is identified, it can be assumed that there is no widely accepted and used writing system. Scripts other than transcription systems also exist for some Sign Languages but are not in wide use and so are not currently reported.

Other comments

This part of the entry gives additional information that does not fit under the above categories.

Non-indigenous. A language that did not originate in the country, but which is now established there either as a result of its longstanding presence or because of institutionally supported use and recognition is identified here with the label “Non-indigenous”. In general, these non-indigenous languages represent two different situations: Some are heritage languages associated with a long-established community which originated elsewhere. In many, but not all, of these cases the language is losing speakers as its users shift to a more dominant language. Others are major languages that are being transmitted to large numbers of people as a second language through formal educational institutions resulting in widespread second-language acquisition and growing use.

General remarks. These are general statements about the language or its context that do not fall into other specific categories. Alternate identifications of the language community or ethnic group may be identified or explained here. These may include government recognized or official nationalities, ethnic names, or the meanings or derivations of certain names. Other historical and ethnographic information may be included here as well.

Religion. The religious affiliations of the speakers of the language are given where known. These are generally listed in descending order of number of adherents.

Macrolanguage member languages. If the entry is describing a macrolanguage, then a complete list is given of the individual languages that fall within the scope of the macrolanguage. See Macrolanguages for a discussion of this concept.

Second Language Only status. While there are many languages that are used as second languages by large populations of speakers, the phrase “Second language only” is used to identify a specific category made up of those languages which are used as second languages but have no L1 speakers and generally weak or secondary ethnic or identity associations. These may include languages of special use, such as languages of initiation, languages of interethnic communication, liturgical languages, as well as cants and jargons. Most often these languages are given a status of EGIDS 3 (Wider Communication) but are identified in this way as well because of the absence of L1 speakers.

Language resources

This part of the entry provides a link to the catalog page for this language compiled by the Open Language Archives Community (OLAC). It lists all of the resources held by participating archives that are known to be in or about the language. OLAC compiles an aggregated catalog of approximately 400,000 items held by more than 60 archives.

Use in other countries If the language is present in more than one country, the entries for the language in those countries are listed at the bottom of the page.

Language maps

Maps showing the locations of language homelands are available for most countries of the world. Where data is available, mapped language areas show distribution of first-language (L1) speakers of established languages, unless explicitly stated on the map. The few second-language (L2) distributions we show are all in separate insets, distinct from the main map.

Where detailed population data are available, we typically use the convention of showing a language in an area if at least 25% of the population in that area speak the language fluently as an L1. However, a minority language may be shown on a map even if the speakers do not comprise more than 25% of the population in any given area. This exception is made when the minority language is indigenous to that area only or, when there is historical significance for the language in that region. An exception may also be made for an immigrant language when the speakers are confined to a specific region and we desire to show the approximate extent of their locations. In other cases, the language locations have been plotted using information supplied by language surveyors or by other field linguists. Where none of these sources is available, map locations are plotted using the best available published sources. The location of an immigrant language population is mapped only if the language is classified by Ethnologue as established in the country. The locations of these and other minority languages are not mapped within major cities.

Most of the maps make use of polygons to show the approximate boundaries of the language groups. No claim is made for precision in the placement of these boundaries, which in many instances overlap with those of other languages. Reference numbers are used on maps where space does not allow the placement of language names. For some maps where the language boundaries are not known, the names or numbers appear alone.

The earliest maps in Ethnologue were hand drawn. Maps of Central and South America were commissioned for the 10th edition, and some maps of Africa were added in the 11th. For the 12th edition computer-generated maps were developed as part of the Language Mapping Project carried out jointly with Global Mapping International (GMI). For the 16th edition, all of the maps were redrawn with a new and clearer design. The capabilities afforded to us by a new generation of mapping software made it possible to improve the way that we show the family association of each language and the overlap of languages. The current maps are drawn using ArcGIS® software published by Esri®. We continue to increase the level of geographic detail included in the maps to aid in locating the language groups within a country, especially adding state or province boundaries and administrative centers to many more maps.

Most of the maps in this publication include the use of geodata from Esri®. You can see a list of specific data sources here: Esri Data and Maps (https://www.esri.com/content/dam/esrisites/en-us/media/legal/redistribution-rights/redist-rights-104.pdf). A few other maps are drawn against the backdrop of the Seamless Digital Chart of the World (SDCW) that was published by GMI. The Esri geodata has a finer level of resolution than the base map used for the maps in earlier editions. In 2020, all the language polygons were repositioned to fit the greater detail of geographic features offered by that new dataset. This improved data will be incorporated into the remainder of the maps over the next several editions. We continue to improve the accuracy and precision with which the language areas are plotted, particularly with the increased use of location data collected by GPS units and satellite imagery, such as Google Maps and the Esri® Living Atlas. We acknowledge use of some Esri® feature services published in conjunction with Michael Bauer Research GmbH for the development of the language polygons for some European countries.

The introduction of GPS data and the increasing use of census data have also meant that the data are becoming increasingly detailed. Smooth generalized curves are being replaced by more complex features and, in some countries, we are now able to provide a more accurate representation of the complexity where speakers from several language groups live in the same area. We continue to look for ways to improve the depiction of the language groups on the maps. The change to a primarily online format and multiple print volumes has given greater freedom to increase the number of maps. We continue to take advantage of this flexibility to redesign the maps of some countries at larger scales. Also we continue to add to the maps the changes in the ISO 639-3 inventory of languages.

We acknowledge use on some maps of national park data adapted from the International Union for Conservation of Nature (IUCN) and United Nations Environment Program — World Conservation Monitoring Centre (UNEP-WCMC) (2015). The World Database on Protected Areas (WDPA) is available at www.protectedplanet.net. Some toponymic information is based on the Geographic Names Database, containing standard names approved by the United States Board on Geographic Names and maintained by the National Geospatial-Intelligence Agency. More information is available at the products and services link at www.nga.mil. Where appropriate, ISO standard administrative names are used. We also gratefully acknowledge the use of resources shared by the Myanmar Information Management Unit (MIMU, themimu.info) in developing the maps for Myanmar.

Continental maps that display the locations of sign languages were first published in the 26th edition. Since users of deaf community sign languages are scattered across regions, we often are not able to display precise locations for them. Instead, we have published country-wide or regional sign language areas. Shared signing community languages are almost always local, and these languages continue to appear on the country maps. There are also a few deaf community languages where it is helpful to show both a local area that is important to the community, as well as a country-wide area, such as Myanmar Sign Language. In such cases we have been able to map known concentrations of speakers as well as allowing for scattered distributions of speakers throughout the country. Refer to both the continental sign language maps and the country maps in these situations.

Another map variety, first published in the 27th edition, is a vehicular language map. The term “vehicular language” is defined as follows:

Vehicularity refers to the degree to which a language is used not only by the group of people who associate it most closely with their identity, but also by others who find that the language provides them with rewards and benefits that can’t be as easily derived by using any other language. ... This dimension is called vehicularity since the language serves as a vehicle of communication that can take a user beyond the spheres of activity in which they could operate without it. (Lewis and Simons 2016:104)

When multiple vehicular languages are in use within a region by the speakers of its local languages, it is helpful to create different maps for the same region, one showing the range of the vehicular languages and the other showing the local languages. See, for example, the local and vehicular maps for Sumatra in Indonesia.

The maps use a variety of map projections: African equatorial countries use the Sinusoidal projection. Other equatorial countries use the Mercator (cylindrical) projection. Maps of countries in higher latitudes use the Lambert Conformal Conic projection.

As with all of the content of the Ethnologue, no political statement is intended by the identification of any territory separately in a map or in the language listings nor by the placement of any boundary lines for any languages or countries on any map.