Language Information


The title for a language page is an anglicized form of the name used to refer to that language in that country. In most cases the name corresponds to the ISO 639-3 reference name associated with the ISO 639-3 code. Where the users of the language have expressed a preference for a different name, Ethnologue generally follows that preference. In other cases, the primary name may be the most well-known English (or anglicized) name associated with the language. Names are generally recorded using English spellings, though diacritical marks may be included. For some language names in southern Africa special symbols are used to represent the click sounds produced with ingressive mouth air.

The subtitle names the primary country for the language. When a language is spoken in more than one country, Ethnologue designates one of the countries as primary, usually the country of origin of the language or the country where most of the language users are located.

A complete language description contains the following elements. Follow the link for a full description of the element. Each language description includes only the elements for which information is known.

  • Language identification gives the code assigned to the language by the ISO 639-3 standard, plus a list of alternate or other names that have been used to refer to the language.
  • Population gives the number of people in the country who use this language, plus the total number of users worldwide if it is used in multiple countries. These user populations are broken down into first and second language users where the data is available. Also included may be monolingual population, ethnic population, and other comments about population.
  • Location describes where the language users are located within the country.
  • Language status gives the EGIDS level for the language in the country and describes the level of official recognition, if any. If the language is associated with an officially recognized nationality or ethnic group, that association is reported here.
  • Classification provides the language classification.
  • Dialects lists the names that have been used to refer to varieties of the language, as well as giving information about dialect relations in terms of intelligibility and lexical similarity with other varieties if available. Includes macrolanguage membership if applicable.
  • Typology provides typological information, including brief descriptions of basic word order, significant phonological, morphological, and syntactic features, and other matters of interest to linguists.
  • Language use gives information about domains of use, age of speakers, and viability and patterns of use, the use of other languages by this language community and the use by others of this language as a second language.
  • Language development gives information about literacy rates, use in education, language documentation and development products, revitalization efforts, and language development agencies.
  • Writing gives information about writing systems and scripts used for the language.
  • Other comments gives information identifying non-indigenous languages and all additional information about the language or ethnic group, including primary religious affiliations

If the language is indigenous or established in other countries, subentries for these countries are listed at the bottom of the page. Information like classification and typology which is the same in every country is not repeated in these subentries.

Language identification

The entry begins with the international three-letter ISO 639-3 code that is used to identify the language uniquely, plus a list of other names that have been used to identify it.

ISO 639-3 code The code assigned to the language by the ISO 639-3 standard (ISO 2007) is given in lower-case letters within square brackets. When a given language is spoken in multiple countries, all of the entries for that language use the same three-letter code. The code distinguishes the language from other languages with the same or similar names and identifies those cases in which the name differs across country borders. These codes ensure that each language is counted only once in world or area statistics.

Alternate names. Many languages are known by or have been referenced in the literature by more than one name. Alternate names come from many diverse sources: speakers may have more than one name for their language, or neighboring groups may use different names. Other names may have been assigned by outsiders and used in ethnographic or linguistic publications before the name used by the speakers themselves was known. Another source of alternate names is variant spellings of what is essentially the same name. In many cases, spellings used in languages of wider communication or in regional languages are also included in the list. Some names may identify the ethnic group or place names that have been used in the literature as names for the language.

Some names, used in the past or in use by others, are pejorative and offensive to the speakers of the language. Those are identified, wherever they are listed, by enclosing the name in double quotation marks and appending the label pej. (pejorative) following the name. We include these names as a means of helping users find languages they may have only heard of or seen referred to by such names. By so doing, Ethnologue in no way implies any endorsement of the pejorative names.

Autonym. This is the “self name”, or, the name of the language in the language itself. Furthermore, the form given is a standard spelling within the writing system of the language. Thus this field is never reported for an unwritten language. When the script is non-Roman or contains unusual characters, a romanization of the name is given in parentheses.


Population data have been provided from many different sources over a number of years. This diversity among sources and dates frequently causes the totals of the populations for all of the languages in any given country to differ markedly from the total current census population of the country.

We do not extrapolate population estimates to bring them up-to-date, since populations of language communities do not necessarily increase or decrease at the same rate within a country and since some initial estimates themselves turn out to have been incorrect to start with. However, some population data submitted to the Ethnologue may be the result of extrapolation.

It is often difficult to get an accurate estimate of the number of speakers of a language. All figures are only estimates; this is true even for census figures. Some sources do not include all dialects in their figures or may count as a single language two languages identified separately in the ISO 639-3 inventory. Some sources count members of ethnic groups, who, in some cases, may not be speakers of the language. Some sources do not make clear whether they refer to the total number of speakers in all countries, or only to those in one of the countries. Some do not distinguish first-language (L1) speakers from second-language (L2) speakers.

Country speaker population. This field begins with a number; it is number of all known users in the country. It is sufixed with "all users" if it is known to combine L1 and L2 users and further information about the breakdown follows if it is known. If the initial number is suffixed with "L2 users", then the only known user populaton is for L2 users. If the initial number has neither of these phrases suffixed, then it is an estimate of L1 users and there are no known L2 users.

Languages that are no longer in use, but still have ethnic group members who identify with the language, are listed as having “No known speakers” in place of a population figure. Languages that have neither societal use nor remaining ethnic group members are described as “Extinct”. Languages which have no L1 speakers but which are used for specific purposes by a community are identified as “Second Language Only”.

Dates and sources for population data are given where available using the conventions described in How references are cited. Where the word “census” appears as the source, it is generally the national census of the country and is not included in the list of references cited. In some cases the source is a government agency (but not the official census) or another organization. Only when the citation has the form “Author Year” will the source appear in the list of references cited; see How references are cited.

Population stability comment. For some languages, we are able to indicate whether the L1 speaker population is increasing or decreasing. This information also contributes to an overall evaluation of ethnolinguistic vitality. There may be a few cases where the actual speaker population count is not known or is unreported, but the stability and general trend of the population is evident and has been commented on.

Monolingual population. Where the data are available, the number of those who are monolingual is reported. In some cases it is reported as a percentage of the L1 speaker population. Where it is known that there are no monolingual users of the language that fact is reported. This information along with the total speaker population is an indicator of the vitality of the language.

Ethnic population. Where it is known, the population of those who identify themselves as part of the ethnic group, whether or not they speak the language, is given. A language with no first-language speakers will be reported as extinct when the ethnic population figure is zero, absent, or unknown. When the reported L1 speaker population is zero but there is an ethnic population figure, the language will be reported as having “No known speakers”.

Population remarks. Additional information concerning populations may include population breakdowns (by dialect, gender, ethnic groups, or specific villages or communities), the population of the deaf community (in the case of a sign language entry), or other comments on demographics.


A description of the locations where the language is spoken is included in each entry where a specific area can be defined. Those languages that are used everywhere in a country or specified region are reported as “Widespread”. These languages may not appear on the country maps. Languages that are widely dispersed in specific locations or which are used by nomadic groups, are identified as “Scattered.” Generally, where locations are known, they are listed in descending order from the largest geopolitical subdivision to the smallest. Major administrative subdivisions are followed by a colon followed by a comma-separated list of subordinate locations. The list of locations may not be exhaustive and locations other than the first-order subdivisions may not be ranked accurately in the list.

Language status

This part of the entry reports on the vitality status of the language in the country, describes its official function in the country, and supplies additional background information for a language of wider communication (LWC)

EGIDS estimate. The vitality status of the language in the country is summarized by estimating its level on the Expanded Graded Intergenerational Disruption Scale (EGIDS); see the complete section on Language Status for a listing of the levels. In cases where the rest of the language entry is sparse in terms of reporting facts about the situation of the language, this estimate can be taken to be the best guess of contributors familiar with the region.

Function in country. If the language has been officially recognized in legislation or serves official functions at the national or provincial levels, there is an additional note naming the nature of the recognition and function. If the recognition is statutory, the statute is identified. If the recognition is regional, the region where the status is assigned is identified. The categories for recognition and function are described in the section on Official Recognition.

LWC information. If the reported language status is EGIDS 3 (Wider Communication) and the data are available, further information about the history or the nature of the use of this language as an LWC by L1 speakers of other languages is described.


This part of the entry names the linguistic affiliation of the language, including macrolanguage membership if applicable.

All languages are slowly changing, and linguistically related varieties may be diverging or merging. Most languages are related to other languages–to some more closely and to others more distantly. Linguists have used terms such as phylum, stock, family, branch, group, language, and dialect to refer to these relationships in increasing order of linguistic similarity much like a family tree.

Linguistic classification. The classification information for each language follows the general order from largest grouping to smallest. More inclusive group names are given first, followed by the names for less inclusive subgroups, separated by commas.

Language classification information comes from a variety of sources. The Ethnologue attempts to report the generally accepted consensus of scholars working in the language family based on published works and scholarly review. The sources on which the classifications are based are not overtly cited in the language entry but may be included in the list of general references listed at the country level. The sources used for classifications are available on request by contacting the Editor; see Contact us.

A listing of the highest-level language families (including the number of languages, average populations, and countries where spoken) is given in the Statistical Summaries. The family trees may be browsed by going to Browse by Language Family.


This part of the entry gives information about the names of dialects of the language. It may also describe the relationships among dialects or to other languages in terms of dialect intelligibility and lexical similarity.

Dialect names. Speech varieties which are functionally intelligible to each others’ speakers because of linguistic similarity are generally considered dialects of the same language and the names of all such dialects are listed under that language. In addition, alternate names for individual dialects are listed in parentheses following the primary name for the dialect. When one of these names is known to be offensive to its speakers, it is placed in double quotes (and tagged as pejorative with the abbreviation “pej.” as is also done for alternate language names).

The listing of dialect names is not the result of rigorous dialectological investigations. As with the alternate names, the list of dialect names includes all names reported to us which may, at one time or another, have been used in reference to some variety of a language. Some of these names are village or regional names and may not actually represent significant linguistic variants. In a few cases, the ISO 639-3 standard has assigned individual language identification codes to varieties which we, on the advice of our contributors and consultants, have included in our list of dialects. In those very few cases, we depart from the ISO 639-3 standard and do not list these varieties separately as individual languages.

Intelligibility and dialect relations. A measure of inherent intelligibility with other varieties is given by percent. Values of less than 85% are likely to signal difficulty in comprehension of the indicated language.

The ability of the users of one variety to understand another variety, based only on the similarity of those two varieties, is called inherent intelligibility. Intelligibility may not be reciprocal or mutual, thus the wording of the intelligibility description may indicate the direction of the intelligibility (e.g., 85% intelligibility of another variety, or 85% intelligibility by speakers of another variety). If the direction of intelligibility is not indicated (e.g., 85% intelligibility with another variety) or is identified as being mutual, it should be understood as being reciprocal with speakers of each of the varieties mentioned understanding each other equally well.

The ability of speakers to understand another variety because of previous exposure to it or learning is called acquired intelligibility and may be commented on in some language entries.

Lexical similarity. The percentage of lexical similarity between two linguistic varieties is determined by comparing a set of standardized wordlists and counting those forms that show similarity in both form and meaning. Percentages higher than 85% usually indicate a speech variant that is likely a dialect of the language with which it is being compared. Unlike intelligibility, lexical similarity is bidirectional or reciprocal.

Macrolanguage membership. If an individual language is a member of a macrolanguage (see Macrolanguages in “The problem of language identification”), that fact is reported following the language classification information. The listing gives the name of the macrolanguage of which the individual language is a member, the name of the primary country under which its entry is found (if different from the current country), and the ISO code for the macrolanguage. By looking up that entry, it is possible to find a list of all the members of the macrolanguage.


A list of linguistic features of the language is given. Constituent order is the most commonly reported feature. Other basic characteristics that are of particular interest to linguists are also reported when the data is available. In a growing number of cases these listings are more extensive and cover a range of linguistic features, including information about the existence of prepositions versus postpositions, constituent order in noun phrases, gender, case, transitivity and ergativity, canonical syllable patterns, the number of consonants and vowels, the existence of tone, and in some cases whether users of the language also use whistle speech. These descriptions are no more than brief mentions, however, and do not constitute adequate descriptions of the language. A list of the typological features that we wish to collect data on is available upon request.

Language use

This part of the entry gives information about the use and viability of the language, as well as the use of other languages by members of the community. These data, for the most part, provide supporting evidence for the assignment of the EGIDS status (See Language Status section above).

Vitality Remarks. As a general summary, where the language is being passed on to children as their first language, or where it is used frequently and widely within the community, the term “Vigorous” is most often used. Other factors related to language vitality that may be reported are descriptions of languages that are used, use of this language by others, and the degree and nature of language shift that may be taking place.

Domains of use. When more than one language is used in a community, speakers often establish patterns of language use for specific configurations of speakers, topics, and locations. These domains of language use can be described by answering the well-known question, “Who speaks which language to whom, about what, and where?” In some language entries, we are able to specify a set of identified domains of use and we may also report whether the domain is associated exclusively with the language or is one where mixed language use is prevalent.

The Ethnologue does not have sufficient data about each language to permit a full description of the domains of use in this technical sense, but uses the term to refer most often to a general set of categories associated with a particular location (e.g., home, school, community) and thus only indirectly related to the topics and speakers most generally associated with those settings.

User age groups. Data reported in this category is most often reported in terms of generations (children, adolescents, young adults, adults, older adults) but occasionally specific age ranges may be identified.

As language use shifts from a traditional language to one of wider communication, differences in use appear between age groups. As language change takes place, older adults tend to be the final speakers of the traditional language. The use of a language in terms of generational cohorts is thus a significant indicator of the patterns of language transmission which is key to language maintenance.

Language attitudes. We report only summary attitude evaluations as positive attitudes, neutral attitudes, or negative attitudes. Where attitudes towards use of the language are not the same throughout the community, we may report “mixed attitudes”.

Bilingualism remarks. Descriptions of the use and users of second languages are included. Unless specific information is provided to us, we generally do not characterize levels of proficiency or domains of use of second languages. Generally the remark consists of the phrase “Also use” followed by the name(s) of the additional languages. These statements may be modified by a term indicating the scope of the second-language usage. The terms correspond to fairly broad percentage ranges as follows:

  • All - 95% or more of the ethnic population use the reported language as L2
  • Most - less than 95% but greater than 60% of the ethnic population use the reported language as L2
  • Many - less than or equal to 60% but more than 40% of the ethnic population use the reported language as L2
  • Some - less than or equal to 40% but more than 10% of the ethnic population use the reported language as L2
  • Few - Less than or equal to 10% of the ethnic population use the reported language as L2

These quantifiers are frequently based on the best estimates reported to us though in some cases they represent calculated conversions of reported percentages over a wide time period.

Because second languages are usually learned later than first languages and access to the means of acquiring proficiency in a second language is not always equally available, bilingualism is usually not uniform across a community. When communities use a second language, different speakers usually have varying degrees of bilingual proficiency, ranging from the ability to use only greetings, to engage in trade, or proficiency adequate for freely expressing anything in the second language. Leaders, educated, men, traders, those who travel, those in population centers, and people in certain age groups may be more bilingual than other members of their community. Ethnologue generally does not attempt to describe the level of proficiency of users in any language. The bilingualism remarks statements are constructed automatically from the Ethnologue database and are sometimes repetitive or redundant.

Use as second language. When the language in focus is used by others as a second language (as reported in the bilingualism remarks in other language entries), this is indicated with the phrase “Used as L2 by ...”. Following this introductory phrase is a list of the other languages that are reported to use this one as a second language. As with L2 use, this report of usage does not imply any specific level of proficiency.

Language development

This part of the entry gives information about literacy rates, use in education, publications and use in media, revitalization efforts, and language development agencies.

Literacy rates. Where available, percentages of the speaker population who are literate are given for L1 and L2 languages. Where the L2 is not specifically identified, it is assumed to be the dominant language of the country in focus or another major language in the vicinity.

Literacy remarks. Information concerning motivation for literacy and existence of government (and other) literacy programs are given where available. Additional information concerning literacy that does not appear in related categories may also be reported here.

Use in elementary or secondary schools. The language may be used either as a language of instruction or taught as a subject within one or more schools in the language area. Generally, we only include a statement in this category if the language is used in the schools. Occasionally some additional information about the nature of that use is also available and is reported.

Publications and use in media. The existence of materials that have been produced in the language such as language documentation (dictionaries, grammars), printed literature, and broadcast media is indicated when known. We report the existence of such materials but do not list titles individually. Where extensive literature and media exist, we identify the language as “Fully developed”.

The most widely published book in the world is the Bible with at least portions having been translated and published in 2970 or 42% of the living languages listed in the Ethnologue. Our information on the existence of the biblical text comes from a variety of sources. The information about Bible publication for each language is given with the dates of both the earliest and the most recent published Bible, New Testament (NT), Old Testament (OT), complete books (portions) of the Bible.

Revitalization efforts. When formalized efforts to revitalize an endangered language have been reported, a cursory description of those efforts is given.

Language development agencies. Agencies that focus on the revitalization, maintenance, or development of the language are listed. These may be national or provincial official or semi-official entities or they may include formally constituted local organizations. In general, international development organizations are not included here. Additions to the existing information are welcomed.


For each language, the script used for written materials is given if known. Where multiple scripts are associated with the language they are reported in alphabetical order. Where possible we also report any specific style of a script that is used, the years when a script began to be used or ceased to be used, and other comments regarding writing and orthography. In general, where no script is identified, it can be assumed that there is no widely accepted and used writing system. Scripts other than transcription systems also exist for some Sign Languages but are not in wide use and so are not currently reported.

Other comments

This part of the entry gives additional information that does not fit under the above categories.

Non-Indigenous. A language that is not indigenous to the country, but which is now established there either as a result of its longstanding presence or because of institutionally supported use and recognition is identified here with the label “Non-indigenous”. In general, these non-indigenous languages represent two different situations: Some are heritage languages associated with a long-established community which originated elsewhere. In many, but not all, of these cases the language is losing speakers as its users shift to a more dominant language. Others are major languages that are being transmitted to large numbers of people as a second language through formal educational institutions resulting in widespread second-language acquisition and growing use.

General remarks. These are general statements about the language or its context that do not fall into other specific categories. Alternate identifications of the language community or ethnic group may be identified or explained here. These may include government recognized or official nationalities, ethnic names, or the meanings or derivations of certain names. Other historical and ethnographic information may be included here as well.

Religion. The religious affiliations of the speakers of the language are given where known. These are generally listed in descending order of number of adherents.

Macrolanguage member languages. If the entry is describing a macrolanguage (see Macrolanguages in “The problem of language identification”), then a complete list is given of the individual languages that fall within the scope of the macrolanguage.

Second Language Only status. While there are many languages that are used as second languages by large populations of speakers, the phrase “Second language only” is used to identify a specific category made up of those languages which are used as second languages but have no L1 speakers and generally weak or secondary ethnic or identity associations. These may include languages of special use, such as languages of initiation, languages of interethnic communication, liturgical languages, as well as cants and jargons. Most often these languages are given a status of EGIDS 3 (Wider Communication) but are identified in this way as well because of the absence of L1 speakers.

Use in other countries If the language is present in more than one country, the entries for the language in those countries are listed at the bottom of the page.