The title for a language page is an anglicized form of the name used to refer to that language in that country. In most cases the name is the one that the speakers prefer if such a preference is known. However, speakers within a language community may have different opinions about which name they prefer. These preferred names are recorded using English spellings, though diacritical marks may be included. For some language names in southern Africa special symbols are used to represent the “click” sounds produced with ingressive mouth air.
The subtitle names the primary country for the language. When a language is spoken in more than one country, we designate one of the countries as primary, usually the country of origin of the language or the country where most of the language users are located.
A complete language description contains the following elements. Follow the link for a full description of the element. Note that each language description includes only the elements for which information is known.
- Language identification gives the code assigned to the language by the ISO 639-3 standard, plus a list of alternate or other names that have been used to refer to the language.
- Population gives the number of people in the country for whom this language is a first language, plus the total number of L1 speakers if it is used in multiple countries. Also included may be monolingual population, ethnic population, and other comments about population.
- Location describes where the language users are located within the country, as well as listing other countries in which the language is used.
- Language status gives the EGIDS level for the language in that country and describes the level of official recognition, if any.
- Classification provides the language classification, including macrolanguage membership if applicable.
- Dialects lists the names of dialects of the language, as well as giving information about dialect relations in terms of intelligibility and lexical similarity with other varieties if available.
- Typology provides typological information, including brief descriptions of basic word order, significant phonological, morphological, and syntactic features, and other matters of interest to linguists.
- Language use gives information about the use and viability of the language, as well as the use of other languages by members of the community.
- Language development gives information about literacy rates, written materials, use in education, and the name(s) of language development agencies focused on the language, if any.
- Writing gives information about writing systems and scripts used for the language.
- Other comments gives additional information that does not fit under the above categories.
If the language has well-established, multi-generational communities of language users in other countries, subentries for these countries are listed at the bottom of the page. Information like classification and typology which is the same in every country is not repeated in these subentries.
The entry begins with the international three-letter ISO 639-3 code that is used to identify the language uniquely, plus a list of other names that have been used to identify it.
ISO 639-3 code The code assigned to the language by the ISO 639-3 standard (ISO 2007) is given in lower-case letters within square brackets. When a given language is spoken in multiple countries, all of the entries for that language use the same three-letter code. The code distinguishes the language from other languages with the same or similar names and identifies those cases in which the name differs across country borders. These codes ensure that each language is counted only once in world or area statistics.
Alternate names. Many languages are known by or have been referenced in the literature by more than one name. Alternate names come from many diverse sources: speakers may have more than one name for their language, or neighboring groups may use different names. Other names may have been assigned by outsiders and used in ethnographic or linguistic publications before the name used by the speakers themselves was known. Another source of alternate names is variant spellings of what is essentially the same name. In many cases, spellings used in languages of wider communication or in regional languages are also included in the list. Names for the ethnic group and place names that have been used in the literature as names for the language are listed in the Other comments part of the entry.
Some names, used in the past or in use by others, are pejorative and offensive to the speakers of the language. Those are identified, wherever they are listed, by enclosing the name in double quotation marks and appending the label pej. (pejorative) following the name. We include these names as a means of helping users find languages they may have only heard of or seen referred to by such names. By so doing, Ethnologue in no way implies any endorsement of the pejorative names.
This part of the entry gives the number of first-language (L1) speakers in the country, plus the total number of L1 speakers worldwide if the language is used in multiple countries. Also included may be an indication of population stability, number of monolingual speakers, population of ethnic community, and other comments about population.
Country speaker population. The first population figure given is the estimated number of L1 speakers in the country in focus. Population data have been provided from many different sources over a number of years. This diversity among sources and dates frequently causes the totals of the populations for all of the languages in any given country to differ markedly from the total current population of the country.
We do not extrapolate population estimates to bring them up-to-date, since populations of language communities do not necesarily increase or decrease at the same rate within a country and since some initial estimates themselves turn out to have been incorrect to start with. However, some population data submitted to the Ethnologue may be the result of extrapolation.
The Ethnologue provides the number of L1 speakers wherever possible. It is often difficult to get an accurate figure for the speakers of a language. All figures are only estimates; this is true even for census figures. Some sources do not include all dialects in their figures or may count as a single language two languages identified separately in the ISO 639-3 inventory. Some sources count members of ethnic groups, who, in some cases, may not be speakers of the language. Some sources do not make clear whether they refer to the total number of speakers in all countries, or only to those in one of the countries. Some do not distinguish L1 speakers from second-language (L2) speakers.
Languages that are no longer in use, but still have ethnic group members who identify with the language, are listed as having “No known speakers” in place of a population figure. Languages that have neither societal use nor remaining ethnic group members are described as “Extinct”. Languages which have no L1 speakers but which are used for specific purposes by a community are identified as “Second Language Only”.
Dates and sources for population data are given where available using the conventions described in How references are cited. Where the word “census” appears as the source, it is generally the national census of the country and is not included in the list of references cited. In some cases the source is a government agency (but not the official census) or another organization. Only when the citation has the form “Author Year” will the source appear in the list of references cited; see How references are cited.
Population stability comment. For some languages, we are able to indicate whether the L1 speaker population is increasing or decreasing. This information also contributes to an overall evaluation of ethnolinguistic vitality. There may be a few cases where the actual speaker population count is not known or is unreported, but the stability and general trend of the population is evident and has been commented on.
Population in all countries. When a language has first-language speakers in more than one country, the entry for the primary country lists the calculated total speaker population for all countries. Since information may come from multiple sources, the sum of the individual country populations may not equal the figure given for all countries in other sources. In some cases, the L1 speaker population of one or more countries may not be available and is not included in this calculation.
Monolingual population. Where the data are available, the number of those who are monolingual is reported. In some cases it is reported as a percentage of the L1 speaker population. This figure can be compared with the total speaker population as an indicator of the vitality of the language.
L2 speaker population. Where the data are available, the number of those who speak this language as a second language is reported. Two numbers may be reported: the number of L2 speakers within the country, and the number of L2 speakers worldwide.
Population remarks. Additional information concerning populations may include population breakdowns (by dialect, gender, ethnic groups, or specific villages or communities), the population of the deaf community, or other comments on demographics.
Ethnic population. Where it is known, the population of those who identify themselves as part of the ethnic group, whether or not they speak the language, is given. A language with no first-language speakers will be reported as extinct when the ethnic population figure is zero, absent, or unknown. When the reported speaker population is zero but there is an ethnic population figure, the language will be reported as having “No known speakers”.
A description of the locations where the language is spoken is included in each entry where a specific area can be defined. Those languages that are used everywhere in a country or specified region may not have this information in the entry or may be reported as “Widespread”. These languages may not appear on the country maps. Languages that are widely dispersed in specific locations or which are used by nomadic groups, may be identified as “Scattered.” Generally, locations are listed in descending order from the largest geopolitical unit to the smallest. Major administrative subdivisions are followed by a colon followed by a comma-separated list of subordinate locations. The list of locations may not be exhaustive. A list of all countries where the language is spoken is provided in the primary entry when a language is spoken in multiple countries.
This part of the entry reports the estimate of the status of the language in the country on the Expanded Graded Intergeneratonal Disruption Scale (EGIDS); see the complete section on Language Status for a listing of the levels. If the language has been officially recognized in legislation or serves official functions at the national or provincial levels, there is an additional note naming the nature of the recognition and function. If the recognition is statutory, the statute is identified. The categories for recognition and function are described in the section on Official Recognition.
If the reported language status is EGIDS 3 (Wider Communication) and the data are available, the use of this language as a language-of-wider-communication by L1 speakers of other languages is identified.
This part of the entry names the linguistic affiliation of the language, including macrolanguage membership if applicable.
All languages are slowly changing, and linguistically related varieties may be diverging or merging. Most languages are related to other languages–to some more closely and to others more distantly. Linguists have used terms such as phylum, stock, family, branch, group, language, and dialect to refer to these relationships in increasing order of linguistic similarity.
Linguistic classification. The classification information for each language follows the general order from largest grouping to smallest. More inclusive group names are given first, followed by the names for less inclusive subgroups, separated by commas.
Language classification information comes from a variety of sources. The Ethnologue attempts to report the generally accepted consensus of scholars working in the language family based on published works and scholarly review. The sources on which the classifications are based are not overtly cited in the language entry but may be included in the list of general references listed at the country level. The sources used for classifications are available on request by contacting the Editor; see Contact us.
A listing of the highest-level language families (including the number of languages, average populations, and countries where spoken) is given in the Statistical Summaries. The family trees may be browsed by going to Browse by Language Family.
Macrolanguage membership. If an individual language is a member of a macrolanguage (see Macrolanguages in “The problem of language identification”), that fact is reported following the language classification information. The listing gives the name of the macrolanguage of which the individual language is a member, the name of the primary country under which its entry is found (if different from the current country), and the ISO code for the macrolanguage. By looking up that entry, it is possible to find a list of all the members of the macrolanguage.
This part of the entry gives information about the names of dialects of the language. It may also describe the relationships among dialects or to other languages in terms of dialect intelligibility and lexical similarity.
Dialect names. Speech varieties which are functionally intelligible to each others’ speakers because of linguistic similarity are generally considered dialects of the same language and the names of all such dialects are listed under that language. In addition, alternate names for individual dialects are listed in parentheses following the primary name for the dialect. When one of these names is known to be offensive to its speakers, it is placed in double quotes (and tagged as pejorative with the abbreviation “pej.” as is also done for alternate language names).
The listing of dialect names does not represent the results of rigorous dialectological investigations. As with the alternate names, the list of dialect names includes all names reported to us which may, at one time or another, have been used in reference to some variety of a language. Some of these names are village or regional names and may not actually represent significant linguistic variants. In a few cases, the ISO 639-3 standard has assigned individual language identification codes to varieties which we, on the advice of our contributors and consultants, have included in our list of dialects. In those very few cases, we depart from the ISO 639-3 standard and do not list these varieties separately as individual languages.
Intelligibility and dialect relations. A measure of inherent intelligibility with other varieties is given by percent. Values of less than 85% are likely to signal difficulty in comprehension of the indicated language.
The ability of the users of one variety to understand another variety, based only on the similarity of those two varieties, is called inherent intelligibility. Intelligibility may not be reciprocal or mutual, thus the wording of the intelligibility description may indicate the direction of the intelligibility (e.g., 85% intelligibility of another variety, or 85% intelligibility by speakers of another variety). If the direction of intelligibility is not indicated (e.g., 85% intelligibility with another variety) or is identified as being mutual, it should be understood as being reciprocal with speakers of each of the varieties mentioned understanding each other equally well.
The ability of speakers to understand another variety because of previous exposure to it or learning is called acquired intelligibility and may be commented on in some language entries.
Lexical similarity. The percentage of lexical similarity between two linguistic varieties is determined by comparing a set of standardized wordlists and counting those forms that show similarity in both form and meaning. Percentages higher than 85% usually indicate a speech variant that is likely a dialect of the language with which it is being compared. Unlike intelligibility, lexical similarity is bidirectional or reciprocal.
For some languages, brief lists of linguistic features of the language are given. Constituent order (e.g. Subject, Object, Verb = SOV) is the most commonly reported feature. Other basic characteristics that are of particular interest to linguists are also reported when the data is available. In a growing number of cases these listings are more extensive and cover a range of linguistic features, including information about the existence of prepositions versus postpositions, constituent order in noun phrases, gender, case, transitivity and ergativity, canonical syllable patterns, the number of consonants and vowels, the existence of tone, and in some cases whether users of the language also use “whistle speech”. These descriptions are no more than brief mentions, however, and do not constitute adequate descriptions of the language. A list of the typological features that we wish to collect data on is available upon request.
This part of the entry gives information about the use and viability of the language, as well as the use of other languages by members of the community.These data, for the most part, provide supporting evidence for the assignment of the EGIDS status (See Language Status section above).
Viability remarks. A number of viability indicators are given. As a general summary, where the language is being passed on to children as their first language, or where it is used frequently and widely within the community, the term “Vigorous” is most often used.
Other indicators are the number of people who use the language as their second language, and the degree of language shift of speakers of this language to a another language (in some cases indicated by the percentage of speakers within the ethnic community). General estimates of viability may be given. These indicators and general comments are taken into account in assigning the EGIDS status.
Domains of use. When more than one language is used in a community, speakers often establish patterns of language use for specific configurations of speakers, topics, and locations. These domains of language use can be described by answering the well-known question, “Who is speaking to whom, about what, and where?” In some language entries, we are able to specify a set of identified domains of use and we may also report whether the domain is associated exclusively with the language or is one where mixed language use is prevalent.
The Ethnologue does not have sufficient data about each language to permit a full description of the domains of use in this technical sense, but uses the term to refer most often to a general set of categories most often associated with a particular location (e.g., home, school, community) and thus only indirectly related to the topics and speakers most generally associated with those settings. Knowledge of these patterns of language use can help in evaluating ethnolinguistic vitality and in developing strategies for language revitalization or language development.
User age groups. Data reported in this category is most often reported in terms of generations (children, adolescents, young adults, adults, older adults) but occasionally specific age ranges may be identified.
As language use shifts from a traditional language to one of wider communication, differences in use appear between age groups. As language change takes place, older adults tend to be the final speakers of the traditional language. The use of a language in terms of generational cohorts is thus a significant indicator of the patterns of language transmission which is key to language maintenance.
Language attitudes. We report only summary attitude evaluations as positive attitudes, neutral attitudes, or negative attitudes. In some cases these general categories may include an additional modifier, such as, “somewhat positive attitudes.”
What people think and how they feel about their own language is important to those promoting literacy or other development activities as more positive attitudes generally correspond to stronger ethnolinguistic vitality. Attitudes are difficult to assess directly and equally difficult to describe adequately.
Bilingualism remarks. Brief comments about the use and users of second languages are included. Unless specific information is provided to us, we generally do not characterize levels of proficiency or domains of use of second languages. Generally the remark consists of the phrase “Also use” followed by the name(s) of the additional languages.
Because second languages are usually learned later than first languages and access to the means of acquiring proficiency in a second language is not always equally available, bilingualism is usually not uniform across a community. When communities use a second language, different speakers usually have varying degrees of bilingual proficiency, ranging from the ability to use only greetings, to engage in trade, or proficiency adequate for freely expressing anything in the second language. Leaders, educated, men, traders, those who travel, those in population centers, and people in certain age groups may be more bilingual than other members of their community. Where information is available, these factors about bilingualism are described but generally.
Use as second language. When the language in focus is used by others as a second language (as reported in the bilingualism remarks of other language entries), this is indicated with the phrase “Used as L2 by ...”. Following this introductory phrase is a list of the other languages that are reported to use this one as a second language.
This part of the entry gives information about literacy rates, publications and use in media, use in education, and language development agencies.
Literacy rates. Where available, percentages of the speaker population who are literate are given for L1 and L2 languages. Where the L2 is not specifically identified, it is assumed to be the dominant language of the country in focus or another major language in the vicinity.
Literacy remarks. Information concerning motivation for literacy and existence of government (and other) literacy programs are given where available. Additional information concerning literacy that does not appear in related categories may also be reported here.
Use in elementary or secondary schools. The language may be used either as a language of instruction or taught as a subject within one or more schools in the language area. Generally, we only include a statement in this category if the language is used in the schools. Occasionally some additional information about the nature of that use is also available and is reported.
Publications and use in media. The existence of materials that have been produced in the language such as linguistic documentation (dictionaries, grammars), printed materials, broadcast media, and new media (SMS, email, websites, etc.) is indicated when known. We report the existence of such materials but do not list titles individually. Where extensive literature and media exist, we identify the language as “Fully developed”. For many languages this information is very incomplete at this time. More information is welcomed though it is unlikely that the Ethnologue will ever be able to document existing literature in a comprehensive way.
The most widely published book in the world is the Bible with at least portions having been translated and published in 2,897 or 41% of the living languages listed in the Ethnologue. Our information on the existence of the biblical text comes from a variety of sources. That information about Bible publication for each language is given with the dates of both the earliest and the most recent published Bible, New Testament (NT), Old Testament (OT), or complete books (portions).
Language development agencies. Agencies that focus on the revitalization, maintenance, or development of the language are listed. These may be national or provincial official or semi-official entities or they may include formally constituted local organizations. In general, international development organizations are not included here. Additions to the existing information are welcomed.
For each language, the script used for written materials is given if known. Where multiple scripts are in use they are reported in alphabetical order. Where possible we also report any specific style of a script that is used, the years when a script began to be used or ceased to be used, and other comments regarding writing and orthography. In general, where no script is identified, it can be assumed that there is no widely accepted and used writing system. Scripts other than transcription systems also exist for some Sign Languages but are not in wide use and so are not currently reported.
This part of the entry gives additional information that does not fit under the above categories.
General remarks. These are general statements about the language or its context that do not fall into other specific categories. Alternate identifications of the language community or ethnic group may be identified here. These may include government recognized or official nationalities, ethnic names (usually identified by the label “Ethnonym:”), or the identification of the meanings or derivations of certain names. Other historical or ethnographic information may be included here as well.
Religion. The religious affiliations of the speakers of the language are given where known. These are generally listed in descending order of number of adherents.
Macrolanguage member languages. If the entry is describing a macrolanguage (see Macrolanguages in “The problem of language identification”), then a complete list is given of the individual languages that fall within the scope of the macrolanguage.
Second Language Only status. While there are many languages that are used as second languages by large populations of speakers, the phrase “Second language only” is used to identify a specific category made up of those languages which are used as second languages but have no mother-tongue speakers and generally weak or secondary ethnic or identity associations.
These may include languages of special use, such as languages of initiation, languages of interethnic communication such as American Plains Indian Sign Language, liturgical languages, as well as cants and jargons. Most often these languages are given a status of EGIDS 3 (Wider Communication) but are identified in this way as well because of the absence of L1 speakers.
Crossreference to primary entry. When this entry is not the primary entry for the language being described, there is a crossreference to the specific page on which the primary entry may be found. Information like the total population, all the countries where spoken, linguistic relationships, linguistic typology, products of language development, and writing systems are included only in the primary entry.