Language Information

Print

The title for a language page is an anglicized form of the name used to refer to that language in that country. In most cases the name is the one that the speakers prefer if such a preference is known. However, speakers within a language community may have different opinions about which name they prefer. These preferred names are recorded using English spellings, though diacritical marks may be included. Among Khoisan languages and a few other languages in southern Africa special symbols are used in language names to represent the “click” sounds produced with ingressive mouth air.

The subtitle names the primary country for the language. When a language is spoken in more than one country, we designate one of the countries as primary, usually the country of origin or the country where most of the speakers are located.

A complete language description contains the following elements. Follow the link for a full description of the element. Note that each language description includes only the elements for which information is known.

  • ISO 639-3 gives the code assigned to the language by the ISO 639-3 standard.
  • Alternate names gives a list of other names that have been used in the literature to refer to the language.
  • Population gives the number of people in the country for whom this language is a first language, plus the total number of L1 speakers if it is used in multiple countries. Also included may be monolingual population, ethnic population, and other comments about population.
  • Location describes where the language is spoken within the country, as well as listing other countries in which it is spoken.
  • Language status gives the EGIDS level for the language in that country and describes the level of official recognition, if any.
  • Classification provides the language classification, including macrolanguage membership if applicable.
  • Dialects lists the names of dialects of the language, as well as giving information about dialect relations in terms of intelligibility and lexical similarity with other varieties if available.
  • Typology provides typological information, including brief descriptions of basic word order, significant phonological, morphological, and syntactic features, and other matters of interest to linguists.
  • Language use gives information about the use and viability of the language, as well as the use of other languages by members of the community.
  • Language development gives information about literacy rates, written materials, and use in education.
  • Writing gives information about writing systems and scripts used for the language.
  • Other comments gives additional information that does not fit under the above categories.

If the language has well-established, multi-generational communities of language users in other countries, subentries for these countries are listed at the bottom of the page. Information like classification and typology which is the same in every country is not repeated in these subentries.

ISO 639-3 code. The code assigned to the language by the ISO 639-3 standard (ISO 2007) is given in lower-case letters within square brackets. When a given language is spoken in multiple countries, all of the entries for that language use the same three-letter code. The code distinguishes the language from other languages with the same or similar names and identifies those cases in which the name differs across country borders. These codes ensure that each language is counted only once in world or area statistics.

Alternate names. Many languages are known by or have been referred to by more than one name. Alternate names come from many diverse sources: speakers may have more than one name for their language, or neighboring groups may use different names. Other names may have been assigned by outsiders and used in ethnographic or linguistic publications before the name used by the speakers themselves was known. Another source of alternate names is variant spellings of what is essentially the same name. In many cases, spellings used in languages of wider communication or regional languages are also included in this list. We list names for the ethnic group and identify place names that have been used in the literature as names for the language in the Other comments part of the entry.

Some names in use by others are offensive to the speakers of the language. Those are identified, wherever they are listed, by enclosing the name in double quotation marks and appending the label pej. (pejorative) following the name. We include these names as a means of helping users find languages they may have only heard or seen referred to by such names. By so doing, Ethnologue in no way implies any endorsement of the pejorative names.

Population

The population element gives the number of L1 speakers in the country, plus the total number of L1 speakers worldwide if it is used in multiple countries. Also included may be an indication of population stability, number of monolingual speakers, population of ethnic community, and other comments about population.

Country speaker population. The first population figure given is the estimated number of first-language (L1) speakers in the country in focus. Where it is available we provide the source and date of the information in parentheses following the conventions described above for source citations. Population data have been provided from many different sources over a number of years. This diversity among sources and dates frequently causes the totals of the populations for all of the languages in any given country to differ markedly from the total current population of the country.

We do not extrapolate population estimates to bring them up-to-date, since populations of language communities do not increase at the same rate within a country and since some initial estimates themselves turn out to have been incorrect to start with. However, some population data submitted to the Ethnologue may be the result of extrapolation.

The Ethnologue provides the number of first-language speakers wherever possible. It is often difficult to get an accurate figure for the speakers of a language. All figures are only estimates; this is true even for census figures. Some sources do not include all dialects in their figures or may count as a single language two languages identified separately in the ISO 639-3 inventory. Some sources count members of ethnic groups, who, in some cases, may not be speakers of the language. Some sources do not make clear whether they refer to the total number of speakers in all countries, or only to those in one of the countries. Some do not distinguish first-language (L1) speakers from second-language (L2) speakers.

Languages that are no longer in use, but still have ethnic group members who identify with the language, are listed as having “No known speakers” in place of a population figure. Languages that have neither societal use nor remaining ethnic group members are described as “Extinct”. Languages which have no L1 speakers but which are used for specific purposes by a community are identified as "Second Language Only".

Dates and sources for population data are given where available using the conventions described in How references are cited. Where the word “census” appears as the source, it is generally the national census of the country and is not included in the list of references. In some cases the source is a government agency (but not the official census) or another organization. The form of the citation, as described in How references are cited indicates whether a particular source will appear in the list of references.

Population stability comment. For some languages, we are able to indicate whether the speaker population is increasing or decreasing. This information also contributes to an overall evaluation of ethnolinguistic vitality. There may be a few cases where the actual speaker population count is not known or is unreported, but the stability and general trend of the population is evident and has been commented on.

Population in all countries. When a language has first-language speakers in more than one country, the entry for the primary country lists the calculated total speaker population for all countries. Since information may come from multiple sources, the sum of the individual country populations may not equal the figure given for all countries in other sources. In some cases, the L1 speaker population of one or more countries may not be available and is not included in this calculation.

Monolingual population. Where the data are available, the number of those who are monolingual is reported. In some cases it is reported as a percentage of the L1 speaker population. This figure can be compared with the total speaker population as an indicator of the vitality of the language.

Population remarks. Additional information concerning populations may include population breakdowns (by dialect, gender, ethnic groups, or specific villages or communities), the population of the deaf community, or other comments on demographics.

Ethnic population. Where it is known, the population of those who identify themselves as part of the ethnic group, whether or not they speak the language, is given. A language with no first-language speakers will be reported as extinct when the ethnic population figure is zero, absent, or unknown. When the reported speaker population is zero but there is an ethnic population figure, the language will be reported as having “No known speakers”.

Location

A description of the location where the language is spoken is included in each entry where a specific area can be defined. Those languages that are scattered through a country or wide region may not have this information in the entry or may be reported as “Widespread”. These languages may not appear on the country maps. A list of all countries where the language is spoken is provided in the primary entry for a language that is spoken in multiple countries. Generally, regional locations are listed in descending order from largest geopolitical unit to the smallest. The list of locations may not be exhaustive. When the language is used in multiple countries, this element for the primary country also includes a listing of all the other countries.

Language status

This part of the entry reports the estimate of the status of the language in the country on the Expanded Graded Intergeneratonal Disruption Scale (EGIDS); see the complete section on Language Status for a listing of the levels. If the language has been officially recognized in legislation or serves official functions at the national or provincial levels, there is an additional note naming the nature of the recognition and function. I the recognition is statutory, the statute is identified. The categories for recognition and function are also described in the section on Language Status.

Classification

The classification element names the linguistic affiliation of the language, including macrolanguage membership if applicable.

Linguistic affiliation. The classification information for each language follows the general order from largest grouping to smallest. More inclusive group names are given first, followed by the names for less inclusive subgroups, separated by commas.

All languages are slowly changing, and linguistically related varieties may be diverging or merging. Most languages are related to other languages--to some more closely and to others more distantly. Linguists have used terms such as phylum, stock, family, branch, group, language, and dialect to refer to these relationships in increasing order of linguistic similarity.

Language classification information comes from a variety of sources. The Ethnologue attempts to report the generally accepted consensus of scholars working in the language family based on published works and scholarly review. For this edition, the language classifications for several major families have undergone thorough review and revision. The sources on which the classifications are based are not overtly cited in the language entry but may be included in the list of general references listed at the country level. The sources used for classifications are available on request by contacting the Editor; see Contact us.

A listing of the highest-level language families (including the number of languages, average populations, and countries where spoken) is given in the Statistical Summaries. The family tress may be browsed by going to Browse by Language Family.

Macrolanguage membership. If an individual language is a member of a macrolanguage (see Macrolanguages in “The problem of language identification”), that fact is reported following the language classification information. The listing gives the name of the macrolanguage, the name of the primary country under which its entry is found (if different from the current country), and the ISO code for the macrolanguage.

Dialects

The dialects element gives information about the names of dialects of the language. It may also describe the relationships among dialects or to other languages in terms of dialect intelligibility and lexical similarity.

Dialect names. Speech varieties which are functionally intelligible to each others' speakers because of linguistic similarity are considered dialects of the same language and the names of all such dialects are listed under that language. In addition, alternate names for individual dialects are listed in parentheses following the primary name for the dialect. When one of these names is known to be offensive to its speakers, it is placed in double quotes (and tagged as pejorative with the abbreviation “pej.” as is also done for alternate language names).

The listing of dialect names does not represent the results of rigorous dialectological investigations. As with the alternate names, the dialect names list includes all names reported to us which may, at one time or another, have been used in reference to a local variety of a language. Some of these names are village or regional names and may not actually represent significant linguistic variants. In a few cases, the ISO 639-3 standard has assigned individual language identification codes to varieties which we, on the advice of our contributors and consultants, have included in our list of dialects. In those very few cases, we depart from the ISO 639-3 standard and do not list these varieties separately as individual languages.

Intelligibility and dialect relations. A measure of inherent intelligibility with other varieties is given by percent. Values of less than 85% are likely to signal difficulty in comprehension of the indicated language.

The ability of the users of one variety to understand another variety, based only on the similarity of those two varieties, is called inherent intelligibility. . Intelligibility may not be reciprocal or mutual, thus the wording of the intelligibility description may indicate the direction of the intelligibility (e.g., 85% intelligibility of another variety, or 85% intelligibility by speakers of another variety). If the direction of intelligibility is not indicated (e.g., 85% intelligibility with another variety) or is identified as being mutual, it should be understood as being reciprocal with speakers of each of the varieties mentioned understanding each other equally well.

The ability of speakers to understand another variety because of previous exposure to it or learning is called acquired intelligibility and may be commented on in some language entries.

Lexical similarity. The percentage of lexical similarity between two linguistic varieties is determined by comparing a set of standardized wordlists and counting those forms that show similarity in both form and meaning. Percentages higher than 85% usually indicate a speech variant that is likely a dialect of the language with which it is being compared. Unlike intelligibility, lexical similarity is bidirectional or reciprocal.

Typology

For some languages, brief lists of linguistic phenomena found in the language are given. Constituent order (e.g. Subject, Object, Verb = SOV) is the most commonly reported feature. Other basic features that are of particular interest to linguists are also reported when the data is available. In a growing number of cases these listings are more extensive and cover a range of linguistic features, including information about the existence of prepositions versus postpositions, constituent order in noun phrases, gender, case, transitivity and ergativity, canonical syllable patterns, the number of consonants and vowels, the existence of tone, and in some cases whether users of the language also use "whistle speech". These descriptions are no more than brief mentions, however, and do not constitute adequate descriptions of the language. A list of the typological features that we wish to collect data on is available upon request.

Language Use

The language use element gives information about the use and viability of the language, as well as the use of other languages by members of the community.

Viability remarks. A number of viability indicators are given. As a general summary, where the language is being passed on to children as their first language, or where it is used frequently and widely within the community, the term “Vigorous” is most often used.

Other indicators are the number of people who use the language as their second language, and the degree of language shift of speakers of this language to a another language (in some cases indicated by the percentage of speakers within the ethnic community). General estimates of viability may be given. These indicators and general comments are taken into account in assigning the EGIDS status.

Domains of use. When more than one language is used in a community, speakers often establish patterns of language use for specific configurations of speakers, topics, and locations. These domains of language use can be described by answering the well-known question, “Who is speaking to whom, about what, and where?” In some language entries, we are able to specify a set of identified domains of use and we also report whether the domain is associated exclusively with the language or is one where mixed language use is prevalent.

The Ethnologue does not have sufficient data about each language to permit a full description of the domains of use in this technical sense, but uses the term to refer most often to a general set of categories most often associated with a particular location (e.g., home, school, community) and thus only indirectly related to the topics and speakers most generally associated with those settings. Knowledge of these patterns of language use can help in evaluating ethnolinguistic vitality and in developing strategies for language revitalization or language development.

User age groups. Data reported in this category is most often reported in terms of generations (children, adolescents, young adults, adults, older adults) but occasionally specific age ranges may be identified.

As language use shifts from a traditional language to one of wider communication, differences in use appear between age groups. As language change takes place, older adults tend to be the final speakers of the traditional language. The use of a language by children is thus a significant indicator of the patterns of intergenerational language transmission which is key to language maintenance.

Language attitudes. We report only summary attitude evaluations as positive attitudes, neutral attitudes, or negative attitudes. In some cases these general categories may include an additional modifier, such as, "somewhat positive attitudes."

What people think and how they feel about their own language is important to those promoting literacy or other development activities as more positive attitudes generally correspond to stronger ethnolinguistic vitality. Attitudes are difficult to assess directly and equally difficult to describe adequately.

Bilingualism remarks. Brief comments about the use of second languages are included. Unless specific information is provided to us, we generally do not characterize levels of proficiency or domains of use of second languages. Generally the remark consists of the phrase "Also use" followed by the name(s) of the additional languages.

Because second languages are usually learned later than first languages and access to the means of acquiring proficiency in a second language is not always equally available, bilingualism is usually not uniform across a community. When speakers can use a second language, different speakers usually have varying degrees of bilingual proficiency in it, ranging from the ability to use only greetings, to engage in trade to proficiency adequate for freely expressing anything in the second language. Language communities are sometimes reported to be bilingual if a few of the speakers can use a second language to some degree, or if there are no monolinguals; whereas other sources would not classify groups as bilingual unless a large majority of their members could use the second language very well. Leaders, the educated, men, traders, those who travel, those in population centers, and people in certain age groups may be more bilingual than others. Where information is available, these factors about bilingualism are described.

Language Development

The language development element gives information about literacy rates, written materials, and use in education.

Literacy rates. Where available, percentages of the speaker population who are literate are given for the first (L1) and second (L2) languages. Where identification of the second language is not given, it is assumed to be the dominant language of the country in focus or another major language in the vicinity.

Literacy remarks. Information concerning motivation for literacy and existence of government (and other) literacy programs are given where available. Additional information concerning literacy that does not appear in related categories may also be given.

Use in elementary or secondary schools. The language may be used either as a language of instruction or taught as a subject within one or more schools in the language area. Generally, we only include a statement in this category if language is used in the schools. Occasionally some additional information about the nature of that use is also available and is reported.

Publications and use in media. The existence of materials that have been produced in the language such as linguistic documentation (dictionaries, grammars), printed materials, broadcast media, and new media (SMS, email, websites, etc.) are indicated when known. We report the existence of such materials but do not list titles individually. Where extensive literature and media exist, we identify the language as “Fully developed”. For many languages this information is very incomplete at this time. More information is welcomed though it is unlikely that the Ethnologue will ever be able to document existing literature in a comprehensive way.

The most widely published book in the world is the Bible with at least portions having been translated and published in 2,848 or 40% of the living languages listed in the Ethnologue. This figure is based on the thorough archival efforts of the United Bible Societies and the American Bible Society. Information about Bible publication for each language is given with the dates of the earliest and most recent published Bible, New Testament (NT), Old Testament (OT), or complete books (portions).

Writing

For each language, the script used for written materials is given if known. Where multiple scripts are in use they are reported in alphabetical order. Where possible we also report any specific style of a script that is used, the years when a script began to be used or ceased to be used, and other comments regarding writing and orthography. Languages which are known to be unwritten are so identified. Since many languages use the Latin script, that fact is not always reported if its use is obvious.

Other Comments

The final element gives additional information that does not fit under the above categories.

General remarks. These are general statements about the language or its context that do not fall into other specific categories. Alternate identifications of the language community or ethnic group may be identified here. These may include government recognized or official nationalities, ethnic names (usually identified by the label "Ethnonym:"), or the identification of the meanings or derivations of certain names. Other historical or ethnographic information may be included here as well.

Religion. The religious affiliations of the speakers of the language are given where known. These are generally listed in descending order of number of adherents.

Member languages. If the entry is describing a macrolanguage (see Macrolanguages in “The problem of language identification”), then a complete list is given of the individual languages that fall within the scope of the macrolanguage.

Second Language Only status. While there are many languages that are used as second languages by large populations of speakers, the phrase “Second language only” is used to identify a specific category made up of those languages which are used as second languages but have no mother-tongue speakers and generally weak or secondary ethnic or identity associations.

These may include languages of special use, such as languages of initiation, languages of interethnic communication such as American Plains Indian Sign Language, liturgical languages, as well as cants and jargons. Most often these languages are given a status of EGIDS 3 (Wider Communication) but are identified here as well because of the absence of L1 speakers. Languages identified as Second Language Only are not included in the count of living languages. The number of languages in this category in each country is reported in the language statistics at the end of the Country Header.

Increasingly, this categorization may also seem applicable to some languages which previously were considered not to have any remaining speakers but where revitalization efforts are resulting in a community of emerging language users who have learned their heritage language as a second language. Re-emerging languages are identified by the EGIDS level assigned to them (EGIDS 9 initially). Such dormant or reawakening languages are listed in the body of the Ethnologue and are included among the world and major area statistical totals of living languages. The inventory of these languages is also incomplete and we welcome information that will expand our coverage.