Ghost Languages? Ghostbusters needed!

MPLewis | November 1, 2015

A motto that we have around here at Ethnologue Central is that we aim to provide language information that is Clear, Concise, and Comprehensive (and, if you need another alliterative item, Correct, of course). Those goals don't always align easily. Sometimes by being concise, we lose clarity.  And in our efforts to be comprehensive, we sometimes find ourselves wandering off the path of correctness.  The topic this month, so-called "ghost languages," refers to the cases where the Ethnologue lists languages for which there is no real evidence of their existence.  These are not languages that are extinct, so-called "dead languages." In most cases there is clear evidence that those languages existed and were in use at one time.  Ghost languages, in contrast, are somewhat akin to "false positives" or, reaching for another metaphor, "rumors."  These are languages that somebody at sometime "named" but for which there is no data to support that identification. It may just seem like shoddy work on our part to have these languages included in the Ethnologue, but the topic warrants some examination. As with nearly everything, It's More Complicated Than You Might Think.

The first complication arises out of one of our strengths--the longevity of the Ethnologue as a research project. The Ethnologue has been around for a long time.  It represents 3 generations of scholarship with the mantle being passed down from editor to editor and field research passing through the hands of multiple investigators over the last 60+ years. As you can read in the History of the Ethnologue, the initial list of languages was quite short but that list grew with each new edition and expanded significantly when, in 1971, the scope of the research was increased to include "all known languages of the world."  Though that scope has been trimmed back a bit to include only "living and recently extinct" languages," that shift is the source of our focus on providing comprehensive information.

While we have lots of documentary material in our paper and electronic files, it isn't always easy for us to determine when or why a language was added to the Ethnologue's inventory.  We can, of course, look up a language in previous editions and keep going back until we don't see an entry for it.  That gives us an idea (within a 4-5 year range) of when a language was identified and reported to the Ethnologue editor.  We also turn to our files and look for correspondence that might give us some clues. It is often the case, though, that we can't determine what the evidence was for that inclusion.  The legacy repertoire of identified languages became codified, first within the pages of the Ethnologue, and more recently, with some modifications, in the ISO 639-3 standard.  It serves as the baseline for our statistical analyses and in many ways has been taken by many people to be a given, the true and authoritative list of the (living and recently extinct) languages of the world.  While that's what we aim for, we recognize that given the nature of the early aims and research methodology, neither we (nor you) should just take us at our word.  The inclusion of a language (now) in the ISO 639-3 inventory should be based on evidence.

The second complication is that the simple lack of evidence--data about the nature of the language, the number of present or former users, the location of those users, and all the other kinds of information that make an Ethnologue entry "complete"--doesn't necessarily mean that a language isn't there now or wasn't there in the last 60 years (our definition of "recently extinct"). Many small language groups live in very remote areas. Often they are intermixed with users of other languages (and are multilingual in those languages). The use of the language may not be visible to the casual or short-term observer. And even with more extensive observation, the domains and times of use may be so limited or so intimate (only in the home, exclusively among members of the group when no outsiders are present) as to go unremarked. And then, even when somebody may be aware of such a language, they may do no more than mention it in passing in a publication or a conference presentation.

If the language is now extinct, there is some utility in making sure that it gets added to the ISO 639-3 inventory as an extinct language.  This gives the scholarly community a three-letter code that can be used to clearly identify the now-dead language. That may seem like a small thing but it enables those who study languages in a region or in a particular language family to more easily know that they are talking about the same (or different) linguistic entities. And if the language went extinct after 1950, we want to make sure that it is clearly and concisely reported on in the Ethnologue.

The third complication is that when multiple sources report on the linguistic ecology of a region, different sources may use different names for the same languages.  Sometimes, one source identifies different varieties using different names where the ISO 639-3 standard lumps all of those varieties together under a single name. Often these different names come from the users of the language itself. If those were reported to Ethnologue prior to 2005, when Ethnologue began to use the ISO 639-3 standard, chances are they might have been added to the Ethnologue as separate languages.  Wanting to be comprehensive, we included possibly previously-unidentified languages rather than omit them.

An "empty" Ethnologue entry for a language doesn't necessarily mean that the language is a ghost.  It may just be a case where not enough research has been done or reported to us. Unfortunately, while it may be possible to provide evidence that a language exists or has existed (linguistic descriptions, materials in the language, etc.), it is much more difficult to prove that a language never existed. This makes the suppression of these ghost languages all the more difficult, though we are making considerable progress in our ghostbusting efforts.

We are especially grateful for the work of several scholars who take the time to point out when a language included in the Ethnologue is spurious.  These ghostbusters have provided us with a great deal of information that helps us achieve our goal of providing correct and accurate information about the global linguistic ecology. Unfortunately, under the protocol that we have set for ourselves, Ethnologue cannot on its own remove a language from the ISO inventory.  That requires a request to the ISO Registrar.  We can, however, include a comment in the language entry indicating that the language identification is very likely erroneous or spurious and when enough evidence is provided to us we will, on occasion, submit the necessary forms to the ISO Registrar to initiate the removal process. Once the ISO standard "deprecates" (to use their term) a language identification, we are happy to adjust the Ethnologue to match and those updates appear in each new annual edition.