|  | LINGUIST Codes for Ancient and Constructed LanguagesAnthony Aristar, Wayne State University and LINGUIST List | 
DRAFT 2002-02-19
The workgroup's aim is to produce a supplementary set of language codes that will, in conjunction with the Ethnologue's set, constitute a complete set of codes for all languages of which there is any historical or current record. To this end, it describes a proposal for supplementing Ethnologue's set with codes for ancient and constructed languages. It discusses the criteria for the Ethnologue codes, and how these might, with modifications, be extended to ancient and constructed languages. It then lists a proposed set of criteria by which codes should be assigned to ancient and constructed languages.
The most complete set of language codes in existence is the Ethnologue system (http://www.ethnologue.com/), produced and maintained by the Summer Institute of Linguistics. This system assigns a 3-letter code to every distinct natural language in existence. As noted by Constable and Simons 2001[1], the Ethnologue attempts to:
Every language description, what is more, always includes information on:
It is essential to note that the notion of mutual non-intelligibility, listed in criterion (1), is fundamental. Varieties of language are only assigned a code if they are mutually unintelligible with varieties of any language to which a code has already been assigned. Simply stated, a dialect whose manifestation is merely linguistically idiosyncratic should not normally merit its own code unless speakers of other dialects cannot understand it.
With reference to criterion (2), it should be noted that the Ethnologue system is intended to encompass only those languages of the world in current use. Thus the Ge'ez (Ethnologue code GEE) and Sanskrit (Ethnologue code SKT) languages both appear in Ethnologue, even though they have not been spoken by native speakers for many centuries, simply because they are in common liturgical use today. Akkadian (LINGUIST code XAKK), on the other hand, does not appear in the Ethnologue, simply because it has not been used in any function for almost 2000 years.
Most ancient languages - and many languages which have become extinct over the last 500 years - are thus absent from Ethnologue, along, of course, with all constructed languages except Esperanto (Ethnologue code ESP), which has a small number of native speakers.
There is one clear inconsistency in this regard in Ethnologue. Languages which are recently extinct often appear there, even when they have no current function. The Romance language Dalmatian, for example - spoken on the coast of modern Croatia until the late 19th century - has the Ethnologue code DLM. On the other hand the language Abipon (LINGUIST code XABI), a South American Indian language which became extinct at around the same time, does not appear in Ethnologue.
There are also two shortcomings in Ethnologue's system, and these have to do with the notions of provenance and conflict. Every language in Ethnologue is documented to a greater or lesser degree. But we usually do not have a clear idea of the evidence upon which it was decided to assign the language a unique code. Nor does the system allow for conflicting language classifications. For example, there is disagreement amongst scholars as to the classification of Low German dialects. This is not indicated in Ethnologue.
In mitigation of both these points, it might be noted that Ethnologue intends to include provenance information in the future, and that Ethnologue is not designed to provide a complete classification of the languages to which it assigns codes. The classification it does indicate is merely intended to be of service to those interested in such information.
Since OLAC is designed to allow the categorization of any language, a complete set of language codes must be available, a set which includes codes for both ancient and constructed languages. We propose here that the set of supplementary codes designed by LINGUIST for this purpose should become the OLAC standard, and that the union of these two code sets should be called the Universal Language Codes or ULC.
This set of supplementary codes should conform as closely as is reasonable to the standards set by Ethnologue. However, to instantiate a useful set of codes for ancient and constructed languages entails considerable loosening of the Ethnologue standards. Most importantly, the criterion of mutual intelligibility is problematic here for three reasons. First, in some cases the criterion of mutual intelligibility has to be abandoned simply because the language had a cultural distinctness, and scholars treat it uniquely. A case in point is Anglo-Norman, which was in reality an aberrant dialect of Old French. However, since it evolved independently, and has a literature distinct from that of Old French, which scholars treat separately, it must be assigned a distinct code so that work on it can be discriminated from work on Old French generally.
Mutual intelligibility also breaks down in another way. Ancient languages often have a diachronic dimension that can usually be ignored with modern languages. Old Latin gave rise to Classical Latin, which in turn gave rise to Late Latin, which in turn gave rise to Vulgar Latin or Proto-Romance. It is likely that no two adjacent stages of this complex process would have been mutually incomprehensible, had there been any speakers who could speak the two versions. How many codes do we assign here on the basis of mutual intelligibility?
There is also the issue of ancient languages in scripts which have as yet not been deciphered (e.g. Minoan, the language(s) of Linear A, LINGUIST code XMIO), or which cannot be understood, though their texts are written in scripts which can be read (e.g. Eteocretan, LINGUIST code XECR).
In conclusion then, we propose the following overriding standard for assigning codes to ancient languages:
In addition, we propose that the following should serve as criteria for the assigning of codes:
With regard to constructed languages, the following standards should apply:
Since we do not believe that any purpose is gained by multiplying language codes, we propose that LINGUIST codes should merely fill the gaps in the Ethnologue system. Thus, liturgical languages should remain part of the Ethnologue system, even if ancient; and all extinct languages already in Ethnologue should remain there.
For the future, however, a clear line of demarcation should exist, so that scholars will know which organization is responsible for assigning a code they need to a language. We propose that an arbitrary division should be decided on, thus:
To ensure that the LINGUIST codes are always clearly distinguishable from those of Ethnologue, and so that codes can be assigned by one organization without reference to the other, we further propose a distinction between Ethnologue and LINGUIST codes:
A list of all current LINGUIST ancient language codes can be found at the URL: http://linguistlist.org/ancientlgs.html.
A list of all current LINGUIST constructed codes can be found at the URL: http://linguistlist.org/constructedlgs.html.