When a language should be considered artificial - A quick classification of spoken, dead, hypothetical and invented languages

Following Mithridates’ latest post and comment on artificial language compared to revived language, I consider it appropriate to share my point of view on this subject. For me, the schematic classification of languages into “natural” and “artificial” could be made more or less as follows, from ‘most natural’ (1) to ‘most artificial’ (20):

NOTE 1: There are 20 categories, as there could be just 4 (living, dead, reconstructed and invented) or 6, or 15, or a million categories corresponding to one language each, based on thorough statistical studies of vocabulary, grammar, ‘prestige’, etc. Thus, 20 is only the number that appeared after I classified the languages I know in some personal, general and more or less straightforward classes; the concept looked for by this classification is to locate where proto-languages (and especially Modern Indo-European or Europe’s PIE) are if compared to natural languages and “conlangs”. It is also possibly the minimum number to show the interesting difference between categories 9 and 10.

NOTE 2: one may or may not agree on languages given as examples of this or that particular category; however, the general concept behind individual categories is what matters. For the term ‘(international) prestige’ as it is used here, I took in part as reference Dutch sociologist Abram de Swaan’s Global Language System concept.

Spoken languages – with a continuated history of written use and international prestige – own historical vocabulary and grammar enough to communicate everything: English, German, French, Spanish, etc.
Spoken languages – with some (interrupted) history of written use and limited international prestige – enough historical vocabulary to build new necessary terms: Polish, Gaelic, Catalan, Occitan, etc.
Spoken languages – with limited historical written use or international prestige – limited vocabulary, clear need of lexical and grammatical borrowings (from 1 or 2) to speak in all situations: Ukrainian, Basque, Sardinian, Saami, etc.
Spoken languages with no written use at all – many expressions and vocabulary not available; taken if needed from prestigious languages (1 or 2, rarely 3): many native American and African languages, and generally all so-called dialects (like Scots, Asturian or Piedmontese) not written down before the last century.
Dead languages – well attested, with enough history of use and [past] international prestige: Classical Latin, Koine Greek, Classical Sanskrit, etc.
Dead languages – some well attested history of use: Archaic Greek, Vedic Sanskrit, Old English, Old French, Old Church Slavonic, etc.
Dead languages – not well attested – need for some writing decyphering and/or interpretation: Hittite, Avestan, Old Norse, Gothic, Old Prussian, etc.
Dead languages – some writings only – writing decyphering and/or interpretation necessary – partially reconstructed with the help of related languages: Mycenaean, Oscan, Gaulish, Cornish, etc.
Hypothetical languages – no writings available – good archaeological knowledge – well reconstructed thanks to attested dialects and related languages: Proto-Germanic, Proto-Indo-Aryan, Proto-Slavic, Proto-Greek, Europe’s Proto-Indo-European, etc.
Dead languages – some writings only – difficult writing decyphering and/or interpretation – available data not enough for a trustable reconstruction: Lusitanian, Thracian, Etruscan, Iberian, etc.
Hypothetical languages – no writings available – some archaeological knowledge – reconstruction available generally deemed correct by linguists – persistent controversy over reconstructed details: Proto-Celtic, Proto-Italic, Proto-Indo-European (III), etc.
Hypothetical languages – insufficient linguistic and archaeological [data for a trustable] reconstruction of actual language, speakers and/or time span: Proto-Indo-European (II or “Indo-Hittite”), Proto-Uralic, Proto-Turkic, Proto-Semitic, Proto-Dravidian, etc.
Hypothetical languages – no academic consensus over its actual shape, but certainty of existence: Early PIE, Proto-Basque, Proto-Albanian, Proto-Armenian, etc.
Corrected languages – strictly based on spoken or dead languages with ‘improvements’: Latino sine flexione, etc.
Corrected languages – strictly based on hypothetical languages with ‘improvements’: Sambahsa-Mundialect (a modern PIE with an easier verbal and nominal inflection, borrowed [non-translated] IE vocabulary, etc.).
Invented languages – loosely based on a homogeneous group of spoken or dead languages: Germanic IAL (mostly Germanic base), Slovio (based on Slavic languages), Interlingua or Lingua Franca Nova (Romance languages), etc.
Invented languages – based on an arbitrary combination (usually deemed “the best” or “the easiest”) of spoken or dead language features: Volapük, Esperanto or Ido (taking mostly European languages); most modern IAL-oriented “conlangs” fit into this category.
Invented languages – artistic or fictional ones, based on living or dead languages or group of languages, created following subjective impressions like ‘beauty’ or ‘aggressiveness’ of its sounds or grammatical features: Klingon, Quenya, etc.
Invented languages – not based on any known native or hypothetical language, but still human-oriented: philosophical or mathematical languages, Lojban, etc.
Invented languages – not human-oriented.

Some additional comments on the language classes:

A) There is no single “completely artificial” or “completely natural” language. Even “level 1” languages, which develop new terms and syntax mostly from their continuated use (and not from outside), have a need for “artificial” or “imported” terms and sentences: like Spanish “hardware”, “software”, “mouse”, “te llamo de vuelta” (a literal translation of Eng. I call you back), or invented terms like “telefonear”, “televisión”, “ordenador/computador”, etc. Even within terms of Latin origin, innovation is often artificially generalized as the standard: as in Spanish “murciélago”, which was in Old Spanish “murciego” (from Lat. mus-caecus, lit. “blind mouse”, “bat”), extended to “murciégalo”, then metathesized to “murciélago”; now, the Royal Spanish Academy Dictionary (which ‘rules over’ the Spanish ‘normative’ or formal language) states that the innovative murciélago is the formal or correct word; usually parents correct children who say “murciégalo”, and the common use of that word is today generally considered a sign of vulgar speech. That is an example of what language regulation artificially adds to seemingly natural languages, just like Classical Latin or Classical Greek norms did impose artificial (or innovative) terms over traditional (i.e. native or more natural) ones. In fact, language regulation in international languages like English, Spanish or Portuguese makes the formal language still more artificial to its speakers, and innovative trends looking for a more natural language emerge: hence the Brasilian push for its own writing rules (and minority calls for being recognized as a different Galician-Portuguese language, like Galician), or US English, Argentinian and Mexican Spanish dialectal proud, expressed in writing and pronunciation, adopting their own standards of formal speech different from the historical one. And even level 20 languages are ultimately based on human perception, so they are necessarily based on nature, and thus never fully artificial, however artificial they might look like…

B) About the Classification:

Dead languages are considered “less natural” than ‘living’ ones because their testimony is not direct. We know of them (mostly) because of writings, so they cannot be “imitated” when spoken as naturally as when directly heard and learned (and pronunciation and style corrected) by native speakers.
Categories 9 and 10 might be interchangeable, depending on who you ask. For me, it’s obvious that a well-reconstructed language is far ‘better’ in the actual shape and knowledge we have from them than dead languages with some inscriptions nobody is able to read and interpret correctly; in that sense, Proto-Germanic is “more natural” than Etruscan, for example…
Also, “corrected” languages could be classified exactly after their “non-corrected” counterparts; thus, level 6 for Latino sine flexione – Classical Latin without declensions – or level 10-12 por Sambahsa-Mundialect – as a European or Common PIE with a simplified inflection system. I don’t think that could be considered the most rational (general) classification, though, as a “corrected language” should be deemed less natural than any other native language, and just before invented ones – because there are a thousand possible “corrections”, and it’s impossible to say which ones are “few enough” for a language be considered “still natural”: for me, an arbitrarily and individually “corrected” language is after a hypothetical one (reconstructed through linguistic studies), and just before a partially invented one, and a partially invented one before a fully invented one. Indeed, if there were a thousand particular classes instead of only 20 general ones, some corrected languages could and should be considered more natural than others.

C) It’s important to note that, as when we talk about Greek we have to distinguish between Proto-Greek, Mycenaean, Archaic Greek, Classical Greek, Koine Greek, etc., when we (at Dnghu) talk about Proto-Indo-European, we refer to the non-laryngeal, Northwestern or European Proto-Indo-European (ca. 2500-2000 BCE). The Indo-European language time span known to us is as follows:

Indo-European I (also Early PIE, Pre-PIE, Paleo-European, etc.) unknown, mostly hypothesis; evolved into Proto-Indo-European II. [Hypothetical locations proposed for IE Urheimat].
Indo-European II (ca. 4000? BC), reconstructed; evolved into Proto-Indo-European III and Hittite. [Map of Kurgan culture]
Indo-European III (ca. 3000 BC), well reconstructed; evolved into Europe’s Proto-Indo-European, Proto-Indo-Iranian, Proto-Greek and Proto-Armenian (possibly Proto-Graeco-Armenian?). [Archaeological map of Yamna & Maykop Cultures]
Europe’s Proto-Indo-European (ca. 2500-2000 BC); evolved into Proto-Germanic, Proto-Celtic, Proto-Italic, Proto-Baltic and Proto-Slavic, among others. [Archaeological map: expansion of Indo-European peoples]

So, when we talk about “reviving PIE for Europe”, we are talking about reviving European (or Northwestern) Proto-Indo-European, which is easier to reconstruct in its vocabulary and syntax details than the general, common Late PIE. Both are obviously well-reconstructed and quite similar (as Old Italian is quite similar to Latin), but there is often no need to determine the exact phonetic value of this or that general PIE word: we only need its European value, which is logically more straightforward. Thus, in PIE *pHter, it is European (and therefore Modern Indo-European) pater because that’s how laryngeal *H evolved in the Northern dialect, no matter how that laryngeal actually sounded like in the common Proto-Indo-European that was spoken in the steppe (or in Renfrew’s Anatolia) a thousand years earlier, to give an Indo-Iranian pitar…

D) Ancient Hebrew probably enters into category 6 (for some maybe 5), and now Modern Hebrew or Israeli fits into category 2 for most people – because there is no continuated language history, and there is (or was) a clear need to borrow “foreign” vocabulary and expressions. That’s similar to what could happen with the European PIE we want to revive, which is in level 9 (or 10), but could be in level 1 if revived – because there is no need for “foreign” vocabulary or expressions to be adapted into PIE, as there are enough Indo-European words and expressions, not only because of the PIE reconstruction, but because of the continuated history of Europe’s Indo-European languages, that allow its modern terms to be ‘translated back’ into PIE… Of course, it could be considered always as a level 2 language, as there will be a need to adapt terms to PIE: like Greek oikonomia to IE woikonomia, etc. BUT, the same need did exist in every Indo-European language, so it’s difficult to classify it (if revived) as 2. Indeed, as Mithridates puts it, both Israeli and MIE could always be considered level 6 and level 9 languages respectively forever, even if they became spoken, but – exactly as it could happen with Esperanto or Ido – once a language is naturally spoken and naturally transmitted from older generations to newer ones – once there is a real generation of native speakers able to twist and shape it, and make it evolve – I think it becomes a more natural one and changes from category; even if we know that its original category was a different one.

NOTE: So, for example, in the history of Italic languages: Proto-Italic (category 12-13), then Old Latin, probably within category 7-8, which became Classical Latin (in category 1 in year 1 AD) nowadays in category 4, and then Romance languages (earlier category 2 or 3, while Classical Latin was still the lingua franca), most of them now within modern categories 1-3.

About the benefits or social need to choose languages from the upper level, more than the lower level ones, if they are available and it’s possible to use them (like European PIE over Esperanto), it is another question I have dealt (and will deal) with in other posts, and which is indeed a matter of personal opinion, like colours. But, to sum it up, it’s not that I or others might prefer it from a rational point of view; the real question is that people – because of their cultural and anthropological backgrounds, not fully known to us – are apparently prepared to accept language revivals – hopefully then proto-language revivals too, in light of Cornish language revival (from category 8 ) – for cultural, social or political purposes, while there has been no real success stories in invented languages, but for some limited groups of enthusiasts who try to continuously overestimate number of speakers, prestige, use, etc.

So, if the objective is to speak a common language in the European Union (and not “to unite the world” or “to speak the easiest language possible” or “to communicate with a lingua franca“, etc.), just like there was a clear objective of speaking a common, unifying language in Israel, maybe the correct answer is to select the most rational common language among those available for us Europeans. We can keep speaking English, or a combination of English-French-German, or any combination of any three EU official languages; but, for me, it’s a common European PIE we can speak as OUR language anywhere in Europe, not just a lingua franca or a combination of them, the best option to be a really united people of Europe.

When a language should be considered artificial – A quick classification of spoken, dead, hypothetical and invented languages

Published by Carlos Quiles

Join the discussion...