When a language should be considered artificial – A quick classification of spoken, dead, hypothetical and invented languages

Following Mithridates’ latest post and comment on artificial language compared to revived language, I consider it appropriate to share my point of view on this subject. For me, the schematic classification of languages into “natural” and “artificial” could be made more or less as follows, from ‘most natural’ (1) to ‘most artificial’ (20):

NOTE 1: There are 20 categories, as there could be just 4 (living, dead, reconstructed and invented) or 6, or 15, or a million categories corresponding to one language each, based on thorough statistical studies of vocabulary, grammar, ‘prestige’, etc. Thus, 20 is only the number that appeared after I classified the languages I know in some personal, general and more or less straightforward classes; the concept looked for by this classification is to locate where proto-languages (and especially Modern Indo-European or Europe’s PIE) are if compared to natural languages and “conlangs”. It is also possibly the minimum number to show the interesting difference between categories 9 and 10.

NOTE 2: one may or may not agree on languages given as examples of this or that particular category; however, the general concept behind individual categories is what matters. For the term ‘(international) prestige’ as it is used here, I took in part as reference Dutch sociologist Abram de Swaan’s Global Language System concept.

  1. Spoken languages – with a continuated history of written use and international prestige – own historical vocabulary and grammar enough to communicate everything: English, German, French, Spanish, etc.
  2. Spoken languages – with some (interrupted) history of written use and limited international prestige – enough historical vocabulary to build new necessary terms: Polish, Gaelic, Catalan, Occitan, etc.
  3. Spoken languages – with limited historical written use or international prestige – limited vocabulary, clear need of lexical and grammatical borrowings (from 1 or 2) to speak in all situations: Ukrainian, Basque, Sardinian, Saami, etc.
  4. Spoken languages with no written use at all – many expressions and vocabulary not available; taken if needed from prestigious languages (1 or 2, rarely 3): many native American and African languages, and generally all so-called dialects (like Scots, Asturian or Piedmontese) not written down before the last century.
  5. Dead languages – well attested, with enough history of use and [past] international prestige: Classical Latin, Koine Greek, Classical Sanskrit, etc.
  6. Dead languages – some well attested history of use: Archaic Greek, Vedic Sanskrit, Old English, Old French, Old Church Slavonic, etc.
  7. Dead languages – not well attested – need for some writing decyphering and/or interpretation: Hittite, Avestan, Old Norse, Gothic, Old Prussian, etc.
  8. Dead languages – some writings only – writing decyphering and/or interpretation necessary – partially reconstructed with the help of related languages: Mycenaean, Oscan, Gaulish, Cornish, etc.
  9. Hypothetical languages – no writings available – good archaeological knowledge – well reconstructed thanks to attested dialects and related languages: Proto-Germanic, Proto-Indo-Aryan, Proto-Slavic, Proto-Greek, Europe’s Proto-Indo-European, etc.
  10. Dead languages – some writings only – difficult writing decyphering and/or interpretation – available data not enough for a trustable reconstruction: Lusitanian, Thracian, Etruscan, Iberian, etc.
  11. Hypothetical languages – no writings available – some archaeological knowledge – reconstruction available generally deemed correct by linguists – persistent controversy over reconstructed details: Proto-Celtic, Proto-Italic, Proto-Indo-European (III), etc.
  12. Hypothetical languages – insufficient linguistic and archaeological [data for a trustable] reconstruction of actual language, speakers and/or time span: Proto-Indo-European (II or “Indo-Hittite”), Proto-Uralic, Proto-Turkic, Proto-Semitic, Proto-Dravidian, etc.
  13. Hypothetical languages – no academic consensus over its actual shape, but certainty of existence: Early PIE, Proto-Basque, Proto-Albanian, Proto-Armenian, etc.
  14. Corrected languages – strictly based on spoken or dead languages with ‘improvements’: Latino sine flexione, etc.
  15. Corrected languages – strictly based on hypothetical languages with ‘improvements’: Sambahsa-Mundialect (a modern PIE with an easier verbal and nominal inflection, borrowed [non-translated] IE vocabulary, etc.).
  16. Invented languages – loosely based on a homogeneous group of spoken or dead languages: Germanic IAL (mostly Germanic base), Slovio (based on Slavic languages), Interlingua or Lingua Franca Nova (Romance languages), etc.
  17. Invented languages – based on an arbitrary combination (usually deemed “the best” or “the easiest”) of spoken or dead language features: Volapük, Esperanto or Ido (taking mostly European languages); most modern IAL-oriented “conlangs” fit into this category.
  18. Invented languages – artistic or fictional ones, based on living or dead languages or group of languages, created following subjective impressions like ‘beauty’ or ‘aggressiveness’ of its sounds or grammatical features: Klingon, Quenya, etc.
  19. Invented languages – not based on any known native or hypothetical language, but still human-oriented: philosophical or mathematical languages, Lojban, etc.
  20. Invented languages – not human-oriented.

Some additional comments on the language classes:

A) There is no single “completely artificial” or “completely natural” language. Even “level 1” languages, which develop new terms and syntax mostly from their continuated use (and not from outside), have a need for “artificial” or “imported” terms and sentences: like Spanish “hardware”, “software”, “mouse”, “te llamo de vuelta” (a literal translation of Eng. I call you back), or invented terms like “telefonear”, “televisión”, “ordenador/computador”, etc. Even within terms of Latin origin, innovation is often artificially generalized as the standard: as in Spanish “murciélago”, which was in Old Spanish “murciego” (from Lat. mus-caecus, lit. “blind mouse”, “bat”), extended to “murciégalo”, then metathesized to “murciélago”; now, the Royal Spanish Academy Dictionary (which ‘rules over’ the Spanish ‘normative’ or formal language) states that the innovative murciélago is the formal or correct word; usually parents correct children who say “murciégalo”, and the common use of that word is today generally considered a sign of vulgar speech. That is an example of what language regulation artificially adds to seemingly natural languages, just like Classical Latin or Classical Greek norms did impose artificial (or innovative) terms over traditional (i.e. native or more natural) ones. In fact, language regulation in international languages like English, Spanish or Portuguese makes the formal language still more artificial to its speakers, and innovative trends looking for a more natural language emerge: hence the Brasilian push for its own writing rules (and minority calls for being recognized as a different Galician-Portuguese language, like Galician), or US English, Argentinian and Mexican Spanish dialectal proud, expressed in writing and pronunciation, adopting their own standards of formal speech different from the historical one. And even level 20 languages are ultimately based on human perception, so they are necessarily based on nature, and thus never fully artificial, however artificial they might look like…

B) About the Classification:

  1. Dead languages are considered “less natural” than ‘living’ ones because their testimony is not direct. We know of them (mostly) because of writings, so they cannot be “imitated” when spoken as naturally as when directly heard and learned (and pronunciation and style corrected) by native speakers.
  2. Categories 9 and 10 might be interchangeable, depending on who you ask. For me, it’s obvious that a well-reconstructed language is far ‘better’ in the actual shape and knowledge we have from them than dead languages with some inscriptions nobody is able to read and interpret correctly; in that sense, Proto-Germanic is “more natural” than Etruscan, for example…
  3. Also, “corrected” languages could be classified exactly after their “non-corrected” counterparts; thus, level 6 for Latino sine flexione – Classical Latin without declensions – or level 10-12 por Sambahsa-Mundialect – as a European or Common PIE with a simplified inflection system. I don’t think that could be considered the most rational (general) classification, though, as a “corrected language” should be deemed less natural than any other native language, and just before invented ones – because there are a thousand possible “corrections”, and it’s impossible to say which ones are “few enough” for a language be considered “still natural”: for me, an arbitrarily and individually “corrected” language is after a hypothetical one (reconstructed through linguistic studies), and just before a partially invented one, and a partially invented one before a fully invented one. Indeed, if there were a thousand particular classes instead of only 20 general ones, some corrected languages could and should be considered more natural than others.

C) It’s important to note that, as when we talk about Greek we have to distinguish between Proto-Greek, Mycenaean, Archaic Greek, Classical Greek, Koine Greek, etc., when we (at Dnghu) talk about Proto-Indo-European, we refer to the non-laryngeal, Northwestern or European Proto-Indo-European (ca. 2500-2000 BCE). The Indo-European language time span known to us is as follows:

  1. Indo-European I (also Early PIE, Pre-PIE, Paleo-European, etc.) unknown, mostly hypothesis; evolved into Proto-Indo-European II. [Hypothetical locations proposed for IE Urheimat].
  2. Indo-European II (ca. 4000? BC), reconstructed; evolved into Proto-Indo-European III and Hittite. [Map of Kurgan culture]
  3. Indo-European III (ca. 3000 BC), well reconstructed; evolved into Europe’s Proto-Indo-European, Proto-Indo-Iranian, Proto-Greek and Proto-Armenian (possibly Proto-Graeco-Armenian?). [Archaeological map of Yamna & Maykop Cultures]
  4. Europe’s Proto-Indo-European (ca. 2500-2000 BC); evolved into Proto-Germanic, Proto-Celtic, Proto-Italic, Proto-Baltic and Proto-Slavic, among others. [Archaeological map: expansion of Indo-European peoples]

So, when we talk about “reviving PIE for Europe”, we are talking about reviving European (or Northwestern) Proto-Indo-European, which is easier to reconstruct in its vocabulary and syntax details than the general, common Late PIE. Both are obviously well-reconstructed and quite similar (as Old Italian is quite similar to Latin), but there is often no need to determine the exact phonetic value of this or that general PIE word: we only need its European value, which is logically more straightforward. Thus, in PIE *pHter, it is European (and therefore Modern Indo-European) pater because that’s how laryngeal *H evolved in the Northern dialect, no matter how that laryngeal actually sounded like in the common Proto-Indo-European that was spoken in the steppe (or in Renfrew’s Anatolia) a thousand years earlier, to give an Indo-Iranian pitar

D) Ancient Hebrew probably enters into category 6 (for some maybe 5), and now Modern Hebrew or Israeli fits into category 2 for most people – because there is no continuated language history, and there is (or was) a clear need to borrow “foreign” vocabulary and expressions. That’s similar to what could happen with the European PIE we want to revive, which is in level 9 (or 10), but could be in level 1 if revived – because there is no need for “foreign” vocabulary or expressions to be adapted into PIE, as there are enough Indo-European words and expressions, not only because of the PIE reconstruction, but because of the continuated history of Europe’s Indo-European languages, that allow its modern terms to be ‘translated back’ into PIE… Of course, it could be considered always as a level 2 language, as there will be a need to adapt terms to PIE: like Greek oikonomia to IE woikonomia, etc. BUT, the same need did exist in every Indo-European language, so it’s difficult to classify it (if revived) as 2. Indeed, as Mithridates puts it, both Israeli and MIE could always be considered level 6 and level 9 languages respectively forever, even if they became spoken, but – exactly as it could happen with Esperanto or Ido – once a language is naturally spoken and naturally transmitted from older generations to newer ones – once there is a real generation of native speakers able to twist and shape it, and make it evolve – I think it becomes a more natural one and changes from category; even if we know that its original category was a different one.

NOTE: So, for example, in the history of Italic languages: Proto-Italic (category 12-13), then Old Latin, probably within category 7-8, which became Classical Latin (in category 1 in year 1 AD) nowadays in category 4, and then Romance languages (earlier category 2 or 3, while Classical Latin was still the lingua franca), most of them now within modern categories 1-3.

About the benefits or social need to choose languages from the upper level, more than the lower level ones, if they are available and it’s possible to use them (like European PIE over Esperanto), it is another question I have dealt (and will deal) with in other posts, and which is indeed a matter of personal opinion, like colours. But, to sum it up, it’s not that I or others might prefer it from a rational point of view; the real question is that people – because of their cultural and anthropological backgrounds, not fully known to us – are apparently prepared to accept language revivals – hopefully then proto-language revivals too, in light of Cornish language revival (from category 8 ) – for cultural, social or political purposes, while there has been no real success stories in invented languages, but for some limited groups of enthusiasts who try to continuously overestimate number of speakers, prestige, use, etc.

So, if the objective is to speak a common language in the European Union (and not “to unite the world” or “to speak the easiest language possible” or “to communicate with a lingua franca“, etc.), just like there was a clear objective of speaking a common, unifying language in Israel, maybe the correct answer is to select the most rational common language among those available for us Europeans. We can keep speaking English, or a combination of English-French-German, or any combination of any three EU official languages; but, for me, it’s a common European PIE we can speak as OUR language anywhere in Europe, not just a lingua franca or a combination of them, the best option to be a really united people of Europe.

9 thoughts on “When a language should be considered artificial – A quick classification of spoken, dead, hypothetical and invented languages

  1. Buenas tardes Carlos!

    I’m OK with the description given of sambahsa in category n°15. I wonder if there are other conlangs which would pass into this category.
    As it is written on langmaker.com, the uniqueness of sambahsa is that it is a mixture of conlang and reconstructed language (and this makes, I suppose, sambahsa’s strength and richness too).
    Below, you suppose that sambahsa could be eventually ranked under 10 and 12. I agree with 12, but I don’t see why with 10; I’d rather tell 9 or 11.
    Sambahsa still has a four cases declension (nominative, accusative, dative and genitive), but it is only compulsory for pronouns. Those who have difficulty to the full grammar in French on my blog can click on the message “dialogs in sambahsa-mundialect” which is entirely in English. Lesson n°5 explains the pronominal system of sambahsa.

    Best wishes for your work Carlos! That’s great!


  2. Hi, Olivier! As I said, the classification is about the general categories as they are related to Europe’s PIE, more than about other languages’ exact classes. In fact, corrected languages like Sambahsa-Mundialect or Latino sine flexione are the most difficult to classify, as they are apparently ‘more artificial than‘ their counterparts, but nobody – before a thorough study of the modifications or corrections made – could ascertain to which degree they are in fact ‘more artificial than’ other living or dead languages, proto-languages or conlangs… I classified them as a whole and in general just before artificial languages, because that’s where they necessarily end if “too corrected to be the same language”, as what we talked about Uropi, being almost unrelated to PIE… But, indeed, the less modifications a language has – and possibly the more such corrections are adapted to its living or dead dialects – the less artificial they could be considered, as you say. I’ve corrected the definition given of Sambahsa-Mundialect, though 😉

    I’m sorry, Akismet plugin classified your comment as spam. It was by chance that I looked in spam comments; take on account that there can be hundreds each week… If it happens again that your comment doesn’t appear, please contact me. It happened possibly because you put the same website twice – in your nick and in the post. I think you can put up to three different ones within the post without problems, apart from your nick’s one, but I don’t really know how “Akismet anti-spam rules” (so to speak) work: You can link to different pages or posts of your blog, but – I guess – not exactly the same twice. And I cannot live right now without Akismet deleting all spam comments for me 🙁 By the way, I’ve added to your comment a direct link to that “dialog section” of your website you referred to.

  3. I think this is a good summary of a complicated field.

    I would attach “spoken language” to Esperanto. It is spoken in a way that Latino sine flexione or Novial are not.

  4. I’m fed up with reading on your pages (here Indo-European) that Uropi is almost unrelated to Proto-Indo-European. I’ve made a list of 360 Uropi root-words which come directly from Indo-European roots (and the list is not yet completed). You can find examples of these roots on the Uropi blog: http.//uropi.canablog.com/
    Of course, Uropi has not kept the original shape of these roots which seem rather prehistoric to me. But this has been the destiny of these roots when they have survived in modern European languages.I-e sâwel has given Spanish sol and guess what is the Uropi word ? sol. I-e swépmi / swôpeyô has given Latin sopor (sleep) and Scandinavian sove/sova (to sleep): the Uropi word is sopo.
    You only have to open a page of The Uropi-French dictionary to see that the immense majority of Uropi words come from P.I.E roots. For instance, out of the 12 of the first S page (words in sa-, only 4 are not from an Indo-European origin: sabel, from Hungarian szab (sabre), sak (sack),from Greek (an I-E language) sakkos, and sabat, sabadia (Sabbath, saturday) from Hebrew shabbat

  5. @Joël Landais:

    I’m sorry to hear you are fed up with my opinion about Uropi not being related to Proto-Indo-European. But I think you contradict yourself by asserting Uropi is related to PIE and at the same time:

    Of course, Uropi has not kept the original shape of these roots which seem rather prehistoric to me.

    And giving examples of how you changed your selected vocabulary, as e.g. *sawel for “sol”, because you found it “less prehistoric” that way. Not to talk about the morphology and syntax, which are rather “modern European made easy”, as you put it.

    For me – and for some others with which I’ve discussed it – yours is another artificial language, loosely based on PIE, of course, just like Esperanto was loosely based on some European languages. Which is fine.

    Personally, I think it was, as an auxlang project, another step in the right direction after Esperanto’s failure. Not less, but not more than a personal conlang. Good luck.

  6. “after Esperanto’s failure. ”

    What are you talking about? Esperanto is still the biggest articifial language, in terms of literature output, speakers, history etc. It has failed to take over the world, but not as a language.

  7. I’m very puzzled at why you consider languages without a written form, or which only recently acquired a written form, less natural than languages with a longer history of writing. I would consider all languages with a speaker community, which have evolved naturally from older languages which had speaker communities as far back as we know about, to be equally natural, whether they have written forms or not.

    Further, I’m dubious about your classifying less well-attested natural languages as “less natural” than better-attested natural languages. The languages themselves are equally natural; it’s only our knowledge of them that differs. It would be clearer, perhaps, if you say that *our reconstruction of* a dead language is something of a conlang, generally more naturalistic than a conlang made up from scratch but less natural than a living natlang, or than the dead natlang in question was when it was spoken.

    It seems to me that borrowing vocabulary from other natlangs is not a less natural process than compounding, derivation, or stretching the meanings of old words. But if you want to classify languages as less natural because they use many loanwords, why is English in category #1?

    You discuss language academies etc. a little in the latter part of your post, but they don’t seem to figure in your classification scheme proper. I would suggest that colloquial spoken French is as natural as any other natlang spoken by a large and vigorous speaker community, but that standardized written French as regulated by the Academy is somewhat less natural than the typical natlang, though more natural than a reconstructed proto-language or a conlang that has no speakers.

    As an earlier commenter pointed out, your classification of conlangs as more or less natural, more or less artificial, only takes into account their origin as imitating some natural language more or less closely (and that, only in a particular way). It leaves out of account whether the language has acquired a speaker community whose actual usage is now normative, rather than (or more so than) the original language inventor’s blueprints. Esperanto, Interlingua, Ido, probably Toki Pona, maybe even Lojban are all “more natural” by this criterion than Latino Sine Fleksione or Sambahsa, which you rank above them in naturalness.

    Furthermore, your system takes no account, as far as I can tell, of conlangs derived from natural languages not by “correcting” them, as you describe LSF and some other auxlangs, but by putting them through a fictional but naturalistically plausible alternate history of sound changes, grammar changes, and so forth — e.g. Brithenig, Wenedyk, and so forth. I would consider these to be more naturalistic than Interlingua or Esperanto, by the origin criteron, but less natural (by the “live speaker community” criterion) than those actually spoken auxlangs. This suggests that we might want a multi-dimensional scheme for classifying languages, taking into account both size and activity of the speaker community, adherence to known linguistic universals, origin of vocabulary and grammar, etc.

Leave a Reply

Your email address will not be published.