Evolutionary forces in language change depend on selective pressure, but also on random chance


A new interesting paper from Nature: Detecting evolutionary forces in language change, by Newberry, Ahern, Clark, and Plotkin (2017). Discovered via Science Daily.

The following are excerpts of materials related to the publication (written by Katherine Unger Baillie), from The University of Pennsylvania:

Examining substantial collections of annotated texts dating from the 12th to the 21st centuries, the researchers found that certain linguistic changes were guided by pressures analogous to natural selection — social, cognitive and other factors — while others seem to have occurred purely by happenstance.

“Linguists usually assume that when a change occurs in a language, there must have been a directional force that caused it,” said Joshua Plotkin, professor of biology in Penn’s School of Arts and Sciences and senior author on the paper. “Whereas we propose that languages can also change through random chance alone. An individual happens to hear one variant of a word as opposed to another and then is more likely to use it herself. Chance events like this can accumulate to produce substantial change over generations. Before we debate what psychological or social forces have caused a language to change, we must first ask whether there was any force at all.”

“One of the great early American linguists, Leonard Bloomfield, said that you can never see a language change, that the change is invisible,” said Robin Clark, a coauthor and professor of linguistics in Penn Arts and Sciences. “But now, because of the availability of these large corpora of texts, we can actually see it, in microscopic detail, and begin to understand the details of how change happened.”

One change is the regularization of past-tense verbs. Using the Corpus of Historical American English, comprised of more than 100,000 texts ranging from 1810 to 2009 that have been parsed and digitized — a database that includes more than 400 million words — the team searched for verbs where both regular and irregular past-tense forms were present, for example, “dived” and “dove” or “wed” and “wedded.”

“There is a vast literature and a lot of mythology on verb regularization and irregularization,” Clark said, “and a lot of people have claimed that the tendency is toward regularization. But what we found was quite different.”

Indeed, the analysis pointed to particular instances where it seems selective forces are driving irregularization. For example, while a swimmer 200 years ago might have “dived”, today we would say they “dove.” The shift towards using this irregular form coincided with the invention of cars and concomitant increase in use of the rhyming irregular verb “drive”/“drove.”

Despite finding selection acting on some verbs, “the vast majority of verbs we analyzed show no evidence of selection whatsoever,” Plotkin said.

The team recognized a pattern: random chance affects rare words more than common ones. When rarely-used verbs changed, that replacement was more likely to be due to chance. But when more common verbs switched forms, selection was more likely to be a factor driving the replacement.

The grammar of negating a sentence has changed from “Ic ne secge” (Beowulf, c. 900) to “Ic ne sege noht” (the Ormulum, c. 1100) to “I seye not” (Chaucer, c. 1400) to “I doe not say” (Shakespeare, c. 1600) before returning to the familiar “I don’t say” (Virginia Woolf, c. 1900). A team from Penn used massive digital libraries along with inference techniques from population genetics to quantify the forces responsible for language evolution, such as in Jespersen’s cycle of negation, depicted here. (c) Cherissa Dukelow, 2017, license information below

The authors also observed a role of random chance in grammatical change. The periphrastic “do,” as used in, “Do they say?” or “They do not say,” did not exist 800 years ago. Back in the 1400s, these sentiments would have been expressed as, “Say they?” or “They say not.”

Using the Penn Parsed Corpora of Historical English, which includes 7 million syntactically parsed words from 1,220 British English texts, the researchers found that the use of the periphrastic “do” emerged in two stages, first in questions (“Don’t they say?”) around the 1500s, and then roughly 200 years later in imperative and declarative statements (“They don’t say.”).

These manuscripts show changes from Old English (Beowulf) through Middle English (Trinity Homilies, Chaucer) to Early Modern English (Shakespeare’s First Folio). Penn researchers used large collections of digitized texts spanning the 12th to the 21st centuries to show that many language changes can be attributed to random chance alone. (c) Mitchell Newberry, 2017, license information below

While most linguists have assumed that such a distinctive grammatical feature must have been driven to dominance by some selective pressure, the Penn team’s analysis questions that assumption. They found that the first stage of the rising periphrastic “do” use is consistent with random chance. Only the second stage appears to have been driven by a selective pressure.

“It seems that, once ‘do’ was introduced in interrogative phrases, it randomly drifted to higher and higher frequency over time,” said Plotkin. “Then, once it became dominant in the question context, it was selected for in other contexts, the imperative and declarative, probably for reasons of grammatical consistency or cognitive ease.”

As the authors see it, it’s only natural that social-science fields like linguistics increasingly exchange knowledge and techniques with fields like statistics and biology.

“To an evolutionary biologist,” said Newberry, “it’s important that language is maintained through a process of copying language; people learn language by copying other people. That copying introduces minute variation, and those variants get propagated. Each change is an opportunity for a different copying rate, which is the basis for evolution as we know it.”

Featured image: copyrighted, modified from the Supplementary information of the article.

Image (c) Cherissa Dukelow, 2017, licensed under CC-BY-NC-SA 4.0 http://creativecommons.org/licenses/by-nc-sa/4.0/
Image (c) Mitchell Newberry, 2017, https://creativecommons.org/licenses/by-nc/4.0/, licensed under CC-BY-NC 4.0 (see materials at University of Pennsylvania for further sources).


About the European Union’s arcane language: the EU does seem difficult for people to understand

Mark Mardell asks in his post Learn EU-speak:

Does the EU shroud itself in obscure language on purpose or does any work of detail produce its own arcane language? Of course it is not just the lingo: the EU does seem difficult for people to understand. What’s at the heart of the problem?

His answer on the radio (as those comments that can be read in his blog) will probably look for complex reasoning on the nature of the European Union as an elitist institution, distant from real people, on the “obscure language” (intentionally?) used by MEPs, on the need of that language to be obscured by legal terms, etc.

All that is great. You can talk a lot about the possible reasons why people would find too boring those Europarliament discussions where everyone speaks his own national language; possible reasons why important media (like the BBC) would never show debates on important issues, unless the MEP uses their national language; possible reasons why that doesn’t happen with national parliaments where everyone speaks a common language…

But the most probable answer is so obvious it doesn’t really make sense to ask. The initeresting question is do people actually want to pay the price for having a common Europe?

A simple FAQ about the “advantages” of Esperanto and other conlang religions: “easy”, “neutral” and “number of speakers”

This is, as requested by a reader of the Association’s website, a concise FAQ about Esperanto’s supposed advantages:

Note: Information and questions are being added to the FAQ thanks to the comments made by visitors.

1. Esperanto has an existing community of speakers, it is used in daily life, it has native speakers…

Sorry, I don’t know any native speaker of Esperanto, that has Esperanto as mother tongue – Only this Wikipedia article and the Ethnologue “estimations” without references apart from the UEA website. In fact, the only people that are said to be “native Esperanto speakers” are those 4 or 5 famous people who assert they were educated in Esperanto as second language by their parents. Is it enough to assert “I was taught Volapük as mother tongue by my parents” or “I taught my children Esperanto as mother tongue” to believe it, and report “native speaker” numbers? Do, in any case, those dozens of (in this Esperantist sense) native speakers of Klingon or Quenya that have been reported in the press represent something more than a bad joke of their parents?

Furthermore, there is no single community of speakers that use Esperanto in daily life, I just know some yearly so-called World Congresses where Esperantists use some Esperanto words with each other, just like Trekkies use Klingon words in their Congresses, or LOTR fans use Quenya words. Figures about ‘Esperanto speakers’ – and speakers of Interlingua, Ido, Lingua Franca Nova, Lojban or any other conlang – are unproven (there is no independent, trustworthy research) and numbers are usually given by their supporters using rough and simple numbers and estimations, when not completely invented. Studies have been prepared, explained, financed and directed by national or international associations like the “Universala Esperanto-Asocio”, sometimes through some of its members from different universities, which doesn’t turn those informal studies into “University research”. The answer is not: “let’s learn creationism until evolution is proven”, but the other way round, because the burden of proof is on the least explained reason: If you want people to learn a one-man-made code to substitute their natural languages, then first bring the research and then talk about its proven advantages. Esperantists and other conlangers make the opposite, just like proposers of “altenative” medicines, “alternative” history or “alternative” science, and therefore any outputs are corrupted since its start by their false expectatives, facts being blurred, figures overestimated and findings biased in the best case.

2. But people use it in Skype, Firefox, Facebook,… and there are a lot of Google hits for “Esperanto”. And the Wikipedia in Esperanto has a lot of articles!

So what? The Internet is not the real world. If you look for “herbal medicine”, “creationism” or “penis enlargement”, you’ll find a thousand times more information and websites (“Google hits”) than when looking for serious knowledge, say “surgery”. Likewise, you can find more websites in Esperanto than in Modern Hebrew, but Hebrew has already a strong community of (at least) some millions of third-generation native speakers who use Hebrew in daily life, while Esperanto – which had the broadest potential community – has just some hundreds of fans who play with new technologies, having begun both language projects at the same time back in the 19th century.

Also, is the Wikipedia not a language-popularity contest? A competition between conlangers, like Volapükist vs. Esperantists, Ido-ists against Interlingua-ists, Latinists against Anglo-Saxonists, etc. to see which “community” is able to sleep less and do nothing else than “translate” articles to their most spoken “languages”? How many articles have been written in Esperanto or Volapük, or in Anglo-Saxon or Latin, and how many of them have been consulted thereafter, and by how many people? In fact, Volapük wins now in number of articles, so we should all speak Volapük? No, Esperanto is better than Volapük, of course, because of bla bla…
I guess everyone wins here: Wikipedia has more visitors, more people involved and ready to donate, while those language fans have something more to say when discussing the advantages: hey, we have X million articles in the almighty Wikipedia, while your language has less! Esperanto/Volapük/Ido/… is so cool, we have so many “speakers”! Then, congratulations to all of you Wikipedian conlangers; but, if I were you, I wouldn’t think the real world revolves around the Wikipedia, Google or any other (past or future) website popularity.

3. Esperanto is far easier than what you are suggesting. I am fluent in Esperanto, and I only studied 3 hours! And so did my Esperantist friends!

Do you mean something like saying “me spikas lo esperanto linguo” – with that horrible native accent that only your countrymen understand – and then being able to tell anyone “I speak Esperanto fluently after 3 hours of study”? And then speak about two or three sentences made up of a mix of European words more once a year with your Esperantist friends in an international “Congress”, and then switch to English or to your mother tongue to really explain what you wanted to say? Well then yes, to say “I speak Esperanto fluently” or “I learned Esperanto in 2 days” is really really easy – hey, I’ve just discovered I am a fluent speaker of Esperanto, too! Esperanto is so cool…
But, talking about easiness…Have you conlangers noticed it’s “easy” just for (some) Western Europeans, because those “languages” you are using are made of a mix of the most common and simplest vocabulary of some Western European languages, whereas other speakers think it is as difficult as any Western European language? Do you really really think it is easier than English for a Chinese speaker? I guess good old Mr. Zamenhof didn’t realize that English, French, Latin, Italian, German and Polish wouldn’t be the only international languages today as it was back then in the 19th century, when European countries made up almost the whole international community…
Furthermore, do you really really think that supposed ease of use, which is actually because of the lack of elaborated grammatical and syntactical structures, hasn’t got a compensation in culture, communication and even reasoning?

4. But I’ve been told that Esperanto is successful because it has a (mostly) European vocabulary that makes it easy for Europeans, an agglutinative structure that makes it especially fit for Africans and Asians, and some other features that make it better than every other language for everyone…
I won’t be extending into linguistic details, because those assertions are obviously completely arbitrary and untrustworthy. Not only Esperantism has failed to prove such claims, but also some people have dedicated extensive linguistic studies and thoughts to see if that was right – Esperantism has obtained independent criticism by insiders and outsiders alike, and still they claim the same falsenesses again and again. You have e.g. the thorough article “Learn not to speak Esperanto” which, from a conlanger’s point of view, discusses every supposed advantage of this Polish ophthalmologist’s conlang. Also, it is interesting that some researchers have noted the condition of Esperanto for most speakers as an anti-language, as they use the same grammar and words as the main speech community, but in a different way so that they can only be understood by “insiders”. That can indeed be the key to the perceived advantages of Esperanto by Esperantists of different generations and places, just like anti-social people like slang words to communicate with members of their community and to hide from outsiders, and it is especially interesting in light of the condition of Esperantism as an anti-social movement more than a promotion of a language, representing Esperanto with flags, slogans (“democracy”, “rights”, “freedom”,…), international consultative organizations and congresses…

5. You talk about real cultural neutrality for the European Union; but, since there are several non Indo-European languages inside the EU, Proto-Indo-European does not solve that issue either.

In fact, the European Union is made up of a great majority of Indo-European speakers (more than 97% falling short), and the rest – i.e. Hungarians, Finnish, Maltese, Basque speakers – have a great knowledge (and speaking tradition) of other IE languages of Europe, viz. Latin, French, English, Swedish, Spanish. So, we are proposing to adopt a natural language common to the GREAT majority of the European Union citizens (just like Latin is common to the vast majority of Romance-speaking countries), instead of the current official situation(s) of the EU, like English, or English+French, or English+French+German… To say that Indo-European is not neutral as the European Union’s language, because not all languages spoken in the EU are Indo-European, is a weak argument; to say exactly that, and then to propose English, or English+French, or even a two-day-of-work invention (a vocabulary mix of 4 Western European languages) by a Polish ophthalmologist, that’s a big fallacy.

6. So why are you proposing Indo-European? Why do you bother?

Because we want to. Because we like Europe’s Indo-European and the other Proto-Indo-European dialects, just like people who want to study and speak Latin, Greek, or Sanskrit do it. Have you noticed the difference in culture, tradition, history, vocabulary, etc. between what you are suggesting (artificial one-man-made inventions) and real world historical languages? Hint: that’s why many universities offer courses in or about Latin, Greek, Sanskrit, Proto-Indo-European, etc. while Esperanto is still (after more than a century) another conlanging experiment for those who want to travel abroad once a year to meet other conlang fans.
We propose it because we believe this language could be one practical answer (maybe the only real one) for the communication problems that a unified European Union poses. Because we don’t believe that any “Toki Pona” language invented by one enlightened individual can solve any communication or cultural problem at all in the real world. Because historical, natural languages like Hebrew, or Cornish, or Manx, or Basque, are interesting and valuable for people; whereas “languages” like Esperanto, Interlingua, Ido, Lojban or Klingon aren’t. You cannot change how people think, but you can learn from their interests and customs and behave accordingly: if, knowing how people reacted to Esperanto and Hebrew revival proposals after a century, you decide to keep trying to change people (so that they accept inventions) instead of changing your ideas (so that you accept natural languages), maybe you lack the necessary adaptation, a common essential resource in natural selection, appliable to psychology too.

7. Why don’t you explain this when talking about Proto-Indo-European advantages in the Dnghu Association’s website?

Because if you make a website about science, and you include a reference like: “Why you shouldn’t believe in Islamic creationism?” you are in fact saying Islamic creationism is so important that you have to mention it when talking about science… It’s like creating a website about Internal Medicine, and trying to answer in your FAQ why Homeopathy is not the answer for your problems: it’s just not worth it, if you want to keep a serious appearance. We are not the anti-Esperanto league or something, but the Indo-European Language Association.
Apart from this, proto-languages are indeed difficult to promote as ‘real’ languages, because there is no inscription of them, so they remain ‘hypothetical’, however well they might be reconstructed, like Europe’s Indo-European, or Proto-Germanic – see Five lines of ancient script on a shard of pottery could be the longest proto-Canaanite text for a curious example of a proto-language becoming a natural dead one. For many people, Proto-Basque (for example) seems exactly as hypothetical as Proto-Indo-European, when it indeed isn’t. If we also mixed Esperanto within a serious explanation of our project as a real alternative, that would be another reason for readers to dismiss the project as “another conlanging joke”. No, thanks.

8. Esperanto has its advantages and disadvantages. You just don’t talk from an objective (or “neutral”) point of view: most linguists (of any opinion) are – like Esperantists – biased, so there is no single truth, but opinions.

Yes, indeed. Many Esperantists, as any supporter of pseudosciences, conclude that people might be for or against their theory, and that therefore both positions are equally valid and should be taken with a grain of salt. For this question, I think it’s interesting, for those who think in terms of “equal validity” of their minority views when confronted to what is generally accepted, to take a quick look at Wikipedia’s Neutral Poin of View – equal validity statement, because they’ve had a lot of problems with that issue. To sum up, it says that if you talk about biology, you cannot consequently demand that evolution and creationism be placed as equally valid theories, only because some people (are willing to) assume they are; if you talk about the holocaust, or medicine, you don’t place revisionism or alternative medicines as equally valid theories or sciences: there are academic and scientific criteria that help classify knowledge into scientific and pseudoscientific. Most (if not all) Esperantist claims are at best pseudoscientific, and when they claim real advantages of their conlang, those are just as well (often better) applied to other conlangs or even to any language.

9. Then why do the “Universala Esperanto-Asocio” enjoys consultave relations with both UNESCO and the United Nations? Why is Esperantism described as “democracy”, “education”, “rights”, “emancipation”,… Why do still Esperantists support Esperanto, when it hasn’t got any advantages at all, and they know it?
The only conclusion possible is that Esperantism (and some other fanatic conlangism) is actually a religion, because it’s based on faith alone: faith on believed “easiness”, on believed “neutrality”, on believed “number of speakers”, without any facts, numbers or studies to support it; on the belief that languages can be “better” and “worse” than others. And it’s obviously nonsense to discuss faith and beliefs, as useless as a discussion about Buddha, Muhammad or Jesus. But, trying to disguise those beliefs as facts helps nobody, not even Esperantism, as it can only attract those very people that see creationism and alternative medicines as real alternatives to raw scientifical knowledge. Esperanto is the god, Zamenhof the messiah and the UEA its church.

How many words do we use in daily speech? A new study from the Royal Spanish Academy on language acquisition

According to the members of the Royal Spanish Academy (the Real Academia Española), humanities have experienced a decrease in importance for younger generations, English is becoming predominant, language in general is poorer in the Media and in all public speeches, classical languages disappear, people play less attention to reading, and computer terms are invading everything.

All involved in the research agree that language cannot be confined to any artificial limits, that it is mutable, it evolves and changes. However, they warn: it can also get sick and degrade. The mean Spaniard uses generally no more than 1000 words, and only the most educated individuals reach 5000 common words. Some young people use only 240 words daily.

Linguists, paedagogues and psychologists say those who write correctly demonstrate they’ve had an adecuate education, they’ve read books and they’ve exercized their minds. Thanks to that mental exercise we can achieve more elevated stages of reasoning and culture. Those who cannot understand something as basic as his own natural language will not achieve a big progress in his intellectual life, they assure.

Now, regarding those numbers and the concept behind the output of that study: would you say learning mixed conlangs like Esperanto – whose supposed benefits are precisely the ease of use, by taking the most common and simplest European vocabulary – could improve that worsening situation? Or do you think it’s better for European culture‘s sake to learn the ancient language from which Old Latin, Gaulish, Old Norse or Old Slavonic derived? It is probably not the main reason to adopt Europe’s Indo-European as the official language of the European Union, but it is certainly another great reason to learn it without being compelled to…

Source: Terra; read in Menéame

WordPress Translation Plugin: ‘Indoeuropean Translator Widget’ – now also Swedish, Danish, Norwegian, Polish, Greek, Russian, Polish, Romanian, Finnish, Chinese, Japanese, Korean, …

The latest upgrades are only available in the simpler WordPress Translation Widget Plugin.

You can download it from the official WordPress Plugin Repository site. New upgrades will automatically appear on your WordPress blog dashboard.

As always, this widget plugin, when activated from the Design tab of your WordPress blog dashboard, will put links – with the tag rel="nofollow", so that search engines don’t follow them – to automatic translations of that website by mainly Google Translation Engine language pairs, to and from (at least) all of these ones into each other, all in all 24×23 language pairs [more or less the number of language translations needed in the European Union…]

The widget offers translations from and into these languages:

English, German, French, Spanish, Italian, Portuguese, Dutch, Arabic, Bulgarian, Czech, Chinese (traditional and simplified), Danish, Greek, Croatian, Hindi, Korean, Japanese, Norwegian, Polish, Romanian, Russian, Swedish and Finnish.

For the latest changes in version 1.1.1 – following Google Translation Engine changes and improvements, you can visit the official release note.

Upgrades for the simple WordPress plugin available in this blog are therefore discontinued not discontinued, due to the need expressed by some bloggers to have this simpler PHP code inserted in their themes, instead of the less flexible widget.

Thanks for the support.

How ‘difficult’ (using Esperantist terms) is an inflected language like Proto-Indo-European for Europeans?

For native speakers of most modern Romance languages (apart from some reminiscence of the neuter case), Nordic (Germanic) languages, English, Dutch, or Bulgarian, it is usually considered “difficult” to learn an inflected language like Latin, German or Russian: cases are a priori felt as too strange, too “archaic”, too ‘foreign’ to the own system of expressing ideas. However, for a common German, Baltic, Slavic, Greek speaker, or for non-IE speakers of Basque or Uralic languages (Finnish, Hungarian, Estonian), cases are the only way to express common concepts and ideas, and it was also the common way of expression for speakers of older versions of those very uninflected languages, like Old English, Old Norse or Classical Latin; and their speakers didn’t consider their languages “difficult” …

Therefore, to use different cases is the normal way to express concepts that non-inflected languages express in different ways – i.e. not “more easily”, but “differently”. That’s the point Esperantism has lost in its struggle to convince the world of its “easiness”. In fact, the idea that cases are difficult is so impregnated in Esperantism, that some did create “an old version” [probably deemed “more difficult”] of Esperanto called Arcaicam Esperantom, as a fiction of evolution from an older language…

Thus, among the European population (more than 700 million inhabitants), just around 200 million speak non-inflected languages, while the rest use at least 4 cases to express every possible concept. Within the current EU, more or less half of its speakers speak an inflected language – like German, Polish, Czech, Greek, Lithuanian, Slovenian, or non-IE Hungarian, Finnish, etc. – as their mother tongue.

For example, the literal sentence “I go to-the-house” [not exactly the common expression “I go home” which is expressed differently in each language] would be said in Spanish “voy a-la-casa”, or in French “je vais a-la-maison”, in Italian “vado a-la-casa”, etc. Therefore, in an “easy conlang” for Western European speakers, say in something called Esperanto, a sentence like “io vo a-lo-haus” is apparently “easy”, because the syntactical structure is similar to those non-inflected languages.

NOTE: In fact, there are other interesting concepts behind the use of the obligatory subject before the verb in languages like English or Esperanto, that appears usually in those languages that have reduced the verbal system; therefore, the subject is necessary only in those languages whose verbal inflection becomes too simple to express an idea that must still be expressed some way – more or less like different combinations of prepositions and articles are often needed to substitute the lost nominal inflection, as we discuss here. In those ‘less innovative’ languages that retain a rich verbal system, the subject appears for some reason, as e.g. in Spanish “yo voy a la casa”, which must be expressed differently in innovative languages, using different linguistic resources, like e.g. Eng. “I myself go to the house” (or maybe “it’s me who…“), or French “moi, je vais a la maison”. Is that obligatory subject and ‘simplified’ verbal system of Esperanto “easier”, and therefore “better”…? I guess not. It’s just an imitation of French or English that Mr. Zamenhoff deemed “better” for his creation to succeed, given the relevance of those languages (and its speakers’ acceptance) back in 1900…

On the other hand, in German it would be “Ich gehe nach-Haus-e”, in Latin, it is “vado ad-domu-m”; in Polish “idę do-dom-u” etc. The use of declensions, if compared to uninflected languages, is usually made of just a simple change of “preposition+article” -> “declension” – or, in the ‘worst’ case (as it is shown here), by a “preposition+article” -> “preposition+declension”.

To sum up, can some languages be considered “more difficult” than others? Yes, indeed. If seen from a European point of view, some linguistic features are not easy to learn: the Arab writing system, Chinese unending kanjis, Sino-Tibetan or Vietnamese tones, etc. can cause headaches to [adult] speakers willing to learn them… Also, from an English, French or Spanish point of view, learning a language like Esperanto might seem “better” because of its apparent and equivocal “easiness”… But, between (a) all Indo-European speakers learning a non-inflected language like English [or ‘easy’ Esperanto], or (b) all Indo-European speakers learning an inflected one like Proto-Indo-European?; I guess there is no language “easier” than other, and therefore the “better” option should come from other rational considerations, not just faith in the absurd ramblings of an illuminated Polish ophthalmologist.

Therefore, the question remains still the same: why on earth should any European willing to speak a common language select an invented one (from the thousand “super easy” ones available) than a natural one, like the ancestor of most of their mother tongues, Proto-Indo-European?