Evolutionary forces in language change depend on selective pressure, but also on random chance

english-language-evolution

A new interesting paper from Nature: Detecting evolutionary forces in language change, by Newberry, Ahern, Clark, and Plotkin (2017). Discovered via Science Daily.

The following are excerpts of materials related to the publication (written by Katherine Unger Baillie), from The University of Pennsylvania:

Examining substantial collections of annotated texts dating from the 12th to the 21st centuries, the researchers found that certain linguistic changes were guided by pressures analogous to natural selection — social, cognitive and other factors — while others seem to have occurred purely by happenstance.

“Linguists usually assume that when a change occurs in a language, there must have been a directional force that caused it,” said Joshua Plotkin, professor of biology in Penn’s School of Arts and Sciences and senior author on the paper. “Whereas we propose that languages can also change through random chance alone. An individual happens to hear one variant of a word as opposed to another and then is more likely to use it herself. Chance events like this can accumulate to produce substantial change over generations. Before we debate what psychological or social forces have caused a language to change, we must first ask whether there was any force at all.”

“One of the great early American linguists, Leonard Bloomfield, said that you can never see a language change, that the change is invisible,” said Robin Clark, a coauthor and professor of linguistics in Penn Arts and Sciences. “But now, because of the availability of these large corpora of texts, we can actually see it, in microscopic detail, and begin to understand the details of how change happened.”

One change is the regularization of past-tense verbs. Using the Corpus of Historical American English, comprised of more than 100,000 texts ranging from 1810 to 2009 that have been parsed and digitized — a database that includes more than 400 million words — the team searched for verbs where both regular and irregular past-tense forms were present, for example, “dived” and “dove” or “wed” and “wedded.”

“There is a vast literature and a lot of mythology on verb regularization and irregularization,” Clark said, “and a lot of people have claimed that the tendency is toward regularization. But what we found was quite different.”

Indeed, the analysis pointed to particular instances where it seems selective forces are driving irregularization. For example, while a swimmer 200 years ago might have “dived”, today we would say they “dove.” The shift towards using this irregular form coincided with the invention of cars and concomitant increase in use of the rhyming irregular verb “drive”/“drove.”

Despite finding selection acting on some verbs, “the vast majority of verbs we analyzed show no evidence of selection whatsoever,” Plotkin said.

The team recognized a pattern: random chance affects rare words more than common ones. When rarely-used verbs changed, that replacement was more likely to be due to chance. But when more common verbs switched forms, selection was more likely to be a factor driving the replacement.

Language-evolution-hero
The grammar of negating a sentence has changed from “Ic ne secge” (Beowulf, c. 900) to “Ic ne sege noht” (the Ormulum, c. 1100) to “I seye not” (Chaucer, c. 1400) to “I doe not say” (Shakespeare, c. 1600) before returning to the familiar “I don’t say” (Virginia Woolf, c. 1900). A team from Penn used massive digital libraries along with inference techniques from population genetics to quantify the forces responsible for language evolution, such as in Jespersen’s cycle of negation, depicted here. (c) Cherissa Dukelow, 2017, license information below

The authors also observed a role of random chance in grammatical change. The periphrastic “do,” as used in, “Do they say?” or “They do not say,” did not exist 800 years ago. Back in the 1400s, these sentiments would have been expressed as, “Say they?” or “They say not.”

Using the Penn Parsed Corpora of Historical English, which includes 7 million syntactically parsed words from 1,220 British English texts, the researchers found that the use of the periphrastic “do” emerged in two stages, first in questions (“Don’t they say?”) around the 1500s, and then roughly 200 years later in imperative and declarative statements (“They don’t say.”).

old-medieval-modern-english
These manuscripts show changes from Old English (Beowulf) through Middle English (Trinity Homilies, Chaucer) to Early Modern English (Shakespeare’s First Folio). Penn researchers used large collections of digitized texts spanning the 12th to the 21st centuries to show that many language changes can be attributed to random chance alone. (c) Mitchell Newberry, 2017, license information below

While most linguists have assumed that such a distinctive grammatical feature must have been driven to dominance by some selective pressure, the Penn team’s analysis questions that assumption. They found that the first stage of the rising periphrastic “do” use is consistent with random chance. Only the second stage appears to have been driven by a selective pressure.

“It seems that, once ‘do’ was introduced in interrogative phrases, it randomly drifted to higher and higher frequency over time,” said Plotkin. “Then, once it became dominant in the question context, it was selected for in other contexts, the imperative and declarative, probably for reasons of grammatical consistency or cognitive ease.”

As the authors see it, it’s only natural that social-science fields like linguistics increasingly exchange knowledge and techniques with fields like statistics and biology.

“To an evolutionary biologist,” said Newberry, “it’s important that language is maintained through a process of copying language; people learn language by copying other people. That copying introduces minute variation, and those variants get propagated. Each change is an opportunity for a different copying rate, which is the basis for evolution as we know it.”

Featured image: copyrighted, modified from the Supplementary information of the article.

Image (c) Cherissa Dukelow, 2017, licensed under CC-BY-NC-SA 4.0 http://creativecommons.org/licenses/by-nc-sa/4.0/
Image (c) Mitchell Newberry, 2017, https://creativecommons.org/licenses/by-nc/4.0/, licensed under CC-BY-NC 4.0 (see materials at University of Pennsylvania for further sources).

Related:

Schleicher’s Fable in Proto-Indo-European – pitch and stress accent

bell-beaker-village-nwie

Also included in our monograph North-West Indo-European (first draft) is a tentative reconstruction of Schleicher’s fable in North-West Indo-European, and just for illustration of the reconstructed sounds (including pitch and stress accent), a recording has been included.

The recording is available as audio (see above) or video (see below) with captions and multiple subtitles. The captions in North-West Indo-European show acute accents over accented vowels, while stressed syllables are underlined:

I think such a recording was necessary for comparison with the most commonly reconstructed pronunciation, as taught usually in courses. And I am not referring to those professors still using only stress – instead of pitch – accent to pronounce PIE, but to those that, using pitch accent, do place stress over the same syllable.

A good example to illustrate my point is Andrew M. Byrd‘s reading of his version of the fable for the journal Archaeology.

Apart from some controversial decisions regarding the Proto-Indo-Hittite reconstruction – see our explanation of our version, or e.g. Kortlandt’s reconstruction of the Fable (PDF) for more details – , his recitation does not seem to contrast enough pitch and stress accent, to the extent that pitch and stress seem to be always on the same syllable. He specialises in Proto-Indo-European phonology, so maybe it is a voluntary selection.

Firstly, as an introduction – in case you don’t know anything about this question -, a pitch accent is reconstructed for Proto-Indo-European, based on the reconstructed accent of Old Indian, Greek, Germanic, and Balto-Slavic – hence also valid for North-West Indo-European, even though Italo-Celtic lost it completely.

If you have listened to any tonal language*, words have also stress accent, and not necessarily on the same syllable – but usually on the heaviest one. In fact, I don’t know of an accent pattern with pitch+stress on the same syllable (but for certain reconstructed intermediate labile stages of a languages), and I guess it is so redundant that it would always lose one of them.

*pitch-accent systems are also tonal systems, after all, since they involve at least two tones: an acute or rising one, and usually a falling one after it.

You can listen to a sample of the Homeric recitation by Stephen Daitz, with restored Ancient Greek pronunciation, where he contrasts pitch and stress beautifully:

Note: you can buy his readings in restored pronunciation online in Bolchazy-Carducci Publishers. I can’t recommend them highly enough.

You can listen to other samples of Ancient Greek with restored pronunciation by Stefan Hagel (whose Homeric singing is superb), or many others.

To see what I mean with the lack of contrast in Byrd’s pronunciation, just compare the restored pronunciation with these samples, of restored Koine Greek, from the Biblical Language Center. I think you can hear pitch accent pronounced, but always stressing the same syllable. After a while, it gets quite monotone (no pun intended); for me, at least*.

*It seems to be, nevertheless, one of the top rated pronunciations of Koine Greek out there.

Pitch accent in my pronunciation is not as noticeable as that of Stephen Daitz, and still less than that of Stefan Hagel. But it is not intended to.

I wanted to combine tone and stress as naturally as possible, as it is found in modern languages, like Chinese, or like South Slavic, Baltic, or Scandinavian languages. I believe PIE phonology cannot be too different from modern natural examples.

Many Modern Greek scholars complain about the artificiality of the restored pronunciation. I’ve heard particularly harsh criticism against Stefan Hagel’s pronunciation: many scholars do not recognise the ancestral language in the restored pronunciation.

While such critics may seem like snob reactionaries, and I really appreciate an exaggerated poetic style for epic poems (I have spent hundreds, probably thousands, of hours listening to Stephen Daitz), I don’t think this is the way Ancient Greek was usually spoken. Listening to Hagel’s pronunciation in the Ancient Greek Assimil, there is a huge contrast between readers who don’t use the restored pronunciation in the recordings (offering thus a decaffeinated Ancient Greek), and Hagel’s reading (or, almost, singing).

In my interpretation of the fable I have tried to follow these ideas, and maybe in the end the pitch accent is not as acute as it should be (a fifth higher). On the other hand, it seemed more natural to me this way.

Also, in the final version of my reading, there are many words where it is not clear – not even to me – if there is more than one syllable with pitch or stress accent. This is especially so after after my first change of voice to make a more acute ‘sheep voice’, and then worsens with my graver ‘horse voice’. I really thought recording this was going to be easier!

If you have any comments or suggestions on the pronunciation, they are all welcome.

UPDATE (November 2, 2017): Frederik Kortlandt comments our paper – “When comparing PIE with other tonal languages, the best candidate is Japanese, which means that the “stress” falls on the last High syllable of a word form or sequence of connected word forms.”

Our monograph on North-West Indo-European (first draft) is out

I wrote yesterday about the recently updated Indo-European demic diffusion model.

Fernando López-Menchero and I have published our first draft on the North-West Indo-European proto-language. Our contribution concerns mainly phonetics, and namely two of its most controversial aspects: a common process of laryngeal loss and two series of velars for PIE.

There is also an updated linguistic model for the Corded Ware substrate hypothesis, which seeks to explain certain similarities between Germanic and Balto-Slavic, and between Balto-Slavic and Indo-Iranian, and potential isoglosses between the three.

Available links:

As you probably know, our interest is (and has been for the past 15 years or so, even before our common project) the reconstruction of a North-West Indo-European proto-language, the ancestor of Italo-Celtic, Germanic, and Balto-Slavic. At least since Krahe’s proposal of an Alteuropäische substrate to European hydronymy, some 70 years ago, Indo-Europeanists have been supporting an Old European branch of Proto-Indo-European.

Old_European_hydronymic_map_for_the_root__Sal-,__Salm-
Root *sal-, *salm in European river names. Krahe (1949). From Wikipedia.

However, dialectal divisions were tentative. Since Oettinger, some 30 years ago, we have a clearer picture of a group of closely related dialects, namely Italo-Celtic, Germanic, and Balto-Slavic. Although the nature of Balto-Slavic is somehow contended (for the few scholars who support an Indo-Slavonic group), the minimalist view holds that at least the substrate language of Baltic and Slavic, Holzer‘s Temematic, was part of the North-West Indo-European group.

A North-West Indo-European (NWIE) proto-language not only solved the controversial question of Pan-European IE hydronymy (clearly of Late Indo-European nature), but also – and more elegantly – the question on the origin of the many fragmentary languages attested in Western Europe, usually attributed to a “Pre-Celtic” or “Pre-Italic” nature depending on their surrounding languages (Venetic has even said to be related to Germanic…).

proto-indo-european-stages
Stages of Proto-Indo-European evolution. IU: Indo-Uralic; PU: Proto-Uralic; PAn: Pre-Anatolian; PToch: Pre-Tocharian; Fin-Ugr: Finno-Ugric. The period between Balkan IE and Proto-Greek could be divided in two periods: an older one, called Proto-Greek (close to the time when NWIE was spoken), probably including Macedonian, and spoken somewhere in the Balkans; and a more recent one, called Mello-Greek, coinciding with the classically reconstructed Proto-Greek, already spoken in the Greek peninsula (West 2007). Similarly, the period between Northern Indo-European and North-West Indo-European could be divided, after the split of Pre-Tocharian, into a North-West Indo-European proper, during the expansion of Yamna to the west, and an Old European period, coinciding with the formation and expansion of the East Bell Beaker group.

Described first mainly in terms of lexical isoglosses, the concept of a NWIE language was then gradually and strongly founded in common grammatical features, contributed to mainly by the German, North American, and Spanish schools (as you know, the British or French schools are quite divided on the nature of Proto-Indo-European itself…). Recent archaeological models pioneered by Harrison and Heyd (2007) showed how this might have happened, with Yamna migrants that evolved as the East Bell Beaker group, and their subsequent expansion into most of Europe.

Genetics is now clearly supporting such a closely related group, too.

yamna-bell-beaker
Yamna – East Bell Beaker migration 3000-2300 BC according to Heyd in Harrison and Heyd (2007).

The work of Prescott and Walderhaug (1995) on the Pre-Germanic homeland, and the more precise archaeological migration model developed by Prescott clearly established the advent of Bell Beakers in Scandinavia as the key factor for the development of a unitary Pre-Germanic language in Scandinavia during the Dagger Period of the Nordic Late Neolithic.

The nature of Únětice and Mierzanowice/Nitra cultures as of Bell Beaker absorption of preceding Corded Ware cultures made the identification of the Balto-Slavic homeland in the Lusatian culture as quite likely – and this is now being confirmed with the study of Bronze Age samples, like those of the Tollense battlefield, which cluster closely to West Slavic and East German samples.

At the time of Marija Gimbutas’ breakthrough model of the “kurgan peoples” a common dialect from this Old European branch was deemed to be ‘Northern European‘ (or ‘Germano-Balto-Slavic’), which greatly influenced her work, supporting an identification of different burial types as stemming from the same source. This model, rejected already some years after Gimbutas’ proposal, has sadly survived to this day because of tradition (due e.g. to the work and influence of Kristiansen, and to some extent Anthony), and for some years (until the advent of ancient DNA) because of the modern distribution of haplogroup R1a in Europe and its relation to the ancient distribution of the Corded Ware culture.

This traditional model of a ‘Corded Ware -> Bell Beaker expansion of NWIE’ which we also followed until recently, never fit well with the known migrations paths from Yamna (into Balkan Early Bronze Age cultures), with the geographic distribution of Old European hydronymy, or with the guesstimates for Late Indo-European and North-West Indo-European. This compelled us to support a break-up of the proto-language further back in time than warranted by models of language change, and it needed certain unlikely cultural diffusion events over huge areas (because no such migration from Yamna to northern Europe has been attested): along the steppe/forest-steppe zone first, for a diffusion from Yamna into Corded Ware cultures, and along the Danube or the Rhine later, for a diffusion of Corded Ware into Bell Beaker. These models were also based on the wrong interpretation of the first radiocarbon dates of Beakers – placing an origin of the Bell Beaker people in Iberia (which has been rejected in Archaeology, and now also in Genetics).

Such a ‘Germano-Balto-Slavic’ group faded in Linguistics long ago, with most Indo-Europeanists preferring to talk about late contacts (viz. Celto-Germanic or Italo-Germanic contacts), and for some there is – if any subgroup at all – a core West Indo-European or Italo-Celto-Germanic group, which may be supported by recent genetic research on Bell Beaker peoples, with the Beaker group of the Netherlands being the key. Our research on the potential language spoken by Corded Ware peoples – most likely related to Uralic, from an Indo-Uralic community from the Pontic-Caspian steppe – can elegantly explain the isoglosses that both European dialects share.

copper-age-late-bell-beaker
Diachronic map of Late Copper Age migrations including Classical Bell Beaker (east group) expansion from central Europe ca. 2600-2250 BC