Bell Beaker/early Late Neolithic (NOT Corded Ware/Battle Axe) identified as forming the Pre-Germanic community in Scandinavia


I wrote recently about the newly created Indo-European Corded Ware Theory group, which represents today the last dying effort to sustain the outdated model of the ‘Kurgan peoples’.

Archaeology and Linguistics (like Genetics) keeps slowly but relentlessly rejecting all the Kurgan model‘s foundations, safe for the steppe origin of Indo-European expansion.

The book Language and Prehistory of the Indo-European Peoples. A Cross-Disciplinary perspective. Eds. A. Hyllested, B.N. Whitehead, Th. Olander and B. Anette. Copenhagen Studies in Indo-European. Museum Tusculanum Press, Copenhagen, has been recently published (December 2017).

In it, Christopher Prescott contributes to the history of Indo-European migrations to Scandinavia and the formation of a common Nordic language, ancestral to Proto-Germanic.

A draft of his chapter is downloadable in Dramatic beginnings of Norway’s history? Archaeology and Indo-Europeanization.

Here are some excerpts from the text:

Thus archaeology can deal with the question of Indo-Europeans through material culture, and archaeology can contribute to unraveling the events leading up to the fact that Indo-European languages were spread from the Indian Ocean to the northwestern European Arctic in pre- and proto-history. In 1995, Prescott and Walderhaug tentatively argued that a dramatic transformation took place in Norway around the Late Neolithic (2350 BCE), and that the swift nature of this transition was tied to the initial Indo-Europeanization of southern and coastal Norway, at least to Trøndelag and perhaps as far north as Troms. Although this interpretation cannot be “proven” in any positivist sense of the word (though aDNA and isotope studies have added a new layer of relevant data), in light of the last ten years of research and excavations, it is has become an increasingly reasonable hypothesis (e.g., Engedal 2002, Fari 2006, Håland and Håland 2000, Kristiansen 2004, Melheim 2006, Østmo 1996, also Kvalø 2007, Larsson 1997).


The Late Neolithic transformation gives rise to a cultural platform where most of southerly Norway is incorporated into the Nordic sphere. Interaction is no longer over borders, rather within a common cultural arena. Locally, the cultural institutions provide a base for the continued dynamic development through the Late Neolithic and Bronze Age. On a larger geographic and historical scale, incorporation into this field of interaction opens even the most peripheral parts of southern Norway to the streams of culture and events that shape Europe’s Bronze Age history, for example those originating from within Unetice, Tumulus Culture, Urnfeld and Hallstatt.


Changes in Scandinavia Norway are linked to wider transformations in Europe. Culturally, both Corded Ware Battle Axe and the Bell Beaker are important referential easterly and westerly European cultural horizons. Both these horizons affect and transform Northern Europe, so developments in Norway are not isolated affairs. Needless to say, though often regarded as Indo-European, the processes leading to and the affect of these cultural horizons is discussed for other parts of Europe as well (Mallory 1989:243ff).

Though there are reasonable arguments to assign both Corded Ware groups and bell Beaker groups Indo-European affiliations, the Corded Ware/Battle Axe horizon did not transform large parts of the Scandinavian Peninsula, nor can this horizon be identifies as the source of the practices, forms and institutions that characterize the ensuing Late Neolithic and Bronze Age. The Bell Beaker/early Late Neolithic, however, represents a source and beginning of these institution and practices, exhibits continuity to the following metal age periods and integrated most of Northern Europe’s Nordic region into a set of interaction fields. This happened around 2400 BCE, at the MNB to LN transition.

Though much is tentative and conjecture, multiple sources indicate that ideology, cosmology, myths social organization and probably language were Indo-European in the Bronze Age, and the development of the Bronze Age is rooted in the preceding Late Neolithic. Though the evidence also indicates that the initial Indo-European encounters, indeed “colliding worlds”, were probably experienced in the Middle Neolithic B, the archaeological record points to the time around transition to the Late Neolithic as the chronologically defining threshold for the entrenchment of an Indo-European platform throughout what would become the Nordic Bronze Age region in Norway. The Late Neolithic is therefore the most likely candidate for the introduction of the foundation for economic, social and ideological institutions, that is Giddens’ “deeply layered structure[s]”, that are fundamental to the development of the region’s identities, also ethnic, in the millennia to come.

Diachronic map of migrations in Europe ca. 2250-1750 BC, after the Bell Beaker invasion, the most likely time of formation of a common Nordic language, ancestor of Proto-Germanic.

Mind you, not that these actual archaeological and linguistic models will deter anyone from supporting tentative sketches of a fictional ‘kurgan people’ that became outdated almost 60 years ago now – especially if they fit certain desires of ancestral ethnolinguistic identification with modern populations…


  • Indo-European demic diffusion model, 3rd edition (revised October 2017)
  • The renewed ‘Kurgan model’ of Kristian Kristiansen and the Danish school: “The Indo-European Corded Ware Theory”
  • Globular Amphora not linked to Pontic steppe migrants – more data against Kristiansen’s Kurgan model of Indo-European expansion
  • Correlation does not mean causation: the damage of the ‘Yamnaya ancestral component’, and the ‘Future America’ hypothesis
  • New Ukraine Eneolithic sample from late Sredni Stog, near homeland of the Corded Ware culture
  • Something is very wrong with models based on the so-called ‘steppe admixture’ – and archaeologists are catching up
  • Germanic–Balto-Slavic and Satem (‘Indo-Slavonic’) dialect revisionism by amateur geneticists, or why R1a lineages *must* have spoken Proto-Indo-European
  • Heyd, Mallory, and Prescott were right about Bell Beakers
  • Coexistence of two different populations in Gotland during the Middle Neolithic


    New insights on cultural dualism and population structure in the Middle Neolithic Funnel Beaker culture on the island of Gotland, by Fraser et al., in Journal of Archaeological Science: Reports (2017).

    Abstract (emphasis mine):

    In recent years it has been shown that the Neolithization of Europe was partly driven by migration of farming groups admixing with local hunter-gatherer groups as they dispersed across the continent. However, little research has been done on the cultural duality of contemporaneous foragers and farming populations in the same region. Here we investigate the demographic history of the Funnel Beaker culture [Trichterbecherkultur or TRB, c. 4000–2800 cal BCE], and the sub-Neolithic Pitted Ware culture complex [PWC, c. 3300–2300 cal BCE] during the Nordic Middle Neolithic period on the island of Gotland, Sweden. We use a multidisciplinary approach to investigate individuals buried in the Ansarve dolmen, the only confirmed TRB burial on the island. We present new radiocarbon dating, isotopic analyses for diet and mobility, and mitochondrial DNA haplogroup data to infer maternal inheritance. We also present a new Sr-baseline of 0.71208 ± 0.0016 for the local isotope variation. We compare and discuss our findings together with that of contemporaneous populations in Sweden and the North European mainland.

    The radiocarbon dating and Strontium isotopic ratios show that the dolmen was used between c. 3300–2700 cal BCE by a population which displayed local Sr-signals. Mitochondrial data show that the individuals buried in the Ansarve dolmen had maternal genetic affinity to that of other Early and Middle Neolithic farming cultures in Europe, distinct from that of the contemporaneous PWC on the island. Furthermore, they exhibited a strict terrestrial and/or slightly varied diet in contrast to the strict marine diet of the PWC. The findings indicate that two different contemporary groups coexisted on the same island for several hundred years with separate cultural identity, lifestyles, as well as dietary patterns.

    “Map indicating distribution of TRB-North group megalithic tombs (Blomqvist, 1989; Midgley, 2008; Sjögren, 2003; Tilley, 1999) and PWC areas (Larsson, 2009) modified from (Malmström et al., 2009). Swedish megalithic TRB burial sites included in the analyses: 1. Gökhem passage grave, Falköping, Västergötland, 2. Alvastra dolmen, Östergötland, 3. Mysinge passage grave, Resmo, Öland, 4. Ansarve dolmen, Tofta, Gotland, and 5. the Ostorf TRB burial ground, Mecklenburg-Vorpommern, Germany.”

    If you are interested in knowing more details about settlements on the island, I recommend you to read Early Holocene human population events on the island of Gotland in the Baltic Sea (9200-3800 cal. BP), by Jan Apel, downloadable here.

    It is important to remember cases like this one when speaking about the steppe as representing a single culture and people, speaking the same language, no matter the period in question and the archaeological cultures involved…


    Featured image: Diachronic map of Early Neolithic migrations ca. 5000-4000 BC.


    Expansion of peoples associated with spread of haplogroups: Mongols and C3*-F3918, Arabs and E-M183 (M81)


    Two recent interesting papers on the potential expansion of cultures associated with haplogroups:

    1. Whole Y-chromosome sequences reveal an extremely recent origin of the most common North African paternal lineage E-M183 (M81), by Solé-Morata et al., Scientific Reports (2017).


    E-M183 (E-M81) is the most frequent paternal lineage in North Africa and thus it must be considered to explore past historical and demographical processes. Here, by using whole Y chromosome sequences from 32 North African individuals, we have identified five new branches within E-M183. The validation of these variants in more than 200 North African samples, from which we also have information of 13 Y-STRs, has revealed a strong resemblance among E-M183 Y-STR haplotypes that pointed to a rapid expansion of this haplogroup. Moreover, for the first time, by using both SNP and STR data, we have provided updated estimates of the times-to-the-most-recent-common-ancestor (TMRCA) for E-M183, which evidenced an extremely recent origin of this haplogroup (2,000–3,000 ya). Our results also showed a lack of population structure within the E-M183 branch, which could be explained by the recent and rapid expansion of this haplogroup. In spite of a reduction in STR heterozygosity towards the West, which would point to an origin in the Near East, ancient DNA evidence together with our TMRCA estimates point to a local origin of E-M183 in NW Africa.

    Distribution of E-M183 subclades among North Africa, the Near East and the Iberian Peninsula. Pie chart sectors areas are proportional to haplogroup frequency and are coloured according to haplogroup in the schematic tree to the right. n: sample size. Map was generated using R software.

    An interesting excerpt, from the discussion:

    Regarding the geographical origin of E-M183, a previous study suggested that an expansion from the Near East could explain the observed east-west cline of genetic variation that extends into the Near East. Indeed, our results also showed a reduction in STR heterozygosity towards the West, which may be taken to support the hypothesis of an expansion from the Near East. In addition, previous studies based on genome-wide SNPs reported that a North African autochthonous component increase towards the West whereas the Near Eastern decreases towards the same direction, which again support an expansion from the Near East. However, our correlations should be taken carefully because our analysis includes only six locations on the longitudinal axis, none from the Near East. As a result, we do not have sufficient statistical power to confirm a Near Eastern origin. In addition, rather than showing a west-to-east cline of genetic diversity, the overall picture shown by this correlation analysis evidences just low genetic diversity in Western Sahara, which indeed could be also caused by the small sample size (n = 26) in this region. Alternatively, given the high frequency of E-M183 in the Maghreb, a local origin of E-M183 in NW Africa could be envisaged, which would fit the clear pattern of longitudinal isolation by distance reported in genome-wide studies. Moreover, the presence of autochthonous North African E-M81 lineages in the indigenous population of the Canary Islands, strongly points to North Africa as the most probable origin of the Guanche ancestors. This, together with the fact that the oldest indigenous inviduals have been dated 2210 ± 60 ya, supports a local origin of E-M183 in NW Africa. Within this scenario, it is also worth to mention that the paternal lineage of an early Neolithic Moroccan individual appeared to be distantly related to the typically North African E-M81 haplogroup30, suggesting again a NW African origin of E-M183. A local origin of E-M183 in NW Africa > 2200 ya is supported by our TMRCA estimates, which can be taken as 2,000–3,000, depending on the data, methods, and mutation rates used.

    The TMRCA estimates of a certain haplogroup and its subbranches provide some constraints on the times of their origin and spread. Although our time estimates for E-M78 are slightly different depending on the mutation rate used, their confidence intervals overlap and the dates obtained are in agreement with those obtained by Trombetta et al Regarding E-M183, as mentioned above, we cannot discard an expansion from the Near East and, if so, according to our time estimates, it could have been brought by the Islamic expansion on the 7th century, but definitely not with the Neolithic expansion, which appeared in NW Africa ~7400 BP and may have featured a strong Epipaleolithic persistence. Moreover, such a recent appearance of E-M183 in NW Africa would fit with the patterns observed in the rest of the genome, where an extensive, male-biased Near Eastern admixture event is registered ~1300 ya, coincidental with the Arab expansion. An alternative hypothesis would involve that E-M183 was originated somewhere in Northwest Africa and then spread through all the region. Our time estimates for the origin of this haplogroup overlap with the end of the third Punic War (146 BCE), when Carthage (in current Tunisia) was defeated and destroyed, which marked the beginning of Roman hegemony of the Mediterranean Sea. About 2,000 ya North Africa was one of the wealthiest Roman provinces and E-M183 may have experienced the resulting population growth.

    2. The Y-chromosome haplogroup C3*-F3918, likely attributed to the Mongol Empire, can be traced to a 2500-year-old nomadic group, by Zhang et al., Journal of Human Genetics (2017)


    The Mongol Empire had a significant role in shaping the landscape of modern populations. Many populations living in Eurasia may have been the product of population mixture between ancient Mongolians and natives following the expansion of Mongol Empire. Geneticists have found that most of these populations carried the Y-haplogroup C3* (C-M217). To trace the history of haplogroup (Hg) C3* and to further understand the origin and development of Mongolians, ancient human remains from the Jinggouzi, Chenwugou and Gangga archaeological sites, which belonged to the Donghu, Xianbei and Shiwei, respectively, were analysed. Our results show that nine of the eleven males of the Gangga site, two of the eight males of Chengwugou site and all of the twelve males of Jinggouzi site were found to have mutations at M130 (Hg C), M217 (Hg C3), L1373 (C2b, ISOGG2015), with the absence of mutations at M93 (Hg C3a), P39 (Hg C3b), M48 (Hg C3c), M407 (Hg C3d) and P62 (Hg C3f). These samples were attributed to the Y-chromosome Hg C3* (Hg C2b, ISOGG2015), and most of them were further typed as Hg C2b1a based on the mutation at F3918. Finally, we inferred that the Y-chromosome Hg C3*-F3918 can trace its origins to the Donghu ancient nomadic group.

    The development of Mongolia and the frequencies of haplogroup C3* in modern Eurasians. a The development of Mongolia. b The frequencies of haplogroup C3 in modern Eurasians. The dotted line represents the approximate boundary between the Xiongnu and the Donghu. The black and grey arrows denote the migration of the Donghu and Mongolians, respectively

    The expansion of peoples is known to be associated with the spread of a certain admixture component, joint with the expansion and reduction in variability of a haplogroup. In other words, few male lineages are usually more successful during the expansion.

    Other known examples include:

    Featured image: Diachronic map of Iron Age migrations ca. 750-250 BC.


    The concept of “Outlier” in Human Ancestry (II): Early Khvalynsk, Sredni Stog, West Yamna, Iron Age Bulgaria, Potapovka, Andronovo…


    I already wrote about the concept of outlier in Human Ancestry, so I am not going to repeat myself. This is just an update of “outliers” in recent studies, and their potential origins (here I will repeat some of the examples):

    Early Khvalynsk: the three samples from the Samara region have quite different positions in PCA, from nearest to EHG (of Y-DNA haplogroup R1a) to nearest to ANE ancestry (of Y-DNA haplogroup Q). This could represent the initial consequences of the second wave of ANE ancestry – as found later in Yamna samples from a neighbouring region -, possibly brought then by Eurasian migrants related to haplogroup Q.
    With only 3 samples, this is obviously just a tentative explanation of the finds. The samples can only be reasonably said to show an unstable time for the region in terms of admixture (i.e. probably migration), judging by the data on PCA.

    Ukraine Eneolithic samples offer a curious example of how the concept of outlier can change radically: from the third version (May 30th) of the preprint paper of Mathieson et al. (2017), when the Ukraine Eneolithic sample with steppe ancestry (and clustering with central European samples) was the ‘outlier’, to the fourth version (September 19th), when two samples with steppe ancestry clustering close to Corded Ware samples were now the ‘normal’ ones (i.e. those representing Ukraine Eneolithic population), and the outlier was the one clustering closely with Ukraine Mesolithic samples…

    PCA and Admixture for south-eastern Europe. Image modified from Mathieson et al. (2017) – Third revision (May 30th), used in the 2nd edition of the Indo-European demic diffusion model.

    This is one of the funny consequences of the wrong interpretation of the ‘yamnaya component’, that made geneticists believe at first that, out of two samples (!), the ‘outlier’ was the one with ‘yamnaya’ ancestry, because this component would have been brought by an eastern immigrant from early Khvalynsk…

    This example offers yet another reason why precise anthropological context is necessary to offer the right interpretation of results. Within the Indo-European demic diffusion model – based mainly on Archaeology and Linguistics – , the sample with steppe ancestry was the most logical find in the region for a potential origin of the Corded Ware culture, and it was interpreted as such, well before the publication of the fourth version of Mathieson et al. (2017).

    PCA of South-East European and other European samples. Image modified from Mathieson et al. (2017) – Fourth revision (September 19th), used in the 3rd edition of the Indo-European demic diffusion model.

    West Yamna (to insist on the same question, the ‘yamnaya’ component): we have only four western Yamna samples, two of them showing Anatolian Neolithic ancestry (one of them, from Ukraine, with a strong ‘southern’ drift). On the other hand, Corded Ware migrants do not show this. So we could infer that their migrations were not coetaneous: whereas peoples of Corded Ware culture expanded ca. 3300 BC to the north – in the natural corridor to the Baltic that has been proposed for this culture in Archaeology for decades (and that is well represented by Ukraine Eneolithic samples) -, peoples of Yamna culture expanded to the west, replacing the Ukraine Eneolithic population (i.e. probably those of ‘Proto-Corded Ware culture’), and eventually mixing with Balkan populations of Anatolian Neolithic ancestry.

    Potapovka, Andronovo, and Srubna: while Potapovka clusters closely to the steppe, and Andronovo (like Sintashta) clusters closely to Corded Ware (i.e. Ukraine Neolithic / Central-East European), both have certain ‘outliers’ in PCA: the former has one individual clustering closely to Corded Ware, and the latter to the steppe. Both ‘outliers’ fit well with the interpretation of the recent mixture of Corded Ware peoples with steppe populations, and they offer a different image for the evolution of populations of Potapovka and Sintashta-Petrovka, potentially influencing their language. The position of Srubna samples, nearer to Sintashta and Andronovo (but occupying the same territory as the previous Potapovka) offers the image of a late westward conquest from Corded Ware-related populations.

    Diachronic map of migrations ca. 2250-1750 BC

    Iron Age Bulgaria: a sample of haplogroup R1a-z93, with more ‘yamnaya’ ancestry than any other previous sample from the Balkans. For some, it might mean continuity from an older time. However – as with the Corded Ware outlier from Esperstedt before it – it is more likely a recent migrant from the steppe. The most likely origin of this individual is therefore people from the steppe, i.e. either the Srubna culture or a related group. Its relatively close cluster in PCA to certain recent Slavic populations can be interpreted in light of the multiple back and forth migrations in the region: of steppe populations to the west (Srubna, Cimmerians, Scythians, Sarmatians,…), and of Slavic-speaking populations:

    Diachronic map of Bronze Age migrations ca. 1750-1250 BC.

    Well-defined outliers are, therefore, essential to understand a recent history of admixture. On the other hand, the very concept of “outlier” can be a dangerous tool – when the lack of enough samples makes their classification as as such unjustified -, leading to the wrong interpretations.


    Globular Amphora not linked to Pontic steppe migrants – more data against Kristiansen’s Kurgan model of Indo-European expansion


    New open access article, Genome diversity in the Neolithic Globular Amphorae culture and the spread of Indo-European languages, by Tassi et al. (2017).


    It is unclear whether Indo-European languages in Europe spread from the Pontic steppes in the late Neolithic, or from Anatolia in the Early Neolithic. Under the former hypothesis, people of the Globular Amphorae culture (GAC) would be descended from Eastern ancestors, likely representing the Yamnaya culture. However, nuclear (six individuals typed for 597 573 SNPs) and mitochondrial (11 complete sequences) DNA from the GAC appear closer to those of earlier Neolithic groups than to the DNA of all other populations related to the Pontic steppe migration. Explicit comparisons of alternative demographic models via approximate Bayesian computation confirmed this pattern. These results are not in contrast to Late Neolithic gene flow from the Pontic steppes into Central Europe. However, they add nuance to this model, showing that the eastern affinities of the GAC in the archaeological record reflect cultural influences from other groups from the East, rather than the movement of people.

    (a) Principal component analysis on genomic diversity in ancient and modern individuals. (b) K = 3,4 ADMIXTURE analysis based only on ancient variation. (a) Principal component analysis of 777 modern West Eurasian samples with 199 ancient samples. Only transversions considered in the PCA (to avoid confounding effects of post-mortem damage). We represented modern individuals as grey dots, and used coloured and labelled symbols to represent the ancient individuals. (b) Admixture plots at K = 3 and K = 4 of the analysis conducted only considering the ancient individuals. The full plot is shown in electronic supplementary material, figure S7. The ancient populations are sorted by a temporal scale from Pleistocene to Iron Age. The GAC samples of this study are displayed in the box on the right.

    Excerpt, from the discussion:

    In its classical formulation, the Kurgan hypothesis, i.e. a late Neolithic spread of proto-Indo-European languages from the Pontic steppes, regards the GAC people as largely descended from Late Neolithic ancestors from the East, most likely representing the Yamna culture; these populations then continued their Westward movement, giving rise to the later Corded Ware and Bell Beaker cultures. Gimbutas [23] suggested that the spread of Indo-European languages involved conflict, with eastern populations spreading their languages and customs to previously established European groups, which implies some degree of demographic change in the areas affected by the process. The genomic variation observed in GAC individuals from Kierzkowo, Poland, does not seem to agree with this view. Indeed, at the nuclear level, the GAC people show minor genetic affinities with the other populations related with the Kurgan Hypothesis, including the Yamna. On the contrary, they are similar to Early-Middle Neolithic populations, even geographically distant ones, from Iberia or Sweden. As already found for other Late Neolithic populations [18], in the GAC people’s genome there is a component related to those of much earlier hunting-gathering communities, probably a sign of admixture with them. At the nuclear level, there is a recognizable genealogical continuity from Yamna to Corded Ware. However, the view that the GAC people represented an intermediate phase in this large-scale migration finds no support in bi-dimensional representations of genome diversity (PCA and MDS), ADMIXTURE graphs, or in the set of estimated f3-statistics.

    Scheme summarizing the five alternative models compared via ABC random forest. We generated by coalescent simulation mtDNA sequences under five models, differing as to the number of migration events considered. The coloured lines represent the ancient samples included in the analysis, namely Unetice (yellow line), Bell Beaker (purple line), Corded Ware (green line) and Globular Amphorae (red line) from Central Europe, Yamnaya (light blue line) and Srubnaya (brown line) from Eastern Europe. The arrows refer to the three waves of migration tested. Model NOMIG was the simplest one, in which the six populations did not have any genetic exchanges; models MIG1, MIG2 and MIG1, 2 differed from NOMIG in that they included the migration events number 1, 2 (from Eastern to Central Europe, respectively before and after the onset of the GAC), or both. Model MIG2, 3 represents a modification of MIG2 model also including a back migration from Central to Eastern Europe after the development of the Corded Ware culture.

    Together with Globular Amphora culture samples from Mathieson et al. (2017), this suggests that Kristiansen’s Indo-European Corded Ware Theory is wrong, even in its latest revised models of 2017.

    The background shading indicates the tree migratory waves proposed by Marija Gimbutas, and personally
    checked by her in 1995. The symbols refer to the ancient populations considered in the ABC analysis

    On the other hand, the article’s genetic finds have some interesting connections in terms of mtDNA phylogeography, but without a proper archaeological model it is difficult to explain them.

    Haplogroup frequencies were obtained for Early Neolithic (EN), Middle Neolithic (MN), Chalcolithic (CA), and Late Neolithic (LN). The color assigned to each haplogroup is represented on the lower right part of each plot. Haplogroup frequencies were plotted geographically using QGIS v2.14.

    Text and images from the article under Creative Commons Attribution 4.0 license.

    Discovered first via Bernard Sécher’s blog.

    See also:

    Evolutionary forces in language change depend on selective pressure, but also on random chance


    A new interesting paper from Nature: Detecting evolutionary forces in language change, by Newberry, Ahern, Clark, and Plotkin (2017). Discovered via Science Daily.

    The following are excerpts of materials related to the publication (written by Katherine Unger Baillie), from The University of Pennsylvania:

    Examining substantial collections of annotated texts dating from the 12th to the 21st centuries, the researchers found that certain linguistic changes were guided by pressures analogous to natural selection — social, cognitive and other factors — while others seem to have occurred purely by happenstance.

    “Linguists usually assume that when a change occurs in a language, there must have been a directional force that caused it,” said Joshua Plotkin, professor of biology in Penn’s School of Arts and Sciences and senior author on the paper. “Whereas we propose that languages can also change through random chance alone. An individual happens to hear one variant of a word as opposed to another and then is more likely to use it herself. Chance events like this can accumulate to produce substantial change over generations. Before we debate what psychological or social forces have caused a language to change, we must first ask whether there was any force at all.”

    “One of the great early American linguists, Leonard Bloomfield, said that you can never see a language change, that the change is invisible,” said Robin Clark, a coauthor and professor of linguistics in Penn Arts and Sciences. “But now, because of the availability of these large corpora of texts, we can actually see it, in microscopic detail, and begin to understand the details of how change happened.”

    One change is the regularization of past-tense verbs. Using the Corpus of Historical American English, comprised of more than 100,000 texts ranging from 1810 to 2009 that have been parsed and digitized — a database that includes more than 400 million words — the team searched for verbs where both regular and irregular past-tense forms were present, for example, “dived” and “dove” or “wed” and “wedded.”

    “There is a vast literature and a lot of mythology on verb regularization and irregularization,” Clark said, “and a lot of people have claimed that the tendency is toward regularization. But what we found was quite different.”

    Indeed, the analysis pointed to particular instances where it seems selective forces are driving irregularization. For example, while a swimmer 200 years ago might have “dived”, today we would say they “dove.” The shift towards using this irregular form coincided with the invention of cars and concomitant increase in use of the rhyming irregular verb “drive”/“drove.”

    Despite finding selection acting on some verbs, “the vast majority of verbs we analyzed show no evidence of selection whatsoever,” Plotkin said.

    The team recognized a pattern: random chance affects rare words more than common ones. When rarely-used verbs changed, that replacement was more likely to be due to chance. But when more common verbs switched forms, selection was more likely to be a factor driving the replacement.

    The grammar of negating a sentence has changed from “Ic ne secge” (Beowulf, c. 900) to “Ic ne sege noht” (the Ormulum, c. 1100) to “I seye not” (Chaucer, c. 1400) to “I doe not say” (Shakespeare, c. 1600) before returning to the familiar “I don’t say” (Virginia Woolf, c. 1900). A team from Penn used massive digital libraries along with inference techniques from population genetics to quantify the forces responsible for language evolution, such as in Jespersen’s cycle of negation, depicted here. (c) Cherissa Dukelow, 2017, license information below

    The authors also observed a role of random chance in grammatical change. The periphrastic “do,” as used in, “Do they say?” or “They do not say,” did not exist 800 years ago. Back in the 1400s, these sentiments would have been expressed as, “Say they?” or “They say not.”

    Using the Penn Parsed Corpora of Historical English, which includes 7 million syntactically parsed words from 1,220 British English texts, the researchers found that the use of the periphrastic “do” emerged in two stages, first in questions (“Don’t they say?”) around the 1500s, and then roughly 200 years later in imperative and declarative statements (“They don’t say.”).

    These manuscripts show changes from Old English (Beowulf) through Middle English (Trinity Homilies, Chaucer) to Early Modern English (Shakespeare’s First Folio). Penn researchers used large collections of digitized texts spanning the 12th to the 21st centuries to show that many language changes can be attributed to random chance alone. (c) Mitchell Newberry, 2017, license information below

    While most linguists have assumed that such a distinctive grammatical feature must have been driven to dominance by some selective pressure, the Penn team’s analysis questions that assumption. They found that the first stage of the rising periphrastic “do” use is consistent with random chance. Only the second stage appears to have been driven by a selective pressure.

    “It seems that, once ‘do’ was introduced in interrogative phrases, it randomly drifted to higher and higher frequency over time,” said Plotkin. “Then, once it became dominant in the question context, it was selected for in other contexts, the imperative and declarative, probably for reasons of grammatical consistency or cognitive ease.”

    As the authors see it, it’s only natural that social-science fields like linguistics increasingly exchange knowledge and techniques with fields like statistics and biology.

    “To an evolutionary biologist,” said Newberry, “it’s important that language is maintained through a process of copying language; people learn language by copying other people. That copying introduces minute variation, and those variants get propagated. Each change is an opportunity for a different copying rate, which is the basis for evolution as we know it.”

    Featured image: copyrighted, modified from the Supplementary information of the article.

    Image (c) Cherissa Dukelow, 2017, licensed under CC-BY-NC-SA 4.0
    Image (c) Mitchell Newberry, 2017,, licensed under CC-BY-NC 4.0 (see materials at University of Pennsylvania for further sources).