Globular Amphora not linked to Pontic steppe migrants – more data against Kristiansen’s Kurgan model of Indo-European expansion


New open access article, Genome diversity in the Neolithic Globular Amphorae culture and the spread of Indo-European languages, by Tassi et al. (2017).


It is unclear whether Indo-European languages in Europe spread from the Pontic steppes in the late Neolithic, or from Anatolia in the Early Neolithic. Under the former hypothesis, people of the Globular Amphorae culture (GAC) would be descended from Eastern ancestors, likely representing the Yamnaya culture. However, nuclear (six individuals typed for 597 573 SNPs) and mitochondrial (11 complete sequences) DNA from the GAC appear closer to those of earlier Neolithic groups than to the DNA of all other populations related to the Pontic steppe migration. Explicit comparisons of alternative demographic models via approximate Bayesian computation confirmed this pattern. These results are not in contrast to Late Neolithic gene flow from the Pontic steppes into Central Europe. However, they add nuance to this model, showing that the eastern affinities of the GAC in the archaeological record reflect cultural influences from other groups from the East, rather than the movement of people.

(a) Principal component analysis on genomic diversity in ancient and modern individuals. (b) K = 3,4 ADMIXTURE analysis based only on ancient variation. (a) Principal component analysis of 777 modern West Eurasian samples with 199 ancient samples. Only transversions considered in the PCA (to avoid confounding effects of post-mortem damage). We represented modern individuals as grey dots, and used coloured and labelled symbols to represent the ancient individuals. (b) Admixture plots at K = 3 and K = 4 of the analysis conducted only considering the ancient individuals. The full plot is shown in electronic supplementary material, figure S7. The ancient populations are sorted by a temporal scale from Pleistocene to Iron Age. The GAC samples of this study are displayed in the box on the right.

Excerpt, from the discussion:

In its classical formulation, the Kurgan hypothesis, i.e. a late Neolithic spread of proto-Indo-European languages from the Pontic steppes, regards the GAC people as largely descended from Late Neolithic ancestors from the East, most likely representing the Yamna culture; these populations then continued their Westward movement, giving rise to the later Corded Ware and Bell Beaker cultures. Gimbutas [23] suggested that the spread of Indo-European languages involved conflict, with eastern populations spreading their languages and customs to previously established European groups, which implies some degree of demographic change in the areas affected by the process. The genomic variation observed in GAC individuals from Kierzkowo, Poland, does not seem to agree with this view. Indeed, at the nuclear level, the GAC people show minor genetic affinities with the other populations related with the Kurgan Hypothesis, including the Yamna. On the contrary, they are similar to Early-Middle Neolithic populations, even geographically distant ones, from Iberia or Sweden. As already found for other Late Neolithic populations [18], in the GAC people’s genome there is a component related to those of much earlier hunting-gathering communities, probably a sign of admixture with them. At the nuclear level, there is a recognizable genealogical continuity from Yamna to Corded Ware. However, the view that the GAC people represented an intermediate phase in this large-scale migration finds no support in bi-dimensional representations of genome diversity (PCA and MDS), ADMIXTURE graphs, or in the set of estimated f3-statistics.

Scheme summarizing the five alternative models compared via ABC random forest. We generated by coalescent simulation mtDNA sequences under five models, differing as to the number of migration events considered. The coloured lines represent the ancient samples included in the analysis, namely Unetice (yellow line), Bell Beaker (purple line), Corded Ware (green line) and Globular Amphorae (red line) from Central Europe, Yamnaya (light blue line) and Srubnaya (brown line) from Eastern Europe. The arrows refer to the three waves of migration tested. Model NOMIG was the simplest one, in which the six populations did not have any genetic exchanges; models MIG1, MIG2 and MIG1, 2 differed from NOMIG in that they included the migration events number 1, 2 (from Eastern to Central Europe, respectively before and after the onset of the GAC), or both. Model MIG2, 3 represents a modification of MIG2 model also including a back migration from Central to Eastern Europe after the development of the Corded Ware culture.

Together with Globular Amphora culture samples from Mathieson et al. (2017), this suggests that Kristiansen’s Indo-European Corded Ware Theory is wrong, even in its latest revised models of 2017.

The background shading indicates the tree migratory waves proposed by Marija Gimbutas, and personally
checked by her in 1995. The symbols refer to the ancient populations considered in the ABC analysis

On the other hand, the article’s genetic finds have some interesting connections in terms of mtDNA phylogeography, but without a proper archaeological model it is difficult to explain them.

Haplogroup frequencies were obtained for Early Neolithic (EN), Middle Neolithic (MN), Chalcolithic (CA), and Late Neolithic (LN). The color assigned to each haplogroup is represented on the lower right part of each plot. Haplogroup frequencies were plotted geographically using QGIS v2.14.

Text and images from the article under Creative Commons Attribution 4.0 license.

Discovered first via Bernard Sécher’s blog.

See also:

The renewed ‘Kurgan model’ of Kristian Kristiansen and the Danish school: “The Indo-European Corded Ware Theory”

Allentoft Corded Ware

A popular science article on Indo-European migrations has appeared at Science News, entitled How Asian nomadic herders built new Bronze Age cultures, signed by Bruce Bower. While the article is well-balanced and introduces new readers to the current status quo of the controversy on Indo-European migrations – including the opposing theories led by Kristiansen/Anthony vs. Heyd – , it reverberates yet again the conclusions of the 2015 Nature articles on the subject, especially with its featured image.

I have argued many times why the recent ‘Yamnaya -> Corded Ware -> Bell Beaker’ migration model is wrong, mainly within my essay Indo-European demic diffusion model, but also in articles of this blog, most recently in the post Correlation does not mean causation: the damage of the ‘Yamnaya ancestral component’, and the ‘Future America’ hypothesis). It is known that Nature is a bit of a ‘tabloid’ in the publishing industry, and these 2015 articles offered simplistic conclusions based on a wrong assessment of archaeological and linguistic data, in search for groundbreaking conclusions.

An excerpt from Bower’s article:

Corded Ware culture emerged as a hybrid way of life that included crop cultivation, breeding of farm animals and some hunting and gathering, Kristiansen argues. Communal living structures and group graves of earlier European farmers were replaced by smaller structures suitable for families and single graves covered by earthen mounds. Yamnaya families had lived out of their wagons even before trekking to Europe. A shared emphasis on family life and burying the dead individually indicates that members of the Yamnaya and Corded Ware cultures kept possessions among close relatives, in Kristiansen’s view.

“The Yamnaya and the Corded Ware culture were unified by a new idea of transmitting property between related individuals and families,” Kristiansen says.

Yamnaya migrants must have spoken a fledgling version of Indo-European languages that later spread across Europe and parts of Asia, Kristiansen’s group contends. Anthony, a longtime Kristiansen collaborator, agrees. Reconstructed vocabularies for people of the Corded Ware culture include words related to wagons, wheels and horse breeding that could have come only from the Yamnaya, Anthony says.

I have already talked about Kristiansen’s continuation of Gimbutas’ outdated ideas: we are seeing a renewed effort by some Scandinavian (mainly Danish) scholars to boost (and somehow capitalise) the revitalised concept of the “Kurgan people”, although now the fundamental issue has been more clearly shifted to the language spoken by Corded Ware migrants.

As far as I can tell, this renewed interest began two years ago, with the simultaneous publication of genetic studies by Haak et al. (2015), and Allentoft et al. (2015), and the misuse of the cursed concept of ‘Yamnaya ancestry‘ to derive far-fetched conclusions.

On the other hand, genetic research is not solely responsible for this: David Anthony – who was apparently consulted by Haak et al. (2015) for their paper, where he appears as co-author – has kept a low (or lower) profile, and only recently has he merely suggested potential links between Corded Ware and Bell Beaker cultures in Lesser Poland, that might explain what (some geneticists have told him) appeared as a potential Yamna -> Corded Ware -> Bell Beaker migration in the first ancient samples studied.

Anthony’s migration model remains otherwise strongly based on Archaeology, offering a careful interpretation of potential contacts and migrations in the Pontic-Caspian steppe, and only marginally offers some views on Linguistics (based on Ringe’s controversial ‘glottochronological model’ of 2006), to the extent that he is compelled to explain the potential adoption of Indo-European by Corded Ware culture (CWC) peoples as multiple cultural diffusion events, since no migration is observed from the steppe to CWC territories.

I think he is thus showing a great deal of restraint, not jumping on the bandwagon of this recent trend based on scarce genetic finds – and therefore losing also the opportunity to publish articles in journals of high impact factor….

This newly created Danish school, on the other hand, seems to be swimming with the tide. Kristiansen, known for his controversial ‘universal’ interpretations of European Prehistory – which are nevertheless more readable and interesting than most specialised literature on Archaeology, at least for us non-archaeologists – , has apparently seized the opportunity to give a strong impulse to his theories.

Not that there is nothing wrong with that, of course, but sometimes it might seem that a lot of papers (or even researchers) support something, when in fact there are only a few of them, working closely together

I see therefore three main “branches” of this support (two of them, Genetics and Linguistics, only recently giving some limited air to this dying hypothesis), with a closely related group of people involved in this model, and they are lending continuous support to each other, by repeating the same theory – and repeating the same misleading map images (like the one shown in the article) – , so that the circular reasoning they represent is concealed behind seemingly independent works.

The theory and its development

The main theory is officially rooted then in Kristiansen’s hypothesis, whose first article on the subject seems to be Prehistoric Migrations – the Case of the Single Grave and Corded Ware Cultures (1989), supporting the Kurgan model applied to the Corded Ware migrations. It was probably a kind of a breakthrough in Archaeology, bringing migration to mainstream Archaeology again (followed closely by Anthony), and he deserves merit for this.

After this proposal, there are mostly just his publications supporting this model. Nevertheless, Kristiansen’s model, I gather, did not involve the sudden Yamnaya -> Corded Ware migrations discussed in recent genetic articles, but long-lasting contacts between peoples and cultures from the North Pontic steppe, Trypillian, and Globular Amphora, that formed a new mixed one, the Corded Ware people and culture. Also, in Gimbutas’ original model of migration (1963), waves of Kurgan migrants are also described into Vučedol and Bell Beaker, which have been apparently forgotten in recent models*.
* The most recent model by Anthony describes such migrations into Early Bronze Age Balkan cultures – as do most archaeological publications today – , but he is unable to recognize migration waves from Yamna into the Corded Ware culture, and because of that describes mere potential routes (or modes) of cultural diffusion including language change.

Proposal for the origin and spread of the Corded Ware/ Battle Axe cultural complex: 1) Distribution of CWC groups; 2) Yamna culture; 3) presumed area of origin; 4) presumed main directions of the primary distribution. Also numbered are other individual CW cultures. From Kristiansen (1989).

Then – skipping the years of simplistic phylogeography based on modern haplogroup distribution – we have to jump directly to Allentoft (of the Natural History Museum of Denmark) and cols. and their article on population genomics of Bronze Age Eurasia (2015), with which Kristiansen collaborated, and which offers the first direct association of Corded Ware as the vector of expansion of Indo-European peoples and languages from Yamna. An interesting take on the Yamna -> Corded Ware -> Bell Beaker question is represented by their very ‘kurgan-like’ Corded Ware-centric map:

Detail of Fig. 1 from Allentoft et al. (2015): “Distribution of Early Bronze Age cultures Yamnaya, Corded Ware, and Afanasievo with arrows showing the Yamnaya expansions”.

And suddenly, we are now seeing more works that support the central thesis of the group – that Corded Ware must have brought Indo-European languages to Europe:

Recent publications by K-G Sjögren – from the same department as Kristiansen, at the University of Gothenburg – seem to imply that there was a direct connection Corded Ware -> Bell Beaker in central Europe.

Guus Kroonen‘s recent hypothesis of a potential (Proto-Semitic-like) Germanic substrate (2012) has been added recently to the cause, in supporting with Iversen (also from the University of Copenhaguen) a link with the Battle Axe/Funnelbeaker culture interaction. However, in the archaeological-linguistic model it seems that Germanic must predominate over the rest of Indo-European languages in terms of age, representing the first wave of Indo-Europeanization in Europe (wat?!), whereas Balto-Slavic is much younger and unrelated…? But didn’t they share the same substrate (as did partially Greek) in Kroonen (2012)? I think Kroonen’s hypothesis might be better explained through an earlier contact in the North Pontic steppe

Modified from Kristiansen et al. (2017). “Schematic representation of how different Indo-European branches have absorbed words (circles) from a lost Neolithic language or language group (dark fill) in the reconstructed European linguistic setting of the third millennium BC, possibly involving one or more hunter gatherer languages (light fill) (after Kroonen & Iversen 2017)”.


This recently created Danish pressure group is not something bad per se. I don’t agree with their hypothesis (or rather evolving hypotheses, since they change with new genetic results and linguistic proposals, as is shown in Kristiansen et al. 2017), but I understand that the group continues a recent tradition:

Publications are always great to advance in knowledge, and if they bring some deal of publicity, and more publications (with the always craved impact factor), and maybe more investment in the departments (with more local jobs and prestige)… why not?

However, this model of workgroup research system is reminiscent of the Anatolian homeland group loosely created around Renfrew; the Palaeolithic Continuity workgroup around Cavalli-Sforza; or (more recently) the Celtic from the West group around Cunliffe and Koch. The difference between Kristiansen’s workgroup and supporters of all those other models, in my opinion, is that (at least for the moment) their collaboration is not obvious to many.

Therefore, to be fair with any outsider, I think this group should clearly state their end model: I propose the general term “Indo-European Corded Ware Theory” (IECWT) workgroup, because ‘Danish’ is too narrow, and ‘Scandinavian’ too broad to represent the whole group. But any name will do.

My opinion on the IECWT

As you can see, no single strong proof exists in support of the IECWT:

  • Not for a solid model of PIE expansion from Corded Ware, not even within the IECWT group, where there is no support (to date) for a Balto-Slavic expansion associated with the Corded Ware culture… Or any other dialect, for that matter;
  • Not for a Corded Ware -> Bell Beaker connection – that is, before the publication of Allentoft et al. (2015) and articles reverberating their conclusions;
  • Not for a unified Pre-Germanic community before the Dagger Period, and still less linked with the expansion of the Corded Ware culture from the steppe – that connection is found only in Anthony (2007), where he links it with a cultural diffusion into Usatovo, which seems too late for a linguistic expansion with Corded Ware peoples, with the current genetic data.

The wrong interpretation of scarce initial ancient samples has been another feeble stone put over the ruins of Gimbutas’ theory. While her simple theory of Kurgan invaders was certainly a breakthrough in her time – when speaking about migrating Indo-European peoples was taboo -, it has since been overcome by more detailed archaeological and linguistic accounts of what happened in east and central Europe during the Chalcolithic and Bronze Age.

However, a lot of people are willing to consume post-truth genetic-based citebait like crazy, in a time when Twitter, Facebook, blogs, etc. seem to shape the general knowledge, while dozens of new, carefully prepared papers on Archaeology and Linguistics related to Indo-European peoples get published weekly and don’t attract any attention, just because they do not support these simplistic claims, or precisely because they fully reject them.

An older connection of Germanic to Scandinavia – and thus an ancestral Indo-European cultural diffusion from north to south – seems to better fit the traditional idea of an autochthonous Germanic homeland in Scandinavia, instead of a bunch of southern Bell Beaker invaders bringing the language that could only later develop as a common Nordic language during the Bronze Age, in a genetically-diverse community…

One is left to wonder whether the support of Corded Ware + haplogroup R1a representing Pre-Germanic is also in line with the most natural human Kossinnian trends, whereby the older your paternal line and your ancestral language are connected to your historical territory, the better. The lack of researchers from Norway – where R1b subclades brought by Bell Beakers peak – in the workgroup is revealing.

Just as we are seeing strong popular pressure e.g. to support the Out of India Theory by Hindu nationalists, or some Slavic people supporting to recreate a ‘Northern IE group’ with a Germano-Balto-Slavic Corded Ware culture – and a renewed interest in skin, hair and eye colour by amateur geneticists – , it is only natural to expect similar autochtonous-first trends in certain regions of the Germanic-speaking community.

NOTE: I feel a bit like an anti-IECWT hooligan here, and once again fulfilling Godwin’s Law. Judging by previous reactions in this blog to criticism of the Out of India Theory, and to criticism of R1a as the vector of expansion of Indo-European languages, this post is likely to cause some people to feel bad.

It is not intended to be against these researchers individually, though. All of them have certainly contributed in great ways to their fields, indeed more than I have to any field: Kristiansen is well-known for his careful, global interpretations of European prehistory (and has been supporting his model for quite a long time). I do like Kroonen’s ideas of a Pre-Germanic substratum. And people involved in the group do so probably because they collaborate closely with each other, and because of the huge pressure to publish in journals of high impact factor, so to mix their disparate research within a common model seems only natural.

But their collaboration is boosting certain wrong ideas, and is giving way to certain misconceptions in Linguistics, and also sadly renewed past ethnocentric views of language in Northern Europe – that will be luckily demonstrated, again, wrong. After all, publications (like ideas in general) are subjected to criticism, as mine are. Researchers who publish know their work is subjected to criticism, and not only before publication, but also – and probably more so – after it. That a paper can be incorrect, biased, or even completely absurd, does not mean the person who wrote it is a fool. That’s the difference between criticising ideas and insulting. If criticism offends you, you shouldn’t be publishing. Period.


Featured image: From Allentoft et al. (2015)“>Allentoft et al. (2015). See here for full caption.

mtDNA haplogroup frequency analysis from Verteba Cave supports a strong cultural frontier between farmers and hunter-gatherers in the North Pontic steppe


New preprint paper at BioRxiv, led by a Japanese researcher, with analysis of mtDNA of Trypillians from Verteba Cave, Analysis of ancient human mitochondrial DNA from Verteba Cave, Ukraine: insights into the origins and expansions of the Late Neolithic-Chalcolithic Cututeni-Tripolye Culture, by Wakabayashi et al. (2017).


Background: The Eneolithic (~5,500 yrBP) site of Verteba Cave in Western Ukraine contains the largest collection of human skeletal remains associated with the archaeological Cucuteni-Tripolye Culture. Their subsistence economy is based largely on agro-pastoralism and had some of the largest and most dense settlement sites during the Middle Neolithic in all of Europe. To help understand the evolutionary history of the Tripolye people, we performed mtDNA analyses on ancient human remains excavated from several chambers within the cave.

Results: Burials at Verteba Cave are largely commingled and secondary in nature. A total of 68 individual bone specimens were analyzed. Most of these specimens were found in association with well-defined Tripolye artifacts. We determined 28 mtDNA D-Loop (368 bp) sequences and defined 8 sequence types, belonging to haplogroups H, HV, W, K, and T. These results do not suggest continuity with local pre-Eneolithic peoples, but rather complete population replacement. We constructed maximum parsimonious networks from the data and generated population genetic statistics. Nucleotide diversity (π) is low among all sequence types and our network analysis indicates highly similar mtDNA sequence types for samples in chamber G3. Using different sample sizes due to the uncertainly in number of individuals (11, 28, or 15), we found Tajima’s D statistic to vary. When all sequence types are included (11 or 28), we do not find a trend for demographic expansion (negative but not significantly different from zero); however, when only samples from Site 7 (peak occupation) are included, we find a significantly negative value, indicative of demographic expansion.

Conclusions: Our results suggest individuals buried at Verteba Cave had overall low mtDNA diversity, most likely due to increased conflict among sedentary farmers and nomadic pastoralists to the East and North. Early Farmers tend to show demographic expansion. We find different signatures of demographic expansion for the Tripolye people that may be caused by existing population structure or the spatiotemporal nature of ancient data. Regardless, peoples of the Tripolye Culture are more closely related to early European farmers and lack genetic continuity with Mesolithic hunter-gatherers or pre-Eneolithic groups in Ukraine.

Genetic finds keep supporting the long-lasting cultural and linguistic frontier that Anthony (2007) – among others – asserted existed in the North-West Pontic steppe in the Mesolithic and Neolithic, between western steppe cultures and farmers, while it disproves Kristiansen’s theories of Sredni Stog expansion in Kurgan waves with a mixture of GAC and Trypillia within the Corded Ware culture:

Previous ancient DNA studies showed that hunter-gatherers before 6,500 yrBP in Europe commonly had haplogroups U, U4, U5, and H, whereas hunter-gatherers after 6,500 yrBP in Europe had less frequency of haplogroup H than before. Haplogroups T and K appeared in hunter-gatherers only after 6,500 yrBP, indicating a degree of admixture in some places between farmers and hunter-gatherers. Farmers before and after 6,500 yrBP in Europe had haplogroups W, HV*, H, T, K, and these are also found in individuals buried at Verteba Cave. Therefore, our data point to a common ancestry with early European farmers. Our data also suggest population replacement. Mathieson et al. analyzed a number of Neolithic Ukrainian samples (petrous bone) from several sites in southern, northern, and western Ukraine, dating to ~8,500 – 6,000 yrBP, and found exclusively U (U4 and U5) mtDNA lineages. It should be noted that ‘Neolithic’ in this context does not mean the adoption of agriculture, but rather simply coinciding with a change in material culture. They also analyzed several Trypillian individuals from Verteba Cave (different samples from the those included in this study). Similar to our findings, they found a wider diversity of mtDNA lineages, including H, HV, and T2b. These data, combined with our results, appear to confirm almost complete population replacement by individuals associated with the Tripolye Culture during the Middle to Late Neolithic.

The findings also hint to potential contacts of Yamna with Usatovo as predicted by Anthony (2007), or alternatively (lacking precise dates) to contacts with Corded Ware migrants:

Trypillians were very much a distinct people who most likely displaced 1 local hunter-gatherers with little admixture. Haplogroup W was also observed in several specimens deriving from Site G3. Although we are unsure if all of these haplogroups come from a single or multiple individuals, this observation is interesting in that it is relatively rare and isolated among Neolithic samples. It has, however, been found in samples dating to the Bronze Age. In the study by Wilde et al. [35], they found haplogroup W present in two samples from the Early Bronze Age associated with the Yamnaya and Usatovo cultures. The Usatovo culture (~ 3500 – 2500 BC) was found in Romania, Moldova, and southern Ukraine. It was the conglomeration of Tripolye and North Pontic steppe cultures. Therefore, this individual could link the Trypillian peoples to the Usatovo peoples and perhaps to the greater Yamnaya steppe migrations during the Bronze Age that lead to the Corded Ware Culture.

On the other hand, an article written in terms of mtDNA haplogroup frequencies seems to offer too little proof of anything today. The lack of Y-DNA haplogroups and data on admixture makes their interpretations provisional, subject to change when these further data are published. Also, radiocarbon dating is only confident for individuals of one site (site 7), dated ca. 5,500 cal BP, while “other chambers in the cave are not as confidently dated”…

“Based on the 8 sequence types of the mtDNA D-loop, a maximum parsimonious phylogenetic network was constructed. Circles represent the sequence types, and the size of the circle is proportional to the number of samples. Numbers on the branches between the circles are nucleotide position numbers (+16,000) of the human mitochondrial genome sequence (rCRS). Information about the location (chamber within the cave) where the specimen was excavated is also provided. Areas 2 and 17 are part of Site 7, and these are defined as a separate chamber, although they are located in close proximity within Site 7. The other chambers, Site 20, G2, and G3, are independent and separate locations within the cave. ‘Undefined’ chamber describes an unknown location within the cave. Specimens from each chamber showed deviation for the sequence type distribution observed in the sample set. For example, specimens excavated from Site 7 had five unique sequence types, (I, II, III, IV, and VIII), while specimens excavated from chamber G 21 had mainly one sequence type (V)”. Made available by the authors under a CC-BY-NC-ND 4.0 International license.

We had also seen signs of conflict between Trypillian and steppe cultures in a recent article, Violence at Verteba Cave, Ukraine: New Insights into the Late Neolithic Intergroup Conflict, by Madden et al. (2017):

Many researchers have pointed to the huge “megasites” and construction of fortifications as evidence of intergroup hostilities among the Late Neolithic Tripolye archaeological culture. However, to date, very few skeletal remains have been analyzed for the types of traumatic injury that serve as direct evidence for violent conflict. In this study, we examine trauma on human remains from the Tripolye site of Verteba Cave in western Ukraine. The remains of 36 individuals, including 25 crania, were buried in the gypsum cave as secondary interments. The frequency of cranial trauma is 30-44% among the 25 crania, six males, four females and one adult of indeterminate sex displayed cranial trauma. Of the 18 total fractures, 10 were significantly large and penetrating suggesting lethal force. Over half of the trauma is located on the posterior aspect of the crania, suggesting the victims were attacked from behind. Sixteen of the fractures observed were perimortem and two were antemortem. The distribution and characteristics of the fractures suggest that some of the Tripolye individuals buried at Verteba Cave were victims of a lethal surprise attack. Resources were limited due to population growth and migration, leading to conflict over resource access. It is hypothesized that during this time of change burial in this cave aided in development of identity and ownership of the local territory.


Correlation does not mean causation: the damage of the ‘Yamnaya ancestral component’, and the ‘Future American’ hypothesis

New Ukraine Eneolithic sample from late Sredni Stog, near homeland of the Corded Ware culture

The concept of “outlier” in studies of Human Ancestry, and the Corded Ware outlier from Esperstedt

Marija Gimbutas and the expansion of the “Kurgan people” based on tumulus-building cultures

Evolutionary forces in language change depend on selective pressure, but also on random chance


A new interesting paper from Nature: Detecting evolutionary forces in language change, by Newberry, Ahern, Clark, and Plotkin (2017). Discovered via Science Daily.

The following are excerpts of materials related to the publication (written by Katherine Unger Baillie), from The University of Pennsylvania:

Examining substantial collections of annotated texts dating from the 12th to the 21st centuries, the researchers found that certain linguistic changes were guided by pressures analogous to natural selection — social, cognitive and other factors — while others seem to have occurred purely by happenstance.

“Linguists usually assume that when a change occurs in a language, there must have been a directional force that caused it,” said Joshua Plotkin, professor of biology in Penn’s School of Arts and Sciences and senior author on the paper. “Whereas we propose that languages can also change through random chance alone. An individual happens to hear one variant of a word as opposed to another and then is more likely to use it herself. Chance events like this can accumulate to produce substantial change over generations. Before we debate what psychological or social forces have caused a language to change, we must first ask whether there was any force at all.”

“One of the great early American linguists, Leonard Bloomfield, said that you can never see a language change, that the change is invisible,” said Robin Clark, a coauthor and professor of linguistics in Penn Arts and Sciences. “But now, because of the availability of these large corpora of texts, we can actually see it, in microscopic detail, and begin to understand the details of how change happened.”

One change is the regularization of past-tense verbs. Using the Corpus of Historical American English, comprised of more than 100,000 texts ranging from 1810 to 2009 that have been parsed and digitized — a database that includes more than 400 million words — the team searched for verbs where both regular and irregular past-tense forms were present, for example, “dived” and “dove” or “wed” and “wedded.”

“There is a vast literature and a lot of mythology on verb regularization and irregularization,” Clark said, “and a lot of people have claimed that the tendency is toward regularization. But what we found was quite different.”

Indeed, the analysis pointed to particular instances where it seems selective forces are driving irregularization. For example, while a swimmer 200 years ago might have “dived”, today we would say they “dove.” The shift towards using this irregular form coincided with the invention of cars and concomitant increase in use of the rhyming irregular verb “drive”/“drove.”

Despite finding selection acting on some verbs, “the vast majority of verbs we analyzed show no evidence of selection whatsoever,” Plotkin said.

The team recognized a pattern: random chance affects rare words more than common ones. When rarely-used verbs changed, that replacement was more likely to be due to chance. But when more common verbs switched forms, selection was more likely to be a factor driving the replacement.

The grammar of negating a sentence has changed from “Ic ne secge” (Beowulf, c. 900) to “Ic ne sege noht” (the Ormulum, c. 1100) to “I seye not” (Chaucer, c. 1400) to “I doe not say” (Shakespeare, c. 1600) before returning to the familiar “I don’t say” (Virginia Woolf, c. 1900). A team from Penn used massive digital libraries along with inference techniques from population genetics to quantify the forces responsible for language evolution, such as in Jespersen’s cycle of negation, depicted here. (c) Cherissa Dukelow, 2017, license information below

The authors also observed a role of random chance in grammatical change. The periphrastic “do,” as used in, “Do they say?” or “They do not say,” did not exist 800 years ago. Back in the 1400s, these sentiments would have been expressed as, “Say they?” or “They say not.”

Using the Penn Parsed Corpora of Historical English, which includes 7 million syntactically parsed words from 1,220 British English texts, the researchers found that the use of the periphrastic “do” emerged in two stages, first in questions (“Don’t they say?”) around the 1500s, and then roughly 200 years later in imperative and declarative statements (“They don’t say.”).

These manuscripts show changes from Old English (Beowulf) through Middle English (Trinity Homilies, Chaucer) to Early Modern English (Shakespeare’s First Folio). Penn researchers used large collections of digitized texts spanning the 12th to the 21st centuries to show that many language changes can be attributed to random chance alone. (c) Mitchell Newberry, 2017, license information below

While most linguists have assumed that such a distinctive grammatical feature must have been driven to dominance by some selective pressure, the Penn team’s analysis questions that assumption. They found that the first stage of the rising periphrastic “do” use is consistent with random chance. Only the second stage appears to have been driven by a selective pressure.

“It seems that, once ‘do’ was introduced in interrogative phrases, it randomly drifted to higher and higher frequency over time,” said Plotkin. “Then, once it became dominant in the question context, it was selected for in other contexts, the imperative and declarative, probably for reasons of grammatical consistency or cognitive ease.”

As the authors see it, it’s only natural that social-science fields like linguistics increasingly exchange knowledge and techniques with fields like statistics and biology.

“To an evolutionary biologist,” said Newberry, “it’s important that language is maintained through a process of copying language; people learn language by copying other people. That copying introduces minute variation, and those variants get propagated. Each change is an opportunity for a different copying rate, which is the basis for evolution as we know it.”

Featured image: copyrighted, modified from the Supplementary information of the article.

Image (c) Cherissa Dukelow, 2017, licensed under CC-BY-NC-SA 4.0
Image (c) Mitchell Newberry, 2017,, licensed under CC-BY-NC 4.0 (see materials at University of Pennsylvania for further sources).


Correlation does not mean causation: the damage of the ‘Yamnaya ancestral component’, and the ‘Future American’ hypothesis


Human ancestry can only help solve anthropological questions by using all anthropological disciplines involved. I have said that many times in this blog.

Correlation does not mean causation

Really, it does not.

You might think the tenet ‘correlation does not mean causation‘ must be evident at this point in Statistics, and it must also be for all those using statistical methods in their research. But it is sadly not so. A lot of researchers just look for correlation, and derive conclusions – without even an initial sound hypothesis to be contrasted… You can judge for yourself, e.g. reading the many instances of this complaint in recent publications of Biomedical and Social Sciences, on the interesting blog Statistical Modeling, Causal Inference, and Social Science.

In anthropological questions regarding Indo-European studies there is an added handicap: not taking correlation to mean causation does also mean – to avoid at least the most obvious confounders – taking into account the multiple linguistic and archaeological data that are available right now, to explain the expansion of Indo-European languages.

You might also believe that international researchers in Human Evolutionary Biology – after all, this is essentially a biomedical discipline – are acquainted with statistical methods and their problems when applied to their field. And that scientific journals – and especially those with the highest impact factors, like Nature, Science, or PNAS – have professional, careful reviewers who would never accept papers that equal correlation with causation, especially when Social Sciences are involved (because this alone might make errors grow exponentially…). Sadly, this is obviously not so, either.

The ‘Yamnaya component’ concept and its damage

From Allentoft et al. (2015), emphasis is mine:

Both studies [Haak et al. (2015) and this one] found a genetic affinity between samples from a central European culture known as Corded Ware, which existed from around 2500 bc, and samples from the earlier Yamnaya steppe culture. This similarity between distant populations is best explained by a substantial westward expansion of the Yamnaya or their close relatives into central Europe (Fig. 1b). Such an expansion is consistent with the steppe hypothesis, which argues that Corded Ware cultures were a conduit for the dispersal of Indo-European languages into Europe.

More interesting than these vague words – and the short, almost invisible suggestion that Yamna may not be exactly the population behind Corded Ware peoples – are the maps that illustrated in Nature their risky hypothesis: they called it “steppe hypothesis“, like that (in general terms), as if everyone defending a steppe origin for Proto-Indo-European would support such a model, when they actually referred to the specific hypothesis of one of their authors (Kristiansen), one of the few archaeologists who keep Gimbutas’ concept of the ‘Kurgan peoples’ alive, based on the Corded Ware culture:

Allentoft Corded Ware
Allentoft et al. (2015): “They conclude that the Corded Ware culture of central Europe had ancestry from the Yamnaya. Allentoft et al. also show that the Afanasievo culture to the east is related to the Yamnaya, and that the Sintashta and Andronovo cultures had ancestry from the Corded Ware. Arrows indicate migrations — those from the Corded Ware reflect the evidence that people of this archaeological culture (or their relatives) were responsible for the spreading of Indo-European languages. All coloured boundaries are approximate.”

In many publications that followed, the trend has been to reproduce this graphical model, by asserting (or implying) that Bell Beaker peoples were the result of subsequent Corded Ware migrations, and indeed that Corded Ware peoples migrated from the Yamna culture, and were thus the vector of expansion for Indo-European languages in Europe.

All of this is being proven wrong, as I predicted: see Mathieson et al. (2017) and Olalde et al. (2017) for recently studied samples with ‘steppe component’, older than (and unrelated to) the Yamna culture. However, no retraction (or correction, whatever) has been published to date about the concept of the ‘Yamnaya ancestry expansion’, and its consequences.

We shall see then just a rather surreptitious shift in terminology from ‘Yamnaya’ to ‘steppe’ component, to adapt to the new data – i.e. some damage control while the ship of ‘Yamnaya ancestry’ capsizes – but little else. “Earlier ‘Yamnaya ancestry’, you say? Just, you know, let’s call it ‘steppe ancestry’ and shift the expansion of Indo-European languages to one or two thousand years earlier, and done!”

The damage of this post-truth genetics is already done: we will see the unending distribution on the Internet in general, and on social networks in particular, of these grandiose conclusions, of far-fetched Indo-European migration models that include the Corded Ware culture, of simplistic maps with apparently harmless ‘arrows of migration’ (like the above) representing fictional population movements suggesting nonexistent dialectal branches.

You might be one of those sceptics wary of so many boring statistical rules: “But it’s a safe reasoning: Yamanaya samples have an ‘ancestral component’ that is found elevated in Corded Ware samples, and less so in Bell Beaker samples, and PCA showed a similar result…so the migration model Yamnaya -> Corded Ware -> Bell Beaker is a priori correct, right?”

The ‘Future American’ hypothesis

Let me illustrate this attractive “Correlation = Causation” argument, using it to solve the problem of Future American languages.

Suppose we live in a future post-apocalyptic world ca. 3500 AD, with no surviving historical records before 3000 AD. None. Just investigation of cultures and their relationship by Archaeology, proto-languages reconstructed and language families identified by Linguistics, etc.

We have thus Future Germanic and Future Romance as the only language families spoken in Future Western Europe and in the Future Americas, in a distribution similar to the present day*, and we have certain somehow related archaeologically-defined cultures on both sides of the Atlantic, like Briton, Iberian, Norman, or Lowlandish, although their distribution remains partly undefined in time and space.

* If you are really curious about this scenario, you can read about the potential evolution of a Future North-American language.

But what languages did the ancestors of Future Americans speak, and who spread them? That question remains far from being settled by our future researchers, in spite of the solidest linguistic and migration models (talking mainly about Briton and Iberian cultures): too many authorities out there questioning them, fighting to impose their own pet theories.

Suddenly, the newly developed field of Human Ancestry comes to save the day. So let’s say we have this map of ancient samples recovered (dated from, say, the 6th to the 18th century AD), and our study is centered on the newly described “Western European” component (a precise combination of, say, WHG+steppe), which peaks in early samples from the Low Lands – hence we call it, quite daringly, “Lowlandic component“.

Our group is keen to demonstrate that the ancient Lowlandic culture described in Archaeology (marked especially by the worldwide distribution of tulips among other traits) is the origin of Western European and American languages… Now, let’s reach conclusions about migrations in the Middle Ages!

‘Future American’ hypothesis. Migration routes in Western Europe and the Americas during the Middle Ages, based on the ‘Lowlandic component’ (Click to open higher quality version).

PCA shows that South-West European samples cluster closely to some North-West European samples, and that some late South American samples available cluster at some distance from North American samples – nearer to a native component represented by two individuals with 0% Lowlandic ancestry and a different cluster in PCA. And some North-American samples cluster quite closely to North-West European samples.

Based on the decrease in ‘Lowlandic component’ in the different samples and on PCA, we conclude that Lowlandic peoples (“or their close relatives”) must have migrated at the same time to North America, South America (or potentially from North America to South America?) as well as western, central, and northern Europe. Both migration events must have happened roughly at the same time, in part because both distinct language families appear in a north-south distribution, and Proto-Lowlandic must be (according to Genetics) the ancestor of both, Proto-Future-Germanic and Proto-Future-Romance.

That makes a lot of sense! A huge Lowlandic pressure for migration, you see. Push-pull mechanisms and stuff. A Lowlandic Empire probably (scattered remains are found everywhere)! And, judging by the presence of the ‘Lowlandic component’ in Future East Europe from the Elbe to the Vistula, maybe Lowlandic peoples spread Proto-Slavic, too! We can even date the common Lowlandic-Slavic proto-language this way! So many groundbreaking conclusions!

Future scholars supporting the Lowlandic homeland are on fire; they can’t get enough of publishing papers on the subject. “Two different Future American language families with cultural origins in Britain and Iberia, my ass! Because genetics.”

And don’t forget the future people of haplogroup R1b-U106 and high Lowlandic component: Wow, they are the heirs of those who expanded Future Germanic and Future Romance languages everywhere, aren’t they? How proud they must be. And who wouldn’t want to have these tall, blond, blue-eyed Lowlanders as their forefathers? Personalised genetic analysis is selling like crazy: “let’s know our Lowlandic percentage!”. Everyone is happy, colourful maps with lots of arrows and shit…

But – your future you might ask in awe, seeing that this doesn’t sound quite right, based on your basic archaeological and linguistic knowledge:

  • What about specific models of migration proposed to date? The solidest ones, not just anyone that seems to fit?
  • What about the dialectal classification of languages? The mainstream ones, not those that are compatible with this interpretation?
  • What about archaeological cultures to which individual samples belonged?
  • What about the actual dates of each sample? And how this date relates to the state of the culture to which it belongs?
  • What about the haplogroups, and the actual subclade of each haplogroup?
  • What about the territories, cultures, and dates not sampled, could they change this interpretation in light of known archaeological models?
  • And what about the actual origin of that ancestral component they so frivolously named? Dit it really appear ex nihilo in the Low Lands, and expanded from it?

“Who cares! This new data is sooo coool… And it proves what we wanted, what a coincidence! And it’s numbers, mate! Numbers don’t lie.”

No, numbers don’t lie. But people do.

Correlation is fun, isn’t it?



Schleicher’s Fable in Proto-Indo-European – pitch and stress accent


Also included in our monograph North-West Indo-European (first draft) is a tentative reconstruction of Schleicher’s fable in North-West Indo-European, and just for illustration of the reconstructed sounds (including pitch and stress accent), a recording has been included.

The recording is available as audio (see above) or video (see below) with captions and multiple subtitles. The captions in North-West Indo-European show acute accents over accented vowels, while stressed syllables are underlined:

I think such a recording was necessary for comparison with the most commonly reconstructed pronunciation, as taught usually in courses. And I am not referring to those professors still using only stress – instead of pitch – accent to pronounce PIE, but to those that, using pitch accent, do place stress over the same syllable.

A good example to illustrate my point is Andrew M. Byrd‘s reading of his version of the fable for the journal Archaeology.

Apart from some controversial decisions regarding the Proto-Indo-Hittite reconstruction – see our explanation of our version, or e.g. Kortlandt’s reconstruction of the Fable (PDF) for more details – , his recitation does not seem to contrast enough pitch and stress accent, to the extent that pitch and stress seem to be always on the same syllable. He specialises in Proto-Indo-European phonology, so maybe it is a voluntary selection.

Firstly, as an introduction – in case you don’t know anything about this question -, a pitch accent is reconstructed for Proto-Indo-European, based on the reconstructed accent of Old Indian, Greek, Germanic, and Balto-Slavic – hence also valid for North-West Indo-European, even though Italo-Celtic lost it completely.

If you have listened to any tonal language*, words have also stress accent, and not necessarily on the same syllable – but usually on the heaviest one. In fact, I don’t know of an accent pattern with pitch+stress on the same syllable (but for certain reconstructed intermediate labile stages of a languages), and I guess it is so redundant that it would always lose one of them.

*pitch-accent systems are also tonal systems, after all, since they involve at least two tones: an acute or rising one, and usually a falling one after it.

You can listen to a sample of the Homeric recitation by Stephen Daitz, with restored Ancient Greek pronunciation, where he contrasts pitch and stress beautifully:

Note: you can buy his readings in restored pronunciation online in Bolchazy-Carducci Publishers. I can’t recommend them highly enough.

You can listen to other samples of Ancient Greek with restored pronunciation by Stefan Hagel (whose Homeric singing is superb), or many others.

To see what I mean with the lack of contrast in Byrd’s pronunciation, just compare the restored pronunciation with these samples, of restored Koine Greek, from the Biblical Language Center. I think you can hear pitch accent pronounced, but always stressing the same syllable. After a while, it gets quite monotone (no pun intended); for me, at least*.

*It seems to be, nevertheless, one of the top rated pronunciations of Koine Greek out there.

Pitch accent in my pronunciation is not as noticeable as that of Stephen Daitz, and still less than that of Stefan Hagel. But it is not intended to.

I wanted to combine tone and stress as naturally as possible, as it is found in modern languages, like Chinese, or like South Slavic, Baltic, or Scandinavian languages. I believe PIE phonology cannot be too different from modern natural examples.

Many Modern Greek scholars complain about the artificiality of the restored pronunciation. I’ve heard particularly harsh criticism against Stefan Hagel’s pronunciation: many scholars do not recognise the ancestral language in the restored pronunciation.

While such critics may seem like snob reactionaries, and I really appreciate an exaggerated poetic style for epic poems (I have spent hundreds, probably thousands, of hours listening to Stephen Daitz), I don’t think this is the way Ancient Greek was usually spoken. Listening to Hagel’s pronunciation in the Ancient Greek Assimil, there is a huge contrast between readers who don’t use the restored pronunciation in the recordings (offering thus a decaffeinated Ancient Greek), and Hagel’s reading (or, almost, singing).

In my interpretation of the fable I have tried to follow these ideas, and maybe in the end the pitch accent is not as acute as it should be (a fifth higher). On the other hand, it seemed more natural to me this way.

Also, in the final version of my reading, there are many words where it is not clear – not even to me – if there is more than one syllable with pitch or stress accent. This is especially so after after my first change of voice to make a more acute ‘sheep voice’, and then worsens with my graver ‘horse voice’. I really thought recording this was going to be easier!

If you have any comments or suggestions on the pronunciation, they are all welcome.

UPDATE (November 2, 2017): Frederik Kortlandt comments our paper – “When comparing PIE with other tonal languages, the best candidate is Japanese, which means that the “stress” falls on the last High syllable of a word form or sequence of connected word forms.”