Something is very wrong with models based on the so-called ‘Yamnaya admixture’ – and archaeologists are catching up (II)

A new article by Leo S. Klejn tries to improve the Northern Mesolithic Proto-Indo-European homeland model of the Russian school of thought: The Steppe hypothesis of Indo-European origins remains to be proven, Acta Archaeologica, 88:1, 193–204.


Recent genetic studies have claimed to reveal a massive migration of the bearers of the Yamnaya culture (Pit-grave culture) to the Central and Northern Europe. This migration has supposedly lead to the formation of the Corded Ware cultures and thereby to the dispersal of Indo-European languages in Europe. The article is a summary presentation of available archaeological, linguistic, genetic and cultural data that demonstrates many discrepancies in the suggested scenario for the transformations caused by the Yamnaya “invasion” some 5000 years ago.


Both teams [Reich/Anthony, and Willerslev/Kristiansen] interpreted this resemblance in the same way: as evidence of mass migration of the Yamnaya culture from the steppes into the Central and Northern Europe, resulting in the formation of the Corded Ware cultures, and these are universally recognised as Indo-European. Since earlier in this part of Europe existed a different pool of genomes, geneticists presumed that the Yamnaya migration alone had brought the Indo-European languages into Europe. It is difficult to say to what extent the pre-convictions of the involved archaeologists influenced these conclusions, or whether the results of the genetic studies attracted archaeologists with such beliefs.

Mismatch of cultural manifestations

First, we might question the idea of the Yamnaya culture as a unity rather than a loose conglomerate of cultures. Merpert (1974) divided it into nine local groups but did not recognise them as separate cultures. However, in 1975 I suggested that Nerushay (Budzhak) monuments should be recognised as a distinct culture (Klejn 1975), although still as a part of the same broader steppe community.

This was accepted by other specialists (Ivanova 2012; 2013; 2014). Generally, in the western branch of this community, a mixture of the eastern rites of interment with local, Balkan ceramics can be observed. It should be noted that hitherto all genetic samples were taken from eastern material (in the vicinity of Samara in the Volga basin and Kalmykia), while the central thesis concerns the intrusion of the western branch of this community (Budzhak culture) into Europe.

The spread of cultural-historical communities of the Yamnaya culture and the location of the Budzhak culture. GAC – Globular Amphora culture; CWC – Corded Ware culture. After Ivanova 2013.

Simultaneity of cultures

The Yamnaya culture (Chernykh & Orlovskaya 2004a; Heyd 2011; Frȋnculeasa et al. 2015) appears not to be the predecessor of the Corded Ware cultures but is contemporary with them. The Corded Ware cultures appeared also around the turn between the fourth and third millennium BC (Stöckli 2001; Furholt 2003). Their derivation from the Yamnaya seems, therefore, to be less probable. This is evidenced by the fact that the corded beakers or amphorae found in the Budzhak culture are not the prototypes of the corded beakers or amphorae found in more northern territories, but seem instead to be an outcome of contemporaneous contacts (Ivanova 2014; Klejn 2017c).

Discrepancies across the haplogroups

Even more remarkable is the variation in the distribution of types of Y chromosome. In the Yamnaya population, R1b is not just a single occurrence (there are about seven known occurrences) while in the Corded Ware population a different clade of R1b is found and R1a is predominant (several instances). Thus the postulate of unbroken succession finds no support!

Distribution of artefacts and customs of the Yamnaya culture in the area of the Corded Ware cultures. After Bátora 2006.

Paradoxical gradient

In the tables presented in the article by Reichs’ team (Haak et al. 2015) the genetic pool connecting the Yamnaya culture with the Corded Ware people is shown to be more intense in Northern Europe (Norway and Sweden) and decreases gradually from the North to the South (Fig. 6). It is weakest around the Danube, in Hungary, i. e. areas neighbouring the western branch of the Yamnaya culture! This is the reverse image to what the proposed hypothesis by the geneticists would lead us to expect. It is true that this gradient is traced back from the contemporary materials, but it was already present during the Bronze Age (Klejn 2015a).

The author also uses questionable interpretations from selected articles to advance his (as of today) untenable positions regarding a Mesolithic origin of the reconstructible Proto-Indo-European language.

1. Glottochronology, for a PIE origin:

If based on the data of glottochronology (taking into account all disputes) the period of initial dispersal is to be dated to the 7th-5th millennium BC.

2. Doubts on the origin of R1b-L51 subclades expressed in Genetic differentiation between upland and lowland populations shapes the Y-chromosomal landscape of West Asia, by Balanovsky et al. (2017), Human Genetics 136, 4. 437-450:

The currently available dataset does not contradict the hypothesis that R-GG400 marks a link between the East European steppe dwellers and West Asians, though the route and even direction of this migration is disputable. It does, however, demonstrate that present-day West European R1b chromosomes do not originate from the Yamnaya populations analyzed in (Haak et al. 2015; Mathieson et al. 2015) and raises the question of their origin. A Bronze Age origin is more likely than a Neolithic one (Balaresque et al. 2010), but further ancient DNA studies may be necessary to identify this source.

Just yesterday I read the post The retraction paradox: Once you retract, you implicitly have to defend all the many things you haven’t yet retracted, by Andrew Gelman. While – in my opinion – the post does not live up to its title, it poses an interesting question, as to how ad logicam (fallacy fallacy) is often used today in research: One author proposes something that is later demonstrated to be wrong, so everything they wrote or write can be said ipso facto to be wrong…especially if they accept that it was wrong.

This is usual with amateur geneticists (those who don’t publish, and are therefore not subjected to criticism): if anyone is wrong (whether in Archaeology or Genetics), then they are wrong in everything else. It seems to me that Klejn’s theses against recent genetic results rest on the same assumption: The Yamna -> Corded Ware migration model is wrong, ergo the Yamna homeland model is wrong.

I guess this same fallacy is what a lot of angered geneticists (whether professional or amateurs) are going to use to dismiss Klejn’s criticism, trying to focus on what he clearly does not grasp – about genomic data of Yamna peoples and their expansion – to disregard his doubts on genetic interpretations entirely.

I have warned many times about how simplistic interpretations of genetic data would cause a general mistrust in the field, and that archaeologists won’t take the discipline seriously, no matter how many articles get published in famous research tabloids like Nature or Science…

Those who dismiss this warning lightly seem to forget the fate of other recent “scientific breakthroughs” which were initially so promising that Humanities appeared to matter no more, like glottochronology for Linguistics and, to some extent, that of radiocarbon analysis for Archaeology.
EDIT: see here a recent example of discusion on discrepancies between archaeological and 14C-based chronologies, whereby ‘scientific data’ obviously needs archaeological context for a meaningful interpretation

Featured image: The direction of the supposed migration of the bearers of the Yamnaya culture into the area of the Corded Ware cultures. After Haak et al. 2015.

NOTE: I obviously don’t agree with Klejn’s main model: he criticises the Proto-Indo-European steppe homeland, and more specifically the expansion of Yamna peoples with R1b-L23 subclades, which I support. But, probably because of his “pre-convictions” (as he puts it when describing proponents of the steppe hypotheses) about the Proto-Indo-European homeland in Northern Europe during the Mesolithic, he was one of the first renown archaeologists to criticise the obvious inconsistencies in the genetic model of migrations based exclusively on the “Yamnaya ancestral component” concept, and to provoke the necessary reaction from (until then) overconfident geneticists, and he deserves credit for that.

In my opinion, the Russian school’s “Northern European Mesolithic” homeland model – as I have said before – could be based on the appearance of EHG ancestry, or maybe on the expansion of haplogroup R1b with post-Swiderian cultures, but the timeframe proposed is too early for any reconstructible parent proto-language, even for Indo-Uralic.


Prehistoric loan relations: Foreign elements in the Proto-Indo-European vocabulary


An interesting ongoing web project, Prehistoric loan relations, on potential loans of Proto-Indo-European words, from Uralic-Yukaghir, Caucasian, and Middle Eastern influence.

Based on a Ph.D. thesis by Bjørn (2017) Foreign elements in the Proto-Indo-European vocabulary (PDF).

From the website (emphasis mine):

This page allows historical linguists to compare and scrutinize proposed prehistoric lexical borrowings from the perspective of Proto-Indo-European. The first entries are all (135 in total) extracted from my master’s thesis “Foreign elements in the Proto-Indo-European vocabulary” (Bjørn 2017). Comments are encouraged at the bottom of each entry. New entries will be added, also on request.

Take this not as the conclusion, but an invitation to join the conversation.

So, we welcome the invitation, and hope that this new project thrives.

Also, I loved his fantasy-like map of the central Eurasian region (featured image on this post).


Stone Age plague accompanying migrants from the steppe, probably Yamna, Balkan EBA, and Bell Beaker, not Corded Ware


In the latest revisions of the Indo-European demic diffusion model, using the results from the article Early Divergent Strains of Yersinia pestis in Eurasia 5,000 Years Ago, by Rasmussen et al., Cell (2015), I stated (more or less indirectly) that the high east-west mobility of the Corded Ware migrants across related cultures might have been responsible for the spread of this disease, which seems to have been originally expanded from Central Eurasia.

New results appeared recently in the article The Stone Age Plague and Its Persistence in Eurasia, by Valtueña et al., Current Biology (2017), which may contradict that interpretation.

Diachronic map of Copper Age migrations ca. 3100-2600 BC – Corded Ware


Yersinia pestis, the etiologic agent of plague, is a bacterium associated with wild rodents and their fleas. Historically it was responsible for three pandemics: the Plague of Justinian in the 6th century AD, which persisted until the 8th century [ 1 ]; the renowned Black Death of the 14th century [ 2, 3 ], with recurrent outbreaks until the 18th century [ 4 ]; and the most recent 19th century pandemic, in which Y. pestis spread worldwide [ 5 ] and became endemic in several regions [ 6 ]. The discovery of molecular signatures of Y. pestis in prehistoric Eurasian individuals and two genomes from Southern Siberia suggest that Y. pestis caused some form of disease in humans prior to the first historically documented pandemic [ 7 ]. Here, we present six new European Y. pestis genomes spanning the Late Neolithic to the Bronze Age (LNBA; 4,800 to 3,700 calibrated years before present). This time period is characterized by major transformative cultural and social changes that led to cross-European networks of contact and exchange [ 8, 9 ]. We show that all known LNBA strains form a single putatively extinct clade in the Y. pestis phylogeny. Interpreting our data within the context of recent ancient human genomic evidence that suggests an increase in human mobility during the LNBA, we propose a possible scenario for the early spread of Y. pestis: the pathogen may have entered Europe from Central Eurasia following an expansion of people from the steppe, persisted within Europe until the mid-Bronze Age, and moved back toward Central Eurasia in parallel with human populations.

Maximum-Likelihood Tree and Percent Coverage Plot of Virulence Factors of Yersinia pestis. (A) Maximum-likelihood tree of all Yersinia pestis genomes, including 1,265 SNP positions with complete deletion. Nodes with support R95% are marked with an asterisk. The colors represent different branches in the Y. pestis phylogeny: branch 0 (black), branch 1 (red), branch 2 (green), branch 3 (blue), branch 4 (orange), and LNBA Y. pestis branch (purple). Y. pseudotuberculosis-specific SNPs were excluded from the tree for clarity of representation. In the light-colored boxes, discussed losses and gains of genomic regions and genes are indicated. Related

It seems that, notwithstanding the simplistic (white) arrows of steppe ancestry expansion shown in their map (see below), the actual expansion of Yersinia pestis might have in fact accompanied Yamna migrants from the Pontic-Caspian steppe into Early Bronze Age cultures from the Balkans, including Bell Beaker migrants, as the phylogenetic analysis and dates suggest – and as the potential arrows of the plague expansion in the map (in green) show.

Late Corded Ware migrants would have only later expanded the disease to eastern Europe, as shown in the second map, most likely because of their close contact with Bell Beaker migrants (but remaining culturally distinct from them), and indeed because of the mobility accross related Corded Ware cultures up to the Urals.

The cultural-historical community in the Late Neolithic between steppe peoples that would evolve into Uralic-speaking Sredni Stog/Corded Ware migrants in the western steppe, and Late Indo-European-speaking Yamna/SE EBA/Bell Beaker migrants originally from the eastern steppe, would allow for the spread of the disease first among steppe groups, and then from both distinct late groups into their respective expanded regions.

The phylogenetic tree of Y. pestis available right now (see above), however, seems to suggest a stronger initial link to Yamna migrants, i.e. an origin in the North Caspian steppe, and an expansion with Yamna into the north Pontic area, into the Caucasus, and with the Afansevo culture, spreading later with Balkan EBA cultures and the expansion of Bell Beaker peoples.

Instead of warring nature, close ties, and mobility of Corded Ware peoples (reasons I used to justify the rapid spread of the disease among CWC groups), I guess it was rather the higher population density of SE Europe compared to the regions north of the loess belt, as well as the greater admixture of Yamna migrants with native SE European populations, the factors which might have helped expand the disease.

Map of Proposed Yersinia pestis Circulation throughout Eurasia (A) Entrance of Y. pestis into Europe from Central Eurasia with the expansion of Yamnaya pastoralists around 4,800 years ago. (B) Circulation of Y. pestis to Southern Siberia from Europe. Only complete genomes are shown.

Nevertheless, lacking more data, it is unclear if the disease expanded with both steppe groups.


The renewed ‘Kurgan model’ of Kristian Kristiansen and the Danish school: “The Indo-European Corded Ware Theory”

Allentoft Corded Ware

A popular science article on Indo-European migrations has appeared at Science News, entitled How Asian nomadic herders built new Bronze Age cultures, signed by Bruce Bower. While the article is well-balanced and introduces new readers to the current status quo of the controversy on Indo-European migrations – including the opposing theories led by Kristiansen/Anthony vs. Heyd – , it reverberates yet again the conclusions of the 2015 Nature articles on the subject, especially with its featured image.

I have argued many times why the recent ‘Yamnaya -> Corded Ware -> Bell Beaker’ migration model is wrong, mainly within my essay Indo-European demic diffusion model, but also in articles of this blog, most recently in the post Correlation does not mean causation: the damage of the ‘Yamnaya ancestral component’, and the ‘Future America’ hypothesis). It is known that Nature is a bit of a ‘tabloid’ in the publishing industry, and these 2015 articles offered simplistic conclusions based on a wrong assessment of archaeological and linguistic data, in search for groundbreaking conclusions.

An excerpt from Bower’s article:

Corded Ware culture emerged as a hybrid way of life that included crop cultivation, breeding of farm animals and some hunting and gathering, Kristiansen argues. Communal living structures and group graves of earlier European farmers were replaced by smaller structures suitable for families and single graves covered by earthen mounds. Yamnaya families had lived out of their wagons even before trekking to Europe. A shared emphasis on family life and burying the dead individually indicates that members of the Yamnaya and Corded Ware cultures kept possessions among close relatives, in Kristiansen’s view.

“The Yamnaya and the Corded Ware culture were unified by a new idea of transmitting property between related individuals and families,” Kristiansen says.

Yamnaya migrants must have spoken a fledgling version of Indo-European languages that later spread across Europe and parts of Asia, Kristiansen’s group contends. Anthony, a longtime Kristiansen collaborator, agrees. Reconstructed vocabularies for people of the Corded Ware culture include words related to wagons, wheels and horse breeding that could have come only from the Yamnaya, Anthony says.

I have already talked about Kristiansen’s continuation of Gimbutas’ outdated ideas: we are seeing a renewed effort by some Scandinavian (mainly Danish) scholars to boost (and somehow capitalise) the revitalised concept of the “Kurgan people”, although now the fundamental issue has been more clearly shifted to the language spoken by Corded Ware migrants.

As far as I can tell, this renewed interest began two years ago, with the simultaneous publication of genetic studies by Haak et al. (2015), and Allentoft et al. (2015), and the misuse of the cursed concept of ‘Yamnaya ancestry‘ to derive far-fetched conclusions.

On the other hand, genetic research is not solely responsible for this: David Anthony – who was apparently consulted by Haak et al. (2015) for their paper, where he appears as co-author – has kept a low (or lower) profile, and only recently has he merely suggested potential links between Corded Ware and Bell Beaker cultures in Lesser Poland, that might explain what (some geneticists have told him) appeared as a potential Yamna -> Corded Ware -> Bell Beaker migration in the first ancient samples studied.

Anthony’s migration model remains otherwise strongly based on Archaeology, offering a careful interpretation of potential contacts and migrations in the Pontic-Caspian steppe, and only marginally offers some views on Linguistics (based on Ringe’s controversial ‘glottochronological model’ of 2006), to the extent that he is compelled to explain the potential adoption of Indo-European by Corded Ware culture (CWC) peoples as multiple cultural diffusion events, since no migration is observed from the steppe to CWC territories.

I think he is thus showing a great deal of restraint, not jumping on the bandwagon of this recent trend based on scarce genetic finds – and therefore losing also the opportunity to publish articles in journals of high impact factor….

This newly created Danish school, on the other hand, seems to be swimming with the tide. Kristiansen, known for his controversial ‘universal’ interpretations of European Prehistory – which are nevertheless more readable and interesting than most specialised literature on Archaeology, at least for us non-archaeologists – , has apparently seized the opportunity to give a strong impulse to his theories.

Not that there is nothing wrong with that, of course, but sometimes it might seem that a lot of papers (or even researchers) support something, when in fact there are only a few of them, working closely together

I see therefore three main “branches” of this support (two of them, Genetics and Linguistics, only recently giving some limited air to this dying hypothesis), with a closely related group of people involved in this model, and they are lending continuous support to each other, by repeating the same theory – and repeating the same misleading map images (like the one shown in the article) – , so that the circular reasoning they represent is concealed behind seemingly independent works.

The theory and its development

The main theory is officially rooted then in Kristiansen’s hypothesis, whose first article on the subject seems to be Prehistoric Migrations – the Case of the Single Grave and Corded Ware Cultures (1989), supporting the Kurgan model applied to the Corded Ware migrations. It was probably a kind of a breakthrough in Archaeology, bringing migration to mainstream Archaeology again (followed closely by Anthony), and he deserves merit for this.

After this proposal, there are mostly just his publications supporting this model. Nevertheless, Kristiansen’s model, I gather, did not involve the sudden Yamnaya -> Corded Ware migrations discussed in recent genetic articles, but long-lasting contacts between peoples and cultures from the North Pontic steppe, Trypillian, and Globular Amphora, that formed a new mixed one, the Corded Ware people and culture. Also, in Gimbutas’ original model of migration (1963), waves of Kurgan migrants are also described into Vučedol and Bell Beaker, which have been apparently forgotten in recent models*.
* The most recent model by Anthony describes such migrations into Early Bronze Age Balkan cultures – as do most archaeological publications today – , but he is unable to recognize migration waves from Yamna into the Corded Ware culture, and because of that describes mere potential routes (or modes) of cultural diffusion including language change.

Proposal for the origin and spread of the Corded Ware/ Battle Axe cultural complex: 1) Distribution of CWC groups; 2) Yamna culture; 3) presumed area of origin; 4) presumed main directions of the primary distribution. Also numbered are other individual CW cultures. From Kristiansen (1989).

Then – skipping the years of simplistic phylogeography based on modern haplogroup distribution – we have to jump directly to Allentoft (of the Natural History Museum of Denmark) and cols. and their article on population genomics of Bronze Age Eurasia (2015), with which Kristiansen collaborated, and which offers the first direct association of Corded Ware as the vector of expansion of Indo-European peoples and languages from Yamna. An interesting take on the Yamna -> Corded Ware -> Bell Beaker question is represented by their very ‘kurgan-like’ Corded Ware-centric map:

Detail of Fig. 1 from Allentoft et al. (2015): “Distribution of Early Bronze Age cultures Yamnaya, Corded Ware, and Afanasievo with arrows showing the Yamnaya expansions”.

And suddenly, we are now seeing more works that support the central thesis of the group – that Corded Ware must have brought Indo-European languages to Europe:

Recent publications by K-G Sjögren – from the same department as Kristiansen, at the University of Gothenburg – seem to imply that there was a direct connection Corded Ware -> Bell Beaker in central Europe.

Guus Kroonen‘s recent hypothesis of a potential (Proto-Semitic-like) Germanic substrate (2012) has been added recently to the cause, in supporting with Iversen (also from the University of Copenhaguen) a link with the Battle Axe/Funnelbeaker culture interaction. However, in the archaeological-linguistic model it seems that Germanic must predominate over the rest of Indo-European languages in terms of age, representing the first wave of Indo-Europeanization in Europe (wat?!), whereas Balto-Slavic is much younger and unrelated…? But didn’t they share the same substrate (as did partially Greek) in Kroonen (2012)? I think Kroonen’s hypothesis might be better explained through an earlier contact in the North Pontic steppe

Modified from Kristiansen et al. (2017). “Schematic representation of how different Indo-European branches have absorbed words (circles) from a lost Neolithic language or language group (dark fill) in the reconstructed European linguistic setting of the third millennium BC, possibly involving one or more hunter gatherer languages (light fill) (after Kroonen & Iversen 2017)”.


This recently created Danish pressure group is not something bad per se. I don’t agree with their hypothesis (or rather evolving hypotheses, since they change with new genetic results and linguistic proposals, as is shown in Kristiansen et al. 2017), but I understand that the group continues a recent tradition:

Publications are always great to advance in knowledge, and if they bring some deal of publicity, and more publications (with the always craved impact factor), and maybe more investment in the departments (with more local jobs and prestige)… why not?

However, this model of workgroup research system is reminiscent of the Anatolian homeland group loosely created around Renfrew; the Palaeolithic Continuity workgroup around Cavalli-Sforza; or (more recently) the Celtic from the West group around Cunliffe and Koch. The difference between Kristiansen’s workgroup and supporters of all those other models, in my opinion, is that (at least for the moment) their collaboration is not obvious to many.

Therefore, to be fair with any outsider, I think this group should clearly state their end model: I propose the general term “Indo-European Corded Ware Theory” (IECWT) workgroup, because ‘Danish’ is too narrow, and ‘Scandinavian’ too broad to represent the whole group. But any name will do.

My opinion on the IECWT

As you can see, no single strong proof exists in support of the IECWT:

  • Not for a solid model of PIE expansion from Corded Ware, not even within the IECWT group, where there is no support (to date) for a Balto-Slavic expansion associated with the Corded Ware culture… Or any other dialect, for that matter;
  • Not for a Corded Ware -> Bell Beaker connection – that is, before the publication of Allentoft et al. (2015) and articles reverberating their conclusions;
  • Not for a unified Pre-Germanic community before the Dagger Period, and still less linked with the expansion of the Corded Ware culture from the steppe – that connection is found only in Anthony (2007), where he links it with a cultural diffusion into Usatovo, which seems too late for a linguistic expansion with Corded Ware peoples, with the current genetic data.

The wrong interpretation of scarce initial ancient samples has been another feeble stone put over the ruins of Gimbutas’ theory. While her simple theory of Kurgan invaders was certainly a breakthrough in her time – when speaking about migrating Indo-European peoples was taboo -, it has since been overcome by more detailed archaeological and linguistic accounts of what happened in east and central Europe during the Chalcolithic and Bronze Age.

However, a lot of people are willing to consume post-truth genetic-based citebait like crazy, in a time when Twitter, Facebook, blogs, etc. seem to shape the general knowledge, while dozens of new, carefully prepared papers on Archaeology and Linguistics related to Indo-European peoples get published weekly and don’t attract any attention, just because they do not support these simplistic claims, or precisely because they fully reject them.

An older connection of Germanic to Scandinavia – and thus an ancestral Indo-European cultural diffusion from north to south – seems to better fit the traditional idea of an autochthonous Germanic homeland in Scandinavia, instead of a bunch of southern Bell Beaker invaders bringing the language that could only later develop as a common Nordic language during the Bronze Age, in a genetically-diverse community…

One is left to wonder whether the support of Corded Ware + haplogroup R1a representing Pre-Germanic is also in line with the most natural human Kossinnian trends, whereby the older your paternal line and your ancestral language are connected to your historical territory, the better. The lack of researchers from Norway – where R1b subclades brought by Bell Beakers peak – in the workgroup is revealing.

Just as we are seeing strong popular pressure e.g. to support the Out of India Theory by Hindu nationalists, or some Slavic people supporting to recreate a ‘Northern IE group’ with a Germano-Balto-Slavic Corded Ware culture – and a renewed interest in skin, hair and eye colour by amateur geneticists – , it is only natural to expect similar autochtonous-first trends in certain regions of the Germanic-speaking community.

NOTE: I feel a bit like an anti-IECWT hooligan here, and once again fulfilling Godwin’s Law. Judging by previous reactions in this blog to criticism of the Out of India Theory, and to criticism of R1a as the vector of expansion of Indo-European languages, this post is likely to cause some people to feel bad.

It is not intended to be against these researchers individually, though. All of them have certainly contributed in great ways to their fields, indeed more than I have to any field: Kristiansen is well-known for his careful, global interpretations of European prehistory (and has been supporting his model for quite a long time). I do like Kroonen’s ideas of a Pre-Germanic substratum. And people involved in the group do so probably because they collaborate closely with each other, and because of the huge pressure to publish in journals of high impact factor, so to mix their disparate research within a common model seems only natural.

But their collaboration is boosting certain wrong ideas, and is giving way to certain misconceptions in Linguistics, and also sadly renewed past ethnocentric views of language in Northern Europe – that will be luckily demonstrated, again, wrong. After all, publications (like ideas in general) are subjected to criticism, as mine are. Researchers who publish know their work is subjected to criticism, and not only before publication, but also – and probably more so – after it. That a paper can be incorrect, biased, or even completely absurd, does not mean the person who wrote it is a fool. That’s the difference between criticising ideas and insulting. If criticism offends you, you shouldn’t be publishing. Period.


Featured image: From Allentoft et al. (2015)“>Allentoft et al. (2015). See here for full caption.

Correlation does not mean causation: the damage of the ‘Yamnaya ancestral component’, and the ‘Future American’ hypothesis


Human ancestry can only help solve anthropological questions by using all anthropological disciplines involved. I have said that many times in this blog.

Correlation does not mean causation

Really, it does not.

You might think the tenet ‘correlation does not mean causation‘ must be evident at this point in Statistics, and it must also be for all those using statistical methods in their research. But it is sadly not so. A lot of researchers just look for correlation, and derive conclusions – without even an initial sound hypothesis to be contrasted… You can judge for yourself, e.g. reading the many instances of this complaint in recent publications of Biomedical and Social Sciences, on the interesting blog Statistical Modeling, Causal Inference, and Social Science.

In anthropological questions regarding Indo-European studies there is an added handicap: not taking correlation to mean causation does also mean – to avoid at least the most obvious confounders – taking into account the multiple linguistic and archaeological data that are available right now, to explain the expansion of Indo-European languages.

You might also believe that international researchers in Human Evolutionary Biology – after all, this is essentially a biomedical discipline – are acquainted with statistical methods and their problems when applied to their field. And that scientific journals – and especially those with the highest impact factors, like Nature, Science, or PNAS – have professional, careful reviewers who would never accept papers that equal correlation with causation, especially when Social Sciences are involved (because this alone might make errors grow exponentially…). Sadly, this is obviously not so, either.

The ‘Yamnaya component’ concept and its damage

From Allentoft et al. (2015), emphasis is mine:

Both studies [Haak et al. (2015) and this one] found a genetic affinity between samples from a central European culture known as Corded Ware, which existed from around 2500 bc, and samples from the earlier Yamnaya steppe culture. This similarity between distant populations is best explained by a substantial westward expansion of the Yamnaya or their close relatives into central Europe (Fig. 1b). Such an expansion is consistent with the steppe hypothesis, which argues that Corded Ware cultures were a conduit for the dispersal of Indo-European languages into Europe.

More interesting than these vague words – and the short, almost invisible suggestion that Yamna may not be exactly the population behind Corded Ware peoples – are the maps that illustrated in Nature their risky hypothesis: they called it “steppe hypothesis“, like that (in general terms), as if everyone defending a steppe origin for Proto-Indo-European would support such a model, when they actually referred to the specific hypothesis of one of their authors (Kristiansen), one of the few archaeologists who keep Gimbutas’ concept of the ‘Kurgan peoples’ alive, based on the Corded Ware culture:

Allentoft Corded Ware
Allentoft et al. (2015): “They conclude that the Corded Ware culture of central Europe had ancestry from the Yamnaya. Allentoft et al. also show that the Afanasievo culture to the east is related to the Yamnaya, and that the Sintashta and Andronovo cultures had ancestry from the Corded Ware. Arrows indicate migrations — those from the Corded Ware reflect the evidence that people of this archaeological culture (or their relatives) were responsible for the spreading of Indo-European languages. All coloured boundaries are approximate.”

In many publications that followed, the trend has been to reproduce this graphical model, by asserting (or implying) that Bell Beaker peoples were the result of subsequent Corded Ware migrations, and indeed that Corded Ware peoples migrated from the Yamna culture, and were thus the vector of expansion for Indo-European languages in Europe.

All of this is being proven wrong, as I predicted: see Mathieson et al. (2017) and Olalde et al. (2017) for recently studied samples with ‘steppe component’, older than (and unrelated to) the Yamna culture. However, no retraction (or correction, whatever) has been published to date about the concept of the ‘Yamnaya ancestry expansion’, and its consequences.

We shall see then just a rather surreptitious shift in terminology from ‘Yamnaya’ to ‘steppe’ component, to adapt to the new data – i.e. some damage control while the ship of ‘Yamnaya ancestry’ capsizes – but little else. “Earlier ‘Yamnaya ancestry’, you say? Just, you know, let’s call it ‘steppe ancestry’ and shift the expansion of Indo-European languages to one or two thousand years earlier, and done!”

The damage of this post-truth genetics is already done: we will see the unending distribution on the Internet in general, and on social networks in particular, of these grandiose conclusions, of far-fetched Indo-European migration models that include the Corded Ware culture, of simplistic maps with apparently harmless ‘arrows of migration’ (like the above) representing fictional population movements suggesting nonexistent dialectal branches.

You might be one of those sceptics wary of so many boring statistical rules: “But it’s a safe reasoning: Yamanaya samples have an ‘ancestral component’ that is found elevated in Corded Ware samples, and less so in Bell Beaker samples, and PCA showed a similar result…so the migration model Yamnaya -> Corded Ware -> Bell Beaker is a priori correct, right?”

The ‘Future American’ hypothesis

Let me illustrate this attractive “Correlation = Causation” argument, using it to solve the problem of Future American languages.

Suppose we live in a future post-apocalyptic world ca. 3500 AD, with no surviving historical records before 3000 AD. None. Just investigation of cultures and their relationship by Archaeology, proto-languages reconstructed and language families identified by Linguistics, etc.

We have thus Future Germanic and Future Romance as the only language families spoken in Future Western Europe and in the Future Americas, in a distribution similar to the present day*, and we have certain somehow related archaeologically-defined cultures on both sides of the Atlantic, like Briton, Iberian, Norman, or Lowlandish, although their distribution remains partly undefined in time and space.

* If you are really curious about this scenario, you can read about the potential evolution of a Future North-American language.

But what languages did the ancestors of Future Americans speak, and who spread them? That question remains far from being settled by our future researchers, in spite of the solidest linguistic and migration models (talking mainly about Briton and Iberian cultures): too many authorities out there questioning them, fighting to impose their own pet theories.

Suddenly, the newly developed field of Human Ancestry comes to save the day. So let’s say we have this map of ancient samples recovered (dated from, say, the 6th to the 18th century AD), and our study is centered on the newly described “Western European” component (a precise combination of, say, WHG+steppe), which peaks in early samples from the Low Lands – hence we call it, quite daringly, “Lowlandic component“.

Our group is keen to demonstrate that the ancient Lowlandic culture described in Archaeology (marked especially by the worldwide distribution of tulips among other traits) is the origin of Western European and American languages… Now, let’s reach conclusions about migrations in the Middle Ages!

‘Future American’ hypothesis. Migration routes in Western Europe and the Americas during the Middle Ages, based on the ‘Lowlandic component’ (Click to open higher quality version).

PCA shows that South-West European samples cluster closely to some North-West European samples, and that some late South American samples available cluster at some distance from North American samples – nearer to a native component represented by two individuals with 0% Lowlandic ancestry and a different cluster in PCA. And some North-American samples cluster quite closely to North-West European samples.

Based on the decrease in ‘Lowlandic component’ in the different samples and on PCA, we conclude that Lowlandic peoples (“or their close relatives”) must have migrated at the same time to North America, South America (or potentially from North America to South America?) as well as western, central, and northern Europe. Both migration events must have happened roughly at the same time, in part because both distinct language families appear in a north-south distribution, and Proto-Lowlandic must be (according to Genetics) the ancestor of both, Proto-Future-Germanic and Proto-Future-Romance.

That makes a lot of sense! A huge Lowlandic pressure for migration, you see. Push-pull mechanisms and stuff. A Lowlandic Empire probably (scattered remains are found everywhere)! And, judging by the presence of the ‘Lowlandic component’ in Future East Europe from the Elbe to the Vistula, maybe Lowlandic peoples spread Proto-Slavic, too! We can even date the common Lowlandic-Slavic proto-language this way! So many groundbreaking conclusions!

Future scholars supporting the Lowlandic homeland are on fire; they can’t get enough of publishing papers on the subject. “Two different Future American language families with cultural origins in Britain and Iberia, my ass! Because genetics.”

And don’t forget the future people of haplogroup R1b-U106 and high Lowlandic component: Wow, they are the heirs of those who expanded Future Germanic and Future Romance languages everywhere, aren’t they? How proud they must be. And who wouldn’t want to have these tall, blond, blue-eyed Lowlanders as their forefathers? Personalised genetic analysis is selling like crazy: “let’s know our Lowlandic percentage!”. Everyone is happy, colourful maps with lots of arrows and shit…

But – your future you might ask in awe, seeing that this doesn’t sound quite right, based on your basic archaeological and linguistic knowledge:

  • What about specific models of migration proposed to date? The solidest ones, not just anyone that seems to fit?
  • What about the dialectal classification of languages? The mainstream ones, not those that are compatible with this interpretation?
  • What about archaeological cultures to which individual samples belonged?
  • What about the actual dates of each sample? And how this date relates to the state of the culture to which it belongs?
  • What about the haplogroups, and the actual subclade of each haplogroup?
  • What about the territories, cultures, and dates not sampled, could they change this interpretation in light of known archaeological models?
  • And what about the actual origin of that ancestral component they so frivolously named? Dit it really appear ex nihilo in the Low Lands, and expanded from it?

“Who cares! This new data is sooo coool… And it proves what we wanted, what a coincidence! And it’s numbers, mate! Numbers don’t lie.”

No, numbers don’t lie. But people do.

Correlation is fun, isn’t it?



Schleicher’s Fable in Proto-Indo-European – pitch and stress accent


Also included in our monograph North-West Indo-European (first draft) is a tentative reconstruction of Schleicher’s fable in North-West Indo-European, and just for illustration of the reconstructed sounds (including pitch and stress accent), a recording has been included.

The recording is available as audio (see above) or video (see below) with captions and multiple subtitles. The captions in North-West Indo-European show acute accents over accented vowels, while stressed syllables are underlined:

I think such a recording was necessary for comparison with the most commonly reconstructed pronunciation, as taught usually in courses. And I am not referring to those professors still using only stress – instead of pitch – accent to pronounce PIE, but to those that, using pitch accent, do place stress over the same syllable.

A good example to illustrate my point is Andrew M. Byrd‘s reading of his version of the fable for the journal Archaeology.

Apart from some controversial decisions regarding the Proto-Indo-Hittite reconstruction – see our explanation of our version, or e.g. Kortlandt’s reconstruction of the Fable (PDF) for more details – , his recitation does not seem to contrast enough pitch and stress accent, to the extent that pitch and stress seem to be always on the same syllable. He specialises in Proto-Indo-European phonology, so maybe it is a voluntary selection.

Firstly, as an introduction – in case you don’t know anything about this question -, a pitch accent is reconstructed for Proto-Indo-European, based on the reconstructed accent of Old Indian, Greek, Germanic, and Balto-Slavic – hence also valid for North-West Indo-European, even though Italo-Celtic lost it completely.

If you have listened to any tonal language*, words have also stress accent, and not necessarily on the same syllable – but usually on the heaviest one. In fact, I don’t know of an accent pattern with pitch+stress on the same syllable (but for certain reconstructed intermediate labile stages of a languages), and I guess it is so redundant that it would always lose one of them.

*pitch-accent systems are also tonal systems, after all, since they involve at least two tones: an acute or rising one, and usually a falling one after it.

You can listen to a sample of the Homeric recitation by Stephen Daitz, with restored Ancient Greek pronunciation, where he contrasts pitch and stress beautifully:

Note: you can buy his readings in restored pronunciation online in Bolchazy-Carducci Publishers. I can’t recommend them highly enough.

You can listen to other samples of Ancient Greek with restored pronunciation by Stefan Hagel (whose Homeric singing is superb), or many others.

To see what I mean with the lack of contrast in Byrd’s pronunciation, just compare the restored pronunciation with these samples, of restored Koine Greek, from the Biblical Language Center. I think you can hear pitch accent pronounced, but always stressing the same syllable. After a while, it gets quite monotone (no pun intended); for me, at least*.

*It seems to be, nevertheless, one of the top rated pronunciations of Koine Greek out there.

Pitch accent in my pronunciation is not as noticeable as that of Stephen Daitz, and still less than that of Stefan Hagel. But it is not intended to.

I wanted to combine tone and stress as naturally as possible, as it is found in modern languages, like Chinese, or like South Slavic, Baltic, or Scandinavian languages. I believe PIE phonology cannot be too different from modern natural examples.

Many Modern Greek scholars complain about the artificiality of the restored pronunciation. I’ve heard particularly harsh criticism against Stefan Hagel’s pronunciation: many scholars do not recognise the ancestral language in the restored pronunciation.

While such critics may seem like snob reactionaries, and I really appreciate an exaggerated poetic style for epic poems (I have spent hundreds, probably thousands, of hours listening to Stephen Daitz), I don’t think this is the way Ancient Greek was usually spoken. Listening to Hagel’s pronunciation in the Ancient Greek Assimil, there is a huge contrast between readers who don’t use the restored pronunciation in the recordings (offering thus a decaffeinated Ancient Greek), and Hagel’s reading (or, almost, singing).

In my interpretation of the fable I have tried to follow these ideas, and maybe in the end the pitch accent is not as acute as it should be (a fifth higher). On the other hand, it seemed more natural to me this way.

Also, in the final version of my reading, there are many words where it is not clear – not even to me – if there is more than one syllable with pitch or stress accent. This is especially so after after my first change of voice to make a more acute ‘sheep voice’, and then worsens with my graver ‘horse voice’. I really thought recording this was going to be easier!

If you have any comments or suggestions on the pronunciation, they are all welcome.

UPDATE (November 2, 2017): Frederik Kortlandt comments our paper – “When comparing PIE with other tonal languages, the best candidate is Japanese, which means that the “stress” falls on the last High syllable of a word form or sequence of connected word forms.”