Correlation does not mean causation: the damage of the ‘Yamnaya ancestral component’, and the ‘Future American’ hypothesis

Human ancestry can only help solve anthropological questions by using all anthropological disciplines involved. I have said that many times in this blog.

Correlation does not mean causation

Really, it does not.

You might think the tenet ‘correlation does not mean causation‘ must be evident at this point in Statistics, and it must also be for all those using statistical methods in their research. But it is sadly not so. A lot of researchers just look for correlation, and derive conclusions – without even an initial sound hypothesis to be contrasted… You can judge for yourself, e.g. reading the many instances of this complaint in recent publications of Biomedical and Social Sciences, on the interesting blog Statistical Modeling, Causal Inference, and Social Science.

In anthropological questions regarding Indo-European studies there is an added handicap: not taking correlation to mean causation does also mean – to avoid at least the most obvious confounders – taking into account the multiple linguistic and archaeological data that are available right now, to explain the expansion of Indo-European languages.

You might also believe that international researchers in Human Evolutionary Biology – after all, this is essentially a biomedical discipline – are acquainted with statistical methods and their problems when applied to their field. And that scientific journals – and especially those with the highest impact factors, like Nature, Science, or PNAS – have professional, careful reviewers who would never accept papers that equal correlation with causation, especially when Social Sciences are involved (because this alone might make errors grow exponentially…). Sadly, this is obviously not so, either.

https://imgs.xkcd.com/comics/correlation.png

The ‘Yamnaya component’ concept and its damage

From Allentoft et al. (2015), emphasis is mine:

Both studies [Haak et al. (2015) and this one] found a genetic affinity between samples from a central European culture known as Corded Ware, which existed from around 2500 bc, and samples from the earlier Yamnaya steppe culture. This similarity between distant populations is best explained by a substantial westward expansion of the Yamnaya or their close relatives into central Europe (Fig. 1b). Such an expansion is consistent with the steppe hypothesis, which argues that Corded Ware cultures were a conduit for the dispersal of Indo-European languages into Europe.

More interesting than these vague words – and the short, almost invisible suggestion that Yamna may not be exactly the population behind Corded Ware peoples – are the maps that illustrated in Nature their risky hypothesis: they called it “steppe hypothesis“, like that (in general terms), as if everyone defending a steppe origin for Proto-Indo-European would support such a model, when they actually referred to the specific hypothesis of one of their authors (Kristiansen), one of the few archaeologists who keep Gimbutas’ concept of the ‘Kurgan peoples’ alive, based on the Corded Ware culture:

Allentoft Corded Ware
Allentoft et al. (2015): “They conclude that the Corded Ware culture of central Europe had ancestry from the Yamnaya. Allentoft et al. also show that the Afanasievo culture to the east is related to the Yamnaya, and that the Sintashta and Andronovo cultures had ancestry from the Corded Ware. Arrows indicate migrations — those from the Corded Ware reflect the evidence that people of this archaeological culture (or their relatives) were responsible for the spreading of Indo-European languages. All coloured boundaries are approximate.”

In many publications that followed, the trend has been to reproduce this graphical model, by asserting (or implying) that Bell Beaker peoples were the result of subsequent Corded Ware migrations, and indeed that Corded Ware peoples migrated from the Yamna culture, and were thus the vector of expansion for Indo-European languages in Europe.

All of this is being proven wrong, as I predicted: see Mathieson et al. (2017) and Olalde et al. (2017) for recently studied samples with ‘steppe component’, older than (and unrelated to) the Yamna culture. However, no retraction (or correction, whatever) has been published to date about the concept of the ‘Yamnaya ancestry expansion’, and its consequences.

We shall see then just a rather surreptitious shift in terminology from ‘Yamnaya’ to ‘steppe’ component, to adapt to the new data – i.e. some damage control while the ship of ‘Yamnaya ancestry’ capsizes – but little else. “Earlier ‘Yamnaya ancestry’, you say? Just, you know, let’s call it ‘steppe ancestry’ and shift the expansion of Indo-European languages to one or two thousand years earlier, and done!”

The damage of this post-truth genetics is already done: we will see the unending distribution on the Internet in general, and on social networks in particular, of these grandiose conclusions, of far-fetched Indo-European migration models that include the Corded Ware culture, of simplistic maps with apparently harmless ‘arrows of migration’ (like the above) representing fictional population movements suggesting nonexistent dialectal branches.

You might be one of those sceptics wary of so many boring statistical rules: “But it’s a safe reasoning: Yamanaya samples have an ‘ancestral component’ that is found elevated in Corded Ware samples, and less so in Bell Beaker samples, and PCA showed a similar result…so the migration model Yamnaya -> Corded Ware -> Bell Beaker is a priori correct, right?”

The ‘Future American’ hypothesis

Let me illustrate this attractive “Correlation = Causation” argument, using it to solve the problem of Future American languages.

Suppose we live in a future post-apocalyptic world ca. 3500 AD, with no surviving historical records before 3000 AD. None. Just investigation of cultures and their relationship by Archaeology, proto-languages reconstructed and language families identified by Linguistics, etc.

We have thus Future Germanic and Future Romance as the only language families spoken in Future Western Europe and in the Future Americas, in a distribution similar to the present day*, and we have certain somehow related archaeologically-defined cultures on both sides of the Atlantic, like Briton, Iberian, Norman, or Lowlandish, although their distribution remains partly undefined in time and space.

* If you are really curious about this scenario, you can read about the potential evolution of a Future North-American language.

But what languages did the ancestors of Future Americans speak, and who spread them? That question remains far from being settled by our future researchers, in spite of the solidest linguistic and migration models (talking mainly about Briton and Iberian cultures): too many authorities out there questioning them, fighting to impose their own pet theories.

Suddenly, the newly developed field of Human Ancestry comes to save the day. So let’s say we have this map of ancient samples recovered (dated from, say, the 6th to the 18th century AD), and our study is centered on the newly described “Western European” component (a precise combination of, say, WHG+steppe), which peaks in early samples from the Low Lands – hence we call it, quite daringly, “Lowlandic component“.

Our group is keen to demonstrate that the ancient Lowlandic culture described in Archaeology (marked especially by the worldwide distribution of tulips among other traits) is the origin of Western European and American languages… Now, let’s reach conclusions about migrations in the Middle Ages!

america-languages-lowlandic
‘Future American’ hypothesis. Migration routes in Western Europe and the Americas during the Middle Ages, based on the ‘Lowlandic component’ (Click to open higher quality version).

PCA shows that South-West European samples cluster closely to some North-West European samples, and that some late South American samples available cluster at some distance from North American samples – nearer to a native component represented by two individuals with 0% Lowlandic ancestry and a different cluster in PCA. And some North-American samples cluster quite closely to North-West European samples.

Based on the decrease in ‘Lowlandic component’ in the different samples and on PCA, we conclude that Lowlandic peoples (“or their close relatives”) must have migrated at the same time to North America, South America (or potentially from North America to South America?) as well as western, central, and northern Europe. Both migration events must have happened roughly at the same time, in part because both distinct language families appear in a north-south distribution, and Proto-Lowlandic must be (according to Genetics) the ancestor of both, Proto-Future-Germanic and Proto-Future-Romance.

That makes a lot of sense! A huge Lowlandic pressure for migration, you see. Push-pull mechanisms and stuff. A Lowlandic Empire probably (scattered remains are found everywhere)! And, judging by the presence of the ‘Lowlandic component’ in Future East Europe from the Elbe to the Vistula, maybe Lowlandic peoples spread Proto-Slavic, too! We can even date the common Lowlandic-Slavic proto-language this way! So many groundbreaking conclusions!

Future scholars supporting the Lowlandic homeland are on fire; they can’t get enough of publishing papers on the subject. “Two different Future American language families with cultural origins in Britain and Iberia, my ass! Because genetics.”

And don’t forget the future people of haplogroup R1b-U106 and high Lowlandic component: Wow, they are the heirs of those who expanded Future Germanic and Future Romance languages everywhere, aren’t they? How proud they must be. And who wouldn’t want to have these tall, blond, blue-eyed Lowlanders as their forefathers? Personalised genetic analysis is selling like crazy: “let’s know our Lowlandic percentage!”. Everyone is happy, colourful maps with lots of arrows and shit…

But – your future you might ask in awe, seeing that this doesn’t sound quite right, based on your basic archaeological and linguistic knowledge:

  • What about specific models of migration proposed to date? The solidest ones, not just anyone that seems to fit?
  • What about the dialectal classification of languages? The mainstream ones, not those that are compatible with this interpretation?
  • What about archaeological cultures to which individual samples belonged?
  • What about the actual dates of each sample? And how this date relates to the state of the culture to which it belongs?
  • What about the haplogroups, and the actual subclade of each haplogroup?
  • What about the territories, cultures, and dates not sampled, could they change this interpretation in light of known archaeological models?
  • And what about the actual origin of that ancestral component they so frivolously named? Dit it really appear ex nihilo in the Low Lands, and expanded from it?

“Who cares! This new data is sooo coool… And it proves what we wanted, what a coincidence! And it’s numbers, mate! Numbers don’t lie.”

 
No, numbers don’t lie. But people do.

Correlation is fun, isn’t it?

 

Related:

Join the discussion...

Please keep the discussion of this post on topic.
For other topics, use the forum instead.
newest oldest most voted
Notify of
trackback

[…] For example, how could Sredni Stog be Late Indo-European-speaking, if the best candidate for a Late Indo-European-speaking community (the Yamna culture) is almost fully unrelated? For some, simply because of the ‘Yamnaya ancestral component’. […]

trackback

[…] geneticists, archaeologists, and linguists: do collaborate. Merely by talking, the so-called ‘Yamnaya ancestral component’ would have never been given that dreadful name, and maybe we could end this quest to find a […]

trackback

[…] and Kristiansen’s school based on the famous 2015 papers, whereby – due to the “Yamnaya ancestral component” – the Yamna culture would have been composed of communities of R1a-M417 and R1b-M269 […]

trackback

[…] is always interesting to see how reports gradually evolve, including more and more doubts about the ‘Yamnaya component’, and how it may be correctly interpreted. Slow but steady wins the […]

trackback

[…] of the Corded Ware culture, and that speculative interpretations of recent genetic papers (especially since 2015) are not doing much in favour of sound anthropological models by connecting directly Yamna to […]

trackback

[…] of this paper) was not so fast to explain the findings the same way the proposed their infamous Indo-European – steppe ancestry association (i.e. ancestry = language, ergo CHG = PIE in this case), and resorted to mainstream anthropological […]

trackback

[…] Directly applicable to the research groups that launched the Yamna-CWC idea based on the fallacious “Yamnaya ancestry” concept, and who are still rooting for it – and to the people who will follow it from now on with […]

trackback

[…] R1a speaking IE (July 2017), my post on the Eneolithic Ukraine sample (September 2017), or on the “Yamnaya ancestral component” (November […]

trackback

[…] seems that – now that the Danish workgroup (responsible for the “steppe ancestry = Indo-European” and “Corded Ware expanded from Yamna“) is backing down, and both it and the […]

trackback

[…] diese tatsächlichen archäologischen und linguistischen Modelle jemanden davon abhalten werden, auf genetische Vermischung-basierende vorläufige Skizzen eines fiktiven „Kurgan-Volkes“ zu stützen, dass vor fast 60 Jahren veraltet war – besonders wenn sie bestimmte Wünsche der […]

trackback

[…] the bad aspect, they keep repeating the same “steppe ancestry” meme (in the featured image above, or the one below). I know this is the news report (i.e. science […]

trackback

[…] adds wheels and wool symbols probably in support of some model based on yet another correlation that is not causation (that I cannot then yet properly […]

trackback

[…] I guess we will still see some groups still resorting to the good old Yamnaya ancestral component™, consciously ignoring that a proportion of ancestral components (some combination of EHG:CHG:WHG in […]

trackback

[…] Kristiansen’s ‘long-lasting GAC-CWC connection’, now ignored to favour their Yamnaya admixture™ concept), and also three ways of defining Corded Ware […]

trackback

[…] […]

trackback

[…] everything is possible, since it is brought to you by the same Danish group who proposed the Yamnaya ancestral component™, the CHG = Indo-European (and simultaneously EHG in Maykop = Anatolian??), and now also the CWC/R1a […]

trackback

[…] right; a model – if anyone is lost here – based on proportions of the so-called Yamnaya ancestral component™, as it was found in a small number of samples, from four or five Eneolithic–Chalcolithic cultures […]

trackback

[…] ever had (or intend to have) a common position for their workgroup – beyond their beloved Yamnaya™ ancestral component and R1a must be Indo-European – , because each new publication changes some essential aspects […]

trackback

[…] a similar “steppe ancestry” due to convergence. I have said so many times (see e.g. here). This was clear long ago, just by looking at the Y-chromosome bottlenecks that differentiate them […]

trackback

[…] the “Siberian ancestry” white whale is that nobody really knows what it is; just like we did not know what “Yamnaya ancestry” was, until the most recent data is making the picture clearer. Its nature is changing with each new […]

trackback

[…] true “Yamnaya ancestry” (not the originally described one) was in fact associated with Indo-Europeans: see more on the evolution of Yamna Hungary into Bell […]

trackback

[…] some brilliant minds decided in 2015 that the so-called “Yamnaya ancestry” should be associated to ‘Indo-Europeans’. This is causing the development of various […]

trackback

[…] […]

trackback

[…] a similar “steppe ancestry” due to convergence. I have said so many times (see e.g. here). This was clear long ago, just by looking at the Y-chromosome bottlenecks that differentiate them […]

trackback

[…] that of Sredni Stog/Corded Ware origin vs. that of Repin/Yamna origin, a difference that has been known for quite some time […]

trackback

[…] seen that movie, it’s because you have. They are at it again, Corded Ware from Yamna, and more “steppe ancestry” = more Indo-European. It seems we haven’t learnt anything about “Steppe ancestry” since 2015. But […]

trackback

[…] […]

trackback

[…] […]

Joe Flood
Joe Flood

PCA is a very flawed technique, almost all other disciplines dropped it after the 1970s. It has little statistical merit but can be quite good for demonstrating spatial forces. I certainly would never use it to try to ‘prove’ something – just for a rough picture

By the way, do you know who invented the idea or name of “Yamna” to refer to the pit grave people on the steppe? Gimbutas uses “kurgan” which is rather less pretentious and more descriptive.

trackback

[…] Correlation does not mean causation: the damage of the ‘Yamnaya ancestral component’, an… […]