The history of the simplistic ‘haplogroup R1a — Indo-European’ association

This is a series of posts I wrote at the end of 2017 / beginning of 2018, to answer the wrong assumptions I could read in forums and blogs. I decided not to publish them then, seeing how many successive papers were confirming my theory in a (surprisingly) clear-cut way. Nevertheless, because I keep reading the same comments and personal attacks no matter what gets published even in mid-2018, I have decided to update and publish them. This way I will be able to respond to the “haplogroup R1a – Indo-European association” directly by pointing to any of these posts from now on, instead of writing a comment each time.

There is a dying trend these days which supports against all (recent) evidence the association R1a – Indo-European. As I have argued many times, it does not make any sense to make that identification (or any other), unless it is used to assess potential migrations that coincide with described Proto-Indo-European stages and dialects.

So, for example, in the Indo-European demic diffusion model, phylogeography (and more specifically, the association of R1b-M269 subclades) is used to ascertain the migration of Middle and Late Indo-European speakers from the steppe.

If, in the future, the current picture should change, no R1b/IE-fanboy should stop the evolution of the general knowledge because of a personal attachment to a wrong idea.

History of the Proto-Indo-European homeland question and its association with haplogroup R1a

Here is the simplified history of this trend:

19th and early 20th c.: The now so-called Kossinnian views of cultural-historical communities dominate the field of ethnolinguistic identification. Eurocentric or Indocentric views dominate the field. Simplistic racial theories and histories become fashionable, and have a long-lasting impact on racial views.

“Maximum Expansion of Alpines” — Map from The Passing of the Great Race showing the “essentially peasant” Alpine migrations into Europe. The Nordic [=Indo-European], in his hypothesis, was “Homo europaeus, the white man par excellence. It is everywhere characterized by certain unique specializations, namely, wavy brown or blond hair and blue, gray or light brown eyes, fair skin, high, narrow and straight nose, which are associated with great stature, and a long skull, as well as with abundant head and body hair.” See Wikipedia for more information.

1960s: Gimbutas’ steppe theory, with kurgan-builder horse-riders expanding Proto-Indo-European rapidly through Europe and Asia at the end of the third millennium. It receives almost immediate criticism for some of its simplistic aspects.

“European dialect” expansion of Proto-Indo-European according to Gimbutas (1963)

1970-1980s: Gimbutas and later her disciple Mallory continue and expand the steppe theory, in spite of the controversial basis of a common Kurgan burial, and the controversial dialectal division (including a Nordic or Germano-Balto-Slavic group). Other hypothesis (e.g. Anatolian hypothesis by Colin Renfrew, Armenian homeland and glottalic theory) are born.

PIE homelands suggested by linguists and archaeologists. Image from Indo-European Chronology (TIED)

1990s: Kristiansen and Anthony retake the impopular migration models, and archaeologists begin to accept the incorporation of ethnology again.

In the east, after the fall of the Soviet Union and its puppet states, Gimbutas’ model is finally known there and it is discussed. Also, western academics are finally able to collaborate with local researchers in Russia and Ukraine to develop more detailed migration models. Gimbutas’ simplistic picture of a steppe origin of PIE may finally be tested and improved upon.

Proposal for the origin and spread of the Corded Ware/ Battle Axe cultural complex: 1) Distribution of CWC groups; 2) Yamna culture; 3) presumed area of origin; 4) presumed main directions of the primary distribution. Also numbered are other individual CW cultures. From Kristiansen (1989).

2000s: Cavalli-Sforza’s phylogeography begins in the 1990s. The distribution of haplogroups R1a and R1b in the modern populations depict a potential ancient R1a-Corded Ware and R1b-Bell Beaker distribution, which makes a cultural diffusion of R1a into R1b through the Rhine quite likely.

The first radiocarbon analyses seem to confirm that the Bell Beaker culture originated and thus expanded from Iberia, from a descriptive archaeological point of view.

R1a distribution in Eurasia (Wikipedia). The typical map people relied on in the 2000s to derive wrong conclusions

The pride of ‘belonging’ to certain haplogroups start all over the world. In Europe, for example, the mental picture of ancient (Palaeolithic/Mesolithic) peoples developed is similar to Wiik’s:

Vasconic/R1b-Uralic/N1c distribution and Indo-European/R1a. The concept that stained it all.

It is a good time for other theories, also, including cute algorithms and glottochronology, that would no doubt overcome the limitations of informal guesstimates

2010-2014: After the publication of Anthony‘s revised steppe theory on Khvalynsk/Yamna migrations (based on Ringe’s glottochronological model) – where no migration can be seen into Corded Ware -, and Harrison and Heyd‘s (2007-2012) new model of East Bell Beaker peoples derived from Yamna (distinct from Proto-Bell Beaker cultural diffusion from Iberia) by Heyd, the identification of Bell Beaker expansion with North-West Indo-European by Mallory (Celtic from the West 2, 2013), and with the formation of a Nordic language by Prescott (1999-2010), revolutionise the traditional archaeological models.

A migration Yamna -> Bell Beaker is now mainstream, while only Anthony keeps claiming that the language of certain Corded Ware groups is the product of cultural diffusion (e.g. Germanic through Usatovo).

linguistic and archaeological evidence for steppe Indo-Europeans

2015-2016: The overconfidence of the famous recent genetic papers, including Brandt et al. (2013), Haak et al. (2015), Mathieson et al. (2015), and Allentoft et al. (2015), and also Lazaridis et al. (2016), change the mood yet again. The IE – Corded Ware association and some almost forgotten language classifications are brought back to life. A Yamna -> Corded Ware migration is proposed, followed by a later Corded Ware -> Bell Beaker migration wave, based solely on statistical methods.

Allentoft Corded Ware
Allentoft et al. “They conclude that the Corded Ware culture of central Europe had ancestry from the Yamnaya. Allentoft et al. also show that the Afanasievo culture to the east is related to the Yamnaya, and that the Sintashta and Andronovo cultures had ancestry from the Corded Ware. Arrows indicate migrations — those from the Corded Ware reflect the evidence that people of this archaeological culture (or their relatives) were responsible for the spreading of Indo-European languages. All coloured boundaries are approximate.”

R1a fans who didn’t know about archaeological or linguistic models related to the Indo-European question become either ecstatic (those of Northern and East European descent) or defensive (Hindu nationlists).

A new anthropological model of migration is born, based solely on ancient DNA, without Anthropology or Archaeology needed to support it. Linguistics becomes thus a game for many so-called amateur geneticists (who are not geneticists). Amateurs delight in playing with bioinformatic tools and free datasets to obtain “percentages” of some simplistically named ethnic components, whose interpretation is obviously as flawed as the whole process followed.

As for Fernando and me, it would seem as though it confirmed what we had supported. A Yamna -> Corded Ware -> Bell Beaker migration, so West Corded Ware groups could easily represent North-West Indo-European, and Únětice the core languages that remained in close contact and expanded later, with a picture similar to what we depicted in our old maps (in favour of Gimbutas/Kristiansen, and against Anthony).

Chalcolithic – Corded Ware as Late Indo-European languages

However, the solidest alternative theory against our preferred one were the models of Anthony (2007) and Heyd (2007, 2012), which supported that a genetic migration could only be seen from Yamna into Afanasevo and into Yamna settlers in the Balkans, not between Yamna and Corded Ware. Y-DNA, even if scarce, pointed in their direction, not in ours. Also, the so-called ‘Yamnaya ancestry’ remained very poorly defined, and their conclusions were not tenable with the available data. The Indo-Uralic concept of the Leiden school seemed to fit very well with the formation and distribution of steppe ancestry in two main Neolithic European groups, though.

So I decided in 2016 to write a draft on what the studies actually showed, if we took linguistic and archaeological data as the basis for the Indo-European diffusion. I used the little time I had left from working in my PhD thesis.

2017: My Indo-European demic diffusion model and then Mathieson et al. (2017), and Olalde et al. (2017), challenge the mood again, so people get less confident about the whole ‘Yamnaya ancestral component’ concept. Archaeologists begin to react, first with Klejn’s complain, and then especially Furholt’s thorough criticism.

NOTE: I am being a little over the top here, since I merely cited Heyd, Anthony, Mallory, Prescott, etc. and put them in common with genetic finds, but this is my blog, and the fact that you are reading this proves something ;-)

Heyd’s doubts on the current interpretation were clear, Anthony changes his famous model, and even Kristiansen does not like the genetic interpretation and keeps supporting some kind of long-lasting interaction with Trypillia and Globular Amphorae cultures, to justify the adoption of an Indo-European language by Corded Ware peoples (who he clearly stated don’t share the same culture as Yamna), even against genetic results.

A new, modified, all-including Anatolian/Armenian/glottochronological homeland hypothesis is also alive and growing thanks to the hopes given in Genomics by the paper on Minoans and Mycenaeans, by Lazaridis et al. (2017).

Science Magazine
Bouckaert et al. (Science 2012): “A first line of evidence comes from linguistic analysis based on quantitative lexical data, which returned a tree compatible with the Anatolian hypothesis”. Funny how glottochronological outputs adapt to mainstream linguistic classifications, as these evolve…

2018: Olalde et al. & Mathieson et al. (2018) give a still clearer picture of the R1b-L23 association with Yamna. The ‘Yamnaya’ ancestral component is mostly called simply ‘steppe’ component, and some begin talking about North Pontic Eneolithic as ‘Indo-Slavonic’ (see for example Kroonen’s reaction to Olalde & Mathieson 2018).

Narasimhan et al. (2018) break that new trend before it is even properly born, in asserting that Proto-Indo-Iranian was spoken by steppe MLBA peoples from Sintashta/Potapovka (and later spread as Andronovo/Srubna), a mixture of Yamna-like Poltavka (of mainly R1b-Z2103) and CWC-like Abashevo (of mainly R1a-Z93 lineages). This is confirmed by the Copenhagen school in Damgaard et al. (Science 2018), where they accept clear links between Finno-Ugric speakers and Proto-Indo-Iranians in the Volga-Ural region.

Image modified from Narasimhan et al. (2018), including the most likely proto-language identification of different groups. Original description “Modeling results including Admixture events, with clines or 2-way mixtures shown in rectangles, and clouds or 3-way mixtures shown in ellipses”. See the original full image here.

The controversy over a hypothetical North Iranian PIE homeland due to the lack of EHG in Old Hittite samples is confirmed as a real problem for a simple genetic interpretation in Damgaard et al. (Science 2018). The finding of EHG in some Maykop samples makes Kristiansen support a Caucasian homeland for Middle PIE.

Other papers, such as those on Corded Ware from the Baltic (showing an homogeneous cluster with previous Corded Ware samples and R1a-Z645 lineages) and samples from Fennoscandia showing that N1c-L392 lineages and Siberian ancestry arrived too late to represent Uralic languages, and in successive waves (and probably not exactly related with each other), confirm what we suspected.

Geneticists finally do realize that in spite of that common steppe ancestral component of Chalcolithic expansions, other factors have to be taken into account – among them the most obvious one , that R1a and R1b lineages are distributed differently (i.e. in different ethnocultural communities) for 2,000 years. David Reich talks, and Iosif Lazaridis writes about this. Even the Danish school backs down, from their previous reactionary view.

The latest paper on the Caucasus by Wang et al. (2018), offering also data on unreleased samples from Yamna settlers in Hungary and nearby GAC-like populations show where the admixture of Bell Beakers happened, breaking the Dnieper-Dniester, Budzhak, Usatovo, or Corded Ware links with the GAC influence seen in East Bell Beakers.

Early Chalcolithic migrations ca. 3300-2600 BC. See more information on this map here.

The future of the “steppe ancestry = Indo-European” fallacy

Of course future developments depend on new results, and my prediction will happen if samples keep appearing as expected from the current distribution (i.e. if selection bias, under-coverage, or over-coverage are not distorting the ancient genetic picture). After all, we thought that R1a and Corded Ware were the vector of Indo-European languages, and now we think differently.

Nobody can say what will happen with new results, and maybe a community of R1a-M417 is actually hidden in Yamna Ukraine, and/or among Hungarian settlers, who knows…

But the fact is that, with the current information available, taking together linguistic, archaeological, anthropological, and genetic data (with overwhelming number of hg R1b-L23 and genetic homogeneity in Khvalynsk, Yamna, Afanasevo, and Bell Beaker) the most likely a priori assumption is that at least two distinct steppe cultural communities developed, lived, spoke, and spread separately.

Now, the picture should be quite clear for everyone. But, instead of narrowing our possibilities and driving our quest to more fine-structure of North-West Indo-European, Palaeo-Balkan and Indo-Iranian branches, what we are going to see in the near future is:

1) Among some professional geneticists (based on what happens in all other scientific fields): continuity, i.e. they will keep referencing the content of previous papers, with some limited criticism of some findings. They need to cite others and themselves (after all, citations move authors’ prestige and journals’ impact factor), any clash with other competing labs can only bring prestige wars with bad consequences for all involved. Also, retracting the own interpretations or conclusions in a clear way may make them appear as unprofessional in front of peers and journals. Believe it or not, ‘salami publishing’ (divide and conquer the impact factor) and ‘non-retraction’ (we were never fully wrong) is the way to go in publishing these days.

Allentoft et al. “They conclude that the Corded Ware culture of central Europe had ancestry from the Yamnaya. Allentoft et al. also show that the Afanasievo culture to the east is related to the Yamnaya, and that the Sintashta and Andronovo cultures had ancestry from the Corded Ware. Arrows indicate migrations — those from the Corded Ware reflect the evidence that people of this archaeological culture (or their relatives) were responsible for the spreading of Indo-European languages. All coloured boundaries are approximate.”

This is understandable. Journals don’t like the bad publicity of retraction, and researchers don’t want to be associated with that (for well-known reasons). Also, among competing labs, which one should be the first to say they were wrong? If the Reich Lab does it, even if they are the leading team now, wouldn’t that make them look like worse than the Copenhagen group? And, the Copenhagen group has already changed their ideas in their most recent papers, surreptitiously modifying (as I expected) every single fact they claimed as true before; so what benefit is there for any of them to clearly, expressly retract previous papers now? None. Benefits of not retracting? All.

NOTE. I would probably do the same, if I were them. If there is no benefit in it, why retract? On the other hand, I would like to see them clearly retract what they say, and not just give up certain positions and go on, because I feel there is a benefit for many of us in their express retraction (and a damage to many in their lack of retraction); because the wrong anthropological theories will find support in certain interpretations that keep popping up in their papers; and because it is good for science in general that retractions become a normalized event in scientific papers.

2) Among anthropologists and amateur geneticists, fans of CWC theory (based on the basic human condition): fait accompli, i.e. remade theories, confirmation and eureka bias; e.g. still supporting GAC influence in Early Yamna -> Corded Ware (Kristiansen), Corded Ware speaking Indo-European by mixing with Bell Beakers (Anthony), invented arrows of migration, admixture events and chronologies, etc. Any theory that may appear as sufficiently complex and in line with certain genetic data will be better than the simple picture by Heyd of a Khvalynsk -> Yamna -> Bell Beaker; especially if it happens to partially support something they said before…

Anthony (2017). Central and Eastern Europe ca. 3000–2500 BCE showing the early Yamnaya culture area 3300–2700 BCE and the Yamnaya migration up the Danube Valley with related/offshoot Makó and Vučedol sites; also the distribution of Corded Ware sites in northern Europe; and site areas sampled for aDNA in Haak et al. (2015). The oldest Corded Ware radiocarbon dates are from southern and central Poland. The Yamnaya cemeteries in the Danube Valley are after Heyd (2011), the shaded Globular Amphorae site area is after Harrison and Heyd (2007); the Corded Ware and Globular Amphorae sites in southern Poland are after Machnik (1999); and the blue dots were all Corded Ware sites with radiocarbon dates as of Furholt (2003).

This is also understandable. If you have placed a bet, and it does not go as you expect, you have two options: accept, or deny. In complex anthropological theories (with scattered, non-connected papers, drafts, comments, etc. written over years) you have a third way: you can accept the new results, but deny that you said something fully against this. So, what we usually read in new papers, drafts, forums, or blogs is not “I was wrong”, but “hey, I said something similar in the year YYYY in that paper/comment/post! In fact, I expect this and this to occur now”. Who is going to discuss that? Who is going to check what you said in the other hundred writings over the years in which you were fully wrong?

NOTE. This happens also with hobbyists in forums and blogs. It is also my impression that among them there are many prone to mentally replicate for prehistory the idealistic 19th century division of Europeans in ‘northerners’ (say, R1a + R1b-U106) and ‘southerners’ (say, the rest of R1b + Neolithic). That’s probably why the R1a-R1b division works so well. This way, we could support an idealized, “full Indo-European Europe” (fully ‘white’ Europe?) divided instantly in the pure blue-eyed blonde Barbarians from the north, and the admixed learned Mediterraneans from the south. I see a lot of theories from the past and from the present which seem to be superficially concerned about prehistory and genetics, but deeply they tend to reproduce this simplistic modern picture in prehistoric Europe since the Mesolithic. Plainly and painfully wrong, too.

3) Among amateur geneticists and R1a-fans of the steppe hypothesis of Northern or East European descent (based on what we see among Basque and Indian nationalists): reactionary views, i.e. the misuse of traditional disciplines to point now to the North Pontic area, Dnieper-Dniester, etc. as the ‘original’ PIE-speaking area – be it Kurgans, horse domestication, horse riding, symbology, blonde hair, blue eyes, lactase persistance, dolychocephaly,… anything will do. Where they said Khvalynsk/Yamnaya, they will say now Sredni Stog/Corded Ware – whatever is necessary to prove that R1a spoke Indo-European.

This is also human nature. If you have supported continuity theories of Balto-Slavic or Germanic in Northern Eurasia since ca. 3500 BC to this day, have acted according to your preconceptions and racism and narcisism (i.e. inferiority complex), calling names, participating in forums stating that ‘R1a is a great haplogroup’ and similar stupid shit, learning statistical methods to prove your preconceptions, etc. And now someone tells you that this is not the case, that your haplogroup may have expanded much later, probably including some recent bottlenecks, and was not found among Indo-European speakers (and later probably not even associated with your current IE language)… And that on top of being fully wrong in many of your previous writings… I guess it’s like telling a fundamentalist muslim that the Qur’an is not true, or something of the sort. Nothing good will come from their reactions. Seeing how Basque racists and Indian nationalists reacted to the latest papers, we will have to wait for new generations to overcome the preconceptions of those who lived with the genetics of the 2000s…

Modern haplogroup R1a distribution from The Genetic Atlas (Public Domain)

NOTE. I wouldn’t do the same. I cannot share the feeling of these peoples of Northern and Eastern European descent. In fact, I believed in the 2000s that R1b represented ‘autochthonous’ Vasconic-like speakers in Western Europe, and R1a/Corded Ware represented IE speakers, and I changed my views after 2015. And, if the case was reversed, and we had thought that R1b was the Indo-European marker, and ancient DNA had proved me wrong (seeing R1a-Z645 where we see R1b-L23 right now), I would be happy to talk about the same North-West Indo-European represented by (say) the expansion of R1a-Z282 from Yamna Hungary -> Bell Beaker, and then the resurge of Vasconic-like R1b in Western Europe. That’s why I attack this position ferociously: I don’t think there is any reasonable justification for this attitude. Nor for the nationalist OIT, Basque/R1b, Slavic or Nordic/R1a, etc. It’s obviously a human behaviour, but there is no benefit, no reason, nothing but sheer bigotry, a reek of supremacy (or inferiority complex), and a desire for domination of prehistoric narratives by modern post-truths, however you look at it.

Even Hindu nationalists had a ‘more rational’ motive for their R1a/IE association: conservative Hindus, as I understand it, share a reactionary view against the ethnolinguistic and religious complexity of the Indian subcontinent, and they want to show religious and cultural unity and continuity only broken by Islam. Because I assumed the same supremacy views from those of Indian descent as I did for those of Northern and Eastern European descent, I was surprised to see the reaction of many to the news of R1a spreading from Steppe MLBA: I have seen them shift their theories according to the new papers, supporting (among many different new contradictory positions) that it was Iranian farmer ancestry (and thus the Indus Valley civilization) the responsible for Indo-Aryan expansion, and that R1a is not relevant anymore (and is just the product of late bottlenecks from steppe incomers). It became clear to me that racism based on ancestry or haplogroup is not the driving force here, but religious and nationalist bigotry. The R1a/OIT association was just one of many resources that the Hindutva has used to support the cultural and historical origin of modern India in the Indus Valley Civilization, to simplify the religious, ethnic, and linguistic diversity of India.

All in all, it is discouraging how humans are incapable to adapt to purely theoretical concepts, just because they previously assumed they were true.