Bell Beakers and Mycenaeans from Yamnaya; Corded Ware from the forest steppe


I have recently written about the spread of Pre-Yamnaya or Yamnaya ancestry and Corded Ware-related ancestry throughout Eurasia, using exclusively analyses published by professional geneticists, and filling in the gaps and contradictory data with the most reasonable interpretations. I did so consciously, to avoid any suspicion that I was interspersing my own data or cherry picking results.

Now I’m finished recapitulating the known public data, and the only way forward is the assessment of these populations using the available datasets and free tools.

Understanding the complexities of qpAdm is fairly difficult without a proper genetic and statistical background, which I won’t pretend to have, so its tweaking to get strictly correct results would require an unending game of trial and error. I have sadly little time for this, even taking my tendency to procrastination into account… so I have used a simple model akin to those published before – in particular, the outgroup selection by Ning, Wang et al. (2019), who seem to be part of the only group interested in distinguishing Yamnaya-related from Corded Ware-related ancestry, probably the most relevant question discussed today in population genomics regarding the Proto-Indo-European and Proto-Uralic homelands.

Supplementary Table 13. P values of rank=2 and admixture proportions in modelling Steppe ancestry populations as a three-way admixture of Eneolithic steppe Anatolian_Neolithic and WHG using 14 outgroups.
Left populations: Test, Eneolithic_steppe, Anatolian_Neolithic, WHG.
Right populations: Mbuti.DG, Ust_Ishim.DG, Kostenki14, MA1, Han.DG, Papuan.DG, Onge.DG, Villabruna, Vestonice16, ElMiron, Ethiopia_4500BP.SG, Karitiana.DG, Natufian, Iran_Ganj_Dareh_Neolithic.

I have used for all analyses below a merged dataset including the curated one of the Reich Lab, the latest on Central and South Asia by Narasimhan, Patterson et al. (2019), on Iberia by Olalde et al. (2019), and on the East Baltic by Saag et al. (2019), as well as datasets including samples from Wang et al. (2019) and Lamnidis et al. (2018). I used (and intend to use) the same merged dataset in all cases, despite its huge size, to avoid adding one more uncontrolled variable to the analyses, so that all results obtained can be compared.

I try to prepare in advance a bunch of relevant files with left pops and right pops for each model:

  1. It seems a priori more reasonable to use geographically and chronologically closer proxy populations (say, Trypillia or GAC for Steppe-related peoples) than hypothetic combinations of ancestral ones (viz. Anatolian farmer, WHG, and EHG).
  2. This also means using subgroups closer to the most likely source population, such as (Don-Volga interfluve) Yamnaya_Kalmykia rather than (Middle Volga) Yamnaya_Samara for the western expansion of late Repin/early Yamnaya, or the early Germany_Corded_Ware.SG or Czech_Corded Ware for the group closest to the Proto-Corded Ware population (see below), likely neighbouring the Upper Vistula region.
  3. I usually test two source populations for different targets, which seems like a much more efficient way of using computer resources, whenever I know what I want to test, since I need my PC back for its normal use; whenever I don’t know exactly what to test, I use three-way admixture models and look for subsets to try and improve the results.

I have probably left out some more complex models by individualizing the most relevant groups, but for the time being this would have to do. Also, no other formal stats have been used in any case, which is an evident shortcoming, ruling out an interpretation drawn directly and only from the results below.

Full qpAdm results for each batch of samples are presented in a Google Spreadsheet, with each tab (bottom of the page) showing a different combination of sources, usually in order of formally ‘best’ (first to the left) to ‘worst’ (last to the right) fits, although the order is difficult to select in highly heterogeneous target groups, as will be readily visible.

Disintegration, migration, and imports of the Azov–Black Sea region. First migration event (solid arrows): Gordineşti–Maikop expansion (groups: I – Bursuchensk; II – Zhyvotylivka; III – Vovchans’k; IV – Crimean; V – Lower Don; VI – pre-Kuban). Second migration event (hollow arrows): Repin expansion. After Rassamakin (1999), Demchenko (2016).

Corded Ware origins

The latest publications on the Yampil barrow complex have not improved much our understanding of the complexity of Corded Ware origins from an archaeological point of view, involving multiple cultural (hence likely population) influences. This bit is from Ivanova et al., Baltic-Pontic Studies (2015) 20:1, and most hypotheses of the paper remain unanswered (except maybe for the relevance of the Złota group):

In the light of the above outline therefore one should argue that the ‘architecture of barrows’ associated in the ‘Yampil landscape’ of the Middle Dniester Area with the Eneolithic (specifically, mainly with the TC), precedes the development of a similar phenomenon that can be observed from 2900/2800 BC in the Upper Dniester Area and drainage basin of the Upper Vistula, associated with the CWC [Goslar et al. 2015; Włodarczak 2006; 2007; 2008; Jarosz, Włodarczak 2007]. The most consuming research question therefore is whether ritual customs making use of Eneolithic (Tripolye) ‘barrow architecture’ could have penetrated northwards along the Dniester route, where GAC communities functioned. One could also ask what role the rituals played among the autochthons [Kośko 2000; Włodarczak 2008; 2014: 335; Ivanova, Toshchev 2015b].

This issue has already been discussed with a resulting tentative systemic taxonomy in the studies of Włodarczak, arguing for the Złota culture (ZC) in the Vistula region as an illustration of one of the (Małopolska) reception centres of civilization inspirations from the oldest Pontic ‘barrow culture’ circle associated with the Eneolithic and Early Bronze Age [Włodarczak 2008]. Notably, it is in the ZC that one can notice a set of cultural traits (catacomb grave construction, burial details, forms and decoration of vessels) analogous to those shared by the north-western Black Sea Coast groups of the forest-steppe Eneolithic (chiefly Zhyvotilovka-Volchansk) and the Late Tripolye circle (chiefly Usatovo-Gordinești-Horodiștea-Kasperovtsy).

Globular Amphorae culture „exodus” to the Danube Delta: a – Globular Amphorae culture; b – GAC (1), Gorodsk (2), Vykhvatintsy (3) and Usatovo (4) groups of Trypillia culture; c – Coţofeni culture; d – northern border of the late phase of Baden culture;red arrows – direction of Globular Amphora culture expansion; blue arrow – direction of „reflux” of Globular Amphora culture (apud Włodarczak, 2008, with changes).

Taking into account that I6561 might be wrongly dated, we cannot include the Corded Ware-like sample of the end-5th millennium BC in the analysis of Corded Ware origins. That uncertainty in the chronology of the appearance of “Steppe ancestry” in Proto-Corded Ware peoples complicates the selection of any potential source population from the CHG cline.

Nevertheless, the lack of hg. R1a-M417 and sizeable Pre-Yamnaya-related ancestry in the sampled Pontic forest-steppe Eneolithic populations (represented exclusively by two samples from Dereivka ca. 3600-3400 BC) would leave open the interesting possibility that a similar ancestry got to the forest-steppe region between modern Poland and Ukraine during the known complex population movements of the Late Eneolithic.

It is known that Corded Ware-derived groups and Steppe Maykop show bad fits for Pre-Yamnaya/Yamnaya ancestry, and also that Steppe Maykop is a potential source of “Steppe-related ancestry” within the Eneolithic CHG mating network of the Pontic-Caspian steppes and forest-steppes. Testing Corded Ware for recent Trypillia and Maykop influences, proper of Late Trypillia and Late Maykop groups in the North Pontic area (such as Zhyvotylivka–Vovchans’k and Gordineşti) side by side with potential Pre-Yamnaya and Yamnaya sources makes thus sense:

Now, the main obvious difference between Khvalynsk-Yamnaya and Corded Ware is the long-lasting, pervasive Y-chromosome bottlenecks under R1b lineages in the former, compared to the haplogroup variability and late bottleneck under R1a-M417 in the latter, which speaks in favour – on top of everything else – of a different community of sub-Neolithic hunter-gatherers including hg. R1a-M417 hijacking the expansion of Steppe_Maykop-related ancestry around the Volhynian-Podolian Upland.

Akin to how Yamnaya patrilineal descendants hijacked regional EEF (±CWC) ancestry components mainly through exogamy, dragging them into the different expanding Bell Beaker groups (see below), but kept their Indo-European languages, these hunter-gatherers that admixed with peoples of “Steppe ancestry” were the most likely vector of expansion of Uralic languages in Eastern Europe.

PCA of ancient Eurasian samples. Marked likely Proto-Corded Ware samples and potential origin of its PCA cluster based on qpAdm results. See full PCA and more related files.

Baltic Corded Ware

One of the most interesting aspects of the results above is the surprising heterogeneity of the different regional groups, which is also reflected in the Y-DNA variability of early Corded Ware samples.

Seeing how Baltic CWC groups, especially the early Latvia_LN sample, show particularly bad fits with the models above, it seems necessary to test how this population might have come to be. My first impression in 2017 was that they could represent early Corded Ware groups admixed with Yamnaya settlers through their interactions along the Dnieper-Dniester corridor.

However, I recently predicted that the most likely admixture leading to their ancestry and PCA cluster would involve a Corded Ware-like group and a group related to sub-Neolithic cultures of eastern Europe, whose best proxy to date are EHG-like Khvalynsk samples (i.e. excluding the outlier with Pre-Yamnaya ancestry, I0434):

Detail of the PCA of the Corded Ware expansion. See full PCA and more related files.

Late Corded Ware + Yamnaya vanguard

Relevant are also the mixtures of Corded Ware from Esperstedt, and particularly those of the sample I0104, which I have repeated many times in this blog I suspected to be influenced by vanguard Yamnaya settlers:

The infeasible models of CWC + Yamnaya_Kalmykia ± Hungary_Baden (see below for Bell Beakers) and the potential cluster formed with other samples from the Baltic suggest that it could represent a more complex set of mixtures with sub-Neolithic populations. On the other hand, its location in Germany, late date (ca. 2500 BC or later), and position in the PCA, together with the good fits obtained for Germany_Beaker as a source, suggest that the increase in Steppe-related ancestry + EEF makes it impossible for the model (as I set it) to directly include Yamnaya_Kalmykia, despite this excess Steppe-related ancestry actually coming from Yamnaya vanguard groups.

I think it is very likely that the future publication of EEF-admixed Yamnaya_Hungary samples (or maybe even Yamnaya vanguard samples) will improve the fits of this model.

These results confirm at least the need to distrust the common interpretation of mixtures including late Corded Ware samples from Esperstedt (giving rise to the “up to 75% Yamnaya ancestry of CWC” in the 2015 papers) as representative of the Corded Ware culture as a whole, and to keep always in mind that an admixture of European BA groups including Corded Ware Esperstedt as a source also includes East BBC-like ancestry, unless proven otherwise.

Yamnaya vanguard groups in Corded Ware territory before the expansion of Bell Beakers (ca. 2500 BC). See full map.

Bell Beaker expansion

A hotly (re)debated topic in the past 6 months or so, and for all the wrong reasons, is the origin of the Bell Beaker folk. Archaeology, linguistics, and different Y-chromosome bottlenecks clearly indicate that Bell Beakers were at the origin of the North-West Indo-European expansion in Europe, while the survival of Corded Ware-related groups in north-eastern Europe is clearly related to the expansion of Uralic languages.

NOTE. For the interesting case of Proto-Indo-Iranians expanding with Corded Ware-like ancestry, see more on the formation of Sintashta-Potapovka-Filatovka from East Uralic-speaking Abashevo and Pre-Proto-Indo-Iranian-speaking Poltavka herders. See also more on R1a in Indo-Iranians and on the social complexity of Sintashta.

Nevertheless, every single discarded theory out there seems to keep coming back to life from time to time, and a new wave of interest in “Bell Beaker from the Single Grave culture” somehow got revived in the process, too, because this obsession – unlike the “Bell Beakers from Iberia Chalcolithic” – is apparently acceptable in certain circles, for some reason.

We know that Iberian Beakers, British Beakers, or Sicilian EBA – representing the most likely closest source population of speakers of Proto-Galaico-Lusitanian, Pre-Celtic Indo-European, and Proto-Elymian, respectively – have already been successfully tested for a direct origin among Western European Beakers in Olalde et al. (2018), Olalde et al. (2019), and Fernandes et al. (2019).

This success in ascertaining a closer Beaker source is probably due to the physical isolation of the specific groups (related to Germany_Beaker, Netherlands_Beaker, and NE_Mediterranean_Beaker samples, respectively) after their migration into regions dominated by peoples without Steppe-related ancestry. Furthermore, Celtic-speaking populations expanding with Urnfield south of the Pyrenees also show a good fit with a source close to France_Beaker.

So I decided to test sampled Bell Beaker populations, to see if it could shed light to the most likely source population of individual Beaker groups and the direction of migration within Central Europe, i.e. roughly eastwards or westwards. As it was to be expected for closely related populations (see the relevant discussion here), an attempt to offer a simplistic analysis of direction based on formal stats does not make any sense, because most of the alternative hypotheses cannot be rejected:

Not only because of the similar values obtained, but because it is absurd to take p-values as a measure of anything, especially when most of these conflicting groups with slightly ‘better’ or ‘worse’ p-values represent multiple different mixtures of the type (Yamnaya + EEF) + (Corded Ware + EEF ± Yamnaya), impossible to distinguish without selecting proper, direct ancestral populations…

A further example of how explosive the Bell Beaker expansion was into different territories, and of their extensive local admixture, is shown by the unsuccessful attempt by Olalde et al. (2018) to obtain an origin of the EEF source for all Beaker groups (excluding Iberian Beakers):

Investigating the genetic makeup of Beaker-complex-associated individuals. Testing different populations as a source for the Neolithic ancestry component in Beaker-complex-associated individuals. The table shows P values (* indicates values > 0.05) for the fit of the model: ‘Steppe_EBA + Neolithic/Copper Age’ source population.
Map of attested Yamnaya pit-grave burials in the Hungarian plains; superimposed in shades of blue are common areas covered by floods before the extensive controls imposed in the 19th century; in orange, cumulative thickness of sand, unfavourable loamy sand layer. Marked are settlements/findings of Boleráz (ca. 3500 BC on), Baden (until ca. 2800 BC), Kostolac (precise dates unknown), and Yamna kurgans (from ca. 3100/3000 BC on).

Now, there is a simpler way to understand what kind of Steppe-related ancestry is proper of Bell Beakers. I tested two simple models for some Beaker groups: Yamnaya + Hungary Baden vs. Corded Ware + GAC Poland. After all, the Bell Beaker folk should prefer a source more closely related to either Yamnaya Hungary or Central European Corded Ware:

Interestingly, models including Yamnaya + Baden show good fits for the most important groups related to North-West Indo-Europeans, including Bell Beakers from Germany, the Netherlands, Italy, and Poland, representing the most likely closest source populations of speakers of Pre-Proto-Celtic, Pre-Proto-Germanic, Proto-Italo-Venetic, and Pre-Proto-Balto-Slavic, respectively.

The admixed Yamnaya samples from Hungary that will hopefully be published soon by the Jena Lab will most likely further improve these fits, especially in combination with intermediate Chalcolithic populations of the Middle and Upper Danube and its tributaries, to a point where there will be an absolute chronological and geographical genomic trail from the fully Yamnaya-like Yamnaya settlers from Hungary to all North-West Indo-European-speaking groups of the Early Bronze Age.

The only difference between groups will be the gradual admixture events of their source Beaker group with local populations on their expansion paths, including peoples of mainly EEF, CWC+EEF, or CWC+EEF+Yamnaya related ancestry. There is ample evidence beyond ancestry models to support this, in particular continued Y-DNA bottlenecks under typical Yamnaya paternal lineages, mainly represented by R1b-L51 subclades.

Distribution of the Bell Beaker East Group, with its regional provinces, as of c. 2400 cal BC (after Heyd et al. 2004, modified). See full maps.

European Early Bronze Age

European EBA groups that might show conflicting results due to multiple admixture events with Corded Ware-related populations are the Únětice culture and the Nordic Late Neolithic.

The results for Únětice groups seem to be in line with what is expected of a Central European EBA population derived from Bell Beakers admixed with surrounding poulations of East Bell Beaker and/or late (Epi-)Corded Ware descent.

Potential models of mixture for Nordic Late Neolithic samples – despite the bad fits due to the lack of direct ancestral CWC and BBC groups from Denmark – seem to be impossible to justify as derived exclusively from Single Grave or (even less) from Battle Axe peoples, supporting immigration waves of Bell Beakers from the south and further admixture events with local groups through maritime domination.

PCA of ancient European samples. Marked are Bronze Age clusters. See full PCAs.

Balkans Bronze Age

The potential origin of the typical Corded Ware Steppe-related ancestry in the social upheaval and population movements of the Dnieper-Dniester forest-steppe corridor during the 4th millennium BC raises the question: how much do Balkan Bronze Age groups owe their ancestry to a population different than the spread of Pre-Yamnaya-like Suvorovo-Novodanilovka chieftains? Furthermore, which Bronze Age groups seem to be more likely derived exclusively from Pre-Yamnaya groups, and which are more likely to be derived from a mixture of Yamnaya and Pre-Yamnaya? Do the formal stats obtained correspond to the expected results for each group?

Since the expansion of hg. I2a-L699 (TMRCA ca. 5500 BC) need not be associated with Yamnaya, some of these values – together with the assessment of each individual archaeological culture – may question their origin in a Yamnaya-related expansion rather than in a Khvalynsk-related one.

NOTE. These are the last ones I was able to test yesterday, and I have not thought these models through, so feel free to propose other source and target groups. In particular, complex movements through the North Pontic area during the Late Eneolithic would suggest that there might have been different Steppe-ancestry-related vs. EEF-related interactions in the north-west and west Pontic area before and during the expansion of Yamnaya.


One of the key Indo-European populations that should be derived from Yamnaya to confirm the Steppe hypothesis, together with North-West Indo-Europeans, are Proto-Greeks, who will in turn improve our understanding of the preceding Palaeo-Balkan community. Unfortunately, we only have Mycenaean samples from the Aegean, with slight contributions of Steppe-related ancestry.

Still, analyses with potential source populations for this Steppe ancestry show that the Yamnaya outlier from Bulgaria is a good fit:

The comparison of all results makes it quite evident the why of the good fits from (Srubnaya-related) Bulgaria_MLBA I2163 or of Sintashta_MLBA relative to the only a priori reasonable Yamnaya and Catacomb sources: it is not about some hypothetical shared ancestor in Graeco-Aryan-speaking East Yamnaya– or even Catacomb-Poltavka-related groups, because all available Yamnaya-related peoples are almost indistinguishable from each other (at least with the sampling available today). These results reflect a sizeable contribution of similar EEF-related populations from around the Carpathians in both Steppe-related groups: Corded Ware and Yamnaya settlers from the Balkans.

Cultural groups in and around the Balkans during the Early Bronze Age. See full maps.

qpAdm magic

In hobby ancestry magic, as in magic in general, it is not about getting dubious results out of thin air: misdirection is the key. A magician needs to draw the audience attention to ‘remarkable’ ancestry percentages coupled with ‘great’ (?) p-values that purportedly “prove” what the audience expects to see, distracting everyone from the true interesting aspects, like statistical design, the data used (and its shortcomings), other opposing models, a comparison of values, a proper interpretation…you name it.

I reckon – based on the examples above – that the following problems lie at the core of bad uses of qpAdm:

  1. In the formal aspect, the poor understanding of what p-values and other formal stats obtained actually mean, and – more importantly – what they don’t mean. The simplistic trend to accept results of a few analyses at face value is necessarily wrong, in so far as there is often no proper reasoning of what is being assessed and how, and there is never a previous opinion about what could be expected if the alternative hypotheses were true.
  2. In the interpretation aspect, the poor judgement of accompanying any results with simplistic, superficial, irrelevant, and often plainly wrong archaeological or linguistic data selected a posteriori; the inclusion of some racial or sociopolitical overtones in the mixture to set a propitious mood in the target audience; and a sort of ritualistic theatrics with the main theme of ‘winning’, that is best completed with ad hominems.

If you get rid of all this, the most reasonable interpretation of the output of a model proposed and tested should be similar to Nick Patterson’s words in his explanation of qpWave and qpAdm use:

Here we see that, at least in this analysis there are reasonable models with CordedWareNeolithic is a mix of either WHG or LBKNeolithic and YamnayaEBA. (…) The point of this note is not to give a serious phylogenetic analysis but the results here certainly support a major Steppe contribution to the Corded Ware population, which is entirely concordant with the archaeology [?].

Very far, as you can see, from the childish “Eureka! I proved the source!”-kind of thinking common among hobbyists.

The Mycenaean case is an illustrative example: if the Yamnaya outlier from Bulgaria were not available, and if one were not careful when designing and assessing those mixture models, the interpretation would range from erroneous (viz. a Graeco-Aryan substrate, as I initially thought) to impossible (say, inventing migration waves of Sintashta or Srubnaya peoples into Crete). The models presented above show that a contribution of Yamnaya to Mycenaeans couldn’t be rejected, and this alone should have been enough to accept Yamnaya as the most likely source population of “Steppe ancestry” in Proto-Greeks, pending intermediate samples from the Balkans. In other words, one could actually find that ‘the best’ p-values for source populations of Mycenaeans is a combination of modern Poles + Turks, despite the impracticality of such a model…

I haven’t been able to reproduce results which supposedly showed that Corded Ware is more likely to be derived from (Pre-)Yamnaya than other source population, or that Corded Ware is better suited as the ancestral population of Bell Beakers. The analyses above show values in line with what has been published in recent scientific papers, and what should be expected based on linguistics and archaeology. So I’ll go out on a limb here and say that it’s only through a careful selection of outgroups and samples tested, and of as few compared models as possible, that you could eventually get this kind of results and interpretation, if at all.

Whether that kind of special care for outgroups and samples is about (a) an acceptable fine-tuning of the analyses, (b) a simplistic selection dragged from the first papers published and applied indiscriminately to all models, or (c) cherry picking analyses until results fit the expected outcome, is a question that will become mostly irrelevant when future publications continue to support an origin of the expansion of ancient Indo-European languages in Khvalynsk- and Yamnaya-related migrations.

Feel free to suggest (reasonable) modifications to correct some of these models in the comments. Also, be sure to check out other values such as proportions, SD or SNPs of the different results that I might have not taken into account when assessing ‘good’ or ‘bad’ fits.


Yamnaya replaced Europeans, but admixed heavily as they spread to Asia


Recent papers The formation of human populations in South and Central Asia, by Narasimhan, Patterson et al. Science (2019) and An Ancient Harappan Genome Lacks Ancestry from Steppe Pastoralists or Iranian Farmers, by Shinde et al. Cell (2019).

NOTE. For direct access to Narasimhan, Patterson et al. (2019), visit this link courtesy of the first author and the Reich Lab.

I am currently not on holidays anymore, and the information in the paper is huge, with many complex issues raised by the new samples and analyses rather than solved, so I will stick to the Indo-European question, especially to some details that have changed since the publication of the preprint. For a summary of its previous findings, see the book series A Song of Sheep and Horses, in particular the sections from A Clash of Chiefs where I discuss languages and regions related to Central and South Asia.

I have updated the maps of the Preshistory Atlas, and included the most recently reported mtDNA and Y-DNA subclades. I will try to update the Eurasian PCA and related graphics, too.

NOTE. Many subclades from this paper have been reported by Kolgeh (download), Pribislav and Principe at Anthrogenica on this thread. I have checked some out for comparison, but even if it contradicted their analyses mine would be the wrong ones. I will upload my spreadsheets and link to them from this page whenever I find the time.

Ancestry clines (1) before and (2) after the advent of farming. Colour modified from the original to emphasize the CHG cline: notice the apparent relevance of forest-steppe groups in the formation of this CHG mating network from which Pre-Yamnaya peoples emerged.


I think the Narasimhan, Patterson et al. (2019) paper is well-balanced, and unexpectedly centered – as it should – on the spread of Yamnaya-related ancestry (now Western_Steppe_EMBA) as the marker of Proto-Indo-European migrations, which stretched ca. 3000 BC “from Hungary in the west to the Altai mountains in the east”, spreading later Indo-European dialects after admixing with local groups, from the Atlantic to South Asia.

I. Afanasievo

I.1. East or West PIE?

I expected Afanasievo to show (1) R1b-L23(xZ2103, xL51) and (2) R1b-L51 lineages, apart from (3) the known R1b-Z2103 ones, pointing thus to an ancestral PIE community before the typical Yamnaya bottlenecks, and with R1b-L51 supporting a connection with North-West Indo-European. The presence of some samples of hg. Q pointed in this direction, too.

However, Afanasievo samples show overwhelmingly R1b-Z2103 subclades (all except for those with low coverage), all apparently under R1b-Z2108 (formed ca. 3500 BC, TMRCA ca. 3500 BC), like most samples from East Yamnaya.

This necessarily shifts the split and spread of R1b-L23 lineages to Khvalynsk/early Repin-related expansions, in line with what TMRCA suggested, and what advances by Anthony (2019) and Khokhlov (2018) on future samples from the Reich Lab suggest.

Given the almost indistinguishable ancestry between Afanasievo and Early Yamnaya, there seems to be as of yet little potential information to support in population genomics that Pre-Tocharians were more closely related to North-West Indo-Europeans than to Graeco-Aryans, as it is proposed in linguistics based on the few shared traits between them, and the lack of innovations proper of the Graeco-Aryan community.

NOTE. A new issue of Wekʷos contains an abstract from a relevant paper by Blažek on vocabulary for ‘word’, including the common NWIE *wrdʰo-/wordʰo-, but also a new (for me, at least) Northern Indo-European one: *rēki-/*rēkoi̯-, shared by Slavic and Tocharian.

The fact that bottlenecks happened around the time of the late Repin expansion suggests that we might be able to see different clans based on the predominant lineages developing around the Don-Volga area in the 4th millennium BC. The finding of Pre-R1b-L51 in Lopatino (see below), and of a Catacomb sample of hg. R1b-Z2103(Z2105-) in the North Caucasus steppe near Novoaleksandrovskij also support a star-like phylogeny of R1b-L23 stemming from the Don-Volga area.

NOTE. Interestingly, a dismissal of a common trunk between Tocharian and North-West Indo-European would mean that shared similarities between such disparate groups could be traced back to a Common Late PIE trunk, and not to a shared (western) Repin community. For an example of such a ‘pure’ East-West dialectal division, see the diagram of Adams & Mallory (2007) at the end of the post. It would thus mean a fatal blow to Kortlandt’s Indo-Slavonic group among other hypothetical groupings (remade versions of the ancient Centum-Satem division), as well as to certain assumptions about laryngeal survival or tritectalism that usually accompany them. Still, I don’t think this is the case, so the question will remain a linguistic one, and maybe some similarities will be found with enough number of samples that differentiate Northern Indo-Europeans from the East Yamna/Catacomb-Poltavka-Balkan_EBA group.

Y-chromosome haplogroups of Afanasievo samples and neighbouring groups. See full maps.

I.2. Expansion or resurgence of hg. Q1b?

Haplogroup Q1b-Y6802(xY6798) seems to be the main lineage that expanded with Afanasievo, or resurged in their territory. It’s difficult to tell, because the three available samples are family, and belong to a later period.

NOTE. I have finally put some order to the chaos of Q1a vs. Q1b subclades in my spreadsheet and in the maps. The change of ISOGG 2016 to 2017 has caused that many samples reported as of Q1 subclades from papers prepared during the 2017-2018 period, and which did not provide specific SNP calls, were impossible to define with certainty. By checking some of them I could determine the specific standard used.

In favour of the presence of this haplogroup in the Pre-Yamnaya community are:

  • The statement by Anthony (2019) that Q1a [hence maybe Q1b in the new ISOGG nomenclature] represented a significant minority among an R1b-rich community.
  • The sample found in a Sintastha WSHG outlier (see below), of hg. Q1b-Y6798, and the sample from Lola, of hg. Q1b-L717, are thus from other lineage(s) separated thousands of years from the Afanasievo subclade, but might be related to the Khvalynsk expansion, like R1b-V1636 and R1b-M269 are.

These are the data that suggest multiple resurgence events in Afanasievo, rather than expanding Q1b lineages with late Repin:

  • Overwhelming presence of R1b in early Yamnaya and Afanasievo samples; one Q1(xQ1b) sample reported in Khvalynsk.
  • The three Q1b samples appear only later, although wide CI for radiocarbon dates, different sites, and indistinguishable ancestry may preclude a proper interpretation of the only available family.
    • Nevertheless, ancestry seems unimportant in the case of Afanasievo, since the same ancestry is found up to the Iron Age in a community of varied haplogroups.
  • Another sample of hg. Q1b-Y6802(xY6798) is found in Aigyrzhal_BA (ca. 2120 BC), with Central_Steppe_EMBA (WSHG-related) ancestry; however, this clade formed and expanded ca. 14000 BC.
  • The whole Altai – Baikal area seems to be a Q1b-L54 hotspot, although admittedly many subclades separated very early from each other, so they might be found throughout North Eurasia during the Neolithic.
  • One Afanasievo sample is reported as of hg. C in Shin (2017), and the same haplogroup is reported by Hollard (2014) for the only available sample of early Chemurchek to date, from Kulala ula, North Altai (ca. 2400 BC).
Y-chromosome haplogroups of late Afanasievo – early Chemurchek samples and neighbouring groups. See full maps.

I.3. Agricultural substrate

Evidence of continuous contacts of Central_Steppe_MLBA populations with BMAC from ca. 2100 BC on – visible in the appearance of Steppe ancestry among BMAC samples and BMAC ancestry among Steppe pastoralists – supports the close interaction between Indo-Iranian pastoralists and BMAC agriculturalists as the origin of the Asian agricultural substrate found in Proto-Indo-Iranian, hence likely related to the language of the Oxus Civilization.

Similar to the European agricultural substrate adopted by West Yamnaya settlers (both NWIE and Palaeo-Balkan speakers), Tocharian shows a few substrate terms in common with Indo-Iranian, which can be explained by contacts in different dialectal stages through phonetic reconstruction alone.

The recent Hermes et al. (2019) supports the early integration of pastoralism and millet cultivation in Central Asia (ca. 2700 BC or earlier), with the spread of agriculture to the north – through the Inner Asian Mountain Corridor – being thus unrelated to the Indo-Iranian expansions, which might support independent loans.

However, compared to the huge number of parallel shared loans between NWIE and Palaeo-Balkan languages in the European substratum, Indo-Iranians seem to have been the first borrowers of vocabulary from Asian agriculturalists, while Proto-Tocharian shows just one certain related word, with phonetic similarities that warrant an adoption from late Indo-Iranian dialects.

Y-chromosome haplogroups of Sintashta, Central Asia, and neighbouring groups in the Early Bronze Age. See full maps.

The finding of hg. (pre-)R1b-PH155 in a BMAC sample from Dzharkutan (to the west of Xinjiang) together with hg. R1b in a sample from Central Mongolia previously reported by Shin (2017) support the widespread presence of this lineage to the east and west of Xinjiang, which means it might have become incorporated to Indo-Iranian migrants into the Xiaohe horizon, to the Afanasievo-Chemurchek-derived groups, or the later from the former. In other words, the Island Biogeography Theory with its explanation of founder effects might be, after all, applicable to the whole Xinjiang area, not only during the Chemurchek – Tianshan-Beilu – Xiaohe interaction.

Of course, there is no need for too complicated models of haplogroup resurgence events in Central and South Asia, seeing how the total amount of hg. R1a-L657 (today prevalent among Indo-Aryan speakers from South Asia) among ancient Western/Central_Steppe_MLBA-related samples amounts to a total of 0, and that many different lineages survived in the region. Similar cases of haplogroup resurgence and Y-DNA bottleneck events are also found in the Central and Eastern Mediterranean, and in North-Eastern Europe. From the paper:

[It] could reflect stronger ecological or cultural barriers to the spread of people in South Asia than in Europe, allowing the previously established groups more time to adapt and mix with incoming groups. A second difference is the smaller proportion of Steppe pastoralist– related ancestry in South Asia compared with Europe, its later arrival by ~500 to 1000 years, and a lower (albeit still significant) male sex bias in the admixture (…).

Y-chromosome haplogroups of samples from the Srubna-Andronovo and Andronovo-related horizon, Xiaohe, late BMAC, and neighbouring groups. See full maps.

II. R1b-Beakers replaced R1a-CWC peoples

II.1. R1a-M417-rich Corded Ware

Newly reported Corded Ware samples from Radovesice show hg. R1a-M417, at least some of them xZ645, ‘archaic’ lineages shared with the early Bergrheinfeld sample (ca. 2650 BC) and with the coeval Esperstedt family, hence supporting that it eventually became the typical Western Corded Ware lineage(s), probably dominating over the so-called A-horizon and the Single Grave culture in particular. On the other hand, R1a-Z645 was typical of bottlenecks among expanding Eastern Corded Ware groups.

Interestingly, it is supported once again that known bottlenecks under hg. R1a-M417 happened during the Corded Ware expansion, evidenced also by the remarkable high variability of male lineages among early Corded Ware samples. Similarly, these Corded Ware samples from Bohemia form part of the typical ‘Central European’ cluster in the PCA, which excludes once again not only the ‘official’ Espersted outlier I1540, but also the known outlier with Yamnaya ancestry.

NOTE. The fact that Esperstedt is closely related geographically and in terms of ancestry to later Únětice samples further complicates the assumption that Únětice is a mixture of Bell Beakers and Corded Ware, being rather an admixture of incoming Bell Beakers with post-Yamnaya vanguard settlers who admixed with Corded Ware (see more on the expansion of Yamnaya ancestry). In other words, Únětice is rather an admixture of Yamnaya+EEF with Yamnaya+(CWC+EEF).

Y-chromosome haplogroups of samples from Catacomb, Poltavka, Balkan EBA, and Bell Beaker, as well as neighbouring groups. See full maps.

On Ukraine_Eneolithic I6561

If the bottlenecks are as straightforward as they appear, with a star-like phylogeny of R1a-M417 starting with the Pre-Corded Ware expansion, then what is happening with the Alexandria sample, so precisely radiocarbon dated to ca. 4045-3974 BC? The reported hg. R1a-M417 was fully compatible, while R1a-Z645 could be compatible with its date, but the few positive SNPs I got in my analysis point indeed to a potential subclade of R1a-Z94, and I trust more experienced hobbyists in this ‘art’ of ascertaining the SNPs of ancient samples, and they report hg. R1a-Z93 (Z95+, Y26+, Y2-).

Seeing how Y-DNA bottlenecks worked in Yamnaya-Afanasievo and in Corded Ware and related groups, and if this sample really is so deep within R1a-Z93 in a region that should be more strongly affected by the known Neolithic Y-chromosome bottlenecks and forest-steppe ecotone, someone from the lab responsible for this sample should check its date once again, before more people keep chasing their tails with an individual that (based on its derived SNPs’ TMRCA) might actually be dated to the Bronze Age, where it could make much more sense in terms of ancestry and position in the PCA.

EDIT (14 SEP 2019): … and with the fact that he is the first individual to show the genetic adaptation for lactase persistence (I3910-T), which is only found later among Bell Beakers, and much later in Sintashta and related Steppe_MLBA peoples (see comments below).

This is also evidenced by the other Ukraine_Eneolithic (likely a late Yamnaya) sample of hg. R1b-Z2103 from Dereivka (ca. 2800 BC) and who – despite being in a similar territory 1,000 years later – shows a wholly diluted Yamnaya ancestry under typically European HG ancestry, even more so than other late Sredni Stog samples from Dereivka of ca. 3600-3400 BC, suggesting a decrease in Steppe ancestry rather than an increase – which is supposedly what should be expected based on the ancestry from Alexandria…

Like the reported Chalcolithic individual of Hajji Firuz who showed an apparently incompatible subclade and Yamnaya ancestry at least some 1,000 years before it should, and turned out to be from the Iron Age (see below), this may be another case of wrong radiocarbon dating.

NOTE. It would be interesting, if this turns out to be another Hajji Firuz-like error, to check how well different ancestry models worked in whose hands exactly, and if anyone actually pointed out that this sample was derived, and not ancestral, to many different samples that were used in combination with it. It would also be a great control to check if those still supporting a Sredni Stog origin for PIE would shift their preference even more to the north or west, depending on where the first “true” R1a-M417 samples popped up. Such a finding now could be thus a great tool to discover whether haplogroup-based bias plays a role in ancestry magic as related to the Indo-European question, i.e. if it really is about “pure statistics”, or there is something else to it…

II.1. R1b-L51-rich Bell Beakers

The overwhelming majority of R1b-L51 lineages in Radovesice during the Bell Beaker period, just after the sampled Corded Ware individuals from the same site, further strengthen the hypothesis of an almost full replacement of R1a-M417 lineages from Central Europe up to southern Scandinavia after the arrival of Bell Beakers.

Yet another R1b-L151* sample has popped up in Central Europe, in the individual classified as Bilina_BA (ca. 2200-800 BC), which clusters with Bell Beakers from Bohemia, with the outlier from Turlojiškė, and with Early Slavs, suggesting once again that a group of central-east European Beakers represented the Pre-Proto-Balto-Slavic community before their spread and admixture events to the east.

The available ancient distribution of R1b-L51*, R1b-L52* or R1b-L151* is getting thus closer to the most likely origin of R1b-L51 in the expansion of East Bell Beakers, who trace their paternal ancestors to Yamnaya settlers from the Carpathian Basin:

NOTE. Some of these are from other sources, and some are samples I have checked in a hurry, so I may have missed some derived SNPs. If you send me a corrected SNP call to dismiss one of these, or more ‘archaic’ samples, I’ll correct the map accordingly. See also maps of modern distributionof R1b-M269 subclades.

Distribution of ‘archaic’ R1b-L51 subclades in ancient samples, overlaid over a map of Yamnaya and Bell Beaker migrations. In blue, Yamnaya Pre-L51 from Lopatino (not shown) and R1b-L52* from BBC Augsburg. In violet, R1b-L51 (xP312,xU106) from BBC Prague and Poland. In maroon, hg. R1b-L151* from BBC Hungary, BA Bohemia, and (not shown) a potential sample from BBC at Mondelange, which is certainly xU106, maybe xP312. Interestingly, the earliest sample of hg. R1b-U106 (a lineage more proper of northern Europe) has been found in a Bell Beaker from Radovesice (ca. 2350 BC), between two of these ‘archaic’ R1b-L51 samples; and a sample possibly of hg. R1b-ZZ11+ (ancestral to DF27 and U152) was found in a Bell Beaker from Quedlinburg, Germany (ca. 2290 BC), to the north-west of Bohemia. The oldest R1b-U152 are logically from Central Europe, too.

III. Proto-Indo-Iranian

Before the emergence of Proto-Indo-Iranian, it seems that Pre-Proto-Indo-Iranian-speaking Poltavka groups were subjected to pressure from Central_Steppe_EMBA-related peoples coming from the (south-?)east, such as those found sampled from Mereke_BA. Their ‘kurgan’ culture was dated correctly to approximately the same date as Poltavka materials, but their ancestry and hg. N2(pre-N2a) – also found in a previous sample from Botai – point to their intrusive nature, and thus to difficulties in the Pre-Proto-Indo-Iranian community to keep control over the previous East Yamnaya territory in the Don-Volga-Ural steppes.

We know that the region does not show genetic continuity with a previous period (or was not under this ‘eastern’ pressure) because of an Eastern Yamnaya sample from the same site (ca. 3100 BC) showing typical Yamnaya ancestry. Before Yamnaya, it is likely that Pre-Yamnaya ancestry formed through admixture of EHG-like Khvalynsk with a North Caspian steppe population similar to the Steppe_Eneolithic samples from the North Caucasus Piedmont (see Anthony 2019), so we can also rule out some intermittent presence of a Botai/Kelteminar-like population in the region during the Khvalynsk period.

It is very likely, then, that this competition for the same territory – coupled with the known harsher climate of the late 3rd millennium BC – led Poltavka herders to their known joint venture with Abashevo chiefs in the formation of the Sintashta-Potapovka-Filatovka community of fortified settlements. Supporting these intense contacts of Poltavka herders with Central Asian populations, late ‘outliers’ from the Volga-Ural region show admixture with typical Central_Steppe_MLBA populations: one in Potapovka (ca. 2220 BC), of hg. R1b-Z2103; and four in the Sintashta_MLBA_o1 cluster (ca. 2050-1650 BC), with two samples of hg. R1b-L23 (one R1b-Z2109), one Q1b-L56(xL53), one Q1b-Y6798.

Outlier analysis reveals ancient contacts between sites. We plot the average of principal component 1 (x axis) and principal component 2 (y axis) for the West Eurasian and All Eurasian PCA plots (…). In the Middle to Late Bronze Age Steppe, we observe, in addition to the Western_Steppe_MLBA and Central_Steppe_MLBA clusters (indistinguishable in this projection), outliers admixed with other ancestries. The BMAC-related admixture in Kazakhstan documents northward gene flow onto the Steppe and confirms the Inner Asian Mountain Corridor as a conduit for movement of people.

Similar to how the Sintashta_MLBA_o2 cluster shows an admixture with central steppe populations and hg. R1a-Z645, the WSHG ancestry in those outliers from the o1 cluster of typically (or potentially) Yamnaya lineages show that Poltavka-like herders survived well after centuries of Abashevo-Poltavka coexistence and admixture events, supporting the formation of a Proto-Indo-Iranian community from the local language as pronounced by the incomers, who dominated as elites over the fortified settlements.

The Proto-Indo-Iranian community likely formed thus in situ in the Don-Volga-Ural region, from the admixture of locals of Yamnaya ancestry with incomers of Corded Ware ancestry – represented by the ca. 67% Yamnaya-like ancestry and ca. 33% ancestry from the European cline. Their community formed thus ca. 1,000 years later than the expansion of Late PIE ca. 3500 BC, and expanded (some 500 years after that) a full-fledged Proto-Indo-Iranian language with the Srubna-Andronovo horizon, further admixing with ca. 9% of Central_Steppe_EMBA (WSHG-related) ancestry in their migration through Central Asia, as reported in the paper.

IV. Armenian

The sample from Hajji Firuz, of hg. R1b-Z2103 (xPF331), has been – as expected – re-dated to the Iron Age (ca. 1193-1019 BC), hence it may offer – together with the samples from the Levant and their Aegean-like ancestry rapidly diluted among local populations – yet another proof of how the Late Bronze Age upheaval in Europe was the cause of the Armenian migration to the Armenoid homeland, where they thrived under the strong influence from Hurro-Urartian.

Y-chromosome haplogroups of the Middle East and neighbouring groups during the Late Bronze Age / Iron Age. See full maps.

Indus Valley Civilization and Dravidian

A surprise came from the analysis reported by Shinde et al. (2019) of an Iran_N-related IVC ancestry which may have split earlier than 10000 BC from a source common to Iran hunter-gatherers of the Belt Cave.

For the controversial Elamo-Dravidian hypothesis of the Muscovite school, this difference in ancestry between both groups (IVC and Iran Neolithic) seems to be a death blow, if population genomics was even needed for that. Nevertheless, I guess that a full rejection of a recent connection will come down to more recent and subtle population movements in the area.

EDIT (12 SEP): Apparently, Iosif Lazaridis is not so sure about this deep splitting of ‘lineages’ as shown in the paper, so we may be talking about different contributions of AME+ANE/ENA, which means the Elamo-Dravidian game is afoot; at least in genomics:

I shared the idea that the Indus Valley Civilization was linked to the Proto-Dravidian community, so I’m inclined to support this statement by Narasimhan, Patterson, et al. (2019), even if based only on modern samples and a few ancient ones:

The strong correlation between ASI ancestry and present-day Dravidian languages suggests that the ASI, which we have shown formed as groups with ancestry typical of the Indus Periphery Cline moved south and east after the decline of the IVC to mix with groups with more AASI ancestry, most likely spoke an early Dravidian language.

Natural neighbour interpolation of qpAdm results – Maximum A Posteriori Estimate from the Hierarchical Model (estimates used in the Narasimhan, Patterson et al. 2019 figures) for Central_Steppe_MLBA-related (left), Indus_Periphery_West-related (center) and Andamanese_Hunter-Gatherer-related ancestry (right) among sampled modern Indian populations. In blue, peoples of IE language; in red, Dravidian; in pink, Tibeto-Burman; in black, unclassified. See full image.

I am wary of this sort of simplistic correlation with modern speakers, because we have seen what happened with the wrong assumptions about modern Balto-Slavic and Finno-Ugric speakers and their genetic profile (see e.g. here or here). In fact, I just can’t differentiate as well as those with deep knowledge in South Asian history the social stratification of the different tribal groups – with their endogamous rules under the varna and jati systems – in the ancestry maps of modern India. The pattern of ancestry and language distribution combined with the findings of ancient populations seem in principle straightforward, though.


The message to take home from Shinde et al. (2019) is that genomic data is fully at odds with the Anatolian homeland hypothesis – including the latest model by Heggarty (2014)* – whose relevance is still overvalued today, probably due in part to the shift of OIT proponents to more reasonable Out-of-Iran models, apparently more fashionable as a vector of Indo-Aryan languages than Eurasian steppe pastoralists?
*The authors listed this model erroneously as Heggarty (2019).

The paper seems to play with the occasional reference to Corded Ware as a vector of expansion of Indo-European languages, even after accepting the role of Yamnaya as the most evident population expanding Late PIE to western Europe – and the different ancestry that spread with Indo-Iranian to South Asia 1,000 years later. However, the most cringe-worthy aspect is the sole citation of the debunked, pseudoscientific glottochronological method used by Ringe, Warnow, and Taylor (2002) to support the so-called “steppe homeland”, a paper and dialectal scheme which keeps being referenced in papers of the Reich Lab, probably as a consequence of its use in Anthony (2007).

On the other hand, these are the equivalent simplistic comments in Narasimhan, Patterson et al. (2019):

The Steppe ancestry in South Asia has the same profile as that in Bronze Age Eastern Europe, tracking a movement of people that affected both regions and that likely spread the unique features shared between Indo-Iranian and Balto-Slavic languages. (…), which despite their vast geographic separation share the “satem” innovation and “ruki” sound laws.

Indo-European dialectal relationships, from Mallory and Adams (2006).

The only academic closely related to linguistics from the list of authors, as far as I know, is James P. Mallory, who has supported a North-West Indo-European dialect (including Balto-Slavic) for a long time – recently associating its expansion with Bell Beakers – opposed thus to a Graeco-Aryan group which shared certain innovations, “Satemization” not being one of them. Not that anyone needs to be a linguist to dismiss any similarities between Balto-Slavic and Indo-Iranian beyond this phonetic trend, mind you.

Even Anthony (2019) supports now R1b-rich Pre-Yamnaya and Yamnaya communities from the Don-Volga region expanding Middle and Late Proto-Indo-European dialects.

So how does the underlying Corded Ware ancestry of eastern Europe (where Pre-Balto-Slavs eventually spread to from Bell Beaker-derived groups) and of the highly admixed (“cosmopolitan”, according to the authors) Sintashta-Potapovka-Filatovka in the east relate to the similar-but-different phonetic trends of two unrelated IE dialects?

If only there was a language substrate that could (as Shinde et al. put it) “elegantly” explain this similar phonetic evolution, solving at the same time the question of the expansion of Uralic languages and their strong linguistic contacts with steppe peoples. Say, Eneolithic populations of mainly hunter-fisher-gatherers from the North Pontic forest-steppes with a stronger connection to metalworking