Bell Beakers and Mycenaeans from Yamnaya; Corded Ware from the forest steppe


I have recently written about the spread of Pre-Yamnaya or Yamnaya ancestry and Corded Ware-related ancestry throughout Eurasia, using exclusively analyses published by professional geneticists, and filling in the gaps and contradictory data with the most reasonable interpretations. I did so consciously, to avoid any suspicion that I was interspersing my own data or cherry picking results.

Now I’m finished recapitulating the known public data, and the only way forward is the assessment of these populations using the available datasets and free tools.

Understanding the complexities of qpAdm is fairly difficult without a proper genetic and statistical background, which I won’t pretend to have, so its tweaking to get strictly correct results would require an unending game of trial and error. I have sadly little time for this, even taking my tendency to procrastination into account… so I have used a simple model akin to those published before – in particular, the outgroup selection by Ning, Wang et al. (2019), who seem to be part of the only group interested in distinguishing Yamnaya-related from Corded Ware-related ancestry, probably the most relevant question discussed today in population genomics regarding the Proto-Indo-European and Proto-Uralic homelands.

Supplementary Table 13. P values of rank=2 and admixture proportions in modelling Steppe ancestry populations as a three-way admixture of Eneolithic steppe Anatolian_Neolithic and WHG using 14 outgroups.
Left populations: Test, Eneolithic_steppe, Anatolian_Neolithic, WHG.
Right populations: Mbuti.DG, Ust_Ishim.DG, Kostenki14, MA1, Han.DG, Papuan.DG, Onge.DG, Villabruna, Vestonice16, ElMiron, Ethiopia_4500BP.SG, Karitiana.DG, Natufian, Iran_Ganj_Dareh_Neolithic.

I have used for all analyses below a merged dataset including the curated one of the Reich Lab, the latest on Central and South Asia by Narasimhan, Patterson et al. (2019), on Iberia by Olalde et al. (2019), and on the East Baltic by Saag et al. (2019), as well as datasets including samples from Wang et al. (2019) and Lamnidis et al. (2018). I used (and intend to use) the same merged dataset in all cases, despite its huge size, to avoid adding one more uncontrolled variable to the analyses, so that all results obtained can be compared.

I try to prepare in advance a bunch of relevant files with left pops and right pops for each model:

  1. It seems a priori more reasonable to use geographically and chronologically closer proxy populations (say, Trypillia or GAC for Steppe-related peoples) than hypothetic combinations of ancestral ones (viz. Anatolian farmer, WHG, and EHG).
  2. This also means using subgroups closer to the most likely source population, such as (Don-Volga interfluve) Yamnaya_Kalmykia rather than (Middle Volga) Yamnaya_Samara for the western expansion of late Repin/early Yamnaya, or the early Germany_Corded_Ware.SG or Czech_Corded Ware for the group closest to the Proto-Corded Ware population (see below), likely neighbouring the Upper Vistula region.
  3. I usually test two source populations for different targets, which seems like a much more efficient way of using computer resources, whenever I know what I want to test, since I need my PC back for its normal use; whenever I don’t know exactly what to test, I use three-way admixture models and look for subsets to try and improve the results.

I have probably left out some more complex models by individualizing the most relevant groups, but for the time being this would have to do. Also, no other formal stats have been used in any case, which is an evident shortcoming, ruling out an interpretation drawn directly and only from the results below.

Full qpAdm results for each batch of samples are presented in a Google Spreadsheet, with each tab (bottom of the page) showing a different combination of sources, usually in order of formally ‘best’ (first to the left) to ‘worst’ (last to the right) fits, although the order is difficult to select in highly heterogeneous target groups, as will be readily visible.

Disintegration, migration, and imports of the Azov–Black Sea region. First migration event (solid arrows): Gordineşti–Maikop expansion (groups: I – Bursuchensk; II – Zhyvotylivka; III – Vovchans’k; IV – Crimean; V – Lower Don; VI – pre-Kuban). Second migration event (hollow arrows): Repin expansion. After Rassamakin (1999), Demchenko (2016).

Corded Ware origins

The latest publications on the Yampil barrow complex have not improved much our understanding of the complexity of Corded Ware origins from an archaeological point of view, involving multiple cultural (hence likely population) influences. This bit is from Ivanova et al., Baltic-Pontic Studies (2015) 20:1, and most hypotheses of the paper remain unanswered (except maybe for the relevance of the Złota group):

In the light of the above outline therefore one should argue that the ‘architecture of barrows’ associated in the ‘Yampil landscape’ of the Middle Dniester Area with the Eneolithic (specifically, mainly with the TC), precedes the development of a similar phenomenon that can be observed from 2900/2800 BC in the Upper Dniester Area and drainage basin of the Upper Vistula, associated with the CWC [Goslar et al. 2015; Włodarczak 2006; 2007; 2008; Jarosz, Włodarczak 2007]. The most consuming research question therefore is whether ritual customs making use of Eneolithic (Tripolye) ‘barrow architecture’ could have penetrated northwards along the Dniester route, where GAC communities functioned. One could also ask what role the rituals played among the autochthons [Kośko 2000; Włodarczak 2008; 2014: 335; Ivanova, Toshchev 2015b].

This issue has already been discussed with a resulting tentative systemic taxonomy in the studies of Włodarczak, arguing for the Złota culture (ZC) in the Vistula region as an illustration of one of the (Małopolska) reception centres of civilization inspirations from the oldest Pontic ‘barrow culture’ circle associated with the Eneolithic and Early Bronze Age [Włodarczak 2008]. Notably, it is in the ZC that one can notice a set of cultural traits (catacomb grave construction, burial details, forms and decoration of vessels) analogous to those shared by the north-western Black Sea Coast groups of the forest-steppe Eneolithic (chiefly Zhyvotilovka-Volchansk) and the Late Tripolye circle (chiefly Usatovo-Gordinești-Horodiștea-Kasperovtsy).

Globular Amphorae culture „exodus” to the Danube Delta: a – Globular Amphorae culture; b – GAC (1), Gorodsk (2), Vykhvatintsy (3) and Usatovo (4) groups of Trypillia culture; c – Coţofeni culture; d – northern border of the late phase of Baden culture;red arrows – direction of Globular Amphora culture expansion; blue arrow – direction of „reflux” of Globular Amphora culture (apud Włodarczak, 2008, with changes).

Taking into account that I6561 might be wrongly dated, we cannot include the Corded Ware-like sample of the end-5th millennium BC in the analysis of Corded Ware origins. That uncertainty in the chronology of the appearance of “Steppe ancestry” in Proto-Corded Ware peoples complicates the selection of any potential source population from the CHG cline.

Nevertheless, the lack of hg. R1a-M417 and sizeable Pre-Yamnaya-related ancestry in the sampled Pontic forest-steppe Eneolithic populations (represented exclusively by two samples from Dereivka ca. 3600-3400 BC) would leave open the interesting possibility that a similar ancestry got to the forest-steppe region between modern Poland and Ukraine during the known complex population movements of the Late Eneolithic.

It is known that Corded Ware-derived groups and Steppe Maykop show bad fits for Pre-Yamnaya/Yamnaya ancestry, and also that Steppe Maykop is a potential source of “Steppe-related ancestry” within the Eneolithic CHG mating network of the Pontic-Caspian steppes and forest-steppes. Testing Corded Ware for recent Trypillia and Maykop influences, proper of Late Trypillia and Late Maykop groups in the North Pontic area (such as Zhyvotylivka–Vovchans’k and Gordineşti) side by side with potential Pre-Yamnaya and Yamnaya sources makes thus sense:

Now, the main obvious difference between Khvalynsk-Yamnaya and Corded Ware is the long-lasting, pervasive Y-chromosome bottlenecks under R1b lineages in the former, compared to the haplogroup variability and late bottleneck under R1a-M417 in the latter, which speaks in favour – on top of everything else – of a different community of sub-Neolithic hunter-gatherers including hg. R1a-M417 hijacking the expansion of Steppe_Maykop-related ancestry around the Volhynian-Podolian Upland.

Akin to how Yamnaya patrilineal descendants hijacked regional EEF (±CWC) ancestry components mainly through exogamy, dragging them into the different expanding Bell Beaker groups (see below), but kept their Indo-European languages, these hunter-gatherers that admixed with peoples of “Steppe ancestry” were the most likely vector of expansion of Uralic languages in Eastern Europe.

PCA of ancient Eurasian samples. Marked likely Proto-Corded Ware samples and potential origin of its PCA cluster based on qpAdm results. See full PCA and more related files.

Baltic Corded Ware

One of the most interesting aspects of the results above is the surprising heterogeneity of the different regional groups, which is also reflected in the Y-DNA variability of early Corded Ware samples.

Seeing how Baltic CWC groups, especially the early Latvia_LN sample, show particularly bad fits with the models above, it seems necessary to test how this population might have come to be. My first impression in 2017 was that they could represent early Corded Ware groups admixed with Yamnaya settlers through their interactions along the Dnieper-Dniester corridor.

However, I recently predicted that the most likely admixture leading to their ancestry and PCA cluster would involve a Corded Ware-like group and a group related to sub-Neolithic cultures of eastern Europe, whose best proxy to date are EHG-like Khvalynsk samples (i.e. excluding the outlier with Pre-Yamnaya ancestry, I0434):

Detail of the PCA of the Corded Ware expansion. See full PCA and more related files.

Late Corded Ware + Yamnaya vanguard

Relevant are also the mixtures of Corded Ware from Esperstedt, and particularly those of the sample I0104, which I have repeated many times in this blog I suspected to be influenced by vanguard Yamnaya settlers:

The infeasible models of CWC + Yamnaya_Kalmykia ± Hungary_Baden (see below for Bell Beakers) and the potential cluster formed with other samples from the Baltic suggest that it could represent a more complex set of mixtures with sub-Neolithic populations. On the other hand, its location in Germany, late date (ca. 2500 BC or later), and position in the PCA, together with the good fits obtained for Germany_Beaker as a source, suggest that the increase in Steppe-related ancestry + EEF makes it impossible for the model (as I set it) to directly include Yamnaya_Kalmykia, despite this excess Steppe-related ancestry actually coming from Yamnaya vanguard groups.

I think it is very likely that the future publication of EEF-admixed Yamnaya_Hungary samples (or maybe even Yamnaya vanguard samples) will improve the fits of this model.

These results confirm at least the need to distrust the common interpretation of mixtures including late Corded Ware samples from Esperstedt (giving rise to the “up to 75% Yamnaya ancestry of CWC” in the 2015 papers) as representative of the Corded Ware culture as a whole, and to keep always in mind that an admixture of European BA groups including Corded Ware Esperstedt as a source also includes East BBC-like ancestry, unless proven otherwise.

Yamnaya vanguard groups in Corded Ware territory before the expansion of Bell Beakers (ca. 2500 BC). See full map.

Bell Beaker expansion

A hotly (re)debated topic in the past 6 months or so, and for all the wrong reasons, is the origin of the Bell Beaker folk. Archaeology, linguistics, and different Y-chromosome bottlenecks clearly indicate that Bell Beakers were at the origin of the North-West Indo-European expansion in Europe, while the survival of Corded Ware-related groups in north-eastern Europe is clearly related to the expansion of Uralic languages.

NOTE. For the interesting case of Proto-Indo-Iranians expanding with Corded Ware-like ancestry, see more on the formation of Sintashta-Potapovka-Filatovka from East Uralic-speaking Abashevo and Pre-Proto-Indo-Iranian-speaking Poltavka herders. See also more on R1a in Indo-Iranians and on the social complexity of Sintashta.

Nevertheless, every single discarded theory out there seems to keep coming back to life from time to time, and a new wave of interest in “Bell Beaker from the Single Grave culture” somehow got revived in the process, too, because this obsession – unlike the “Bell Beakers from Iberia Chalcolithic” – is apparently acceptable in certain circles, for some reason.

We know that Iberian Beakers, British Beakers, or Sicilian EBA – representing the most likely closest source population of speakers of Proto-Galaico-Lusitanian, Pre-Celtic Indo-European, and Proto-Elymian, respectively – have already been successfully tested for a direct origin among Western European Beakers in Olalde et al. (2018), Olalde et al. (2019), and Fernandes et al. (2019).

This success in ascertaining a closer Beaker source is probably due to the physical isolation of the specific groups (related to Germany_Beaker, Netherlands_Beaker, and NE_Mediterranean_Beaker samples, respectively) after their migration into regions dominated by peoples without Steppe-related ancestry. Furthermore, Celtic-speaking populations expanding with Urnfield south of the Pyrenees also show a good fit with a source close to France_Beaker.

So I decided to test sampled Bell Beaker populations, to see if it could shed light to the most likely source population of individual Beaker groups and the direction of migration within Central Europe, i.e. roughly eastwards or westwards. As it was to be expected for closely related populations (see the relevant discussion here), an attempt to offer a simplistic analysis of direction based on formal stats does not make any sense, because most of the alternative hypotheses cannot be rejected:

Not only because of the similar values obtained, but because it is absurd to take p-values as a measure of anything, especially when most of these conflicting groups with slightly ‘better’ or ‘worse’ p-values represent multiple different mixtures of the type (Yamnaya + EEF) + (Corded Ware + EEF ± Yamnaya), impossible to distinguish without selecting proper, direct ancestral populations…

A further example of how explosive the Bell Beaker expansion was into different territories, and of their extensive local admixture, is shown by the unsuccessful attempt by Olalde et al. (2018) to obtain an origin of the EEF source for all Beaker groups (excluding Iberian Beakers):

Investigating the genetic makeup of Beaker-complex-associated individuals. Testing different populations as a source for the Neolithic ancestry component in Beaker-complex-associated individuals. The table shows P values (* indicates values > 0.05) for the fit of the model: ‘Steppe_EBA + Neolithic/Copper Age’ source population.
Map of attested Yamnaya pit-grave burials in the Hungarian plains; superimposed in shades of blue are common areas covered by floods before the extensive controls imposed in the 19th century; in orange, cumulative thickness of sand, unfavourable loamy sand layer. Marked are settlements/findings of Boleráz (ca. 3500 BC on), Baden (until ca. 2800 BC), Kostolac (precise dates unknown), and Yamna kurgans (from ca. 3100/3000 BC on).

Now, there is a simpler way to understand what kind of Steppe-related ancestry is proper of Bell Beakers. I tested two simple models for some Beaker groups: Yamnaya + Hungary Baden vs. Corded Ware + GAC Poland. After all, the Bell Beaker folk should prefer a source more closely related to either Yamnaya Hungary or Central European Corded Ware:

Interestingly, models including Yamnaya + Baden show good fits for the most important groups related to North-West Indo-Europeans, including Bell Beakers from Germany, the Netherlands, Italy, and Poland, representing the most likely closest source populations of speakers of Pre-Proto-Celtic, Pre-Proto-Germanic, Proto-Italo-Venetic, and Pre-Proto-Balto-Slavic, respectively.

The admixed Yamnaya samples from Hungary that will hopefully be published soon by the Jena Lab will most likely further improve these fits, especially in combination with intermediate Chalcolithic populations of the Middle and Upper Danube and its tributaries, to a point where there will be an absolute chronological and geographical genomic trail from the fully Yamnaya-like Yamnaya settlers from Hungary to all North-West Indo-European-speaking groups of the Early Bronze Age.

The only difference between groups will be the gradual admixture events of their source Beaker group with local populations on their expansion paths, including peoples of mainly EEF, CWC+EEF, or CWC+EEF+Yamnaya related ancestry. There is ample evidence beyond ancestry models to support this, in particular continued Y-DNA bottlenecks under typical Yamnaya paternal lineages, mainly represented by R1b-L51 subclades.

Distribution of the Bell Beaker East Group, with its regional provinces, as of c. 2400 cal BC (after Heyd et al. 2004, modified). See full maps.

European Early Bronze Age

European EBA groups that might show conflicting results due to multiple admixture events with Corded Ware-related populations are the Únětice culture and the Nordic Late Neolithic.

The results for Únětice groups seem to be in line with what is expected of a Central European EBA population derived from Bell Beakers admixed with surrounding poulations of East Bell Beaker and/or late (Epi-)Corded Ware descent.

Potential models of mixture for Nordic Late Neolithic samples – despite the bad fits due to the lack of direct ancestral CWC and BBC groups from Denmark – seem to be impossible to justify as derived exclusively from Single Grave or (even less) from Battle Axe peoples, supporting immigration waves of Bell Beakers from the south and further admixture events with local groups through maritime domination.

PCA of ancient European samples. Marked are Bronze Age clusters. See full PCAs.

Balkans Bronze Age

The potential origin of the typical Corded Ware Steppe-related ancestry in the social upheaval and population movements of the Dnieper-Dniester forest-steppe corridor during the 4th millennium BC raises the question: how much do Balkan Bronze Age groups owe their ancestry to a population different than the spread of Pre-Yamnaya-like Suvorovo-Novodanilovka chieftains? Furthermore, which Bronze Age groups seem to be more likely derived exclusively from Pre-Yamnaya groups, and which are more likely to be derived from a mixture of Yamnaya and Pre-Yamnaya? Do the formal stats obtained correspond to the expected results for each group?

Since the expansion of hg. I2a-L699 (TMRCA ca. 5500 BC) need not be associated with Yamnaya, some of these values – together with the assessment of each individual archaeological culture – may question their origin in a Yamnaya-related expansion rather than in a Khvalynsk-related one.

NOTE. These are the last ones I was able to test yesterday, and I have not thought these models through, so feel free to propose other source and target groups. In particular, complex movements through the North Pontic area during the Late Eneolithic would suggest that there might have been different Steppe-ancestry-related vs. EEF-related interactions in the north-west and west Pontic area before and during the expansion of Yamnaya.


One of the key Indo-European populations that should be derived from Yamnaya to confirm the Steppe hypothesis, together with North-West Indo-Europeans, are Proto-Greeks, who will in turn improve our understanding of the preceding Palaeo-Balkan community. Unfortunately, we only have Mycenaean samples from the Aegean, with slight contributions of Steppe-related ancestry.

Still, analyses with potential source populations for this Steppe ancestry show that the Yamnaya outlier from Bulgaria is a good fit:

The comparison of all results makes it quite evident the why of the good fits from (Srubnaya-related) Bulgaria_MLBA I2163 or of Sintashta_MLBA relative to the only a priori reasonable Yamnaya and Catacomb sources: it is not about some hypothetical shared ancestor in Graeco-Aryan-speaking East Yamnaya– or even Catacomb-Poltavka-related groups, because all available Yamnaya-related peoples are almost indistinguishable from each other (at least with the sampling available today). These results reflect a sizeable contribution of similar EEF-related populations from around the Carpathians in both Steppe-related groups: Corded Ware and Yamnaya settlers from the Balkans.

Cultural groups in and around the Balkans during the Early Bronze Age. See full maps.

qpAdm magic

In hobby ancestry magic, as in magic in general, it is not about getting dubious results out of thin air: misdirection is the key. A magician needs to draw the audience attention to ‘remarkable’ ancestry percentages coupled with ‘great’ (?) p-values that purportedly “prove” what the audience expects to see, distracting everyone from the true interesting aspects, like statistical design, the data used (and its shortcomings), other opposing models, a comparison of values, a proper interpretation…you name it.

I reckon – based on the examples above – that the following problems lie at the core of bad uses of qpAdm:

  1. In the formal aspect, the poor understanding of what p-values and other formal stats obtained actually mean, and – more importantly – what they don’t mean. The simplistic trend to accept results of a few analyses at face value is necessarily wrong, in so far as there is often no proper reasoning of what is being assessed and how, and there is never a previous opinion about what could be expected if the alternative hypotheses were true.
  2. In the interpretation aspect, the poor judgement of accompanying any results with simplistic, superficial, irrelevant, and often plainly wrong archaeological or linguistic data selected a posteriori; the inclusion of some racial or sociopolitical overtones in the mixture to set a propitious mood in the target audience; and a sort of ritualistic theatrics with the main theme of ‘winning’, that is best completed with ad hominems.

If you get rid of all this, the most reasonable interpretation of the output of a model proposed and tested should be similar to Nick Patterson’s words in his explanation of qpWave and qpAdm use:

Here we see that, at least in this analysis there are reasonable models with CordedWareNeolithic is a mix of either WHG or LBKNeolithic and YamnayaEBA. (…) The point of this note is not to give a serious phylogenetic analysis but the results here certainly support a major Steppe contribution to the Corded Ware population, which is entirely concordant with the archaeology [?].

Very far, as you can see, from the childish “Eureka! I proved the source!”-kind of thinking common among hobbyists.

The Mycenaean case is an illustrative example: if the Yamnaya outlier from Bulgaria were not available, and if one were not careful when designing and assessing those mixture models, the interpretation would range from erroneous (viz. a Graeco-Aryan substrate, as I initially thought) to impossible (say, inventing migration waves of Sintashta or Srubnaya peoples into Crete). The models presented above show that a contribution of Yamnaya to Mycenaeans couldn’t be rejected, and this alone should have been enough to accept Yamnaya as the most likely source population of “Steppe ancestry” in Proto-Greeks, pending intermediate samples from the Balkans. In other words, one could actually find that ‘the best’ p-values for source populations of Mycenaeans is a combination of modern Poles + Turks, despite the impracticality of such a model…

I haven’t been able to reproduce results which supposedly showed that Corded Ware is more likely to be derived from (Pre-)Yamnaya than other source population, or that Corded Ware is better suited as the ancestral population of Bell Beakers. The analyses above show values in line with what has been published in recent scientific papers, and what should be expected based on linguistics and archaeology. So I’ll go out on a limb here and say that it’s only through a careful selection of outgroups and samples tested, and of as few compared models as possible, that you could eventually get this kind of results and interpretation, if at all.

Whether that kind of special care for outgroups and samples is about (a) an acceptable fine-tuning of the analyses, (b) a simplistic selection dragged from the first papers published and applied indiscriminately to all models, or (c) cherry picking analyses until results fit the expected outcome, is a question that will become mostly irrelevant when future publications continue to support an origin of the expansion of ancient Indo-European languages in Khvalynsk- and Yamnaya-related migrations.

Feel free to suggest (reasonable) modifications to correct some of these models in the comments. Also, be sure to check out other values such as proportions, SD or SNPs of the different results that I might have not taken into account when assessing ‘good’ or ‘bad’ fits.


Yamnaya ancestry: mapping the Proto-Indo-European expansions


The latest papers from Ning et al. Cell (2019) and Anthony JIES (2019) have offered some interesting new data, supporting once more what could be inferred since 2015, and what was evident in population genomics since 2017: that Proto-Indo-Europeans expanded under R1b bottlenecks, and that the so-called “Steppe ancestry” referred to two different components, one – Yamnaya or Steppe_EMBA ancestry – expanding with Proto-Indo-Europeans, and the other one – Corded Ware or Steppe_MLBA ancestry – expanding with Uralic speakers.

The following maps are based on formal stats published in the papers and supplementary materials from 2015 until today, mainly on Wang et al. (2018 & 2019), Mathieson et al. (2018) and Olalde et al. (2018), and others like Lazaridis et al. (2016), Lazaridis et al. (2017), Mittnik et al. (2018), Lamnidis et al. (2018), Fernandes et al. (2018), Jeong et al. (2019), Olalde et al. (2019), etc.

NOTE. As in the Corded Ware ancestry maps, the selected reports in this case are centered on the prototypical Yamnaya ancestry vs. other simplified components, so everything else refers to simplistic ancestral components widespread across populations that do not necessarily share any recent connection, much less a language. In fact, most of the time they clearly didn’t. They can be interpreted as “EHG that is not part of the Yamnaya component”, or “CHG that is not part of the Yamnaya component”. They can’t be read as “expanding EHG people/language” or “expanding CHG people/language”, at least no more than maps of “Steppe ancestry” can be read as “expanding Steppe people/language”. Also, remember that I have left the default behaviour for color classification, so that the highest value (i.e. 1, or white colour) could mean anything from 10% to 100% depending on the specific ancestry and period; that’s what the legend is for… But, fere libenter homines id quod volunt credunt.


  1. Neolithic or the formation of Early Indo-European
  2. Eneolithic or the expansion of Middle Proto-Indo-European
  3. Chalcolithic / Early Bronze Age or the expansion of Late Proto-Indo-European
  4. European Early Bronze Age and MLBA or the expansion of Late PIE dialects

1. Neolithic

Anthony (2019) agrees with the most likely explanation of the CHG component found in Yamnaya, as derived from steppe hunter-fishers close to the lower Volga basin. The ultimate origin of this specific CHG-like component that eventually formed part of the Pre-Yamnaya ancestry is not clear, though:

The hunter-fisher camps that first appeared on the lower Volga around 6200 BC could represent the migration northward of un-admixed CHG hunter-fishers from the steppe parts of the southeastern Caucasus, a speculation that awaits confirmation from aDNA.

Natural neighbor interpolation of CHG ancestry among Neolithic populations. See full map.

The typical EHG component that formed part eventually of Pre-Yamnaya ancestry came from the Middle Volga Basin, most likely close to the Samara region, as shown by the sampled Samara hunter-gatherer (ca. 5600-5500 BC):

After 5000 BC domesticated animals appeared in these same sites in the lower Volga, and in new ones, and in grave sacrifices at Khvalynsk and Ekaterinovka. CHG genes and domesticated animals flowed north up the Volga, and EHG genes flowed south into the North Caucasus steppes, and the two components became admixed.

Natural neighbor interpolation of EHG ancestry among Neolithic populations. See full map.

To the west, in the Dnieper-Dniester area, WHG became the dominant ancestry after the Mesolithic, at the expense of EHG, revealing a likely mating network reaching to the north into the Baltic:

Like the Mesolithic and Neolithic populations here, the Eneolithic populations of Dnieper-Donets II type seem to have limited their mating network to the rich, strategic region they occupied, centered on the Rapids. The absence of CHG shows that they did not mate frequently if at all with the people of the Volga steppes (…)

Natural neighbor interpolation of WHG ancestry among Neolithic populations. See full map.

North-West Anatolia Neolithic ancestry, proper of expanding Early European farmers, is found up to border of the Dniester, as Anthony (2007) had predicted.

Natural neighbor interpolation of Anatolia Neolithic ancestry among Neolithic populations. See full map.

2. Eneolithic

From Anthony (2019):

After approximately 4500 BC the Khvalynsk archaeological culture united the lower and middle Volga archaeological sites into one variable archaeological culture that kept domesticated sheep, goats, and cattle (and possibly horses). In my estimation, Khvalynsk might represent the oldest phase of PIE.

(…) this middle Volga mating network extended down to the North Caucasian steppes, where at cemeteries such as Progress-2 and Vonyuchka, dated 4300 BC, the same Khvalynsk-type ancestry appeared, an admixture of CHG and EHG with no Anatolian Farmer ancestry, with steppe-derived Y-chromosome haplogroup R1b. These three individuals in the North Caucasus steppes had higher proportions of CHG, overlapping Yamnaya. Without any doubt, a CHG population that was not admixed with Anatolian Farmers mated with EHG populations in the Volga steppes and in the North Caucasus steppes before 4500 BC. We can refer to this admixture as pre-Yamnaya, because it makes the best currently known genetic ancestor for EHG/CHG R1b Yamnaya genomes.

From Wang et al (2019):

Three individuals from the sites of Progress 2 and Vonyuchka 1 in the North Caucasus piedmont steppe (‘Eneolithic steppe’), which harbour EHG and CHG related ancestry, are genetically very similar to Eneolithic individuals from Khvalynsk II and the Samara region. This extends the cline of dilution of EHG ancestry via CHG-related ancestry to sites immediately north of the Caucasus foothills

Natural neighbor interpolation of Pre-Yamnaya ancestry among Neolithic populations. See full map. This map corresponds roughly to the map of Khvalynsk-Novodanilovka expansion, and in particular to the expansion of horse-head pommel-scepters (read more about Khvalynsk, and specifically about horse symbolism)

NOTE. Unpublished samples from Ekaterinovka have been previously reported as within the R1b-L23 tree. Interestingly, although the Varna outlier is a female, the Balkan outlier from Smyadovo shows two positive SNP calls for hg. R1b-M269. However, its poor coverage makes its most conservative haplogroup prediction R-M343.

The formation of this Pre-Yamnaya ancestry sets this Volga-Caucasus Khvalynsk community apart from the rest of the EHG-like population of eastern Europe.

Natural neighbor interpolation of non-Pre-Yamnaya EHG ancestry among Eneolithic populations. See full map.

Anthony (2019) seems to rely on ADMIXTURE graphics when he writes that the late Sredni Stog sample from Alexandria shows “80% Khvalynsk-type steppe ancestry (CHG&EHG)”. While this seems the most logical conclusion of what might have happened after the Suvorovo-Novodanilovka expansion through the North Pontic steppes (see my post on “Steppe ancestry” step by step), formal stats have not confirmed that.

In fact, analyses published in Wang et al. (2019) rejected that Corded Ware groups are derived from this Pre-Yamnaya ancestry, a reality that had been already hinted in Narasimhan et al. (2018), when Steppe_EMBA showed a poor fit for expanding Srubna-Andronovo populations. Hence the need to consider the whole CHG component of the North Pontic area separately:

Natural neighbor interpolation of non-Pre-Yamnaya CHG ancestry among Eneolithic populations. See full map. You can read more about population movements in the late Sredni Stog and closer to the Proto-Corded Ware period.

NOTE. Fits for WHG + CHG + EHG in Neolithic and Eneolithic populations are taken in part from Mathieson et al. (2019) supplementary materials (download Excel here). Unfortunately, while data on the Ukraine_Eneolithic outlier from Alexandria abounds, I don’t have specific data on the so-called ‘outlier’ from Dereivka compared to the other two analyzed together, so these maps of CHG and EHG expansion are possibly showing a lesser distribution to the west than the real one ca. 4000-3500 BC.

Natural neighbor interpolation of WHG ancestry among Eneolithic populations. See full map.

Anatolia Neolithic ancestry clearly spread to the east into the north Pontic area through a Middle Eneolithic mating network, most likely opened after the Khvalynsk expansion:

Natural neighbor interpolation of Anatolia Neolithic ancestry among Eneolithic populations. See full map.
Natural neighbor interpolation of Iran Chl. ancestry among Eneolithic populations. See full map.

Regarding Y-chromosome haplogroups, Anthony (2019) insists on the evident association of Khvalynsk, Yamnaya, and the spread of Pre-Yamnaya and Yamnaya ancestry with the expansion of elite R1b-L754 (and some I2a2) individuals:

Y-DNA haplogroups in West Eurasia during the Early Eneolithic in the Pontic-Caspian steppes. See full map, and see culture, ADMIXTURE, Y-DNA, and mtDNA maps of the Early Eneolithic and Late Eneolithic.

3. Early Bronze Age

Data from Wang et al. (2019) show that Corded Ware-derived populations do not have good fits for Eneolithic_Steppe-like ancestry, no matter the model. In other words: Corded Ware populations show not only a higher contribution of Anatolia Neolithic ancestry (ca. 20-30% compared to the ca. 2-10% of Yamnaya); they show a different EHG + CHG combination compared to the Pre-Yamnaya one.

Supplementary Table 13. P values of rank=2 and admixture proportions in modelling Steppe ancestry populations as a three-way admixture of Eneolithic steppe Anatolian_Neolithic and WHG using 14 outgroups.
Left populations: Test, Eneolithic_steppe, Anatolian_Neolithic, WHG.
Right populations: Mbuti.DG, Ust_Ishim.DG, Kostenki14, MA1, Han.DG, Papuan.DG, Onge.DG, Villabruna, Vestonice16, ElMiron, Ethiopia_4500BP.SG, Karitiana.DG, Natufian, Iran_Ganj_Dareh_Neolithic.

Yamnaya Kalmykia and Afanasievo show the closest fits to the Eneolithic population of the North Caucasian steppes, rejecting thus sizeable contributions from Anatolia Neolithic and/or WHG, as shown by the SD values. Both probably show then a Pre-Yamnaya ancestry closest to the late Repin population.

Modelling results for the Steppe and Caucasus cluster. Admixture proportions based on (temporally and geographically) distal and proximal models, showing additional AF ancestry in Steppe groups and additional gene flow from the south in some of the Steppe groups as well as the Caucasus groups. See tables above. Modified from Wang et al. (2019). Within a blue square, Yamnaya-related groups; within a cyan square, Corded Ware-related groups. Green background behind best p-values. In red circle, SD of AF/WHG ancestry contribution in Afanasevo and Yamnaya Kalmykia, with ranges that almost include 0%.

EBA maps include data from Wang et al. (2018) supplementary materials, specifically unpublished Yamnaya samples from Hungary that appeared in analysis of the preprint, but which were taken out of the definitive paper. Their location among Yamnaya settlers from Hungary is speculative, although most uncovered kurgans in Hungary are concentrated in the Tisza-Danube interfluve.

Natural neighbor interpolation of Pre-Yamnaya ancestry among Early Bronze Age populations. See full map. This map corresponds roughly with the known expansion of late Repin/Yamnaya settlers.

The Y-chromosome bottleneck of elite males from Proto-Indo-European clans under R1b-L754 and some I2a2 subclades, already visible in the Khvalynsk sampling, became even more noticeable in the subsequent expansion of late Repin/early Yamnaya elites under R1b-L23 and I2a-L699:

Y-DNA haplogroups in West Eurasia during the Yamnaya expansion. See full map and maps of cultures, ADMIXTURE, Y-DNA, and mtDNA of the Early Chalcolithic and Yamnaya Hungary.

Maps of CHG, EHG, Anatolia Neolithic, and probably WHG show the expansion of these components among Corded Ware-related groups in North Eurasia, apart from other cultures close to the Caucasus:

NOTE. For maps with actual formal stats of Corded Ware ancestry from the Early Bronze Age to the modern times, you can read the post Corded Ware ancestry in North Eurasia and the Uralic expansion.

Natural neighbor interpolation of non-Pre-Yamnaya CHG ancestry among Early Bronze Age populations. See full map.
Natural neighbor interpolation of non-Pre-Yamnaya EHG ancestry among Early Bronze Age populations. See full map.
Natural neighbor interpolation of WHG ancestry among Early Bronze Age populations. See full map.
Natural neighbor interpolation of Anatolia Neolithic ancestry among Early Bronze Age populations. See full map.
Natural neighbor interpolation of Iran Chl. ancestry among Early Bronze Age populations. See full map.

4. Middle to Late Bronze Age

The following maps show the most likely distribution of Yamnaya ancestry during the Bell Beaker-, Balkan-, and Sintashta-Potapovka-related expansions.

4.1. Bell Beakers

The amount of Yamnaya ancestry is probably overestimated among populations where Bell Beakers replaced Corded Ware. A map of Yamnaya ancestry among Bell Beakers gets trickier for the following reasons:

  • Expanding Repin peoples of Pre-Yamnaya ancestry must have had admixture through exogamy with late Sredni Stog/Proto-Corded Ware peoples during their expansion into the North Pontic area, and Sredni Stog in turn had probably some Pre-Yamnaya admixture, too (although they don’t appear in the simplistic formal stats above). This is supported by the increase of Anatolia farmer ancestry in more western Yamna samples.
  • Later, Yamnaya admixed through exogamy with Corded Ware-like populations in Central Europe during their expansion. Even samples from the Middle to Upper Danube and around the Lower Rhine will probably show increasing contributions of Steppe_MLBA, at the same time as they show an increasing proportion of EEF-related ancestry.
  • To complicate things further, the late Corded Ware Espersted family (from ca. 2500 BC or later) shows, in turn, what seems like a recent admixture with Yamnaya vanguard groups, with the sample of highest Yamnaya ancestry being the paternal uncle of other individuals (all of hg. R1a-M417), suggesting that there might have been many similar Central European mating networks from the mid-3rd millennium BC on, of (mainly) Yamnaya-like R1b elites displaying a small proportion of CW-like ancestry admixing through exogamy with Corded Ware-like peoples who already had some Yamnaya ancestry.
Natural neighbor interpolation of Yamnaya ancestry among Middle to Late Bronze Age populations (Esperstedt CWC site close to BK_DE, label is hidden by BK_DE_SAN). See full map. You can see how this map correlated with the map of Late Copper Age migrations and Yamanaya into Bell Beaker expansion.

NOTE. Terms like “exogamy”, “male-driven migration”, and “sex bias”, are not only based on the Y-chromosome bottlenecks visible in the different cultural expansions since the Palaeolithic. Despite the scarce sampling available in 2017 for analysis of “Steppe ancestry”-related populations, it appeared to show already a male sex bias in Goldberg et al. (2017), and it has been confirmed for Neolithic and Copper Age population movements in Mathieson et al. (2018) – see Supplementary Table 5. The analysis of male-biased expansion of “Steppe ancestry” in CWC Esperstedt and Bell Beaker Germany is, for the reasons stated above, not very useful to distinguish their mutual influence, though.

Based on data from Olalde et al. (2019), Bell Beakers from Germany are the closest sampled ones to expanding East Bell Beakers, and those close to the Rhine – i.e. French, Dutch, and British Beakers in particular – show a clear excess “Steppe ancestry” due to their exogamy with local Corded Ware groups:

Only one 2-way model fits the ancestry in Iberia_CA_Stp with P-value>0.05: Germany_Beaker + Iberia_CA. Finding a Bell Beaker-related group as a plausible source for the introduction of steppe ancestry into Iberia is consistent with the fact that some of the individuals in the Iberia_CA_Stp group were excavated in Bell Beaker associated contexts. Models with Iberia_CA and other Bell Beaker groups such as France_Beaker (P-value=7.31E-06), Netherlands_Beaker (P-value=1.03E-03) and England_Beaker (P-value=4.86E-02) failed, probably because they have slightly higher proportions of steppe ancestry than the true source population.


The exogamy with Corded Ware-like groups in the Lower Rhine Basin seems at this point undeniable, as is the origin of Bell Beakers around the Middle-Upper Danube Basin from Yamnaya Hungary.

To avoid this excess “Steppe ancestry” showing up in the maps, since Bell Beakers from Germany pack the most Yamnaya ancestry among East Bell Beakers outside Hungary (ca. 51.1% “Steppe ancestry”), I equated this maximum with BK_Scotland_Ach (which shows ca. 61.1% “Steppe ancestry”, highest among western Beakers), and applied a simple rule of three for “Steppe ancestry” in Dutch and British Beakers.

NOTE. Formal stats for “Steppe ancestry” in Bell Beaker groups are available in Olalde et al. (2018) supplementary materials (PDF). I didn’t apply this adjustment to Bk_FR groups because of the R1b Bell Beaker sample from the Champagne/Alsace region reported by Samantha Brunel that will pack more Yamnaya ancestry than any other sampled Beaker to date, hence probably driving the Yamnaya ancestry up in French samples.

The most likely outcome in the following years, when Yamnaya and Corded Ware ancestry are investigated separately, is that Yamnaya ancestry will be much lower the farther away from the Middle and Lower Danube region, similar to the case in Iberia, so the map above probably overestimates this component in most Beakers to the north of the Danube. Even the late Hungarian Beaker samples, who pack the highest Yamnaya ancestry (up to 75%) among Beakers, represent likely a back-migration of Moravian Beakers, and will probably show a contribution of Corded Ware ancestry due to the exogamy with local Moravian groups.

Despite this decreasing admixture as Bell Beakers spread westward, the explosive expansion of Yamnaya R1b male lineages (in words of David Reich) and the radical replacement of local ones – whether derived from Corded Ware or Neolithic groups – shows the true extent of the North-West Indo-European expansion in Europe:

Y-DNA haplogroups in West Eurasia during the Bell Beaker expansion. See full map and see maps of cultures, ADMIXTURE, Y-DNA, and mtDNA of the Late Copper Age and of the Yamnaya-Bell Beaker transition.

4.2. Palaeo-Balkan

There is scarce data on Palaeo-Balkan movements yet, although it is known that:

  1. Yamnaya ancestry appears among Mycenaeans, with the Yamnaya Bulgaria sample being its best current ancestral fit;
  2. the emergence of steppe ancestry and R1b-M269 in the eastern Mediterranean was associated with Ancient Greeks;
  3. Thracians, Albanians, and Armenians also show R1b-M269 subclades and “Steppe ancestry”.

4.3. Sintashta-Potapovka-Filatovka

Interestingly, Potapovka is the only Corded Ware derived culture that shows good fits for Yamnaya ancestry, despite having replaced Poltavka in the region under the same Corded Ware-like (Abashevo) influence as Sintashta.

This proves that there was a period of admixture in the Pre-Proto-Indo-Iranian community between CWC-like Abashevo and Yamnaya-like Catacomb-Poltavka herders in the Sintashta-Potapovka-Filatovka community, probably more easily detectable in this group because of the specific temporal and geographic sampling available.

Supplementary Table 14. P values of rank=3 and admixture proportions in modelling Steppe ancestry populations as a four-way admixture of distal sources EHG, CHG, Anatolian_Neolithic and WHG using 14 outgroups.
Left populations: Steppe cluster, EHG, CHG, WHG, Anatolian_Neolithic
Right populations: Mbuti.DG, Ust_Ishim.DG, Kostenki14, MA1, Han.DG, Papuan.DG, Onge.DG, Villabruna, Vestonice16, ElMiron, Ethiopia_4500BP.SG, Karitiana.DG, Natufian, Iran_Ganj_Dareh_Neolithic.

Srubnaya ancestry shows a best fit with non-Pre-Yamnaya ancestry, i.e. with different CHG + EHG components – possibly because the more western Potapovka (ancestral to Proto-Srubnaya Pokrovka) also showed good fits for it. Srubnaya shows poor fits for Pre-Yamnaya ancestry probably because Corded Ware-like (Abashevo) genetic influence increased during its formation.

On the other hand, more eastern Corded Ware-derived groups like Sintashta and its more direct offshoot Andronovo show poor fits with this model, too, but their fits are still better than those including Pre-Yamnaya ancestry.

Natural neighbor interpolation of non-Pre-Yamnaya EHG ancestry among Middle to Late Bronze Age populations. See full map.
Natural neighbor interpolation of non-Pre-Yamnaya CHG ancestry among Middle to Late Bronze Age populations. See full map.
Natural neighbor interpolation of Anatolia Neolithic ancestry among Middle to Late Bronze Age populations. See full map.
Natural neighbor interpolation of Iran Chl. ancestry among Middle to Late Bronze Age populations. See full map.

NOTE For maps with actual formal stats of Corded Ware ancestry from the Early Bronze Age to the modern times, you should read the post Corded Ware ancestry in North Eurasia and the Uralic expansion instead.

The bottleneck of Proto-Indo-Iranians under R1a-Z93 was not yet complete by the time when the Sintashta-Potapovka-Filatovka community expanded with the Srubna-Andronovo horizon:

Y-DNA haplogroups in West Eurasia during the European Early Bronze Age. See full map and see maps of cultures, ADMIXTURE, Y-DNA, and mtDNA of the Early Bronze Age.

4.4. Afanasevo

At the end of the Afanasevo culture, at least three samples show hg. Q1b (ca. 2900-2500 BC), which seemed to point to a resurgence of local lineages, despite continuity of the prototypical Pre-Yamnaya ancestry. On the other hand, Anthony (2019) makes this cryptic statement:

Yamnaya men were almost exclusively R1b, and pre-Yamnaya Eneolithic Volga-Caspian-Caucasus steppe men were principally R1b, with a significant Q1a minority.

Since the only available samples from the Khvalynsk community are R1b (x3), Q1a(x1), and R1a(x1), it seems strange that Anthony would talk about a “significant minority”, unless Q1a (potentially Q1b in the newer nomenclature) will pop up in some more individuals of those ca. 30 new to be published. Because he also mentions I2a2 as appearing in one elite burial, it seems Q1a (like R1a-M459) will not appear under elite kurgans, although it is still possible that hg. Q1a was involved in the expansion of Afanasevo to the east.

Y-DNA haplogroups in West Eurasia during the Middle Bronze Age. See full map and see maps of cultures, ADMIXTURE, Y-DNA, and mtDNA of the Middle Bronze Age and the Late Bronze Age.

Okunevo, which replaced Afanasevo in the Altai region, shows a majority of hg. Q1b, but also some R1b-M269 samples proper of Afanasevo, suggesting partial genetic continuity.

NOTE. Other sampled Siberian populations clearly show a variety of Q subclades that likely expanded during the Palaeolithic, such as Baikal EBA samples from Ust’Ida and Shamanka with a majority of Q1b, and hg. Q reported from Elunino, Sagsai, Khövsgöl, and also among peoples of the Srubna-Andronovo horizon (the Krasnoyarsk MLBA outlier), and in Karasuk.

From Damgaard et al. Science (2018):

(…) in contrast to the lack of identifiable admixture from Yamnaya and Afanasievo in the CentralSteppe_EMBA, there is an admixture signal of 10 to 20% Yamnaya and Afanasievo in the Okunevo_EMBA samples, consistent with evidence of western steppe influence. This signal is not seen on the X chromosome (qpAdm P value for admixture on X 0.33 compared to 0.02 for autosomes), suggesting a male-derived admixture, also consistent with the fact that 1 of 10 Okunevo_EMBA males carries a R1b1a2a2 Y chromosome related to those found in western pastoralists. In contrast, there is no evidence of western steppe admixture among the more eastern Baikal region region Bronze Age (~2200 to 1800 BCE) samples.

This Yamnaya ancestry has been also recently found to be the best fit for the Iron Age population of Shirenzigou in Xinjiang – where Tocharian languages were attested centuries later – despite the haplogroup diversity acquired during their evolution, likely through an intermediate Chemurchek culture (see a recent discussion on the elusive Proto-Tocharians).

Haplogroup diversity seems to be common in Iron Age populations all over Eurasia, most likely due to the spread of different types of sociopolitical structures where alliances played a more relevant role in the expansion of peoples. A well-known example of this is the spread of Akozino warrior-traders in the whole Baltic region under a partial N1a-VL29-bottleneck associated with the emerging chiefdom-based systems under the influence of expanding steppe nomads.

Y-DNA haplogroups in West Eurasia during the Early Iron Age. See full map and see maps of cultures, ADMIXTURE, Y-DNA, and mtDNA of the Early Iron Age and Late Iron Age.

Surprisingly, then, Proto-Tocharians from Shirenzigou pack up to 74% Yamnaya ancestry, in spite of the 2,000 years that separate them from the demise of the Afanasevo culture. They show more Yamnaya ancestry than any other population by that time, being thus a sort of Late PIE fossils not only in their archaic dialect, but also in their genetic profile:


The recent intrusion of Corded Ware-like ancestry, as well as the variable admixture with Siberian and East Asian populations, both point to the known intense Old Iranian and Old/Middle Chinese contacts. The scarce Proto-Samoyedic and Proto-Turkic loans in Tocharian suggest a rather loose, probably more distant connection with East Uralic and Altaic peoples from the forest-steppe and steppe areas to the north (read more about external influences on Tocharian).

Interestingly, both R1b samples, MO12 and M15-2 – likely of Asian R1b-PH155 branch – show a best fit for Andronovo/Srubna + Hezhen/Ulchi ancestry, suggesting a likely connection with Iranians to the east of Xinjiang, who later expanded as the Wusun and Kangju. How they might have been related to Huns and Xiongnu individuals, who also show this haplogroup, is yet unknown, although Huns also show hg. R1a-Z93 (probably most R1a-Z2124) and Steppe_MLBA ancestry, earlier associated with expanding Iranian peoples of the Srubna-Andronovo horizon.

All in all, it seems that prehistoric movements explained through the lens of genetic research fit perfectly well the linguistic reconstruction of Proto-Indo-European and Proto-Uralic.


Yamna the likely source of modern horse domesticates; the closest lineage, from East Bell Beakers

Open access Tracking Five Millennia of Horse Management with Extensive Ancient Genome Time Series, by Fages et al. Cell (2019).

Interesting excerpts (emphasis mine):

The earliest archaeological evidence of horse milking, harnessing, and corralling is found in the ∼5,500-year-old Botai culture of Central Asian steppes (Gaunitz et al., 2018, Outram et al., 2009; see Kosintsev and Kuznetsov, 2013 for discussion). Botai-like horses are, however, not the direct ancestors of modern domesticates but of Przewalski’s horses (Gaunitz et al., 2018). The genetic origin of modern domesticates thus remains contentious, with suggested candidates in the Pontic-Caspian steppes (Anthony, 2007), Anatolia (Arbuckle, 2012, Benecke, 2006), and Iberia (Uerpmann, 1990, Warmuth et al., 2011). Irrespective of the origins of domestication, the horse genome is known to have been reshaped significantly within the last ∼2,300 years (Librado et al., 2017, Wallner et al., 2017, Wutke et al., 2018). However, when and in which context(s) such changes occurred remains largely unknown.

To clarify the origins of domestic horses and reveal their subsequent transformation by past equestrian civilizations, we generated DNA data from 278 equine subfossils with ages mostly spanning the last six millennia (n = 265, 95%) (Figures 1A and 1B; Table S1; STAR Methods). Endogenous DNA content was compatible with economical sequencing of 87 new horse genomes to an average depth-of-coverage of 1.0- to 9.3-fold (median = 3.3-fold; Table S2). This more than doubles the number of ancient horse genomes hitherto characterized. With a total of 129 ancient genomes, 30 modern genomes, and new genome-scale data from 132 ancient individuals (0.01- to 0.9-fold, median = 0.08-fold), our dataset represents the largest genome-scale time series published for a non-human organism (Tables S2, S3, and S4; STAR Methods).

Genetic Affinities.
Principal Component Analysis (PCA) of 159 ancient and modern horse genomes showing at least 1-fold average depth-of-coverage. The overall genetic structure is shown for the first three principal components, which summarize 11.6%, 10.4% and 8.2% of the total genetic variation, respectively. The two specimens MerzlyYar_Rus45_23789 and Dunaujvaros_Duk2_4077 discussed in the main text are highlighted. See also Figure S7 and Table S5 for further information.
(B) Visualization of the genetic affinities among individuals, as revealed by the struct-f4 algorithm and 878,475 f4 permutations. The f4 calculation was conditioned on nucleotide transversions present in all groups, with samples were grouped as in TreeMix analyses (Figure 3). In contrast to PCA, f4 permutations measure genetic drift along internal branches. They are thus more likely to reveal ancient population substructure.

Discovering Two Divergent and Extinct Lineages of Horses

Domestic and Przewalski’s horses are the only two extant horse lineages (Der Sarkissian et al., 2015). Another lineage was genetically identified from three bones dated to ∼43,000–5,000 years ago (Librado et al., 2015, Schubert et al., 2014a). It showed morphological affinities to an extinct horse species described as Equus lenensis (Boeskorov et al., 2018). We now find that this extinct lineage also extended to Southern Siberia, following the principal component analysis (PCA), phylogenetic, and f3-outgroup clustering of an ∼24,000-year-old specimen from the Tuva Republic within this group (Figures 3, 5A and S7A). This new specimen (MerzlyYar_Rus45_23789) carries an extremely divergent mtDNA only found in the New Siberian Islands some ∼33,200 years ago (Orlando et al., 2013) (Figure 6A; STAR Methods) and absent from the three bones previously sequenced. This suggests that a divergent ghost lineage of horses contributed to the genetic ancestry of MerzlyYar_Rus45_23789. However, both the timing and location of the genetic contact between E. lenensis and this ghost lineage remain unknown.

Population modeling of the demographic changes and admixture events in extant and extinct horse lineages. The two models presented show best fitting to the observed multi-dimensional SFS in momi2. The width of each branch scales with effective size variation, while colored dashed lines indicate admixture proportions and their directionality. The robustness of each model was inferred from 100 bootstrap pseudo-replicates. Time is shown in a linear scale up to 120,000 years ago and in a logarithmic scale above.

Modeling Demography and Admixture of Extinct and Extant Horse Lineages

Phylogenetic reconstructions without gene flow indicated that IBE differentiated prior to the divergence between DOM2 and Przewalski’s horses (Figure 3; STAR Methods). However, allowing for one migration edge in TreeMix suggested closer affinities with one single Hungarian DOM2 specimen from the 3rd mill. BCE (Dunaujvaros_Duk2_4077), with extensive genetic contribution (38.6%) from the branch ancestral to all horses (Figure S7B).This, and the extremely divergent IBE Y chromosome (Figure 6B), suggest that a divergent but yet unidentified ghost population could have contributed to the IBE genetic makeup.

Rejecting Iberian Contribution to Modern Domesticates

The genome sequences of four ∼4,800- to 3,900-year-old IBE specimens characterized here allowed us to clarify ongoing debates about the possible contribution of Iberia to horse domestication (Benecke, 2006, Uerpmann, 1990, Warmuth et al., 2011). Calculating the so-called fG ratio (Martin et al., 2015) provided a minimal boundary for the IBE contribution to DOM2 members (Cahill et al., 2013) (Figure 7A). The maximum of such estimate was found in the Hungarian Dunaujvaros_Duk2_4077 specimen (∼11.7%–12.2%), consistent with its TreeMix clustering with IBE when allowing for one migration edge (Figure S7B). This specimen was previously suggested to share ancestry with a yet-unidentified population (Gaunitz et al., 2018). Calculation of f4-statistics indicates that this population is not related to E. lenensis but to IBE (Figure 7B; STAR Methods). Therefore, IBE or horses closely related to IBE, contributed ancestry to animals found at an Early Bronze Age trade center in Hungary from the late 3rd mill. BCE. This could indicate that there was long-distance exchange of horses during the Bell Beaker phenomenon (Olalde et al., 2018). The fG minimal boundary for the IBE contribution into an Iron Age Spanish horse (ElsVilars_UE4618_2672) was still important (~9.6%–10.1%), suggesting that an IBE genetic influence persisted in Iberia until at least the 7th century BCE in a domestic context. However, fG estimates were more limited for almost all ancient and modern horses investigated (median = ~4.9%–5.4%; Figure 7A).

TreeMix Phylogenetic Relationships. The tree topology was inferred using a total of ∼16.8 million transversion sites and disregarding migration. The name of each sample provides the archaeological site as a prefix, and the age of the specimen as a suffix (years ago). Name suffixes (E) and (A) denote European and Asian ancient horses, respectively. See Table S5 for dataset information. Image modified to include the likely ancestor of domesticates in a red circle, represented by Yamna, the most likely direct ancestor of the Dunaujvarus specimen.

Iron Age horses

Y chromosome nucleotide diversity (π) decreased steadily in both continents during the last ∼2,000 years but dropped to present-day levels only after 850–1,350 CE (Figures 2B and S2E; STAR Methods). This is consistent with the dominance of an ∼1,000- to 700-year-old oriental haplogroup in most modern studs (Felkel et al., 2018, Wallner et al., 2017). Our data also indicate that the growing influence of specific stallion lines post-Renaissance (Wallner et al., 2017) was responsible for as much as a 3.8- to 10.0-fold drop in Y chromosome diversity.

We then calculated Y chromosome π estimates within past cultures represented by a minimum of three males to clarify the historical contexts that most impacted Y chromosome diversity. This confirmed the temporal trajectory observed above as Byzantine horses (287–861 CE) and horses from the Great Mongolian Empire (1,206–1,368 CE) showed limited yet larger-than-modern diversity. Bronze Age Deer Stone horses from Mongolia, medieval Aukštaičiai horses from Lithuania (C9th–C10th [ninth through the tenth centuries of the Common Era]), and Iron Age Pazyryk Scythian horses showed similar diversity levels (0.000256–0.000267) (Figure 2A). However, diversity was larger in La Tène, Roman, and Gallo-Roman horses, where Y-to-autosomal π ratios were close to 0.25. This contrasts to modern horses, where marked selection of specific patrilines drives Y-to-autosomal π ratios substantially below 0.25 (0.0193–0.0396) (Figure 2A). The close-to-0.25 Y-to-autosomal π ratios found in La Tène, Roman, and Gallo-Roman horses suggest breeding strategies involving an even reproductive success among stallions or equally biased reproductive success in both sexes (Wilson Sayres et al., 2014).

Lineage is used in this paper, as in many others in genetics, as defined by a specific ancestry. I keep that nomenclature below. It should not be confused with the “lineages” or “lines” referring to Y-chromosome (or mtDNA) haplogroups.

Supporting the “archaic” nature of the Hungarian BBC horses expanding from the Pontic-Caspian steppes are:

  • Among Y-chromosome lines, the common group formed by Botai-Borly4 (closely related to DOM2), Scythian horses from Aldy Bel (Arzhani), Iron Age horses from Estonia (Ridala), horses from the Xiongnu culture (Uushgiin Uvur), and Roman horses from Autricum (Chartres).
  • Among mtDNA lines, the common group formed by Botai samples, LebyazhinkaIV NB35, and different Eurasian domesticates, including many ancient Western European ones, which reveals a likely expansion of certain subclades east and west with the Repin culture.
  • (…) DOM2 contributed 22% to the ancestor of Przewalski’s horses ca. 9.47 kya, suggesting the Holocene optimum, rather than the Eneolithic Botai culture (∼5.5 kya), as a period of population contact. This pre-Botai introgression could explain the Y chromosome topology, where Botai horses were reported to carry two different segregating haplogroups: one occupied a basal position in the phylogeny while the other was closely related to DOM2. Multiple admixture pulses, however, are known to have occurred along the divergence of DOM2 and the Botai-Borly4 lineage, including 2.3% post-Borly4 contribution to DOM2, and a more recent 6.8% DOM2 intogression into Przewalski’s horses (Gaunitz et al., 2018). Model C2 parameters accommodate all these as a single admixture pulse, likely averaging the contributions of all these multiple events.

    Tip labels are respectively composed of individual sample names, their reference number as well as their age (years ago, from 2017). Red, orange, light green, green, dark green and blue refer to modern horses, ancient DOM2, Botai horses, Borly4 horses, Przewalski’s horses and E. lenensis, respectively. Black refers to wild horses not yet identified to belong to any particular cluster in absence of sufficient genome-scale data. Clades composed of only Przewalski’s horses or ancient DOM2 horses were collapsed to increase readability.

    (A) Best maximum likelihood tree retracing the phylogenetic relationships between 270 mitochondrial genomes.

    B) Best Y chromosome maximum likelihood tree (GTRGAMMA substitution model) excluding outgroup. Node supports are indicated as fractions of 100 bootstrap pseudoreplicates. Bootstrap supports inferior to 90% are not shown. The root was placed on the tree midpoint. See also Table S5 for dataset information.

    Image modified from the paper, including a red square in archaic groups that contain the Hungarian sample, and a red circle around the most likely common ancestral stallion and mare from the Pontic-Caspian steppes.

    The paper cannot offer a detailed picture of ancient horse domestication, but it is yet another step in showing how Repin/Yamna is the most likely source of expansion of horse domesticates in Eurasia. Even more interestingly, Yamna settlers in Hungary probably expanded an ancient lineage of that horse at the same time as they spread with the Classical Bell Beaker culture. Remarkable parallels are thus found between:

    The expansion of an ancient line of horse domesticates related to Yamna Hungary/East Bell Beakers seems to be confirmed by the pre-Iberian sample from Vilars I, Els Vilars4618 2672 (ca. 700-550 BC), likely of Iberian Beaker descent, showing a lineage older than the Indo-Iranian ones, which later replaced most European lines.

    NOTE. For known contacts between Yamna and Proto-Beakers just before the expansion of East Bell Beakers, see a recent post on Vanguard Yamna groups.

    The findings of the paper confirm the expansion of the horse firstly (and mainly) through the steppe biome, mimicking the expansion of Proto-Indo-Europeans first, and then replaced gradually (or not so gradually) by lines brought to Europe during westward expansions of Bronze Age, Iron Age, and later specialized horse-riding steppe cultures. The expansion also correlates well with the known spread of animal traction and pastoralism before 2000 BC:

    Top image: Map with evidence of animal traction before ca. 2000 BC. Bottom image: frequency of finds of evidence for animal traction (orange), cylinder seals (purple) and potter’s wheels (green) in the 4th and 3rd millennium BC (query from the Digital Atlas of Innovations). The data points to an early peak in the expansion of this innovation at the turn of the 4th–3rd millennium BC, while direct evidence supports a radical increase from around the mid–3th millennium BC until the early 2nd millennium, coinciding with the expansion of East Bell Beakers and related European Early Bronze Age cultures. Data and image modified from Klimscha (2017).

    EDIT (3 MAY 2019): A recent reminder of these parallel developments by David Reich in Insights into language expansions from ancient DNA:

    • Yamna expansion to the west “with horses and wagons”, with a more homogeneous ancestry in modern Europeans due to later migrations from the east (and north):

    • “Descendants” of Yamna (once the culture was already “dead”), expanding to the east mainly with Corded Ware ancestry:

    Another recent open access paper on horse domestication is The horse Y chromosome as an informative marker for tracing sire lines, by Felkel et al. Scientific Reports (2019).


Common Slavs from the Lower Danube, expanding with haplogroup E1b-V13?


Florin Curta has published online his draft for Eastern Europe in the Middle Ages (500-1300), Brill’s Companions to European History, Vol. 10 (2019), apparently due to appear in June.

Some interesting excerpts, relevant for the latest papers (emphasis mine):

The Archaeology of the Early Slavs

(…) One of the most egregious problems with the current model of the Slavic migration is that it is not at all clear where it started. There is in fact no agreement as to the exact location of the primitive homeland of the Slavs, if there ever was one. The idea of tracing the origin of the Slavs to the Zarubyntsi culture dated between the 3rd century BC and the first century AD is that a gap of about 200 years separates it from the Kiev culture (dated between the 3rd and the 4th century AD), which is also attributed to the Slavs. Furthermore, another century separates the Kiev culture from the earliest assemblages attributed to the Prague culture. It remains unclear as to where the (prehistoric) Slavs went after the first century, and whence they could return, two centuries later, to the same region from which their ancestors had left. The obvious cultural discontinuity in the region of the presumed homeland raises serious doubts about any attempts to write the history of the Slavic migration on such a basis. There is simply no evidence of the material remains of the Zarubyntsi, Kiev, or even Prague culture in the southern and southwestern direction of the presumed migration of the Slavs towards the Danube frontier of the Roman Empire.

Moreover, the material culture revealed by excavations of 6th- to 7th-century settlements and, occasionally, cremation cemeteries in northwestern Russia, Belarus, Poland, Moravia, and Bohemia is radically different from that in the lands north of the Danube river, which according to the early Byzantine sources were inhabited at that time by Sclavenes: no settlement layout with a central, open area; no wheel-made pottery or pottery thrown on a tournette; no clay rolls inside clay ovens; few, if any clay pans; no early Byzantine coins, buckles, or remains of amphorae; no fibulae with bent stem, and few, if any bow fibulae. Conversely, those regions have produced elements of material culture that have no parallels in the lands north of the river Danube: oval, trough-like settlement features (which are believed to be remains of above-ground, log-houses); exclusively handmade pottery of specific forms; very large settlements, with over 300 houses; fortified sites that functioned as religious or communal centers; and burials under barrows. With no written sources to inform about the names and identities of the populations living in the 6th and 7th centuries in East Central and Eastern Europe, those contrasting material culture profiles could hardly be interpreted as ethnic commonality. In other words, there is no serious basis for attributing to the Sclavenes (or, at least, to those whom early Byzantine authors called so) any of the many sites excavated in Russia, Belarus, Poland, Moravia, and Bohemia.

Common Slavic expanding with Prague-Korchak from the east…or was it from the west?


There is of course evidence of migrations in the 6th and 7th centuries, but not in the directions assumed by historians. For example, there are clear signs of settlement discontinuity in northern Germany and in northwestern Poland. German archaeologists believe that the bearers of the Prague culture who reached northern Germany came from the south (from Bohemia and Moravia), and not from the east (from neighboring Poland or the lands farther to the east). At any rate, no archaeological assemblage attributed to the Slavs either in northern Germany or in northern Poland may be dated earlier than ca. 700. In Poland, settlement discontinuity was postulated, to make room for the new, Prague culture introduced gradually from the southeast (from neighboring Ukraine). However, there is increasing evidence of 6th-century settlements in Lower Silesia (western Poland and the lands along the Middle Oder) that have nothing to do with the Prague culture. Nor is it clear how and when did the Prague culture spread over the entire territory of Poland. No site of any of the three archaeological cultures in Eastern Europe that have been attributed to the Slavs (Kolochin, Pen’kivka, and Prague/Korchak) has so far been dated earlier than the sites in the Lower Danube region where the 6th century sources located the Sclavenes. Neither the Kolochin, nor the Pen’kivka cultures expanded westwards into East Central or Southeastern Europe; on the contrary, they were themselves superseded in the late 7th or 8th century by other archaeological cultures originating in eastern Ukraine. Meanwhile, there is an increasing body of archaeological evidence pointing to very strong cultural influences from the Lower and Middle Danube to the Middle Dnieper region during the 7th century—the opposite of the alleged direction of Slavic migration.

When did the Slavs appear in those regions of East Central and Eastern Europe where they are mentioned in later sources? A resistant stereotype of the current scholarship on the early Slavs is that “Slavs are Slavonic-speakers; Slavonic-speakers are Slavs.”* If so, when did people in East Central and Eastern Europe become “Slavonic speakers”? There is in fact no evidence that the Sclavenes mentioned by the 6th-century authors spoke Slavic (or what linguists now call Common Slavic). Nor can the moment be established (with any precision), at which Slavic was adopted or introduced in any given region of East Central and Eastern Europe.** To explain the spread of Slavic across those regions, some have recently proposed the model of a koiné, others that of a lingua franca. The latter was most likely used within the Avar polity during the last century of its existence (ca. 700 to ca. 800).

*Ziółkowski, “When did the Slavs originate?” p. 211. On the basis of the meaning of the Old Church Slavonic word ięzyk (“language,” but also “people” or “nation”), Darden, “Who were the Sclaveni?” p. 138 argues that the meaning of the name the Slavs gave to themselves was closely associated with the language they spoke.

**Uncertainty in this respect dominates even in recent studies of contacts between Slavic and Romance languages (particularly Romanian), even though such contacts are presumed to have been established quite early (Paliga, “When could be dated ‘the earliest Slavic borrowings’?”; Boček, Studie). Recent studies of the linguistic interactions between speakers of Germanic and speakers of Slavic languages suggest that the adoption of place names of Slavic origin was directly linked to the social context of language contact between the 9th and the 13th centuries (Klír, “Sociální kontext”).


During the 6th century, the area between the Danube and the Tisza in what is today Hungary, was only sparsely inhabited, and probably a “no man’s land” between the Lombard and Gepid territories. It is only after ca. 600 that this area was densely inhabited, as indicated by a number of new cemeteries that came into being along the Tisza and north of present-day Kecskemét. There can therefore be no doubt about the migration of the Avars into the Carpathian Basin, even though it was probably not a single event and did not involve only one group of population, or even a cohesive ethnic group.

The number of graves with weapons and of burials with horses is particularly large in cemeteries excavated in southwestern Slovakia and in neighboring, eastern Austria. This was a region of special status on the border of the qaganate, perhaps a “militarized frontier.” From that region, the Avar mores and fashions spread farther to the west and to the north, into those areas of East Central Europe in which, for reasons that are still not clear, Avar symbols of social rank were particularly popular, as demonstrated by numerous finds of belt fittings. Emulating the success of the Avar elites sometimes involved borrowing other elements of social representation, such as the preferential deposition of weapons and ornamented belts. For example, in the early 8th century, a few males were buried in Carinthia (southern Austria) with richly decorated belts imitating those in fashion in the land of the Avars, but also with Frankish weapons and spurs. Much like in the Avar-age cemeteries in Slovakia and Hungary, the graves of those socially prominent men are often surrounded by many burials without any grave goods whatsoever.

Territory of the early Avar Qaganate and the location of the investigated sites in the Carpathian Basin in Csáky et al. (2019).


Carantania was a northern neighbor of the Lombard duchy of Friuli, which was inhabited by Slavs. According to Paul the Deacon, who was writing in the late 780s, those Slavs called their country Carantanum, by means of a corruption of the name of ancient Carnuntum (a former Roman legionary camp on the Danube, between Vienna and Bratislava). Carantanians were regarded as Slavs by the author of a report known as the Conversion of the Bavarians and Carantanians, and written in ca. 870 in order to defend the position of the archbishop of Salzburg against the claims of Methodius, the bishop of Pannonia.94 According to this text, a duke named Boruth was ruling over Carantania when he was attacked by Avars in ca. 740. He called for the military assistance of his Bavarian neighbors. The Bavarian duke Odilo (737–748) obliged, defeated the Avars, but in the process also subdued the Carantanians to his authority. Once Bavarian overlordship was established in Carantania, Odilo took with him as hostages Boruth’s son Cacatius and his nephew Chietmar (Hotimir). Both were baptized in Bavaria. During the 743 war between Odilo and Charles Martel’s two sons, Carloman and Pepin (the Mayors of the Palace in Austrasia and Neustria, respectively), Carantanian troops fought on the Bavarian side. The Bavarian domination cleared the field for missions of conversion to Christianity sent by Virgil, the new bishop of Salzburg (746–784). Many missionaries were of Bavarian origin, but some were Irish monks.


Several Late Avar cemeteries dated to the last quarter of the 8th century are known from the lands north of the middle course of the river Danube, in what is today southern Slovakia and the valley of the Lower Morava [see image below]. By contrast, only two cemeteries have so far been found in Moravia (the eastern part of the present-day Czech Republic), along the middle and upper course of the Morava and along its tributary, the Dyje. In both Dolní Dunajovice and Hevlín, the latest graves may be dated by means of strap ends and belt mounts with human figures to the very end of the Late Avar period. (…)

The archaeological evidence pertaining to burial assemblages dated to the early 9th century is completely different. Shortly before or after 800, all traces of cremation—with or without barrows—disappear from the valley of the Morava river and southwestern Slovakia, two regions in which cremation had been the preferred burial rite during the previous centuries. This dramatic cultural change has often been interpreted as a direct influence of both Avar and Frankish burial rites, but it coincides in time with the adoption of Christianity by local elites. In spite of conversion, however, the representation of status through furnished burial continued well into the 9th century. Unlike Avar-age sites in Hungary and the surrounding regions, many men were buried in 9th-century Moravia together with their spurs, in addition to such weapons as battle axes, “winged” lance heads, or swords with high-quality steel blades of Frankish production.

Relevant Moravian sites mentioned in Curta’s new book.

When the Magyars inflicted a crushing defeat on the Bavarians at Bratislava (July 4, 907), the fate of Moravia was sealed as well. Moravia and the Moravians disappear from the radar of the written sources, and historians and archaeologists alike believe that the polity collapsed as a result of the Magyar raids.


(…) although there can be no doubt about the relations between Uelgi and the sites in Hungary attributed to the first generations of Magyars, those relations indicate a migration directly from the Trans-Ural lands, and not gradually, with several other stops in the forest-steppe and steppe zones of Eastern Europe. In the lands west of the Ural Mountains, the Magyars are now associated with the Kushnarenkovo (6th to 8th century) and Karaiakupovo (8th to 10th century) cultures, and with such burial sites as Sterlitamak (near Ufa, Bashkortostan) and Bol’shie Tigany (near Chistopol, Tatarstan).* However, the same problem with chronology makes it difficult to draw the model of a migration from the lands along the Middle Volga. Many parallels for the so typically Magyar sabretache plates found in Hungary are from that region. They have traditionally been dated to the 9th century, but more recent studies point to the coincidence in time between specimens found in Eastern Europe and those from Hungary.

* Ivanov, Drevnie ugry-mad’iary; Ivanov and Ivanova, “Uralo-sibirskie istoki”; Boldog et al., “From the ancient homelands,” p. 3; Ivanov, “Similarities.” Ivanov, “Similarities,” p. 562 points out that the migration out of the lands along of the Middle Volga is implied by the disappearance of both cultures (Kushnarenkovo and Karaiakupovo) in the mid-9th century. For the Kushnarenkovo culture, see Kazakov, “Kushnarenkovskie pamiatniki.” For the Karaiakupovo culture, see Mogil’nikov, “K probleme.”

Given that the Magyars are first mentioned in relation to events taking place in the Lower Danube area in the 830s, the Magyar sojourn in Etelköz must have been no longer than 60 years or so—a generation. (…)

A detail of the Arrival of the Hungarians, Árpád Feszty’s and his assistants’ vast (1800 m2) cyclorama, painted to celebrate the 1000th anniversary of the Magyar conquest of Hungary, now displayed at the Ópusztaszer National Heritage Park in Hungary. This specific detail is probably based on the account on The Annals of Fulda, which narrates under the year 894 that the Hungarians crossed the Danube into Pannonia where they “killed men and old women outright and carried off the young women alone with them like cattle to satisfy their lusts and reduced the whole” province “to desert”.

It has become obvious by now that one’s impression of the Magyars as “Easterners” and “steppe-like” was (and still is) primarily based on grave finds, while the settlement material is considerably more aligned with what is otherwise known from other contemporary settlement sites in Central and Southeastern Europe. The dominant feature on the 10th- and 11th-century settlements in Hungary is the sunken-floored building of rectangular plan, with a stone oven in a corner. Similarly, the pottery resulting from the excavation of settlement sites is very similar to that known from many other such sites in Eastern Europe. Moreover, while clear changes taking place in burial customs between ca. 900 and ca. 1100 are visible in the archaeological record from cemeteries, there are no substantial differences between 10th- and the 11th-century settlements in Hungary. (…)

As a matter of fact, the increasing quantity of paleobotanical and zooarchaeological data from 10th-century settlements strongly suggests that the economy of the first generations of Magyars in Hungary was anything but nomadic. To call those Magyars “half-nomad” is not only wrong, but also misleading, as it implies that they were half-way toward civilization, with social changes taking place that must have had material culture correlates otherwise visible in the burial customs.


The origin of “Slavs” (i.e. that of “Slavonic” as a language, whatever the ancestral Proto-Slavic ethnic make-up was) is almost as complicated as the origin of Albanians, Basques, Balts, or Finns. Their entry into history is very recent, with few reliable sources available until well into the Middle Ages. If you add our ignorance of their origin with the desire of every single researcher or amateur out there to connect them to the own region (or, still worse, to all the regions where they were historically attested), we are bound to find contradictory data and a constantly biased selection of information.

Furthermore, it is extremely complicated to connect any recent population to its ancestral (linguistic) one through haplogroups prevalent today, and just absurd to connect them through ancestral components. This, which was already suspected for many populations, has been confirmed recently for Basques in Olalde et al. (2019) and will be confirmed soon for Finns with a study of the Proto-Fennic populations in the Gulf of Finland.

NOTE. Yes, the “my parents look like Corded Ware in this PCA” had no sense. Ever. Why adult people would constantly engage in that kind of false 5,000-year-old connections instead of learning history – or their own family history – escapes all comprehension. But if something is certain about human nature, is that we will still see nativism and ancestry/haplogroup fetishism for any modern region or modern haplogroups and their historically attested ethnolinguistic groups.

Genetic structure of modern Balto-Slavic populations within a European context according to the three genetic systems. Image from Kushniarevich et al. (2015)

As you can see from my maps and writings, I prefer neat and simple concepts: in linguistics, in archaeology, and in population movements. Hence my aversion to this kind of infinite proto-historical accounts (and interpretations of them) necessary to ascertain the origins of recent peoples (Slavs in this case), and my usual preference for:

  • Clear dialectal classifications, whether or not they can be as clear cut as I describe them. The only thing that sets Slavic apart from other recent languages is its connection with Baltic, luckily for both. Even though this connection is disputed by some linguists, and the question is always far from being resolved, a homeland of Proto-Balto-Slavic would almost necessarily need to be set to the north of the Carpathian Mountains in the Bronze Age (or at least close to them).
  • NOTE. A dismissal of a connection with Baltic would leave Slavic a still more complicated orphan, and its dialectal classification within Late PIE more dubious. Its union with Balto-Slavic locates it close to Germanic, and thus as a Bronze Age North-West Indo-European dialect close to northern Germany. So bear with me in accepting this connection, or enter the linguistic hell of arguing for Indo-Slavonic of R1a-Z93 mixed with Temematic….

  • A priori “pots = people” assumption, which may lead to important errors, but fewer than the usual “pots != people” of modern archaeologists. The traditional identification of the Common Slavic expansion with the Prague-Korchak culture – however undefined this culture may be – has clear advantages: it may be connected (although admittedly with many archaeological holes) with western cultures expanding east during the Bronze Age, and then west again after the Iron Age, and thus potentially also with Baltic.
  • A simplistic “haplogroup expansion = ethnolinguistic expansion”, which is quite useful for prehistoric migrations, but enters into evident contradictions as we approach the Iron Age. Common Slavs may be speculatively (for all we know) associated with an expansion of recent R1a-M458 lineages – among other haplogroups – from the east, and possibly Balto-Slavic as an earlier expansion of older subclades from the west, as I proposed in A Clash of Chiefs.
Modern distribution of R1a-M458, after Underhill et al. (2015).

NOTE. The connection of most R1a-Z280 lineages is more obviously done with ancient Finno-Ugric peoples, as it is clear now (see here and here).

Slavs appeared first in the Danube?

No matter what my personal preference is, one can’t ignore the growing evidence, and it seems that Florin Curta‘s long-lasting view of a Danubian origin of expansion for Common Slavic, including its condition as a lingua franca of late Avars, won’t be easy to reject any time soon:

1) Theories concerning Chernyakhov as a Slavic homeland will apparently need to be fully rejected, due to the Germanic-like ancestry that will be reported in the study by Järve et al. (2019).

EDIT (3 MAY 2019). From their poster Shift in the genetic landscape of the western Eurasian Steppe not due to Scythian dominance, but rather at the transition to the Chernyakhov culture (Ostrogoths) (download PDF):

(…) the transition from the Scythian to the Chernyakhov culture (~2,100–1,700 cal BP) does mark a shift in the Ponto-Caspian genetic landscape. Our results agree well with the Ostrogothic origins of the Chernyakhov culture and support the hypothesis that Scythian dominance was cultural rather than achieved through population replacement.

PCA of novel and published ancient samples from Scythian/Sarmatian and related groups on the background of modern samples presented as population medians. Δ – ref. 1, ○ – ref. 2, □ – ref. 3, ◊ – this study. Embedded are the locations of some of the samples. Notice the wide cluster formed by the three samples, from Hungarian Scythians in the west to steppe-like peoples in the east.

2) Therefore, unless Przeworsk shows the traditionally described mixture of populations in terms of ancestry and/or haplogroups, it will also be a sign of East Germanic peoples expanding south (and potentially displacing the ancestors of Slavs in either direction, east or south).

It would seem we are stuck in a Danubian vs. Kievan homeland for Common Slavs, then:

3) About the homeland in the Kiev culture, two early Avar females from Szólád have been commented to cluster “among Modern Slavic populations” based on some data in Amorim et al. (2018).

Rather than supporting an origin of Slavs in common with modern Russians, Poles, and Ukranians as observed in the PCA, though, the admixture of AV1 and AV2 (ca. AD 540-640) paradoxically supports an admixture of Modern Slavs of Eastern Europe in common with early Avar peoples (an Altaic-speaking population) and other steppe groups with an origin in East Asia… So this admixture would actually support a western origin of the Common Slavs with which East Asian Avars may have admixed, and whose descendants are necessarily sampled at later times.

Procrustes transformed PCA of medieval ancient samples against POPRES imputed SNP dataset. AV1 and Av2 samples have been circled in red. Color coding of medieval samples is same as in Figs 1 and 2. Two letter and three codes for POPRES samples: AL=Albania, AT=Austria, BA=Bosnia-Herzegovina, BE=Belgium, BG=Bulgaria, CH=Switzerland, CY=Cyprus, CZ=Czech Republic, DE=Germany, DK=Denmark, ES=Spain, FI=Finland, FR=France, GB=United Kingdom, GR, Greece, HR=Croatia, HU=Hungary, IE=Ireland, IT=Italy, KS=Kosovo, LV=Latvia, MK=Macedonia, NO=Norway, NL=Netherlands, PL=Poland, PT=Portugal, RO=Romania, SM=Serbia and Montenegro, RU=Russia, Sct=Scotland, SE=Sweden, SI=Slovenia, SK=Slovakia, TR=Turkey, UA=Ukraine.

4) Favouring Curta’s Danubian origin (or even an origin near Bohemia) at the moment are thus:

  • The “western” cluster of Early Slavs from Brandýsek, Bohemia (ca. AD 600-900).
  • Two likely Slavic individuals from Usedom, in Mecklenburg-Vorpommern (AD 1200) show hg. R1a-M458 and E1b-M215 (Freder 2010).
  • An early West Slav individual from Hrádek nad Nisou in Northern Bohemia (ca. AD 1330) also shows E1b-M215 (Vanek et al. 2015).
  • One sample from Székkutas-Kápolnadülő (SzK/239) among middle or late Avars (ca. AD 650-710), a supposed Slavonic-speaking polity, of hg. E1b-V13.
  • Two samples from Karosc (K1/13, and K2/6) among Hungarian conquerors (ca. AD 895-950), likely both of hg. E1b-V13, probably connected to the alliance with Moravian elites.
  • Possibly a West Slavic sample from Poland in the High Middle Ages (see below).

A later Hungarian sample (II/53) from the Royal Basilica, where King Béla was interred, of hg. E1b1, supports the importance of this haplogroup among elite conquerors, although its original relation to the other buried individuals is unknown.

NOTE. You can see all ancient samples of haplogroup E to date on this Map of ancient E samples, with care to identify the proper subclades related to south-eastern Europe. About the ancestral origin of the haplogroup in Europe, you may read Potential extra Iberomaurusian-related gene flow into European farmers, by Chad Rohlfsen.

Even assuming that the R1a sample reported from the late Avar period is of a subclade typically associated with Slavs (I know, circular reasoning here), which is not warranted, we would have already 6 E1b1b vs. 1-2 R1a-M458 in populations that can be actually assumed to represent early Slavonic speakers (unlike many earlier cultures potentially associated with them), clearly earlier than other Slavic-speaking populations that will be sampled in eastern Europe. It is more and more likely that Early Slavs are going to strengthen Curta’s view, and this may somehow complicate the link of Proto-Slavic with eastern European BA cultures like Trzciniec or Lusatian.

NOTE. I am still expecting a clear expansion associated with Prague-Korchak, though, including a connection with bottlenecks based on R1a-M458 in the Middle Ages, whether the expansion is eventually shown to be from the west (i.e. Bohemia -> Prague -> Korchak), or from the east (i.e. Kiev -> Korchack -> Prague), and whether or not this cultural community was later replaced by other ‘true’ Slavonic-speaking cultures through acculturation or population movements.

Common theories on Slavic origins.. After “The Early Slavs. Culture and Society in Early Medieval Europe” by P. M. Barford, Cornell University Press (2001). Image by Hxseek at Wikipedia.

5) Back to Przeworsk and the “north of the Carpathians” homeland (i.e. between the Upper Oder and the Upper Dniester), but compatible with Curta’s view: Even if Common Slavic is eventually evidenced to be driven by small migrations north and south of the Danube during the Roman Iron Age, before turning into a mostly “R1a-rich” migration or acculturation to the north in Bohemia and then east (which is what this early E1b-V13 connection suggests), this does not dismiss the traditional idea that Late Bronze Age – Iron Age central-eastern Europe was the Proto-Slavic homeland, i.e. likely the Pomeranian culture disturbed by the East Germanic migrations first (in Przeworsk), and the migrations of steppe nomads later (around the Danube).

Even without taking into account the connection with Baltic, the relevance of haplogroup E1b-V13 among Early Slavs may well be a sign of an ancestral population from the northern or eastern Carpathian region, supported by the finding of this haplogroup among the westernmost Scythians. The expansion of some modern E1b-CTS1273 lineages may link Slavic ancestrally with the Lusatian culture, which is an eastern (very specific) Urnfield culture group, stemming from central-east Europe.

An important paper in this respect is the upcoming Zenczak et al., where another hg. E1b1 will be added to the list above: such a sample is expected from Poland (from Kowalewko, Maslomecz, Legowo or Niemcza), either from the Roman Iron Age or Early Middle Ages, close to an early population of likely Scandinavian origin (eight I1 samples), apart from other varied haplogroups, with little relevance of R1a. Whether this E-V13 sample is an Iron Age one (justifying the bottleneck under E-V13 to the south) or, maybe more likely, a late one from the Middle Ages (maybe supporting a connection of the Gothic/Slavic E1b bottleneck with southern Chernyakhov or further west along the Danube) is unclear.

The finding of south-eastern European ancestry and lineages in both, Early Slavs and East Germanic tribes* suggests therefore a Slavonic homeland near (or within) the Przeworsk culture, close to the Albanoid one, as proposed based on topohydronymy. This may point to a complex process of acculturation of different eastern European populations which formed alliances, as was common during the Iron Age and later periods, and which cannot be interpreted as a clear picture of their languages’ original homeland and ancestral peoples (in the case of East Germanic tribes, apparently originally expanding from Scandinavia under strong I1 bottlenecks).

* Iberian samples of the Visigothic period in Spain show up to 25% E1b-V13 samples, with a mixture of haplogroups including local and foreign lineages, as well as some more E1b-V13 samples later during the Muslim period. Out of the two E1b samples from Longobards in Amorim et al. (2018), only SZ18 from Szólád (ca. AD 412-604) is within E1b-V13, in a very specific early branch (SNP M35.2), further locating the expansion of hg. E1b-V13 near the Danube. Samples of haplogroup J (maybe J2a) or G2a among Germanic tribes (and possibly in Poland’s Roman Iron Age / Early Middle Ages) are impossible to compare with early Hungarian ones without precise subclades.

East Slavic expansion in topo-hydronymy. Image from (Udolph 1997, 2016).

I already interpreted the earlier Slavic samples we had as a sign of a Carpathian origin and very recent bottlenecks under R1a lineages among Modern Slavs:

The finding of haplogroup E1b1b-M215 in two independent early West Slavic individuals further supports that the current distribution of R1a1a1b1a-Z282 lineages in Slavic populations is the product of recent bottlenecks. The lack of a precise subclade within the E1b1b-M215 tree precludes a proper interpretation of a potential origin, but they are probably under European E1b1b1a1b1-L618 subclade E1b1b1a1b1a-V13 (formed ca. 6100 BC, TMRCA ca. 2800 BC), possibly under the mutation CTS1273 (formed ca. 2600 BC, TMRCA ca. 2000 BC), in common with other ancient populations around the Carpathians (see below §viii.11. Thracians and Albanians). This gross geographic origin would support the studies of the Common Slavic homeland based on toponymy (Figure 66), which place it roughly between the Upper Oder and the Upper Dniester, north of the Carpathians (Udolph 1997, 2016).

EDIT (8 APR 2019): Another interesting data is the haplogroup distribution among Modern Slavs and neighbouring peoples (see Wikipedia). For example, the bottleneck seen in Modern Albanians, under Z5017 subclade, also points to an origin of the expansion of E1b-V13 subclades among multiethnic groups around the Lower Danube coinciding with the Roman Iron Age, given the estimates for the arrival of Proto-Albanian close to the Latin and Greek linguistic frontier.

Remarkable is also its distribution among Rusyns, East Slavs from the Carpathians not associated with the Kievan Rus’, isolated thus quite soon from East Slavic expansions to the east. They were reported to show ca. 35% hg. E1b-V13 globally in FTDNA, with a frequency similar to or higher than R1a, in common with South Slavic peoples*, reflecting thus a situation similar to the source of East Slavs before further R1a-based bottlenecks (and/or acculturation events) to the east:

* Although probably due in part to founder effects and biased familial sampling, this should be assumed to be common to all FTDNA sampling, anyway.

Map showing the full geographic extent of the Rusyn people in Central Europe, prior to World War I (Carpatho Rusyn Society).

Repeating what should be already evident: in complex organizations and/or demographically dense populations (more common since the Iron Age), we can’t expect language change to happen in the same way as during the known Neolithic or Chalcolithic population replacements, be it in Finland, Hungary, Iberia, or Poland. For example, no matter whether Romans (2nd c. BC) brought some R1b-U152 and other Mediterranean lineages to Iberia; Germanic peoples entering Hispania (AD 5th c.) were of typically Germanic lineages or not; Muslims who spoke mainly Berber (AD 8th c.) and were mainly of hg. E1b-M81 (and J?) brought North African ancestry; etc. the language or languages of Iberia changed (or not) with the political landscape: neither with radical population replacements (or full population continuity), nor with the dominant haplogroups’ ancestral language.

Y-chromosome haplogroups are, in those cases, useful for ascertaining a more recent origin of the population. Like the finding of certain R1a-Z645, I2a-L621 & N-L392 lineages among Hungarians shows a recent origin near the Trans-Urals forest-steppes, or the finding of I1, R1b-U106 & E1b-V13 among Visigoths shows a recent origin near the Danube, the finding of Early Slavs (ca. AD 6th-7th c.) originally with small elite groups of hg. R1a-M458 & E1b-V13 from the Lower/Middle Danube – if strengthened with more Early Slavic samples, with Slavonic partially expanding as a lingua franca in some regions – is not necessarily representative of the Proto-Slavic community, just as it is clearly not representative of the later expansion of Slavic dialects. It would be representative, though, of the same processes of acculturation repeated all over Eurasia at least since the Iron Age, where no genetic continuity can be found with ancestral languages.


Updates to ASoSaH: new maps, updated PCA, and added newest research papers


The title says it all. I have used some free time to update the series A Song of Sheep and Horses:

I basically added information from the latest papers published, which (luckily enough for me) haven’t been too many, and I have added images to illustrate certain sections.

I have updated the PCAs by including North Caucasus samples from Wang et al. (2018), whose position I could only infer for older versions from previously published PCA graphs.

PCA of ancient and modern Eurasian samples. Early Eneolithic admixture events in the steppe drawn.

I have also added to the supplementary materials the “Tip of the Iceberg” R1b tree by Mike Walsh from the FTDNA R1b group, with permission, because some relevant genetic sections are centered on the evolution of R1b lineages, and the reader can get easily lost with so many subclades.

I have also updated maps, including some of the Y-DNA ones, and managed to finish two new maps I was working on, and I added them to the supplementary materials and to the menu above:

One on Yamna kurgans in Hungary, coupled with contemporaneous sites of Baden-Boleráz or Kostolac cultures:

Map of attested Yamnaya pit-grave burials in the Hungarian plains; superimposed in shades of blue are common areas covered by floods before the extensive controls imposed in the 19th century; in orange, cumulative thickness of sand, unfavourable loamy sand layer. Marked are settlements/findings of Boleráz (ca. 3500 BC on), Baden (until ca. 2800 BC), Kostolac (precise dates unknown), and Yamna kurgans (from ca. 3100/3000 BC on).

Another one on Steppe ancestry expansion, with a tentative distribution of “steppe ancestry” divided into that of Sredni Stog/Corded Ware origin vs. that of Repin/Yamna origin, a difference that has been known for quite some time already.

It is tentative because there hasn’t been any professional study or amateur attempt to date to differentiate both “steppe ancestries” in Yamna, and especially in Bell Beakers. So much for the call of professional geneticists since 2018 (see here and here) and archaeologists since 2017 (see e.g. here and here) to distinguish fine-scale population structure to be able to follow neighbouring populations which expanded with different archaeological (and thus ethnolinguistic) groups.

Tentative map of fine-scale population structure during steppe-related expansions (ca. 3500–2000 BC), including Repin–Yamna–Bell Beaker/Balkans and Sredni Stog–Corded Ware groups. Data based on published samples and pairwise comparisons tested to date. Notice that the potential admixture of expanding Repin/Early Yamna settlers in the North Pontic area with the late Sredni Stog population (and thus Sredni Stog-related ancestry in Yamna) has been omitted for simplicity purposes, assuming thus a homogeneous Yamna vs. Corded Ware ancestry.

I think both maps are especially important today, given the current Nordicist reactionary trends arguing (yet again) for an origin of Indo-Europeans in The North™, now based on the Fearsome Tisza River hypothesis, on cephalic index values, and a few pairwise comparisons – i.e. an absolutely no-nonsense approach to the Indo-European question (LOL). At least I get to relax and sit this year out just observing how other people bury themselves and their beloved “steppe ancestry=IE” under so many new pet theories…

NOTE. Not that there is anything wrong with a northern origin of North-West Indo-European from a linguistic point of view, as I commented recently – after all, a Corded Ware origin would roughly fit the linguistic guesstimates, unlike the proposed ancestral origins in Anatolia or India. The problem is that, like many other fringe theories, it is today just based on tradition, or (even worse) ethnic, political, or personal desires, and it doesn’t make sense when all findings from disciplines involved in the Indo-European and Uralic questions are combined.

Simple ancestry percentages in modern populations. Recent image by Iain Mathieson 2019 (min. 5.57). A simplistic “Steppe ancestry” defining Indo-European speakers…? Sure.

Within 20 or 30 years, when genetic genealogists (or amateur geneticists, or however you want to call them) ask why we had the opportunity since 2015 to sample as many Hungarian Yamnaya individuals as possible and we didn’t, when it is clear that the number of unscathed kurgans is diminishing every year (from an estimated 4,000 in the 20th century, of the original tens of thousands, to less than 1,500 today) the answer will not be “because this or that archaeologist or linguist was a dilettante or a charlatan‘, as they usually describe academics they dislike.

It will be precisely because the very same genetic genealogists – supposedly interested today in the origin of R1b-L151 and/or genetic marker associated with North-West Indo-Europeans – are obsessed with finding them anywhere else but for Hungary, and prefer to use their money and time to play with a few statistical tools within a biased framework of flawed assumptions and study designs, obtaining absurd results and accepting far-fetched interpretations of them, to be told exactly what they want to hear: be it the Franco-Cantabrian homeland, the Dutch or Moravian Beaker from CWC homeland, the Maykop homeland, or the Moon homeland.

Poetic justice this heritage destruction, whose indirect causes will remain written in Internet archives for everyone to see, as a good lesson for future generations.

Ahead of the (Indo-European – Uralic) game: in theory and in numbers


There is a good reason for hope, for those who look for a happy ending to the revolution of population genomics that is quickly turning into an involution led by beliefs and personal interests. This blog is apparently one of the the most read sites on Indo-European peoples, if not the most read one, and now on Uralic peoples, too.

I’ve been checking the analytics of our sites, and judging by the numbers of the English blog, (without the other languages) is quickly turning into the most visited one from Academia Prisca‘s sites on Indo-European languages, beyond (and its parent sites in other languages), which host many popular files for download.

If we take into account file downloads (like images or PDFs), and not only what Google Analytics can record, has not more users than all other websites of Academia Prisca, but at this pace it will soon reach half the total visits, possibly before the end of 2019.

Overall, we have evolved from some 10,000 users/year in 2006 to ~300,000 active users/year and >1,000,000 page+file views/year in 2018 (impossible to say exactly without spending too much time on this task). Nothing out of the ordinary, I guess, and obviously numbers are not a quality index, but rather a hint at increasing popularity of the subject and of our work.

NOTE. The mean reading time is ~2:40 m, which I guess fits the length of most posts, and most visitors read a mean of ~2+ pages before leaving, with increasing reader fidelity over time.

Number of active users of, according to Google Analytics since before the start of the new blog. Notice the peaks corresponding to the posts below (except the last one, corresponding to the publication of A Song of Sheep and Horses).

The most read posts of 2018, now that we can compare those from the last quarter, are as follows:

  1. – The series on the Corded Ware-Uralic theory, with a marked increase in readers, especially with the last three posts:
    1. Finno-Permic and the expansion of N-L392/Siberian ancestry,
    2. “Siberian ancestry” and Ugric-Samoyedic expansions, and
    3. Haplogroups R1a and N in Finno-Ugric and Samoyedic
  2. Haplogroup is not language, but R1b-L23 expansion was associated with Proto-Indo-Europeans
  3. The history of the simplistic ‘haplogroup R1a — Indo-European’ association
  4. On the origin of haplogroup R1b-L51 in late Repin / early Yamna settlers
  5. On the origin and spread of haplogroup R1a-Z645 from eastern Europe
  6. The Caucasus a genetic and cultural barrier; Yamna dominated by R1b-M269; Yamna settlers in Hungary cluster with Yamna
  7. Something is very wrong with models based on the so-called ‘Yamnaya admixture’ – and archaeologists are catching up (II)
  8. Olalde et al. and Mathieson et al. (Nature 2018): R1b-L23 dominates Bell Beaker and Yamna, R1a-M417 resurges in East-Central Europe during the Bronze Age
  9. Early Indo-Iranian formed mainly by R1b-Z2103 and R1a-Z93, Corded Ware out of Late PIE-speaking migrations
  10. “Steppe ancestry” step by step: Khvalynsk, Sredni Stog, Repin, Yamna, Corded Ware

NOTE. Of course, the most recent posts are the most visited ones right now, but that’s because of the constant increase in the number of visitors.

I think it is obvious what the greatest interest of readers has been in the past two years. You can see the pattern by looking at the most popular posts of 2017, when the blog took off again:

  1. Germanic–Balto-Slavic and Satem (‘Indo-Slavonic’) dialect revisionism by amateur geneticists, or why R1a lineages *must* have spoken Proto-Indo-European
  2. The renewed ‘Kurgan model’ of Kristian Kristiansen and the Danish school: “The Indo-European Corded Ware Theory”
  3. The new “Indo-European Corded Ware Theory” of David Anthony
  4. Correlation does not mean causation: the damage of the ‘Yamnaya ancestral component’, and the ‘Future American’ hypothesis
  5. The Aryan migration debate, the Out of India models, and the modern “indigenous Indo-Aryan” sectarianism

The most likely reason for the radical increase in this blog’s readership is very simple, then: people want to know what is really happening with the research on ancestral Indo-Europeans and Uralians, and other blogs and forums are not keeping up with that demand, being content with repeating the same ideas again and again (R1a-CWC-IE, R1b-BBC-Vasconic, and N-Comb Ware-Uralic), despite the growing contradictions. As you can imagine, once you have seen the Yamna -> Bell Beaker migration model of North-West Indo-European, with Corded Ware obviously representing Uralic, you can’t unsee it.

The online bullying, personal attacks, and similar childish attempts to silence those who want to talk about this theory elsewhere (while fringe theories like R1a/CHG-OIT, R1b-Vasconic, or the Anatolian/Armenian-CHG hypotheses, to name just a few, are openly discussed) has had, as could be expected, the opposite effect to what was intended. I guess you can say this blog and our projects have profited from the first relevant Streisand effect of population genomics, big time.

If this trend continues this year (and other bloggers’ or forum users’ faith in miracles is not likely to change), I suppose that after the Yamna Hungary samples are published (with the expected results) this blog is going to be the most read in 2020 by a great margin… I can only infer that this tension is also helping raise the interest in (and politicization of) the question, hence probably the overall number of active users and their participation in other blogs and forums is going to increase everywhere in 2019, too, as this debate becomes more and more heated.

So, what I infer from the most popular posts and the numbers is that people want criticism and controversy, and if you want blood you’ve got it. Here it is, my latest addition to the successful series criticizing the “Corded Ware/R1a–Indo-European” pet theories, a post I wrote two-three months ago, slightly updated with the newest comedy, and a sure success for 2019 (already added to the static pages of the menu):

The “Indo-European Corded Ware theory” doesn’t hold water

This is how I feel when I see spikes in visits with more and more returning users linked to my controversial posts 😉

Are you not entertained?! Are you not entertained?! Is this not why you are here?!

Mitogenomes from Avar nomadic elite show Inner Asian origin


Inner Asian maternal genetic origin of the Avar period nomadic elite in the 7th century AD Carpathian Basin, by Csáky et al. bioRxiv (2018).

Abstract (emphasis mine):

After 568 AD the nomadic Avars settled in the Carpathian Basin and founded their empire, which was an important force in Central Europe until the beginning of the 9th century AD. The Avar elite was probably of Inner Asian origin; its identification with the Rourans (who ruled the region of today’s Mongolia and North China in the 4th-6th centuries AD) is widely accepted in the historical research.

Here, we study the whole mitochondrial genomes of twenty-three 7th century and two 8th century AD individuals from a well-characterised Avar elite group of burials excavated in Hungary. Most of them were buried with high value prestige artefacts and their skulls showed Mongoloid morphological traits.

The majority (64%) of the studied samples’ mitochondrial DNA variability belongs to Asian haplogroups (C, D, F, M, R, Y and Z). This Avar elite group shows affinities to several ancient and modern Inner Asian populations.

The genetic results verify the historical thesis on the Inner Asian origin of the Avar elite, as not only a military retinue consisting of armed men, but an endogamous group of families migrated. This correlates well with records on historical nomadic societies where maternal lineages were as important as paternal descent.

MDS with 23 ancient populations. The Multidimensional Scaling plot is based on linearised Slatkin FST values that were calculated based on whole mitochondrial sequences (stress value is 0.1581). The MDS plot shows the connection of the Avars (AVAR) to the Central-Asian populations of the Late Iron Age (C-ASIA_LIAge) and Medieval period (C-ASIA_Medieval) along coordinate 1 and coordinate 2, which is caused by non-significant genetic distances between these populations. The European ancient populations are situated on the left part of the plot, where the Iberian (IB_EBRAge), Central-European (C-EU_BRAge) and British (BRIT_BRAge) populations from Early Bronze Age and Bronze Age are clustered along coordinate 2, while the Neolithic populations from Germany (GER_Neo), Hungary (HUN_Neo), Near-East (TUR_ _Neo) and Baltic region (BALT_Neo) are located on the skirt of the plot along coordinate 1. The linearised Slatkin FST values, abbreviations and references are presented in Table S4.

Interesting excerpts:

The mitochondrial genome sequences can be assigned to a wide range of the Eurasian haplogroups with dominance of the Asian lineages, which represent 64% of the variability: four samples belong to Asian macrohaplogroup C (two C4a1a4, one C4a1a4a and one C4b6); five samples to macrohaplogroup D (one by one D4i2, D4j, D4j12, D4j5a, D5b1), and three individuals to F (two F1b1b and one F1b1f). Each haplogroup M7c1b2b, R2, Y1a1 and Z1a1 is represented by one individual. One further haplogroup, M7 (probably M7c1b2b), was detected (sample AC20); however, the poor quality of its sequence data (2.19x average coverage) did not allow further analysis of this sample.

European lineages (occurring mainly among females) are represented by the following haplogroups: H (one H5a2 and one H8a1), one J1b1a1, three T1a (two T1a1 and one T1a1b), one U5a1 and one U5b1b (Table S1).

We detected two identical F1b1f haplotypes (AC11 female and AC12 male) and two identical C4a1a4 haplotypes (AC13 and AC15 males) from the same cemetery of Kunszállás; these matches indicate the maternal kinship of these individuals. There is no chronological difference between the female and the male from Grave 30 and 32 (AC11 and AC12), but the two males buried in Grave 28 and 52 (AC13 and AC15) are not contemporaries; they lived at least 2-3 generations apart.

Ward type clustering of 44 ancient populations. The Ward type clustering shows separation of Asian and European populations. The Avar elite group (AVAR) is situated on an Asian branch and clustered together with Central Asian populations from Late Iron Age (C-ASIA_LIAge) and Medieval period (C-ASIA_Medieval), furthermore with Xiongnu period population from Mongolia (MON_Xiongnu) and Scythians from the Altai region (E-EU_IAge_Scyth). P values are given in percent as red numbers on the dendogram, where red rectangles indicate clusters with significant p values. The abbreviations and references are presented in Table S2.

The Avar period elite shows the lowest and non-significant genetic distances to ancient Central Asian populations dated to the Late Iron Age (Hunnic) and to the Medieval period, which is displayed on the ancient MDS plot (Fig. 4); these connections are also reflected on the haplogroup based Ward-type clustering tree (Fig. 3). Building of these large Central Asian sample pools is enabled by the small number of samples per cultural/ethnic group. Further mitogenomic data from Inner Asia are needed to specify the ancient genetic connections; however, genomic analyses are also set back by the state of archaeological research, i.e. the lack of human remains from the 4th-5th century Mongolia, which would be a particularly important region in the study of the Avar elite’s origin.

The investigated elite group from the Avar period elite also shows low genetic distances and phylogenetic connections to several Central and Inner Asian modern populations. Our results indicate that the source population of the elite group of the Avar Qaganate might have existed in Inner Asia (region of today’s Mongolia and North China) and the studied stratum of the Avars moved from there westwards towards Europe. Further genetic connections of the Avars to modern populations living to East and North of Inner Asia (Yakuts, Buryats, Tungus) probably indicate common source populations.

MDS with the 44 modern populations and the Avar elite group. The Multidimensional Scaling plot is displayed based on linearised Slatkin FST values calculated based on whole mitochondrial sequences (stress value is 0.0677). The MDS plot shows differentiation of European, Near-Eastern, Central- and East-Asian populations along coordinates 1 and 2. The Avar elite (AVAR) is located on the Asian part of plot and clustered with Uyghurs from Northwest-China (NW-CHIN_UYG) and Han Chinese (CHIN), as well as with Burusho and Hazara populations from the Central-Asian Highland (Pakistan). The linearised Slatkin FST values, abbreviations and references are presented in Table S5.

Sadly, no Y-DNA is available from this paper, although haplogroups Q, C2, or R1b (xM269) are probably to be expected, given the reported mtDNA. A replacement of the male population with subsequent migrations is obvious from the current distribution of Y-DNA haplogroups in the Carpathian Basin.

Hungarians and Corded Ware

Ancient Hungarians are important to understand the evolution, not only of Ugric, but also of Finno-Ugric peoples and their origin, since they show a genetic picture before more recent population expansions, genetic drift, and bottlenecks in eastern Europe.

By now it is evident that the migration of Magyar clans from their homeland in the Cis-Urals region (from the 4th century AD on) happened after the first waves of late and gradual expansion of N1c subclades among Finno-Ugric peoples, but before the bottlenecks seen in modern populations of eastern Europe.

In Ob-Ugric peoples, from the scarce data found in Pimenoff et al. (2018), we can see how Siberian N subclades expanded further after the separation of Magyars, evidenced by the inverted proportion of haplogroups R1a and N in modern Khantys and Mansis compared to Hungarians, and the diversity of N subclades compared to modern Fennic peoples.

Similarly to Hungarians, the situation of modern Estonians (where R1a and N subclades show approximately the same proportion, ca. 33%) is probably closer to Fennic peoples in Antiquity, not having undergone the latest strong founder effect evident in modern Finns after their expansion to the north.

Hungarian expansion from the 4th to the 10th century AD.

Modern Hungary

This is data from recent papers, summed up in Wikipedia:

  • In Semino et al. (2001) they found among 45 Palóc from Budapest and northern Hungary: 60% R1a, 13% R1b, 11% I, 9% E, 2% G, 2% J2.
  • In Csányi et al. (2008) Among 100 Hungarian men, 90 of whom from the Great Hungarian Plain: 30% R1a, 15% R1b, 13% I2a1, 13% J2, 9% E1b1b1a, 8% I1, 3% G2, 3% J1, 3% I*, 1% E*, 1% F*, 1% K*. Among 97 Székelys, in Romania: 20% R1b, 19% R1a, 17% I1, 11% J2, 10% J1, 8% E1b1b1a, 5% I2a1, 5% G2, 3% P*, 1% E*, 1% N.
  • In Pamjav et al. (2011), among 230 samples expected to include 6-8% Gypsy peoples: 26% R1a, 20% I2a, 19% R1b, 7% I, 6% J2, 5% H, 5% G2a, 5% E1b1b1a1, 3% J1, <1% N, <1% R2.
  • In Pamjav et al. (2017), from the Bodrogköz population: R1a-M458 (20.4%), I2a1-P37 (19%), R1b-M343 (15%), R1a-Z280 (14.3%), E1b-M78 (10.2%), and N1c-Tat (6.2%).

NOTE. The N1c-Tat found in Bodrogköz belongs to the N1c-VL29 subgroup, more frequent among Balto-Slavic peoples, which may suggest (yet again) an initial stage of the expansion of N subclades among Finno-Ugric peoples by the time of the Hungarian migration.

This is the data from FTDNA group on Hungary (copied from a Wikipedia summary of 2017 data):

  • 26.1% R1a (15% Z280, 6.5% M458, 0.9% Z93=>S23201, 3.7% unknown)
  • 19.2% R1b (6% L11-P312/U106, 5.3% P312, 4.2% L23/Z2103, 3.7% U106)
  • 16.9% I2 (15.2% CTS10228, 1.4% M223, 0.5% L38)
  • 8.3% I1
  • 8.1% J2 (5.3% M410, 2.8% M102)
  • 6.9% E1b1b1 (6% V13, 0.3% V22, 0.3% M123, 0.3% M81)
  • 6.9% G2a
  • 3.2% N (1.4% Z9136, 0.5% M2019/VL67, 0.5% Y7310, 0.9% Z16981)- note: only unrelated males are sampled
  • 2.3% Q (1.2% YP789, 0.9% M346, 0.2% M242)
  • 0.9% T
  • 0.5% J1
  • 0.2% L
  • 0.2% C

R1a-Z280 stands out in FDNA (which we have to assume has no geographic preference among modern Hungarians), while R1a-M458 is prevalent in the north, which probably points to its relationship with (at least West) Slavic populations.

Ancient Hungarians

We already knew that Hungarians show similarities with Srubna and Hunnic peoples, and this paper shows a good reason for the similarities with the Huns.

Also, recent population movements in the region (before the Avars) probably increased the proportion of R1b-L23 and I1 subclades (related to Roman and Germanic peoples) as well as possibly R1a-Z283 (mainly M458, related to the expansion of Slavs). From Understanding 6th-century barbarian social organization and migration through paleogenomics, by Amorim et al. (2018):

Y-chromosome haplogroup attribution for 37 medieval and 1 Bronze age individuals.

NOTE. The sample SZ15, of haplogroup R1a1a1b1a3a (S200), belongs to the Germanic branch Z284, which has a completely different history with its integration into the Nordic Bronze Age community.

Interesting is the Szólád Bronze Age sample of R1a1a1b2a2a (Z2123) subclade (ca. 2100-1700 BC), which is possibly the same haplogroup found in King Béla III [Z93+ (80.6%), Z2123+ (10.8%)]*. Nevertheless, Z2123 refers to an upper clade, found also in East Andronovo sites in Narasimhan et al. (2018), as well as in the modern population of the Tarim Basin.

NOTE. For more on the analysis of probability of the actual subclade, see here.

Bronze Age R1a-Z93 samples of central-east Europe – like the Balkans BA sample (ca. 1750-1625 BC) from Merichleri, of R1a1a1b2 subclade – correspond most likely to the expansion of Iranian-speaking peoples in the early 2nd millennium BC, probably to the westward expansion of the Srubna culture.

The specific subclade of King Béla III, on the other hand, probably corresponds to the more recent expansion of Magyar tribes settled in the region during the 9th century AD, so the specific subclade must have separated from those found in central-east Europe and in Andronovo during the Corded Ware expansion.

Modified image, from Underhill et al. (2015). Spatial frequency distributions of Z282 (green) and Z93 (blue) affiliated haplogroups. Notice the potential Finno-Ugric-associated distribution of Z282 (including M558, a Z280 subclade) according to ancient maps; the northern Eurasian finds of Z2125 (upper clade of Z2123); and the potential of M458 subclades representing a west-east expansion of Balto-Slavic as a western outgroup of an original Fenno-Ugric population, equivalent to Z284 in Scandinavia.

The study by Csányi et al. (2008), where the Tat C allele was found in 2 of 4 ancient samples, showed thus a potential 50:50 relationship of N1c in ancient Magyars, which is striking given the modern 1-3% a mere 1,000 years later, without any relevant population movement in between. This result remains to be reproduced with the current technology.

In fact, recent studies of ancient Magyars, from the 10th to the 12th century, have not shown any N1c sample, and have confirmed instead the ancient presence of R1a (two other samples, interred near Béla III), R1b (four samples), I2a (two samples) J1, and E1b, a mixed genetic picture which is more in line with what is expected.

So the question that I recently posed about east Corded Ware groups remains open: were Proto-Ugric peoples mainly of R1a-Z282 or R1a-Z93 subclades? Without ancient DNA from Middle Dnieper, Fatyanovo, Afanasevo, and the succeeding cultures (like Netted Ware) in north-eastern Europe, it is difficult to say.

It is very likely that they are going to show mainly a mixture of both R1a-Z282 and R1a-Z93 lineages, with later populations showing a higher proportion of R1a-Z280 subclades. Whether this mixture happened already during the Corded Ware period, or is the result of later developments, is still unknown. What is certain is that Hungarian N1a1a1a-L708 subclades belong to more recent additions of Siberian haplogroups to the Ugric stock, probably during the Iron Age, just centuries before the Magyar expansion.


Mitogenomes show Longobard migration was socially stratified and included females


New bioRxiv preprint A genetic perspective on Longobard-Era migrations, by Vai et al. (2018).

Interesting excerpts (emphasis mine):

In this study we sequenced complete mitochondrial genomes from nine early-medieval cemeteries located in the Czech Republic, Hungary and Italy, for a total of 87 individuals. In some of these cemeteries, a portion of the individuals are buried with cultural markers in these areas traditionally associated with the Longobard culture (hereby we refer to these cemeteries as LC), as opposed to burial communities in which no artifacts or rituals associated by archaeologists to Longobard culture have been found in any graves. These necropolises, hereby referred as NLC, may represent local communities or other Barbaric groups previously migrated to this region. This extended sampling strategy provides an excellent condition to investigate the degree of genetic affinity between coeval LC and NLC burials, and to shed light on early-medieval dynamics in Europe.

Geographical and genetic relationship between the newly sequenced individuals. (A) Location of the sampled necropolises. Here and through the other figures LC cemeteries are represented by a circle while NLC ones are indicated by a square. C) DAPC Scatterplot of the most supported K (7) highlighted by the kmeans analysis

Social rank

There is also no clear geographical structure between samples in our dataset, with individuals from Italy, Hungary and Czech Republic clustering together. However, the first PC clearly separates a group of 12 LC individuals found at Szólád, Collegno and Mušov from a group composed by both LC and NLC individuals. The same pattern is also found when pairwise differences among individuals are plotted by multidimensional scaling (…)

The presence in this group of LC sequences belonging to macrohaplogroups I and W, commonly found at high frequencies in northern Europe (e.g. Finland 32), suggests (although certainly does not prove) the existence of a possible link between these 12 LC individuals and northern Europe. The peculiarity of this group is strengthened by archaeological information from the Szólád cemetery, where 8 of the 12 individuals in this group originated, indicating that all these samples were found buried with typical Longobard artifacts and grave assemblages. We do not find the same tight association for the 3 samples from Collegno, where the 3 graves are indeed devoid of evident Germanic cultural markers; however they are not placed in a separate and marginal location—as for the tombs without grave goods found in Szólád —but among graves with wooden chambers and weapons. It is worth noting that weapon burials were quite scarce in 5th century Pannonia and 6th century Italy (e.g. Goths never buried weapons), and an increase in weapon burials started in Italy only after the Longobard migration. In this light, the individuals buried in this manner may have been members of the same community as well, but belonging to the lowest social level. This social condition could explain the absence of artifacts and could be related to mixed marriages, whose offspring had an inferior social rank. Finally, this group also includes an individual from the Musov graveyard. This finding is particularly interesting in light of the fact that the Musov necropolis has been only tentatively associated with Longobard occupation (see Supplementary Text for details), based on the presence of but a few archaeological markers.

Female migration

We hence estimated that about 70% of the lineages found in Collegno actually derived from the Hungarian LC groups, in agreement with previous archaeological and historical hypotheses. This supports the idea that the spread of Longobards into Italy actually involved movements of fairly large numbers of people, who gave a substantial contribution to the gene pool of the resulting populations. This is even more remarkable thinking that, in many studied cases, military invasions are movements of males, and hence do not have consequences at the mtDNA level. Here, instead, we have evidence of changes in the composition of the mtDNA pool of an Italian population, supporting the view that immigration from Central Europe involved females as well as males.