Yamnaya ancestry: mapping the Proto-Indo-European expansions


The latest papers from Ning et al. Cell (2019) and Anthony JIES (2019) have offered some interesting new data, supporting once more what could be inferred since 2015, and what was evident in population genomics since 2017: that Proto-Indo-Europeans expanded under R1b bottlenecks, and that the so-called “Steppe ancestry” referred to two different components, one – Yamnaya or Steppe_EMBA ancestry – expanding with Pro-Indo-Europeans, and the other one – Corded Ware or Steppe_MLBA ancestry – expanding with Uralic speakers.

The following maps are based on formal stats published in the papers and supplementary materials from 2015 until today, mainly on Wang et al. (2018 & 2019), Mathieson et al. (2018) and Olalde et al. (2018), and others like Lazaridis et al. (2016), Lazaridis et al. (2017), Mittnik et al. (2018), Lamnidis et al. (2018), Fernandes et al. (2018), Jeong et al. (2019), Olalde et al. (2019), etc.

NOTE. As in the Corded Ware ancestry maps, the selected reports in this case are centered on the prototypical Yamnaya ancestry vs. other simplified components, so everything else refers to simplistic ancestral components widespread across populations that do not necessarily share any recent connection, much less a language. In fact, most of the time they clearly didn’t. They can be interpreted as “EHG that is not part of the Yamnaya component”, or “CHG that is not part of the Yamnaya component”. They can’t be read as “expanding EHG people/language” or “expanding CHG people/language”, at least no more than maps of “Steppe ancestry” can be read as “expanding Steppe people/language”. Also, remember that I have left the default behaviour for color classification, so that the highest value (i.e. 1, or white colour) could mean anything from 10% to 100% depending on the specific ancestry and period; that’s what the legend is for… But, fere libenter homines id quod volunt credunt.


  1. Neolithic or the formation of Early Indo-European
  2. Eneolithic or the expansion of Middle Proto-Indo-European
  3. Chalcolithic / Early Bronze Age or the expansion of Late Proto-Indo-European
  4. European Early Bronze Age and MLBA or the expansion of Late PIE dialects

1. Neolithic

Anthony (2019) agrees with the most likely explanation of the CHG component found in Yamnaya, as derived from steppe hunter-fishers close to the lower Volga basin. The ultimate origin of this specific CHG-like component that eventually formed part of the Pre-Yamnaya ancestry is not clear, though:

The hunter-fisher camps that first appeared on the lower Volga around 6200 BC could represent the migration northward of un-admixed CHG hunter-fishers from the steppe parts of the southeastern Caucasus, a speculation that awaits confirmation from aDNA.

Natural neighbor interpolation of CHG ancestry among Neolithic populations. See full map.

The typical EHG component that formed part eventually of Pre-Yamnaya ancestry came from the Middle Volga Basin, most likely close to the Samara region, as shown by the sampled Samara hunter-gatherer (ca. 5600-5500 BC):

After 5000 BC domesticated animals appeared in these same sites in the lower Volga, and in new ones, and in grave sacrifices at Khvalynsk and Ekaterinovka. CHG genes and domesticated animals flowed north up the Volga, and EHG genes flowed south into the North Caucasus steppes, and the two components became admixed.

Natural neighbor interpolation of EHG ancestry among Neolithic populations. See full map.

To the west, in the Dnieper-Dniester area, WHG became the dominant ancestry after the Mesolithic, at the expense of EHG, revealing a likely mating network reaching to the north into the Baltic:

Like the Mesolithic and Neolithic populations here, the Eneolithic populations of Dnieper-Donets II type seem to have limited their mating network to the rich, strategic region they occupied, centered on the Rapids. The absence of CHG shows that they did not mate frequently if at all with the people of the Volga steppes (…)

Natural neighbor interpolation of WHG ancestry among Neolithic populations. See full map.

North-West Anatolia Neolithic ancestry, proper of expanding Early European farmers, is found up to border of the Dniester, as Anthony (2007) had predicted.

Natural neighbor interpolation of Anatolia Neolithic ancestry among Neolithic populations. See full map.

2. Eneolithic

From Anthony (2019):

After approximately 4500 BC the Khvalynsk archaeological culture united the lower and middle Volga archaeological sites into one variable archaeological culture that kept domesticated sheep, goats, and cattle (and possibly horses). In my estimation, Khvalynsk might represent the oldest phase of PIE.

(…) this middle Volga mating network extended down to the North Caucasian steppes, where at cemeteries such as Progress-2 and Vonyuchka, dated 4300 BC, the same Khvalynsk-type ancestry appeared, an admixture of CHG and EHG with no Anatolian Farmer ancestry, with steppe-derived Y-chromosome haplogroup R1b. These three individuals in the North Caucasus steppes had higher proportions of CHG, overlapping Yamnaya. Without any doubt, a CHG population that was not admixed with Anatolian Farmers mated with EHG populations in the Volga steppes and in the North Caucasus steppes before 4500 BC. We can refer to this admixture as pre-Yamnaya, because it makes the best currently known genetic ancestor for EHG/CHG R1b Yamnaya genomes.

From Wang et al (2019):

Three individuals from the sites of Progress 2 and Vonyuchka 1 in the North Caucasus piedmont steppe (‘Eneolithic steppe’), which harbour EHG and CHG related ancestry, are genetically very similar to Eneolithic individuals from Khvalynsk II and the Samara region. This extends the cline of dilution of EHG ancestry via CHG-related ancestry to sites immediately north of the Caucasus foothills

Natural neighbor interpolation of Pre-Yamnaya ancestry among Neolithic populations. See full map. This map corresponds roughly to the map of Khvalynsk-Novodanilovka expansion, and in particular to the expansion of horse-head pommel-scepters (read more about Khvalynsk, and specifically about horse symbolism)

NOTE. Unpublished samples from Ekaterinovka have been previously reported as within the R1b-L23 tree. Interestingly, although the Varna outlier is a female, the Balkan outlier from Smyadovo shows two positive SNP calls for hg. R1b-M269. However, its poor coverage makes its most conservative haplogroup prediction R-M343.

The formation of this Pre-Yamnaya ancestry sets this Volga-Caucasus Khvalynsk community apart from the rest of the EHG-like population of eastern Europe.

Natural neighbor interpolation of non-Pre-Yamnaya EHG ancestry among Eneolithic populations. See full map.

Anthony (2019) seems to rely on ADMIXTURE graphics when he writes that the late Sredni Stog sample from Alexandria shows “80% Khvalynsk-type steppe ancestry (CHG&EHG)”. While this seems the most logical conclusion of what might have happened after the Suvorovo-Novodanilovka expansion through the North Pontic steppes (see my post on “Steppe ancestry” step by step), formal stats have not confirmed that.

In fact, analyses published in Wang et al. (2019) rejected that Corded Ware groups are derived from this Pre-Yamnaya ancestry, a reality that had been already hinted in Narasimhan et al. (2018), when Steppe_EMBA showed a poor fit for expanding Srubna-Andronovo populations. Hence the need to consider the whole CHG component of the North Pontic area separately:

Natural neighbor interpolation of non-Pre-Yamnaya CHG ancestry among Eneolithic populations. See full map. You can read more about population movements in the late Sredni Stog and closer to the Proto-Corded Ware period.

NOTE. Fits for WHG + CHG + EHG in Neolithic and Eneolithic populations are taken in part from Mathieson et al. (2019) supplementary materials (download Excel here). Unfortunately, while data on the Ukraine_Eneolithic outlier from Alexandria abounds, I don’t have specific data on the so-called ‘outlier’ from Dereivka compared to the other two analyzed together, so these maps of CHG and EHG expansion are possibly showing a lesser distribution to the west than the real one ca. 4000-3500 BC.

Natural neighbor interpolation of WHG ancestry among Eneolithic populations. See full map.

Anatolia Neolithic ancestry clearly spread to the east into the north Pontic area through a Middle Eneolithic mating network, most likely opened after the Khvalynsk expansion:

Natural neighbor interpolation of Anatolia Neolithic ancestry among Eneolithic populations. See full map.
Natural neighbor interpolation of Iran Chl. ancestry among Eneolithic populations. See full map.

Regarding Y-chromosome haplogroups, Anthony (2019) insists on the evident association of Khvalynsk, Yamnaya, and the spread of Pre-Yamnaya and Yamnaya ancestry with the expansion of elite R1b-L754 (and some I2a2) individuals:

Y-DNA haplogroups in West Eurasia during the Early Eneolithic in the Pontic-Caspian steppes. See full map, and see culture, ADMIXTURE, Y-DNA, and mtDNA maps of the Early Eneolithic and Late Eneolithic.

3. Early Bronze Age

Data from Wang et al. (2019) show that Corded Ware-derived populations do not have good fits for Eneolithic_Steppe-like ancestry, no matter the model. In other words: Corded Ware populations show not only a higher contribution of Anatolia Neolithic ancestry (ca. 20-30% compared to the ca. 2-10% of Yamnaya); they show a different EHG + CHG combination compared to the Pre-Yamnaya one.

Supplementary Table 13. P values of rank=2 and admixture proportions in modelling Steppe ancestry populations as a three-way admixture of Eneolithic steppe Anatolian_Neolithic and WHG using 14 outgroups.
Left populations: Test, Eneolithic_steppe, Anatolian_Neolithic, WHG.
Right populations: Mbuti.DG, Ust_Ishim.DG, Kostenki14, MA1, Han.DG, Papuan.DG, Onge.DG, Villabruna, Vestonice16, ElMiron, Ethiopia_4500BP.SG, Karitiana.DG, Natufian, Iran_Ganj_Dareh_Neolithic.

Yamnaya Kalmykia and Afanasievo show the closest fits to the Eneolithic population of the North Caucasian steppes, rejecting thus sizeable contributions from Anatolia Neolithic and/or WHG, as shown by the SD values. Both probably show then a Pre-Yamnaya ancestry closest to the late Repin population.

Modelling results for the Steppe and Caucasus cluster. Admixture proportions based on (temporally and geographically) distal and proximal models, showing additional AF ancestry in Steppe groups and additional gene flow from the south in some of the Steppe groups as well as the Caucasus groups. See tables above. Modified from Wang et al. (2019). Within a blue square, Yamnaya-related groups; within a cyan square, Corded Ware-related groups. Green background behind best p-values. In red circle, SD of AF/WHG ancestry contribution in Afanasevo and Yamnaya Kalmykia, with ranges that almost include 0%.

EBA maps include data from Wang et al. (2018) supplementary materials, specifically unpublished Yamnaya samples from Hungary that appeared in analysis of the preprint, but which were taken out of the definitive paper. Their location among Yamnaya settlers from Hungary is speculative, although most uncovered kurgans in Hungary are concentrated in the Tisza-Danube interfluve.

Natural neighbor interpolation of Pre-Yamnaya ancestry among Early Bronze Age populations. See full map. This map corresponds roughly with the known expansion of late Repin/Yamnaya settlers.

The Y-chromosome bottleneck of elite males from Proto-Indo-European clans under R1b-L754 and some I2a2 subclades, already visible in the Khvalynsk sampling, became even more noticeable in the subsequent expansion of late Repin/early Yamnaya elites under R1b-L23 and I2a-L699:

Y-DNA haplogroups in West Eurasia during the Yamnaya expansion. See full map and maps of cultures, ADMIXTURE, Y-DNA, and mtDNA of the Early Chalcolithic and Yamnaya Hungary.

Maps of CHG, EHG, Anatolia Neolithic, and probably WHG show the expansion of these components among Corded Ware-related groups in North Eurasia, apart from other cultures close to the Caucasus:

NOTE. For maps with actual formal stats of Corded Ware ancestry from the Early Bronze Age to the modern times, you can read the post Corded Ware ancestry in North Eurasia and the Uralic expansion.

Natural neighbor interpolation of non-Pre-Yamnaya CHG ancestry among Early Bronze Age populations. See full map.
Natural neighbor interpolation of non-Pre-Yamnaya EHG ancestry among Early Bronze Age populations. See full map.
Natural neighbor interpolation of WHG ancestry among Early Bronze Age populations. See full map.
Natural neighbor interpolation of Anatolia Neolithic ancestry among Early Bronze Age populations. See full map.
Natural neighbor interpolation of Iran Chl. ancestry among Early Bronze Age populations. See full map.

4. Middle to Late Bronze Age

The following maps show the most likely distribution of Yamnaya ancestry during the Bell Beaker-, Balkan-, and Sintashta-Potapovka-related expansions.

4.1. Bell Beakers

The amount of Yamnaya ancestry is probably overestimated among populations where Bell Beakers replaced Corded Ware. A map of Yamnaya ancestry among Bell Beakers gets trickier for the following reasons:

  • Expanding Repin peoples of Pre-Yamnaya ancestry must have had admixture through exogamy with late Sredni Stog/Proto-Corded Ware peoples during their expansion into the North Pontic area, and Sredni Stog in turn had probably some Pre-Yamnaya admixture, too (although they don’t appear in the simplistic formal stats above). This is supported by the increase of Anatolia farmer ancestry in more western Yamna samples.
  • Later, Yamnaya admixed through exogamy with Corded Ware-like populations in Central Europe during their expansion. Even samples from the Middle to Upper Danube and around the Lower Rhine will probably show increasing contributions of Steppe_MLBA, at the same time as they show an increasing proportion of EEF-related ancestry.
  • To complicate things further, the late Corded Ware Espersted family (from ca. 2500 BC or later) shows, in turn, what seems like a recent admixture with Yamnaya vanguard groups, with the sample of highest Yamnaya ancestry being the paternal uncle of other individuals (all of hg. R1a-M417), suggesting that there might have been many similar Central European mating networks from the mid-3rd millennium BC on, of (mainly) Yamnaya-like R1b elites displaying a small proportion of CW-like ancestry admixing through exogamy with Corded Ware-like peoples who already had some Yamnaya ancestry.
Natural neighbor interpolation of Yamnaya ancestry among Middle to Late Bronze Age populations (Esperstedt CWC site close to BK_DE, label is hidden by BK_DE_SAN). See full map. You can see how this map correlated with the map of Late Copper Age migrations and Yamanaya into Bell Beaker expansion.

NOTE. Terms like “exogamy”, “male-driven migration”, and “sex bias”, are not only based on the Y-chromosome bottlenecks visible in the different cultural expansions since the Palaeolithic. Despite the scarce sampling available in 2017 for analysis of “Steppe ancestry”-related populations, it appeared to show already a male sex bias in Goldberg et al. (2017), and it has been confirmed for Neolithic and Copper Age population movements in Mathieson et al. (2018) – see Supplementary Table 5. The analysis of male-biased expansion of “Steppe ancestry” in CWC Esperstedt and Bell Beaker Germany is, for the reasons stated above, not very useful to distinguish their mutual influence, though.

Based on data from Olalde et al. (2019), Bell Beakers from Germany are the closest sampled ones to expanding East Bell Beakers, and those close to the Rhine – i.e. French, Dutch, and British Beakers in particular – show a clear excess “Steppe ancestry” due to their exogamy with local Corded Ware groups:

Only one 2-way model fits the ancestry in Iberia_CA_Stp with P-value>0.05: Germany_Beaker + Iberia_CA. Finding a Bell Beaker-related group as a plausible source for the introduction of steppe ancestry into Iberia is consistent with the fact that some of the individuals in the Iberia_CA_Stp group were excavated in Bell Beaker associated contexts. Models with Iberia_CA and other Bell Beaker groups such as France_Beaker (P-value=7.31E-06), Netherlands_Beaker (P-value=1.03E-03) and England_Beaker (P-value=4.86E-02) failed, probably because they have slightly higher proportions of steppe ancestry than the true source population.


The exogamy with Corded Ware-like groups in the Lower Rhine Basin seems at this point undeniable, as is the origin of Bell Beakers around the Middle-Upper Danube Basin from Yamnaya Hungary.

To avoid this excess “Steppe ancestry” showing up in the maps, since Bell Beakers from Germany pack the most Yamnaya ancestry among East Bell Beakers outside Hungary (ca. 51.1% “Steppe ancestry”), I equated this maximum with BK_Scotland_Ach (which shows ca. 61.1% “Steppe ancestry”, highest among western Beakers), and applied a simple rule of three for “Steppe ancestry” in Dutch and British Beakers.

NOTE. Formal stats for “Steppe ancestry” in Bell Beaker groups are available in Olalde et al. (2018) supplementary materials (PDF). I didn’t apply this adjustment to Bk_FR groups because of the R1b Bell Beaker sample from the Champagne/Alsace region reported by Samantha Brunel that will pack more Yamnaya ancestry than any other sampled Beaker to date, hence probably driving the Yamnaya ancestry up in French samples.

The most likely outcome in the following years, when Yamnaya and Corded Ware ancestry are investigated separately, is that Yamnaya ancestry will be much lower the farther away from the Middle and Lower Danube region, similar to the case in Iberia, so the map above probably overestimates this component in most Beakers to the north of the Danube. Even the late Hungarian Beaker samples, who pack the highest Yamnaya ancestry (up to 75%) among Beakers, represent likely a back-migration of Moravian Beakers, and will probably show a contribution of Corded Ware ancestry due to the exogamy with local Moravian groups.

Despite this decreasing admixture as Bell Beakers spread westward, the explosive expansion of Yamnaya R1b male lineages (in words of David Reich) and the radical replacement of local ones – whether derived from Corded Ware or Neolithic groups – shows the true extent of the North-West Indo-European expansion in Europe:

Y-DNA haplogroups in West Eurasia during the Bell Beaker expansion. See full map and see maps of cultures, ADMIXTURE, Y-DNA, and mtDNA of the Late Copper Age and of the Yamnaya-Bell Beaker transition.

4.2. Palaeo-Balkan

There is scarce data on Palaeo-Balkan movements yet, although it is known that:

  1. Yamnaya ancestry appears among Mycenaeans, with the Yamnaya Bulgaria sample being its best current ancestral fit;
  2. the emergence of steppe ancestry and R1b-M269 in the eastern Mediterranean was associated with Ancient Greeks;
  3. Thracians, Albanians, and Armenians also show R1b-M269 subclades and “Steppe ancestry”.

4.3. Sintashta-Potapovka-Filatovka

Interestingly, Potapovka is the only Corded Ware derived culture that shows good fits for Yamnaya ancestry, despite having replaced Poltavka in the region under the same Corded Ware-like (Abashevo) influence as Sintashta.

This proves that there was a period of admixture in the Pre-Proto-Indo-Iranian community between CWC-like Abashevo and Yamnaya-like Catacomb-Poltavka herders in the Sintashta-Potapovka-Filatovka community, probably more easily detectable in this group because of the specific temporal and geographic sampling available.

Supplementary Table 14. P values of rank=3 and admixture proportions in modelling Steppe ancestry populations as a four-way admixture of distal sources EHG, CHG, Anatolian_Neolithic and WHG using 14 outgroups.
Left populations: Steppe cluster, EHG, CHG, WHG, Anatolian_Neolithic
Right populations: Mbuti.DG, Ust_Ishim.DG, Kostenki14, MA1, Han.DG, Papuan.DG, Onge.DG, Villabruna, Vestonice16, ElMiron, Ethiopia_4500BP.SG, Karitiana.DG, Natufian, Iran_Ganj_Dareh_Neolithic.

Srubnaya ancestry shows a best fit with non-Pre-Yamnaya ancestry, i.e. with different CHG + EHG components – possibly because the more western Potapovka (ancestral to Proto-Srubnaya Pokrovka) also showed good fits for it. Srubnaya shows poor fits for Pre-Yamnaya ancestry probably because Corded Ware-like (Abashevo) genetic influence increased during its formation.

On the other hand, more eastern Corded Ware-derived groups like Sintashta and its more direct offshoot Andronovo show poor fits with this model, too, but their fits are still better than those including Pre-Yamnaya ancestry.

Natural neighbor interpolation of non-Pre-Yamnaya EHG ancestry among Middle to Late Bronze Age populations. See full map.
Natural neighbor interpolation of non-Pre-Yamnaya CHG ancestry among Middle to Late Bronze Age populations. See full map.
Natural neighbor interpolation of Anatolia Neolithic ancestry among Middle to Late Bronze Age populations. See full map.
Natural neighbor interpolation of Iran Chl. ancestry among Middle to Late Bronze Age populations. See full map.

NOTE For maps with actual formal stats of Corded Ware ancestry from the Early Bronze Age to the modern times, you should read the post Corded Ware ancestry in North Eurasia and the Uralic expansion instead.

The bottleneck of Proto-Indo-Iranians under R1a-Z93 was not yet complete by the time when the Sintashta-Potapovka-Filatovka community expanded with the Srubna-Andronovo horizon:

Y-DNA haplogroups in West Eurasia during the European Early Bronze Age. See full map and see maps of cultures, ADMIXTURE, Y-DNA, and mtDNA of the Early Bronze Age.

4.4. Afanasevo

At the end of the Afanasevo culture, at least three samples show hg. Q1a2-M25 (ca. 2900-2500 BC), which seemed to point to a resurgence of local lineages, despite continuity of the prototypical Pre-Yamnaya ancestry. On the other hand, Anthony (2019) makes this cryptic statement:

Yamnaya men were almost exclusively R1b, and pre-Yamnaya Eneolithic Volga-Caspian-Caucasus steppe men were principally R1b, with a significant Q1a minority.

Since the only available samples from the Khvalynsk community are R1b (x3), Q1a(x1), and R1a(x1), it seems strange that Anthony would talk about a “significant minority”, unless Q1a will pop up in some more individuals of those ca. 30 new to be published. Because he also mentions I2a2 as appearing in one elite burial, it seems Q1a (like R1a-M459) will not appear under elite kurgans, although it is still possible that hg. Q1a was involved in the expansion of Afanasevo to the east.

Y-DNA haplogroups in West Eurasia during the Middle Bronze Age. See full map and see maps of cultures, ADMIXTURE, Y-DNA, and mtDNA of the Middle Bronze Age and the Late Bronze Age.

Okunevo, which replaced Afanasevo in the Altai region, shows a majority of hg. Q1a2-M25, and at least one Q1a1-B284, but also some R1b-M269 samples proper of Afanasevo, suggesting partial genetic continuity.

NOTE. Other sampled Siberian populations clearly show a variety of Q subclades that likely expanded during the Palaeolithic, such as Baikal EBA samples from Ust’Ida and Shamanka with a majority of Q1a2-M25 (in particular Q1a2-L712), and hg. Q reported from Elunino, Sagsai, Khövsgöl, and also among peoples of the Srubna-Andronovo horizon (the Krasnoyarsk MLBA outlier), and in Karasuk. Q1a-M25 was earlier found in a Baltic hunter-gatherer, which supports a widespread distribution of Q1a2 and Q1a1 in North Eurasia during the Neolithic and Bronze Age.

From Damgaard et al. Science (2018):

(…) in contrast to the lack of identifiable admixture from Yamnaya and Afanasievo in the CentralSteppe_EMBA, there is an admixture signal of 10 to 20% Yamnaya and Afanasievo in the Okunevo_EMBA samples, consistent with evidence of western steppe influence. This signal is not seen on the X chromosome (qpAdm P value for admixture on X 0.33 compared to 0.02 for autosomes), suggesting a male-derived admixture, also consistent with the fact that 1 of 10 Okunevo_EMBA males carries a R1b1a2a2 Y chromosome related to those found in western pastoralists. In contrast, there is no evidence of western steppe admixture among the more eastern Baikal region region Bronze Age (~2200 to 1800 BCE) samples.

This Yamnaya ancestry has been also recently found to be the best fit for the Iron Age population of Shirenzigou in Xinjiang – where Tocharian languages were attested centuries later – despite the haplogroup diversity acquired during their evolution, likely through an intermediate Chemurchek culture (see a recent discussion on the elusive Proto-Tocharians).

Haplogroup diversity seems to be common in Iron Age populations all over Eurasia, most likely due to the spread of different types of sociopolitical structures where alliances played a more relevant role in the expansion of peoples. A well-known example of this is the spread of Akozino warrior-traders in the whole Baltic region under a partial N1a-VL29-bottleneck associated with the emerging chiefdom-based systems under the influence of expanding steppe nomads.

Y-DNA haplogroups in West Eurasia during the Early Iron Age. See full map and see maps of cultures, ADMIXTURE, Y-DNA, and mtDNA of the Early Iron Age and Late Iron Age.

Surprisingly, then, Proto-Tocharians from Shirenzigou pack up to 74% Yamnaya ancestry, in spite of the 2,000 years that separate them from the demise of the Afanasevo culture. They show more Yamnaya ancestry than any other population by that time, being thus a sort of Late PIE fossils not only in their archaic dialect, but also in their genetic profile:


The recent intrusion of Corded Ware-like ancestry, as well as the variable admixture with Siberian and East Asian populations, both point to the known intense Old Iranian and Old/Middle Chinese contacts. The scarce Proto-Samoyedic and Proto-Turkic loans in Tocharian suggest a rather loose, probably more distant connection with East Uralic and Altaic peoples from the forest-steppe and steppe areas to the north (read more about external influences on Tocharian).

Interestingly, both R1b samples, MO12 and M15-2 – likely of Asian R1b-PH155 branch – show a best fit for Andronovo/Srubna + Hezhen/Ulchi ancestry, suggesting a likely connection with Iranians to the east of Xinjiang, who later expanded as the Wusun and Kangju. How they might have been related to Huns and Xiongnu individuals, who also show this haplogroup, is yet unknown, although Huns also show hg. R1a-Z93 (probably most R1a-Z2124) and Steppe_MLBA ancestry, earlier associated with expanding Iranian peoples of the Srubna-Andronovo horizon.

All in all, it seems that prehistoric movements explained through the lens of genetic research fit perfectly well the linguistic reconstruction of Proto-Indo-European and Proto-Uralic.


Origin of horse domestication likely on the North Caspian steppes

Open access Late Quaternary horses in Eurasia in the face of climate and vegetation change, by Leonardi et al. Science Advances (2008) 4(7):eaar5589.

Interesting excerpts (emphasis mine):

Here, we compiled an extensive continental-scale database, consisting of 3070 radiocarbon dates associated to horse paleontological and archeological finds across the whole of Eurasia, that has been analyzed in association with coarse-scale paleoclimatic reconstructions. We further collected the number of identified specimens (NISP) frequency data for horses versus other ungulates in 1120 archeological layers in Europe (…) This ma.ssive amount of data allowed us to track,with unprecedented details, how the geographic distribution of the species changed through time

Geographic range through time

For most analyses, the data have been divided into climatic periods: pre-LGM(older than 27 ka B.P.), LGM(27 to 18 ka B.P.), Late Glacial (18 to 11.7 ka B.P.), Preboreal (11.7 to 10.6 ka B.P.), Boreal (10.6 to 9.1 ka B.P.), Early Atlantic (9.1 to 7.5 ka B.P.), Late Atlantic (7.5 to 5.5 ka B.P.), and Recent (younger than 5.5 ka B.P.) (Fig. 1, A and B). The spatial and temporal distribution of horse remains compiled in our database reveals a strong imbalance in Eurasia (Fig. 1, A and B).

We found a common trend in both regions for a high number of occurrences at the end of the Pleistocene (with a decrease during the LGM, only visible in Europe), followed by a drastic reduction in the Early and Middle Holocene, and a relative increase toward more recent times. These included both the Early Atlantic in Europe, which started ~9.1 ka B.P., and the time range after 5.5 ka B.P. for Asia. The horse fossil record appears ubiquitous throughout Europe in the Late Pleistocene, while in the Early and Middle Holocene the finds are concentrated in central-western Europe and Iberia. From 7.5 ka B.P., the number of finds increases markedly, and the geographical distribution extends toward the east and southeast.

Horse occurrences through time. (A) Horse occurrences through time. Histograms showing the number of horse observations in Europe (left panel) and Asia (right panel) for each time bin (top) and for climatic period (bottom). Only time bins with more than 10 observations (black horizontal line) have been considered for the SDM analyses. From 22 ka B.P. backward (gray vertical line), time bins cover 2 ka following the available paleoclimatic reconstructions. The central map shows the boundaries considered while defining European and Asian regions, with the black line representing the Urals. The zoomed area shows the geographical resolution of the climatic reconstructions, with each pixel representing a grid cell. (B) Geographic distribution of horse occurrences. Maps showing horse occurrences for each climatic period in Europe (left) and Asia (right).

Different Asian and European niches

This analysis revealed that, in both continents, horses occupied only a portion of the climatic space available. The range covered by random locations shows that the paleoecological conditions present in Europe were only a subset of those found in Asia. However, European horses occupied a much wider climatic space than in Asia, with only limited overlap between the two ranges.

Horses conquered temperate environments from a European source

There is no evidence of climatic barriers between those two populations through time because the forecasts from Europe and Asia always overlap in central Eurasia, except 5 ka B.P. (figs. S3 and S4). An alternative explanation is the role of the Urals as a potential constraint for the dispersal of horses between Europe and north central Asia.

Climatic suitability. (A) Cumulative climatic suitability for the past 44 ka based on simulation on the European (left), Eurasian (middle), and Asian (right) data sets. To correct for sampling bias in the Eurasian data set, for each time slice, all estimates and projections for Eurasia are performed considering 100 random resampling of European occurrences in the same number as Asian occurrences. The darker the colors, themore stable the climatic suitability for horses (climatic niche = p-Hor) through time. (B) Projection of climatic suitability across Eurasia in different climatic periods based on occurrences in Europe (left), Eurasia (middle), and Asia (right). Because of the scarcity of data available for Asia, no models for the Holocene have been possible for both Asia and Eurasia, with the exception of 5 and 3 ka B.P. (both included in the “Recent” period).

Climatic and habitat association patterns for horses in Europe support increasing habitat fragmentation

The decrease of horse remains in Europe is not characterized by a geographic reduction in the overall extent of the area occupied by the species but in a drop of frequencies in a geographic extent that does not vary much between the Late Glacial and the Early Atlantic (Figs. 1B and 4B). This pattern is more likely to result from habitat fragmentation than from a geographic shift in the climatic range suitable for the species, as observed for many animals during the LGM (23).

In the whole period ranging from the Preboreal (11.7 to 10.6 ka B.P.) to the Late Atlantic (7.5 to 5.5 ka B.P.), the total amount of land space most and likely suitable to horses is wider than in the Late Glacial, and only between 8 to 7 ka ago the European range appears patchy and fragmented (Fig. 4C). When comparing each of four successive time bins during the Holocene (8, 7, 6, and 5 ka B.P., respectively) (Fig. 4E), the difference in successive p-Hor values in Europe shows that the suitability for the species in Iberia, northeastern France, Italy, the Balkans, and eastern Europe steadily increased, while in Central Europe strong differences can be observed between neighboring regions.

Analyses of the European data set and biomefrequency. (A) Distribution through time of the frequency of horse remains in Europe calculated as NISP of horses versus other ungulates. (B) Density of horse remains through time in Europe, calculated as NISP of horses versus other ungulates. The numbers at the bottom of each bar represent the number of observations falling in each class, from 0 to >5%. (C) Climatic suitability for horses in Europe between 10 and 3 ka B.P. (D) Climatic suitability per time period. Percentage of land cells in Europe with a value of suitability for horses (p-Hor) > 0.5 and p-Hor > 0.8. (E) Holocene climatic amelioration. Difference in p-Hor in Europe comparing five successive time bins during the Holocene: 9, 8, 7, 6, and 5 ka B.P. Eachmap shows the difference in themore recent distribution compared to the previous one. (F) Environmental reconstructions in themacro area surrounding horse finds in Europe (left) and Asia (right) per climatic period. The lighter the color, the less forested is the region. The numbers at the bottom of the bars show the number of occurrences in closed environments over all the observations. The dotted line represents a frequency of 0.5.

Taken at face value, this pattern would suggest that horses were not restricted to open environments but could equally well inhabit closed, forested environments, as previously suggested (18). However, as others recently emphasized (19), the faunal associations inHolocene sites from Europe suggest a different pattern. The PCAs based on faunal assemblages (figs. S1 and S2) separate on the second principal component sites characterized by ungulates associated to forested areas (red deer, wild boar, and roe deer) and all other animals, associated to semi-open and open environments, including horses for most records.

Together, the contrast between the reconstructed microscale and macroscale vegetable coverage in Europe, the increase of horses in mainly forested macroregions, and the spatial pattern of extinction suggest that, from the beginning of the Holocene, the suitable environment became more and more patchy, with open areas increasingly fragmented by forests, where wild populations of horses could have survived in isolation until one or several waves of arrivals of domestic horses, leading to either local admixture or a full replacement of the preexisting local populations.


Our data show that, up to 5.5 ka ago, horse finds do not show association with species characteristic of forested areas such as wild boar and roe deer. We infer that the open and semi-open habitats occupied by horses on a narrow geographic scale appear less and less frequent at a macroenvironmental scale, supporting the possibility of increasing fragmentation of open habitats. This event is also likely to have led to an intensification of genetic isolation for the remaining horse populations, a pattern that still needs to be tested on genomic data.

The suitability of both Iberia and eastern Europe appears constant throughout the entire post-LGM period, in line with these regions being hotspots of genetic diversity and, possibly, the refugia sources for the recolonization of the continent (11). While the Pontic-Caspian region appears not suitable for European horses around the time when horses where first domesticated some 5.5 ka ago (6), part of this region appears suitable for the Asian horses (with the Caspian Sea as the westernmost boundary). This may suggest that horse domestication started from a population background related to an Asian ancestry and that the further spread of the domesticated horses in Europe involved either adaptation to novel niches (possibly through selective breeding) or the application of domestication techniques to local horse populations pre-adapted to these environmental conditions. Testing this scenario will require mapping the genetic structure of the Eurasian horse population within the fifth to third millennium BCE.

Some remarks

Cultural-anthropological research and archaeological remains (see here), genetics (see here and here), and now also thorough palaeoclimatic and archaeological models point to the North Caspian region, settled by the Khvalynsk culture, as the most likely earliest origin of horse domestication. The paper also supports the favorable conditions of western Europe up to Iberia for the introduction of a horse-riding culture.

I intended to write a post about the myth of Corded Ware horse riders, but for the moment I haven’t found the time. Not that Corded Ware pastoralists didn’t have horses, or could not ride them: they were a highly mobile culture of pastoralists stemming from eastern Poland / western Ukraine, so they must have known horses, like many other European cultures of the late 4th / early 3rd millennium influenced by expanding Yamna settlers. But it just cannot be said to have formed an essential part of their culture, as it was for Khvalynsk-Novodanilovka, and especially Yamna and later East Bell Beaker, Sintashta, etc.

A mere look at these maps suffices to assess the limited role of the horse in north-eastern Europe, the only region where groups of late Corded Ware-derived cultures survived the expansion of Yamna, and especially East Bell Beakers after ca. 2500 BC, which transformed Western, Northern, and Central Europe, and even East Europe reaching the modern Baltic countries, Belarus, and Romania. Even Trzciniec was born out of the influence from expanding Bell Beakers into earlier Corded Ware territory, although the later (Iron Age) relevance of this culture was probably quite limited.

As you can imagine, without horses and horse symbolism, horse riding, carts, and intensive cattle-breeding (associated with Yamna and the broad, east-central European grasslands typical of steppe regions), there can be no Proto-Indo-European, whose reconstructed vocabulary is particulary rich in horse-related words, and whose reconstructed culture, society, and religion cannot be understood without the domesticated horse. In forest regions to the north-east and eastern Europe, there was apparently little space for horses, but plenty of room for other ungulates and thus hunting, and indeed Uralic languages

In the upcoming months we will see R1a-fans associating Proto-Indo-Europeans more and more with wool, and sheep, and corded ware, and forest regions, until the proposed homeland shifts to the Baltic and Finland, instead of dat boring horse-riding people of the steppes…No wait, it’s already happening.

NOTE. Also open access is the recent Horse Y chromosome assembly displays unique evolutionary features and putative stallion fertility genes, by Janečka et al. Nature Communications (2018).


Shared ancestry of ancient Eurasian hepatitis B virus diversity linked to Bronze Age steppe


Ancient hepatitis B viruses from the Bronze Age to the Medieval period, by Mühlemann et al., Science (2018) 557:418–423.

NOTE. You can read the PDF at Dalia Pokutta’s Academia.edu account.

Abstract (emphasis):

Hepatitis B virus (HBV) is a major cause of human hepatitis. There is considerable uncertainty about the timescale of its evolution and its association with humans. Here we present 12 full or partial ancient HBV genomes that are between approximately 0.8 and 4.5 thousand years old. The ancient sequences group either within or in a sister relationship with extant human or other ape HBV clades. Generally, the genome properties follow those of modern HBV. The root of the HBV tree is projected to between 8.6 and 20.9 thousand years ago, and we estimate a substitution rate of 8.04 × 10−6–1.51 × 10−5 nucleotide substitutions per site per year. In several cases, the geographical locations of the ancient genotypes do not match present-day distributions. Genotypes that today are typical of Africa and Asia, and a subgenotype from India, are shown to have an early Eurasian presence. The geographical and temporal patterns that we observe in ancient and modern HBV genotypes are compatible with well-documented human migrations during the Bronze and Iron Ages1,2. We provide evidence for the creation of HBV genotype A via recombination, and for a long-term association of modern HBV genotypes with humans, including the discovery of a human genotype that is now extinct. These data expose a complexity of HBV evolution that is not evident when considering modern sequences alone.

Geographical distribution of analysed samples and modern genotypes. a (featured image), Distribution of modern human HBV genotypes. Genotypes relevant to this Letter are shown in colour. Coloured shapes indicate the locations of the HBV-positive samples included for further analysis. b (above this text), Locations of analysed Bronze Age samples are shown as circles and Iron Age and later samples are shown as triangles. Coloured markers indicate HBV-positive samples. Ancient genotype A samples are found in regions in which genotype D predominates today, and HBV-DA27 is of subgenotype D5 which today is found almost exclusively in India.

Interesting excerpts:

We find genotype A in south-western Russia by 4.3 ka (in samples RISE386 and RISE387) in individuals belonging to the Sintashta culture, and in a Hungarian sample (DA195) from the Scythian culture. The western Scythians are related to the Bronze Age cultures of western steppe populations2 and their shared ancestry suggests that the modern genotype A may descend from this ancient Eurasian diversity and not, as previously hypothesized, from African ancestors29,30. This is also consistent with the phylogeny (Fig. 2), as well as the fact that the three oldest ancient genotype A sequences (HBV-DA195, HBV-RISE386 and HBV-RISE387) lack the six-nucleotide insertion found in the youngest (HBV-DA119) and in all modern genotype A sequences. The ancestors of subgenotypes A1 and A3 could have been carried into Africa subsequently, via migration from western Eurasia31.

The ancient HBV genotype D sequences were all found in Central Asia. HBV-DA27, found in Kazakhstan and dated to 1.6 ka, falls basal to the modern subgenotype D5 sequences that today are found in the Paharia tribe from eastern India32. DA27 and the Paharia people in India are linked by their East Asian ancestry2,33.

Dated maximum clade credibility tree of HBV. A log-normal relaxed clock and coalescent exponential population prior were used. Grey horizontal bars indicate the 95% HPD interval of the age of the node. Larger numbers on the nodes indicate the median age and 95% HPD interval of the age (in parentheses) under a strict clock and Bayesian skyline tree prior. Clades of genotypes C (except clade C4), E, F, G and H are collapsed and shown as dots. The figure includes a possible tenth genotype, J, based on a single human isolate. Taxon names for ancient samples indicate era (BA, Bronze Age; IA, Iron Age or later), sample name, sample age in years, ISO 3166 three-letter abbreviation of country of sequence origin, and region of sequence origin. Taxon names for modern samples indicate human genotype or subgenotype or host species if non-human, GenBank accession number, sample age in years, ISO 3166 three-letter abbreviation of country of sequence origin, and region of sequence origin.

(…)Despite the age of the samples and the imperfect diagnostic test, our dataset contained a high proportion of HBV-positive individuals. The actual ancient prevalence during the Bronze Age and thereafter might have been higher, reaching or exceeding the prevalence typically found in contemporary indigenous populations5. This clearly establishes the potential of HBV as powerful proxy tool for research into human spread and interactions. The data from ancient genomes reveal aspects of complexity in HBV evolution that are not apparent when only modern sequences are considered. They show the existence of ancient HBV genotypes in locations incongruent with their present-day distribution, contradicting previously suggested geographical or temporal origins of genotypes or sub-genotypes; evidence for the creation of genotype A via recombination and the emergence of the genotype outside Africa; at least one now-extinct human genotype; ancient genotype-level localized diversity; and demonstrate that the viral substitution rate obtained from modern heterochronously sampled sequences is probably misleading. Together, these findings suggest that the difficulty in formulating a coherent theory for the origin and spread of HBV may be due to genetic evidence of an earlier evolutionary scenario being overwritten by relatively recent alterations, as has previously been suggested in the context of recombination24

See also: