“Steppe ancestry” step by step (2019): Mesolithic to Early Bronze Age Eurasia


The recent update on the Indo-Anatolian homeland in the Middle Volga region and its evolution as the Indo-Tocharian homeland in the Don–Volga area as described in Anthony (2019) has, at last, a strong scientific foundation, as it relies on previous linguistic and archaeological theories, now coupled with ancient phylogeography and genomic ancestry.

There are still some inconsistencies in the interpretation of the so-called “Steppe ancestry”, though, despite the one and a half years that have passed since we first had access to the closest Pontic–Caspian steppe source populations. Even my post “Steppe ancestry” step by step from a year ago is already outdated.


The population selection process for models shown below included (1) plausibility of potential influences in the particular geographic and archaeological context; (2) looking for their clusters or particular samples in the PCA; and (3) testing with qpAdm for potential source populations that might have been involved in their development.

The results and graphics posted are therefore intended to simplistically show potential admixture events between populations potentially close to the actual sources of the target samples, whenever such mating networks could be supported by archaeology.

NOTE. This is an informal post and I am not a geneticist, so I am turning this flexibility to my advantage. If any reader is – for some strange reason – looking for a strict hypothesis testing, for the use of a full set of formal stats (as used e.g. in Ning et al. 2019 for Proto-Tocharians), and correctly redacted and peer-reviewed text, this is not the right place to find them.

An example pedigree (a) of a focal individual sampled in the modern day, placed in its geographic context to make the spatial pedigree (b). Dashed lines denote matings, and solid lines denote parentage, with red hues for the maternal ancestors and blue hues for the paternal ancestors. In the spatial pedigree, each plane represents a sampled region in a discrete (nonoverlapping) generation, and each dot shows the birth location of an individual. The pedigree of the focal individual is highlighted back through time and across space. Image modified from Bradburd and Ralph (2019).

Despite the natural impulse to draw straight mixture trajectories (see e.g. Wang et al. 2019), simply adding or subtracting samples used for a PCA shows how the plot is affected by different variables (see e.g. what happens by including more South Asian samples to the PCA below), hence the need to draw curved arrows – not necessarily representing a sizable drift; at least not in recent prehistoric admixture events for which we have a reasonable chronological transect.

Representation of mixture events between European prehistoric peoples in the PCA. Image modified from David Reich‘s Who We Are and How We Got Here (2018).

Ethnolinguistic identification is a risky business that brings back memories of an evil use of cultural history and its consequences (at least in Western Europe, where this tradition was discontinued after WWII), but it seems necessary for those of us who want to find some confirmation of proposed dialectal schemes and language contacts.

Eneolithic Steppe vs. Steppe Maykop

First things first: I tested Bronze Age Eurasian peoples for the only two true steppe populations sampled to date, as potential sources of their “Steppe ancestry” – conventionally described as an EHG:CHG admixture, similar to that found in the first sampled Yamnaya individuals. I used the rightpops of Wang et al. (2018), but with a catch: since authors used WHG as a leftpop and Villabruna as a rightpop, and I find that a little inconsequential*, I preferred the strategy in Ning et al. (2019), contrasting as outgroup Eneolithic_Steppe (ca. 4300 BC) vs. Steppe_Maykop (ca. 3500 BC) when testing for WHG as a source population.

*WHG usually includes samples from a ‘western’ cluster (Loschbour and La Braña) and an ‘eastern’ cluster (Villabruna and Koros), see Lipson et al. (2017). Therefore, it doesn’t make much sense to include the same (or a very similar) population as a source AND an outgroup.

NOTE. For all other qpAdm analyses below, where WHG was not used as leftpop, I have used Villabruna as rightpop following Wang et al. (2019).

Map of samples and sites mentioned in Wang et al. (2019), modified from the original to include labels of Eneolithic_Steppe and Steppe_Maykop samples. See PCA and ADMIXTURE grahpic for the identification of specific samples.

Results are not much different from what has been reported. In general, Yamnaya and related groups such as Bell Beakers and Steppe-related Chalcolithic/Bronze Age populations show good fits for Eneolithic_Steppe as their closest source for Steppe ancestry, and bad fits for Steppe_Maykop, whereas Corded Ware groups show the opposite, supporting their known differences.

This trend seems to be tempered in some groups, though, most likely due the influence of Samara_LN-like admixture in Circum-Baltic Late Neolithic and Eastern Corded Ware groups, and the influence of Anatolia_N/EEF-like admixture in Balkan and late European CWC or BBC groups. In fact, the more EEF-related ancestry in a populatoin, the less reliable these generic models (and even specific ones) seem to become when distinguishing the Steppe-related source.

NOTE. For more on this, see the discussion on Circum-Baltic Corded Ware peoples, and the discussion on Mycenaeans and their potential source populations.

These are just broad strokes of what might have happened around the Pontic–Caspian steppes before and during the Early Bronze Age expansions. The most relevant quest right now for Indo-European studies is to ascertain the chain of admixture events that led to the development and expansion of Indo-Uralic and its offshoots, Indo-European and Uralic.

Eastern European Mesolithic with the expansion of Post-Swiderian cultures. See full map.

A history of Steppe ancestry

This post is divided in (more or less accurate) chronological developments as follows:

  1. Hunter-gatherer pottery and the steppes
  2. Khvalynsk and Sredni Stog
  3. Post-Stog and Proto-Corded Ware
  4. Yamnaya and Afanasievo

1. Hunter-gatherer pottery and the steppes

I laid out in the ASOSAH book series the general idea – based on attempts to reconstruct the linguistic ancestor of Indo-Uralic – that Eurasiatic speakers might have expanded with the North-Eastern Techno-Complex that spread through north-eastern Europe during the warm period represented by the transition of the Palaeolithic to the Mesolithic.

If one were to trust the traditional migrationist view, a post-Swiderian population expanded from central-eastern Europe (potentially related originally to Epi-Gravettian peoples, represented by WHG ancestry) into north-eastern Europe, and then further east into the Trans-Urals, to then reappear in eastern Europe as a back-migration represented by the spread of hunter-gatherer pottery.

The marked shift from WHG-like towards EHG-related ancestry from Baltic Mesolithic (ca. 30%) to Combed Ware cultures (ca. 65%-100%) supports this continuous westward expansion, that is possibly best represented in the currently available sampling by the ‘south-eastern’ shift (CHG:ANE-related) of the hunter-gatherer from Lebyazhinka IV (5600 BC) relative to the older one from Sidelkino (9300 BC), both from the Samara region in the Middle Volga:

Mesolithic-Neolithic transition ca. 7000-6000 BC, with hunter-gatherer pottery groups spreading westwards. See full map.

From Anthony (2019):

Along the banks of the lower Volga many excavated hunting-fishing camp sites are dated 6200-4500 BC. They could be the source of CHG ancestry in the steppes. At about 6200 BC, when these camps were first established at Kair-Shak III and Varfolomievka, they hunted primarily saiga antelope around Dzhangar, south of the lower Volga, and almost exclusively onagers in the drier desert-steppes at Kair-Shak, north of the lower Volga. Farther north at the lower/middle Volga ecotone, at sites such as Varfolomievka and Oroshaemoe hunter-fishers who made pottery similar to that at Kair-Shak hunted onagers and saiga antelope in the desert-steppe, horses in the steppe, and aurochs in the riverine forests. Finally, in the Volga steppes north of Saratov and near Samara, hunter-fishers who made a different kind of pottery (Samara type) and hunted wild horses and red deer definitely were EHG. A Samara hunter-gatherer of this era buried at Lebyazhinka IV, dated 5600-5500 BC, was one of the first named examples of the EHG genetic type (Haak et al. 2015). This individual, like others from the same region, had no or very little CHG ancestry. The CHG mating network had not yet reached Samara by 5500 BC.

Given the lack of a proper geographical and chronological transect of ancient DNA from eastern European groups, and the discontinuous appearance of both R1b-M73 and R1b-M269 lineages on both sides of the Urals within the WHG:ANE cline, where EHG appears to have formed, it is impossible at this point to assert anything with enough degree of certainty. For simplicity purposes, though, I risked to equate the expansion of R1b-M73 in West Siberia as potentially associated with Micro-Altaic, and the expansion of hg. R1b-M269 with the spread of Indo-Uralic on both sides of the Urals.

NOTE. For incrementally speculative associations of languages with prehistoric cultures and their potential link to ancestry ± haplogroup expansions, you can check sections on Early Indo-Europeans and Uralians, Indo-Uralians, Altaic peoples, Eurasians, or Nostratians. I explained why I made these simplistic choices here.

While this identification of the Indo-Uralic expansion with hg. R1b is more or less straightforward for the Cis-Urals, given the available ancient DNA samples, it will be very difficult (if at all possible) to trace the migration of these originally R1b-M269-rich populations into Trans-Uralian groups that could eventually be linked to Yukaghir speakers. The sheer number of potential admixture events and bottlenecks in Siberian forest, taiga, and tundra regions since the Mesolithic until Yukaghirs were first attested is guaranteed to give more than one headache in upcoming years…

Spread of hunter-gatherer pottery in eastern Europe ca. 6000-5000 BC. See full map.

The slight increase in WHG-related ancestry in Ukraine Neolithic groups relative to Mesolithic ones questions the arrival of this eastern influence in the north Pontic area, or at least its relevance in genomic terms, although the cluster formed is similar to the previous one and to Combed Ware groups – despite the Central European and Baltic influences in the north Pontic region – with some samples showing 0% change relative to Mesolithic groups.

Structure and change in hunter-gatherer-related populations, from Mathieson et al. (2018). Inferred ancestry proportions for populations modelled as a mixture of WHG, EHG and CHG. Dashed lines show populations from the same geographic region. Percentages indicate proportion of WHG + EHG ancestry. Standard errors range from 1.5 to 8.3%.

NOTE. For more on Indo-Uralic and its reconstruction from a linguistic point of view, check out its dedicated section on ASOSAH, or the recently published (behind paywall) The Precursors of Proto-Indo-European, edited by Kloekhorst and Pronk, Brill (2019). Authors of specific chapters have posted their contributions to Academia.edu, where they can be downloaded for free.

2. Khvalynsk and Sredni Stog

The cluster formed by the three available samples of the Khvalynsk culture (early 5th millennium BC) might be described, as expected from its position in the PCA, as a mixture of EHG-like populations of the Middle Volga with CHG-like ancestry close to that represented by samples from Progress-2 and Vonyuchka, in the North Caucasus Piedmont (ca. 4300 BC):

This variable CHG-like admixture shown in the wide cluster formed by the available Khvalynsk-related samples support the interpretation of a recently created CHG mating network in Anthony (2019):

After 5000 BC domesticated animals appeared in these same sites in the lower Volga, and in new ones, and in grave sacrifices at Khvalynsk and Ekaterinovka. CHG genes and domesticated animals flowed north up the Volga, and EHG genes flowed south into the North Caucasus steppes, and the two components became admixed. After approximately 4500 BC the Khvalynsk archaeological culture united the lower and middle Volga archaeological sites into one variable archaeological culture that kept domesticated sheep, goats, and cattle (and possibly horses). In my estimation, Khvalynsk might represent the oldest phase of PIE.

Detail of the PCA of Eurasian samples, including Neolithic clusters with the hypothesized gene flows related to (1) the formation and (2) expansion of Khvalynsk and the (3) emergence of late Sredni Stog. See full image.

The richest copper assemblage found in all Khvalynsk burials belongs to an individual of hg. R1b-V1636 and intermediate Samara_HG:Eneolithic_Steppe ancestry, while full Eneolithic_Steppe-like admixture in the Middle Volga is represented by the commoner of Khvalynsk II, of hg. Q1. The finding of hg. R1b-V1636 in the North Caucasus Piedmont – and R1b-P297 in the Samara region (probably including Yekaterinovka) begs the question of the origin of hg. R1b-V1636 in the Khvalynsk community. Based on its absence in ancient samples from the forest zone, it is tempting to assign it to steppe hunter-gatherers down the Lower Volga and possibly to the east of it, who infiltrated the Samara region precisely during these population movements described by Anthony (2019).

Suvorovo-related samples from the Balkans, including the Varna and Smyadovo outliers of Steppe ancestry, are closely related to the Khvalynsk expansion:

Similarly, the ancestry of late Sredni Stog samples from Dereivka seem to be directly related to the expansion of Mariupol-like individuals over populations of Suvorovo-Novodanilovka-like admixture, as suggested by the resurgence of typical Ukraine Neolithic haplogroups, the shift in the PCA, and the models of Eneolithic_Steppe vs. Steppe_Maykop above:

#EDIT (11 Nov 2019): In fact, the position of the unpublished Greece_Neolithic outlier that appeared in the Wang et al. (2018) preprint (see full PCA and ADMIXTURE) show that the expanding Suvorovo chiefs from the Balkans formed a tight cluster close to the two published outliers with Steppe ancestry from Bulgaria.

The Ukraine_Neolithic outlier, possibly a Novodanilovka-related sample suggests, based on its position in the PCA close to the late Trypillian outlier of Steppe-related ancestry, that Ukraine_Eneolithic samples from Dereivka are a mixture of Ukraine_Neolithic and a Novodanilovka-like community similar to Suvorovo.

The Trypillian_Eneolithic-like admixture found among Proto-Corded Ware peoples (see below) would then feature potentially a small Steppe_Eneolithic-like component already present in the north Pontic area, too.

Image modified from Wang et al. (2018). Samples projected in PCA of 84 modern-day West Eurasian populations (open symbols). Previously known clusters have been marked and referenced. Marked and labelled are the Balkan samples referenced in this text An EHG and a Caucasus ‘clouds’ have been drawn, leaving Pontic-Caspian steppe and derived groups between them. See the original file here.

Furthermore, whereas Anthony (2019) mentions a long-lasting predominance of hg. R1b in elite graves of the Eneolithic Volga basin, not a single sample of hg. R1a is mentioned supporting the community formed by the Alexandria individual, supposedly belonging to late Sredni Stog groups, but with a Corded Ware-like genetic profile (suggesting yet again that it is possibly a wrongly dated sample).

NOTE. A lack of first-hand information rather than an absence of R1a-M417 samples in the north Pontic forest-steppes would not be surprising, since Anthony is involved in the archaeology of the Middle Volga, but not in that of the north Pontic area.

Khvalynsk expansion through the Pontic–Caspian steppes in the early 5th millennium BC. See full map.

3. Post-Stog and Proto-Corded Ware

The origin of the Pre-Corded Ware ancestry is still a mystery, because of the heterogeneity of the sampled groups to date, and because the only ancestral sample that had a compatible genetic profile – I6561 from Alexandria – shows some details that make its radiocarbon date rather unlikely.

The most likely explanation for the closest source population of Corded Ware groups, found in the three core samples of Steppe_Maykop and in Trypillian Eneolithic samples from the first half of the 4th millennium BC, is still that a population of north Pontic forest-steppe hunter-gatherers hijacked this kind of ancestry, that was foreign to the north Pontic region before the Late Eneolithic period, later expanding east and west through the Podolian–Volhynian upland, due to the complex population movements of the Late Eneolithic.

NOTE. The idea of Trypillia influencing the formation of the Steppe_MLBA ancestry proper of Uralic peoples has been around for quite some time already, since the publication of Narasimhan et al. (2018) (see here or here).

Detail of the PCA of Eurasian samples, including Corded Ware groups and related clusters, as well as outliers, with hypothesized gene flows related to the (1) formation and (2) initial expansion of Pre-Corded Ware ancestry, as well as (3) later regional admixture events. See full image.

The specifics of how the Proto-Corded Ware community emerged remain unclear at this point, despite the simplistic description by Rassamakin (1999) of the Late Eneolithic north Pontic population movements as a two-stage migration of 1) late Trypillian groups (Usatovo) west → east, and (2) Late Maykop–Novosvobodnaya east → west. So, for example, Manzura (2016) on the Zhivotilovka “cultural-historical horizon” (emphasis mine):

Indeed, the very complex combination of different cultural traits in the burial sites of the Zhivotilovka type is able to generate certain problems in the search for the origins of this phenomenon. The only really consistent attribute is the burial rite in contracted position on the left or right side. Yu. Rassamakin is correct in asserting that this position of the deceased can be considered as new in the North Pontic region (Rassamakin 1999, 97). However, this opinion can be accepted only partially for the territory between Dniester and Lower Don. This position is well known in the Usatovo culture in the Northwest Pontic region, although skeletons on the right side are evidenced there only in double burials, whereas single burials contain the deceased only in a contracted position on the left side. On the other hand, the southern and western orientation of the deceased, which is one of the main burial traits of the Zhivotilovka type, is not characteristic of the Usatovo culture. Nevertheless, it is possible to suppose that at least part of the Usatovo population could have played a part in the formation of the cultural type under consideration here. One aspect of this cultural tradition, for instance, could be represented by skeletons on the left side and oriented in north-eastern and eastern directions.

Especially close ties can be traced between the Zhivotilovka and Maykop-Novosvobodnaya traditions, as exemplified by similar burial customs and various grave goods. It is beyond any doubt that the Maykop-Novosvobodnaya population was actively involved in the spread of the main Zhivotilovka cultural traits. The influence of North Caucasian traditions can be well observed, at least as far as the Dnieper Basin, but farther west influence is not manifested pronouncedly. The role of cultural units situated between the Dniester and Don rivers in the process of emergence of the Zhivotilovka type looks somewhat vague. Now, it can be quite confidently asserted that at the end of the 4th millennium BC this territory was settled by migrants from the North Caucasus and Carpathian-Dniester region. This event in theory had to stimulate cultural transformations in the Azov-Black Sea steppes and, thus, bearers of local cultural traditions perhaps could have participated in forming the culture under consideration. In any event, the Zhivotilovka type can be regarded as a complex phenomenon that emerged within the regime of intensive cultural dialogue and that it absorbed totally diff erent cultural traditions. The spread of the Zhivotilovka graves across the Pontic steppes from the Carpathians to the Lower Don or even to the Kuban Basin clearly signalizes a rapid dissolution of former cultural borders and the beginning of active movements of people, things and ideas over vast territories.


What were the factors or reasons that could have provoked this event? In the beginning of the second half of the 4th millennium BC two advanced cultural centers emerged in the south of Eastern Europe. These were the Maykop-Novosvobodnaya and Usatovo cultures, which in spite of their separation by great distances were structurally very alike. This is expressed in similar monumental burial architecture, complex burial rites, even the composition of grave goods, developed bronze metallurgy, high standards of material culture, etc. Both cultures in a completely formed state exemplify prosperous societies with a high level of economic and social organization, which can correspond to the type of ranked or early complex societies. Normally, the social elite in such polities tends to rigidly control basic domains social, economic and spiritual life using different mechanisms, even open compulsion (Earle 1987, 294-297). To some extent similar social entities can be found at this moment in the forest-steppe zone of the Carpathian-Dniester region, as reflected by the well organized settlement of Brânzeni III and the Vykhatitsy cemetery (Маркевич 1981; Дергачев 1978). In spite of their complex character, such societies represent rather friable structures, which could rapidly disintegrate due to unfavourable inner or external factors.

The societies in question emerged and existed during a time of favourable natural climatic conditions, which is considered to be a transitional period from the Atlantic to the Subboreal period, lasting approximately from 3600 to 3300 cal BC, or a climatic optimum for the steppe zone (Иванова и др. 2011, 108; Спиридонова, Алешинская 1999, 30-31). These conditions to a large degree could guarantee a stable exploitation of basic resources and support existing social hierarchies. However, after 3300 cal BC significant climatic changes occurred, accompanied by an increasing aridization and fall in temperature. This event is usually termed the “Piora oscillation” or “Rapid Climatic Event”, and is regarded as having been of global character (Magny, Haas 2004). These rapid changes could have seriously disturbed existing economic and social relations and finally provoked a similar rapid disintegration of complex social structures. In this case the sites of the Zhivotilovka type could represent mere fragments of former prosperous societies, which under conditions of the absence of centralized social control and stable cultural borders tried to recombine social and economic ties. However, the population possessed the necessary social experience and important technological resources, such as developed stock-breeding based on the breeding of small cattle and wheeled transport, so they were ready for opening new territories in their search for a better life.

Disintegration, migration, and imports of the Azov–Black Sea region. First migration event (solid arrows): Gordineşti–Maikop expansion (groups: I – Bursuchensk; II – Zhyvotylivka; III – Vovchans’k; IV – Crimean; V – Lower Don; VI – pre-Kuban). Second migration event (hollow arrows): Repin expansion. After Rassamakin (1999), Demchenko (2016).

For more on chronology and the potentially larger, longer-lasting Zhivotilovka–Volchansk–Gordineşti cultural horizon and its expansion through the Podolian–Volhynian upland, read e.g. on the Yampil Complex in the latest volume 22 of Baltic-Pontic Studies (2017):

In the forest-steppe zone of the North-West Pontic area, important data concerning the chronological position of the Zhivotilovka-Volchansk group have been produced by the exploration of the Bursuceni kurgan, which is still awaiting full publication [Yarovoy 1978; cf. also Demcenko 2016; Manzura 2016]. Burials linked with the mentioned group were stratigraphically the eldest in the kurgan, and pre-dated a burial in the extended position and [Yamnaya culture] graves. Two of these burials (features 20 and 21) produced radiocarbon dates falling around 3350-3100 BC [Petrenko, Kovaliukh 2003: 108, Tab. 7]. Similar absolute age determinations were obtained for Podolia kurgans at Prydnistryanske [Goslar et al. 2015]. These dates, falling within the Late Eneolithic, mark the currently oldest horizon of kurgan burials in the forest-steppe zone of the North-West Pontic area. The Podolia graves linked with other, older traditions of the steppe Eneolithic seem to represent a slightly later horizon dated to the transition between the Late Eneolithic and Early Bronze Age.

The presence on the left bank of the Dniester River of kurgans associated with the Eneolithic tradition, which at the same time reveals connections with the Gordineşti-Kasperovce-Horodiştea complex, raises questions about the western range of the new trend in funerary rituals, and its potential connection with the expansion of the late Trypilia culture to the West Podolia and West Volhynia Regions. The data potentially suggesting the attribution of kurgans from the upper Dniester basin to this period is patchy and difficult to verify [e.g. Liczkowce – see Sulimirski 1968: 173]. In this context, the discovery of vessels in the Gordineşti style in a kurgan at Zawisznia near Sokal is inspiring [Antoniewicz 1925].

Burials representing funerary traditions of Zhivotilovka-Volchansk group in Podolie kurgans: 1 – Porohy, grave 3A/7, 2 – Kuzmin, grave 2/2 [after Klochko et al. 2015b, Bubulich, Khakhey 2001]

Another interesting aspect of potential source populations, in combination with those above for Eneolithic_Steppe vs. Steppe_Maykop, are groups with worse fits for Steppe_Maykop_core, which include Potapovka and Srubnaya, as reported by Wang et al. (2018), but also Sintastha_MLBA (although not Andronovo). This is compatible with the long-term admixture of Abashevo chiefs dominating over a majority of Poltavka-like herders in the Don-Volga-Ural steppes during the formation of the Sintashta-Potapovka-Filatovka community, also visible in the typical Yamnaya lineages and Yamnaya-like ancestry still appearing in the region centuries after the change in power structures had occurred.

NOTE. If you feel tempted to test for mixtures of Khvalynsk_EN, Eneolithic_Steppe, Yamnaya, etc. as a source population for Corded Ware, go for it, but it’s almost certain to give similar ‘good’ fits – whatever the model – in some Corded Ware groups and not in others. It is still unclear, as far as I know, how to formally distinguish a mixture of Corded Ware-related from a Yamnaya-related source in the same model, and the results obtained with a combination of Steppe_Maykop-related + Eneolithic_Steppe-related sources will probably artificially select either one or the other source, as it probably happened in Ning et al. (2019) with Proto-Tocharian samples (see qpAdm values) that most likely had a contribution of both, based on their known intense interactions in the Tarim Basin.

Expansion of north Pontic cultures and related groups during the Late Eneolithic. See full map.

4. Yamnaya and Afanasievo

I don’t think it makes much sense to test for GAC (or Iberia_CA, for that matter) as Wang et al. (2019) did, given the implausibility of them taking part in the formation of late Repin during the mid-4th millennium BC around the Don-Volga interfluve (represented by its offshoots Yamnaya and Afanasievo), whether these or other EEF-related populations show ‘better’ fits or not. Therefore, I only tested for more or less straightforward potential source populations:

Detail of the PCA of Eurasian samples, including Yamnaya groups and related clusters, as well as outliers, with hypothesized gene flows related to its (1) formation and (2) expansion. Also included is the inferred position of the admixed sample Yamnaya_Hungary_EBA1. See full image.

Quite unexpectedly – for me, at least – it appears that Afanasievo and Yamnaya invariably prefer Khvalynsk_EN as the closest source rather than a combination including Eneolithic_Steppe directly. In other words, late Repin shows largely genetic continuity with the Steppe ancestry already shown by the three sampled individuals from the Khvalynsk II cemetery, in line with the known strong bottlenecks of Khvalynsk-related groups under R1b lineages, visible also later in Afanasievo and Yamnaya and derived Indo-European-speaking groups under R1b-L23 subclades.

NOTE. This explains better the reported bad fits of models using directly Eneolithic_Steppe instead of Khvalynsk_EN for Afanasievo and Yamnaya Kalmykia, as is readily evident from the results above, instead of a rejection of an additional contribution to an Eneolithic_Steppe-like population, as I interpreted it, based on Anthony (2019).

Map of major sites of the Zhivotilovka-Volchansk group (A) and Repin culture (B), by Rassamakin (see 1994 and 2013). (A) 1 – Primorskoye; 2 – Vasilevka; 3 – Aleksandrovka; 4 – Boguslav; 5 – Pavlograd; 6 – Zhivotilovka; 7 – Podgorodnoye; 8 – Novomoskovsk; 9- Sokolovo; 10 – Dneprelstan; 11- Razumovka; 12 – Pologi; 13 – Vinogradnoye; 14 – Novo-Filipovka; 15 – Volchansk; 16 – Yuryevka; 17 – Davydovka; 18 – Novovorontsovka; 19 – Ust-Kamenka; 20 – Staroselye; 21- Velikaya Aleksandrovka; 22- Kovalevka; 23 – Tiraspol; 24 – Cura-Bykuluy; 25 – Roshkany; 26 – Tarakliya; 27 – Kazakliya; 28 – Bolgrad; 29 – Sarateny; 30 – Bursucheny; 31 – Novye Duruitory; 232 – Kosteshty. (B) 1 – Podgorovka; 2 – Aleksandria; 3 – Volonterovka; 4 – Zamozhnoye; 5 – Kremenevka; 6 – Ogorodnoye; 7 – Boguslav; 8 – Aleksandrovka; 9 – Verkhnaya Mayevka; 10 – Duma Skela; 11 – Zamozhnoye; 12 – Mikhailovka II.

This might suggest that the Steppe ancestry visible in samples from Progress-2 and Vonyuchka, sharing the same cluster with the Khvalynsk II cemetery commoner of hg. Q1, most likely represents North Caspian or Black Sea–Caspian steppe hunter-gatherer ancestry that increased as Khvalynsk settlers expanded to the south-west towards the Greater Caucasus, probably through female exogamy. That would mean that Steppe_Maykop potentially represents the ‘original’ ancestry of steppe hunter-gatherers of the North Caucasus steppes, which is also weakly supported by the available similar admixture of the Lola culture. The chronology, geographical location and admixture of both clusters seemed to indicate the opposite.

Modelling results for the Steppe and Caucasus cluster. Additional ‘eastern’ AG-Siberian gene flow in Steppe Maykop relative to Eneolithic Steppe. From Wang et al. (2019).

Due to the limitations of the currently available sampling and statistical tools, and barring the dubious Alexandria outlier, it is unclear how much of the late Trypillian-related admixture of late Repin (as reflected in Yamnaya and Afanasievo) corresponds to late Trypillian, Post-Stog, or Proto-Corded Ware groups from the north Pontic area. A mutual exchange suggestive of a common mating network (also supported by the mixed results obtained when including Khvalynsk_EN as source for early Corded Ware groups) seem to be the strongest proof to date of the Late Proto-Indo-European – Uralic contacts reflected in the period when post-laryngeal vocabulary was borrowed (with some samples predating the merged laryngeal loss), before the period of intense borrowing from Pre- and Proto-Indo-Iranian.

Between-group differences of Yamnaya samples are caused – like those between Corded Ware groups – by the admixture of a rapidly expanding society through exogamy with regional populations, evidenced by the inconstant affinities of western or southern outliers for previous local populations of the west Pontic or Caucasus area. This explanation for the gradual increase in local admixture is also supported by the strong, long-term patrilineal system and female exogamy practiced among expanding Proto-Indo-Europeans.

Groups of the Yamnaya culture and its western expansion after ca. 3100 BC, and Corded Ware after ca. 2900 BC See full map.

Bell Beakers and Mycenaeans

This Eneolithic_Steppe ancestry is also found among Bell Beaker groups (see above). More specifically, all Bell Beaker groups prefer a source closest to a combination of Yamnaya from the Don and Baden LCA individuals from Hungary, rather than with Corded Ware and GAC, despite the quite likely admixture of western Yamnaya settlers with (1) south-eastern European (west Pontic, Balkan) Chalcolithic populations during their expansion through the Lower Danube and with (2) late Corded Ware groups (already admixed with GAC-like populations) during their expansion as East Bell Beakers:

Similarly, Mycenaeans show good fits for a source close to the Yamnaya outlier from Bulgaria:

Detail of the PCA of Eurasian samples, including Bell Beaker and Balkan EBA groups and related clusters, as well as outliers, including ancestral Yamnaya samples from Hungary (position inferred) and Bulgaria. Also marked are Minoans, Mycenaeans and Armenian BA samples. See full image.

You can read more on Yamnaya-related admixture of Bell Beakers and Mycenaeans, and on Afanasievo-related admixture of Iron Age Proto-Tocharians.


The use of the concept of “Yamnaya ancestry”, then “Steppe ancestry” (and now even “Yamnaya Steppe ancestry“?) has already permeated the ongoing research of all labs working with human population genomics. Somehow, the conventional use of Yamnaya_Samara samples opposed to a combination of other ancient samples – alternatively selected among WHG, EHG, CHG/Iran_N, Anatolia_N, or ANE – has spread and is now unquestionably accepted as one of the “three quite distinct” ancestral groups that admixed to form the ancestry of modern Europeans, which is a rather odd, simplistic and anachronistic description of prehistory…

It has now become evident that authors involved with the Proto-Indo-European homeland question – and the tightly intertwined one of the Proto-Uralic homeland – are going to dedicate a great part of the discussion of many future papers to correct or outright reject the conclusions of previous publications, instead of simply going forward with new data.

The most striking argument to mistrust the current use of “Steppe ancestry” (as an alternative name for Yamnaya_Samara, and not as ancestry proper of steppe hunter-gatherers) is not the apparent difference in direct Eneolithic sources of Steppe ancestry for Corded Ware and Yamnaya-related peoples – closer to the available samples classified as Steppe_Maykop and Eneolithic_Steppe, respectively – or their different evolution under marked Y-DNA bottlenecks.

It is not even the lack of information about the distant origin of these Pontic–Caspian steppe hunter-gatherers of the 5th and 4th millennium BC, with their shared ancestral component potentially separated during the warmer Palaeolithic-Mesolithic transition, when the steppes were settled, without necessarily sharing any meaningful recent history before the formation of the Proto-Indo-Uralic community.

NOTE. I have raised this question multiple times since 2017 (see e.g. here or here).

The most striking paradox about simplistically misinterpreting “Steppe ancestry” as representative of Indo-European expansions is that those sub-Neolithic Pontic–Caspian steppe hunter-gatherers that had this ancestry in the 6th millennium BC were probably non-Indo-European-speaking communities, most likely related to the North(West) Caucasian language family, based on the substrate of Indo-Anatolian that sets it apart from Uralic within the Indo-Uralic trunk, and on later contacts of Indo-Tocharian with North-West Caucasian and Kartvelian, the former probably represented by Maykop and its contact with the Repin and early Yamnaya cultures.

NOTE. For more on this, see Allan Bomhard’s recent paper on the Caucasian substrate hypothesis and its ongoing supplement Additional Proto-Indo-European/Northwest Caucasian Lexical Parallels.

“Spatiotemporal kriging of YAM steppe ancestry during the Holocene, using 5000 spatial grid points. The colors represent the predicted ancestry proportion at each point in the grid.” Image with evolution from ca. 2800 BC until the present day, modified from Racimo et al. (2019). The Copenhagen group considers the expansion of this component as representative of expanding Indo-Europeans…

This kind of error happens because we all – hence also authors, peer reviewers, and especially journal editors – love far-fetched conclusions and sensational titles, forgetting what a paper actually shows and – always more importantly in scientific reports – what it doesn’t show. This is particularly true when more than one field is involved and when extraordinary claims involve aspects foreign to the journal’s (and usually the own authors’) main interests. One would have thought that the glottochronological fiasco published in Science in 2012 (open access in PMC) should have taught an important lesson to everyone involved. It didn’t, because apparently no one has felt the responsibility or the shame to retract that paper yet, even in the age of population genomics.

If anything, the excesses of mathematical linguistics – using computational methods to try and reconstruct phylogenetic trees – have perpetuated a form of misunderstood Scientism which blindly relies on a simple promise made by authors in the Materials and Method section (rarely if ever kept beyond it) to use statistics rather than resorting to the harder, well-informed, comprehensive reasoning that is needed in the comparative method. After all, why should anyone invest hundreds of hours (or simply show an interest in) learning about historical linguistics, about ancient Indo-European or Uralic languages, carefully argumenting and discussing each and every detail of the reconstruction, when one can simply rely on the own guts to decide what is Science and what isn’t? When one can trust a promise that formulas have been used?

The conservative, null hypothesis when studying prehistoric Eurasian samples related to evolving cultures was universally understood as no migration, or “pots not people” (as most western archaeologists chose to believe until recently), whereas the alternative one should have been that there were in fact migration events, some of them potentially related to the expansion of Eurasian languages ancestral to the historically attested ones. Beyond this migrationist view there were obviously dozens of thorough theories concerning potential linguistic expansions associated with specific prehistoric cultures, and a myriad of less developed alternatives, all of which deserved to be evaluated after the null hypothesis had been rejected.

Despite the shortcomings of the 2015 papers and their lack of testing or discussion of different language expansion models, the spread of the so-called “Yamnaya ancestry” – an admixture especially prevalent (after the demise of the Yamnaya) among the most likely ancient Uralic-speaking groups as well as among modern Uralic speakers and recently acculturated groups from Eastern Europe – has been nevertheless invariably concluded by each lab to support the theories of their leading archaeologist, often combined with pre-aDNA theories of geneticists based on modern haplogroup distributions. This is as evident a case of confirmation bias, circular reasoning, and jumping to conclusions as it gets.

Why many researchers of other labs have chosen to follow such conclusions instead of challenging or simply ignoring them is difficult to understand.


Bell Beakers and Mycenaeans from Yamnaya; Corded Ware from the forest steppe


I have recently written about the spread of Pre-Yamnaya or Yamnaya ancestry and Corded Ware-related ancestry throughout Eurasia, using exclusively analyses published by professional geneticists, and filling in the gaps and contradictory data with the most reasonable interpretations. I did so consciously, to avoid any suspicion that I was interspersing my own data or cherry picking results.

Now I’m finished recapitulating the known public data, and the only way forward is the assessment of these populations using the available datasets and free tools.

Understanding the complexities of qpAdm is fairly difficult without a proper genetic and statistical background, which I won’t pretend to have, so its tweaking to get strictly correct results would require an unending game of trial and error. I have sadly little time for this, even taking my tendency to procrastination into account… so I have used a simple model akin to those published before – in particular, the outgroup selection by Ning, Wang et al. (2019), who seem to be part of the only group interested in distinguishing Yamnaya-related from Corded Ware-related ancestry, probably the most relevant question discussed today in population genomics regarding the Proto-Indo-European and Proto-Uralic homelands.

Supplementary Table 13. P values of rank=2 and admixture proportions in modelling Steppe ancestry populations as a three-way admixture of Eneolithic steppe Anatolian_Neolithic and WHG using 14 outgroups.
Left populations: Test, Eneolithic_steppe, Anatolian_Neolithic, WHG.
Right populations: Mbuti.DG, Ust_Ishim.DG, Kostenki14, MA1, Han.DG, Papuan.DG, Onge.DG, Villabruna, Vestonice16, ElMiron, Ethiopia_4500BP.SG, Karitiana.DG, Natufian, Iran_Ganj_Dareh_Neolithic.

I have used for all analyses below a merged dataset including the curated one of the Reich Lab, the latest on Central and South Asia by Narasimhan, Patterson et al. (2019), on Iberia by Olalde et al. (2019), and on the East Baltic by Saag et al. (2019), as well as datasets including samples from Wang et al. (2019) and Lamnidis et al. (2018). I used (and intend to use) the same merged dataset in all cases, despite its huge size, to avoid adding one more uncontrolled variable to the analyses, so that all results obtained can be compared.

I try to prepare in advance a bunch of relevant files with left pops and right pops for each model:

  1. It seems a priori more reasonable to use geographically and chronologically closer proxy populations (say, Trypillia or GAC for Steppe-related peoples) than hypothetic combinations of ancestral ones (viz. Anatolian farmer, WHG, and EHG).
  2. This also means using subgroups closer to the most likely source population, such as (Don-Volga interfluve) Yamnaya_Kalmykia rather than (Middle Volga) Yamnaya_Samara for the western expansion of late Repin/early Yamnaya, or the early Germany_Corded_Ware.SG or Czech_Corded Ware for the group closest to the Proto-Corded Ware population (see below), likely neighbouring the Upper Vistula region.
  3. I usually test two source populations for different targets, which seems like a much more efficient way of using computer resources, whenever I know what I want to test, since I need my PC back for its normal use; whenever I don’t know exactly what to test, I use three-way admixture models and look for subsets to try and improve the results.

I have probably left out some more complex models by individualizing the most relevant groups, but for the time being this would have to do. Also, no other formal stats have been used in any case, which is an evident shortcoming, ruling out an interpretation drawn directly and only from the results below.

Full qpAdm results for each batch of samples are presented in a Google Spreadsheet, with each tab (bottom of the page) showing a different combination of sources, usually in order of formally ‘best’ (first to the left) to ‘worst’ (last to the right) fits, although the order is difficult to select in highly heterogeneous target groups, as will be readily visible.

Disintegration, migration, and imports of the Azov–Black Sea region. First migration event (solid arrows): Gordineşti–Maikop expansion (groups: I – Bursuchensk; II – Zhyvotylivka; III – Vovchans’k; IV – Crimean; V – Lower Don; VI – pre-Kuban). Second migration event (hollow arrows): Repin expansion. After Rassamakin (1999), Demchenko (2016).

Corded Ware origins

The latest publications on the Yampil barrow complex have not improved much our understanding of the complexity of Corded Ware origins from an archaeological point of view, involving multiple cultural (hence likely population) influences. This bit is from Ivanova et al., Baltic-Pontic Studies (2015) 20:1, and most hypotheses of the paper remain unanswered (except maybe for the relevance of the Złota group):

In the light of the above outline therefore one should argue that the ‘architecture of barrows’ associated in the ‘Yampil landscape’ of the Middle Dniester Area with the Eneolithic (specifically, mainly with the TC), precedes the development of a similar phenomenon that can be observed from 2900/2800 BC in the Upper Dniester Area and drainage basin of the Upper Vistula, associated with the CWC [Goslar et al. 2015; Włodarczak 2006; 2007; 2008; Jarosz, Włodarczak 2007]. The most consuming research question therefore is whether ritual customs making use of Eneolithic (Tripolye) ‘barrow architecture’ could have penetrated northwards along the Dniester route, where GAC communities functioned. One could also ask what role the rituals played among the autochthons [Kośko 2000; Włodarczak 2008; 2014: 335; Ivanova, Toshchev 2015b].

This issue has already been discussed with a resulting tentative systemic taxonomy in the studies of Włodarczak, arguing for the Złota culture (ZC) in the Vistula region as an illustration of one of the (Małopolska) reception centres of civilization inspirations from the oldest Pontic ‘barrow culture’ circle associated with the Eneolithic and Early Bronze Age [Włodarczak 2008]. Notably, it is in the ZC that one can notice a set of cultural traits (catacomb grave construction, burial details, forms and decoration of vessels) analogous to those shared by the north-western Black Sea Coast groups of the forest-steppe Eneolithic (chiefly Zhyvotilovka-Volchansk) and the Late Tripolye circle (chiefly Usatovo-Gordinești-Horodiștea-Kasperovtsy).

Globular Amphorae culture „exodus” to the Danube Delta: a – Globular Amphorae culture; b – GAC (1), Gorodsk (2), Vykhvatintsy (3) and Usatovo (4) groups of Trypillia culture; c – Coţofeni culture; d – northern border of the late phase of Baden culture;red arrows – direction of Globular Amphora culture expansion; blue arrow – direction of „reflux” of Globular Amphora culture (apud Włodarczak, 2008, with changes).

Taking into account that I6561 might be wrongly dated, we cannot include the Corded Ware-like sample of the end-5th millennium BC in the analysis of Corded Ware origins. That uncertainty in the chronology of the appearance of “Steppe ancestry” in Proto-Corded Ware peoples complicates the selection of any potential source population from the CHG cline.

Nevertheless, the lack of hg. R1a-M417 and sizeable Pre-Yamnaya-related ancestry in the sampled Pontic forest-steppe Eneolithic populations (represented exclusively by two samples from Dereivka ca. 3600-3400 BC) would leave open the interesting possibility that a similar ancestry got to the forest-steppe region between modern Poland and Ukraine during the known complex population movements of the Late Eneolithic.

It is known that Corded Ware-derived groups and Steppe Maykop show bad fits for Pre-Yamnaya/Yamnaya ancestry, and also that Steppe Maykop is a potential source of “Steppe-related ancestry” within the Eneolithic CHG mating network of the Pontic-Caspian steppes and forest-steppes. Testing Corded Ware for recent Trypillia and Maykop influences, proper of Late Trypillia and Late Maykop groups in the North Pontic area (such as Zhyvotylivka–Vovchans’k and Gordineşti) side by side with potential Pre-Yamnaya and Yamnaya sources makes thus sense:

Now, the main obvious difference between Khvalynsk-Yamnaya and Corded Ware is the long-lasting, pervasive Y-chromosome bottlenecks under R1b lineages in the former, compared to the haplogroup variability and late bottleneck under R1a-M417 in the latter, which speaks in favour – on top of everything else – of a different community of sub-Neolithic hunter-gatherers including hg. R1a-M417 hijacking the expansion of Steppe_Maykop-related ancestry around the Volhynian-Podolian Upland.

Akin to how Yamnaya patrilineal descendants hijacked regional EEF (±CWC) ancestry components mainly through exogamy, dragging them into the different expanding Bell Beaker groups (see below), but kept their Indo-European languages, these hunter-gatherers that admixed with peoples of “Steppe ancestry” were the most likely vector of expansion of Uralic languages in Eastern Europe.

PCA of ancient Eurasian samples. Marked likely Proto-Corded Ware samples and potential origin of its PCA cluster based on qpAdm results. See full PCA and more related files.

Baltic Corded Ware

One of the most interesting aspects of the results above is the surprising heterogeneity of the different regional groups, which is also reflected in the Y-DNA variability of early Corded Ware samples.

Seeing how Baltic CWC groups, especially the early Latvia_LN sample, show particularly bad fits with the models above, it seems necessary to test how this population might have come to be. My first impression in 2017 was that they could represent early Corded Ware groups admixed with Yamnaya settlers through their interactions along the Dnieper-Dniester corridor.

However, I recently predicted that the most likely admixture leading to their ancestry and PCA cluster would involve a Corded Ware-like group and a group related to sub-Neolithic cultures of eastern Europe, whose best proxy to date are EHG-like Khvalynsk samples (i.e. excluding the outlier with Pre-Yamnaya ancestry, I0434):

Detail of the PCA of the Corded Ware expansion. See full PCA and more related files.

Late Corded Ware + Yamnaya vanguard

Relevant are also the mixtures of Corded Ware from Esperstedt, and particularly those of the sample I0104, which I have repeated many times in this blog I suspected to be influenced by vanguard Yamnaya settlers:

The infeasible models of CWC + Yamnaya_Kalmykia ± Hungary_Baden (see below for Bell Beakers) and the potential cluster formed with other samples from the Baltic suggest that it could represent a more complex set of mixtures with sub-Neolithic populations. On the other hand, its location in Germany, late date (ca. 2500 BC or later), and position in the PCA, together with the good fits obtained for Germany_Beaker as a source, suggest that the increase in Steppe-related ancestry + EEF makes it impossible for the model (as I set it) to directly include Yamnaya_Kalmykia, despite this excess Steppe-related ancestry actually coming from Yamnaya vanguard groups.

I think it is very likely that the future publication of EEF-admixed Yamnaya_Hungary samples (or maybe even Yamnaya vanguard samples) will improve the fits of this model.

These results confirm at least the need to distrust the common interpretation of mixtures including late Corded Ware samples from Esperstedt (giving rise to the “up to 75% Yamnaya ancestry of CWC” in the 2015 papers) as representative of the Corded Ware culture as a whole, and to keep always in mind that an admixture of European BA groups including Corded Ware Esperstedt as a source also includes East BBC-like ancestry, unless proven otherwise.

Yamnaya vanguard groups in Corded Ware territory before the expansion of Bell Beakers (ca. 2500 BC). See full map.

Bell Beaker expansion

A hotly (re)debated topic in the past 6 months or so, and for all the wrong reasons, is the origin of the Bell Beaker folk. Archaeology, linguistics, and different Y-chromosome bottlenecks clearly indicate that Bell Beakers were at the origin of the North-West Indo-European expansion in Europe, while the survival of Corded Ware-related groups in north-eastern Europe is clearly related to the expansion of Uralic languages.

NOTE. For the interesting case of Proto-Indo-Iranians expanding with Corded Ware-like ancestry, see more on the formation of Sintashta-Potapovka-Filatovka from East Uralic-speaking Abashevo and Pre-Proto-Indo-Iranian-speaking Poltavka herders. See also more on R1a in Indo-Iranians and on the social complexity of Sintashta.

Nevertheless, every single discarded theory out there seems to keep coming back to life from time to time, and a new wave of interest in “Bell Beaker from the Single Grave culture” somehow got revived in the process, too, because this obsession – unlike the “Bell Beakers from Iberia Chalcolithic” – is apparently acceptable in certain circles, for some reason.

We know that Iberian Beakers, British Beakers, or Sicilian EBA – representing the most likely closest source population of speakers of Proto-Galaico-Lusitanian, Pre-Celtic Indo-European, and Proto-Elymian, respectively – have already been successfully tested for a direct origin among Western European Beakers in Olalde et al. (2018), Olalde et al. (2019), and Fernandes et al. (2019).

This success in ascertaining a closer Beaker source is probably due to the physical isolation of the specific groups (related to Germany_Beaker, Netherlands_Beaker, and NE_Mediterranean_Beaker samples, respectively) after their migration into regions dominated by peoples without Steppe-related ancestry. Furthermore, Celtic-speaking populations expanding with Urnfield south of the Pyrenees also show a good fit with a source close to France_Beaker.

So I decided to test sampled Bell Beaker populations, to see if it could shed light to the most likely source population of individual Beaker groups and the direction of migration within Central Europe, i.e. roughly eastwards or westwards. As it was to be expected for closely related populations (see the relevant discussion here), an attempt to offer a simplistic analysis of direction based on formal stats does not make any sense, because most of the alternative hypotheses cannot be rejected:

Not only because of the similar values obtained, but because it is absurd to take p-values as a measure of anything, especially when most of these conflicting groups with slightly ‘better’ or ‘worse’ p-values represent multiple different mixtures of the type (Yamnaya + EEF) + (Corded Ware + EEF ± Yamnaya), impossible to distinguish without selecting proper, direct ancestral populations…

A further example of how explosive the Bell Beaker expansion was into different territories, and of their extensive local admixture, is shown by the unsuccessful attempt by Olalde et al. (2018) to obtain an origin of the EEF source for all Beaker groups (excluding Iberian Beakers):

Investigating the genetic makeup of Beaker-complex-associated individuals. Testing different populations as a source for the Neolithic ancestry component in Beaker-complex-associated individuals. The table shows P values (* indicates values > 0.05) for the fit of the model: ‘Steppe_EBA + Neolithic/Copper Age’ source population.
Map of attested Yamnaya pit-grave burials in the Hungarian plains; superimposed in shades of blue are common areas covered by floods before the extensive controls imposed in the 19th century; in orange, cumulative thickness of sand, unfavourable loamy sand layer. Marked are settlements/findings of Boleráz (ca. 3500 BC on), Baden (until ca. 2800 BC), Kostolac (precise dates unknown), and Yamna kurgans (from ca. 3100/3000 BC on).

Now, there is a simpler way to understand what kind of Steppe-related ancestry is proper of Bell Beakers. I tested two simple models for some Beaker groups: Yamnaya + Hungary Baden vs. Corded Ware + GAC Poland. After all, the Bell Beaker folk should prefer a source more closely related to either Yamnaya Hungary or Central European Corded Ware:

Interestingly, models including Yamnaya + Baden show good fits for the most important groups related to North-West Indo-Europeans, including Bell Beakers from Germany, the Netherlands, Italy, and Poland, representing the most likely closest source populations of speakers of Pre-Proto-Celtic, Pre-Proto-Germanic, Proto-Italo-Venetic, and Pre-Proto-Balto-Slavic, respectively.

The admixed Yamnaya samples from Hungary that will hopefully be published soon by the Jena Lab will most likely further improve these fits, especially in combination with intermediate Chalcolithic populations of the Middle and Upper Danube and its tributaries, to a point where there will be an absolute chronological and geographical genomic trail from the fully Yamnaya-like Yamnaya settlers from Hungary to all North-West Indo-European-speaking groups of the Early Bronze Age.

The only difference between groups will be the gradual admixture events of their source Beaker group with local populations on their expansion paths, including peoples of mainly EEF, CWC+EEF, or CWC+EEF+Yamnaya related ancestry. There is ample evidence beyond ancestry models to support this, in particular continued Y-DNA bottlenecks under typical Yamnaya paternal lineages, mainly represented by R1b-L51 subclades.

Distribution of the Bell Beaker East Group, with its regional provinces, as of c. 2400 cal BC (after Heyd et al. 2004, modified). See full maps.

European Early Bronze Age

European EBA groups that might show conflicting results due to multiple admixture events with Corded Ware-related populations are the Únětice culture and the Nordic Late Neolithic.

The results for Únětice groups seem to be in line with what is expected of a Central European EBA population derived from Bell Beakers admixed with surrounding poulations of East Bell Beaker and/or late (Epi-)Corded Ware descent.

Potential models of mixture for Nordic Late Neolithic samples – despite the bad fits due to the lack of direct ancestral CWC and BBC groups from Denmark – seem to be impossible to justify as derived exclusively from Single Grave or (even less) from Battle Axe peoples, supporting immigration waves of Bell Beakers from the south and further admixture events with local groups through maritime domination.

PCA of ancient European samples. Marked are Bronze Age clusters. See full PCAs.

Balkans Bronze Age

The potential origin of the typical Corded Ware Steppe-related ancestry in the social upheaval and population movements of the Dnieper-Dniester forest-steppe corridor during the 4th millennium BC raises the question: how much do Balkan Bronze Age groups owe their ancestry to a population different than the spread of Pre-Yamnaya-like Suvorovo-Novodanilovka chieftains? Furthermore, which Bronze Age groups seem to be more likely derived exclusively from Pre-Yamnaya groups, and which are more likely to be derived from a mixture of Yamnaya and Pre-Yamnaya? Do the formal stats obtained correspond to the expected results for each group?

Since the expansion of hg. I2a-L699 (TMRCA ca. 5500 BC) need not be associated with Yamnaya, some of these values – together with the assessment of each individual archaeological culture – may question their origin in a Yamnaya-related expansion rather than in a Khvalynsk-related one.

NOTE. These are the last ones I was able to test yesterday, and I have not thought these models through, so feel free to propose other source and target groups. In particular, complex movements through the North Pontic area during the Late Eneolithic would suggest that there might have been different Steppe-ancestry-related vs. EEF-related interactions in the north-west and west Pontic area before and during the expansion of Yamnaya.


One of the key Indo-European populations that should be derived from Yamnaya to confirm the Steppe hypothesis, together with North-West Indo-Europeans, are Proto-Greeks, who will in turn improve our understanding of the preceding Palaeo-Balkan community. Unfortunately, we only have Mycenaean samples from the Aegean, with slight contributions of Steppe-related ancestry.

Still, analyses with potential source populations for this Steppe ancestry show that the Yamnaya outlier from Bulgaria is a good fit:

The comparison of all results makes it quite evident the why of the good fits from (Srubnaya-related) Bulgaria_MLBA I2163 or of Sintashta_MLBA relative to the only a priori reasonable Yamnaya and Catacomb sources: it is not about some hypothetical shared ancestor in Graeco-Aryan-speaking East Yamnaya– or even Catacomb-Poltavka-related groups, because all available Yamnaya-related peoples are almost indistinguishable from each other (at least with the sampling available today). These results reflect a sizeable contribution of similar EEF-related populations from around the Carpathians in both Steppe-related groups: Corded Ware and Yamnaya settlers from the Balkans.

Cultural groups in and around the Balkans during the Early Bronze Age. See full maps.

qpAdm magic

In hobby ancestry magic, as in magic in general, it is not about getting dubious results out of thin air: misdirection is the key. A magician needs to draw the audience attention to ‘remarkable’ ancestry percentages coupled with ‘great’ (?) p-values that purportedly “prove” what the audience expects to see, distracting everyone from the true interesting aspects, like statistical design, the data used (and its shortcomings), other opposing models, a comparison of values, a proper interpretation…you name it.

I reckon – based on the examples above – that the following problems lie at the core of bad uses of qpAdm:

  1. In the formal aspect, the poor understanding of what p-values and other formal stats obtained actually mean, and – more importantly – what they don’t mean. The simplistic trend to accept results of a few analyses at face value is necessarily wrong, in so far as there is often no proper reasoning of what is being assessed and how, and there is never a previous opinion about what could be expected if the alternative hypotheses were true.
  2. In the interpretation aspect, the poor judgement of accompanying any results with simplistic, superficial, irrelevant, and often plainly wrong archaeological or linguistic data selected a posteriori; the inclusion of some racial or sociopolitical overtones in the mixture to set a propitious mood in the target audience; and a sort of ritualistic theatrics with the main theme of ‘winning’, that is best completed with ad hominems.

If you get rid of all this, the most reasonable interpretation of the output of a model proposed and tested should be similar to Nick Patterson’s words in his explanation of qpWave and qpAdm use:

Here we see that, at least in this analysis there are reasonable models with CordedWareNeolithic is a mix of either WHG or LBKNeolithic and YamnayaEBA. (…) The point of this note is not to give a serious phylogenetic analysis but the results here certainly support a major Steppe contribution to the Corded Ware population, which is entirely concordant with the archaeology [?].

Very far, as you can see, from the childish “Eureka! I proved the source!”-kind of thinking common among hobbyists.

The Mycenaean case is an illustrative example: if the Yamnaya outlier from Bulgaria were not available, and if one were not careful when designing and assessing those mixture models, the interpretation would range from erroneous (viz. a Graeco-Aryan substrate, as I initially thought) to impossible (say, inventing migration waves of Sintashta or Srubnaya peoples into Crete). The models presented above show that a contribution of Yamnaya to Mycenaeans couldn’t be rejected, and this alone should have been enough to accept Yamnaya as the most likely source population of “Steppe ancestry” in Proto-Greeks, pending intermediate samples from the Balkans. In other words, one could actually find that ‘the best’ p-values for source populations of Mycenaeans is a combination of modern Poles + Turks, despite the impracticality of such a model…

I haven’t been able to reproduce results which supposedly showed that Corded Ware is more likely to be derived from (Pre-)Yamnaya than other source population, or that Corded Ware is better suited as the ancestral population of Bell Beakers. The analyses above show values in line with what has been published in recent scientific papers, and what should be expected based on linguistics and archaeology. So I’ll go out on a limb here and say that it’s only through a careful selection of outgroups and samples tested, and of as few compared models as possible, that you could eventually get this kind of results and interpretation, if at all.

Whether that kind of special care for outgroups and samples is about (a) an acceptable fine-tuning of the analyses, (b) a simplistic selection dragged from the first papers published and applied indiscriminately to all models, or (c) cherry picking analyses until results fit the expected outcome, is a question that will become mostly irrelevant when future publications continue to support an origin of the expansion of ancient Indo-European languages in Khvalynsk- and Yamnaya-related migrations.

Feel free to suggest (reasonable) modifications to correct some of these models in the comments. Also, be sure to check out other values such as proportions, SD or SNPs of the different results that I might have not taken into account when assessing ‘good’ or ‘bad’ fits.


On the Ukraine Eneolithic outlier I6561 from Alexandria


Over the past week or so, since the publication of new Corded Ware samples in Narasimhan, Patterson et al. (2019) and after finding out that the R1a-M417 star-like phylogeny may have started ca. 3000 BC, I have been ruminating the relevance of contradictory data about the Ukraine_Eneolithic_o sample from Alexandria, its potential wrong radiocarbon date, and its implications for the Indo-European question.

How many other similar ‘controversial’ samples are there which we haven’t even considered? And what mechanisms are in place to control that the case of Hajji_Firuz_CA I2327 is not repeated?

Ukraine Eneolithic outlier I6561

It was not the first time that I (or many others) have alternatively questioned its subclade or its date, but the contradictory data seem to keep piling up. We can still explain all these discrepancies by assuming that the radiocarbon date is correct – seeing how it is a direct and newly reported lab analysis – because it is an isolated individual from a poorly sampled region, so he may actually be the first one to show features proper of later Corded Ware-related samples.

PCA of ancient Eurasian samples. An interpretation of the evolution of the Pontic-Caspian steppe populations in the Eneolithic. See full PCA.

The individual seems to be especially relevant for the Indo-European and Uralic homeland question. The last one to mention this sample in a publication was Anthony (2019), who considered it in common with two other Eneolithic samples from Dereivka to show how Anatolian farmer-related ancestry first appeared in the recently opened CHG mating network of the Pontic-Caspian steppes and forest-steppes during the Middle Eneolithic, after the expansion of Khvalynsk:

The currently oldest sample with Anatolian Farmer ancestry in the steppes in an individual at Aleksandriya, a Sredni Stog cemetery on the Donets in eastern Ukraine. Sredni Stog has often been discussed as a possible Yamnaya ancestor in Ukraine (Anthony 2007: 239- 254). The single published grave is dated about 4000 BC (4045–3974 calBC/ 5215±20 BP/ PSUAMS-2832) and shows 20% Anatolian Farmer ancestry and 80% Khvalynsk-type steppe ancestry (CHG&EHG). His Y-chromosome haplogroup was R1a-Z93, similar to the later Sintashta culture and to South Asian Indo-Aryans, and he is the earliest known sample to show the genetic adaptation to lactase persistence (I3910-T). Another pre-Yamnaya grave with Anatolian Farmer ancestry was analyzed from the Dnieper valley at Dereivka, dated 3600-3400 BC (grave 73, 3634–3377 calBC/ 4725±25 BP/ UCIAMS-186349). She also had 20% Anatolian Farmer ancestry, but she showed less CHG than Aleksandriya and more Dereivka-1 ancestry, not surprising for a Dnieper valley sample, but also showing that the old fifth-millennium-type EHG/WHG Dnieper ancestry survived into the fourth millennium BC in the Dnieper valley (Mathieson et al. 2018).

The main problem is that this sample has more than one inconsistent, anachronistic data compared to its reported precise radiocarbon date ca. 4045–3974 calBCE (5215±20BP, PSUAMS-2832). I summarized them on Twitter:

  • First known R1a-M417 sample, with subclade R1a-Y26 (Y2-), with formation date and TMRCA ca. 2750 BC (CI 95% ca. 3750–1950 BC), and proper of much later Steppe_MLBA bottlenecks. The closest available sample would be the Poltavka outlier of hg. R1a-Z94 (ca. 2700 BC), from a mixed cemetery that could belong to a later (likely Abashevo) layer; the closest related subclade is probably found in sample I12450 of Butkara_IA (ca. 800 BC).
  • NOTE. The formation date of upper clade R1a-Z93 is estimated ca. 3000 BC, with a CI 95% ca. 3550–2550 BC, suggesting that the actual TMRCA range for the subclade has most likely a lower maximum formation date than estimated with the available samples under Y3.

  • Ancestry and PCA cluster like Steppe_MLBA (see PCA below), different from neighbouring Sredni Stog samples of the roughly coetaneous Dereivka site (ca. 3600-3400 BC), and from a later Yamnaya sample from Dereivka (ca. 2800 BC), even more shifted toward WHG-related ancestry.
  • Allele for lactase persistence (I3910-T), found only much later among Bell Beakers, and still later in Sintashta and Steppe_MLBA samples. This suggests a strong selection in northern Europe and South Asia stemming from steppe-related (and not forest-steppe-related) peoples, postdating the age of massive Indo-European migrations.
  • Hajji Firuz Chalcolithic outlier

    My impression is that the Hajji_Firuz Chalcolithic outlier, initially dated ca. 5900-5500 BC, had much less reason to be questioned than this sample, since Pre-Yamnaya ancestry was (and apparently is still) believed by members of the Reich Lab to have come from south of the Caucasus, and to have arrived around that time or earlier to the North Caspian steppe, i.e. before the 5th millennium BC.

    The formation date of its initially reported haplogroup, R1b-Z2103, is ca. 4100 BC (CI 95% 4800-3500 BC), which seems also roughly compatible with that date and site – at least as compatible as R1a-Y3(xY2) is for ca. 4000 BC -, so it could have been interpreted as a migrant from the South Caspian region, potentially related to Proto-Anatolians, especially before the description of the Caucasus genetic barrier in Wang et al (2018). For some reason, though, the Hajji_Firuz sample was questioned, but this one didn’t even merited an interrogation mark.

    There was already a similar situation with two samples (RISE568 and RISE569) initially reported as belonging to Czech Corded Ware groups, that turned out to be Early Slavs ca. 3,000 years younger, in turn more closely related to Bell Beaker-derived cultures of Central-East Europe. It seems little has changed since that case.

    All in all, my guess is that genomic data of I6561 would have been a priori more compatible with a later period, during the expansion of East Corded Ware groups: at least Middle Dnieper culture, potentially Multi-Cordoned Ware culture, but most likely a Srubnaya-related one, given the most likely SNP mutation and TMRCA date, and the haplogroup variability found in the few samples available from that culture.

    PCA of ancient Eurasian samples. Marked I6561 sample within the cluster formed by Srubnaya samples. See full PCA.

    Compatibility checks

    I tried to start a thread on the possibility that the radiocarbon date was wrong, and IF it were, how likely it would be that formal stats could actually show this, or how could we automatically prevent ancestry magic fiascos.

    In other words: if this guy were a Srubnaya-related individual actually dated e.g. ca. 1700 BC, and someone would try to ‘prove’ – based on the current open source tools alone – that he was the ancestor of expanding peoples of the 4th and 3rd millennium BC (i.e. Balkan outliers, Yamnaya, Corded Ware, you name it), could these results be formally challenged?

    I was hoping for some original brainstorming where people would propose crazy, essentially impossible to understand statistical models, say plotting dozens of well-studied mutations of different geographically related ancient samples with their reported dates, to visually highlight samples that don’t exactly fit with such a feature-based time series analysis; I mean, the kind of theoretical models I wouldn’t even be able to follow after the first two tweets or so. I didn’t receive an answer like that, but still:

    I have nothing to add to these answers, because I agree that all contradictory data are circumstancial.

    The current absolute lack of this kind of validity checks for ancestry models is disappointing, though, and leaves the so-called outliers in a dangerous limbo between “potentially very interesting samples” and “potentially wrongly dated samples”. Radiocarbon date is thus – together with compatibility of population source in terms of archaeological cultures and their potential relationship – a necessary variable to take into account in any statistical design: an error in one of these variables means a catastrophic error in the whole model.

    Formal stats

    For example, in these qpAdm models, I assumed Srubnaya, Ukraine_Eneolithic_outlier, and Bulgaria_MLBA samples were roughly coetaneous and potentially related to the Srubnaya-SabatinovkaNoua cultural horizon, hence stemming from a source close to:

    1. Abashevo-like individuals (whose best proxy to date should be Poltavka_outlier I0432) potentially admixed with Poltavka-like herders; or
    2. Potapovka-like individuals potentially admixed with Catacomb-like peoples (whose best proxy until recently were probably Yamnaya_Kalmykia*).

    *To avoid adding more potential errors by merging different datasets, I have used only proxy samples available in the Reich Lab’s curated dataset of published ancient DNA.

    Srubnaya and Noua-Sabatinovka cultural horizon during the MLBA. See full maps.

    Apart from the lack of more models for comparison (I’m not going to dedicate more time to this), the results can’t be interpreted without a proper sampling and context, either, because (1) Poltavka_o may actually be from a much later group closely related to Srubnaya; (2) Bulgaria_MLBA is only one sample; and (3) there are only two samples from Potapovka; so the models here presented are basically useless, as many similar models that have been tested looking just for a formal “best fit”.

    So feel free to chime in and contribute with ideas as to how to detect in the future whether a sample is ancestral to or derived from others. I will post here informative answers from Twitter, too, if there are any. I don’t think a discussion about the potentially wrong date in this specific sample is very useful, because this seems impossible to prove or disprove at this point. Just what tools or data would you use to at least try and assess whether samples are compatible with its reported date or not – preferably in some kind of automated sieve that takes dozens or hundreds of samples into account.

    On the bright side, there is so much more than formal stats to arrive to relevant inferences about prehistoric populations, their movements and languages. That’s why I6561 didn’t matter for the conclusion by Anthony (2019) that it was the R1b-rich Eneolithic Don-Volga-Caucasus region the most likely Indo-Anatolian and Late Proto-Indo-European homeland, due to the creation of a wide Eneolithic mating network with extended exogamy practices, where Y-chromosome bottlenecks seem to be one of the main genomic data to take into account from the Neolithic to the Middle Bronze Age.

    And that is the same reason why it doesn’t matter that much for the Proto-Indo-European or Uralic question for me, either.


Yamnaya ancestry: mapping the Proto-Indo-European expansions


The latest papers from Ning et al. Cell (2019) and Anthony JIES (2019) have offered some interesting new data, supporting once more what could be inferred since 2015, and what was evident in population genomics since 2017: that Proto-Indo-Europeans expanded under R1b bottlenecks, and that the so-called “Steppe ancestry” referred to two different components, one – Yamnaya or Steppe_EMBA ancestry – expanding with Proto-Indo-Europeans, and the other one – Corded Ware or Steppe_MLBA ancestry – expanding with Uralic speakers.

The following maps are based on formal stats published in the papers and supplementary materials from 2015 until today, mainly on Wang et al. (2018 & 2019), Mathieson et al. (2018) and Olalde et al. (2018), and others like Lazaridis et al. (2016), Lazaridis et al. (2017), Mittnik et al. (2018), Lamnidis et al. (2018), Fernandes et al. (2018), Jeong et al. (2019), Olalde et al. (2019), etc.

NOTE. As in the Corded Ware ancestry maps, the selected reports in this case are centered on the prototypical Yamnaya ancestry vs. other simplified components, so everything else refers to simplistic ancestral components widespread across populations that do not necessarily share any recent connection, much less a language. In fact, most of the time they clearly didn’t. They can be interpreted as “EHG that is not part of the Yamnaya component”, or “CHG that is not part of the Yamnaya component”. They can’t be read as “expanding EHG people/language” or “expanding CHG people/language”, at least no more than maps of “Steppe ancestry” can be read as “expanding Steppe people/language”. Also, remember that I have left the default behaviour for color classification, so that the highest value (i.e. 1, or white colour) could mean anything from 10% to 100% depending on the specific ancestry and period; that’s what the legend is for… But, fere libenter homines id quod volunt credunt.


  1. Neolithic or the formation of Early Indo-European
  2. Eneolithic or the expansion of Middle Proto-Indo-European
  3. Chalcolithic / Early Bronze Age or the expansion of Late Proto-Indo-European
  4. European Early Bronze Age and MLBA or the expansion of Late PIE dialects

1. Neolithic

Anthony (2019) agrees with the most likely explanation of the CHG component found in Yamnaya, as derived from steppe hunter-fishers close to the lower Volga basin. The ultimate origin of this specific CHG-like component that eventually formed part of the Pre-Yamnaya ancestry is not clear, though:

The hunter-fisher camps that first appeared on the lower Volga around 6200 BC could represent the migration northward of un-admixed CHG hunter-fishers from the steppe parts of the southeastern Caucasus, a speculation that awaits confirmation from aDNA.

Natural neighbor interpolation of CHG ancestry among Neolithic populations. See full map.

The typical EHG component that formed part eventually of Pre-Yamnaya ancestry came from the Middle Volga Basin, most likely close to the Samara region, as shown by the sampled Samara hunter-gatherer (ca. 5600-5500 BC):

After 5000 BC domesticated animals appeared in these same sites in the lower Volga, and in new ones, and in grave sacrifices at Khvalynsk and Ekaterinovka. CHG genes and domesticated animals flowed north up the Volga, and EHG genes flowed south into the North Caucasus steppes, and the two components became admixed.

Natural neighbor interpolation of EHG ancestry among Neolithic populations. See full map.

To the west, in the Dnieper-Dniester area, WHG became the dominant ancestry after the Mesolithic, at the expense of EHG, revealing a likely mating network reaching to the north into the Baltic:

Like the Mesolithic and Neolithic populations here, the Eneolithic populations of Dnieper-Donets II type seem to have limited their mating network to the rich, strategic region they occupied, centered on the Rapids. The absence of CHG shows that they did not mate frequently if at all with the people of the Volga steppes (…)

Natural neighbor interpolation of WHG ancestry among Neolithic populations. See full map.

North-West Anatolia Neolithic ancestry, proper of expanding Early European farmers, is found up to border of the Dniester, as Anthony (2007) had predicted.

Natural neighbor interpolation of Anatolia Neolithic ancestry among Neolithic populations. See full map.

2. Eneolithic

From Anthony (2019):

After approximately 4500 BC the Khvalynsk archaeological culture united the lower and middle Volga archaeological sites into one variable archaeological culture that kept domesticated sheep, goats, and cattle (and possibly horses). In my estimation, Khvalynsk might represent the oldest phase of PIE.

(…) this middle Volga mating network extended down to the North Caucasian steppes, where at cemeteries such as Progress-2 and Vonyuchka, dated 4300 BC, the same Khvalynsk-type ancestry appeared, an admixture of CHG and EHG with no Anatolian Farmer ancestry, with steppe-derived Y-chromosome haplogroup R1b. These three individuals in the North Caucasus steppes had higher proportions of CHG, overlapping Yamnaya. Without any doubt, a CHG population that was not admixed with Anatolian Farmers mated with EHG populations in the Volga steppes and in the North Caucasus steppes before 4500 BC. We can refer to this admixture as pre-Yamnaya, because it makes the best currently known genetic ancestor for EHG/CHG R1b Yamnaya genomes.

From Wang et al (2019):

Three individuals from the sites of Progress 2 and Vonyuchka 1 in the North Caucasus piedmont steppe (‘Eneolithic steppe’), which harbour EHG and CHG related ancestry, are genetically very similar to Eneolithic individuals from Khvalynsk II and the Samara region. This extends the cline of dilution of EHG ancestry via CHG-related ancestry to sites immediately north of the Caucasus foothills

Natural neighbor interpolation of Pre-Yamnaya ancestry among Neolithic populations. See full map. This map corresponds roughly to the map of Khvalynsk-Novodanilovka expansion, and in particular to the expansion of horse-head pommel-scepters (read more about Khvalynsk, and specifically about horse symbolism)

NOTE. Unpublished samples from Ekaterinovka have been previously reported as within the R1b-L23 tree. Interestingly, although the Varna outlier is a female, the Balkan outlier from Smyadovo shows two positive SNP calls for hg. R1b-M269. However, its poor coverage makes its most conservative haplogroup prediction R-M343.

The formation of this Pre-Yamnaya ancestry sets this Volga-Caucasus Khvalynsk community apart from the rest of the EHG-like population of eastern Europe.

Natural neighbor interpolation of non-Pre-Yamnaya EHG ancestry among Eneolithic populations. See full map.

Anthony (2019) seems to rely on ADMIXTURE graphics when he writes that the late Sredni Stog sample from Alexandria shows “80% Khvalynsk-type steppe ancestry (CHG&EHG)”. While this seems the most logical conclusion of what might have happened after the Suvorovo-Novodanilovka expansion through the North Pontic steppes (see my post on “Steppe ancestry” step by step), formal stats have not confirmed that.

In fact, analyses published in Wang et al. (2019) rejected that Corded Ware groups are derived from this Pre-Yamnaya ancestry, a reality that had been already hinted in Narasimhan et al. (2018), when Steppe_EMBA showed a poor fit for expanding Srubna-Andronovo populations. Hence the need to consider the whole CHG component of the North Pontic area separately:

Natural neighbor interpolation of non-Pre-Yamnaya CHG ancestry among Eneolithic populations. See full map. You can read more about population movements in the late Sredni Stog and closer to the Proto-Corded Ware period.

NOTE. Fits for WHG + CHG + EHG in Neolithic and Eneolithic populations are taken in part from Mathieson et al. (2019) supplementary materials (download Excel here). Unfortunately, while data on the Ukraine_Eneolithic outlier from Alexandria abounds, I don’t have specific data on the so-called ‘outlier’ from Dereivka compared to the other two analyzed together, so these maps of CHG and EHG expansion are possibly showing a lesser distribution to the west than the real one ca. 4000-3500 BC.

Natural neighbor interpolation of WHG ancestry among Eneolithic populations. See full map.

Anatolia Neolithic ancestry clearly spread to the east into the north Pontic area through a Middle Eneolithic mating network, most likely opened after the Khvalynsk expansion:

Natural neighbor interpolation of Anatolia Neolithic ancestry among Eneolithic populations. See full map.
Natural neighbor interpolation of Iran Chl. ancestry among Eneolithic populations. See full map.

Regarding Y-chromosome haplogroups, Anthony (2019) insists on the evident association of Khvalynsk, Yamnaya, and the spread of Pre-Yamnaya and Yamnaya ancestry with the expansion of elite R1b-L754 (and some I2a2) individuals:

Y-DNA haplogroups in West Eurasia during the Early Eneolithic in the Pontic-Caspian steppes. See full map, and see culture, ADMIXTURE, Y-DNA, and mtDNA maps of the Early Eneolithic and Late Eneolithic.

3. Early Bronze Age

Data from Wang et al. (2019) show that Corded Ware-derived populations do not have good fits for Eneolithic_Steppe-like ancestry, no matter the model. In other words: Corded Ware populations show not only a higher contribution of Anatolia Neolithic ancestry (ca. 20-30% compared to the ca. 2-10% of Yamnaya); they show a different EHG + CHG combination compared to the Pre-Yamnaya one.

Supplementary Table 13. P values of rank=2 and admixture proportions in modelling Steppe ancestry populations as a three-way admixture of Eneolithic steppe Anatolian_Neolithic and WHG using 14 outgroups.
Left populations: Test, Eneolithic_steppe, Anatolian_Neolithic, WHG.
Right populations: Mbuti.DG, Ust_Ishim.DG, Kostenki14, MA1, Han.DG, Papuan.DG, Onge.DG, Villabruna, Vestonice16, ElMiron, Ethiopia_4500BP.SG, Karitiana.DG, Natufian, Iran_Ganj_Dareh_Neolithic.

Yamnaya Kalmykia and Afanasievo show the closest fits to the Eneolithic population of the North Caucasian steppes, rejecting thus sizeable contributions from Anatolia Neolithic and/or WHG, as shown by the SD values. Both probably show then a Pre-Yamnaya ancestry closest to the late Repin population.

Modelling results for the Steppe and Caucasus cluster. Admixture proportions based on (temporally and geographically) distal and proximal models, showing additional AF ancestry in Steppe groups and additional gene flow from the south in some of the Steppe groups as well as the Caucasus groups. See tables above. Modified from Wang et al. (2019). Within a blue square, Yamnaya-related groups; within a cyan square, Corded Ware-related groups. Green background behind best p-values. In red circle, SD of AF/WHG ancestry contribution in Afanasevo and Yamnaya Kalmykia, with ranges that almost include 0%.

EBA maps include data from Wang et al. (2018) supplementary materials, specifically unpublished Yamnaya samples from Hungary that appeared in analysis of the preprint, but which were taken out of the definitive paper. Their location among Yamnaya settlers from Hungary is speculative, although most uncovered kurgans in Hungary are concentrated in the Tisza-Danube interfluve.

Natural neighbor interpolation of Pre-Yamnaya ancestry among Early Bronze Age populations. See full map. This map corresponds roughly with the known expansion of late Repin/Yamnaya settlers.

The Y-chromosome bottleneck of elite males from Proto-Indo-European clans under R1b-L754 and some I2a2 subclades, already visible in the Khvalynsk sampling, became even more noticeable in the subsequent expansion of late Repin/early Yamnaya elites under R1b-L23 and I2a-L699:

Y-DNA haplogroups in West Eurasia during the Yamnaya expansion. See full map and maps of cultures, ADMIXTURE, Y-DNA, and mtDNA of the Early Chalcolithic and Yamnaya Hungary.

Maps of CHG, EHG, Anatolia Neolithic, and probably WHG show the expansion of these components among Corded Ware-related groups in North Eurasia, apart from other cultures close to the Caucasus:

NOTE. For maps with actual formal stats of Corded Ware ancestry from the Early Bronze Age to the modern times, you can read the post Corded Ware ancestry in North Eurasia and the Uralic expansion.

Natural neighbor interpolation of non-Pre-Yamnaya CHG ancestry among Early Bronze Age populations. See full map.
Natural neighbor interpolation of non-Pre-Yamnaya EHG ancestry among Early Bronze Age populations. See full map.
Natural neighbor interpolation of WHG ancestry among Early Bronze Age populations. See full map.
Natural neighbor interpolation of Anatolia Neolithic ancestry among Early Bronze Age populations. See full map.
Natural neighbor interpolation of Iran Chl. ancestry among Early Bronze Age populations. See full map.

4. Middle to Late Bronze Age

The following maps show the most likely distribution of Yamnaya ancestry during the Bell Beaker-, Balkan-, and Sintashta-Potapovka-related expansions.

4.1. Bell Beakers

The amount of Yamnaya ancestry is probably overestimated among populations where Bell Beakers replaced Corded Ware. A map of Yamnaya ancestry among Bell Beakers gets trickier for the following reasons:

  • Expanding Repin peoples of Pre-Yamnaya ancestry must have had admixture through exogamy with late Sredni Stog/Proto-Corded Ware peoples during their expansion into the North Pontic area, and Sredni Stog in turn had probably some Pre-Yamnaya admixture, too (although they don’t appear in the simplistic formal stats above). This is supported by the increase of Anatolia farmer ancestry in more western Yamna samples.
  • Later, Yamnaya admixed through exogamy with Corded Ware-like populations in Central Europe during their expansion. Even samples from the Middle to Upper Danube and around the Lower Rhine will probably show increasing contributions of Steppe_MLBA, at the same time as they show an increasing proportion of EEF-related ancestry.
  • To complicate things further, the late Corded Ware Espersted family (from ca. 2500 BC or later) shows, in turn, what seems like a recent admixture with Yamnaya vanguard groups, with the sample of highest Yamnaya ancestry being the paternal uncle of other individuals (all of hg. R1a-M417), suggesting that there might have been many similar Central European mating networks from the mid-3rd millennium BC on, of (mainly) Yamnaya-like R1b elites displaying a small proportion of CW-like ancestry admixing through exogamy with Corded Ware-like peoples who already had some Yamnaya ancestry.
Natural neighbor interpolation of Yamnaya ancestry among Middle to Late Bronze Age populations (Esperstedt CWC site close to BK_DE, label is hidden by BK_DE_SAN). See full map. You can see how this map correlated with the map of Late Copper Age migrations and Yamanaya into Bell Beaker expansion.

NOTE. Terms like “exogamy”, “male-driven migration”, and “sex bias”, are not only based on the Y-chromosome bottlenecks visible in the different cultural expansions since the Palaeolithic. Despite the scarce sampling available in 2017 for analysis of “Steppe ancestry”-related populations, it appeared to show already a male sex bias in Goldberg et al. (2017), and it has been confirmed for Neolithic and Copper Age population movements in Mathieson et al. (2018) – see Supplementary Table 5. The analysis of male-biased expansion of “Steppe ancestry” in CWC Esperstedt and Bell Beaker Germany is, for the reasons stated above, not very useful to distinguish their mutual influence, though.

Based on data from Olalde et al. (2019), Bell Beakers from Germany are the closest sampled ones to expanding East Bell Beakers, and those close to the Rhine – i.e. French, Dutch, and British Beakers in particular – show a clear excess “Steppe ancestry” due to their exogamy with local Corded Ware groups:

Only one 2-way model fits the ancestry in Iberia_CA_Stp with P-value>0.05: Germany_Beaker + Iberia_CA. Finding a Bell Beaker-related group as a plausible source for the introduction of steppe ancestry into Iberia is consistent with the fact that some of the individuals in the Iberia_CA_Stp group were excavated in Bell Beaker associated contexts. Models with Iberia_CA and other Bell Beaker groups such as France_Beaker (P-value=7.31E-06), Netherlands_Beaker (P-value=1.03E-03) and England_Beaker (P-value=4.86E-02) failed, probably because they have slightly higher proportions of steppe ancestry than the true source population.


The exogamy with Corded Ware-like groups in the Lower Rhine Basin seems at this point undeniable, as is the origin of Bell Beakers around the Middle-Upper Danube Basin from Yamnaya Hungary.

To avoid this excess “Steppe ancestry” showing up in the maps, since Bell Beakers from Germany pack the most Yamnaya ancestry among East Bell Beakers outside Hungary (ca. 51.1% “Steppe ancestry”), I equated this maximum with BK_Scotland_Ach (which shows ca. 61.1% “Steppe ancestry”, highest among western Beakers), and applied a simple rule of three for “Steppe ancestry” in Dutch and British Beakers.

NOTE. Formal stats for “Steppe ancestry” in Bell Beaker groups are available in Olalde et al. (2018) supplementary materials (PDF). I didn’t apply this adjustment to Bk_FR groups because of the R1b Bell Beaker sample from the Champagne/Alsace region reported by Samantha Brunel that will pack more Yamnaya ancestry than any other sampled Beaker to date, hence probably driving the Yamnaya ancestry up in French samples.

The most likely outcome in the following years, when Yamnaya and Corded Ware ancestry are investigated separately, is that Yamnaya ancestry will be much lower the farther away from the Middle and Lower Danube region, similar to the case in Iberia, so the map above probably overestimates this component in most Beakers to the north of the Danube. Even the late Hungarian Beaker samples, who pack the highest Yamnaya ancestry (up to 75%) among Beakers, represent likely a back-migration of Moravian Beakers, and will probably show a contribution of Corded Ware ancestry due to the exogamy with local Moravian groups.

Despite this decreasing admixture as Bell Beakers spread westward, the explosive expansion of Yamnaya R1b male lineages (in words of David Reich) and the radical replacement of local ones – whether derived from Corded Ware or Neolithic groups – shows the true extent of the North-West Indo-European expansion in Europe:

Y-DNA haplogroups in West Eurasia during the Bell Beaker expansion. See full map and see maps of cultures, ADMIXTURE, Y-DNA, and mtDNA of the Late Copper Age and of the Yamnaya-Bell Beaker transition.

4.2. Palaeo-Balkan

There is scarce data on Palaeo-Balkan movements yet, although it is known that:

  1. Yamnaya ancestry appears among Mycenaeans, with the Yamnaya Bulgaria sample being its best current ancestral fit;
  2. the emergence of steppe ancestry and R1b-M269 in the eastern Mediterranean was associated with Ancient Greeks;
  3. Thracians, Albanians, and Armenians also show R1b-M269 subclades and “Steppe ancestry”.

4.3. Sintashta-Potapovka-Filatovka

Interestingly, Potapovka is the only Corded Ware derived culture that shows good fits for Yamnaya ancestry, despite having replaced Poltavka in the region under the same Corded Ware-like (Abashevo) influence as Sintashta.

This proves that there was a period of admixture in the Pre-Proto-Indo-Iranian community between CWC-like Abashevo and Yamnaya-like Catacomb-Poltavka herders in the Sintashta-Potapovka-Filatovka community, probably more easily detectable in this group because of the specific temporal and geographic sampling available.

Supplementary Table 14. P values of rank=3 and admixture proportions in modelling Steppe ancestry populations as a four-way admixture of distal sources EHG, CHG, Anatolian_Neolithic and WHG using 14 outgroups.
Left populations: Steppe cluster, EHG, CHG, WHG, Anatolian_Neolithic
Right populations: Mbuti.DG, Ust_Ishim.DG, Kostenki14, MA1, Han.DG, Papuan.DG, Onge.DG, Villabruna, Vestonice16, ElMiron, Ethiopia_4500BP.SG, Karitiana.DG, Natufian, Iran_Ganj_Dareh_Neolithic.

Srubnaya ancestry shows a best fit with non-Pre-Yamnaya ancestry, i.e. with different CHG + EHG components – possibly because the more western Potapovka (ancestral to Proto-Srubnaya Pokrovka) also showed good fits for it. Srubnaya shows poor fits for Pre-Yamnaya ancestry probably because Corded Ware-like (Abashevo) genetic influence increased during its formation.

On the other hand, more eastern Corded Ware-derived groups like Sintashta and its more direct offshoot Andronovo show poor fits with this model, too, but their fits are still better than those including Pre-Yamnaya ancestry.

Natural neighbor interpolation of non-Pre-Yamnaya EHG ancestry among Middle to Late Bronze Age populations. See full map.
Natural neighbor interpolation of non-Pre-Yamnaya CHG ancestry among Middle to Late Bronze Age populations. See full map.
Natural neighbor interpolation of Anatolia Neolithic ancestry among Middle to Late Bronze Age populations. See full map.
Natural neighbor interpolation of Iran Chl. ancestry among Middle to Late Bronze Age populations. See full map.

NOTE For maps with actual formal stats of Corded Ware ancestry from the Early Bronze Age to the modern times, you should read the post Corded Ware ancestry in North Eurasia and the Uralic expansion instead.

The bottleneck of Proto-Indo-Iranians under R1a-Z93 was not yet complete by the time when the Sintashta-Potapovka-Filatovka community expanded with the Srubna-Andronovo horizon:

Y-DNA haplogroups in West Eurasia during the European Early Bronze Age. See full map and see maps of cultures, ADMIXTURE, Y-DNA, and mtDNA of the Early Bronze Age.

4.4. Afanasevo

At the end of the Afanasevo culture, at least three samples show hg. Q1b (ca. 2900-2500 BC), which seemed to point to a resurgence of local lineages, despite continuity of the prototypical Pre-Yamnaya ancestry. On the other hand, Anthony (2019) makes this cryptic statement:

Yamnaya men were almost exclusively R1b, and pre-Yamnaya Eneolithic Volga-Caspian-Caucasus steppe men were principally R1b, with a significant Q1a minority.

Since the only available samples from the Khvalynsk community are R1b (x3), Q1a(x1), and R1a(x1), it seems strange that Anthony would talk about a “significant minority”, unless Q1a (potentially Q1b in the newer nomenclature) will pop up in some more individuals of those ca. 30 new to be published. Because he also mentions I2a2 as appearing in one elite burial, it seems Q1a (like R1a-M459) will not appear under elite kurgans, although it is still possible that hg. Q1a was involved in the expansion of Afanasevo to the east.

Y-DNA haplogroups in West Eurasia during the Middle Bronze Age. See full map and see maps of cultures, ADMIXTURE, Y-DNA, and mtDNA of the Middle Bronze Age and the Late Bronze Age.

Okunevo, which replaced Afanasevo in the Altai region, shows a majority of hg. Q1b, but also some R1b-M269 samples proper of Afanasevo, suggesting partial genetic continuity.

NOTE. Other sampled Siberian populations clearly show a variety of Q subclades that likely expanded during the Palaeolithic, such as Baikal EBA samples from Ust’Ida and Shamanka with a majority of Q1b, and hg. Q reported from Elunino, Sagsai, Khövsgöl, and also among peoples of the Srubna-Andronovo horizon (the Krasnoyarsk MLBA outlier), and in Karasuk.

From Damgaard et al. Science (2018):

(…) in contrast to the lack of identifiable admixture from Yamnaya and Afanasievo in the CentralSteppe_EMBA, there is an admixture signal of 10 to 20% Yamnaya and Afanasievo in the Okunevo_EMBA samples, consistent with evidence of western steppe influence. This signal is not seen on the X chromosome (qpAdm P value for admixture on X 0.33 compared to 0.02 for autosomes), suggesting a male-derived admixture, also consistent with the fact that 1 of 10 Okunevo_EMBA males carries a R1b1a2a2 Y chromosome related to those found in western pastoralists. In contrast, there is no evidence of western steppe admixture among the more eastern Baikal region region Bronze Age (~2200 to 1800 BCE) samples.

This Yamnaya ancestry has been also recently found to be the best fit for the Iron Age population of Shirenzigou in Xinjiang – where Tocharian languages were attested centuries later – despite the haplogroup diversity acquired during their evolution, likely through an intermediate Chemurchek culture (see a recent discussion on the elusive Proto-Tocharians).

Haplogroup diversity seems to be common in Iron Age populations all over Eurasia, most likely due to the spread of different types of sociopolitical structures where alliances played a more relevant role in the expansion of peoples. A well-known example of this is the spread of Akozino warrior-traders in the whole Baltic region under a partial N1a-VL29-bottleneck associated with the emerging chiefdom-based systems under the influence of expanding steppe nomads.

Y-DNA haplogroups in West Eurasia during the Early Iron Age. See full map and see maps of cultures, ADMIXTURE, Y-DNA, and mtDNA of the Early Iron Age and Late Iron Age.

Surprisingly, then, Proto-Tocharians from Shirenzigou pack up to 74% Yamnaya ancestry, in spite of the 2,000 years that separate them from the demise of the Afanasevo culture. They show more Yamnaya ancestry than any other population by that time, being thus a sort of Late PIE fossils not only in their archaic dialect, but also in their genetic profile:


The recent intrusion of Corded Ware-like ancestry, as well as the variable admixture with Siberian and East Asian populations, both point to the known intense Old Iranian and Old/Middle Chinese contacts. The scarce Proto-Samoyedic and Proto-Turkic loans in Tocharian suggest a rather loose, probably more distant connection with East Uralic and Altaic peoples from the forest-steppe and steppe areas to the north (read more about external influences on Tocharian).

Interestingly, both R1b samples, MO12 and M15-2 – likely of Asian R1b-PH155 branch – show a best fit for Andronovo/Srubna + Hezhen/Ulchi ancestry, suggesting a likely connection with Iranians to the east of Xinjiang, who later expanded as the Wusun and Kangju. How they might have been related to Huns and Xiongnu individuals, who also show this haplogroup, is yet unknown, although Huns also show hg. R1a-Z93 (probably most R1a-Z2124) and Steppe_MLBA ancestry, earlier associated with expanding Iranian peoples of the Srubna-Andronovo horizon.

All in all, it seems that prehistoric movements explained through the lens of genetic research fit perfectly well the linguistic reconstruction of Proto-Indo-European and Proto-Uralic.


Volga Basin R1b-rich Proto-Indo-Europeans of (Pre-)Yamnaya ancestry


New paper (behind paywall) by David Anthony, Archaeology, Genetics, and Language in the Steppes: A Comment on Bomhard, complementing in a favourable way Bomhard’s Caucasian substrate hypothesis in the current issue of the JIES.

NOTE. I have tried to access this issue for some days, but it’s just not indexed in my university library online service (ProQuest) yet. This particular paper is on Academia.edu, though, as are Bomhard’s papers on this issue in his site.

Interesting excerpts (emphasis mine):

Along the banks of the lower Volga many excavated hunting-fishing camp sites are dated 6200-4500 BC. They could be the source of CHG ancestry in the steppes. At about 6200 BC, when these camps were first established at Kair Shak III and Varfolomievka (42 and 28 on Figure 2), they hunted primarily saiga antelope around Dzhangar, south of the lower Volga, and almost exclusively onagers in the drier desert-steppes at Kair-Shak, north of the lower Volga. Farther north at the lower/middle Volga ecotone, at sites such as Varfolomievka and Oroshaemoe hunter-fishers who made pottery similar to that at Kair-Shak hunted onagers and saiga antelope in the desert-steppe, horses in the steppe, and aurochs in the riverine forests. Finally, in the Volga steppes north of Saratov and near Samara, hunter-fishers who made a different kind of pottery (Samara type) and hunted wild horses and red deer definitely were EHG. A Samara hunter-gatherer of this era buried at Lebyazhinka IV, dated 5600-5500 BC, was one of the first named examples of the EHG genetic type (Haak et al. 2015). This individual, like others from the same region, had no or very little CHG ancestry. The CHG mating network had not yet reached Samara by 5500 BC.

Eneolithic settlements (1–5, 7, 10–16, 20, 22–43, 48, 50), burial grounds (6, 8–9, 17–19, 21, 47, 49) and kurgans (44–46) of the steppe Ural-Volga region: 1 Ivanovka; 2 Turganik; 3 Kuzminki; 4 Mullino; 5 Davlekanovo; 6 Sjezheye (burial ground); 7 Vilovatoe; 8 Ivanovka; 9 Krivoluchye; 10–13 LebjazhinkaI-III-IV-V; 14 Gundorovka; 15–16 Bol. Rakovka I-II; 17–18 Khvalunsk I-II; 19 Lipoviy Ovrag; 20 Alekseevka; 21 Khlopkovskiy; 22 Kuznetsovo I; 23 Ozinki II; 24 Altata; 25 Monakhov I; 26 Oroshaemoe; 27 Rezvoe; 28 Varpholomeevka; 29 Vetelki; 30 Pshenichnoe; 31 Kumuska; 32 Inyasovo; 33 Shapkino VI; 34 Russkoe Truevo I; 35 Tsaritsa I-II; 36 Kamenka I; 37 Kurpezhe-Molla; 38 Istay; 39 Isekiy; 40 Koshalak; 41 Kara-Khuduk; 42 Kair-Shak VI; 43 Kombakte; 44 Berezhnovka I-II; 45 Rovnoe; 46 Politotdelskoe; 47 burial near s. Pushkino; 48 Elshanka; 49 Novoorsk; 50 Khutor Repin. Modified from Morgunova (2014).

But before 4500 BC, CHG ancestry appeared among the EHG hunter-fishers in the middle Volga steppes from Samara to Saratov, at the same time that domesticated cattle and sheep-goats appeared. The Reich lab now has whole-genome aDNA data from more than 30 individuals from three Eneolithic cemeteries in the Volga steppes between the cities of Saratov and Samara (Khlopkov Bugor, Khvalynsk, and Ekaterinovka), all dated around the middle of the fifth millennium BC. Many dates from human bone are older, even before 5000 BC, but they are affected by strong reservoir effects, derived from a diet rich in fish, making them appear too old (Shishlina et al 2009), so the dates I use here accord with published and unpublished dates from a few dated animal bones (not fish-eaters) in graves.

Only three individuals from Khvalynsk are published, and they were first published in a report that did not mention the site in the text (Mathieson et al. 2015), so they went largely unnoticed. Nevertheless, they are crucial for understanding the evolution of the Yamnaya mating network in the steppes. They were mentioned briefly in Damgaard et al (2018) but were not graphed. They were re-analyzed and their admixture components were illustrated in a bar graph in Wang et al (2018: figure 2c), but they are not the principal focus of any published study. All of the authors who examined them agreed that these three Khvalynsk individuals, dated about 4500 BC, showed EHG ancestry admixed substantially with CHG, and not a trace of Anatolian Farmer ancestry, so the CHG was a Hotu-Cave or Kotias-Cave type of un-admixed CHG. The proportion of CHG in the Wang et al. (2018) bar graphs is about 20-30% in two individuals, substantially less CHG than in Yamnaya; but the third Khvalynsk individual had more than 50% CHG, like Yamnaya. The ca. 30 additional unpublished individuals from three middle Volga Eneolithic cemeteries, including Khvalynsk, preliminarily show the same admixed EHG/CHG ancestry in varying proportions. Most of the males belonged to Y-chromosome haplogroup R1b1a, like almost all Yamnaya males, but Khvalynsk also had some minority Y-chromosome haplogroups (R1a, Q1a, J, I2a2) that do not appear or appear only rarely (I2a2) in Yamnaya graves.

Pontic-Caspian steppe and neighbouring groups in the Neolithic. See full map.

Wang et al. (2018) discovered that this middle Volga mating network extended down to the North Caucasian steppes, where at cemeteries such as Progress-2 and Vonyuchka, dated 4300 BC, the same Khvalynsk-type ancestry appeared, an admixture of CHG and EHG with no Anatolian Farmer ancestry, with steppe-derived Y-chromosome haplogroup R1b. These three individuals in the North Caucasus steppes had higher proportions of CHG, overlapping Yamnaya. Without any doubt, a CHG population that was not admixed with Anatolian Farmers mated with EHG populations in the Volga steppes and in the North Caucasus steppes before 4500 BC. We can refer to this admixture as pre-Yamnaya, because it makes the best currently known genetic ancestor for EHG/CHG R1b Yamnaya genomes. The Progress-2 individuals from North Caucasus steppe graves lived not far from the pre-Maikop farmers of the Belaya valley, but they did not exchange mates, according to their DNA.

The hunter-fisher camps that first appeared on the lower Volga around 6200 BC could represent the migration northward of un-admixed CHG hunter-fishers from the steppe parts of the southeastern Caucasus, a speculation that awaits confirmation from aDNA. After 5000 BC domesticated animals appeared in these same sites in the lower Volga, and in new ones, and in grave sacrifices at Khvalynsk and Ekaterinovka. CHG genes and domesticated animals flowed north up the Volga, and EHG genes flowed south into the North Caucasus steppes, and the two components became admixed. After approximately 4500 BC the Khvalynsk archaeological culture united the lower and middle Volga archaeological sites into one variable archaeological culture that kept domesticated sheep, goats, and cattle (and possibly horses). In my estimation, Khvalynsk might represent the oldest phase of PIE.

Pontic-Caspian steppe and neighbouring groups in the Early Eneolithic. See full map.

Anatolian Farmer ancestry and Yamnaya origins

The Eneolithic Volga-North Caucasus mating network (Khvalynsk/Progress-2 type) exhibited EHG/CHG admixtures and Y-chromosome haplogroups similar to Yamnaya, but without Yamnaya’s additional Anatolian Farmer ancestry. (…)

Like the Mesolithic and Neolithic populations here, the Eneolithic populations of Dnieper-Donets II type seem to have limited their mating network to the rich, strategic region they occupied, centered on the Rapids. The absence of CHG shows that they did not mate frequently if at all with the people of the Volga steppes, a surprising but undeniable discovery. Archaeologists have seen connections in ornament types and in some details of funeral ritual between Dnieper-Donets cemeteries of the Mariupol-Nikol’skoe type and cemeteries in the middle Volga steppes such as Khvalynsk and S’yez’zhe (Vasiliev 1981:122-123). Also their cranio-facial types were judged to be similar (Bogdanov and Khokhlov 2012:212). So it it surprising that their aDNA does not indicate any genetic admixture with Khvalynsk or Progress-2. Also, neither they nor the Volga steppe Eneolithic populations showed any Anatolian Farmer ancestry. (…)

All three of the steppe-admixed exceptions were from the Varna region (Mathieson et al. 2018). One of them was the famous “golden man’ at Varna (Krause et al. 2016), Grave 43, whose steppe ancestry was the most doubtful of the three. If he had steppe ancestry, it was sufficiently distant (five+ generations before him) that he was not a statistically significant outlier, but he was displaced in the steppe direction, away from the central values of the majority of typical Anatolian Farmers at Varna and elsewhere. The other two, at Varna (grave 158, a 5-7-year-old girl) and Smyadovo (grave 29, a male 20-25 years old), were statistically significant outliers who had recent steppe ancestry (consistent with grandparents or great-grandparents) of the EHG/CHG Khvalynsk/Progress-2 type, not of the Dnieper Rapids EHG/WHG type.

(…) I believe that the Suvorovo-Cernavoda I movement into the lower Danube valley and the Balkans about 4300 BC separated early PIE-speakers (pre-Anatolian) from the steppe population that stayed behind in the steppes and that later developed into late PIE and Yamnaya.

This archaeological transition marked the breakdown of the mating barrier between steppe and Anatolian Farmer mating networks. After this 4300-4200 BC event, Anatolian Farmer ancestry began to pop up in the steppes. The currently oldest sample with Anatolian Farmer ancestry in the steppes in an individual at Aleksandriya, a Sredni Stog cemetery on the Donets in eastern Ukraine. Sredni Stog has often been discussed as a possible Yamnaya ancestor in Ukraine (Anthony 2007: 239- 254). The single published grave is dated about 4000 BC (4045– 3974 calBC/ 5215±20 BP/ PSUAMS-2832) and shows 20% Anatolian Farmer ancestry and 80% Khvalynsk-type steppe ancestry (CHG&EHG). His Y-chromosome haplogroup was R1a-Z93, similar to the later Sintashta culture and to South Asian Indo-Aryans, and he is the earliest known sample to show the genetic adaptation to lactase persistence (I3910-T). Another pre-Yamnaya grave with Anatolian Farmer ancestry was analyzed from the Dnieper valley at Dereivka, dated 3600-3400 BC (grave 73, 3634–3377 calBC/ 4725±25 BP/ UCIAMS-186349). She also had 20% Anatolian Farmer ancestry, but she showed less CHG than Aleksandriya and more Dereivka-1 ancestry, not surprising for a Dnieper valley sample, but also showing that the old fifth-millennium-type EHG/WHG Dnieper ancestry survived into the fourth millennium BC in the Dnieper valley (Mathieson et al. 2018).

Pontic-Caspian steppe and neighbouring groups in the Late Eneolithic. See full map.

Probably, late PIE (Yamnaya) evolved in the same part of the steppes—the Volga-Caucasus steppes between the lower Don, the lower and middle Volga, and the North Caucasus piedmont—where early PIE evolved, and where appropriate EHG/CHG admixtures and Y-chromosome haplogroups were seen already in the Eneolithic (without Anatolian Farmer). There have always been archaeologists who argued for an origin of Yamnaya in the Volga steppes, including Gimbutas (1963), Merpert (1974), and recently Morgunova (2014), who argued that this was where Repin-type ceramics, an important early Yamnaya pottery type, first appeared in dated contexts before Yamnaya, about 3600 BC. The genetic evidence is consistent with Yamnaya EHG/CHG origins in the Volga-Caucasus steppes. Also, if contact with the Maikop culture was a fundamental cause of the innovations in transport and metallurgy that defined the Yamnaya culture, then the lower Don-North Caucasus-lower Volga steppes, closest to the North Caucasus, would be where the earliest phase is expected.

I would still guess that the Darkveti-Meshoko culture and its descendant Maikop culture established the linguistic ancestor of the Northwest Caucasian languages in approximately the region where they remained. I also accept the general consensus that the appearance of the hierarchical Maikop culture about 3600 BC had profound effects on pre-Yamnaya and early Yamnaya steppe cultures. Yamnaya metallurgy borrowed from the Maikop culture two-sided molds, tanged daggers, cast shaft hole axes with a single blade, and arsenical copper. Wheeled vehicles might have entered the steppes through Maikop, revolutionizing steppe economies and making Yamnaya pastoral nomadism possible after 3300 BC.

For those who still hoped that Proto-Indo-Europeans of Yamnaya/Afanasievo ancestry from the Don-Volga region were associated with the expansion of hg. R1a-M417, in a sort of mythical “R1-rich” Indo-European society, it seems this is going to be yet another prediction based on ancestry magic that goes wrong.

Proto-Indo-Europeans were, however, associated with other subclades beyond R1b-M269, probably (as I wrote recently) R1b-V1636, I2a-L699, Q1a-M25, and R1a-YP1272, but also interestingly some J subclade, so let’s see what surprises the new study on Khvalynsk and Yamnaya settlers from the Carpathian Basin brings…

On the bright side, it is indirectly confirmed that late Sredni Stog formed part of the neighbouring Corded Ware-like populations of ca. 20-30%+ Anatolian farmer ancestry that gave Yamnaya its share (ca. 6-10%), relative to the comparatively unmixed Khvalynsk and late Repin population (as shown by Afanasevo).

In this steppe mating network that opened up after the Khvalynsk expansion, the increasing admixture of Anatolian farmer-related ancestry in Yamnaya from east (ca. 2-10%) to west (ca. 6-15%) points to an exogamy of late Repin males in their western/south-western regions with populations around the Don River basin and beyond (and endogamy within the Yamnaya community), in an evolution relevant for language expansions and language contacts during the Late Eneolithic.

NOTE. “Mating network” is my new preferred term for “ancestry”. Also great to see scholars finally talk about “Pre-Yamnaya” ancestry, which – combined with the distinction of Yamnaya from Corded Ware ancestry – will no doubt help differentiate fine-scale population movements of steppe- and forest-steppe-related populations.

Modified from Rassamakin (1999), adding red color to Repin expansion. The system of the latest Eneolithic Pointic cultures and the sites of the Zhivotilovo-Volchanskoe type: 1) Volchanskoe; 2) Zhivotilovka; 3) Vishnevatoe; 4) Koisug.

The whole issue of the JIES is centered on Caucasian influences on Early PIE as an Indo-Uralic dialect, and this language contact/substrate is useful to locate the most likely candidates for the Northeast and Northwest Caucasian and the Proto-Indo-European homelands.

On the other hand, it would also be interesting to read a discussion of how this Volga homeland of Middle PIE and Don-Volga-Ural homeland of Late PIE would be reconciled with the known continuous contacts of Uralic with Middle and Late PIE (see here) to locate the most likely Proto-Uralic homeland.

Especially because Corded Ware fully replaced all sub-Neolithic groups to the north and east of Khvalynsk/Yamnaya, like Volosovo, so no other population neighbouring Middle and Late Proto-Indo-Europeans survived into the Bronze Age…

EDIT: For those new to this blog, this information on unpublished samples from the Volga River basin is yet another confirmation of Khokhlov’s report on the R1b-L23 samples from Yekaterinovka, and its confirmation by a co-author of The unique elite Khvalynsk male from a Yekaterinovskiy Cape burial, apart from more support to the newest data placing Yekaterinovka culturally and probably chronologically between Samara and Khvalynsk.


Villabruna cluster in Late Epigravettian Sicily supports South Italian corridor for R1b-V88


New preprint Late Upper Palaeolithic hunter-gatherers in the Central Mediterranean: new archaeological and genetic data from the Late Epigravettian burial Oriente C (Favignana, Sicily), by Catalano et al. bioRxiv (2019).

Interesting excerpts (emphasis mine):

Grotta d’Oriente is a small coastal cave located on the island of Favignana, the largest (~20 km2) of a group of small islands forming the Egadi Archipelago, ~5 km from the NW coast of Sicily.

The Oriente C funeral pit opens in the lower portion of layer 7, specifically sublayer 7D. Two radiocarbon dates on charcoal from the sublayers 7D (12149±65 uncal. BP) and 7E, 12132±80 uncal. BP are consistent with the associated Late Epigravettian lithic assemblages (Lo Vetro and Martini, 2012; Martini et al., 2012b) and refer the burial to a period between about 14200-13800 cal. BP, when Favignana was connected to the main island (Agnesi et al., 1993; Antonioli et al., 2002; Mannino et al. 2014).

A-B) Geographic location of Grotta d’Oriente.

The anatomical features of Oriente C are close to those of Late Upper Palaeolithic populations of the Mediterranean and show strong affinity with other Palaeolithic individuals of Sicily. As suggested by Henke (1989) and Fabbri (1995) the hunter-gatherer populations were morphologically rather uniform.

Genetic analysis

We confirmed the originally reported mitochondrial haplogroup assignment of U2’3’4’7’8’9. This haplogroup is present in both pre- and post-LGM populations, but is rare by the Mesolithic, when U5 dominates (Posth et al.2016).

Lipson et al. (2018) (their supplementary Figure S5.1) and Villalba-Mouco et al. (2019) (their Figure 2A) showed that European Late Palaeolithic and Mesolithic hunter-gatherers fall along two main axes of genetic variation. Multidimensional scaling (MDS) of f3-statistics shows that these axes form a “V” shape (Fig. 3). (…)

Focusing further on Oriente C, we find that it shares most drift with individuals from Northern Italy, Switzerland and Luxembourg, and less with individuals from Iberia, Scandinavia, and East and Southeast Europe (Fig. 4A-B). Shared drift decreases significantly with distance (Fig. 4C) and with time (Fig. 4D) although in a linear model of drift with distance and time as a covariate, only distance (p=1.3×10-6) and not time (p=0.11) is significant. Consistent with the overall E-W cline in hunter-gatherer ancestry, genetic distance to Oriente C increases more rapidly with longitude than latitude, although this may also be affected by geographic features. For example, Oriente C shares significantly more drift with the 8,000 year-old 1,400 km distant individual from Loschbour in Luxembourg (Lazaridis et al.,2014), than with the 9,000 year old individual from Vela Spila in Croatia (Mathieson et al.,2018) only 700 km away as shown by the D-statistic (Patterson et al.,2012) D (Mbuti, Oriente C, Vela Spila, Villabruna); Z=3.42. Oriente C’s heterozygosity was slightly lower than Villabruna (14% lower at 1240k transversion sites), but this difference is not significant (bootstrap P=0.12).

Multidimensional scaling of outgroup f3-statistics for Late 531 Upper Palaeolithic and Mesolithic hunter-gatherers.

Discussion and Conclusion

The robust record of radiocarbon dates proves that they reached Sicily not before 15-14 ka cal. BP, several millennia after the LGM peak. In our opinion, in fact, the hypothesis about an early colonization of Sicily by Aurignacians (Laplace, 1964; Chilardi et al., 1996) must be rejected, on the basis of a recent reinterpretation of the techno-typological features of the lithic industries from Riparo di Fontana Nuova (Martini et al., 2007; Lo Vetro and Martini, 2012; on this topic see also Di Maida et al., 2019).

These analyses have implications for understanding the origin and diffusion of the hunter-gatherers that inhabited Europe during the Late Upper Palaeolithic and Mesolithic. Our findings indicate that Oriente C shows a strong genetic relationship with Western European Late Upper Palaeolithic and Mesolithic hunter-gatherers, suggesting that the “Western hunter-gatherers” was a homogeneous population widely distributed in the Central Mediterranean, presumably as a consequence of continuous gene flow among different groups, or a range expansion following the LGM.

The same statistic as in A plotted with geographic position

The South Italian corridor

Once again, a hypothesis based on phylogeography – apart from scarce archaeological and palaeolinguistic data (“Semitic”-like topo-hydronymy and substrates in Europe) – seems to be confirmed step by step. Since the finding of the Villabruna individual of hg. R1b-L754 (likely R1b-V88, like south-eastern European lineages expanded with WHG ancestry), it was quite likely to find out that southern Europe was the origin of the expansion of R1b-V88 into Africa.

The most likely explanation for the presence of “archaic” R1b-V88 subclades among modern Sardinians was, therefore, that they represented a remnant from a Late Upper Palaeolithic/Early Mesolithic population that had not been replaced in subsequent migrations, and thus that the migration of these lineages into Northern Africa and the Green Sahara happened during a period when Italy was connected by a shallower Mediterranean (and more land connections) to Northern Africa.

Likely Late Epigravettian/Mesolithic expansion of R1b-V88 into Northern Africa. See full map.

Nevertheless, the arguments for a quite recent expansion of R1b-V88 through the Mediterranean and into Africa keep being repeated, probably based on ancestry from the few ancient (and many modern) populations that have been investigated to date, a simplistic approach prone to important errors that overarch whole migration models.

For example, in the recent paper by Marcus et al. (2019) the presence of these lineages among ancient Sardinians (from the late 4th millennium BC on) is interpreted as an expansion of R1b-V88 with the Cardial Neolithic based on their ancestry, disregarding the millennia-long gap between these samples and the presence of this haplogroup in Palaeolithic/Mesolithic Northern Iberia and Northern Italy, and the comparatively much earlier splits in the phylogenetic tree and dispersal among African populations.

Afroasiatic and Nostratic

I was asked recently if I really believed that we could reconstruct Proto-Nostratic and connect it with any ancestral population. My answer is simple: until the Chalcolithic – when the whole picture of Indo-Europeans, Uralians, Egyptians or Semites becomes quite clear – we have just very few (linguistic, archaeological, genetic) dots which we would like to connect, and we do so the best we can. The earlier the population and proto-language, the more difficult this task becomes.

NOTE. 1) I tentatively connected hg. R with Nostratic in a previous text – when it appeared that R1a expanded from around Lake Baikal, hence Eurasiatic; R1b from the south with AME-WHG ancestry, hence Afroasiatic; and R2 with Dravidian.

2) After that, I though it was more likely to be connected to AME ancestry and the Middle East, because of the apparent expansion of WHG from south-eastern Europe, and the potential association of Afroasiatic and (Elamo-?)Dravidian to Middle Eastern populations.

3) However, after finding more and more R1b samples expanding through northern Eurasia, spreading through the (then wider) steppe regions; and R1a essentially surviving among other groups in eastern Europe for thousands of years without being associated to significant migrations (like, say, hg. C after the Palaeolithic), it didn’t seem like this division was accurate, hence my most recent version.

But, in essence, it’s all about connecting the dots, and we have very few of them…

Phylogenetic tree from Pagel et al. (2013), partially in agreement with Kortlandt’s view on Eurasiatic. “Consensus phylogenetic tree of Eurasiatic superfamily (A) superimposed on Eurasia and (B) rooted tree with estimated dates of origin of families and of superfamily. (A) Unrooted consensus tree with branch lengths (solid lines) shown to scale and illustrating the correspondence between the tree and the contemporary north-south and east-west geographical positions of these language families. Abbreviations: P (proto) followed by initials of language family: PD, proto-Dravidian; PK, proto-Kartvelian; PU, proto-Uralic; PIE, proto–Indo-European; PA, proto-Altaic; PCK, proto–Chukchi-Kamchatkan; PIY, proto–Inuit-Yupik. The dotted line to PIY extends the inferred branch length into the area in which Inuit-Yupik languages are currently spoken: it is not a measure of divergence. The cross-hatched line to PK indicates that branch has been shortened (compare with B). The branch to proto-Dravidian ends in an area that Dravidian populations are thought to have occupied before the arrival of Indo-Europeans (see main text). (B) Consensus tree rooted using proto-Dravidian as the outgroup. The age at the root is 14.45 ± 1.75 kya (95% CI = 11.72–18.38 kya) or a slightly older 15.61 ± 2.29 kya (95% CI = 11.72–20.40 kya) if the tree is rooted with proto-Kartvelian. The age assumes midpoint rooting along the branch leading to proto-Dravidian (rooting closer to PD would produce an older root, and vice versa), and takes into account uncertainty around proto–Indo-European date of 8,700 ± 544 (SD) y following ref. 35 and the PCK date of 692 ± 67 (SD) y ago.”

In linguistics, I trust traditional linguists who tend to trust other more experimental linguists (like Hyllested or Kortlandt) who consider that – in their experience – an Indo-Uralic and a Eurasiatic phylum can be reconstructed. Similarly, linguists like Kortlandt are apparently (partially) supportive of attempts like that of Allan Bomhard with Nostratic – although almost everyone is critic of the Muscovite school‘s attachment to the Brugmannian reconstruction, stuck in pre-laryngeal Proto-Indo-Anatolian and similar archaisms.

I mostly use Nostratic as a way to give a simplistic ethnolinguistic label to the genetically related prehistoric peoples whose languages we will probably never know. I think it’s becoming clear that the strongest connection right now with the expansion of potential Eurasiatic dialects is offered by ANE-related populations (hence Y-chromosome bottlenecks under hg. R, Q, probably also N), however complicated the reconstruction of that hypothetic community (and its dialectalization) may be.

Therefore, the multiple expansions of lineages more or less closely associated to ANE-related peoples – like R1b-V88 in the case of Afrasian, or R2 in the case of Dravidians – are the easiest to link to the traditionally described Nostratic dialects and their highly hypothetic relationship.

Reconstruction of North African vegetation during past green Sahara periods. Estimated and reconstructed MAP for the Holocene GSP (6–10 kyr BP) projected onto a cross-section along the eastern Sahara (left panel) and map view of reconstructed MAP, vegetation and physiographic elements [7,8,11,45] (right panel). Image from Larrasoaña et al. (2013).

What should be clear to anyone is that the attempt of many modern Afroasiatic speakers to connect their language to their own (or their own community’s main) haplogroups, frequently E and/or J, is flawed for many reasons; it was simplistic in the 2000s, but it is absurd after the advent of ancient DNA investigation and more recent investigation on SNP mutation rates. R1b-V88 should have been on the table of discussions about the expansion of Afroasiatic communities through the Green Sahara long ago, whether one supports a Nostratic phylum or not.

The fact that the role of R1b bottlenecks and expansions in the spread of Afroasiatic is usually not even discussed despite their likely connection with the most recent population expansions through the Green Sahara fitting a reasonable time frame for Proto-Afroasiatic reconstruction, a reasonable geographical homeland, and a compatible dialectal division – unlike many other proposed (E or J) subclades – reveals (once again) a lot about the reasons behind amateur interest in genetics.

Just like seeing the fixation in (and immobility of) recent writings about the role of I1, I2, or (more recently) R1a in the Proto-Indo-European expansion, R1b with Vasconic, or N1c with Proto-Uralic.

NOTE. That evident interest notwithstanding, it is undeniable that we have a much better understanding of the expansions of R1b subclades than other haplogroups, probably due in great part to the easier recovery of ancient DNA from Eurasia (and Europe in particular), for many different – sociopolitical, geographical, technological – reasons. It is quite possible that a more thorough temporal transect of ancient DNA from the Middle East and Africa might radically change our understanding of population movements, especially those related to the Afroasiatic expansion. I am referring in this post to interpretations based on the data we currently have, despite that potential R1b-based bias.


Sea Peoples behind Philistines were Aegeans, including R1b-M269 lineages

New open access paper Ancient DNA sheds light on the genetic origins of early Iron Age Philistines, by Feldman et al. Science Advances (2019) 5(7):eaax0061.

Interesting excerpts (modified for clarity, emphasis mine):

Here, we report genome-wide data from human remains excavated at the ancient seaport of Ashkelon, forming a genetic time series encompassing the Bronze to Iron Age transition. We find that all three Ashkelon populations derive most of their ancestry from the local Levantine gene pool. The early Iron Age population was distinct in its high genetic affinity to European-derived populations and in the high variation of that affinity, suggesting that a gene flow from a European-related gene pool entered Ashkelon either at the end of the Bronze Age or at the beginning of the Iron Age. Of the available contemporaneous populations, we model the southern European gene pool as the best proxy for this incoming gene flow. Last, we observe that the excess European affinity of the early Iron Age individuals does not persist in the later Iron Age population, suggesting that it had a limited genetic impact on the long-term population structure of the people in Ashkelon.

Ancient genomes (marked with color-filled symbols) projected onto the principal components inferred from present-day west Eurasians (gray circles). The newly reported Ashkelon populations are annotated in the upper corner.

Genetic discontinuity between the Bronze Age and the early Iron Age people of Ashkelon

In comparison to ASH_LBA, the four ASH_IA1 individuals from the following Iron Age I period are, on average, shifted along PC1 toward the European cline and are more spread out along PC1, overlapping with ASH_LBA on one extreme and with the Greek Late Bronze Age “S_Greece_LBA” on the other. Similarly, genetic clustering assigns ASH_IA1 with an average of 14% contribution from a cluster maximized in the Mesolithic European hunter-gatherers labeled “WHG” (shown in blue in Fig. 2B) (15, 22, 26). This component is inferred only in small proportions in earlier Bronze Age Levantine populations (2 to 9%).

In agreement with the PCA and ADMIXTURE results, only European hunter-gatherers (including WHG) and populations sharing a history of genetic admixture with European hunter-gatherers (e.g., as European Neolithic and post-Neolithic populations) produced significantly positive f4-statistics (Z ≥ 3), suggesting that, compared to ASH_LBA, ASH_IA1 has additional European-related ancestry.

We find that the PC1 coordinates positively correlate with the proportion of WHG ancestry modeled in the Ashkelon individuals, suggesting that WHG reasonably tag a European-related ancestral component within the ASH_IA1 individuals.

We plot the ancestral proportions of the Ashkelon individuals inferred by qpAdm using Iran_ChL, Levant_ChL, and WHG as sources ±1 SEs. P values are annotated under each model. In cases when the three-way model failed (χ2P < 0.05), we plot the fitting two-way model. The WHG ancestry is necessary only in ASH_IA1.

The best supported one (χ2P = 0.675) infers that ASH_IA1 derives around 43% of ancestry from the Greek Bronze Age “Crete_Odigitria_BA” (43.1 ± 19.2%) and the rest from the ASH_LBA population.

(…) only the models including “Sardinian,” “Crete_Odigitria_BA,” or “Iberia_BA” as the candidate population provided a good fit (χ2P = 0.715, 49.3 ± 8.5%; χ2P = 0.972, 38.0 ± 22.0%; and χ2P = 0.964, 25.8 ± 9.3%, respectively). We note that, because of geographical and temporal sampling gaps, populations that potentially contributed the “European-related” admixture in ASH_IA1 could be missing from the dataset.

The transient impact of the “European-related” gene flow on the Ashkelon gene pool

The ASH_IA2 individuals are intermediate along PC1 between the ASH_LBA ones and the earlier Bronze Age Levantines (Jordan_EBA/Lebanon_MBA) in the west Eurasian PCA (Fig. 2A). Notably, despite being chronologically closer to ASH_IA1, the ASH_IA2 individuals position closer, on average, to the earlier Bronze Age individuals.

See more information on Y-DNA SNP calls, including ASH067 as R1b-M269 (xL151).

The transient excess of European-related genetic affinity in ASH_IA1 can be explained by two scenarios. The early Iron Age European-related genetic component could have been diluted by either the local Ashkelon population to the undetectable level at the time of the later Iron Age individuals or by a gene flow from a population outside of Ashkelon introduced during the final stages of the early Iron Age or the beginning of the later Iron Age.

By modeling ASH_IA2 as a mixture of ASH_IA1 and earlier Bronze Age Levantines/Late Period Egyptian, we infer a range of 7 to 38% of contribution from ASH_IA1, although no contribution cannot be rejected because of the limited resolution to differentiate between Bronze Age and early Iron Age ancestries in this model.

Hg. R1b-M269 and the Aegean

I already predicted this relationship of Philistines and Aegeans (Greeks in particular) months ago, based on linguistics, archaeology, and phylogeography, although it was (and still is) yet unclear if these paternal lineages might have come from other nearby populations which might be descended from Common Anatolians instead, given the known intense contacts between Helladic and West Anatolian groups.

The alternative view: The Sea Peoples can be traced back to the Aegean, so they could also have consisted of Luwian petty kingdoms, who had formed an alliance and attacked Hatti from the south.

The deduction process for the Greek connection was quite simple:

Palaeo-Balkan populations

We know that R1b-Z2103 expanded with Yamna, including West Yamna settlers: they appear in Vučedol, which means they formed part of the earliest expansion waves of Yamna settlers into the Carpathian Basin, and they also appear scattered among Bell Beakers (apart from dominating East Yamna and Afanasevo), which suggests that they were possibly one of the most successful lineages during the late Repin/early Yamna expansion.

The “Steppe ancestry” associated with I2a-L699 samples among Balkan BA peoples may have also been associated with recent Bronze Age expansions, and this haplogroup’s presence among modern Balkan peoples may also suggest that it expanded with Palaeo-Balkan languages. Nevertheless, we don’t know which specific lineages and “Steppe ancestry” they represent, sadly.

These samples may well be related to remnants of previous Balkan populations like Cernavodă or Ezero, because there has been no peer-reviewed attempt at distinguishing Khvalynsk-/Novodanilovka- from Sredni Stog- from Yamnaya-related populations (see here), and some groups that are associated with this ancestry, like Corded Ware, are known to be culturally distinct from Yamna.

In any case, Proto-Greeks from the southern Balkans (say, Sitagroi IV and related groups) are probably going to show, based on Palaeo-Balkan substrate and Pre-Greek substrate and on the available Mycenaean samples, a process of decreasing proportion of R1b-Z2103 lineages relative to local ones, and a relatively similar cline of Yamna:EEF ancestry from northern to southern areas, at least in the periods closest to the Yamna expansion.

NOTE. The finding of “archaic” R1b-L389 (R1b-V1636) and R1a-M198 subclades among modern Greeks and the likely Neolithic origin of these paternal lineages around the Caucasus suggest that their presence in Greece may be from any of the more recent migrations that have happened between Anatolia and the Balkans, especially during the Common Era, rather than Indo-Anatolian migrations; probably very very recently.

Bronze Age cultures in the Balkans and the Aegean. See full map including ancient samples with Y-DNA, mtDNA, and ADMIXTURE.

Minoans and haplogroup J

In the Aegean, it is already evident that the population changed language partly through cultural diffusion, probably through elite domination of Proto-Greek speakers. Whether that happened before the invasion into the Greek Peninsula or after it is unclear, as we discussed recently, because we only have one reported Y-chromosome haplogroup among Mycenaeans, and it is J (probably continuing earlier lineages).

Now we have more samples from the so-called Emporion 2 cluster in Olalde et al. (2019), which shows Mycenaean-like eastern Mediterranean ancestry and 3 (out of 3) samples of haplogroup J, which – given the origin of the colony in Phocea – may be interpreted as the prevalence of West Anatolian-like ancestry and lineages in the eastern part of the Aegean (and possibly thus south Peloponnese), in line with the modern situation.

NOTE. It does not seem likely that those R or R1b-L23 samples from the Emporion 1 cluster are R1b-Z2103, based on their West European-like ancestry, although they still may be, because – as we know – ancestry (unlike haplogroup) changes too easily to interpret it as an ancestral ethnolinguistic marker.

PCA of ancient samples related to the Aegean, with Minoans, Mycenaeans (including the Emporion 2 cluster in the background) Anatolia N-Ch.-BA and Levantine BA-LBA populations, including Tel Shadud samples. See more PCAs of ancient Eurasian populations.

Greeks and haplogroup R1b-M269

Therefore, while the presence of R1b-Z2103 among ancient Balkan peoples connected to the Yamna expansion is clear, one might ask if R1b-Z2103 really spread up to the Peloponnese by the time of the Mycenaean Civilization. That has only one indirect answer, and it’s most likely yes.

We already had some R1b-Z2103 among Thracians and around the Armenoid homeland, which offers another clue at the migration of these lineages from the Balkans. The distribution of different “archaic” R1b-Z2103 subclades among modern Balkan populations and around the Aegean offered more support to this conclusion.

But now we have two interesting ancient populations that bear witness to the likely intrusion of R1b-M269 with Proto-Greeks:

An Ancient Greek of hg. R1b

A single ancient sample supports the increase in R1b-Z2103 among Greeks during the “Dorian” invasions that triggered the Dark Ages and the phenomenon of the Aegean Sea Peoples. It comes from a Greek lab study, showing R1b1b (i.e. R1b-P297 in the old nomenclature) as the only Y-chromosome haplogroup obtained from the sampling of the Gulf of Amurakia ca. 470-30 BC, i.e. before the Roman foundation of Nikopolis, hence from people likely from Anaktorion in Ancient Acarnania, of Corinthian origin.


Even with the few data available – and with the caution necessary for this kind of studies from non-established labs, which may be subject to many different kinds of errors – one could argue that the western Greek areas, which received different waves of migrants from the north and shows a higher distribution of R1b-Z2103 in modern times, was probably more heavily admixed with R1b-Z2103 than southern and eastern areas, which were always dominated by Greek-speaking populations more heavily admixed with locals.

The Dorian invasion and the Greek Dark Ages may thus account for a renewed influx of R1b-Z2103 lineages accompanying the dialects that would eventually help form the Hellenic Koiné. In a sense, it is only natural that demographically stronger populations around the Bronze Age Aegean would suffer a limited (male) population replacement with the succeeding invasions, starting with a higher genetic impact in the north-west and diminishing as they progressed to the south and the east, coupled with stepped admixture events with local populations.

This would be therefore the late equivalent of what happened at the end of the 3rd millennium BC, with Mycenaeans and their genetic continuity with Minoans.

Distribution of Pre-Greek place-names ending in -ssos/-ssa or -sos/-sa. See original images and more on the south/east cline distribution of Pre-Greek place-names here.

Sea peoples of hg. R1b-M269

Thanks to Wang et al. (2018) supplementary materials we knew that one of the two Levantine LBA II samples from Tel Shadud (final 13th–early 11th c. BC) published in van den Brink (2017) was of hg. R1b-M269 – in fact, the one interpreted as a Canaanite official residing at this site and emulating selected funerary aspects of Egyptian mortuary culture.

Both analyzed samples, this elite individual and a commoner of hg. J buried nearby, were genetically similar and indistinguishable from local populations, though:

Principal Components Analysis of L112 and L126 was carried out within the framework described in Lazaridis et al. (2016). This analysis showed that the two individuals cluster genetically, with similar estimated proportions of ancestry from diverse West Eurasian ancestral sources. These results are consistent with the hypothesis that they derive from the same population, or alternatively that they derive from two quite closely related populations.

We know that ancestry changes easily within a few generations, so there was not much information to go on, except for the fact that – being R1b-M269 – this individual could trace his paternal ancestor at some point to Proto-Indo-Europeans.

One might think that, because many haplogroups in this spreadsheet were wrong, this is also wrong; nevertheless, many haplogroups are correctly identified by Yleaf, and finding R1b-M269 in the Levant after the expansion of Sea Peoples could not be that surprising, because they were most likely related to populations of the Aegean Sea. Any other related hg. R1b (R1b-M73, R1b-V88, even R1b-V1636) wouldn’t fit as well as R1b-M269.


However, the early expansion of Proto-Indo-Aryans into the Middle East, as well as the later expansion of Armenians from the Balkans through Anatolia and of West Iranians from the east may have all potentially been related to this sample. But still, the previous linguistic and archaeological theories concerning the Philistines and the expansion of Sea Peoples in the Levant made this sample a likely (originally) Greek “Dorian” lineage, rather than the other (increasingly speculative) alternatives.

In any case, it was obvious to anyone – that is, to anyone with a minimum knowledge of how population genomics works – that just the two samples from van den Brink (2017) couldn’t be used to get to any conclusions about the ancestral origin of these individuals (or their differences) beyond Levantine peoples, because their ancestry was essentially (i.e. statistically) the same as the other few available ancient samples from nearby regions and similar periods.

If anything, the PCA suggested an origin of the R1b sample closer to Aegean populations relative to the J individual (see PCA above), and this should have been supported also by amateur models, without any possible confirmation (as with the ASH_IA2 cluster in this paper). However, if you have followed online discussions of Tel Shadud R1b-M269 sample since it was mentioned first on Eupedia months ago – including another wave of misguided speculation based on the ancestry of both individuals triggered by a discussion on this blog -, you have once more proof of how misleading ancestry analyses can be in the wrong hands.

NOTE. This is the Nth proof (and that only in 2019) of how it’s best to just avoid amateur analyses and interpretations altogether, as I did in the recent publication of the books. All those who didn’t take into account whatever was commented about the ancestry of these samples haven’t lost a single bit of relevant information on Levantine peoples, and have had more time for useful reads, compared to those dedicated to endless void speculation, once again gone awfully wrong, as does everything related to cocky ancient DNA crackpottery 😉

Late Bronze Age population movements in the Eastern Mediterranean and the Middle East. See full map including ancient DNA samples with Y-DNA, mtDNA, and ADMIXTURE.

Admittedly, though, even accepting the evident Mediterranean origin of this lineage, one could have argued that this sample may have been of R1b-L151 subclade, if one were inclined to support the theory that Italic peoples were behind Sea Peoples expanding east – and consequently that the ancestors of Etruscans had migrated eastward into the Aegean (e.g. into Lemnos), so that it could be asserted that Tyrsenian might have been a remnant language of an ancient population of northern Italy.


Fortunately, some of the samples recovered in Feldman et al. (2019) that could be analyzed (those of the cluster ASH_IA1) offer a very specific time frame where European ancestry appeared (ca. 1250 BC) before it subsequently became fully diluted (as seen in cluster ASH_IA2) among the prevalent Levantine ancestry of the area.

Also fortunately, this precise cluster shows another R1b-M269 sample, likely R1b-Z2103 (because it is probably xL151), and this sample together with others from the same cluster prove that the ancestry related to the original southern European incomers was:

  1. Recent, related thus to LBA population movements, as expected; and
  2. More closely related to coeval Aegeans, including Mycenaeans with Steppe-related ancestry.

NOTE. I say “fortunately” because, as you can imagine if you have dealt with amateurish discussions long enough, without this cluster with evident Aegean ancestry and the R1b-M269 (Z2103) sample precisely associated to it, some would enter again in endless comment loops created by ancestry magicians, showing how Aegean peoples were not behind Sea Peoples, or not behind Philistines, or not behind the R1b-M269 among Philistines, depending on their specific agendas.

Map of the Sea People invasions in the Aegean Sea and Eastern Mediterranean at the end of the Late Bronze Age (blue arrows).. Some of the major cities impacted by the raids are denoted with historical dates. Inland invasions are represented by purple arrows. From Kaniewski et al. (2011). Some of the major cities impacted by the raids are denoted with historical dates. Inland invasions are represented by purple arrows.

The results of the paper don’t solve the question of the exact origin of all Sea Peoples (not even that of Philistines), but it is quite clear that most of those forming this seafaring confederation must have come from sites around the Aegean Sea. This supports thus the traditional origin attributed to them, including a hint at the likely expansion of Eastern Mediterranean ancestry and lineages into the Italian Peninsula precisely from the Aegean, as some oral communications have already disclosed.

As an indirect conclusion from the findings in this paper, then, we can now more confidently support that Tyrsenian speakers most likely expanded into the Appenines and the Alps originally from a Tyrsenian-speaking LBA population from Lemnos, due to the social unrest in the whole Aegean region, and might have become heavily admixed with local Italic peoples quite quickly, as it happened with Philistines, resulting in yet another case of language expansion through (the simplistically called) elite domination.


Even more interesting than these specific findings, this paper confirms yet another hypothesis based on phylogeography, and proves once again two important starting points for ancient DNA interpretation that I have discussed extensively in this blog:

  • The rare R1b-M269 Y-chromosome lineage of Tel Shadud offered ipso facto the most relevant clue about the ancestral geographical origin of this Canaanite elite male’s paternal family, most likely from the north-west based on ancient phylogeography, which indirectly – in combination with linguistics and archaeology – supported the ancestral ethnolinguistic identification of Philistines with the Aegean and thus with (a population closest to) Ancient Greeks.
  • Ancestry analyses are often fully unreliable when assessing population movements, especially when few samples from incomplete temporal-geographical transects are assessed in isolation, because – unlike paternal (and maternal) haplogroups – ancestry might change fully within a few generations, depending on the particular anthropological setting. Their investigation is thus bound by many limitations – of design, statistical, and anthropological (i.e. archaeological and linguistic) – which are quite often not taken into account.

These cornerstones of ancient DNA interpretation have been already demonstrated to be valid not only for Levantine populations, as in this case, but also for Balkan peoples, for Bell Beakers, for steppe populations (like Khvalynsk, Sredni Stog, Yamna, Corded Ware), for Basques, for Balto-Slavs, for Ugrians and Samoyeds, and for many other prehistoric peoples.

I rest my case.


Balto-Slavic accentual mobility: an innovation in contact with Balto-Finnic


Some very specific prosodic innovations affected the Balto-Slavic linguistic community, probably at a time when it already showed internal dialectal differences. Whether those innovations were related to archaic remnants stemming from the parent Proto-Indo-European language, and whether that disintegrating community included different dialects, remains an object of active debate.

“Archaic” Balto-Slavic?

The main question about Balto-Slavic is whether this concept represents a single community, or it was rather a continuum formed by two (Baltic and Slavic) or possibly three (East Baltic, West Baltic, Slavic) neighbouring communities, speaking closely related Northern European dialects, which just happened to evolve very close to each other, i.e. in cultures that were closer to each other than they were to Germanic or Balto-Finnic.

In my opinion, their similarities warrant the reconstruction of a single original central-east European community since the dissolution of Bell Beakers, speaking a North-West Indo-European dialect, and most internal differences between Baltic and Slavic may be explained as innovations. The precise identification of a Proto-Balto-Slavic community remains elusive, although the Unetice-Iwno-Mierzanowice triangle remains the best bet, with Trzciniec showing what seems like an Early Slavic-like population reaching up to the East Baltic.

Bell Beaker expansion in eastern Europe and around the Baltic.

The reconstruction of a common Balto-Slavic proto-language is known to range from difficult to impossible, depending on who you ask, not the least because of the differences that are discussed in this post, and which have been the own battlefield created by Balticists and Slavicists for decades. The old tenet that Balto-Slavic had inherited some traits directly from PIE is – in contrast with e.g. the Italo-Celtic concept – surprisingly vivid still today.

Take, for example, these internal differences and supposedly archaic traits:

  • The ruKi rule, where Baltic shows mostly *is, *us, and Slavic shows *, *; or the different output of Satemization in Baltic compared to Slavic (and both compared to Indo-Iranian). Nevertheless, the Satemization trends in Balto-Slavic and Indo-Iranian are usually explained together and taken as a sign of a traditional three-velar system for PIE.
    • If you consider Satemization as a late trend in Balto-Slavic, affecting each dialect in a different way, and thus Balto-Slavic phonetic evolution clearly distinct from the Indo-Iranian trend, rejecting trictectalism, this problem is solved. This would also solve the impossible Indo-Slavonic problem, and the paradox of Balto-Slavic sharing a genetic phylum with Germanic and Italo-Celtic.
    • If you, however, conflate these differences and North-West Indo-European features with an ad hoc explanation of a hypothetic Centum dialect called Temematic, which intends to solve their (in Holzer’s words) unlösbaren inconsistencies, you essentially add a whole new inconsistency without solving their previous ones. For a full rebuttal of Holzer‘s Temematic etymologies, see Matasović (2014).
  • Kortlandt’s reconstruction of a PIE 3rd singular *-e (Baltic from *-et, Slavic from *-eti) and 3rd plural *-o, which would have been replaced independently in other Indo-European dialects (by *-eti, *-onti), is reminiscent of his own reconstruction of laryngeals almost up to the attestation of all Indo-European dialects, including Baltic. If you consider these traits an innovation, this artificially created problem is immediately solved.
  • Genitive plural Pre-Baltic *-ōm vs. Pre-Slavic *-ŏm is another commonly cited example. However, I would place this difference among other similar differences found within other related IE dialects, hence a common phonetic innovation (see e.g. below for the classicist view of unstable obliques).
  • Kortlandt’s reconstruction of oblique cases in *-m-, shared with Germanic, as stemming from a common Middle PIE *-mus (based essentially on Old Lithuanian *-mus and on a non-existent equivalent Anatolian formation), hence different from those in *-bʰ-. While you can argue for infinite more reasonable alternatives, the most often cited one is the ins.-dat. pl. *-bʰ- as a common NWIE innovation based on ins. sg. *bʰi-, while forms in *-m- (including ins. sg.) as a Northern European phonetic innovation. The simplest, most elegant explanation I’ve read to date (I think by Rémy Viredaz) is the similar bilabial change of Giacobo/Giacomo in Italian…

As you can see, some Balto-Slavicists could have written whole books about how their object of study holds the key to solve problems on common Proto-Indo-European paradigms, some of which wouldn’t need solving if they hadn’t been started by Balto-Slavicists themselves…

While all of these “archaic” traits are easily dismissed without further ado (except for some understandable damaged pride among academics), there is one especially pervasive idea among those willing to find the white whale of laryngeal remnants in Indo-European languages (see here for other examples of dubious laryngeal remains).

The prophecy before the battle, Józef Ryszkiewicz, 1890. Or, how to conjure laryngeal remnants in Balto-Slavic.

Accentual development in contact

Whichever position one prefers, the general argument is that the Balto-Slavic accentual system is non-trivial for the classification of both dialects into a common branch. However, that would only be completely true if it were a common innovation, but not so much if it were a natural laryngeal evolution.

In fact, the broken tone preserving a PIE laryngeal, as proposed by Kortlandt – continuing Meillet’s idea of synchronous PIE-PBS developments – was always very difficult to accept. Even the rising pronunciation is not original, and represents a shift of the accent on the initial syllable in Latvian…

In my opinion, the derivation of a modern phenomenon from a PIE laryngeal must always raise a red flag (see below on archaisms vs. innovations in IE languages). As you can see from my take of the fable in Balto-Slavic, which uses Kortlandt’s reconstruction, I preferred not to take into account the reconstructed accents. The fable remains thus a model of what could have been a common Proto-Balto-Slavic, unlike other reconstructions, which are much less tentative.

NOTE. You could argue that accents may be reconstructed in spite of the wrong theory behind them, but this is not true; at least not of all reconstructed accents, some of which require further assumptions. Think about it this way: I wouldn’t take into account a reconstruction of Germanic accent which used Danish glottalized tone for a hypothetical Proto-Germanic laryngeal, even if most accents seemed correct at first sight. The truth is, I didn’t want to dedicate time to go through each reconstructed word and its explanation, so it was easier to delete them all, even though that’s not an actual solution, either. You will find the same doubts in the description of Balto-Slavic evolution in my old Modern Indo-European grammar. The introduction to IE dialects was partially copied from Wikipedia (which, in the case of Balto-Slavic, essentially summarized data from Kortlandt), but in the grammar I just tried to keep the basics, and not very successfully, because you need a comprehensive and coherent description of a language’s evolution. That’s how messed up the question was, and how it still is, even though 15 years of research have passed…

Despite the idea of an “archaic Balto-Slavic”, especially prevalent among older researchers, the current trend is to consider Balto-Slavic prosodic changes as a natural innovation, even among those who would artificially reconstruct laryngeal remnants up to late Balto-Slavic stages.

NOTE. You can read more about the Proto-Indo-European laryngeal loss and vocalism. While the presence of certain laryngeals up to Late PIE is certain, the loss in many environments is also generally agreed upon. This is especially true of a hypothetical Indo-Slavonic branch, like that supported by Kortlandt: even those supporting multiple laryngeal loss events must admit that Indo-Iranian showed no laryngeals before its disintegration, whether they put this loss as an internal Proto-Indo-Iranian evolution, or they place it earlier. Tocharian attests to an evolution similar to the rest of Late PIE dialects (hence to a quite early laryngeal loss trend), and Balkan dialects (supposedly splitting before Indo-Slavonic) also lost laryngeals in a similar way, except for initial ones, which show vocalic output instead of full loss.

So, where does a laryngeal loss fit in this “Indo-Slavonic” scheme, exactly? Before the Tocharian split? Before the Balkan split? After the Balkan split but before the full loss in Indo-Iranian? And where exactly does this group belong regarding Corded Ware, and where does Germanic? No idea (but you can read Kortlandt try fitting his model with Gimbutas’ “Kurgan peoples”). Because one thing is to reconstruct Proto-Greek, or Proto-Celtic, or Proto-Italic forms without laryngeals and to put them in relation with a purely theoretical three-laryngeal PIE, and a different one is to reconstruct laryngeals (including in environments which were already lost in Tocharian) up to Proto-Baltic and Proto-Slavic, which seems more than just a bit of a stretch…

Indo-European dialectal relationships, from Mallory and Adams (2006).

Thomas Olander offered a summary of the current positions regarding the Balto-Slavic accentual system recently in Indo-European heritage in the Balto-Slavic accentuation system (2013), which also contains a summary of his Mobility Law, to explain this phenomenon as a common Pre-Baltic and Pre-Slavic innovation.

Andersen, an advocate of different Baltic and Slavic dialects developing in contact with Satem dialects, suggested in The Satem Languages of the Indo-European Northwest. First Contacts? (2009), partially based on Olander’s initial proposal, that Baltic and Slavic accentual mobility arose as a result of contact with languages with fixed word-initial ictus: the accent was lost in the word-final mora in pre-Proto-Baltic and, independently, in pre-Proto-Slavic. Hence, the central innovation, the accent loss

technically is not a shared Slavic and Baltic innovation. On the contrary. It shows that the speakers of the Pre-Slavic and Pre-Baltic dialects formed bilingual communities with speakers of contact dialects that were of the same prosodic type, viz. had fixed initial ictus but no free accent.

In the meantime, Olander (2019) has found out about more real-world examples of this same phenomenon:

Prosodic features are known to be susceptible to contact influence (Salmons 1992:1 and passim). While it does not directly influence the evaluation of the Mobility Law as a non-trivial innovation, it is interesting that most of the alleged parallels are indeed considered to be contact-induced changes due to influence from languages with an ictus on the word-initial syllable (Andersen 2009: 11-14; Rinkevičius 2013): Balto-Fennic in the case of the Karelian and (perhaps through Latvian as an intermediary) Žemaitian dialects, and Hungarian in the case of the Slavonian dialects (for Karelian see Jakobson 1938/2002: 239; Veenker 1967: 74; Thomason & Kaufman 1988: 122, 241; Salmons 1992: 41- 42; for Žemaitian see Zinkevičius 1966: 45- 46; for Slavonian see Ivić 1958: 287).

I am not aware of any hypotheses on a contact-induced origin for Greek prosodic innovations, but it is at least worth noting that there is agreement on significant substrate influence on Greek. While we may speculate that these substrate language(s) had word-initial ictus like Balto-Fennic and Hungarian, we do not have any actual information about the prosodic system(s) (thus even Beekes 2014: 9, who in other respects provides a fairly detailed picture of the substrate).

The parallels from other speech varieties show that an accent loss of the type suggested for a pre-stage of Baltic and Slavic is a type of prosodic change that has occurred several times in different various systems. In the context of the present paper this means that the sound law itself cannot be classified as a non-trivial innovation; it may have taken place in already differentiated dialects or languages. Also, the parallels suggest that a loss of the accent may be the result of influence from languages with fixed word-initial ictus.

In this time when even linguists agree that substrate/contact languages have to be related to specific ethnolinguistic groups (see here for Germanic), the fact that Olander stops short of naming this substrate behind Pre-Baltic and Pre-Slavic as being Late Uralic in general, or Balto-Finnic in particular, is surprising.

NOTE. Not the least because Olander is part of the Homeland Timeline map project of the Copenhagen group (their website is not working right now), and they placed Volosovo as Uralians expanding with Netted Ware in contact with the Baltic during the Bronze Age…So what’s to doubt about Balto-Slavic – Balto-Finnic contacts, exactly? Maybe if Balto-Finnic was the substrate language behind Balto-Slavic (as it was in Germanic), it would mean that Uralic languages were previously spoken in territories that became later Germanic- and Balto-Slavic-speaking?

Still image from the Copenhagen Timeline Map (accessed one year ago), showing in green Volosovo hunter-gatherers who, according to the map, later expand to the north-east with Netted Ware…

Archaism vs. Innovation

If we tried to describe these trends of explaining peculiar traits in recent Indo-European dialects as archaism vs. innovation from a purely theoretical point of view, we could roughly distinguish two different positions (with infinite variants, of course) among academics – just like we could find people more inclined to leftist or rightist trends when speaking about economy. When it comes to linguistics, which is the least messed-up field where one can describe Indo-European and Indo-Europeans, I think we can find two alternative basic tenets:

  • One idea would hold that the oldest attested dialects – and those with an older guesstimated proto-language – are the gold standard as to what the original situation may have been, and about what could be described as an archaism. For example, Ancient Greek and Mycenaean or Vedic Sanskrit for old dialects; Tocharian, or Italic dialects for those with quite old guesstimates, each for different reasons; and Anatolian for both, old dialect and attested early.
  • NOTE. Nevertheless, the phonology of Anatolian inscriptions is often difficult to ascertain, and its ancient dialectal nature stemming from a Middle PIE stage may still be disputed by some. The archaic nature of Tocharian seems to be maybe less generally accepted than that of Anatolian, but I would say there is general consensus on the matter today.

  • The other general idea would support that the most isolated dialects are those which may hold the key to the oldest Indo-European traits, somehow hidden from external influences and areal contacts, and thus from generalized innovative trends that have affected the best known ancient dialects. In that sense, languages like Slavic, Baltic, Albanian, or Armenian – as well as some Balkan fragmentary dialects – are quite common aims of study to reveal exceptional PIE traits.

I think the education system in Southern Europe and South Asia is that of formal classicists. In eastern Europe, I’d reckon the education system – especially in regions that were never connected to the Graeco-Roman tradition – favours linguistics as a study of the own and related proto-languages. For northern Europe, I would say it’s 50/50, especially in Scandinavia, depending on whether classicists or linguists dominate over the departments of Indo-European. For example, while Germany or Austria would maybe lean more toward the classics, Copenhagen’s obsession with Germanic as the most archaic IE branch is well known…

A 17th-century birch bark manuscript of Pāṇini’s grammar treatise from Kashmir. Image from Wikipedia.

Both positions, when blindly accepted, are bound to fail at some point or another:

  • If you take Classical Sanskrit, Classical Greek, or Classical Latin as an example of Proto-Indo-European, you are bound to make radical mistakes when reconstructing the parent language, more so if you disregard the oldest attested layers of the languages. An interesting view of the so-called Adradists at the Complutense University of Madrid – apart from their famous 9-laryngeal reconstruction – is that Middle PIE had only 5 cases, with a general (unstable) oblique one in Late PIE that later evolved into the attested 5 to 8 cases in the different dialects. That is, in my opinion, a fairly typical classicist error, which would be easily addressed by taking into account the oldest stages, like those attested in Mycenaean and in Old Latin, instead of focusing on classical grammar. The 8-case system is, in fact, one of the few true Balto-Slavic archaisms, supported by external comparanda.
  • On the other hand, if you take Albanian, Armenian, Baltic or Slavic, or even phonetically dubious data like those from some Anatolian inscriptions, you can eventually argue for anything. And I really mean anything; you are leaving the logic door wide open for any crazy-ass opinion about Proto-Indo-European based on traits found in modern languages: From how many velars evolved (if at all, because you may find all of them in Luwian, or still living in Albanian or in Armenian…) and their nature as ejective consonants in Late PIE (based on Armenian or Germanic); to how many laryngeals and when these laryngeals disappeared (if they actually did disappear, because some may even find them in Modern Lithuanian, in Armenian, or in Danish…); etc. Once you believe your own romantic view of some modern language(s) retaining traits from five thousand years ago, there is no stopping that; not for you, but not for anyone else, either.

NOTE. One of the funniest consequences of this type of ‘worldview’, where one assumes that – the own interpretations of – modern dialects are as reliable (or even more so than) ancient ones, and that Indo-European dialects somehow split at the same time from the parent language (so there was one common “full laryngeal” language, and then all attested dialects evolved from it) are some of the theories that you can easily find posted on Facebook’s group on Proto-Indo-European. Let’s just say, for the sake of simplicity, that you can compare English ‘sunrise’ with Spanish ‘sonrisa’ “smile” all you want, and assert that both reveal a common origin in PIE *sup- hence from the Sun and the smile going “up” or something, but any explanation as to how you reached that conclusion doesn’t make for the why this comparison shouldn’t have even started at all. Now replace English and Spanish with Armenian, Slavic, and/or Albanian, invent some new IE sound law, throw one or two laryngeals in the mix, and somehow this might get a pass among certain linguists…

The Celebration of Svetovid on Rügen, Alphonse Mucha, The Slav Epic. Image from Wikipedia. Were Early Slavs some among a selected few romantic peoples to keep the “true” Indo-European language and traditions? Of course not.

While no one can deny the value of different Indo-European branches for the reconstruction of the parent language, no matter how recently they were attested, the only reasonable solution whenever a difficult case arises is to trust ancient dialects more than recent ones. Using data from fringe theories based on recent dialects to build a Proto-Indo-European paradigm, especially when there is contradictory data from ancient IE dialects, is flawed for two reasons:

  1. Languages attested later – especially after periods of population movements and contacts – would show, in general, a greater degree of change. Preferring Old Slavic or Classical Armenian to reconstruct Indo-European over ancient dialects like Ancient Greek, Vedic Sanskrit, or ancient Italic dialects is, in a way, like taking Byzantine Greek, Pali, or Old French as models, respectively.
  2. Classical languages are indeed modified due to the action of grammarians, but once standardized these “languages behind a state” (or religion) are less prone to change, due to the transmission of oral (and written) literature, education, commerce, etc. Languages left to unorganized tribes are less constrained in their evolution, and their internal (substrate) and external (contact) influences are greater and (what’s worse) unknown.

Baltic and Slavic, like Albanian or Armenian, are dialects attested very recently, which may have undergone complex internal and external influences we may never fully understand. Confronted with controversial or inexplicable traits compared to ancient branches like Greek, Indo-Iranian, or Italo-Celtic (especially if they fit with other Indo-European dialects), the conservative solution that will be right most of the time (and I mean 99.9999% of cases) is to assume they represent an innovation over Late PIE.

The fact that some researchers still use these recent dialects as a blank canvas instead, in order to propose unending new ideas about how to reconstruct IE proto-languages, or even older common PIE stages, is shocking. Not “R1a/Steppe” vs. “N1c/Siberian” haplogroup+ancestry bullshit-level shocking, but still unacceptable in a serious academic environment.

The only reason why Balto-Slavicists have failed so many times in this “unsolvable” question that seems to be Proto-Balto-Slavic reconstruction, apart from the known differences between Baltic and Slavic, is precisely the fixation of many with their object of study as a model for other IE languages (and thus for PIE), instead of taking the rest as a model for the reconstruction of Balto-Slavic (or of Proto-Baltic and Proto-Slavic).

Repeating ad nauseam the popular concept of Balto-Slavic (or Baltic and Slavic) being among the most archaic IE dialects, or the slowest evolving IE dialects, and cheap nationalist slogans of the sort, does not help this aim, and just reading or hearing that should make anyone cringe instantly. Not less than reading or hearing about Sanskrit being essentially equal to PIE, or spoken in the Indus Valley 10,000 years ago. Because we are not living in the 19th century, mind you.