Spread of Indo-European and Uralic speakers in ADMIXTURE


The following are updated files for unsupervised ADMIXTURE of most available ancient Eurasian samples with K=7. For reference, see PCA of ancient and modern Eurasian samples.

NOTE. For a precise interpretation of ancestry evolution, be sure to first check the posts on the expansion of “Steppe ancestry”, on the spread of Yamnaya ancestry with Indo-Europeans, and on the evolution of Corded Ware ancestry typical of modern Uralic populations.

ADMIXTURE timeline

This is a YouTube video similar to the one on Indo-Europeans and Y-DNA evolution:


Some comments

  • I have tried running supervised ADMIXTURE models by selecting distant populations based on PCAs and qpAdm results. The most accurate approximations to what the software should offer appear with a small K number, between K=5 and K=7, whether supervised or unsupervised, and adding more ancestral populations gives some weird results the more distant (in time) populations are from these selected samples.
  • Labels for ancestral components are used following those commonly referred to in the literature, although supervised ADMIXTURE using corresponding available samples (viz. Anatolia Neolithic for AHG, Iran Hotu and/or CHG for IHG, AG2, AG3 and Mal’ta for ANE, etc.) offer slightly different, less smooth outputs for some periods, especially among more recent populations.
  • Outputs depend on many different factors, and these files are intended as an overview of the evolution of these simplistic components. The number of available samples per period, the potential ancestry changes within each conventionally selected period, or whether or not each available sample is representative of the territory they were recovered from, among many other factors, influence the outputs and the maps.
Unsupervised ADMIXTURE (K=7). See full image.

NOTE. In summary, ADMIXTURE results like these below might be used to develop new ideas, to be then formally tested; they cannot be used to support anything. Don’t be like the Copenhagen group, randomly selecting “Steppe ancestry” with K=4, identifying this component as “Indo-Europeans”, and correlating its evolution with changes in vegetation composition in yet another obvious correlation = causation argument among many confounding factors left unaccounted for…

Static ADMIXTURE + culture maps

Colours correspond to the components as labelled in the video and in the files below.

  1. Anatomically Modern Humans (PDF)
  2. Upper Palaeolithic (PDF)
  3. Epipalaeolithic (PDF)
  4. Early Mesolithic (PDF)
  5. Late Mesolithic (PDF)
  6. Neolithic and hunter-gatherer pottery (PDF)
  7. Early Eneolithic (PDF)
  8. Late Eneolithic (PDF)
  9. Early Chalcolithic (PDF)
  10. Late Chalcolithic (PDF)
  11. Early Bronze Age (PDF)
  12. Middle Bronze Age (PDF)
  13. Late Bronze Age (PDF)
  14. Early Iron Age (PDF)
  15. Late Iron Age (PDF)
  16. Antiquity (PDF)
  17. Middle Ages (PDF)

Natural interpolation maps of ADMIXTURE

The following maps offer natural neighbour interpolations of ancestral components in ancient DNA samples grouped by periods (conventionally selected following the same pattern as in the Prehistory Atlas).

  • Extrapolation (inferred ancestry beyond the frame created by available samples per map) is obtained by adding distant external locations (such as Greenland, Arctic, Alaska…) with a value of 0.
  • Videos offer a dynamic timeline.
  • Click on the images to see a version with higher resolution.

WHG ancestry


AHG ancestry


ANE ancestry


“Siberian” ancestry

This ancestry peaks among Baikal HG, Ust’Belaya, Nganasans, or Ulchi, hence the different labels used.


Iran HG ancestry


ADMIXTURE maps by period

Click on each image for a higher resolution version.





Early Eneolithic


Late Eneolithic


Early Chalcolithic


Late Chalcolithic


Early Bronze Age


Middle Bronze Age


Late Bronze Age


Early Iron Age


Late Iron Age




Middle Ages


Modern populations



These are the samples used for interpolations in each period (except for modern populations, which are those included in the Reich Lab curated dataset):

See also

“Steppe ancestry” step by step (2019): Mesolithic to Early Bronze Age Eurasia


The recent update on the Indo-Anatolian homeland in the Middle Volga region and its evolution as the Indo-Tocharian homeland in the Don–Volga area as described in Anthony (2019) has, at last, a strong scientific foundation, as it relies on previous linguistic and archaeological theories, now coupled with ancient phylogeography and genomic ancestry.

There are still some inconsistencies in the interpretation of the so-called “Steppe ancestry”, though, despite the one and a half years that have passed since we first had access to the closest Pontic–Caspian steppe source populations. Even my post “Steppe ancestry” step by step from a year ago is already outdated.


The population selection process for models shown below included (1) plausibility of potential influences in the particular geographic and archaeological context; (2) looking for their clusters or particular samples in the PCA; and (3) testing with qpAdm for potential source populations that might have been involved in their development.

The results and graphics posted are therefore intended to simplistically show potential admixture events between populations potentially close to the actual sources of the target samples, whenever such mating networks could be supported by archaeology.

NOTE. This is an informal post and I am not a geneticist, so I am turning this flexibility to my advantage. If any reader is – for some strange reason – looking for a strict hypothesis testing, for the use of a full set of formal stats (as used e.g. in Ning et al. 2019 for Proto-Tocharians), and correctly redacted and peer-reviewed text, this is not the right place to find them.

An example pedigree (a) of a focal individual sampled in the modern day, placed in its geographic context to make the spatial pedigree (b). Dashed lines denote matings, and solid lines denote parentage, with red hues for the maternal ancestors and blue hues for the paternal ancestors. In the spatial pedigree, each plane represents a sampled region in a discrete (nonoverlapping) generation, and each dot shows the birth location of an individual. The pedigree of the focal individual is highlighted back through time and across space. Image modified from Bradburd and Ralph (2019).

Despite the natural impulse to draw straight mixture trajectories (see e.g. Wang et al. 2019), simply adding or subtracting samples used for a PCA shows how the plot is affected by different variables (see e.g. what happens by including more South Asian samples to the PCA below), hence the need to draw curved arrows – not necessarily representing a sizable drift; at least not in recent prehistoric admixture events for which we have a reasonable chronological transect.

Representation of mixture events between European prehistoric peoples in the PCA. Image modified from David Reich‘s Who We Are and How We Got Here (2018).

Ethnolinguistic identification is a risky business that brings back memories of an evil use of cultural history and its consequences (at least in Western Europe, where this tradition was discontinued after WWII), but it seems necessary for those of us who want to find some confirmation of proposed dialectal schemes and language contacts.

Eneolithic Steppe vs. Steppe Maykop

First things first: I tested Bronze Age Eurasian peoples for the only two true steppe populations sampled to date, as potential sources of their “Steppe ancestry” – conventionally described as an EHG:CHG admixture, similar to that found in the first sampled Yamnaya individuals. I used the rightpops of Wang et al. (2018), but with a catch: since authors used WHG as a leftpop and Villabruna as a rightpop, and I find that a little inconsequential*, I preferred the strategy in Ning et al. (2019), contrasting as outgroup Eneolithic_Steppe (ca. 4300 BC) vs. Steppe_Maykop (ca. 3500 BC) when testing for WHG as a source population.

*WHG usually includes samples from a ‘western’ cluster (Loschbour and La Braña) and an ‘eastern’ cluster (Villabruna and Koros), see Lipson et al. (2017). Therefore, it doesn’t make much sense to include the same (or a very similar) population as a source AND an outgroup.

NOTE. For all other qpAdm analyses below, where WHG was not used as leftpop, I have used Villabruna as rightpop following Wang et al. (2019).

Map of samples and sites mentioned in Wang et al. (2019), modified from the original to include labels of Eneolithic_Steppe and Steppe_Maykop samples. See PCA and ADMIXTURE grahpic for the identification of specific samples.

Results are not much different from what has been reported. In general, Yamnaya and related groups such as Bell Beakers and Steppe-related Chalcolithic/Bronze Age populations show good fits for Eneolithic_Steppe as their closest source for Steppe ancestry, and bad fits for Steppe_Maykop, whereas Corded Ware groups show the opposite, supporting their known differences.

This trend seems to be tempered in some groups, though, most likely due the influence of Samara_LN-like admixture in Circum-Baltic Late Neolithic and Eastern Corded Ware groups, and the influence of Anatolia_N/EEF-like admixture in Balkan and late European CWC or BBC groups. In fact, the more EEF-related ancestry in a populatoin, the less reliable these generic models (and even specific ones) seem to become when distinguishing the Steppe-related source.

NOTE. For more on this, see the discussion on Circum-Baltic Corded Ware peoples, and the discussion on Mycenaeans and their potential source populations.

These are just broad strokes of what might have happened around the Pontic–Caspian steppes before and during the Early Bronze Age expansions. The most relevant quest right now for Indo-European studies is to ascertain the chain of admixture events that led to the development and expansion of Indo-Uralic and its offshoots, Indo-European and Uralic.

Eastern European Mesolithic with the expansion of Post-Swiderian cultures. See full map.

A history of Steppe ancestry

This post is divided in (more or less accurate) chronological developments as follows:

  1. Hunter-gatherer pottery and the steppes
  2. Khvalynsk and Sredni Stog
  3. Post-Stog and Proto-Corded Ware
  4. Yamnaya and Afanasievo

1. Hunter-gatherer pottery and the steppes

I laid out in the ASOSAH book series the general idea – based on attempts to reconstruct the linguistic ancestor of Indo-Uralic – that Eurasiatic speakers might have expanded with the North-Eastern Techno-Complex that spread through north-eastern Europe during the warm period represented by the transition of the Palaeolithic to the Mesolithic.

If one were to trust the traditional migrationist view, a post-Swiderian population expanded from central-eastern Europe (potentially related originally to Epi-Gravettian peoples, represented by WHG ancestry) into north-eastern Europe, and then further east into the Trans-Urals, to then reappear in eastern Europe as a back-migration represented by the spread of hunter-gatherer pottery.

The marked shift from WHG-like towards EHG-related ancestry from Baltic Mesolithic (ca. 30%) to Combed Ware cultures (ca. 65%-100%) supports this continuous westward expansion, that is possibly best represented in the currently available sampling by the ‘south-eastern’ shift (CHG:ANE-related) of the hunter-gatherer from Lebyazhinka IV (5600 BC) relative to the older one from Sidelkino (9300 BC), both from the Samara region in the Middle Volga:

Mesolithic-Neolithic transition ca. 7000-6000 BC, with hunter-gatherer pottery groups spreading westwards. See full map.

From Anthony (2019):

Along the banks of the lower Volga many excavated hunting-fishing camp sites are dated 6200-4500 BC. They could be the source of CHG ancestry in the steppes. At about 6200 BC, when these camps were first established at Kair-Shak III and Varfolomievka, they hunted primarily saiga antelope around Dzhangar, south of the lower Volga, and almost exclusively onagers in the drier desert-steppes at Kair-Shak, north of the lower Volga. Farther north at the lower/middle Volga ecotone, at sites such as Varfolomievka and Oroshaemoe hunter-fishers who made pottery similar to that at Kair-Shak hunted onagers and saiga antelope in the desert-steppe, horses in the steppe, and aurochs in the riverine forests. Finally, in the Volga steppes north of Saratov and near Samara, hunter-fishers who made a different kind of pottery (Samara type) and hunted wild horses and red deer definitely were EHG. A Samara hunter-gatherer of this era buried at Lebyazhinka IV, dated 5600-5500 BC, was one of the first named examples of the EHG genetic type (Haak et al. 2015). This individual, like others from the same region, had no or very little CHG ancestry. The CHG mating network had not yet reached Samara by 5500 BC.

Given the lack of a proper geographical and chronological transect of ancient DNA from eastern European groups, and the discontinuous appearance of both R1b-M73 and R1b-M269 lineages on both sides of the Urals within the WHG:ANE cline, where EHG appears to have formed, it is impossible at this point to assert anything with enough degree of certainty. For simplicity purposes, though, I risked to equate the expansion of R1b-M73 in West Siberia as potentially associated with Micro-Altaic, and the expansion of hg. R1b-M269 with the spread of Indo-Uralic on both sides of the Urals.

NOTE. For incrementally speculative associations of languages with prehistoric cultures and their potential link to ancestry ± haplogroup expansions, you can check sections on Early Indo-Europeans and Uralians, Indo-Uralians, Altaic peoples, Eurasians, or Nostratians. I explained why I made these simplistic choices here.

While this identification of the Indo-Uralic expansion with hg. R1b is more or less straightforward for the Cis-Urals, given the available ancient DNA samples, it will be very difficult (if at all possible) to trace the migration of these originally R1b-M269-rich populations into Trans-Uralian groups that could eventually be linked to Yukaghir speakers. The sheer number of potential admixture events and bottlenecks in Siberian forest, taiga, and tundra regions since the Mesolithic until Yukaghirs were first attested is guaranteed to give more than one headache in upcoming years…

Spread of hunter-gatherer pottery in eastern Europe ca. 6000-5000 BC. See full map.

The slight increase in WHG-related ancestry in Ukraine Neolithic groups relative to Mesolithic ones questions the arrival of this eastern influence in the north Pontic area, or at least its relevance in genomic terms, although the cluster formed is similar to the previous one and to Combed Ware groups – despite the Central European and Baltic influences in the north Pontic region – with some samples showing 0% change relative to Mesolithic groups.

Structure and change in hunter-gatherer-related populations, from Mathieson et al. (2018). Inferred ancestry proportions for populations modelled as a mixture of WHG, EHG and CHG. Dashed lines show populations from the same geographic region. Percentages indicate proportion of WHG + EHG ancestry. Standard errors range from 1.5 to 8.3%.

NOTE. For more on Indo-Uralic and its reconstruction from a linguistic point of view, check out its dedicated section on ASOSAH, or the recently published (behind paywall) The Precursors of Proto-Indo-European, edited by Kloekhorst and Pronk, Brill (2019). Authors of specific chapters have posted their contributions to Academia.edu, where they can be downloaded for free.

2. Khvalynsk and Sredni Stog

The cluster formed by the three available samples of the Khvalynsk culture (early 5th millennium BC) might be described, as expected from its position in the PCA, as a mixture of EHG-like populations of the Middle Volga with CHG-like ancestry close to that represented by samples from Progress-2 and Vonyuchka, in the North Caucasus Piedmont (ca. 4300 BC):

This variable CHG-like admixture shown in the wide cluster formed by the available Khvalynsk-related samples support the interpretation of a recently created CHG mating network in Anthony (2019):

After 5000 BC domesticated animals appeared in these same sites in the lower Volga, and in new ones, and in grave sacrifices at Khvalynsk and Ekaterinovka. CHG genes and domesticated animals flowed north up the Volga, and EHG genes flowed south into the North Caucasus steppes, and the two components became admixed. After approximately 4500 BC the Khvalynsk archaeological culture united the lower and middle Volga archaeological sites into one variable archaeological culture that kept domesticated sheep, goats, and cattle (and possibly horses). In my estimation, Khvalynsk might represent the oldest phase of PIE.

Detail of the PCA of Eurasian samples, including Neolithic clusters with the hypothesized gene flows related to (1) the formation and (2) expansion of Khvalynsk and the (3) emergence of late Sredni Stog. See full image.

The richest copper assemblage found in all Khvalynsk burials belongs to an individual of hg. R1b-V1636 and intermediate Samara_HG:Eneolithic_Steppe ancestry, while full Eneolithic_Steppe-like admixture in the Middle Volga is represented by the commoner of Khvalynsk II, of hg. Q1. The finding of hg. R1b-V1636 in the North Caucasus Piedmont – and R1b-P297 in the Samara region (probably including Yekaterinovka) begs the question of the origin of hg. R1b-V1636 in the Khvalynsk community. Based on its absence in ancient samples from the forest zone, it is tempting to assign it to steppe hunter-gatherers down the Lower Volga and possibly to the east of it, who infiltrated the Samara region precisely during these population movements described by Anthony (2019).

Suvorovo-related samples from the Balkans, including the Varna and Smyadovo outliers of Steppe ancestry, are closely related to the Khvalynsk expansion:

Similarly, the ancestry of late Sredni Stog samples from Dereivka seem to be directly related to the expansion of Mariupol-like individuals over populations of Suvorovo-Novodanilovka-like admixture, as suggested by the resurgence of typical Ukraine Neolithic haplogroups, the shift in the PCA, and the models of Eneolithic_Steppe vs. Steppe_Maykop above:

#EDIT (11 Nov 2019): In fact, the position of the unpublished Greece_Neolithic outlier that appeared in the Wang et al. (2018) preprint (see full PCA and ADMIXTURE) show that the expanding Suvorovo chiefs from the Balkans formed a tight cluster close to the two published outliers with Steppe ancestry from Bulgaria.

The Ukraine_Neolithic outlier, possibly a Novodanilovka-related sample suggests, based on its position in the PCA close to the late Trypillian outlier of Steppe-related ancestry, that Ukraine_Eneolithic samples from Dereivka are a mixture of Ukraine_Neolithic and a Novodanilovka-like community similar to Suvorovo.

The Trypillian_Eneolithic-like admixture found among Proto-Corded Ware peoples (see below) would then feature potentially a small Steppe_Eneolithic-like component already present in the north Pontic area, too.

Image modified from Wang et al. (2018). Samples projected in PCA of 84 modern-day West Eurasian populations (open symbols). Previously known clusters have been marked and referenced. Marked and labelled are the Balkan samples referenced in this text An EHG and a Caucasus ‘clouds’ have been drawn, leaving Pontic-Caspian steppe and derived groups between them. See the original file here.

Furthermore, whereas Anthony (2019) mentions a long-lasting predominance of hg. R1b in elite graves of the Eneolithic Volga basin, not a single sample of hg. R1a is mentioned supporting the community formed by the Alexandria individual, supposedly belonging to late Sredni Stog groups, but with a Corded Ware-like genetic profile (suggesting yet again that it is possibly a wrongly dated sample).

NOTE. A lack of first-hand information rather than an absence of R1a-M417 samples in the north Pontic forest-steppes would not be surprising, since Anthony is involved in the archaeology of the Middle Volga, but not in that of the north Pontic area.

Khvalynsk expansion through the Pontic–Caspian steppes in the early 5th millennium BC. See full map.

3. Post-Stog and Proto-Corded Ware

The origin of the Pre-Corded Ware ancestry is still a mystery, because of the heterogeneity of the sampled groups to date, and because the only ancestral sample that had a compatible genetic profile – I6561 from Alexandria – shows some details that make its radiocarbon date rather unlikely.

The most likely explanation for the closest source population of Corded Ware groups, found in the three core samples of Steppe_Maykop and in Trypillian Eneolithic samples from the first half of the 4th millennium BC, is still that a population of north Pontic forest-steppe hunter-gatherers hijacked this kind of ancestry, that was foreign to the north Pontic region before the Late Eneolithic period, later expanding east and west through the Podolian–Volhynian upland, due to the complex population movements of the Late Eneolithic.

NOTE. The idea of Trypillia influencing the formation of the Steppe_MLBA ancestry proper of Uralic peoples has been around for quite some time already, since the publication of Narasimhan et al. (2018) (see here or here).

Detail of the PCA of Eurasian samples, including Corded Ware groups and related clusters, as well as outliers, with hypothesized gene flows related to the (1) formation and (2) initial expansion of Pre-Corded Ware ancestry, as well as (3) later regional admixture events. See full image.

The specifics of how the Proto-Corded Ware community emerged remain unclear at this point, despite the simplistic description by Rassamakin (1999) of the Late Eneolithic north Pontic population movements as a two-stage migration of 1) late Trypillian groups (Usatovo) west → east, and (2) Late Maykop–Novosvobodnaya east → west. So, for example, Manzura (2016) on the Zhivotilovka “cultural-historical horizon” (emphasis mine):

Indeed, the very complex combination of different cultural traits in the burial sites of the Zhivotilovka type is able to generate certain problems in the search for the origins of this phenomenon. The only really consistent attribute is the burial rite in contracted position on the left or right side. Yu. Rassamakin is correct in asserting that this position of the deceased can be considered as new in the North Pontic region (Rassamakin 1999, 97). However, this opinion can be accepted only partially for the territory between Dniester and Lower Don. This position is well known in the Usatovo culture in the Northwest Pontic region, although skeletons on the right side are evidenced there only in double burials, whereas single burials contain the deceased only in a contracted position on the left side. On the other hand, the southern and western orientation of the deceased, which is one of the main burial traits of the Zhivotilovka type, is not characteristic of the Usatovo culture. Nevertheless, it is possible to suppose that at least part of the Usatovo population could have played a part in the formation of the cultural type under consideration here. One aspect of this cultural tradition, for instance, could be represented by skeletons on the left side and oriented in north-eastern and eastern directions.

Especially close ties can be traced between the Zhivotilovka and Maykop-Novosvobodnaya traditions, as exemplified by similar burial customs and various grave goods. It is beyond any doubt that the Maykop-Novosvobodnaya population was actively involved in the spread of the main Zhivotilovka cultural traits. The influence of North Caucasian traditions can be well observed, at least as far as the Dnieper Basin, but farther west influence is not manifested pronouncedly. The role of cultural units situated between the Dniester and Don rivers in the process of emergence of the Zhivotilovka type looks somewhat vague. Now, it can be quite confidently asserted that at the end of the 4th millennium BC this territory was settled by migrants from the North Caucasus and Carpathian-Dniester region. This event in theory had to stimulate cultural transformations in the Azov-Black Sea steppes and, thus, bearers of local cultural traditions perhaps could have participated in forming the culture under consideration. In any event, the Zhivotilovka type can be regarded as a complex phenomenon that emerged within the regime of intensive cultural dialogue and that it absorbed totally diff erent cultural traditions. The spread of the Zhivotilovka graves across the Pontic steppes from the Carpathians to the Lower Don or even to the Kuban Basin clearly signalizes a rapid dissolution of former cultural borders and the beginning of active movements of people, things and ideas over vast territories.


What were the factors or reasons that could have provoked this event? In the beginning of the second half of the 4th millennium BC two advanced cultural centers emerged in the south of Eastern Europe. These were the Maykop-Novosvobodnaya and Usatovo cultures, which in spite of their separation by great distances were structurally very alike. This is expressed in similar monumental burial architecture, complex burial rites, even the composition of grave goods, developed bronze metallurgy, high standards of material culture, etc. Both cultures in a completely formed state exemplify prosperous societies with a high level of economic and social organization, which can correspond to the type of ranked or early complex societies. Normally, the social elite in such polities tends to rigidly control basic domains social, economic and spiritual life using different mechanisms, even open compulsion (Earle 1987, 294-297). To some extent similar social entities can be found at this moment in the forest-steppe zone of the Carpathian-Dniester region, as reflected by the well organized settlement of Brânzeni III and the Vykhatitsy cemetery (Маркевич 1981; Дергачев 1978). In spite of their complex character, such societies represent rather friable structures, which could rapidly disintegrate due to unfavourable inner or external factors.

The societies in question emerged and existed during a time of favourable natural climatic conditions, which is considered to be a transitional period from the Atlantic to the Subboreal period, lasting approximately from 3600 to 3300 cal BC, or a climatic optimum for the steppe zone (Иванова и др. 2011, 108; Спиридонова, Алешинская 1999, 30-31). These conditions to a large degree could guarantee a stable exploitation of basic resources and support existing social hierarchies. However, after 3300 cal BC significant climatic changes occurred, accompanied by an increasing aridization and fall in temperature. This event is usually termed the “Piora oscillation” or “Rapid Climatic Event”, and is regarded as having been of global character (Magny, Haas 2004). These rapid changes could have seriously disturbed existing economic and social relations and finally provoked a similar rapid disintegration of complex social structures. In this case the sites of the Zhivotilovka type could represent mere fragments of former prosperous societies, which under conditions of the absence of centralized social control and stable cultural borders tried to recombine social and economic ties. However, the population possessed the necessary social experience and important technological resources, such as developed stock-breeding based on the breeding of small cattle and wheeled transport, so they were ready for opening new territories in their search for a better life.

Disintegration, migration, and imports of the Azov–Black Sea region. First migration event (solid arrows): Gordineşti–Maikop expansion (groups: I – Bursuchensk; II – Zhyvotylivka; III – Vovchans’k; IV – Crimean; V – Lower Don; VI – pre-Kuban). Second migration event (hollow arrows): Repin expansion. After Rassamakin (1999), Demchenko (2016).

For more on chronology and the potentially larger, longer-lasting Zhivotilovka–Volchansk–Gordineşti cultural horizon and its expansion through the Podolian–Volhynian upland, read e.g. on the Yampil Complex in the latest volume 22 of Baltic-Pontic Studies (2017):

In the forest-steppe zone of the North-West Pontic area, important data concerning the chronological position of the Zhivotilovka-Volchansk group have been produced by the exploration of the Bursuceni kurgan, which is still awaiting full publication [Yarovoy 1978; cf. also Demcenko 2016; Manzura 2016]. Burials linked with the mentioned group were stratigraphically the eldest in the kurgan, and pre-dated a burial in the extended position and [Yamnaya culture] graves. Two of these burials (features 20 and 21) produced radiocarbon dates falling around 3350-3100 BC [Petrenko, Kovaliukh 2003: 108, Tab. 7]. Similar absolute age determinations were obtained for Podolia kurgans at Prydnistryanske [Goslar et al. 2015]. These dates, falling within the Late Eneolithic, mark the currently oldest horizon of kurgan burials in the forest-steppe zone of the North-West Pontic area. The Podolia graves linked with other, older traditions of the steppe Eneolithic seem to represent a slightly later horizon dated to the transition between the Late Eneolithic and Early Bronze Age.

The presence on the left bank of the Dniester River of kurgans associated with the Eneolithic tradition, which at the same time reveals connections with the Gordineşti-Kasperovce-Horodiştea complex, raises questions about the western range of the new trend in funerary rituals, and its potential connection with the expansion of the late Trypilia culture to the West Podolia and West Volhynia Regions. The data potentially suggesting the attribution of kurgans from the upper Dniester basin to this period is patchy and difficult to verify [e.g. Liczkowce – see Sulimirski 1968: 173]. In this context, the discovery of vessels in the Gordineşti style in a kurgan at Zawisznia near Sokal is inspiring [Antoniewicz 1925].

Burials representing funerary traditions of Zhivotilovka-Volchansk group in Podolie kurgans: 1 – Porohy, grave 3A/7, 2 – Kuzmin, grave 2/2 [after Klochko et al. 2015b, Bubulich, Khakhey 2001]

Another interesting aspect of potential source populations, in combination with those above for Eneolithic_Steppe vs. Steppe_Maykop, are groups with worse fits for Steppe_Maykop_core, which include Potapovka and Srubnaya, as reported by Wang et al. (2018), but also Sintastha_MLBA (although not Andronovo). This is compatible with the long-term admixture of Abashevo chiefs dominating over a majority of Poltavka-like herders in the Don-Volga-Ural steppes during the formation of the Sintashta-Potapovka-Filatovka community, also visible in the typical Yamnaya lineages and Yamnaya-like ancestry still appearing in the region centuries after the change in power structures had occurred.

NOTE. If you feel tempted to test for mixtures of Khvalynsk_EN, Eneolithic_Steppe, Yamnaya, etc. as a source population for Corded Ware, go for it, but it’s almost certain to give similar ‘good’ fits – whatever the model – in some Corded Ware groups and not in others. It is still unclear, as far as I know, how to formally distinguish a mixture of Corded Ware-related from a Yamnaya-related source in the same model, and the results obtained with a combination of Steppe_Maykop-related + Eneolithic_Steppe-related sources will probably artificially select either one or the other source, as it probably happened in Ning et al. (2019) with Proto-Tocharian samples (see qpAdm values) that most likely had a contribution of both, based on their known intense interactions in the Tarim Basin.

Expansion of north Pontic cultures and related groups during the Late Eneolithic. See full map.

#EDIT (22 NOV 2019): New preprint Gene-flow from steppe individuals into Cucuteni-Trypillia associated populations indicates long-standing contacts and gradual admixture, by Immel et al. bioRxiv (2019), on Gordinești samples from Moldova ca. 3500-3100 BC. Relevant excerpts (emphasis mine):

A principal component analysis of the four Moldova females together with previously published data sets of ancient Eurasians showed that Gordinești, Pocrovca 1 and Pocrovca 3 grouped with later dating Bell Beakers from Germany and Hungary close to the four CTC males from Verteba, while Pocrovca 2 fell into the LBK cluster next to Neolithic farmers from Anatolia and Starčevo individual.

When looking at various proxies for steppe-related ancestry (Yamnaya Samara, Ukraine Mesolithic, Caucasian hunter-gatherer (CHG), Eastern hunter gatherer (EHG)), we did not observe any significant difference in genetic influx from either Yamnaya Samara, EHG or Ukraine Mesolithic. However, relative to CHG, we detected a substantial shift towards Yamnaya Samara steppe-related ancestry. Consequently, Yamnaya Samara, Ukraine Mesolithic and EHG appear to be equally suitable proxies for steppe-related ancestry in the Moldovan CTC individuals.

We did not obtain feasible models when running qpAdm on the X-chromosome in order to test for male-biased admixture from hunter-gatherers or individuals with steppe-related ancestry.

It is not surprising that Gordinești, Pocrovca 1 and Pocrovca 3 showed genetic affinities with later dating Bronze Age or Bell Beaker individuals. The common link among them is the considerable steppe-related ancestry, which each group likely received independently from different parental populations.

Principal component analysis of the CTC individuals from Moldova (Gordinești, Pocrovca 1, Pocrovca 2, Pocrovca 3) in red and the CTC individuals from Verteba Cave (I1926, I2110, I2111, I3151) in blue together with 23 selected ancient populations/individuals projected onto a basemap of 58 modern-day West Eurasian populations (not shown). HG=hunter-gatherer, LBK=Linearbandkeramik, PU=Proto-Unetice, TRB=Trichterbecher (Funnel Beaker Culture, FBC). PC1 is shown on the x-axis and PC2 on the y-axis.

4. Yamnaya and Afanasievo

I don’t think it makes much sense to test for GAC (or Iberia_CA, for that matter) as Wang et al. (2019) did, given the implausibility of them taking part in the formation of late Repin during the mid-4th millennium BC around the Don-Volga interfluve (represented by its offshoots Yamnaya and Afanasievo), whether these or other EEF-related populations show ‘better’ fits or not. Therefore, I only tested for more or less straightforward potential source populations:

Detail of the PCA of Eurasian samples, including Yamnaya groups and related clusters, as well as outliers, with hypothesized gene flows related to its (1) formation and (2) expansion. Also included is the inferred position of the admixed sample Yamnaya_Hungary_EBA1. See full image.

Quite unexpectedly – for me, at least – it appears that Afanasievo and Yamnaya invariably prefer Khvalynsk_EN as the closest source rather than a combination including Eneolithic_Steppe directly. In other words, late Repin shows largely genetic continuity with the Steppe ancestry already shown by the three sampled individuals from the Khvalynsk II cemetery, in line with the known strong bottlenecks of Khvalynsk-related groups under R1b lineages, visible also later in Afanasievo and Yamnaya and derived Indo-European-speaking groups under R1b-L23 subclades.

NOTE. This explains better the reported bad fits of models using directly Eneolithic_Steppe instead of Khvalynsk_EN for Afanasievo and Yamnaya Kalmykia, as is readily evident from the results above, instead of a rejection of an additional contribution to an Eneolithic_Steppe-like population, as I interpreted it, based on Anthony (2019).

Map of major sites of the Zhivotilovka-Volchansk group (A) and Repin culture (B), by Rassamakin (see 1994 and 2013). (A) 1 – Primorskoye; 2 – Vasilevka; 3 – Aleksandrovka; 4 – Boguslav; 5 – Pavlograd; 6 – Zhivotilovka; 7 – Podgorodnoye; 8 – Novomoskovsk; 9- Sokolovo; 10 – Dneprelstan; 11- Razumovka; 12 – Pologi; 13 – Vinogradnoye; 14 – Novo-Filipovka; 15 – Volchansk; 16 – Yuryevka; 17 – Davydovka; 18 – Novovorontsovka; 19 – Ust-Kamenka; 20 – Staroselye; 21- Velikaya Aleksandrovka; 22- Kovalevka; 23 – Tiraspol; 24 – Cura-Bykuluy; 25 – Roshkany; 26 – Tarakliya; 27 – Kazakliya; 28 – Bolgrad; 29 – Sarateny; 30 – Bursucheny; 31 – Novye Duruitory; 232 – Kosteshty. (B) 1 – Podgorovka; 2 – Aleksandria; 3 – Volonterovka; 4 – Zamozhnoye; 5 – Kremenevka; 6 – Ogorodnoye; 7 – Boguslav; 8 – Aleksandrovka; 9 – Verkhnaya Mayevka; 10 – Duma Skela; 11 – Zamozhnoye; 12 – Mikhailovka II.

This might suggest that the Steppe ancestry visible in samples from Progress-2 and Vonyuchka, sharing the same cluster with the Khvalynsk II cemetery commoner of hg. Q1, most likely represents North Caspian or Black Sea–Caspian steppe hunter-gatherer ancestry that increased as Khvalynsk settlers expanded to the south-west towards the Greater Caucasus, probably through female exogamy. That would mean that Steppe_Maykop potentially represents the ‘original’ ancestry of steppe hunter-gatherers of the North Caucasus steppes, which is also weakly supported by the available similar admixture of the Lola culture. The chronology, geographical location and admixture of both clusters seemed to indicate the opposite.

Modelling results for the Steppe and Caucasus cluster. Additional ‘eastern’ AG-Siberian gene flow in Steppe Maykop relative to Eneolithic Steppe. From Wang et al. (2019).

Due to the limitations of the currently available sampling and statistical tools, and barring the dubious Alexandria outlier, it is unclear how much of the late Trypillian-related admixture of late Repin (as reflected in Yamnaya and Afanasievo) corresponds to late Trypillian, Post-Stog, or Proto-Corded Ware groups from the north Pontic area. A mutual exchange suggestive of a common mating network (also supported by the mixed results obtained when including Khvalynsk_EN as source for early Corded Ware groups) seem to be the strongest proof to date of the Late Proto-Indo-European – Uralic contacts reflected in the period when post-laryngeal vocabulary was borrowed (with some samples predating the merged laryngeal loss), before the period of intense borrowing from Pre- and Proto-Indo-Iranian.

Between-group differences of Yamnaya samples are caused – like those between Corded Ware groups – by the admixture of a rapidly expanding society through exogamy with regional populations, evidenced by the inconstant affinities of western or southern outliers for previous local populations of the west Pontic or Caucasus area. This explanation for the gradual increase in local admixture is also supported by the strong, long-term patrilineal system and female exogamy practiced among expanding Proto-Indo-Europeans.

Groups of the Yamnaya culture and its western expansion after ca. 3100 BC, and Corded Ware after ca. 2900 BC See full map.

Bell Beakers and Mycenaeans

This Eneolithic_Steppe ancestry is also found among Bell Beaker groups (see above). More specifically, all Bell Beaker groups prefer a source closest to a combination of Yamnaya from the Don and Baden LCA individuals from Hungary, rather than with Corded Ware and GAC, despite the quite likely admixture of western Yamnaya settlers with (1) south-eastern European (west Pontic, Balkan) Chalcolithic populations during their expansion through the Lower Danube and with (2) late Corded Ware groups (already admixed with GAC-like populations) during their expansion as East Bell Beakers:

Similarly, Mycenaeans show good fits for a source close to the Yamnaya outlier from Bulgaria:

Detail of the PCA of Eurasian samples, including Bell Beaker and Balkan EBA groups and related clusters, as well as outliers, including ancestral Yamnaya samples from Hungary (position inferred) and Bulgaria. Also marked are Minoans, Mycenaeans and Armenian BA samples. See full image.

You can read more on Yamnaya-related admixture of Bell Beakers and Mycenaeans, and on Afanasievo-related admixture of Iron Age Proto-Tocharians.


The use of the concept of “Yamnaya ancestry”, then “Steppe ancestry” (and now even “Yamnaya Steppe ancestry“?) has already permeated the ongoing research of all labs working with human population genomics. Somehow, the conventional use of Yamnaya_Samara samples opposed to a combination of other ancient samples – alternatively selected among WHG, EHG, CHG/Iran_N, Anatolia_N, or ANE – has spread and is now unquestionably accepted as one of the “three quite distinct” ancestral groups that admixed to form the ancestry of modern Europeans, which is a rather odd, simplistic and anachronistic description of prehistory…

It has now become evident that authors involved with the Proto-Indo-European homeland question – and the tightly intertwined one of the Proto-Uralic homeland – are going to dedicate a great part of the discussion of many future papers to correct or outright reject the conclusions of previous publications, instead of simply going forward with new data.

The most striking argument to mistrust the current use of “Steppe ancestry” (as an alternative name for Yamnaya_Samara, and not as ancestry proper of steppe hunter-gatherers) is not the apparent difference in direct Eneolithic sources of Steppe ancestry for Corded Ware and Yamnaya-related peoples – closer to the available samples classified as Steppe_Maykop and Eneolithic_Steppe, respectively – or their different evolution under marked Y-DNA bottlenecks.

It is not even the lack of information about the distant origin of these Pontic–Caspian steppe hunter-gatherers of the 5th and 4th millennium BC, with their shared ancestral component potentially separated during the warmer Palaeolithic-Mesolithic transition, when the steppes were settled, without necessarily sharing any meaningful recent history before the formation of the Proto-Indo-Uralic community.

NOTE. I have raised this question multiple times since 2017 (see e.g. here or here).

The most striking paradox about simplistically misinterpreting “Steppe ancestry” as representative of Indo-European expansions is that those sub-Neolithic Pontic–Caspian steppe hunter-gatherers that had this ancestry in the 6th millennium BC were probably non-Indo-European-speaking communities, most likely related to the North(West) Caucasian language family, based on the substrate of Indo-Anatolian that sets it apart from Uralic within the Indo-Uralic trunk, and on later contacts of Indo-Tocharian with North-West Caucasian and Kartvelian, the former probably represented by Maykop and its contact with the Repin and early Yamnaya cultures.

NOTE. For more on this, see Allan Bomhard’s recent paper on the Caucasian substrate hypothesis and its ongoing supplement Additional Proto-Indo-European/Northwest Caucasian Lexical Parallels.

“Spatiotemporal kriging of YAM steppe ancestry during the Holocene, using 5000 spatial grid points. The colors represent the predicted ancestry proportion at each point in the grid.” Image with evolution from ca. 2800 BC until the present day, modified from Racimo et al. (2019). The Copenhagen group considers the expansion of this component as representative of expanding Indo-Europeans…

This kind of error happens because we all – hence also authors, peer reviewers, and especially journal editors – love far-fetched conclusions and sensational titles, forgetting what a paper actually shows and – always more importantly in scientific reports – what it doesn’t show. This is particularly true when more than one field is involved and when extraordinary claims involve aspects foreign to the journal’s (and usually the own authors’) main interests. One would have thought that the glottochronological fiasco published in Science in 2012 (open access in PMC) should have taught an important lesson to everyone involved. It didn’t, because apparently no one has felt the responsibility or the shame to retract that paper yet, even in the age of population genomics.

If anything, the excesses of mathematical linguistics – using computational methods to try and reconstruct phylogenetic trees – have perpetuated a form of misunderstood Scientism which blindly relies on a simple promise made by authors in the Materials and Method section (rarely if ever kept beyond it) to use statistics rather than resorting to the harder, well-informed, comprehensive reasoning that is needed in the comparative method. After all, why should anyone invest hundreds of hours (or simply show an interest in) learning about historical linguistics, about ancient Indo-European or Uralic languages, carefully argumenting and discussing each and every detail of the reconstruction, when one can simply rely on the own guts to decide what is Science and what isn’t? When one can trust a promise that formulas have been used?

The conservative, null hypothesis when studying prehistoric Eurasian samples related to evolving cultures was universally understood as no migration, or “pots not people” (as most western archaeologists chose to believe until recently), whereas the alternative one should have been that there were in fact migration events, some of them potentially related to the expansion of Eurasian languages ancestral to the historically attested ones. Beyond this migrationist view there were obviously dozens of thorough theories concerning potential linguistic expansions associated with specific prehistoric cultures, and a myriad of less developed alternatives, all of which deserved to be evaluated after the null hypothesis had been rejected.

Despite the shortcomings of the 2015 papers and their lack of testing or discussion of different language expansion models, the spread of the so-called “Yamnaya ancestry” – an admixture especially prevalent (after the demise of the Yamnaya) among the most likely ancient Uralic-speaking groups as well as among modern Uralic speakers and recently acculturated groups from Eastern Europe – has been nevertheless invariably concluded by each lab to support the theories of their leading archaeologist, often combined with pre-aDNA theories of geneticists based on modern haplogroup distributions. This is as evident a case of confirmation bias, circular reasoning, and jumping to conclusions as it gets.

Why many researchers of other labs have chosen to follow such conclusions instead of challenging or simply ignoring them is difficult to understand.


Villabruna cluster in Late Epigravettian Sicily supports South Italian corridor for R1b-V88


New preprint Late Upper Palaeolithic hunter-gatherers in the Central Mediterranean: new archaeological and genetic data from the Late Epigravettian burial Oriente C (Favignana, Sicily), by Catalano et al. bioRxiv (2019).

Interesting excerpts (emphasis mine):

Grotta d’Oriente is a small coastal cave located on the island of Favignana, the largest (~20 km2) of a group of small islands forming the Egadi Archipelago, ~5 km from the NW coast of Sicily.

The Oriente C funeral pit opens in the lower portion of layer 7, specifically sublayer 7D. Two radiocarbon dates on charcoal from the sublayers 7D (12149±65 uncal. BP) and 7E, 12132±80 uncal. BP are consistent with the associated Late Epigravettian lithic assemblages (Lo Vetro and Martini, 2012; Martini et al., 2012b) and refer the burial to a period between about 14200-13800 cal. BP, when Favignana was connected to the main island (Agnesi et al., 1993; Antonioli et al., 2002; Mannino et al. 2014).

A-B) Geographic location of Grotta d’Oriente.

The anatomical features of Oriente C are close to those of Late Upper Palaeolithic populations of the Mediterranean and show strong affinity with other Palaeolithic individuals of Sicily. As suggested by Henke (1989) and Fabbri (1995) the hunter-gatherer populations were morphologically rather uniform.

Genetic analysis

We confirmed the originally reported mitochondrial haplogroup assignment of U2’3’4’7’8’9. This haplogroup is present in both pre- and post-LGM populations, but is rare by the Mesolithic, when U5 dominates (Posth et al.2016).

Lipson et al. (2018) (their supplementary Figure S5.1) and Villalba-Mouco et al. (2019) (their Figure 2A) showed that European Late Palaeolithic and Mesolithic hunter-gatherers fall along two main axes of genetic variation. Multidimensional scaling (MDS) of f3-statistics shows that these axes form a “V” shape (Fig. 3). (…)

Focusing further on Oriente C, we find that it shares most drift with individuals from Northern Italy, Switzerland and Luxembourg, and less with individuals from Iberia, Scandinavia, and East and Southeast Europe (Fig. 4A-B). Shared drift decreases significantly with distance (Fig. 4C) and with time (Fig. 4D) although in a linear model of drift with distance and time as a covariate, only distance (p=1.3×10-6) and not time (p=0.11) is significant. Consistent with the overall E-W cline in hunter-gatherer ancestry, genetic distance to Oriente C increases more rapidly with longitude than latitude, although this may also be affected by geographic features. For example, Oriente C shares significantly more drift with the 8,000 year-old 1,400 km distant individual from Loschbour in Luxembourg (Lazaridis et al.,2014), than with the 9,000 year old individual from Vela Spila in Croatia (Mathieson et al.,2018) only 700 km away as shown by the D-statistic (Patterson et al.,2012) D (Mbuti, Oriente C, Vela Spila, Villabruna); Z=3.42. Oriente C’s heterozygosity was slightly lower than Villabruna (14% lower at 1240k transversion sites), but this difference is not significant (bootstrap P=0.12).

Multidimensional scaling of outgroup f3-statistics for Late 531 Upper Palaeolithic and Mesolithic hunter-gatherers.

Discussion and Conclusion

The robust record of radiocarbon dates proves that they reached Sicily not before 15-14 ka cal. BP, several millennia after the LGM peak. In our opinion, in fact, the hypothesis about an early colonization of Sicily by Aurignacians (Laplace, 1964; Chilardi et al., 1996) must be rejected, on the basis of a recent reinterpretation of the techno-typological features of the lithic industries from Riparo di Fontana Nuova (Martini et al., 2007; Lo Vetro and Martini, 2012; on this topic see also Di Maida et al., 2019).

These analyses have implications for understanding the origin and diffusion of the hunter-gatherers that inhabited Europe during the Late Upper Palaeolithic and Mesolithic. Our findings indicate that Oriente C shows a strong genetic relationship with Western European Late Upper Palaeolithic and Mesolithic hunter-gatherers, suggesting that the “Western hunter-gatherers” was a homogeneous population widely distributed in the Central Mediterranean, presumably as a consequence of continuous gene flow among different groups, or a range expansion following the LGM.

The same statistic as in A plotted with geographic position

The South Italian corridor

Once again, a hypothesis based on phylogeography – apart from scarce archaeological and palaeolinguistic data (“Semitic”-like topo-hydronymy and substrates in Europe) – seems to be confirmed step by step. Since the finding of the Villabruna individual of hg. R1b-L754 (likely R1b-V88, like south-eastern European lineages expanded with WHG ancestry), it was quite likely to find out that southern Europe was the origin of the expansion of R1b-V88 into Africa.

The most likely explanation for the presence of “archaic” R1b-V88 subclades among modern Sardinians was, therefore, that they represented a remnant from a Late Upper Palaeolithic/Early Mesolithic population that had not been replaced in subsequent migrations, and thus that the migration of these lineages into Northern Africa and the Green Sahara happened during a period when Italy was connected by a shallower Mediterranean (and more land connections) to Northern Africa.

Likely Late Epigravettian/Mesolithic expansion of R1b-V88 into Northern Africa. See full map.

Nevertheless, the arguments for a quite recent expansion of R1b-V88 through the Mediterranean and into Africa keep being repeated, probably based on ancestry from the few ancient (and many modern) populations that have been investigated to date, a simplistic approach prone to important errors that overarch whole migration models.

For example, in the recent paper by Marcus et al. (2019) the presence of these lineages among ancient Sardinians (from the late 4th millennium BC on) is interpreted as an expansion of R1b-V88 with the Cardial Neolithic based on their ancestry, disregarding the millennia-long gap between these samples and the presence of this haplogroup in Palaeolithic/Mesolithic Northern Iberia and Northern Italy, and the comparatively much earlier splits in the phylogenetic tree and dispersal among African populations.

Afroasiatic and Nostratic

I was asked recently if I really believed that we could reconstruct Proto-Nostratic and connect it with any ancestral population. My answer is simple: until the Chalcolithic – when the whole picture of Indo-Europeans, Uralians, Egyptians or Semites becomes quite clear – we have just very few (linguistic, archaeological, genetic) dots which we would like to connect, and we do so the best we can. The earlier the population and proto-language, the more difficult this task becomes.

NOTE. 1) I tentatively connected hg. R with Nostratic in a previous text – when it appeared that R1a expanded from around Lake Baikal, hence Eurasiatic; R1b from the south with AME-WHG ancestry, hence Afroasiatic; and R2 with Dravidian.

2) After that, I though it was more likely to be connected to AME ancestry and the Middle East, because of the apparent expansion of WHG from south-eastern Europe, and the potential association of Afroasiatic and (Elamo-?)Dravidian to Middle Eastern populations.

3) However, after finding more and more R1b samples expanding through northern Eurasia, spreading through the (then wider) steppe regions; and R1a essentially surviving among other groups in eastern Europe for thousands of years without being associated to significant migrations (like, say, hg. C after the Palaeolithic), it didn’t seem like this division was accurate, hence my most recent version.

But, in essence, it’s all about connecting the dots, and we have very few of them…

Phylogenetic tree from Pagel et al. (2013), partially in agreement with Kortlandt’s view on Eurasiatic. “Consensus phylogenetic tree of Eurasiatic superfamily (A) superimposed on Eurasia and (B) rooted tree with estimated dates of origin of families and of superfamily. (A) Unrooted consensus tree with branch lengths (solid lines) shown to scale and illustrating the correspondence between the tree and the contemporary north-south and east-west geographical positions of these language families. Abbreviations: P (proto) followed by initials of language family: PD, proto-Dravidian; PK, proto-Kartvelian; PU, proto-Uralic; PIE, proto–Indo-European; PA, proto-Altaic; PCK, proto–Chukchi-Kamchatkan; PIY, proto–Inuit-Yupik. The dotted line to PIY extends the inferred branch length into the area in which Inuit-Yupik languages are currently spoken: it is not a measure of divergence. The cross-hatched line to PK indicates that branch has been shortened (compare with B). The branch to proto-Dravidian ends in an area that Dravidian populations are thought to have occupied before the arrival of Indo-Europeans (see main text). (B) Consensus tree rooted using proto-Dravidian as the outgroup. The age at the root is 14.45 ± 1.75 kya (95% CI = 11.72–18.38 kya) or a slightly older 15.61 ± 2.29 kya (95% CI = 11.72–20.40 kya) if the tree is rooted with proto-Kartvelian. The age assumes midpoint rooting along the branch leading to proto-Dravidian (rooting closer to PD would produce an older root, and vice versa), and takes into account uncertainty around proto–Indo-European date of 8,700 ± 544 (SD) y following ref. 35 and the PCK date of 692 ± 67 (SD) y ago.”

In linguistics, I trust traditional linguists who tend to trust other more experimental linguists (like Hyllested or Kortlandt) who consider that – in their experience – an Indo-Uralic and a Eurasiatic phylum can be reconstructed. Similarly, linguists like Kortlandt are apparently (partially) supportive of attempts like that of Allan Bomhard with Nostratic – although almost everyone is critic of the Muscovite school‘s attachment to the Brugmannian reconstruction, stuck in pre-laryngeal Proto-Indo-Anatolian and similar archaisms.

I mostly use Nostratic as a way to give a simplistic ethnolinguistic label to the genetically related prehistoric peoples whose languages we will probably never know. I think it’s becoming clear that the strongest connection right now with the expansion of potential Eurasiatic dialects is offered by ANE-related populations (hence Y-chromosome bottlenecks under hg. R, Q, probably also N), however complicated the reconstruction of that hypothetic community (and its dialectalization) may be.

Therefore, the multiple expansions of lineages more or less closely associated to ANE-related peoples – like R1b-V88 in the case of Afrasian, or R2 in the case of Dravidians – are the easiest to link to the traditionally described Nostratic dialects and their highly hypothetic relationship.

Reconstruction of North African vegetation during past green Sahara periods. Estimated and reconstructed MAP for the Holocene GSP (6–10 kyr BP) projected onto a cross-section along the eastern Sahara (left panel) and map view of reconstructed MAP, vegetation and physiographic elements [7,8,11,45] (right panel). Image from Larrasoaña et al. (2013).

What should be clear to anyone is that the attempt of many modern Afroasiatic speakers to connect their language to their own (or their own community’s main) haplogroups, frequently E and/or J, is flawed for many reasons; it was simplistic in the 2000s, but it is absurd after the advent of ancient DNA investigation and more recent investigation on SNP mutation rates. R1b-V88 should have been on the table of discussions about the expansion of Afroasiatic communities through the Green Sahara long ago, whether one supports a Nostratic phylum or not.

The fact that the role of R1b bottlenecks and expansions in the spread of Afroasiatic is usually not even discussed despite their likely connection with the most recent population expansions through the Green Sahara fitting a reasonable time frame for Proto-Afroasiatic reconstruction, a reasonable geographical homeland, and a compatible dialectal division – unlike many other proposed (E or J) subclades – reveals (once again) a lot about the reasons behind amateur interest in genetics.

Just like seeing the fixation in (and immobility of) recent writings about the role of I1, I2, or (more recently) R1a in the Proto-Indo-European expansion, R1b with Vasconic, or N1c with Proto-Uralic.

NOTE. That evident interest notwithstanding, it is undeniable that we have a much better understanding of the expansions of R1b subclades than other haplogroups, probably due in great part to the easier recovery of ancient DNA from Eurasia (and Europe in particular), for many different – sociopolitical, geographical, technological – reasons. It is quite possible that a more thorough temporal transect of ancient DNA from the Middle East and Africa might radically change our understanding of population movements, especially those related to the Afroasiatic expansion. I am referring in this post to interpretations based on the data we currently have, despite that potential R1b-based bias.


Uralic speakers formed clines of Corded Ware ancestry with WHG:ANE populations


The preprint by Jeong et al. (2018) has been published: The genetic history of admixture across inner Eurasia Nature Ecol. Evol. (2019).

Interesting excerpts, referring mainly to Uralic peoples (emphasis mine):

A model-based clustering analysis using ADMIXTURE shows a similar pattern (Fig. 2b and Supplementary Fig. 3). Overall, the proportions of ancestry components associated with Eastern or Western Eurasians are well correlated with longitude in inner Eurasians (Fig. 3). Notable outliers include known historical migrants such as Kalmyks, Nogais and Dungans. The Uralic- and Yeniseian-speaking populations, as well as Russians from multiple locations, derive most of their Eastern Eurasian ancestry from a component most enriched in Nganasans, while Turkic/Mongolic speakers have this component together with another component most enriched in populations from the Russian Far East, such as Ulchi and Nivkh (Supplementary Fig. 3). Turkic/Mongolic speakers comprising the bottom-most cline have a distinct Western Eurasian ancestry profile: they have a high proportion of a component most enriched in Mesolithic Caucasus hunter-gatherers and Neolithic Iranians and frequently harbour another component enriched in present-day South Asians (Supplementary Fig. 4). Based on the PCA and ADMIXTURE results, we heuristically assigned inner Eurasians to three clines: the ‘forest-tundra’ cline includes Russians and all Uralic and Yeniseian speakers; the ‘steppe-forest’ cline includes Turkic- and Mongolic-speaking populations from the Volga and Altai–Sayan regions and Southern Siberia; and the ‘southern steppe’ cline includes the rest of the populations.

The first two PCs summarizing the genetic structure within 2,077 Eurasian individuals. The two PCs generally mirror geography. PC1 separates western and eastern Eurasian populations, with many inner Eurasians in the middle. PC2 separates eastern Eurasians along the northsouth cline and also separates Europeans from West Asians. Ancient individuals (color-filled shapes), including two Botai individuals, are projected onto PCs calculated from present-day individuals.

For the forest-tundra populations, the Nganasan + Srubnaya model is adequate only for the two Volga region populations, Udmurts and Besermyans (Fig. 5 and Supplementary Table 8).

For the other populations west of the Urals, six from the northeastern corner of Europe are modelled with additional Mesolithic Western European hunter-gatherer (WHG) contribution (8.2–11.4%; Supplementary Table 8), while the rest need both WHG and early Neolithic European farmers (LBK_EN; Supplementary Table 2). Nganasan-related ancestry substantially contributes to their gene pools and cannot be removed from the model without a significant decrease in the model fit (4.1–29.0% contribution; χ2 P ≤ 1.68 × 10−5; Supplementary Table 8).

Supplementary Table 8. QpAdm-based admixture modeling of the forest-tundra cline populations. For the 13 populations west of the Urals, we present a four-way admixture model, Nganasan+Srubnaya+WHG+LBK_EN, or its minimal adequate subset. Modified from the article, to include colors for cultures, and underlined best models for Corded Ware ancestry among Uralians.

NOTE. It doesn’t seem like Hungarians can be easily modelled with Nganasan ancestry, though…

For the 4 populations east of the Urals (Enets, Selkups, Kets and Mansi), for which the above models are not adequate, Nganasan + Srubnaya + AG3 provides a good fit (χ2 P ≥ 0.018; Fig. 5 and Supplementary Table 8). Using early Bronze Age populations from the Baikal Lake region (‘Baikal_EBA’; Supplementary Table 2) as a reference instead of Nganasan, the two-way model of Baikal_EBA + Srubnaya provides a reasonable fit (χ2 P ≥ 0.016; Supplementary Table 8) and the three-way model of Baikal_EBA + Srubnaya + AG3 is adequate but with negative AG3 contribution for Enets and Mansi (χ2 P ≥ 0.460; Supplementary Table 8).

Supplementary Table 8. QpAdm-based admixture modeling of the forest-tundra cline populations. For the four populations east of the Urals, we present three admixture models: Baikal_EBA+Srubnaya, Baikal_EBA+Srubnaya+AG3 and Nganasan+Srubnaya+AG3. For each model, we present qpAdm p-value, admixture coefficient estimates and associated 5 cM jackknife standard errors (estimate ± SE). Modified from the article, to include colors for cultures, and underlined best models for Corded Ware ancestry among Uralians.

Bronze/Iron Age populations from Southern Siberia also show a similar ancestry composition with high ANE affinity (Supplementary Table 9). The additional ANE contribution beyond the Nganasan + Srubnaya model suggests a legacy from ANE-ancestry-rich clines before the Late Bronze Age.

Supplementary Table 9. QpAdm-based admixture modeling of Bronze and Iron Age populations of southern Siberia. For ancieint individuals associated with Karasuk and Tagar cultures, Nganasan+Srubnaya model is insufficient. For all five groups, adding AG3 as the third ancestry or substituting Nganasan with Baikal_EBA with higher ANE affinity provides an adequate model. For each model, we present qpAdm p-value, admixture coefficient estimates and associated 5 cM jackknife standard errors (estimate ± SE). Models with p-value ≥ 0.05 are highlighted in bold face. Modified from the article, to include colors for cultures, and underlined best models for Corded Ware ancestry among Uralians.

Lara M. Cassidy comments the results of the study in A steppe in the right direction (you can read it here):

Even among the earliest available inner Eurasian genomes, east–west connectivity is evident. These, too, form a longitudinal cline, characterized by the easterly increase of a distinct ancestry, labelled Ancient North Eurasian (ANE), lowest in western European hunter-gatherers (WHG) and highest in Palaeolithic Siberians from the Baikal region. Flow-through from this ANE cline is seen in steppe populations until at least the Bronze Age, including the world’s earliest known horse herders — the Botai. However, this is eroded over time by migration from west and east, following agricultural adoption on the continental peripheries (Fig. 1b,c).

Strikingly, Jeong et al. model the modern upper steppe cline as a simple two-way mixture between western Late Bronze Age herders and Northeast Asians (Fig. 1c), with no detectable residue from the older ANE cline. They propose modern steppe peoples were established mainly through migrations post-dating the Bronze Age, a sequence for which has been recently outlined using ancient genomes. In contrast, they confirm a substantial ANE legacy in modern Siberians of the northernmost cline, a pattern mirrored in excesses of WHG ancestry west of the Urals (Fig. 1b). This marks the inhospitable biome as a reservoir for older lineages, an indication that longstanding barriers to latitudinal movement may indeed be at work, reducing the penetrance of gene flows further south along the steppe.

The genomic formation of inner Eurasians. b–d, Depiction of the three main clines of ancestry identified among Inner Eurasians. Sources of admixture for each cline are represented using proxy ancient populations, both sampled and hypothesised, based on the study’s modelling results. The major eastern and western ancestries used to model each cline are shown in bold; the peripheral admixtures that gave rise to these are also shown. Additional contributions to subsections of each cline are marked with dashed lines. b, The northernmost cline, illustrating the legacy of WHG and ANE-related populations. c,d, The upper (c) and lower (d) steppe clines are shown, both of which have substantial eastern contributions related to modern Tungusic speakers. The authors propose these populations are themselves the result of an admixture between groups related to the Nganasan, whose ancestors potentially occupied a wider range, and hunter-gatherers (HGs) from the Amur River Basin. While the upper steppe cline in c can be described as a mixture between this eastern ancestry and western steppe herders, the current model for the southern steppe cline as shown in d is not adequate and is likely confounded by interactions with diverse bordering ancestries. Credit: Ecoregions 2017, Resolve https://ecoregions2017.appspot.com/

Given the findings as reported in the paper, I think it should be much easier to describe different subclines in the “northernmost cline” than in the much more recent “Turkic/Mongolic cline”, which is nevertheless subdivided in this paper in two clines. As an example, there are at least two obvious clines with “Nganasan-related meta-populations” among Uralians, which converge in a common Steppe MLBA (i.e. Corded Ware) ancestry – one with Palaeo-Laplandic peoples, and another one with different Palaeo-Siberian populations:

PCA of ancient and modern Eurasian samples. Ancient Palaeo-Laplandic, Palaeosiberian, and Altai clines drawn, with modern populations labelled. See a version with higher resolution.

The inclusion of certain Eurasian groups (or lack thereof) in the PCA doesn’t help to distinguish these subclines visually, and I guess the tiny “Naganasan-related” ancestral components found in some western populations (e.g. the famous ~5% among Estonians) probably don’t lend themselves easily to further subdivisions. Notice, nevertheless, the different components of the Eastern Eurasian source populations among Finno-Ugrians:

Characterization of the Western and Eastern Eurasian source ancestries in inner Eurasian populations. [Modified from the paper, includes only Uralic populations]. a, Admixture f3 values are compared for different Eastern Eurasian (Mixe, Nganasan and Ulchi; green) and Western Eurasian references (Srubnaya and Chalcolithic Iranians (Iran_ChL); red). For each target group, darker shades mark more negative f3 values. b, Weights of donor populations in two sources characterizing the main admixture signal (date 1 and PC1) in the GLOBETROTTER analysis. We merged 167 donor populations into 12 groups (top right). Target populations were split into five groups (from top to bottom): Aleuts; the forest-tundra cline populations; the steppe-forest cline populations; the southern steppe cline populations; and ‘others’.

Also remarkable is the lack of comparison of Uralic populations with other neighbouring ones, since the described Uralic-like ancestry of Russians was already known, and is most likely due to the recent acculturation of Uralic-speaking peoples in the cradle of Russians, right before their eastward expansions.

Supplementary Fig. 4. ADMIXTURE results qualitatively support PCA-based grouping of inner Eurasians into three clines. (A) Most southern steppe cline populations derive a higher proportion of their total Western Eurasian ancestry from a source related to Caucasus, Iran and South Asian populations. (B) Turkic- and Mongolic-speaking populations tend to derive their Eastern Eurasian ancestry more from the Devil’s Gate related one than from Nganasan-related one, while the opposite is true for Uralic- and Yeiseian-speakers. To estimate overall western Eurasian ancestry proportion, we sum up four components in our ADMIXTURE results (K=14), which are the dominant components in Neolithic Anatolians (“Anatolia_N”), Mesolithic western European hunter-gatherers (“WHG”), early Holocene Caucasus hunter-gatherers (“CHG”) and Mala from southern India, respectively. The “West / South Asian ancestry” is a fraction of it, calculated by summing up the last two components. To estimate overall Eastern Eurasian ancestry proportion, we sum up six components, most prevalent in Surui, Chipewyan, Itelmen, Nganasan, Atayal and early Neolithic Russian Far East individuals (“Devil’s Gate”). Eurasians into three clines. (A) Most southern steppe cline populations derive a higher proportion of their total Western Eurasian ancestry from a source related to Caucasus, Iran and South Asian populations. (B) Turkic- and Mongolic-speaking populations tend to derive their Eastern Eurasian ancestry more from the Devil’s Gate related one than from Nganasan-related one, while the opposite is true for Uralic- and Yeiseian-speakers. To estimate overall western Eurasian ancestry proportion, we sum up four components in our ADMIXTURE results (K=14), which are the dominant components in Neolithic Anatolians (“Anatolia_N”), Mesolithic western European hunter-gatherers (“WHG”), early Holocene Caucasus hunter-gatherers (“CHG”) and Mala from southern India, respectively. The “West / South Asian ancestry” is a fraction of it, calculated by summing up the last two components. To estimate overall Eastern Eurasian ancestry proportion, we sum up six components, most prevalent in Surui, Chipewyan, Itelmen, Nganasan, Atayal and early Neolithic Russian Far East individuals (“Devil’s Gate”).

A comparison of Estonians and Finns with Balts, Scandinavians, and Eastern Europeans would have been more informative for the division of the different so-called “Nganasan-like meta-populations”, and to ascertain which one of these ancestral peoples along the ancient WHG:ANE cline could actually be connected (if at all) to the Cis-Urals.

Because, after all, based on linguistics and archaeology, geneticists are not supposed to be looking for populations from the North Asian Arctic region, for “Siberian ancestry”, or for haplogroup N1c – despite previous works by their peers – , but for the Bronze Age Volga-Kama region…


Minimal gene flow from western pastoralists in the Bronze Age eastern steppes


Open access paper Bronze Age population dynamics and the rise of dairy pastoralism on the eastern Eurasian steppe, by Jeong et al. PNAS (2018).

Interesting excerpts (emphasis mine):

To understand the population history and context of dairy pastoralism in the eastern Eurasian steppe, we applied genomic and proteomic analyses to individuals buried in Late Bronze Age (LBA) burial mounds associated with the Deer Stone-Khirigsuur Complex (DSKC) in northern Mongolia. To date, DSKC sites contain the clearest and most direct evidence for animal pastoralism in the Eastern steppe before ca. 1200 BCE.

Most LBA Khövsgöls are projected on top of modern Tuvinians or Altaians, who reside in neighboring regions. In comparison with other ancient individuals, they are also close to but slightly displaced from temporally earlier Neolithic and Early Bronze Age (EBA) populations from the Shamanka II cemetry (Shamanka_EN and Shamanka_EBA, respectively) from the Lake Baikal region. However, when Native Americans are added to PC calculation, we observe that LBA Khövsgöls are displaced from modern neighbors toward Native Americans along PC2, occupying a space not overlapping with any contemporary population. Such an upward shift on PC2 is also observed in the ancient Baikal populations from the Neolithic to EBA and in the Bronze Age individuals from the Altai associated with Okunevo and Karasuk cultures.

Image modified from the article. Karasuk cluster in green, closely related to sample ARS026 in red. Principal Component Analysis (PCA) of selected 2,077 contemporary Eurasians belonging to 149 groups. Contemporary individuals are plotted using three-letter abbreviations for operational group IDs. Group IDs color coded by geographic region. Ancient Khövsgöl individuals and other selected ancient groups are represented on the plot by filled shapes. Ancient individuals are projected onto the PC space using the “lsqproject: YES” option in the smartpca program to minimize the impact of high genotype missing rate.

(…) two individuals fall on the PC space markedly separated from the others: ARS017 is placed close to ancient and modern northeast Asians, such as early Neolithic individuals from the Devil’s Gate archaeological site (22) and present-day Nivhs from the Russian far east, while ARS026 falls midway between the main cluster and western Eurasians.

Upper Paleolithic Siberians from nearby Afontova Gora and Mal’ta archaeological sites (AG3 and MA-1, respectively) (25, 26) have the highest extra affinity with the main cluster compared with other groups, including the eastern outlier ARS017, the early Neolithic Shamanka_EN, and present-day Nganasans and Tuvinians (Z > 6.7 SE for AG3). Main cluster Khövsgöl individuals mostly belong to Siberian mitochondrial (A, B, C, D, and G) and Y (all Q1a but one N1c1a) haplogroups.

The genetic affinity of the Khövsgöl clusters measured by outgroup-f3 and -f4 statistics. (A) The top 20 populations sharing the highest amount of >genetic drift with the Khövsgöl main cluster measured by f3(Mbuti; Khövsgöl, X). (B) The top 15 populations with the most extra affinity with each of the three Khövsgöl clusters in contrast to Tuvinian (for the main cluster) or to the main cluster (for the two outliers), measured by f4(Mbuti, X; Tuvinian/Khövsgöl, Khövsgöl/ARS017/ARS026). Ancient and contemporary groups are marked by squares and circles, respectively. Darker shades represent a larger f4 statistic.

Previous studies show a close genetic relationship between WSH populations and ANE ancestry, as Yamnaya and Afanasievo are modeled as a roughly equal mixture of early Holocene Iranian/ Caucasus ancestry (IRC) and Mesolithic Eastern European hunter-gatherers, the latter of which derive a large fraction of their ancestry from ANE. It is therefore important to pinpoint the source of ANE-related ancestry in the Khövsgöl gene pool: that is, whether it derives from a pre-Bronze Age ANE population (such as the one represented by AG3) or from a Bronze Age WSH population that has both ANE and IRC ancestry.

The amount of WSH contribution remains small (e.g., 6.4 ± 1.0% from Sintashta). Assuming that the early Neolithic populations of the Khövsgöl region resembled those of the nearby Baikal region, we conclude that the Khövsgöl main cluster obtained ∼11% of their ancestry from an ANE source during the Neolithic period and a much smaller contribution of WSH ancestry (4–7%) beginning in the early Bronze Age.

Admixture modeling of Altai populations and the Khövsgöl main cluster using qpAdm. For the archaeological populations, (A) Shamanka_EBA and (B and C) Khövsgöl, each colored block represents the proportion of ancestry derived from a corresponding ancestry source in the legend. Error bars show 1 SE. (A) Shamanka_EBA is modeled as a mixture of Shamanka_EN and AG3. The Khövsgöl main cluster is modeled as (B) a two-way admixture of Shamanka_EBA+Sintashta and (C) a three-way admixture Shamanka_EN+AG3+Sintashta.

Apparently, then, the first individual with substantial WSH ancestry in the Khövsgöl population (ARS026, of haplogroup R1a-Z2123), directly dated to 1130–900 BC, is consistent with the first appearance of admixed forest-steppe-related populations like Karasuk (ca. 1200-800 BC) in the Altai. Interestingly, haplogroup N1a1a-M178 pops up (with mtDNA U5a2d1) among the earlier Khövsgöl samples.

I will repeat what I wrote recently here: Samoyedic arrived in the Altai with Karasuk and hg R1a-Z645 + Steppe_MLBA-like ancestry, admixed with Altai populations, clustering thus within an Ancient Altai cline. Only later did N1a1a subclades infiltrate Samoyedic (and Ugric) populations, bringing them closer to their modern Palaeo-Siberian cline. The shared mtDNA may support an ancestral EHG-“Siberian” cline, or else a more recent Afanasevo-related origin.

Modified image from Jeong et al. (2018), supplementary materials. The first two PCs summarizing the genetic structure within 2,077 Eurasian individuals. The two PCs generally mirror geography. PC1 separates western and eastern Eurasian populations, with many inner Eurasians in the middle. PC2 separates eastern Eurasians along the north-south cline and also separates Europeans from West Asians. Ancient individuals (color-filled shapes), including two Botai individuals, are projected onto PCs calculated from present-day individuals. Read more.

Also interesting, Q1a2 subclades and ANE ancestry making its appearance everywhere among ancestral Eurasian peoples, as Chetan recently pointed out.


Waves of Palaeolithic ANE ancestry driven by P subclades; new CWC-like Finnish Iron Age

New preprint The population history of northeastern Siberia since the Pleistocene, by Sikora et al. bioRxiv (2018).

Interesting excerpts (emphasis mine; most internal references removed):

ANE ancestry

The earliest, most secure archaeological evidence of human occupation of the region comes from the artefact-rich, high-latitude (~70° N) Yana RHS site dated to ~31.6 kya (…)

The Yana RHS human remains represent the earliest direct evidence of human presence in northeastern Siberia, a population we refer to as “Ancient North Siberians” (ANS). Both Yana RHS individuals were unrelated males, and belong to mitochondrial haplogroup U, predominant among ancient West Eurasian hunter-gatherers, and to Y chromosome haplogroup P1, ancestral to haplogroups Q and R, which are widespread among present-day Eurasians and Native Americans.

Symmetry tests using f4 statistics reject tree-like clade relationships with both Early West Eurasians (EWE; Sunghir) and Early East Asians (EEA; Tianyuan); however, Yana is genetically closer to EWE, despite its geographic location in northeastern Siberia

Using admixture graphs (qpGraph) and outgroup-based estimation of mixture proportions (qpAdm), we find that Yana can be modelled as EWE with ~25% contribution from EEA

Among all ancient individuals, Yana shares the most genetic drift with Mal’ta, and f4 statistics show that Mal’ta shares more alleles with Yana than with EWE (e.g. f4(Mbuti,Mal’ta;Sunghir,Yana) = 0.0019, Z = 3.99). Mal’ta and Yana also exhibit a similar pattern of genetic affinities to both EWE and EEA, consistent with previous studies.The ANE lineage can thus be considered a descendant of the ANS lineage, demonstrating that by 31.6 kya early representatives of this lineage were widespread across northern Eurasia, including far northeastern Siberia.


Ancient Palaeosiberian

(…) the 9.8 kya Kolyma1 individual, representing a group we term “Ancient Paleosiberians” (AP). Our results indicate that AP are derived from a first major genetic shift observed in the region. Principal component analysis (PCA), outgroup f3-statistics and mtDNA and Y chromosome haplogroups (G1b and Q1a1a, respectively) demonstrate a close affinity between AP and present-day Koryaks, Itelmen and Chukchis, as well as with Native Americans.

For both AP and Native Americans, ANS ancestry appears more closely related to Mal’ta than Yana, therefore rejecting a direct contribution of Yana to later AP or Native American groups.

Lake Baikal Neolithic – Bronze Age

(…) the newly reported genomes from Ust’Belaya and recently published neighbouring Neolithic and Bronze Age sites show a succession of three distinct genetic ancestries over a ~6 ky time span. The earliest individuals show predominantly East Asian ancestry, closely related to the ancient individuals from DGC. In the early Bronze Age (BA), we observe a resurgence of AP ancestry (up to ~50% ancestry fraction), as well as influence of West Eurasian Steppe ANE ancestry represented by the early BA individuals from Afanasievo in the Altai region (~10%) This is consistent with previous reports of gene flow from an unknown ANE-related source into Lake Baikal hunter-gatherers.

Our results suggest a southward expansion of AP as a possible source, which is also consistent with the replacement of Y chromosome lineages observed at Lake Baikal, from predominantly haplogroup N in the Neolithic to haplogroup Q in the BA. Finally, the most recent individual from Ust’Belaya, dated to ~600 years ago, falls along the Neosiberian cline, similar to the ~760 year-old ‘Young Yana’ individual from northeastern Siberia, demonstrating the widespread distribution of Neosiberian ancestry in the most recent epoch.

Genetic structure of ancient northeast Siberians. PCA of ancient individuals projected onto a set of modern Eurasian and American individuals. Abbreviations in group labels: UP – Upper Palaeolithic; LP – Late Palaeolithic; M – Mesolithic; EN – Early Neolithic; MN – Middle Neolithic; LN – Late Neolithic; EBA – Early Bronze Age; LBA – Late Bronze Age; IA – Iron Age; PE – Paleoeskimo; MED – Medieval

Finland Saami

At the western edge of northern Eurasia, genetic and strontium isotope data from ancient individuals at the Levänluhta site documents the presence of Saami ancestry in Southern Finland in the Late Holocene 1.5 kya. This ancestry component is currently limited to the northern fringes of the region, mirroring the pattern observed for AP ancestry in northeastern Siberia. However, while the ancient Saami individuals harbour East Asian ancestry, we find that this is better modelled by DGC rather than AP, suggesting that AP influence was likely restricted to the eastern side of the Urals. Comparison of ancient Finns and Saami with their present-day counterparts reveals additional gene flow over the past 1.6 kya, with evidence for West Eurasian admixture into modern Saami. The ancient Finn from Levänluhta shows lower Siberian ancestry than modern Finns .

EDIT (27 OCT 2018): By comparing the three, I see these are samples published already (at least two) in Lamnidis et al. (2018), but here with added (1) specific radiocarbon dates, (2) comparison with Neosiberian populations and (3) strontium isotope analyses.

Finnish_IA (ca. 350 AD) is probably a Saami-speaking individual, just like the Saami_IA with newly reported radiocarbon dates from Levänluhta ca. 400-600 AD (since Fennic peoples were then likely around the Gulf of Finland).

The conflicting strontium isotope data on marine dietary resources on certain samples from the supplementary material hint at possible external origin of the diet of some of the previously reported (and possibly one newly reported) Saami Iron Age individuals, from some 25-30 km. to the northwest through the river up to hundreds of km. to the southwest of Levänluhta (i.e. the whole coast of the Bothnian Sea). It is unclear why they would prefer an origin of the dietary source in southern Baltic regions instead of some km. to the west, though, unless that’s what they want to propose based on the sample’s admixture…

The coast of the Bothnian Sea (=the northern part of the Baltic Sea, between Sweden and Finland) lay only 25-30 km to the northwest, and accessible to the Iron Age people of the Levänluhta region via the Kyrönjoki river. (…) For individual JA2065/DA236, the low 87Sr/86Sr value (0.71078) would imply an exceptionally heavy reliance on Baltic Sea resources. The δ13C and δ15N values of the individual are near comparable (especially considering within-Baltic latitudinal gradients in δ13C; Torniainen et al. 2017) to the δ13C and δ15N values of a Middle Neolithic population on the Baltic island of Gotland (Eriksson, 2004) interpreted to have subsisted primarily on seals.

These new data on the samples give us some more information than what we already had, because the early date of Finnish_IA implies that there was few East Asian admixture (if any at all) in west Finland during the Roman Iron Age, which pushes still farther forward in time the expected appearance of Siberian ancestry among Saamic (first) and Fennic populations (later). It is unclear whether this East Asian ancestry found in Finnish_IA is actually related to DGC, or it is rather related to the ENA-like ancestry found already in Baltic hunter-gatherers (i.e. in some EHG samples from Karelia), for which Baikal_EN is a good proxy in Lazaridis et al. (2018).

Since Bronze Age and Iron Age samples from Estonia show more Baltic_HG drift compared to Corded Ware samples, it is likely that this supposedly DGC-related ancestry (here considered part of the ‘Siberian ancestry’) is actually an EHG-related ENA component of north-east European hunter-gatherers, with whom Finno-Saamic peoples admixed during the expansion of the Corded Ware culture into Finland.

The paper finds thus increased (probably the actual) Siberian ancestry in modern Finns compared to this Iron Age Saami individual. Coupled with the later Saami Iron Age samples, from between one to three centuries later – showing the start of Siberian ancestry influx – , we can begin to establish when the expansion of Siberian ancestry happened in central Finland, and thus quite likely when the Saami began to expand to the north and east and admix with Palaeo-Laplandic peoples.

Admixture modelling using qpAdm. Maps showing locations and ancestry proportions of ancient (left) and modern (right) groups.

One sample of haplogroup N1a1a1a1a4a1-M1982, Yana_MED, is found in the Arctic region (north-eastern Yakutia) ca. 1100 AD. Since it is derived from N1a1a1a1a-L392, it might be a surprise for some to find it in a clearly non-Uralic speaking environment at the same time other subclades of this haplogroup were admixing in the west with well-established Finno-Saamic, Volga-Finnic, Ugric, and Samoyedic populations…

On the growing doubts that these data – contradicting the CWC=IE theory – are creating among geneticists (from the supplementary materials):

NOTE. This paper comes from the Copenhagen group, also signed by Kristiansen, one of today’s strongest supporters of this connection

The Proto-Saami language evolved in southern Finland and Karelia in the Early Iron Age, an area now host to Finnish and the closely related Karelian, but with Saami toponyms showing that the latter two languages are intrusive here (Saarikivi 2004). Saami-speaking populations are thought to have retreated to Lapland during the Middle Iron Age (300–800 AD), where it diverged into the modern Saami dialects. Genetically, the northward retreat of the Saami language correlates with the documented decrease of Saami ancestry in Southern Finland between the Iron Age and the modern period (cf. Lamnidis et al. 2018).

On the way to Lapland, the Saami replaced at least two linguistically obscure groups. This can be inferred from 1) an influx of non-Uralic loanwords into Proto-Saami in the Finnish Lakeland area, and 2) an influx of non-Uralic, non-Germanic words into Saami dialects in Lapland (Aikio 2012). Both of these borrowing events imply contact with non-Saami-speaking groups, e.g. non-Uralic-speaking hunter-gatherers that may have left a genetic and linguistic footprint on modern Saami populations.

The linguistic prehistory of Finland thus does not allow for a straightforward interpretation of the genetic data. The detection of East Asian ancestry in the genetically Saami individual is indicative of a population movement from the east (cf. Lamnidis et al. 2018, Rootsi et al. 2007), one that given the affinities with the ~7.6 ky old individuals from the Devil’s Gate Cave may have been a western extension of the Neosiberian turnover. However, it remains unclear whether this gene flow should be associated with the arrival of Uralic speakers, thus providing further support for a Uralic homeland in Eastern Eurasia, or with an earlier immigration of pre-Uralic, so-called “Paleo-Lakelandic” groups.

I think the genetic interpretation is already straightforward, though. We had a sneak peek at how this late admixture with non-Uralians (mainly Palaeo-Lakelandic and Palaeo-Laplandic peoples from Lovozero and related asbestos ware cultures) is going to unfold among expanding Saami-speaking populations thanks to Lamnidis et al. (2018):

PCA plot of 113 Modern Eurasian populations, with individuals from this study projected on the principal components. Uralic speakers are highlighted in light purple. Image modified from Lamnidis et al. (2018)

Also, still no trace of R1a in far East Asia (reported as M17 ca. 5300 BC near Lake Baikal by Moussa et al. 2016), so I still have doubts about my previous assessment that R1a split into M17 (and thus also M417) in Siberia, with those expanding hunter-gatherer pottery.


Dzudzuana, Sidelkino, and the Caucasus contribution to the Pontic-Caspian steppe


It has been known for a long time that the Caucasus must have hosted many (at least partially) isolated populations, probably helped by geographical boundaries, setting it apart from open Eurasian areas.

David Reich writes in his book the following about India:

The genetic data told a clear story. Around a third of Indian groups experienced population bottlenecks as strong or stronger than the ones that occurred among Finns or Ashkenazi Jews. We later confirmed this finding in an even larger dataset that we collected working with Thangaraj: genetic data from more than 250 jati groups spread throughout India (…)

Rather than an invention of colonialism as Dirks suggested, long-term endogamy as embodied in India today in the institution of caste has been overwhelmingly important for millennia. (…)

The Han Chinese are truly a large population. They have been mixing freely for thousands of years. In contrast, there are few if any Indian groups that are demographically very large, and the degree of genetic differentiation among Indian jati groups living side by side in the same village is typically two to three times higher than the genetic differentiation between northern and southern Europeans. The truth is that India is composed of a large number of small populations.

There is little doubt now, based on findings spanning thousands of years, that the Mesolithic and Neolithic Caucasus hosted various very small populations, even if the ancestral components may be reduced to the few known to date (such as ANE, EHG, AME*, ENA, CHG, and other “deep” ancestral components).

NOTE. I will call the ancestral component of Dzudzuana/Anatolian hunter-gatherers Ancient Middle Easterner (AME), to give a clear idea of its likely extension during the Late Upper Palaeolithic, and to avoid using the more simplistic Dzudzuana, unless it is useful to mention these specific local samples.

Image modified from Lazaridis et al. (2018), including Caucasus, Don-Volga-Ural, and North Pontic Mesolithic-Neolithic populations. “Ancient West Eurasian population structure. (a) Geographical distribution of key ancient West Eurasian populations. (b) Temporal distribution of key ancient West Eurasian populations (approximate date in ky BP). (c) PCA of key ancient West Eurasians, including additional populations (shown with grey shells), in the space of outgroup f4-statistics (Methods).”

Genetic labs have a strong fixation with ancestry. I guess the use of complex statistical methods gives professionals and laymen alike the feeling of dealing with “Science”, as opposed to academic fields where you have to interpret data. I think language reveals a lot about the way people think, and the fact that ancestral components are called ‘lineages’ – while not wrong per se – is a clear symptom of the lack of interest in the true lineages: Y-DNA haplogroups.

Y-DNA bottlenecks

It has become quite clear that male-biased migrations are often the ones which can be confidently followed for actual population movements and ethnolinguistic identification, at least until the Iron Age. The frequently used Palaeolithic clusters offer a clear example of why ancestry does not represent what some people believe: They merely give a basic idea of sizeable population replacements by distant peoples.

Both concepts are important: sizeable and distant peoples. For example, during the Upper Palaeolithic in Europe there was a sizeable population replacement of the Aurignacian Goyet cluster by the Gravettian Vestonice cluster (probably from populations of far eastern Russia) coupled with the arrival of haplogroup I, although during the thousands of years that this material culture lasted, the previously expanded C1a2 lineages did not disappear, and there were probably different resurgence and admixture events.

Haplogroup I certainly expanded with the Gravettian culture to Iberia, where the Goyet ancestry did not change much – probably because of male-driven migrations -, to the extent that during the Magdalenian expansions haplogroup I expanded with an ancestry closer to Goyet, in what is called a ‘resurge’ of the Goyet cluster – even though there is a clear replacement of male lines.

The Villabruna (WHG) cluster is another good example. It probably spread with haplogroup R1b-L754, which – based on the extra ‘East Asian’ affinity of some samples and on modern samples from the Middle East – came probably from the east through a southern route, and not too long before the expansion of WHG likely from around the Black Sea, although this is still unclear. The finding of haplogroup I in samples of mostly WHG ancestry could confuse people that do not care about timing, sub-structured populations, and gene flow.

Image from David Reich’s Who We Are and How We Got Here. Having migrated out of Africa and the Near East, modern human pioneer populations spread throughout Eurasia (1). By at least thirty-nine thousand years ago, one group founded a lineage of European hunter-gatherers that persisted largely uninterrupted for more than twenty thousand years (2). Eventually, groups derived from an eastern branch of this founding population of European huntergatherers spread west (3), displaced previous groups, and were eventually themselves pushed out of northern Europe by the spread of glacial ice, shown at its maximum extent (top right). As the glaciers receded, western Europe was repeopled from the southwest (4) by a population that had managed to persist for tens of thousands of years and was related to an approximately thirty-five-thousand-year old individual from far western Europe. A later human migration, following the first strong warming period, had an even larger impact, with a spread from the southeast (5) that not only transformed the population of western Europe but also homogenized the populations of Europe and the Near East. At a single site—Goyet Caves in Belgium—ancient DNA from individuals spread over twenty thousand years reflects these transformations, with representatives from the Aurignacian, Gravettian, and Magdalenian periods.

NOTE. If you don’t understand why ‘clusters’ that span thousands of years don’t really matter for the many Palaeolithic population expansions that certainly happened among hunter-gatherers in Europe, just take a look at what happened with Bell Beakers expanding from Yamna into western Europe within 500 years.

If we don’t thread carefully when talking about population migrations, these terms are bound to confuse people. Just as the fixation on “steppe ancestry” – which marks the arrival in Chalcolithic Europe of peoples from the Pontic-Caspian region – has confused a lot of researchers to this day.

When I began to write about the Indo-European demic diffusion model, my concern was to find a single spot where a North-West Indo-European proto-language could have expanded from ca. 2000 BC (our most common guesstimate). Based on the 2015 papers, and in spite of their conclusions, I thought it had become clear that Corded Ware was not it, and it was rather Bell Beakers. I assumed that Uralic was spoken to the north (as was the traditional belief), and thus Corded Ware expanded from the forest zone, hence steppe ancestry would also be found there with other R1a lineages.

With the publication of Mathieson et al. (2017) and Olalde et al. (2017), I changed my mind, seeing how “steppe ancestry” did in fact appear quite late, hence it was likely to be the result of very specific population movements, probably directly from the Caucasus. Later, Mathieson published in a revision the sample from Alexandria of hg R1a-M417 (probably R1a-Z645, possibly Z93+), which further supported the idea that the migration of Corded Ware peoples started near the North Pontic forest-steppe (as I included in a the next revision).

The question remains the same I repeated recently, though: where do the extra Caucasus components (i.e. beyond EHG) of Eneolithic Ukraine/Corded Ware and Khvalynsk/Yamna come from?

Steppe ancestry: “EHG” + “CHG”?

About EHG ancestry

From Lazaridis et al. (2018):

Considering 2-way mixtures, we can model Karelia_HG as deriving 34 ± 2.8% of its ancestry from a Villabruna-related source, with the remainder mainly from ANE represented by the AfontovaGora3 (AG3) sample from Lake Baikal ~17kya.

AG3 was likely of haplogroup Q1a (as reported by YFull, see Genetiker), and probably the ANE ancestry found in Eastern Europe accompanied a Palaeolithic migration of Q1a2-M25 (formed ca. 22600 BC, TMRCA ca. 14300 BC).

NOTE. You can read more about the expansion of Q lineages during the Palaeolithic.

Combined with what we know about the Eneolithic Steppe and Caucasus populations – it is likely that ANE ancestry remained the most important component of some of the small ghost populations of the Caucasus until their emergence with the Lola culture.

Image modified from Wang et al. (2018). Samples projected in PCA of 84 modern-day West Eurasian populations (open symbols). Previously known clusters have been marked and referenced. Marked and labelled are the Balkan samples referenced in this text An EHG and a Caucasus ‘clouds’ have been drawn, leaving Pontic-Caspian steppe and derived groups between them. See the original file here. To understand the drawn potential Caucasus Mesolithic cluster, see above the PCA from Lazaridis et al. (2018).

The first sample we have now attributed to the EHG cluster is Sidelkino, from the Samara region (ca. 9300 BC), mtDNA U5a2. In Damgaard et al. (Science 2018), Yamnaya could be modelled as a CHG population related to Kotias Klde (54%) and the remaining from ANE population related to Sidelkino (>46%), with the following split events:

  1. A split event, where the CHG component of Yamnaya splits from KK1. The model inferred this time at 27 kya (though we note the larger models in Sections S2.12.4 and S2.12.5 inferred a more recent split time).
  2. A split event, where the ANE component of Yamnaya splits from Sidelkino. This was inferred at about about 11 kya.
  3. A split event, where the ANE component of Yamnaya splits from Botai. We inferred this to occur 17 kya. Note that this is above the Sidelkino split time, so our model infers Yamnaya to be more closely related to the EHG Sidelkino, as expected.
  4. An ancestral split event between the CHG and ANE ancestral populations. This was inferred to occur around 40 kya.

Other samples classified as of the EHG cluster:

  • Popovo2 (ca. 6250 BC) of hg J1, mtDNA U4d – Po2 and Po4 from the same site (ca. 6550 BC) show continuity of mtDNA.
  • Karelia_HG, from Juzhnii Oleni Ostrov (ca. 6300 BC): I0211/UzOO40 (ca. 6300 BC) of hg J1(xJ1a), mtDNA U4a; and I0061/UzOO74 of hg R1a1(xR1a1a), mtDNA C1
  • UzOO77 and UzOO76 from Juzhnii Oleni Ostrov (ca. 5250 BC) of mtDNA R1b.
  • Samara_HG from Lebyanzhinka (ca. 5600 BC) of hg R1b1a, mtDNA U5a1d.

From the analysis of Lazaridis et al. (2018), we have some details about their admixture:

Image modified from Lazaridis et al. (2018). Modeling present-day and ancient West-Eurasians. Mixture proportions computed with qpAdm (Supplementary Information section 4). The proportion of ‘Mbuti’ ancestry represents the total of ‘Deep’ ancestry from lineages that split prior to the split of Ust’Ishim, Tianyuan, and West Eurasians and can include both ‘Basal Eurasian’ and other (e.g., Sub-Saharan African) ancestry. (Left) ‘Conservative’ estimates. Each population 367 cannot be modeled with fewer admixture events than shown. (Right) ‘Speculative’ estimates. The highest number of sources (≤5) with admixture estimates within [0,1] are shown for each population. Some of the admixture proportions are not significantly different from 0 (Supplementary Information section 4).

About Anatolia_Neolithic ancestry

About the enigmatic Anatolia_Neolithic-related ancestry found in Pontic-Caspian steppe samples, this is what Wang et al. (2018) had to say:

We focused on model of mixture of proximal sources such as CHG and Anatolian Chalcolithic for all six groups of the Caucasus cluster (Eneolithic Caucasus, Maykop and Late Makyop, Maykop-Novosvobodnaya, Kura-Araxes, and Dolmen LBA), with admixture proportions on a genetic cline of 40-72% Anatolian Chalcolithic related and 28-60% CHG related (Supplementary Table 7). When we explored Romania_EN and Greece_Neolithic individuals as alternative southeast European sources (30-46% and 36-49%), the CHG proportions increased to 54-70% and 51-64%, respectively. We hypothesize that alternative models, replacing the Anatolian Chalcolithic individual with yet unsampled populations from eastern Anatolia, South Caucasus or northern Mesopotamia, would probably also provide a fit to the data from some of the tested Caucasus groups.


The first appearance of ‘Near Eastern farmer related ancestry’ in the steppe zone is evident in Steppe Maykop outliers. However, PCA results also suggest that Yamnaya and later groups of the West Eurasian steppe carry some farmer related ancestry as they are slightly shifted towards ‘European Neolithic groups’ in PC2 (Fig. 2D) compared to Eneolithic steppe. This is not the case for the preceding Eneolithic steppe individuals. The tilting cline is also confirmed by admixture f3-statistics, which provide statistically negative values for AG3 as one source and any Anatolian Neolithic related group as a second source

Modified image from Wang et al. (2018). In blue, Yamna-related populations. In red, Corded Ware-related populations, and two elevated Anatolia_Neolithic values in Yamna. Notice how only GAC-related admixture increases the Anatolian_N-related ancestry in the Yamna outlier from Ozero, and the late Yamna sample from Hungary, related to the homogeneous Yamna population. “Supplementary Table 14. P values of rank=3 and admixture proportions in modelling Steppe ancestry populations as a four-way admixture of distal sources EHG, CHG, Anatolian_Neolithic and WHG using 14 outgroups.Left populations: Steppe cluster, EHG, CHG, WHG, Anatolian_Neolithic. Right populations: Mbuti.DG, Ust_Ishim.DG, Kostenki14, MA1, Han.DG, Papuan.DG, Onge.DG, Villabruna, Vestonice16, ElMiron, Ethiopia_4500BP.SG, Karitiana.DG, Natufian, Iran_Ganj_Dareh_Neolithic.”

Detailed exploration via D-statistics in the form of D(EHG, steppe group; X, Mbuti) and D(Samara_Eneolithic, steppe group; X, Mbuti) show significantly negative D values for most of the steppe groups when X is a member of the Caucasus cluster or one of the Levant/Anatolia farmer-related groups (Supplementary Figs. 5 and 6). In addition, we used f- and D-statistics to explore the shared ancestry with Anatolian Neolithic as well as the reciprocal relationship between Anatolian- and Iranian farmer-related ancestry for all groups of our two main clusters and relevant adjacent regions (Supplementary Fig. 4). Here, we observe an increase in farmer-related ancestry (both Anatolian and Iranian) in our Steppe cluster, ranging from Eneolithic steppe to later groups. In Middle/Late Bronze Age groups especially to the north and east we observe a further increase of Anatolian farmer related ancestry consistent with previous studies of the Poltavka, Andronovo, Srubnaya and Sintashta groups and reflecting a different process not especially related to events in the Caucasus.

(…) Surprisingly, we found that a minimum of four streams of ancestry is needed to explain all eleven steppe ancestry groups tested, including previously published ones (Fig. 2; Supplementary Table 12). Importantly, our results show a subtle contribution of both Anatolian farmer-related ancestry and WHG-related ancestry (Fig.4; Supplementary Tables 13 and 14), which was likely contributed through Middle and Late Neolithic farming groups from adjacent regions in the West. The discovery of a quite old AME ancestry has rendered this probably unnecessary, because this admixture from an Anatolian-like ghost population could be driven even by small populations from the Caucasus.

Image modified from Wang et al. (2018). Marked are: in red, approximate limit of Anatolia_Neolithic ancestry found in Yamna populations; in blue, Corded Ware-related groups. “Modelling results for the Steppe and Caucasus 1128 cluster. Admixture proportions based on (temporally and geographically) distal and proximal models, showing additional Anatolian farmer-related ancestry in Steppe groups as well as additional gene flow from the south in some of the Steppe groups as well as the Caucasus groups (see also Supplementary Tables 10, 14 and 20).”

NOTE. For a detailed account of the possibilities regarding this differential admixture in the North Pontic area in contrast to the Don-Volga-Ural region, you can read the posts Sredni Stog, Proto-Corded Ware, and their “steppe admixture”, and Corded Ware culture origins: The Final Frontier.

While it is not yet fully clear, the increased Anatolian_Neolithic-like ancestry in Ukraine_Eneolithic samples (see below) makes it unlikely that all such ancestry in Corded Ware groups comes from a GAC-related contribution. It is likely that at least part of it represents contributions from populations of the Caucasus, based on the mostly westward population movements in the steppe from ca. 4600 BC on, including the Suvorovo-Novodanilovka expansion, and especially the Kuban-Maykop expansion during the final Eneolithic into the North Pontic area.

NOTE. Since CHG-like groups from the Caucasus may have combinations of AME and ANE ancestry similar to Yamna (which may thus appear as ‘steppe ancestry’ in the North Pontic area), it is impossible to interpret with precision the following ADMIXTURE graphic:

Modified image from Mathieson et al. (2018). Supervised ADMIXTURE analysis, modelling each ancient individual (one per row) as a mixture of population clusters constrained to contain northwestern-Anatolian Neolithic (grey), Yamnaya from Samara (yellow), EHG (pink) and WHG (green) populations. Dates in parentheses indicate approximate range of individuals in each population.

North-Eastern Technocomplex

The East Asian contribution to samples from the WHG samples (like Loschbour or La Braña), as specified in Fu et al. (2016), does not seem to be related to Baikal_EN, and appears possibly (in the ADMIXTURE analysis) integrated into he Villabruna component. I guess this implies that the shared alleles with East Asians are quite early, and potentially due to the expansion of R1b-L754 from the East.

It would be interesting to know the specific material culture Sidelkino belonged to – i.e. if it was related to the expansion of the North-Eastern Technocomplex – , and its Y-DNA. The Post-Swiderian expansion into eastern Europe, probably associated with the expansion of R1b-P297 lineages (including R1b-M73, found later in Botai and in Baltic HG) is supposed to have begun during the 11th millennium BC, but migrations to the Urals and beyond are probably concentrated in the 9th millennium, so this sample is possibly slightly early for R1b.

NOTE. User Rozenfeld at Anthrogenica posted this, which I think is interesting (in case anyone wants to try a Y-SNP call):

there is something strange with Sidelkino EHG: first, its archaeological context is not described in the supplementary. Second, its sex is not listed in the supplementary tables. Third, after looking for info about this sample, I found that: “Сиделькино-3. Для снятия вопроса о половой принадлежности индивида была проведена генетическая экспертиза, выявившая принадлежность останков мужчине.”(translation: Sidelkino-3. To resolve the question about sex of the remains, the genetic analysis was conducted, which showed that remains belonged to male), source: http://static.iea.ras.ru/books/7487_Traditsii.pdf

So either they haven’t mentioned his Y-DNA in the paper for some reason, or there are more than one Sidelkino sample and the male one has not yet been published. The coverage of the Sidelkino sample from the paper is 2.9, more than enough to tell Y-DNA haplogroup.

The map of spreading of Post-Swiderian and Post-Krasnosillian sites in Mesolithic of Eastern Europe in the 8th millennia BC. From Zaliznyak (see here).

My speculative guess right now about specific population movements in far eastern Europe, based on the few data we have:

  • The expansion of the North-Eastern Technocomplex first around the 9th millennium BC, most likely expanded R1b-P279 ca. 11300 BC, judging by its TMRCA, with both R1b-M73 (TMRCA 5300) and R1b-M269 (TMRCA 4400 BC) info (with extra El Mirón ancestry) back, and thus Eurasiatic.
  • The expansion of haplogroup J1 to the north may have happened before or after the R1b-P279 expansion. Judging by the increase in AG3-related ancestry near Karelia compared to Baltic_HG, it is possible that it expanded just after R1b-P279 (hence possibly J1-Y6304? TMRCA 9700 BC). Its long-lasting presence in the Caucasus is supported by the Satsurblia (ca. 11300 BC) and the Dolmen BA (ca. 1300 BC) samples.
  • The expansion of R1a-M17 ca. 6600 BC is still likely to have happened from the east, based on the R1a-M17 samples found in Baikalic cultures slightly later (ca. 5300 BC). The presence of elevated Baikal_EN ancestry in Karelia HG and in Samara HG, and the finding of R1a-M417 samples in the Forest Zone after the Mesolithic suggests a connection with the expansion of Hunter-Gatherer pottery, from the Elshanka culture in the Samara region northward into the Forset Zone and westward into the North Pontic area.
  • The expansion of R1b-M73 ca. 5300 BC is likely to be associated with the emergence of a group east of the Urals (related to the later Botai culture, and potentially Pre-Yukaghir). Its presence in a Narva sample from Donkalnis (ca. 5200 BC) suggest either an early split and spread of both R1b-P297 lineages (M73 and M269) through Eastern Europe, or maybe a back-migration with hunter-gatherer pottery.
  • R1b-M269 spread successfully ca. 4400 BC (and R1b-L23 ca. 4100 BC, both based on TMRCA), and this successful expansion is probably to be associated with the Khvalynsk-Novodanilovka expansion. We already know that Samara_HG ca. 5600 was R1b1a, so it is likely that R1b-M269 appeared (or ‘resurged’) in the Volga-Ural region shortly after the expansion of R1a-M17, whose expansion through the region may be inferred by the additional AG3 and Baikal_EN ancestry. Interesting from Samara_HG compared to the previous Sidelkino sample is the introduction of more El Mirón-related ancestry, typical of WHG populations (and thus proper of Baltic groups).

NOTE. The TMRCA dates are obviously gross approximations, because a) the actual rate of mutation is unknown and b) TMRCA estimates are based on the convergence of lineages that survived. The potential finding of R1a-Z645 (possibly Z93+) in Ukraine Eneolithic (ca. 4000 BC), and the potential finding of R1b-L23 in Khvalynsk ca. 4250 BC complicates things further, in terms of dates and origins of any subclade.

The question thus remains as it was long ago: did R1b-M269 lineages expand (‘return’) from the east, near the Urals, or directly from the north? Were they already near Samara at the same time as the expansion of hunter-gatherer pottery, and were not much affected by it? Or did they ‘resurge’ from populations admixed with Caucasus-related ancestry after the expansion of R1a-M17 with this pottery (since there are different stepped expansions from the Samara region)? We could even ask, did R1a-M17 really expand from the east, i.e. are the dates on Baikalic subclades from Moussa et al. (2016) reliable? Or did R1a-M17 expand from some pockets in the Pontic-Caspian steppe, taking over the expansion of HG pottery at some point?

Early Neolithic cultures in eastern and central Europe: 1–Yelshanian; 2–North Caspian; 3–Rakushechnyj Yar; 4–Surskian; 5–Dnieper-Donetsian; 6– Bug-Dniesterian; 7–Upper Volga; 8–Narvian; 9–Linear Pottery. White arrows: expansion of early farming; black arrows: spread of pottery-making traditions. From Dolukhanov et al. (2009).

Maglemose-related migrations

The most interesting aspect from the new paper (regarding Indo-Uralic migrations) is that Ancestral Middle Easterner ancestry will probably be a better proxy for the Anatolia_Neolithic component found in Ukraine Mesolithic to Eneolithic, and possibly also for some of the “more CHG-like” component found among Pontic-Caspian steppe populations, all likely derived from different admixture events with groups from the Caucasus.

NOTE. Even the supposed gene flow of Neolithic Iranian ancestry into the Caucasus can be put into question, since that means possibly a Dzudzuana-like population with greater “deep ancestry” proportion than the one found in CHG, which may still be found within the Caucasus.

If it was not clear already that following ‘steppe ancestry’ wherever it appears is a rather lame way of following Indo-European migrations, every single sample from the Caucasus and their admixture with Pontic-Caspian steppe populations will probably show that “steppe ancestry” is in fact formed by a variety of steppe-related ancestral components, impossible to follow coherently with a single population. Exactly what is happening already with the Siberian ancestry.

If the paper on the Dzudzuana samples has shown something, is that the expansion of an ANE-like population shook the entire Caucasus area up to the Zagros Mountains, creating this ANE – AME cline that are CHG and Iran_N, with further contributions of “deep ancestries” (probably from the south) complicating the picture further.

If this happens with few known samples, and we know of an ANE-like ghost population in the Caucasus (appearing later in the Lola culture), we can already guess that the often repeated “CHG component” found in Ukraine_Eneolithic and Khvalynsk will not be the same (except the part mediated by the Novodanilovka expansion).

This ANE-like expansion happened probably in the Late Upper Palaeolithic, and reached Northern Europe probably after the expansion of the Villabruna cluster (ca. 12000 BC), judging by the advance of AG3-like and ENA-like ancestry in later WHG samples.

The population movements during the Mesolithic and Early Neolithic in the North Pontic area are quite complicated: the extra AME ancestry is probably connected to the admixture with populations from the Caucasus, while the close similarity of Ukraine populations with Scandinavian ones (with an increase in Villabruna ancestry from Mesolithic to Neolithic samples), probably reveal population movements related to the expansion of Maglemose-related groups.

Etno-cultural situation in Central and Eastern Europe in the Late Mesolithic — Early Neolithic (VI—V Mill. BC) (after Конча 2004: 201, карта 1; made after ideas by L. L. Zaliznyak). Legend: 1 — Maglemose circle in the VII Mill. BC (after Gr. Clark); 2—7 — Mesolithic cultures of the Post-Maglemose tradition, VI Mill. BC (after S. Kozłowsky, L. L. Zaliznyak): 2 — de Leyen-Wartena; 3 — Oldesloe — Godenaa; 4 — Chojnice — Peńki; 5 — Janisłavice; 6 — finds of Janisłavice artefacts outside of the main area; 7 — Donets culture; 8 — directions of the settling of Janisłavice people (after S. Kozłowsky and L. L. Zaliznyak); 9 — the south border of Mesolithic and Early Neolithic cultures of post-Swidrian and post-Arensburgian traditions; 10 — northern border of settlement of the Balkan-Danubian farmers; 11 — Bug- Dniester culture; 12 — Neolithic cultures emerged on the ethno-cultural basis of post-Maglemose: Э — Ertebölle-Ellerbeck, Н — Neman, Д — Dnieper-Donets, М — Mariupol (western variants). From Klein (2017).

These Maglemose-related groups were probably migrants from the north-west, originally from the Northern European Plains, who occupied the previous Swiderian territory, and then expanded into the North Pontic area. The overwhelming presence of I2a (likely all I2a2a1b1b) lineages in Ukraine Neolithic supports this migration.

The likely picture of Mesolithic-Neolithic migrations in the North Pontic area right now is then:

  1. Expansion of R1a-M459 from the east ca. 12000 BC – probably coupled with AG3 and also some Baikal_EN ancestry. First sample is I1819 from Vasilievka (ca. 8700 BC), another is from Dereivka ca. 6900 BC.
  2. Expansion of R1b-V88 from the Balkans in the west ca. 9700 BC, based on its TMRCA and also the Balkan hunter-gatherer population overwhemingly of this haplogroup from the 10th millennium until the Neolithic. First sample is I1734 from Vasilievka (ca. 7252 BC), which suggests that it replaced the male population there, based on their similar EHG-like adxmixture (and lack of sizeable WHG increase), and shared mtDNA U5b2, U5a2.
  3. Expansion of I2a-Y5606 probably ca. 6800 based on its TMRCA with Janislawice culture. Supporting this is the increase in WHG contribution to Neolithic samples, including the spread of U4 subclades compared to the previous period.
  4. Expansion of R1a-M17 starting probably ca. 6600 BC in the east (see above).

NOTE. The first sample of haplogroup I appears in the Mesolithic: I1763 (ca. 8100 BC) of haplogroup I2a1, probably related to an older Upper Palaeolithic expansion.

Distribution of archeological cultures in the North Pontic Region during the Mesolithic (7th – 6th millennium BCE). Dotted, dashed and solid lines with corresponding arrows indicate alternative models of the spread of the Grebenyky culture groups. (After Bryuako IV., Samojlova TL., Eds, Drevnie kul’tury Severo-­‐Zapadnogo Prichernomor’ya, Odessa: SMIL, 2013.) Nikitin – Ivanova 2017.


It is becoming more and more clear with each new paper that – unless the number of very ancient samples increases – the use of Y-chromosome haplogroups remains one of the most important tools for academics; this is especially so in the steppes, in light of the diversity found in populations from the Caucasus. A clear example comes from the Yamna – Corded Ware similarities:

After the publication of the 2015 papers, it was likely that Yamna expanded with haplogroup R1b-L23, but it has only become crystal clear that Yamna expanded through the steppes into Bell Beakers, now that we have data about the strict genetic homogeneity of the whole Yamna population from west to east (including Afanasevo), in contrast with contemporary Corded Ware peoples which expanded from a different forest-steppe population.

The presence of haplogroups Q and R1a-M459 (xM17) in Khvalynsk along with a R1b1a sample, which some interpreted as being akin to modern ‘mixed’ populations in the past, is likely to point instead to a period of Khvalynsk-Novodanilovka expansion with R1b-M269, where different small populations from the steppe were being integrated into the common Khvalynsk stock, but where differences are seen in material culture surrounding their burials, as supported by the finding of R1b1 in the Kuban area already in the first half of the 5th millennium. The case would be similar to the early ‘mixed’ Icelandic population.

Only after the emergence of the Samara culture (in the second half of the 6th millennium BC), with a sample of haplogroup R1b1a, starts then the obvious connection with Early Proto-Indo-Europeans; and only after the appearance of late Sredni Stog and haplogroup R1a-M417 (ca. 4000 BC) is its connection with Uralic also clear. In previous population movements, I think more haplogroups were involved in migrations of small groups, and only some communities among them were eventually successful, expanding to be dominant, creating ever growing cultures during their expansions.

Indeed, if you think in terms of Uralic and Indo-European just as converging languages, and forget their potential genetic connection, then the genetic + linguistic picture becomes simplified, and the upper frontier of the 6th millennium BC with a division North Pontic (Mariupol) vs. Volga-Ural (Samara) is enough. However, tracing their movements backwards – with cultural expansions from west to east (with the expansion of farming), and earlier east to west (with hunter-gatherer pottery), and still earlier west to east (with the north-eastern technocomplex), offers an interesting way to prove their potential connection to macrofamilies, at least in terms of population movements.

Modified image from Tambets et al. (2018) Proportions of ancestral components in studied European and Siberian populations and the tested qpGraph model. a The qpGraph model fitting the data for the tested populations. Colour codes for the terminal nodes: pink—modern populations (‘Population X’ refers to test population) and yellow—ancient populations (aDNA samples and their pools). Nodes coloured other than pink or yellow are hypothetical intermediate populations. We putatively named nodes which we used as admixture sources using the main recipient among known populations. The colours of intermediate nodes on the qpGraph model match those on the admixture proportions panel. The NeolL (Neolithic Levant) ancestry selected in this qpGraph is likely to correspond (at least in part) to a specific Dzudzuana-like component present in the CHG-like population that admixed in the North Pontic area.

I am quite convinced right now that it would be possible to connect the expansion of R1b-L754 subclades with a speculative Nostratic (given the R1b-V88 connection with Afroasiatic, and the obvious connection of R1b-L297 with Eurasiatic). Paradoxically, the connection of an Indo-Uralic community in the steppes (after the separation of Yukaghir) with any lineage expansion (R1a-M17, R1b-M269, or even Q, I or J1) seems somehow blurrier than one year ago, possibly just because there are too many open possibilities.

David Reich says about the admixture with Neanderthals, which he helped discover:

At the conclusion of the Neanderthal genome project, I am still amazed by the surprises we encountered. Having found the first evidence of interbreeding between Neanderthals and modern humans, I continue to have nightmares that the finding is some kind of mistake. But the data are sternly consistent: the evidence for Neanderthal interbreeding turns out to be everywhere. As we continue to do genetic work, we keep encountering more and more patterns that reflect the extraordinary impact this interbreeding has had on the genomes of people living today.

I think this is a shared feeling among many of us who have made proposals about anything, to fear that we have made a gross, evident mistake, and constantly look for flaws. However, it seems to me that geneticists are more preoccupied with being wrong in their developed statistical methods, in the theoretical models they are creating, and not so much about errors in the true ancient ethnolinguistic picture human population genetics is (at least in theory) concerned about. Their publications are, after all, constantly associating genetic finds with cultures and (whenever possible) languages, so this aspect of their research should not be taken lightly.

Seeing how David Anthony or Razib Khan (among many others) have changed their previously preferred migration models as new data was published, and they continue to be respected in their own fields, I guess we can be confident that professionals with integrity are going to accept whatever new picture appears. While I don’t think that genetic finds can change what we can reconstruct with comparative grammar, I am also ready to revise guesstimates and routes of expansion of certain dialects if R1a-Z645 is shown to have accompanied Late Proto-Indo-Europeans during their expansion with Yamna, and later integrated somehow with Corded Ware.

However, taking into account the obsession of some with an ancestral, uninterrupted R1a—Indo-European association, and the lack of actual political repercussion of Neanderthal admixture, I think the most common nightmare that all genetic researchers should be worried about is to keep inflating this “Yamnaya ancestry”-based hornet’s nest, which has been constantly stirred up for the past two years, by rejecting it – or, rather, specifying it into its true complex nature.

This succession of corrections and redefinitions, coupled with the distinct Y-DNA bottleneck of each steppe population, will eventually lead to a completely different ethnolinguistic picture of the Pontic-Caspian region during the Eneolithic, which is likely to eventually piss off not only reasonable academics stubbornly attached to the CWC-IE idea, but also a part of those interested in daydreaming about their patrilineal ancestors.

Sometimes it’s better to just rip off the band-aid once and for all…

Featured image from The oldest pottery in hunter-gatherer communitiesand models of Neolithisation of Eastern Europe (2015), by Andrey Mazurkevich and Ekaterina Dolbunova.


Interesting is today’s post in Ancient DNA Era: Is Male-driven Genetic Replacement always meaning Language-shift?

The genetic makings of South Asia – IVC as Proto-Dravidian


Review (behind paywall) The genetic makings of South Asia, by Metspalu, Monda, and Chaubey, Current Opinion in Genetics & Development (2018) 53:128-133.

Interesting excerpts (emphasis mine):

(…) the spread of agriculture in Europe was a result of the demic diffusion of early Anatolian farmers, it was discovered that the spread of agriculture to South Asia was mediated by a genetically completely different farmer population in the Zagros mountains in contemporary Iran (IF). The ANI-ASI cline itself was interpreted as a mixture of three components genetically related to Iranian agriculturalists, Onge and Early and Middle Bronze Age Steppe populations (Steppe_EMBA).

The first ever autosomal aDNA from South Asia comes from Northern Pakistan (Swat Valley, early Iron Age). This study presented altogether 362 aDNA samples from the broad South and Central Asia and contributes substantially to our understanding of the evolutionary past of South and Central Asia. The study redefines the three genetic strata that form the basis of the Indian Cline. The Indus Periphery (IP) component is composed of (varying proportions of): first, IF, second, Ancient Ancestral South Asians (AASI), which represents an ancient branch of human genetic variation in Asia arising from a population split contemporaneous with the splits of East Asian, Onge and Australian Aboriginal ancestors and third, West_Siberian Hunter gatherers (WS_HG).

The authors argue that IP could have formed the genetic base of the Indus Valley Civilization (IVC). Upon the collapse of the IVC IP contributes to the formation of both ASI and ANI. ASI is formed as IP admixes further with AASI. ANI in turn forms when IP admixes with the incoming Middle and Late Bronze Age Steppe (Steppe_MLBA) component, (rather than the Steppe_EMBA groups suggested earlier)

A sketch of the peopling history of South Asia. Depicting the full complexity of available reconstructions is not attempted. Placing of population labels does not indicate precise geographic location or range of the population in question. Rather we aim to highlight the essentials of the recent advancements in the field. We divide the scenario into three time horizons: Panels (a) before 10 000 BCE (pre agriculture era.); (b) 10 000 BCE to 3000 BCE (agriculture era) and (c) 3000 BCE to prehistoric era/modern era. (iron age).

Dating of the arrival of the Austro-Asiatic speakers in South Asia-based on Y chromosome haplogroup O2a1-M95 expansion estimates yielded dates between 3000 and 2000 BCE [30]. However, admixture LD decay-based approach on genome-wide data suggests the admixture between South Asian and incoming Austro-Asiatic speakers occurred slightly later between 1800 and 0 BCE (Tätte et al. submitted). It is interesting that while the mtDNA variants of the Mundas are completely South Asian, the Y chromosome variation is dominated at >60% by haplogroup O2a which is phylogeographically nested in East Asian-specific paternal lineages.

In India, the speakers of Tibeto-Burman (TB) languages live in the Seven Sisters States in Northeast India and in the very north of the country. Genetically they show a clear East Asian origin and around 20% of subsequent admixture with South Asians within the last 1000 years.The genetic flavour of East Asia in TB is different from that in Munda speakers as the best surrogates for the East Asian admixing component are contemporary Han Chinese.

I found the simplistic migration maps especially interesting to illustrate ancient population movements. The emergence of EHG is supposed to involve a WHG:ANE cline, though, and this isn’t clear from the map. Also, there is new information on what may be at the origin of WHG and Anatolian hunter-gatherers.

From the recent Reich’s session on South Asia at ISBA 8:

– Tale of three clines, with clear indication that “Indus Periphery” samples drawn from an already-cosmopolitan and heterogeneous world of variable ASI & Iranian ancestry. (I know how some people like to pore over these pictures – so note red dots = just dummy data for illustration.)
– Some more certainty about primary window of steppe ancestry injection into S. Asia: 2000-1500 BC
Alexander M. Kim

Featured image: map of South Asian languages from http://llmap.org.