“Steppe ancestry” step by step: Khvalynsk, Sredni Stog, Repin, Yamna, Corded Ware


Wang et al. (2018) is obviously a game changer in many aspects. I have already written about the upcoming Yamna Hungary samples, about the new Steppe_Eneolithic and Caucasus Eneolithic keystones, and about the upcoming Greece Neolithic samples with steppe ancestry.

An interesting aspect of the paper, hidden among so many relevant details, is a clearer picture of how the so-called Yamnaya or steppe ancestry evolved from Samara hunter-gatherers to Yamna nomadic pastoralists, and how this ancestry appeared among Proto-Corded Ware populations.

Image modified from Wang et al. (2018). Marked are in orange: equivalent Steppe_Maykop ADMIXTURE; in red, approximate limit of Anatolia_Neolithic ancestry found in Yamna populations; in blue, Corded Ware-related groups. “Modelling results for the Steppe and Caucasus cluster. Admixture proportions based on (temporally and geographically) distal and proximal models, showing additional Anatolian farmer-related ancestry in Steppe groups as well as additional gene flow from the south in some of the Steppe groups as well as the Caucasus groups.”

Please note: arrows of “ancestry movement” in the following PCAs do not necessarily represent physical population movements, or even ethnolinguistic change. To avoid misinterpretations, I have depicted arrows with Y-DNA haplogroup migrations to represent the most likely true ethnolinguistic movements. Admixture graphics shown are from Wang et al. (2018), and also (the K12) from Mathieson et al. (2018).

1. Samara to Early Khvalynsk

The so-called steppe ancestry was born during the Khvalynsk expansion through the steppes, probably through exogamy of expanding elite clans (eventually all R1b-M269 lineages) originally of Samara_HG ancestry. The nearest group to the ANE-like ghost population with which Samara hunter-gatherers admixed is represented by the Steppe_Eneolithic / Steppe_Maykop cluster (from the Northern Caucasus Piedmont).

Steppe_Eneolithic samples, of R1b1 lineages, are probably expanded Khvalynsk peoples, showing thus a proximate ancestry of an Early Eneolithic ghost population of the Northern Caucasus. Steppe_Maykop samples represent a later replacement of this Steppe_Eneolithic population – and/or a similar population with further contribution of ANE-like ancestry – in the area some 1,000 years later.


This is what Steppe_Maykop looks like, different from Steppe_Eneolithic:


NOTE. This admixture shows how different Steppe_Maykop is from Steppe_Eneolithic, but in the different supervised ADMIXTURE graphics below Maykop_Eneolithic is roughly equivalent to Eneolithic_Steppe (see orange arrow in ADMIXTURE graphic above). This is useful for a simplified analysis, but actual differences between Khvalynsk, Sredni Stog, Afanasevo, Yamna and Corded Ware are probably underestimated in the analyses below, and will become clearer in the future when more ancestral hunter-gatherer populations are added to the analysis.

2. Early Khvalynsk expansion

We have direct data of Khvalynsk-Novodanilovka-like populations thanks to Khvalynsk and Steppe_Eneolithic samples (although I’ve used the latter above to represent the ghost Caucasus population with which Samara_HG admixed).

We also have indirect data. First, there is the PCA with outliers:


Second, we have data from north Pontic Ukraine_Eneolithic samples (see next section).

Third, there is the continuity of late Repin / Afanasevo with Steppe_Eneolithic (see below).

3. Proto-Corded Ware expansion

It is unclear if R1a-M459 subclades were continuously in the steppe and resurged after the Khvalynsk expansion, or (the most likely option) they came from the forested region of the Upper Dnieper area, possibly from previous expansions there with hunter-gatherer pottery.

Supporting the latter is the millennia-long continuity of R1b-V88 and I2a2 subclades in the north Pontic Mesolithic, Neolithic, and Early Eneolithic Sredni Stog culture, until ca. 4500 BC (and even later, during the second half).

Only at the end of the Early Eneolithic with the disappearance of Novodanilovka (and beginning of the steppe ‘hiatus’ of Rassamakin) is R1a to be found in Ukraine again (after disappearing from the record some 2,000 years earlier), related to complex population movements in the north Pontic area.

NOTE. In the PCA, a tentative position of Novodanilovka closer to Anatolia_Neolithic / Dzudzuana ancestry is selected, based on the apparent cline formed by Ukraine_Eneolithic samples, and on the position and ancestry of Sredni Stog, Yamna, and Corded Ware later. A good alternative would be to place Novodanilovka still closer to the Balkan outliers (i.e. Suvorovo), and a source closer to EHG as the ancestry driven by the migration of R1a-M417.


The first sample with steppe ancestry appears only after 4250 BC in the forest-steppe, centuries after the samples with steppe ancestry from the Northern Caucasus and the Balkans, which points to exogamy of expanding R1a-M417 lineages with the remnants of the Novodanilovka population.


4. Repin / Early Yamna expansion

We don’t have direct data on early Repin settlers. But we do have a very close representative: Afanasevo, a population we know comes directly from the Repin/late Khvalynsk expansion ca. 3500/3300 BC (just before the emergence of Early Yamna), and which shows fully Steppe_Eneolithic-like ancestry.


Compared to this eastern Repin expansion that gave Afanasevo, the late Repin expansion to the west ca. 3300 BC that gave rise to the Yamna culture was one of colonization, evidenced by the admixture with north Pontic (Sredni Stog-like) populations, no doubt through exogamy:


This admixture is also found (in lesser proportion) in east Yamna groups, which supports the high mobility and exogamy practices among western and eastern Yamna clans, not only with locals:


5. Corded Ware

Corded Ware represents a quite homogeneous expansion of a late Sredni Stog population, compatible with the traditional location of Proto-Corded Ware peoples in the steppe-forest/forest zone of the Dnieper-Dniester region.


We don’t have a comparison with Ukraine_Eneolithic or Corded Ware samples in Wang et al. (2018), but we do have proximate sources for Abashevo, when compared to the Poltavka population (with which it admixed in the Volga-Ural steppes): Sintashta, Potapovka, Srubna (with further Abashevo contribution), and Andronovo:


The two CWC outliers from the Baltic show what I thought was an admixture with Yamna. However, given the previous mixture of Eneolithic_Steppe in north Pontic steppe-forest populations, this elevated “steppe ancestry” found in Baltic_LN (similar to west Yamna) seems rather an admixture of Baltic sub-Neolithic peoples with a north Pontic Eneolithic_Steppe-like population. Late Repin settlers also admixed with a similar population during its colonization of the north Pontic area, hence the Baltic_LN – west Yamna similarities.

NOTE. A direct admixture with west Yamna populations through exogamy by the ancestors of this Baltic population cannot be ruled out yet (without direct access to more samples), though, because of the contacts of Corded Ware with west Yamna settlers in the forest-steppe regions.


A similar case is found in the Yamna outlier from Mednikarovo south of the Danube. It would be absurd to think that Yamna from the Balkans comes from Corded Ware (or vice versa), just because the former is closer in the PCA to the latter than other Yamna samples. The same error is also found e.g. in the Corded Ware → Bell Beaker theory, because of their proximity in the PCA and their shared “steppe ancestry”. All those theories have been proven already wrong.

NOTE. A similar fallacy is found in potential Sintashta→Mycenaean connections, where we should distinguish statistically that result from an East/West Yamna + Balkans_BA admixture. In fact, genetic links of Mycenaeans with west Yamna settlers prove this (there are some related analyses in Anthrogenica, but the site is down at this moment). To try to relate these two populations (separated more than 1,000 years before Sintashta) is like comparing ancient populations to modern ones, without the intermediate samples to trace the real anthropological trail of what is found…Pure numbers and wishful thinking.


Yamna and Corded Ware show a similar “steppe ancestry” due to convergence. I have said so many times (see e.g. here). This was clear long ago, just by looking at the Y-chromosome bottlenecks that differentiate them – and Tomenable noticed this difference in ADMIXTURE from the supplementary materials in Mathieson et al. (2017), well before Wang et al. (2018).

This different stock stems from (1) completely different ancestral populations + (2) different, long-lasting Y-chromosome bottlenecks. Their similarities come from the two neighbouring cultures admixing with similar populations.

If all this does not mean anything, and each lab was going to support some pre-selected archaeological theories from the 1960s or the 1980s, coupled with outdated linguistic models no matter what – Anthony’s model + Ringe’s glottochronological tree of the early 2000s in the Reich Lab; and worse, Kristiansen’s CWC-IE + Germano-Slavonic models of the 1940s in the Copenhagen group – , I have to repeat my question again:

What’s (so much published) ancient DNA useful for, exactly?


On the origin of haplogroup R1b-L51 in late Repin / early Yamna settlers


A recent comment on the hypothetical Central European origin of PIE helped me remember that, when news appeared that R1b-L51 had been found in Khvalynsk ca. 4250-4000 BC, I began to think about alternative scenarios for the expansion of this haplogroup, with one of them including Central Europe.

Because, if YFull‘s (and Iain McDonald‘s) estimation of the split of R1b-L23 in L51 and Z2103 (ca. 4100 BC, TMRCA ca. 3700 BC) was wrong, by as much as the R1a-Z645 estimates proved wrong, and both subclades were older than expected, then maybe R1b-L51 was not part of the Yamna expansion, but rather part of an earlier expansion with Suvorovo-Novodanilovka into central Europe.

That is, R1b-L51 and R1b-Z2103 would have expanded wih Khvalynsk-Novodanilovka migrants, and they would have either disappeared among local populations, or settled and expanded with successful lineages in certain regions. I think this may give rise to two potential models.

A hidden group in the European east-central steppes?

Here is what Heyd (2011), for example, has to say about the effect of the Khvalynsk-Novodanilovka expansion in the 4th millennium BC, with the first Kurgan wave that shuttered the social, economic, and cultural foundations of south-eastern Europe (before the expansion of west Yamna migrants in the region):

Proto-Anatolian migrations with Khvalynsk-Novodanilovka expansion, including ADMIXTURE data from Wang et al. (2018).

As the Boleraz and Baden tumuli cases in Serbia and Hungary demonstrate, there are earlier, 4th millennium cal. B.C. round tumuli in the Carpathian basin. There are also earlier north-Pontic steppe populations who infiltrated similar environments west of the Black Sea prior to the rise of the Yamnaya culture. This situation can be traced back to the 2nd half of the 5th millennium cal. B.C. to a group of distinct burials, zoomorphic maceheads, long flint blades, triangular flint points, etc., summarized under the term Suvurovo-Novodanilovka (Govedarica 2004; Rassamakin 2004; Anthony 2007; Heyd forthcoming 2011). They also erected round personalized tumuli, though smaller in size and height, above inhumations of single individuals. Suvorovo and Casimcea are the key examples in the lower Danube region of Romania. In northeast Bulgaria, the primary grave of Polska Kosovo (ochre-stained supine extended body position: information communicated by S. Alexandrov) can also be seen as such, as should the Targovishte-“Gonova mogila” primary grave 1 in the Thracian plain with a burial arranged in a supine position with flexed legs, southeast-northwest orientated, and strewed with ochre (Kanchev 1991 , p. 56- 57; Ivanova Gaydarska 2007). In addition to the many copper and shell beads, the 17.4cm long obsidian blade is exceptional, which links this grave to the Csongrád-“Kettoshalom” grave in the south Hungarian plain (Ecsedy 1979). It also yielded an obsidian blade ( 13.2cm long) and copper, shell and limestone beads.

The Southeast European distribution of graves of the Suvorovo-Novodanilovka group and such unequipped ones mentioned in the text which can be attributed by burial custom and stratigraphic position in the barrow, plus zoomorphic and abstract animal head sceptres as well as specific maceheads with knobs as from Decea Maresului (mid-5th millennium until around 4000 BC). Heyd (2016).

However, no traces of a tumulus have been recorded above the Kettoshalom tomb. Conventionally, it is dated to the Bodrogkeresztur-period in east Hungary, shortly after 4000 cal. B.C., which would correspond very well with the suggested Cernavodă I (or its less known cultural equivalent in the Thracian plain) attribution for the “Gonova mogila” grave, a cultural background to which the Csongrád grave should have also belonged. Bodrogkeresztur and Cernavodă I periods are not the only examples of 4th millennium cal. B.C. tumuli and burials displaying this steppe connection. Indeed we can find this early steppe impact throughout the 4th millennium cal. B.C. These include adscriptions to the Horodiștea II (Corlateni-Dealul Stadole, grave I: Burtanescu l 998, p. 37; Holbocai, grave 34: Coma 1998, p. 16); to Gordinești-Cernavodă 11 (Liești-Movila Arbănașu, grave 22: Brudiu 2000); to Gorodsk-Usatovo (Corlăteni Dealul Cetăţii, grave I: Comșa 1998, p. 17- 18, in Romania; Durankulak, grave 982: Vajsov 2002, in Bulgaria); and to Cernavodă III(Golyama Detelina, tum. 4: Leshtakov, Borisov 1995), and early (end of 4th millennium cal. B.C.) Ezero in Ovchartsi, primary grave (Kalchev 1994, p. 134-138) and Golyama Detelina, tum. 2 (Kanchev 1991) in Bulgaria. Also the Boleráz and Baden tumuli of Banjevac-Tolisavac and Mokrin in the south Carpathian basin account for this, since one should perhaps take into account primary grave 12 of the Sárrédtudavari-Orhalom tumulus in the Hungarian Alfold: a left-sided crouched juvenile ( 15- 17 y) individual in an oval, NW-SE orientated grave pit 14C dated to 3350-3100 cal. B.C. at 2 sigma (Dani, Ncpper 2006). Neither the burial custom (no ochre strewing or depositing a lump of ochre has been recorded), nor date account for its ascription to the Yamnaya!

All of these tumuli and burials demonstrate, though, that there is already a constant but perhaps low-level 4th millennium cal. B.C. steppe interaction, linking the regions of the north of the Black Sea with those of the west, and reaching deep into the Carpathian basin. This has to be acknowledged. even if these populations remain small, bounded to their steppe habitat with an economy adapted to this special environment, and are not always visible in the record. Indirect hints may help in seeing them, such as the frequent occurrence of horse bones, regarded as deriving from domesticated horses, in Hungarian Baden settlements (Bokonyi 1978; Benecke 1998), and in those of the south German Cham Culture (Matuschik 1999, p. 80-82) and the east German Bernburg Culture (Becker 1999; Benecke 1999). These occur, however, always in low numbers, perhaps not enough to maintain and regenerate a herd. Does this point us towards otherwise archaeologically hidden horsebreeders in the Carpathian basin, before the Yamnaya? In any case, I hope to make one case clear: these are by no means Yamnaya burials in the strict definition! Attribution to the Yamnaya in its strict definition applies.

Distribution of Pit-Grave burials west of the Black Sea likely dating to the 2nd half of the 4th millennium BC (triangles: side-crouched burials; filled circles: supine extended burials; open circles: suspected). In Alin Frînculeasa, Bianca Preda, Volker Heyd, Pit-Graves, Yamnaya and Kurgans along the Lower Danube.

Also, about the expansion of Yamna settlers along the steppes:

However, it should have been made clear by the distribution map of the Western Yamnaya that they were confining themselves solely to their own, well-known, steppe habitat and therefore not occupying, or pushing away and expelling, the locally settled farming societies. Also, living solely in the steppes requires another lifestyle, and quite different economic and social bases, most likely very different to the established farming societies. Although surely regarded as incoming strangers, they may therefore not have been seen as direct competitors. This argument can be further enforced when remembering that the lowlands and the steppes in the southeast of Europe had already been populated throughout the 4th millennium cal. B.C., as demonstrated above, by societies with a similar north-Pontic steppe origin and tradition, albeit in lower numbers. It is only for these groups that the Yamnaya may have become a threat, but their common origin and perhaps a similar economic/ social background with comparable lifestyles would surely have assisted to allow rapid assimilation. More important, though, is that farming societies in this region may therefore have been accustomed to dealing and interacting with different people and ethnic strangers for a long time. (…)

When assessing farming and steppe societies’ interaction from a general point of view, attitudes can diverge in three main directions:

  1. the violent one; with raids, fights, struggles, warfare, suppression and finally the superiority and exploitation of the one over the other;
  2. the peaceful one; with a continuous exchange of gifts, goods, work, information and genes in a balanced reciprocal system, leading eventually to the merging of the two societies and creation of a new identity;
  3. the neutral one; with the two societies ignoring each other for a long time.

What we see from trying to understand the record of the Yamnaya, based on their tumuli and burials, and the local and neighbouring contemporary societies, based on their settlements, hoards, and graves, is likely a mixture of all three scenarios, with the balance perhaps more towards exchange in a highly dynamic system with alterations over time. However, violence and raids cannot be ruled out; they would be difficult to see in the archaeological record; or only indirectly, such as the building of hill forts, particularly the defence-like chain of Vucedol hillforts along the south shore of the Danube on the Serbian/Croatian border zone (Tasic 1995a), and the retreat of people into them (Falkenstein 1998, p. 261-262), with other interpretations also possible. And finally, we are dealing here with very different local and neighbouring societies, as well as with more distant contemporary ones, looking, in reality, rather like a chequer board of societies and archaeological cultures (see Parzinger 1993 for the overview). These display different regional backgrounds and traditions leading to different social and settlement organizations, different economic bases and material cultures in the wide areas between Prut and Maritza rivers, and Black Sea and Tisza river. They surely found their individual way of responding to the incoming and settling Yamnaya people.

Yamnaya tumuli signalling the expansion of West Yamna from ca. 3100 BC (especially after ca. 2950 BC). Heyd (2011).

The best data we have about this potential non-Yamna origin of R1b-L51 – and thus in favour of its admixture in the Carpathian basin – lies in:

  1. The majority of R1a-Z2103 subclades found to date among Yamna samples.
  2. The presence of R1b-Z2103 in the Catacomb culture – in the Northern Caucasus and in Ukraine.
  3. The limited presence of (ancient and modern) R1b-L51 in eastern Europe and India, whose isolated finds are commonly (and simplistically) attributed to ‘late migrations’.
  4. The presence of R1b-L51 (xZ2103) in cultures related to the ‘Yamna package’, but supposedly not to Yamna settlers. So for example I7043, of haplogroup R1b-L151(xU106,xP312), ca. 2500-2200 BC from Szigetszentmiklós-Üdülősor, probably from the Bell Beaker (Csepel group), but maybe from the early Nagýrev culture.
  5. The expansion of its subclades apparently only from a single region, around the Carpathian basin, in contrast to R1b-Z2103.
  6. The already ‘diluted’ steppe admixture found in the earliest samples with respect to Yamna, which points to the appearance after the Yamna admixture with the local population.
  7. Ukrainian archaeologists (in contrast to their Russian colleagues) point to the relevance of North Pontic cultures like Kvitjana and Lower Mikhailovka in the development of Early Yamna in the west, and some eastern European researchers also believe in this similarity.
  8. If R1b-Z2103 and R1b-L51 had expanded with Suvorovo-Novodanilovka migrants to the west, and had admixed later as Hungary_LCA-LBA-like peoples with Yamna migrants during the long-term contacts with other ‘kurganized cultures’ ca. 2900-2500 BC in the Great Hungarian Plains, it could explain some peculiar linguistic traits of North-West Indo-European, and also why R1b-Z2103 appears in cultures associated with this earlier ‘steppe influence’ (i.e. not directly related to Yamna) such as Vučedol (with a R1b-Z2103 sample, see below). That could also explain the presence of R1b-L151(xP312, xU106) in similar Balkan cultures, possibly not directly related to Yamna.
Image modified from Wang et al. (2018). PCA of ancient and modern samples. Red circle in dashed line around Varna, Greece Neolithic, and (approximate position of) Smyadovo outliers, part of Khvalynsk-Novodanilovka settlers.

A hidden group among north or west Pontic Eneolithic steppe cultures?

The expansion of Khvalynsk as Novodanilovka into the North Pontic area happened through the south across the steppe, near the coast, with the forest-steppe region working as a clear natural border for this culture of likely horse-riding chieftains, whose economy was probably based on some rudimentary form of mobile pastoralism.

Although archaeologists are divided as to the origin of each individual Middle Eneolithic group near the Black Sea after the end of the Khvalynsk-Novodanilovka period, it seems more or less clear that steppe cultures like Cernavodă, Lower Mikhailovka, or Kvitjana are closer (or “more archaic”) in their steppe features, which connects them to Volga–Ural and Northern Caucasus cultures, like Northern Caucasus, Repin or Khvalynsk.

On the other hand, forest-steppe cultures like Dereivka (including Alexandria) show innovative traits and contacts with para- or sub-Neolithic cultures to the north, like Comb-Pit Ware groups, apart from corded decoration influenced by Trypillian groups to the west, especially in their later (‘Proto-Corded Ware‘) stage after ca. 3500 BC.

If Ukrainian researchers like Rassamakin are right, Early Yamna expanded not only from Repin settlers, but also from local steppe cultures adopting Repin traits to develop an Early Yamna culture, similar to how eastern (Volga–Ural groups) seem to have synchronously adopted Early Yamna without massive affluence of Repin settlements.

Furthermore, local traits develop in southern groups, like anthropomorphic stelae (shared with Kemi-Oba, direct heir of Lower Mikhailovka), and rich burials featuring wagons. These traits are seen in west Yamna settlers.

Modified from Rassamakin (1999), adding red color to Repin expansion. The system of the latest Eneolithic Pointic cultures and the sites of the Zhivotilovo-Volchanskoe type: 1) Volchanskoe; 2) Zhivotilovka; 3) Vishnevatoe; 4) Koisug.

Problems of this model include:

  1. On the North Pontic area – in contrast to the Volga–Ural region – , there was a clear “colonization” wave of Repin settlers, also supported by Ukrainian researchers, based on the number of new settlements and burials, and on the progressive retreat of Dereivka, Kvitjana, as well as (more recent) Maykop- and Trypillia-related groups from the North Pontic area ca. 3350/3300 BC. It seems unlikely that these expansionist, semi-nomadic, cattle-breeding, patrilineally-related steppe clans that were driving all native populations out of their territories suddenly decided, at some point during their spread into the North Pontic area ca. 3300-3100 BC, to join forces with some foreign male lineages from the area, and then continue their expansion to the west…
  2. Similar to the fate of R1b-P297 subclades in the Baltic after the expansion of Corded Ware migrants, previous haplogropus of the North Pontic region – such as R1a, R1b-V88, and I2 subclades basically disappeared from the ancient DNA record after the expansion of Khvalynsk-Novodanilovka, and then after the expansion of Yamna, as is clear from Yamna, Afanasevo, and Bell Beaker samples obtained to date. This, in combination with what we know about Y-chromosome bottlenecks in post-Neolithic expansions, leaves little space to think that a big enough territorial group with a majority of “native” haplogroups could survive later expansions (be it R1b-L51 or R1a-Z645).
  3. Supporting an expansion of the same male (and partly female) population, the Yamna admixture from east to west is quite homogeneous, with the only difference found in (non-significant) EEF-like proportion which becomes elevated in distant areas [apart from significant ‘southern’ contribution to certain outlier samples]. Based on the also homogeneous Y-DNA picture, the heterogeneity must come, in general, from the female exogamy practiced by expanding groups.
  4. There is a short period, spanning some centuries (approximately 3300-2700 BC), in which the North Pontic area – especially the forest-steppe territories to the west of the Dnieper, i.e. the Upper Dniester, Boh, and Prut-Siret areas – are a chaos of incoming and emigrating, expanding and shrinking groups of different cultures, such as late Trypillian groups, Maykop-related traits, TRB, GAC, (Proto-)Corded Ware, and Early Yamna settlements. No natural geographic frontier can be delimited between these groups, which probably interacted in different ways. Nevertheless, based on their cultural traits, admixture, and especially on their Y-DNA, it seems that they never incorporated foreign male lineages, beyond those they probably had during their initial expansion trends.
  5. The further expansionist waves of Early Yamna seen ca. 3100 BC, from the Danube Delta to the west, give an overall image of continuously expanding patrilineal clans of R1b-M269 subclades since the Khvalynsk-Novodanilovka migration, in different periodic steps, mostly from eastern Pontic-Caspian nuclei, usually overriding all encountered cultures and (especially male) populations, rather than showing long-term collaboration and interaction. Such interaction is seen only in exceptional cases, e.g. the long-term admixture between Abashevo and Poltavka, as seen in Proto-Indo-Iranian peoples and their language.
Image modified from Wang et al. (2018). PCA of ancient and modern samples. Arrows depicting Khvalynsk -> Yamna drift (blue), and hypothetic approximate Ukraine Eneolithic -> Yamna drift accompanying R1b-L51 (red).


We are living right now an exemplary ego-, (ethno-)nationalism-, and/or supremacy-deflating moment, for some individuals of eastern and northern European descent who believed that R1a or ‘steppe ancestry proportions’ meant something special. The same can be said about those who had interiorized some social or ethnolinguistic meaning for the origin of R1b in western Europe, N1c in north-eastern Europe, as well as Greeks, Iranians, Armenians, or Mediterranean peoples in general of ‘Near Eastern’ ancestry or haplogroups, or peoples of Near Eastern origin and/or language.

These people had linked their haplogroups or ancestry with some fantasy continuity of ‘their’ ancestral populations to ‘their’ territories or languages (or both), and all are being proven wrong.

Apart from teaching such people a lesson about what simplistic views are useful for – whether it is based on ABO or RH group, white skin, blond hair, blue eyes, lactase persistence, or on the own ancestry or Y-DNA haplogroup -, it teaches the rest of us what can happen in the near future among western Europeans. Because, until recently, most western Europeans were comfortably settled thinking that our ancestors were some remnant population from an older, Palaeolithic or Mesolithic population, who acquired Indo-European languages by way of cultural diffusion in different periods, including only minor migrations.

Judging by what we can see now among some individuals of Northern and Eastern European descent, the only thing that can worsen the air of superiority among western Europeans is when they realize (within a few years, when all these stupid battles to control the narrative fade) that not only are they the cultural ‘heirs’ of the Graeco-Roman tradition that began with the Roman Empire, but that most of them are the direct patrilineal descendants of Khvalynsk, Yamna, Bell Beaker, and European Bronze Age peoples, and thus direct descendants of Middle PIE, Late PIE, and NWIE speakers.

Steppe-related migrations ca. 3100-2600 BC with tentative linguistic identification.

The finding of R1b-L51 and R1b-Z2103 among expanding Suvorovo-Novodanilovka chieftains, with pockets of R1b-L51 remaining in steppe-like societies of the Balkans and the Carpathian Basin, would have beautifully complemented what we know about the East Yamna admixture with R1a-Z93 subclades (Uralic speakers) ca. 2600-2100 BC to form Proto-Indo-Iranian, and about the regional admixtures seen in the Balkans, e.g. in Proto-Greeks, with the prevalent J subclades of the region.

It would have meant an end to any modern culture or nation identifying themselves with the ‘true’ Late PIE and Yamna heirs, because these would be exclusively associated with the expansion of R1b-Z2103 subclades with late Repin, and later as the full-fledged Late PIE with Yamna settlers to south-east and central Europe, and to the southern Urals. The language would have had then obviously undergone different language changes in all these territories through long-lasting admixture with other populations. In that sense, it would have ended with the ideas of supremacy in western Europe before they even begin.

The most likely future

However limited the evidence, it seems that R1b-L51 expanded with Yamna, though, based on the estimates for the haplogroups involved, and on marginal hints at the variability of L23 subclades within Yamna and neighbouring populations. If R1b-L51 expanded with West Repin / Early Yamna settlers, this is why they have not yet been found among Yamna samples:

Simplified map of Repin expansions from ca. 3500/3400 BC.
  • The subclade division of Yamna settlers needs not be 50:50 for L51:Z2103, either in time or in space. I think this is the simplistic view underlying many thoughts on this matter. Many different expanding patrilineal clans of L23 subclades may have been more or less successful in different areas, and non-Z2103 may have been on the minority, or more isolated relative to Z2103-clans among expanding peoples on the steppe, especially on the east. In fact, we usually talk in terms of “Z2103 vs. L51” as if
    1. these two were the only L23 subclades; and
    2. both had split and succeeded (expanding) synchronously;

    that is, as if there had not been multiple subclades of both haplogroups, and as if there had not been different expansion waves for hundreds of years stemming from different evolving nuclei, involving each time only limited (successful) clans. Many different subclades of haplogroups L23 (xZ2103, xL51), Z2103, and L51 must have been unsuccessful during the ca. 1,500 years of late Khvalynsk and late Repin-Early Yamna expansions in which they must have participated (for approximately 60-75 generations, based on a mean 20-25 years).

  • If we want to imagine a pocket of ‘hidden’ L51 for some region of the North Pontic or Carpathian region, the same can be imagined – and much more likely – for any unsampled territory of expanding late Repin/Early Yamna settlers from the Lower Don – Lower Volga region (probably already a mixed society of L51 and Z2103 subclades since their beginning, as the early Repin culture, ca. 3800 BC), with L51 clans being probably successful to the west.
  • The Repin culture expanded only in small, mobile settlements from the Lower Don – Lower Volga to the north, east, and south, starting ca. 3500/3400 BC, in the waves that eventually gave a rather early distant offshoot in the Altai region, i.e. Afanasevo. Starting ca. 3300 BC in the archaeological record, the majority of R1b-Z2103 subclades found to date in Afanasevo also supports either
    • a mixed Repin society, with Z2103-clans predominating among eastern settlers; or
    • a Repin society marked by haplogroup L51, and thus a cultural diffusion of late Repin/Early Yamna traits among neighbouring (Khvalynsk, Samara, etc.) groups of essentially the same (early Khvalynsk-Novodanilovka) genetic stock in the Volga–Ural region.

    Both options could justify a majority of Z2103 in the Lower Volga–Ural region, with the latter being supported by the scattered archaeological remains of late Repin in the region before the synchronous emergence of Early Yamna findings in the whole Pontic-Caspian steppe.

  • Most Z2103 from Yamna samples to date are from around 3100 BC (in average) onward, and from the right bank of the Lower Don to the east, particularly from the Lower Volga–Ural area (especially the Samara region), which – based on the center of expansion of late Repin settlers – may be depicting an artificially high Z2103-distribution of the whole Yamna community.
Repin expansion into the Volga–Ural region from ca. 3500/3400 BC. Map made by me based on maps and data from Morgunova (2014, 2016). Lopatino is marked with number 64.
  • Yamna sample I0443, R1b-L23 (Y410+, L51-), ca. 3300-2700 BCE from Lopatino II, points to an intermediate subclade between L23 and L51, near one of the supposed late Repin sites (based on kurgan burials with late Repin cultural traits) in the Samara region.
  • Other Balkan cultures potentially unrelated to the Yamna expansion also show Z2103 (and not only L51) subclades, like I3499 (ca. 2884-2666 calBC), of the Vučedol culture, from Beli Manastir-Popova zemlja, which points to the infiltration of Yamna peoples in other cultures. In any case, the appearance of R1b-L23 subclades in the region happens only after the Yamna expansion ca. 3100 BC, probably through intrusions into different neighbouring regions, if these Balkan cultures are not directly derived from Yamna settlements (which is probably the case of the Csepel Bell Beaker or early Nagýrev sample, see above).
  • The diversity of haplogroups found in or around the Carpathian Basin in Late Chalcolithic / Early Bronze Age samples, including L151(xP312, xU106), P312, U106, Z2103, makes it the most likely sink of Yamna settlers, who spread thus with expanding family clans of different R1b-L23 subclades.
  • Even though some Yamna vanguard groups are known to have expanded up to Saxony-Anhalt before ca. 2700 BC, haplogroup Z2103 seems to be restricted to more eastern regions, which suggests that R1b-L51 was already successful among expanding West Yamna clans in Hungary, which gave rise only later to expanding East Bell Beakers (overwhelmingly of L151 subclades). The source of R1b-L51 and L151 expansion over Z2103 must lie therefore in the West Yamna period, and not in the Bell Beaker expansion.
Yamna migrants ca. 3300-2600. Most likely site of admixture with GAC circled in red.
  • The R1b-Z2103 found in Poltavka, Catacomb, and to the south point to a late migration displacing the western R1b-L51, only after the late Repin expansion. This is also seen in the steppe ancestry and R1b-Z2103 south of the Caucasus, in Hajji Firuz, which points to this route as a potential source of the supposed “Earliest Proto-Indo-Iranian” (the mariannu term) of the Near East. A similar replacement event happened some centuries later with expanding R1a-Z93 subclades from the east wiping out haplogroup R1b-Z2103 from the Pontic-Caspian steppe.
  • Many ancient samples from Khvalynsk, Northern Caucasus, Yamna, or later ones are reported simply as R1b-M269 or L23, without a clear subclade, so the simplistic ‘Yamna–Z2103’ picture is not real: if one takes into account that Z2103 might have been successful quite early in the eastern region, it is more likely to obtain a successful Y-SNP call of a Z2103 subclade in the Volga–Ural region than a xZ2103 one.
  • There are some modern samples of R1b-L51 in eastern Europe and Asia, whose common simplistic attribution to “late expansions” is usually not substantiated; and also ancient R1b-L51 samples might be confirmed soon for Asia.
  • ‘Western’ features described by archaeologists for West Yamna settlers, associated with Kemi Oba and southern Yamna groups in the North Pontic area – like rich burials with anthropomorphic stelae and wagons – are actually absent in burials from settlers beyond Bulgaria, which does not support their affiliation with these local steppe groups of the Black Sea. Also, a mix with local traditions is seen accross all Early Yamna groups of the Pontic-Caspian steppe, and still genetics and common cultural traits point to their homogeneization under the same patrilineal clans expanding continuously for centuries. The maintenance of local traditions (as evidenced by East Bell Beakers in Iberia related to Iberian Proto-Beakers) is often not a useful argument in genetics, especially when the female population is not replaced.
Yamna settlers in the Great Pannonian Plain, showing only kurgans of Hungary ca. 2950-2500 BC. Yamna Hungary was one of the biggest West Yamna provinces. From Hórvath et al. (2013).


This is what we know, using linguistics, archaeology, and genetics:

  • Middle Proto-Indo-European expanded with Khvalynsk-Novodanilovka after ca. 4800 BC, with the first Suvorovo settlements dated ca. 4600 BC.
  • Archaic Late Proto-Indo-European expanded with late Repin (or Volga–Ural settlers related to Khvalynsk, influenced by the Repin expansion) into Afanasevo ca. 3500/3400 BC.
  • Late Proto-Indo-European expanded with Early Yamna settlers to the west into central Europe and the Balkans ca. 3100 BC; and also to the east (as Pre-Proto-Indo-Iranian) into the southern Urals ca. 2600 BC.
  • North-West Indo-European expanded with Yamna Hungary -> East Bell Beakers, from ca. 2500 BC.
  • Proto-Indo-Iranian expanded with Sintashta, Potapovka, and later Andronovo and Srubna from ca. 2100 BC.

It seems that the subclades from Khvalynsk ca. 4250-4000 BC were wrongly reported – like those of Narasimhan et al. (2018). However, even if they are real and YFull estimates have to be revised, and even if the split had happened before the expansion of Suvorovo-Novodanilovka, the most likely origin of R1b-L51 among Bell Beakers will still be the expansion of late Repin / Early Yamna settlers, and that is what ancient DNA samples will most likely show, whatever the social or political consequences.

The only relevance of the finding of R1b-L51 in one place or another – especially if it is found to be a remnant of a Middle PIE expansion coupled with centuries of admixture and interaction in the Carpathian Basin – is the potential influence of an archaic PIE (or non-IE) layer on the development of North-West Indo-European in Yamna Hungary -> East Bell Beaker. That is, more or less like the Uralic influence related to the appearance of R1a-Z93 among Proto-Indo-Iranians, of R1a-Z284 among Pre-Germanic peoples, and of R1a-Z282 among Balto-Slavic peoples.

I think there is little that ancient DNA samples from West Yamna could add to what we know in general terms of archaeology or linguistics at this point regarding Late PIE migrations, beyond many interesting details. I am sure that those who have not attributed some random 6,000-year-old paternal ancestor any magical (ethnic or nationalist) meaning are just having fun, enjoying more and more the precise data we have now on European prehistoric populations.

As for those who believe in magical consequences of genetic studies, I don’t think there is anything for them to this quest beyond the artificially created grand-daddy issues. And, funnily enough, those who played (and play) the ‘neutrality’ card to feel superior in front of others – the “I only care about the truth”-type of lie, while secretly longing for grandpa’s ethnolinguistic continuity – are suffering the hardest fall.


Sahara’s rather pale-green and discontinuous Sahelo-Sudanian steppe corridor, and the R1b – Afroasiatic connection


Interesting new paper (behind paywall) Megalakes in the Sahara? A Review, by Quade et al. (2018).

Abstract (emphasis mine):

The Sahara was wetter and greener during multiple interglacial periods of the Quaternary, when some have suggested it featured very large (mega) lakes, ranging in surface area from 30,000 to 350,000 km2. In this paper, we review the physical and biological evidence for these large lakes, especially during the African Humid Period (AHP) 11–5 ka. Megalake systems from around the world provide a checklist of diagnostic features, such as multiple well-defined shoreline benches, wave-rounded beach gravels where coarse material is present, landscape smoothing by lacustrine sediment, large-scale deltaic deposits, and in places, tufas encrusting shorelines. Our survey reveals no clear evidence of these features in the Sahara, except in the Chad basin. Hydrologic modeling of the proposed megalakes requires mean annual rainfall ≥1.2 m/yr and a northward displacement of tropical rainfall belts by ≥1000 km. Such a profound displacement is not supported by other paleo-climate proxies and comprehensive climate models, challenging the existence of megalakes in the Sahara. Rather than megalakes, isolated wetlands and small lakes are more consistent with the Sahelo-Sudanian paleoenvironment that prevailed in the Sahara during the AHP. A pale-green and discontinuously wet Sahara is the likelier context for human migrations out of Africa during the late Quaternary.

The whole review is an interesting read, but here are some relevant excerpts:

Various researchers have suggested that megalakes coevally covered portions of the Sahara during the AHP and previous periods, such as paleolakes Chad, Darfur, Fezzan, Ahnet-Mouydir, and Chotts (Fig. 2, Table 2). These proposed paleolakes range in size by an order of magnitude in surface area from the Caspian Sea–scale paleo-Lake Chad at 350,000 km2 to Lake Chotts at 30,000 km2. At their maximum, megalakes would have covered ~ 10% of the central and western Sahara, similar to the coverage by megalakes Victoria, Malawi, and Tanganyika in the equatorial tropics of the African Rift today. This observation alone should raise questions of the existence of megalakes in the Sahara, and especially if they developed coevally. Megalakes, because of their significant depth and area, generate large waves that become powerful modifiers of the land surface and leave conspicuous and extensive traces in the geologic record.

ETOPO1 digital elevation model (1 arc-minute; Amante and Eakins, 2009) of proposed megalakes in the Sahara Desert during the late Quaternary. Colors denote Köppen-Geiger climate zones: blue, Aw, Af, Am (tropical); light tan, Bwk, BSh, BSk, Csa, Csb, Cwb, Cfa, Cfb (temperate); red-brown, Bwh (arid, hot desert and steppe climate). Lake area at proposed megalake high stands and present Lake Victoria are in blue, and contributing catchment areas are shown as thin solid black lines. The main tributaries of Lake Chad are denoted by blue lines (from west to east: the Komadougou-Yobe, Logone, and Chari Rivers; source: Global Runoff Data Center, Koblenz, Germany). Rainfall isohyets (50, 200, 800, 1200, and 1600) are marked in dashed gray-scale lines. Physical parameters of each basin are shown in white boxes: Abt, total basin area; AW, lake area; Vw, lake volume; and aW= AW/Abt. Black dots mark the location of the paleohydrological records from Lezine et al. (2011), also compiled in Supplementary Table S5.

Lakes, megalakes, and wetlands

Active ground-water discharge systems abound in the Sahara today, although they were much more widespread in the AHP. They range from isolated springs and wet ground in many oases scattered across the Sahara (e.g., Haynes et al., 1989) to wetlands and small lakes (Kröpelin et al., 2008). Ground water feeding these systems is dominated by fossil AHP-age and older water (e.g., Edmunds and Wright 1979; Sonntag et al., 1980), although recently recharged water (<50 yr) has been locally identified in Saharan ground water (e.g., Sultan et al., 2000; Maduapuchi et al., 2006).

Megalake Chad

In our view, Lake Chad is the only former megalake in the Sahara firmly documented by sedimentologic and geomorphic evidence. Mega-Lake Chad is thought to have covered ~ 345,000 km2, stretching for nearly 8° (10–18°N) of latitude (Ghienne et al., 2002) (Fig. 2). The presence of paleo- Lake Chad was at one point challenged, but several—and in our view very robust—lines of evidence have been presented to support its development during the AHP. These include: (1) clear paleo-shorelines at various elevations, visible on the ground (Abafoni et al., 2014) and in radar and satellite images (Schuster et al., 2005; Drake and Bristow, 2006; Bouchette et al., 2010); (2) sand spits and shoreline berms (Thiemeyer, 2000; Abafoni et al., 2014); and (3) evaporites and aquatic fauna such as fresh-water mollusks and diatoms in basin deposits (e.g., Servant, 1973; Servant and Servant, 1983). Age determinations for all but the Holocene history of mega- Lake Chad are sparse, but there is evidence for Mio-Pliocene lake (s) (Lebatard et al., 2010) and major expansion of paleo- Lake Chad during the AHP (LeBlanc et al., 2006; Schuster et al., 2005; Abafoni et al., 2014; summarized in Armitage et al., 2015) up to the basin overflow level at ~ 329m asl.

Insights from hydrologic mass balance of megalakes

Graph of mean annual rainfall (mm/yr) versus aw (area lake/area basin, AW/AL); their modeled relationship using our Sahelo-Sudanian hydrologic model for the different lake basins are shown as solid colored lines. Superimposed on this (dashed lines) are the aw values for individual megalake basins and the mean annual rainfall required to sustain them. Mean annual paleo-rainfall estimates of 200– 400 mm/yr during the AHP from fossil pollen and mollusk evidence is shown as a tan box. The intersection of this box with the solid colored lines describes the resulting aw for Saharan paleolakes on the y-axis. The low predicted values for aw suggest that very large lakes would not form under Sahelo-Sudanian conditions where sustained by purely local rainfall and runoff. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Using these conservative conditions (i.e., erring in the direction that will support megalake formation), our hydrologic models for the two biggest central Saharan megalakes (Darfur and Fezzan) require minimum annual average rainfall amounts of ~ 1.1 m/yr to balance moisture losses from their respective basins (Supplementary Table S1). Lake Chad required a similar amount (~1 m/yr; Supplementary Table S1) during the AHP according to our calculations, but this is plausible, because even today the southern third of the Chad basin receives ≥1.2 m/yr (Fig. 2) and experiences a climate similar to Lake Victoria. A modest 5° shift in the rainfall belt would bring this moist zone northward to cover a much larger portion of the Chad basin, which spans N13° ±7°. Estimated rainfall rates for Darfur and Fezzan are slightly less than the average of ~ 1.3 m/yr for the Lake Victoria basin, because of the lower aw values, that is, smaller areas of Saharan megalakes compared with their respective drainage basins (Fig. 15).

Estimates of paleo-rainfall during the AHP

Here major contradictions develop between the model outcomes and paleo-vegetation evidence, because our Sahelo-Sudanian hydrologic model predicts wetter conditions and therefore more tropical vegetation assemblages than found around Lake Victoria today. In fact, none of the very wet rainfall scenarios required by all our model runs can be reconciled with the relatively dry conditions implied by the fossil plant and animal evidence. In short, megalakes cannot be produced in Sahelo-Sudanian conditions past or present; to form, they require a tropical or subtropical setting, and major displacements of the African monsoon or extra-desert moisture sources.

Change in mean annual precipitation over northern Africa between mid-Holocene (6 ka) and pre-industrial conditions in PMIP3 models (affiliations are provided in Supplementary Table S4). Lakes Victoria and Chad outlined in blue. (a) Ensemble mean change in mean annual precipitation and positions of the African summer (July–September) ensemble mean ITCZ during mid-Holocene (solid red line) and pre-industrial conditions (solid blue line). (b) Zonal average of change in mean annual precipitation over land (20°W–30°E) for the ensemble mean (thick black) and individual models are listed on right). The range of minimal estimated change in mean annual precipitation required to sustain steppe is shown in shaded green (Jolly et al., 1998).


If not megalakes, what size lakes, marshes, discharging springs, and flowing rivers in the Sahara were sustainable in Sahelo-Sudanian climatic conditions? For lakes and perennial rivers to be created and sustained, net rainfall in the basin has to exceed loss to evapotranspiration, evaporation, and infiltration, yielding runoff that then supplies a local lake or river. Our hydrologic models (see Supplementary Material) and empirical observations (Gash et al., 1991; Monteith, 1991) for the Sahel suggest that this limit is in the 200–300 mm/yr range, meaning that most of the Sahara during the AHP was probably too dry to support very large lakes or perennial rivers by means of local runoff. This does not preclude creation of local wetlands supplied by ground-water recharge focused from a very large recharge area or forced to the surface by hydrologic barriers such as faults, nor megalakes like Chad supplied by moisture from the subtropics and tropics outside the Sahel. But it does raise a key question concerning the size of paleolakes, if not megalakes, in the Sahara during the AHP. Our analysis suggests that Sahelo-Sudanian climate could perhaps support a paleolake approximately ≤5000 km2 in area in the Darfur basin and ≤10,000–20,000 km2 in the Fezzan basin. These are more than an order of magnitude smaller than the megalakes envisioned for these basins, but they are still sizable, and if enclosed in a single body of water, should have been large enough to generate clear shorelines (Enzel et al., 2015, 2017). On the other hand, if surface water was dispersed across a series of shallow and extensive but partly disconnected wetlands, as also implied by previous research (e.g., Pachur and Hoelzmann, 1991), then shorelines may not have developed.

One of the underdeveloped ideas of my Indo-European demic diffusion model was that R1b-V88 had migrated through South Italy to Northern Africa, and from it using the Sahara Green Corridor to the south, from where the “upside-down” view of Bender (2007) could have occurred, i.e. Afroasiatic expanding westwards within the Green Sahara, precisely at this time, and from a homeland near the Megalake Chad region (see here).

Whether or not R1b-V88 brought the ‘original’ lineage that expanded Afroasiatic languages may be contended, but after D’Atanasio et al. (2018) it seems that only two lineages, E-M2 and R1b-V88, fit the ‘star-like’ structure suggesting an appropriate haplogroup expansion and necessary regional distribution that could explain the spread of Afroasiatic languages within a reasonable time frame.

Palaeolithic migrations

This review shows that the hypothesized Green Sahara corridor full of megalakes that some proposed had fully connected Africa from west to east was actually a strip of Sahelo-Sudanian steppe spread to the north of its current distribution, including the Chad megalake, East Africa and Arabia, apart from other discontinuous local wetlands further to the north in Africa. This greenish belt would have probably allowed for the initial spread of early Afroasiatic proto-languages only through the southern part of the current Sahara Desert. This and the R1b-V88 haplogroup distribution in Central and North Africa (with a prevalence among Chadic speakers probably due to later bottlenecks), and the Near East, leaves still fewer possibilities for an expansion of Afroasiatic from anywhere else.

If my proposal turns out to be correct, this Afroasiatic-like language would be the one suggested by some in the vocabulary of Old European and North European local groups (viz. Kroonen for the Agricultural Substrate Hypothesis), and not Anatolian farmer ancestry or haplogroup G2, which would have been rather confined to Southern Europe, mainly south of the Loess line, where incoming Middle East farmers encountered the main difficulties spreading agriculture and herding, and where they eventually admixed with local hunter-gatherers.

NOTE. If related to attested languages before the Roman expansion, Tyrsenian would be a good candidate for a descendant of the language of Anatolian farmers, given the more recent expansion of Anatolian ancestry to the Tuscan region (even if already influenced by Iran farmer ancestry), which reinforces its direct connection to the Aegean.

The fiercest opposition to this R1b-V88 – Afroasiatic connection may come from:

  • Traditional Hamito-Semitic scholars, who try to look for any parent language almost invariably in or around the Near East – the typical “here it was first attested, ergo here must be the origin, too”-assumption (coupled with the cradle of civilization memes) akin to the original reasons behind Anatolian or Out-of-India hypotheses; and of course
  • autochthonous continuity theories based on modern subclades, of (mainly Semitic) peoples of haplogroup E or J, who will root for either one or the other as the Afroasiatic source no matter what. As we have seen with the R1a – Indo-European hypothesis (see here for its history), this is never the right way to look at prehistoric migrations, though.

I proposed that it was R1a-M417 the lineage marking an expansion of Indo-Uralic from the east near Lake Baikal, then obviously connected to Yukaghir and Altaic languages marked by R1a-M17, and that haplogroup R could then be the source of a hypothetic Nostratic expansion (where R2 could mark the Dravidian expansion), with upper clades being maybe responsible for Borean.

Simple Nostratic tree by Bomhard (2008)

However, recent studies have shown early expansions of R1b-297 to East Europe (Mathieson et al. 2017 & 2018), and of R1b-M73 to East Eurasia probably up to Siberia, and possibly reaching the Pacific (Jeong et al. 2018). Also, the Steppe Eneolithic and Caucasus Eneolithic clusters seen in Wang et al. (2018) would be able to explain the WHG – EHG – ANE ancestry cline seen in Mesolithic and Neolithic Eurasia without a need for westward migrations.

Dravidian is now after Narasimhan et al. (2018) and Damgaard et al. (Science 2018) more and more likely to be linked to the expansion of the Indus Valley civilization and haplogroup J, in turn strongly linked to Iranian farmer ancestry, thus giving support to an Elamo-Dravidian group stemming from Iran Neolithic.

NOTE. This Dravidian-IVC and Iran connection has been supported for years by knowledgeable bloggers and commenters alike, see e.g. one of Razib Khan’s posts on the subject. This rather early support for what is obvious today is probably behind the reactionary views by some nationalist Hindus, who probably saw in this a potential reason for a strengthened Indo-Aryan/Dravidian divide adding to the religious patchwork that is modern India.

I am not in a good position to judge Nostratic, and I don’t think Glottochronology, Swadesh lists, or any statistical methods applied to a bunch of words are of any use, here or anywhere. The work of pioneers like Illich-Svitych or Starostin, on the other hand, seem to me solid attempts to obtain a faithful reconstruction, if rather outdated today.

NOTE. I am still struggling to learn more about Uralic and Indo-Uralic; not because it is more difficult than Indo-European, but because – in comparison to PIE comparative grammar – material about them is scarce, and the few available sources are sometimes contradictory. My knowledge of Afroasiatic is limited to Semitic (Arabic and Akkadian), and the field is not much more developed here than for Uralic…

Spread of Y-haplogroup R1b(xM269) in Eurasia, according to Jeong et al. (2018).

If one wanted to support a Nostratic proto-language, though, and not being able to take into account genome-wide autosomal admixture, the only haplogroup right now which can connect the expansion of all its branches is R1b-M343:

  • R1b-L278 expanded from Asia to Europe through the Iranian Plateau, since early subclades are found in Iran and the Caucasus region, thus supporting the separation of Elamo-Dravidian and Kartvelian branches;
  • From the Danube or another European region ‘near’ the Villabruna 1 sample (of haplogroup R1b-L754):
    • R1b-V88 expanding everywhere in Europe, and especially the branch expanding to the south into Africa, may be linked to the initial Afroasiatic expansion through the Pale-Green Sahara corridor (and even a hypothetic expansion with E-M2 subclades and/or from the Middle East would also leave open the influence of V88 and previous R1b subclades from the Middle East in the emergence of the language);
    • R1b-297 subclades expanding to the east may be linked to Eurasiatic, giving rise to both Indo-Uralic (M269) and Macro- or Micro-Altaic (M73) expansions.

This is shameless, simplistic speculation, of course, but not more than the Nostratic hypothesis, and it has the main advantage of offering ‘small and late’ language expansions relative to other proposals spanning thousands (or even tens of thousands) of years more of language separation. On the other hand, that would leave Borean out of the question, unless the initial expansion of R1b subclades happened from a community close to lake Baikal (and Mal’ta) that was also at the origin of the other supposedly related Borean branches, whether linked to haplogroup R or to any other…

NOTE. If Afroasiatic and Indo-Uralic (or Eurasiatic) are not genetically related, my previous simplistic model, R1b-Afroasiatic vs. R1a-Eurasiatic, may still be supported, with R1a-M17 potentially marking the latest meaningful westward population expansion from which EHG ancestry might have developed (see here). Without detailed works on Nostratic comparative grammar and dialectalization, and especially without a lot more Palaeolithic and Mesolithic samples, all this will remain highly speculative, like proposals of the 2000s about Y-DNA-haplogroup – language relationships.


R1b-V88 migration through Southern Italy into Green Sahara corridor, and the Afroasiatic connection


Open access article The peopling of the last Green Sahara revealed by high-coverage resequencing of trans-Saharan patrilineages, by D’Atanasio, Trombetta, Bonito, et al., Genome Biology (2018) 19:20.


Little is known about the peopling of the Sahara during the Holocene climatic optimum, when the desert was replaced by a fertile environment.

In order to investigate the role of the last Green Sahara in the peopling of Africa, we deep-sequence the whole non-repetitive portion of the Y chromosome in 104 males selected as representative of haplogroups which are currently found to the north and to the south of the Sahara. We identify 5,966 mutations, from which we extract 142 informative markers then genotyped in about 8,000 subjects from 145 African, Eurasian and African American populations. We find that the coalescence age of the trans-Saharan haplogroups dates back to the last Green Sahara, while most northern African or sub-Saharan clades expanded locally in the subsequent arid phase.

Our findings suggest that the Green Sahara promoted human movements and demographic expansions, possibly linked to the adoption of pastoralism. Comparing our results with previously reported genome-wide data, we also find evidence for a sex-biased sub-Saharan contribution to northern Africans, suggesting that historical events such as the trans-Saharan slave trade mainly contributed to the mtDNA and autosomal gene pool, whereas the northern African paternal gene pool was mainly shaped by more ancient events.

Maximum parsimony Y chromosome tree and dating of the four trans-Saharan haplogroups. a Phylogenetic relations among the 150 samples analysed here. Each haplogroup is labelled in a different colour. The four Y sequences from ancient samples are marked by the dagger symbol. b Phylogenetic tree of the four trans-Saharan haplogroups, aligned to the timeline (at the bottom). At the tip of each lineage, the ethno-geographic affiliation of the corresponding sample is represented by a circle, coloured according to the legend (bottom left). The last Green Sahara period is highlighted by a green belt in the background

Also, interesting excerpts:

The fertile environment established in the Green Sahara probably promoted demographic expansions and rapid dispersals of the human groups, as suggested by the great homogeneity in the material culture of the early Holocene Saharan populations [62]. Our data for all the four trans-Saharan haplogroups are consistent with this scenario, since we found several multifurcated topologies, which can be considered as phylogenetic footprints of demographic expansions. The multifurcated structure of the E-M2 is suggestive of a first demographic expansion, which occurred about 10.5 kya, at the beginning of the last Green Sahara (Fig. 2; Additional file 2: Figure S4). After this initial expansion, we found that most of the trans-Saharan lineages within A3-M13, E-M2 and R-V88 radiated in a narrow time interval at 8–7 kya, suggestive of population expansions that may have occurred in the same time (Fig. 2; Additional file 2: Figures S3, S4 and S6). Interestingly, during roughly the same period, the Saharan populations adopted pastoralism, probably as an adaptive strategy against a short arid period [1, 62, 63]. So, the exploitation of pastoralism resources and the reestablishment of wetter conditions could have triggered the simultaneous population expansions observed here. R-V88 also shows signals of a further and more recent (~ 5.5 kya) Saharan demographic expansion which involved the R-V1589 internal clade. We observed similar demographic patterns in all the other haplogroups in about the same period and in different geographic areas (A3-M13/V3, E-M2/V3862 and E-M78/V32 in the Horn of Africa, E-M2/M191 in the central Sahel/central Africa), in line with the hypothesis that the start of the desertification may have caused massive economic, demographic and social changes [1].

Finally, the onset of the arid conditions at the end of the last African humid period was more abrupt in the eastern Sahara compared to the central Sahara, where an extensive hydrogeological network buffered the climatic changes, which were not complete before ~ 4 kya [6, 62, 64]. Consistent with these local climatic differences, we observed slight differences among the four trans-Saharan haplogroups. Indeed, we found that the contact between northern and sub-Saharan Africa went on until ~ 4.5 kya in the central Sahara, where we mainly found the internal lineages of E-M2 and R-V88 (Additional file 2: Figures S4 and S6). In the eastern Sahara, we found a sharper and more ancient (> 5 kya) differentiation between the people from northern Africa (and, more generally, from the Mediterranean area) and the groups from the eastern sub-Saharan regions (mainly from the Horn of Africa), as testified by the distribution and the coalescence ages of the A3-M13 and E-M78 lineages (Additional file 2: Figures S3 and S5).

Time estimates and frequency maps of the four trans-Saharan haplogroups and major sub-clades. a Time estimates of the four trans-Saharan clades and their main internal lineages. To the left of the timeline, the time windows of the main climatic/historical African events are reported in different colours (legend in the upper left). b Frequency maps of the main trans-Saharan clades and sub-clades. For each map, the relative frequencies (percentages) are reported to the right

R-V88 has been observed at high frequencies in the central Sahel (northern Cameroon, northern Nigeria, Chad and Niger) and it has also been reported at low frequencies in northwestern Africa [37]. Outside the African continent, two rare R-V88 sub-lineages (R-M18 and R-V35) have been observed in Near East and southern Europe (particularly in Sardinia)[30, 37, 38, 39]. Because of its ethno-geographic distribution in the central Sahel, R-V88 has been linked to the spread of the Chadic branch of the Afroasiatic linguistic family [37, 40].

(…) the R-V88 lineages date back to 7.85 kya and its main internal branch (branch 233) forms a “star-like” topology (“Star-like” index = 0.55), suggestive of a demographic expansion. More specifically, 18 out of the 21 sequenced chromosomes belong to branch 233, which includes eight sister clades, five of which are represented by a single subject. The coalescence age of this sub-branch dates back to 5.73 kya, during the last Green Sahara period. Interestingly, the subjects included in the “star-like” structure come from northern Africa or central Sahel, tracing a trans-Saharan axis. It is worth noting that even the three lineages outside the main multifurcation (branches 230, 231 and 232) are sister lineages without any nested sub-structure. The peculiar topology of the R-V88 sequenced samples suggests that the diffusion of this haplogroup was quite rapid and possibly triggered by the Saharan favourable climate (Fig. 2b).

One of the theories I proposed in the Indo-European demic diffusion model since the first edition – based mainly on phylogeography – is that R1b-V88 lineages had probably crossed the Mediterranean through southern Italy into a Green Sahara region, and distributed from there throuh important green corridors, humid areas between megalakes. Even though this new study – like the rest of them – is based solely on modern samples, and as such is quite prone to error in assessing ancient distributions – as we have seen in Europe -, it seems that a southern Italian route (probably through Sicily) for R1b-V88 and a late expansion through Green Sahara is more and more likely.

If we accept that the migration of R1b-V88 lineages is the last great expansion through a Green Sahara, then this expansion is a potential candidate for the initial Afroasiatic expansion – whereas older haplogroup expansions would represent languages different than Afroasiatic, and more recent haplogroup expansions would represent subsequent expansions of Afroasiatic dialects, like Semitic, Hamitic, Cushitic, or Chadic – as I explained in an older post.

In absolutely shameless speculative terms, then – as is today common in Genetic studies, by the way, so let’s all have some fun here – instead of some sort of R1b/Eurasiatic continuity in Europe, as some autochthonous continuists would like, this could mean that there would be an old Afroasiatic – R1b connection. That would imply:

NOTE. Regarding the contribution of CHG ancestry in the Pontic-Caspian steppe cultures, it is usually explained as caused by exogamy, or by absorption of a previous population (as in the Indo-Iranian case), although a contribution of communities of mainly J subclades to the formation of Neolithic steppe cultures cannot be ruled out. As for some autochthonous continuists’ belief in some sort of mythical mixed steppe people with mixed haplogroups and mixed language, well…

Simple Nostratic tree by Bomhard (2008)

The Pre-Indo-European linguistic situation, before the formation of Neolithic steppe cultures, seems like pure speculation, because a) language macro-families (with the exception of Afroasiatic) are highly speculative, b) sound anthropological models are lacking for them, and c) migrations inferred from haplogroup distributions of modern populations are often incorrect:

  • Haplogroup R could then be argued to be the source of Nostratic, and earlier subclades the source of Starostin’s Borean, given the distribution of its subclades in Asia and the timing of their migrations.
  • But of course one could also argue that, given the comparatively late population expansions that Genomics is showing, supporting Western European linguistic schools – where Russian Nostraticists tend to date languages further back in timeR1b (and not R) expansion could be the marker of Nostratic languages, due to its most likely southern path (and their old subclades found in Iran and the Caucasus), which would be more in line with the wet dreams of Europeans proposing R1b autochthonous continuity theories. I like this option far less because of that, but it cannot be ruled out.

If you have read this blog before, you know I profoundly dislike lexicostatistical and glottochronological methods, and I don’t like mass comparisons either. Whereas these methods pretend to apply mathematics to big (raw) data where there is almost no knowledge of what one is doing, comparative grammar applies complex reasoning where there is a lot of partially processed data.

But, it is always fun to ask “what if they were right?” and follow from there…

See also: