Yamnaya replaced Europeans, but admixed heavily as they spread to Asia

narasimhan-spread-yamnaya-ancestry

Recent papers The formation of human populations in South and Central Asia, by Narasimhan, Patterson et al. Science (2019) and An Ancient Harappan Genome Lacks Ancestry from Steppe Pastoralists or Iranian Farmers, by Shinde et al. Cell (2019).

NOTE. For direct access to Narasimhan, Patterson et al. (2019), visit this link courtesy of the first author and the Reich Lab.

I am currently not on holidays anymore, and the information in the paper is huge, with many complex issues raised by the new samples and analyses rather than solved, so I will stick to the Indo-European question, especially to some details that have changed since the publication of the preprint. For a summary of its previous findings, see the book series A Song of Sheep and Horses, in particular the sections from A Clash of Chiefs where I discuss languages and regions related to Central and South Asia.

I have updated the maps of the Preshistory Atlas, and included the most recently reported mtDNA and Y-DNA subclades. I will try to update the Eurasian PCA and related graphics, too.

NOTE. Many subclades from this paper have been reported by Kolgeh (download), Pribislav and Principe at Anthrogenica on this thread. I have checked some out for comparison, but even if it contradicted their analyses mine would be the wrong ones. I will upload my spreadsheets and link to them from this page whenever I find the time.

caucasus-cline-narasimhan
Ancestry clines (1) before and (2) after the advent of farming. Colour modified from the original to emphasize the CHG cline: notice the apparent relevance of forest-steppe groups in the formation of this CHG mating network from which Pre-Yamnaya peoples emerged.

Indo-Europeans

I think the Narasimhan, Patterson et al. (2019) paper is well-balanced, and unexpectedly centered – as it should – on the spread of Yamnaya-related ancestry (now Western_Steppe_EMBA) as the marker of Proto-Indo-European migrations, which stretched ca. 3000 BC “from Hungary in the west to the Altai mountains in the east”, spreading later Indo-European dialects after admixing with local groups, from the Atlantic to South Asia.

I. Afanasievo

I.1. East or West PIE?

I expected Afanasievo to show (1) R1b-L23(xZ2103, xL51) and (2) R1b-L51 lineages, apart from (3) the known R1b-Z2103 ones, pointing thus to an ancestral PIE community before the typical Yamnaya bottlenecks, and with R1b-L51 supporting a connection with North-West Indo-European. The presence of some samples of hg. Q pointed in this direction, too.

However, Afanasievo samples show overwhelmingly R1b-Z2103 subclades (all except for those with low coverage), all apparently under R1b-Z2108 (formed ca. 3500 BC, TMRCA ca. 3500 BC), like most samples from East Yamnaya.

This necessarily shifts the split and spread of R1b-L23 lineages to Khvalynsk/early Repin-related expansions, in line with what TMRCA suggested, and what advances by Anthony (2019) and Khokhlov (2018) on future samples from the Reich Lab suggest.

Given the almost indistinguishable ancestry between Afanasievo and Early Yamnaya, there seems to be as of yet little potential information to support in population genomics that Pre-Tocharians were more closely related to North-West Indo-Europeans than to Graeco-Aryans, as it is proposed in linguistics based on the few shared traits between them, and the lack of innovations proper of the Graeco-Aryan community.

NOTE. A new issue of Wekʷos contains an abstract from a relevant paper by Blažek on vocabulary for ‘word’, including the common NWIE *wrdʰo-/wordʰo-, but also a new (for me, at least) Northern Indo-European one: *rēki-/*rēkoi̯-, shared by Slavic and Tocharian.

The fact that bottlenecks happened around the time of the late Repin expansion suggests that we might be able to see different clans based on the predominant lineages developing around the Don-Volga area in the 4th millennium BC. The finding of Pre-R1b-L51 in Lopatino (see below), and of a Catacomb sample of hg. R1b-Z2103(Z2105-) in the North Caucasus steppe near Novoaleksandrovskij also support a star-like phylogeny of R1b-L23 stemming from the Don-Volga area.

NOTE. Interestingly, a dismissal of a common trunk between Tocharian and North-West Indo-European would mean that shared similarities between such disparate groups could be traced back to a Common Late PIE trunk, and not to a shared (western) Repin community. For an example of such a ‘pure’ East-West dialectal division, see the diagram of Adams & Mallory (2007) at the end of the post. It would thus mean a fatal blow to Kortlandt’s Indo-Slavonic group among other hypothetical groupings (remade versions of the ancient Centum-Satem division), as well as to certain assumptions about laryngeal survival or tritectalism that usually accompany them. Still, I don’t think this is the case, so the question will remain a linguistic one, and maybe some similarities will be found with enough number of samples that differentiate Northern Indo-Europeans from the East Yamna/Catacomb-Poltavka-Balkan_EBA group.

afanasievo-y-dna
Y-chromosome haplogroups of Afanasievo samples and neighbouring groups. See full maps.

I.2. Expansion or resurgence of hg. Q1b?

Haplogroup Q1b-Y6802(xY6798) seems to be the main lineage that expanded with Afanasievo, or resurged in their territory. It’s difficult to tell, because the three available samples are family, and belong to a later period.

NOTE. I have finally put some order to the chaos of Q1a vs. Q1b subclades in my spreadsheet and in the maps. The change of ISOGG 2016 to 2017 has caused that many samples reported as of Q1 subclades from papers prepared during the 2017-2018 period, and which did not provide specific SNP calls, were impossible to define with certainty. By checking some of them I could determine the specific standard used.

In favour of the presence of this haplogroup in the Pre-Yamnaya community are:

  • The statement by Anthony (2019) that Q1a [hence maybe Q1b in the new ISOGG nomenclature] represented a significant minority among an R1b-rich community.
  • The sample found in a Sintastha WSHG outlier (see below), of hg. Q1b-Y6798, and the sample from Lola, of hg. Q1b-L717, are thus from other lineage(s) separated thousands of years from the Afanasievo subclade, but might be related to the Khvalynsk expansion, like R1b-V1636 and R1b-M269 are.

These are the data that suggest multiple resurgence events in Afanasievo, rather than expanding Q1b lineages with late Repin:

  • Overwhelming presence of R1b in early Yamnaya and Afanasievo samples; one Q1(xQ1b) sample reported in Khvalynsk.
  • The three Q1b samples appear only later, although wide CI for radiocarbon dates, different sites, and indistinguishable ancestry may preclude a proper interpretation of the only available family.
    • Nevertheless, ancestry seems unimportant in the case of Afanasievo, since the same ancestry is found up to the Iron Age in a community of varied haplogroups.
  • Another sample of hg. Q1b-Y6802(xY6798) is found in Aigyrzhal_BA (ca. 2120 BC), with Central_Steppe_EMBA (WSHG-related) ancestry; however, this clade formed and expanded ca. 14000 BC.
  • The whole Altai – Baikal area seems to be a Q1b-L54 hotspot, although admittedly many subclades separated very early from each other, so they might be found throughout North Eurasia during the Neolithic.
  • One Afanasievo sample is reported as of hg. C in Shin (2017), and the same haplogroup is reported by Hollard (2014) for the only available sample of early Chemurchek to date, from Kulala ula, North Altai (ca. 2400 BC).
afanasievo-chemurchek-y-dna
Y-chromosome haplogroups of late Afanasievo – early Chemurchek samples and neighbouring groups. See full maps.

I.3. Agricultural substrate

Evidence of continuous contacts of Central_Steppe_MLBA populations with BMAC from ca. 2100 BC on – visible in the appearance of Steppe ancestry among BMAC samples and BMAC ancestry among Steppe pastoralists – supports the close interaction between Indo-Iranian pastoralists and BMAC agriculturalists as the origin of the Asian agricultural substrate found in Proto-Indo-Iranian, hence likely related to the language of the Oxus Civilization.

Similar to the European agricultural substrate adopted by West Yamnaya settlers (both NWIE and Palaeo-Balkan speakers), Tocharian shows a few substrate terms in common with Indo-Iranian, which can be explained by contacts in different dialectal stages through phonetic reconstruction alone.

The recent Hermes et al. (2019) supports the early integration of pastoralism and millet cultivation in Central Asia (ca. 2700 BC or earlier), with the spread of agriculture to the north – through the Inner Asian Mountain Corridor – being thus unrelated to the Indo-Iranian expansions, which might support independent loans.

However, compared to the huge number of parallel shared loans between NWIE and Palaeo-Balkan languages in the European substratum, Indo-Iranians seem to have been the first borrowers of vocabulary from Asian agriculturalists, while Proto-Tocharian shows just one certain related word, with phonetic similarities that warrant an adoption from late Indo-Iranian dialects.

chemurchek-sintashta-bmac
Y-chromosome haplogroups of Sintashta, Central Asia, and neighbouring groups in the Early Bronze Age. See full maps.

The finding of hg. (pre-)R1b-PH155 in a BMAC sample from Dzharkutan (to the west of Xinjiang) together with hg. R1b in a sample from Central Mongolia previously reported by Shin (2017) support the widespread presence of this lineage to the east and west of Xinjiang, which means it might have become incorporated to Indo-Iranian migrants into the Xiaohe horizon, to the Afanasievo-Chemurchek-derived groups, or the later from the former. In other words, the Island Biogeography Theory with its explanation of founder effects might be, after all, applicable to the whole Xinjiang area, not only during the Chemurchek – Tianshan-Beilu – Xiaohe interaction.

Of course, there is no need for too complicated models of haplogroup resurgence events in Central and South Asia, seeing how the total amount of hg. R1a-L657 (today prevalent among Indo-Aryan speakers from South Asia) among ancient Western/Central_Steppe_MLBA-related samples amounts to a total of 0, and that many different lineages survived in the region. Similar cases of haplogroup resurgence and Y-DNA bottleneck events are also found in the Central and Eastern Mediterranean, and in North-Eastern Europe. From the paper:

[It] could reflect stronger ecological or cultural barriers to the spread of people in South Asia than in Europe, allowing the previously established groups more time to adapt and mix with incoming groups. A second difference is the smaller proportion of Steppe pastoralist– related ancestry in South Asia compared with Europe, its later arrival by ~500 to 1000 years, and a lower (albeit still significant) male sex bias in the admixture (…).

Y-chromosome haplogroups of samples from the Srubna-Andronovo and Andronovo-related horizon, Xiaohe, late BMAC, and neighbouring groups. See full maps.

II. R1b-Beakers replaced R1a-CWC peoples

II.1. R1a-M417-rich Corded Ware

Newly reported Corded Ware samples from Radovesice show hg. R1a-M417, at least some of them xZ645, ‘archaic’ lineages shared with the early Bergrheinfeld sample (ca. 2650 BC) and with the coeval Esperstedt family, hence supporting that it eventually became the typical Western Corded Ware lineage(s), probably dominating over the so-called A-horizon and the Single Grave culture in particular. On the other hand, R1a-Z645 was typical of bottlenecks among expanding Eastern Corded Ware groups.

Interestingly, it is supported once again that known bottlenecks under hg. R1a-M417 happened during the Corded Ware expansion, evidenced also by the remarkable high variability of male lineages among early Corded Ware samples. Similarly, these Corded Ware samples from Bohemia form part of the typical ‘Central European’ cluster in the PCA, which excludes once again not only the ‘official’ Espersted outlier I1540, but also the known outlier with Yamnaya ancestry.

NOTE. The fact that Esperstedt is closely related geographically and in terms of ancestry to later Únětice samples further complicates the assumption that Únětice is a mixture of Bell Beakers and Corded Ware, being rather an admixture of incoming Bell Beakers with post-Yamnaya vanguard settlers who admixed with Corded Ware (see more on the expansion of Yamnaya ancestry). In other words, Únětice is rather an admixture of Yamnaya+EEF with Yamnaya+(CWC+EEF).

Y-chromosome haplogroups of samples from Catacomb, Poltavka, Balkan EBA, and Bell Beaker, as well as neighbouring groups. See full maps.

On Ukraine_Eneolithic I6561

If the bottlenecks are as straightforward as they appear, with a star-like phylogeny of R1a-M417 starting with the Pre-Corded Ware expansion, then what is happening with the Alexandria sample, so precisely radiocarbon dated to ca. 4045-3974 BC? The reported hg. R1a-M417 was fully compatible, while R1a-Z645 could be compatible with its date, but the few positive SNPs I got in my analysis point indeed to a potential subclade of R1a-Z94, and I trust more experienced hobbyists in this ‘art’ of ascertaining the SNPs of ancient samples, and they report hg. R1a-Z93 (Z95+, Y26+, Y2-).

Seeing how Y-DNA bottlenecks worked in Yamnaya-Afanasievo and in Corded Ware and related groups, and if this sample really is so deep within R1a-Z93 in a region that should be more strongly affected by the known Neolithic Y-chromosome bottlenecks and forest-steppe ecotone, someone from the lab responsible for this sample should check its date once again, before more people keep chasing their tails with an individual that (based on its derived SNPs’ TMRCA) might actually be dated to the Bronze Age, where it could make much more sense in terms of ancestry and position in the PCA.

EDIT (14 SEP 2019): … and with the fact that he is the first individual to show the genetic adaptation for lactase persistence (I3910-T), which is only found later among Bell Beakers, and much later in Sintashta and related Steppe_MLBA peoples (see comments below).

This is also evidenced by the other Ukraine_Eneolithic (likely a late Yamnaya) sample of hg. R1b-Z2103 from Dereivka (ca. 2800 BC) and who – despite being in a similar territory 1,000 years later – shows a wholly diluted Yamnaya ancestry under typically European HG ancestry, even more so than other late Sredni Stog samples from Dereivka of ca. 3600-3400 BC, suggesting a decrease in Steppe ancestry rather than an increase – which is supposedly what should be expected based on the ancestry from Alexandria…

Like the reported Chalcolithic individual of Hajji Firuz who showed an apparently incompatible subclade and Yamnaya ancestry at least some 1,000 years before it should, and turned out to be from the Iron Age (see below), this may be another case of wrong radiocarbon dating.

NOTE. It would be interesting, if this turns out to be another Hajji Firuz-like error, to check how well different ancestry models worked in whose hands exactly, and if anyone actually pointed out that this sample was derived, and not ancestral, to many different samples that were used in combination with it. It would also be a great control to check if those still supporting a Sredni Stog origin for PIE would shift their preference even more to the north or west, depending on where the first “true” R1a-M417 samples popped up. Such a finding now could be thus a great tool to discover whether haplogroup-based bias plays a role in ancestry magic as related to the Indo-European question, i.e. if it really is about “pure statistics”, or there is something else to it…

II.1. R1b-L51-rich Bell Beakers

The overwhelming majority of R1b-L51 lineages in Radovesice during the Bell Beaker period, just after the sampled Corded Ware individuals from the same site, further strengthen the hypothesis of an almost full replacement of R1a-M417 lineages from Central Europe up to southern Scandinavia after the arrival of Bell Beakers.

Yet another R1b-L151* sample has popped up in Central Europe, in the individual classified as Bilina_BA (ca. 2200-800 BC), which clusters with Bell Beakers from Bohemia, with the outlier from Turlojiškė, and with Early Slavs, suggesting once again that a group of central-east European Beakers represented the Pre-Proto-Balto-Slavic community before their spread and admixture events to the east.

The available ancient distribution of R1b-L51*, R1b-L52* or R1b-L151* is getting thus closer to the most likely origin of R1b-L51 in the expansion of East Bell Beakers, who trace their paternal ancestors to Yamnaya settlers from the Carpathian Basin:

NOTE. Some of these are from other sources, and some are samples I have checked in a hurry, so I may have missed some derived SNPs. If you send me a corrected SNP call to dismiss one of these, or more ‘archaic’ samples, I’ll correct the map accordingly. See also maps of modern distributionof R1b-M269 subclades.

r1b-l51-ancient-europe
Distribution of ‘archaic’ R1b-L51 subclades in ancient samples, overlaid over a map of Yamnaya and Bell Beaker migrations. In blue, Yamnaya Pre-L51 from Lopatino (not shown) and R1b-L52* from BBC Augsburg. In violet, R1b-L51 (xP312,xU106) from BBC Prague and Poland. In maroon, hg. R1b-L151* from BBC Hungary, BA Bohemia, and (not shown) a potential sample from BBC at Mondelange, which is certainly xU106, maybe xP312. Interestingly, the earliest sample of hg. R1b-U106 (a lineage more proper of northern Europe) has been found in a Bell Beaker from Radovesice (ca. 2350 BC), between two of these ‘archaic’ R1b-L51 samples; and a sample possibly of hg. R1b-ZZ11+ (ancestral to DF27 and U152) was found in a Bell Beaker from Quedlinburg, Germany (ca. 2290 BC), to the north-west of Bohemia. The oldest R1b-U152 are logically from Central Europe, too.

III. Proto-Indo-Iranian

Before the emergence of Proto-Indo-Iranian, it seems that Pre-Proto-Indo-Iranian-speaking Poltavka groups were subjected to pressure from Central_Steppe_EMBA-related peoples coming from the (south-?)east, such as those found sampled from Mereke_BA. Their ‘kurgan’ culture was dated correctly to approximately the same date as Poltavka materials, but their ancestry and hg. N2(pre-N2a) – also found in a previous sample from Botai – point to their intrusive nature, and thus to difficulties in the Pre-Proto-Indo-Iranian community to keep control over the previous East Yamnaya territory in the Don-Volga-Ural steppes.

We know that the region does not show genetic continuity with a previous period (or was not under this ‘eastern’ pressure) because of an Eastern Yamnaya sample from the same site (ca. 3100 BC) showing typical Yamnaya ancestry. Before Yamnaya, it is likely that Pre-Yamnaya ancestry formed through admixture of EHG-like Khvalynsk with a North Caspian steppe population similar to the Steppe_Eneolithic samples from the North Caucasus Piedmont (see Anthony 2019), so we can also rule out some intermittent presence of a Botai/Kelteminar-like population in the region during the Khvalynsk period.

It is very likely, then, that this competition for the same territory – coupled with the known harsher climate of the late 3rd millennium BC – led Poltavka herders to their known joint venture with Abashevo chiefs in the formation of the Sintashta-Potapovka-Filatovka community of fortified settlements. Supporting these intense contacts of Poltavka herders with Central Asian populations, late ‘outliers’ from the Volga-Ural region show admixture with typical Central_Steppe_MLBA populations: one in Potapovka (ca. 2220 BC), of hg. R1b-Z2103; and four in the Sintashta_MLBA_o1 cluster (ca. 2050-1650 BC), with two samples of hg. R1b-L23 (one R1b-Z2109), one Q1b-L56(xL53), one Q1b-Y6798.

central-steppe-pastoralists
Outlier analysis reveals ancient contacts between sites. We plot the average of principal component 1 (x axis) and principal component 2 (y axis) for the West Eurasian and All Eurasian PCA plots (…). In the Middle to Late Bronze Age Steppe, we observe, in addition to the Western_Steppe_MLBA and Central_Steppe_MLBA clusters (indistinguishable in this projection), outliers admixed with other ancestries. The BMAC-related admixture in Kazakhstan documents northward gene flow onto the Steppe and confirms the Inner Asian Mountain Corridor as a conduit for movement of people.

Similar to how the Sintashta_MLBA_o2 cluster shows an admixture with central steppe populations and hg. R1a-Z645, the WSHG ancestry in those outliers from the o1 cluster of typically (or potentially) Yamnaya lineages show that Poltavka-like herders survived well after centuries of Abashevo-Poltavka coexistence and admixture events, supporting the formation of a Proto-Indo-Iranian community from the local language as pronounced by the incomers, who dominated as elites over the fortified settlements.

The Proto-Indo-Iranian community likely formed thus in situ in the Don-Volga-Ural region, from the admixture of locals of Yamnaya ancestry with incomers of Corded Ware ancestry – represented by the ca. 67% Yamnaya-like ancestry and ca. 33% ancestry from the European cline. Their community formed thus ca. 1,000 years later than the expansion of Late PIE ca. 3500 BC, and expanded (some 500 years after that) a full-fledged Proto-Indo-Iranian language with the Srubna-Andronovo horizon, further admixing with ca. 9% of Central_Steppe_EMBA (WSHG-related) ancestry in their migration through Central Asia, as reported in the paper.

IV. Armenian

The sample from Hajji Firuz, of hg. R1b-Z2103 (xPF331), has been – as expected – re-dated to the Iron Age (ca. 1193-1019 BC), hence it may offer – together with the samples from the Levant and their Aegean-like ancestry rapidly diluted among local populations – yet another proof of how the Late Bronze Age upheaval in Europe was the cause of the Armenian migration to the Armenoid homeland, where they thrived under the strong influence from Hurro-Urartian.

middle-east-armenia-y-dna
Y-chromosome haplogroups of the Middle East and neighbouring groups during the Late Bronze Age / Iron Age. See full maps.

Indus Valley Civilization and Dravidian

A surprise came from the analysis reported by Shinde et al. (2019) of an Iran_N-related IVC ancestry which may have split earlier than 10000 BC from a source common to Iran hunter-gatherers of the Belt Cave.

For the controversial Elamo-Dravidian hypothesis of the Muscovite school, this difference in ancestry between both groups (IVC and Iran Neolithic) seems to be a death blow, if population genomics was even needed for that. Nevertheless, I guess that a full rejection of a recent connection will come down to more recent and subtle population movements in the area.

EDIT (12 SEP): Apparently, Iosif Lazaridis is not so sure about this deep splitting of ‘lineages’ as shown in the paper, so we may be talking about different contributions of AME+ANE/ENA, which means the Elamo-Dravidian game is afoot; at least in genomics:

I shared the idea that the Indus Valley Civilization was linked to the Proto-Dravidian community, so I’m inclined to support this statement by Narasimhan, Patterson, et al. (2019), even if based only on modern samples and a few ancient ones:

The strong correlation between ASI ancestry and present-day Dravidian languages suggests that the ASI, which we have shown formed as groups with ancestry typical of the Indus Periphery Cline moved south and east after the decline of the IVC to mix with groups with more AASI ancestry, most likely spoke an early Dravidian language.

india-steppe-indus-valley-andamanese-ancestry
Natural neighbour interpolation of qpAdm results – Maximum A Posteriori Estimate from the Hierarchical Model (estimates used in the Narasimhan, Patterson et al. 2019 figures) for Central_Steppe_MLBA-related (left), Indus_Periphery_West-related (center) and Andamanese_Hunter-Gatherer-related ancestry (right) among sampled modern Indian populations. In blue, peoples of IE language; in red, Dravidian; in pink, Tibeto-Burman; in black, unclassified. See full image.

I am wary of this sort of simplistic correlation with modern speakers, because we have seen what happened with the wrong assumptions about modern Balto-Slavic and Finno-Ugric speakers and their genetic profile (see e.g. here or here). In fact, I just can’t differentiate as well as those with deep knowledge in South Asian history the social stratification of the different tribal groups – with their endogamous rules under the varna and jati systems – in the ancestry maps of modern India. The pattern of ancestry and language distribution combined with the findings of ancient populations seem in principle straightforward, though.

Conclusion

The message to take home from Shinde et al. (2019) is that genomic data is fully at odds with the Anatolian homeland hypothesis – including the latest model by Heggarty (2014)* – whose relevance is still overvalued today, probably due in part to the shift of OIT proponents to more reasonable Out-of-Iran models, apparently more fashionable as a vector of Indo-Aryan languages than Eurasian steppe pastoralists?
*The authors listed this model erroneously as Heggarty (2019).

The paper seems to play with the occasional reference to Corded Ware as a vector of expansion of Indo-European languages, even after accepting the role of Yamnaya as the most evident population expanding Late PIE to western Europe – and the different ancestry that spread with Indo-Iranian to South Asia 1,000 years later. However, the most cringe-worthy aspect is the sole citation of the debunked, pseudoscientific glottochronological method used by Ringe, Warnow, and Taylor (2002) to support the so-called “steppe homeland”, a paper and dialectal scheme which keeps being referenced in papers of the Reich Lab, probably as a consequence of its use in Anthony (2007).

On the other hand, these are the equivalent simplistic comments in Narasimhan, Patterson et al. (2019):

The Steppe ancestry in South Asia has the same profile as that in Bronze Age Eastern Europe, tracking a movement of people that affected both regions and that likely spread the unique features shared between Indo-Iranian and Balto-Slavic languages. (…), which despite their vast geographic separation share the “satem” innovation and “ruki” sound laws.

mallory-adams-tree
Indo-European dialectal relationships, from Mallory and Adams (2006).

The only academic closely related to linguistics from the list of authors, as far as I know, is James P. Mallory, who has supported a North-West Indo-European dialect (including Balto-Slavic) for a long time – recently associating its expansion with Bell Beakers – opposed thus to a Graeco-Aryan group which shared certain innovations, “Satemization” not being one of them. Not that anyone needs to be a linguist to dismiss any similarities between Balto-Slavic and Indo-Iranian beyond this phonetic trend, mind you.

Even Anthony (2019) supports now R1b-rich Pre-Yamnaya and Yamnaya communities from the Don-Volga region expanding Middle and Late Proto-Indo-European dialects.

So how does the underlying Corded Ware ancestry of eastern Europe (where Pre-Balto-Slavs eventually spread to from Bell Beaker-derived groups) and of the highly admixed (“cosmopolitan”, according to the authors) Sintashta-Potapovka-Filatovka in the east relate to the similar-but-different phonetic trends of two unrelated IE dialects?

If only there was a language substrate that could (as Shinde et al. put it) “elegantly” explain this similar phonetic evolution, solving at the same time the question of the expansion of Uralic languages and their strong linguistic contacts with steppe peoples. Say, Eneolithic populations of mainly hunter-fisher-gatherers from the North Pontic forest-steppes with a stronger connection to metalworking

Related

Mitogenomes show continuity of Neolithic populations in Southern India

New paper (behind paywall) Neolithic phylogenetic continuity inferred from complete mitochondrial DNA sequences in a tribal population of Southern India, by Sylvester et al. Genetica (2018).

This paper used a complete mtDNA genome study of 113 unrelated individuals from the Melakudiya tribal population, a Dravidian speaking tribe from the Kodagu district of Karnataka, Southern India.

Some interesting excerpts (emphasis mine):

Autosomal genetic evidence indicates that most of the ethnolinguistic groups in India have descended from a mixture of two divergent ancestral populations: Ancestral North Indians (ANI) related to People of West Eurasia, the Caucasus, Central Asia and the Middle East, and Ancestral South Indians (ASI) distantly related to indigenous Andaman Islanders (Reich et al. 2009). It is presumed that proto-Dravidian language, most likely originated in Elam province of South Western Iran, and later spread eastwards with the movement of people to the Indus Valley and later the subcontinent India (McAlpin et al. 1975; Cavalli-Sforza et al. 1988; Renfrew 1996; Derenko et al. 2013). West Eurasian haplogroups are found across India and harbor many deep-branching lineages of Indian mtDNA pool, and most of the mtDNA lineages of Western Eurasian ancestry must have a recent entry date less than 10 Kya (Kivisild et al. 1999a). The frequency of these lineages is specifically found among the higher caste groups of India (Bamshad et al. 1998, 2001; Basu et al. 2003) and many caste groups are direct descendants of Indo-Aryan immigrants (Cordaux et al. 2004). These waves of various invasions and subsequent migrations resulted in major demographic expansions in the region, which added new languages and cultures to the already colonized populations of India. Although previous genetic studies of the maternal gene pools of Indians had revealed a genetic connection between Iranian populations and the Arabian Peninsula, likely the result of both ancient and recent gene flow (Metspalu et al. 2004; Terreros et al. 2011).

mtdna-dravidian-south

Haplogroup HV14

mtDNA haplogroup HV14 has prominence in North/Western Europe, West Eurasia, Iran, and South Caucasus to Central Asia (Malyarchuk et al. 2008; Schonberg et al. 2011; Derenko et al. 2013; De Fanti et al. 2015). Although Palanichamy identified haplogroup HV14a1 in three Indian samples (Palanichamy et al. 2015), it is restricted to limited unknown distribution. In the present study, by the addition of considerable sequences from the Melakudiya population, a unique novel subclade designated as HV14a1b was found with a high frequency (43%) allowed us to reveal the earliest diverging sequences in the HV14 tree prior to the emergence of HV14a1b in Melakudiya. (…) The coalescence age for haplogroup HV14 in this study is dated ~ 16.1 ± 4.2 kya and the founder age of haplogroup HV14 in Melakudiya tribe, which is represented by a novel clade HV14a1b is ~ 8.5 ± 5.6 kya

hv14-mtdna-haplogroup
Maximum Parsimonious tree of complete mitogenomes constructed using 38 sequences from Melakudiya tribe and 11 previously published sequences belonging to haplogroup HV14 [Supplementary file Table S2] Suffixes @ indicate back mutation, a plus sign (+) an insertion. Control region mutations are underlined, and synonymous transitions are shown in normal font and non-synonymous mutations are shown in bold font. Coalescence ages (Kya) for complete coding region are shown in normal font and synonymous transitions are shown in Italics

Haplogroup U7a3a1a2

The coalescence age of haplogroup U7a3a1a2 dates to ~ 13.3 ± 4.0 kya. (…)

Although, haplogroup U7 has its origin from the Near East and is widespread from Europe to India, the phylogeny of Melakudiya tribe with subclade U7a3a1a2 clusters with populations of India (caste and tribe) and neighboring populations (Irwin et al. 2010; Ranaweera et al. 2014; Sahakyan et al. 2017), hint about the in-situ origin of the subclade in India from Indo-Aryan immigrants.

I am not a native English speaker, but this paper looks like it needs a revision by one.

Also – without comparison with ancient DNA – it is not enough to show coalescence age to prove an origin of haplogroup expansion in the Neolithic instead of later bottlenecks. However, since we are talking about mtDNA, it is likely that their analysis is mostly right.

Finally, one thing is to prove that the origin of the Indus Valley Civilization lies (in part) in peoples from the Iranian plateau, and to show with ASI ancestry that they are probably the origin of Proto-Dravidian expansion, and another completely different thing is to prove an Elamo-Dravidian connection.

Since that group is not really accepted in linguistics, it is like talking about proving – through that Iran Neolithic ancestry – a Sumero-Dravidian, or a Hurro-Dravidian connection…

Related

Sahara’s rather pale-green and discontinuous Sahelo-Sudanian steppe corridor, and the R1b – Afroasiatic connection

palaeolakes-world

Interesting new paper (behind paywall) Megalakes in the Sahara? A Review, by Quade et al. (2018).

Abstract (emphasis mine):

The Sahara was wetter and greener during multiple interglacial periods of the Quaternary, when some have suggested it featured very large (mega) lakes, ranging in surface area from 30,000 to 350,000 km2. In this paper, we review the physical and biological evidence for these large lakes, especially during the African Humid Period (AHP) 11–5 ka. Megalake systems from around the world provide a checklist of diagnostic features, such as multiple well-defined shoreline benches, wave-rounded beach gravels where coarse material is present, landscape smoothing by lacustrine sediment, large-scale deltaic deposits, and in places, tufas encrusting shorelines. Our survey reveals no clear evidence of these features in the Sahara, except in the Chad basin. Hydrologic modeling of the proposed megalakes requires mean annual rainfall ≥1.2 m/yr and a northward displacement of tropical rainfall belts by ≥1000 km. Such a profound displacement is not supported by other paleo-climate proxies and comprehensive climate models, challenging the existence of megalakes in the Sahara. Rather than megalakes, isolated wetlands and small lakes are more consistent with the Sahelo-Sudanian paleoenvironment that prevailed in the Sahara during the AHP. A pale-green and discontinuously wet Sahara is the likelier context for human migrations out of Africa during the late Quaternary.

The whole review is an interesting read, but here are some relevant excerpts:

Various researchers have suggested that megalakes coevally covered portions of the Sahara during the AHP and previous periods, such as paleolakes Chad, Darfur, Fezzan, Ahnet-Mouydir, and Chotts (Fig. 2, Table 2). These proposed paleolakes range in size by an order of magnitude in surface area from the Caspian Sea–scale paleo-Lake Chad at 350,000 km2 to Lake Chotts at 30,000 km2. At their maximum, megalakes would have covered ~ 10% of the central and western Sahara, similar to the coverage by megalakes Victoria, Malawi, and Tanganyika in the equatorial tropics of the African Rift today. This observation alone should raise questions of the existence of megalakes in the Sahara, and especially if they developed coevally. Megalakes, because of their significant depth and area, generate large waves that become powerful modifiers of the land surface and leave conspicuous and extensive traces in the geologic record.

megalakes-sahara
ETOPO1 digital elevation model (1 arc-minute; Amante and Eakins, 2009) of proposed megalakes in the Sahara Desert during the late Quaternary. Colors denote Köppen-Geiger climate zones: blue, Aw, Af, Am (tropical); light tan, Bwk, BSh, BSk, Csa, Csb, Cwb, Cfa, Cfb (temperate); red-brown, Bwh (arid, hot desert and steppe climate). Lake area at proposed megalake high stands and present Lake Victoria are in blue, and contributing catchment areas are shown as thin solid black lines. The main tributaries of Lake Chad are denoted by blue lines (from west to east: the Komadougou-Yobe, Logone, and Chari Rivers; source: Global Runoff Data Center, Koblenz, Germany). Rainfall isohyets (50, 200, 800, 1200, and 1600) are marked in dashed gray-scale lines. Physical parameters of each basin are shown in white boxes: Abt, total basin area; AW, lake area; Vw, lake volume; and aW= AW/Abt. Black dots mark the location of the paleohydrological records from Lezine et al. (2011), also compiled in Supplementary Table S5.

Lakes, megalakes, and wetlands

Active ground-water discharge systems abound in the Sahara today, although they were much more widespread in the AHP. They range from isolated springs and wet ground in many oases scattered across the Sahara (e.g., Haynes et al., 1989) to wetlands and small lakes (Kröpelin et al., 2008). Ground water feeding these systems is dominated by fossil AHP-age and older water (e.g., Edmunds and Wright 1979; Sonntag et al., 1980), although recently recharged water (<50 yr) has been locally identified in Saharan ground water (e.g., Sultan et al., 2000; Maduapuchi et al., 2006).

Megalake Chad

In our view, Lake Chad is the only former megalake in the Sahara firmly documented by sedimentologic and geomorphic evidence. Mega-Lake Chad is thought to have covered ~ 345,000 km2, stretching for nearly 8° (10–18°N) of latitude (Ghienne et al., 2002) (Fig. 2). The presence of paleo- Lake Chad was at one point challenged, but several—and in our view very robust—lines of evidence have been presented to support its development during the AHP. These include: (1) clear paleo-shorelines at various elevations, visible on the ground (Abafoni et al., 2014) and in radar and satellite images (Schuster et al., 2005; Drake and Bristow, 2006; Bouchette et al., 2010); (2) sand spits and shoreline berms (Thiemeyer, 2000; Abafoni et al., 2014); and (3) evaporites and aquatic fauna such as fresh-water mollusks and diatoms in basin deposits (e.g., Servant, 1973; Servant and Servant, 1983). Age determinations for all but the Holocene history of mega- Lake Chad are sparse, but there is evidence for Mio-Pliocene lake (s) (Lebatard et al., 2010) and major expansion of paleo- Lake Chad during the AHP (LeBlanc et al., 2006; Schuster et al., 2005; Abafoni et al., 2014; summarized in Armitage et al., 2015) up to the basin overflow level at ~ 329m asl.

Insights from hydrologic mass balance of megalakes

sahara-annaul-rainfall
Graph of mean annual rainfall (mm/yr) versus aw (area lake/area basin, AW/AL); their modeled relationship using our Sahelo-Sudanian hydrologic model for the different lake basins are shown as solid colored lines. Superimposed on this (dashed lines) are the aw values for individual megalake basins and the mean annual rainfall required to sustain them. Mean annual paleo-rainfall estimates of 200– 400 mm/yr during the AHP from fossil pollen and mollusk evidence is shown as a tan box. The intersection of this box with the solid colored lines describes the resulting aw for Saharan paleolakes on the y-axis. The low predicted values for aw suggest that very large lakes would not form under Sahelo-Sudanian conditions where sustained by purely local rainfall and runoff. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Using these conservative conditions (i.e., erring in the direction that will support megalake formation), our hydrologic models for the two biggest central Saharan megalakes (Darfur and Fezzan) require minimum annual average rainfall amounts of ~ 1.1 m/yr to balance moisture losses from their respective basins (Supplementary Table S1). Lake Chad required a similar amount (~1 m/yr; Supplementary Table S1) during the AHP according to our calculations, but this is plausible, because even today the southern third of the Chad basin receives ≥1.2 m/yr (Fig. 2) and experiences a climate similar to Lake Victoria. A modest 5° shift in the rainfall belt would bring this moist zone northward to cover a much larger portion of the Chad basin, which spans N13° ±7°. Estimated rainfall rates for Darfur and Fezzan are slightly less than the average of ~ 1.3 m/yr for the Lake Victoria basin, because of the lower aw values, that is, smaller areas of Saharan megalakes compared with their respective drainage basins (Fig. 15).

Estimates of paleo-rainfall during the AHP

Here major contradictions develop between the model outcomes and paleo-vegetation evidence, because our Sahelo-Sudanian hydrologic model predicts wetter conditions and therefore more tropical vegetation assemblages than found around Lake Victoria today. In fact, none of the very wet rainfall scenarios required by all our model runs can be reconciled with the relatively dry conditions implied by the fossil plant and animal evidence. In short, megalakes cannot be produced in Sahelo-Sudanian conditions past or present; to form, they require a tropical or subtropical setting, and major displacements of the African monsoon or extra-desert moisture sources.

sahara-palaeoclimate
Change in mean annual precipitation over northern Africa between mid-Holocene (6 ka) and pre-industrial conditions in PMIP3 models (affiliations are provided in Supplementary Table S4). Lakes Victoria and Chad outlined in blue. (a) Ensemble mean change in mean annual precipitation and positions of the African summer (July–September) ensemble mean ITCZ during mid-Holocene (solid red line) and pre-industrial conditions (solid blue line). (b) Zonal average of change in mean annual precipitation over land (20°W–30°E) for the ensemble mean (thick black) and individual models are listed on right). The range of minimal estimated change in mean annual precipitation required to sustain steppe is shown in shaded green (Jolly et al., 1998).

Conclusions

If not megalakes, what size lakes, marshes, discharging springs, and flowing rivers in the Sahara were sustainable in Sahelo-Sudanian climatic conditions? For lakes and perennial rivers to be created and sustained, net rainfall in the basin has to exceed loss to evapotranspiration, evaporation, and infiltration, yielding runoff that then supplies a local lake or river. Our hydrologic models (see Supplementary Material) and empirical observations (Gash et al., 1991; Monteith, 1991) for the Sahel suggest that this limit is in the 200–300 mm/yr range, meaning that most of the Sahara during the AHP was probably too dry to support very large lakes or perennial rivers by means of local runoff. This does not preclude creation of local wetlands supplied by ground-water recharge focused from a very large recharge area or forced to the surface by hydrologic barriers such as faults, nor megalakes like Chad supplied by moisture from the subtropics and tropics outside the Sahel. But it does raise a key question concerning the size of paleolakes, if not megalakes, in the Sahara during the AHP. Our analysis suggests that Sahelo-Sudanian climate could perhaps support a paleolake approximately ≤5000 km2 in area in the Darfur basin and ≤10,000–20,000 km2 in the Fezzan basin. These are more than an order of magnitude smaller than the megalakes envisioned for these basins, but they are still sizable, and if enclosed in a single body of water, should have been large enough to generate clear shorelines (Enzel et al., 2015, 2017). On the other hand, if surface water was dispersed across a series of shallow and extensive but partly disconnected wetlands, as also implied by previous research (e.g., Pachur and Hoelzmann, 1991), then shorelines may not have developed.

One of the underdeveloped ideas of my Indo-European demic diffusion model was that R1b-V88 had migrated through South Italy to Northern Africa, and from it using the Sahara Green Corridor to the south, from where the “upside-down” view of Bender (2007) could have occurred, i.e. Afroasiatic expanding westwards within the Green Sahara, precisely at this time, and from a homeland near the Megalake Chad region (see here).

Whether or not R1b-V88 brought the ‘original’ lineage that expanded Afroasiatic languages may be contended, but after D’Atanasio et al. (2018) it seems that only two lineages, E-M2 and R1b-V88, fit the ‘star-like’ structure suggesting an appropriate haplogroup expansion and necessary regional distribution that could explain the spread of Afroasiatic languages within a reasonable time frame.

palaeolithic
Palaeolithic migrations

This review shows that the hypothesized Green Sahara corridor full of megalakes that some proposed had fully connected Africa from west to east was actually a strip of Sahelo-Sudanian steppe spread to the north of its current distribution, including the Chad megalake, East Africa and Arabia, apart from other discontinuous local wetlands further to the north in Africa. This greenish belt would have probably allowed for the initial spread of early Afroasiatic proto-languages only through the southern part of the current Sahara Desert. This and the R1b-V88 haplogroup distribution in Central and North Africa (with a prevalence among Chadic speakers probably due to later bottlenecks), and the Near East, leaves still fewer possibilities for an expansion of Afroasiatic from anywhere else.

If my proposal turns out to be correct, this Afroasiatic-like language would be the one suggested by some in the vocabulary of Old European and North European local groups (viz. Kroonen for the Agricultural Substrate Hypothesis), and not Anatolian farmer ancestry or haplogroup G2, which would have been rather confined to Southern Europe, mainly south of the Loess line, where incoming Middle East farmers encountered the main difficulties spreading agriculture and herding, and where they eventually admixed with local hunter-gatherers.

NOTE. If related to attested languages before the Roman expansion, Tyrsenian would be a good candidate for a descendant of the language of Anatolian farmers, given the more recent expansion of Anatolian ancestry to the Tuscan region (even if already influenced by Iran farmer ancestry), which reinforces its direct connection to the Aegean.

The fiercest opposition to this R1b-V88 – Afroasiatic connection may come from:

  • Traditional Hamito-Semitic scholars, who try to look for any parent language almost invariably in or around the Near East – the typical “here it was first attested, ergo here must be the origin, too”-assumption (coupled with the cradle of civilization memes) akin to the original reasons behind Anatolian or Out-of-India hypotheses; and of course
  • autochthonous continuity theories based on modern subclades, of (mainly Semitic) peoples of haplogroup E or J, who will root for either one or the other as the Afroasiatic source no matter what. As we have seen with the R1a – Indo-European hypothesis (see here for its history), this is never the right way to look at prehistoric migrations, though.

I proposed that it was R1a-M417 the lineage marking an expansion of Indo-Uralic from the east near Lake Baikal, then obviously connected to Yukaghir and Altaic languages marked by R1a-M17, and that haplogroup R could then be the source of a hypothetic Nostratic expansion (where R2 could mark the Dravidian expansion), with upper clades being maybe responsible for Borean.

nostratic-tree
Simple Nostratic tree by Bomhard (2008)

However, recent studies have shown early expansions of R1b-297 to East Europe (Mathieson et al. 2017 & 2018), and of R1b-M73 to East Eurasia probably up to Siberia, and possibly reaching the Pacific (Jeong et al. 2018). Also, the Steppe Eneolithic and Caucasus Eneolithic clusters seen in Wang et al. (2018) would be able to explain the WHG – EHG – ANE ancestry cline seen in Mesolithic and Neolithic Eurasia without a need for westward migrations.

Dravidian is now after Narasimhan et al. (2018) and Damgaard et al. (Science 2018) more and more likely to be linked to the expansion of the Indus Valley civilization and haplogroup J, in turn strongly linked to Iranian farmer ancestry, thus giving support to an Elamo-Dravidian group stemming from Iran Neolithic.

NOTE. This Dravidian-IVC and Iran connection has been supported for years by knowledgeable bloggers and commenters alike, see e.g. one of Razib Khan’s posts on the subject. This rather early support for what is obvious today is probably behind the reactionary views by some nationalist Hindus, who probably saw in this a potential reason for a strengthened Indo-Aryan/Dravidian divide adding to the religious patchwork that is modern India.

I am not in a good position to judge Nostratic, and I don’t think Glottochronology, Swadesh lists, or any statistical methods applied to a bunch of words are of any use, here or anywhere. The work of pioneers like Illich-Svitych or Starostin, on the other hand, seem to me solid attempts to obtain a faithful reconstruction, if rather outdated today.

NOTE. I am still struggling to learn more about Uralic and Indo-Uralic; not because it is more difficult than Indo-European, but because – in comparison to PIE comparative grammar – material about them is scarce, and the few available sources are sometimes contradictory. My knowledge of Afroasiatic is limited to Semitic (Arabic and Akkadian), and the field is not much more developed here than for Uralic…

y-haplogroup-r1b-p343
Spread of Y-haplogroup R1b(xM269) in Eurasia, according to Jeong et al. (2018).

If one wanted to support a Nostratic proto-language, though, and not being able to take into account genome-wide autosomal admixture, the only haplogroup right now which can connect the expansion of all its branches is R1b-M343:

  • R1b-L278 expanded from Asia to Europe through the Iranian Plateau, since early subclades are found in Iran and the Caucasus region, thus supporting the separation of Elamo-Dravidian and Kartvelian branches;
  • From the Danube or another European region ‘near’ the Villabruna 1 sample (of haplogroup R1b-L754):
    • R1b-V88 expanding everywhere in Europe, and especially the branch expanding to the south into Africa, may be linked to the initial Afroasiatic expansion through the Pale-Green Sahara corridor (and even a hypothetic expansion with E-M2 subclades and/or from the Middle East would also leave open the influence of V88 and previous R1b subclades from the Middle East in the emergence of the language);
    • R1b-297 subclades expanding to the east may be linked to Eurasiatic, giving rise to both Indo-Uralic (M269) and Macro- or Micro-Altaic (M73) expansions.

This is shameless, simplistic speculation, of course, but not more than the Nostratic hypothesis, and it has the main advantage of offering ‘small and late’ language expansions relative to other proposals spanning thousands (or even tens of thousands) of years more of language separation. On the other hand, that would leave Borean out of the question, unless the initial expansion of R1b subclades happened from a community close to lake Baikal (and Mal’ta) that was also at the origin of the other supposedly related Borean branches, whether linked to haplogroup R or to any other…

NOTE. If Afroasiatic and Indo-Uralic (or Eurasiatic) are not genetically related, my previous simplistic model, R1b-Afroasiatic vs. R1a-Eurasiatic, may still be supported, with R1a-M17 potentially marking the latest meaningful westward population expansion from which EHG ancestry might have developed (see here). Without detailed works on Nostratic comparative grammar and dialectalization, and especially without a lot more Palaeolithic and Mesolithic samples, all this will remain highly speculative, like proposals of the 2000s about Y-DNA-haplogroup – language relationships.

Related: