The cradle of Russians, an obvious Finno-Volgaic genetic hotspot

pskov-novgorod-russia

First look of an accepted manuscript (behind paywall), Genome-wide sequence analyses of ethnic populations across Russia, by Zhernakova et al. Genomics (2019).

Interesting excerpts:

There remain ongoing discussions about the origins of the ethnic Russian population. The ancestors of ethnic Russians were among the Slavic tribes that separated from the early Indo-European Group, which included ancestors of modern Slavic, Germanic and Baltic speakers, who appeared in the northeastern part of Europe ca. 1,500 years ago. Slavs were found in the central part of Eastern Europe, where they came in direct contact with (and likely assimilation of) the populations speaking Uralic (Volga-Finnish and Baltic- Finnish), and also Baltic languages [11–13]. In the following centuries, Slavs interacted with the Iranian-Persian, Turkic and Scandinavian peoples, all of which in succession may have contributed to the current pattern of genome diversity across the different parts of Russia. At the end of the Middle Ages and in the early modern period, there occurred a division of the East Slavic unity into Russians, Ukrainians and Belarusians. It was the Russians who drove the colonization movement to the East, although other Slavic, Turkic and Finnish peoples took part in this movement, as the eastward migrations brought them to the Ural Mountains and further into Siberia, the Far East, and Alaska. During that interval, the Russians encountered the Finns, Ugrians, and Samoyeds speakers in the Urals, but also the Turkic, Mongolian and Tungus speakers of Siberia. Finally, in the great expanse between the Altai Mountains on the border with Mongolia, and the Bering Strait, they encountered paleo-Asiatic groups that may be genetically closest to the ancestors of the Native Americans. Today’s complex patchwork of human diversity in Russia has continued to be augmented by modern migrations from the Caucasus, and from Central Asia, as modern economic migrations take shape.

pskov-novgorod-pca-eurasia-yakut
Sample relatedness based on genotype data. Eurasia: Principal Component plot of 574 modern Russian genomes. Colors reflect geographical regions of collection; shapes reflect the sample source. Red circles show the location of Genome Russia samples.

In the current study, we annotated whole genome sequences of individuals currently living on the territory of Russia and identifying themselves as ethnic Russian or as members of a named ethnic minority (Fig. 1). We analyzed genetic variation in three modern populations of Russia (ethnic Russians from Pskov and Novgorod regions and ethnic Yakut from the Sakha Republic), and compared them to the recently released genome sequences collected from 52 indigenous Russian populations. The incidence of function-altering mutations was explored by identifying known variants and novel variants and their allele frequencies relative to variation in adjacent European, East Asian and South Asian populations. Genomic variation was further used to estimate genetic distance and relationships, historic gene flow and barriers to gene flow, the extent of population admixture, historic population contractions, and linkage disequilibrium patterns. Lastly, we present demographic models estimating historic founder events within Russia, and a preliminary HapMap of ethnic Russians from the European part of Russia and Yakuts from eastern Siberia.

pskov-novgorod-pca-finno-permic
Sample relatedness based on genotype data. Western Russia and neighboring countries: Principal Component plot of 574 modern Russian genomes. Colors reflect geographical regions of collection; shapes reflect the sample source. Red circles show the location of Genome Russia samples.

The collection of identified SNPs was used to inspect quantitative distinctions among 264 individuals from across Eurasia (Fig. 1) using Principal Component Analysis (PCA) (Fig. 2). The first and the second eigenvectors of the PCA plot are associated with longitude and latitude, respectively, of the sample locations and accurately separate Eurasian populations according to geographic origin. East European samples cluster near Pskov and Novgorod samples, which fall between northern Russians, Finno-Ugric peoples (Karelian, Finns, Veps etc.), and other Northeastern European peoples (Swedes, Central Russians, Estonian, Latvians, Lithuanians, and Ukrainians) (Fig. 2b). Yakut individuals map into the Siberian sample cluster as expected (Fig. 2a). To obtain an extended view of population relationships, we performed a maximum likelihood-based estimation of ancestry and population structure using ADMIXTURE [46](Fig. 2c). The Novgorod and Pskov populations show similar profiles with their Northeastern European ancestors while the Yakut ethnic group showed mixed ancestry similar to the Buryat and Mongolian groups.

pskov-novgorod-yakut-admixture
Population structure across samples in 178 populations from five major geographic regions (k=5). Samples are pooled across three different studies that covered the territory of Russian Federation (Mallick et al. 2016 [36], Pagani et al. 2016 [37], this study). The optimal k-value was selected by value of cross validation error. Russian samples from all studies (highlighted in bold dark blue) show a slight gradient from Eastern European (Ukrainian, Belorussian, Polish) to North European (Estonian Karelian, Finnish) structures, reflecting population history of northward expansion. Yakut samples from different studies (highlighted in bold red) also show a slight gradient from Mongolian to Siberian people (Evens), as expected from their original admixture and northward expansions. The samples originated from this study are highlighted, and plotted in separated boxes below.

Possible admixture sources of the Genome Russia populations were addressed more formally by calculating F3 statistics, which is an allele frequency-based measure, allowing to test if a target population can be modeled as a mixture of two source populations [48]. Results showed that Yakut individuals are best modeled as an admixture of Evens or Evenks with various European populations (Supplemental Table S4). Pskov and Novgorod showed admixture of European with Siberian or Finno-Ugric populations, with Lithuanian and Latvian populations being the dominant European sources for Pskov samples.

direction-expansion-russians
The heatmaps of gene flow barriers show for each point at the geographical map the interpolated differences in allele frequencies (AF) between the estimated AF at the point with AFs in the vicinity of this point. The direction of the maximal difference in allele frequencies is coded by colors and arrows.

So, Russians expanding in the Middle Ages as acculturaded Finno-Volgaic peoples.

Or maybe the true Germano-Slavonic™-speaking area was in north-eastern Europe, until the recent arrival of Finno-Permians with the totally believable Nganasan-Saami horde, whereas Yamna -> Bell Beaker represented Vasconic-Caucasian expanding all over Europe in the Bronze Age. Because steppe ancestry in Fennoscandia and Modern Basques in Iberia.

A really hard choice between equally plausible models.

Related

The genetic and cultural barrier of the Pontic-Caspian steppe – forest-steppe ecotone

steppe-forest-steppe-biomes

We know that the Caucasus Mountains formed a persistent prehistoric barrier to cultural and population movements. Nevertheless, an even more persistent frontier to population movements in Europe, especially since the Neolithic, is the Pontic-Caspian steppe – forest-steppe ecotone.

Like the Caucasus, this barrier could certainly be crossed, and peoples and cultures could permeate in both directions, but there have been no massive migrations through it. The main connection between both regions (steppe vs. forest-steppe/forest zone) was probably through its eastern part, through the Samara region in the Middle Volga.

The chances of population expansions crossing this natural barrier anywhere else seem quite limited, with a much less porous crossing region in the west, through the Dnieper-Dniester corridor.

A Persistent ecological and cultural frontier

It is very difficult to think about any culture that transgressed this persistent ecological and cultural frontier: many prehistoric and historical steppe pastoralists did appear eventually in the neighbouring forest-steppe areas during their expansions (e.g. Yamna, Scythians, or Turks), as did forest groups who permeated to the south (e.g. Comb Ware, GAC, or Abashevo), but their respective hold in foreign biomes was mostly temporary, because their cultures had to adapt to the new ecological environment. Most if not all groups originally from a different ecological niche eventually disappeared, subjected to renewed demographic pressure from neighbouring steppe or forest populations…

The Samara region in the Middle Volga may be pointed out as the true prehistoric link between forests and steppes (see David Anthony’s remarks), something reflected in its nature as a prehistoric sink in genetics. This strong forest – forest-steppe – steppe connection was seen in the Eurasian technocomplex, during the expansion of hunter-gatherer pottery, in the expansion of Abashevo peoples to the steppes (in one of the most striking cases of population admixture in the area), with Scythians (visible in the intense contacts with Ananyino), and with Turks (Volga Turks).

steppe-forest-steppe-europe
Simplified map of the distribution of steppes and forest-steppes (Pontic and Pannonian) and xeric grasslands in Eastern Central Europe (with adjoining East European ranges) with their regionalisation as used in the review (Northern—Pannonic—Pontic). Modified from Kajtoch et al. (2016).

Before the emergence of pastoralism, the cultural contacts of the Pontic region (i.e. forest-steppes) with the Baltic were intense. In fact, the connection of the north Pontic area with the Baltic through the Dnieper-Dniester corridor and the Podolian-Volhynian region is essential to understand the spread of peoples of post-Maglemosian and post-Swiderian cultures (to the south), hunter-gatherer pottery (to the north), TRB (to the south), Late Trypillian groups (north), GAC (south), or Comb Ware (south) (see here for Eneolithic movements), and finally steppe ancestry and R1a-Z645 with Corded Ware (north). After the complex interaction of TRB, Trypillia, GAC, and CWC during the expansion of late Repin, this traditional long-range connection is lost and only emerges sporadically, such as with the expansion of East Germanic tribes.

A barrier to steppe migrations into northern Europe

One may think that this barrier was more permeable, then, in the past. However, the frontier is between steppe and forest-steppe ecological niches, and this barrier evolved during prehistory due to climate changes. The problem is, before the drought that began ca. 4000 BC and increased until the Yamna expansion, the steppe territory in the north Pontic region was much smaller, merely a strip of coastal land, compared to its greater size ca. 3300 BC and later.

This – apart from the cultural and technological changes associated with nomadic pastoralism – justifies the traditional connection of the north Pontic forest-steppes to the north, broken precisely after the expansion of Khvalynsk, as the north Pontic area became gradually a steppe region. The strips of north Pontic and Azov steppes and Crimea seem to have had stronger connections to the Northern Caucasus and Northern Caspian steppes than with the neighbouring forest-steppe areas during the Upper Palaeolithic, Mesolithic, and Neolithic.

NOTE. We still don’t know the genetic nature of Mikhailovka or Ezero, steppe-related groups possibly derived from Novodanilovka and Suvorovo close to the Black Sea (which possibly include groups from the Pannonian plains), and how they compare to neighbouring typically forest-steppe cultures of the so-called late Sredni Stog groups, like Dereivka or partly Kvityana.

steppe-forest-steppe-migration-routes
Typical migration routes through European steppes and forest-steppes. Red line represents the persistent cultural and genetic barrier, with the latest evolution in steppe region represented by the shift from dashed line to the north. Arrows show the most common population movements. Modified from Kajtoch et al. (2016).

Despite the Pontic-Caspian steppes and forest-steppes neighbouring each other for ca. 2,000 km, peoples from forested and steppe areas had an obvious advantage in their own regions, most likely due to the specialization of their subsistence economy. While this is visible already in Palaeolithic and Mesolithic hunter-gatherers, the arrival of the Neolithic package in the Pontic-Caspian region incremented the difference between groups, by spreading specialized animal domestication. The appearance of nomadic pastoralism adapted to the steppe, eventually including the use of horses and carts, made the cultural barrier based on the economic know-how even stronger.

Even though groups could still adapt and permeate a different territory (from steppe to forest-steppe/forest and vice-versa), this required an important cultural change, to the extent that it is eventually complicated to distinguish these groups from neighbouring ones (like north-west Pontic Mesolithic or Neolithic groups and their interaction with the steppes, Trypillia-Usatovo, Scythians-Thracians, etc.). In fact, this steppe – forest-steppe barrier is also seen to the east of the Urals, with the distinct expansion of Andronovo and Seima-Turbino/Andronovo-like horizons, which seem to represent completely different ethnolinguistic groups.

As a result of this cultural and genetic barrier, like that formed by the Northern Caucasus:

1) No steppe pastoralist culture (which after the emergence of Khvalynsk means almost invariably horse-riding, chariot-using nomadic herders who could easily pasture their cows in the huge grasslands without direct access to water) has ever been successful in spreading to the north or north-west into northern Europe, until the Mongols. No forest culture has ever been successful in expanding to the steppes, either (except for the infiltration of Abashevo into Sintashta-Potapovka).

2) Corded Ware was not an exception: like hunter-gatherer pottery before it (and like previous population movements of TRB, late Trypillia, GAC, Comb Ware or Lublin-Volhynia settlers) their movements between the north Pontic area and central Europe happened through forest-steppe ecological niches due to their adaptation to them. There is no reason to support a direct connection of CWC with true steppe cultures.

3) The so-called “Steppe ancestry” permeated the steppe – forest-steppe ecotone for hundreds of years during the 5th and early 4th millennium BC, due to the complex interaction of different groups, and probably to the aridization trend that expanded steppe (and probably forest-steppe) to the north. Language, culture, and paternal lineages did not cross that frontier, though.

EDIT (4 FEB 2019): Wang et al. is out in Nature Communications. They deleted the Yamna Hungary samples and related analyses, but it’s interesting to see where exactly they think the trajectory of admixture of Yamna with European MN cultures fits best. This path could also be inferred long ago from the steppe connections shown by the Yamna Hungary -> Bell Beaker evolution and by early Balkan samples:

wang-yamna-connection
Prehistoric individuals projected onto a PCA of 84 modern-day West Eurasian populations (open symbols). Dashed arrows indicate trajectories of admixture: EHG—CHG (petrol), Yamnaya—Central European MN (pink), Steppe—Caucasus (green), and Iran Neolithic—Anatolian Neolithic (brown). Modified from the original, a red circle has been added to the Yamna-Central European MN admixture.

Related

The Iron Age expansion of Southern Siberian groups and ancestry with Scythians

iron_age-sarmatians

Maternal genetic features of the Iron Age Tagar population from Southern Siberia (1st millennium BC), by Pilipenko et al. (2018).

Interesting excerpts (emphasis mine):

The positions of non-Tagar Iron Age groups in the MDS plot were correlated with their geographic position within the Eurasian steppe belt and with frequencies of Western and Eastern Eurasian mtDNA lineages in their gene pools. Series from chronological Tagar stages (similar to the overall Tagar series) were located within the genetic variability (in terms of mtDNA) of Scythian World nomadic groups (Figs 5 and 6; S4 and S6 Tables). Specifically, the Early Tagar series was more similar to western nomads (North Pontic Scythians), while the Middle Tagar was more similar to the Southern Siberian populations of the Scythian period. The Late Tagar group (Tes`culture) belonging to the Early Xiongnu period had the “western-most” location on the MDS plot with the maximal genetic difference from Xiongnu and other eastern nomadic groups (but see Discussion concerning the low sample size for the Tes`series).

In a comparison of our Tagar series with modern populations in Eurasia, we detected similarity between the Tagar group and some modern Turkic-speaking populations (with the exception of the Indo-Iranian Tajik population) (Fig 7; S2 Table). Among the modern Turkic-speaking groups, populations from the western part of the Eurasian steppe belt, such as Bashkirs from the Volga-Ural region and Siberian Tatars from the West Siberian forest-steppe zone, were more similar to the Tagar group than modern Turkic-speaking populations of the Altay-Sayan mountain system (including the Khakassians from the Minusinsk basin) (Fig 7).

tagar-archaeology
Location of Tagar archaeological sites from which samples for this study were obtained. Burial grounds: 1—Novaya Chernaya-1; 2—Podgornoe Ozero, Barsuchiha-1, Barsuchiha-6, Barsuchiha-7; 3—Perevozinskiy; 4—Ulug-Kyuzyur, Kichik-Kyuzyur, Sovetskaya Khakassiya; 5—Tepsey-3, Tepsey-8, Tepsey-9; 6—Dolgiy Kurgan. https://doi.org/10.1371/journal.pone.0204062.g001

Mitochondrial DNA diversity and genetic relationships of the Tagar population

Our results are not inconsistent with the assumption of a probable role of gene flow due to the migration from Western Eurasia to the Minusinsk basin in the Bronze Age in the formation of the genetic composition of the Tagar population. Particularly, we detected many mtDNA lineages/clusters with probable West Eurasian origin that were dominant in modern populations of different parts of Europe, Caucasus, and the Near East (such as K and HV6) in our Tagar series based on a phylogeographic analysis.

We detected relatively low genetic distances between our Tagar population and two Bronze Age populations from the Minusinsk basin—the Okunevo culture population (pre-Andronovo Bronze Age) and Andronovo culture population, followed by Afanasievo population from the Minusinsk Basin and Middle Bronze Age population from the Mongolian Altai Mountains (the region adjacent to the Minusinsk basin) (Figs 3 and 6; S3 and S5 Tables). Among West Eurasian part of our Tagar series we also observed haplogroups/sub-haplogroups and haplotypes shared with Early and Middle Bronze Age populations from Minusinsk Basin and western part of Eurasian steppe belt (Fig 4; S5 Table). Thus, our results suggested a potentially significant role of the genetic components, introduced by migrants from Western Eurasia during the Bronze Age, in the formation of the genetic composition of the Tagar population. It is necessary to note the relatively small size of available mtDNA samples from the Bronze Age populations of Minusinsk basin; accordingly, additional mtDNA data for these populations are required to further confirm our inference.

tagar-mtdna-tree
Phylogenetic tree of mtDNA lineages from the Tagar population. Color coding of the Tagar stages: orange—the Early Tagar stage; blue—the Middle Tagar Stage; green—the Late Tagar stage. Color of haplogroup labels: yellow—for Western Eurasian haplogroups; red—for Eastern Eurasian haplogroups. https://doi.org/10.1371/journal.pone.0204062.g002

Another substantial part of the mtDNA pool of the Tagar and other eastern populations of the Scythian World is typical of populations in Southern Siberia and adjacent regions of Central Asia (autochthonous Central Asian mtDNA clusters). Most of these components belong to the East Eurasian cluster of mtDNA haplogroups. Moreover, the role of each of these components in the formation of the genetic composition of subsequent (to the present) populations in South Siberia and Central Asia could be very different. In this regard, cluster C4a2a (and its subcluster C4a2a1), and haplogroup A8 are of particular interest.

Genetic features of successive Tagar groups

We compared successive Tagar groups (Early, Middle, and Late Tagar) with each other and with other Iron Age nomadic populations to evaluate changes in the mtDNA pool structure. Despite the genetic similarity between the Early and Middle Tagar series and Scythian World nomadic groups (Figs 5 and 6; S4 and S6 Tables), there were some peculiarities. For example, the Early Tagar series was more similar to North Pontic Classic Scythians, while the Middle Tagar samples were more similar to the Southern Siberian populations of the Scythian period (i.e., completely synchronous populations of regions neighboring the Minusinsk basin, such as the Pazyryk population from the Altay Mountains and Aldy-Bel population from Tuva).

We observed differences in the mtDNA pool structure between the Early and the Middle chronological stages of the Tagar culture population, as evidenced by the change in the ratio of Western to Eastern Eurasian mtDNA components. The contribution of Eastern Eurasian lineages increased from about one-third (34.8%) in the Early Tagar group to almost one-half (45.8%) in the Middle Tagar group.

tagar-mtdna-fst
Results of multidimensional scaling based on matrix of Slatkin population differentiation (FST) according to frequencies of mtDNA haplogroup in Tagar populations and modern populations of Eurasia. Populations: Tagar (red pentagon) (this study); Mongolian-speaking populations: Khamnigans (Buryat Republic, Russia) [43]; Barghuts (Inner Mongolia, China) [44]; Buryats (Buryat Republic, Southern Siberia, Russia) [43]; Mongols (Mongolia) [45]. Turkic-speaking populations: Tuvinians (Tuva Republic, Russia) [43]; Tofalars (Irkutsk region, Russia) [46]; Altai-Kizhi ((Altai Republic, Russia) [43, 47]; Telenghits (Altai Republic, Russia) [43,47]; Tubalars (Altai Republic) [48]; Shors (Kemerovo region, Russia) [43, 47]; Khakassians (Khakassian Rupublic, Russia) [43, 46]; Altaian Kazakhs (Altai Republic) [49]; Kazakhs (Kazakhstan, Uzbekistan) [50, 51]; Kirghiz (Kyrgyzstan) [50, 51]; Uighurs (Kazakhstan and Xinjiang) [50, 52]; Siberian Tatars (Tyumen and Omsk regions, Russia) [53]; Tatars (Volga-Ural rigion, Russia) [54]; Bashkirs (Volga-Ural region, Russia) [55]; Uzbeks (Uzbekistan) [51, 56]; Turkmens (Turkmenistan) [51, 56]; Nogays [57]; Turkeys [58]; other populations: Evenks [43, 46]; Ulchi [59]; Koreans (South Korea) [43]; Han Chinese [60]; Zhuang (Guangxi, China) [61]; Tadjiks (Tadjikistan) [43, 51]; Iranians [60]; Russians [62]. https://doi.org/10.1371/journal.pone.0204062.g007

At the level of mtDNA haplogroups, we detected a decrease in the diversity of phylogenetic clusters during the transition from the Early Tagar to the Middle Tagar. This decline in diversity equally affected the West Eurasian and East Eurasian components of the Tagar mtDNA pool. It should be noted that this decrease can be partially explained by the smaller number of Middle Tagar than Early Tagar samples. Under a simple binomial approximation the mtDNA clusters, observed at frequencies of 6.3% and 11.7%, could be lost by chance in our Early (N = 46) and Middle (N = 24) Tagar samples, respectively. However, the simultaneous lack of several such clusters, with a total frequency in the gene pool of the Early group of 34.8%, is unlikely.

The observed reduction in the genetic distance between the Middle Tagar population and other Scythian-like populations of Southern Siberia(Fig 5; S4 Table), in our opinion, is primarily associated with an increase in the role of East Eurasian mtDNA lineages in the gene pool (up to nearly half of the gene pool) and a substantial increase in the joint frequency of haplogroups C and D (from 8.7% in the Early Tagar series to 37.5% in the Middle Tagar series). These features are characteristic of many ancient and modern populations of Southern Siberia and adjacent regions of Central Asia, including the Pazyryk population of the Altai Mountains. We did not obtain strong evidence for an intensification of genetic contact between the population of the Minusinsk basin and the Altai Mountains in the Middle Tagar period compared with the Early Tagar period. Although, several archaeologists have found evidence for the intensification of contact at the level of material culture, namely, a cultural influence of the population of the Altai Mountains (represented by the Pazyryk population) on the population of the Minusinsk basin (the Saragash Tagar group) [6, 71, 72].

Another important issue is the change in the genetic structure of the Tagar population during the transition from the Middle (Saragash) to the Late (Tes`) stage. The Late Tagar stage refers to the Xiongnu period. Many archaeologists suggest that the formation of the Tes`stage involved the direct cultural influence of the Xiongnu and/or related groups of nomads from more eastern regions of Central Asia [71, 73]. Some archaeologists have even suggested renaming the Tes`stage in the Tes`culture [71], emphasizing the role of new eastern cultural elements. If this influence also existed at the genetic level, then we would expect to observe new genetic elements in the Tes`gene pool, particularly those of East Eurasian origin.

Siberian ancestry

Just a reminder of the recent session in ISBA 8 on expanding Scythians (and also Mongolians and Turks) spreading Siberian ancestry, usually (wrongly) identified as “Uralic-Yeniseian” based on modern populations (similar to how steppe ancestry is wrongly identified as “Indo-European”), see the following graphic including the Tagar population:

siberian-genetic-component-chronology
Very important observation with implication of population turnover is that pre-Turkic Inner Eurasian populations’ Siberian ancestry appears predominantly “Uralic-Yeniseian” in contrast to later dominance of “Tungusic-Mongolic” sort (which does sporadically occur earlier). Alexander M. Kim

And also the poster by Alexander M. Kim et al. Yeniseian hypotheses in light of genome-wide ancient DNA from historical Siberia:

The relevance of ancient DNA data to debates in historical linguistics is an emphatic strand in much recent work on the archaeogenetics of Eurasia, where the discussion has focused heavily on Indo-European (Haak et al. 2015; Narasimhan et al. 2018; de Barros Damgaard et al. 2018a,b). We present new genome-wide ancient DNA data from a historical Siberian individual in relation to Yeniseian, an isolated language “microfamily” (Vajda 2014) that nonetheless sits at the center of numerous controversial proposals in historical linguistics and cultural interaction. Yeniseian’s sole surviving representative is Ket, a critically endangered language fluently spoken by only a few dozen individuals near the Middle Yenisei River of Central Siberia.

In strong contrast to the present-day picture, river names and argued substrate influences and loanwords in languages outside the current range of Yeniseian, as well as direct records from the Russian colonial period, indicate that speakers of extinct Yeniseian languages had a formerly much broader presence in the taiga of Central Siberia as well as further south in the mountainous Altai-Sayan region – and perhaps even further afield in Inner Asia (Vajda 2010; Gorbachov 2017; Blažek 2016). The consilience of these proposals with genetic data is not straightforward (Flegontov et al. 2015, 2017) and faces a major obstacle in the lack of genetic information from verifiable speakers of Yeniseian languages other than the Kets, who have had complex ongoing interactions with speakers of non-Yeniseian languages such as the Samoyedic Selkups. We attempt to remedy this with new historical Siberian aDNA data, orienting our search for common denominators and systematic difference in a broader landscape of concordance, discordance, and uncertainty at the interface of diachronic linguistics and genetics.

Related

Modern Sardinians show elevated Neolithic farmer ancestry shared with Basques

sardinia-europe-relation

New paper (behind paywall), Genomic history of the Sardinian population, by Chiang et al. Nature Genetics (2018), previously published as a preprint at bioRxiv (2016).

#EDIT (18 Sep 2018): Link to read paper for free shared by the main author.

Interesting excerpts (emphasis mine):

Our analysis of divergence times suggests the population lineage ancestral to modern-day Sardinia was effectively isolated from the mainland European populations ~140–250 generations ago, corresponding to ~4,300–7,000 years ago assuming a generation time of 30 years and a mutation rate of 1.25 × 10−8 per basepair per generation. (…) in terms of relative values, the divergence time between Northern and Southern Europeans is much more recent than either is to Sardinia, signaling the relative isolation of Sardinia from mainland Europe.

We documented fine-scale variation in the ancient population ancestry proportions across the island. The most remote and interior areas of Sardinia—the Gennargentu massif covering the central and eastern regions, including the present-day province of Ogliastra— are thought to have been the least exposed to contact with outside populations. We found that pre-Neolithic hunter-gatherer and Neolithic farmer ancestries are enriched in this region of isolation. Under the premise that Ogliastra has been more buffered from recent immigration to the island, one interpretation of the result is that the early populations of Sardinia were an admixture of the two ancestries, rather than the pre-Neolithic ancestry arriving via later migrations from the mainland. Such admixture could have occurred principally on the island or on the mainland before the hypothesized Neolithic era influx to the island. Under the alternative premise that Ogliastra is simply a highly isolated region that has differentiated within Sardinia due to genetic drift, the result would be interpreted as genetic drift leading to a structured pattern of pre-Neolithic ancestry across the island, in an overall background of high Neolithic ancestry.

sardinia-pca
PCA results of merged Sardinian whole-genome sequences and the HGDP Sardinians. See below for a map of the corresponding regions.

We found Sardinians show a signal of shared ancestry with the Basque in terms of the outgroup f3 shared-drift statistics. This is consistent with long-held arguments of a connection between the two populations, including claims of Basque-like, non-Indo-European words among Sardinian placenames. More recently, the Basque have been shown to be enriched for Neolithic farmer ancestry and Indo-European languages have been associated with steppe population expansions in the post-Neolithic Bronze Age. These results support a model in which Sardinians and the Basque may both retain a legacy of pre-Indo-European Neolithic ancestry. To be cautious, while it seems unlikely, we cannot exclude that the genetic similarity between the Basque and Sardinians is due to an unsampled pre-Neolithic population that has affinities with the Neolithic representatives analyzed here.

density-nuraghi-sardinia-genetics
Left: Geographical map of Sardinia. The provincial boundaries are given as black lines. The provinces are abbreviated as Cag (Cagliari), Cmp (Campidano), Car (Carbonia), Ori (Oristano), Sas (Sassari), Olb (Olbia-tempio), Nuo (Nuoro), and Ogl (Ogliastra). For sampled villages within Ogliastra, the names and abbreviations are indicated in the colored boxes. The color corresponds to the color used in the PCA plot (Fig. 2a). The Gennargentu region referred to in the main text is the mountainous area shown in brown that is centered in western Ogliastra and southeastern Nuoro.
Right: Density of Nuraghi in Sardinia, from Wikipedia.

While we can confirm that Sardinians principally have Neolithic ancestry on the autosomes, the high frequency of two Y-chromosome haplogroups (I2a1a1 at ~39% and R1b1a2 at ~18%) that are not typically affiliated with Neolithic ancestry is one challenge to this model. Whether these haplogroups rose in frequency due to extensive genetic drift and/or reflect sex-biased demographic processes has been an open question. Our analysis of X chromosome versus autosome diversity suggests a smaller effective size for males, which can arise due to multiple processes, including polygyny, patrilineal inheritance rules, or transmission of reproductive success. We also find that the genetic ancestry enriched in Sardinia is more prevalent on the X chromosome than the autosome, suggesting that male lineages may more rapidly trace back to the mainland. Considering that the R1b1a2 haplogroup may be associated with post-Neolithic steppe ancestry expansions in Europe, and the recent timeframe when the R1b1a2 lineages expanded in Sardinia, the patterns raise the possibility of recent male-biased steppe ancestry migration to Sardinia, as has been reported among mainland Europeans at large (though see Lazaridis and Reich and Goldberg et al.). Such a recent influx is difficult to square with the overall divergence of Sardinian populations observed here.

sardinian-admixture
Mixture proportions of the three-component ancestries among Sardinian populations. Using a method first presented in Haak et al. (Nature 522, 207–211, 2015), we computed unbiased estimates of mixture proportions without a parameterized model of relationships between the test populations and the outgroup populations based on f4 statistics. The three-component ancestries were represented by early Neolithic individuals from the LBK culture (LBK_EN), pre-Neolithic huntergatherers (Loschbour), and Bronze Age steppe pastoralists (Yamnaya). See Supplementary Table 5 for standard error estimates computed using a block jackknife.

Once again, haplogroup R1b1a2 (M269), and only R1b1a2, related to male-biased, steppe-related Indo-European migrations…just sayin’.

Interestingly, haplogroup I2a1a1 is actually found among northern Iberians during the Neolithic and Chalcolithic, and is therefore associated with Neolithic ancestry in Iberia, too, and consequently – unless there is a big surprise hidden somewhere – with the ancestry found today among Basques.

NOTE. In fact, the increase in Neolithic ancestry found in south-west Ireland with expanding Bell Beakers (likely Proto-Beakers), coupled with the finding of I2a subclades in Megalithic cultures of western Europe, would support this replacement after the Cardial and Epi-Cardial expansions, which were initially associated with G2a lineages.

I am not convinced about a survival of Palaeo-Sardo after the Bell Beaker expansion, though, since there is no clear-cut cultural divide (and posterior continuity) of pre-Beaker archaeological cultures after the arrival of Bell Beakers in the island that could be identified with the survival of Neolithic languages.

We may have to wait for ancient DNA to show a potential expansion of Neolithic ancestry from the west, maybe associated with the emergence of the Nuragic civilization (potentially linked with contemporaneous Megalithic cultures in Corsica and in the Balearic Islands, and thus with an Iberian rather than a Basque stock), although this is quite speculative at this moment in linguistic, archaeological, and genetic terms.

Nevertheless, it seems that the association of a Basque-Iberian language with the Neolithic expansion from Anatolia (see Villar’s latest book on the subject) is somehow strengthened by this paper. However, it is unclear when, how, and where expanding G2a subclades were replaced by native I2 lineages.

Related

Cystic fibrosis probably spread with expanding Bell Beakers

indo-european-uralic-bell-beaker-corded-ware-migrations

New paper (behind paywall) Estimating the age of p.(Phe508del) with family studies of geographically distinct European populations and the early spread of cystic fibrosis, by Farrell et al., European Journal of Human Genetics (2018).

Interesting excerpts (emphasis mine):

Our results revealed tMRCA average values ranging from 4725 to 1175 years ago and support the estimates of Serre et al. (3000–6000 years ago) [11], rather than Morral et al. (52,000 years ago) [6], but the latter figure was challenged by Kaplan et al. [26] because of disagreement with assumptions used in their calculations. In addition, the tMRCA values from western European regions reported herein refine the results of Fichou et al. [7] from a study of Breton CF patients in which the Estiage analysis suggested that the most common recent ancestor lived 115 generations ago. That tMRCA value, however, may have underestimated the age of p.(Phe508del) in Brittany due to consideration of all the haplotypes, even those that were reconstructed with ambiguities, as well as a potential bias associated with consanguinity due to including both haplotypes in homozygous families. In the more stringent Estiage analyses reported herein, those potential biases were avoided for all populations, leading to estimates of the oldest tMCRA values corresponding to the Early Bronze Age in western Europe, which is generally agreed to begin around 3000 BCE. This finding extends our results from a direct investigation of aDNA in teeth from Iron Age burials near Vienna around 350 BCE and allow us to conclude that p.(Phe508del) was present in that region long before then. More specifically, in the Austrian families studied, the Estiage data revealed a mean tMCRA value of 3575 years ago, which converts to 1558 BCE (Middle Bronze Age) [22].

Perhaps most remarkably, the estimated ages of p.(Phe508del) in the three western European regions (France, Ireland, and Denmark) were similar with closely overlapping 95% CI values. This observation is also in line with previously documented spatial autocorrelograms expressing genetic and geographical distance for these populations [24]. Such data provide more insight about the ancient origin of CF in our judgment—both when and where—and lead us to propose that CFTR p.(Phe508del) is derived from ancestors who lived in western Europe during the Bronze Age, as early as 2700 BCE, and that its relatively rapid dissemination occurred because of human migrations around the northwestern Atlantic trading routes [21] and then towards central and eastern Europe [22]. Diffusion from northwestern to central Europe in approximately 1000 years is consistent with the prominent Bronze Age migrations evident in the archeological record [21, 22] and from genomic studies of aDNA [27]. On the other hand, we are assuming a discrete origin of the principal CF-causing variant, but it is possible that p.(Phe508del) arose more than once or earlier, and then reached western Europe subsequently through Neolithic migrations.

cystic-fibrosis

[About Bell Beakers] (…) More specifically, their distinctive Bell Beaker pottery appeared and spread across western and central Europe beginning around 3000–2750 BCE and then disappeared between 2200 and 1800 BCE [22, 29]. Their migrations are linked to the advent of western and central European metallurgy, as they manufactured and traded metal goods, especially weapons, while traveling over long distances [30]. Most relevant to our study is the evidence that they migrated in a direction and over a time period that fits well with the pattern of tMRCA data we found for the p.(Phe508del) variant. Olalde et al. [29] have shown that both migration and cultural transmission played a major role in diffusion of the “Beaker Complex” and led to a “profound demographic transformation” of Britain after 2400 BCE. Moreover, the cultural elements that unite the widely distributed Beaker folk are so obvious that some have considered them a distinct ethnicity of Bronze Age people [33].

From our results, we propose the novel concept that large scale, long term west-to-east migrations of the Bell Beaker Europeans [22, 28–30] during the Bronze Age, could explain the dissemination of p.(Phe508del) in Europe and its documented northwest-to-southeast gradient [4].In fact, our tMRCA data show a temporal gradient also.

As you can see from the references, they consulted with Barry Cunliffe (or people accepting his theory), who is obsessed with Bell Beakers expanding Celtic languages from the British Isles. He is like the British equivalent of Danish scholar Kristian Kristiansen, and his obsession with Corded Ware = Indo-European (and Germanic = CWC Denmark), immutable no matter what genetic results might show.

The funny thing is, the interpretation of the paper is probably right. From what we can see in the data, it is quite possible that the disease spread with expanding Bell Beakers…only it spread from the East group in Hungary, i.e. from east to west. The regional difference in TMRCA and apparent west—east cline would point to the different expansions of affected lineages in the corresponding regions, and not to an origin in the British Isles.

Related

Male-biased expansions and migrations also observed in Northwestern Amazonia

Open access preprint Cultural Innovations influence patterns of genetic diversity in Northwestern Amazonia, by Arias et al., bioRxiv (2018).

Abstract (emphasis mine):

Human populations often exhibit contrasting patterns of genetic diversity in the mtDNA and the non-recombining portion of the Y-chromosome (NRY), which reflect sex-specific cultural behaviors and population histories. Here, we sequenced 2.3 Mb of the NRY from 284 individuals representing more than 30 Native-American groups from Northwestern Amazonia (NWA) and compared these data to previously generated mtDNA genomes from the same groups, to investigate the impact of cultural practices on genetic diversity and gain new insights about NWA population history. Relevant cultural practices in NWA include postmarital residential rules and linguistic-exogamy, a marital practice in which men are required to marry women speaking a different language. We identified 2,969 SNPs in the NRY sequences; only 925 SNPs were previously described. The NRY and mtDNA data showed that males and females experienced different demographic histories: the female effective population size has been larger than that of males through time, and both markers show an increase in lineage diversification beginning ~5,000 years ago, with a male-specific expansion occurring ~3,500 years ago. These dates are too recent to be associated with agriculture, therefore we propose that they reflect technological innovations and the expansion of regional trade networks documented in the archaeological evidence. Furthermore, our study provides evidence of the impact of postmarital residence rules and linguistic exogamy on genetic diversity patterns. Finally, we highlight the importance of analyzing high-resolution mtDNA and NRY sequences to reconstruct demographic history, since this can differ considerably between males and females.

y-dna-mtdna-amazonia
MDS plots for mtDNA and NRY. Stress values (within parentheses) are indicated in percentages.

Looking more precisely at the different groups (even with the resampling approach), there are no significant differences between matrilocal and patrilocal groups. At best, as the study proposes, “this is just one of the factors at play in structuring the observed genetic variation”.

Interesting excerpts:

(…) we found evidence that the patterns of genetic differentiation depend on the geographical scale of the study. The magnitude of between-population differentiation in the NRY compared to the mtDNA is smaller when looking at the continental scale than in NWA (Figure 6). This is in agreement with the findings of Wilkins and Marlowe (2006), who showed that the excess of between-population differentiation for the NRY in comparison to the mtDNA decreases when comparing more geographically distant populations. Heyer et al. (2012) and Wilkins and Marlowe (2006) have proposed that at a local scale the patterns of genetic diversity reflect cultural practices over a relatively small number of generations, whereas at a larger geographic scale the genetic diversity reflects old migration and/or old common ancestry patterns(Heyer et al. 2012; Wilkins and Marlowe 2006).

y-dna-mtdna-amazon
BSPs for the mtDNA and NRY sequences from NWA. The dotted lines indicate the 95% HPD intervals. Ne was corrected for generation time according to (Fenner 2005), using 26 years for mtDNA and 31 years for NRY.

The BSP plots and the diversity statistics indicate that overall the Ne of males has been smaller than that of females. One tentative explanation for this difference is that it reflects larger differences in reproductive success among males than among females. Some support for this explanation comes from the shape of the phylogenies (Supplementary Figures 1 and 6), since differences in reproductive success and the cultural transmission of fertility lead to imbalance phylogenies (Blum et al. 2006; Heyer et al. 2015). We estimated a common index of tree imbalance (Colless index) and calculated whether the mtDNA and NRY trees were more unbalanced than 1000 simulated trees generated under a Yule process (Bortolussi et al. 2006) (i.e. a simple pure birth process that assumes that the birth rate of new lineages is the same along the tree). We found that the NRY tree is more unbalanced than predicted by the Yule model (p-value=0.001), whereas the mtDNA tree is not significantly different from trees generated by the Yule model (p-value=0.628). It has been suggested that highly mobile hunter-gatherer societies, such as those typical of most of human prehistory, were polygynous bands (Dupanloup et al. 2003); similarly, nomadic horticulturalist Amazonian societies exhibit strong differences in reproductive success due to the common practice of polygyny, especially among community chiefs, whose offspring also enjoy a high fertility (Neel 1970; 1980; Neel and Weiss 1975).

Furthermore, a more recent expansion can be observed in the BSP based on the NRY, but not in the mtDNA BSP (Figure 5), indicating an expansion specifically in the paternal line. The reasons behind this recent male-biased population expansion, which starts ~3.5 kya, are as yet unclear. However, similar male-biased expansions have been observed in other studies using high-resolution NRY sequences (Batini et al. 2017; Karmin et al. 2015).

Related:

Ancient DNA reveals temporal population structure of pre-Incan and Incan periods in South‐Central Andes area

Ancient DNA reveals temporal population structure within the South‐Central Andes area, by Russo et al. Am. J. Phys. Anthropol. (2018).

Abstract (emphasis mine):

Objectives
The main aim of this work was to contribute to the knowledge of pre‐Hispanic genetic variation and population structure among the South‐central Andes Area by studying individuals from Quebrada de Humahuaca, North‐western (NW) Argentina.

Materials and methods
We analyzed 15 autosomal STRs in 19 individuals from several archaeological sites in Quebrada de Humahuaca, belonging to the Regional Developments Period (900–1430 AD). Compiling autosomal, mitochondrial, and Y‐chromosome data, we evaluated population structure and differentiation among eight South‐central Andean groups from the current territories of NW Argentina and Peru.

andes-populations
Location of the archaeological sites analyzed in this study (stars) and the South-central Andean populations used for comparisons (triangles). The punctuated line indicates the north-south subdivision of Quebrada de Humahuaca.1: Pe~nas Blancas, 2: San José, 3: Huacalera, 4: Banda de Perchel, 5: Juella, 6: Sarahuaico, 7: Tilcara, 8: Muyuna, 9: Los Amarillos, 10: Las Pirguas, 11: Tompullo 2, 12: Puca, 13: Acchaymarca, 14: Lauricocha. Map constructed from the obtained with the R package ggmap (Kahle & Wickham, 2013)

Results
Autosomal data revealed a structuring of the analyzed populations into two clusters which seemed to represent different temporalities in the Andean pre‐Hispanic history: pre‐Inca and Inca. All pre‐Inca samples fell into the same cluster despite being from the two different territories of NW Argentina and Peru. Also, they were systematically differentiated from the Peruvian Inca group. These results were mostly confirmed by mitochondrial and Y‐chromosome analyses. We mainly found a clearly different haplotype composition between clusters.

Discussion
Population structure in South America has been mostly studied on current native groups, mainly showing a west‐to‐east differentiation between the Andean and lowland regions. Here we demonstrated that genetic population differentiation preceded the European contact and might have been more complex than thought, being found within the South‐central Andes Area. Moreover, divergence among temporally different populations might be reflecting socio‐political changes occurred in the evermore complex pre‐Hispanic Andean societies.

pcoa-andes
Principal coordinates analysis (PCoA) based on individual genetic distances obtained with autosomal STRs data. Percentage of variance explained by each coordinate is shown in parenthesis. Colors were assigned according to the two clusters discovered with structure

See also:

Early Indo-Iranian formed mainly by R1b-Z2103 and R1a-Z93, Corded Ware out of Late PIE-speaking migrations

yamna-expansion-reich

The awaited, open access paper on Asian migrations is out: The Genomic Formation of South and Central Asia, by Narasimhan et al. bioRxiv (2018).

Abstract:

The genetic formation of Central and South Asian populations has been unclear because of an absence of ancient DNA. To address this gap, we generated genome-wide data from 362 ancient individuals, including the first from eastern Iran, Turan (Uzbekistan, Turkmenistan, and Tajikistan), Bronze Age Kazakhstan, and South Asia. Our data reveal a complex set of genetic sources that ultimately combined to form the ancestry of South Asians today. We document a southward spread of genetic ancestry from the Eurasian Steppe, correlating with the archaeologically known expansion of pastoralist sites from the Steppe to Turan in the Middle Bronze Age (2300-1500 BCE). These Steppe communities mixed genetically with peoples of the Bactria Margiana Archaeological Complex (BMAC) whom they encountered in Turan (primarily descendants of earlier agriculturalists of Iran), but there is no evidence that the main BMAC population contributed genetically to later South Asians. Instead, Steppe communities integrated farther south throughout the 2nd millennium BCE, and we show that they mixed with a more southern population that we document at multiple sites as outlier individuals exhibiting a distinctive mixture of ancestry related to Iranian agriculturalists and South Asian hunter-gathers. We call this group Indus Periphery because they were found at sites in cultural contact with the Indus Valley Civilization (IVC) and along its northern fringe, and also because they were genetically similar to post-IVC groups in the Swat Valley of Pakistan. By co-analyzing ancient DNA and genomic data from diverse present-day South Asians, we show that Indus Periphery-related people are the single most important source of ancestry in South Asia — consistent with the idea that the Indus Periphery individuals are providing us with the first direct look at the ancestry of peoples of the IVC — and we develop a model for the formation of present-day South Asians in terms of the temporally and geographically proximate sources of Indus Periphery-related, Steppe, and local South Asian hunter-gatherer-related ancestry. Our results show how ancestry from the Steppe genetically linked Europe and South Asia in the Bronze Age, and identifies the populations that almost certainly were responsible for spreading Indo-European languages across much of Eurasia.

NOTE. The supplementary material seems to be full of errors right now, because it lists as R1b-M269 (and further subclades) samples that have been previously expressly said were xM269, so we will have to wait to see if there are big surprises here. So, for example, samples from Mal’ta (M269), Iron Gates (M269 and L51), and Latvia Mesolithic (L51), a Deriivka sample from 5230 BC (M269), Armenia_EBA (Z2103)…Also, the sample from Yuzhnyy Oleni Ostrov is R1a-M417 now.

EDIT (1 APR 2018): The main author has confirmed on Twitter that they have used a new Y Chr caller that calls haplogroups given the data provided, and depending on the coverage tried to provide a call to the lowest branch of the tree possible, so there are obviously a lot of mistakes – not just in the subclades of R. A revision of the paper is on its way, and soon more people will be able to work with the actual samples, since they say they are releasing them.

Nevertheless, since it is subclades (and not haplogroups) the apparent source of gross errors, for the moment it seems we can say with a great degree of confidence that:

  • New samples of East Yamna / Poltavka are of haplogroup R1b-L23.
  • Afanasevo is confirmed to be dominated by R1b-M269.
  • Sintashta, as I predicted could happen, shows a mixed R1b-L23/ R1a-Z645 society, compatible with my model of continuity of Proto-Indo-Iranian in the East Yamna admixture with late Corded Ware immigrants.

With lesser confidence in precise subclades, we find that:

  • A sample from Hajji Firuz in Iran ca. 5650 BC, of subclade R1b-Z2103, may confirm Mesolithic R1b-M269 lineages from the Caucasus as the source of CHG ancestry to Khvalynsk/Yamna, and be thus the reason why Reich wrote about a potential PIE homeland south of the Caucasus . (EDIT 11 APR 2018) The sample shows steppe ancestry, therefore the date is most likely incorrect, and a new radiocarbon dating is due. It is still interesting – depending on the precise subclade – for its potential relationship with IE migrations into the area.
  • New samples of East Yamna / Poltavka are of haplogroup R1b-Z2103.
  • Afanasevo migrants are mainly of haplogroup R1b-Z2103.
    • The Darra-e Kur sample, ca. 2655, of haplogroup R1b-L151, without a clear cultural adscription, may be the expected sign of Afanasevo migrants (Pre-Proto-Tocharian speakers) expanding a Northern Indo-European (in contrast with a Southern or Graeco-Aryan) dialect, in a region closely linked with the later desert mummies in the Tarim Basin. Its early presence there would speak in favour of a migration through the Inner Asian Mountain Corridor previous to the one caused by Andronovo migrants.
  • Sintashta shows a mixed R1b-Z2103 / R1a-Z93 society.
    • Later Indo-Iranian migrations are apparently dominated by R1a-Z2123, an early subclade of R1a-Z93, also found in Srubna.
    • R1b is also seen later in BMAC (ca. 1487 BC), although its subclade is not given.
  • There is also a sample of R1a-Z283 subclade in the eastern steppe (ca. 1600 BC). What may be interesting about it is that it could mark one of the subclades not responsible for the expansion of Balto-Slavic (or responsible for it with the expansion of Srubna, for those who support an Indo-Slavonic branch related Sintashta-Potapovka).
  • A sample of R1b-U106 subclade is found in Loebanr_IA ca. 950 BC, which – together with the sample of Darra-e Kur – is compatible with the presence of L51 in Yamna.

NOTE. Errors in haplogroups of previously published samples make every subclade of new samples from the supplementary table questionable, but all new samples (safe for the Darra_i_Kur one) were analysed and probably reported by the Reich Lab, and at least upper subclades in each haplogroup tree seem mostly coherent with what was expected. Also, the contribution of Iranian Farmer related (a population in turn contributing to Hajji Firuz) to Khvalynsk in their sketch of the genetic history may be a sign of the association of R1b-M269 lineages with CHG ancestry, although previous data on precise R1b subclades in the region contradict this. (EDIT 11 APR 2018) The sample of Hajji Firuz is most likely much younger than the published date, hence its younger subclade may be correct. No revision or comment on this matter has been published, though.

yamna-steppe-emba-mlba-cloud
Modeling results. (A) Admixture events originating from 7 “Distal” populations leading 538 to the formation of the modern Indian cloud shown geographically. Clines or 2-way mixtures of 539 ancestry are shown in rectangles, and clouds (3-way mixtures) are shown in ellipses.

Also, it seems that the Corded Ware culture appears now irrelevant for Late Proto-Indo-European migrations. Observe:

In the text, a consistent terminology of Yamnaya or Yamnaya-related Steppe pastoralists, discarding the relevance of previous migrations from the North Pontic steppe in spreading Late Indo-European:

Our results also shed light on the question of the origins of the subset of Indo-European languages spoken in India and Europe (45). It is striking that the great majority of Indo-European speakers today living in both Europe and South Asia harbor large fractions of ancestry related to Yamnaya Steppe pastoralists (corresponding genetically to the Steppe_EMBA cluster), suggesting that “Late Proto-Indo-European”—the language ancestral to all modern Indo- European languages—was the language of the Yamnaya (46). While ancient DNA studies have documented westward movements of peoples from the Steppe that plausibly spread this ancestry to Europe (5, 31), there has not been ancient DNA evidence of the chain 488 of transmission to South Asia. Our documentation of a large-scale genetic pressure from Steppe_MLBA groups in the 2nd millennium BCE provides a prime candidate, a finding that is consistent with archaeological evidence of connections between material culture in the Kazakh middle-to-late Bronze Age Steppe and early Vedic culture in India (46).

EDIT (1 APR 2018): I corrected this text and the word ‘official’ in the title, because more than rejecting the role of Corded Ware migrants in expanding Late PIE, they actually seem to keep considering Corded Ware migrants as continuing the western Yamna expansion in the Carpathian Basin, so no big ‘official’ change or retraction in this paper, just subtle movements out of their previous model.

yamna-migrations-indo-iranian
Modeling results.(B) A 540 schematic model of events originating from 7 “Distal” populations leading to the formation of 541 the modern Indian cline, shown chronologically. (C) Admixture proportions as estimated 542 using qpAdm for populations reflected in A and B.

NOTE. If they correct the haplogroups soon, I will update the information in this post. Unless there is a big surprise that merits a new one, of course.

EDIT (1 APR 2018): Multiple minor edits to the original post.

EDIT (2 APR 2018): While I and other simple-minded people were only looking to confirm our previous theories using Y-DNA haplogroups, and are content with wildly speculating over the consequences if some of those strange (probably wrong) ones were true, intelligent people are using their time for something useful, interpreting the results of the investigation as described in the paper, to offer a clearer picture of Indo-Iranian migrations for everyone:

Visit the beautiful interactive map with samples: with their location, PCA, ADMIXTURE and haplogroups (still with those originally given): https://public.tableau.com/profile/vagheesh#!/vizhome/TheGenomicFormationofSouthandCentralAsia/Fig_1

Featured image, from the article: “A Tale of Two Subcontinents. The prehistory of South Asia and Europe are parallel in both being impacted by two successive spreads, the first from the Near East after 7000 BCE bringing agriculturalists who mixed with local hunter-gatherers, and the second from the Steppe after 3000 BCE bringing people who spoke Indo-European languages and who mixed with those they encountered during their migratory movement. Mixtures of these mixed populations then produced the rough clines of ancestry present in both South Asia and in Europe today (albeit with more variable proportions of local hunter-gatherer-related ancestry in Europe than in India), which are (imperfectly) correlated to geography. The plot shows in contour lines the time of the expansion of Near Eastern agriculture. Human movements and mixtures, which also plausibly contributed to the spread of languages, are shown with arrows.”

Related: