The Iron Age expansion of Southern Siberian groups and ancestry with Scythians


Maternal genetic features of the Iron Age Tagar population from Southern Siberia (1st millennium BC), by Pilipenko et al. (2018).

Interesting excerpts (emphasis mine):

The positions of non-Tagar Iron Age groups in the MDS plot were correlated with their geographic position within the Eurasian steppe belt and with frequencies of Western and Eastern Eurasian mtDNA lineages in their gene pools. Series from chronological Tagar stages (similar to the overall Tagar series) were located within the genetic variability (in terms of mtDNA) of Scythian World nomadic groups (Figs 5 and 6; S4 and S6 Tables). Specifically, the Early Tagar series was more similar to western nomads (North Pontic Scythians), while the Middle Tagar was more similar to the Southern Siberian populations of the Scythian period. The Late Tagar group (Tes`culture) belonging to the Early Xiongnu period had the “western-most” location on the MDS plot with the maximal genetic difference from Xiongnu and other eastern nomadic groups (but see Discussion concerning the low sample size for the Tes`series).

In a comparison of our Tagar series with modern populations in Eurasia, we detected similarity between the Tagar group and some modern Turkic-speaking populations (with the exception of the Indo-Iranian Tajik population) (Fig 7; S2 Table). Among the modern Turkic-speaking groups, populations from the western part of the Eurasian steppe belt, such as Bashkirs from the Volga-Ural region and Siberian Tatars from the West Siberian forest-steppe zone, were more similar to the Tagar group than modern Turkic-speaking populations of the Altay-Sayan mountain system (including the Khakassians from the Minusinsk basin) (Fig 7).

Location of Tagar archaeological sites from which samples for this study were obtained. Burial grounds: 1—Novaya Chernaya-1; 2—Podgornoe Ozero, Barsuchiha-1, Barsuchiha-6, Barsuchiha-7; 3—Perevozinskiy; 4—Ulug-Kyuzyur, Kichik-Kyuzyur, Sovetskaya Khakassiya; 5—Tepsey-3, Tepsey-8, Tepsey-9; 6—Dolgiy Kurgan.

Mitochondrial DNA diversity and genetic relationships of the Tagar population

Our results are not inconsistent with the assumption of a probable role of gene flow due to the migration from Western Eurasia to the Minusinsk basin in the Bronze Age in the formation of the genetic composition of the Tagar population. Particularly, we detected many mtDNA lineages/clusters with probable West Eurasian origin that were dominant in modern populations of different parts of Europe, Caucasus, and the Near East (such as K and HV6) in our Tagar series based on a phylogeographic analysis.

We detected relatively low genetic distances between our Tagar population and two Bronze Age populations from the Minusinsk basin—the Okunevo culture population (pre-Andronovo Bronze Age) and Andronovo culture population, followed by Afanasievo population from the Minusinsk Basin and Middle Bronze Age population from the Mongolian Altai Mountains (the region adjacent to the Minusinsk basin) (Figs 3 and 6; S3 and S5 Tables). Among West Eurasian part of our Tagar series we also observed haplogroups/sub-haplogroups and haplotypes shared with Early and Middle Bronze Age populations from Minusinsk Basin and western part of Eurasian steppe belt (Fig 4; S5 Table). Thus, our results suggested a potentially significant role of the genetic components, introduced by migrants from Western Eurasia during the Bronze Age, in the formation of the genetic composition of the Tagar population. It is necessary to note the relatively small size of available mtDNA samples from the Bronze Age populations of Minusinsk basin; accordingly, additional mtDNA data for these populations are required to further confirm our inference.

Phylogenetic tree of mtDNA lineages from the Tagar population. Color coding of the Tagar stages: orange—the Early Tagar stage; blue—the Middle Tagar Stage; green—the Late Tagar stage. Color of haplogroup labels: yellow—for Western Eurasian haplogroups; red—for Eastern Eurasian haplogroups.

Another substantial part of the mtDNA pool of the Tagar and other eastern populations of the Scythian World is typical of populations in Southern Siberia and adjacent regions of Central Asia (autochthonous Central Asian mtDNA clusters). Most of these components belong to the East Eurasian cluster of mtDNA haplogroups. Moreover, the role of each of these components in the formation of the genetic composition of subsequent (to the present) populations in South Siberia and Central Asia could be very different. In this regard, cluster C4a2a (and its subcluster C4a2a1), and haplogroup A8 are of particular interest.

Genetic features of successive Tagar groups

We compared successive Tagar groups (Early, Middle, and Late Tagar) with each other and with other Iron Age nomadic populations to evaluate changes in the mtDNA pool structure. Despite the genetic similarity between the Early and Middle Tagar series and Scythian World nomadic groups (Figs 5 and 6; S4 and S6 Tables), there were some peculiarities. For example, the Early Tagar series was more similar to North Pontic Classic Scythians, while the Middle Tagar samples were more similar to the Southern Siberian populations of the Scythian period (i.e., completely synchronous populations of regions neighboring the Minusinsk basin, such as the Pazyryk population from the Altay Mountains and Aldy-Bel population from Tuva).

We observed differences in the mtDNA pool structure between the Early and the Middle chronological stages of the Tagar culture population, as evidenced by the change in the ratio of Western to Eastern Eurasian mtDNA components. The contribution of Eastern Eurasian lineages increased from about one-third (34.8%) in the Early Tagar group to almost one-half (45.8%) in the Middle Tagar group.

Results of multidimensional scaling based on matrix of Slatkin population differentiation (FST) according to frequencies of mtDNA haplogroup in Tagar populations and modern populations of Eurasia. Populations: Tagar (red pentagon) (this study); Mongolian-speaking populations: Khamnigans (Buryat Republic, Russia) [43]; Barghuts (Inner Mongolia, China) [44]; Buryats (Buryat Republic, Southern Siberia, Russia) [43]; Mongols (Mongolia) [45]. Turkic-speaking populations: Tuvinians (Tuva Republic, Russia) [43]; Tofalars (Irkutsk region, Russia) [46]; Altai-Kizhi ((Altai Republic, Russia) [43, 47]; Telenghits (Altai Republic, Russia) [43,47]; Tubalars (Altai Republic) [48]; Shors (Kemerovo region, Russia) [43, 47]; Khakassians (Khakassian Rupublic, Russia) [43, 46]; Altaian Kazakhs (Altai Republic) [49]; Kazakhs (Kazakhstan, Uzbekistan) [50, 51]; Kirghiz (Kyrgyzstan) [50, 51]; Uighurs (Kazakhstan and Xinjiang) [50, 52]; Siberian Tatars (Tyumen and Omsk regions, Russia) [53]; Tatars (Volga-Ural rigion, Russia) [54]; Bashkirs (Volga-Ural region, Russia) [55]; Uzbeks (Uzbekistan) [51, 56]; Turkmens (Turkmenistan) [51, 56]; Nogays [57]; Turkeys [58]; other populations: Evenks [43, 46]; Ulchi [59]; Koreans (South Korea) [43]; Han Chinese [60]; Zhuang (Guangxi, China) [61]; Tadjiks (Tadjikistan) [43, 51]; Iranians [60]; Russians [62].

At the level of mtDNA haplogroups, we detected a decrease in the diversity of phylogenetic clusters during the transition from the Early Tagar to the Middle Tagar. This decline in diversity equally affected the West Eurasian and East Eurasian components of the Tagar mtDNA pool. It should be noted that this decrease can be partially explained by the smaller number of Middle Tagar than Early Tagar samples. Under a simple binomial approximation the mtDNA clusters, observed at frequencies of 6.3% and 11.7%, could be lost by chance in our Early (N = 46) and Middle (N = 24) Tagar samples, respectively. However, the simultaneous lack of several such clusters, with a total frequency in the gene pool of the Early group of 34.8%, is unlikely.

The observed reduction in the genetic distance between the Middle Tagar population and other Scythian-like populations of Southern Siberia(Fig 5; S4 Table), in our opinion, is primarily associated with an increase in the role of East Eurasian mtDNA lineages in the gene pool (up to nearly half of the gene pool) and a substantial increase in the joint frequency of haplogroups C and D (from 8.7% in the Early Tagar series to 37.5% in the Middle Tagar series). These features are characteristic of many ancient and modern populations of Southern Siberia and adjacent regions of Central Asia, including the Pazyryk population of the Altai Mountains. We did not obtain strong evidence for an intensification of genetic contact between the population of the Minusinsk basin and the Altai Mountains in the Middle Tagar period compared with the Early Tagar period. Although, several archaeologists have found evidence for the intensification of contact at the level of material culture, namely, a cultural influence of the population of the Altai Mountains (represented by the Pazyryk population) on the population of the Minusinsk basin (the Saragash Tagar group) [6, 71, 72].

Another important issue is the change in the genetic structure of the Tagar population during the transition from the Middle (Saragash) to the Late (Tes`) stage. The Late Tagar stage refers to the Xiongnu period. Many archaeologists suggest that the formation of the Tes`stage involved the direct cultural influence of the Xiongnu and/or related groups of nomads from more eastern regions of Central Asia [71, 73]. Some archaeologists have even suggested renaming the Tes`stage in the Tes`culture [71], emphasizing the role of new eastern cultural elements. If this influence also existed at the genetic level, then we would expect to observe new genetic elements in the Tes`gene pool, particularly those of East Eurasian origin.

Siberian ancestry

Just a reminder of the recent session in ISBA 8 on expanding Scythians (and also Mongolians and Turks) spreading Siberian ancestry, usually (wrongly) identified as “Uralic-Yeniseian” based on modern populations (similar to how steppe ancestry is wrongly identified as “Indo-European”), see the following graphic including the Tagar population:

Very important observation with implication of population turnover is that pre-Turkic Inner Eurasian populations’ Siberian ancestry appears predominantly “Uralic-Yeniseian” in contrast to later dominance of “Tungusic-Mongolic” sort (which does sporadically occur earlier). Alexander M. Kim

And also the poster by Alexander M. Kim et al. Yeniseian hypotheses in light of genome-wide ancient DNA from historical Siberia:

The relevance of ancient DNA data to debates in historical linguistics is an emphatic strand in much recent work on the archaeogenetics of Eurasia, where the discussion has focused heavily on Indo-European (Haak et al. 2015; Narasimhan et al. 2018; de Barros Damgaard et al. 2018a,b). We present new genome-wide ancient DNA data from a historical Siberian individual in relation to Yeniseian, an isolated language “microfamily” (Vajda 2014) that nonetheless sits at the center of numerous controversial proposals in historical linguistics and cultural interaction. Yeniseian’s sole surviving representative is Ket, a critically endangered language fluently spoken by only a few dozen individuals near the Middle Yenisei River of Central Siberia.

In strong contrast to the present-day picture, river names and argued substrate influences and loanwords in languages outside the current range of Yeniseian, as well as direct records from the Russian colonial period, indicate that speakers of extinct Yeniseian languages had a formerly much broader presence in the taiga of Central Siberia as well as further south in the mountainous Altai-Sayan region – and perhaps even further afield in Inner Asia (Vajda 2010; Gorbachov 2017; Blažek 2016). The consilience of these proposals with genetic data is not straightforward (Flegontov et al. 2015, 2017) and faces a major obstacle in the lack of genetic information from verifiable speakers of Yeniseian languages other than the Kets, who have had complex ongoing interactions with speakers of non-Yeniseian languages such as the Samoyedic Selkups. We attempt to remedy this with new historical Siberian aDNA data, orienting our search for common denominators and systematic difference in a broader landscape of concordance, discordance, and uncertainty at the interface of diachronic linguistics and genetics.


Modern Sardinians show elevated Neolithic farmer ancestry shared with Basques


New paper (behind paywall), Genomic history of the Sardinian population, by Chiang et al. Nature Genetics (2018), previously published as a preprint at bioRxiv (2016).

#EDIT (18 Sep 2018): Link to read paper for free shared by the main author.

Interesting excerpts (emphasis mine):

Our analysis of divergence times suggests the population lineage ancestral to modern-day Sardinia was effectively isolated from the mainland European populations ~140–250 generations ago, corresponding to ~4,300–7,000 years ago assuming a generation time of 30 years and a mutation rate of 1.25 × 10−8 per basepair per generation. (…) in terms of relative values, the divergence time between Northern and Southern Europeans is much more recent than either is to Sardinia, signaling the relative isolation of Sardinia from mainland Europe.

We documented fine-scale variation in the ancient population ancestry proportions across the island. The most remote and interior areas of Sardinia—the Gennargentu massif covering the central and eastern regions, including the present-day province of Ogliastra— are thought to have been the least exposed to contact with outside populations. We found that pre-Neolithic hunter-gatherer and Neolithic farmer ancestries are enriched in this region of isolation. Under the premise that Ogliastra has been more buffered from recent immigration to the island, one interpretation of the result is that the early populations of Sardinia were an admixture of the two ancestries, rather than the pre-Neolithic ancestry arriving via later migrations from the mainland. Such admixture could have occurred principally on the island or on the mainland before the hypothesized Neolithic era influx to the island. Under the alternative premise that Ogliastra is simply a highly isolated region that has differentiated within Sardinia due to genetic drift, the result would be interpreted as genetic drift leading to a structured pattern of pre-Neolithic ancestry across the island, in an overall background of high Neolithic ancestry.

PCA results of merged Sardinian whole-genome sequences and the HGDP Sardinians. See below for a map of the corresponding regions.

We found Sardinians show a signal of shared ancestry with the Basque in terms of the outgroup f3 shared-drift statistics. This is consistent with long-held arguments of a connection between the two populations, including claims of Basque-like, non-Indo-European words among Sardinian placenames. More recently, the Basque have been shown to be enriched for Neolithic farmer ancestry and Indo-European languages have been associated with steppe population expansions in the post-Neolithic Bronze Age. These results support a model in which Sardinians and the Basque may both retain a legacy of pre-Indo-European Neolithic ancestry. To be cautious, while it seems unlikely, we cannot exclude that the genetic similarity between the Basque and Sardinians is due to an unsampled pre-Neolithic population that has affinities with the Neolithic representatives analyzed here.

Left: Geographical map of Sardinia. The provincial boundaries are given as black lines. The provinces are abbreviated as Cag (Cagliari), Cmp (Campidano), Car (Carbonia), Ori (Oristano), Sas (Sassari), Olb (Olbia-tempio), Nuo (Nuoro), and Ogl (Ogliastra). For sampled villages within Ogliastra, the names and abbreviations are indicated in the colored boxes. The color corresponds to the color used in the PCA plot (Fig. 2a). The Gennargentu region referred to in the main text is the mountainous area shown in brown that is centered in western Ogliastra and southeastern Nuoro.
Right: Density of Nuraghi in Sardinia, from Wikipedia.

While we can confirm that Sardinians principally have Neolithic ancestry on the autosomes, the high frequency of two Y-chromosome haplogroups (I2a1a1 at ~39% and R1b1a2 at ~18%) that are not typically affiliated with Neolithic ancestry is one challenge to this model. Whether these haplogroups rose in frequency due to extensive genetic drift and/or reflect sex-biased demographic processes has been an open question. Our analysis of X chromosome versus autosome diversity suggests a smaller effective size for males, which can arise due to multiple processes, including polygyny, patrilineal inheritance rules, or transmission of reproductive success. We also find that the genetic ancestry enriched in Sardinia is more prevalent on the X chromosome than the autosome, suggesting that male lineages may more rapidly trace back to the mainland. Considering that the R1b1a2 haplogroup may be associated with post-Neolithic steppe ancestry expansions in Europe, and the recent timeframe when the R1b1a2 lineages expanded in Sardinia, the patterns raise the possibility of recent male-biased steppe ancestry migration to Sardinia, as has been reported among mainland Europeans at large (though see Lazaridis and Reich and Goldberg et al.). Such a recent influx is difficult to square with the overall divergence of Sardinian populations observed here.

Mixture proportions of the three-component ancestries among Sardinian populations. Using a method first presented in Haak et al. (Nature 522, 207–211, 2015), we computed unbiased estimates of mixture proportions without a parameterized model of relationships between the test populations and the outgroup populations based on f4 statistics. The three-component ancestries were represented by early Neolithic individuals from the LBK culture (LBK_EN), pre-Neolithic huntergatherers (Loschbour), and Bronze Age steppe pastoralists (Yamnaya). See Supplementary Table 5 for standard error estimates computed using a block jackknife.

Once again, haplogroup R1b1a2 (M269), and only R1b1a2, related to male-biased, steppe-related Indo-European migrations…just sayin’.

Interestingly, haplogroup I2a1a1 is actually found among northern Iberians during the Neolithic and Chalcolithic, and is therefore associated with Neolithic ancestry in Iberia, too, and consequently – unless there is a big surprise hidden somewhere – with the ancestry found today among Basques.

NOTE. In fact, the increase in Neolithic ancestry found in south-west Ireland with expanding Bell Beakers (likely Proto-Beakers), coupled with the finding of I2a subclades in Megalithic cultures of western Europe, would support this replacement after the Cardial and Epi-Cardial expansions, which were initially associated with G2a lineages.

I am not convinced about a survival of Palaeo-Sardo after the Bell Beaker expansion, though, since there is no clear-cut cultural divide (and posterior continuity) of pre-Beaker archaeological cultures after the arrival of Bell Beakers in the island that could be identified with the survival of Neolithic languages.

We may have to wait for ancient DNA to show a potential expansion of Neolithic ancestry from the west, maybe associated with the emergence of the Nuragic civilization (potentially linked with contemporaneous Megalithic cultures in Corsica and in the Balearic Islands, and thus with an Iberian rather than a Basque stock), although this is quite speculative at this moment in linguistic, archaeological, and genetic terms.

Nevertheless, it seems that the association of a Basque-Iberian language with the Neolithic expansion from Anatolia (see Villar’s latest book on the subject) is somehow strengthened by this paper. However, it is unclear when, how, and where expanding G2a subclades were replaced by native I2 lineages.


Cystic fibrosis probably spread with expanding Bell Beakers


New paper (behind paywall) Estimating the age of p.(Phe508del) with family studies of geographically distinct European populations and the early spread of cystic fibrosis, by Farrell et al., European Journal of Human Genetics (2018).

Interesting excerpts (emphasis mine):

Our results revealed tMRCA average values ranging from 4725 to 1175 years ago and support the estimates of Serre et al. (3000–6000 years ago) [11], rather than Morral et al. (52,000 years ago) [6], but the latter figure was challenged by Kaplan et al. [26] because of disagreement with assumptions used in their calculations. In addition, the tMRCA values from western European regions reported herein refine the results of Fichou et al. [7] from a study of Breton CF patients in which the Estiage analysis suggested that the most common recent ancestor lived 115 generations ago. That tMRCA value, however, may have underestimated the age of p.(Phe508del) in Brittany due to consideration of all the haplotypes, even those that were reconstructed with ambiguities, as well as a potential bias associated with consanguinity due to including both haplotypes in homozygous families. In the more stringent Estiage analyses reported herein, those potential biases were avoided for all populations, leading to estimates of the oldest tMCRA values corresponding to the Early Bronze Age in western Europe, which is generally agreed to begin around 3000 BCE. This finding extends our results from a direct investigation of aDNA in teeth from Iron Age burials near Vienna around 350 BCE and allow us to conclude that p.(Phe508del) was present in that region long before then. More specifically, in the Austrian families studied, the Estiage data revealed a mean tMCRA value of 3575 years ago, which converts to 1558 BCE (Middle Bronze Age) [22].

Perhaps most remarkably, the estimated ages of p.(Phe508del) in the three western European regions (France, Ireland, and Denmark) were similar with closely overlapping 95% CI values. This observation is also in line with previously documented spatial autocorrelograms expressing genetic and geographical distance for these populations [24]. Such data provide more insight about the ancient origin of CF in our judgment—both when and where—and lead us to propose that CFTR p.(Phe508del) is derived from ancestors who lived in western Europe during the Bronze Age, as early as 2700 BCE, and that its relatively rapid dissemination occurred because of human migrations around the northwestern Atlantic trading routes [21] and then towards central and eastern Europe [22]. Diffusion from northwestern to central Europe in approximately 1000 years is consistent with the prominent Bronze Age migrations evident in the archeological record [21, 22] and from genomic studies of aDNA [27]. On the other hand, we are assuming a discrete origin of the principal CF-causing variant, but it is possible that p.(Phe508del) arose more than once or earlier, and then reached western Europe subsequently through Neolithic migrations.


[About Bell Beakers] (…) More specifically, their distinctive Bell Beaker pottery appeared and spread across western and central Europe beginning around 3000–2750 BCE and then disappeared between 2200 and 1800 BCE [22, 29]. Their migrations are linked to the advent of western and central European metallurgy, as they manufactured and traded metal goods, especially weapons, while traveling over long distances [30]. Most relevant to our study is the evidence that they migrated in a direction and over a time period that fits well with the pattern of tMRCA data we found for the p.(Phe508del) variant. Olalde et al. [29] have shown that both migration and cultural transmission played a major role in diffusion of the “Beaker Complex” and led to a “profound demographic transformation” of Britain after 2400 BCE. Moreover, the cultural elements that unite the widely distributed Beaker folk are so obvious that some have considered them a distinct ethnicity of Bronze Age people [33].

From our results, we propose the novel concept that large scale, long term west-to-east migrations of the Bell Beaker Europeans [22, 28–30] during the Bronze Age, could explain the dissemination of p.(Phe508del) in Europe and its documented northwest-to-southeast gradient [4].In fact, our tMRCA data show a temporal gradient also.

As you can see from the references, they consulted with Barry Cunliffe (or people accepting his theory), who is obsessed with Bell Beakers expanding Celtic languages from the British Isles. He is like the British equivalent of Danish scholar Kristian Kristiansen, and his obsession with Corded Ware = Indo-European (and Germanic = CWC Denmark), immutable no matter what genetic results might show.

The funny thing is, the interpretation of the paper is probably right. From what we can see in the data, it is quite possible that the disease spread with expanding Bell Beakers…only it spread from the East group in Hungary, i.e. from east to west. The regional difference in TMRCA and apparent west—east cline would point to the different expansions of affected lineages in the corresponding regions, and not to an origin in the British Isles.


Male-biased expansions and migrations also observed in Northwestern Amazonia

Open access preprint Cultural Innovations influence patterns of genetic diversity in Northwestern Amazonia, by Arias et al., bioRxiv (2018).

Abstract (emphasis mine):

Human populations often exhibit contrasting patterns of genetic diversity in the mtDNA and the non-recombining portion of the Y-chromosome (NRY), which reflect sex-specific cultural behaviors and population histories. Here, we sequenced 2.3 Mb of the NRY from 284 individuals representing more than 30 Native-American groups from Northwestern Amazonia (NWA) and compared these data to previously generated mtDNA genomes from the same groups, to investigate the impact of cultural practices on genetic diversity and gain new insights about NWA population history. Relevant cultural practices in NWA include postmarital residential rules and linguistic-exogamy, a marital practice in which men are required to marry women speaking a different language. We identified 2,969 SNPs in the NRY sequences; only 925 SNPs were previously described. The NRY and mtDNA data showed that males and females experienced different demographic histories: the female effective population size has been larger than that of males through time, and both markers show an increase in lineage diversification beginning ~5,000 years ago, with a male-specific expansion occurring ~3,500 years ago. These dates are too recent to be associated with agriculture, therefore we propose that they reflect technological innovations and the expansion of regional trade networks documented in the archaeological evidence. Furthermore, our study provides evidence of the impact of postmarital residence rules and linguistic exogamy on genetic diversity patterns. Finally, we highlight the importance of analyzing high-resolution mtDNA and NRY sequences to reconstruct demographic history, since this can differ considerably between males and females.

MDS plots for mtDNA and NRY. Stress values (within parentheses) are indicated in percentages.

Looking more precisely at the different groups (even with the resampling approach), there are no significant differences between matrilocal and patrilocal groups. At best, as the study proposes, “this is just one of the factors at play in structuring the observed genetic variation”.

Interesting excerpts:

(…) we found evidence that the patterns of genetic differentiation depend on the geographical scale of the study. The magnitude of between-population differentiation in the NRY compared to the mtDNA is smaller when looking at the continental scale than in NWA (Figure 6). This is in agreement with the findings of Wilkins and Marlowe (2006), who showed that the excess of between-population differentiation for the NRY in comparison to the mtDNA decreases when comparing more geographically distant populations. Heyer et al. (2012) and Wilkins and Marlowe (2006) have proposed that at a local scale the patterns of genetic diversity reflect cultural practices over a relatively small number of generations, whereas at a larger geographic scale the genetic diversity reflects old migration and/or old common ancestry patterns(Heyer et al. 2012; Wilkins and Marlowe 2006).

BSPs for the mtDNA and NRY sequences from NWA. The dotted lines indicate the 95% HPD intervals. Ne was corrected for generation time according to (Fenner 2005), using 26 years for mtDNA and 31 years for NRY.

The BSP plots and the diversity statistics indicate that overall the Ne of males has been smaller than that of females. One tentative explanation for this difference is that it reflects larger differences in reproductive success among males than among females. Some support for this explanation comes from the shape of the phylogenies (Supplementary Figures 1 and 6), since differences in reproductive success and the cultural transmission of fertility lead to imbalance phylogenies (Blum et al. 2006; Heyer et al. 2015). We estimated a common index of tree imbalance (Colless index) and calculated whether the mtDNA and NRY trees were more unbalanced than 1000 simulated trees generated under a Yule process (Bortolussi et al. 2006) (i.e. a simple pure birth process that assumes that the birth rate of new lineages is the same along the tree). We found that the NRY tree is more unbalanced than predicted by the Yule model (p-value=0.001), whereas the mtDNA tree is not significantly different from trees generated by the Yule model (p-value=0.628). It has been suggested that highly mobile hunter-gatherer societies, such as those typical of most of human prehistory, were polygynous bands (Dupanloup et al. 2003); similarly, nomadic horticulturalist Amazonian societies exhibit strong differences in reproductive success due to the common practice of polygyny, especially among community chiefs, whose offspring also enjoy a high fertility (Neel 1970; 1980; Neel and Weiss 1975).

Furthermore, a more recent expansion can be observed in the BSP based on the NRY, but not in the mtDNA BSP (Figure 5), indicating an expansion specifically in the paternal line. The reasons behind this recent male-biased population expansion, which starts ~3.5 kya, are as yet unclear. However, similar male-biased expansions have been observed in other studies using high-resolution NRY sequences (Batini et al. 2017; Karmin et al. 2015).


Ancient DNA reveals temporal population structure of pre-Incan and Incan periods in South‐Central Andes area

Ancient DNA reveals temporal population structure within the South‐Central Andes area, by Russo et al. Am. J. Phys. Anthropol. (2018).

Abstract (emphasis mine):

The main aim of this work was to contribute to the knowledge of pre‐Hispanic genetic variation and population structure among the South‐central Andes Area by studying individuals from Quebrada de Humahuaca, North‐western (NW) Argentina.

Materials and methods
We analyzed 15 autosomal STRs in 19 individuals from several archaeological sites in Quebrada de Humahuaca, belonging to the Regional Developments Period (900–1430 AD). Compiling autosomal, mitochondrial, and Y‐chromosome data, we evaluated population structure and differentiation among eight South‐central Andean groups from the current territories of NW Argentina and Peru.

Location of the archaeological sites analyzed in this study (stars) and the South-central Andean populations used for comparisons (triangles). The punctuated line indicates the north-south subdivision of Quebrada de Humahuaca.1: Pe~nas Blancas, 2: San José, 3: Huacalera, 4: Banda de Perchel, 5: Juella, 6: Sarahuaico, 7: Tilcara, 8: Muyuna, 9: Los Amarillos, 10: Las Pirguas, 11: Tompullo 2, 12: Puca, 13: Acchaymarca, 14: Lauricocha. Map constructed from the obtained with the R package ggmap (Kahle & Wickham, 2013)

Autosomal data revealed a structuring of the analyzed populations into two clusters which seemed to represent different temporalities in the Andean pre‐Hispanic history: pre‐Inca and Inca. All pre‐Inca samples fell into the same cluster despite being from the two different territories of NW Argentina and Peru. Also, they were systematically differentiated from the Peruvian Inca group. These results were mostly confirmed by mitochondrial and Y‐chromosome analyses. We mainly found a clearly different haplotype composition between clusters.

Population structure in South America has been mostly studied on current native groups, mainly showing a west‐to‐east differentiation between the Andean and lowland regions. Here we demonstrated that genetic population differentiation preceded the European contact and might have been more complex than thought, being found within the South‐central Andes Area. Moreover, divergence among temporally different populations might be reflecting socio‐political changes occurred in the evermore complex pre‐Hispanic Andean societies.

Principal coordinates analysis (PCoA) based on individual genetic distances obtained with autosomal STRs data. Percentage of variance explained by each coordinate is shown in parenthesis. Colors were assigned according to the two clusters discovered with structure

See also:

Early Indo-Iranian formed mainly by R1b-Z2103 and R1a-Z93, Corded Ware out of Late PIE-speaking migrations


The awaited, open access paper on Asian migrations is out: The Genomic Formation of South and Central Asia, by Narasimhan et al. bioRxiv (2018).


The genetic formation of Central and South Asian populations has been unclear because of an absence of ancient DNA. To address this gap, we generated genome-wide data from 362 ancient individuals, including the first from eastern Iran, Turan (Uzbekistan, Turkmenistan, and Tajikistan), Bronze Age Kazakhstan, and South Asia. Our data reveal a complex set of genetic sources that ultimately combined to form the ancestry of South Asians today. We document a southward spread of genetic ancestry from the Eurasian Steppe, correlating with the archaeologically known expansion of pastoralist sites from the Steppe to Turan in the Middle Bronze Age (2300-1500 BCE). These Steppe communities mixed genetically with peoples of the Bactria Margiana Archaeological Complex (BMAC) whom they encountered in Turan (primarily descendants of earlier agriculturalists of Iran), but there is no evidence that the main BMAC population contributed genetically to later South Asians. Instead, Steppe communities integrated farther south throughout the 2nd millennium BCE, and we show that they mixed with a more southern population that we document at multiple sites as outlier individuals exhibiting a distinctive mixture of ancestry related to Iranian agriculturalists and South Asian hunter-gathers. We call this group Indus Periphery because they were found at sites in cultural contact with the Indus Valley Civilization (IVC) and along its northern fringe, and also because they were genetically similar to post-IVC groups in the Swat Valley of Pakistan. By co-analyzing ancient DNA and genomic data from diverse present-day South Asians, we show that Indus Periphery-related people are the single most important source of ancestry in South Asia — consistent with the idea that the Indus Periphery individuals are providing us with the first direct look at the ancestry of peoples of the IVC — and we develop a model for the formation of present-day South Asians in terms of the temporally and geographically proximate sources of Indus Periphery-related, Steppe, and local South Asian hunter-gatherer-related ancestry. Our results show how ancestry from the Steppe genetically linked Europe and South Asia in the Bronze Age, and identifies the populations that almost certainly were responsible for spreading Indo-European languages across much of Eurasia.

NOTE. The supplementary material seems to be full of errors right now, because it lists as R1b-M269 (and further subclades) samples that have been previously expressly said were xM269, so we will have to wait to see if there are big surprises here. So, for example, samples from Mal’ta (M269), Iron Gates (M269 and L51), and Latvia Mesolithic (L51), a Deriivka sample from 5230 BC (M269), Armenia_EBA (Z2103)…Also, the sample from Yuzhnyy Oleni Ostrov is R1a-M417 now.

EDIT (1 APR 2018): The main author has confirmed on Twitter that they have used a new Y Chr caller that calls haplogroups given the data provided, and depending on the coverage tried to provide a call to the lowest branch of the tree possible, so there are obviously a lot of mistakes – not just in the subclades of R. A revision of the paper is on its way, and soon more people will be able to work with the actual samples, since they say they are releasing them.

Nevertheless, since it is subclades (and not haplogroups) the apparent source of gross errors, for the moment it seems we can say with a great degree of confidence that:

  • New samples of East Yamna / Poltavka are of haplogroup R1b-L23.
  • Afanasevo is confirmed to be dominated by R1b-M269.
  • Sintashta, as I predicted could happen, shows a mixed R1b-L23/ R1a-Z645 society, compatible with my model of continuity of Proto-Indo-Iranian in the East Yamna admixture with late Corded Ware immigrants.

With lesser confidence in precise subclades, we find that:

  • A sample from Hajji Firuz in Iran ca. 5650 BC, of subclade R1b-Z2103, may confirm Mesolithic R1b-M269 lineages from the Caucasus as the source of CHG ancestry to Khvalynsk/Yamna, and be thus the reason why Reich wrote about a potential PIE homeland south of the Caucasus . (EDIT 11 APR 2018) The sample shows steppe ancestry, therefore the date is most likely incorrect, and a new radiocarbon dating is due. It is still interesting – depending on the precise subclade – for its potential relationship with IE migrations into the area.
  • New samples of East Yamna / Poltavka are of haplogroup R1b-Z2103.
  • Afanasevo migrants are mainly of haplogroup R1b-Z2103.
    • The Darra-e Kur sample, ca. 2655, of haplogroup R1b-L151, without a clear cultural adscription, may be the expected sign of Afanasevo migrants (Pre-Proto-Tocharian speakers) expanding a Northern Indo-European (in contrast with a Southern or Graeco-Aryan) dialect, in a region closely linked with the later desert mummies in the Tarim Basin. Its early presence there would speak in favour of a migration through the Inner Asian Mountain Corridor previous to the one caused by Andronovo migrants.
  • Sintashta shows a mixed R1b-Z2103 / R1a-Z93 society.
    • Later Indo-Iranian migrations are apparently dominated by R1a-Z2123, an early subclade of R1a-Z93, also found in Srubna.
    • R1b is also seen later in BMAC (ca. 1487 BC), although its subclade is not given.
  • There is also a sample of R1a-Z283 subclade in the eastern steppe (ca. 1600 BC). What may be interesting about it is that it could mark one of the subclades not responsible for the expansion of Balto-Slavic (or responsible for it with the expansion of Srubna, for those who support an Indo-Slavonic branch related Sintashta-Potapovka).
  • A sample of R1b-U106 subclade is found in Loebanr_IA ca. 950 BC, which – together with the sample of Darra-e Kur – is compatible with the presence of L51 in Yamna.

NOTE. Errors in haplogroups of previously published samples make every subclade of new samples from the supplementary table questionable, but all new samples (safe for the Darra_i_Kur one) were analysed and probably reported by the Reich Lab, and at least upper subclades in each haplogroup tree seem mostly coherent with what was expected. Also, the contribution of Iranian Farmer related (a population in turn contributing to Hajji Firuz) to Khvalynsk in their sketch of the genetic history may be a sign of the association of R1b-M269 lineages with CHG ancestry, although previous data on precise R1b subclades in the region contradict this. (EDIT 11 APR 2018) The sample of Hajji Firuz is most likely much younger than the published date, hence its younger subclade may be correct. No revision or comment on this matter has been published, though.

Modeling results. (A) Admixture events originating from 7 “Distal” populations leading 538 to the formation of the modern Indian cloud shown geographically. Clines or 2-way mixtures of 539 ancestry are shown in rectangles, and clouds (3-way mixtures) are shown in ellipses.

Also, it seems that the Corded Ware culture appears now irrelevant for Late Proto-Indo-European migrations. Observe:

In the text, a consistent terminology of Yamnaya or Yamnaya-related Steppe pastoralists, discarding the relevance of previous migrations from the North Pontic steppe in spreading Late Indo-European:

Our results also shed light on the question of the origins of the subset of Indo-European languages spoken in India and Europe (45). It is striking that the great majority of Indo-European speakers today living in both Europe and South Asia harbor large fractions of ancestry related to Yamnaya Steppe pastoralists (corresponding genetically to the Steppe_EMBA cluster), suggesting that “Late Proto-Indo-European”—the language ancestral to all modern Indo- European languages—was the language of the Yamnaya (46). While ancient DNA studies have documented westward movements of peoples from the Steppe that plausibly spread this ancestry to Europe (5, 31), there has not been ancient DNA evidence of the chain 488 of transmission to South Asia. Our documentation of a large-scale genetic pressure from Steppe_MLBA groups in the 2nd millennium BCE provides a prime candidate, a finding that is consistent with archaeological evidence of connections between material culture in the Kazakh middle-to-late Bronze Age Steppe and early Vedic culture in India (46).

EDIT (1 APR 2018): I corrected this text and the word ‘official’ in the title, because more than rejecting the role of Corded Ware migrants in expanding Late PIE, they actually seem to keep considering Corded Ware migrants as continuing the western Yamna expansion in the Carpathian Basin, so no big ‘official’ change or retraction in this paper, just subtle movements out of their previous model.

Modeling results.(B) A 540 schematic model of events originating from 7 “Distal” populations leading to the formation of 541 the modern Indian cline, shown chronologically. (C) Admixture proportions as estimated 542 using qpAdm for populations reflected in A and B.

NOTE. If they correct the haplogroups soon, I will update the information in this post. Unless there is a big surprise that merits a new one, of course.

EDIT (1 APR 2018): Multiple minor edits to the original post.

EDIT (2 APR 2018): While I and other simple-minded people were only looking to confirm our previous theories using Y-DNA haplogroups, and are content with wildly speculating over the consequences if some of those strange (probably wrong) ones were true, intelligent people are using their time for something useful, interpreting the results of the investigation as described in the paper, to offer a clearer picture of Indo-Iranian migrations for everyone:

Visit the beautiful interactive map with samples: with their location, PCA, ADMIXTURE and haplogroups (still with those originally given):!/vizhome/TheGenomicFormationofSouthandCentralAsia/Fig_1

Featured image, from the article: “A Tale of Two Subcontinents. The prehistory of South Asia and Europe are parallel in both being impacted by two successive spreads, the first from the Near East after 7000 BCE bringing agriculturalists who mixed with local hunter-gatherers, and the second from the Steppe after 3000 BCE bringing people who spoke Indo-European languages and who mixed with those they encountered during their migratory movement. Mixtures of these mixed populations then produced the rough clines of ancestry present in both South Asia and in Europe today (albeit with more variable proportions of local hunter-gatherer-related ancestry in Europe than in India), which are (imperfectly) correlated to geography. The plot shows in contour lines the time of the expansion of Near Eastern agriculture. Human movements and mixtures, which also plausibly contributed to the spread of languages, are shown with arrows.”


Stone Age plague accompanying migrants from the steppe, probably Yamna, Balkan EBA, and Bell Beaker, not Corded Ware


In the latest revisions of the Indo-European demic diffusion model, using the results from the article Early Divergent Strains of Yersinia pestis in Eurasia 5,000 Years Ago, by Rasmussen et al., Cell (2015), I stated (more or less indirectly) that the high east-west mobility of the Corded Ware migrants across related cultures might have been responsible for the spread of this disease, which seems to have been originally expanded from Central Eurasia.

New results appeared recently in the article The Stone Age Plague and Its Persistence in Eurasia, by Valtueña et al., Current Biology (2017), which may contradict that interpretation.

Diachronic map of Copper Age migrations ca. 3100-2600 BC – Corded Ware


Yersinia pestis, the etiologic agent of plague, is a bacterium associated with wild rodents and their fleas. Historically it was responsible for three pandemics: the Plague of Justinian in the 6th century AD, which persisted until the 8th century [ 1 ]; the renowned Black Death of the 14th century [ 2, 3 ], with recurrent outbreaks until the 18th century [ 4 ]; and the most recent 19th century pandemic, in which Y. pestis spread worldwide [ 5 ] and became endemic in several regions [ 6 ]. The discovery of molecular signatures of Y. pestis in prehistoric Eurasian individuals and two genomes from Southern Siberia suggest that Y. pestis caused some form of disease in humans prior to the first historically documented pandemic [ 7 ]. Here, we present six new European Y. pestis genomes spanning the Late Neolithic to the Bronze Age (LNBA; 4,800 to 3,700 calibrated years before present). This time period is characterized by major transformative cultural and social changes that led to cross-European networks of contact and exchange [ 8, 9 ]. We show that all known LNBA strains form a single putatively extinct clade in the Y. pestis phylogeny. Interpreting our data within the context of recent ancient human genomic evidence that suggests an increase in human mobility during the LNBA, we propose a possible scenario for the early spread of Y. pestis: the pathogen may have entered Europe from Central Eurasia following an expansion of people from the steppe, persisted within Europe until the mid-Bronze Age, and moved back toward Central Eurasia in parallel with human populations.

Maximum-Likelihood Tree and Percent Coverage Plot of Virulence Factors of Yersinia pestis. (A) Maximum-likelihood tree of all Yersinia pestis genomes, including 1,265 SNP positions with complete deletion. Nodes with support R95% are marked with an asterisk. The colors represent different branches in the Y. pestis phylogeny: branch 0 (black), branch 1 (red), branch 2 (green), branch 3 (blue), branch 4 (orange), and LNBA Y. pestis branch (purple). Y. pseudotuberculosis-specific SNPs were excluded from the tree for clarity of representation. In the light-colored boxes, discussed losses and gains of genomic regions and genes are indicated. Related

It seems that, notwithstanding the simplistic (white) arrows of steppe ancestry expansion shown in their map (see below), the actual expansion of Yersinia pestis might have in fact accompanied Yamna migrants from the Pontic-Caspian steppe into Early Bronze Age cultures from the Balkans, including Bell Beaker migrants, as the phylogenetic analysis and dates suggest – and as the potential arrows of the plague expansion in the map (in green) show.

Late Corded Ware migrants would have only later expanded the disease to eastern Europe, as shown in the second map, most likely because of their close contact with Bell Beaker migrants (but remaining culturally distinct from them), and indeed because of the mobility accross related Corded Ware cultures up to the Urals.

The cultural-historical community in the Late Neolithic between steppe peoples that would evolve into Uralic-speaking Sredni Stog/Corded Ware migrants in the western steppe, and Late Indo-European-speaking Yamna/SE EBA/Bell Beaker migrants originally from the eastern steppe, would allow for the spread of the disease first among steppe groups, and then from both distinct late groups into their respective expanded regions.

The phylogenetic tree of Y. pestis available right now (see above), however, seems to suggest a stronger initial link to Yamna migrants, i.e. an origin in the North Caspian steppe, and an expansion with Yamna into the north Pontic area, into the Caucasus, and with the Afansevo culture, spreading later with Balkan EBA cultures and the expansion of Bell Beaker peoples.

Instead of warring nature, close ties, and mobility of Corded Ware peoples (reasons I used to justify the rapid spread of the disease among CWC groups), I guess it was rather the higher population density of SE Europe compared to the regions north of the loess belt, as well as the greater admixture of Yamna migrants with native SE European populations, the factors which might have helped expand the disease.

Map of Proposed Yersinia pestis Circulation throughout Eurasia (A) Entrance of Y. pestis into Europe from Central Eurasia with the expansion of Yamnaya pastoralists around 4,800 years ago. (B) Circulation of Y. pestis to Southern Siberia from Europe. Only complete genomes are shown.

Nevertheless, lacking more data, it is unclear if the disease expanded with both steppe groups.


Globular Amphora not linked to Pontic steppe migrants – more data against Kristiansen’s Kurgan model of Indo-European expansion


New open access article, Genome diversity in the Neolithic Globular Amphorae culture and the spread of Indo-European languages, by Tassi et al. (2017).


It is unclear whether Indo-European languages in Europe spread from the Pontic steppes in the late Neolithic, or from Anatolia in the Early Neolithic. Under the former hypothesis, people of the Globular Amphorae culture (GAC) would be descended from Eastern ancestors, likely representing the Yamnaya culture. However, nuclear (six individuals typed for 597 573 SNPs) and mitochondrial (11 complete sequences) DNA from the GAC appear closer to those of earlier Neolithic groups than to the DNA of all other populations related to the Pontic steppe migration. Explicit comparisons of alternative demographic models via approximate Bayesian computation confirmed this pattern. These results are not in contrast to Late Neolithic gene flow from the Pontic steppes into Central Europe. However, they add nuance to this model, showing that the eastern affinities of the GAC in the archaeological record reflect cultural influences from other groups from the East, rather than the movement of people.

(a) Principal component analysis on genomic diversity in ancient and modern individuals. (b) K = 3,4 ADMIXTURE analysis based only on ancient variation. (a) Principal component analysis of 777 modern West Eurasian samples with 199 ancient samples. Only transversions considered in the PCA (to avoid confounding effects of post-mortem damage). We represented modern individuals as grey dots, and used coloured and labelled symbols to represent the ancient individuals. (b) Admixture plots at K = 3 and K = 4 of the analysis conducted only considering the ancient individuals. The full plot is shown in electronic supplementary material, figure S7. The ancient populations are sorted by a temporal scale from Pleistocene to Iron Age. The GAC samples of this study are displayed in the box on the right.

Excerpt, from the discussion:

In its classical formulation, the Kurgan hypothesis, i.e. a late Neolithic spread of proto-Indo-European languages from the Pontic steppes, regards the GAC people as largely descended from Late Neolithic ancestors from the East, most likely representing the Yamna culture; these populations then continued their Westward movement, giving rise to the later Corded Ware and Bell Beaker cultures. Gimbutas [23] suggested that the spread of Indo-European languages involved conflict, with eastern populations spreading their languages and customs to previously established European groups, which implies some degree of demographic change in the areas affected by the process. The genomic variation observed in GAC individuals from Kierzkowo, Poland, does not seem to agree with this view. Indeed, at the nuclear level, the GAC people show minor genetic affinities with the other populations related with the Kurgan Hypothesis, including the Yamna. On the contrary, they are similar to Early-Middle Neolithic populations, even geographically distant ones, from Iberia or Sweden. As already found for other Late Neolithic populations [18], in the GAC people’s genome there is a component related to those of much earlier hunting-gathering communities, probably a sign of admixture with them. At the nuclear level, there is a recognizable genealogical continuity from Yamna to Corded Ware. However, the view that the GAC people represented an intermediate phase in this large-scale migration finds no support in bi-dimensional representations of genome diversity (PCA and MDS), ADMIXTURE graphs, or in the set of estimated f3-statistics.

Scheme summarizing the five alternative models compared via ABC random forest. We generated by coalescent simulation mtDNA sequences under five models, differing as to the number of migration events considered. The coloured lines represent the ancient samples included in the analysis, namely Unetice (yellow line), Bell Beaker (purple line), Corded Ware (green line) and Globular Amphorae (red line) from Central Europe, Yamnaya (light blue line) and Srubnaya (brown line) from Eastern Europe. The arrows refer to the three waves of migration tested. Model NOMIG was the simplest one, in which the six populations did not have any genetic exchanges; models MIG1, MIG2 and MIG1, 2 differed from NOMIG in that they included the migration events number 1, 2 (from Eastern to Central Europe, respectively before and after the onset of the GAC), or both. Model MIG2, 3 represents a modification of MIG2 model also including a back migration from Central to Eastern Europe after the development of the Corded Ware culture.

Together with Globular Amphora culture samples from Mathieson et al. (2017), this suggests that Kristiansen’s Indo-European Corded Ware Theory is wrong, even in its latest revised models of 2017.

The background shading indicates the tree migratory waves proposed by Marija Gimbutas, and personally
checked by her in 1995. The symbols refer to the ancient populations considered in the ABC analysis

On the other hand, the article’s genetic finds have some interesting connections in terms of mtDNA phylogeography, but without a proper archaeological model it is difficult to explain them.

Haplogroup frequencies were obtained for Early Neolithic (EN), Middle Neolithic (MN), Chalcolithic (CA), and Late Neolithic (LN). The color assigned to each haplogroup is represented on the lower right part of each plot. Haplogroup frequencies were plotted geographically using QGIS v2.14.

Text and images from the article under Creative Commons Attribution 4.0 license.

Discovered first via Bernard Sécher’s blog.

See also: