Thus, our data favor a process of Near Eastern animal domestication that is dispersed in space and time, rather than radiating from a central core (3, 11). This resonates with archaeozoological evidence for disparate early management strategies from early Anatolian, Iranian, and Levantine Neolithic sites (12, 13). Interestingly, our finding of divergent goat genomes within the Neolithic echoes genetic investigation of early farmers. Northwestern Anatolian and Iranian human Neolithic genomes are also divergent (14–16), which suggests the sharing of techniques rather than large-scale migrations of populations across Southwest Asia in the period of early domestication. Several crop plants also show evidence of parallel domestication processes in the region (17).
PCA affinity (Fig. 2), supported by qpGraph and outgroup f3 analyses, suggests that modern European goats derive from a source close to the western Neolithic; Far Eastern goats derive from early eastern Neolithic domesticates; and African goats have a contribution from the Levant, but in this case with considerable admixture from the other sources (figs. S11, S16, and S17 and tables S26 and 27). The latter may be in part a result of admixture that is discernible in the same analyses extended to ancient genomes within the Fertile Crescent after the Neolithic (figs. S18 and S19 and tables S20, S27, and S31) when the spread of metallurgy and other developments likely resulted in an expansion of inter-regional trade networks and livestock movement.
Our results imply a domestication process carried out by humans in dispersed, divergent, but communicating communities across the Fertile Crescent who selected animals in early millennia, including for pigmentation, the most visible of domestic traits.
Native Americans from the Amazon, Andes, and coastal geographic regions of South America have a rich cultural heritage but are genetically understudied, therefore leading to gaps in our knowledge of their genomic architecture and demographic history. In this study, we sequence 150 genomes to high coverage combined with an additional 130 genotype array samples from Native American and mestizo populations in Peru. The majority of our samples possess greater than 90% Native American ancestry, which makes this the most extensive Native American sequencing project to date. Demographic modeling reveals that the peopling of Peru began ∼12,000 y ago, consistent with the hypothesis of the rapid peopling of the Americas and Peruvian archeological data. We find that the Native American populations possess distinct ancestral divisions, whereas the mestizo groups were admixtures of multiple Native American communities that occurred before and during the Inca Empire and Spanish rule. In addition, the mestizo communities also show Spanish introgression largely following Peruvian Independence, nearly 300 y after Spain conquered Peru. Further, we estimate migration events between Peruvian populations from all three geographic regions with the majority of between-region migration moving from the high Andes to the low-altitude Amazon and coast. As such, we present a detailed model of the evolutionary dynamics which impacted the genomes of modern-day Peruvians and a Native American ancestry dataset that will serve as a beneficial resource to addressing the underrepresentation of Native American ancestry in sequencing studies.
The high frequency of Native American mitochondrial haplotypes suggests that European males were the primary source of European admixture with Native Americans, as previously found (23, 24, 41, 42). The only Peruvian populations that have a proportion of the Central American component are in the Amazon (Fig. 2A). This is supported by Homburger et al. (4), who also found Central American admixture in other Amazonian populations and could represent ancient shared ancestry or a recent migration between Central America and the Amazon.
Following the peopling of Peru, we find a complex history of admixture between Native American populations from multiple geographic regions (Figs. 2B and 3 A and C). This likely began before the Inca Empire due to Native American and mestizo groups sharing IBD segments that correspond to the time before the Inca Empire. However, the Inca Empire likely influenced this pattern due to their policy of forced migrations, known as “mitma” (mitmay in Quechua) (28, 31, 37), which moved large numbers of individuals to incorporate them into the Inca Empire. We can clearly see the influence of the Inca through IBD sharing where the center of dominance in Peru is in the Andes during the Inca Empire (Fig. 3C).
A similar policy of large-scale consolidation of multiple Native American populations was continued during Spanish rule through their program of reducciones, or reductions (31, 32), which is consistent with the hypothesis that the Inca and Spanish had a profound impact on Peruvian demography (25). The result of these movements of people created early New World cosmopolitan communities with genetic diversity from the Andes, Amazon, and coast regions as is evidenced by mestizo populations’ ancestry proportions (Fig. 3A). Following Peruvian independence, these cosmopolitan populations were those same ones that predominantly admixed with the Spanish (Fig. 3B). Therefore, this supports our model that the Inca Empire and Spanish colonial rule created these diverse populations as a result of admixture between multiple Native American ancestries, which would then go on to become the modern mestizo populations by admixing with the Spanish after Peruvian independence.
Further, it is interesting that this admixture began before the urbanization of Peru (26) because others suspected the urbanization process would greatly impact the ancestry patterns in these urban centers (25). (…)
Animal domestication gives rise to gradual changes at the genomic level through selection in populations. Selective sweeps have been traced in the genomes of many animal species, including humans, cattle, and dogs. However, little is known regarding positional candidate genes and genomic regions that exhibit signatures of selection in domestic horses. In addition, an understanding of the genetic processes underlying horse domestication, especially the origin of Chinese native populations, is still lacking. In our study, we generated whole genome sequences from 4 Chinese native horses and combined them with 48 publicly available full genome sequences, from which 15 341 213 high-quality unique single-nucleotide polymorphism variants were identified. Kazakh and Lichuan horses are 2 typical Asian native breeds that were formed in Kazakh or Northwest China and South China, respectively. We detected 1390 loss-of-function (LoF) variants in protein-coding genes, and gene ontology (GO) enrichment analysis revealed that some LoF-affected genes were overrepresented in GO terms related to the immune response. Bayesian clustering, distance analysis, and principal component analysis demonstrated that the population structure of these breeds largely reflected weak geographic patterns. Kazakh and Lichuan horses were assigned to the same lineage with other Asian native breeds, in agreement with previous studies on the genetic origin of Chinese domestic horses. We applied the composite likelihood ratio method to scan for genomic regions showing signals of recent selection in the horse genome. A total of 1052 genomic windows of 10 kB, corresponding to 933 distinct core regions, significantly exceeded neutral simulations. The GO enrichment analysis revealed that the genes under selective sweeps were overrepresented with GO terms, including “negative regulation of canonical Wnt signaling pathway,” “muscle contraction,” and “axon guidance.” Frequent exercise training in domestic horses may have resulted in changes in the expression of genes related to metabolism, muscle structure, and the nervous system.
Admixture proportions were assessed without user-defined population information to infer the presence of distinct populations among the samples (Figure 2). At K = 3 or K = 4, Franches-Montagnes and Arabian forms one unique cluster; at K = 5, Jeju pony forms one unique cluster. For other breeds, comparatively strong population structure exists among breeds, and they can be assigned to 2 (or 3) alternate clusters from K = 3 to K = 5 including group A (Duelmener, Fjord, Icelandic, Kazakh, Lichuan, and Mongolian) and group B (Hanoverian, Morgan, Quarter, Sorraia, and Standardbred). For group A, geographically this was unexpected, where Nordic breeds (Norwegian Fjord, Icelandic, and Duelmener) clustered with Asian breeds including the Mongolian.Previous results of mitochondrial DNA have revealed links between the Mongolian horse and breeds in Iceland, Scandinavia, Central Europe, and the British Isles. The Mongol horses are believed to have been originally imported from Russia subsequently became the basis for the Norwegian Fjord horse.31 At K = 6, Sorraia forms one unique cluster. The Sorraia horse has no long history as a domestic breed but is considered to be of a nearly ancestral type in the southern part of the Iberian Peninsula.32 However, our result did not support Sorraia as an independent ancestral type based on result from K = 2 to K = 5, and the unique cluster in K = 6 may be explained by the small population size and recently inbreeding programs. Genetic admixture of Morgan reveals that these breeds are currently or traditionally continually crossed with other breeds from K = 2 to K = 8. The Morgan horse has been a largely closed breed for 200 years or more but there has been some unreported crossbreeding in recent times.33
Bayesian clustering and PCA demonstrated the relationships among the horse breeds with weak geographic patterns. The tight grouping within most native breeds and looser grouping of individuals in admixed breeds have been reported previously in modern horses using data from a 54K SNP chip.33,34 Cluster analysis reveals that Arabian or Franches-Montagnes forms one unique cluster with relatively low K value, which is consistent with former study using 50K SNP chip 33,34 Interestingly, Standardbred forms a unique cluster with relatively high K value in this study, different from previous study.33 To date, no footprints are available to describe how the earliest domestic horses spread into China in ancient times. Our study found that Kazakh and Lichuan were assigned to the same lineage as other native Asian breeds, in agreement with previous studies on the origin of Chinese domestic horses.4,5,35,36 The strong genetic relationship between Asian native breeds and European native breeds have made it more difficult to understand the population history of the horse across Eurasia. Low levels of population differentiation observed between breeds might be explained by historical admixture. Unlike the domestic pig in China,8we suggest that in China, Northern/Southern distinct groups could not be used to genetically distinct native Chinese horse breeds. We consider that during domestication process of horse, gene flow continued among Chinese-domesticated horses.
There are large populations of indigenous horse (Equus caballus) in China and some other parts of East Asia. However, their matrilineal genetic diversity and origin remained poorly understood. Using a combination of mitochondrial DNA (mtDNA) and hypervariable region (HVR-1) sequences, we aim to investigate the origin of matrilineal inheritance in these domestic horses.
To investigate patterns of matrilineal inheritance in domestic horses, we conducted a phylogenetic study using 31 de novo mtDNA genomes together with 317 others from the GenBank. In terms of the updated phylogeny, a total of 5,180 horse mitochondrial HVR-1 sequences were analyzed.
Eighteen haplogroups (Aw-Rw) were uncovered from the analysis of the whole mitochondrial genomes. Most of which have a divergence time before the earliest domestication of wild horses (about 5,800 years ago) and during the Upper Paleolithic (35–10 KYA). The distribution of some haplogroups shows geographic patterns. The Lw haplogroup contained a significantly higher proportion of European horses than the horses from other regions, while haplogroups Jw, Rw, and some maternal lineages of Cw, have a higher frequency in the horses from East Asia. The 5,180 sequences of horse mitochondrial HVR-1 form nine major haplogroups (A-I). We revealed a corresponding relationship between the haplotypes of HVR-1 and those of whole mitochondrial DNA sequences. The data of the HVR-1 sequences also suggests that Jw, Rw, and some haplotypes of Cw may have originated in East Asia while Lw probably formed in Europe.
Our study supports the hypothesis of the multiple origins of the maternal lineage of domestic horses and some maternal lineages of domestic horses may have originated from East Asia.
Geographic distributions of horse mtDNA haplogroups
The analysis of geographic distribution of the mitochondrial genome haplogroups showed that horse populations in Europe or East Asia included all haplogroups defined from the mtDNA genome sequences. The lineage Fw comprised entirely of Przewalskii horses. The two haplogroups Iw and Lw displayed frequency peaks in Europe (14.08% and 37.32%, respectively) and a decline to the east (9.33% and 8.00% in the West Asia, and 6.45% and 12.90% in East Asia, respectively), especially for Lw, which contained the largest number of European horses (Table 2). However, an opposite distribution pattern was observed for haplogroups Aw, Hw, Jw, and Rw, which were harbored by more horses from East Asia than those from other regions. The proportions of horses from East Asia for the four haplogroups were 38%, 88%, 62%, and 54%, respectively.
Visualization and exploration of high-dimensional data is a ubiquitous challenge across disciplines. Widely used techniques such as principal component analysis (PCA) aim to identify dominant trends in one dataset. However, in many settings we have datasets collected under different conditions, e.g., a treatment and a control experiment, and we are interested in visualizing and exploring patterns that are specific to one dataset. This paper proposes a method, contrastive principal component analysis (cPCA), which identifies low-dimensional structures that are enriched in a dataset relative to comparison data. In a wide variety of experiments, we demonstrate that cPCA with a background dataset enables us to visualize dataset-specific patterns missed by PCA and other standard methods. We further provide a geometric interpretation of cPCA and strong mathematical guarantees. An implementation of cPCA is publicly available, and can be used for exploratory data analysis in many applications where PCA is currently used.
The Mexican example caught my attention:
Relationship between ancestral groups in Mexico
In previous examples, we have seen that cPCA allows the user to discover subclasses within a target dataset that are not labeled a priori. However, even when subclasses are known ahead of time, dimensionality reduction can be a useful way to visualize the relationship within groups. For example, PCA is often used to visualize the relationship between ethnic populations based on genetic variants, because projecting the genetic variants onto two dimensions often produces maps that offer striking visualizations of geographic and historic trends26,27. But again, PCA is limited to identifying the most dominant structure; when this represents universal or uninteresting variation, cPCA can be more effective at visualizing trends.
The dataset that we use for this example consists of single nucleotide polymorphisms (SNPs) from the genomes of individuals from five states in Mexico, collected in a previous study28. Mexican ancestry is challenging to analyze using PCA since the PCs usually do not reflect geographic origin within Mexico; instead, they reflect the proportion of European/Native American heritage of each Mexican individual, which dominates and obscures differences due to geographic origin within Mexico (see Fig. 4a). To overcome this problem, population geneticists manually prune SNPs, removing those known to derive from Europeans ancestry, before applying PCA. However, this procedure is of limited applicability since it requires knowing the origin of the SNPs and that the source of background variation to be very different from the variation of interest, which are often not the case.
As an alternative, we use cPCA with a background dataset that consists of individuals from Mexico and from Europe. This background is dominated by Native American/European variation, allowing us to isolate the intra-Mexican variation in the target dataset. The results of applying cPCA are shown in Fig. 4b. We find that individuals from the same state in Mexico are embedded closer together. Furthermore, the two groups that are the most divergent are the Sonorans and the Mayans from Yucatan, which are also the most geographically distant within Mexico, while Mexicans from the other three states are close to each other, both geographically as well as in the embedding captured by cPCA (see Fig. 4c). See also Supplementary Fig. 6 for more details.
So, by using a background dataset, it discovers patterns in a single target dataset via dimensionality reduction, that standard dimensionality reduction techniques do not discover. Maybe useful for some prehistoric populations, too…
Computing D-statistics for each individual of the form D(Baltic LN, Yamnaya; X, Mbuti), we find that the two individuals from the early phase of the LN (Plinkaigalis242 and Gyvakarai1, dating to ca. 3200–2600 calBCE) form a clade with Yamnaya (Supplementary Table 7), consistent with the absence of the farmer-associated component in ADMIXTURE (Fig. 2b). Younger individuals share more alleles with Anatolian and European farmers (Supplementary Table 7) as also observed in contemporaneous Central European CWC individuals2.
My interpretation of the Zvejnieki sample ca. 2880 BC (and thus also of the only Baltic LN sample forming a close cluster with it) as ‘outlier’ seems thus reinforced as more samples come in. My explanation based on exogamy is one possibility for the region. After all, great mobility and exogamy practices are universally accepted for the Corded Ware territory, and Yamna migrants had settled up along the Prut precisely around this period (ca. 3100-2900 BC), so this kind of relation between Yamna and Baltic samples is to be expected.
Plinkaigalis 242, >40 year old female (OxA-5936, 4280 ± 75 BP, 3260–2630 calBCE). The burial site is located in the plains of central Lithuania on the eastern bank of the river Šušvė on the outskirts of the Plinkaigalis village, approximately 400 m southeast of an Iron age hill fort and settlement. The burial site was discovered in 1975 when local residents started digging for gravel in the western part of the hill. The same year site was granted a legal protection with archaeological excavations carried out for eight straight years in a row (1977-1984). During the eight years of fieldwork a total of 373 graves (364 inhumation and 9 cremation graves) with all but two of them dating to 3rd to 8th c. AD were uncovered. The two exceptional graves (no. 241, 242) were uncovered in the northern part of the burial site and C14 dated to the Late Neolithic.
Gyvakarai 1, 35-40 year old male (Poz-61584, 4030 ± 30 BP, 2620–2470 calBCE). The burial site is located in the northern part of Lithuania on the steep gravelly bank (elevation up to 79 m a. s. l.) of the rivulet Žvikė, 500 m to the south from where, in the wet grassland valley, it meets the main stem river Pyvesa. The site was discovered in 2000 when local residents started digging for gravel in the central part of the gravelly bank. The same year rescue excavations were conducted in the surrounding area of the highly disturbed grave resulting in discovery of a single grave C14 dated to the Late Neolithic.
EDIT (16 FEB 2018): A commentator noted that Gyvakaray1 was also studied for Yersinia pestis, a disease which appears to have expanded first to the west from the steppe, and then to the east, so it is possible that its position in PCA related to Plinkaigalis242 shows a connection to late Yamna settlers or East Bell Beaker migrants.
NOTE: I haven’t had the time and patience to work with my virtual computer on the PCA of these new samples – my CPU is reaching everyday its limit and my fans work half the time – , so I don’t know exactly which of them is Plinkaigalis242 and which Gyvakarai1, I just made a wild guess (based on ADMIXTURE) that the earlier Plinkaigalis242 forms a common ‘outlier’ group with Zvejnieki; if they are reversed or otherwise wrong in the image, please correct me. It will be much appreciated.
If we take the most recent reliable radiocarbon analyses of material culture, and interpretations based on them of Corded Ware as a ‘complex’similar to Bell Beaker (accepted more and more by disparate academics such as Anthony or Klejn), it seems that the controversial ‘massive’ Corded Ware migration must have begun somehow later than previously thought, which leaves these early Baltic samples still less clearly part of the initial Corded Ware culture, and more as outliers waiting for a more precise cultural context among Late Neolithic changes in the region.
However, if traditional Uralicists are right in supposing a loose Neolithic community in the Forest Zone, and Kristiansen is right in supposing long-lasting contacts in the Dniester-Dnieper region, we might actually be seeing with these ‘outliers’ the first proof that Neolithic samples from the forest-steppe and Forest Zone of the 4th millenium – unrelated to the Corded Ware culture – clustered closely to Khvalynsk, Sredni Stog, or Yamna samples, which is compatible with Piezonka’s accounts of intercultural contacts.
Acceptance of the results of radiometric dating meant that the concept of the so called ‘A-Horizon’ also had to be reformulated. If we are dealing with such a phase at all, it is not a classic typological period that is defined by a uniform material culture inventory, but rather a set of types which show a wide distribution, but which are always integrated into a locally specific and thus regionally variable context.
The situation resembles that of the Bell Beakers, where a few supra-regional types are associated with local forms of ‘Begleitkeramik’ (i.e. pottery that accompanies Bell Beakers: Strahm 1995; Besse 1996).
The distribution data indicate that this set of forms (namely the A-Beaker, ‘A-Amphora’, and A-Battle Axe, as well as Herringbone-decorated Beakers) was to be found over much of Europe around 2700 BC, and that the currency of these forms was not short: they seem to have been used continuously during the Final Neolithic, perhaps even until 2000 BC (Fig. 3; Furholt 2004). Analysis of the radiometric and dendrochronological determinations also indicates that the A-Horizon is not the earliest Corded Ware phase. Instead, it appears to follow an apparent earlier phase in Poland during which Corded Ware pottery was in use from as early as 2900 BC (Furholt 2003; 2008a; Wödarczak 2006; Ullrich 2008).
Corded Ware and Yamna/Bell Beaker
While widening networks and a change in the mechanism of exchange appears to have contributed to the emergence of the Corded Ware archaeological phenomenon, and also the contemporaneous Yamnaya graves (Harrison & Heyd 2007) and the following Bell Beaker and Early Bronze Age phenomena, it remains to be seen exactly what factors contributed to the development of these systems. It may be that there were changes in subsistence practices, perhaps involving a rising importance of animal herding that subsequently required higher mobility (for a discussion see Dörfler & Müller 2008), but considering the obvious diversity in subsistence patterns present in different Corded Ware groups, such an explanation would seem appropriate for the transformation in some regions, but surely not for the eastern hunterfisher-gatherer groups of the Baltic (Bläuer & Kantanen 2013). Also, trade with amber and copper might have played its role, but there are so far no indications for a significant rise in quantity or reach of these two materials in connection with Corded Ware graves or settlements (Furholt 2003, 125–7).
The impacts of animal traction and the wagon are also to be taken into account, as they are present since 3400 BC (Mischka 2011) but does at least not play any visible role in Corded Ware burial rituals, very much in contrast to the previous periods (Johannsen & Laursen 2010). There is no evidence for horse riding, but the domesticated horse seems to be present in central Europe since before 3000 BC (Becker 1999) and have also been found in Corded Ware settlements (Becker 2008), but again the evidence of domesticated horses is much more abundant in the period before 3000 BC.
So, concerning amber and copper exchange, or the impact of the wheel and animal traction, there is the recurrent motive of stronger evidence for the period before 3000 BC than during or in connection to Corded Ware finds after 2700 BC.
The evidence strongly points towards a long period of coalescence from 3000 to 2700 BC, when several innovations in burial customs, pottery, and tool types sprung forth from different places and subsequently spread via different networks of exchange and interaction. These surely showed a significant rise in scale, reach, and impact on local practices, but the same is true for the contemporary Globular Amphora and Yamnaya ‘Cultures’. This exchange resulted, roughly spoken, in a phenomenon like the A-Horizon.
Thus, it seems reasonable to explain the wide regional reach of those Corded Ware elements as the result of a general increase in mobility and thus an increase in the spatial extension of regional networks, triggered by the long-term effects of technological innovations and connected economic and social transformations in Europe since 3400 BC. It is the increase in mobility and regional networks that is new to the European Neolithic Societies after this time, and it is not only the Corded Ware elements, that are spread through these channels but also Yamnaya, Globular Amphorae, Bell Beaker ‘Cultures’, and copper and bronze artefacts in later periods. Those are archaeological classification units, heuristic tools for the ordering of finds, while brushing over variability and overlapping traits, and so they should not be confused with real social groups.
These differences between closely related regions, in all these cases and especially among steppe cultures, even when they are supported by Archaeology and anthropological models of migration (and compatible with linguistic models), are expected to be minimal.
Fortunately, we have phylogeography, which helps us point in the right direction when assessing potential migrations using genomic data.
User Tomenable recently pointed out a curious finding on Anthrogenica, from data available in Mathieson et al (2017): in ADMIXTURE results with K=12, a different ancestral component (in light green in the paper, see below) is traceable from the North Caspian steppe since the Neolithic. This is also partially distinguishable on K=10 and K=11, although not so clearly differentiating among later cultures.
Interesting is also the appearance of similar ancestral components later in Vučedol – which probably received admixture from Yamna settlers (see admixture components in West Yamna samples and in the Yamna settler from Bulgaria) – , and later still in the Balkans.
On the other hand, previous ancestral components in outliers from the Balkans seem to be more similar to Sredni Stog samples, giving still more strength to the hypothesis that this common (“steppe”) component expanded westward within the Pontic-Caspian steppe with the spread of Suvorovo-Novodanilovka chiefs.
Problems with this interpretation include:
1) The scarce samples available, the different cultures included, and the CV values of the K populations selected in ADMIXTURE.
3) The sample classified as Latvia_LN/CWC has this component. I have already said before that, given the differences with all other Corded Ware samples, this quite early sample might be an outlier, with Khvalynsk/Yamna population connected directly to the ancestors of this individual, possibly through exogamy (as it is clear from my sketch below). Whether or not this is an outlier among CWC populations in the Baltic, only future samples can tell.
4) Three later individuals from Corded Ware in Germany have the component, in a minimal amount. I would bet – judging by their position in the graphic – that this might be explained through the Esperstedt family. These individuals might have in turn got the contribution directly from the oldest member, who shows what seems (in PCA) like a recent admixture from contemporary steppe cultures (such as the Catacomb culture).
Again, needle in a haystack… And confirmation bias by me, indeed.
But interesting nonetheless.
EDIT (4 JAN 2017): A reader points out that the interpretation of Unsupervised ADMIXTURE should work backwards (i.e. different contributions into different modern populations), and not based solely on ancestral populations, which seems probably right. So again, confirmation bias (and potentially wrong direction fallacy) by me…
I already wrote about the concept of outlier in Human Ancestry, so I am not going to repeat myself. This is just an update of “outliers” in recent studies, and their potential origins (here I will repeat some of the examples):
Early Khvalynsk: the three samples from the Samara region have quite different positions in PCA, from nearest to EHG (of Y-DNA haplogroup R1a) to nearest to ANE ancestry (of Y-DNA haplogroup Q). This could represent the initial consequences of the second wave of ANE ancestry – as found later in Yamna samples from a neighbouring region -, possibly brought then by Eurasian migrants related to haplogroup Q.
With only 3 samples, this is obviously just a tentative explanation of the finds. The samples can only be reasonably said to show an unstable time for the region in terms of admixture (i.e. probably migration), judging by the data on PCA.
Ukraine Eneolithic samples offer a curious example of how the concept of outlier can change radically: from the third version (May 30th) of the preprint paper of Mathieson et al. (2017), when the Ukraine Eneolithic sample with steppe ancestry (and clustering with central European samples) was the ‘outlier’, to the fourth version (September 19th), when two samples with steppe ancestry clustering close to Corded Ware samples were now the ‘normal’ ones (i.e. those representing Ukraine Eneolithic population), and the outlier was the one clustering closely with Ukraine Mesolithic samples…
This is one of the funny consequences of the wrong interpretation of the ‘yamnaya component’, that made geneticists believe at first that, out of two samples (!), the ‘outlier’ was the one with ‘yamnaya’ ancestry, because this component would have been brought by an eastern immigrant from early Khvalynsk…
West Yamna (to insist on the same question, the ‘yamnaya’ component): we have only four western Yamna samples, two of them showing Anatolian Neolithic ancestry (one of them, from Ukraine, with a strong ‘southern’ drift). On the other hand, Corded Ware migrants do not show this. So we could infer that their migrations were not coetaneous: whereas peoples of Corded Ware culture expanded ca. 3300 BC to the north – in the natural corridor to the Baltic that has been proposed for this culture in Archaeology for decades (and that is well represented by Ukraine Eneolithic samples) -, peoples of Yamna culture expanded to the west, replacing the Ukraine Eneolithic population (i.e. probably those of ‘Proto-Corded Ware culture’), and eventually mixing with Balkan populations of Anatolian Neolithic ancestry.
Potapovka, Andronovo, and Srubna: while Potapovka clusters closely to the steppe, and Andronovo (like Sintashta) clusters closely to Corded Ware (i.e. Ukraine Neolithic / Central-East European), both have certain ‘outliers’ in PCA: the former has one individual clustering closely to Corded Ware, and the latter to the steppe. Both ‘outliers’ fit well with the interpretation of the recent mixture of Corded Ware peoples with steppe populations, and they offer a different image for the evolution of populations of Potapovka and Sintashta-Petrovka, potentially influencing their language. The position of Srubna samples, nearer to Sintashta and Andronovo (but occupying the same territory as the previous Potapovka) offers the image of a late westward conquest from Corded Ware-related populations.
Iron Age Bulgaria: a sample of haplogroup R1a-z93, with more ‘yamnaya’ ancestry than any other previous sample from the Balkans. For some, it might mean continuity from an older time. However – as with the Corded Ware outlier from Esperstedt before it – it is more likely a recent migrant from the steppe. The most likely origin of this individual is therefore people from the steppe, i.e. either the Srubna culture or a related group. Its relatively close cluster in PCA to certain recent Slavic populations can be interpreted in light of the multiple back and forth migrations in the region: of steppe populations to the west (Srubna, Cimmerians, Scythians, Sarmatians,…), and of Slavic-speaking populations:
And then rapidly expanding as a Proto-Slavic-speaking community from the steppe to the west.
Well-defined outliers are, therefore, essential to understand a recent history of admixture. On the other hand, the very concept of “outlier” can be a dangerous tool – when the lack of enough samples makes their classification as as such unjustified -, leading to the wrong interpretations.
It is unclear whether Indo-European languages in Europe spread from the Pontic steppes in the late Neolithic, or from Anatolia in the Early Neolithic. Under the former hypothesis, people of the Globular Amphorae culture (GAC) would be descended from Eastern ancestors, likely representing the Yamnaya culture. However, nuclear (six individuals typed for 597 573 SNPs) and mitochondrial (11 complete sequences) DNA from the GAC appear closer to those of earlier Neolithic groups than to the DNA of all other populations related to the Pontic steppe migration. Explicit comparisons of alternative demographic models via approximate Bayesian computation confirmed this pattern. These results are not in contrast to Late Neolithic gene flow from the Pontic steppes into Central Europe. However, they add nuance to this model, showing that the eastern affinities of the GAC in the archaeological record reflect cultural influences from other groups from the East, rather than the movement of people.
Excerpt, from the discussion:
In its classical formulation, the Kurgan hypothesis, i.e. a late Neolithic spread of proto-Indo-European languages from the Pontic steppes, regards the GAC people as largely descended from Late Neolithic ancestors from the East, most likely representing the Yamna culture; these populations then continued their Westward movement, giving rise to the later Corded Ware and Bell Beaker cultures. Gimbutas  suggested that the spread of Indo-European languages involved conflict, with eastern populations spreading their languages and customs to previously established European groups, which implies some degree of demographic change in the areas affected by the process. The genomic variation observed in GAC individuals from Kierzkowo, Poland, does not seem to agree with this view. Indeed, at the nuclear level, the GAC people show minor genetic affinities with the other populations related with the Kurgan Hypothesis, including the Yamna. On the contrary, they are similar to Early-Middle Neolithic populations, even geographically distant ones, from Iberia or Sweden. As already found for other Late Neolithic populations , in the GAC people’s genome there is a component related to those of much earlier hunting-gathering communities, probably a sign of admixture with them. At the nuclear level, there is a recognizable genealogical continuity from Yamna to Corded Ware. However, the view that the GAC people represented an intermediate phase in this large-scale migration finds no support in bi-dimensional representations of genome diversity (PCA and MDS), ADMIXTURE graphs, or in the set of estimated f3-statistics.
Together with Globular Amphora culture samples from Mathieson et al. (2017), this suggests that Kristiansen’s Indo-European Corded Ware Theory is wrong, even in its latest revised models of 2017.
On the other hand, the article’s genetic finds have some interesting connections in terms of mtDNA phylogeography, but without a proper archaeological model it is difficult to explain them.
I wrote two days ago in the post anouncing the revised version (October 2017) of the Indo-European demic diffusion model, about dumping the information I had on doing PCA and ADMIXTURE analyses as ‘drafts’, without reviewing them, in the new section of this website called Human Ancestry.
I began to work with free datasets to see if I could learn something more about results of recent Genetic research by working with the available free software. For the moment, I don’t see it necessary to continue working with samples myself, because there are many professionals in Bioinformatics doing an excellent job with their publications – much better than I could do -, and publishing results early (as pre-prints) and with free licenses, which allow us to reuse and modify their material. To work again with their samples seems most of the time like reinventing the wheel.
After all, my interpretation of Indo-European migrations does not depend on my own analysis of free datasets – or on genetic analysis, or on archaeological fieldwork, for that matter – but on the study of all anthropological questions involved. I am actually more interested in Linguistics, and – only marginally – in Archaeology, as is the field of Indo-European Studies in general.
I did find certain interesting aspects that I have commented in the model, though: especially by labelling all samples and reading about them carefully (usually in the supplementary notes of the published papers), you can observe certain patterns and derive some information that others might have missed. Such examples include the Corded Ware outlier from Esperstedt (see more on the Corded Ware migration), or the differences in the three samples from early Khvalynsk.
However, if I need to work again with datasets, I will try to complete the drafts the best I can. Especially regarding F3 Statistics and qpGraph, which I didn’t even try. If you want to help improve the sections, you are welcome of course.
If I find time, I might be of help with your work. And even though modern genealogy does not interest me (for the moment), I guess it can also be relevant to obtain conclusions on more recent migrations, so if I can be of any help to any interesting work, I will do it too.
I also noticed after publishing the draft that I had used the wording “Corded Ware outlier” at least once. I certainly had that term in mind when developing the third version, but I did not intend to write it down formally. Nevertheless, I think it is the right name to use.
Outlier in Statistics, as you can infer from the name, is a sample (more precisely an observation) that lies distant to others. It is a slippery concept in Human Evolutionary Biology, because it has no clear definition, and it is thus dependent on a certain degree of subjective evaluation. It seems to be mainly based on a combination of PCA and ADMIXTURE analyses, but should obviously be dependent on the number of samples available for a certain culture, and the regional distribution of the samples available.
We have thus certain clear cases, like the Poltavka outlier, of R1a-M417 lineage, clustering close to Corded Ware (and Sintashta, and Potapovka) samples, but far from other R1b-L23 samples from Poltavka or Yamna cultures, from neighbouring regions in the steppe.
We have also less clear observations, like Balkan Chalcolithic samples, which may or may not have been part of different cultural groups (say, related to the Suvorovo-Novodanilovka expansion, or not), which may justify their differences in ancestral components in ADMIXTURE, and in their position in PCA.
And we have a Yamna sample from western Ukraine, which – unlike the other two available samples – clusters “to the south” of east Yamna samples. Taking into account the Yamna sample from Bulgaria, clustering closely with south-eastern European samples, could you really call this an outlier? Two outliers out of four western Yamna samples? Well, maybe. If you take east and west Yamna from the steppe as a whole, and exclude the Yamna sample from Bulgaria, of course you can. Whether that classification is useful, or actually hinders a proper interpretation of western Yamna samples, and of the “Yamna component” seen in them, is a different story…
But what then about the Corded Ware male from Esperstedt, labelled I0104, dated ca. 2430 BC, which clusters among contemporaneous steppe (Poltavka) samples, and has the greatest proportion of ‘Yamna component’ in ADMIXTURE? After all, it is different in both respects from any other Corded Ware individual – including the oldest samples available, from Latvia (ca. 2885 BC) and Tiefbrunn (ca. 2755 BC).
This sample is one of the direct links between the steppe and Corded Ware in late times, and has been the main reason for the confusion a lot of people seem to have about the “Yamna component” in Corded Ware, with some supporting a direct migration from one into the other, and a few even daring to say that “Corded Ware is indistinguishable from Yamna”(!?).
His family members – all males of haplogroup R1a-M417 (like I0104 and most males from the Corded Ware culture) -, few generations later, show a decreased Yamna component, which clearly indicates that this individual’s admixture came directly from the steppe, and most likely from one or multiple female ancestors. That is compatible with the nomadic nature of the Corded Ware culture (and its known exogamy practices), which connected central Europe with the steppes, up to the North Caspian region.
If labelling other samples as outliers may be interesting to improve the conclusions one can obtain from genetic research, labelling this sample is, in my opinion, essential, to avoid certain strong misconceptions about the origin of the Corded Ware culture.