The concept of “Outlier” in Human Ancestry (III): Late Neolithic samples from the Baltic region and origins of the Corded Ware culture


I have written before about how the Late Neolithic sample from Zvejnieki seemed to be an outlier among Corded Ware samples (read also the Admixture analysis section on the IEDDM), due to its position in PCA, even more than its admixture components or statistical comparison might show.

In the recent update to Northern European samples in Mittnik et al. (2018), an evaluation of events similar to the previous preprint (2017) is given:

Computing D-statistics for each individual of the form D(Baltic LN, Yamnaya; X, Mbuti), we find that the two individuals from the early phase of the LN (Plinkaigalis242 and Gyvakarai1, dating to ca. 3200–2600 calBCE) form a clade with Yamnaya (Supplementary Table 7), consistent with the absence of the farmer-associated component in ADMIXTURE (Fig. 2b). Younger individuals share more alleles with Anatolian and European farmers (Supplementary Table 7) as also observed in contemporaneous Central European CWC individuals2.

Sampling locations and dating of 38 ancient Northern European samples introduced in this study. Chronology based on calibrated radiocarbon dates or relative dating

My interpretation of the Zvejnieki sample ca. 2880 BC (and thus also of the only Baltic LN sample forming a close cluster with it) as ‘outlier’ seems thus reinforced as more samples come in. My explanation based on exogamy is one possibility for the region. After all, great mobility and exogamy practices are universally accepted for the Corded Ware territory, and Yamna migrants had settled up along the Prut precisely around this period (ca. 3100-2900 BC), so this kind of relation between Yamna and Baltic samples is to be expected.

NOTE: Information on the Late Neolithic burial of Zvejnieki is scarce, since it is an isolated find in radiocarbon analysis, among Mesolithic burials. You can read more about it from Ilga Zagorska’s studies, such as The use of ochre in Stone Age burials of the East Baltic (2008), The persistent presence of the dead: recent excavations at the hunter-gatherer cemetery at Zvejnieki (Latvia) (Antiquity 2013), or Dietary freshwater reservoir effects and the radiocarbon ages of prehistoric human bones from Zvejnieki, Latvia (J. Archaeol. Sci. 2016).

Samples of Baltic “Late Neolithic / Corded Ware culture”

The only two samples clustering more closely to Yamna cluster also closely to the three previous samples from Khvalynsk in Samara (labelled ‘Steppe Eneolithic’ in the paper), which makes one wonder how strongly connected were cultures from the forest and forest-steppe zones before the expansion of Corded Ware and Yamna settlers.

NOTE: Apart from the scarcity of samples available, which is common in genetic studies, the description of both additional ‘outlier’ samples of the Baltic Late Neolithic – isolated finds based mainly on radiocarbon analysis – leaves a lot to the imagination, because of the lack of cultural context and potential problems with dating methods:

Plinkaigalis 242, >40 year old female (OxA-5936, 4280 ± 75 BP, 3260–2630 calBCE). The burial site is located in the plains of central Lithuania on the eastern bank of the river Šušvė on the outskirts of the Plinkaigalis village, approximately 400 m southeast of an Iron age hill fort and settlement. The burial site was discovered in 1975 when local residents started digging for gravel in the western part of the hill. The same year site was granted a legal protection with archaeological excavations carried out for eight straight years in a row (1977-1984). During the eight years of fieldwork a total of 373 graves (364 inhumation and 9 cremation graves) with all but two of them dating to 3rd to 8th c. AD were uncovered. The two exceptional graves (no. 241, 242) were uncovered in the northern part of the burial site and C14 dated to the Late Neolithic.

Gyvakarai 1, 35-40 year old male (Poz-61584, 4030 ± 30 BP, 2620–2470 calBCE). The burial site is located in the northern part of Lithuania on the steep gravelly bank (elevation up to 79 m a. s. l.) of the rivulet Žvikė, 500 m to the south from where, in the wet grassland valley, it meets the main stem river Pyvesa. The site was discovered in 2000 when local residents started digging for gravel in the central part of the gravelly bank. The same year rescue excavations were conducted in the surrounding area of the highly disturbed grave resulting in discovery of a single grave C14 dated to the Late Neolithic.

EDIT (16 FEB 2018): A commentator noted that Gyvakaray1 was also studied for Yersinia pestis, a disease which appears to have expanded first to the west from the steppe, and then to the east, so it is possible that its position in PCA related to Plinkaigalis242 shows a connection to late Yamna settlers or East Bell Beaker migrants.

File modified by me from Mittnik et al. (2018) to include the approximate position of the most common ancestral components, and an identification of potential outliers. Zoomed-in version of the European Late Neolithic and Bronze Age samples. “Principal components analysis of 1012 present-day West Eurasians (grey points, modern Baltic populations in dark grey) with 294 projected published ancient and 38 ancient North European samples introduced in this study (marked with a red outline).

NOTE: I haven’t had the time and patience to work with my virtual computer on the PCA of these new samples – my CPU is reaching everyday its limit and my fans work half the time – , so I don’t know exactly which of them is Plinkaigalis242 and which Gyvakarai1, I just made a wild guess (based on ADMIXTURE) that the earlier Plinkaigalis242 forms a common ‘outlier’ group with Zvejnieki; if they are reversed or otherwise wrong in the image, please correct me. It will be much appreciated.

We can see from the additional samples in Mittnik et al. (2018) that the common cluster formed by most Baltic LN samples in PCA (most of them with clear cultural context among Late Neolithic or Corded Ware material, unlike the two ‘outliers’ and Gyvakarai1) is among Ukraine Eneolithic samples, European Corded Ware samples, and also Mesolithic-Neolithic samples from the Baltic. This is a logical find in light of the mainstream opinion that the expansion of the third horizon of the Corded Ware culture seems to have begun in the Dnieper-Dniester region (a corridor of steppe, steppe-forest, and forest zones) ca. 3300 BC.

PCA and ADMIXTURE analysis reflecting three time periods in Northern European prehistory. a Principal components analysis of 1012 present-day West Eurasians (grey points, modern Baltic populations in dark grey) with 294 projected published ancient and 38 ancient North European samples introduced in this study (marked with a red outline). Population labels of modern West Eurasians are given in Supplementary Fig. 7 and a zoomed-in version of the European Late Neolithic and Bronze Age samples is provided in Supplementary Fig. 8. b Ancestral components in ancient individuals estimated by ADMIXTURE (k = 11)

Corded Ware culture origins

If we take the most recent reliable radiocarbon analyses of material culture, and interpretations based on them of Corded Ware as a ‘complex’ similar to Bell Beaker (accepted more and more by disparate academics such as Anthony or Klejn), it seems that the controversial ‘massive’ Corded Ware migration must have begun somehow later than previously thought, which leaves these early Baltic samples still less clearly part of the initial Corded Ware culture, and more as outliers waiting for a more precise cultural context among Late Neolithic changes in the region.

Their situation in PCA among Khvalynsk (Samara), Baltic Mesolithic, East Hunger-Gatherer samples, Yamna and Eneolithic Ukraine leaves us without enough information to understand their actual origin.

EDIT (3 FEB 2018): In the first edition of my IEDDM paper I based the potential expansion of the Corded Ware culture mainly on Piezonka’s detailed analyses of the evolution of Mesolithic and Neolithic cultures in the forest-steppe and Forest Zone, and on later phylogeographic finds, since there were no samples from these regions in this interesting period. I revised it in the second edition to accomodate the model to the Indo-Uralic proto-language supported by the Leiden school, and identified it with a a close Neolithic-Chalcolithic steppe community based on common language guesstimates and – after the latest revision of Mathieson et al. (2017) – on the appearance of steppe admixture in the steppe.

However, if traditional Uralicists are right in supposing a loose Neolithic community in the Forest Zone, and Kristiansen is right in supposing long-lasting contacts in the Dniester-Dnieper region, we might actually be seeing with these ‘outliers’ the first proof that Neolithic samples from the forest-steppe and Forest Zone of the 4th millenium – unrelated to the Corded Ware culture – clustered closely to Khvalynsk, Sredni Stog, or Yamna samples, which is compatible with Piezonka’s accounts of intercultural contacts.

Martin Furholt‘s assessment of the origin of the A-horizon of the Corded Ware culture would put the early dates of Late Neolithic in the Baltic coinciding with or just before the initial expansion of Corded Ware migrants. For example, here are some excerpts (emphasis mine) from Re-evaluating Corded Ware Variability in Late Neolithic Europe (2014), in Proceedings of the Prehistoric Society (you can read it free at

Radiocarbon analysis

Acceptance of the results of radiometric dating meant that the concept of the so called ‘A-Horizon’ also had to be reformulated. If we are dealing with such a phase at all, it is not a classic typological period that is defined by a uniform material culture inventory, but rather a set of types which show a wide distribution, but which are always integrated into a locally specific and thus regionally variable context.

The situation resembles that of the Bell Beakers, where a few supra-regional types are associated with local forms of ‘Begleitkeramik’ (i.e. pottery that accompanies Bell Beakers: Strahm 1995; Besse 1996).

The distribution data indicate that this set of forms (namely the A-Beaker, ‘A-Amphora’, and A-Battle Axe, as well as Herringbone-decorated Beakers) was to be found over much of Europe around 2700 BC, and that the currency of these forms was not short: they seem to have been used continuously during the Final Neolithic, perhaps even until 2000 BC (Fig. 3; Furholt 2004). Analysis of the radiometric and dendrochronological determinations also indicates that the A-Horizon is not the earliest Corded Ware phase. Instead, it appears to follow an apparent earlier phase in Poland during which Corded Ware pottery was in use from as early as 2900 BC (Furholt 2003; 2008a; Wödarczak 2006; Ullrich 2008).

Chronological model following from radiocarbon dating. Mark the contrast to the traditional model of the A-horizon as the earliest phase and a successive increase in regional variability later on

Corded Ware and Yamna/Bell Beaker

While widening networks and a change in the mechanism of exchange appears to have contributed to the emergence of the Corded Ware archaeological phenomenon, and also the contemporaneous Yamnaya graves (Harrison & Heyd 2007) and the following Bell Beaker and Early Bronze Age phenomena, it remains to be seen exactly what factors contributed to the development of these systems. It may be that there were changes in subsistence practices, perhaps involving a rising importance of animal herding that subsequently required higher mobility (for a discussion see Dörfler & Müller 2008), but considering the obvious diversity in subsistence patterns present in different Corded Ware groups, such an explanation would seem appropriate for the transformation in some regions, but surely not for the eastern hunterfisher-gatherer groups of the Baltic (Bläuer & Kantanen 2013). Also, trade with amber and copper might have played its role, but there are so far no indications for a significant rise in quantity or reach of these two materials in connection with Corded Ware graves or settlements (Furholt 2003, 125–7).

The impacts of animal traction and the wagon are also to be taken into account, as they are present since 3400 BC (Mischka 2011) but does at least not play any visible role in Corded Ware burial rituals, very much in contrast to the previous periods (Johannsen & Laursen 2010). There is no evidence for horse riding, but the domesticated horse seems to be present in central Europe since before 3000 BC (Becker 1999) and have also been found in Corded Ware settlements (Becker 2008), but again the evidence of domesticated horses is much more abundant in the period before 3000 BC.

So, concerning amber and copper exchange, or the impact of the wheel and animal traction, there is the recurrent motive of stronger evidence for the period before 3000 BC than during or in connection to Corded Ware finds after 2700 BC.

Summary table for the chronological positions (extent of name plus vertical lines) of the most important traditional archaeological ‘cultures’, ‘Groups’ or pottery styles discussed in this paper. Note that the definitions of those units are far from consistent or comparable, because they derive from different national and regional research traditions. Bold letters indicate a unit connected to the Corded Ware phenonomenon


The evidence strongly points towards a long period of coalescence from 3000 to 2700 BC, when several innovations in burial customs, pottery, and tool types sprung forth from different places and subsequently spread via different networks of exchange and interaction. These surely showed a significant rise in scale, reach, and impact on local practices, but the same is true for the contemporary Globular Amphora and Yamnaya ‘Cultures’. This exchange resulted, roughly spoken, in a phenomenon like the A-Horizon.


Thus, it seems reasonable to explain the wide regional reach of those Corded Ware elements as the result of a general increase in mobility and thus an increase in the spatial extension of regional networks, triggered by the long-term effects of technological innovations and connected economic and social transformations in Europe since 3400 BC. It is the increase in mobility and regional networks that is new to the European Neolithic Societies after this time, and it is not only the Corded Ware elements, that are spread through these channels but also Yamnaya, Globular Amphorae, Bell Beaker ‘Cultures’, and copper and bronze artefacts in later periods. Those are archaeological classification units, heuristic tools for the ordering of finds, while brushing over variability and overlapping traits, and so they should not be confused with real social groups.

Network analysis based on the quantitative occurrence of Corded Ware pottery forms, pottery ornamentation styles, tools, weapons and ornaments as stated in Table 1, based on the catalogues given in Table 2, line thickness representing similarity

As a summary, we can say that there is still much work to be done on the origins and expansion of the Corded Ware culture, and that speculative interpretations of recent genetic papers (especially since 2015), based solely on scarce genetic finds, are not doing much in favour of sound anthropological models by connecting directly Yamna to Corded Ware (and the latter to Bell Beaker), as the multiple new anthropological ‘steppe’ models (and their unending revisions due to the gradual corrections from ‘Yamnaya’ to ‘steppe’ admixture in genetic papers) are showing.

Featured image, from Furholt’s article: Map of the Corded Ware regions discussed for central Europe. The dark shading indicates those regions where Corded Ware burial rituals are present regularly.


Differences in ADMIXTURE between Khvalynsk/Yamna and Sredni Stog/Corded Ware


Looking for differences among steppe cultures in Genomics is like looking for a needle in a haystack.

It means, after all, looking for differences among closely related cultures, such as between South-Western and North-Western Anatolian Neolithic cultures, or among Old European cultures (such as Vinča or Cucuteni–Trypillia), or between Iberian cultures after the arrival of steppe-related populations.

These differences between closely related regions, in all these cases and especially among steppe cultures, even when they are supported by Archaeology and anthropological models of migration (and compatible with linguistic models), are expected to be minimal.

Fortunately, we have phylogeography, which helps us point in the right direction when assessing potential migrations using genomic data.

User Tomenable recently pointed out a curious finding on Anthrogenica, from data available in Mathieson et al (2017): in ADMIXTURE results with K=12, a different ancestral component (in light green in the paper, see below) is traceable from the North Caspian steppe since the Neolithic. This is also partially distinguishable on K=10 and K=11, although not so clearly differentiating among later cultures.

NOTE: Read more on the controversy regarding the ideal number of ancestral populations, the absurd use of ADMIXTURE to solve language questions, and the meaning of cross-validation (CV) values

Unsupervised ADMIXTURE plot from k=10 to 12, on a dataset consisting of 1099 present-day individuals and 476 ancient individuals. We show newly reported ancient individuals and some previously published individuals for comparison.

Explanations for this finding might include, as the user points out, a greater contribution of CHG ancestry in the eastern steppe cultures (Khvalynsk/Yamna) compared to the North Pontic steppe (Sredni Stog/Corded Ware), which is probably one of the main genomic differences among both cultures, as I pointed out in the Indo-European demic diffusion model (see accounts on the origins of Khvalynsk and Sredni Stog populations and on contacts between Yamna and the Caucasus, and see below also my sketch of Eurasian genomic history).

Interesting is also the appearance of similar ancestral components later in Vučedol – which probably received admixture from Yamna settlers (see admixture components in West Yamna samples and in the Yamna settler from Bulgaria) – , and later still in the Balkans.

On the other hand, previous ancestral components in outliers from the Balkans seem to be more similar to Sredni Stog samples, giving still more strength to the hypothesis that this common (“steppe”) component expanded westward within the Pontic-Caspian steppe with the spread of Suvorovo-Novodanilovka chiefs.

Problems with this interpretation include:

1) The scarce samples available, the different cultures included, and the CV values of the K populations selected in ADMIXTURE.

2) The lack of data for comparison with Bell Beaker peoples (from Olalde et al. 2017).

3) The sample classified as Latvia_LN/CWC has this component. I have already said before that, given the differences with all other Corded Ware samples, this quite early sample might be an outlier, with Khvalynsk/Yamna population connected directly to the ancestors of this individual, possibly through exogamy (as it is clear from my sketch below). Whether or not this is an outlier among CWC populations in the Baltic, only future samples can tell.

4) Three later individuals from Corded Ware in Germany have the component, in a minimal amount. I would bet – judging by their position in the graphic – that this might be explained through the Esperstedt family. These individuals might have in turn got the contribution directly from the oldest member, who shows what seems (in PCA) like a recent admixture from contemporary steppe cultures (such as the Catacomb culture).

NOTE: See my graphics with interesting members of the Espersted family marked: ADMIXTURE and PCA (outlier).

Tentative sketch modelling the genetic history of Europe and West Eurasia from ancient populations up to the Neolithic, according to results in recent genetic papers and archaeological models of known migrations.

Again, needle in a haystack… And confirmation bias by me, indeed.

But interesting nonetheless.

EDIT (4 JAN 2017): A reader points out that the interpretation of Unsupervised ADMIXTURE should work backwards (i.e. different contributions into different modern populations), and not based solely on ancestral populations, which seems probably right. So again, confirmation bias (and potentially wrong direction fallacy) by me…


The concept of “Outlier” in Human Ancestry (II): Early Khvalynsk, Sredni Stog, West Yamna, Iron Age Bulgaria, Potapovka, Andronovo…


I already wrote about the concept of outlier in Human Ancestry, so I am not going to repeat myself. This is just an update of “outliers” in recent studies, and their potential origins (here I will repeat some of the examples):

Early Khvalynsk: the three samples from the Samara region have quite different positions in PCA, from nearest to EHG (of Y-DNA haplogroup R1a) to nearest to ANE ancestry (of Y-DNA haplogroup Q). This could represent the initial consequences of the second wave of ANE ancestry – as found later in Yamna samples from a neighbouring region -, possibly brought then by Eurasian migrants related to haplogroup Q.
With only 3 samples, this is obviously just a tentative explanation of the finds. The samples can only be reasonably said to show an unstable time for the region in terms of admixture (i.e. probably migration), judging by the data on PCA.

Ukraine Eneolithic samples offer a curious example of how the concept of outlier can change radically: from the third version (May 30th) of the preprint paper of Mathieson et al. (2017), when the Ukraine Eneolithic sample with steppe ancestry (and clustering with central European samples) was the ‘outlier’, to the fourth version (September 19th), when two samples with steppe ancestry clustering close to Corded Ware samples were now the ‘normal’ ones (i.e. those representing Ukraine Eneolithic population), and the outlier was the one clustering closely with Ukraine Mesolithic samples…

PCA and Admixture for south-eastern Europe. Image modified from Mathieson et al. (2017) – Third revision (May 30th), used in the 2nd edition of the Indo-European demic diffusion model.

This is one of the funny consequences of the wrong interpretation of the ‘yamnaya component’, that made geneticists believe at first that, out of two samples (!), the ‘outlier’ was the one with ‘yamnaya’ ancestry, because this component would have been brought by an eastern immigrant from early Khvalynsk…

This example offers yet another reason why precise anthropological context is necessary to offer the right interpretation of results. Within the Indo-European demic diffusion model – based mainly on Archaeology and Linguistics – , the sample with steppe ancestry was the most logical find in the region for a potential origin of the Corded Ware culture, and it was interpreted as such, well before the publication of the fourth version of Mathieson et al. (2017).

PCA of South-East European and other European samples. Image modified from Mathieson et al. (2017) – Fourth revision (September 19th), used in the 3rd edition of the Indo-European demic diffusion model.

West Yamna (to insist on the same question, the ‘yamnaya’ component): we have only four western Yamna samples, two of them showing Anatolian Neolithic ancestry (one of them, from Ukraine, with a strong ‘southern’ drift). On the other hand, Corded Ware migrants do not show this. So we could infer that their migrations were not coetaneous: whereas peoples of Corded Ware culture expanded ca. 3300 BC to the north – in the natural corridor to the Baltic that has been proposed for this culture in Archaeology for decades (and that is well represented by Ukraine Eneolithic samples) -, peoples of Yamna culture expanded to the west, replacing the Ukraine Eneolithic population (i.e. probably those of ‘Proto-Corded Ware culture’), and eventually mixing with Balkan populations of Anatolian Neolithic ancestry.

Potapovka, Andronovo, and Srubna: while Potapovka clusters closely to the steppe, and Andronovo (like Sintashta) clusters closely to Corded Ware (i.e. Ukraine Neolithic / Central-East European), both have certain ‘outliers’ in PCA: the former has one individual clustering closely to Corded Ware, and the latter to the steppe. Both ‘outliers’ fit well with the interpretation of the recent mixture of Corded Ware peoples with steppe populations, and they offer a different image for the evolution of populations of Potapovka and Sintashta-Petrovka, potentially influencing their language. The position of Srubna samples, nearer to Sintashta and Andronovo (but occupying the same territory as the previous Potapovka) offers the image of a late westward conquest from Corded Ware-related populations.

Diachronic map of migrations ca. 2250-1750 BC

Iron Age Bulgaria: a sample of haplogroup R1a-z93, with more ‘yamnaya’ ancestry than any other previous sample from the Balkans. For some, it might mean continuity from an older time. However – as with the Corded Ware outlier from Esperstedt before it – it is more likely a recent migrant from the steppe. The most likely origin of this individual is therefore people from the steppe, i.e. either the Srubna culture or a related group. Its relatively close cluster in PCA to certain recent Slavic populations can be interpreted in light of the multiple back and forth migrations in the region: of steppe populations to the west (Srubna, Cimmerians, Scythians, Sarmatians,…), and of Slavic-speaking populations:

Diachronic map of Bronze Age migrations ca. 1750-1250 BC.

Well-defined outliers are, therefore, essential to understand a recent history of admixture. On the other hand, the very concept of “outlier” can be a dangerous tool – when the lack of enough samples makes their classification as as such unjustified -, leading to the wrong interpretations.


Globular Amphora not linked to Pontic steppe migrants – more data against Kristiansen’s Kurgan model of Indo-European expansion


New open access article, Genome diversity in the Neolithic Globular Amphorae culture and the spread of Indo-European languages, by Tassi et al. (2017).


It is unclear whether Indo-European languages in Europe spread from the Pontic steppes in the late Neolithic, or from Anatolia in the Early Neolithic. Under the former hypothesis, people of the Globular Amphorae culture (GAC) would be descended from Eastern ancestors, likely representing the Yamnaya culture. However, nuclear (six individuals typed for 597 573 SNPs) and mitochondrial (11 complete sequences) DNA from the GAC appear closer to those of earlier Neolithic groups than to the DNA of all other populations related to the Pontic steppe migration. Explicit comparisons of alternative demographic models via approximate Bayesian computation confirmed this pattern. These results are not in contrast to Late Neolithic gene flow from the Pontic steppes into Central Europe. However, they add nuance to this model, showing that the eastern affinities of the GAC in the archaeological record reflect cultural influences from other groups from the East, rather than the movement of people.

(a) Principal component analysis on genomic diversity in ancient and modern individuals. (b) K = 3,4 ADMIXTURE analysis based only on ancient variation. (a) Principal component analysis of 777 modern West Eurasian samples with 199 ancient samples. Only transversions considered in the PCA (to avoid confounding effects of post-mortem damage). We represented modern individuals as grey dots, and used coloured and labelled symbols to represent the ancient individuals. (b) Admixture plots at K = 3 and K = 4 of the analysis conducted only considering the ancient individuals. The full plot is shown in electronic supplementary material, figure S7. The ancient populations are sorted by a temporal scale from Pleistocene to Iron Age. The GAC samples of this study are displayed in the box on the right.

Excerpt, from the discussion:

In its classical formulation, the Kurgan hypothesis, i.e. a late Neolithic spread of proto-Indo-European languages from the Pontic steppes, regards the GAC people as largely descended from Late Neolithic ancestors from the East, most likely representing the Yamna culture; these populations then continued their Westward movement, giving rise to the later Corded Ware and Bell Beaker cultures. Gimbutas [23] suggested that the spread of Indo-European languages involved conflict, with eastern populations spreading their languages and customs to previously established European groups, which implies some degree of demographic change in the areas affected by the process. The genomic variation observed in GAC individuals from Kierzkowo, Poland, does not seem to agree with this view. Indeed, at the nuclear level, the GAC people show minor genetic affinities with the other populations related with the Kurgan Hypothesis, including the Yamna. On the contrary, they are similar to Early-Middle Neolithic populations, even geographically distant ones, from Iberia or Sweden. As already found for other Late Neolithic populations [18], in the GAC people’s genome there is a component related to those of much earlier hunting-gathering communities, probably a sign of admixture with them. At the nuclear level, there is a recognizable genealogical continuity from Yamna to Corded Ware. However, the view that the GAC people represented an intermediate phase in this large-scale migration finds no support in bi-dimensional representations of genome diversity (PCA and MDS), ADMIXTURE graphs, or in the set of estimated f3-statistics.

Scheme summarizing the five alternative models compared via ABC random forest. We generated by coalescent simulation mtDNA sequences under five models, differing as to the number of migration events considered. The coloured lines represent the ancient samples included in the analysis, namely Unetice (yellow line), Bell Beaker (purple line), Corded Ware (green line) and Globular Amphorae (red line) from Central Europe, Yamnaya (light blue line) and Srubnaya (brown line) from Eastern Europe. The arrows refer to the three waves of migration tested. Model NOMIG was the simplest one, in which the six populations did not have any genetic exchanges; models MIG1, MIG2 and MIG1, 2 differed from NOMIG in that they included the migration events number 1, 2 (from Eastern to Central Europe, respectively before and after the onset of the GAC), or both. Model MIG2, 3 represents a modification of MIG2 model also including a back migration from Central to Eastern Europe after the development of the Corded Ware culture.

Together with Globular Amphora culture samples from Mathieson et al. (2017), this suggests that Kristiansen’s Indo-European Corded Ware Theory is wrong, even in its latest revised models of 2017.

The background shading indicates the tree migratory waves proposed by Marija Gimbutas, and personally
checked by her in 1995. The symbols refer to the ancient populations considered in the ABC analysis

On the other hand, the article’s genetic finds have some interesting connections in terms of mtDNA phylogeography, but without a proper archaeological model it is difficult to explain them.

Haplogroup frequencies were obtained for Early Neolithic (EN), Middle Neolithic (MN), Chalcolithic (CA), and Late Neolithic (LN). The color assigned to each haplogroup is represented on the lower right part of each plot. Haplogroup frequencies were plotted geographically using QGIS v2.14.

Text and images from the article under Creative Commons Attribution 4.0 license.

Discovered first via Bernard Sécher’s blog.

See also:

Human ancestry: how to work your own PCA, ADMIXTURE analyses for human evolutionary and genealogical studies


I wrote two days ago in the post anouncing the revised version (October 2017) of the Indo-European demic diffusion model, about dumping the information I had on doing PCA and ADMIXTURE analyses as ‘drafts’, without reviewing them, in the new section of this website called Human Ancestry.

I had some time today to review them, and to correct gross mistakes in the texts, so that they might be more usable now

I began to work with free datasets to see if I could learn something more about results of recent Genetic research by working with the available free software. For the moment, I don’t see it necessary to continue working with samples myself, because there are many professionals in Bioinformatics doing an excellent job with their publications – much better than I could do -, and publishing results early (as pre-prints) and with free licenses, which allow us to reuse and modify their material. To work again with their samples seems most of the time like reinventing the wheel.

After all, my interpretation of Indo-European migrations does not depend on my own analysis of free datasets – or on genetic analysis, or on archaeological fieldwork, for that matter – but on the study of all anthropological questions involved. I am actually more interested in Linguistics, and – only marginally – in Archaeology, as is the field of Indo-European Studies in general.

I did find certain interesting aspects that I have commented in the model, though: especially by labelling all samples and reading about them carefully (usually in the supplementary notes of the published papers), you can observe certain patterns and derive some information that others might have missed. Such examples include the Corded Ware outlier from Esperstedt (see more on the Corded Ware migration), or the differences in the three samples from early Khvalynsk.

Now that most data published seem to keep supporting what I have suggested – regarding the more complex nature of the steppe component (so-called ‘yamnaya component‘), and also regarding the migration from Yamna to Bell Beaker, and a migration of a different population (and probably language) with Corded Ware – I don’t find it worthy to spend more of my quite limited time in these tasks.

However, if I need to work again with datasets, I will try to complete the drafts the best I can. Especially regarding F3 Statistics and qpGraph, which I didn’t even try. If you want to help improve the sections, you are welcome of course.

If I find time, I might be of help with your work. And even though modern genealogy does not interest me (for the moment), I guess it can also be relevant to obtain conclusions on more recent migrations, so if I can be of any help to any interesting work, I will do it too.

Plot 3D of datasets Minoans and Mycenaeans + Scythians and Sarmatians, using the same colours as in the Indo-European demic diffusion model.


  • The concept of “outlier” in studies of Human Ancestry, and the Corded Ware outlier from Esperstedt
  • New Ukraine Eneolithic sample from late Sredni Stog, near homeland of the Corded Ware culture
  • The concept of “outlier” in studies of Human Ancestry, and the Corded Ware outlier from Esperstedt


    While writing the third version of the Indo-European demic diffusion model, I noticed that one Corded Ware sample (labelled I0104) clusters quite closely with steppe samples (i.e. Yamna, Afanasevo, and Potapovka). The other Corded Ware samples cluster, as expected, closely with east-central European samples, which include related cultures such as the Swedish Battle Axe, and later Sintashta, or Potapovka (cultures that are from the steppe proper, but are derived from Corded Ware).

    I also noticed after publishing the draft that I had used the wording “Corded Ware outlier” at least once. I certainly had that term in mind when developing the third version, but I did not intend to write it down formally. Nevertheless, I think it is the right name to use.

    PCA of dataset including Minoans and Mycenaeans, and Scythians and Sarmatians. The graphic has been arranged so that ancestries and samples are located in geographically friendly axes similar to north-south (Y), east-west(X). Symbols are used, in a simplified manner, in accordance with symbols for Y-DNA haplogroups used in the maps. Labels have been used for simplification of important components. Areas are drawn surrounding Yamna, Poltavka, Afanasevo, Corded Ware (including samples from Estonia, Battle Axe, and Poltavka outlier), and succeeding Sintashta and Potapovka cultures, as well as Bell Beaker. Corded Ware sample I0104, from Esperstedt, has also been labelled.

    Outlier in Statistics, as you can infer from the name, is a sample (more precisely an observation) that lies distant to others. It is a slippery concept in Human Evolutionary Biology, because it has no clear definition, and it is thus dependent on a certain degree of subjective evaluation. It seems to be mainly based on a combination of PCA and ADMIXTURE analyses, but should obviously be dependent on the number of samples available for a certain culture, and the regional distribution of the samples available.

    We have thus certain clear cases, like the Poltavka outlier, of R1a-M417 lineage, clustering close to Corded Ware (and Sintashta, and Potapovka) samples, but far from other R1b-L23 samples from Poltavka or Yamna cultures, from neighbouring regions in the steppe.

    We have also less clear observations, like Balkan Chalcolithic samples, which may or may not have been part of different cultural groups (say, related to the Suvorovo-Novodanilovka expansion, or not), which may justify their differences in ancestral components in ADMIXTURE, and in their position in PCA.

    And we have a Yamna sample from western Ukraine, which – unlike the other two available samples – clusters “to the south” of east Yamna samples. Taking into account the Yamna sample from Bulgaria, clustering closely with south-eastern European samples, could you really call this an outlier? Two outliers out of four western Yamna samples? Well, maybe. If you take east and west Yamna from the steppe as a whole, and exclude the Yamna sample from Bulgaria, of course you can. Whether that classification is useful, or actually hinders a proper interpretation of western Yamna samples, and of the “Yamna component” seen in them, is a different story…

    PCA for European samples of Mathieson et al. (2017)

    But what then about the Corded Ware male from Esperstedt, labelled I0104, dated ca. 2430 BC, which clusters among contemporaneous steppe (Poltavka) samples, and has the greatest proportion of ‘Yamna component’ in ADMIXTURE? After all, it is different in both respects from any other Corded Ware individual – including the oldest samples available, from Latvia (ca. 2885 BC) and Tiefbrunn (ca. 2755 BC).

    This sample is one of the direct links between the steppe and Corded Ware in late times, and has been the main reason for the confusion a lot of people seem to have about the “Yamna component” in Corded Ware, with some supporting a direct migration from one into the other, and a few even daring to say that “Corded Ware is indistinguishable from Yamna”(!?).

    His family members – all males of haplogroup R1a-M417 (like I0104 and most males from the Corded Ware culture) -, few generations later, show a decreased Yamna component, which clearly indicates that this individual’s admixture came directly from the steppe, and most likely from one or multiple female ancestors. That is compatible with the nomadic nature of the Corded Ware culture (and its known exogamy practices), which connected central Europe with the steppes, up to the North Caspian region.

    If labelling other samples as outliers may be interesting to improve the conclusions one can obtain from genetic research, labelling this sample is, in my opinion, essential, to avoid certain strong misconceptions about the origin of the Corded Ware culture.


    Indo-European demic diffusion model, 3rd edition


    I have just uploaded the working draft of the third version of the Indo-European demic diffusion model. Unlike the previous two versions, which were published as essays (fully developed papers), this new version adds more information on human admixture, and probably needs important corrections before a definitive edition can be published.

    The third version is available right now on ResearchGate and I will post the PDF at Academia Prisca, as soon as possible:

    Map overlaid by PCA including Yamna, Corded Ware, Bell Beaker, and other samples

    Feel free to comment on the paper here, or (preferably) in our forum.

    A working version (needing some corrections) divided by sections, illustrated with up-to-date, high resolution maps, can be found (as always) at the official collaborative Wiki website

    Palaeogenomic and biostatistical analysis of ancient DNA data from Mesolithic and Neolithic skeletal remains


    PhD Thesis Palaeogenomic and biostatistical analysis of ancient DNA data from Mesolithic and Neolithic skeletal remains, by Zuzana Hofmanova (2017) at the University of Mainz.

    Palaeogenomic data have illuminated several important periods of human past with surprising im- plications for our understanding of human evolution. One of the major changes in human prehistory was Neolithisation, the introduction of the farming lifestyle to human societies. Farming originated in the Fertile Crescent approximately 10,000 years BC and in Europe it was associated with a major population turnover. Ancient DNA from Anatolia, the presumed source area of the demic spread to Europe, and the Balkans, one of the first known contact zones between local hunter-gatherers and incoming farmers, was obtained from roughly contemporaneous human remains dated to ∼6 th millennium BC. This new unprecedented dataset comprised of 86 full mitogenomes, five whole genomes (7.1–3.7x coverage) and 20 high coverage (7.6–93.8x) genomic samples. The Aegean Neolithic pop- ulation, relatively homogeneous on both sides of the Aegean Sea, was positively proven to be a core zone for demic spread of farmers to Europe. The farmers were shown to migrate through the central Balkans and while the local sedentary hunter-gathers of Vlasac in the Danube Gorges seemed to be isolated from the farmers coming from the south, the individuals of the Aegean origin infiltrated the nearby hunter-gatherer community of Lepenski Vir. The intensity of infiltration increased over time and even though there was an impact of the Danubian hunter-gatherers on genetic variation of Neolithic central Europe, the Aegean ancestry dominated during the introduction of farming to the continent.

    Taking only admixture analyses using Yamna samples:

    This increased genetic affinity of Neolithic farmers to Danubians was observed for Neolithic Hungarians, LBK from central Europe and LBK Stuttgart sample. Some post-Neolithic samples also proved to share more drift with Danubians, again samples from Hungary (Bronze Age and Copper Age samples and also Yamnaya and samples with elevated Yamnaya ancestry (Early Bronze Age samples from Únětice, Bell Beaker samples, Late Neolithic Karlsdorf sample and Corded Ware samples).


    The results of our ADMIXTURE analysis for the dataset including also Yamnaya samples are shown in Figure S1c. The cross-validation error was the lowest for K=2. Supervised and unsupervised analyses for K=3 are again highly concordant. Early Neolithic farmers again demonstrate almost no evidence of hunter-gatherer admixture, while it is observable in the Middle Neolithic farmers. However, much of the Late Neolithic hunter-gatherer ancestry from the previous analysis is replaced by Yamnaya ancestry. These results are consistent with the results of Haak et al. who demonstrated a resurgence of hunter-gatherer ancestry followed by the establishment of Eastern hunter-gatherer ancestry.

    Again, admixture results show that something in the simplistic Yamna -> Corded Ware model is off. It is still interesting to review admixture results of European Mesolithic and Late Neolithic genomic data in relation to the so-called steppe or yamna ancestry or component (most likely an eastern steppe / forest zone ancestry probably also present in the earlier Corded Ware horizons) and its interpretation…

    Image composed by me, from two different images of the PhD Thesis. To the left: Supervised run of ADMIXTURE. The clusters to be supervised were chosen to best fit the presumed ancestral populations (for HG Motala and for farmers Bar8 and Bar31 and for later Eastern migration Yamnaya). To the Right: Unsupervised run of ADMIXTURE for the Anatolian genomic dataset with Yamnaya samples for K=8.

    Discovered via Généalogie génétique

    Two more studies on the genetic history of East Asia: Han Chinese and Thailand


    A comprehensive map of genetic variation in the world’s largest ethnic group – Han Chinese, by Charleston et al. (2017).

    It is believed – based on uniparental markers from modern and ancient DNA samples and array-based genome-wide data – that Han Chinese originated in the Central Plain region of China during prehistoric times, expanding with agriculture and technology northward and southward, to become the largest Chinese ethnic group.


    As are most non-European populations around the globe, the Han Chinese are relatively understudied in population and medical genetics studies. From low-coverage whole-genome sequencing of 11,670 Han Chinese women we present a catalog of 25,057,223 variants, including 548,401 novel variants that are seen at least 10 times in our dataset. Individuals from our study come from 19 out of 22 provinces across China, allowing us to study population structure, genetic ancestry, and local adaptation in Han Chinese. We identify previously unrecognized population structure along the East-West axis of China and report unique signals of admixture across geographical space, such as European influences among the Northwestern provinces of China. Finally, we identified a number of highly differentiated loci, indicative of local adaptation in the Han Chinese. In particular, we detected extreme differentiation among the Han Chinese at MTHFR, ADH7, and FADS loci, suggesting that these loci may not be specifically selected in Tibetan and Inuit populations as previously suggested. On the other hand, we find that Neandertal ancestry does not vary significantly across the provinces, consistent with admixture prior to the dispersal of modern Han Chinese. Furthermore, contrary to a previous report, Neandertal ancestry does not explain a significant amount of heritability in depression. Our findings provide the largest genetic data set so far made available for Han Chinese and provide insights into the history and population structure of the world’s largest ethnic group.

    Using Shanghai individuals as representatives, shared drift between Chinese and ancient humans are computed by calculating the outgroup f3 statistics of the form f3(Mbuty;X, Y), with ancient individuals separated into approximately Palaeolithic, Mesolithic, Neolithic , and Chalcolithic-Medieval times. it is found that modern Chinese individuals show greater shared drift with pre-Neolithic hunter-gatherers rather than Neolithic farmers (Featured image from the article).

    EDIT (17/7/2017): Davidski at Eurogenes shares an interesting view on this kind of results:

    These sorts of estimates always look way off. And I doubt that it’s largely the result of the Silk Road, which linked China to the Near East and Mediterranean rather than to Northern Europe. More likely it reflects gene flow from the Pontic-Caspian steppe in Eastern Europe during the Bronze and Iron ages, via the Afanasievo, Andronovo, and other closely related steppe peoples

    New insights from Thailand into the maternal genetic history of Mainland Southeast Asia, by Kutanan et al. (2017)


    Tai-Kadai (TK) is one of the major language families in Mainland Southeast Asia (MSEA), with a concentration in the area of Thailand and Laos. Our previous study of 1,234 mtDNA genome sequences supported a demic diffusion scenario in the spread of TK languages from southern China to Laos as well as northern and northeastern Thailand. Here we add an additional 560 mtDNA sequences from 22 groups, with a focus on the TK-speaking central Thai people and the Sino-Tibetan speaking Karen. We find extensive diversity, including 62 haplogroups not reported previously from this region. Demic diffusion is still a preferable scenario for central Thais, emphasizing the extension and expansion of TK people through MSEA, although there is also some support for an admixture model. We also tested competing models concerning the genetic relationships of groups from the major MSEA languages, and found support for an ancestral relationship of TK and Austronesian-speaking groups.

    Effective migration in Western Eurasia reveals fine-scale migration surface features


    Interesting poster from SMBE 2017, Maps of effective migration as a summary of global human genetic diversity, by Benjamin Peter, Desislava Petkova, Matthew Stephens & John Novembre, of the JNPopGen group of the University of Chicago.

    You can read the full poster in the original PDF, or in compressed image. The following are important excerpts:

    Aim: To answer the following questions:

    • Which regions have high/low effective migration?
    • How well is human genetic diversity explained by this pure isolation-by-distance model?
    • How does the explanatory performance of EEMS compare to PCA?

    Method: It uses the method proposed by Petkova et al. (2016) to fit a map of time-averaged (effective) migration rates to geographically referenced samples, and merges data from 24 different studies (8740 individuals from 469 populations) to assess human genetic diversity on global and continental scale.

    1. Basic workflow:
      • Merge data, remove duplicated & related individuals.
      • Remove Hunter-Gatherer and recently admixed populations. Their locations are still indicated with (H) and (X), respectively
    2. EEMS analysis
      • Calculate genetic distance matrix between all individuals.
      • Fit migration map to data using EEMS MCMC algorithm
    3. Comparison to PCA: Standard PCA using flashpca (Abraham & Inouye 2014) was used, they compare correlation of genetic distance induced from first ten PCs with the fitted EEMS distance

    Interpretation: A continuous habitat is approximated by a discrete grid (light gray). A Bayesian model is used to infer the most likely migration rates, which are given on a log scale compared to the Average (BLUE= 100x higher, BROWN=100x lower

    Map of effective migrations in Europe

    Results (see maps):

    1. Global diversity patterns correlate with topographical features
    2. In Western Eurasia, EEMS reveals fine-scale migration surface features

    Discussion: EEMS Maps are intuitive and direct way to visualize geographically referenced genetic data.

    Dense sampling (WEstern Eurasian panel) in particular yields high resolution and accuracy, but the method works well at a global scale (FST=0.06) and just in Western Eurasia (FST=0.01).

    EEMS-maps are able to reasonably well predict genetic differences, but hunter-gatherer populations and admixed populations were a priori excluded.

    Discovered via Eurogenes. Full image via Reddit.