Differences in ADMIXTURE between Khvalynsk/Yamna and Sredni Stog/Corded Ware


Looking for differences among steppe cultures in Genomics is like looking for a needle in a haystack.

It means, after all, looking for differences among closely related cultures, such as between South-Western and North-Western Anatolian Neolithic cultures, or among Old European cultures (such as Vinča or Cucuteni–Trypillia), or between Iberian cultures after the arrival of steppe-related populations.

These differences between closely related regions, in all these cases and especially among steppe cultures, even when they are supported by Archaeology and anthropological models of migration (and compatible with linguistic models), are expected to be minimal.

Fortunately, we have phylogeography, which helps us point in the right direction when assessing potential migrations using genomic data.

User Tomenable recently pointed out a curious finding on Anthrogenica, from data available in Mathieson et al (2017): in ADMIXTURE results with K=12, a different ancestral component (in light green in the paper, see below) is traceable from the North Caspian steppe since the Neolithic. This is also partially distinguishable on K=10 and K=11, although not so clearly differentiating among later cultures.

NOTE: Read more on the controversy regarding the ideal number of ancestral populations, the absurd use of ADMIXTURE to solve language questions, and the meaning of cross-validation (CV) values

Unsupervised ADMIXTURE plot from k=10 to 12, on a dataset consisting of 1099 present-day individuals and 476 ancient individuals. We show newly reported ancient individuals and some previously published individuals for comparison.

Explanations for this finding might include, as the user points out, a greater contribution of CHG ancestry in the eastern steppe cultures (Khvalynsk/Yamna) compared to the North Pontic steppe (Sredni Stog/Corded Ware), which is probably one of the main genomic differences among both cultures, as I pointed out in the Indo-European demic diffusion model (see accounts on the origins of Khvalynsk and Sredni Stog populations and on contacts between Yamna and the Caucasus, and see below also my sketch of Eurasian genomic history).

Interesting is also the appearance of similar ancestral components later in Vučedol – which probably received admixture from Yamna settlers (see admixture components in West Yamna samples and in the Yamna settler from Bulgaria) – , and later still in the Balkans.

On the other hand, previous ancestral components in outliers from the Balkans seem to be more similar to Sredni Stog samples, giving still more strength to the hypothesis that this common (“steppe”) component expanded westward within the Pontic-Caspian steppe with the spread of Suvorovo-Novodanilovka chiefs.

Problems with this interpretation include:

1) The scarce samples available, the different cultures included, and the CV values of the K populations selected in ADMIXTURE.

2) The lack of data for comparison with Bell Beaker peoples (from Olalde et al. 2017).

3) The sample classified as Latvia_LN/CWC has this component. I have already said before that, given the differences with all other Corded Ware samples, this quite early sample might be an outlier, with Khvalynsk/Yamna population connected directly to the ancestors of this individual, possibly through exogamy (as it is clear from my sketch below). Whether or not this is an outlier among CWC populations in the Baltic, only future samples can tell.

4) Three later individuals from Corded Ware in Germany have the component, in a minimal amount. I would bet – judging by their position in the graphic – that this might be explained through the Esperstedt family. These individuals might have in turn got the contribution directly from the oldest member, who shows what seems (in PCA) like a recent admixture from contemporary steppe cultures (such as the Catacomb culture).

NOTE: See my graphics with interesting members of the Espersted family marked: ADMIXTURE and PCA (outlier).

Tentative sketch modelling the genetic history of Europe and West Eurasia from ancient populations up to the Neolithic, according to results in recent genetic papers and archaeological models of known migrations.

Again, needle in a haystack… And confirmation bias by me, indeed.

But interesting nonetheless.

EDIT (4 JAN 2017): A reader points out that the interpretation of Unsupervised ADMIXTURE should work backwards (i.e. different contributions into different modern populations), and not based solely on ancestral populations, which seems probably right. So again, confirmation bias (and potentially wrong direction fallacy) by me…


Genomic history of Northern Eurasians includes East-West and North-South gradients


Open Access article on modern populations (including ancient samples), Between Lake Baikal and the Baltic Sea: genomic history of the gateway to Europe, by Triska et al., BMC Genetics 18(Suppl 1):110, 2017.


The history of human populations occupying the plains and mountain ridges separating Europe from Asia has been eventful, as these natural obstacles were crossed westward by multiple waves of Turkic and Uralic-speaking migrants as well as eastward by Europeans. Unfortunately, the material records of history of this region are not dense enough to reconstruct details of population history. These considerations stimulate growing interest to obtain a genetic picture of the demographic history of migrations and admixture in Northern Eurasia.

We genotyped and analyzed 1076 individuals from 30 populations with geographical coverage spanning from Baltic Sea to Baikal Lake. Our dense sampling allowed us to describe in detail the population structure, provide insight into genomic history of numerous European and Asian populations, and significantly increase quantity of genetic data available for modern populations in region of North Eurasia. Our study doubles the amount of genome-wide profiles available for this region.

We detected unusually high amount of shared identical-by-descent (IBD) genomic segments between several Siberian populations, such as Khanty and Ket, providing evidence of genetic relatedness across vast geographic distances and between speakers of different language families. Additionally, we observed excessive IBD sharing between Khanty and Bashkir, a group of Turkic speakers from Southern Urals region. While adding some weight to the “Finno-Ugric” origin of Bashkir, our studies highlighted that the Bashkir genepool lacks the main “core”, being a multi-layered amalgamation of Turkic, Ugric, Finnish and Indo-European contributions, which points at intricacy of genetic interface between Turkic and Uralic populations. Comparison of the genetic structure of Siberian ethnicities and the geography of the region they inhabit point at existence of the “Great Siberian Vortex” directing genetic exchanges in populations across the Siberian part of Asia.

f3 values to estimate (a) Eastern European Hunter-Gatherer, b Neolithic Farmer, c Caucasus hunter-gatherer, and d) Mal’ta (Ancient North Eurasian) ancestry in modern humans

Slavic speakers of Eastern Europe are, in general, very similar in their genetic composition. Ukrainians, Belarusians and Russians have almost identical proportions of Caucasus and Northern European components and have virtually no Asian influence. We capitalized on wide geographic span of our sampling to address intriguing question about the place of origin of Russian Starovers, an enigmatic Eastern Orthodox Old Believers religious group relocated to Siberia in seventeenth century. A comparative reAdmix analysis, complemented by IBD sharing, placed their roots in the region of the Northern European Plain, occupied by North Russians and Finno-Ugric Komi and Karelian people. Russians from Novosibirsk and Russian Starover exhibit ancestral proportions close to that of European Eastern Slavs, however, they also include between five to 10 % of Central Siberian ancestry, not present at this level in their European counterparts.

Admixture proportions in studied populations, K = 6. Populations from the Extended dataset. Abbreviated population codes: NSK – Russians from Novosibirsk; STV -Starover Russians; ARK: Bashkirs from Arkhangelskiy district; BRZ – Bashkirs from Burzyansky district

Our project has patched the hole in the genetic map of Eurasia: we demonstrated complexity of genetic structure of Northern Eurasians, existence of East-West and North-South genetic gradients, and assessed different inputs of ancient populations into modern populations.

Featured image, from the article: “Departures from the expected IBD. Shown populations exceed the expected IBD sharing by more than two standard deviations.”


More evidence on the recent arrival of haplogroup N and gradual replacement of R1a lineages in North-Eastern Europe


A new article (in Russian), Kinship Analysis of Human Remains from the Sargat Mounds, Baraba forest-steppe, Western Siberia, by Pilipenko et al. Археология, этнография и антропология Евразии Том 45 № 4 2017, downloadable at ResearchGate.


We present the results of a paleogenetic analysis of nine individuals from two Early Iron Age mounds in the Baraba forest -teppe, associated with the Sargat culture (fi ve from Pogorelka-2 mound 8, and four from Vengerovo-6 mound 1). Four systems of genetic markers were analyzed: mitochondrial DNA, the polymorphic part of the amelogenin gene, autosomal STR-loci, and those of the Y-chromosome. Complete or partial data, obtained for eight of the nine individuals, were subjected to kinship analysis. No direct relatives of the “parent-child” type were detected. However, the data indicate close paternal and maternal kinship among certain individuals. This was evidently one of the reasons why certain individuals were buried under a single mound. Paternal kinship appears to have been of greater importance. The diversity of mtDNA and Y-chromosome lineages among individuals from one and the same mound suggests that kinship was not the only motive behind burying the deceased people jointly. The presence of very similar, though not identical, variants of the Y chromosome in different burial grounds may indicate the existence of groups such as clans, consisting of paternally related males. Our conclusions need further confi rmation and detailed elaboration. Keywords: Paleogenetics, ancient DNA, kinship analysis, mitochondrial DNA, uniparental genetic markers, STR-loci, Y-chromosome, Baraba forest-steppe, Sargat culture, Early Iron Age.

From the older study of the same region (Baraba, numbered 4) “Location of ancient human groups with a high frequency of mtDNA haplogroups U5, U4 and U2e lineages. The area of Northern Eurasian anthropological formation is marked by yellow region on the map (References: 1. Bramanti et al., 2009; 2. Malmstrom et
al., 2009; 3. Krause et al., 2010; 4. this study)”

Chronological time scale of Bronze Age Cultures from the Baraba region
This is the same team that brought an ancient mtDNA study of different cultures within the Baraba steppe-forest region (from the Open Access book Population Dynamics in Prehistory and Early History).

The Baraba steppe-forest is a region between the Ob and Irtysh rivers (about 800 km from west to east), stretching over 200 km from the taiga zone in the north to the steppes in the south.

The new study brings a more recent picture of the region, from the Iron Age Sargat culture, ca. 500 BC – 500 AD, with five samples of haplogroup N and two samples of haplogroup R1a.

R1a lineages in the region probably derive from the previous expansion of Andronovo and related cultures, which had absorbed North Caspian steppe populations and their Late Indo-European culture.

N subclades prevalent in certain modern Eurasian populations are probably derived from the expansion of the Seima-Turbino phenomenon.

While samples are scarce, Y-DNA data keeps showing the same picture I have spoken about more than once:

N subclades (potentially originally speaking Proto-Yukaghir languages) gradually replacing haplogroup R1a (originally probably speaking Uralic languages), probably through successive founder effects (such as the bottlenecks found in Finland), which left their Uralic culture and ethnolinguistic identification intact.

Therefore, late Corded Ware groups of North-Eastern Europe (in the Forest Zone and the Baltic), mainly of R1a-Z645 subclades, probably never adopted Late Indo-European languages.


Prehistoric loan relations: Foreign elements in the Proto-Indo-European vocabulary


An interesting ongoing web project, Prehistoric loan relations, on potential loans of Proto-Indo-European words, from Uralic-Yukaghir, Caucasian, and Middle Eastern influence.

Based on a Ph.D. thesis by Bjørn (2017) Foreign elements in the Proto-Indo-European vocabulary (PDF).

From the website (emphasis mine):

This page allows historical linguists to compare and scrutinize proposed prehistoric lexical borrowings from the perspective of Proto-Indo-European. The first entries are all (135 in total) extracted from my master’s thesis “Foreign elements in the Proto-Indo-European vocabulary” (Bjørn 2017). Comments are encouraged at the bottom of each entry. New entries will be added, also on request.

Take this not as the conclusion, but an invitation to join the conversation.

So, we welcome the invitation, and hope that this new project thrives.

Also, I loved his fantasy-like map of the central Eurasian region (featured image on this post).


Indo-European demic diffusion model, 3rd Ed. – Revised October 2017


I have just uploaded a new working draft of the third version of the Indo-European demic diffusion model.

In this new version I have added more information published recently, I have updated the maps – especially the one on Palaeolithic migrations -, I have added information on Sredni Stog and its potential role in developing the Corded Ware culture and most likely language, and I have corrected certain parts that have become obsolete, especially after the latest version (19 Sept. 2017) of Mathieson et al. (2017).

It can be read or downloaded at:

Included is my first sketch of the genetic history of Europe, as I interpret it in light of Genetic research (especially from outputs of qpGraph published to date), but also Archaeology (and, to some extent, Linguistics).

Tentative sketch modelling the genetic history of Europe and West Eurasia from ancient populations up to the Bronze Age, according to results in recent Genetic papers and archaeological models of known migrations.

I have also taken this opportunity to upload some drafts I had been preparing in September while working on the Third Edition, that I have sadly not been able to complete as I would have wanted to. The drafts are posted in the section Human Ancestry. I post them as they are, in the hope that they can help others.

Another hint at the role of Corded Ware peoples in spreading Uralic languages into north-eastern Europe, found in mtDNA analysis of the Finnish population


Open article at Scientific Reports (Nature): Identification and analysis of mtDNA genomes attributed to Finns reveal long-stagnant demographic trends obscured in the total diversity, by Översti et al. (2017).

Of special interest is its depiction of Finland’s past as including the expansion of Corded Ware population of mtDNA U5b1b2 (and probably Y-DNA R1a-M417 subclades), most likely Uralic speakers of the Forest Zone, to the north of the Yamna culture (where Late Proto-Indo-European was spoken).

A later expansion of other subclades – particularly Y-DNA N1c -, was probably associated with the later western expansion of the Eurasian Seima-Turbino phenomenon, and its current prevalence in Finnish Y-DNA haplogroups might have been the consequence of the population decline ca. 1500 BC, and later Iron Age population bottleneck (with the population peak ca. 500 AD) described in the article.

That would more naturally explain the ‘cultural diffusion’ of Finnic languages into invading eastern N1c lineages, a diffusion which would have been in fact a long-term, quite gradual replacement of previously prevalent Y-DNA R1a subclades in the region, as supported by the prevalent “steppe” component in genome-wide ancestry of Finns.

Therefore, there were probably no sudden, strong population (and thus cultural) changes associated with the arrival of N1c lineages, like the ones seen with R1a (Corded Ware / Uralic) and R1b (Yamna / Proto-Indo-European) expansions in Europe.

How the Saami fit into this scheme is not yet obvious, though.


In Europe, modern mitochondrial diversity is relatively homogeneous and suggests an ubiquitous rapid population growth since the Neolithic revolution. Similar patterns also have been observed in mitochondrial control region data in Finland, which contrasts with the distinctive autosomal and Y-chromosomal diversity among Finns. A different picture emerges from the 843 whole mitochondrial genomes from modern Finns analyzed here. Up to one third of the subhaplogroups can be considered as Finn-characteristic, i.e. rather common in Finland but virtually absent or rare elsewhere in Europe. Bayesian phylogenetic analyses suggest that most of these attributed Finnish lineages date back to around 3,000–5,000 years, coinciding with the arrival of Corded Ware culture and agriculture into Finland. Bayesian estimation of past effective population sizes reveals two differing demographic histories: 1) the ‘local’ Finnish mtDNA haplotypes yielding small and dwindling size estimates for most of the past; and 2) the ‘immigrant’ haplotypes showing growth typical of most European populations. The results based on the local diversity are more in line with that known about Finns from other studies, e.g., Y-chromosome analyses and archaeology findings. The mitochondrial gene pool thus may contain signals of local population history that cannot be readily deduced from the total diversity.

From its results:

In general, there appears to be two loose and largely overlapping clusters among the Finn-characteristic haplogroups: the first between 1,000–2,000 ybp and the second around 3,300–5,500 ybp. The age of the older cluster coincides temporally with the arrival of the Corded-Ware culture and, notably, the spread of agriculture in Finland. The arrival and spread of agriculture, temporally corresponding with the age estimates for most of the haplogroups characteristic of Finns, might be a sign of population size increase enabled by the new mode of subsistence, resulting in reduced drift and accumulation of genetic diversity in the population.


Another insight in the past population sizes in Finland is based on radiocarbon-dated archaeological findings in different time periods. These analyses suggest two prehistoric population peaks in Finland, the Stone Age peak (c. 5,500 ybp) and the Metal Age peak (~1,500 ybp). Both of these peaks were followed by a population decline, which appears to have reached its ebb around 3,500 ybp. These developments are not distinguishable in the BSPs. However, these ages correspond well to the two haplogroup age clusters described above. The presumably less severe Iron Age population bottleneck seen in the archaeological data, 1,500–1,300 ybp, temporally coincides with the population size reduction visible for the Finn-characteristic subhaplogroups.


Discovered via Eurogenes.