Spread of Indo-European and Uralic speakers in ADMIXTURE


The following are updated files for unsupervised ADMIXTURE of most available ancient Eurasian samples with K=7. For reference, see PCA of ancient and modern Eurasian samples.

NOTE. For a precise interpretation of ancestry evolution, be sure to first check the posts on the expansion of “Steppe ancestry”, on the spread of Yamnaya ancestry with Indo-Europeans, and on the evolution of Corded Ware ancestry typical of modern Uralic populations.

ADMIXTURE timeline

This is a YouTube video similar to the one on Indo-Europeans and Y-DNA evolution:


Some comments

  • I have tried running supervised ADMIXTURE models by selecting distant populations based on PCAs and qpAdm results. The most accurate approximations to what the software should offer appear with a small K number, between K=5 and K=7, whether supervised or unsupervised, and adding more ancestral populations gives some weird results the more distant (in time) populations are from these selected samples.
  • Labels for ancestral components are used following those commonly referred to in the literature, although supervised ADMIXTURE using corresponding available samples (viz. Anatolia Neolithic for AHG, Iran Hotu and/or CHG for IHG, AG2, AG3 and Mal’ta for ANE, etc.) offer slightly different, less smooth outputs for some periods, especially among more recent populations.
  • Outputs depend on many different factors, and these files are intended as an overview of the evolution of these simplistic components. The number of available samples per period, the potential ancestry changes within each conventionally selected period, or whether or not each available sample is representative of the territory they were recovered from, among many other factors, influence the outputs and the maps.
Unsupervised ADMIXTURE (K=7). See full image.

NOTE. In summary, ADMIXTURE results like these below might be used to develop new ideas, to be then formally tested; they cannot be used to support anything. Don’t be like the Copenhagen group, randomly selecting “Steppe ancestry” with K=4, identifying this component as “Indo-Europeans”, and correlating its evolution with changes in vegetation composition in yet another obvious correlation = causation argument among many confounding factors left unaccounted for…

Static ADMIXTURE + culture maps

Colours correspond to the components as labelled in the video and in the files below.

  1. Anatomically Modern Humans (PDF)
  2. Upper Palaeolithic (PDF)
  3. Epipalaeolithic (PDF)
  4. Early Mesolithic (PDF)
  5. Late Mesolithic (PDF)
  6. Neolithic and hunter-gatherer pottery (PDF)
  7. Early Eneolithic (PDF)
  8. Late Eneolithic (PDF)
  9. Early Chalcolithic (PDF)
  10. Late Chalcolithic (PDF)
  11. Early Bronze Age (PDF)
  12. Middle Bronze Age (PDF)
  13. Late Bronze Age (PDF)
  14. Early Iron Age (PDF)
  15. Late Iron Age (PDF)
  16. Antiquity (PDF)
  17. Middle Ages (PDF)

Natural interpolation maps of ADMIXTURE

The following maps offer natural neighbour interpolations of ancestral components in ancient DNA samples grouped by periods (conventionally selected following the same pattern as in the Prehistory Atlas).

  • Extrapolation (inferred ancestry beyond the frame created by available samples per map) is obtained by adding distant external locations (such as Greenland, Arctic, Alaska…) with a value of 0.
  • Videos offer a dynamic timeline.
  • Click on the images to see a version with higher resolution.

WHG ancestry


AHG ancestry


ANE ancestry


“Siberian” ancestry

This ancestry peaks among Baikal HG, Ust’Belaya, Nganasans, or Ulchi, hence the different labels used.


Iran HG ancestry


ADMIXTURE maps by period

Click on each image for a higher resolution version.





Early Eneolithic


Late Eneolithic


Early Chalcolithic


Late Chalcolithic


Early Bronze Age


Middle Bronze Age


Late Bronze Age


Early Iron Age


Late Iron Age




Middle Ages


Modern populations



These are the samples used for interpolations in each period (except for modern populations, which are those included in the Reich Lab curated dataset):

See also

Iron Age Tocharians of Yamnaya ancestry from Afanasevo show hg. R1b-M269 and Q1a1

New open access Ancient Genomes Reveal Yamnaya-Related Ancestry and a Potential Source of Indo-European Speakers in Iron Age Tianshan, by Ning et al. Current Biology (2019).

Interesting excerpts (emphasis mine, changes for clarity):

Here, we report the first genome-wide data of 10 ancient individuals from northeastern Xinjiang. They are dated to around 2,200 years ago and were found at the Iron Age Shirenzigou site. We find them to be already genetically admixed between Eastern and Western Eurasians. We also find that the majority of the East Eurasian ancestry in the Shirenzigou individuals is related to northeastern Asian populations, while the West Eurasian ancestry is best presented by ∼20% to 80% Yamnaya-like ancestry. Our data thus suggest a Western Eurasian steppe origin for at least part of the ancient Xinjiang population. Our findings furthermore support a Yamnaya-related origin for the now extinct Tocharian languages in the Tarim Basin, in southern Xinjiang.


The dominant mtDNA lineages of the Shirenzigou people are commonly found in modern and ancient West Eurasian populations, such as U4, U5, and H, while they also have East Eurasian-specific haplogroups A, D4, and G3, preliminarily documenting admixed ancestry from eastern and western Eurasia.

The admixture profile is also shown on the paternal Y chromosome side that 4 out of 6 males in Shirenzigou (Figure S2) belong to the West Eurasian-specific haplogroup R1b (n = 2) and East Eurasian-specific haplogroup Q1a (n = 2), the former is predominant in ancient Yamnaya and nearly 100% in Afanasievo, different from the Middle and Late Bronze Age Steppe groups (Steppe_MLBA) such as Andronovo, [Potapovka], Srubnaya, and Sintashta whose Y chromosomal haplogroup is mainly R1a.



We first carried out principal component analysis (PCA) to assess the genetic affinities of the ancient individuals qualitatively by projecting them onto present-day Eurasian variation (Figure 2). We observed a distinct separation between East and West Eurasians. Our ancient Shirenzigou samples and present-day populations from Central Asia and northwestern China form a genetic cline from East to West in the first PC. The distribution of Shirenzigou samples on the cline is relatively scattered with two major clusters, one being closer to modern-day Uygurs and Kazakhs and the other being closer to recently published ancient Saka and Huns from the Tianshan in Kazakhstan (…).

We applied a formal admixture test using f3 statistics in the form of f3 (Shirenzigou; X, Y) where X and Y are worldwide populations that might be the genetic sources for the Shirenzigou individuals. We observed the most significant signals of admixture in the Shirenzigou samples when using Yamnaya_Samara or Srubnaya as the West Eurasian source and some Northern Asians or Koreans as the East Eurasian source (Table S1). We also plotted the outgroup f3 statistics in the form of f3 (Mbuti; X, Anatolia_Neolithic) and f3 (Mbuti; X, Kostenki14) to visualize the allele sharing between population X and Anatolian farmers. As shown in Figure S3, the Steppe_MLBA populations including Srubnaya, Andronovo, and Sintashta were shifted toward farming populations compared with Yamnaya groups and the Shirenzigou samples. This observation is consistent with ADMIXTURE analysis that Steppe_MLBA populations have an Anatolian and European farmer-related component that Yamnaya groups and the Shirenzigou individuals do not seem to have. The analysis consistently suggested Yamnaya-related Steppe populations were the better source in modeling the West Eurasian ancestry in Shirenzigou.

PCA and ADMIXTURE for Shirenzigou Samples. Modified from the original to include in black squares samples related to Yamnaya.

Genetic Composition of Iron Age Shirenzigou Individuals

We continued to use qpAdm to estimate the admixture proportions in the Shirenzigou samples by using different pairs of source populations, such as Yamnaya_Samara, Afanasievo, Srubnaya, Andronovo, BMAC culture (Bustan_BA and Sappali_Tepe_BA) and Tianshan_Hun as the West Eurasian source and Han, Ulchi, Hezhen, Shamanka_EN as the East Eurasian source. In all cases, Yamnaya, Afanasievo, or Tianshan_Hun always provide the best model fit for the Shirenzigou individuals, while Srubnaya, Andronovo, Bustan_BA and Sappali_Tepe_BA only work in some cases. The Yamnaya_Samara or Afanasievo-related ancestry ranges from ∼20% to 80% in different Shirenzigou individuals, consistent with the scattered distribution on the East-West cline in the PCA


(…) we then modeled Shirenzigou as a three-way admixture of Yamnaya_Samara, Ulchi (or Hezhen) and Han to infer the source from the East Eurasia side that contributed to Shirenzigou. We found the Ulchi or Hezhen and Han-related ancestry had a complicated and unevenly distribution in the Shirenzigou samples. The most Shirenzigou individuals derived the majority of their East Eurasian ancestry from Ulchi or Hezhen-related populations, while the following two individuals M820 and M15-2 have more Han related than Ulchi/Hezhen-related ancestry.

One important question remains, though: how and when did these Proto-Tocharian speakers migrate from the Afanasevo culture in the Altai into the Tarim Basin? The traditional answer, now more likely than ever, is through the Chemurchek culture. See e.g. A re-analysis of the Qiemu’erqieke (Shamirshak) cemeteries, Xinjiang, China, by Jia and Betts JIES (2010) 38(4).

Also, given the apparent lack of (extra farmer ancestry that characterizes) Corded Ware ancestry, if the results were already suspicious before, how likely are now the published R1a(xZ93) and/or radiocarbon dates of the Xiaohe mummies from Li et al. (2010, 2015)? Because, after all, one should have expected in such a late date a generalized admixture with neighbouring Srubna/Andronovo-like populations.


Y-DNA haplogroups of Tuvinian tribes show little effect of the Mongol expansion


Open access Estimating the impact of the Mongol expansion upon the gene pool of Tuvans, by Balanovskaya et al., Vavilov Journal of genetics and breeding (2018), 22(5):611-619.

Abstract (emphasis mine):

With a view to trace the Mongol expansion in Tuvinian gene pool we studied two largest Tuvinian clans – those in which, according to data of humanities, one could expect the highest Central Asian ancestry, connected with the Mongol expansion. Thus, the results of Central Asian ancestry in these two clans component may be used as upper limit of the Mongol influence upon the Tuvinian gene pool in a whole. According to the data of 59 Y-chromosomal SNP markers, the haplogroup spectra in these Tuvinian tribal groups (Mongush, N = 64, and Oorzhak, N = 27) were similar. On average, two-thirds of their gene pools (63 %) are composed by North Eurasian haplogroups (N*, N1a2, N3a, Q) connected with autochtonous populations of modern area of Tuvans. The Central Asian haplogroups (C2, O2) composed less then fifth part (17 %) of gene pools of the clans studied. The opposite ratio was revealed in Mongols: there were 10 % North Eurasian haplogroups and 75 % Central Asian haplogroups in their gene pool. All the results derived – “genetic portraits”, the matrix of genetic distances, the dendrogram and the multidimensional scaling plot, which mirror the genetic connections between Tuvinian clans and populations of South Siberia and East Asia, demonstrated the prominent similarity of the Tuvinian gene pools with populations from and Khakassia and Altai. It could be therefore assumed that Tuvinian clans Mongush and Oorzhak originated from autochtonous people (supposedly, from the local Samoyed and Kets substrata). The minor component of Central Asian haplogroups in the gene pool of these clans allowed to suppose that Mongol expansion did not have a significant influence upon the Tuvinan gene pool at a whole.


Interesting excerpts:

Haplogroup C2 peaks in Central Asia (Wells et al., 2001; Zerial et al., 2003), though its variants are abundant in other peoples of Siberia and Far East. For instance, in one of Buryat clans, namely Ekhirids, hg C2 frequency is 88 % (Y-base); in Kazakhs from different regions of Kazakhstan, total occurrence of hg C2 variants averages between 17 and 81 % (Abilev et al., 2012; Zhabagin et al., 2013, 2014, 2017), in populations of the Amur River (such as Nanais, Negidals, Nivkhs, Ulchs) – between 40 and 65 %, in Evenks – up to 68 % (Y-base), in Kyrgyz people of Pamir-Alay – up to 22 %, correspondingly; of all Turkic peoples of Altai, relatively high hg C2 frequency (16 %) is detected only in Telengits (Balanovskaya et al., 2014; Balaganskaya et al., 2011a, 2016). In Tuvinian clans under the study, hg C2 frequency is rather low – 19 % in Mongush and 11 % in Oorzhak, while in Mongols it makes up almost two thirds of the entire gene pool an comprises different genetic lines (subhaplogroups).

Y-chromosomal haplogroup spectra in gene pools of Tuvinian Oorzhak and Mongush clans and of the neighboring populations of South Siberia and Central Asia.

Haplogroup N is abundant all over North Eurasia from Scandinavia to Far East (Rootsi et al., 2007). The study on whole Y-chromosome sequencing conducted with participation of our group (Ilumäe et al., 2016) subdivided this haplogroup into several branches with their regional distribution. In gene pools of the Tuvans involved, hg N was represented by two sub-clades, namely N1a2 and N3a.

Sub-clade N1a2 peaks in populations of West Siberia (in Nganasans, frequency is 92 %) and South Siberia (in Khakas 34 %, in Tofalars 25 %) (Y-base). In Tuvans, N1a2 occurrence is nearly 16 % in Mongush and 15 % in Oorzhak clans, respectively, while in Mongols, the frequency is three times less (5 %). Hg N1a2 is supposed to display the impact of the Samoyedic component to the gene pool of Tuvinian clans (Kharkov et al., 2013).

Sub-clade N3a is major in the Oorzhak clan comprising almost half of the gene pool (45 %); it is represented by two sub-clades, namely N3a* and N3a5. The same sub-branches are specific to the Mongush clan as well, though with lower frequencies: N3a* – 9 % and N3a5 – 14 % (see Table). In Khori-Buryats from the Transbaikal region, a high frequency is observed – 82 % (Kharkov et al., 2014), while in Mongols, N3a5 occurs rather rarely (6 %). Hg N3a* was detected in populations of South Siberia only, and was widely spread in Khakas-Sagays and Shors (up to 40 %) (Ilumäe et al., 2016) (Y-base).

Map of distribution of Samoyedic languages (red) in the XVII century (approximate; hatching) and in the end of XX century (continuous background). Modified from Wikipedia, with the Tuva region labelled.

Within the pan-Eurasian haplogroup R1a1a, two large genetic lines (sub-haplogroups) are identified: “European” (marker M458) and “Asian” (marker Z93) the latter almost never occurring in Europe (Balanovsky, 2015) but abundant in South Siberia and northern Hindustan. In the Altai-Sayan region, high frequencies of the “Asian” branch are spread in many peoples – Shors, Tubalars, Altai-Kizhi people, Telengits, Sagays, Kyzyl Khakas, Koibals, Teleuts (Y-base) (Kharkov et al., 2009). Hg R1a1a comprises perceptible parts of gene pools of Tuvinian clans (19 % in Mongush, and 15 % in Oorzhak), though its occurrence in Mongols is much lower (6 %). Those results also count in favor of the hypothesis of autochtonous component dominance even in the gene pools of clans potentially most influenced by Mongolian ancestry. If we add R1a1a variants to the “North Eurasian” haplogroups, the “not-Central Asian” component will compose average four fifth of the entire gene pools for Tuvinian clans (in Mongush 77 %, and in Oorzhak 81 %), being only 16 % in Mongols. Such data are definitely contrary to the hypothesis of a crucial influence of the Mongol expansion upon the development of Tuvinian gene pool.

I found interesting the high proportion of R1a-Z93 subclades among Sagays in Khakhasia, which stem from a local Samoyed substratum, as described by the paper…

Featured Image: Map of Uralic and Altaic languages, from Wikipedia.


On Latin, Turkic, and Celtic – likely stories of mixed societies and little genetic impact


Recent article on The Conversation, The Roman dead: new techniques are revealing just how diverse Roman Britain was, about the paper (behind paywall) A Novel Investigation into Migrant and Local Health-Statuses in the Past: A Case Study from Roman Britain, by Redfern et al. Bioarchaeology International (2018), among others.

Interesting excerpts about Roman London:

We have discovered, for example, that one middle-aged woman from the southern Mediterranean has black African ancestry. She was buried in Southwark with pottery from Kent and a fourth century local coin – her burial expresses British connections, reflecting how people’s communities and lives can be remade by migration. The people burying her may have decided to reflect her life in the city by choosing local objects, but we can’t dismiss the possibility that she may have come to London as a slave.

The evidence for Roman Britain having a diverse population only continues to grow. Bioarchaeology offers a unique and independent perspective, one based upon the people themselves. It allows us to understand more about their life stories than ever before, but requires us to be increasingly nuanced in our understanding, recognising and respecting these people’s complexities.

We already have a more or less clear idea about how little the Roman conquest may have shaped the genetic map of Europe, Africa, or the Middle East, in contrast to other previous or later migrations or conquests.

Also, on the Turkic expansion, the recent paper of Damgaard et al. (Nature 2018) stated:

In the sixth century AD, the Hunnic Empire had been broken up and dispersed as the Turkic Khaganate assumed the military and political domination of the steppes22,23. Khaganates were steppe nomad political organizations that varied in size and became dominant during this period; they can be contrasted to the previous stateless organizations of the Iron Age24. The Turkic Khaganate was eventually replaced by a number of short-lived steppe cultures25 (…).

We find evidence that elite soldiers associated with the Turkic Khaganate are genetically closer to East Asians than are the preceding Huns of the Tian Shan mountains (Supplementary Information section 3.7). We also find that one Turkic Khaganate-period nomad was a genetic outlier with pronounced European ancestries, indicating the presence of ongoing contact with Europe (…).

Analyses of Turk- and Medieval-period population clusters. a, PCA of Tian Shan Hun, Turk, Kimak, Kipchack, Karakhanid and Golden Horde, including 28 individuals analysed at 242,406 autosomal SNP positions. b, Results for model-based clustering analysis at K = 7. Here we illustrate the admixture analyses with K = 7 as it approximately identifies the major component of relevance (Anatolian/ European farmer component, Caucasian ancestry, EHG-related ancestry and East Asian ancestry).”

These results suggest that Turkic cultural customs were imposed by an East Asian minority elite onto central steppe nomad populations, resulting in a small detectable increase in East Asian ancestry. However, we also find that steppe nomad ancestry in this period was extremely heterogeneous, with several individuals being genetically distributed at the extremes of the first principal component (Fig. 2) separating Eastern and Western descent. On the basis of this notable heterogeneity, we suggest that during the Medieval period steppe populations were exposed to gradual admixture from the east, while interacting with incoming West Eurasians. The strong variation is a direct window into ongoing admixture processes and the multi-ethnic cultural organization of this period.

We already knew that the expansion of the La Tène culture, associated with the expansion of Celtic languages throughout Europe, was probably not accompanied by massive migrations (from the IEDM, 3rd ed.):

The Mainz research project of bio-archaeometric identification of mobility has not proven to date a mass migration of Celtic peoples in central Europe ca. 4th-3rd centuries BC, i.e. precisely in a period where textual evidence informs of large migratory movements (Scheeres 2014). La Tène material culture points to far-reaching inter-regional contacts and cultural transfers (Burmeister 2016).

Also, from the latest paper on Y-chromosome bottleneck:

[The hypothesis of patrilineal kin group competition] has an added benefit in that it could explain the temporal placement of the bottleneck if competition between patrilineal kin groups was the main form of intergroup competition for a limited episode of time after the Neolithic transition. Anthropologists have repeatedly noted that the political salience of unilineal descent groups is greatest in societies of ‘intermediate social scale’ (Korotayev47 and its citations on p. 2), which tend to be post-Neolithic small-scale societies that are acephalous, i.e. without hierarchical institutions48. Corporate kin groups tend to be absent altogether among mobile hunter gatherers with few defensible resource sites or little property (Kelly49 pp. 64–73), or in societies utilizing relatively unoccupied and under-exploited resource landscapes (Earle and Johnson50 pp. 157–171). Once they emerge, complex societies, such as chiefdoms and states, tend to supervene the patrilineal kin group as the unit of intergroup competition, and while they may not eradicate them altogether as sub-polity-level social identities, warfare between such kin groups is suppressed very effectively51,52.These factors restrict the social phenomena responsible for the bottleneck to the period after the initial Neolithic but before the emergence of complex societies, which would place the bottleneck-generating mechanisms in the right period of time for each region of the Old World.

Diachronic map of Late Copper Age migrations including Classical Bell Beaker (east group) expansion from central Europe ca. 2600-2250 BC

However, I recently read in a forum for linguists that the expansion of East Bell Beakers overwhelmingly of R1b-L21 subclades in the British Isles “poses a problem”, in that it should be identified with a Celtic expansion earlier than traditionally assumed…

That interpretation would be in line with the simplistic maps we are seeing right now for Bell Beakers (see below for the Copenhagen group).

If anything, the results of Bell Beaker expansions (taken alone) would seem to support a model similar to Cunliffe & Koch‘s hypotheses of a rather early Celtic expansion into Great Britain and Iberia from the Atlantic.

Spread of Indo-European languages (by the Copenhagen group).

But it doesn’t. Mallory already explained why in Cunliffe & Koch’s series Celtic from the West: the Bell Beaker expansion is too early for that; even for Italo-Celtic. It should correspond to North-West Indo-European speakers.

Not every population movement that is genetically very significant needs to be significant for the languages attested much later in the region.

This should be obvious to everyone with the many examples we already have. One of the least controversial now would probably be the expansion of R1b-DF27, widespread in Iberia probably at roughly the same time as R1b-L21 was in Great Britain, and still pre-Roman Iberians showed a mix of non-Indo-European languages, non-Celtic languages (at least Galaico-Lusitanian), and also some (certain) Celtic languages. And modern Iberians speak Romance languages, without much genetic impact from the Romans, either…

It is well-established in Academia that the expansion of La Tène is culturally associated with the spread of Celtic languages in Europe, including the British Isles and Iberia. While modern maps of U152 distribution may correspond to the migration of early Celts (or Italo-Celtic speakers) with Urnfield/Hallstatt, the great Celtic expansion across Europe need not show a genetic influence greater than or even equal to that of previous prehistoric migrations.

Post-Bell-Beaker Europe, after ca. 2200 BC.

You can see in these de novo models the same kind of invented theoretical ‘problem’ (as Iosif Lazaridis puts it) that we have seen with the Corded Ware showing steppe ancestry, with Old Hittite samples not showing EHG ancestry, or with CHG ancestry appearing north of the Caucasus but no EHG to the south.

However you may want to explain all these errors in scientific terms (selection bias, under-coverage, over-coverage, faulty statistical methods, etc.), these interpretations were simply fruit of the lack of knowledge of the anthropological disciplines at play.

Let’s hope the future paper on Celtic expansion takes this into consideration.


Demographic history and genetic adaptation in the Himalayan region

Open access Demographic history and genetic adaptation in the Himalayan region inferred from genome-wide SNP genotypes of 49 populations, by Arciero et al. Mol. Biol. Evol (2018), accepted manuscript (msy094).

Abstract (emphasis mine):

We genotyped 738 individuals belonging to 49 populations from Nepal, Bhutan, North India or Tibet at over 500,000 SNPs, and analysed the genotypes in the context of available worldwide population data in order to investigate the demographic history of the region and the genetic adaptations to the harsh environment. The Himalayan populations resembled other South and East Asians, but in addition displayed their own specific ancestral component and showed strong population structure and genetic drift. We also found evidence for multiple admixture events involving Himalayan populations and South/East Asians between 200 and 2,000 years ago. In comparisons with available ancient genomes, the Himalayans, like other East and South Asian populations, showed similar genetic affinity to Eurasian hunter-gatherers (a 24,000-year-old Upper Palaeolithic Siberian), and the related Bronze Age Yamnaya. The high-altitude Himalayan populations all shared a specific ancestral component, suggesting that genetic adaptation to life at high altitude originated only once in this region and subsequently spread. Combining four approaches to identifying specific positively-selected loci, we confirmed that the strongest signals of high-altitude adaptation were located near the Endothelial PAS domain-containing protein 1 (EPAS1) and Egl-9 Family Hypoxia Inducible Factor 1 (EGLN1) loci, and discovered eight additional robust signals of high-altitude adaptation, five of which have strong biological functional links to such adaptation. In conclusion, the demographic history of Himalayan populations is complex, with strong local differentiation, reflecting both genetic and cultural factors; these populations also display evidence of multiple genetic adaptations to high-altitude environments.

Population samples analysed in this study. A. Map of South and East Asia, highlighting the four regions examined, and the colour assigned to each. B. Samples from the Tibetan Plateau. C.Samples from Nepal. D. Samples from Bhutan and India. The circle areas are proportional to the sample sizes. The three letter population codes in B-D are defined in supplementary table S1.

Relevant excerpts:

Genetic affinity to ancestral populations

We explored the genetic affinity between the Himalayan populations and five ancient genomes using f3-outgroup statistics. Himalayans show greater affinity to Eurasian hunter-gatherers (MA-1, a 24,000- year-old Upper Palaeolithic Siberian), and the related Bronze Age Yamnaya, than to European farmers (5,500-4,800 years ago; Fig. 5A) or to European hunter-gatherers (La Braña, 7,000 years ago; Fig. 5B), like other South and East Asian populations. We further explored the affinity of Himalayan populations by comparing them with the 45,000-year-old Upper Palaeolithic hunter-gatherer (Ust’-Ishim) and each of MA-1, La Braña, or Yamnaya. Himalayan individuals cluster together with other East Asian populations and show equal distance from Ust’-Ishim and the other ancient genomes, probably because Ust’-Ishim belongs to a much earlier period of time (supplementary fig. S15). We also explored genetic affinity between modern Himalayan populations and five ancient Himalayans (3,150 1,250 years old) from Nepal. The ancient individuals cluster together with modern Himalayan populations in a worldwide PCA (supplementary fig. S16), and the f3-outgroup statistics show modern high-altitude populations have the closest affinity with these ancient Himalayans, suggesting that these ancient individuals could represent a proxy for the first populations residing in the region (supplementary fig. S17 and supplementary table S4). Finally, we explored the genetic affinity of Himalayan samples with the archaic genomes of Denisovans and Neanderthals (Skoglund and Jakobsson 2011), and found that they show a similar sharing pattern with Denisovans and Neanderthals to the other South and East Asian populations. Individuals belonging to four Nepalese, one Cambodian, and three Chinese populations show the highest Denisovan sharing (after populations from Australia and Papua New Guinea) but these values are not significantly greater than other South and East Asian populations (supplementary figs. S18 and S19).

Genetic structure of the Himalayan region populations from analyses using unlinked SNPs. A. PCA of the Himalayan and HGDP-CEPH populations. Each dot represents a sample, coded by region as indicated. The Himalayan region samples lie between the HGDP-CEPH East Asian and South Asian samples on the right-hand side of the plot. B. PCA of the Himalayan populations alone. Each dot represents a sample, coded by country or region as indicated. Most samples lie on an arc between Bhutanese and Nepalese samples; Toto (India) are seen as extreme outlier in the bottom left corner, while Dhimal (Nepal) and Bodo (India) also form outliers.

NOTE. The variance explained in the PCA graphics seems to be too high. This happened recently also with the Damgaard et al. (2018) papers (see here the comment by Iosif Lazaridis).

Similarities and differences between high-altitude Himalayan

The most striking example is provided by the Toto from North India, an isolated tribal group with the lowest genetic diversity of the Himalayan populations examined here, indicated by the smallest long-term Ne (supplementary fig. S5), and a reported census size of 321 in 1951 (Mitra 1951), although their numbers have subsequently increased. Despite this extreme substructure, shared common ancestry among the high-altitude populations (Fig. 2C and Fig. 3) can be detected, and the Nepalese in general are distinguished from the Bhutanese and Tibetans (Fig. 2C) and they also cluster separately (Fig. 3). In a worldwide context, they share an ancestral component with South Asians (supplementary fig. S2). On the other hand, the Tibetans do not show detectable population substructure, probably due to a much more recent split in comparison with the other populations (Fig. 2C and supplementary fig. S6). The genetic similarity between the high-altitude populations, including Tibetans, Sherpa and Bhutanese, is also supported by their clustering together on the phylogenetic tree, the PCA generated from the co-ancestry matrix generated by fineSTRUCTURE (supplementary fig. S10 and S11), the lack of statistical significance for most of the D-statistics tests (Yoruba, Han; high-altitude Himalayan 1, high-altitude Himalayan 2), and the absence of correlation between the increased genetic affinity to lowland East Asians and the spatial location of the Himalayan populations (supplementary figs. S12 and S13). Together, these results suggest the presence of a single ancestral population carrying advantageous variants for high-altitude adaptation that separated from lowland East Asians, and then spread and diverged into different populations across the Himalayan region. (…)

Recent admixture events

Genetic structure of the Himalayan region populations from analyses using unlinked SNPs. C. ADMIXTURE (K values of 2 to 6, as indicated) analysis of the Himalayan samples. Note that most increases in the value of K result in single population being distinguished. Population codes in C are defined in supplementary table S1.

Himalayan populations show signatures of recent admixture events, mainly with South and East Asian populations as well as within the Himalayan region itself. Newar and Lhasa show the oldest signature of admixture, dated to between 2,000 and 1,000 years ago. Majhi and Dhimal display signatures of admixture within the last 1,000 years. Chetri and Bodo show the most recent admixture events, between 500 and 200 years ago (Fig. 4, supplementary tables S3). The comparison between the genetic tree and the linguistic association of each Himalayan population highlights the agreement between genetic and linguistic sub-divisions, in particular in the Bhutanese and Tibetan populations. Nepalese populations show more variability, with genetic sub-clusters of populations belonging to different linguistic affiliations (Fig. 3B). Modern high-altitude Himalayans show genetic affinity with ancient genomes from the same region (supplementary fig. S17), providing additional support for the idea of an ancient high-altitude population that spread across the Himalayan region and subsequently diverged into several of the present-day populations. Furthermore, Himalayan populations show a similar pattern of allele sharing with Denisovans as other South-East Asian populations (supplementary fig. S18 and S19). Overall, geographical isolation, genetic drift, admixture with neighbouring populations and linguistic subdivision played important roles in shaping the genetic variability we see in the Himalayan region today.


Linguistic continuity despite genetic replacement in Remote Oceania


Review of recent papers on East Asia, quite relevant these days: Human Genetics: Busy Subway Networks in Remote Oceania? by Anders Bergström & Chris Tyler-Smith, Current Biology (2018) 28.

Interesting excerpts (emphasis mine):

Ancient DNA is transforming our understanding of the human past by forcing geneticists to confront its real complexity [1]. Historians and archaeologists have long known that the development of human societies was complex and often haphazard, but geneticists have persistently tried to explain present-day patterns of genetic variation using simple models.

Early genetic analyses of present-day populations revealed a mix of Asian (Taiwanese) and Papuan (New Guinea or nearby) ancestries throughout Remote Oceania, with maternally-inherited mitochondrial DNA being predominantly Asian, paternally-inherited Y chromosomes mainly Papuan, and autosomes intermediate [7]. This led to the simple model mentioned above of an Austronesian-speaking population starting out from Taiwan, developing the Lapita culture in the islands near New Guinea while mixing with local Papuans, and then boldly launching out into the unknown Pacific.

The surprise came with the first studies of ancient DNA, when early Lapita people from Vanuatu and Tonga (ca. 2,500-3,000 yBP) showed completely Asian genetic ancestry, so the Papuan genetic component must have entered later.

This is what the most recent ancient DNA papers found:


There thus seems to have been a migration of Papuan-ancestry people from the Bismarck archipelago off the coast of New Guinea, into the islands of Remote Oceania, shortly after those very islands were first settled by people from Asia. Few traces of such a migration and its cultural or technological underpinnings have been found in the archaeological record or in linguistic relationships, which is why it comes as such a surprise. The fact that these Near Oceanian people made the long journey to Vanuatu so soon after the Asian seafarers arrived in their neighbourhood, having had tens of thousands of years to do so previously, strongly suggest that the migration was somehow triggered by interactions with the new Austronesian-speaking arrivals and adoption of their sophisticated seafaring technology. The excess of Y chromosomes of Papuan origin in Remote Oceania, somewhat difficult to explain under the traditional model, might also make sense in the light of an active expansion of people from Near Oceania, as such expansions have often found to be male-biased [10]. Both studies speculate that the arrival of these Papuan-ancestry people might have contributed to the end of the Lapita period and its cultural unity.

The very first settlers of Vanuatu would have spoken Austronesian languages, and the Papuan-ancestry people who arrived shortly after would very likely have spoken Papuan languages. Yet today, all languages of Vanuatu are Austronesian. The arrivals from Near Oceania thus seem to have largely replaced the first settlers but adopted their languages. Posth and colleagues [5] argue that the languages of Vanuatu actually contain some elements of Papuan origin, and that the ancient DNA results are compatible with a more gradual process of cultural interaction and genetic mixing, rather than sudden replacement. Nonetheless, linguistic continuity in the face of this almost complete genetic replacement is extremely unusual in human history, perhaps even unprecedented as Posth and colleagues [5] suggest.

We are seeing now from the Anatolian expansion and in the formation of the Indo-Iranian community that such processes were actually not as unusual as some had previously thought…


Two sources of archaic Denisovan ancestry in East Asia, one possibly after the isolation of Native Americans


Open access paper Analysis of Human Sequence Data Reveals Two Pulses of Archaic Denisovan Admixture, by Sharon L. Browning, Brian L. Browning, Zhou, Tucci, & Akey, Cell (2018).


Anatomically modern humans interbred with Neanderthals and with a related archaic population known as Denisovans. Genomes of several Neanderthals and one Denisovan have been sequenced, and these reference genomes have been used to detect introgressed genetic material in present-day human genomes. Segments of introgression also can be detected without use of reference genomes, and doing so can be advantageous for finding introgressed segments that are less closely related to the sequenced archaic genomes. We apply a new reference-free method for detecting archaic introgression to 5,639 whole-genome sequences from Eurasia and Oceania. We find Denisovan ancestry in populations from East and South Asia and Papuans. Denisovan ancestry comprises two components with differing similarity to the sequenced Altai Denisovan individual. This indicates that at least two distinct instances of Denisovan admixture into modern humans occurred, involving Denisovan populations that had different levels of relatedness to the sequenced Altai Denisovan.

Mean detected archaic sequence per individual (Mb)

The discussion on the potential implication of the paper:

Featured image, from the article: Contour Density Plots of Match Proportion of Introgressed Segments to the Altai Neanderthal and Altai Denisovan Genomes.


Genomics reveals four prehistoric migration waves into South-East Asia

Open access preprint article at bioRxiv Ancient Genomics Reveals Four Prehistoric Migration Waves into Southeast Asia, by McColl, Racimo, Vinner, et al. (2018).

Abstract (emphasis mine):

Two distinct population models have been put forward to explain present-day human diversity in Southeast Asia. The first model proposes long-term continuity (Regional Continuity model) while the other suggests two waves of dispersal (Two Layer model). Here, we use whole-genome capture in combination with shotgun sequencing to generate 25 ancient human genome sequences from mainland and island Southeast Asia, and directly test the two competing hypotheses. We find that early genomes from Hoabinhian hunter-gatherer contexts in Laos and Malaysia have genetic affinities with the Onge hunter-gatherers from the Andaman Islands, while Southeast Asian Neolithic farmers have a distinct East Asian genomic ancestry related to present-day Austroasiatic-speaking populations. We also identify two further migratory events, consistent with the expansion of speakers of Austronesian languages into Island Southeast Asia ca. 4 kya, and the expansion by East Asians into northern Vietnam ca. 2 kya. These findings support the Two Layer model for the early peopling of Southeast Asia and highlight the complexities of dispersal patterns from East Asia.

A model for plausible migration routes into Southeast Asia, based on the ancestry patterns observed in the ancient genomes.