Spread of Indo-European and Uralic speakers in ADMIXTURE


The following are updated files for unsupervised ADMIXTURE of most available ancient Eurasian samples with K=7. For reference, see PCA of ancient and modern Eurasian samples.

NOTE. For a precise interpretation of ancestry evolution, be sure to first check the posts on the expansion of “Steppe ancestry”, on the spread of Yamnaya ancestry with Indo-Europeans, and on the evolution of Corded Ware ancestry typical of modern Uralic populations.

ADMIXTURE timeline

This is a YouTube video similar to the one on Indo-Europeans and Y-DNA evolution:


Some comments

  • I have tried running supervised ADMIXTURE models by selecting distant populations based on PCAs and qpAdm results. The most accurate approximations to what the software should offer appear with a small K number, between K=5 and K=7, whether supervised or unsupervised, and adding more ancestral populations gives some weird results the more distant (in time) populations are from these selected samples.
  • Labels for ancestral components are used following those commonly referred to in the literature, although supervised ADMIXTURE using corresponding available samples (viz. Anatolia Neolithic for AHG, Iran Hotu and/or CHG for IHG, AG2, AG3 and Mal’ta for ANE, etc.) offer slightly different, less smooth outputs for some periods, especially among more recent populations.
  • Outputs depend on many different factors, and these files are intended as an overview of the evolution of these simplistic components. The number of available samples per period, the potential ancestry changes within each conventionally selected period, or whether or not each available sample is representative of the territory they were recovered from, among many other factors, influence the outputs and the maps.
Unsupervised ADMIXTURE (K=7). See full image.

NOTE. In summary, ADMIXTURE results like these below might be used to develop new ideas, to be then formally tested; they cannot be used to support anything. Don’t be like the Copenhagen group, randomly selecting “Steppe ancestry” with K=4, identifying this component as “Indo-Europeans”, and correlating its evolution with changes in vegetation composition in yet another obvious correlation = causation argument among many confounding factors left unaccounted for…

Static ADMIXTURE + culture maps

Colours correspond to the components as labelled in the video and in the files below.

  1. Anatomically Modern Humans (PDF)
  2. Upper Palaeolithic (PDF)
  3. Epipalaeolithic (PDF)
  4. Early Mesolithic (PDF)
  5. Late Mesolithic (PDF)
  6. Neolithic and hunter-gatherer pottery (PDF)
  7. Early Eneolithic (PDF)
  8. Late Eneolithic (PDF)
  9. Early Chalcolithic (PDF)
  10. Late Chalcolithic (PDF)
  11. Early Bronze Age (PDF)
  12. Middle Bronze Age (PDF)
  13. Late Bronze Age (PDF)
  14. Early Iron Age (PDF)
  15. Late Iron Age (PDF)
  16. Antiquity (PDF)
  17. Middle Ages (PDF)

Natural interpolation maps of ADMIXTURE

The following maps offer natural neighbour interpolations of ancestral components in ancient DNA samples grouped by periods (conventionally selected following the same pattern as in the Prehistory Atlas).

  • Extrapolation (inferred ancestry beyond the frame created by available samples per map) is obtained by adding distant external locations (such as Greenland, Arctic, Alaska…) with a value of 0.
  • Videos offer a dynamic timeline.
  • Click on the images to see a version with higher resolution.

WHG ancestry


AHG ancestry


ANE ancestry


“Siberian” ancestry

This ancestry peaks among Baikal HG, Ust’Belaya, Nganasans, or Ulchi, hence the different labels used.


Iran HG ancestry


ADMIXTURE maps by period

Click on each image for a higher resolution version.





Early Eneolithic


Late Eneolithic


Early Chalcolithic


Late Chalcolithic


Early Bronze Age


Middle Bronze Age


Late Bronze Age


Early Iron Age


Late Iron Age




Middle Ages


Modern populations



These are the samples used for interpolations in each period (except for modern populations, which are those included in the Reich Lab curated dataset):

See also

Munda admixture happened probably during the ANI-ASI mixture


Preprint The genetic legacy of continental scale admixture in Indian Austroasiatic speakers, by Tätte et al. bioRxiv (2018).

Interesting excerpts:

Studies analysing mtDNA and Y chromosome markers have revealed a sex-specific admixture pattern of admixture of Southeast and South Asian ancestry components for Munda speakers. While close to 100% of mtDNA lineages present in Mundas match those in other Indian populations, around 65% of their paternal genetic heritage is more closely related to Southeast Asian than South Asian variation. Such a contrasting distribution of maternal and paternal lineages among the Munda speakers is a classic example of ‘father tongue hypothesis’. However, the temporality of this expansion is contentious. Based on Y-STR data the coalescent time of Indian O2a-M95 haplogroup was estimated to be >10 KYA. Recently, the reconstructed phylogeny of 8.8 Mb region of Y chromosome data showed that Indian O2a-M95 lineages coalesce within a clade nested within East/Southeast Asian within the last ~5-7 KYA. This date estimate sets the upper boundary for the main episode of gene flow of Y chromosomes from Southeast Asia to India.

Supplementary Figure S4. First two components of principal component analysis (PCA). Individuals and population medians (circles) are marked with abbreviations from population names. Different colours represent populations from different geographic areas and/or linguistic groups as shown on the legend on the right. For the full names of populations see Supplementary Table S1. PCA was performed using software EIGENSOFT 6.1.42 on the whole filtered dataset (1072 individuals), previously LD pruned as described in the title of Supplementary Figure S1. The first two principal components describe 5.13% and 2.57% of total variance.

Admixture proportions suggest a novel scenario

Regardless of which West Asian population we used, we found that Munda speakers can be described on average as a mixture of ~19% Southeast Asian, 15% West Asian and 66% Onge (South Asian) components. Alternatively, the West and South Asian components of Munda could be modelled using a single South Asian population (Paniya), accounting on average to 77% of the Munda genome. When rescaling the West and South Asian (Onge) components to 1 to explore the Munda genetic composition prior to the introduction of the Southeast Asian component, we note that the West Asian component is lower (~19%) in Munda compared to Paniya (27%) (Supplementary Table S4: *Average_Lao=0). Consistently with qpGraph analyses in Narasimhan et al. (2018), this may point to an initial admixture of a Southeast Asian substrate with a South Asian substrate free of any West Asian component, followed by the encounter of the resulting admixed population with a Paniya-like population. Such a scenario would imply an inverse relationship between the Southeast and West Asian relative proportions in Munda or, in other words, the increase of Southeast Asian component should cause a greater reduction of the West Asian compared to the reduction in the South Asian component in Munda.

The distribution of genetic components (K=13) based on the global ADMIXTURE analysis (Supplementary Figure S1, S2, S3) for a subset of populations on a map of South and Southeast Asia. The circular legend in the bottom left corner shows the ancestral components corresponding to the colours on pie charts. The sector sizes correspond to population median.

Dating the admixture event

In this study, we have replicated a result previously reported in Chaubey et al. (2011)7 that the Mundas lack one ancestral component (k2) that is characteristic to Indian Indo-European and Dravidian speaking populations. If this component came to India through one of the Indo-Aryan migrations then it would be fair to presume that the Munda admixture happened before this component reached India or at least before it spread all over the country. However, the admixture time computed here, falls in the exact same timeframe as the ANI-ASI mixture has been estimated to have happened in India through which the k2 component probably spread. Therefore, we propose that if the Munda admixture happened at the same time, it is possible for it to have happened in the eastern part of the country, east of Bangladesh, and later when populations from East Asia moved to the area, the Mundas migrated towards central India. Such a scenario, which may be further clarified by ancient DNA analyses, seems to be further supported by the fact that Mundas harbor a smaller fraction of West Asian ancestry compared to contemporary Paniya (Supplementary Table S4) and cannot therefore be seen as a simple admixture product of Southern Indian populations with incoming Southeast Asian ancestries.

Image from Damgaard et al. (2018). A summary of the four qpAdm models fitted for South Asian populations. For each modern South Asian population. we fit different models with qpAdm to explain their ancestry composition using ancient groups and present the f irst model that we could not reject in the following priority order: 1. Namazga_CA + Onge, 2. Namazga_CA + Onge + Late Bronze Age Steppe, 3. Namazga_CA + Onge + Xiongnu_lA (East Asian proxy). and 4. Turkmenistan_lA + Xiongnu_lA. Xiongnu_lA were used here to represent East Asian ancestry. We observe that while South Asian Dravidian speakers can be modeled as a mixture of Onge and Namazga_CA. an additional source related to Late Bronze Age steppe groups is required for IE speakers. In Tibeto-Burman and Austro-Asiatic speakers. an East Asian rather than a Steppe_MLBA source is required

Linguistics and genome-wide data

(…) by and large, the linguistic classification justifies itself but Kharia and Juang do not fit in this simplification perfectly.

Once again, with the current level of detail in genetic studies, there is often no clear dialectal division possible for certain groups without fine-scale population studies, and the help from linguistics and archaeology.

Featured image from open access paper by Chaubey et al. (2011).


South-East Asia samples include shared ancestry with Jōmon


New paper (behind paywall) The prehistoric peopling of Southeast Asia, by McColl et al. (Science 2018) 361(6397):88-92 from a recent bioRxiv preprint.

Interesting is this apparently newly reported information including a female sample from the Ikawazu Jōmon of Japan ca. 570 BC (emphasis mine):

The two oldest samples — Hòabìnhians from Pha Faen, Laos [La368; 7950 with 7795 calendar years before the present (cal B.P.)] and Gua Cha, Malaysia (Ma911; 4415 to 4160 cal B.P.)—henceforth labeled “group 1,” cluster most closely with present-day Önge from the Andaman Islands and away from other East Asian and Southeast-Asian populations (Fig. 2), a pattern that differentiates them from all other ancient samples. We used ADMIXTURE (14) and fastNGSadmix (15) to model ancient genomes as mixtures of latent ancestry components (11). Group 1 individuals differ from the other Southeast Asian ancient samples in containing components shared with the supposed descendants of the Hòabìnhians: the Önge and the Jehai (Peninsular Malaysia), along with groups from India and Papua New Guinea.

We also find a distinctive relationship between the group 1 samples and the Ikawazu Jōmon of Japan (IK002). Outgroup f3 statistics (11, 16) show that group 1 shares the most genetic drift with all ancient mainland samples and Jōmon (fig. S12 and table S4). All other ancient genomes share more drift with present-day East Asian and Southeast Asian populations than with Jōmon (figs. S13 to S19 and tables S4 to S11). This is apparent in the fastNGSadmix analysis when assuming six ancestral components (K = 6) (fig. S11), where the Jōmon sample contains East Asian components and components found in group 1. To detect populations with genetic affinities to Jōmon, relative to present-day Japanese, we computed D statistics of the form D(Japanese, Jōmon; X, Mbuti), setting X to be different presentday and ancient Southeast Asian individuals (table S22). The strongest signal is seen when X=Ma911 and La368 (group 1 individuals), showing a marginally nonsignificant affinity to Jōmon (11). This signal is not observed with X = Papuans or Önge, suggesting that the Jōmon and Hòabìnhians may share group 1 ancestry (11).

Model for plausible migration routes into SEA. This schematic is based on ancestry patterns observed in the ancient genomes. Because we do not have ancient samples to accurately resolve how the ancestors of Jōmon and Japanese populations entered the Japanese archipelago, these migrations are represented by dashed arrows. A mainland component in Indonesia is depicted by the dashed red-green line. Gr, group; Kra, Kradai.

(…) Finally, the Jōmon individual is best-modeled as a mix between a population related to group 1/Önge and a population related to East Asians (Amis), whereas present-day Japanese can be modeled as a mixture of Jōmon and an additional East Asian component (Fig. 3 and fig. S29)

Interesting in relation to the oral communication of the SMBE O-03-OS02 Whole genome analysis of the Jomon remain reveals deep lineage of East Eurasian populations by Gakuuhari et al.:

Post late-Paleolithic hunter-gatherers lived throughout the Japanese archipelago, Jomonese, are thought to be a key to understanding the peopling history in East Asia. Here, we report a whole genome sequence (x1.85) of 2,500-year old female excavated from the Ikawazu shell-mound, unearthed typical remains of Jomon culture. The whole genome data places the Jomon as a lineage basal to contemporary and ancient populations of the eastern part of Eurasian continent, and supports the closest relationship with the modern Hokkaido Ainu. The results of ADMIXTURE show the Jomon ancestry is prevalent in present-day Nivkh, Ulchi, and people in the main-island Japan. By including the Jomon genome into phylogenetic trees, ancient lineages of the Kusunda and the Sherpa/Tibetan, early splitting from the rest of East Asian populations, is emerged. Thus, the Jomon genome gives a new insight in East Asian expansion. The Ikawazu shell-mound site locates on 34,38,43 north latitude, and 137,8, 52 east longitude in the central main-island of the Japanese archipelago, corresponding to a warm and humid monsoon region, which has been thought to be almost impossible to maintain sufficient ancient DNA for genome analysis. Our achievement opens up new possibilities for such geographical regions.


Genomics reveals four prehistoric migration waves into South-East Asia

Open access preprint article at bioRxiv Ancient Genomics Reveals Four Prehistoric Migration Waves into Southeast Asia, by McColl, Racimo, Vinner, et al. (2018).

Abstract (emphasis mine):

Two distinct population models have been put forward to explain present-day human diversity in Southeast Asia. The first model proposes long-term continuity (Regional Continuity model) while the other suggests two waves of dispersal (Two Layer model). Here, we use whole-genome capture in combination with shotgun sequencing to generate 25 ancient human genome sequences from mainland and island Southeast Asia, and directly test the two competing hypotheses. We find that early genomes from Hoabinhian hunter-gatherer contexts in Laos and Malaysia have genetic affinities with the Onge hunter-gatherers from the Andaman Islands, while Southeast Asian Neolithic farmers have a distinct East Asian genomic ancestry related to present-day Austroasiatic-speaking populations. We also identify two further migratory events, consistent with the expansion of speakers of Austronesian languages into Island Southeast Asia ca. 4 kya, and the expansion by East Asians into northern Vietnam ca. 2 kya. These findings support the Two Layer model for the early peopling of Southeast Asia and highlight the complexities of dispersal patterns from East Asia.

A model for plausible migration routes into Southeast Asia, based on the ancestry patterns observed in the ancient genomes.