Dzudzuana, Sidelkino, and the Caucasus contribution to the Pontic-Caspian steppe


It has been known for a long time that the Caucasus must have hosted many (at least partially) isolated populations, probably helped by geographical boundaries, setting it apart from open Eurasian areas.

David Reich writes in his book the following about India:

The genetic data told a clear story. Around a third of Indian groups experienced population bottlenecks as strong or stronger than the ones that occurred among Finns or Ashkenazi Jews. We later confirmed this finding in an even larger dataset that we collected working with Thangaraj: genetic data from more than 250 jati groups spread throughout India (…)

Rather than an invention of colonialism as Dirks suggested, long-term endogamy as embodied in India today in the institution of caste has been overwhelmingly important for millennia. (…)

The Han Chinese are truly a large population. They have been mixing freely for thousands of years. In contrast, there are few if any Indian groups that are demographically very large, and the degree of genetic differentiation among Indian jati groups living side by side in the same village is typically two to three times higher than the genetic differentiation between northern and southern Europeans. The truth is that India is composed of a large number of small populations.

There is little doubt now, based on findings spanning thousands of years, that the Mesolithic and Neolithic Caucasus hosted various very small populations, even if the ancestral components may be reduced to the few known to date (such as ANE, EHG, AME*, ENA, CHG, and other “deep” ancestral components).

NOTE. I will call the ancestral component of Dzudzuana/Anatolian hunter-gatherers Ancient Middle Easterner (AME), to give a clear idea of its likely extension during the Late Upper Palaeolithic, and to avoid using the more simplistic Dzudzuana, unless it is useful to mention these specific local samples.

Image modified from Lazaridis et al. (2018), including Caucasus, Don-Volga-Ural, and North Pontic Mesolithic-Neolithic populations. “Ancient West Eurasian population structure. (a) Geographical distribution of key ancient West Eurasian populations. (b) Temporal distribution of key ancient West Eurasian populations (approximate date in ky BP). (c) PCA of key ancient West Eurasians, including additional populations (shown with grey shells), in the space of outgroup f4-statistics (Methods).”

Genetic labs have a strong fixation with ancestry. I guess the use of complex statistical methods gives professionals and laymen alike the feeling of dealing with “Science”, as opposed to academic fields where you have to interpret data. I think language reveals a lot about the way people think, and the fact that ancestral components are called ‘lineages’ – while not wrong per se – is a clear symptom of the lack of interest in the true lineages: Y-DNA haplogroups.

Y-DNA bottlenecks

It has become quite clear that male-biased migrations are often the ones which can be confidently followed for actual population movements and ethnolinguistic identification, at least until the Iron Age. The frequently used Palaeolithic clusters offer a clear example of why ancestry does not represent what some people believe: They merely give a basic idea of sizeable population replacements by distant peoples.

Both concepts are important: sizeable and distant peoples. For example, during the Upper Palaeolithic in Europe there was a sizeable population replacement of the Aurignacian Goyet cluster by the Gravettian Vestonice cluster (probably from populations of far eastern Russia) coupled with the arrival of haplogroup I, although during the thousands of years that this material culture lasted, the previously expanded C1a2 lineages did not disappear, and there were probably different resurgence and admixture events.

Haplogroup I certainly expanded with the Gravettian culture to Iberia, where the Goyet ancestry did not change much – probably because of male-driven migrations -, to the extent that during the Magdalenian expansions haplogroup I expanded with an ancestry closer to Goyet, in what is called a ‘resurge’ of the Goyet cluster – even though there is a clear replacement of male lines.

The Villabruna (WHG) cluster is another good example. It probably spread with haplogroup R1b-L754, which – based on the extra ‘East Asian’ affinity of some samples and on modern samples from the Middle East – came probably from the east through a southern route, and not too long before the expansion of WHG likely from around the Black Sea, although this is still unclear. The finding of haplogroup I in samples of mostly WHG ancestry could confuse people that do not care about timing, sub-structured populations, and gene flow.

Image from David Reich’s Who We Are and How We Got Here. Having migrated out of Africa and the Near East, modern human pioneer populations spread throughout Eurasia (1). By at least thirty-nine thousand years ago, one group founded a lineage of European hunter-gatherers that persisted largely uninterrupted for more than twenty thousand years (2). Eventually, groups derived from an eastern branch of this founding population of European huntergatherers spread west (3), displaced previous groups, and were eventually themselves pushed out of northern Europe by the spread of glacial ice, shown at its maximum extent (top right). As the glaciers receded, western Europe was repeopled from the southwest (4) by a population that had managed to persist for tens of thousands of years and was related to an approximately thirty-five-thousand-year old individual from far western Europe. A later human migration, following the first strong warming period, had an even larger impact, with a spread from the southeast (5) that not only transformed the population of western Europe but also homogenized the populations of Europe and the Near East. At a single site—Goyet Caves in Belgium—ancient DNA from individuals spread over twenty thousand years reflects these transformations, with representatives from the Aurignacian, Gravettian, and Magdalenian periods.

NOTE. If you don’t understand why ‘clusters’ that span thousands of years don’t really matter for the many Palaeolithic population expansions that certainly happened among hunter-gatherers in Europe, just take a look at what happened with Bell Beakers expanding from Yamna into western Europe within 500 years.

If we don’t thread carefully when talking about population migrations, these terms are bound to confuse people. Just as the fixation on “steppe ancestry” – which marks the arrival in Chalcolithic Europe of peoples from the Pontic-Caspian region – has confused a lot of researchers to this day.

When I began to write about the Indo-European demic diffusion model, my concern was to find a single spot where a North-West Indo-European proto-language could have expanded from ca. 2000 BC (our most common guesstimate). Based on the 2015 papers, and in spite of their conclusions, I thought it had become clear that Corded Ware was not it, and it was rather Bell Beakers. I assumed that Uralic was spoken to the north (as was the traditional belief), and thus Corded Ware expanded from the forest zone, hence steppe ancestry would also be found there with other R1a lineages.

With the publication of Mathieson et al. (2017) and Olalde et al. (2017), I changed my mind, seeing how “steppe ancestry” did in fact appear quite late, hence it was likely to be the result of very specific population movements, probably directly from the Caucasus. Later, Mathieson published in a revision the sample from Alexandria of hg R1a-M417 (probably R1a-Z645, possibly Z93+), which further supported the idea that the migration of Corded Ware peoples started near the North Pontic forest-steppe (as I included in a the next revision).

The question remains the same I repeated recently, though: where do the extra Caucasus components (i.e. beyond EHG) of Eneolithic Ukraine/Corded Ware and Khvalynsk/Yamna come from?

Steppe ancestry: “EHG” + “CHG”?

About EHG ancestry

From Lazaridis et al. (2018):

Considering 2-way mixtures, we can model Karelia_HG as deriving 34 ± 2.8% of its ancestry from a Villabruna-related source, with the remainder mainly from ANE represented by the AfontovaGora3 (AG3) sample from Lake Baikal ~17kya.

AG3 was likely of haplogroup Q1a (as reported by YFull, see Genetiker), and probably the ANE ancestry found in Eastern Europe accompanied a Palaeolithic migration of Q1a2-M25 (formed ca. 22600 BC, TMRCA ca. 14300 BC).

NOTE. You can read more about the expansion of Q lineages during the Palaeolithic.

Combined with what we know about the Eneolithic Steppe and Caucasus populations – it is likely that ANE ancestry remained the most important component of some of the small ghost populations of the Caucasus until their emergence with the Lola culture.

Image modified from Wang et al. (2018). Samples projected in PCA of 84 modern-day West Eurasian populations (open symbols). Previously known clusters have been marked and referenced. Marked and labelled are the Balkan samples referenced in this text An EHG and a Caucasus ‘clouds’ have been drawn, leaving Pontic-Caspian steppe and derived groups between them. See the original file here. To understand the drawn potential Caucasus Mesolithic cluster, see above the PCA from Lazaridis et al. (2018).

The first sample we have now attributed to the EHG cluster is Sidelkino, from the Samara region (ca. 9300 BC), mtDNA U5a2. In Damgaard et al. (Science 2018), Yamnaya could be modelled as a CHG population related to Kotias Klde (54%) and the remaining from ANE population related to Sidelkino (>46%), with the following split events:

  1. A split event, where the CHG component of Yamnaya splits from KK1. The model inferred this time at 27 kya (though we note the larger models in Sections S2.12.4 and S2.12.5 inferred a more recent split time).
  2. A split event, where the ANE component of Yamnaya splits from Sidelkino. This was inferred at about about 11 kya.
  3. A split event, where the ANE component of Yamnaya splits from Botai. We inferred this to occur 17 kya. Note that this is above the Sidelkino split time, so our model infers Yamnaya to be more closely related to the EHG Sidelkino, as expected.
  4. An ancestral split event between the CHG and ANE ancestral populations. This was inferred to occur around 40 kya.

Other samples classified as of the EHG cluster:

  • Popovo2 (ca. 6250 BC) of hg J1, mtDNA U4d – Po2 and Po4 from the same site (ca. 6550 BC) show continuity of mtDNA.
  • Karelia_HG, from Juzhnii Oleni Ostrov (ca. 6300 BC): I0211/UzOO40 (ca. 6300 BC) of hg J1(xJ1a), mtDNA U4a; and I0061/UzOO74 of hg R1a1(xR1a1a), mtDNA C1
  • UzOO77 and UzOO76 from Juzhnii Oleni Ostrov (ca. 5250 BC) of mtDNA R1b.
  • Samara_HG from Lebyanzhinka (ca. 5600 BC) of hg R1b1a, mtDNA U5a1d.

From the analysis of Lazaridis et al. (2018), we have some details about their admixture:

Image modified from Lazaridis et al. (2018). Modeling present-day and ancient West-Eurasians. Mixture proportions computed with qpAdm (Supplementary Information section 4). The proportion of ‘Mbuti’ ancestry represents the total of ‘Deep’ ancestry from lineages that split prior to the split of Ust’Ishim, Tianyuan, and West Eurasians and can include both ‘Basal Eurasian’ and other (e.g., Sub-Saharan African) ancestry. (Left) ‘Conservative’ estimates. Each population 367 cannot be modeled with fewer admixture events than shown. (Right) ‘Speculative’ estimates. The highest number of sources (≤5) with admixture estimates within [0,1] are shown for each population. Some of the admixture proportions are not significantly different from 0 (Supplementary Information section 4).

About Anatolia_Neolithic ancestry

About the enigmatic Anatolia_Neolithic-related ancestry found in Pontic-Caspian steppe samples, this is what Wang et al. (2018) had to say:

We focused on model of mixture of proximal sources such as CHG and Anatolian Chalcolithic for all six groups of the Caucasus cluster (Eneolithic Caucasus, Maykop and Late Makyop, Maykop-Novosvobodnaya, Kura-Araxes, and Dolmen LBA), with admixture proportions on a genetic cline of 40-72% Anatolian Chalcolithic related and 28-60% CHG related (Supplementary Table 7). When we explored Romania_EN and Greece_Neolithic individuals as alternative southeast European sources (30-46% and 36-49%), the CHG proportions increased to 54-70% and 51-64%, respectively. We hypothesize that alternative models, replacing the Anatolian Chalcolithic individual with yet unsampled populations from eastern Anatolia, South Caucasus or northern Mesopotamia, would probably also provide a fit to the data from some of the tested Caucasus groups.


The first appearance of ‘Near Eastern farmer related ancestry’ in the steppe zone is evident in Steppe Maykop outliers. However, PCA results also suggest that Yamnaya and later groups of the West Eurasian steppe carry some farmer related ancestry as they are slightly shifted towards ‘European Neolithic groups’ in PC2 (Fig. 2D) compared to Eneolithic steppe. This is not the case for the preceding Eneolithic steppe individuals. The tilting cline is also confirmed by admixture f3-statistics, which provide statistically negative values for AG3 as one source and any Anatolian Neolithic related group as a second source

Modified image from Wang et al. (2018). In blue, Yamna-related populations. In red, Corded Ware-related populations, and two elevated Anatolia_Neolithic values in Yamna. Notice how only GAC-related admixture increases the Anatolian_N-related ancestry in the Yamna outlier from Ozero, and the late Yamna sample from Hungary, related to the homogeneous Yamna population. “Supplementary Table 14. P values of rank=3 and admixture proportions in modelling Steppe ancestry populations as a four-way admixture of distal sources EHG, CHG, Anatolian_Neolithic and WHG using 14 outgroups.Left populations: Steppe cluster, EHG, CHG, WHG, Anatolian_Neolithic. Right populations: Mbuti.DG, Ust_Ishim.DG, Kostenki14, MA1, Han.DG, Papuan.DG, Onge.DG, Villabruna, Vestonice16, ElMiron, Ethiopia_4500BP.SG, Karitiana.DG, Natufian, Iran_Ganj_Dareh_Neolithic.”

Detailed exploration via D-statistics in the form of D(EHG, steppe group; X, Mbuti) and D(Samara_Eneolithic, steppe group; X, Mbuti) show significantly negative D values for most of the steppe groups when X is a member of the Caucasus cluster or one of the Levant/Anatolia farmer-related groups (Supplementary Figs. 5 and 6). In addition, we used f- and D-statistics to explore the shared ancestry with Anatolian Neolithic as well as the reciprocal relationship between Anatolian- and Iranian farmer-related ancestry for all groups of our two main clusters and relevant adjacent regions (Supplementary Fig. 4). Here, we observe an increase in farmer-related ancestry (both Anatolian and Iranian) in our Steppe cluster, ranging from Eneolithic steppe to later groups. In Middle/Late Bronze Age groups especially to the north and east we observe a further increase of Anatolian farmer related ancestry consistent with previous studies of the Poltavka, Andronovo, Srubnaya and Sintashta groups and reflecting a different process not especially related to events in the Caucasus.

(…) Surprisingly, we found that a minimum of four streams of ancestry is needed to explain all eleven steppe ancestry groups tested, including previously published ones (Fig. 2; Supplementary Table 12). Importantly, our results show a subtle contribution of both Anatolian farmer-related ancestry and WHG-related ancestry (Fig.4; Supplementary Tables 13 and 14), which was likely contributed through Middle and Late Neolithic farming groups from adjacent regions in the West. The discovery of a quite old AME ancestry has rendered this probably unnecessary, because this admixture from an Anatolian-like ghost population could be driven even by small populations from the Caucasus.

Image modified from Wang et al. (2018). Marked are: in red, approximate limit of Anatolia_Neolithic ancestry found in Yamna populations; in blue, Corded Ware-related groups. “Modelling results for the Steppe and Caucasus 1128 cluster. Admixture proportions based on (temporally and geographically) distal and proximal models, showing additional Anatolian farmer-related ancestry in Steppe groups as well as additional gene flow from the south in some of the Steppe groups as well as the Caucasus groups (see also Supplementary Tables 10, 14 and 20).”

NOTE. For a detailed account of the possibilities regarding this differential admixture in the North Pontic area in contrast to the Don-Volga-Ural region, you can read the posts Sredni Stog, Proto-Corded Ware, and their “steppe admixture”, and Corded Ware culture origins: The Final Frontier.

While it is not yet fully clear, the increased Anatolian_Neolithic-like ancestry in Ukraine_Eneolithic samples (see below) makes it unlikely that all such ancestry in Corded Ware groups comes from a GAC-related contribution. It is likely that at least part of it represents contributions from populations of the Caucasus, based on the mostly westward population movements in the steppe from ca. 4600 BC on, including the Suvorovo-Novodanilovka expansion, and especially the Kuban-Maykop expansion during the final Eneolithic into the North Pontic area.

NOTE. Since CHG-like groups from the Caucasus may have combinations of AME and ANE ancestry similar to Yamna (which may thus appear as ‘steppe ancestry’ in the North Pontic area), it is impossible to interpret with precision the following ADMIXTURE graphic:

Modified image from Mathieson et al. (2018). Supervised ADMIXTURE analysis, modelling each ancient individual (one per row) as a mixture of population clusters constrained to contain northwestern-Anatolian Neolithic (grey), Yamnaya from Samara (yellow), EHG (pink) and WHG (green) populations. Dates in parentheses indicate approximate range of individuals in each population.

North-Eastern Technocomplex

The East Asian contribution to samples from the WHG samples (like Loschbour or La Braña), as specified in Fu et al. (2016), does not seem to be related to Baikal_EN, and appears possibly (in the ADMIXTURE analysis) integrated into he Villabruna component. I guess this implies that the shared alleles with East Asians are quite early, and potentially due to the expansion of R1b-L754 from the East.

It would be interesting to know the specific material culture Sidelkino belonged to – i.e. if it was related to the expansion of the North-Eastern Technocomplex – , and its Y-DNA. The Post-Swiderian expansion into eastern Europe, probably associated with the expansion of R1b-P297 lineages (including R1b-M73, found later in Botai and in Baltic HG) is supposed to have begun during the 11th millennium BC, but migrations to the Urals and beyond are probably concentrated in the 9th millennium, so this sample is possibly slightly early for R1b.

NOTE. User Rozenfeld at Anthrogenica posted this, which I think is interesting (in case anyone wants to try a Y-SNP call):

there is something strange with Sidelkino EHG: first, its archaeological context is not described in the supplementary. Second, its sex is not listed in the supplementary tables. Third, after looking for info about this sample, I found that: “Сиделькино-3. Для снятия вопроса о половой принадлежности индивида была проведена генетическая экспертиза, выявившая принадлежность останков мужчине.”(translation: Sidelkino-3. To resolve the question about sex of the remains, the genetic analysis was conducted, which showed that remains belonged to male), source:

So either they haven’t mentioned his Y-DNA in the paper for some reason, or there are more than one Sidelkino sample and the male one has not yet been published. The coverage of the Sidelkino sample from the paper is 2.9, more than enough to tell Y-DNA haplogroup.

The map of spreading of Post-Swiderian and Post-Krasnosillian sites in Mesolithic of Eastern Europe in the 8th millennia BC. From Zaliznyak (see here).

My speculative guess right now about specific population movements in far eastern Europe, based on the few data we have:

  • The expansion of the North-Eastern Technocomplex first around the 9th millennium BC, most likely expanded R1b-P279 ca. 11300 BC, judging by its TMRCA, with both R1b-M73 (TMRCA 5300) and R1b-M269 (TMRCA 4400 BC) info (with extra El Mirón ancestry) back, and thus Eurasiatic.
  • The expansion of haplogroup J1 to the north may have happened before or after the R1b-P279 expansion. Judging by the increase in AG3-related ancestry near Karelia compared to Baltic_HG, it is possible that it expanded just after R1b-P279 (hence possibly J1-Y6304? TMRCA 9700 BC). Its long-lasting presence in the Caucasus is supported by the Satsurblia (ca. 11300 BC) and the Dolmen BA (ca. 1300 BC) samples.
  • The expansion of R1a-M17 ca. 6600 BC is still likely to have happened from the east, based on the R1a-M17 samples found in Baikalic cultures slightly later (ca. 5300 BC). The presence of elevated Baikal_EN ancestry in Karelia HG and in Samara HG, and the finding of R1a-M417 samples in the Forest Zone after the Mesolithic suggests a connection with the expansion of Hunter-Gatherer pottery, from the Elshanka culture in the Samara region northward into the Forset Zone and westward into the North Pontic area.
  • The expansion of R1b-M73 ca. 5300 BC is likely to be associated with the emergence of a group east of the Urals (related to the later Botai culture, and potentially Pre-Yukaghir). Its presence in a Narva sample from Donkalnis (ca. 5200 BC) suggest either an early split and spread of both R1b-P297 lineages (M73 and M269) through Eastern Europe, or maybe a back-migration with hunter-gatherer pottery.
  • R1b-M269 spread successfully ca. 4400 BC (and R1b-L23 ca. 4100 BC, both based on TMRCA), and this successful expansion is probably to be associated with the Khvalynsk-Novodanilovka expansion. We already know that Samara_HG ca. 5600 was R1b1a, so it is likely that R1b-M269 appeared (or ‘resurged’) in the Volga-Ural region shortly after the expansion of R1a-M17, whose expansion through the region may be inferred by the additional AG3 and Baikal_EN ancestry. Interesting from Samara_HG compared to the previous Sidelkino sample is the introduction of more El Mirón-related ancestry, typical of WHG populations (and thus proper of Baltic groups).

NOTE. The TMRCA dates are obviously gross approximations, because a) the actual rate of mutation is unknown and b) TMRCA estimates are based on the convergence of lineages that survived. The potential finding of R1a-Z645 (possibly Z93+) in Ukraine Eneolithic (ca. 4000 BC), and the potential finding of R1b-L23 in Khvalynsk ca. 4250 BC complicates things further, in terms of dates and origins of any subclade.

The question thus remains as it was long ago: did R1b-M269 lineages expand (‘return’) from the east, near the Urals, or directly from the north? Were they already near Samara at the same time as the expansion of hunter-gatherer pottery, and were not much affected by it? Or did they ‘resurge’ from populations admixed with Caucasus-related ancestry after the expansion of R1a-M17 with this pottery (since there are different stepped expansions from the Samara region)? We could even ask, did R1a-M17 really expand from the east, i.e. are the dates on Baikalic subclades from Moussa et al. (2016) reliable? Or did R1a-M17 expand from some pockets in the Pontic-Caspian steppe, taking over the expansion of HG pottery at some point?

Early Neolithic cultures in eastern and central Europe: 1–Yelshanian; 2–North Caspian; 3–Rakushechnyj Yar; 4–Surskian; 5–Dnieper-Donetsian; 6– Bug-Dniesterian; 7–Upper Volga; 8–Narvian; 9–Linear Pottery. White arrows: expansion of early farming; black arrows: spread of pottery-making traditions. From Dolukhanov et al. (2009).

Maglemose-related migrations

The most interesting aspect from the new paper (regarding Indo-Uralic migrations) is that Ancestral Middle Easterner ancestry will probably be a better proxy for the Anatolia_Neolithic component found in Ukraine Mesolithic to Eneolithic, and possibly also for some of the “more CHG-like” component found among Pontic-Caspian steppe populations, all likely derived from different admixture events with groups from the Caucasus.

NOTE. Even the supposed gene flow of Neolithic Iranian ancestry into the Caucasus can be put into question, since that means possibly a Dzudzuana-like population with greater “deep ancestry” proportion than the one found in CHG, which may still be found within the Caucasus.

If it was not clear already that following ‘steppe ancestry’ wherever it appears is a rather lame way of following Indo-European migrations, every single sample from the Caucasus and their admixture with Pontic-Caspian steppe populations will probably show that “steppe ancestry” is in fact formed by a variety of steppe-related ancestral components, impossible to follow coherently with a single population. Exactly what is happening already with the Siberian ancestry.

If the paper on the Dzudzuana samples has shown something, is that the expansion of an ANE-like population shook the entire Caucasus area up to the Zagros Mountains, creating this ANE – AME cline that are CHG and Iran_N, with further contributions of “deep ancestries” (probably from the south) complicating the picture further.

If this happens with few known samples, and we know of an ANE-like ghost population in the Caucasus (appearing later in the Lola culture), we can already guess that the often repeated “CHG component” found in Ukraine_Eneolithic and Khvalynsk will not be the same (except the part mediated by the Novodanilovka expansion).

This ANE-like expansion happened probably in the Late Upper Palaeolithic, and reached Northern Europe probably after the expansion of the Villabruna cluster (ca. 12000 BC), judging by the advance of AG3-like and ENA-like ancestry in later WHG samples.

The population movements during the Mesolithic and Early Neolithic in the North Pontic area are quite complicated: the extra AME ancestry is probably connected to the admixture with populations from the Caucasus, while the close similarity of Ukraine populations with Scandinavian ones (with an increase in Villabruna ancestry from Mesolithic to Neolithic samples), probably reveal population movements related to the expansion of Maglemose-related groups.

Etno-cultural situation in Central and Eastern Europe in the Late Mesolithic — Early Neolithic (VI—V Mill. BC) (after Конча 2004: 201, карта 1; made after ideas by L. L. Zaliznyak). Legend: 1 — Maglemose circle in the VII Mill. BC (after Gr. Clark); 2—7 — Mesolithic cultures of the Post-Maglemose tradition, VI Mill. BC (after S. Kozłowsky, L. L. Zaliznyak): 2 — de Leyen-Wartena; 3 — Oldesloe — Godenaa; 4 — Chojnice — Peńki; 5 — Janisłavice; 6 — finds of Janisłavice artefacts outside of the main area; 7 — Donets culture; 8 — directions of the settling of Janisłavice people (after S. Kozłowsky and L. L. Zaliznyak); 9 — the south border of Mesolithic and Early Neolithic cultures of post-Swidrian and post-Arensburgian traditions; 10 — northern border of settlement of the Balkan-Danubian farmers; 11 — Bug- Dniester culture; 12 — Neolithic cultures emerged on the ethno-cultural basis of post-Maglemose: Э — Ertebölle-Ellerbeck, Н — Neman, Д — Dnieper-Donets, М — Mariupol (western variants). From Klein (2017).

These Maglemose-related groups were probably migrants from the north-west, originally from the Northern European Plains, who occupied the previous Swiderian territory, and then expanded into the North Pontic area. The overwhelming presence of I2a (likely all I2a2a1b1b) lineages in Ukraine Neolithic supports this migration.

The likely picture of Mesolithic-Neolithic migrations in the North Pontic area right now is then:

  1. Expansion of R1a-M459 from the east ca. 12000 BC – probably coupled with AG3 and also some Baikal_EN ancestry. First sample is I1819 from Vasilievka (ca. 8700 BC), another is from Dereivka ca. 6900 BC.
  2. Expansion of R1b-V88 from the Balkans in the west ca. 9700 BC, based on its TMRCA and also the Balkan hunter-gatherer population overwhemingly of this haplogroup from the 10th millennium until the Neolithic. First sample is I1734 from Vasilievka (ca. 7252 BC), which suggests that it replaced the male population there, based on their similar EHG-like adxmixture (and lack of sizeable WHG increase), and shared mtDNA U5b2, U5a2.
  3. Expansion of I2a-Y5606 probably ca. 6800 based on its TMRCA with Janislawice culture. Supporting this is the increase in WHG contribution to Neolithic samples, including the spread of U4 subclades compared to the previous period.
  4. Expansion of R1a-M17 starting probably ca. 6600 BC in the east (see above).

NOTE. The first sample of haplogroup I appears in the Mesolithic: I1763 (ca. 8100 BC) of haplogroup I2a1, probably related to an older Upper Palaeolithic expansion.

Distribution of archeological cultures in the North Pontic Region during the Mesolithic (7th – 6th millennium BCE). Dotted, dashed and solid lines with corresponding arrows indicate alternative models of the spread of the Grebenyky culture groups. (After Bryuako IV., Samojlova TL., Eds, Drevnie kul’tury Severo-­‐Zapadnogo Prichernomor’ya, Odessa: SMIL, 2013.) Nikitin – Ivanova 2017.


It is becoming more and more clear with each new paper that – unless the number of very ancient samples increases – the use of Y-chromosome haplogroups remains one of the most important tools for academics; this is especially so in the steppes, in light of the diversity found in populations from the Caucasus. A clear example comes from the Yamna – Corded Ware similarities:

After the publication of the 2015 papers, it was likely that Yamna expanded with haplogroup R1b-L23, but it has only become crystal clear that Yamna expanded through the steppes into Bell Beakers, now that we have data about the strict genetic homogeneity of the whole Yamna population from west to east (including Afanasevo), in contrast with contemporary Corded Ware peoples which expanded from a different forest-steppe population.

The presence of haplogroups Q and R1a-M459 (xM17) in Khvalynsk along with a R1b1a sample, which some interpreted as being akin to modern ‘mixed’ populations in the past, is likely to point instead to a period of Khvalynsk-Novodanilovka expansion with R1b-M269, where different small populations from the steppe were being integrated into the common Khvalynsk stock, but where differences are seen in material culture surrounding their burials, as supported by the finding of R1b1 in the Kuban area already in the first half of the 5th millennium. The case would be similar to the early ‘mixed’ Icelandic population.

Only after the emergence of the Samara culture (in the second half of the 6th millennium BC), with a sample of haplogroup R1b1a, starts then the obvious connection with Early Proto-Indo-Europeans; and only after the appearance of late Sredni Stog and haplogroup R1a-M417 (ca. 4000 BC) is its connection with Uralic also clear. In previous population movements, I think more haplogroups were involved in migrations of small groups, and only some communities among them were eventually successful, expanding to be dominant, creating ever growing cultures during their expansions.

Indeed, if you think in terms of Uralic and Indo-European just as converging languages, and forget their potential genetic connection, then the genetic + linguistic picture becomes simplified, and the upper frontier of the 6th millennium BC with a division North Pontic (Mariupol) vs. Volga-Ural (Samara) is enough. However, tracing their movements backwards – with cultural expansions from west to east (with the expansion of farming), and earlier east to west (with hunter-gatherer pottery), and still earlier west to east (with the north-eastern technocomplex), offers an interesting way to prove their potential connection to macrofamilies, at least in terms of population movements.

Modified image from Tambets et al. (2018) Proportions of ancestral components in studied European and Siberian populations and the tested qpGraph model. a The qpGraph model fitting the data for the tested populations. Colour codes for the terminal nodes: pink—modern populations (‘Population X’ refers to test population) and yellow—ancient populations (aDNA samples and their pools). Nodes coloured other than pink or yellow are hypothetical intermediate populations. We putatively named nodes which we used as admixture sources using the main recipient among known populations. The colours of intermediate nodes on the qpGraph model match those on the admixture proportions panel. The NeolL (Neolithic Levant) ancestry selected in this qpGraph is likely to correspond (at least in part) to a specific Dzudzuana-like component present in the CHG-like population that admixed in the North Pontic area.

I am quite convinced right now that it would be possible to connect the expansion of R1b-L754 subclades with a speculative Nostratic (given the R1b-V88 connection with Afroasiatic, and the obvious connection of R1b-L297 with Eurasiatic). Paradoxically, the connection of an Indo-Uralic community in the steppes (after the separation of Yukaghir) with any lineage expansion (R1a-M17, R1b-M269, or even Q, I or J1) seems somehow blurrier than one year ago, possibly just because there are too many open possibilities.

David Reich says about the admixture with Neanderthals, which he helped discover:

At the conclusion of the Neanderthal genome project, I am still amazed by the surprises we encountered. Having found the first evidence of interbreeding between Neanderthals and modern humans, I continue to have nightmares that the finding is some kind of mistake. But the data are sternly consistent: the evidence for Neanderthal interbreeding turns out to be everywhere. As we continue to do genetic work, we keep encountering more and more patterns that reflect the extraordinary impact this interbreeding has had on the genomes of people living today.

I think this is a shared feeling among many of us who have made proposals about anything, to fear that we have made a gross, evident mistake, and constantly look for flaws. However, it seems to me that geneticists are more preoccupied with being wrong in their developed statistical methods, in the theoretical models they are creating, and not so much about errors in the true ancient ethnolinguistic picture human population genetics is (at least in theory) concerned about. Their publications are, after all, constantly associating genetic finds with cultures and (whenever possible) languages, so this aspect of their research should not be taken lightly.

Seeing how David Anthony or Razib Khan (among many others) have changed their previously preferred migration models as new data was published, and they continue to be respected in their own fields, I guess we can be confident that professionals with integrity are going to accept whatever new picture appears. While I don’t think that genetic finds can change what we can reconstruct with comparative grammar, I am also ready to revise guesstimates and routes of expansion of certain dialects if R1a-Z645 is shown to have accompanied Late Proto-Indo-Europeans during their expansion with Yamna, and later integrated somehow with Corded Ware.

However, taking into account the obsession of some with an ancestral, uninterrupted R1a—Indo-European association, and the lack of actual political repercussion of Neanderthal admixture, I think the most common nightmare that all genetic researchers should be worried about is to keep inflating this “Yamnaya ancestry”-based hornet’s nest, which has been constantly stirred up for the past two years, by rejecting it – or, rather, specifying it into its true complex nature.

This succession of corrections and redefinitions, coupled with the distinct Y-DNA bottleneck of each steppe population, will eventually lead to a completely different ethnolinguistic picture of the Pontic-Caspian region during the Eneolithic, which is likely to eventually piss off not only reasonable academics stubbornly attached to the CWC-IE idea, but also a part of those interested in daydreaming about their patrilineal ancestors.

Sometimes it’s better to just rip off the band-aid once and for all…

Featured image from The oldest pottery in hunter-gatherer communitiesand models of Neolithisation of Eastern Europe (2015), by Andrey Mazurkevich and Ekaterina Dolbunova.


Interesting is today’s post in Ancient DNA Era: Is Male-driven Genetic Replacement always meaning Language-shift?

The Iron Age expansion of Southern Siberian groups and ancestry with Scythians


Maternal genetic features of the Iron Age Tagar population from Southern Siberia (1st millennium BC), by Pilipenko et al. (2018).

Interesting excerpts (emphasis mine):

The positions of non-Tagar Iron Age groups in the MDS plot were correlated with their geographic position within the Eurasian steppe belt and with frequencies of Western and Eastern Eurasian mtDNA lineages in their gene pools. Series from chronological Tagar stages (similar to the overall Tagar series) were located within the genetic variability (in terms of mtDNA) of Scythian World nomadic groups (Figs 5 and 6; S4 and S6 Tables). Specifically, the Early Tagar series was more similar to western nomads (North Pontic Scythians), while the Middle Tagar was more similar to the Southern Siberian populations of the Scythian period. The Late Tagar group (Tes`culture) belonging to the Early Xiongnu period had the “western-most” location on the MDS plot with the maximal genetic difference from Xiongnu and other eastern nomadic groups (but see Discussion concerning the low sample size for the Tes`series).

In a comparison of our Tagar series with modern populations in Eurasia, we detected similarity between the Tagar group and some modern Turkic-speaking populations (with the exception of the Indo-Iranian Tajik population) (Fig 7; S2 Table). Among the modern Turkic-speaking groups, populations from the western part of the Eurasian steppe belt, such as Bashkirs from the Volga-Ural region and Siberian Tatars from the West Siberian forest-steppe zone, were more similar to the Tagar group than modern Turkic-speaking populations of the Altay-Sayan mountain system (including the Khakassians from the Minusinsk basin) (Fig 7).

Location of Tagar archaeological sites from which samples for this study were obtained. Burial grounds: 1—Novaya Chernaya-1; 2—Podgornoe Ozero, Barsuchiha-1, Barsuchiha-6, Barsuchiha-7; 3—Perevozinskiy; 4—Ulug-Kyuzyur, Kichik-Kyuzyur, Sovetskaya Khakassiya; 5—Tepsey-3, Tepsey-8, Tepsey-9; 6—Dolgiy Kurgan.

Mitochondrial DNA diversity and genetic relationships of the Tagar population

Our results are not inconsistent with the assumption of a probable role of gene flow due to the migration from Western Eurasia to the Minusinsk basin in the Bronze Age in the formation of the genetic composition of the Tagar population. Particularly, we detected many mtDNA lineages/clusters with probable West Eurasian origin that were dominant in modern populations of different parts of Europe, Caucasus, and the Near East (such as K and HV6) in our Tagar series based on a phylogeographic analysis.

We detected relatively low genetic distances between our Tagar population and two Bronze Age populations from the Minusinsk basin—the Okunevo culture population (pre-Andronovo Bronze Age) and Andronovo culture population, followed by Afanasievo population from the Minusinsk Basin and Middle Bronze Age population from the Mongolian Altai Mountains (the region adjacent to the Minusinsk basin) (Figs 3 and 6; S3 and S5 Tables). Among West Eurasian part of our Tagar series we also observed haplogroups/sub-haplogroups and haplotypes shared with Early and Middle Bronze Age populations from Minusinsk Basin and western part of Eurasian steppe belt (Fig 4; S5 Table). Thus, our results suggested a potentially significant role of the genetic components, introduced by migrants from Western Eurasia during the Bronze Age, in the formation of the genetic composition of the Tagar population. It is necessary to note the relatively small size of available mtDNA samples from the Bronze Age populations of Minusinsk basin; accordingly, additional mtDNA data for these populations are required to further confirm our inference.

Phylogenetic tree of mtDNA lineages from the Tagar population. Color coding of the Tagar stages: orange—the Early Tagar stage; blue—the Middle Tagar Stage; green—the Late Tagar stage. Color of haplogroup labels: yellow—for Western Eurasian haplogroups; red—for Eastern Eurasian haplogroups.

Another substantial part of the mtDNA pool of the Tagar and other eastern populations of the Scythian World is typical of populations in Southern Siberia and adjacent regions of Central Asia (autochthonous Central Asian mtDNA clusters). Most of these components belong to the East Eurasian cluster of mtDNA haplogroups. Moreover, the role of each of these components in the formation of the genetic composition of subsequent (to the present) populations in South Siberia and Central Asia could be very different. In this regard, cluster C4a2a (and its subcluster C4a2a1), and haplogroup A8 are of particular interest.

Genetic features of successive Tagar groups

We compared successive Tagar groups (Early, Middle, and Late Tagar) with each other and with other Iron Age nomadic populations to evaluate changes in the mtDNA pool structure. Despite the genetic similarity between the Early and Middle Tagar series and Scythian World nomadic groups (Figs 5 and 6; S4 and S6 Tables), there were some peculiarities. For example, the Early Tagar series was more similar to North Pontic Classic Scythians, while the Middle Tagar samples were more similar to the Southern Siberian populations of the Scythian period (i.e., completely synchronous populations of regions neighboring the Minusinsk basin, such as the Pazyryk population from the Altay Mountains and Aldy-Bel population from Tuva).

We observed differences in the mtDNA pool structure between the Early and the Middle chronological stages of the Tagar culture population, as evidenced by the change in the ratio of Western to Eastern Eurasian mtDNA components. The contribution of Eastern Eurasian lineages increased from about one-third (34.8%) in the Early Tagar group to almost one-half (45.8%) in the Middle Tagar group.

Results of multidimensional scaling based on matrix of Slatkin population differentiation (FST) according to frequencies of mtDNA haplogroup in Tagar populations and modern populations of Eurasia. Populations: Tagar (red pentagon) (this study); Mongolian-speaking populations: Khamnigans (Buryat Republic, Russia) [43]; Barghuts (Inner Mongolia, China) [44]; Buryats (Buryat Republic, Southern Siberia, Russia) [43]; Mongols (Mongolia) [45]. Turkic-speaking populations: Tuvinians (Tuva Republic, Russia) [43]; Tofalars (Irkutsk region, Russia) [46]; Altai-Kizhi ((Altai Republic, Russia) [43, 47]; Telenghits (Altai Republic, Russia) [43,47]; Tubalars (Altai Republic) [48]; Shors (Kemerovo region, Russia) [43, 47]; Khakassians (Khakassian Rupublic, Russia) [43, 46]; Altaian Kazakhs (Altai Republic) [49]; Kazakhs (Kazakhstan, Uzbekistan) [50, 51]; Kirghiz (Kyrgyzstan) [50, 51]; Uighurs (Kazakhstan and Xinjiang) [50, 52]; Siberian Tatars (Tyumen and Omsk regions, Russia) [53]; Tatars (Volga-Ural rigion, Russia) [54]; Bashkirs (Volga-Ural region, Russia) [55]; Uzbeks (Uzbekistan) [51, 56]; Turkmens (Turkmenistan) [51, 56]; Nogays [57]; Turkeys [58]; other populations: Evenks [43, 46]; Ulchi [59]; Koreans (South Korea) [43]; Han Chinese [60]; Zhuang (Guangxi, China) [61]; Tadjiks (Tadjikistan) [43, 51]; Iranians [60]; Russians [62].

At the level of mtDNA haplogroups, we detected a decrease in the diversity of phylogenetic clusters during the transition from the Early Tagar to the Middle Tagar. This decline in diversity equally affected the West Eurasian and East Eurasian components of the Tagar mtDNA pool. It should be noted that this decrease can be partially explained by the smaller number of Middle Tagar than Early Tagar samples. Under a simple binomial approximation the mtDNA clusters, observed at frequencies of 6.3% and 11.7%, could be lost by chance in our Early (N = 46) and Middle (N = 24) Tagar samples, respectively. However, the simultaneous lack of several such clusters, with a total frequency in the gene pool of the Early group of 34.8%, is unlikely.

The observed reduction in the genetic distance between the Middle Tagar population and other Scythian-like populations of Southern Siberia(Fig 5; S4 Table), in our opinion, is primarily associated with an increase in the role of East Eurasian mtDNA lineages in the gene pool (up to nearly half of the gene pool) and a substantial increase in the joint frequency of haplogroups C and D (from 8.7% in the Early Tagar series to 37.5% in the Middle Tagar series). These features are characteristic of many ancient and modern populations of Southern Siberia and adjacent regions of Central Asia, including the Pazyryk population of the Altai Mountains. We did not obtain strong evidence for an intensification of genetic contact between the population of the Minusinsk basin and the Altai Mountains in the Middle Tagar period compared with the Early Tagar period. Although, several archaeologists have found evidence for the intensification of contact at the level of material culture, namely, a cultural influence of the population of the Altai Mountains (represented by the Pazyryk population) on the population of the Minusinsk basin (the Saragash Tagar group) [6, 71, 72].

Another important issue is the change in the genetic structure of the Tagar population during the transition from the Middle (Saragash) to the Late (Tes`) stage. The Late Tagar stage refers to the Xiongnu period. Many archaeologists suggest that the formation of the Tes`stage involved the direct cultural influence of the Xiongnu and/or related groups of nomads from more eastern regions of Central Asia [71, 73]. Some archaeologists have even suggested renaming the Tes`stage in the Tes`culture [71], emphasizing the role of new eastern cultural elements. If this influence also existed at the genetic level, then we would expect to observe new genetic elements in the Tes`gene pool, particularly those of East Eurasian origin.

Siberian ancestry

Just a reminder of the recent session in ISBA 8 on expanding Scythians (and also Mongolians and Turks) spreading Siberian ancestry, usually (wrongly) identified as “Uralic-Yeniseian” based on modern populations (similar to how steppe ancestry is wrongly identified as “Indo-European”), see the following graphic including the Tagar population:

Very important observation with implication of population turnover is that pre-Turkic Inner Eurasian populations’ Siberian ancestry appears predominantly “Uralic-Yeniseian” in contrast to later dominance of “Tungusic-Mongolic” sort (which does sporadically occur earlier). Alexander M. Kim

And also the poster by Alexander M. Kim et al. Yeniseian hypotheses in light of genome-wide ancient DNA from historical Siberia:

The relevance of ancient DNA data to debates in historical linguistics is an emphatic strand in much recent work on the archaeogenetics of Eurasia, where the discussion has focused heavily on Indo-European (Haak et al. 2015; Narasimhan et al. 2018; de Barros Damgaard et al. 2018a,b). We present new genome-wide ancient DNA data from a historical Siberian individual in relation to Yeniseian, an isolated language “microfamily” (Vajda 2014) that nonetheless sits at the center of numerous controversial proposals in historical linguistics and cultural interaction. Yeniseian’s sole surviving representative is Ket, a critically endangered language fluently spoken by only a few dozen individuals near the Middle Yenisei River of Central Siberia.

In strong contrast to the present-day picture, river names and argued substrate influences and loanwords in languages outside the current range of Yeniseian, as well as direct records from the Russian colonial period, indicate that speakers of extinct Yeniseian languages had a formerly much broader presence in the taiga of Central Siberia as well as further south in the mountainous Altai-Sayan region – and perhaps even further afield in Inner Asia (Vajda 2010; Gorbachov 2017; Blažek 2016). The consilience of these proposals with genetic data is not straightforward (Flegontov et al. 2015, 2017) and faces a major obstacle in the lack of genetic information from verifiable speakers of Yeniseian languages other than the Kets, who have had complex ongoing interactions with speakers of non-Yeniseian languages such as the Samoyedic Selkups. We attempt to remedy this with new historical Siberian aDNA data, orienting our search for common denominators and systematic difference in a broader landscape of concordance, discordance, and uncertainty at the interface of diachronic linguistics and genetics.


Global demographic history inferred from mitogenomes

Open access Global demographic history of human populations inferred from whole mitochondrial genomes, by Miller, Manica, and Amos, Royal Society Open Science (2018).

Relevant excerpts (emphasis mine):


The Phase 3 sequence data from 20 populations, comprising five populations for each of the four main geographical regions of Europe, East Asia, South Asia and Africa, were downloaded from the 1000 Genomes Project website (, [8]), including whole mitochondrial genome data for 1999 individuals. We decided not to analyse populations from the Americas due to the region’s complex history of admixture [13,14].

The European populations were as follows: Finnish sampled in Finland (FIN); European Caucasians resident in Utah, USA (CEU); British in England and Scotland (GBR); an Iberian population from Spain (IBS) and Toscani from Italy (TSI). Representing East Asia were the Han Chinese in Beijing (CHB); Southern Han Chinese (CHS); Dai Chinese from Xishuangbanna, China (CDX); Kinh population from Ho Chi Minh City, Vietnam (KHV) and Japanese from Tokyo (JPT). The South Asian populations were Punjabi Indians from Lahore, Pakistan (PJL); Gujarati Indians in Houston, USA (GIH) as well as Indian Telugu sampled in the UK (ITU); Bengali from Bangladesh (BEB) and Sri Lankan Tamil from the UK (STU). (…)


We analysed our mtDNA data with the extended Bayesian skyline plot (EBSP) method, a Bayesian, non-parametric technique for inferring past population size fluctuations from genetic data. Building on the previous Bayesian skyline plot (BSP) approach, EBSP uses a piecewise-linear model and Markov chain Monte Carlo (MCMC) methods to reconstruct a populations’ demographic history [17] and is implemented in the software package BEAST v. 2.3.2 [11]. Alignments for each of the 20 populations were loaded separately into the Bayesian Evolutionary Analysis Utility tool (BEAUti v. 2.3.2) in NEXUS format.

Relationship between profile similarity and genetic distance, measured as Fst. Comparisons between regions, circles, are colour-coded: black ¼ AFR-EA; yellow ¼ AFR-EUR; blue ¼ AFR-SA; orange ¼ EUR-EA; green ¼ EA-SA; red ¼ EUR-SA. Comparisons within regions, squares, are coded: peach ¼ EUR; pink ¼ EA; dark blue ¼ EA; light blue ¼ AFR. Profile similarity is calculated as inferred size difference summed over 20 evenly spaced intervals (see Material and methods).

Regional demographic histories


The five European profiles are presented in figure 2. The four southerly populations all show profiles with a stable size up to approximately 14 ka followed by a sudden, rapid increase that becomes progressively less steep towards the present. There is also a north-south trend, with confidence intervals becoming broader towards the north, particularly for the oldest time-points. The Finnish population profile appears rather different, but this is to be expected both because it is so far north and because previous studies have identified Finns as a strong genetic outlier in Europe [19–22].

Inferred demographic histories of five European populations. Dotted line is the median estimate of Ne and the thin grey lines show the boundary of the 95% CPD interval. The x-axis represents time from the present in years and all plots are on the same scale. Map shows origins of sampled populations.

South Asia:

The five profiles for South Asia are shown in figure 3. All populations reveal a period of rapid growth approximately 45–40 ka which then slows. Near the present the two southerly populations, GIH and STU both show evidence of a decline. However, this may be due to these samples being drawn from populations no longer living on the subcontinent, with the downward trend capturing a bottleneck associated with moving to Europe/America, perhaps accentuated by the tendency for immigrant populations to group by region, religion and race [23].

Inferred South Asian population demographic histories. Dotted line is the median Ne estimate and the thin grey lines show the boundary of the 95% CPD intervals. The x-axis represents time from the present in thousands of years and all plots are on the same scale. The map shows location of sampled populations.


Linguistic continuity despite genetic replacement in Remote Oceania


Review of recent papers on East Asia, quite relevant these days: Human Genetics: Busy Subway Networks in Remote Oceania? by Anders Bergström & Chris Tyler-Smith, Current Biology (2018) 28.

Interesting excerpts (emphasis mine):

Ancient DNA is transforming our understanding of the human past by forcing geneticists to confront its real complexity [1]. Historians and archaeologists have long known that the development of human societies was complex and often haphazard, but geneticists have persistently tried to explain present-day patterns of genetic variation using simple models.

Early genetic analyses of present-day populations revealed a mix of Asian (Taiwanese) and Papuan (New Guinea or nearby) ancestries throughout Remote Oceania, with maternally-inherited mitochondrial DNA being predominantly Asian, paternally-inherited Y chromosomes mainly Papuan, and autosomes intermediate [7]. This led to the simple model mentioned above of an Austronesian-speaking population starting out from Taiwan, developing the Lapita culture in the islands near New Guinea while mixing with local Papuans, and then boldly launching out into the unknown Pacific.

The surprise came with the first studies of ancient DNA, when early Lapita people from Vanuatu and Tonga (ca. 2,500-3,000 yBP) showed completely Asian genetic ancestry, so the Papuan genetic component must have entered later.

This is what the most recent ancient DNA papers found:


There thus seems to have been a migration of Papuan-ancestry people from the Bismarck archipelago off the coast of New Guinea, into the islands of Remote Oceania, shortly after those very islands were first settled by people from Asia. Few traces of such a migration and its cultural or technological underpinnings have been found in the archaeological record or in linguistic relationships, which is why it comes as such a surprise. The fact that these Near Oceanian people made the long journey to Vanuatu so soon after the Asian seafarers arrived in their neighbourhood, having had tens of thousands of years to do so previously, strongly suggest that the migration was somehow triggered by interactions with the new Austronesian-speaking arrivals and adoption of their sophisticated seafaring technology. The excess of Y chromosomes of Papuan origin in Remote Oceania, somewhat difficult to explain under the traditional model, might also make sense in the light of an active expansion of people from Near Oceania, as such expansions have often found to be male-biased [10]. Both studies speculate that the arrival of these Papuan-ancestry people might have contributed to the end of the Lapita period and its cultural unity.

The very first settlers of Vanuatu would have spoken Austronesian languages, and the Papuan-ancestry people who arrived shortly after would very likely have spoken Papuan languages. Yet today, all languages of Vanuatu are Austronesian. The arrivals from Near Oceania thus seem to have largely replaced the first settlers but adopted their languages. Posth and colleagues [5] argue that the languages of Vanuatu actually contain some elements of Papuan origin, and that the ancient DNA results are compatible with a more gradual process of cultural interaction and genetic mixing, rather than sudden replacement. Nonetheless, linguistic continuity in the face of this almost complete genetic replacement is extremely unusual in human history, perhaps even unprecedented as Posth and colleagues [5] suggest.

We are seeing now from the Anatolian expansion and in the formation of the Indo-Iranian community that such processes were actually not as unusual as some had previously thought…


Genomic history of South-East Asia: eastern Polynesians, Peninsular Malaysia and North Borneo

Two recent interesting genetic papers:

1. Open Access Investigating the origins of eastern Polynesians using genome-wide data from the Leeward Society Isles, by Hudjashov et al., at Scientific Reports (2018)


The debate concerning the origin of the Polynesian speaking peoples has been recently reinvigorated by genetic evidence for secondary migrations to western Polynesia from the New Guinea region during the 2nd millennium BP. Using genome-wide autosomal data from the Leeward Society Islands, the ancient cultural hub of eastern Polynesia, we find that the inhabitants’ genomes also demonstrate evidence of this episode of admixture, dating to 1,700–1,200 BP. This supports a late settlement chronology for eastern Polynesia, commencing ~1,000 BP, after the internal differentiation of Polynesian society. More than 70% of the autosomal ancestry of Leeward Society Islanders derives from Island Southeast Asia with the lowland populations of the Philippines as the single largest potential source. These long-distance migrants into Polynesia experienced additional admixture with northern Melanesians prior to the secondary migrations of the 2nd millennium BP. Moreover, the genetic diversity of mtDNA and Y chromosome lineages in the Leeward Society Islands is consistent with linguistic evidence for settlement of eastern Polynesia proceeding from the central northern Polynesian outliers in the Solomon Islands. These results stress the complex demographic history of the Leeward Society Islands and challenge phylogenetic models of cultural evolution predicated on eastern Polynesia being settled from Samoa.

Sampling locations and overview of genomic diversity. (a) Sources of population data used in the present study. The Philippine group names are abbreviated as follows: Aet (Aeta); Agt (Agta); Bat (Batak); Cas (Casiguran); Kan (Kankanaey); Taga (Tagalog); Tagb (Tagbanua); Zam (Zambales); and Phi (Philippines, incorporating all other groups from this region). Colours indicate regional affiliation of populations used for analysis of autosomal DNA: orange – mainland Southeast Asia and East Asia; dark blue – Taiwan; brown – Philippines Aeta, Agta and Batak negritos; light blue – Philippines non-negritos; red – western Indonesia; pink – eastern Indonesia; purple – northern Melanesia and New Guinea; black – Australia; green –Polynesia. The usage of populations varies with the type of analysis employed (Supplementary Table S1). Inset map shows the three populations from the Leeward Society Isles, and Tahiti, the major island in the Windward Society Isles. The red circles within Micronesia and Melanesia represent 20 of the atolls and islands referred to collectively as outlier Polynesia. The red stars denote the three additional Polynesian outlier populations (Rennell and Bellona, Tikopia), which together with Tonga, were used in analysis of ancient admixture by Skoglund, et al.25. Detailed sample information is given in Supplementary Table S1. The map was created using R v. 3.4.1 (R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing,, and packages ‘maps’ v. 3.2.0 ( and ‘mapdata’ v. 2.2-6 ( (b) Inset at top right shows two alternative reconstructed sub-groupings of Polynesian languages discussed in the text. The critical differences are the position of the East Polynesian languages relative to the rest of nuclear Polynesian, and their relationship to the Central Northern Outlier languages. In the sub-grouping according to Pawley31 all the Polynesian Outlier languages group within Samoic implying an early separation of Proto-East Polynesian from the rest of the Nuclear Polynesian languages. In the alternative sub-grouping proposed by Wilson32 the Central Northern Outlier languages group with the languages of East Polynesia, within a larger clade containing the other Northern Outlier languages. (c) Principal components analysis of genome-wide SNP diversity in 639 individuals populations shown in panel A; axes are scaled by the proportion of variance described by the corresponding principal component.

2. Genomic structure of the native inhabitants of Peninsular Malaysia and North Borneo suggests complex human population history in Southeast Asia, by Yew et al. at at Human Genetics (2018)


Southeast Asia (SEA) is enriched with a complex history of peopling. Malaysia, which is located at the crossroads of SEA, has been recognized as one of the hubs for early human migration. To unravel the genomic complexity of the native inhabitants of Malaysia, we sequenced 12 samples from 3 indigenous populations from Peninsular Malaysia and 4 native populations from North Borneo to a high coverage of 28–37×. We showed that the Negritos from Peninsular Malaysia shared a common ancestor with the East Asians, but exhibited some level of gene flow from South Asia, while the North Borneo populations exhibited closer genetic affinity towards East Asians than the Malays. The analysis of time of divergence suggested that ancestors of Negrito were the earliest settlers in the Malay Peninsula, whom first separated from the Papuans ~ 50–33 thousand years ago (kya), followed by East Asian (~ 40–15 kya), while the divergence time frame between North Borneo and East Asia populations predates the Austronesian expansion period implies a possible pre-Neolithic colonization. Substantial Neanderthal ancestry was confirmed in our genomes, as was observed in other East Asians. However, no significant difference was observed, in terms of the proportion of Denisovan gene flow into these native inhabitants from Malaysia. Judging from the similar amount of introgression in the Southeast Asians and East Asians, our findings suggest that the Denisovan gene flow may have occurred before the divergence of these populations and that the shared similarities are likely an ancestral component.

See also:

Two more studies on the genetic history of East Asia: Han Chinese and Thailand


A comprehensive map of genetic variation in the world’s largest ethnic group – Han Chinese, by Charleston et al. (2017).

It is believed – based on uniparental markers from modern and ancient DNA samples and array-based genome-wide data – that Han Chinese originated in the Central Plain region of China during prehistoric times, expanding with agriculture and technology northward and southward, to become the largest Chinese ethnic group.


As are most non-European populations around the globe, the Han Chinese are relatively understudied in population and medical genetics studies. From low-coverage whole-genome sequencing of 11,670 Han Chinese women we present a catalog of 25,057,223 variants, including 548,401 novel variants that are seen at least 10 times in our dataset. Individuals from our study come from 19 out of 22 provinces across China, allowing us to study population structure, genetic ancestry, and local adaptation in Han Chinese. We identify previously unrecognized population structure along the East-West axis of China and report unique signals of admixture across geographical space, such as European influences among the Northwestern provinces of China. Finally, we identified a number of highly differentiated loci, indicative of local adaptation in the Han Chinese. In particular, we detected extreme differentiation among the Han Chinese at MTHFR, ADH7, and FADS loci, suggesting that these loci may not be specifically selected in Tibetan and Inuit populations as previously suggested. On the other hand, we find that Neandertal ancestry does not vary significantly across the provinces, consistent with admixture prior to the dispersal of modern Han Chinese. Furthermore, contrary to a previous report, Neandertal ancestry does not explain a significant amount of heritability in depression. Our findings provide the largest genetic data set so far made available for Han Chinese and provide insights into the history and population structure of the world’s largest ethnic group.

Using Shanghai individuals as representatives, shared drift between Chinese and ancient humans are computed by calculating the outgroup f3 statistics of the form f3(Mbuty;X, Y), with ancient individuals separated into approximately Palaeolithic, Mesolithic, Neolithic , and Chalcolithic-Medieval times. it is found that modern Chinese individuals show greater shared drift with pre-Neolithic hunter-gatherers rather than Neolithic farmers (Featured image from the article).

EDIT (17/7/2017): Davidski at Eurogenes shares an interesting view on this kind of results:

These sorts of estimates always look way off. And I doubt that it’s largely the result of the Silk Road, which linked China to the Near East and Mediterranean rather than to Northern Europe. More likely it reflects gene flow from the Pontic-Caspian steppe in Eastern Europe during the Bronze and Iron ages, via the Afanasievo, Andronovo, and other closely related steppe peoples

New insights from Thailand into the maternal genetic history of Mainland Southeast Asia, by Kutanan et al. (2017)


Tai-Kadai (TK) is one of the major language families in Mainland Southeast Asia (MSEA), with a concentration in the area of Thailand and Laos. Our previous study of 1,234 mtDNA genome sequences supported a demic diffusion scenario in the spread of TK languages from southern China to Laos as well as northern and northeastern Thailand. Here we add an additional 560 mtDNA sequences from 22 groups, with a focus on the TK-speaking central Thai people and the Sino-Tibetan speaking Karen. We find extensive diversity, including 62 haplogroups not reported previously from this region. Demic diffusion is still a preferable scenario for central Thais, emphasizing the extension and expansion of TK people through MSEA, although there is also some support for an admixture model. We also tested competing models concerning the genetic relationships of groups from the major MSEA languages, and found support for an ancestral relationship of TK and Austronesian-speaking groups.