Eurasian steppe dominated by Iranian peoples, Indo-Iranian expanded from East Yamna


The expected study of Eurasian samples is out (behind paywall): 137 ancient human genomes from across the Eurasian steppes, by de Barros Damgaard et al. Nature (2018).

Dicussion (emphasis mine):

Our findings fit well with current insights from the historical linguistics of this region (Supplementary Information section 2). The steppes were probably largely Iranian-speaking in the first and second millennia bc. This is supported by the split of the Indo-Iranian linguistic branch into Iranian and Indian33, the distribution of the Iranian languages, and the preservation of Old Iranian loanwords in Tocharian34. The wide distribution of the Turkic languages from Northwest China, Mongolia and Siberia in the east to Turkey and Bulgaria in the west implies large-scale migrations out of the homeland in Mongolia since about 2,000 years ago35. The diversification within the Turkic languages suggests that several waves of migration occurred36 and, on the basis of the effect of local languages, gradual assimilation to local populations had previously been assumed37. The East Asian migration starting with the Xiongnu accords well with the hypothesis that early Turkic was the major language of Xiongnu groups38. Further migrations of East Asians westwards find a good linguistic correlate in the influence of Mongolian on Turkic and Iranian in the last millennium39. As such, the genomic history of the Eurasian steppes is the story of a gradual transition from Bronze Age pastoralists of West Eurasian ancestry towards mounted warriors of increased East Asian ancestry—a process that continued well into historical times.

This paper will need a careful reading – better in combination with Narasimhan et al. (2018), when their tables are corrected – , to assess the actual ‘Iranian’ nature of the peoples studied. Their wide and long-term dominion over the steppe could also potentially explain some early samples from Hajji Firuz with steppe ancestry.

Principal component analyses. The principal components 1 and 2 were plotted for the ancient data analysed with the present-day data (no projection bias) using 502 individuals at 242,406 autosomal SNP positions. Dimension 1 explains 3% of the variance and represents a gradient stretching from Europe to East Asia. Dimension 2 explains 0.6% of the variance, and is a gradient mainly represented by ancient DNA starting from a ‘basal-rich’ cluster of Natufian hunter-gatherers and ending with EHGs. BA, Bronze Age; EMBA, Early-to-Middle Bronze Age; SHG, Scandinavian hunter-gatherers.

For the moment, at first sight, it seems that, in terms of Y-DNA lineages:

  • R1b-Z93 (especially Z2124 subclades) dominate the steppes in the studied periods.
  • R1b-P312 is found in Hallstatt ca. 810 BC, which is compatible with its role in the Celtic expansion.
  • R1b-U106 is found in a West Germanic chieftain in Poprad (Slovakia) ca. 400 AD, during the Migration Period, hence supporting once again the expansion of Germanic tribes especially with R1b-U106 lineages.
  • A new sample of N1c-L392 (L1025) lineage dated ca. 400 AD, now from Lithuania, points again to a quite late expansion of this lineage to the region, believed to have hosted Uralic speakers for more than 2,000 years before this.
  • A sample of haplogroup R1a-Z282 (Z92) dated ca. 1300 AD in the Golden Horde is probably not quite revealing, not even for the East Slavic expansion.
  • Also, interestingly, some R1b(xM269) lineages seem to be associated with Turkic expansions from the eastern steppe dated around 500 AD, which probably points to a wide Eurasian distribution of early R1b subclades in the Mesolithic.

NOTE. I have referenced not just the reported subclades from the paper, but also (and mainly) further Y-SNP calls studied by Open Genomes. See the spreadsheet here.

Interesting also to read in the supplementary materials the following, by Michaël Peyrot (emphasis mine):

1. Early Indo-Europeans on the steppe: Tocharians and Indo-Iranians

The Indo-European language family is spread over Eurasia and comprises such branches and languages as Greek, Latin, Germanic, Celtic, Sanskrit etc. The branches relevant for the Eurasian steppe are Indo-Aryan (= Indian) and Iranian, which together form the Indo-Iranian branch, and the extinct Tocharian branch. All Indo-European languages derive from a postulated protolanguage termed Proto-Indo-European. This language must have been spoken ca 4500–3500 BCE in the steppe of Eastern Europe21. The Tocharian languages were spoken in the Tarim Basin in present-day Northwest China, as shown by manuscripts from ca 500–1000 CE. The Indo-Aryan branch consists of Sanskrit and several languages of the Indian subcontinent, including Hindi. The Iranian branch is spread today from Kurdish in the west, through a.o. Persian and Pashto, to minority languages in western China, but was in the 2nd and 1st millennia BCE widespread also on the Eurasian steppe. Since despite their location Tocharian and Indo-Iranian show no closer relationship within Indo-European, the early Tocharians may have moved east before the Indo-Iranians. They are probably to be identified with the Afanasievo Culture of South Siberia (ca 2900 – 2500 BCE) and have possibly entered the Tarim Basin ca 2000 BCE103.

The Indo-Iranian branch is an extension of the Indo-European Yamnaya Culture (ca 3000–2400 BCE) towards the east. The rise of the Indo-Iranian language, of which no direct records exist, must be connected with the Abashevo / Sintashta Culture (ca 2100 – 1800 BCE) in the southern Urals and the subsequent rise and spread of Andronovo-related Culture (1700 – 1500 BCE). The most important linguistic evidence of the Indo-Iranian phase is formed by borrowings into Finno-Ugric languages104–106. Kuz’mina (2001) identifies the Finno-Ugrians with the Andronoid cultures in the pre-taiga zone east of the Urals107. Since some of the oldest words borrowed into Finno-Ugric are only found in Indo-Aryan, Indo-Aryan and Iranian apparently had already begun to diverge by the time of these contacts, and when both groups moved east, the Iranians followed the Indo-Aryans108. Being pushed by the expanding Iranians, the Indo-Aryans then moved south, one group surfacing in equestrian terminology of the Anatolian Mitanni kingdom, and the main group entering the Indian subcontinent from the northwest.

Summary map. Depictions of the five main migratory events associated with the genomic history of the steppe pastoralists from 3000 bc to the present. a, Depiction of Early Bronze Age migrations related to the expansion of Yamnaya and Afanasievo culture. b, Depiction of Late Bronze Age migrations related to the Sintashta and Andronovo horizons. c, Depiction of Iron Age migrations and sources of admixture. d, Depiction of Hun-period migrations and sources of admixture. e, Depiction of Medieval migrations across the steppes.

2. Andronovo Culture: Early Steppe Iranian

Initially, the Andronovo Culture may have encompassed speakers of Iranian as well as Indo-Aryan, but its large expansion over the Eurasian steppe is most probably to be interpreted as the spread of Iranians. Unfortunately, there is no direct linguistic evidence to prove to what extent the steppe was indeed Iranian speaking in the 2nd millennium BCE. An important piece of indirect evidence is formed by an archaic stratum of Iranian loanwords in Tocharian34,109. Since Tocharian was spoken beyond the eastern end of the steppe, this suggests that speakers of Iranian spread at least that far. In the west of the Tarim Basin the Iranian languages Khotanese and Tumshuqese were spoken. However, the Tocharian B word etswe ‘mule’, borrowed from Iranian *atswa- ‘horse’, cannot derive from these languages, since Khotanese has aśśa- ‘horse’ with śś instead of tsw. The archaic Iranian stratum in Tocharian is therefore rather to be connected with the presence of Andronovo people to the north and possibly to the east of the Tarim Basin from the middle of the 2nd millennium BCE onwards110.

Since Kristiansen and Allentoft sign the paper (and Peyrot is a colleague of Kroonen), it seems that they needed to expressly respond to the growing criticism about their recent Indo-European – Corded Ware Theory. That’s nice.

They are obviously trying to reject the Corded Ware – Uralic links that are on the rise lately among Uralicists, now that Comb Ware is not a suitable candidate for the expansion of the language family.

IECWT-proponents are apparently not prepared to let it go quietly, and instead of challenging the traditional Neolithic Uralic homeland in Eastern Europe with a recent paper on the subject, they selected an older one which partially fit, from Kuz’mina (2001), now shifting the Uralic homeland to the east of the Urals (when Kuz’mina asserts it was south of the Urals).

Different authors comment later in this same paper about East Uralic languages spreading quite late, so even their text is not consistent among collaborating authors.

Also interesting is the need to resort to the questionable argument of early Indo-Aryan loans – which may have evidently been Indo-Iranian instead, since there is no way to prove a difference between both stages in early Uralic borrowings from ca. 4,500-3,500 years ago…

EDIT (10/5/2018) The linguistic supplement of the Science paper deals with different Proto-Indo-Iranian stages in Uralic loans, so on the linguistic side at least this influence is clear to all involved.

A rejection of such proposals of a late, eastern homeland can be found in many recent writings of Finnic scholars; see e.g. my references to Parpola (2017), Kallio (2017), or Nordqvist (2018).

NOTE. I don’t mind repeating it again: Uralic is one possibility (the most likely one) for the substrate language that Corded Ware migrants spread, but it could have been e.g. another Middle PIE dialect, similar to Proto-Anatolian (after the expansion of Suvorovo-Novodanilovka chiefs). I expressly stated this in the Corded Ware substrate hypothesis, since the first edition. What was clear since 2015, and should be clear to anyone now, is that Corded Ware did not spread Late PIE languages to Europe, and that some east CWC groups only spread languages to Asia after admixing with East Yamna. If they did not spread Uralic, then it was a language or group of languages phonetically similar, which has not survived to this day.

Their description of Yamna migrations is already outdated after Olalde et al. & Mathieson et al. (2018), and Narasimhan et al. (2018), so they will need to update their model (yet again) for future papers. As I said before, Anthony seems to be one step behind the current genetic data, and the IECWT seems to be one step behind Anthony in their interpretations.

At least we won’t have the Yamna -> Corded Ware -> BBC nonsense anymore, and they expressly stated that LPIE is to be associated with Yamna, and in particular the “Indo-Iranian branch is an extension of the Indo-European Yamnaya Culture (ca 3000–2400 BCE) to the East” (which will evidently show an East Yamna / Poltavka society of R1b-L23 subclades), so that earlier Eneolithic cultures have to be excluded, and Balto-Slavic identification with East Europe is also out of the way.


Ancient nomadic tribes of the Mongolian steppe dominated by a single paternal lineage

The genome of an ancient Rouran individual reveals an important paternal lineage in the Donghu population, by Li et al. Am J Phys Anthropol (2018), 1–11.

Abstract (emphasis mine):

Following the Xiongnu and Xianbei, the Rouran Khaganate (Rouran) was the third great nomadic tribe on the Mongolian Steppe. However, few human remains from this tribe are available for archaeologists and geneticists to study, as traces of the tombs of these nomadic people have rarely been found. In 2014, the IA‐M1 remains (TL1) at the Khermen Tal site from the Rouran period were found by a Sino‐Mongolian joint archaeological team in Mongolia, providing precious material for research into the genetic imprint of the Rouran.

Materials and methods
The mtDNA hypervariable sequence I (HVS‐I) and Y‐chromosome SNPs were analyzed, and capture of the paternal non‐recombining region of the Y chromosome (NRY) and whole‐genome shotgun sequencing of TL1 were performed. The materials from three sites representing the three ancient nationalities (Donghu, Xianbei, and Shiwei) were selected for comparison with the TL1 individual.

The mitochondrial haplotype of the TL1 individual was D4b1a2a1. The Y‐chromosome haplotype was C2b1a1b/F3830 (ISOGG 2015), which was the same as that of the other two ancient male nomadic samples (ZHS5 and GG3) related to the Xianbei and Shiwei, which were also detected as F3889; this haplotype was reported to be downstream of F3830 by Wei et al. (2017).

We conclude that F3889 downstream of F3830 is an important paternal lineage of the ancient Donghu nomads. The Donghu‐Xianbei branch is expected to have made an important paternal genetic contribution to Rouran. This component of gene flow ultimately entered the gene pool of modern Mongolic‐ and Manchu‐speaking populations.

The ancient males (TL1, ZHS5, and GG3) was grouped under C2b1a1b1/F3880 on the Y-DNA haplogroup C lineage using BEAST


These results suggested that TL1 likely presents a close paternal relationship to the Donghu people and may have even descended from a branch of the ancient Donghu-Xianbei people, based on the conclusion that haplogroup C2b1a/F3918 can be considered the paternal branch of the ancient Donghu people (Zhang et al., 2018). The Y-chromosome phylogenetic tree showed that TL1 shared a branch with modern Mongolian-Buryats, Hezhen, Xibo, Yugur, and Kazakh, suggesting that the TL1 individual from the Rouran period should also generally present close paternal genetic relationships with modern Mongolic- and Manchu-speaking peoples.

In general, the Rouran Khaganate originated from an alliance of the ancient Eurasian steppe nomads, which disintegrated and disappeared with the progress of history. This group was complex, and its origin cannot be explained based only on one individual. However, we can trace the genetic imprint of the Rouran people through genome analysis of the TL1 individual. On the basis of the comparison with other ancient nomadic people (Donghu, Xianbei, and Shiwei) and data on modern individuals from published articles (Lippold et al., 2014; Wei et al., 2017) (Supporting Information S5), we found that they all share the same haplotype implying shared paternal ancestry between the Donghu, Xianbei and Rouran populations. Furthermore, this gene flow (mainly haplogroup C2b1a/F3918) did not stop with the disappearance of the Rouran, and a portion was instead passed on in other groups, such as the ancient Shiwei people (later than Rouran), eventually reaching the gene pool of modern Mongolic- and Manchu-speaking populations (Mongolian-Buryats, Hezhen, Xibo, et al).

Interesting to see now confirmed with ancient DNA the proposal of a C3*-DYS448del cluster as the paternal lineage defining ancient Mongolian tribes, a theory based on ancient and modern samples – since it is found in low frequency in almost all Mongolic- and Turkic-speaking populations.

This is yet another proof of how prehistoric ethnolinguistic expansions are usually accompanied by haplogroup expansion and reduction in variability.

I wonder what other ancient chiefdom-type steppe-based nomadic groups were also dominated by a single paternal lineage


Genetic structure, divergence and admixture of Han Chinese, Japanese and Korean populations


Open access Genetic structure, divergence and admixture of Han Chinese, Japanese and Korean populations, by Wang, Lu, Chung, and Xu, Hereditas (2018) 155:19.

Abstract (emphasis mine):

Han Chinese, Japanese and Korean, the three major ethnic groups of East Asia, share many similarities in appearance, language and culture etc., but their genetic relationships, divergence times and subsequent genetic exchanges have not been well studied.

We conducted a genome-wide study and evaluated the population structure of 182 Han Chinese, 90 Japanese and 100 Korean individuals, together with the data of 630 individuals representing 8 populations wordwide. Our analyses revealed that Han Chinese, Japanese and Korean populations have distinct genetic makeup and can be well distinguished based on either the genome wide data or a panel of ancestry informative markers (AIMs). Their genetic structure corresponds well to their geographical distributions, indicating geographical isolation played a critical role in driving population differentiation in East Asia. The most recent common ancestor of the three populations was dated back to 3000 ~ 3600 years ago. Our analyses also revealed substantial admixture within the three populations which occurred subsequent to initial splits, and distinct gene introgression from surrounding populations, of which northern ancestral component is dominant.

These estimations and findings facilitate to understanding population history and mechanism of human genetic diversity in East Asia, and have implications for both evolutionary and medical studies.

Population level phylogenetic Tree and Principal component analysis (PCA). (A) The maximum likelihood tree was constructed based on pair-wise FST matrix. And the marked number are bootstrap value; (B) The top two PCs of individuals representing six East Asian populations, mapped to their corresponding geographic locations (generated by R 2.15.2 and Microsoft Excel 2010)

Interesting excerpts:

It is obvious that the genetic difference among the three East Asian groups initially resulted from population divergence due to pre-historical or historical migrations. Subsequently, different geographical locations where the three populations are located, mainland of China, Korean Peninsular and Japanese archipelago, respectively, apparently facilitated population differentiation due to physical isolation and independent genetic drift. Our estimations of population divergence time among the three groups, 1.2~ 3.6 KYA, are largely consistent with known history of the three populations and those related. However, considering that recent admixture could have reduced genetic difference between populations, it is likely the divergence time was underestimated.

We detected substantial gene flow among the three populations and also from the surrounding populations. For example, based on our analysis with the F3 test, Korean received gene flow from Han Chinese and Japanese, and gene flow also happened between Han Chinese and Japanese (Additional file 12: Table S3). These gene flows are expected to have reduced the genetic differentiation between the three ethnic groups. On the other hand, we also detected considerable gene flow from surrounding populations to the three populations studied. For instance, an ancestral population represented by Ryukyuan have contributed greater to Japanese than to Han Chinese, while southern ethnic group like Dai have contributed more to continent populations than to island and peninsula populations. Contrary to the gene flow among the three populations, these gene flows from surrounding populations are expected to have increased genetic difference among the three populations if they occurred independently and from different source populations. According to our results, the major source of gene flow to the three ethnic groups were substantially different, for example, the major source of gene flow to Han Chinese was from southern ethnic groups, the major source of gene flow to Japanese was from southern islands, and the major source of gene flow to Korean were from both mainland and islands. Therefore, those gene flows might have significantly contributed to further genetic differentiation of the three populations.

The three populations have similar but not identical demographical history; they all experience a strong population expansion in the last 20,000 years. However, according to different geographic distribution, their effective population size and population expansion are different.

Although based on modern populations, the study is interesting in light of the potential implications for a Macro-Altaic proposal.


Statistical methods fashionable again in Linguistics: Reconstructing Proto-Australian dialects

Reconstructing remote relationships – Proto-Australian noun class prefixation, by Mark Harvey & Robert Mailhammer, Diachronica (2017) 34(4): 470–515


Evaluation of hypotheses on genetic relationships depends on two factors: database size and criteria on correspondence quality. For hypotheses on remote relationships, databases are often small. Therefore, detailed consideration of criteria on correspondence quality is important. Hypotheses on remote relationships commonly involve greater geographical and temporal ranges. Consequently, we propose that there are two factors which are likely to play a greater role in comparing hypotheses of chance, contact and inheritance for remote relationships: (i) spatial distribution of corresponding forms; and (ii) language specific unpredictability in related paradigms. Concentrated spatial distributions disfavour hypotheses of chance, and discontinuous distributions disfavour contact hypotheses, whereas hypotheses of inheritance may accommodate both. Higher levels of language-specific unpredictability favour remote over recent transmission. We consider a remote relationship hypothesis, the Proto-Australian hypothesis. We take noun class prefixation as a test dataset for evaluating this hypothesis against these two criteria, and we show that inheritance is favoured over chance and contact.

I was redirected to this work by my wife – who discovered it reading BBC News – , suspicious of its potential glottochronological content. However, I must say – speaking from my absolute ignorance of the main language family investigated – , that it seemed in general an interesting read, with some thorough discussion and attention to detail.

The statistical analyses, however, seem to disrupt the content, and – in my opinion – do not help support its conclusions.

Map of Non-Pama-Nyungan languages.

Computer Science and Linguistics

We are evidently on alert to tackle dubious research, because of the revival of pseudoscientific methods in linguistic investigation, promoted (yet again) by Nature.

It seems that journals with the highest impact factor, in their search for groundbreaking conclusions supported by any methods involving numbers, are setting a still lower level of standards for academic disciplines.

NOTE. If you think about it – if glottochronology has survived the disgrace it fell into in the 2000s, to come back again now to the top of the publishing industry… How can we expect the “Yamnaya ancestry” concept to be overcome? I guess we will still see certain Eastern Europeans in 2030 arguing for elevated steppe ancestry here and there to support the conclusions of the 2015 papers, no matter what…

I am sure that worse times lie ahead for traditional comparative grammar. For example, it seems that there will be more publications on Proto-Indo-European using novel computer methods: a group led by Janhunen and Pyysalo, from the Department of Languages at the University of Helsinki, promises – under an ever-growing bubble of mistery (or so it seems from their Twitter and Facebook accounts) – a machine-implemented reconstruction (with the generative etymological PIE lexicon project) that will once and for all solve all our previous ‘inconsistencies’…

Spoiler alert for their publications: whether they select to go on mainly with computer-implemented methods, or they use them to support more traditional results, their conclusions will confirm (surprise!) their authors’ previous reactionary theses, such as a renewed support for the traditional monolaryngealism, and a rejection of Kortlandt’s or Kloekhorst’s (i.e. the Leiden School’s) theories on Proto-Indo-European phonology, and thus a PIE relationship to Proto-Uralic, probably stressing yet again an independent origin for both proto-languages.

See also:

Oldest N1c1a1a-L392 samples and Siberian ancestry in Bronze Age Fennoscandia

Open access preprint at bioRxiv, Ancient Fennoscandian genomes reveal origin and spread of Siberian ancestry in Europe, by Lamnidis et al. (2018).

Abstract (emphasis mine):

European history has been shaped by migrations of people, and their subsequent admixture. Recently, evidence from ancient DNA has brought new insights into migration events that could be linked to the advent of agriculture, and possibly to the spread of Indo-European languages. However, little is known so far about the ancient population history of north-eastern Europe, in particular about populations speaking Uralic languages, such as Finns and Saami. Here we analyse ancient genomic data from 11 individuals from Finland and Northwest Russia. We show that the specific genetic makeup of northern Europe traces back to migrations from Siberia that began at least 3,500 years ago. This ancestry was subsequently admixed into many modern populations in the region, in particular populations speaking Uralic languages today. In addition, we show that ancestors of modern Saami inhabited a larger territory during the Iron Age than today, which adds to historical and linguistic evidence for the population history of Finland.

Interesting excerpts (edited):

While the Siberian genetic component described here was previously described in modern-day populations from the region, we gain further insights into its temporal depth. Our data suggest that this fourth genetic component found in modern-day north-eastern Europeans arrived in the area around 4,000 years ago at the latest, as illustrated by ALDER dating using the ancient genome-wide data from Bolshoy Oleni Ostrov. The upper bound for the introduction of this component is harder to estimate. The component is absent in the Karelian hunter-gatherers (EHG) 3 dated to 8,300-7,200 yBP as well as Mesolithic and Neolithic populations from the Baltics from 8,300 yBP and 7,100-5,000 yBP respectively. While this suggests an upper bound of 5,000 yBP for the arrival of Siberian ancestry, we cannot exclude the possibility of its presence even earlier, yet restricted to more northern regions, as suggested by its absence in populations in the Baltic during the Bronze Age. Our study also presents the earliest occurrence of the Y-chromosomal haplogroup N1c in Fennoscandia. N1c is common among modern Uralic speakers, and has also been detected in Hungarian individuals dating to the 10th century, yet it is absent in all published Mesolithic genomes from Karelia and the Baltics.

The large Siberian component in the Bolshoy individuals from the Kola Peninsula provides the earliest direct genetic evidence for an eastern migration into this region. Such contact is well documented in archaeology, with the introduction of asbestos-mixed Lovozero ceramics during the second millenium BC, and the spread of even-based arrowheads in Lapland from 1,900 BCE. Additionally, the nearest counterparts of Vardøy ceramics, appearing in the area around 1,600-1,300 BCE, can be found on the Taymyr peninsula, much further to the east. Finally, the Imiyakhtakhskaya culture from Yakutia spread to the Kola Peninsula during the same period. Contacts between Siberia and Europe are also recognised in linguistics. The fact that the Siberian genetic component is consistently shared among Uralic-speaking populations, with the exceptions of Hungarians and the non-Uralic speaking Russians, would make it tempting to equate this component with the spread of Uralic languages in the area. However, such a model may be overly simplistic. First, the presence of the Siberian component on the Kola Peninsula at ca. 4000 yBP predates most linguistic estimates of the spread of Uralic languages to the area. Second, as shown in our analyses, the admixture patterns found in historic and modern Uralic speakers are complex and in fact inconsistent with a single admixture event. Therefore, even if the Siberian genetic component partly spread alongside Uralic languages, it likely presented only an addition to populations carrying this component from earlier.

Plot of ADMIXTURE (K=3) results containing West Eurasian populations and the Nganasan. Ancient individuals from this study are represented by thicker bars.

The novel genome-wide data here presented from ancient individuals from Finland opens new insights into Finnish population history. Two of the three higher coverage individuals and all six low coverage individuals from Levänluhta showed low genetic affinity to modern-day Finnish speakers of the area. Instead, an increased affinity was observed to modern-day Saami speakers, now mostly residing in the north of the Scandinavian Peninsula. These results suggest that the geographic range of the Saami extended further south in the past, and hints at a genetic shift at least in the western Finnish region during the Iron Age. The findings are in concordance with the noted linguistic shift from Saami languages to early Finnish. Further ancient DNA from Finland is needed to conclude to what extent these signals of migration and admixture are representative of Finland as a whole.

PCA plot of 113 Modern Eurasian populations, with individuals from this study projected on the principal components. Uralic speakers are highlighted in light purple.

The two samples of haplogroup N1c1a1a-L392/L1026, dated ca. 1500 BC, come from the site Bolshoy Oleniy Ostrov, in the Kola Peninsula.

Bolshoy Oleniy Ostrov (Great Reindeer Island), situated in the Kola Bay of the Barents Sea and separated from the mainland by Yekarerininsky Island and two straits, harbors the ancient cemetery of an unknown Early Metal Age culture. The preservation of artifacts made from bone and antler, wooden structures, as well as human remains is remarkable for the location and age this site represents. Altogether 19 skeletons of adults and children have been recognized from both single and collective burials of the site, together with more than 250 artifacts. (…) Apart from these excavations, approximately 25 burials were revealed in 1934 during the construction of fortifications. (…) Radiocarbon dates are provided by Moiseyev and Khartanovich in their 2012 study, placing the site in middle to the late 2nd millennium BC (…)

After seing how Late Indo-European languages spread with Yamna and (mainly) R1b-L23 lineages, we are now obtaining proof of how Siberian ancestry – likely accompanying N1c-L392 lineages – was probably related to an early archaeological Siberian influence in the easternmost region of North-East Europe, seen also probably in linguistics.

NOTE. Whereas I proposed – based mainly on common guesstimates – that R1a-M417 and EHG ancestry might have signaled the arrival of an early Yukaghir substratum to NE Europe, later acquired by Uralic spreading over this territory, while N1c1a1a lineages with the Seima-Turbino phenomenon might have given Uralic its later Altaic traits, it is indeed possible – and more likely with the findings in this paper – that N1c1a1a lineages may have in fact spread Yukaghir languages, especially if (like the Leiden school) one supports an Indo-Uralic community.

The linguistic effect of this migration may depend on one’s preferred model for Proto-Uralic and its strata, and especially on one’s position in the Proto-Uralic vs. Proto-Uralo-Yukaghir controversy. Although I really didn’t have a strong opinion on this matter, it is clear from my texts that (unlike Kortlandt) I didn’t consider Yukaghir to share a common ancestor with Uralic languages. What genomics is showing right now seems to me directly translatable to a linguistic model, and we should therefore reject an original Proto-Uralo-Yukaghir community.

Also, it seems that the Finnish population peak which expanded today’s prevalent N1c-L392 lineages – after the Iron Age bottleneck which likely reduced its haplogroup diversity – may have been associated with the event that displaced the Saami population from Finland after ca. 1000 AD.

I think it is becoming still clearer where Uralic languages came from.


Model for the spread of Transeurasian (Macro-Altaic) communities with farming


Austronesian influence and Transeurasian ancestry in Japanese: A case of farming/language dispersal, by Martine Robbeets, Max Planck Institute for the Science of Human History.


In this paper, I propose a hypothesis reconciling Austronesian influence and Transeurasian ancestry in the Japanese language, explaining the spread of the Japanic languages through farming dispersal. To this end, I identify the original speech community of the Transeurasian language family as the Neolithic Xinglongwa culture situated in the West Liao River Basin in the sixth millennium bc. I argue that the separation of the Japanic branch from the other Transeurasian languages and its spread to the Japanese Islands can be understood as occurring in connection with the dispersal of millet agriculture and its subsequent integration with rice agriculture. I further suggest that a prehistorical layer of borrowings related to rice agriculture entered Japanic from a sister language of proto-Austronesian, at a time when both language families were still situated in the Shandong-Liaodong interaction sphere.

Classification of the Transeurasian languages according to Robbeets ( forthcoming)

Another interesting anthropological model to validate with future genomic analyses, although I was never convinced about a grouping (let alone reconstructible proto-language) beyond Micro-Altaic languages.

NOTE. The Max Planck Institute may be a great source of scientific advancement, but in Linguistics you can see from the projects Indo-European languages originate in Anatolia (2012) and A massive migration from the steppe brought Indo-European languages to Europe (2015) (the last one referring to the Corded Ware culture, associated with the study by Haak et al. 2015) that they have not got it quite right with Proto-Indo-European… I like the traditional approach of this paper, though, including a thorough assessment of archaeological and linguistic details.

Featured images: Left. The eastward spread of millet agriculture in association with ancestral speech communities. Right: The spread of agriculture and language to Japan.

See also:

Y chromosome C2*-star cluster traces back to ordinary Mongols, rather than Genghis Khan


Article behind paywall, Whole-sequence analysis indicates that the Y chromosome C2*-Star Cluster traces back to ordinary Mongols, rather than Genghis Khan, by Wei, Yan, Lu, et al. Eur J Hum Genet (2018); 26:230–237


The Y-chromosome haplogroup C3*-Star Cluster (revised to C2*-ST in this study) was proposed to be the Y-profile of Genghis Khan. Here, we re-examined the origin of C2*-ST and its associations with Genghis Khan and Mongol populations. We analyzed 34 Y-chromosome sequences of haplogroup C2*-ST and its most closely related lineage. We redefined this paternal lineage as C2b1a3a1-F3796 and generated a highly revised phylogenetic tree of the haplogroup, including 36 sub-lineages and 265 non-private Y-chromosome variants. We performed a comprehensive analysis and age estimation of this lineage in eastern Eurasia, including 18,210 individuals from 292 populations. We discovered that the origin of populations with high frequencies of C2*-ST can be traced to either an ancient Niru’un Mongol clan or ordinary Mongol tribes. Importantly, the age of the most recent common ancestor of C2*-ST (2576 years, 95% CI = 1975–3178) and its sub-lineages, and their expansion patterns, are consistent with the diffusion of all Mongolic-speaking populations, rather than Genghis Khan himself or his close male relatives. We concluded that haplogroup C2*-ST is one of the founder paternal lineages of all Mongolic-speaking populations, and direct evidence of an association between C2*-ST and Genghis Khan has yet to be discovered.

This is a great example of the potential mistake that one can make in assessing leading clans of population expansions from the perspective of the renown case of the Uí Néill clan’s expansion in Ireland.

Just some days ago I wrote about the first Hungarian dynasty’s haplogroup R1a, and the potential association of other Ugric-speaking clans with R1a subclades, so let’s wait and see if future papers on other ancient Hungarian clans and Hungarian settlers bring surprises…


Prehistoric loan relations: Foreign elements in the Proto-Indo-European vocabulary


An interesting ongoing web project, Prehistoric loan relations, on potential loans of Proto-Indo-European words, from Uralic-Yukaghir, Caucasian, and Middle Eastern influence.

Based on a Ph.D. thesis by Bjørn (2017) Foreign elements in the Proto-Indo-European vocabulary (PDF).

From the website (emphasis mine):

This page allows historical linguists to compare and scrutinize proposed prehistoric lexical borrowings from the perspective of Proto-Indo-European. The first entries are all (135 in total) extracted from my master’s thesis “Foreign elements in the Proto-Indo-European vocabulary” (Bjørn 2017). Comments are encouraged at the bottom of each entry. New entries will be added, also on request.

Take this not as the conclusion, but an invitation to join the conversation.

So, we welcome the invitation, and hope that this new project thrives.

Also, I loved his fantasy-like map of the central Eurasian region (featured image on this post).


Expansion of peoples associated with spread of haplogroups: Mongols and C3*-F3918, Arabs and E-M183 (M81)


The expansion of peoples is known to be associated with the spread of a certain admixture component, joint with the expansion and reduction in variability of a haplogroup. In other words, few male lineages are usually more successful during the expansion.

Known examples include:

Two recent interesting papers add prehistoric cases of potential expansion of cultures associated with haplogroups:

1. Whole Y-chromosome sequences reveal an extremely recent origin of the most common North African paternal lineage E-M183 (M81), by Solé-Morata et al., Scientific Reports (2017).


E-M183 (E-M81) is the most frequent paternal lineage in North Africa and thus it must be considered to explore past historical and demographical processes. Here, by using whole Y chromosome sequences from 32 North African individuals, we have identified five new branches within E-M183. The validation of these variants in more than 200 North African samples, from which we also have information of 13 Y-STRs, has revealed a strong resemblance among E-M183 Y-STR haplotypes that pointed to a rapid expansion of this haplogroup. Moreover, for the first time, by using both SNP and STR data, we have provided updated estimates of the times-to-the-most-recent-common-ancestor (TMRCA) for E-M183, which evidenced an extremely recent origin of this haplogroup (2,000–3,000 ya). Our results also showed a lack of population structure within the E-M183 branch, which could be explained by the recent and rapid expansion of this haplogroup. In spite of a reduction in STR heterozygosity towards the West, which would point to an origin in the Near East, ancient DNA evidence together with our TMRCA estimates point to a local origin of E-M183 in NW Africa.

Distribution of E-M183 subclades among North Africa, the Near East and the Iberian Peninsula. Pie chart sectors areas are proportional to haplogroup frequency and are coloured according to haplogroup in the schematic tree to the right. n: sample size. Map was generated using R software.

An interesting excerpt, from the discussion:

Regarding the geographical origin of E-M183, a previous study suggested that an expansion from the Near East could explain the observed east-west cline of genetic variation that extends into the Near East. Indeed, our results also showed a reduction in STR heterozygosity towards the West, which may be taken to support the hypothesis of an expansion from the Near East. In addition, previous studies based on genome-wide SNPs reported that a North African autochthonous component increase towards the West whereas the Near Eastern decreases towards the same direction, which again support an expansion from the Near East. However, our correlations should be taken carefully because our analysis includes only six locations on the longitudinal axis, none from the Near East. As a result, we do not have sufficient statistical power to confirm a Near Eastern origin. In addition, rather than showing a west-to-east cline of genetic diversity, the overall picture shown by this correlation analysis evidences just low genetic diversity in Western Sahara, which indeed could be also caused by the small sample size (n = 26) in this region. Alternatively, given the high frequency of E-M183 in the Maghreb, a local origin of E-M183 in NW Africa could be envisaged, which would fit the clear pattern of longitudinal isolation by distance reported in genome-wide studies. Moreover, the presence of autochthonous North African E-M81 lineages in the indigenous population of the Canary Islands, strongly points to North Africa as the most probable origin of the Guanche ancestors. This, together with the fact that the oldest indigenous inviduals have been dated 2210 ± 60 ya, supports a local origin of E-M183 in NW Africa. Within this scenario, it is also worth to mention that the paternal lineage of an early Neolithic Moroccan individual appeared to be distantly related to the typically North African E-M81 haplogroup30, suggesting again a NW African origin of E-M183. A local origin of E-M183 in NW Africa > 2200 ya is supported by our TMRCA estimates, which can be taken as 2,000–3,000, depending on the data, methods, and mutation rates used.

The TMRCA estimates of a certain haplogroup and its subbranches provide some constraints on the times of their origin and spread. Although our time estimates for E-M78 are slightly different depending on the mutation rate used, their confidence intervals overlap and the dates obtained are in agreement with those obtained by Trombetta et al Regarding E-M183, as mentioned above, we cannot discard an expansion from the Near East and, if so, according to our time estimates, it could have been brought by the Islamic expansion on the 7th century, but definitely not with the Neolithic expansion, which appeared in NW Africa ~7400 BP and may have featured a strong Epipaleolithic persistence. Moreover, such a recent appearance of E-M183 in NW Africa would fit with the patterns observed in the rest of the genome, where an extensive, male-biased Near Eastern admixture event is registered ~1300 ya, coincidental with the Arab expansion. An alternative hypothesis would involve that E-M183 was originated somewhere in Northwest Africa and then spread through all the region. Our time estimates for the origin of this haplogroup overlap with the end of the third Punic War (146 BCE), when Carthage (in current Tunisia) was defeated and destroyed, which marked the beginning of Roman hegemony of the Mediterranean Sea. About 2,000 ya North Africa was one of the wealthiest Roman provinces and E-M183 may have experienced the resulting population growth.

2. The Y-chromosome haplogroup C3*-F3918, likely attributed to the Mongol Empire, can be traced to a 2500-year-old nomadic group, by Zhang et al., Journal of Human Genetics (2017)


The Mongol Empire had a significant role in shaping the landscape of modern populations. Many populations living in Eurasia may have been the product of population mixture between ancient Mongolians and natives following the expansion of Mongol Empire. Geneticists have found that most of these populations carried the Y-haplogroup C3* (C-M217). To trace the history of haplogroup (Hg) C3* and to further understand the origin and development of Mongolians, ancient human remains from the Jinggouzi, Chenwugou and Gangga archaeological sites, which belonged to the Donghu, Xianbei and Shiwei, respectively, were analysed. Our results show that nine of the eleven males of the Gangga site, two of the eight males of Chengwugou site and all of the twelve males of Jinggouzi site were found to have mutations at M130 (Hg C), M217 (Hg C3), L1373 (C2b, ISOGG2015), with the absence of mutations at M93 (Hg C3a), P39 (Hg C3b), M48 (Hg C3c), M407 (Hg C3d) and P62 (Hg C3f). These samples were attributed to the Y-chromosome Hg C3* (Hg C2b, ISOGG2015), and most of them were further typed as Hg C2b1a based on the mutation at F3918. Finally, we inferred that the Y-chromosome Hg C3*-F3918 can trace its origins to the Donghu ancient nomadic group.

The development of Mongolia and the frequencies of haplogroup C3* in modern Eurasians. a The development of Mongolia. b The frequencies of haplogroup C3 in modern Eurasians. The dotted line represents the approximate boundary between the Xiongnu and the Donghu. The black and grey arrows denote the migration of the Donghu and Mongolians, respectively

Featured image: Diachronic map of Iron Age migrations ca. 750-250 BC.


How to do modern phylogeography: Relationships between clans and genetic kin explain cultural similarities over vast distances


A preprint paper has been published in BioRxiv, Relationships between clans and genetic kin explain cultural similarities over vast distances: the case of Yakutia, by Zvenigorosky et al (2017).


Archaeological studies sample ancient human populations one site at a time, often limited to a fraction of the regions and periods occupied by a given group. While this bias is known and discussed in the literature, few model populations span areas as large and unforgiving as the Yakuts of Eastern Siberia. We systematically surveyed 31,000 square kilometres in the Sakha Republic (Yakutia) and completed the archaeological study of 174 frozen graves, assembled between the 15th and the 19th century. We analysed genetic data (autosomal genotypes, Y-chromosome haplotypes and mitochondrial haplotypes) for all ancient subjects and confronted it to the study of 190 modern subjects from the same area and the same population. Ancient familial links and paternal clan were identified between graves up to 1500 km apart and we provide new data concerning the origins of the contemporary Yakut population and demonstrate that cultural similarities in the past were linked to (i) the expansion of specific paternal clans, (ii) preferential marriage among the elites and (iii) funeral choices that could constitute a bias in any ancient population study.

Even if you are not interested in the cultural and anthropological evolution of this Turkic-speaking people of the Russian Far Eastern region, the method used is an excellent example of how to use archaeology and genetics (especially Y-DNA and mtDNA data) to obtain meaningful results when investigating ancient populations.

For quite some time, probably since the first renown admixture analyses of ancient DNA samples were published, we have been living under the impression that phylogeography, or simply archaeogenetics as it was called back in the day, is not needed.

Cavalli-Sforza’s assertion that the study of modern populations could offer a clear picture of past population movements is now considered wrong, and the study of Y-DNA and mtDNA haplogroups is today mostly disregarded as of secondary importance, even among geneticists. Whole genomic investigation (and especially admixture analyses) have been leading the new wave of overconfidence in genetic results, tightly joint with the ignorance of its shortcomings (and commercial interests based on desires of ethnic identification), and haplogroups are usually just reported with other, not entirely meaningful aspects of ancient DNA analyses.

While it is undeniable that admixture analyses are offering quite interesting results, they must be carefully balanced against known archaeological and linguistic knowledge. Phylogeography – and especially Y-DNA haplogroup assessment – is quite interesting in investigating kinship and clans in patrilocal communities – i.e. most communities in prehistoric and historic periods, unless proven otherwise.

Luckily enough, there are those researchers who still strive to obtain meaningful information from haplotypes. The article referenced in this post is quite interesting due to its phylogeographic method’s applicability to ancient cultures and peoples.

When some geneticists look at simplistic prehistoric maps, like those depicting Yamna, Afanasevo, Corded Ware, and Bell Beaker cultures together, they forget that 1) cultural regions are selected more or less arbitrarily (we only have certain scattered sites for each of these cultures); 2) economic or population contacts are difficult to ascertain and to represent graphically; and 3) time periods for archaeological sites are important – in fact, they are probably THE most important aspect in assessing how accurate a map (and its “arrows” of migration or exchange) represents reality.

A careful, detailed study like this one, if applied to the Pontic-Caspian steppe, would probably reveal how R1b subclades dominated steppe clans, beginning at least during the Suvorovo-Novodanilovka expansion to the west, and certainly representing the vast majority of lineages during the internal expansion in the Early Yamna period and its later expansion east and west of the steppe…

Featured image from the article, summing up Geography, Archaeology, and Genetics of Yakutia – including Y-DNA and mtDNA haplogroups from ancient populations.