Second in popularity for the expansion of haplogroup N1a-L392 (ca. 4400 BC) is, apparently, the association with Turkic, and by extension with Micro-Altaic, after the Uralic link preferred in Europe; at least among certain eastern researchers.
According to the views of a number of authoritative researchers, the Yakut ethnos was formed in the territory of Yakutia as a result of the mixing of people from the south and the autochthonous population .
These three major Sakha paternal lineages may have also arrived in Yakutia at different times and/ or from different places and/or with a difference in several generations instead, or perhaps Y-chromosomal STR mutations may have taken place in situ in Yakutia. Nevertheless, the immediate common ancestor(s) from the Asian Steppe of these three most prevalent Sakha Y-chromosomal STR haplotypes possibly lived during the prominence of the Turkic Khaganates, hence the near-perfect matches observed across a wide range of Eurasian geography, including as far as from Cyprus in the West to Liaoning, China in the East, then Middle Lena in the North and Afghanistan in the South (Table 3 and Figure 5). There may also be haplotypes closely-related to ‘the dominant Elley line’ among Karakalpaks, Uzbeks and Tajiks, however, limitations in the loci coverage for the available dataset (only eight Y-chromosomal STR loci) precludes further conclusions on this matter .
According to the results presented here, very similar Y-STR haplotypes to that of the original Elley line were found in the west: Afghanistan and northern Cyprus, and in the east: Liaoning Province, China and Ulaanbaator, Northern Mongolia. In the case of the dominant Omogoy line, very closely matching haplotypes differing by a single mutational step were found in the city of Chifen of the Jirin Province, China. The widest range of similar haplotypes was found for the Yakut haplotype Unknown: In Mongolia, China and South Korea. For instance, haplotypes differing by a single step mutation were found in Northern Mongolia (Khalk, Darhad, Uryankhai populations), Ulaanbaator (Khalk) and in the province of Jirin, China (Han population).
Notably, Tat-C-bearing Y-chromosomes were also observed in ancient DNA samples from the 2700-3000 years-old Upper Xiajiadian culture in Inner Mongolia, as well as those from the Serteya II site at the Upper Dvina region in Russia and the ‘Devichyi gory’ culture of long barrow burials at the Nevel’sky district of Pskovsky region in Russia. A 14-loci Y-chromosomal STR median-joining network of the most prevalent Sakha haplotypes and a Tat-C-bearing haplotype from one of the ancient DNA samples recovered from the Upper Xiajiadian culture in Inner Mongolia (DSQ04) revealed that the contemporary Sakha haplotype ‘Xuo’ (Table 2, Haplotype ID “Xuo”) classified as that of ‘the Xiongnu clan’ in our current study, was the closest to the ancient Xiongnu haplotype (Figure 6). TMRCA estimate for this 14-loci Y-chromosomal STR network was 4357 ± 1038 years or 2341 ± 1038 BCE, which correlated well with the Upper Xiajiadian culture that was dated to the Late Bronze Age (700-1000 BCE).
Also, a simple look at the TMRCA and modern distribution was enough to hypothesize long ago the lack of connection of N1c-L392 with Altaic or Uralic peoples. From Ilumäe et al. (2016):
Previous research has shown that Y chromosomes of the Turkic-speaking Yakuts (Sakha) belong overwhelmingly to hg N3 (formerly N1c1). We found that nearly all of the more than 150 genotyped Yakut N3 Y chromosomes belong to the N3a2-M2118 clade, just as in the Turkic-speaking Dolgans and the linguistically distant Tungusic-speaking Evenks and Evens living in Yakutia (Table S2). Hence, the N3a2 patrilineage is a prime example of a male population of broad central Siberian ancestry that is not intrinsic to any linguistically defined group of people. Moreover, the deepest branch of hg N3a2 is represented by a Lebanese and a Chinese sample. This finding agrees with the sequence data from Hallast et al., where one Turkish Y chromosome was also assigned to the same sub-clade. Interestingly, N3a2 was also found in one Bhutan individual who represents a separate sub-lineage in the clade. These findings show that although N3a2 reflects a recent strong founder effect primarily in central Siberia (Yakutia, Sakha), the sub-clade has a much wider distribution area with incidental occurrences in the Near East and South Asia.
The most striking aspect of the phylogeography of hg N is the spread of the N3a3’6-CTS6967 lineages. Considering the three geographically most distant populations in our study—Chukchi, Buryats, and Lithuanians—it is remarkable to find that about half of the Y chromosome pool of each consists of hg N3 and that they share the same sub-clade N3a3’6. The fractionation of N3a3’6 into the four sub-clades that cover such an extraordinarily wide area occurred in the mid-Holocene, about 5.0 kya (95% CI = 4.4–5.7 kya). It is hard to pinpoint the precise region where the split of these lineages occurred. It could have happened somewhere in the middle of their geographic spread around the Urals or further east in West Siberia, where current regional diversity of hg N sub-lineages is the highest (Figure 1B). Yet, it is evident that the spread of the newly arisen sub-clades of N3a3’6 in opposing directions happened very quickly. Today, it unites the East Baltic, East Fennoscandia, Buryatia, Mongolia, and Chukotka-Kamchatka (Beringian) Eurasian regions, which are separated from each other by approximately 5,000–6,700 km by air. N3a3’6 has high frequencies in the patrilineal pools of populations belonging to the Altaic, Uralic, several Indo-European, and Chukotko-Kamchatkan language families. There is no generally agreed, time-resolved linguistic tree that unites these linguistic phyla. Yet, their split is almost certainly at least several millennia older than the rather recent expansion signal of the N3a3’6 sub-clade, suggesting that its spread had little to do with linguistic affinities of men carrying the N3a3’6 lineages.
It was thus clear long ago that N1c-L392 lineages must have expanded explosively in the 5th millennium through Northern Eurasia, probably from a region to the north of Lake Baikal, and that this expansion – and succeeding ones through Northern Eurasia – may not be associated to any known language group until well into the common era.
A new paper (behind paywall) offers insight into the prevalent presence of R1a-Z93 among eastern Scytho-Siberian groups (most likely including Samoyedic speakers in the forest-steppes), and a new hint to the westward expansion of haplogroups Q and N (probably coupled with the so-called “Siberian ancestry”) from the east with different groups of Iron Age steppe nomads:
From an archeological and historical point of view, the term “Scythians” refers to Iron Age nomadic or seminomadic populations characterized by the presence of three types of artifacts in male burials: typical weapons, specific horse harnesses and items decorated in the so-called “Animal Style”. This complex of goods has been termed the “Scythian triad” and was considered to be characteristic of nomadic groups belonging to the “Scythian World” (Yablonsky 2001). This “Scythian World” includes both the Classic (or European) Scythians from the North Pontic region (7th–3th century BC) and the Southern Siberian (or Asian) populations of the Scythian period (also called Scytho-Siberians). These include, among others, the Sakas from Kazakhstan, the Tagar population from the Minusinsk Basin (Republic of Khakassia), the Aldy-Bel population from Tuva (Russian Federation) and the Pazyryk and Sagly cultures from the Altai Mountains.
In this work, we first aim to address the question of the familial and social organization of Scytho-Siberian groups by studying the genetic relationship of 29 individuals from the Aldy-Bel and Sagly cultures using autosomal STRs. (…) were obtained from 5 archeological sites located in the valley of the Eerbek river in Tuva Republic, Russia (Fig. 1). All the mounds of this archeological site were excavated but DNA samples were not collected from all of them. 14C dates mainly fall within the Hallstatt radiocarbon calibration plateau (ca. 800–400 cal BC) where the chronological resolution is poor. Only one date falls on an earlier segment of calibration curve: Le 9817–2650 ± 25 BP, i.e. 843–792 cal BC with a probability of 94.3% (using the OxCal v4.3.2 program). This sample (Bai-Dag 8, Kurgan 1, grave 10) is not from one of the graves studied but was used to date the kurgan as a whole.
Y-chromosome haplogroups were first assigned using the ISOGG 2018 nomenclature. In order to improve the precision of haplogroup definition, we also analyzed a set of Y-chromosome SNP (Supplementary Table 2). Nine samples belonged to the R1a-M513 haplogroup (defined by marker M513) and two of these nine samples were characterized as belonging to the R1a1a1b2-Z93 haplogroup or one of its subclades. Six samples belonged to the Q1b1a-L54 haplogroup and five of these six samples belonged to the Q1b1a3-L330 subclade. One sample belonged to the N-M231 haplogroup.
The distribution of these haplogroups in the population must be confronted with the prevalence of kinship among the samples. Although five individuals belonged to haplogroup Q1b1a3-L330, three of them (ARZ-T18, ARZ-T19 and ARZ-T20) were paternally related (Fig. 2). It must, therefore, be considered that haplogroup Q1b1a3-L330 is present in three independent instances (given that the remaining two instances exhibit no close familial relationship with other samples or one another). All five were buried on the Eki-Ottug 1 archaeological site (although in two different kurgans).
In the same way, although two groups, of two and three individuals, shared haplotypes belonging to the R1a-M513 haplogroup, these groups likely include a father/son pair (ARZ-T2 and ARZ-T12). Therefore, among nine R1a-M513 men, we found six independent haplotypes, one being present in two independent instances. All R1a-M513 haplotypes, however, including those attributed to the R1a1a1b2-Z93 subclade, only differed by one-step mutations, across 5 loci at most. All R1a-M513 individuals were buried on the same site, Eki-Ottug 2, in a single Kurgan.
Haplogroup R1a-M173 was previously reported for 6 Scytho-Siberian individuals from the Tagar culture (Keyser et al. 2009) and one Altaian Scytho-Siberian from the Sebÿstei site (Ricaut et al. 2004a), whereas haplogroup R1a1a1b2-Z93 (or R1a1a1b-S224) was described for one Scythian from Samara (Mathieson et al. 2015) and two Scytho-Siberians from Berel and the Tuva Republic (Unterländer et al. 2017). On the contrary, North Pontic Scythians were found to belong to the R1b1a1a2 haplogroup (Krzewińska et al. 2018), showing a distinction between the two groups of Scythians. (…) The absence of R1b lineages in the Scytho-Siberian individuals tested so far and their presence in the North Pontic Scythians suggest that these 2 groups had a completely different paternal lineage makeup with nearly no gene flow from male carriers between them.
The seven other male individuals studied in this work were found to carry Eastern Eurasian Y haplogroups Q1b1a and one of its subclades (n = 6) and N (n = 1). Haplogroup Q1b1a-L54 was previously described in four males from the Bronze Age in the Altai Mountains (Hollard et al. 2014, 2018) and was clearly associated with Siberian populations (Regueiro et al. 2013).
The N-M231 haplogroup emerged from haplogroup K in Southern Asia around 21,000 years BCE, maybe in Southern China (Shi et al. 2013; Ilumäe et al. 2016). Previous studies attested to its presence in samples from Neolithic and Bronze Age in China (Li et al. 2011; Cui et al. 2013). Waves of northwestern expansion of this haplogroup are described as beginning during the Paleolithic period (Derenko et al. 2006; Shi et al. 2013) but traces of this expansion in archeological samples were reported only in two Scytho-Siberian males from the Altai (Pilipenko et al. 2015).
The sample of haplogroup N comes from the Aldy-Bel culture (ARZ-T15), from the Eerbek site, but has no radiocarbon date. All Q1b-L330 samples come from the Sagly culture, and three are paternally related. The other Q1b-L54 sample is from other tombs in one kurgan at Aldy Bel.
After 568 AD the Avars settled in the Carpathian Basin and founded the Avar Qaganate that was an important power in Central Europe until the 9th century. Part of the Avar society was probably of Asian origin, however the localisation of their homeland is hampered by the scarcity of historical and archaeological data.
Here, we study mitogenome and Y chromosomal STR variability of twenty-six individuals, a number of them representing a well-characterised elite group buried at the centre of the Carpathian Basin more than a century after the Avar conquest.
The Y-STR analyses of 17 males give evidence on a surprisingly homogeneous Y chromosomal composition. Y chromosomal STR profiles of 14 males could be assigned to haplogroup N-Tat (also N1a1-M46). N-Tat haplotype I was found in four males from Kunpeszér with identical alleles on at least nine loci. The full Y-STR haplotype I, reconstructed from AC17 with 17 detected STRs, is rare in our days. Only nine matches were found among haplotypes in YHRD database, such as samples from the Ural Region, Northern Europe (Estonia, Finland), and Western Alaska (Yupiks). We performed Median Joining (MJ) network analysis using N-Tat haplotypes with ten shared STR loci (Fig. 3, Table S9). All modern N-Tat samples included in the network had derived allele of L708 as well. Haplotype I (Cluster 1 in Fig. 3) is shared by eight populations on the MJ network among the 24 identical haplotypes. Cluster 1 represents the founding lineage, as it is described in Siberian populations, because this haplotype is shared by the most populations and it is more diverse than Cluster 2.
Nine males share N-Tat haplotype II (on a minimum of eight detected alleles), all of them buried in the Danube-Tisza Interfluve. We found 30 direct matches of this N-Tat haplotype II in the YHRD database, using the complete 17 STR Y-filer profile of AC1, AC12, AC14, AC15, AC19 samples. Most hits came from Mongolia (seven Buryats and one Khalkh) and from Russia (six Yakuts), but identical haplotypes also occur in China (five in Xinjiang and four in Inner Mongolia provinces). On the MJ network, this haplotype II is represented by Cluster 2 and is composed of 45 samples (including 32 Buryats) from six populations (Fig. 3).
A third N-Tat lineage (type III) was represented only once in the Avar dataset (AC8), and has no direct modern parallels from the YHRD database. This haplotype on the MJ network (see red arrow in Fig. 3) seems to be a descendent from other haplotype cluster that is shared by three populations (two Buryat from Mongolia, three Khanty and one Northern Mansi samples). This haplotype cluster also differs one molecular step (locus DYS393) from haplotype II. We classified the Avar samples to downstream subgroup N-F4205 within the N-Tat haplogroup, based on the results of ours and Ilumäe et al.18 and constructed a second network (Fig. S4). The N-F4205 network results support the assumption that the N-Tat Avar samples belong to N-F4205 subgroup (see SI chapter 1d for more details).
Based on our calculation, the age of accumulated STR variance (TMRCA) within N-Tat lineage for all samples is 7.0 kya (95% CI: 4.9 – 9.2 kya), considering the core haplotype (Cluster 1) to be the founding lineage. Y haplogroup N-Tat was not detected by large scale Eurasian ancient DNA studies but it occurs in late Bronze Age Inner Mongolia and late medieval Yakuts, among them N-Tat has still the highest frequency.
Two males (AC4 and AC7) from the Transtisza group belong to two different haplotypes of Y-haplogroup Q1. Both Q1a-F1096 and Q1b-M346 haplotypes have neither direct nor one step neighbour matches in the worldwide YHRD database. A network of the Q1b-M346 haplotype shows that this male had a probable Altaian or South Siberian paternal genetic origin.
EDIT (5 APR 2019): The paper offers an interesting late sample before the arrival of Hungarian conquerors, although we don’t know which precise lineage the sample belongs to:
One sample in our dataset (HC9) comes from this population, and both his mtDNA (T1a1b) and Y chromosome (R1a) support Eastern European connections. (…) Furthermore, we excluded sample HC9 from population-genetic statistical analyses because it belongs to a later period (end of 7th – early 9th centuries)
Apparently, then, results are consistent with what was already known from studies of modern populations:
According to Ilumäe et al. study, the frequency peak of N-F4205 (N3a5-F4205) chromosomes is close to the Transbaikal region of Southern Siberia and Mongolia, and we conclude that most Avar N-Tat chromosomes probably originated from a common source population of people living in this area, completely in line with the results of Ilumäe et al.
The most frequent haplogroups of the Bashkirian Maris were N1b-P43 (42%), R1a-Z280 (16%), R1a-Z93 (16%), N1c-Tat (13%), and J2-M172 (7%). Furthermore, subgroup R1b-M343 accounted for 4% and I2a-P37 covered 2% of the lineages. None of the Mari N1c Y chromosomes belonged to the N1c subgroups investigated (L1034, VL29, Z1936).
In the case of the Southern Mansi males, the most frequent haplogroups were N1b-P43 (33%), N1c-L1034 (28%) and R1a-Z280 (19%). The frequencies of the remaining haplogroups were as follows: R1a-M458 (6%), I1-L22 (3%), I2a-P37 (3%), and R1b-P312 (3%). The haplotype and haplogroup diversities of the Bashkirian Mari group were 0.9929 and 0.7657, whereas these values for the Southern Mansi were 0.9984 and 0.7873, respectively. The results show that, in both populations, haplotypes are much more diverse than haplogroups.
(..) the studied Bashkirian Mari and Southern Mansi population groups formed a compact cluster along with two Khanty, Northern Mansi, Mari, and Estonian populations based on close Fst-genetic distances (< 0.05), with nonsignificant p values (p > 0.05) except for the Estonian population. All of these populations belong to the Finno-Ugric language family. Interestingly, the other Mansi population studied by Pimenoff et al. (2008) (pop # 38) was located a great distance from the Southern Mansi group (0.268). In addition, the Bashkir population (pop # 6) did not show a close genetic affinity to the Bashkirian Mari group (0.194), even though it is the host population. However, the Russian population from the Eastern European region of Russia (pop # 49) showed a genetic distance of 0.055 with the Southern Mansi group. All Hungarian speaking populations (pops 13, 22, 23, 24, 50, and 51) showed close genetic affinities to each other and to the neighbouring populations, but not to the two studied populations.
Median-joining networks were constructed for:
N-P43 (earlier N1b):
(…) TMRCA estimates for this haplogroup were made for all P43 samples (n = 157) 8.7 kya (95% CI 6.7–10.8 kya), for the N-P43 Asian.
(…) 75% of Buryats belonged to Haplotype 2, indicating that the Buryats studied by us is a young and isolated population (Bíró et al. 2015). Bashkirian Mari samples derive from Haplotype 2 via Haplotype 3 (see dark purple circles on the top of Fig. 6a). Haplotype 3 contained six males (2 Buryat, 1 Northern Mansi, and 3 Khanty samples from Pimenoff et al. 2008). The biggest Bashkirian Mari haplotype node (3 Mari samples) was positioned three mutational steps away from Haplotype 1 and the remaining Mari samples can be derived from this haplotype. Southern Mansi haplotypes were scattered within the network except for two, which formed a smaller haplotype node with two Northern Mansi and two Khanty samples from Pimenoff et al. (2008).
R1a-Z280 haplotypes, shared by Maris, Mansis, and Hungarians, hence ancient Finno-Ugrians:
The founder R1a-Z280 haplotype was shared by four samples from four populations (1 Bashkirian Mari; 1 Southern Mansi; 1 Hungarian speaking Székely; and 1 Hungarian), as presented in Fig. 7 (Haplotype 1). Haplotype 2 included five males (3 Bashkirian Mari and 2 Hungarian), as it can be seen in Fig. 7. Haplotype 4 included two shared haplotypes (1 Bashkirian Mari and one Hungarian speaking Csángó). The remaining two Bashkirian Mari haplotypes differ from the founder haplotype (Haplotype 1) by two mutational steps via Hungarian or Hungarian and Bashkirian Mari shared haplotypes. Beside Haplotype 1, the remaining Southern Mansi haplotypes were shared with Hungarians (Haplotype 5 or turquoise blue and red-coloured circles above Haplotype 7) or with Hungarians and Hungarian speaking Székely group (Haplotypes 3, 5, and 6). Haplotype 7 included ten Hungarian speakers (Hungarian, Székely, and Csángó). One Hungarian and one Uzbek Khwarezm shared haplotype can be found in Fig. 7 as well (red and white-coloured circle). All the other haplotypes were scattered in the network. The age of accumulated STR variation within R1a-Z280 lineage for 93 samples is estimated to be 9.4 kya (95% CI 6.5–12.4 kya) considering Haplotype 1 (Fig. 7) to be the founder.
R1a-Z93 as isolated lineages among Permic and Ugric populations:
Figure 8 depicts an MJ network of R1a-Z93* samples using 106 haplotypes from the 14 populations (Fig. 8). All of the Bashkirian Mari samples (7 haplotypes) formed a very isolated branch and differed from the one Hungarian haplotype (Fig. 8, see Haplotype 1) by seven mutational steps as well from two Uzbek Tashkent samples (see Haplotype 3). Another Hungarian sample shared two haplotypes of Uzbek Khwarezm samples in Haplotype 4. This haplotype can be derived from Haplotype 3 (Uzbek Tashkent). Haplotype 2 included one Hungarian and one Khakassian male. The remaining three Hungarian haplotypes are outliers in the network and are not shared by any sample. The other population samples included in the network either form independent clusters such as Altaians, Khakassians, Khanties, and Uzbek Madjars or were scattered in the network. The age of accumulated STR variation (TMRCA) within R1a-Z93* lineage for 106 samples is estimated as 11.6 kya (95% CI 9.3–14.0 kya) considering an Armenian haplotype (Fig. 8, “A”) to be the founder and the median haplotype.
The results of modern populations for N (especially N1c) subclades show really wide clusters and ancient TMRCA, consistent with their known ancient and wide distribution in northern and eastern Eurasian groups, and thus with infiltration of different lineages with eastern nomads (and northern Arctic populations) coupled with later bottlenecks, as well as acculturation of groups.
EDIT (2 APR): Interesting is the specific subclade to which ancient Mongolic-speaking Avars belong (information from Yfull) N1c-F4205 (TMRCA ca. 500 BC), subclade of N1c-Y6058 (formed ca. 2800 BC, TMRCA ca. 2800 BC). This branch also gives the “European” branch N1c-CTS10760 (formed ca. 2800 BC, TMRCA ca. 2100 BC), and is subclade of a branch of N1c-L392 (formed ca. 4400 BC, TMRCA ca. 2800 BC). A northern expansion of N1c-L392 is probably represented by its branch N1c-Z1936 (formed ca. 2800, TMRCA ca. 2100 BC), the most likely candidate to appear in the Kola Peninsula in the Bronze Age as the Palaeo-Laplandic population (see here). Read more about potential routes of expansion of haplogroup N.
On the other hand, R1a-Z280 lineages form a tight cluster connecting Permic with Ugric groups, with R1a-Z93 showing early isolation (probably) between Cis-Urals and Trans-Urals regions. While both Corded Ware lineages in Finno-Ugrians are most likely related to the Abashevo expansion through Seima-Turbino and the Andronovo-like Horizon (and potentially later Eurasian expansions), a plausible hypothesis would be that Finno-Ugrians are related to an expansion of R1a-Z283 haplogroups (we already knew about the Finno-Permic connection), while the ancient connection between Permians and Hungarians with R1a-Z93 would correspond to this haplogroup’s potentially tighter link with an early Samoyedic split.
I don’t think that an explosive expansion of eastern Corded Ware groups of R1a-Z645 lineages will show a clear-cut division of haplogroups among Eastern Uralic groups, though, and culturally I doubt we will have such a clear image, either (similar to how the explosive expansion of Bell Beakers cannot be easily divided by regional/language group into R1b-L151 subclades before the known bottlenecks). Relevant in this regard are the known Z93 samples from the Árpád dynasty.
Such a “Z283 over Z93” layer in the Trans-Urals (and Cis-Urals?) forest-steppes would be similar to the apparent replacement of Z284 by Z282 in the Eastern Baltic during the Bronze Age (possibly with the second or Estonian Battle Axe wave or, much more likely during later population movements). Such an early R1a-Z93 split could potentially be supported also by the separation into bottlenecks under “Northern” (R1a-Z283) Finno-Ugric-speaking Abashevo-related groups and “Southern” (R1a-Z93) acculturated Indo-Iranian-speaking Abashevo migrants developing Sintashta-Potapovka admixing with Poltavka R1b-Z2103 herders.
Let’s review some of the most common myths about Hungarians (and Finno-Ugrians in general) repeated ad nauseam, side by side with my assertions:
❌ N (especially N1c-Tat) in ancient and modern samples represent the True Uralic™ N1c peoples including Magyar tribes? Nope.
❌ Modern Hungarian R1a-Z280 lineages represent the majority of the native population, poor Slavic ‘peasants’ from the Carpathian Basin, forcibly acculturated by a minority of bad bad Hungarian hordes? Nope.
Sooo, the theory of a “diluted” Y-DNA in Modern Hungarians from originally fully N-dominated conquerors subjugating native R1a-Z280 Slavs from the Carpathian Basin is not backed up by genetic studies? The ethnic Iranian-Turkic R1a-Z93 federation in the steppes that ended up speaking Magyar is not real?? Who would’ve thunk.
Totally unexpected, too, the drift of “R1a=IE” fans with the newest genetic findings towards a Molgen-like “Yamna/R1b = Vasconic-Caucasian”, “N1c = Uralic-Altaic”, and “R1a = the origin of the white world in Mother Russia”. So much for the supposed interest in “Steppe ancestry” and fancy statistics.
Marital structure. The intensity of interethnic marriages puts the existence of the Ulchi population at risk. The colorful ethnic composition of the Ulchi settlements is reflected in the marriage structure [see featured image]. We found that the proportion of single-ethnic marriages of the Ulchi is on average 51%. The greatest number of such marriages takes place in the village of Bulava. Marriages of Ulchi with Russians are in second place. Marriages with indigenous peoples of the Far East, Nanais, Nivkhs, Evenks, and others, are in third place. Thus, almost half of the Ulchi marriages are with representatives of other nationalities. Such a significant level of interethnic mixing makes it possible to talk about intense processes of assimilation of this indigenous people and puts to the forefront the problem of loss of the unique gene pool of the Ulchi.
Haplogroup C (its branch M48) was genotyped for its five subbranches with markers M86, B470, F13686, B93, and the marker at position 16645386 (GRCh37), which was found by our team for the first time. Variant B93 is rare in the Ulchi, and 14 samples (that is, more than a quarter of the entire gene pool of the Ulchi, Fig. 2) belong to M86 and its subvariants. Therefore, we genotyped STR markers of C-M86 carriers for the Ulchi and neighboring Amur populations and analyzed the relationships of detected haplotypes on the phylogenetic network (Fig. 3, STR haplotypes are available from authors upon request).
(…) On the network, different clusters are associated with different populations: most Mongols belong to F13686, all Evenks of the Amur River region with this haplogroup form a subcluster within F13686, and part of Upper Nanais is the basis of cluster B470.
An estimate of the age of the entire haplogroup C-F12355 obtained from the data of genome-wide sequencing of seven specimens is 2400 ± 500 years (O.P. Balanovsky, unpublished data). That is, the common ancestor of all the studied representatives of various peoples with this haplogroup lived not so long ago, the first millennium BC. The formation time of cluster F13686 is somewhat later: 1990 ± 600 years.
(…) obvious traces of the interaction of the gene pool of the Ulchi with neighboring and remote peoples of the Far East and Central Asia in the time range of the last one to three thousand years were revealed. This shows that the results of work  on the similarity of the gene pool of the ancient (age of 7500 years) Neolithic genomes of the Amur River region to the Ulchi probably indicate not the uniqueness of the Ulchi, but the fact that this ancient gene pool was preserved in a vast circle of populations of the Far East interwoven with gene flows both with each other and, to a lesser extent, with populations of Central Asia.
The expansion of C2b1a2a-M86 (among many basal C2-M217 samples) is thus possibly associated with the spread of Tungusic, which puts C2b1a at the root of the Micro-Altaic expansion, with a formation date ca. 12700 BC, TMRCA 12500 BC (and not only Mongolian). This shows that Micro-Altaic is connected with a local population which shows a clear continuity since at least 3500 BC. This, however, tells us little about the origin of the language.
That leaves the ancestral N lineages found among Far East Asians as Palaeo-Siberian in origin, and their late expansions to the west not particularly linked with any of the known Palaeo-Siberian ethnolinguistic groups, let alone a supposed “Uralo-Altaic” language…
With a view to trace the Mongol expansion in Tuvinian gene pool we studied two largest Tuvinian clans – those in which, according to data of humanities, one could expect the highest Central Asian ancestry, connected with the Mongol expansion. Thus, the results of Central Asian ancestry in these two clans component may be used as upper limit of the Mongol influence upon the Tuvinian gene pool in a whole. According to the data of 59 Y-chromosomal SNP markers, the haplogroup spectra in these Tuvinian tribal groups (Mongush, N = 64, and Oorzhak, N = 27) were similar. On average, two-thirds of their gene pools (63 %) are composed by North Eurasian haplogroups (N*, N1a2, N3a, Q) connected with autochtonous populations of modern area of Tuvans. The Central Asian haplogroups (C2, O2) composed less then fifth part (17 %) of gene pools of the clans studied. The opposite ratio was revealed in Mongols: there were 10 % North Eurasian haplogroups and 75 % Central Asian haplogroups in their gene pool. All the results derived – “genetic portraits”, the matrix of genetic distances, the dendrogram and the multidimensional scaling plot, which mirror the genetic connections between Tuvinian clans and populations of South Siberia and East Asia, demonstrated the prominent similarity of the Tuvinian gene pools with populations from and Khakassia and Altai. It could be therefore assumed that Tuvinian clans Mongush and Oorzhak originated from autochtonous people (supposedly, from the local Samoyed and Kets substrata). The minor component of Central Asian haplogroups in the gene pool of these clans allowed to suppose that Mongol expansion did not have a significant influence upon the Tuvinan gene pool at a whole.
Haplogroup C2 peaks in Central Asia (Wells et al., 2001; Zerial et al., 2003), though its variants are abundant in other peoples of Siberia and Far East. For instance, in one of Buryat clans, namely Ekhirids, hg C2 frequency is 88 % (Y-base); in Kazakhs from different regions of Kazakhstan, total occurrence of hg C2 variants averages between 17 and 81 % (Abilev et al., 2012; Zhabagin et al., 2013, 2014, 2017), in populations of the Amur River (such as Nanais, Negidals, Nivkhs, Ulchs) – between 40 and 65 %, in Evenks – up to 68 % (Y-base), in Kyrgyz people of Pamir-Alay – up to 22 %, correspondingly; of all Turkic peoples of Altai, relatively high hg C2 frequency (16 %) is detected only in Telengits (Balanovskaya et al., 2014; Balaganskaya et al., 2011a, 2016). In Tuvinian clans under the study, hg C2 frequency is rather low – 19 % in Mongush and 11 % in Oorzhak, while in Mongols it makes up almost two thirds of the entire gene pool an comprises different genetic lines (subhaplogroups).
Haplogroup N is abundant all over North Eurasia from Scandinavia to Far East (Rootsi et al., 2007). The study on whole Y-chromosome sequencing conducted with participation of our group (Ilumäe et al., 2016) subdivided this haplogroup into several branches with their regional distribution. In gene pools of the Tuvans involved, hg N was represented by two sub-clades, namely N1a2 and N3a.
Sub-clade N1a2 peaks in populations of West Siberia (in Nganasans, frequency is 92 %) and South Siberia (in Khakas 34 %, in Tofalars 25 %) (Y-base). In Tuvans, N1a2 occurrence is nearly 16 % in Mongush and 15 % in Oorzhak clans, respectively, while in Mongols, the frequency is three times less (5 %). Hg N1a2 is supposed to display the impact of the Samoyedic component to the gene pool of Tuvinian clans (Kharkov et al., 2013).
Sub-clade N3a is major in the Oorzhak clan comprising almost half of the gene pool (45 %); it is represented by two sub-clades, namely N3a* and N3a5. The same sub-branches are specific to the Mongush clan as well, though with lower frequencies: N3a* – 9 % and N3a5 – 14 % (see Table). In Khori-Buryats from the Transbaikal region, a high frequency is observed – 82 % (Kharkov et al., 2014), while in Mongols, N3a5 occurs rather rarely (6 %). Hg N3a* was detected in populations of South Siberia only, and was widely spread in Khakas-Sagays and Shors (up to 40 %) (Ilumäe et al., 2016) (Y-base).
Within the pan-Eurasian haplogroup R1a1a, two large genetic lines (sub-haplogroups) are identified: “European” (marker M458) and “Asian” (marker Z93) the latter almost never occurring in Europe (Balanovsky, 2015) but abundant in South Siberia and northern Hindustan. In the Altai-Sayan region, high frequencies of the “Asian” branch are spread in many peoples – Shors, Tubalars, Altai-Kizhi people, Telengits, Sagays, Kyzyl Khakas, Koibals, Teleuts (Y-base) (Kharkov et al., 2009). Hg R1a1a comprises perceptible parts of gene pools of Tuvinian clans (19 % in Mongush, and 15 % in Oorzhak), though its occurrence in Mongols is much lower (6 %). Those results also count in favor of the hypothesis of autochtonous component dominance even in the gene pools of clans potentially most influenced by Mongolian ancestry. If we add R1a1a variants to the “North Eurasian” haplogroups, the “not-Central Asian” component will compose average four fifth of the entire gene pools for Tuvinian clans (in Mongush 77 %, and in Oorzhak 81 %), being only 16 % in Mongols. Such data are definitely contrary to the hypothesis of a crucial influence of the Mongol expansion upon the development of Tuvinian gene pool.
I found interesting the high proportion of R1a-Z93 subclades among Sagays in Khakhasia, which stem from a local Samoyed substratum, as described by the paper…
Featured Image: Map of Uralic and Altaic languages, from Wikipedia.
The Sahara was wetter and greener during multiple interglacial periods of the Quaternary, when some have suggested it featured very large (mega) lakes, ranging in surface area from 30,000 to 350,000 km2. In this paper, we review the physical and biological evidence for these large lakes, especially during the African Humid Period (AHP) 11–5 ka. Megalake systems from around the world provide a checklist of diagnostic features, such as multiple well-defined shoreline benches, wave-rounded beach gravels where coarse material is present, landscape smoothing by lacustrine sediment, large-scale deltaic deposits, and in places, tufas encrusting shorelines. Our survey reveals no clear evidence of these features in the Sahara, except in the Chad basin. Hydrologic modeling of the proposed megalakes requires mean annual rainfall ≥1.2 m/yr and a northward displacement of tropical rainfall belts by ≥1000 km. Such a profound displacement is not supported by other paleo-climate proxies and comprehensive climate models, challenging the existence of megalakes in the Sahara. Rather than megalakes, isolated wetlands and small lakes are more consistent with the Sahelo-Sudanian paleoenvironment that prevailed in the Sahara during the AHP. A pale-green and discontinuously wet Sahara is the likelier context for human migrations out of Africa during the late Quaternary.
The whole review is an interesting read, but here are some relevant excerpts:
Various researchers have suggested that megalakes coevally covered portions of the Sahara during the AHP and previous periods, such as paleolakes Chad, Darfur, Fezzan, Ahnet-Mouydir, and Chotts (Fig. 2, Table 2). These proposed paleolakes range in size by an order of magnitude in surface area from the Caspian Sea–scale paleo-Lake Chad at 350,000 km2 to Lake Chotts at 30,000 km2. At their maximum, megalakes would have covered ~ 10% of the central and western Sahara, similar to the coverage by megalakes Victoria, Malawi, and Tanganyika in the equatorial tropics of the African Rift today. This observation alone should raise questions of the existence of megalakes in the Sahara, and especially if they developed coevally. Megalakes, because of their significant depth and area, generate large waves that become powerful modifiers of the land surface and leave conspicuous and extensive traces in the geologic record.
Lakes, megalakes, and wetlands
Active ground-water discharge systems abound in the Sahara today, although they were much more widespread in the AHP. They range from isolated springs and wet ground in many oases scattered across the Sahara (e.g., Haynes et al., 1989) to wetlands and small lakes (Kröpelin et al., 2008). Ground water feeding these systems is dominated by fossil AHP-age and older water (e.g., Edmunds and Wright 1979; Sonntag et al., 1980), although recently recharged water (<50 yr) has been locally identified in Saharan ground water (e.g., Sultan et al., 2000; Maduapuchi et al., 2006).
In our view, Lake Chad is the only former megalake in the Sahara firmly documented by sedimentologic and geomorphic evidence. Mega-Lake Chad is thought to have covered ~ 345,000 km2, stretching for nearly 8° (10–18°N) of latitude (Ghienne et al., 2002) (Fig. 2). The presence of paleo- Lake Chad was at one point challenged, but several—and in our view very robust—lines of evidence have been presented to support its development during the AHP. These include: (1) clear paleo-shorelines at various elevations, visible on the ground (Abafoni et al., 2014) and in radar and satellite images (Schuster et al., 2005; Drake and Bristow, 2006; Bouchette et al., 2010); (2) sand spits and shoreline berms (Thiemeyer, 2000; Abafoni et al., 2014); and (3) evaporites and aquatic fauna such as fresh-water mollusks and diatoms in basin deposits (e.g., Servant, 1973; Servant and Servant, 1983). Age determinations for all but the Holocene history of mega- Lake Chad are sparse, but there is evidence for Mio-Pliocene lake (s) (Lebatard et al., 2010) and major expansion of paleo- Lake Chad during the AHP (LeBlanc et al., 2006; Schuster et al., 2005; Abafoni et al., 2014; summarized in Armitage et al., 2015) up to the basin overflow level at ~ 329m asl.
Insights from hydrologic mass balance of megalakes
Using these conservative conditions (i.e., erring in the direction that will support megalake formation), our hydrologic models for the two biggest central Saharan megalakes (Darfur and Fezzan) require minimum annual average rainfall amounts of ~ 1.1 m/yr to balance moisture losses from their respective basins (Supplementary Table S1). Lake Chad required a similar amount (~1 m/yr; Supplementary Table S1) during the AHP according to our calculations, but this is plausible, because even today the southern third of the Chad basin receives ≥1.2 m/yr (Fig. 2) and experiences a climate similar to Lake Victoria. A modest 5° shift in the rainfall belt would bring this moist zone northward to cover a much larger portion of the Chad basin, which spans N13° ±7°. Estimated rainfall rates for Darfur and Fezzan are slightly less than the average of ~ 1.3 m/yr for the Lake Victoria basin, because of the lower aw values, that is, smaller areas of Saharan megalakes compared with their respective drainage basins (Fig. 15).
Estimates of paleo-rainfall during the AHP
Here major contradictions develop between the model outcomes and paleo-vegetation evidence, because our Sahelo-Sudanian hydrologic model predicts wetter conditions and therefore more tropical vegetation assemblages than found around Lake Victoria today. In fact, none of the very wet rainfall scenarios required by all our model runs can be reconciled with the relatively dry conditions implied by the fossil plant and animal evidence. In short, megalakes cannot be produced in Sahelo-Sudanian conditions past or present; to form, they require a tropical or subtropical setting, and major displacements of the African monsoon or extra-desert moisture sources.
If not megalakes, what size lakes, marshes, discharging springs, and flowing rivers in the Sahara were sustainable in Sahelo-Sudanian climatic conditions? For lakes and perennial rivers to be created and sustained, net rainfall in the basin has to exceed loss to evapotranspiration, evaporation, and infiltration, yielding runoff that then supplies a local lake or river. Our hydrologic models (see Supplementary Material) and empirical observations (Gash et al., 1991; Monteith, 1991) for the Sahel suggest that this limit is in the 200–300 mm/yr range, meaning that most of the Sahara during the AHP was probably too dry to support very large lakes or perennial rivers by means of local runoff. This does not preclude creation of local wetlands supplied by ground-water recharge focused from a very large recharge area or forced to the surface by hydrologic barriers such as faults, nor megalakes like Chad supplied by moisture from the subtropics and tropics outside the Sahel. But it does raise a key question concerning the size of paleolakes, if not megalakes, in the Sahara during the AHP. Our analysis suggests that Sahelo-Sudanian climate could perhaps support a paleolake approximately ≤5000 km2 in area in the Darfur basin and ≤10,000–20,000 km2 in the Fezzan basin. These are more than an order of magnitude smaller than the megalakes envisioned for these basins, but they are still sizable, and if enclosed in a single body of water, should have been large enough to generate clear shorelines (Enzel et al., 2015, 2017). On the other hand, if surface water was dispersed across a series of shallow and extensive but partly disconnected wetlands, as also implied by previous research (e.g., Pachur and Hoelzmann, 1991), then shorelines may not have developed.
One of the underdeveloped ideas of my Indo-European demic diffusion model was that R1b-V88 had migrated through South Italy to Northern Africa, and from it using the Sahara Green Corridor to the south, from where the “upside-down” view of Bender (2007) could have occurred, i.e. Afroasiatic expanding westwards within the Green Sahara, precisely at this time, and from a homeland near the Megalake Chad region (see here).
Whether or not R1b-V88 brought the ‘original’ lineage that expanded Afroasiatic languages may be contended, but after D’Atanasio et al. (2018) it seems that only two lineages, E-M2 and R1b-V88, fit the ‘star-like’ structure suggesting an appropriate haplogroup expansion and necessary regional distribution that could explain the spread of Afroasiatic languages within a reasonable time frame.
This review shows that the hypothesized Green Sahara corridor full of megalakes that some proposed had fully connected Africa from west to east was actually a strip of Sahelo-Sudanian steppe spread to the north of its current distribution, including the Chad megalake, East Africa and Arabia, apart from other discontinuous local wetlands further to the north in Africa. This greenish belt would have probably allowed for the initial spread of early Afroasiatic proto-languages only through the southern part of the current Sahara Desert. This and the R1b-V88 haplogroup distribution in Central and North Africa (with a prevalence among Chadic speakers probably due to later bottlenecks), and the Near East, leaves still fewer possibilities for an expansion of Afroasiatic from anywhere else.
If my proposal turns out to be correct, this Afroasiatic-like language would be the one suggested by some in the vocabulary of Old European and North European local groups (viz. Kroonen for the Agricultural Substrate Hypothesis), and not Anatolian farmer ancestry or haplogroup G2, which would have been rather confined to Southern Europe, mainly south of the Loess line, where incoming Middle East farmers encountered the main difficulties spreading agriculture and herding, and where they eventually admixed with local hunter-gatherers.
NOTE. If related to attested languages before the Roman expansion, Tyrsenian would be a good candidate for a descendant of the language of Anatolian farmers, given the more recent expansion of Anatolian ancestry to the Tuscan region (even if already influenced by Iran farmer ancestry), which reinforces its direct connection to the Aegean.
The fiercest opposition to this R1b-V88 – Afroasiatic connection may come from:
Traditional Hamito-Semitic scholars, who try to look for any parent language almost invariably in or around the Near East – the typical “here it was first attested, ergo here must be the origin, too”-assumption (coupled with the cradle of civilization memes) akin to the original reasons behind Anatolian or Out-of-India hypotheses; and of course
autochthonous continuity theories based on modern subclades, of (mainly Semitic) peoples of haplogroup E or J, who will root for either one or the other as the Afroasiatic source no matter what. As we have seen with the R1a – Indo-European hypothesis (see here for its history), this is never the right way to look at prehistoric migrations, though.
I proposed that it was R1a-M417 the lineage marking an expansion of Indo-Uralic from the east near Lake Baikal, then obviously connected to Yukaghir and Altaic languages marked by R1a-M17, and that haplogroup R could then be the source of a hypothetic Nostratic expansion (where R2 could mark the Dravidian expansion), with upper clades being maybe responsible for Borean.
However, recent studies have shown early expansions of R1b-297 to East Europe (Mathieson et al. 2017 & 2018), and of R1b-M73 to East Eurasia probably up to Siberia, and possibly reaching the Pacific (Jeong et al. 2018). Also, the Steppe Eneolithic and Caucasus Eneolithic clusters seen in Wang et al. (2018) would be able to explain the WHG – EHG – ANE ancestry cline seen in Mesolithic and Neolithic Eurasia without a need for westward migrations.
Dravidian is now after Narasimhan et al. (2018) and Damgaard et al. (Science 2018) more and more likely to be linked to the expansion of the Indus Valley civilization and haplogroup J, in turn strongly linked to Iranian farmer ancestry, thus giving support to an Elamo-Dravidian group stemming from Iran Neolithic.
NOTE. This Dravidian-IVC and Iran connection has been supported for years by knowledgeable bloggers and commenters alike, see e.g. one of Razib Khan’s posts on the subject. This rather early support for what is obvious today is probably behind the reactionary views by some nationalist Hindus, who probably saw in this a potential reason for a strengthened Indo-Aryan/Dravidian divide adding to the religious patchwork that is modern India.
I am not in a good position to judge Nostratic, and I don’t think Glottochronology, Swadesh lists, or any statistical methods applied to a bunch of words are of any use, here or anywhere. The work of pioneers like Illich-Svitych or Starostin, on the other hand, seem to me solid attempts to obtain a faithful reconstruction, if rather outdated today.
NOTE. I am still struggling to learn more about Uralic and Indo-Uralic; not because it is more difficult than Indo-European, but because – in comparison to PIE comparative grammar – material about them is scarce, and the few available sources are sometimes contradictory. My knowledge of Afroasiatic is limited to Semitic (Arabic and Akkadian), and the field is not much more developed here than for Uralic…
If one wanted to support a Nostratic proto-language, though, and not being able to take into account genome-wide autosomal admixture, the only haplogroup right now which can connect the expansion of all its branches is R1b-M343:
R1b-L278 expanded from Asia to Europe through the Iranian Plateau, since early subclades are found in Iran and the Caucasus region, thus supporting the separation of Elamo-Dravidian and Kartvelian branches;
R1b-V88 expanding everywhere in Europe, and especially the branch expanding to the south into Africa, may be linked to the initial Afroasiatic expansion through the Pale-Green Sahara corridor (and even a hypothetic expansion with E-M2 subclades and/or from the Middle East would also leave open the influence of V88 and previous R1b subclades from the Middle East in the emergence of the language);
R1b-297 subclades expanding to the east may be linked to Eurasiatic, giving rise to both Indo-Uralic (M269) and Macro- or Micro-Altaic (M73) expansions.
This is shameless, simplistic speculation, of course, but not more than the Nostratic hypothesis, and it has the main advantage of offering ‘small and late’ language expansions relative to other proposals spanning thousands (or even tens of thousands) of years more of language separation. On the other hand, that would leave Borean out of the question, unless the initial expansion of R1b subclades happened from a community close to lake Baikal (and Mal’ta) that was also at the origin of the other supposedly related Borean branches, whether linked to haplogroup R or to any other…
NOTE. If Afroasiatic and Indo-Uralic (or Eurasiatic) are not genetically related, my previous simplistic model, R1b-Afroasiatic vs. R1a-Eurasiatic, may still be supported, with R1a-M17 potentially marking the latest meaningful westward population expansion from which EHG ancestry might have developed (see here). Without detailed works on Nostratic comparative grammar and dialectalization, and especially without a lot more Palaeolithic and Mesolithic samples, all this will remain highly speculative, like proposals of the 2000s about Y-DNA-haplogroup – language relationships.
Objectives Following the Xiongnu and Xianbei, the Rouran Khaganate (Rouran) was the third great nomadic tribe on the Mongolian Steppe. However, few human remains from this tribe are available for archaeologists and geneticists to study, as traces of the tombs of these nomadic people have rarely been found. In 2014, the IA‐M1 remains (TL1) at the Khermen Tal site from the Rouran period were found by a Sino‐Mongolian joint archaeological team in Mongolia, providing precious material for research into the genetic imprint of the Rouran.
Materials and methods
The mtDNA hypervariable sequence I (HVS‐I) and Y‐chromosome SNPs were analyzed, and capture of the paternal non‐recombining region of the Y chromosome (NRY) and whole‐genome shotgun sequencing of TL1 were performed. The materials from three sites representing the three ancient nationalities (Donghu, Xianbei, and Shiwei) were selected for comparison with the TL1 individual.
The mitochondrial haplotype of the TL1 individual was D4b1a2a1. The Y‐chromosome haplotype was C2b1a1b/F3830 (ISOGG 2015), which was the same as that of the other two ancient male nomadic samples (ZHS5 and GG3) related to the Xianbei and Shiwei, which were also detected as F3889; this haplotype was reported to be downstream of F3830 by Wei et al. (2017).
We conclude that F3889 downstream of F3830 is an important paternal lineage of the ancient Donghu nomads. The Donghu‐Xianbei branch is expected to have made an important paternal genetic contribution to Rouran. This component of gene flow ultimately entered the gene pool of modern Mongolic‐ and Manchu‐speaking populations.
These results suggested that TL1 likely presents a close paternal relationship to the Donghu people and may have even descended from a branch of the ancient Donghu-Xianbei people, based on the conclusion that haplogroup C2b1a/F3918 can be considered the paternal branch of the ancient Donghu people (Zhang et al., 2018). The Y-chromosome phylogenetic tree showed that TL1 shared a branch with modern Mongolian-Buryats, Hezhen, Xibo, Yugur, and Kazakh, suggesting that the TL1 individual from the Rouran period should also generally present close paternal genetic relationships with modern Mongolic- and Manchu-speaking peoples.
In general, the Rouran Khaganate originated from an alliance of the ancient Eurasian steppe nomads, which disintegrated and disappeared with the progress of history. This group was complex, and its origin cannot be explained based only on one individual. However, we can trace the genetic imprint of the Rouran people through genome analysis of the TL1 individual. On the basis of the comparison with other ancient nomadic people (Donghu, Xianbei, and Shiwei) and data on modern individuals from published articles (Lippold et al., 2014; Wei et al., 2017) (Supporting Information S5), we found that they all share the same haplotype implying shared paternal ancestry between the Donghu, Xianbei and Rouran populations. Furthermore, this gene flow (mainly haplogroup C2b1a/F3918) did not stop with the disappearance of the Rouran, and a portion was instead passed on in other groups, such as the ancient Shiwei people (later than Rouran), eventually reaching the gene pool of modern Mongolic- and Manchu-speaking populations (Mongolian-Buryats, Hezhen, Xibo, et al).
Interesting to see now confirmed with ancient DNA the proposal of a C3*-DYS448del cluster as the paternal lineage defining ancient Mongolian tribes, a theory based on ancient and modern samples – since it is found in low frequency in almost all Mongolic- and Turkic-speaking populations.
Han Chinese, Japanese and Korean, the three major ethnic groups of East Asia, share many similarities in appearance, language and culture etc., but their genetic relationships, divergence times and subsequent genetic exchanges have not been well studied.
We conducted a genome-wide study and evaluated the population structure of 182 Han Chinese, 90 Japanese and 100 Korean individuals, together with the data of 630 individuals representing 8 populations wordwide. Our analyses revealed that Han Chinese, Japanese and Korean populations have distinct genetic makeup and can be well distinguished based on either the genome wide data or a panel of ancestry informative markers (AIMs). Their genetic structure corresponds well to their geographical distributions, indicating geographical isolation played a critical role in driving population differentiation in East Asia. The most recent common ancestor of the three populations was dated back to 3000 ~ 3600 years ago. Our analyses also revealed substantial admixture within the three populations which occurred subsequent to initial splits, and distinct gene introgression from surrounding populations, of which northern ancestral component is dominant.
These estimations and findings facilitate to understanding population history and mechanism of human genetic diversity in East Asia, and have implications for both evolutionary and medical studies.
It is obvious that the genetic difference among the three East Asian groups initially resulted from population divergence due to pre-historical or historical migrations. Subsequently, different geographical locations where the three populations are located, mainland of China, Korean Peninsular and Japanese archipelago, respectively, apparently facilitated population differentiation due to physical isolation and independent genetic drift. Our estimations of population divergence time among the three groups, 1.2~ 3.6 KYA, are largely consistent with known history of the three populations and those related. However, considering that recent admixture could have reduced genetic difference between populations, it is likely the divergence time was underestimated.
We detected substantial gene flow among the three populations and also from the surrounding populations. For example, based on our analysis with the F3 test, Korean received gene flow from Han Chinese and Japanese, and gene flow also happened between Han Chinese and Japanese (Additional file 12: Table S3). These gene flows are expected to have reduced the genetic differentiation between the three ethnic groups. On the other hand, we also detected considerable gene flow from surrounding populations to the three populations studied. For instance, an ancestral population represented by Ryukyuan have contributed greater to Japanese than to Han Chinese, while southern ethnic group like Dai have contributed more to continent populations than to island and peninsula populations. Contrary to the gene flow among the three populations, these gene flows from surrounding populations are expected to have increased genetic difference among the three populations if they occurred independently and from different source populations. According to our results, the major source of gene flow to the three ethnic groups were substantially different, for example, the major source of gene flow to Han Chinese was from southern ethnic groups, the major source of gene flow to Japanese was from southern islands, and the major source of gene flow to Korean were from both mainland and islands. Therefore, those gene flows might have significantly contributed to further genetic differentiation of the three populations.
The three populations have similar but not identical demographical history; they all experience a strong population expansion in the last 20,000 years. However, according to different geographic distribution, their effective population size and population expansion are different.
Although based on modern populations, the study is interesting in light of the potential implications for a Macro-Altaic proposal.