Reconstruction of Y-DNA phylogeny helps also reconstruct Tibeto-Burman expansion


New paper (behind paywall) Reconstruction of Y-chromosome phylogeny reveals two neolithic expansions of Tibeto-Burman populations by Wang et al. Mol Genet Genomics (2018).

Interesting excerpts:

Archeological studies suggest that a subgroup of ancient populations of the Miaodigou culture (~ 6300–5500 BP) moved westward to the upper stream region of the Yellow River and created the Majiayao culture (~ 5400–4900 BP) (Liu et al. 2010), which was proposed to be the remains of direct ancestors of Tibeto-Burman populations (Sagart 2008). On the other hand, Han populations, the other major descendant group of the Yang-Shao culture (~ 7000–5500 BP), are composed of many other sub-lineages of Oα-F5 and extremely low frequencies of D-M174 (Additional files 1: Figure S1; Additional files 2: Table S1). Therefore, we propose that Oα-F5 may be one of the dominant paternal lineages in ancient populations of Yang-Shao culture and its successors.

In this study, we demonstrated that both sub-lineages of D-M174 and Oα-F5 are founding paternal lineages of modern Tibeto-Burman populations. The genetic patterns suggested that the ancestor group of modern Tibeto-Burman populations may be an admixture of two distinct ancient populations. One of them may be hunter–gatherer populations who survived on the plateau since the Paleolithic Age, represented by varied sub-lineages of sub-lineages of D-M174. The other one was comprised of farmers who migrated from the middle Yellow River basin, represented by sub-lineages of Oα-F5. In general, the genetic evidence in this study supports the conclusion that the appearance of the ancestor group of Tibeto-Burman populations was triggered by the Neolithic expansion from the upper-middle Yellow River basin and admixture with local populations on the Tibetan Plateau (Su et al. 2000).

Simplified phylogenetic tree showing sample locations. The size of the circle for each sampling location corresponds to the number of samples

Two neolithic expansion origins of Tibeto‑Burman populations

We also observed significant differences in the paternal gene pool of different subgroups of Tibeto-Burman populations. Haplogroup D-M174 contributed ~ 54% percent in a sampling of 2354 Tibetan males throughout the Tibetan Plateau (Qi et al. 2013). Previous studies have also found high frequencies of D-M174 in other populations on the Tibetan Plateau (Shi et al. 2008), including Sherpa (Lu et al. 2016) and Qiang (Wang et al. 2014). In contrast, haplogroup D-M174 is rare or absent from Tibeto-Burman populations from Northeast India and Burma (Shi et al. 2008). In populations of the Ngwi-Burmese language subgroup, the average frequencies of haplogroup D-M174 are ~ 5% (Dong et al. 2004; Peng et al. 2014). Furthermore, we found that lineage Oα1c1b-CTS5308 is mainly found in Tibeto-Burman populations from the Tibetan Plateau. In contrast, lineage Oα1c1a-Z25929 was found in Tibeto-Burman populations from Northeast India, Burma, and the Yunan and Hunan provinces of China (Additional files 1: Figure S1; Additional files 2: Table S1). In general, enrichment of lineage Oα1c1b- CTS5308 and high frequencies of D-M174 can be found in most Tibeto-Burman populations on the Tibetan Plateau and adjacent regions, whereas Tibeto-Burman populations from other regions tend to have lineage Oα1c1a-Z25929 and a little to no percentage of D-M174.

The inconsistent pattern we observed in the paternal gene pool of modern Tibeto-Burman populations suggested that there may be two distinct ancestor groups (Fig. 3). The proposed migration routes shown in Fig. 3 are somewhat different from those proposed by Su et al. (2000). According to our age estimation, most of the D1a2a-P47 samples belong to sub-lineage PH116, a young lineage that emerged ~ 2500 years ago (95% CI 1915–3188 years). On the other hand, continuous differentiation can be observed on a phylogenetic tree of lineages D1a1a1a1-PH4979 and D1a1a1a2-Z31591 since 6000 years ago. Therefore, we proposed that a group of ancient populations may have moved to the upper basin of the Yellow River and admixed intensively with local populations with high frequencies of haplogroup D-M174, including its sub-lineage D1a2a-P47 (Fig. 3). This ancestor group eventually gave birth to modern Tibeto-Burman populations on the Tibetan Plateau and adjacent regions. The other ancestor group moved toward the southwest and finally reached South East Asia (Burma and other locations) and the northeastern part of India (Fig. 3). This ancestor group may have had no or a minor admixture of D-M174 in their paternal gene pool.

Two proposed ancestor groups and migration routes for Tibeto-Burman populations

Long‑term admixture before expansion to a high‑altitude region

It is interesting to investigate the time gap between the appearance of Neolithic cultures in the northeastern part of the Tibetan Plateau and the final phase of human expansion across the Tibetan Plateau. The Majiayao culture (~ 5400–4900 BP) is the earliest Neolithic culture in the northeastern part of the Tibetan Plateau (Liu et al. 2010). However, previous archeological study has suggested that the final phase of diffusion into the high-altitude area of the Tibetan Plateau occurred at approximately 3.6 kya (Chen et al. 2015). Our genetic evidence in this study is consistent with this scenario based on archeological evidence. Based on Y-chromosome analysis in this study, many unique lineages of Tibeto-Burman populations emerged between 6000 years ago and 2500 years ago (Additional files 3: Table S2). The most recent common age of D1a2-PH116, a sub-lineage that spread throughout the Tibetan Plateau, is only 2500 years ago.

We propose that there may be two important factors for the observed age gap. First, living in a high-altitude environment may require some crucial physical characteristics that were lacking from Neolithic immigrants from the middle Yellow River Basin. Intense genetic admixture with local people who had survived on the Tibetan Plateau since the Paleolithic Age may have actually guaranteed the expansion of humans across the Tibetan Plateau. Therefore, a long period of admixture, lasting from 5.4 to 3.6 kya, may be necessary for the appearance of a population with beneficial genetic variants that was genetically adapted to the high-altitude environment. Second, technological innovations, such as the domestication of wheat and highland barley (Chen et al. 2015), establishment of yak pastoralism (Rhode et al. 2007), and introduction of other culture elements in the Bronze Age (Ma et al. 2016), are also important factors that facilitated permanent settlements with large population sizes in the high-altitude area of the Tibetan Plateau.


Genetic structure, divergence and admixture of Han Chinese, Japanese and Korean populations


Open access Genetic structure, divergence and admixture of Han Chinese, Japanese and Korean populations, by Wang, Lu, Chung, and Xu, Hereditas (2018) 155:19.

Abstract (emphasis mine):

Han Chinese, Japanese and Korean, the three major ethnic groups of East Asia, share many similarities in appearance, language and culture etc., but their genetic relationships, divergence times and subsequent genetic exchanges have not been well studied.

We conducted a genome-wide study and evaluated the population structure of 182 Han Chinese, 90 Japanese and 100 Korean individuals, together with the data of 630 individuals representing 8 populations wordwide. Our analyses revealed that Han Chinese, Japanese and Korean populations have distinct genetic makeup and can be well distinguished based on either the genome wide data or a panel of ancestry informative markers (AIMs). Their genetic structure corresponds well to their geographical distributions, indicating geographical isolation played a critical role in driving population differentiation in East Asia. The most recent common ancestor of the three populations was dated back to 3000 ~ 3600 years ago. Our analyses also revealed substantial admixture within the three populations which occurred subsequent to initial splits, and distinct gene introgression from surrounding populations, of which northern ancestral component is dominant.

These estimations and findings facilitate to understanding population history and mechanism of human genetic diversity in East Asia, and have implications for both evolutionary and medical studies.

Population level phylogenetic Tree and Principal component analysis (PCA). (A) The maximum likelihood tree was constructed based on pair-wise FST matrix. And the marked number are bootstrap value; (B) The top two PCs of individuals representing six East Asian populations, mapped to their corresponding geographic locations (generated by R 2.15.2 and Microsoft Excel 2010)

Interesting excerpts:

It is obvious that the genetic difference among the three East Asian groups initially resulted from population divergence due to pre-historical or historical migrations. Subsequently, different geographical locations where the three populations are located, mainland of China, Korean Peninsular and Japanese archipelago, respectively, apparently facilitated population differentiation due to physical isolation and independent genetic drift. Our estimations of population divergence time among the three groups, 1.2~ 3.6 KYA, are largely consistent with known history of the three populations and those related. However, considering that recent admixture could have reduced genetic difference between populations, it is likely the divergence time was underestimated.

We detected substantial gene flow among the three populations and also from the surrounding populations. For example, based on our analysis with the F3 test, Korean received gene flow from Han Chinese and Japanese, and gene flow also happened between Han Chinese and Japanese (Additional file 12: Table S3). These gene flows are expected to have reduced the genetic differentiation between the three ethnic groups. On the other hand, we also detected considerable gene flow from surrounding populations to the three populations studied. For instance, an ancestral population represented by Ryukyuan have contributed greater to Japanese than to Han Chinese, while southern ethnic group like Dai have contributed more to continent populations than to island and peninsula populations. Contrary to the gene flow among the three populations, these gene flows from surrounding populations are expected to have increased genetic difference among the three populations if they occurred independently and from different source populations. According to our results, the major source of gene flow to the three ethnic groups were substantially different, for example, the major source of gene flow to Han Chinese was from southern ethnic groups, the major source of gene flow to Japanese was from southern islands, and the major source of gene flow to Korean were from both mainland and islands. Therefore, those gene flows might have significantly contributed to further genetic differentiation of the three populations.

The three populations have similar but not identical demographical history; they all experience a strong population expansion in the last 20,000 years. However, according to different geographic distribution, their effective population size and population expansion are different.

Although based on modern populations, the study is interesting in light of the potential implications for a Macro-Altaic proposal.


Ancient genomes document multiple waves of migration in south-east Asian prehistory


Open access preprint at bioRxiv Ancient genomes document multiple waves of migration in Southeast Asian prehistory, by Lipson, Cheronet, Mallick, et al. (2018).

Abstract (emphasis mine):

Southeast Asia is home to rich human genetic and linguistic diversity, but the details of past population movements in the region are not well known. Here, we report genome-wide ancient DNA data from thirteen Southeast Asian individuals spanning from the Neolithic period through the Iron Age (4100-1700 years ago). Early agriculturalists from Man Bac in Vietnam possessed a mixture of East Asian (southern Chinese farmer) and deeply diverged eastern Eurasian (hunter-gatherer) ancestry characteristic of Austroasiatic speakers, with similar ancestry as far south as Indonesia providing evidence for an expansive initial spread of Austroasiatic languages. In a striking parallel with Europe, later sites from across the region show closer connections to present-day majority groups, reflecting a second major influx of migrants by the time of the Bronze Age.

Schematics of admixture graph results. (A) Wider phylogenetic context. (B) Details of the Austroasiatic clade. Branch lengths are not to scale, and the order of the two events on the Nicobarese lineage in (B) is not well determined (Supplementary Text).

Featured image, from the article: “Overview of samples. (A) Locations and dates of ancient individuals. Overlapping positions are shifted slightly for visibility. (B) PCA with East and Southeast Asians. We projected the ancient samples onto axes computed using the present-day populations (with the exception of Mlabri, who were projected instead due to their large population-speci c drift). Present-day colors indicate language family affiliation: green, Austroasiatic; blue, Austronesian; orange, Hmong-Mien; black, Sino-Tibetan; magenta, Tai-Kadai.”

See also:

Ancient Di-Qiang people show early links with Han Chinese


Bernard Sécher reports on a recent article, Ancient DNA reveals genetic connections between early Di-Qiang and Han Chinese, by Li et al., BMC Evolutionary Biology (2017).


Ancient Di-Qiang people once resided in the Ganqing region of China, adjacent to the Central Plain area from where Han Chinese originated. While gene flow between the Di-Qiang and Han Chinese has been proposed, there is no evidence to support this view. Here we analyzed the human remains from an early Di-Qiang site (Mogou site dated ~4000 years old) and compared them to other ancient DNA across China, including an early Han-related site (Hengbei site dated ~3000 years old) to establish the underlying genetic relationship between the Di-Qiang and ancestors of Han Chinese.

We found Mogou mtDNA haplogroups were highly diverse, comprising 14 haplogroups: A, B, C, D (D*, D4, D5), F, G, M7, M8, M10, M13, M25, N*, N9a, and Z. In contrast, Mogou males were all Y-DNA haplogroup O3a2/P201; specifically one male was further assigned to O3a2c1a/M117 using targeted unique regions on the non-recombining region of the Y-chromosome. We compared Mogou to 7 other ancient and 38 modern Chinese groups, in a total of 1793 individuals, and found that Mogou shared close genetic distances with Taojiazhai (a more recent Di-Qiang population), Hengbei, and Northern Han. We modeled their interactions using Approximate Bayesian Computation, and support was given to a potential admixture of ~13-18% between the Mogou and Northern Han around 3300–3800 years ago.

Mogou harbors the earliest genetically identifiable Di-Qiang, ancestral to the Taojiazhai, and up to ~33% paternal and ~70% of its maternal haplogroups could be found in present-day Northern Han Chinese.

MDS plot of genetic distance Fst between 3 ancient and 38 modern Chinese groups

Interesting times now for the investigation of potential migrations associated with the expansion of Sino-Tibetan and Altaic languages


Two more studies on the genetic history of East Asia: Han Chinese and Thailand


A comprehensive map of genetic variation in the world’s largest ethnic group – Han Chinese, by Charleston et al. (2017).

It is believed – based on uniparental markers from modern and ancient DNA samples and array-based genome-wide data – that Han Chinese originated in the Central Plain region of China during prehistoric times, expanding with agriculture and technology northward and southward, to become the largest Chinese ethnic group.


As are most non-European populations around the globe, the Han Chinese are relatively understudied in population and medical genetics studies. From low-coverage whole-genome sequencing of 11,670 Han Chinese women we present a catalog of 25,057,223 variants, including 548,401 novel variants that are seen at least 10 times in our dataset. Individuals from our study come from 19 out of 22 provinces across China, allowing us to study population structure, genetic ancestry, and local adaptation in Han Chinese. We identify previously unrecognized population structure along the East-West axis of China and report unique signals of admixture across geographical space, such as European influences among the Northwestern provinces of China. Finally, we identified a number of highly differentiated loci, indicative of local adaptation in the Han Chinese. In particular, we detected extreme differentiation among the Han Chinese at MTHFR, ADH7, and FADS loci, suggesting that these loci may not be specifically selected in Tibetan and Inuit populations as previously suggested. On the other hand, we find that Neandertal ancestry does not vary significantly across the provinces, consistent with admixture prior to the dispersal of modern Han Chinese. Furthermore, contrary to a previous report, Neandertal ancestry does not explain a significant amount of heritability in depression. Our findings provide the largest genetic data set so far made available for Han Chinese and provide insights into the history and population structure of the world’s largest ethnic group.

Using Shanghai individuals as representatives, shared drift between Chinese and ancient humans are computed by calculating the outgroup f3 statistics of the form f3(Mbuty;X, Y), with ancient individuals separated into approximately Palaeolithic, Mesolithic, Neolithic , and Chalcolithic-Medieval times. it is found that modern Chinese individuals show greater shared drift with pre-Neolithic hunter-gatherers rather than Neolithic farmers (Featured image from the article).

EDIT (17/7/2017): Davidski at Eurogenes shares an interesting view on this kind of results:

These sorts of estimates always look way off. And I doubt that it’s largely the result of the Silk Road, which linked China to the Near East and Mediterranean rather than to Northern Europe. More likely it reflects gene flow from the Pontic-Caspian steppe in Eastern Europe during the Bronze and Iron ages, via the Afanasievo, Andronovo, and other closely related steppe peoples

New insights from Thailand into the maternal genetic history of Mainland Southeast Asia, by Kutanan et al. (2017)


Tai-Kadai (TK) is one of the major language families in Mainland Southeast Asia (MSEA), with a concentration in the area of Thailand and Laos. Our previous study of 1,234 mtDNA genome sequences supported a demic diffusion scenario in the spread of TK languages from southern China to Laos as well as northern and northeastern Thailand. Here we add an additional 560 mtDNA sequences from 22 groups, with a focus on the TK-speaking central Thai people and the Sino-Tibetan speaking Karen. We find extensive diversity, including 62 haplogroups not reported previously from this region. Demic diffusion is still a preferable scenario for central Thais, emphasizing the extension and expansion of TK people through MSEA, although there is also some support for an admixture model. We also tested competing models concerning the genetic relationships of groups from the major MSEA languages, and found support for an ancestral relationship of TK and Austronesian-speaking groups.