South-East Asia samples include shared ancestry with Jōmon


New paper (behind paywall) The prehistoric peopling of Southeast Asia, by McColl et al. (Science 2018) 361(6397):88-92 from a recent bioRxiv preprint.

Interesting is this apparently newly reported information including a female sample from the Ikawazu Jōmon of Japan ca. 570 BC (emphasis mine):

The two oldest samples — Hòabìnhians from Pha Faen, Laos [La368; 7950 with 7795 calendar years before the present (cal B.P.)] and Gua Cha, Malaysia (Ma911; 4415 to 4160 cal B.P.)—henceforth labeled “group 1,” cluster most closely with present-day Önge from the Andaman Islands and away from other East Asian and Southeast-Asian populations (Fig. 2), a pattern that differentiates them from all other ancient samples. We used ADMIXTURE (14) and fastNGSadmix (15) to model ancient genomes as mixtures of latent ancestry components (11). Group 1 individuals differ from the other Southeast Asian ancient samples in containing components shared with the supposed descendants of the Hòabìnhians: the Önge and the Jehai (Peninsular Malaysia), along with groups from India and Papua New Guinea.

We also find a distinctive relationship between the group 1 samples and the Ikawazu Jōmon of Japan (IK002). Outgroup f3 statistics (11, 16) show that group 1 shares the most genetic drift with all ancient mainland samples and Jōmon (fig. S12 and table S4). All other ancient genomes share more drift with present-day East Asian and Southeast Asian populations than with Jōmon (figs. S13 to S19 and tables S4 to S11). This is apparent in the fastNGSadmix analysis when assuming six ancestral components (K = 6) (fig. S11), where the Jōmon sample contains East Asian components and components found in group 1. To detect populations with genetic affinities to Jōmon, relative to present-day Japanese, we computed D statistics of the form D(Japanese, Jōmon; X, Mbuti), setting X to be different presentday and ancient Southeast Asian individuals (table S22). The strongest signal is seen when X=Ma911 and La368 (group 1 individuals), showing a marginally nonsignificant affinity to Jōmon (11). This signal is not observed with X = Papuans or Önge, suggesting that the Jōmon and Hòabìnhians may share group 1 ancestry (11).

Model for plausible migration routes into SEA. This schematic is based on ancestry patterns observed in the ancient genomes. Because we do not have ancient samples to accurately resolve how the ancestors of Jōmon and Japanese populations entered the Japanese archipelago, these migrations are represented by dashed arrows. A mainland component in Indonesia is depicted by the dashed red-green line. Gr, group; Kra, Kradai.

(…) Finally, the Jōmon individual is best-modeled as a mix between a population related to group 1/Önge and a population related to East Asians (Amis), whereas present-day Japanese can be modeled as a mixture of Jōmon and an additional East Asian component (Fig. 3 and fig. S29)

Interesting in relation to the oral communication of the SMBE O-03-OS02 Whole genome analysis of the Jomon remain reveals deep lineage of East Eurasian populations by Gakuuhari et al.:

Post late-Paleolithic hunter-gatherers lived throughout the Japanese archipelago, Jomonese, are thought to be a key to understanding the peopling history in East Asia. Here, we report a whole genome sequence (x1.85) of 2,500-year old female excavated from the Ikawazu shell-mound, unearthed typical remains of Jomon culture. The whole genome data places the Jomon as a lineage basal to contemporary and ancient populations of the eastern part of Eurasian continent, and supports the closest relationship with the modern Hokkaido Ainu. The results of ADMIXTURE show the Jomon ancestry is prevalent in present-day Nivkh, Ulchi, and people in the main-island Japan. By including the Jomon genome into phylogenetic trees, ancient lineages of the Kusunda and the Sherpa/Tibetan, early splitting from the rest of East Asian populations, is emerged. Thus, the Jomon genome gives a new insight in East Asian expansion. The Ikawazu shell-mound site locates on 34,38,43 north latitude, and 137,8, 52 east longitude in the central main-island of the Japanese archipelago, corresponding to a warm and humid monsoon region, which has been thought to be almost impossible to maintain sufficient ancient DNA for genome analysis. Our achievement opens up new possibilities for such geographical regions.


Mitogenomes from Thailand offer insights into maternal genetic history of mainland South-East Asia

Open access New insights from Thailand into the maternal genetic history of Mainland Southeast Asia, by Kutanan et al. Eur. J. Hum. Genet. (2018) 26:898–911

Abstract (emphasis mine):

Tai-Kadai (TK) is one of the major language families in Mainland Southeast Asia (MSEA), with a concentration in the area of Thailand and Laos. Our previous study of 1234 mtDNA genome sequences supported a demic diffusion scenario in the spread of TK languages from southern China to Laos as well as northern and northeastern Thailand. Here we add an additional 560 mtDNA genomes from 22 groups, with a focus on the TK-speaking central Thai people and the Sino-Tibetan speaking Karen. We find extensive diversity, including 62 haplogroups not reported previously from this region. Demic diffusion is still a preferable scenario for central Thais, emphasizing the expansion of TK people through MSEA, although there is also some support for gene flow between central Thai and native Austroasiatic speaking Mon and Khmer. We also tested competing models concerning the genetic relationships of groups from the major MSEA languages, and found support for an ancestral relationship of TK and Austronesian-speaking groups.

Map showing sample locations and haplogroup distributions. Blue stars indicate the 22 presently studied populations (Tai-Kadai, Austroasiatic, and Sino-Tibetan groups) while red and green circles represent Tai-Kadai and Austroasiatic populations from the previous study [7]. Population abbreviations are in Supplementary Table S1

Interesting excerpts:

Finally, we used simulations to test hypotheses concerning the genetic relationships of groups belonging to different language families. We found that Starosta’s model [11] provided the best fit to the mtDNA data; however, Sagart’s model [9, 10] was also highly supported. These two models both postulate a close linguistic affinity between TK and AN. Although genetic relatedness between TK and AN groups has been previously studied [7, 46, 47], to our knowledge this is the first study to use demographic simulations to select the best-fitting model. Our results support the genetic relatedness of TK and AN groups, which might reflect a postulated shared ancestry among the proto-Austronesian populations of coastal East Asia [48].

Specifically, the best-fitting model suggests that after separation of the prehistoric TK from AN stocks around 5–6 kya in Southeast China, the TK spread southward throughout MSEA around 1–2 kya by a demic diffusion process, accompanied by population growth but with at most minor admixture with the autochthonous AA groups. Meanwhile, the prehistorical AN ancestors entered Taiwan and dispersed southward throughout ISEA, with these two expansions later meeting in western ISEA. The lack of mtDNA haplogroups associated with the expansion out of Taiwan in our Thai/Lao samples has two possible explanations: either the Out of Taiwan expansion did not reach MSEA (at least, in the area of present-day Thailand and Laos); or, if the prehistoric AN migrated through this area, their mtDNA lineages do not survive in modern Thai/Lao populations. Ancient DNA studies in MSEA would further clarify this issue. Moreover, although mtDNA analyses are informative in elucidating genetic perspectives in geographically and linguistically related populations, they have an obvious limitation in that they only provide insights into the maternal history of populations. Future studies of Y chromosomal and genome-wide data will provide further insights into the genetic history of Thai/Lao populations and the role of factors such as post-marital residence patterns and migration in shaping the genetic structure of the region.

Starosta’s chapter referred to in the paper is Proto-East Asian and the origin and dispersal of the languages of East and Southeast Asia and the Pacific.


Genomics reveals four prehistoric migration waves into South-East Asia

Open access preprint article at bioRxiv Ancient Genomics Reveals Four Prehistoric Migration Waves into Southeast Asia, by McColl, Racimo, Vinner, et al. (2018).

Abstract (emphasis mine):

Two distinct population models have been put forward to explain present-day human diversity in Southeast Asia. The first model proposes long-term continuity (Regional Continuity model) while the other suggests two waves of dispersal (Two Layer model). Here, we use whole-genome capture in combination with shotgun sequencing to generate 25 ancient human genome sequences from mainland and island Southeast Asia, and directly test the two competing hypotheses. We find that early genomes from Hoabinhian hunter-gatherer contexts in Laos and Malaysia have genetic affinities with the Onge hunter-gatherers from the Andaman Islands, while Southeast Asian Neolithic farmers have a distinct East Asian genomic ancestry related to present-day Austroasiatic-speaking populations. We also identify two further migratory events, consistent with the expansion of speakers of Austronesian languages into Island Southeast Asia ca. 4 kya, and the expansion by East Asians into northern Vietnam ca. 2 kya. These findings support the Two Layer model for the early peopling of Southeast Asia and highlight the complexities of dispersal patterns from East Asia.

A model for plausible migration routes into Southeast Asia, based on the ancestry patterns observed in the ancient genomes.


New preprint papers on Finland’s population history and disease, skin pigmentation in Africa, and genetic variation in Thailand hunter-gatherers


New and interesting research these days in BioRxiv:

Haplotype sharing provides insights into fine-scale population history and disease in Finland, by Martín et al. (2017):

Finland provides unique opportunities to investigate population and medical genomics because of its adoption of unified national electronic health records, detailed historical and birth records, and serial population bottlenecks. We assemble a comprehensive view of recent population history (≤100 generations), the timespan during which most rare disease-causing alleles arose, by comparing pairwise haplotype sharing from 43,254 Finns to geographically and linguistically adjacent countries with different population histories, including 16,060 Swedes, Estonians, Russians, and Hungarians. We find much more extensive sharing in Finns, with at least one ≥ 5 cM tract on average between pairs of unrelated individuals. By coupling haplotype sharing with fine-scale birth records from over 25,000 individuals, we find that while haplotype sharing broadly decays with geographical distance, there are pockets of excess haplotype sharing; individuals from northeast Finland share several-fold more of their genome in identity-by-descent (IBD) segments than individuals from southwest regions containing the major cities of Helsinki and Turku. We estimate recent effective population size changes over time across regions of Finland and find significant differences between the Early and Late Settlement Regions as expected; however, our results indicate more continuous gene flow than previously indicated as Finns migrated towards the northernmost Lapland region. Lastly, we show that haplotype sharing is locally enriched among pairs of individuals sharing rare alleles by an order of magnitude, especially among pairs sharing rare disease causing variants. Our work provides a general framework for using haplotype sharing to reconstruct an integrative view of recent population history and gain insight into the evolutionary origins of rare variants contributing to disease.

Migration rates and haplotype sharing within Finland and between neighboring countries. A) Map of regional Finnish, Swedish, and Estonian birthplaces Purple triangle indicates St. Petersburg, Russia. Hungary not shown. 1 Finnish, Swedish, and Estonian region labels are shown in Table S3. B) Principal components analysis (PCA) of unrelated individuals, colored by birth region as shown in A) if available or country otherwise. C-D) Migration rates inferred with EEMS. Values and colors indicate inferred rates, for example with +1 (shades of blue) indicating an order of magnitude more migration at a given point on average, and shades of orange indicating migration barriers. C) Migration rates among municipalities in Finland. D) Migration rates within and between Finland, Sweden, Estonia, and St. Petersburg, Russia. Available under a CC-BY 4.0 International license.

Interesting to understand this paper is the whole research published by the Institute for Molecular Medicine Finland (FIMM): their website contains detailed research on Finland’s recent genetic history.

NOTE: The featured image of this article contains three figures from the FIMM (License CC-BY 4.0). Left: Position of the points represents the locations of 1042 Finnish individuals. By clustering the individuals into two groups based on genome data we see a split between eastern (blue) and western (red) parts. Individuals who show considerable relatedness to both groups have been colored with cyan. Both parents of each individual were born close to each other and based on the parents’ birth years we can infer that we are looking at the genetic structure present in Finland before 1950s. Center: An estimated borderline of the Treaty of Nöteborg on top of the map from the left. The border line is drawn between Jääski (28.92 N, 61.04 E) and Pyhäjoki (24.26 N, 64.46 E). Right: The settlement border divides Finland into the early settlement region (to west and south of the border) and the late settlement region (to east and north of the border) (Jutikkala 1933, s. 91). We see that Southern Savo (in south-eastern part of the early settlement) is among the only parts of the early settlement region that is dominated by the eastern genetic group. Information from Matti Pirinen and Sini Kerminen, 24.5.2017.

An Unexpectedly Complex Architecture for Skin Pigmentation in Africans, by Martin et al (2017):

Fewer than 15 genes have been directly associated with skin pigmentation variation in humans, leading to its characterization as a relatively simple trait. However, by assembling a global survey of quantitative skin pigmentation phenotypes, we demonstrate that pigmentation is more complex than previously assumed with genetic architecture varying by latitude. We investigate polygenicity in the Khoe and the San, populations indigenous to southern Africa, who have considerably lighter skin than equatorial Africans. We demonstrate that skin pigmentation is highly heritable, but that known pigmentation loci explain only a small fraction of the variance. Rather, baseline skin pigmentation is a complex, polygenic trait in the KhoeSan. Despite this, we identify canonical and non-canonical skin pigmentation loci, including near SLC24A5, TYRP1, SMARCA2/VLDLR, and SNX13 using a genome-wide association approach complemented by targeted resequencing. By considering diverse, under-studied African populations, we show how the architecture of skin pigmentation can vary across humans subject to different local evolutionary pressures.

Contrasting maternal and paternal genetic variation of hunter-gatherer groups in Thailand, by Kutanan et al. (2017):

The Maniq and Mlabri are the only recorded nomadic hunter-gatherer groups in Thailand. Here, we sequenced complete mitochondrial (mt) DNA genomes and ~2.364 Mbp of non-recombining Y chromosome (NRY) to learn more about the origins of these two enigmatic populations. Both groups exhibited low genetic diversity compared to other Thai populations, and contrasting patterns of mtDNA and NRY diversity: there was greater mtDNA diversity in the Maniq than in the Mlabri, while the converse was true for the NRY. We found basal uniparental lineages in the Maniq, namely mtDNA haplogroups M21a, R21 and M17a, and NRY haplogroup K. Overall, the Maniq are genetically similar to other negrito groups in Southeast Asia. By contrast, the Mlabri haplogroups (B5a1b1 for mtDNA and O1b1a1a1b and O1b1a1a1b1a1 for the NRY) are common lineages in Southeast Asian non-negrito groups, and overall the Mlabri are genetically similar to their linguistic relatives (Htin and Khmu) and other groups from northeastern Thailand. In agreement with previous studies of the Mlabri, our results indicate that the Malbri do not directly descend from the indigenous negritos. Instead, they likely have a recent origin (within the past 1,000 years) by an extreme founder event (involving just one maternal and two paternal lineages) from an agricultural group, most likely the Htin or a closely-related group.