South-East Asia samples include shared ancestry with Jōmon


New paper (behind paywall) The prehistoric peopling of Southeast Asia, by McColl et al. (Science 2018) 361(6397):88-92 from a recent bioRxiv preprint.

Interesting is this apparently newly reported information including a female sample from the Ikawazu Jōmon of Japan ca. 570 BC (emphasis mine):

The two oldest samples — Hòabìnhians from Pha Faen, Laos [La368; 7950 with 7795 calendar years before the present (cal B.P.)] and Gua Cha, Malaysia (Ma911; 4415 to 4160 cal B.P.)—henceforth labeled “group 1,” cluster most closely with present-day Önge from the Andaman Islands and away from other East Asian and Southeast-Asian populations (Fig. 2), a pattern that differentiates them from all other ancient samples. We used ADMIXTURE (14) and fastNGSadmix (15) to model ancient genomes as mixtures of latent ancestry components (11). Group 1 individuals differ from the other Southeast Asian ancient samples in containing components shared with the supposed descendants of the Hòabìnhians: the Önge and the Jehai (Peninsular Malaysia), along with groups from India and Papua New Guinea.

We also find a distinctive relationship between the group 1 samples and the Ikawazu Jōmon of Japan (IK002). Outgroup f3 statistics (11, 16) show that group 1 shares the most genetic drift with all ancient mainland samples and Jōmon (fig. S12 and table S4). All other ancient genomes share more drift with present-day East Asian and Southeast Asian populations than with Jōmon (figs. S13 to S19 and tables S4 to S11). This is apparent in the fastNGSadmix analysis when assuming six ancestral components (K = 6) (fig. S11), where the Jōmon sample contains East Asian components and components found in group 1. To detect populations with genetic affinities to Jōmon, relative to present-day Japanese, we computed D statistics of the form D(Japanese, Jōmon; X, Mbuti), setting X to be different presentday and ancient Southeast Asian individuals (table S22). The strongest signal is seen when X=Ma911 and La368 (group 1 individuals), showing a marginally nonsignificant affinity to Jōmon (11). This signal is not observed with X = Papuans or Önge, suggesting that the Jōmon and Hòabìnhians may share group 1 ancestry (11).

Model for plausible migration routes into SEA. This schematic is based on ancestry patterns observed in the ancient genomes. Because we do not have ancient samples to accurately resolve how the ancestors of Jōmon and Japanese populations entered the Japanese archipelago, these migrations are represented by dashed arrows. A mainland component in Indonesia is depicted by the dashed red-green line. Gr, group; Kra, Kradai.

(…) Finally, the Jōmon individual is best-modeled as a mix between a population related to group 1/Önge and a population related to East Asians (Amis), whereas present-day Japanese can be modeled as a mixture of Jōmon and an additional East Asian component (Fig. 3 and fig. S29)

Interesting in relation to the oral communication of the SMBE O-03-OS02 Whole genome analysis of the Jomon remain reveals deep lineage of East Eurasian populations by Gakuuhari et al.:

Post late-Paleolithic hunter-gatherers lived throughout the Japanese archipelago, Jomonese, are thought to be a key to understanding the peopling history in East Asia. Here, we report a whole genome sequence (x1.85) of 2,500-year old female excavated from the Ikawazu shell-mound, unearthed typical remains of Jomon culture. The whole genome data places the Jomon as a lineage basal to contemporary and ancient populations of the eastern part of Eurasian continent, and supports the closest relationship with the modern Hokkaido Ainu. The results of ADMIXTURE show the Jomon ancestry is prevalent in present-day Nivkh, Ulchi, and people in the main-island Japan. By including the Jomon genome into phylogenetic trees, ancient lineages of the Kusunda and the Sherpa/Tibetan, early splitting from the rest of East Asian populations, is emerged. Thus, the Jomon genome gives a new insight in East Asian expansion. The Ikawazu shell-mound site locates on 34,38,43 north latitude, and 137,8, 52 east longitude in the central main-island of the Japanese archipelago, corresponding to a warm and humid monsoon region, which has been thought to be almost impossible to maintain sufficient ancient DNA for genome analysis. Our achievement opens up new possibilities for such geographical regions.


Expansion of domesticated goat echoes expansion of early farmers


New paper (behind paywall) Ancient goat genomes reveal mosaic domestication in the Fertile Crescent, by Daly et al. Science (2018) 361(6397):85-88.

Interesting excerpts (emphasis mine):

Thus, our data favor a process of Near Eastern animal domestication that is dispersed in space and time, rather than radiating from a central core (3, 11). This resonates with archaeozoological evidence for disparate early management strategies from early Anatolian, Iranian, and Levantine Neolithic sites (12, 13). Interestingly, our finding of divergent goat genomes within the Neolithic echoes genetic investigation of early farmers. Northwestern Anatolian and Iranian human Neolithic genomes are also divergent (14–16), which suggests the sharing of techniques rather than large-scale migrations of populations across Southwest Asia in the period of early domestication. Several crop plants also show evidence of parallel domestication processes in the region (17).

PCA affinity (Fig. 2), supported by qpGraph and outgroup f3 analyses, suggests that modern European goats derive from a source close to the western Neolithic; Far Eastern goats derive from early eastern Neolithic domesticates; and African goats have a contribution from the Levant, but in this case with considerable admixture from the other sources (figs. S11, S16, and S17 and tables S26 and 27). The latter may be in part a result of admixture that is discernible in the same analyses extended to ancient genomes within the Fertile Crescent after the Neolithic (figs. S18 and S19 and tables S20, S27, and S31) when the spread of metallurgy and other developments likely resulted in an expansion of inter-regional trade networks and livestock movement.

Maximumlikelihood phylogeny and geographical distributions of ancient mtDNA haplogroups. (A) A phylogeny placing ancient whole mtDNA sequences in the context of known haplogroups. Symbols denoting individuals are colored by clade membership; shape indicates archaeological period (see key). Unlabeled nodes are modern bezoar and outgroup sequence (Nubian ibex) added for reference.We define haplogroup T as the sister branch to the West Caucasian tur (9). (B and C) Geographical distributions of haplogroups show early highly structured diversity in the Neolithic period (B) followed by collapse of structure in succeeding periods (C).We delineate the tiled maps at 7250 to 6950 BP, a period >bracketing both our earliest Chalcolithic sequence (24, Mianroud) and latest Neolithic (6, Aşağı Pınar). Numbered archaeological sites also include Direkli Cave (8), Abu Ghosh (9), ‘Ain Ghazal (10), and Hovk-1 Cave (11) (table S1) (9).

Our results imply a domestication process carried out by humans in dispersed, divergent, but communicating communities across the Fertile Crescent who selected animals in early millennia, including for pigmentation, the most visible of domestic traits.


BMAC: long term interaction between agricultural communities and steppe pastoralists in Central Asia


Interesting new paper Mixing metaphors: sedentary-mobile interactions and local-global connections in prehistoric Turkmenistan, by Rouse & Cerasetti, Antiquity (2018) 92:674-689.

Relevant excerpts (emphasis mine):

The Murghab alluvial fan in southern Turkmenistan witnessed some of the earliest encounters between sedentary farmers and mobile pastoralists from different cultural spheres. During the late third and early second millennia BC, the Murghab was home to the Oxus civilisation and formed a central node in regional exchange networks (Possehl 2005; Kohl 2007). The Oxus civilisation (or the Bactria-Margiana Archaeological Complex) relied on intensive agriculture to support a hierarchical society and specialised craft production of metal and precious stone objects for prestige display and long-distance exchange (Sarianidi 1981; Hiebert 1994). By c. 1800 BC (the local Late Bronze Age), the internal coherence of the Oxus civilisation began to break down, along with the inter-regional exchange networks; the settlement structure of the Murghab shifted from a tiered system of urban centres, villages and hamlets, to a more dispersed pattern of smaller-scale agricultural settlements (Salvatori 2008). Contemporaneous evidence for small campsites (with a distinct ceramic tradition) suggests an influx of mobile pastoralists from the Central Eurasian Steppe and foothills (Cerasetti 1998; Masson 2002; Cattani et al. 2008). This striking combination of the sites and material cultures of both late Oxus farmers and ‘steppe’ pastoralists spans more than 500 years of Murghab prehistory (Salvatori 2008; Rouse & Cerasetti 2017).

The mixed farmer-pastoralist archaeological record of the Murghab has influenced competing interpretations of Later Bronze Age socio-political and economic relationships. Some scholars argue that the ‘collapse’ of the Oxus civilisation was at least partly due to the hostile incursions of nomads (Marushchenko 1956; Kuz’mina&Lyapin 1984; Vinogradova & Kuz’mina 1996). Others suggest that pastoralists took advantage of the Murghab’s crumbling power structure by moving into the area, but occupying only marginal, agriculturally unsuitable zones (P’yankova 1993), or merging with the late Oxus farming populations (Masson 2002). These models broadly follow ‘trade or raid’ paradigms of farmer-pastoralist interaction, whereby the perceived shortages of pastoralist communities force them to rely on agriculturalists for subsistence, material and cultural inputs (Kroeber 1947; Ferdinand 2003; Potts 2014). Such models may explain certain cases of Near Eastern pastoral economic specialisation, or historical contact scenarios between Eurasian steppe and agricultural communities on China’s northern frontier (Lattimore 1979; Barfield 2001; Alizadeh 2009; Khazanov 2009). Near Eastern and Eurasian interaction paradigms, however, fit increasingly poorly with the archaeological evidence for early farmer-pastoralist encounters in southern Central Asia.

We present data from four Murghab pastoralist campsites dating to the third to second millennia BC, restricting our discussion to the materials and practices employed by Oxus-period pastoralists to navigate shifting social, political and economic networks. Our aim is to highlight how variable strategies broadly identified under the rubric of ‘agropastoralism’ can be teased apart to recognise mechanisms of social boundary-making. Individually, these four sites present chronologically and locally distinct snapshots of farmer-pastoralist interactions across different realms of exchange (e.g. subsistence, technology and ideology); they provide examples of how pastoralists and farmers mutually participated in each other’s material and social norms. Together, these sites reveal how varied farmer-pastoralist engagement with technology and material culture did not lead inevitably to the assimilation of the two groups; rather, they worked consciously within existing systems of cultural practice to maintain distinct ‘farmer’ and ‘pastoralist’ identities, potentially over a 900-year period.

Region of Central Asia as discussed in this article. Areas traditionally identified with farming-dependent Oxus communities and non-Oxus mobile pastoralists are shown, acknowledging that in both areas mixed agropastoral practices have occurred in the past and present.


(…)First, the results indicate a cultural model of ‘being’ a pastoralist that was maintained actively over hundreds of years, in part by its material difference from that of local farmers. Second, the variability of materials, technologies and practices shared at these campsites suggests that no hegemonic power controlled trade relationships or regulated economic dependency between Oxus farmers and non-Oxus mobile pastoralists in the Murghab. Indeed, current data indicate that pastoralist occupation in the Murghab intensified during the waning of Oxus political centralisation, suggesting that the loosening of state-level structures provided the opportunity for intercultural interactions, rather than interactions being promoted or facilitated from the top. Finally, in the removal of broad-brush narratives that polarise ‘the steppe’ and ‘the sown’, and the integration of evidence suggesting that mobile pastoralists influenced the crop systems of farmers in southern Central Asia (Spengler et al. 2014b), these four sites allow us to recognise the means by which farmers and pastoralists re-shaped cultural institutions while reinforcing the meaningfulness of the associated social categories. Current work in the Murghab complements detailed studies of pastoralists in other Eurasian contexts (e.g. Frachetti 2008; Rogers 2012; Honeychurch 2015) in beginning to unravel simplistic notions of broad cross-cultural exchanges in Eurasian prehistory and the political entities traditionally seen as directing them.

The whole article is very interesting, and the four sites studied and their relevance for the said interactions are described in detail, and in chronological order. If you have the opportunity, read it.

I found it interesting that the article mentions the traditional scholarly opposition of agriculturalists vs. pastoralists (‘civilised/barbarian’, ‘state/tribe’ and ‘centre/periphery’) as an idea of Eurasian origin, and having deep ‘Western’ roots. Reading what many OIT (or anti-AIT, as they like to call themselves) supporters write, it seems to me as though they have entirely accepted and in fact are eager to promote this ‘Western’ narrative from the mid-20th century…

Steppe MLBA

This is what Narasimhan et al. (2018) had to say about the BMAC – Steppe pastoralists interaction:

We document a southward spread of genetic ancestry from the Eurasian Steppe, correlating with the archaeologically known expansion of pastoralist sites from the Steppe to Turan in the Middle Bronze Age (2300-1500 BCE). These Steppe communities mixed genetically with peoples of the Bactria Margiana Archaeological Complex (BMAC) whom they encountered in Turan (primarily descendants of earlier agriculturalists of Iran), but there is no evidence that the main BMAC population contributed genetically to later South Asians. Instead, Steppe communities integrated farther south throughout the 2nd millennium BCE, and we show that they mixed with a more southern population that we document at multiple sites as outlier individuals exhibiting a distinctive mixture of ancestry related to Iranian agriculturalists and South Asian hunter-gathers.

Narasimhan et al. (2018): “Modeling results.(A) Admixture events originating from 7 “Distal” populations leading 538 to the formation of the modern Indian cloud shown geographically. Clines or 2-way mixtures of 539 ancestry are shown in rectangles, and clouds (3-way mixtures) are shown in ellipses.

(…) The absence in the BMAC cluster of the Steppe_EMBA ancestry that is ubiquitous in South Asia today—along with qpAdm analyses that rule out BMAC as a substantial source of ancestry in South Asia (Fig. 3A)—suggests that while the BMAC was affected by the same demographic forces that later impacted South Asia (the southward movement of Middle to Late Bronze Age Steppe pastoralists described in the next section), it was also bypassed by members of these groups who hardly mixed with BMAC people and instead mixed with peoples further south. In fact, the data suggest that instead of the main BMAC population having a demographic impact on South Asia, there was a larger effect of gene flow in the reverse direction, as the main BMAC genetic cluster is slightly different from the preceding Turan populations in harboring ~5% of their ancestry from the AASI.

(…)between 2100-1700 BCE, we observe BMAC outliers from three sites with Steppe_EMBA ancestry in the admixed form typically carried by the later Middle to Late Bronze Age Steppe groups (Steppe_MLBA). This documents a southward movement of Steppe ancestry through this region that only began to have a major impact around the turn of the 2nd millennium BCE.


Inca and Spanish Empires had a profound impact on Peruvian demography


Open access Evolutionary genomic dynamics of Peruvians before, during, and after the Inca Empire by Harris et al., PNAS (2018) 201720798 (published ahead of print).

Abstract (emphasis mine):

Native Americans from the Amazon, Andes, and coastal geographic regions of South America have a rich cultural heritage but are genetically understudied, therefore leading to gaps in our knowledge of their genomic architecture and demographic history. In this study, we sequence 150 genomes to high coverage combined with an additional 130 genotype array samples from Native American and mestizo populations in Peru. The majority of our samples possess greater than 90% Native American ancestry, which makes this the most extensive Native American sequencing project to date. Demographic modeling reveals that the peopling of Peru began ∼12,000 y ago, consistent with the hypothesis of the rapid peopling of the Americas and Peruvian archeological data. We find that the Native American populations possess distinct ancestral divisions, whereas the mestizo groups were admixtures of multiple Native American communities that occurred before and during the Inca Empire and Spanish rule. In addition, the mestizo communities also show Spanish introgression largely following Peruvian Independence, nearly 300 y after Spain conquered Peru. Further, we estimate migration events between Peruvian populations from all three geographic regions with the majority of between-region migration moving from the high Andes to the low-altitude Amazon and coast. As such, we present a detailed model of the evolutionary dynamics which impacted the genomes of modern-day Peruvians and a Native American ancestry dataset that will serve as a beneficial resource to addressing the underrepresentation of Native American ancestry in sequencing studies.

Admixture among Peruvian populations. (A) Colors represent contributions from donor populations into the genomes of Peruvian mestizo groups, as estimated by CHROMOPAINTER and GLOBETROTTER. The label within parentheses for each Peruvian Native American source population corresponds to their geographic region where Ama, And, and Coa represent Amazon, Andes, and coast, respectively. (B) Admixture time and proportion for the best fit three-way ancestry (AP, Trujillo and Lima) and two-way ancestry (Iquitos, Cusco, and Puno) TRACT models [European, African, and Native American (NatAm) ancestries] for six mestizo populations. (C) Network of individuals from Peruvian Native American and mestizo groups according to their shared IBD length. Each node is an individual and the length of an edge equals to (1/total shared IBD). IBD segments with different lengths are summed according to different thresholds representing different times in the past (52), with 7.8 cM, 9.3 cM, and 21.8 cM roughly representing the start of the Inca Empire, the Spanish conquest and occupation, and Peruvian independence. IBD networks are generated by Cytoscape (98) and only the major clusters in the network are shown for different cutoffs of segment length. AP, Central Am, and Matsig are short for Afroperuvians, Central American, and Matsiguenka, respectively. The header of each IBD network specifies the length of IBD segments used in each network.

Interesting excerpts

The high frequency of Native American mitochondrial haplotypes suggests that European males were the primary source of European admixture with Native Americans, as previously found (23, 24, 41, 42). The only Peruvian populations that have a proportion of the Central American component are in the Amazon (Fig. 2A). This is supported by Homburger et al. (4), who also found Central American admixture in other Amazonian populations and could represent ancient shared ancestry or a recent migration between Central America and the Amazon.

Following the peopling of Peru, we find a complex history of admixture between Native American populations from multiple geographic regions (Figs. 2B and 3 A and C). This likely began before the Inca Empire due to Native American and mestizo groups sharing IBD segments that correspond to the time before the Inca Empire. However, the Inca Empire likely influenced this pattern due to their policy of forced migrations, known as “mitma” (mitmay in Quechua) (28, 31, 37), which moved large numbers of individuals to incorporate them into the Inca Empire. We can clearly see the influence of the Inca through IBD sharing where the center of dominance in Peru is in the Andes during the Inca Empire (Fig. 3C).

ASPCA of combined Peruvian Genome Project with the HGDP genotyped on the Human Origins Array. A.) European ancestry. B.) African ancestry. Samples are filtered by their corresponding ancestral proportion: European ≥ 30% (panel A) and African ≥ 10% (panel B). The two plots in each panel are identical except for the color scheme: reference populations are colored on the left and Peruvian populations are colored on the right. Each point is one haplotype. In the African ASPCA we note three outliers among our samples, two from Trujillo and one from Iquitos, that cluster closer to the Luhya and Luo populations, though not directly. It is likely that these individuals share ancestry with other regions of Africa in addition to western Africa, but we cannot test this hypothesis explicitly as we have too few samples.

A similar policy of large-scale consolidation of multiple Native American populations was continued during Spanish rule through their program of reducciones, or reductions (31, 32), which is consistent with the hypothesis that the Inca and Spanish had a profound impact on Peruvian demography (25). The result of these movements of people created early New World cosmopolitan communities with genetic diversity from the Andes, Amazon, and coast regions as is evidenced by mestizo populations’ ancestry proportions (Fig. 3A). Following Peruvian independence, these cosmopolitan populations were those same ones that predominantly admixed with the Spanish (Fig. 3B). Therefore, this supports our model that the Inca Empire and Spanish colonial rule created these diverse populations as a result of admixture between multiple Native American ancestries, which would then go on to become the modern mestizo populations by admixing with the Spanish after Peruvian independence.

Further, it is interesting that this admixture began before the urbanization of Peru (26) because others suspected the urbanization process would greatly impact the ancestry patterns in these urban centers (25). (…)


Bantu distinguished from Khoe by uniparental markers, not genome-wide autosomal admixture


The role of matrilineality in shaping patterns of Y chromosome and mtDNA sequence variation in southwestern Angola, by Oliveira et al. bioRxiv (2018).

Interesting excerpts (emphasis mine):

The origins of NRY diversity in SW Angola

In accordance with our previous mtDNA study9, the present NRY analysis reveals a major division between the Kx’a-speaking !Xun and the Bantu-speaking groups, whose paternal genetic ancestry does not display any old remnant lineages, or a clear link to pre-Bantu eastern African migrants introducing Khoe-Kwadi languages and pastoralism into southern Africa (cf. 15). This is especially evident in the distribution of the eastern African subhaplogroup E1b1b1b2b29, which reaches the highest frequency in the !Xun (25%) and not in the formerly Kwadi-speaking Kwepe (7%). This observation, together with recent genome-wide estimates of 9-22% of eastern African ancestry in other Kx’a and Tuu-speaking groups35, suggests that eastern African admixture was not restricted to present-day Khoe-Kwadi speakers. Alternatively, it is likely that the dispersal of pastoralism and Khoe-Kwadi languages involved a series of punctuated contacts that led to a wide variety of cultural, genetic and linguistic outcomes, including possible shifts to Khoe-Kwadi by originally Bantu-speaking peoples36.

Although traces of an ancestral pre-Bantu population may yet be found in autosomal genome-wide studies, the extant variation in both uniparental markers strongly supports a scenario in which all groups of the Angolan Namib share most of their genetic ancestry with other Bantu groups but became increasingly differentiated within the highly stratified social context of SW African pastoral societies11.

Y chromosome phylogeny, haplogroup distribution and map of the sampling locations. The phylogenetic tree was reconstructed in BEAST based on 2,379 SNPs and is in accordance with the known Y chromosome topology. Main haplogroup clades and their labels are shown with different colors. Age estimates are reported in italics near each node, with the TMRCA of main haplogroups shown with their corresponding color. A map of the sampling locations, re-used with permission from Oliveira et al. (2018) 9, is shown on the bottom left, and the haplogroup distribution per population is shown on the bottom right, with color-coding corresponding to the phylogenetic tree.

The influence of socio-cultural behaviors on the diversity of NRY and mtDNA

A comparison of the NRY variation with previous mtDNA results for the same groups 9 identifies three main sex-specific patterns. First, gene flow from the Bantu into the !Xun is much higher for male than for female lineages (31% NRY vs. 3% mtDNA), similar to the reported male-biased patterns of gene flow from Bantu to Khoisan-speaking groups33, and from non-Pygmies to Pygmies in Central Africa 37. A comparable trend, involving exclusive introgression of NRY eastern African lineages into the !Xun (25%) was also found. (…)

Secondly, the levels of intrapopulation diversity in the Bantu-speaking peoples from the Namib were found to be consistently higher for mtDNA than for the NRY, reflecting the marked association between the Bantu expansion and the relatively young NRY E1b1a1a1 haplogroup, which has no parallel in mtDNA25,39. (…)

In the context of the Bantu expansions, these patterns have been mostly interpreted as the result of polygyny and/or higher levels of assimilation of females from resident forager communities38,40. However, most groups from the Angolan Namib are only mildly polygynous11 and ethnographic data suggest that the actual rates of polygyny in many populations may be insufficient to significantly reduce Nem2,41. In addition, the finding of a large Nef/ Nem ratio in the Himba (Fig. S5), who have almost no Khoisan-related mtDNA lineages9, indicates that female biased introgression cannot fully explain the observed patterns.

An alternative explanation may be sought in the prevailing matrilineal descent rules, which might have created a sex-specific structuring effect, similar to that proposed for patrilineal groups from Central Asia (…)

Bayesian skyline plots (BSP) of effective population size change through time, based on mtDNA (red) and the NRY (black). Thick lines show the mean estimates and dashed lines show the 95% HPD intervals. The vertical line highlights the 2 ky before present mark. Effective sizes are plotted on a log scale. Generation times of 25 and 31 years were assumed for mtDNA and the NRY, respectively32.

The third important sex-specific pattern observed in this study is the much lower amount of between-group differentiation for NRY than for mtDNA among Bantu-speaking populations (4.4% NRY vs. 20.2% mtDNA), in spite of the patrilocal residence patterns of all ethnic groups (Table S5). This difference can hardly be explained by unequal levels of introgression of “Khoisan” mtDNA lineages into the Bantu, since the percentage of mtDNA variation remains high (18.8%) when the Kuvale, who have high frequencies of “Khoisan”-related mtDNA, are excluded from the comparisons. It therefore seems more plausible that differentiation is higher in the mtDNA simply because there is more ancestral mtDNA than NRY variation that can be sorted among different populations (see 45). Moreover, due to the matriclanic organization of all Bantu-speaking communities, factors enhancing inter-group differentiation, like kin-structured migration and kin-structured founder effects46, would have been restricted to mtDNA. Finally, it is also likely that the discrepancy between among-group divergence of mtDNA and NRY might have been influenced by higher migration rates in males than females. In fact, although all Bantu-speaking populations have patrilocal residence patterns, the observance of endogamy rules severely constrains the between-group mobility of females. In this context, the children from extramarital unions involving members from different populations tend to be raised in the mother’s group, effectively increasing male versus female migration rates. Moreover, it is likely that, in the highly hierarchized setting of the Namib, most intergroup extramarital unions would involve men from dominant groups and women from peripatetic communities. This hypothesis is indirectly supported by the finding that in NRY-based clusters (but not in mtDNA) pastoralist populations are grouped together with peripatetic communities that share their cultural traits (Figs. S6 and 3b), suggesting that migration of NRY lineages follows a path that is similar to horizontally transmitted cultural features.


Native American genetic continuity and oldest mtDNA hg A2ah in the Andean region

Native American gene continuity to the modern admixed population from the Colombian Andes: Implication for biomedical, population and forensic studies by Criollo-Rayo et al., Forensic Sci Int Genet (2018), in press, corrected proof.

Abstract (emphasis mine):

Andean populations have variable degrees of Native American and European ancestry, representing an opportunity to study admixture dynamics in the populations from Latin America (also known as Hispanics). We characterized the genetic structure of two indigenous (Nasa and Pijao) and three admixed (Ibagué, Ortega and Planadas) groups from Tolima, in the Colombian Andes. DNA samples from 348 individuals were genotyped for six mitochondrial DNA (mtDNA), seven non-recombining Y-chromosome (NRY) region and 100 autosomal ancestry informative markers. Nasa and Pijao had a predominant Native American ancestry at the autosomal (92%), maternal (97%) and paternal (70%) level. The admixed groups had a predominant Native American mtDNA ancestry (90%), a substantial frequency of European NRY haplotypes (72%) and similar autosomal contributions from Europeans (51%) and Amerindians (45%). Pijao and nearby Ortega were indistinguishable at the mtDNA and autosomal level, suggesting a genetic continuity between them. Comparisons with multiple Native American populations throughout the Americas revealed that Pijao, had close similarities with Carib-speakers from distant parts of the continent, suggesting an ancient correlation between language and genes. In summary, our study aimed to understand Hispanic patterns of migration, settlement and admixture, supporting an extensive contribution of local Amerindian women to the gene pool of admixed groups and consistent with previous reports of European-male driven admixture in Colombia.

Ancestral uniparental haplogroups and diversity in Tolima. Geography of sampling locations. The
top and middle sections show the frequency of Native American mtDNA haplogroups and NRY lineages for all
populations. Gene diversity is shown below their respective pie chart. The lower part depicts the geography of the
region where the sampling sites of Ortega and Pijao are closely located in Tolima’s Magdalena river valley and
Ibague, Planadas and Nasa located in the Andes cordilleras (additional geographic details are shown in SF1).

Highlights from the paper:

  • MtDNA suggest a pre/post Columbian genetic continuity in the Colombian Andes.
  • Y-chromosome diversity follows a clinal gradient in the studied region.
  • Sex-biased/male-driven admixture process, involving Pijao women with European men.
  • Admixed closer to Indigenous resguardos have a higher Native American ancestry.

Also interesting is the recent paper Mitochondrial lineage A2ah found in a pre‐Hispanic individual from the Andean region, by Russo et al., in American Journal of Human Biology (2018), with an interesting sample from the Regional Developments II period (540 ± 60 BP).

Phylogeny of the A2ah mitochondrial lineage based on HVR I sequences. Both MaximumParsimony andMaximumLikelihood reconstructions led to the same typology. The tree was rooted with the RSRS. Sample ID: Cueva: Pukara de La Cueva, STACRUZ: Santa Cruz, BNI: Beni, BR: South-eastern Brazil, TobaChA: TobaGranChaco


Domesticated horse population structure, selection, and mtDNA geographic patterns


Open access Detecting the Population Structure and Scanning for Signatures of Selection in Horses (Equus caballus) From Whole-Genome Sequencing Data, by Zhang et al, Evolutionary Bioinformatics (2018) 14:1–9.

Abstract (emphasis mine):

Animal domestication gives rise to gradual changes at the genomic level through selection in populations. Selective sweeps have been traced in the genomes of many animal species, including humans, cattle, and dogs. However, little is known regarding positional candidate genes and genomic regions that exhibit signatures of selection in domestic horses. In addition, an understanding of the genetic processes underlying horse domestication, especially the origin of Chinese native populations, is still lacking. In our study, we generated whole genome sequences from 4 Chinese native horses and combined them with 48 publicly available full genome sequences, from which 15 341 213 high-quality unique single-nucleotide polymorphism variants were identified. Kazakh and Lichuan horses are 2 typical Asian native breeds that were formed in Kazakh or Northwest China and South China, respectively. We detected 1390 loss-of-function (LoF) variants in protein-coding genes, and gene ontology (GO) enrichment analysis revealed that some LoF-affected genes were overrepresented in GO terms related to the immune response. Bayesian clustering, distance analysis, and principal component analysis demonstrated that the population structure of these breeds largely reflected weak geographic patterns. Kazakh and Lichuan horses were assigned to the same lineage with other Asian native breeds, in agreement with previous studies on the genetic origin of Chinese domestic horses. We applied the composite likelihood ratio method to scan for genomic regions showing signals of recent selection in the horse genome. A total of 1052 genomic windows of 10 kB, corresponding to 933 distinct core regions, significantly exceeded neutral simulations. The GO enrichment analysis revealed that the genes under selective sweeps were overrepresented with GO terms, including “negative regulation of canonical Wnt signaling pathway,” “muscle contraction,” and “axon guidance.” Frequent exercise training in domestic horses may have resulted in changes in the expression of genes related to metabolism, muscle structure, and the nervous system.

Bayesian clustering output for 5 K values from K = 2 to K = 8 in 45 domestic horses. Each individual is represented by a vertical line, which is partitioned into colored segments that represent the proportion of the inferred K clusters.

Interesting excerpts:

Admixture proportions were assessed without user-defined population information to infer the presence of distinct populations among the samples (Figure 2). At K = 3 or K = 4, Franches-Montagnes and Arabian forms one unique cluster; at K = 5, Jeju pony forms one unique cluster. For other breeds, comparatively strong population structure exists among breeds, and they can be assigned to 2 (or 3) alternate clusters from K = 3 to K = 5 including group A (Duelmener, Fjord, Icelandic, Kazakh, Lichuan, and Mongolian) and group B (Hanoverian, Morgan, Quarter, Sorraia, and Standardbred). For group A, geographically this was unexpected, where Nordic breeds (Norwegian Fjord, Icelandic, and Duelmener) clustered with Asian breeds including the Mongolian. Previous results of mitochondrial DNA have revealed links between the Mongolian horse and breeds in Iceland, Scandinavia, Central Europe, and the British Isles. The Mongol horses are believed to have been originally imported from Russia subsequently became the basis for the Norwegian Fjord horse.31 At K = 6, Sorraia forms one unique cluster. The Sorraia horse has no long history as a domestic breed but is considered to be of a nearly ancestral type in the southern part of the Iberian Peninsula.32 However, our result did not support Sorraia as an independent ancestral type based on result from K = 2 to K = 5, and the unique cluster in K = 6 may be explained by the small population size and recently inbreeding programs. Genetic admixture of Morgan reveals that these breeds are currently or traditionally continually crossed with other breeds from K = 2 to K = 8. The Morgan horse has been a largely closed breed for 200 years or more but there has been some unreported crossbreeding in recent times.33

Principal component analysis results of all 48 horses. The x-axis denotes the value of PC1, whereas the y-axis denotes the value of PC2. Each dot in the figure represents one individual.

Bayesian clustering and PCA demonstrated the relationships among the horse breeds with weak geographic patterns. The tight grouping within most native breeds and looser grouping of individuals in admixed breeds have been reported previously in modern horses using data from a 54K SNP chip.33,34 Cluster analysis reveals that Arabian or Franches-Montagnes forms one unique cluster with relatively low K value, which is consistent with former study using 50K SNP chip 33,34 Interestingly, Standardbred forms a unique cluster with relatively high K value in this study, different from previous study.33 To date, no footprints are available to describe how the earliest domestic horses spread into China in ancient times. Our study found that Kazakh and Lichuan were assigned to the same lineage as other native Asian breeds, in agreement with previous studies on the origin of Chinese domestic horses.4,5,35,36 The strong genetic relationship between Asian native breeds and European native breeds have made it more difficult to understand the population history of the horse across Eurasia. Low levels of population differentiation observed between breeds might be explained by historical admixture. Unlike the domestic pig in China,8  we suggest that in China, Northern/Southern distinct groups could not be used to genetically distinct native Chinese horse breeds. We consider that during domestication process of horse, gene flow continued among Chinese-domesticated horses.

Open access Some maternal lineages of domestic horses may have origins in East Asia revealed with further evidence of mitochondrial genomes and HVR-1 sequences, by Ma et al., PeerJ (2018).


There are large populations of indigenous horse (Equus caballus) in China and some other parts of East Asia. However, their matrilineal genetic diversity and origin remained poorly understood. Using a combination of mitochondrial DNA (mtDNA) and hypervariable region (HVR-1) sequences, we aim to investigate the origin of matrilineal inheritance in these domestic horses.

To investigate patterns of matrilineal inheritance in domestic horses, we conducted a phylogenetic study using 31 de novo mtDNA genomes together with 317 others from the GenBank. In terms of the updated phylogeny, a total of 5,180 horse mitochondrial HVR-1 sequences were analyzed.

Eighteen haplogroups (Aw-Rw) were uncovered from the analysis of the whole mitochondrial genomes. Most of which have a divergence time before the earliest domestication of wild horses (about 5,800 years ago) and during the Upper Paleolithic (35–10 KYA). The distribution of some haplogroups shows geographic patterns. The Lw haplogroup contained a significantly higher proportion of European horses than the horses from other regions, while haplogroups Jw, Rw, and some maternal lineages of Cw, have a higher frequency in the horses from East Asia. The 5,180 sequences of horse mitochondrial HVR-1 form nine major haplogroups (A-I). We revealed a corresponding relationship between the haplotypes of HVR-1 and those of whole mitochondrial DNA sequences. The data of the HVR-1 sequences also suggests that Jw, Rw, and some haplotypes of Cw may have originated in East Asia while Lw probably formed in Europe.

Our study supports the hypothesis of the multiple origins of the maternal lineage of domestic horses and some maternal lineages of domestic horses may have originated from East Asia.

Median joining network constructed based on the 247- bp HVR-1 sequences. Circles are proportional to the number of horses represented and a scale indicator (for node sizes) was provided. The length of lines represents the number of variants that separate nodes (some manual adjustment was made for visually good). In the circles, the colors of solid pie slices indicate studied horse populations: Orange, European horses; Blue, horses of West Asia; Light Green, horses from East Asia; Grey, ancient horses; Purper, Przewalskii horses.

Geographic distributions of horse mtDNA haplogroups

The analysis of geographic distribution of the mitochondrial genome haplogroups showed that horse populations in Europe or East Asia included all haplogroups defined from the mtDNA genome sequences. The lineage Fw comprised entirely of Przewalskii horses. The two haplogroups Iw and Lw displayed frequency peaks in Europe (14.08% and 37.32%, respectively) and a decline to the east (9.33% and 8.00% in the West Asia, and 6.45% and 12.90% in East Asia, respectively), especially for Lw, which contained the largest number of European horses (Table 2). However, an opposite distribution pattern was observed for haplogroups Aw, Hw, Jw, and Rw, which were harbored by more horses from East Asia than those from other regions. The proportions of horses from East Asia for the four haplogroups were 38%, 88%, 62%, and 54%, respectively.

Schematic phylogeny of mtDNAs genome from modern horses. This tree includes 348 sequences
and was rooted at a donkey (E. asinus) mitochondrial genome (not displayed). The topology was inferred by a beast approach, whereas a time divergence scale (based on rate substitutions) is shown on the bottom (age estimates were indicated with thousand years (KY)). The percentages on each branch represent Bayesian posterior credibility and the alphabets on the right represent the names of haplogroups. Additional details concerning ages were given in Tables S3 and S6.


Bayesian estimation of partial population continuity by using ancient DNA and spatially explicit simulations


Open access Bayesian estimation of partial population continuity by using ancient DNA and spatially explicit simulations, by Silva et al., Evolutionary Applications (2018).

Abstract (emphasis mine):

The retrieval of ancient DNA from osteological material provides direct evidence of human genetic diversity in the past. Ancient DNA samples are often used to investigate whether there was population continuity in the settlement history of an area. Methods based on the serial coalescent algorithm have been developed to test whether the population continuity hypothesis can be statistically rejected by analysing DNA samples from the same region but of different ages. Rejection of this hypothesis is indicative of a large genetic shift, possibly due to immigration occurring between two sampling times. However, this approach is only able to reject a model of full continuity model (a total absence of genetic input from outside), but admixture between local and immigrant populations may lead to partial continuity. We have recently developed a method to test for population continuity that explicitly considers the spatial and temporal dynamics of populations. Here we extended this approach to estimate the proportion of genetic continuity between two populations, by using ancient genetic samples. We applied our original approach to the question of the Neolithic transition in Central Europe. Our results confirmed the rejection of full continuity, but our approach represents an important step forward by estimating the relative contribution of immigrant farmers and of local hunter‐gatherers to the final Central European Neolithic genetic pool. Furthermore, we show that a substantial proportion of genes brought by the farmers in this region were assimilated from other hunter‐gatherer populations along the way from Anatolia, which was not detectable by previous continuity tests. Our approach is also able to jointly estimate demographic parameters, as we show here by finding both low density and low migration rate for pre‐Neolithic hunter‐gatherers. It provides a useful tool for the analysis of the numerous aDNA datasets that are currently being produced for many different species.

A) Different zones defined for computing proportions of ancestry in Central Europeans 4,500 BP. B) Schematic representation of various population contributions. C) Mean proportions of ancestry from the various PHG zones (A+B+C+D) in Central European populations from zone A at the end of the Neolithic transition 4,500 BP, computed for autosomal and mitochondrial markers.

Relevant excerpts:

Our results are in general accordance with two distinct ancestry components that have previously been detected at the continental scale by Lazaridis, Patterson et al. (2014): the “early European farmer” (EEF), which corresponds here to the NFA from Anatolia (zone C in Figure 3), and the “West European hunter-gatherer” (WHG), which corresponds here to the PHG from zones A and B in Figure 3. Notably, the contribution of an Ancient North Eurasians (ANE) component is not included in our model as we did not consider potential post-Neolithic immigration waves, which could have contributed to the modern European genetic pool, such as the wave that came from the Pontic steppes and was associated with the Yamnaya culture (Haak, Lazaridis et al. 2015). Without considering the ANE ancestry component, our estimate of the autosomal genetic contribution of Early farmers to the gene pool of Central European populations (25%) tends to be lower than the EEF ancestry estimated in most modern Western European populations, but is of the same order than the estimations in modern Estonians and in the ancient Late Neolithic genome “Karsdorf” from Germany (Lazaridis, Patterson et al. 2014, Haak, Lazaridis et al. 2015). Note that the contribution of hunter-gatherers to Neolithic communities appears to be variable in different regions of Europe (Skoglund, Malmstrom et al. 2012, Brandt, Haak et al. 2013, Lazaridis, Patterson et al. 2014), while we computed an average value for Central Europe. Moreover, we computed the ancestry of the two groups at the end of the Neolithic period while previous studies estimated it in modern times. Finally, previous studies used molecular information to directly estimate admixture proportions, while we use molecular information to estimate the model parameters and, then, we computed the expected genetic contributions of both groups using the best parameters, without using molecular information during this second step. Model assumptions may thus influence the inferences on the relative genetic contribution of both groups. In particular, we made the assumption of a uniform expansion of NFA with constant and similar assimilation of PHG over the whole continent but spatio-temporally heterogeneous environment, variable assimilation rate and long distance dispersal may have played an important role. The effects of those factors should be investigated in future studies.

Contrastive principal component analysis (cPCA) to explore patterns specific to a dataset

Interesting open access paper Exploring patterns enriched in a dataset with contrastive principal component analysis, by Abid, Zhang, Bagaria & Zou, Nature Communications (2018) 9:2134.

Abstract (emphasis mine):

Visualization and exploration of high-dimensional data is a ubiquitous challenge across disciplines. Widely used techniques such as principal component analysis (PCA) aim to identify dominant trends in one dataset. However, in many settings we have datasets collected under different conditions, e.g., a treatment and a control experiment, and we are interested in visualizing and exploring patterns that are specific to one dataset. This paper proposes a method, contrastive principal component analysis (cPCA), which identifies low-dimensional structures that are enriched in a dataset relative to comparison data. In a wide variety of experiments, we demonstrate that cPCA with a background dataset enables us to visualize dataset-specific patterns missed by PCA and other standard methods. We further provide a geometric interpretation of cPCA and strong mathematical guarantees. An implementation of cPCA is publicly available, and can be used for exploratory data analysis in many applications where PCA is currently used.

Schematic Overview of cPCA. To perform cPCA, compute the covariance matrices C X , C Y of the target and background datasets. The singular vectors of the weighted difference of the covariance matrices, C X  − α · C Y , are the directions returned by cPCA. As shown in the scatter plot on the right, PCA (on the target data) identifies the direction that has the highest variance in the target data, while cPCA identifies the direction that has a higher variance in the target data as compared to the background data. Projecting the target data onto the latter direction gives patterns unique to the target data and often reveals structure that is missed by PCA. Specifically, in this example, reducing the dimensionality of the target data by cPCA would reveal two distinct clusters

The Mexican example caught my attention:

Relationship between ancestral groups in Mexico

In previous examples, we have seen that cPCA allows the user to discover subclasses within a target dataset that are not labeled a priori. However, even when subclasses are known ahead of time, dimensionality reduction can be a useful way to visualize the relationship within groups. For example, PCA is often used to visualize the relationship between ethnic populations based on genetic variants, because projecting the genetic variants onto two dimensions often produces maps that offer striking visualizations of geographic and historic trends26,27. But again, PCA is limited to identifying the most dominant structure; when this represents universal or uninteresting variation, cPCA can be more effective at visualizing trends.

The dataset that we use for this example consists of single nucleotide polymorphisms (SNPs) from the genomes of individuals from five states in Mexico, collected in a previous study28. Mexican ancestry is challenging to analyze using PCA since the PCs usually do not reflect geographic origin within Mexico; instead, they reflect the proportion of European/Native American heritage of each Mexican individual, which dominates and obscures differences due to geographic origin within Mexico (see Fig. 4a). To overcome this problem, population geneticists manually prune SNPs, removing those known to derive from Europeans ancestry, before applying PCA. However, this procedure is of limited applicability since it requires knowing the origin of the SNPs and that the source of background variation to be very different from the variation of interest, which are often not the case.

Relationship between Mexican ancestry groups. a PCA applied to genetic data from individuals from 5 Mexican states does not reveal any visually discernible patterns in the embedded data. b cPCA applied to the same dataset reveals patterns in the data: individuals from the same state are clustered closer together in the cPCA embedding. c Furthermore, the distribution of the points reveals relationships between the groups that matches the geographic location of the different states: for example, individuals from geographically adjacent states are adjacent in the embedding. c Adapted from a map of Mexico that is originally the work of User:Allstrak at Wikipedia, published under a CC-BY-SA license, sourced from

As an alternative, we use cPCA with a background dataset that consists of individuals from Mexico and from Europe. This background is dominated by Native American/European variation, allowing us to isolate the intra-Mexican variation in the target dataset. The results of applying cPCA are shown in Fig. 4b. We find that individuals from the same state in Mexico are embedded closer together. Furthermore, the two groups that are the most divergent are the Sonorans and the Mayans from Yucatan, which are also the most geographically distant within Mexico, while Mexicans from the other three states are close to each other, both geographically as well as in the embedding captured by cPCA (see Fig. 4c). See also Supplementary Fig. 6 for more details.

So, by using a background dataset, it discovers patterns in a single target dataset via dimensionality reduction, that standard dimensionality reduction techniques do not discover. Maybe useful for some prehistoric populations, too…

They have released a Python implementation of cPCA on GitHub:, including Python notebooks and datasets.

See also:

Tales of Human Migration, Admixture, and Selection in Africa


Comprehensive review (behind paywall) Tales of Human Migration, Admixture, and Selection in Africa, by Carina M. Schlebusch & Mattias Jakobsson, Annual Review of Genomics and Human Genetics (2018), Vol. 9.

Abstract (emphasis mine):

In the last three decades, genetic studies have played an increasingly important role in exploring human history. They have helped to conclusively establish that anatomically modern humans first appeared in Africa roughly 250,000–350,000 years before present and subsequently migrated to other parts of the world. The history of humans in Africa is complex and includes demographic events that influenced patterns of genetic variation across the continent. Through genetic studies, it has become evident that deep African population history is captured by relationships among African hunter–gatherers, as the world’s deepest population divergences occur among these groups, and that the deepest population divergence dates to 300,000 years before present. However, the spread of pastoralism and agriculture in the last few thousand years has shaped the geographic distribution of present-day Africans and their genetic diversity. With today’s sequencing technologies, we can obtain full genome sequences from diverse sets of extant and prehistoric Africans. The coming years will contribute exciting new insights toward deciphering human evolutionary history in Africa.

Regarding potential Afroasiatic origins and expansions:

It is currently believed that farming practices in northeastern and eastern Africa developed independently in the Sahara/Sahel (around 7,000 BP) and the Ethiopian highlands (7,000–4,000 BP), while farming in the Nile River Valley developed as a consequence of the Neolithic Revolution in the Middle East (84). Northeastern and eastern African farmers today speak languages from the Afro-Asiatic and Nilo-Saharan linguistic groups, which is also reflected in their genetic affinities (Figure 3, K=6). In the northern parts of East Africa (South Sudan, Somalia, and Ethiopia), Nilo-Saharan and Afro-Asiatic speakers with farming lifeways have completely replaced hunter–gatherers. It is still largely unclear how farming and herding practices influenced the northeastern African prefarming population structure and whether the spread of farming is better explained by demic or cultural diffusion in this part of the world. Genetic studies of contemporary populations and aDNA have started to provide some insights into population continuity and incoming gene flow in this region of Africa.

Demographic model of African history and estimated divergences. (a) Population split times, hierarchy, and population sizes (summarized from 123). Horizontal width represents population size; horizontal colored lines represent migrations, with down-pointing triangles indicating admixture into another group. (b) Population structure analysis at 5 assumed ancestries (K=5) for 93 African and 6 non-African populations. Non-Africans (brown), East Africans (blue), West Africans ( green), central African hunter–gatherers (light blue), and Khoe-San (red ) populations are sorted according to their broad historical distributions.

For example, studies have shown that a back-migration from Eurasia into Africa affected most of northeastern and eastern Africa (36, 46, 53, 89, 132) (Figure 1b). A genetic baseline of eastern African ancestral genetic variation unaffected by recent Eurasian admixture and farming migrations within the last 4,500 years has been suggested in the form of the genome sequence of a 4,500-year-old individual from Mota, Ethiopia (36). Based on comparisons with the ancient Mota genome, we know that certain populations from northeastern Africa show deep continuity in their local area with very limited gene flow resulting from recent population movements. For example, the Nilotic herder populations from South Sudan (e.g., Dinka, Nuer, and Shilluk) appear to have remained relatively isolated over time and received little to no gene flow from Eurasians, West African Bantu-speaking farmers, and other surrounding groups (53) (Figures 2 and 3). By contrast, the Nubian and Arab populations to their north show gene flow with Eurasians, which has been connected to the Arab expansion (53). The Nubian, Arab, and Beja populations of northeastern Africa roughly display equal admixture fractions from a local northeastern African gene pool (similar to the Nilotic component) and an incoming Eurasian migrant component (53) (Figure 3). The Eurasian component has been linked to the Middle East and the Arab migration, but only the Arab groups shifted to the Semitic languages; the Nubians and Beja groups kept their original languages. The Eurasian gene flow appears to have spread from north to south along the Nile and Blue Nile in a succession of admixture events (53).

Skoglund and Mathieson’s preprint has also been published in the same volume, without meaningful changes.