The expansion of Indo-Europeans in Y-chromosome haplogroups

yamnaya-corded-ware-y-dna-haplogroups

I have been playing around a little more with GIS tools and haplogroups, and I managed to get some interesting outputs.

I made a video with a timeline of the evolution of Indo-European speakers, according to what is known today about reconstructed languages, prehistoric cultures and ancient DNA:

yamnaya-expansion

NOTE. The video is best viewed in HD 1080p (1920×1080) with a display that allows for this or greater video quality, and a screen big enough to see haplogroup symbols, i.e. tablet or greater. The YouTube link is here. The Facebook link is here.

Based on the results of the past 5 years or so, which have been confirming this combined picture every single time, I doubt there will be much need to change it in any radical way, as only minor details remain to be clarified.

Haplogroup maps

I wanted to publish a GIS tool of my own for everyone to have an updated reference of all data I use for my books.

The most complex GIS tools consume too many resources when used online in a client-server model, so I have to keep that to myself, but there are some ways to publish low quality outputs.

The files below include the possibility to zoom some levels to be able to see more samples, and also to check each one for more information on their ID, attributed culture and label, archaeological site, source paper, subclade (and people responsible for SNP inferences if any), etc.

Some usage notes:

  • Files are large (ca. 20 Mb), so they still take some time to load.
  • For the meaning of symbols and colors (for Y-DNA haplogroups), if there is any doubt, check the video above.
  • Pop-ups with sample information will work on desktop browsers by clicking on them, apparently not on smartphone and related tactile OS. I have changed the settings to show pop-ups on hover, so that it now works (to some extent) on tactile OS.
  • The search tool can look for specific samples according to their official ID, and works by highlighting the symbol of the selected individual (turning it into a bright blue dot), and leading the layer view to the location, but it seems to work best only with some browser and OS settings – in other browsers, you need to zoom out to see where the dot is located. The specific sample with its information could paradoxically disappear in search mode, so you might need to reload and look again for the same site that was highlighted.
  • Latitude and longitude values have been randomly modified to avoid samples overcrowding specific sites, so they are not the original ones.

Y-DNA

There are three versions:

  1. Labels with more specific subclades (including negative SNPs), using YTree for R1b samples (whenever it conflicts with YFull).
  2. Labels with YFull nomenclature.
  3. Simbols without labels (more symbols visible per layer).

y-dna-haplogroups

mtDNA

There are two versions:

  1. Symbols with labels.
  2. Symbols without labels.

NOTE. Because there are too many samples at the starting view, depending on the file you should zoom some levels to start seeing symbols.

mtdna-haplogroups

ADMIXTURE

I have tried running supervised ADMIXTURE models by selecting distant populations based on PCA and qpAdm results, but it seems to work fine only for a small K number, being easily improved when running it unsupervised.

Adding distant populations seems to improve or mess up with the results in unpredictable ways, too, so at this point I doubt ADMIXTURE (or anything other than qpAdm) is actually useful to obtain anything precise in terms of ancestry evolution, although it can give a good overall idea of rough ancestry changes, if K is kept small enough.

Anyway, I will keep trying to find a simple way to show the actual evolution and expansion of “Steppe ancestry”. Since every single run for thousands of samples takes days, I don’t really know if and when I will find something interesting to show…

See also

A multidisciplinary approach to Neolithic life reconstruction

france-neolithic

Open access A Multidisciplinary Approach to Neolithic Life Reconstruction, by Goude et al. J Archaeol Method Th (2018).

Abstract (emphasis mine):

The expansion of Neolithic stable isotope studies in France now allows distinct regional population-scale food patterns to be linked to both local environment influences and specific economic choices. Carbon and nitrogen isotope values of more than 500 humans and of animal samples also permit hypotheses on sex-biased human provenance. To advance population scale research, we here present the first study that draws together carbon (C), nitrogen (N), sulphur (S) and strontium (Sr), dental calculus, aDNA, and palaeoparasitology analysis to infer intra-population patterns of diet and provenance in a Middle Neolithic population from Le Vigneau 2 (human = 40; fauna = 12; 4720–4350 cal. BC) from north-western France. The data of the different studies, such as palaeoparasitology to detect diet and hygiene, CNS isotopes and dental calculus analysis to examine dietary staples, Sr and S isotopes to discriminate non-locals, and aDNA to detect maternal (mtDNA) versus paternal lineages (Y chromosome), were compared to anthropological information of sex and age. Collagen isotope data suggest a similar diet for all individuals except for one child. The provenance isotopic studies suggest no clear differences between sexes, suggesting both males and females used the territory in a similar pattern and had access to foods from the same environments.

internal-external-burials
Radiogenic strontium isotope ratios from human teeth

Relevant excerpt:

With regard to aDNA analysis and the information this reveals on genetic provenance, Table 1 presents the mitochondrial haplogroups (SNPs typing) retrieved from the human remains. SNPs typing made it possible to assign one individual (LVH3, male < 60 years old) to maternal lineage K (or derivatives), and another individual (LVH12) to lineage H (or derivatives), whereas the low number of SNPs recovered for the last sample (LVH26) did not make it possible to assign any haplogroup. No Y chromosome SNP, as well as no reproducible result for HVR-I sequences, could be obtained for any Le Vigneau 2 individual. Unfortunately, major DNA degradation prevents precise identification of the maternal and paternal lineages, and these two mitochondrial haplogroups do not allow any assessment about female mobility. However, we can note that maternal lineages characterized in the Le Vigneau 2 site are quite common in Neolithic farmer groups and fit within the French Middle Neolithic variability (from 14 to 25.5% for haplogroup K and from 7.9 to 40.9% for haplogroup H; Beau et al. 2017), including farmers from the Paris Basin (35% of H and 18.33% of K for the Gurgy site; Rivollat et al. 2015).

FADS1 and the timing of human adaptation to agriculture

fads1-farmers

Open access FADS1 and the timing of human adaptation to agriculture, by Sara Mathieson & Iain Mathieson, bioRxiv (2018).

Abstract:

Variation at the FADS1/FADS2 gene cluster is functionally associated with differences in lipid metabolism and is often hypothesized to reflect adaptation to an agricultural diet. Here, we test the evidence for this relationship using both modern and ancient DNA data. We document pre-out-of-Africa selection for both the derived and ancestral FADS1 alleles and show that almost all the inhabitants of Europe carried the ancestral allele until the derived allele was introduced approximately 8,500 years ago by Early Neolithic farming populations. However, we also show that it was not under strong selection in these populations. Further, we find that this allele, and other proposed agricultural adaptations including variants at LCT/MCM6, SLC22A4 and NAT2, were not strongly selected until the Bronze Age, 2,000-4,000 years ago. Similarly, increased copy number variation at the salivary amylase gene AMY1 is not linked to the development of agriculture although in this case, the putative adaptation precedes the agricultural transition. Our analysis shows that selection at the FADS locus was not tightly linked to the development of agriculture. Further, it suggests that the strongest signals of recent human adaptation may not have been driven by the agricultural transition but by more recent changes in environment or by increased efficiency of selection due to increases in effective population size.

Interesting excerpt for the steppe-related expansion:

agricultural-adaptation-allele-frequency
Allele frequency trajectories for other putative agricultural adaptation variants. As in Figure 2C, estimated allele frequency trajectories and selection coefficients in different ancient European populations. Significant selection coefficients are labelled.

In the case of FADS1 and all the other examples we investigated, the proposed agricultural adaption was either not temporally linked with agriculture or showed no evidence of selection in agricultural populations. Instead, most of the variants with any evidence of selection were only strongly selected at some point between the Bronze Age and the present day, that is, in a period starting 2000-4000 BP and continuing until the present. This time period is one in which there is relatively limited ancient DNA data, and so we are unable to determine the timing of selection any more accurately. Future research should address the question of why this recent time period saw the most rapid changes in apparently diet associated genes. One plausible hypothesis is that the change in environment at this time was actually more dramatic than the earlier change associated with agriculture. Another is that effective population sizes were so small before this time that selection did not operate efficiently on variants with small selection coefficients. For example, analysis of present-day genomes from the United Kingdom suggests that effective population size increased by a factor of 100-1000 in the past 4500 years (Browning and Browning 2015). Ancient effective population sizes less that 104 would suggest that those populations would not be able to efficiently select for variants with selection coefficients on the order of 10-4 or smaller. Larger ancient DNA datasets from the past 4,000 years will likely resolve this question.

This complexity of the reasons for selection reminded me of the comment by Narasimhan on lactase persistence expanding with steppe populations into Central Asia (based on data of the paper where he is the first author):

I always thought that to argue for natural selection in humans (viz. skin color, lactase persistence, etc.) was possible for archaic groups over tens of thousands of years, but that more recent selections would be very difficult to prove, in so far as historical population expansions involve more ‘artificial’ (i.e. man-made or man-caused) societal changes.

NOTE. I am probably more inclined to think about regional outbreaks (especially of new diseases) as one of the few potential short-term selection mechanisms in historical societies, because of their potential to create sudden bottlenecks of better fitted survivors.

I think recent works like these are showing a mixed situation, where maybe some traits were strongly selected for environmental reasons; but most of the time they were probably – like, say, Y-DNA haplogroup bottlenecks in Europe after the steppe-related expansions – due mostly to chance.

Phylogeny of leprosy, relevant for prehistoric Eurasian contacts

leprosy-medieval-europe

Some interesting studies were published at roughly the same time as Damgaard et al. (Nature 2018 and Science 2018), and that’s probably why they got little attention (at least by me).

Monica H. Green (also in Academia.edu), specialized in History of Medicine, summed up their relevance in Twitter quite well (her text is edited here for clarity):

I’ve been disappointed that three recent exceptional studies of one of the world’s most historically important diseases, leprosy, have gotten so little notice from the science communication. It will take me a few hours to lay out their significance. But I think it’s important to do so.

So, here are the new studies on historical distribution and evolutionary development of Mycobacterium leprae, one of two organisms that causes leprosy (fourth study dropped yesterday!).

  1. Phylogenomics and antimicrobial resistance of the leprosy bacillus Mycobacterium leprae, by Benjak et al., Nature Communications (2018) 9:352.
  2. Abstract:

    Leprosy is a chronic human disease caused by the yet-uncultured pathogen Mycobacterium leprae. Although readily curable with multidrug therapy (MDT), over 200,000 new cases are still reported annually. Here, we obtain M. leprae genome sequences from DNA extracted directly from patients’ skin biopsies using a customized protocol. Comparative and phylogenetic analysis of 154 genomes from 25 countries provides insight into evolution and antimicrobial resistance, uncovering lineages and phylogeographic trends, with the most ancestral strains linked to the Far East. In addition to known MDT-resistance mutations, we detect other mutations associated with antibiotic resistance, and retrace a potential stepwise emergence of extensive drug resistance in the pre-MDT era. Some of the previously undescribed mutations occur in genes that are apparently subject to positive selection, and two of these (ribD, fadD9) are restricted to drug-resistant strains. Finally, nonsense mutations in the nth excision repair gene are associated with greater sequence diversity and drug resistance.

  3. Ancient DNA study reveals HLA susceptibility locus for leprosy in medieval Europeans, by Krause-Kyora et al., Nature Communications (2018) 9:1569
  4. NOTE. I referred to this study in this blog.

  5. Ancient genomes reveal a high diversity of Mycobacterium leprae in medieval Europe, by Schuenemann et al., PLOS Pathogens (2018)
  6. Abstract:

    Studying ancient DNA allows us to retrace the evolutionary history of human pathogens, such as Mycobacterium leprae, the main causative agent of leprosy. Leprosy is one of the oldest recorded and most stigmatizing diseases in human history. The disease was prevalent in Europe until the 16th century and is still endemic in many countries with over 200,000 new cases reported annually. Previous worldwide studies on modern and European medieval M. leprae genomes revealed that they cluster into several distinct branches of which two were present in medieval Northwestern Europe. In this study, we analyzed 10 new medieval M. leprae genomes including the so far oldest M. leprae genome from one of the earliest known cases of leprosy in the United Kingdom—a skeleton from the Great Chesterford cemetery with a calibrated age of 415–545 C.E. This dataset provides a genetic time transect of M. leprae diversity in Europe over the past 1500 years. We find M. leprae strains from four distinct branches to be present in the Early Medieval Period, and strains from three different branches were detected within a single cemetery from the High Medieval Period. Altogether these findings suggest a higher genetic diversity of M. leprae strains in medieval Europe at various time points than previously assumed. The resulting more complex picture of the past phylogeography of leprosy in Europe impacts current phylogeographical models of M. leprae dissemination. It suggests alternative models for the past spread of leprosy such as a wide spread prevalence of strains from different branches in Eurasia already in Antiquity or maybe even an origin in Western Eurasia. Furthermore, these results highlight how studying ancient M. leprae strains improves understanding the history of leprosy worldwide.

  7. The genome sequence of a SNP type 3K strain of Mycobacterium leprae isolated from a seventh‐century Hungarian case of lepromatous leprosy, by Mendum et al., International Journal of Osteoarchaeology (2018).
  8. Abstract:

    We report on a Mycobacterium leprae genome isolated from the remains of an individual with lepromatous leprosy that were excavated from a seventh‐century Hungarian cemetery. We determined that the genome was from a single nucleotide polymorphism (SNP) type 3K0 M. leprae strain, a lineage that diverged early from other M. leprae lineages. This is one of the earliest 3K0 M. leprae genomes to be sequenced to date. A number of novel SNPs as well as SNPs characteristic of the 3K0 lineage were confirmed by conventional polymerase chain reaction and Sanger sequencing. Recovery of accompanying human DNA from the burial was poor, particularly when compared with that of the pathogen. Modern 3K0 M. leprae strains have only been isolated from East Asia and the Pacific, and so these findings require new scenarios to describe the origins and routes of dissemination of leprosy during antiquity that have resulted in the modern phylogeographical distribution of M. leprae.

A fifth study can be added to the list, which, though not as extensive, is significant because it validates findings of others: Mycobacterium leprae genomes from naturally infected nonhuman primates, by Honap et al. PLOS Neglected Tropical Diseases (2018).

Abstract:

Leprosy is caused by the bacterial pathogens Mycobacterium leprae and Mycobacterium lepromatosis. Apart from humans, animals such as nine-banded armadillos in the Americas and red squirrels in the British Isles are naturally infected with M. leprae. Natural leprosy has also been reported in certain nonhuman primates, but it is not known whether these occurrences are due to incidental infections by human M. leprae strains or by M. leprae strains specific to nonhuman primates. In this study, complete M. leprae genomes from three naturally infected nonhuman primates (a chimpanzee from Sierra Leone, a sooty mangabey from West Africa, and a cynomolgus macaque from The Philippines) were sequenced. Phylogenetic analyses showed that the cynomolgus macaque M. leprae strain is most closely related to a human M. leprae strain from New Caledonia, whereas the chimpanzee and sooty mangabey M. leprae strains belong to a human M. leprae lineage commonly found in West Africa. Additionally, samples from ring-tailed lemurs from the Bezà Mahafaly Special Reserve, Madagascar, and chimpanzees from Ngogo, Kibale National Park, Uganda, were screened using quantitative PCR assays, to assess the prevalence of M. leprae in wild nonhuman primates. However, these samples did not show evidence of M. leprae infection. Overall, this study adds genomic data for nonhuman primate M. leprae strains to the existing M. leprae literature and finds that this pathogen can be transmitted from humans to nonhuman primates as well as between nonhuman primate species. While the prevalence of natural leprosy in nonhuman primates is likely low, nevertheless, future studies should continue to explore the prevalence of leprosy-causing pathogens in the wild.

These five studies are doing whole-genome sequencing on either modern isolates of M. leprae, or genomic fragments retrieved from buried remains (aDNA). The main objective of all the studies is to understand the diversity of M. leprae, both in terms of its history and in terms of its present-day distribution. (Benjak et al. 2018 are especially concerned to study possible reasons for variance in multiple drug resistance).

The following comments are concerned only to discuss leprosy’s history.

So, let’s start with a common claim of the science communication pieces on Schuenemann et al. 2018, which was published last week. A common formula: “New Study Suggests Leprosy Began To Spread From Europe To The World“. Is it plausible that Europe was where leprosy originated as human disease?

The answer, actually, is no. There’s two reasons for this, one having to do with chronology, the other with geography.

For chronology, these studies cumulatively suggest we are looking at a bottleneck. The current Time to Most Recent Common Ancestor (TMRCA) suggested for the divergence of M. leprae from its closest known “cousin,” M. lepromatosis (which also causes leprosy in humans) is estimated to be ca. 13.9 million years. There were no humans around 13.9M ya. So we cannot have been M. leprae‘s original host. All studies being discussed here agree on a consensus phylogeny, which puts the origin of all known strains of M. leprae at about 4-5K ya. So when we talk about the “origin” of M. leprae, we only talking about those lineages formed after this bottleneck.

Next we have to look at geography. Let’s start with this statement from the most recent study, Mendum et al. 2018, which is discussing a genome sequenced from an individual in Hungary from the 7th c. CE: “Modern 3K0 M. leprae strains have only been isolated from East Asia and the Pacific and so these findings require new scenarios to describe the origins and routes of dissemination of leprosy during antiquity that have resulted in the modern phylogeographical distribution of M. leprae.”

Okay, so stop and consider the implications of this. We have someone from 7th c. Hungary with leprosy. The strain of M. leprae that he has is not most closely related to strains sequenced earlier in Denmark or Sweden or England (see Schuenemann et al. 2018, refs. 9, 20, & 21). Rather, the strain he has (3K0) is most closely related to modern strains currently documented on the Pacific Rim, a very, very long way from Hungary. Here are the summary reflections of Mendum et al. 2018:

The global distribution of 3K0 and 3K1 strains is today restricted to regions of the Western Pacific such as Japan (except Okinawa), Korea, China, The Philippines, New Caledonia and Indonesia amongst others (Kai et al, 2013; Avanzi et al, 2015; Monot el al, 2009; Weng et al, 2013; Honap et al, 2018). This could indicate that the 3K lineage originated in Northern or Eastern Asia. The presence of two type 3K cases (KD271 and 222) in early medieval Hungary would then suggest a route of dissemination from Asia to central Europe, perhaps via trade links or migrations. This would be consistent with what is known of the origins of the Pannonian Avars, who are believed to have reached the Hungarian plain from the Eurasian steppe in the late 6th to early 9th centuries (Curta, 2006). The other possibility is that Europe was a centre of dissemination of the ancestral 3K0 and related strains, some of which later became less common or even absent from Europe but persisted in East Asia and the Pacific. Determining the likelihood of each of these scenarios will require more sampling and characterisation of both ancient and modern strains.

Two things bear stressing:

  1. The lineage in which the Hungarian sample has been placed, Lineage 0, has now been documented in historical remains from Denmark, too. (Schuenemann et al. 2018) So whatever transmission routes are postulated to connect the Pacific Rim to Hungary, we will also need to postulate routes to connect the Pacific Rim to Denmark.
  2. Schuenemann et al. 2018 document four of the five known M. leprae lineages in medieval western Europe. (Mendum et al. 2018 now declare the existence of 6 lineages; see tree.)
leprosy-phylogenetic-tree
Phylogenetic relationships between selected modern (regular text) and ancient (bold text) M. leprae strains. The phylogeny was inferred by the Maximum Likelihood method of MEGA7 (Kumar et al, 2016) and the Tamura-3-Parameter model. The tree with the highest log likelihood value is shown. Bootstrap percentages from 1000 replicates are shown next to the branches. The scale indicates the number of substitutions per site. All positions with less than 90% site coverage were eliminated. M. lepromatosis was used as an outgroup (not shown). CM1 and Br15-1 are derived from a cynomolgus macaque and a red squirrel respectively.

Now, remember that we also need to keep chronology in mind: Lineage 0 is thought to have diverged from the common ancestor of Lineages 1-4 at least 3.5K ya. (Here’s the phylogenetic tree from Benjak et al. 2018, which I have marked with time divisions for emphasis.)

phylogeny-mleprae
Modified image by Monica H. Green. Phylogeny of M. leprae. Bayesian phylogenetic tree of 146 genomes of M. leprae calculated with BEAST 2.4.4. Hypermutated samples with mutations in the nth gene were excluded from the analysis. The tree is drawn to scale, with branch lengths representing years of age. Samples were binned according to geographic origin as given in the legend. Posterior probabilities for each node are shown in gray. Location probabilities of nodes were inferred by the Discrete Phylogeny model

So what we need to explain is how a strain (Lineage 0, or 3K0 as Mendum et al. 2018 call it) can be found all the way from Denmark to New Caledonia. An “Out of Europe” narrative isn’t really helpful, any more than the earlier “Out of Africa” narrative worked.

Given the extreme amount of suffering leprosy has caused, and continues to cause around the world, and given the extraordinary investigative power that paleogenetics has now developed, it’s really time that we did a better job pulling these global narratives together.

If you have Twitter, be sure to retweet this thread!

NOTE. Another (probably also interesting) article was published recently, Digging up the plague: A diachronic comparison of aDNA confirmed plague burials and associated burial customs in Germany, by Gutsmiedl-Schümann, Praehistorische Zeitschrift (2018) 92:2, but sadly my university does not have access to it.

Abstract:

Plague outbreaks in the past are mainly known from written sources; in particular, the Justinianic Plague of the Early Middle Ages and the Black Death of the Late Middle Ages have been described in vivid detail. Yet prior to the introduction of aDNA analysis, it was often quite difficult to associate burials with plague beyond doubt – especially in areas where written evidence of the plague is scarce. As analysis of ancient DNA now allows the detection of plague victims in the archaeological record, new ways are being developed for combining archaeological, historical and ancient DNA research. In this paper we would like to present and compare known examples of plague graves from the Early Middle Ages, the Late Middle Ages and the Thirty Years’ War in Germany that have also been confirmed by ancient DNA analyses. We would like to argue for a differentiated view of the burial customs, especially when more than one plague victim shared a grave, and would like to show possible conclusions, drawn from the aDNA-confirmed plague burials, that can indicate the different strategies adopted by ancient societies to deal with catastrophic events like a pandemic disease.

Related:

Ancient nomadic tribes of the Mongolian steppe dominated by a single paternal lineage

The genome of an ancient Rouran individual reveals an important paternal lineage in the Donghu population, by Li et al. Am J Phys Anthropol (2018), 1–11.

Abstract (emphasis mine):

Objectives
Following the Xiongnu and Xianbei, the Rouran Khaganate (Rouran) was the third great nomadic tribe on the Mongolian Steppe. However, few human remains from this tribe are available for archaeologists and geneticists to study, as traces of the tombs of these nomadic people have rarely been found. In 2014, the IA‐M1 remains (TL1) at the Khermen Tal site from the Rouran period were found by a Sino‐Mongolian joint archaeological team in Mongolia, providing precious material for research into the genetic imprint of the Rouran.

Materials and methods
The mtDNA hypervariable sequence I (HVS‐I) and Y‐chromosome SNPs were analyzed, and capture of the paternal non‐recombining region of the Y chromosome (NRY) and whole‐genome shotgun sequencing of TL1 were performed. The materials from three sites representing the three ancient nationalities (Donghu, Xianbei, and Shiwei) were selected for comparison with the TL1 individual.

Results
The mitochondrial haplotype of the TL1 individual was D4b1a2a1. The Y‐chromosome haplotype was C2b1a1b/F3830 (ISOGG 2015), which was the same as that of the other two ancient male nomadic samples (ZHS5 and GG3) related to the Xianbei and Shiwei, which were also detected as F3889; this haplotype was reported to be downstream of F3830 by Wei et al. (2017).

Discussion
We conclude that F3889 downstream of F3830 is an important paternal lineage of the ancient Donghu nomads. The Donghu‐Xianbei branch is expected to have made an important paternal genetic contribution to Rouran. This component of gene flow ultimately entered the gene pool of modern Mongolic‐ and Manchu‐speaking populations.

mongol-f3830-tree
The ancient males (TL1, ZHS5, and GG3) was grouped under C2b1a1b1/F3880 on the Y-DNA haplogroup C lineage using BEAST

Excerpt:

These results suggested that TL1 likely presents a close paternal relationship to the Donghu people and may have even descended from a branch of the ancient Donghu-Xianbei people, based on the conclusion that haplogroup C2b1a/F3918 can be considered the paternal branch of the ancient Donghu people (Zhang et al., 2018). The Y-chromosome phylogenetic tree showed that TL1 shared a branch with modern Mongolian-Buryats, Hezhen, Xibo, Yugur, and Kazakh, suggesting that the TL1 individual from the Rouran period should also generally present close paternal genetic relationships with modern Mongolic- and Manchu-speaking peoples.

In general, the Rouran Khaganate originated from an alliance of the ancient Eurasian steppe nomads, which disintegrated and disappeared with the progress of history. This group was complex, and its origin cannot be explained based only on one individual. However, we can trace the genetic imprint of the Rouran people through genome analysis of the TL1 individual. On the basis of the comparison with other ancient nomadic people (Donghu, Xianbei, and Shiwei) and data on modern individuals from published articles (Lippold et al., 2014; Wei et al., 2017) (Supporting Information S5), we found that they all share the same haplotype implying shared paternal ancestry between the Donghu, Xianbei and Rouran populations. Furthermore, this gene flow (mainly haplogroup C2b1a/F3918) did not stop with the disappearance of the Rouran, and a portion was instead passed on in other groups, such as the ancient Shiwei people (later than Rouran), eventually reaching the gene pool of modern Mongolic- and Manchu-speaking populations (Mongolian-Buryats, Hezhen, Xibo, et al).

Interesting to see now confirmed with ancient DNA the proposal of a C3*-DYS448del cluster as the paternal lineage defining ancient Mongolian tribes, a theory based on ancient and modern samples – since it is found in low frequency in almost all Mongolic- and Turkic-speaking populations.

This is yet another proof of how prehistoric ethnolinguistic expansions are usually accompanied by haplogroup expansion and reduction in variability.

I wonder what other ancient chiefdom-type steppe-based nomadic groups were also dominated by a single paternal lineage

Related:

The uneasy relationship between Archaeology and Ancient Genomics

Allentoft Corded Ware

News feature Divided by DNA: The uneasy relationship between archaeology and ancient genomics, Two fields in the midst of a technological revolution are struggling to reconcile their views of the past, by Ewen Callaway, Nature (2018) 555:573-576.

Interesting excerpts (emphasis mine):

In duelling 2015 Nature papers6,7the teams arrived at broadly similar conclusions: an influx of herders from the grassland steppes of present-day Russia and Ukraine — linked to Yamnaya cultural artefacts and practices such as pit burial mounds — had replaced much of the gene pool of central and Western Europe around 4,500–5,000 years ago. This was coincident with the disappearance of Neolithic pottery, burial styles and other cultural expressions and the emergence of Corded Ware cultural artefacts, which are distributed throughout northern and central Europe. “These results were a shock to the archaeological community,” Kristiansen says.

(…)

Still, not everyone was satisfied. In an essay8 titled ‘Kossinna’s Smile’, archaeologist Volker Heyd at the University of Bristol, UK, disagreed, not with the conclusion that people moved west from the steppe, but with how their genetic signatures were conflated with complex cultural expressions. Corded Ware and Yamnaya burials are more different than they are similar, and there is evidence of cultural exchange, at least, between the Russian steppe and regions west that predate Yamnaya culture, he says. None of these facts negates the conclusions of the genetics papers, but they underscore the insufficiency of the articles in addressing the questions that archaeologists are interested in, he argued. “While I have no doubt they are basically right, it is the complexity of the past that is not reflected,” Heyd wrote, before issuing a call to arms. “Instead of letting geneticists determine the agenda and set the message, we should teach them about complexity in past human actions.”

Many archaeologists are also trying to understand and engage with the inconvenient findings from genetics. (…)
[Carlin:] “I would characterize a lot of these papers as ‘map and describe’. They’re looking at the movement of genetic signatures, but in terms of how or why that’s happening, those things aren’t being explored,” says Carlin, who is no longer disturbed by the disconnect. “I am increasingly reconciling myself to the view that archaeology and ancient DNA are telling different stories.” The changes in cultural and social practices that he studies might coincide with the population shifts that Reich and his team are uncovering, but they don’t necessarily have to. And such biological insights will never fully explain the human experiences captured in the archaeological record.

Reich agrees that his field is in a “map-making phase”, and that genetics is only sketching out the rough contours of the past. Sweeping conclusions, such as those put forth in the 2015 steppe migration papers, will give way to regionally focused studies with more subtlety.

This is already starting to happen. Although the Bell Beaker study found a profound shift in the genetic make-up of Britain, it rejected the notion that the cultural phenomenon was associated with a single population. In Iberia, individuals buried with Bell Beaker goods were closely related to earlier local populations and shared little ancestry with Beaker-associated individuals from northern Europe (who were related to steppe groups such as the Yamnaya). The pots did the moving, not the people.

This final paragraph apparently sums up a view that Reich has of this field, since he repeats it:

Reich concedes that his field hasn’t always handled the past with the nuance or accuracy that archaeologists and historians would like. But he hopes they will eventually be swayed by the insights his field can bring. “We’re barbarians coming late to the study of the human past,” Reich says. “But it’s dangerous to ignore barbarians.”

I would say that the true barbarians didn’t have a habit or possibility to learn from the higher civilizations they attacked or invaded. Geneticists, on the other hand, only have to do what they expect archaeologists to do: study.

EDIT (30 MAR 2018): A new interesting editorial of Nature, On the use and abuse of ancient DNA.

See also:

David Reich on the influence of ancient DNA on Archaeology and Linguistics

An interesting interview has appeared on The Atlantic, Ancient DNA Is Rewriting Human (and Neanderthal) History, on the occasion of the publication of David Reich’s book Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past.

Some interesting excerpts (I have emphasized some of Reich’s words):

On the efficiency of the Reich Lab

Zhang: How much does it cost to process an ancient DNA sample right now?

Reich: In our hands, a successful sample costs less than $200. That’s only two or three times more than processing them on a present-day person. And maybe about one-third to one half of the samples we screen are successful at this point.

This is probably the most controversial assessment for the Twitterverse, since it puts the Reich Lab at the top of the publishing chain, but I don’t find this fact controversial; at all.

Anyone interested in doing genetic studies has free datasets, papers, and bioinformatic tools at hand – thanks to his lab, mostly – to develop new methods and publish papers. Such secondary works won’t probably be published in journals with the highest impact factor, but what can you do, welcome to the scientific world…

Also, by the looks of it, every single researcher involved in recovering an archaeological sample is included as co-author of the papers, so there is a clear benefit for ‘local’ researchers collaborating with the Lab. Therefore, these researchers and their institutions are responsible for whatever unfair situation might be created by their exchange.

On Archaeology’s reaction to Kossinna and Nazi ideas:

(…)
Zhang: You actually had German collaborators drop out of a study because of these exact concerns, right? One of them wrote, “We must(!) avoid … being compared with the so-called ‘siedlungsarchäologie Method’ from Gustaf Kossinna!”

Reich: Yeah, that’s right. I think one of the things the ancient DNA is showing is actually the Corded Ware culture does correspond coherently to a group of people. I think that was a very sensitive issue to some of our coauthors, and one of the coauthors resigned because he felt we were returning to that idea of migration in archaeology that pots are the same as people. There have been a fair number of other coauthors from different parts of continental Europe who shared this anxiety.

We responded to this by adding a lot of content to our papers to discuss these issues and contextualize them. Our results are actually almost diametrically opposite from what Kossina thought because these Corded Ware people come from the East, a place that Kossina would have despised as a source for them. But nevertheless it is true that there’s big population movements, and so I think what the DNA is doing is it’s forcing the hand of this discussion in archaeology, showing that in fact, major movements of people do occur. They are sometimes sharp and dramatic, and they involve large-scale population replacements over a relatively short period of time. We now can see that for the first time.

What the genetics is finding is often outside the range of what the archaeologists are discussing these days.

This is mostly true: Genomics offers a whole new dimension to assess exchanges among groups, and help thus select anthropological models of cultural diffusion. They offer another way of interpreting prehistoric cultural evolution and change, including the investigation of potential languages of these cultures, ways of change and replacement, etc.

Also, he acknowledges that there is a lot of content added to the papers in search for context – and thus avoid simplistic assumptions and conclusions – , so this is a reasonable way to look at the (often erroneous) cultural and linguistic context which accompany most genetic papers, and even the new methods being developed to assess samples.

On the other hand, the fact that many in Archaeology didn’t want to discuss migrations does not mean that it was not discussed at all, as he seems to suggest.

On how Genomics fits with traditional disciplines

Zhang: I think at one point in your book you actually describe ancient DNA researchers as the “barbarians” at the gates of the study of history.

Reich: Yeah.

Zhang: Does it feel that way? Have you gotten into arguments with archaeologists over your findings?

Reich: I think archaeologists and linguists find it frustrating that we’re not trained in the language of archaeology and all these sensitivities like about Kossinna. Yet we have this really powerful tool which is this way of looking at things nobody has been able to look at before.

The point I was trying to make there was that even if we’re not always able to articulate the context of our findings very well, this is very new information, and a serious scholar really needs to take this on board. It’s dangerous. Barbarians may not talk in an educated and learned way but they have access to weapons and ways of looking at things that other people haven’t looked to. And time and again we’ve learned in the past that ignoring barbarians is a dangerous thing to do.

I think this is also mostly true: many academics find it frustrating to read these papers, most of which lack a minimal understanding of the topics being discussed.

For example, you can’t pretend to derive meaningful conclusions about Proto-Indo-Europeans knowing nothing about their language and the potential cultures associated with them (and why they were associated with them in the first place)…

I also agree with him in that the study of ancient DNA is a very powerful tool. Everyone involved in Anthropology and Archaeology should be trained these days in Genomics – or, at least, they should have the opportunity to do so.

On the dangers of Genomics

Reich: (…) I know there are extremists who are interested in genealogy and genetics. But I think those are very marginal people, and there’s, of course, a concern they may impinge on the mainstream.

But if you actually take any serious look at this data, it just confounds every stereotype. It’s revealing that the differences among populations we see today are actually only a few thousand years old at most and that everybody is mixed. I think that if you pay any attention to this world, and have any degree of seriousness, then you can’t come out feeling affirmed in the racist view of the world. You have to be more open to immigration. You have to be more open to the mixing of different peoples. That’s your own history.

I guess David Reich does not frequent forums on human genetics linked to ethnolinguistic identification, or he would not think of ‘extremists’ as marginal people. Or else we have a different view of what defines an ‘extremist’…

Conclusion

I did not have the best of opinions about David Reich – or any other geneticist involved in publishing anthropological theories, for that matter. I have always had great respect for their scientific work, though.

If anything, this article shows that he knows his own (and his fellow geneticists’) limitations, and the dangers and limitations of Genomics as a whole, so I have more respect for him – and anyone involved with his Lab’s work – after reading this piece.

I would sum up his interview with his humbling sentence:

We should think we really don’t know what we’re talking about.

NOTE. Also on the occasion of the publication of his book, Nature has published the piece Sex, power and ancient DNA – Turi King hails David Reich’s thrilling account of mapping humans through time and place.

After buying Lalueza-Fox’s recent book ‘La forja genètica d’Europa’, I don’t really feel like buying another book on Genomics and migrations from a geneticist. If you have read Reich’s book, please share your impressions.

EDIT (19 MAR 2018): Razib Khan has written a ‘preview of a review‘ that he intends to publish on the National Review, and it seems the book might be worth it, after all.

EDIT (20 MAR 2018): The New York Times’ Carl Zimmer writes a review, David Reich Unearths Human History Etched in Bone. Seen first in Razib Khan’s Gene Expression blog.

Y-DNA relevant in the postgenomic era, mtDNA study of Iron Age Italic population, and reconstructing the genetic history of Italians

iron_age_europe_mediterranean

Open Access Annals of Human Biology (2018), Volume 45, Issue 1, with the title Human population genetics of the Mediterranean.

Among the most interesting articles (emphasis mine):

Iron Age Italic population genetics: the Piceni from Novilara (8th–7th century BC), by Serventi, Panicucci, Bodega, et al.

Background: Archaeological data provide evidence that Italy, during the Iron Age, witnessed the appearance of the first communities with well defined cultural identities. To date, only a few studies report genetic data about these populations and, in particular, the Piceni have never been analysed.

Aims: To provide new data about mitochondrial DNA (mtDNA) variability of an Iron Age Italic population, to understand the contribution of the Piceni in shaping the modern Italian gene pool and to ascertain the kinship between some individuals buried in the same grave within the Novilara necropolis.

Subjects and methods: In a first set of 10 individuals from Novilara, we performed deep sequencing of the HVS-I region of the mtDNA, combined with the genotyping of 22 SNPs in the coding region and the analysis of several autosomal markers.

Results: The results show a low nucleotide diversity for the inhabitants of Novilara and highlight a genetic affinity of this ancient population with the current inhabitants of central Italy. No family relationship was observed between the individuals analysed here.

Conclusions: This study provides a preliminary characterisation of the mtDNA variability of the Piceni of Novilara, as well as a kinship assessment of two peculiar burials.


Reconstructing the genetic history of Italians: new insights from a male (Y-chromosome) perspective, by Grugni, Raveani, Mattioli, et al.

Background: Due to its central and strategic position in Europe and in the Mediterranean Basin, the Italian Peninsula played a pivotal role in the first peopling of the European continent and has been a crossroad of peoples and cultures since then.

Aim: This study aims to gain more information on the genetic structure of modern Italian populations and to shed light on the migration/expansion events that led to their formation.

Subjects and methods: High resolution Y-chromosome variation analysis in 817 unrelated males from 10 informative areas of Italy was performed. Haplogroup frequencies and microsatellite haplotypes were used, together with available data from the literature, to evaluate Mediterranean and European inputs and date their arrivals.

Results: Fifty-three distinct Y-chromosome lineages were identified. Their distribution is in general agreement with geography, southern populations being more differentiated than northern ones.

Conclusions: A complex genetic structure reflecting the multifaceted peopling pattern of the Peninsula emerged: southern populations show high similarity with those from the Middle East and Southern Balkans, while those from Northern Italy are close to populations of North-Western Europe and the Northern Balkans. Interestingly, the population of Volterra, an ancient town of Etruscan origin in Tuscany, displays a unique Y-chromosomal genetic structure.

italy-y-dna
Frequencies of the main Y-chromosome haplogroups E1b, J2 and R1b and their sub-clades in the 10 analysed Italian population samples. Black sectors in the primary pies are proportional to the frequency of the main haplogroup in each population. Coloured sectors in the secondary pies are proportional to the frequencies of sub-haplogroups within the relative main haplogroup.

Mitochondrial variability in the Mediterranean area: a complex stage for human migrations, by De Angelis, Scorrano, Martínez-Labarga, et al.

Context: The Mediterranean area has always played a significant role in human dispersal due to the large number of migratory events contributing to shape the cultural features and the genetic pool of its populations.

Objective: This paper aims to review and diachronically describe the mitogenome variability in the Mediterranean population and the main demic diffusions that occurred in this area over time.

Methods: Frequency distributions of the leading mitochondrial haplogroups have been geographically and chronologically evaluated. The variability of U5b and K lineages has been focussed to broaden the knowledge of their genetic histories.

Results: The mitochondrial genetic makeup of Palaeolithic hunter-gatherers is poorly defined within the extant Mediterranean populations, since only a few traces of their genetic contribution are still detectable. The Neolithic lineages are more represented, suggesting that the Neolithic revolution had a marked effect on the peopling of the Mediterranean area. The largest effect, however, was provided by historical migrations.

Conclusion: Although the mitogenome variability has been widely used to try and clarify the evolution of the Mediterranean genetic makeup throughout almost 50 000 years, it is necessary to collect whole genome data on both extinct and extant populations from this area to fully reconstruct and interpret the impact of multiple migratory waves and their cultural and genetic consequences on the structure of the Mediterranean populations.

mtdna-mediterranean
Major migratory routes with the associated mtDNA haplogroups for the Upper Palaeolithic (solid lines) and the Neolithic (dashed lines) chronologies. Other hypothetical migratory routes are presented with dotted lines (see text for more details).

Mediterranean Y-chromosome 2.0—why the Y in the Mediterranean is still relevant in the postgenomic era, by Larmuseau & Ottoni.

Context: Due to its unique paternal inheritance, the Y-chromosome has been a highly popular marker among population geneticists for over two decades. Recently, the advent of cost-effective genome-wide methods has unlocked information-rich autosomal genomic data, paving the way to the postgenomic era. This seems to have announced the decreasing popularity of investigating Y-chromosome variation, which provides only the paternal perspective of human ancestries and is strongly influenced by genetic drift and social behaviour.

Objective: For this special issue on population genetics of the Mediterranean, the aim was to demonstrate that the Y-chromosome still provides important insights in the postgenomic era and in a time when ancient genomes are becoming exponentially available.

Methods: A systematic literature search on Y-chromosomal studies in the Mediterranean was performed.

Results: Several applications of Y-chromosomal analysis with future opportunities are formulated and illustrated with studies on Mediterranean populations.

Conclusions: There will be no reduced interest in Y-chromosomal studies going from reconstruction of male-specific demographic events to ancient DNA applications, surname history and population-wide estimations of extra-pair paternity rates. Moreover, more initiatives are required to collect population genetic data of Y-chromosomal markers for forensic research, and to include Y-chromosomal data in GWAS investigations and studies on male infertility.

y-dna-plot
Two-dimensional plot of the PCA of Y-chromosomal haplogroup frequencies of modern populations from Europe, the Near and Middle East and North Africa. Symbols are as in the legend. The inset shows the plot of factor coordinates of the variables used.

We are clearly seeing in the latest genomic papers that Y-DNA was indeed extremely important to assess ancient population movements.

See also: