Modern Sardinians show elevated Neolithic farmer ancestry shared with Basques


New paper (behind paywall), Genomic history of the Sardinian population, by Chiang et al. Nature Genetics (2018), previously published as a preprint at bioRxiv (2016).

#EDIT (18 Sep 2018): Link to read paper for free shared by the main author.

Interesting excerpts (emphasis mine):

Our analysis of divergence times suggests the population lineage ancestral to modern-day Sardinia was effectively isolated from the mainland European populations ~140–250 generations ago, corresponding to ~4,300–7,000 years ago assuming a generation time of 30 years and a mutation rate of 1.25 × 10−8 per basepair per generation. (…) in terms of relative values, the divergence time between Northern and Southern Europeans is much more recent than either is to Sardinia, signaling the relative isolation of Sardinia from mainland Europe.

We documented fine-scale variation in the ancient population ancestry proportions across the island. The most remote and interior areas of Sardinia—the Gennargentu massif covering the central and eastern regions, including the present-day province of Ogliastra— are thought to have been the least exposed to contact with outside populations. We found that pre-Neolithic hunter-gatherer and Neolithic farmer ancestries are enriched in this region of isolation. Under the premise that Ogliastra has been more buffered from recent immigration to the island, one interpretation of the result is that the early populations of Sardinia were an admixture of the two ancestries, rather than the pre-Neolithic ancestry arriving via later migrations from the mainland. Such admixture could have occurred principally on the island or on the mainland before the hypothesized Neolithic era influx to the island. Under the alternative premise that Ogliastra is simply a highly isolated region that has differentiated within Sardinia due to genetic drift, the result would be interpreted as genetic drift leading to a structured pattern of pre-Neolithic ancestry across the island, in an overall background of high Neolithic ancestry.

PCA results of merged Sardinian whole-genome sequences and the HGDP Sardinians. See below for a map of the corresponding regions.

We found Sardinians show a signal of shared ancestry with the Basque in terms of the outgroup f3 shared-drift statistics. This is consistent with long-held arguments of a connection between the two populations, including claims of Basque-like, non-Indo-European words among Sardinian placenames. More recently, the Basque have been shown to be enriched for Neolithic farmer ancestry and Indo-European languages have been associated with steppe population expansions in the post-Neolithic Bronze Age. These results support a model in which Sardinians and the Basque may both retain a legacy of pre-Indo-European Neolithic ancestry. To be cautious, while it seems unlikely, we cannot exclude that the genetic similarity between the Basque and Sardinians is due to an unsampled pre-Neolithic population that has affinities with the Neolithic representatives analyzed here.

Left: Geographical map of Sardinia. The provincial boundaries are given as black lines. The provinces are abbreviated as Cag (Cagliari), Cmp (Campidano), Car (Carbonia), Ori (Oristano), Sas (Sassari), Olb (Olbia-tempio), Nuo (Nuoro), and Ogl (Ogliastra). For sampled villages within Ogliastra, the names and abbreviations are indicated in the colored boxes. The color corresponds to the color used in the PCA plot (Fig. 2a). The Gennargentu region referred to in the main text is the mountainous area shown in brown that is centered in western Ogliastra and southeastern Nuoro.
Right: Density of Nuraghi in Sardinia, from Wikipedia.

While we can confirm that Sardinians principally have Neolithic ancestry on the autosomes, the high frequency of two Y-chromosome haplogroups (I2a1a1 at ~39% and R1b1a2 at ~18%) that are not typically affiliated with Neolithic ancestry is one challenge to this model. Whether these haplogroups rose in frequency due to extensive genetic drift and/or reflect sex-biased demographic processes has been an open question. Our analysis of X chromosome versus autosome diversity suggests a smaller effective size for males, which can arise due to multiple processes, including polygyny, patrilineal inheritance rules, or transmission of reproductive success. We also find that the genetic ancestry enriched in Sardinia is more prevalent on the X chromosome than the autosome, suggesting that male lineages may more rapidly trace back to the mainland. Considering that the R1b1a2 haplogroup may be associated with post-Neolithic steppe ancestry expansions in Europe, and the recent timeframe when the R1b1a2 lineages expanded in Sardinia, the patterns raise the possibility of recent male-biased steppe ancestry migration to Sardinia, as has been reported among mainland Europeans at large (though see Lazaridis and Reich and Goldberg et al.). Such a recent influx is difficult to square with the overall divergence of Sardinian populations observed here.

Mixture proportions of the three-component ancestries among Sardinian populations. Using a method first presented in Haak et al. (Nature 522, 207–211, 2015), we computed unbiased estimates of mixture proportions without a parameterized model of relationships between the test populations and the outgroup populations based on f4 statistics. The three-component ancestries were represented by early Neolithic individuals from the LBK culture (LBK_EN), pre-Neolithic huntergatherers (Loschbour), and Bronze Age steppe pastoralists (Yamnaya). See Supplementary Table 5 for standard error estimates computed using a block jackknife.

Once again, haplogroup R1b1a2 (M269), and only R1b1a2, related to male-biased, steppe-related Indo-European migrations…just sayin’.

Interestingly, haplogroup I2a1a1 is actually found among northern Iberians during the Neolithic and Chalcolithic, and is therefore associated with Neolithic ancestry in Iberia, too, and consequently – unless there is a big surprise hidden somewhere – with the ancestry found today among Basques.

NOTE. In fact, the increase in Neolithic ancestry found in south-west Ireland with expanding Bell Beakers (likely Proto-Beakers), coupled with the finding of I2a subclades in Megalithic cultures of western Europe, would support this replacement after the Cardial and Epi-Cardial expansions, which were initially associated with G2a lineages.

I am not convinced about a survival of Palaeo-Sardo after the Bell Beaker expansion, though, since there is no clear-cut cultural divide (and posterior continuity) of pre-Beaker archaeological cultures after the arrival of Bell Beakers in the island that could be identified with the survival of Neolithic languages.

We may have to wait for ancient DNA to show a potential expansion of Neolithic ancestry from the west, maybe associated with the emergence of the Nuragic civilization (potentially linked with contemporaneous Megalithic cultures in Corsica and in the Balearic Islands, and thus with an Iberian rather than a Basque stock), although this is quite speculative at this moment in linguistic, archaeological, and genetic terms.

Nevertheless, it seems that the association of a Basque-Iberian language with the Neolithic expansion from Anatolia (see Villar’s latest book on the subject) is somehow strengthened by this paper. However, it is unclear when, how, and where expanding G2a subclades were replaced by native I2 lineages.


Male-biased expansions and migrations also observed in Northwestern Amazonia

Open access preprint Cultural Innovations influence patterns of genetic diversity in Northwestern Amazonia, by Arias et al., bioRxiv (2018).

Abstract (emphasis mine):

Human populations often exhibit contrasting patterns of genetic diversity in the mtDNA and the non-recombining portion of the Y-chromosome (NRY), which reflect sex-specific cultural behaviors and population histories. Here, we sequenced 2.3 Mb of the NRY from 284 individuals representing more than 30 Native-American groups from Northwestern Amazonia (NWA) and compared these data to previously generated mtDNA genomes from the same groups, to investigate the impact of cultural practices on genetic diversity and gain new insights about NWA population history. Relevant cultural practices in NWA include postmarital residential rules and linguistic-exogamy, a marital practice in which men are required to marry women speaking a different language. We identified 2,969 SNPs in the NRY sequences; only 925 SNPs were previously described. The NRY and mtDNA data showed that males and females experienced different demographic histories: the female effective population size has been larger than that of males through time, and both markers show an increase in lineage diversification beginning ~5,000 years ago, with a male-specific expansion occurring ~3,500 years ago. These dates are too recent to be associated with agriculture, therefore we propose that they reflect technological innovations and the expansion of regional trade networks documented in the archaeological evidence. Furthermore, our study provides evidence of the impact of postmarital residence rules and linguistic exogamy on genetic diversity patterns. Finally, we highlight the importance of analyzing high-resolution mtDNA and NRY sequences to reconstruct demographic history, since this can differ considerably between males and females.

MDS plots for mtDNA and NRY. Stress values (within parentheses) are indicated in percentages.

Looking more precisely at the different groups (even with the resampling approach), there are no significant differences between matrilocal and patrilocal groups. At best, as the study proposes, “this is just one of the factors at play in structuring the observed genetic variation”.

Interesting excerpts:

(…) we found evidence that the patterns of genetic differentiation depend on the geographical scale of the study. The magnitude of between-population differentiation in the NRY compared to the mtDNA is smaller when looking at the continental scale than in NWA (Figure 6). This is in agreement with the findings of Wilkins and Marlowe (2006), who showed that the excess of between-population differentiation for the NRY in comparison to the mtDNA decreases when comparing more geographically distant populations. Heyer et al. (2012) and Wilkins and Marlowe (2006) have proposed that at a local scale the patterns of genetic diversity reflect cultural practices over a relatively small number of generations, whereas at a larger geographic scale the genetic diversity reflects old migration and/or old common ancestry patterns(Heyer et al. 2012; Wilkins and Marlowe 2006).

BSPs for the mtDNA and NRY sequences from NWA. The dotted lines indicate the 95% HPD intervals. Ne was corrected for generation time according to (Fenner 2005), using 26 years for mtDNA and 31 years for NRY.

The BSP plots and the diversity statistics indicate that overall the Ne of males has been smaller than that of females. One tentative explanation for this difference is that it reflects larger differences in reproductive success among males than among females. Some support for this explanation comes from the shape of the phylogenies (Supplementary Figures 1 and 6), since differences in reproductive success and the cultural transmission of fertility lead to imbalance phylogenies (Blum et al. 2006; Heyer et al. 2015). We estimated a common index of tree imbalance (Colless index) and calculated whether the mtDNA and NRY trees were more unbalanced than 1000 simulated trees generated under a Yule process (Bortolussi et al. 2006) (i.e. a simple pure birth process that assumes that the birth rate of new lineages is the same along the tree). We found that the NRY tree is more unbalanced than predicted by the Yule model (p-value=0.001), whereas the mtDNA tree is not significantly different from trees generated by the Yule model (p-value=0.628). It has been suggested that highly mobile hunter-gatherer societies, such as those typical of most of human prehistory, were polygynous bands (Dupanloup et al. 2003); similarly, nomadic horticulturalist Amazonian societies exhibit strong differences in reproductive success due to the common practice of polygyny, especially among community chiefs, whose offspring also enjoy a high fertility (Neel 1970; 1980; Neel and Weiss 1975).

Furthermore, a more recent expansion can be observed in the BSP based on the NRY, but not in the mtDNA BSP (Figure 5), indicating an expansion specifically in the paternal line. The reasons behind this recent male-biased population expansion, which starts ~3.5 kya, are as yet unclear. However, similar male-biased expansions have been observed in other studies using high-resolution NRY sequences (Batini et al. 2017; Karmin et al. 2015).


Mitochondrial DNA unsuitable to test for IBD, and undersampling genomes show biased time and rate estimates

Two interesting papers questioning previous methods have been published.

Open access Mitochondrial DNA is unsuitable to test for isolation by distance, by Teske et al. Scientific Reports (2018) 8:8448.

Abstract (emphasis mine):

Tests for isolation by distance (IBD) are the most commonly used method of assessing spatial genetic structure. Many studies have exclusively used mitochondrial DNA (mtDNA) sequences to test for IBD, but this marker is often in conflict with multilocus markers. Here, we report a review of the literature on IBD, with the aims of determining (a) whether significant IBD is primarily a result of lumping spatially discrete populations, and (b) whether microsatellite datasets are more likely to detect IBD when mtDNA does not. We also provide empirical data from four species in which mtDNA failed to detect IBD by comparing these with microsatellite and SNP data. Our results confirm that IBD is mostly found when distinct regional populations are pooled, and this trend disappears when each is analysed separately. Discrepancies between markers were found in almost half of the studies reviewed, and microsatellites were more likely to detect IBD when mtDNA did not. Our empirical data rejected the lack of IBD in the four species studied, and support for IBD was particularly strong for the SNP data. We conclude that mtDNA sequence data are often not suitable to test for IBD, and can be misleading about species’ true dispersal potential. The observed failure of mtDNA to reliably detect IBD, in addition to being a single-locus marker, is likely a result of a selection-driven reduction in genetic diversity obscuring spatial genetic differentiation.

Plots of geographic distances vs. F-statistics for the following species (plots on the left show mtDNA data, those on the right SNP or microsatellite data): (a) Sardinops sagax; (b) Psammogobius knysnaensis; (c) Nerita atramentosa; (d) Siphonaria diemenensis. The density of data points is indicated by colours.

Behind paywall, Undersampling Genomes has Biased Time and Rate Estimates Throughout the Tree of Life, by Julie Marin and S. Blair Hedges, Mol Biol Evol (2018), msy103.

Abstract (emphasis mine):

Genomic data drive evolutionary research on the relationships and timescale of life but the genomes of most species remain poorly sampled. Phylogenetic trees can be reconstructed reliably using small data sets and the same has been assumed for the estimation of divergence time with molecular clocks. However, we show here that undersampling of molecular data results in a bias expressed as disproportionately shorter branch lengths and underestimated divergence times in the youngest nodes and branches, termed the small sample artifact. In turn, this leads to increasing speciation and diversification rates towards the present. Any evolutionary analyses derived from these biased branch lengths and speciation rates will be similarly biased. The widely used timetrees of the major species-rich studies of amphibians, birds, mammals, and squamate reptiles are all data-poor and show upswings in diversification rate, suggesting that their results were biased by undersampling. Our results show that greater sampling of genomes is needed for accurate time and rate estimation, which are basic data used in ecological and evolutionary research.

Potential biases on speciation rate estimation. The black line represents constant speciation rate as expected if there are no artifacts or other factors affecting the rate. The small sample artifact (an insufficient number of variable sites) may impact all of the tree and diversification plot, resulting in a rate increase towards the present. The taxonomic artifact (incomplete sampling of taxa or lineages) also may impact all of the tree and diversification plot and results in a speciation rate decrease towards the present. The sparse nodes artifact (stochastic effect of a limited number of nodes) may impact the beginning of the diversification plot, causing decreases or increases in rate.

Do you remember the male-biased expansion from the Pontic-Caspian steppe, that was later contested in its methods by Lazaridis and Reich, but that is today again accepted by Reich and Lazaridis (probably for different reasons, namely Y-DNA evidence)?

Every time I read this kind of studies rejecting previous methods – which get written and published only because there is a future interest in them, not because they are (or may cause) retractions of previous results and interpretations – I remember these people inventing migration models based on genomic studies and saying “genetics is a science, linguistics/archaeology/anthropology is not”…

NOTE. Even if papers eventually receive a correction, journalists and blogs will keep echoing whatever gets published (see the famous Dennis/Denise will become dentists); there is no end to that. Believe it or not, we still see Underhill et al. (2014) being cited against the most recent papers, and even against the author’s own rejection of his paper’s results

Especially right now, it must cause some kind of dissociated reasoning among those naysayers, when they need to resort to anthropological disciplines to discuss the latest interpretations of a potential Caucasus origin or North Iranian homeland of Proto-Indo-European…

EDIT (5 JUN 2018): Also, check out the recent review From genome-wide associations to candidate causal variants by statistical fine-mapping, by Schaid, Chen, and Larson.