Improving environmental conditions favoured higher local population density, which favoured domestication

agricultural-origins

New paper (behind paywall) Hindcasting global population densities reveals forces enabling the origin of agriculture, by Kavanagh et al., Nature Human Behaviour (2018)

Abstract (emphasis mine):

The development and spread of agriculture changed fundamental characteristics of human societies1,2,3. However, the degree to which environmental and social conditions enabled the origins of agriculture remains contested4,5,6. We test three hypothesized links between the environment, population density and the origins of plant and animal domestication, a prerequisite for agriculture: (1) domestication arose as environmental conditions improved and population densities increased7 (surplus hypothesis); (2) populations needed domestication to overcome deteriorating environmental conditions (necessity hypothesis)8,9; (3) factors promoting domestication were distinct in each location10 (regional uniqueness hypothesis). We overcome previous data limitations with a statistical model, in which environmental, geographic and cultural variables capture 77% of the variation in population density among 220 foraging societies worldwide. We use this model to hindcast potential population densities across the globe from 21,000 to 4,000 years before present. Despite the timing of domestication varying by thousands of years, we show that improving environmental conditions favoured higher local population densities during periods when domestication arose in every known agricultural origin centre. Our results uncover a common, global factor that facilitated one of humanity’s most significant innovations and demonstrate that modelling ancestral demographic changes can illuminate major events deep in human history.

cultural-variables-population-densities
Path diagram for piecewise-SEM exploring the effects of environmental and cultural variables on population densities of foraging societies. Measured variables are represented by the large boxes and R2 GLMM values (see Methods) are provided for response variables. n = 220. Red arrows depict negative relationships among variables, black arrows positive relationships, and dashed grey arrows depict non-significant paths (P ≥ 0.05). Standardized coefficients are presented for all paths (small boxes) and arrow widths are scaled to reflect the magnitude of path coefficients.

Interesting excerpts:

(…) our results are consistent with the surplus hypothesis, which suggests that improving environmental conditions and the potential for increased population density may have facilitated the domestication of plants and animals in agricultural origin centres4,7 (Fig. 3). Several factors may explain the links between environmental conditions, potential population density and the origin of domestication. For one, rates of innovation may scale positively with the number of potential innovators13,14. In turn, the likelihood of domestication innovations may have increased in environments that could support increasingly higher densities of foraging people.

In addition, foraging societies may have become more sedentary to take advantage of locally abundant resources, some of which were later domesticated35. Our results indicate that residential mobility scales negatively with population density in foraging societies (Fig. 1). Therefore, increasingly sedentary lifestyles may have contributed further to increases in population density and the potential for innovation. Increases in the productivity of wild progenitors of important domesticates may have also facilitated growing population densities and the viability of cultivation for food production15,16.

population-density-foragers
Predictions of potential population density for foragers. a–c, Predicted population densities at 4,000 (a), 10,000 (b) and 21,000 (c) YBP. Blue hues depict potential population densities below the median population density of observed foraging societies, and red hues depict potential population densities above the median. The second red hue and above are greater than the mean population density of observed foraging societies. Note the increase in area, through time, with potential population densities greater than the mean of observed foraging societies (number of 0.5° × 0.5° cells: 21,000 YBP = 3,027; 4,000 YBP = 4,673). For example, a notable increase in the number of red cells in the Sudanic savannah and Ganges of East India (Northeast India) between panels c and a.

It is also possible that improving environmental conditions may have resulted in a situation where necessity drove the origins of domestication. For example, population densities may have increased in foraging societies that occupied productive, coastal areas, causing an outflow of groups into regions with less ideal conditions where the cultivation of plants and animals was required to secure adequate food resources6,17,18. Our results cannot support, or refute, the possible influence the outflow of people from hospitable locations to less ideal environments may have played. A detailed understanding of the movements of ancient populations is required for more rigorous testing of the role that forced habitation of marginal environments may have played in the origins of domestication at particular sites.

See also:

Mitochondrial DNA unsuitable to test for IBD, and undersampling genomes show biased time and rate estimates

Two interesting papers questioning previous methods have been published.

Open access Mitochondrial DNA is unsuitable to test for isolation by distance, by Teske et al. Scientific Reports (2018) 8:8448.

Abstract (emphasis mine):

Tests for isolation by distance (IBD) are the most commonly used method of assessing spatial genetic structure. Many studies have exclusively used mitochondrial DNA (mtDNA) sequences to test for IBD, but this marker is often in conflict with multilocus markers. Here, we report a review of the literature on IBD, with the aims of determining (a) whether significant IBD is primarily a result of lumping spatially discrete populations, and (b) whether microsatellite datasets are more likely to detect IBD when mtDNA does not. We also provide empirical data from four species in which mtDNA failed to detect IBD by comparing these with microsatellite and SNP data. Our results confirm that IBD is mostly found when distinct regional populations are pooled, and this trend disappears when each is analysed separately. Discrepancies between markers were found in almost half of the studies reviewed, and microsatellites were more likely to detect IBD when mtDNA did not. Our empirical data rejected the lack of IBD in the four species studied, and support for IBD was particularly strong for the SNP data. We conclude that mtDNA sequence data are often not suitable to test for IBD, and can be misleading about species’ true dispersal potential. The observed failure of mtDNA to reliably detect IBD, in addition to being a single-locus marker, is likely a result of a selection-driven reduction in genetic diversity obscuring spatial genetic differentiation.

ibd-plot
Plots of geographic distances vs. F-statistics for the following species (plots on the left show mtDNA data, those on the right SNP or microsatellite data): (a) Sardinops sagax; (b) Psammogobius knysnaensis; (c) Nerita atramentosa; (d) Siphonaria diemenensis. The density of data points is indicated by colours.

Behind paywall, Undersampling Genomes has Biased Time and Rate Estimates Throughout the Tree of Life, by Julie Marin and S. Blair Hedges, Mol Biol Evol (2018), msy103.

Abstract (emphasis mine):

Genomic data drive evolutionary research on the relationships and timescale of life but the genomes of most species remain poorly sampled. Phylogenetic trees can be reconstructed reliably using small data sets and the same has been assumed for the estimation of divergence time with molecular clocks. However, we show here that undersampling of molecular data results in a bias expressed as disproportionately shorter branch lengths and underestimated divergence times in the youngest nodes and branches, termed the small sample artifact. In turn, this leads to increasing speciation and diversification rates towards the present. Any evolutionary analyses derived from these biased branch lengths and speciation rates will be similarly biased. The widely used timetrees of the major species-rich studies of amphibians, birds, mammals, and squamate reptiles are all data-poor and show upswings in diversification rate, suggesting that their results were biased by undersampling. Our results show that greater sampling of genomes is needed for accurate time and rate estimation, which are basic data used in ecological and evolutionary research.

speciation-rate
Potential biases on speciation rate estimation. The black line represents constant speciation rate as expected if there are no artifacts or other factors affecting the rate. The small sample artifact (an insufficient number of variable sites) may impact all of the tree and diversification plot, resulting in a rate increase towards the present. The taxonomic artifact (incomplete sampling of taxa or lineages) also may impact all of the tree and diversification plot and results in a speciation rate decrease towards the present. The sparse nodes artifact (stochastic effect of a limited number of nodes) may impact the beginning of the diversification plot, causing decreases or increases in rate.

Do you remember the male-biased expansion from the Pontic-Caspian steppe, that was later contested in its methods by Lazaridis and Reich, but that is today again accepted by Reich and Lazaridis (probably for different reasons, namely Y-DNA evidence)?

Every time I read this kind of studies rejecting previous methods – which get written and published only because there is a future interest in them, not because they are (or may cause) retractions of previous results and interpretations – I remember these people inventing migration models based on genomic studies and saying “genetics is a science, linguistics/archaeology/anthropology is not”…

NOTE. Even if papers eventually receive a correction, journalists and blogs will keep echoing whatever gets published (see the famous Dennis/Denise will become dentists); there is no end to that. Believe it or not, we still see Underhill et al. (2014) being cited against the most recent papers, and even against the author’s own rejection of his paper’s results

Especially right now, it must cause some kind of dissociated reasoning among those naysayers, when they need to resort to anthropological disciplines to discuss the latest interpretations of a potential Caucasus origin or North Iranian homeland of Proto-Indo-European…

EDIT (5 JUN 2018): Also, check out the recent review From genome-wide associations to candidate causal variants by statistical fine-mapping, by Schaid, Chen, and Larson.

Related:

Contrastive principal component analysis (cPCA) to explore patterns specific to a dataset

Interesting open access paper Exploring patterns enriched in a dataset with contrastive principal component analysis, by Abid, Zhang, Bagaria & Zou, Nature Communications (2018) 9:2134.

Abstract (emphasis mine):

Visualization and exploration of high-dimensional data is a ubiquitous challenge across disciplines. Widely used techniques such as principal component analysis (PCA) aim to identify dominant trends in one dataset. However, in many settings we have datasets collected under different conditions, e.g., a treatment and a control experiment, and we are interested in visualizing and exploring patterns that are specific to one dataset. This paper proposes a method, contrastive principal component analysis (cPCA), which identifies low-dimensional structures that are enriched in a dataset relative to comparison data. In a wide variety of experiments, we demonstrate that cPCA with a background dataset enables us to visualize dataset-specific patterns missed by PCA and other standard methods. We further provide a geometric interpretation of cPCA and strong mathematical guarantees. An implementation of cPCA is publicly available, and can be used for exploratory data analysis in many applications where PCA is currently used.

contrastive-pca-process
Schematic Overview of cPCA. To perform cPCA, compute the covariance matrices C X , C Y of the target and background datasets. The singular vectors of the weighted difference of the covariance matrices, C X  − α · C Y , are the directions returned by cPCA. As shown in the scatter plot on the right, PCA (on the target data) identifies the direction that has the highest variance in the target data, while cPCA identifies the direction that has a higher variance in the target data as compared to the background data. Projecting the target data onto the latter direction gives patterns unique to the target data and often reveals structure that is missed by PCA. Specifically, in this example, reducing the dimensionality of the target data by cPCA would reveal two distinct clusters

The Mexican example caught my attention:

Relationship between ancestral groups in Mexico

In previous examples, we have seen that cPCA allows the user to discover subclasses within a target dataset that are not labeled a priori. However, even when subclasses are known ahead of time, dimensionality reduction can be a useful way to visualize the relationship within groups. For example, PCA is often used to visualize the relationship between ethnic populations based on genetic variants, because projecting the genetic variants onto two dimensions often produces maps that offer striking visualizations of geographic and historic trends26,27. But again, PCA is limited to identifying the most dominant structure; when this represents universal or uninteresting variation, cPCA can be more effective at visualizing trends.

The dataset that we use for this example consists of single nucleotide polymorphisms (SNPs) from the genomes of individuals from five states in Mexico, collected in a previous study28. Mexican ancestry is challenging to analyze using PCA since the PCs usually do not reflect geographic origin within Mexico; instead, they reflect the proportion of European/Native American heritage of each Mexican individual, which dominates and obscures differences due to geographic origin within Mexico (see Fig. 4a). To overcome this problem, population geneticists manually prune SNPs, removing those known to derive from Europeans ancestry, before applying PCA. However, this procedure is of limited applicability since it requires knowing the origin of the SNPs and that the source of background variation to be very different from the variation of interest, which are often not the case.

cpca-mexico
Relationship between Mexican ancestry groups. a PCA applied to genetic data from individuals from 5 Mexican states does not reveal any visually discernible patterns in the embedded data. b cPCA applied to the same dataset reveals patterns in the data: individuals from the same state are clustered closer together in the cPCA embedding. c Furthermore, the distribution of the points reveals relationships between the groups that matches the geographic location of the different states: for example, individuals from geographically adjacent states are adjacent in the embedding. c Adapted from a map of Mexico that is originally the work of User:Allstrak at Wikipedia, published under a CC-BY-SA license, sourced from https://commons.wikimedia.org/wiki/File:Mexico_Map.svg

As an alternative, we use cPCA with a background dataset that consists of individuals from Mexico and from Europe. This background is dominated by Native American/European variation, allowing us to isolate the intra-Mexican variation in the target dataset. The results of applying cPCA are shown in Fig. 4b. We find that individuals from the same state in Mexico are embedded closer together. Furthermore, the two groups that are the most divergent are the Sonorans and the Mayans from Yucatan, which are also the most geographically distant within Mexico, while Mexicans from the other three states are close to each other, both geographically as well as in the embedding captured by cPCA (see Fig. 4c). See also Supplementary Fig. 6 for more details.

So, by using a background dataset, it discovers patterns in a single target dataset via dimensionality reduction, that standard dimensionality reduction techniques do not discover. Maybe useful for some prehistoric populations, too…

They have released a Python implementation of cPCA on GitHub: https://github.com/abidlabs/contrastive, including Python notebooks and datasets.

See also: