A history of male migration in and out of the Green Sahara

Open access research highlight A history of male migration in and out of the Green Sahara, by Yali Xue, Genome Biology (2018) 19:30, on the recent paper by D’Atanasio et al.

Insights from the Green Saharan Y-chromosomal findings (emphasis mine):

It is widely accepted that sub-Saharan Y chromosomes are dominated by E-M2 lineages carried by Bantu-speaking farmers as they expanded from West Africa starting < 5 kya, reaching South Africa within recent centuries [4]. The E-M2-Bantu lineages lie phylogenetically within the E-M2-Green Sahara lineage and show at least three explosive lineage expansions beginning 4.9–5.3 kya [5] (Fig. 1a). These events of E-M2-Bantu expansion are slightly later than the R-V88 expansion, and highlight the range of male demographic changes in the mid-Holocene. North of the Sahara, in addition to the four trans-Saharan haplogroups, haplogroup E-M81 (which diverged from E-M78 ~ 13 kya) became very common in present-day populations as a result of another massive expansion ~ 2 kya [6] (Fig. 1a).

Simplified Y-chromosomal phylogeny and inferred past or observed present-day distribution of relevant Y-chromosomal lineages. a Calibrated phylogenetic tree of Y-chromosomal lineages discussed in the text. Green shading represents the period when the present-day Sahara Desert was green and fertile. Lineages represented by filled pentagons have undergone very rapid expansions. b [featured image] The Green Sahara period 5–12 kya. Green shading indicates that the present-day Sahara Desert was green and fertile. The colors within the large oval represent the four Y-chromosomal haplogroups deduced to be present in the region at this time; specific locations are not implied. The arrows indicate the inferred origins of these haplogroups to the north or south, but specific origins and routes are not implied. c The present-day distributions of the four Green Saharan Y-chromosomal haplogroups. Yellow shading indicates the Sahara Desert. Each circle represents a sampled population, with the presence or absence of the four Green Saharan haplogroups shown by the colored sectors; other haplogroups may also be present in these populations, but are not shown. The small arrows indicate the inferred northwards and southwards movements of these haplogroups when the Sahara became uninhabitable.

Although Y chromosomes exist within populations and so share and reflect the general history of those populations, they can sometimes show some departures from other parts of the genome that result from differences in male and female behaviors. D’Atanasio et al. [1] highlight one such contrast in their study. Present-day North African populations show substantial sub-Saharan autosomal and mtDNA genetic components ascribed to the Roman and Arab slave trades 1–2 kya [7], but carry few sub-Saharan Y lineages from this source, probably reflecting the smaller numbers of male slaves and their reduced reproductive opportunities when compared to those of female slaves. The sub-Saharan Y chromosomes in these North African populations thus originate predominantly from the earlier Green Sahara period.

In this part of Africa, the indigenous languages that are spoken belong to three of the four African linguistic families (Afro-Asiatic, Nilo-Saharan and Niger-Congo). Interestingly, these languages show non-random associations with Y lineages. For example, Chadic languages within the Afro-Asiatic family are associated with haplogroup R-V88, whereas Nilo-Saharan languages are associated with specific sublineages within A3-M13 and E-M78, further illustrating the complex human history of the region.

The main question after D’Atanasio et al. (2018) is thus:

(…) what are the reasons for the very rapid R-V88 expansion 5–6 kya [1] and E-M81 expansion ~ 2 kya [6], and how do these expansions fit within general worldwide patterns of male-specific expansions, which in other cases have been linked to cultural and technological changes [5]?

I think that the only known haplogroup expansion that might fit today the spread and dialectalization of Afroasiatic, a proto-language probably contemporaneous or slighly older than Middle Proto-Indo-European, is that of R1b-V88 lineages. However, without ancient DNA samples to corroborate this, we cannot be sure.

See also:

Genetic ancestry of Hadza and Sandawe peoples reveals ancient population structure in Africa

Open access paper Genetic Ancestry of Hadza and Sandawe Peoples Reveals Ancient Population Structure in Africa, by Shriner, Tekola-Ayele, Adeyemo, & Rotimi, GBE (2018).

Abstract (emphasis mine):

The Hadza and Sandawe populations in present-day Tanzania speak languages containing click sounds and therefore thought to be distantly related to southern African Khoisan languages. We analyzed genome-wide genotype data for individuals sampled from the Hadza and Sandawe populations in the context of a global data set of 3,528 individuals from 163 ethno-linguistic groups. We found that Hadza and Sandawe individuals share ancestry distinct from and most closely related to Omotic ancestry; share Khoisan ancestry with populations such as ≠Khomani, Karretjie, and Ju/’hoansi in southern Africa; share Niger-Congo ancestry with populations such as Yoruba from Nigeria and Luhya from Kenya, consistent with migration associated with the Bantu Expansion; and share Cushitic ancestry with Somali, multiple Ethiopian populations, the Maasai population in Kenya, and the Nama population in Namibia. We detected evidence for low levels of Arabian, Nilo-Saharan, and Pygmy ancestries in a minority of individuals. Our results indicate that west Eurasian ancestry in eastern Africa is more precisely the Arabian parent of Cushitic ancestry. Relative to the Out-of-Africa migrations, Hadza ancestry emerged early whereas Sandawe ancestry emerged late.


In the Hadza population, the distribution of Y chromosomes includes mostly B2 haplogroups, with a smaller number of E1b1a haplogroups, which are common in Niger-Congo-speaking populations, and E1b1b haplogroups, which are common in Cushitic populations (Tishkoff, et al. 2007). In the Sandawe population, E1b1a and E1b1b haplogroups are more common, with lower frequencies of B2 and A3b2 haplogroups (Tishkoff, et al. 2007).

We found that Hadza ancestry diverged early, rather than late. We found evidence for contributions of Cushitic and Niger-Congo ancestries in Tanzania, consistent with the movements of herding and cultivating Cushitic speakers ~4,000 years ago and agricultural Niger-Congo speakers ~2,500 years ago (Newman 1995). However, we did not find evidence of a substantial contribution of Nilo-Saharan ancestry that might have resulted from movement of pastoralist Nilo-Saharan speakers (Newman 1995). We also identified west Eurasian ancestry in eastern and southern African populations more precisely as the Arabian parent of Cushitic ancestry. Finally, our ancestry analyses support the hypothesis that Omotic, Hadza, and Sandawe languages group together, rather than Omotic languages belonging to the Afroasiatic family and Hadza and Sandawe languages belonging to the Khoisan family.

I don’t like linguistic assumptions from admixture analysis; especially from scarce modern samples, as in this case.

Nevertheless, these papers may help clarify the different nature of Omotic and Cushitic among Afroasiatic languages, and thus leave the origin of Afroasiatic either:

a) To the east, with the traditionalist Afroasiatic – Semitic/Hamitic homeland association.

Expansion of Afroasiatic

b) To the west, near modern Chadic languages (associated with the expansion of R1b-V88 subclades through a Green Sahara), as I suggested.


The demographic history and mutational load of African hunter-gatherers and farmers


Interesting new article (behind paywall), The demographic history and mutational load of African hunter-gatherers and farmers, Nat Ecol Evol (2018)

Abstract (emphasis mine):

Understanding how deleterious genetic variation is distributed across human populations is of key importance in evolutionary biology and medical genetics. However, the impact of population size changes and gene flow on the corresponding mutational load remains a controversial topic. Here, we report high-coverage exomes from 300 rainforest hunter-gatherers and farmers of central Africa, whose distinct subsistence strategies are expected to have impacted their demographic pasts. Detailed demographic inference indicates that hunter-gatherers and farmers recently experienced population collapses and expansions, respectively, accompanied by increased gene flow. We show that the distribution of deleterious alleles across these populations is compatible with a similar efficacy of selection to remove deleterious variants with additive effects, and predict with simulations that their present-day additive mutation load is almost identical. For recessive mutations, although an increased load is predicted for hunter-gatherers, this increase has probably been partially counteracted by strong gene flow from expanding farmers. Collectively, our predicted and empirical observations suggest that the impact of the recent population decline of African hunter-gatherers on their mutation load has been modest and more restrained than would be expected under a fully recessive model of dominance.

“Inferred demographic models of the studied populations. a, EUR-first branching model, in which ancestors of EUR (aEUR) diverged from African populations before the divergence of the ancestors of RHG (aRHG) and AGR (aAGR). b, RHG-first branching model, in which aRHG were the first to diverge from the other groups. c, AGR-first branching model, in which aAGR were the first to diverge from the other groups. We assumed an ancient change in the size of the ancestral population of all humans (ANC). We assumed that each subsequent divergence of populations was followed by an instantaneous change in the effective population size (Ne). We also assumed that there were two epochs of migration between the following population pairs: wAGR/aAGR and wRHG/aRHG, eAGR/aAGR and eRHG/aRHG, and EUR and eAGR/aAGR. The figure labels correspond to the parameters of the model estimated by maximum likelihood and the 95% confidence intervals assessed by bootstrapping by site 100 times (Supplementary Table 4). Vertical arrow corresponds to the direction of time, from past to present, with divergence times given on the left and expressed in thousand years ago(ka). Effective population sizes (N) are given within the diagram and expressed in thousands of individuals. Bold horizontal arrows indicate an estimated parameter for the effective strength of migration 2Nm > 1, while thin horizontal arrows indicate 2Nm ≤ 1.”

See also: