Villabruna cluster in Late Epigravettian Sicily supports South Italian corridor for R1b-V88

epipalaeolithic-whg-expansion

New preprint Late Upper Palaeolithic hunter-gatherers in the Central Mediterranean: new archaeological and genetic data from the Late Epigravettian burial Oriente C (Favignana, Sicily), by Catalano et al. bioRxiv (2019).

Interesting excerpts (emphasis mine):

Grotta d’Oriente is a small coastal cave located on the island of Favignana, the largest (~20 km2) of a group of small islands forming the Egadi Archipelago, ~5 km from the NW coast of Sicily.

The Oriente C funeral pit opens in the lower portion of layer 7, specifically sublayer 7D. Two radiocarbon dates on charcoal from the sublayers 7D (12149±65 uncal. BP) and 7E, 12132±80 uncal. BP are consistent with the associated Late Epigravettian lithic assemblages (Lo Vetro and Martini, 2012; Martini et al., 2012b) and refer the burial to a period between about 14200-13800 cal. BP, when Favignana was connected to the main island (Agnesi et al., 1993; Antonioli et al., 2002; Mannino et al. 2014).

sicily-grotta-oriente
A-B) Geographic location of Grotta d’Oriente.

The anatomical features of Oriente C are close to those of Late Upper Palaeolithic populations of the Mediterranean and show strong affinity with other Palaeolithic individuals of Sicily. As suggested by Henke (1989) and Fabbri (1995) the hunter-gatherer populations were morphologically rather uniform.

Genetic analysis

We confirmed the originally reported mitochondrial haplogroup assignment of U2’3’4’7’8’9. This haplogroup is present in both pre- and post-LGM populations, but is rare by the Mesolithic, when U5 dominates (Posth et al.2016).

Lipson et al. (2018) (their supplementary Figure S5.1) and Villalba-Mouco et al. (2019) (their Figure 2A) showed that European Late Palaeolithic and Mesolithic hunter-gatherers fall along two main axes of genetic variation. Multidimensional scaling (MDS) of f3-statistics shows that these axes form a “V” shape (Fig. 3). (…)

Focusing further on Oriente C, we find that it shares most drift with individuals from Northern Italy, Switzerland and Luxembourg, and less with individuals from Iberia, Scandinavia, and East and Southeast Europe (Fig. 4A-B). Shared drift decreases significantly with distance (Fig. 4C) and with time (Fig. 4D) although in a linear model of drift with distance and time as a covariate, only distance (p=1.3×10-6) and not time (p=0.11) is significant. Consistent with the overall E-W cline in hunter-gatherer ancestry, genetic distance to Oriente C increases more rapidly with longitude than latitude, although this may also be affected by geographic features. For example, Oriente C shares significantly more drift with the 8,000 year-old 1,400 km distant individual from Loschbour in Luxembourg (Lazaridis et al.,2014), than with the 9,000 year old individual from Vela Spila in Croatia (Mathieson et al.,2018) only 700 km away as shown by the D-statistic (Patterson et al.,2012) D (Mbuti, Oriente C, Vela Spila, Villabruna); Z=3.42. Oriente C’s heterozygosity was slightly lower than Villabruna (14% lower at 1240k transversion sites), but this difference is not significant (bootstrap P=0.12).

oriente-c-villabruna-f3-statistics
Multidimensional scaling of outgroup f3-statistics for Late 531 Upper Palaeolithic and Mesolithic hunter-gatherers.

Discussion and Conclusion

The robust record of radiocarbon dates proves that they reached Sicily not before 15-14 ka cal. BP, several millennia after the LGM peak. In our opinion, in fact, the hypothesis about an early colonization of Sicily by Aurignacians (Laplace, 1964; Chilardi et al., 1996) must be rejected, on the basis of a recent reinterpretation of the techno-typological features of the lithic industries from Riparo di Fontana Nuova (Martini et al., 2007; Lo Vetro and Martini, 2012; on this topic see also Di Maida et al., 2019).

These analyses have implications for understanding the origin and diffusion of the hunter-gatherers that inhabited Europe during the Late Upper Palaeolithic and Mesolithic. Our findings indicate that Oriente C shows a strong genetic relationship with Western European Late Upper Palaeolithic and Mesolithic hunter-gatherers, suggesting that the “Western hunter-gatherers” was a homogeneous population widely distributed in the Central Mediterranean, presumably as a consequence of continuous gene flow among different groups, or a range expansion following the LGM.

shared-drift-whg-villabruna-oriente-c
The same statistic as in A plotted with geographic position

The South Italian corridor

Once again, a hypothesis based on phylogeography – apart from scarce archaeological and palaeolinguistic data (“Semitic”-like topo-hydronymy and substrates in Europe) – seems to be confirmed step by step. Since the finding of the Villabruna individual of hg. R1b-L754 (likely R1b-V88, like south-eastern European lineages expanded with WHG ancestry), it was quite likely to find out that southern Europe was the origin of the expansion of R1b-V88 into Africa.

The most likely explanation for the presence of “archaic” R1b-V88 subclades among modern Sardinians was, therefore, that they represented a remnant from a Late Upper Palaeolithic/Early Mesolithic population that had not been replaced in subsequent migrations, and thus that the migration of these lineages into Northern Africa and the Green Sahara happened during a period when Italy was connected by a shallower Mediterranean (and more land connections) to Northern Africa.

late-epigravettian
Likely Late Epigravettian/Mesolithic expansion of R1b-V88 into Northern Africa. See full map.

Nevertheless, the arguments for a quite recent expansion of R1b-V88 through the Mediterranean and into Africa keep being repeated, probably based on ancestry from the few ancient (and many modern) populations that have been investigated to date, a simplistic approach prone to important errors that overarch whole migration models.

For example, in the recent paper by Marcus et al. (2019) the presence of these lineages among ancient Sardinians (from the late 4th millennium BC on) is interpreted as an expansion of R1b-V88 with the Cardial Neolithic based on their ancestry, disregarding the millennia-long gap between these samples and the presence of this haplogroup in Palaeolithic/Mesolithic Northern Iberia and Northern Italy, and the comparatively much earlier splits in the phylogenetic tree and dispersal among African populations.

Afroasiatic and Nostratic

I was asked recently if I really believed that we could reconstruct Proto-Nostratic and connect it with any ancestral population. My answer is simple: until the Chalcolithic – when the whole picture of Indo-Europeans, Uralians, Egyptians or Semites becomes quite clear – we have just very few (linguistic, archaeological, genetic) dots which we would like to connect, and we do so the best we can. The earlier the population and proto-language, the more difficult this task becomes.

NOTE. 1) I tentatively connected hg. R with Nostratic in a previous text – when it appeared that R1a expanded from around Lake Baikal, hence Eurasiatic; R1b from the south with AME-WHG ancestry, hence Afroasiatic; and R2 with Dravidian.

2) After that, I though it was more likely to be connected to AME ancestry and the Middle East, because of the apparent expansion of WHG from south-eastern Europe, and the potential association of Afroasiatic and (Elamo-?)Dravidian to Middle Eastern populations.

3) However, after finding more and more R1b samples expanding through northern Eurasia, spreading through the (then wider) steppe regions; and R1a essentially surviving among other groups in eastern Europe for thousands of years without being associated to significant migrations (like, say, hg. C after the Palaeolithic), it didn’t seem like this division was accurate, hence my most recent version.

But, in essence, it’s all about connecting the dots, and we have very few of them…

eurasiatic-phylum-ultraconserved-words
Phylogenetic tree from Pagel et al. (2013), partially in agreement with Kortlandt’s view on Eurasiatic. “Consensus phylogenetic tree of Eurasiatic superfamily (A) superimposed on Eurasia and (B) rooted tree with estimated dates of origin of families and of superfamily. (A) Unrooted consensus tree with branch lengths (solid lines) shown to scale and illustrating the correspondence between the tree and the contemporary north-south and east-west geographical positions of these language families. Abbreviations: P (proto) followed by initials of language family: PD, proto-Dravidian; PK, proto-Kartvelian; PU, proto-Uralic; PIE, proto–Indo-European; PA, proto-Altaic; PCK, proto–Chukchi-Kamchatkan; PIY, proto–Inuit-Yupik. The dotted line to PIY extends the inferred branch length into the area in which Inuit-Yupik languages are currently spoken: it is not a measure of divergence. The cross-hatched line to PK indicates that branch has been shortened (compare with B). The branch to proto-Dravidian ends in an area that Dravidian populations are thought to have occupied before the arrival of Indo-Europeans (see main text). (B) Consensus tree rooted using proto-Dravidian as the outgroup. The age at the root is 14.45 ± 1.75 kya (95% CI = 11.72–18.38 kya) or a slightly older 15.61 ± 2.29 kya (95% CI = 11.72–20.40 kya) if the tree is rooted with proto-Kartvelian. The age assumes midpoint rooting along the branch leading to proto-Dravidian (rooting closer to PD would produce an older root, and vice versa), and takes into account uncertainty around proto–Indo-European date of 8,700 ± 544 (SD) y following ref. 35 and the PCK date of 692 ± 67 (SD) y ago.”

In linguistics, I trust traditional linguists who tend to trust other more experimental linguists (like Hyllested or Kortlandt) who consider that – in their experience – an Indo-Uralic and a Eurasiatic phylum can be reconstructed. Similarly, linguists like Kortlandt are apparently (partially) supportive of attempts like that of Allan Bomhard with Nostratic – although almost everyone is critic of the Muscovite school‘s attachment to the Brugmannian reconstruction, stuck in pre-laryngeal Proto-Indo-Anatolian and similar archaisms.

I mostly use Nostratic as a way to give a simplistic ethnolinguistic label to the genetically related prehistoric peoples whose languages we will probably never know. I think it’s becoming clear that the strongest connection right now with the expansion of potential Eurasiatic dialects is offered by ANE-related populations (hence Y-chromosome bottlenecks under hg. R, Q, probably also N), however complicated the reconstruction of that hypothetic community (and its dialectalization) may be.

Therefore, the multiple expansions of lineages more or less closely associated to ANE-related peoples – like R1b-V88 in the case of Afrasian, or R2 in the case of Dravidians – are the easiest to link to the traditionally described Nostratic dialects and their highly hypothetic relationship.

green-sahara-neolithic
Reconstruction of North African vegetation during past green Sahara periods. Estimated and reconstructed MAP for the Holocene GSP (6–10 kyr BP) projected onto a cross-section along the eastern Sahara (left panel) and map view of reconstructed MAP, vegetation and physiographic elements [7,8,11,45] (right panel). Image from Larrasoaña et al. (2013).

What should be clear to anyone is that the attempt of many modern Afroasiatic speakers to connect their language to their own (or their own community’s main) haplogroups, frequently E and/or J, is flawed for many reasons; it was simplistic in the 2000s, but it is absurd after the advent of ancient DNA investigation and more recent investigation on SNP mutation rates. R1b-V88 should have been on the table of discussions about the expansion of Afroasiatic communities through the Green Sahara long ago, whether one supports a Nostratic phylum or not.

The fact that the role of R1b bottlenecks and expansions in the spread of Afroasiatic is usually not even discussed despite their likely connection with the most recent population expansions through the Green Sahara fitting a reasonable time frame for Proto-Afroasiatic reconstruction, a reasonable geographical homeland, and a compatible dialectal division – unlike many other proposed (E or J) subclades – reveals (once again) a lot about the reasons behind amateur interest in genetics.

Just like seeing the fixation in (and immobility of) recent writings about the role of I1, I2, or (more recently) R1a in the Proto-Indo-European expansion, R1b with Vasconic, or N1c with Proto-Uralic.

NOTE. That evident interest notwithstanding, it is undeniable that we have a much better understanding of the expansions of R1b subclades than other haplogroups, probably due in great part to the easier recovery of ancient DNA from Eurasia (and Europe in particular), for many different – sociopolitical, geographical, technological – reasons. It is quite possible that a more thorough temporal transect of ancient DNA from the Middle East and Africa might radically change our understanding of population movements, especially those related to the Afroasiatic expansion. I am referring in this post to interpretations based on the data we currently have, despite that potential R1b-based bias.

Related

A Song of Sheep and Horses, revised edition, now available as printed books

cover-song-sheep-and-horses

As I said 6 months ago, 2019 is a tough year to write a blog, because this was going to be a complex regional election year and therefore a time of political promises, hence tenure offers too. Now the preliminary offers have been made, elections have passed, but the timing has slightly shifted toward 2020. So I may have the time, but not really any benefit of dedicating too much effort to the blog, and a lot of potential benefit of dedicating any time to evaluable scientific work.

On the other hand, I saw some potential benefit for publishing texts with ISBNs, hence the updates to the text and the preparation of these printed copies of the books, just in case. While Spain’s accreditation agency has some hard rules for becoming a tenured professor, especially for medical associates (whose years of professional experience are almost worthless compared to published peer-reviewed papers), it is quite flexible in assessing one’s merits.

However, regional and/or autonomous entities are not, and need an official identifier and preferably printed versions to evaluate publications, such as an ISBN for books. I took thus some time about a month ago to update the texts and supplementary materials, to publish a printed copy of the books with Amazon. The first copies have arrived, and they look good.

series-song-sheep-horses-cover

Corrections and Additions

Titles
I have changed the names and order of the books, as I intended for the first publication – as some of you may have noticed when the linguistic book was referred to as the third volume in some parts. In the first concept I just wanted to emphasize that the linguistic work had priority over the rest. Now the whole series and the linguistic volume don’t share the same name, and I hope this added clarity is for the better, despite the linguistic volume being the third one.

Uralic dialects
I have changed the nomenclature for Uralic dialects, as I said recently. I haven’t really modified anything deeper than that, because – unlike adding new information from population genomics – this would require for me to do a thorough research of the most recent publications of Uralic comparative grammar, and I just can’t begin with that right now.

Anyway, the use of terms like Finno-Ugric or Finno-Samic is as correct now for the reconstructed forms as it was before the change in nomenclature.

west-east-uralic-schema

Mediterranean
The most interesting recent genetic data has come from Iberia and the Mediterranean. Lacking direct data from the Italian Peninsula (and thus from the emergence of the Etruscan and Rhaetian ethnolinguistic community), it is becoming clearer how some quite early waves of Indo-Europeans and non-Indo-Europeans expanded and shrank – at least in West Iberia, West Mediterranean, and France.

Finno-Ugric
Some of the main updates to the text have been made to the sections on Finno-Ugric populations, because some interesting new genetic data (especially Y-DNA) have been published in the past months. This is especially true for Baltic Finns and for Ugric populations.

ananino-culture-new

Balto-Slavic
Consequently, and somehow unsurprisingly, the Balto-Slavic section has been affected by this; e.g. by the identification of Early Slavs likely with central-eastern populations dominated by (at least some subclades of) hg. I2a-L621 and E1b-V13.

Maps
I have updated some cultural borders in the prehistoric maps, and the maps with Y-DNA and mtDNA. I have also added one new version of the Early Bronze age map, to better reflect the most likely location of Indo-European languages in the Early European Bronze Age.

As those in software programming will understand, major changes in the files that are used for maps and graphics come with an increasing risk of additional errors, so I would not be surprised if some major ones would be found (I already spotted three of them). Feel free to communicate these errors in any way you see fit.

bronze-age-early-indo-european
European Early Bronze Age: tentative langage map based on linguistics, archaeology, and genetics.

SNPs
I have selected more conservative SNPs in certain controversial cases.

I have also deleted most SNP-related footnotes and replaced them with the marking of each individual tentative SNP, leaving only those footnotes that give important specific information, because:

  • My way of referencing tentative SNP authors did not make it clear which samples were tentative, if there were more than one.
  • It was probably not necessary to see four names repeated 100 times over.
  • Often I don’t really know if the person I have listed as author of the SNP call is the true author – unless I saw the full SNP data posted directly – or just someone who reposted the results.
  • Sometimes there are more than one author of SNPs for a certain sample, but I might have added just one for all.
ancient-dna-all
More than 6000 ancient DNA samples compiled to date.

For a centralized file to host the names of those responsible for the unofficial/tentative SNPs used in the text – and to correct them if necessary -, readers will be eventually able to use Phylogeographer‘s tool for ancient Y-DNA, for which they use (partly) the same data I compiled, adding Y-Full‘s nomenclature and references. You can see another map tool in ArcGIS.

NOTE. As I say in the text, if the final working map tool does not deliver the names, I will publish another supplementary table to the text, listing all tentative SNPs with their respective author(s).

If you are interested in ancient Y-DNA and you want to help develop comprehensive and precise maps of ancient Y-DNA and mtDNA haplogroups, you can contact Hunter Provyn at Phylogeographer.com. You can also find more about phylogeography projects at Iain McDonald’s website.

Graphics
I have also added more samples to both the “Asian” and the “European” PCAs, and to the ADMIXTURE analyses, too.

I previously used certain samples prepared by amateurs from BAM files (like Botai, Okunevo, or Hittites), and the results were obviously less than satisfactory – hence my criticism of the lack of publication of prepared files by the most famous labs, especially the Copenhagen group.

Fortunately for all of us, most published datasets are free, so we don’t have to reinvent the wheel. I criticized genetic labs for not releasing all data, so now it is time for praise, at least for one of them: thank you to all responsible at the Reich Lab for this great merged dataset, which includes samples from other labs.

NOTE. I would like to make my tiny contribution here, for beginners interested in working with these files, so I will update – whenever I have time – the “How To” sections of this blog for PCAs, PCA3d, and ADMIXTURE.

-iron-age-europe-romans
Detail of the PCA of European Iron Age populations. See full versions.

ADMIXTURE
For unsupervised ADMIXTURE in the maps, a K=5 is selected based on the CV, giving a kind of visual WHG : NWAN : CHG/IN : EHG : ENA, but with Steppe ancestry “in between”. Higher K gave worse CV, which I guess depends on the many ancient and modern samples selected (and on the fact that many samples are repeated from different sources in my files, because I did not have time to filter them all individually).

I found some interesting component shared by Central European populations in K=7 to K=9 (from CEU Bell Beakers to Denmark LN to Hungarian EBA to Iberia BA, in a sort of “CEU BBC ancestry” potentially related to North-West Indo-Europeans), but still, I prefer to go for a theoretically more correct visualization instead of cherry-picking the ‘best-looking’ results.

Since I made fun of the search for “Siberian ancestry” in coloured components in Tambets et al. 2018, I have to be consistent and preferred to avoid doing the same here…

qpAdm
In the first publication (in January) and subsequent minor revisions until March, I trusted analyses and ancestry estimates reported by amateurs in 2018, which I used for the text adding my own interpretations. Most of them have been refuted in papers from 2019, as you probably know if you have followed this blog (see very recent examples here, here, or here), compelling me to delete or change them again, and again, and again. I don’t have experience from previous years, although the current pattern must have been evidently repeated many times over, or else we would be still talking about such previous analyses as being confirmed today…

I wanted to be one step ahead of peer-reviewed publications in the books, but I prefer now to go for something safe in the book series, rather than having one potentially interesting prediction – which may or may not be right – and ten huge mistakes that I would have helped to endlessly redistribute among my readers (online and now in print) based on some cherry-picked pairwise comparisons. This is especially true when predictions of “Steppe“- and/or “Siberian“-related ancestry have been published, which, for some reason, seem to go horribly wrong most of the time.

I am sure whole books can be written about why and how this happened (and how this is going to keep happening), based on psychology and sociology, but the reasons are irrelevant, and that would be a futile effort; like writing books about glottochronology and its intermittent popularity due to misunderstood scientist trends. The most efficient way to deal with this problem is to avoid such information altogether, because – as you can see in the current revised text – they wouldn’t really add anything essential to the content of these books, anyway.

Continue reading

Official site of the book series:
A Song of Sheep and Horses: eurafrasia nostratica, eurasia indouralica

Fulani from Cameroon show ancestry similar to Afroasiatic speakers from East Africa

sahel-region-fulani

Open access African evolutionary history inferred from whole genome sequence data of 44 indigenous African populations, by Fan et al. Genome Biology (2019) 20:82.

Interesting excerpts (emphasis mine):

Introduction

To extend our knowledge of patterns of genomic diversity in Africa, we generated high coverage (> 30×) genome sequencing data from 43 geographically diverse Africans originating from 22 ethnic groups, representing a broad array of ethnic, linguistic, cultural, and geographic diversity (Additional file 1: Table S1). These include a number of populations of anthropological interest that have never previously been characterized for high-coverage genome sequence diversity such as Afroasiatic-speaking El Molo fishermen and Nilo-Saharan-speaking Ogiek hunter-gatherers (Kenya); Afroasiatic-speaking Aari, Agaw, and Amhara agro-pastoralists (Ethiopia); Niger-Congo-speaking Fulani pastoralists (Cameroon); Nilo-Saharan-speaking Kaba (Central African Republic, CAR); and Laka and Bulala (Chad) among others. We integrated this data with 49 whole genome sequences generated as part of the Simons Genome Diversity Project (SGDP) [14] (…)

afroasiatic-samples
Locations of samples included in this study. Each dot is an individual and the color indicates the language classification

Results and discussion

We found that the CRHG populations from central Africa, including the Mbuti from the Demographic Republic of Congo (DRC), Biaka from the CAR, and Baka, Bakola, and Bedzan from Cameroon, also form a basal lineage in the phylogeny. The other two hunter-gatherer populations, Hadza and Sandawe, living in Tanzania, group with populations from eastern Africa (Fig. 2). The two Nilo-Saharan-speaking populations, the Mursi from southern Ethiopia and the Dinka from southern Sudan, group into a single cluster, which is consistent with archeological data indicating that the migration of Nilo-Saharan populations to eastern Africa originated from a source population in southern Sudan in the last 3000 years [4, 23, 24, 25].

phylogenetic-relationship-africans
Phylogenetic relationship of 44 African and 32 west Eurasian populations determined by a neighbor joining analysis assuming no admixture. Here, the dots of each node represent bootstrap values and the color of each branch indicates language usage of each population. Human_AA human ancestral alleles

The Fulani people are traditionally nomadic pastoralists living across a broad geographic range spanning Sudan, the Sahel, Central, and Western Africa. The Fulani in our study, sampled from Cameroon, clustered with the Afroasiatic-speaking populations in East Africa in the phylogenetic analysis, indicating a potential language replacement from Afroasiatic to Niger-Congo in this population (Fig. 2). Prior studies suggest a complex history of the Fulani; analyses of Y chromosome variation suggest a shared ancestry with Nilo-Saharan and Afroasiatic populations [24], whereas mtDNA indicates a West African origin [26]. An analysis based on autosomal markers found traces of West Eurasian-related ancestry in this population [4], which suggests a North African or East African origin (as North and East Africans also have such ancestry likely related to expansions of farmers and herders from the Near East) and is consistent with the presence at moderate frequency of the −13,910T variant associated with lactose tolerance in European populations [15, 16].

Phylogenetic reconstruction of the relationship of African individuals under a model allowing for migration using TREEMIX [27] largely recapitulates the NJ phylogeny with the exception of the Fulani who cluster near neighboring Niger-Congo-speaking populations with whom they have admixed (Additional file 2: Figure S1). Interestingly, TREEMIX analysis indicates evidence for gene flow between the Hadza and the ancestors of the Ju|‘hoan and Khomani San, supporting genetic, linguistic, and archeological evidence that Khoesan-speaking populations may have originated in Eastern Africa [28, 29, 30].

afroasiatic-niger-congo-admixture
ADMIXTURE analysis of 92 African and 62 West Eurasian individuals. Each bar is an individual and colors represent the proportion of inferred ancestry from K ancestral populations. The bottom bar shows the language classification of each individual. With the increasing of K, the populations are largely grouped by their current language usage

About the Fulani, this is what the referenced study of Y‐chromosome variation among 15 Sudanese populations by Hassan et al. (2008), had to say:

  • Haplogroups A-M13 and B-M60 are present at high frequencies in Nilo-Saharan groups except Nubians, with low frequencies in Afro-Asiatic groups although notable frequencies of B-M60 were found in Hausa (15.6%) and Copts (15.2%).
  • Haplogroup E (four different haplotypes) accounts for the majority (34.4%) of the chromosome and is widespread in the Sudan. E-M78 represents 74.5% of haplogroup E, the highest frequencies observed in Masalit and Fur populations. E-M33 (5.2%) is largely confined to Fulani and Hausa, whereas E-M2 is restricted to Hausa. E-M215 was found to occur more in Nilo-Saharan rather than Afro-Asiatic speaking groups.
  • In contrast, haplogroups F-M89, I-M170, J-12f2, and JM172 were found to be more frequent in the Afro-Asiatic speaking groups. J-12f2 and J-M172 represents 94% and 6%, respectively, of haplogroup J with high frequencies among Nubians, Copts, and Arabs.
  • Haplogroup K-M9 is restricted to Hausa and Gaalien with low frequencies and is absent in Nilo-Saharan and Niger-Congo.
  • Haplogroup R-M173 appears to be the most frequent haplogroup in Fulani, and haplogroup R-P25 has the highest frequency in Hausa and Copts and is present at lower frequencies in north, east, and western Sudan.
  • Haplogroups A-M51, A-M23, D-M174, H-M52, L-M11, OM175, and P-M74 were completely absent from the populations analyzed.
fulfulde-fulani-language
Image modified from “Fulfulde Language Family Report” Author: Annette Harrison; Cartographer: Irene Tucker; SIL International 2003.

This is what David Reich will talk about in the seminar Insights into language expansions from ancient DNA:

In this talk, I will describe how the new science of genome-wide ancient DNA can provide insights into past spreads of language and culture. I will discuss five examples: (1) the spread of Indo-European languages to Europe and South Asia in association with Steppe pastoralist ancestry, (2) the spread of Austronesian languages to the open Pacific islands in association with Taiwanese aboriginal-associated ancestry, (3) the spread of Austroasiatic languages through southeast Asia in association with the characteristic ancestry type that is also represented in western Indonesia suggesting that these languages were once widespread there, (4) the spread of Afroasiastic languages through in East Africa as part of the Pastoral Neolithic farming expansion, and (5) the spread of Na-Dene languages in North America in association with Proto-Paleoeskimo ancestry. I will highlight the ways that ancient DNA can meaningfully contribute to our understanding of language expansions—increasing the plausibility of some scenarios while decreasing the plausibility of others—while emphasizing that with genetic data by itself we can never definitively determine what languages ancient people spoke.

EDIT (3 MAY 2019): Apparently, there was not much to take from the talk:

neolithic-pastoralist-africa
Pastoralist Neolithic in Africa, through a pale-green Sahelo-Sudanian steppe corridor. See full map.

This seminar (and maybe some new paper on the Neolithic expansion in Africa) could shed light on population movements that may be related to the spread of Afroasiatic dialects. Until now, it seems that Bantu peoples have been more interesting for linguistics and archaeology, and South and East Africans for anthropology.

Archaeology in Africa appears to be in its infancy, as is population genomics. From the latest publication by Carina Schlebusch, Population migration and adaptation during the African Holocene: A genetic perspective, a chapter from Modern Human Origins and Dispersal (2019):

The process behind the introduction and development of farming in Africa is still unclear. It is not known how many independent invention events there were in the continent and to which extent the various first instances of farming in northern Africa are linked. Based on the archeological record, it was proposed that at least three regions in Africa may have developed agriculture independently: the Sahara/Sahel (around 7 ka), the Ethiopian highlands (7-4 ka), and western Africa (5-3 ka). In addition to these developments, the Nile River Valley is thought to have adopted agriculture (around 7.2 ka), from the Neolithic Revolution in the Middle East (Chapter 12 – Jobling et al. 2014; Chapter 35, 37 – Mitchell and Lane 2013). From these diverse centers of origin, farmers or farming practices spread to the rest of Africa, with domesticate animals reaching the southern tip of Africa ~2 ka and crop farming ~1,8 ka (Mitchell 2002; Huffman 2007)

african-popularion-movements
Schematic representation of possible migration routes related to the expansion of herders and crop farmers during Holocene times. Arrow color indicate source populations; Brown-Eurasian, Green-western African, Blue-eastern African.

Similar to the case in Europe and the 1990s-2000s wrong haplogroup history based on the modern distribution of R1b, R1a, N, or I2, it is possible that neither of the most often mentioned haplogroups linked to the Afroasiatic expansion, E and J, were responsible for its early spread within Africa, despite their widespread distribution in certain modern Afroasiatic-speaking areas. The fact that such assessments include implausible glottochronological dates spanning up to 20,000 years for the parent language, combined with regional language continuities despite archaeological changes, makes them even more suspicious.

Similar to the case with Indo-Europeans and the “steppe ancestry” concept of the 2010s, it may be that the often-looked-for West Eurasian ancestry among Africans is the effect of recent migrations, unrelated to the Afroasiatic expansion. The results of this paper could be offering another sign of how this ancestry may have expanded only quite recently westwards from East Africa through the Sahel, after the Semitic expansion to the south:

1. From approximately 1000 BC, accompanying Nilo-Saharan peoples.

2. From approximately AD 1500, with the different population movements related to the nomadic Fulani:

sahel-nomadic-sedentary
Image from Sahel in West African History – Oxford Research Encyclopedia of African History.
  • Arguably, since the Fulani caste system wasn’t as elaborate in northern Nigeria, eastern Niger, and Cameroon, these specific groups would be a good example of the admixture with eastern populations, based on the (proportionally) huge amount of slaves they dealt with.
  • Similarly, it could be argued that the castes-based social stratification in most other territories (including Sudan) would have helped them keep a genetic make-up similar to their region of origin in terms of ancient lineages, hence similar to Chadic populations from west to east.

Reich’s assertion of the association of the language expansion with the spread of Pastoral Neolithic is still too vague, but – based on previous publications of ancient DNA in Africa and the Levant – I don’t have high hopes for a revolutionary paper in the near future. Without many samples and proper temporal transects, we are stuck with speculations based on modern distributions and scarce historical data.

fula-people-distribution
A distribution map of Fula people. Dark green: a major ethnic group; Medium: significant; Light: minor. Modified from image by Sarah Welch at Wikipedia.

About the potential genetic make-up of Cameroon before the arrival of the Neolithic, from the recent SAA 84th Annual Meeting (Abstracts in PDF):

Lipson, Mark (Harvard Medical School), Mary Prendergast (Harvard University), Isabelle Ribot (Université de Montréal), Carles Lalueza-Fox (Institute of Evolutionary Biology CSIC-UPF) and David Reich (Harvard Medical School)

[253] Ancient Human DNA from Shum Laka (Cameroon) in the Context of African Population History We generated genome-wide DNA data from four people buried at the site of Shum Laka in Cameroon between 8000–3000 years ago. One individual carried the deeply divergent Y chromosome haplogroup A00 found at low frequencies among some present-day Niger-Congo speakers, but the genome-wide ancestry profiles for all four individuals are very different from the majority of West Africans today and instead are more similar to West-Central African hunter-gatherers. Thus, despite the geographic proximity of Shum Laka to the hypothesized birthplace of Bantu languages and the temporal range of our samples bookending the initial Bantu expansion, these individuals are not representative of a Bantu source population. We present a phylogenetic model including Shum Laka that features three major radiations within Africa: one phase early in the history of modern humans, one close to the time of the migration giving rise to non-Africans, and one in the past several thousand years. Present-day West Africans and some East Africans, in addition to Central and Southern African hunter-gatherers, retain ancestry from the first phase, which is therefore still represented throughout the majority of human diversity in Africa today.

Related

Happy new year 2019…and enjoy our new books!

song-sheep-horses-header

Sorry for the last weeks of silence, I have been rather busy lately. I am having more projects going on, and (because of that) I also wanted to finish a project I have been working on for many months already.

I have therefore decided to publish a provisional version of the text, in the hope that it will be useful in the following months, when I won’t be able to update it as often as I would like to:

EDIT (20 JAN 2019): For those of you who are more comfortable reading in your native language, I have placed some links to automatic translations by Google Translate. They might work especially well for the texts of A Game of Clans & A Clash of Chiefs.

Don’t forget to check out the maps included in the supplementary materials: I have added Y-DNA, mtDNA, and ADMIXTURE data using GIS software. The PCA graphics are also important to follow the main text.

NOTE. Right now the files are only in my server. I will try to upload them to Academia.edu and Research Gate when I have time, I have uploaded them to Academia.edu and ResearchGate, in case the websites are too slow.

I would have preferred to wait for a thorough revision of the section on archaeology and the linguistic sections on Uralic, but I doubt I will have time when the reviews come, so it was either now or maybe next December…

I say so in the introduction, but it is evident that certain aspects of the book are tentative to say the least: the farther back we go from Late Proto-Indo-European, the less clear are many aspects. Also, linguistically I am not convinced about Eurasiatic or Nostratic, although they do have a certain interest when we try to offer a comprehensive view of the past, including ethnolinguistic identities.

I cannot be an expert in everything, and these books cover a lot. I am bound to publish many corrections as new information appears and more reviews are sent. For example, just days ago (before SNP calls of Wang et al. 2018 were published) some paragraphs implied that AME might have expanded Nostratic from the Middle East. Now it does not seem so, and I changed them just before uploading the text. That’s how tentative certain routes are, and how much all of this may change. And that only if we accept a Nostratic phylum…

NOTE. Since the first book I wrote was the linguistic one, and I have spent the last months updating the archaeology + genetics part, now many of you will probably understand 1) why I am so convinced about certain language relationships and 2) how I used many posts to clarify certain ideas and receive comments. Many posts offer probably a good timeline of what I worked with, and when.

Acknowledgements

I did not add this section to the books, because they are still not ready for print, but I think this is due somewhere now. It is impossible to reference all who have directly or indirectly contributed to this, so this is a list of those I feel have played an important role.

I am indebted to the following people (which does not mean that they share my views, obviously):

First and foremost, to Fernando López-Menchero, for having the patience to review with detail many parts on Indo-European linguistics, knowing that I won’t accept many of his comments anyway. The additional information he offers is invaluable, but I didn’t want to turn this into a huge linguistic encyclopaedia with unending discussions of tiny details of each reconstructed word. I think it is already too big as it is.

I would not have thought about doing this if it were not for the interest of Wekwos (Xavier Delamarre) in publishing a full book about the Indo-European demic diffusion model (in the second half of 2017, I think). It was them who suggested that I extended the content, when all I had done until then was write an essay and draw some maps in my free time between depositing the PhD thesis and defending it.

Sadly, as much as I would like to publish a book with a professional publisher, I don’t think ancient DNA lends itself for the traditional format, so my requests (mainly to have free licenses and being able to review the text at will, as new genetic papers are published) were logically not acceptable. Also, the main aim of all volumes, especially the linguistic one, is the teaching of essentials of Late Proto-Indo-European and related languages, and this objective would be thwarted by selling each volume for $50-70 and only in printed format. I prefer a wider distribution.

At first I didn’t think much of this proposal, because I do not benefit from this kind of publications in my scientific field, but with time my interest in writing a whole, comprehensive book on the subject grew to the point where it was already an ongoing project, probably by the start of 2018.

I would not have been in contact with Wekwos if it were not for user Camulogène Rix at Anthrogenica, so thanks for that and for the interest in this work.

I would not have thought of writing this either if not for the spontaneous support (with an unexpected phone call!) of a professor of the Complutense University of Madrid, Ángel Gómez Moreno, who is interested in this subject – as is his wife, a professor of Classics more closely associated to Indo-European studies, and who helped me with a search for Indo-Europeanists.

EDIT (1 JAN 2019): I remembered that Karin Bojs sent me her book after reading the demic diffusion model. I may have also thought about writing a whole book back then, but mid-2017 is probably too early for the project.

Professor Kortlandt is still to review the text, but he contributed to both previous essays in some very interesting ways, so I hope he can help me improve the parts on Uralic, and maybe alternative accounts of expansion for Balto-Slavic, depending on the time depth that he would consider warranted according to the Temematic hypothesis.

The maps are evidently (for those who are interested in genetics) in part the result of the effort of the late Jean Manco: As you can see from the maps including Y-DNA and mtDNA samples, I have benefitted from her way of organising data and publishing it. Similarly, the work of Iain McDonald in assessing the potential migration routes of R1b and R1a in Europe with the help of detailed maps was behind my idea for the first maps, and consequently behind these, too.

I should thank all people responsible for the release of free datasets to work with, including the Reich and Jena labs, the Veeramah Lab, and also researchers from the Max Planck Institute or the Mainz Palaeogenetics group, who didn’t mind to share with me datasets to work with.

Readers of this blog with interesting comments have also been essential for the improvement of the texts. You can probably see some of your many contributions there. I may not answer many comments, because I am always busy (and sometimes I just don’t have anything interesting to say), but I try to read all of them.

EDIT (1 JAN 2019) I think I should mention at least Chetan, Egg, or Robert George; but then I would leave out old europe, Sgr Ganesh, or Tileman Ehlen; and if I include them I would leave out others…

Users of other sites, like Anthrogenica, whose particular points of view and deep knowledge of some very specific aspects are sometimes very useful. In particular, user Anglesqueville helped me to fix some issues with the merging of datasets to obtain the PCAs and ADMIXTURE, and prepared some individual samples to merge them.

Even without posting anything, Google Analytics keeps sending me messages about increasing user fidelity (returning users), and stats haven’t really changed (which probably means more people are reading old posts), so thank you for that.

I hope you enjoy the books.

Happy new year!

Sahara’s rather pale-green and discontinuous Sahelo-Sudanian steppe corridor, and the R1b – Afroasiatic connection

palaeolakes-world

Interesting new paper (behind paywall) Megalakes in the Sahara? A Review, by Quade et al. (2018).

Abstract (emphasis mine):

The Sahara was wetter and greener during multiple interglacial periods of the Quaternary, when some have suggested it featured very large (mega) lakes, ranging in surface area from 30,000 to 350,000 km2. In this paper, we review the physical and biological evidence for these large lakes, especially during the African Humid Period (AHP) 11–5 ka. Megalake systems from around the world provide a checklist of diagnostic features, such as multiple well-defined shoreline benches, wave-rounded beach gravels where coarse material is present, landscape smoothing by lacustrine sediment, large-scale deltaic deposits, and in places, tufas encrusting shorelines. Our survey reveals no clear evidence of these features in the Sahara, except in the Chad basin. Hydrologic modeling of the proposed megalakes requires mean annual rainfall ≥1.2 m/yr and a northward displacement of tropical rainfall belts by ≥1000 km. Such a profound displacement is not supported by other paleo-climate proxies and comprehensive climate models, challenging the existence of megalakes in the Sahara. Rather than megalakes, isolated wetlands and small lakes are more consistent with the Sahelo-Sudanian paleoenvironment that prevailed in the Sahara during the AHP. A pale-green and discontinuously wet Sahara is the likelier context for human migrations out of Africa during the late Quaternary.

The whole review is an interesting read, but here are some relevant excerpts:

Various researchers have suggested that megalakes coevally covered portions of the Sahara during the AHP and previous periods, such as paleolakes Chad, Darfur, Fezzan, Ahnet-Mouydir, and Chotts (Fig. 2, Table 2). These proposed paleolakes range in size by an order of magnitude in surface area from the Caspian Sea–scale paleo-Lake Chad at 350,000 km2 to Lake Chotts at 30,000 km2. At their maximum, megalakes would have covered ~ 10% of the central and western Sahara, similar to the coverage by megalakes Victoria, Malawi, and Tanganyika in the equatorial tropics of the African Rift today. This observation alone should raise questions of the existence of megalakes in the Sahara, and especially if they developed coevally. Megalakes, because of their significant depth and area, generate large waves that become powerful modifiers of the land surface and leave conspicuous and extensive traces in the geologic record.

megalakes-sahara
ETOPO1 digital elevation model (1 arc-minute; Amante and Eakins, 2009) of proposed megalakes in the Sahara Desert during the late Quaternary. Colors denote Köppen-Geiger climate zones: blue, Aw, Af, Am (tropical); light tan, Bwk, BSh, BSk, Csa, Csb, Cwb, Cfa, Cfb (temperate); red-brown, Bwh (arid, hot desert and steppe climate). Lake area at proposed megalake high stands and present Lake Victoria are in blue, and contributing catchment areas are shown as thin solid black lines. The main tributaries of Lake Chad are denoted by blue lines (from west to east: the Komadougou-Yobe, Logone, and Chari Rivers; source: Global Runoff Data Center, Koblenz, Germany). Rainfall isohyets (50, 200, 800, 1200, and 1600) are marked in dashed gray-scale lines. Physical parameters of each basin are shown in white boxes: Abt, total basin area; AW, lake area; Vw, lake volume; and aW= AW/Abt. Black dots mark the location of the paleohydrological records from Lezine et al. (2011), also compiled in Supplementary Table S5.

Lakes, megalakes, and wetlands

Active ground-water discharge systems abound in the Sahara today, although they were much more widespread in the AHP. They range from isolated springs and wet ground in many oases scattered across the Sahara (e.g., Haynes et al., 1989) to wetlands and small lakes (Kröpelin et al., 2008). Ground water feeding these systems is dominated by fossil AHP-age and older water (e.g., Edmunds and Wright 1979; Sonntag et al., 1980), although recently recharged water (<50 yr) has been locally identified in Saharan ground water (e.g., Sultan et al., 2000; Maduapuchi et al., 2006).

Megalake Chad

In our view, Lake Chad is the only former megalake in the Sahara firmly documented by sedimentologic and geomorphic evidence. Mega-Lake Chad is thought to have covered ~ 345,000 km2, stretching for nearly 8° (10–18°N) of latitude (Ghienne et al., 2002) (Fig. 2). The presence of paleo- Lake Chad was at one point challenged, but several—and in our view very robust—lines of evidence have been presented to support its development during the AHP. These include: (1) clear paleo-shorelines at various elevations, visible on the ground (Abafoni et al., 2014) and in radar and satellite images (Schuster et al., 2005; Drake and Bristow, 2006; Bouchette et al., 2010); (2) sand spits and shoreline berms (Thiemeyer, 2000; Abafoni et al., 2014); and (3) evaporites and aquatic fauna such as fresh-water mollusks and diatoms in basin deposits (e.g., Servant, 1973; Servant and Servant, 1983). Age determinations for all but the Holocene history of mega- Lake Chad are sparse, but there is evidence for Mio-Pliocene lake (s) (Lebatard et al., 2010) and major expansion of paleo- Lake Chad during the AHP (LeBlanc et al., 2006; Schuster et al., 2005; Abafoni et al., 2014; summarized in Armitage et al., 2015) up to the basin overflow level at ~ 329m asl.

Insights from hydrologic mass balance of megalakes

sahara-annaul-rainfall
Graph of mean annual rainfall (mm/yr) versus aw (area lake/area basin, AW/AL); their modeled relationship using our Sahelo-Sudanian hydrologic model for the different lake basins are shown as solid colored lines. Superimposed on this (dashed lines) are the aw values for individual megalake basins and the mean annual rainfall required to sustain them. Mean annual paleo-rainfall estimates of 200– 400 mm/yr during the AHP from fossil pollen and mollusk evidence is shown as a tan box. The intersection of this box with the solid colored lines describes the resulting aw for Saharan paleolakes on the y-axis. The low predicted values for aw suggest that very large lakes would not form under Sahelo-Sudanian conditions where sustained by purely local rainfall and runoff. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Using these conservative conditions (i.e., erring in the direction that will support megalake formation), our hydrologic models for the two biggest central Saharan megalakes (Darfur and Fezzan) require minimum annual average rainfall amounts of ~ 1.1 m/yr to balance moisture losses from their respective basins (Supplementary Table S1). Lake Chad required a similar amount (~1 m/yr; Supplementary Table S1) during the AHP according to our calculations, but this is plausible, because even today the southern third of the Chad basin receives ≥1.2 m/yr (Fig. 2) and experiences a climate similar to Lake Victoria. A modest 5° shift in the rainfall belt would bring this moist zone northward to cover a much larger portion of the Chad basin, which spans N13° ±7°. Estimated rainfall rates for Darfur and Fezzan are slightly less than the average of ~ 1.3 m/yr for the Lake Victoria basin, because of the lower aw values, that is, smaller areas of Saharan megalakes compared with their respective drainage basins (Fig. 15).

Estimates of paleo-rainfall during the AHP

Here major contradictions develop between the model outcomes and paleo-vegetation evidence, because our Sahelo-Sudanian hydrologic model predicts wetter conditions and therefore more tropical vegetation assemblages than found around Lake Victoria today. In fact, none of the very wet rainfall scenarios required by all our model runs can be reconciled with the relatively dry conditions implied by the fossil plant and animal evidence. In short, megalakes cannot be produced in Sahelo-Sudanian conditions past or present; to form, they require a tropical or subtropical setting, and major displacements of the African monsoon or extra-desert moisture sources.

sahara-palaeoclimate
Change in mean annual precipitation over northern Africa between mid-Holocene (6 ka) and pre-industrial conditions in PMIP3 models (affiliations are provided in Supplementary Table S4). Lakes Victoria and Chad outlined in blue. (a) Ensemble mean change in mean annual precipitation and positions of the African summer (July–September) ensemble mean ITCZ during mid-Holocene (solid red line) and pre-industrial conditions (solid blue line). (b) Zonal average of change in mean annual precipitation over land (20°W–30°E) for the ensemble mean (thick black) and individual models are listed on right). The range of minimal estimated change in mean annual precipitation required to sustain steppe is shown in shaded green (Jolly et al., 1998).

Conclusions

If not megalakes, what size lakes, marshes, discharging springs, and flowing rivers in the Sahara were sustainable in Sahelo-Sudanian climatic conditions? For lakes and perennial rivers to be created and sustained, net rainfall in the basin has to exceed loss to evapotranspiration, evaporation, and infiltration, yielding runoff that then supplies a local lake or river. Our hydrologic models (see Supplementary Material) and empirical observations (Gash et al., 1991; Monteith, 1991) for the Sahel suggest that this limit is in the 200–300 mm/yr range, meaning that most of the Sahara during the AHP was probably too dry to support very large lakes or perennial rivers by means of local runoff. This does not preclude creation of local wetlands supplied by ground-water recharge focused from a very large recharge area or forced to the surface by hydrologic barriers such as faults, nor megalakes like Chad supplied by moisture from the subtropics and tropics outside the Sahel. But it does raise a key question concerning the size of paleolakes, if not megalakes, in the Sahara during the AHP. Our analysis suggests that Sahelo-Sudanian climate could perhaps support a paleolake approximately ≤5000 km2 in area in the Darfur basin and ≤10,000–20,000 km2 in the Fezzan basin. These are more than an order of magnitude smaller than the megalakes envisioned for these basins, but they are still sizable, and if enclosed in a single body of water, should have been large enough to generate clear shorelines (Enzel et al., 2015, 2017). On the other hand, if surface water was dispersed across a series of shallow and extensive but partly disconnected wetlands, as also implied by previous research (e.g., Pachur and Hoelzmann, 1991), then shorelines may not have developed.

One of the underdeveloped ideas of my Indo-European demic diffusion model was that R1b-V88 had migrated through South Italy to Northern Africa, and from it using the Sahara Green Corridor to the south, from where the “upside-down” view of Bender (2007) could have occurred, i.e. Afroasiatic expanding westwards within the Green Sahara, precisely at this time, and from a homeland near the Megalake Chad region (see here).

Whether or not R1b-V88 brought the ‘original’ lineage that expanded Afroasiatic languages may be contended, but after D’Atanasio et al. (2018) it seems that only two lineages, E-M2 and R1b-V88, fit the ‘star-like’ structure suggesting an appropriate haplogroup expansion and necessary regional distribution that could explain the spread of Afroasiatic languages within a reasonable time frame.

palaeolithic
Palaeolithic migrations

This review shows that the hypothesized Green Sahara corridor full of megalakes that some proposed had fully connected Africa from west to east was actually a strip of Sahelo-Sudanian steppe spread to the north of its current distribution, including the Chad megalake, East Africa and Arabia, apart from other discontinuous local wetlands further to the north in Africa. This greenish belt would have probably allowed for the initial spread of early Afroasiatic proto-languages only through the southern part of the current Sahara Desert. This and the R1b-V88 haplogroup distribution in Central and North Africa (with a prevalence among Chadic speakers probably due to later bottlenecks), and the Near East, leaves still fewer possibilities for an expansion of Afroasiatic from anywhere else.

If my proposal turns out to be correct, this Afroasiatic-like language would be the one suggested by some in the vocabulary of Old European and North European local groups (viz. Kroonen for the Agricultural Substrate Hypothesis), and not Anatolian farmer ancestry or haplogroup G2, which would have been rather confined to Southern Europe, mainly south of the Loess line, where incoming Middle East farmers encountered the main difficulties spreading agriculture and herding, and where they eventually admixed with local hunter-gatherers.

NOTE. If related to attested languages before the Roman expansion, Tyrsenian would be a good candidate for a descendant of the language of Anatolian farmers, given the more recent expansion of Anatolian ancestry to the Tuscan region (even if already influenced by Iran farmer ancestry), which reinforces its direct connection to the Aegean.

The fiercest opposition to this R1b-V88 – Afroasiatic connection may come from:

  • Traditional Hamito-Semitic scholars, who try to look for any parent language almost invariably in or around the Near East – the typical “here it was first attested, ergo here must be the origin, too”-assumption (coupled with the cradle of civilization memes) akin to the original reasons behind Anatolian or Out-of-India hypotheses; and of course
  • autochthonous continuity theories based on modern subclades, of (mainly Semitic) peoples of haplogroup E or J, who will root for either one or the other as the Afroasiatic source no matter what. As we have seen with the R1a – Indo-European hypothesis (see here for its history), this is never the right way to look at prehistoric migrations, though.

I proposed that it was R1a-M417 the lineage marking an expansion of Indo-Uralic from the east near Lake Baikal, then obviously connected to Yukaghir and Altaic languages marked by R1a-M17, and that haplogroup R could then be the source of a hypothetic Nostratic expansion (where R2 could mark the Dravidian expansion), with upper clades being maybe responsible for Borean.

nostratic-tree
Simple Nostratic tree by Bomhard (2008)

However, recent studies have shown early expansions of R1b-297 to East Europe (Mathieson et al. 2017 & 2018), and of R1b-M73 to East Eurasia probably up to Siberia, and possibly reaching the Pacific (Jeong et al. 2018). Also, the Steppe Eneolithic and Caucasus Eneolithic clusters seen in Wang et al. (2018) would be able to explain the WHG – EHG – ANE ancestry cline seen in Mesolithic and Neolithic Eurasia without a need for westward migrations.

Dravidian is now after Narasimhan et al. (2018) and Damgaard et al. (Science 2018) more and more likely to be linked to the expansion of the Indus Valley civilization and haplogroup J, in turn strongly linked to Iranian farmer ancestry, thus giving support to an Elamo-Dravidian group stemming from Iran Neolithic.

NOTE. This Dravidian-IVC and Iran connection has been supported for years by knowledgeable bloggers and commenters alike, see e.g. one of Razib Khan’s posts on the subject. This rather early support for what is obvious today is probably behind the reactionary views by some nationalist Hindus, who probably saw in this a potential reason for a strengthened Indo-Aryan/Dravidian divide adding to the religious patchwork that is modern India.

I am not in a good position to judge Nostratic, and I don’t think Glottochronology, Swadesh lists, or any statistical methods applied to a bunch of words are of any use, here or anywhere. The work of pioneers like Illich-Svitych or Starostin, on the other hand, seem to me solid attempts to obtain a faithful reconstruction, if rather outdated today.

NOTE. I am still struggling to learn more about Uralic and Indo-Uralic; not because it is more difficult than Indo-European, but because – in comparison to PIE comparative grammar – material about them is scarce, and the few available sources are sometimes contradictory. My knowledge of Afroasiatic is limited to Semitic (Arabic and Akkadian), and the field is not much more developed here than for Uralic…

y-haplogroup-r1b-p343
Spread of Y-haplogroup R1b(xM269) in Eurasia, according to Jeong et al. (2018).

If one wanted to support a Nostratic proto-language, though, and not being able to take into account genome-wide autosomal admixture, the only haplogroup right now which can connect the expansion of all its branches is R1b-M343:

  • R1b-L278 expanded from Asia to Europe through the Iranian Plateau, since early subclades are found in Iran and the Caucasus region, thus supporting the separation of Elamo-Dravidian and Kartvelian branches;
  • From the Danube or another European region ‘near’ the Villabruna 1 sample (of haplogroup R1b-L754):
    • R1b-V88 expanding everywhere in Europe, and especially the branch expanding to the south into Africa, may be linked to the initial Afroasiatic expansion through the Pale-Green Sahara corridor (and even a hypothetic expansion with E-M2 subclades and/or from the Middle East would also leave open the influence of V88 and previous R1b subclades from the Middle East in the emergence of the language);
    • R1b-297 subclades expanding to the east may be linked to Eurasiatic, giving rise to both Indo-Uralic (M269) and Macro- or Micro-Altaic (M73) expansions.

This is shameless, simplistic speculation, of course, but not more than the Nostratic hypothesis, and it has the main advantage of offering ‘small and late’ language expansions relative to other proposals spanning thousands (or even tens of thousands) of years more of language separation. On the other hand, that would leave Borean out of the question, unless the initial expansion of R1b subclades happened from a community close to lake Baikal (and Mal’ta) that was also at the origin of the other supposedly related Borean branches, whether linked to haplogroup R or to any other…

NOTE. If Afroasiatic and Indo-Uralic (or Eurasiatic) are not genetically related, my previous simplistic model, R1b-Afroasiatic vs. R1a-Eurasiatic, may still be supported, with R1a-M17 potentially marking the latest meaningful westward population expansion from which EHG ancestry might have developed (see here). Without detailed works on Nostratic comparative grammar and dialectalization, and especially without a lot more Palaeolithic and Mesolithic samples, all this will remain highly speculative, like proposals of the 2000s about Y-DNA-haplogroup – language relationships.

Related:

A history of male migration in and out of the Green Sahara

Open access research highlight A history of male migration in and out of the Green Sahara, by Yali Xue, Genome Biology (2018) 19:30, on the recent paper by D’Atanasio et al.

Insights from the Green Saharan Y-chromosomal findings (emphasis mine):

It is widely accepted that sub-Saharan Y chromosomes are dominated by E-M2 lineages carried by Bantu-speaking farmers as they expanded from West Africa starting < 5 kya, reaching South Africa within recent centuries [4]. The E-M2-Bantu lineages lie phylogenetically within the E-M2-Green Sahara lineage and show at least three explosive lineage expansions beginning 4.9–5.3 kya [5] (Fig. 1a). These events of E-M2-Bantu expansion are slightly later than the R-V88 expansion, and highlight the range of male demographic changes in the mid-Holocene. North of the Sahara, in addition to the four trans-Saharan haplogroups, haplogroup E-M81 (which diverged from E-M78 ~ 13 kya) became very common in present-day populations as a result of another massive expansion ~ 2 kya [6] (Fig. 1a).

african-sahara-y-dna
Simplified Y-chromosomal phylogeny and inferred past or observed present-day distribution of relevant Y-chromosomal lineages. a Calibrated phylogenetic tree of Y-chromosomal lineages discussed in the text. Green shading represents the period when the present-day Sahara Desert was green and fertile. Lineages represented by filled pentagons have undergone very rapid expansions. b [featured image] The Green Sahara period 5–12 kya. Green shading indicates that the present-day Sahara Desert was green and fertile. The colors within the large oval represent the four Y-chromosomal haplogroups deduced to be present in the region at this time; specific locations are not implied. The arrows indicate the inferred origins of these haplogroups to the north or south, but specific origins and routes are not implied. c The present-day distributions of the four Green Saharan Y-chromosomal haplogroups. Yellow shading indicates the Sahara Desert. Each circle represents a sampled population, with the presence or absence of the four Green Saharan haplogroups shown by the colored sectors; other haplogroups may also be present in these populations, but are not shown. The small arrows indicate the inferred northwards and southwards movements of these haplogroups when the Sahara became uninhabitable.

Although Y chromosomes exist within populations and so share and reflect the general history of those populations, they can sometimes show some departures from other parts of the genome that result from differences in male and female behaviors. D’Atanasio et al. [1] highlight one such contrast in their study. Present-day North African populations show substantial sub-Saharan autosomal and mtDNA genetic components ascribed to the Roman and Arab slave trades 1–2 kya [7], but carry few sub-Saharan Y lineages from this source, probably reflecting the smaller numbers of male slaves and their reduced reproductive opportunities when compared to those of female slaves. The sub-Saharan Y chromosomes in these North African populations thus originate predominantly from the earlier Green Sahara period.

In this part of Africa, the indigenous languages that are spoken belong to three of the four African linguistic families (Afro-Asiatic, Nilo-Saharan and Niger-Congo). Interestingly, these languages show non-random associations with Y lineages. For example, Chadic languages within the Afro-Asiatic family are associated with haplogroup R-V88, whereas Nilo-Saharan languages are associated with specific sublineages within A3-M13 and E-M78, further illustrating the complex human history of the region.

The main question after D’Atanasio et al. (2018) is thus:

(…) what are the reasons for the very rapid R-V88 expansion 5–6 kya [1] and E-M81 expansion ~ 2 kya [6], and how do these expansions fit within general worldwide patterns of male-specific expansions, which in other cases have been linked to cultural and technological changes [5]?

I think that the only known haplogroup expansion that might fit today the spread and dialectalization of Afroasiatic, a proto-language probably contemporaneous or slighly older than Middle Proto-Indo-European, is that of R1b-V88 lineages. However, without ancient DNA samples to corroborate this, we cannot be sure.

See also: