Human ancestry solves language questions? New admixture citebait

human_ancestry

A paper at Scientific Reports, Human ancestry correlates with language and reveals that race is not an objective genomic classifier, by Baker, Rotimi, and Shriner (2017).

Abstract (emphasis mine):

Genetic and archaeological studies have established a sub-Saharan African origin for anatomically modern humans with subsequent migrations out of Africa. Using the largest multi-locus data set known to date, we investigated genetic differentiation of early modern humans, human admixture and migration events, and relationships among ancestries and language groups. We compiled publicly available genome-wide genotype data on 5,966 individuals from 282 global samples, representing 30 primary language families. The best evidence supports 21 ancestries that delineate genetic structure of present-day human populations. Independent of self-identified ethno-linguistic labels, the vast majority (97.3%) of individuals have mixed ancestry, with evidence of multiple ancestries in 96.8% of samples and on all continents. The data indicate that continents, ethno-linguistic groups, races, ethnicities, and individuals all show substantial ancestral heterogeneity. We estimated correlation coefficients ranging from 0.522 to 0.962 between ancestries and language families or branches. Ancestry data support the grouping of Kwadi-Khoe, Kx’a, and Tuu languages, support the exclusion of Omotic languages from the Afroasiatic language family, and do not support the proposed Dené-Yeniseian language family as a genetically valid grouping. Ancestry data yield insight into a deeper past than linguistic data can, while linguistic data provide clarity to ancestry data.

Regarding European ancestry:

Southern European ancestry correlates with both Italic and Basque speakers (r = 0.764, p = 6.34 × 10−49). Northern European ancestry correlates with Germanic and Balto-Slavic branches of the Indo-European language family as well as Finno-Ugric and Mordvinic languages of the Uralic family (r = 0.672, p = 4.67 × 10−34). Italic, Germanic, and Balto-Slavic are all branches of the Indo-European language family, while the correlation with languages of the Uralic family is consistent with an ancient migration event from Northern Asia into Northern Europe. Kalash ancestry is widely spread but is the majority ancestry only in the Kalash people (Table S3). The Kalasha language is classified within the Indo-Iranian branch of the Indo-European language family.

Sure, admixture analysis came to save the day. Yet again. Now it’s not just Archaeology related to language anymore, it’s Linguistics; all modern languages and their classification, no less. Because why the hell not? Why would anyone study languages, history, archaeology, etc. when you can run certain algorithms on free datasets of modern populations to explain everything?

What I am criticising here, as always, is not the study per se, its methods (PCA, the use of Admixture or any other tools), or its results, which might be quite interesting – even regarding the origin or position of certain languages (or more precisely their speakers) within their linguistic groups; it’s the many broad, unsupported, striking conclusions (read the article if you want to see more wishful thinking).

This is obviously simplistic citebait – that benefits only journals and authors, and it is therefore tacitly encouraged -, but not knowledge, because it is not supported by any linguistic or archaeological data or expertise.

Is anyone with a minimum knowledge of languages, or general anthropology, actually reviewing these articles?

Related:

Featured image: Ancestry analysis of the global data set, from the article.

C.C. Uhlenbeck on the Proto-Indo-European homeland in the 19th century

yamna-expansion-europe

Michiel de Vaan, from the University of Lausanne, has recently uploaded three of his papers published in recent years in the JIES on the works of Dutch linguist C.C. Uhlenbeck:

1. The Early C. C. Uhlenbeck on Indo-European, JIES 44/1-2, 2016, p. 73-80

Christianus Cornelius Uhlenbeck (1866–1951) was one of the leading Dutch linguists between the 1880s and the 1940s. He made his mark on a number of disciplines in descriptive and comparative linguistics, such as Basque, the indigenous languages of North America, Old Germanic and Sanskrit. In 2008, a special issue of the Canadian Journal of Netherlandic Studies (Genee & Hinrichs 2008) was devoted to his memory, the contents of which can be read online.

Uhlenbeck’s work and thinking on the Indo-European language family, and, in particular, on the original habitat of its speakers, have been discussed by Kortlandt 2010, who concluded that Uhlenbeck had remarkably advanced views for his time. The first two journal articles in which Uhlenbeck (1895, 1897) sets forth his views were published in Dutch. During the academic year 2013/14, I had the opportunity to read a number of articles on the question of the Indo-European homeland problem with my students at Leiden University. I provided Uhlenbeck’s Dutch articles from 1895 and 1897 with an English translation which I hereby submit to all colleagues

On Anthony and Haarman:

Anthony focuses on the socioeconomic changes that took place in the fifth and fourth millennium BC, when the Indo-European steppe peoples entered into contact with the sedentary, agricultural population of Southeast-Europe, also termed Old European or Palaeo-European. Importantly, Anthony dismantles the monolithic view of a single “steppe pastoralism”, and instead stresses that the steppe economy itself went through various developmental phases, which might be linked to different periods of expansion of Indo-European into Europe. Haarmann zooms in on the sociocultural effects of the Indo-European expansion(s). Since language contact will often heavily influence the languages which are in contact, he sets out to look for traces of the language of the Old Europeans in the surviving Indo-European languages, first of all, in Ancient Greek. As many scholars before him have also realized, there is a thick layer of non-Indo-European words in Greek in fields such as agriculture, wine production, weaving, metallurgy, religion and mythology, building techniques, and local flora and fauna. Even the Greeks themselves acknowledged the presence of a “Pelasgian” substratum in their own language. Haarmann concludes (2012: 119): “Despite the fact that Indo-Europeans exercised political power and promoted their language as the common vehicle, they were nevertheless impressed by the achievements of the Old Europeans to the extent that the dominant language of the élite absorbed manifold influences from the local language(s).”

2. Where was the Indo-European proto-language spoken?, by C.C. Uhlenbeck (1895), translation by Michiel de Vaan, JIES 44/1-2, 2016, p. 181-185.

It cannot be objected that the eastern and the western Iranians differed much in their dialects, for the PIE language itself must have been split in a number of fairly different dialects. There has never been in the world a language without dialect differences, larger or smaller, depending on the geographic distance. That is why, in the beginning of this piece, I spoke not of one original language, but of a group of closely cognate dialects. Since the linguistic area of PIE was probably very large, it is certainly possible that part of it lay in the steppes, another part in the mountains, and yet another part in the fertile plains. If so, the fauna and the flora of the homeland cannot have been the same in different areas. And this is an argument, which the linguistic prehistorician must not lose sight of!

On the necessary natural (geographic and stage) division of PIE, he made apparently a dialectal division into a European group (including Greek?), a Balkan-Balto-Slavic group, and Indo-Iranian.

3. The prehistory of the Indo-European peoples, by C.C. Uhlenbeck (1897), translation by Michiel de Vaan, JIES 44/1-2, 2016, 186-212.

The following excerpt is probably not the most interesting one (check out the different aspect of prehistoric life described through linguistics), but it is fun to be able to support the same arguments today:

Does linguistics provide us with the means to indicate a smaller region as the center of expansion of the Indo-European languages and peoples? Hardly. After all, it is far from certain that the people who speak Indo-European languages are also ethnologically more closely related to each other than to peoples with languages very different from ours. If the homeland of the Indo-European languages does not coincide with that of the Indo-European peoples, it becomes impossible to determine either one. In reality, if the Indo-European speaking peoples do not form an ethnological unity, we have not the slightest reason to suppose that they all hail from a single region. The use of a common language can just as well be explained by a powerful, prehistoric cultural influence, as by common ancestry. The unknown, unknowable origin of that cultural force is then, in a certain sense, the homeland of our language family. Searching a homeland of the Indo-Europeans or of the Indo-European dialects is like taking a wild stab, something which all who understand history must abhor. If Schrader regards as the homeland the Pontic steppes, if Hirt regards the coasts of Lithuania as such, this is based on insufficient and partially judged data. Still, the large agreement in vocabulary between Indo-European and Egypto-Semitic remains a remarkable fact, which Friedrich Delitzsch first illustrated in a truly scientific way.

(…)

If we stick to the facts, and refrain from bottomless speculations, we will find no other homeland than the area indicated above, which encompasses half of Europe and a part of Asia.

Potential Afroasiatic Urheimat near Lake Megachad

palaeolithic-migrations

The publication of new ancient DNA samples from Africa is near, according to people at the SMBE meeting. As reported by Anthropology.net, a group by Pontus Skoglund has analysed new samples (complementing the study made by Carina Schlebusch), so we will have ancient samples of Africans from 300 to 6,000 years ago. They have been compared to the data of modern African populations, and among their likely conclusions (to be published):

  • Several thousand years ago, likely Tanzanian herders migrated far and wide, reaching Southern Africa centuries before the first farmers.
  • West Africans were likely early contributors to the gene pool of sub-Saharan Africans.
  • One ancient African herder showed influence from even farther abroad, with 38% of their DNA coming from outside Africa. 9-22% of the DNA of modern farmers, including the southern Khoe-San, comes from East Africans and Eurasian herders
  • Modern farmers, the ones as old as 500 years old, did have Bantu DNA in their genomes, but the ancient hunter-gatherers predated the spread of the Bantu.

Razib Khan, asked about the Afroasiatic homeland by David Reich, has taken this opportunity to publish his own hypothesis on the expansion of Afroasiatic, given the known Admixture analyses, using Y-DNA phylogeography, and with reasonable assumptions. He concludes that Afroasiatic expansion might also be associated with the western expansion of E1b1b subclades from a Levantine (“Natufian”) homeland.

I think it is necessary to remind everyone of the many problems unsolved by Indo-European studies – a much older discipline (and with more research published) than Afroasiatic studies. It is already quite revealing that we can’t still trace back Proto-Semitic to its homeland, and that Proto-Semitic is probably as old as Late Proto-Indo-European. We are talking, then, about an ancient proto-language – Afroasiatic – possibly older than Middle Indo-European (or Indo-Hittite), and whose dialects are still not well studied – but for the Semitic and Egyptian branches. Linguistic guesstimates or phylogenetic speculation date the proto-language (and thus the homeland) within a wide range, from 15,000 to 6,000 years ago.

There is an obvious trend (probably driven by Semitic and Egyptian researchers) to place the Afroasiatic Homeland near one of the many proposed Semitic homelands, i.e. in East Africa. This is similar to the trend seen in the first half of the 20th century in Indo-European studies, with most proposals locating the Proto-Indo-European homeland in Europe. European languages were the best known, and only the perceived antiquity of Vedic Sanskrit made some propose South Asian origins for the proto-language. However, it was only careful interpretation of linguistic finds, combined with archaeological data, what eventually yielded the Kurgan hypothesis, which has been since refined.

afroasiatic-homeland
A model for the homeland and expansion of Afroasiatic, from Wikipedia

Razib Khan’s proposal makes sense in that it fits what others have proposed before, i.e. an east African or Middle Eastern Afroasiatic homeland, and that it links it with the expansion of farming. However, we have to keep in mind that until 5,000 years ago the Sahara was not the desert we know: it had certain important green corridors, humid areas between megalakes. The Sahara might not have been exactly green 10,000 to 5,000 years ago (roughly the time when Afroasiatic must have been spoken), but it had certain regions that allowed for an east-west migration. However, it also allowed for a west-east migration, and – perhaps more importantly – for a sizeable population expansion in central Saharan territory. To forget that is to allow for potentially wrong assumptions to be made.

What we expect from the next papers on ancient African DNA samples are the result of certain (more recent) population – and thus potentially ethnolinguistic – movements, but they probably won’t solve the question of the Afroasiatic homeland, which has an older time span than the samples studied. There is a wide void in African prehistory – compared with Near Eastern history – and this research will be closing that gap, just like European samples are helping close the gap in the prehistory of western, northern, and eastern Europe, compared to the history of the eastern Mediterranean regions.

palaeolithic-europe-africa
Diachronic map of Paleolithic migrations of R1b lineages in Europe and Africa

I already wrote, regarding the potential ethnolinguistic link between Indo-European and Afroasiatic, that a close look at the migration of R1b-V88 lineages from Europe (through southern Italy?) into the Sahara – through the Fezzan-Chad-Chotts, and Chad-Chotts-Ahnet-Moyer megalake green corridors – could have been the key to the successful expansion of Afrasians.

Interesting aspects to take into account are the distribution of R1b-V88 lineages, compared to the location of Chadic languages (probably the most divergent and least known of the group) and to the potential North Afroasiatic (composed by Egyptian, Berber, and Semitic) and South Afroasiatic group (made of Cushitic and Omotic). Chadic has been argued to be connected variously to North Afroasiatic, or to the Berber branch, but the Northern group has also been argued to be connected with Cushitic, with Omotic as an independent branch. Also interesting would then be the potential connection between Indo-European (or Indo-Uralic) and Afroasiatic.

r1b-map
Modern distribution of haplogroup R1b, from Wikipedia

We could speculatively place the potential primary Afroasiatic homeland in the south-central Sahara, near the Megachad lake (i.e. near the peak of R1b-V88 lineages), with a secondary homeland in eastern Africa (as in the map above) – and maybe a tertiary homeland (of North Afroasiatic) in the Middle East, associated with the expansion of “Natufians” and E1b1b subclades. The identification of the spread of Afroasiatic languages with the expansion of R1b-V88 lineages needs an anthropological context (linguistic and archaeological) that is obviously lacking today.

It is important to keep all possibilities in sight when reviewing genetic analyses.

Related:

EDIT (16/7/2017): Added link to Neby’s post on a potential Semitic homeland, and Nature article on Schlebusch and Skoglund research.