Statistical methods fashionable again in Linguistics: Reconstructing Proto-Australian dialects

Reconstructing remote relationships – Proto-Australian noun class prefixation, by Mark Harvey & Robert Mailhammer, Diachronica (2017) 34(4): 470–515


Evaluation of hypotheses on genetic relationships depends on two factors: database size and criteria on correspondence quality. For hypotheses on remote relationships, databases are often small. Therefore, detailed consideration of criteria on correspondence quality is important. Hypotheses on remote relationships commonly involve greater geographical and temporal ranges. Consequently, we propose that there are two factors which are likely to play a greater role in comparing hypotheses of chance, contact and inheritance for remote relationships: (i) spatial distribution of corresponding forms; and (ii) language specific unpredictability in related paradigms. Concentrated spatial distributions disfavour hypotheses of chance, and discontinuous distributions disfavour contact hypotheses, whereas hypotheses of inheritance may accommodate both. Higher levels of language-specific unpredictability favour remote over recent transmission. We consider a remote relationship hypothesis, the Proto-Australian hypothesis. We take noun class prefixation as a test dataset for evaluating this hypothesis against these two criteria, and we show that inheritance is favoured over chance and contact.

I was redirected to this work by my wife – who discovered it reading BBC News – , suspicious of its potential glottochronological content. However, I must say – speaking from my absolute ignorance of the main language family investigated – , that it seemed in general an interesting read, with some thorough discussion and attention to detail.

The statistical analyses, however, seem to disrupt the content, and – in my opinion – do not help support its conclusions.

Map of Non-Pama-Nyungan languages.

Computer Science and Linguistics

We are evidently on alert to tackle dubious research, because of the revival of pseudoscientific methods in linguistic investigation, promoted (yet again) by Nature.

It seems that journals with the highest impact factor, in their search for groundbreaking conclusions supported by any methods involving numbers, are setting a still lower level of standards for academic disciplines.

NOTE. If you think about it – if glottochronology has survived the disgrace it fell into in the 2000s, to come back again now to the top of the publishing industry… How can we expect the “Yamnaya ancestry” concept to be overcome? I guess we will still see certain Eastern Europeans in 2030 arguing for elevated steppe ancestry here and there to support the conclusions of the 2015 papers, no matter what…

I am sure that worse times lie ahead for traditional comparative grammar. For example, it seems that there will be more publications on Proto-Indo-European using novel computer methods: a group led by Janhunen and Pyysalo, from the Department of Languages at the University of Helsinki, promises – under an ever-growing bubble of mistery (or so it seems from their Twitter and Facebook accounts) – a machine-implemented reconstruction (with the generative etymological PIE lexicon project) that will once and for all solve all our previous ‘inconsistencies’…

Spoiler alert for their publications: whether they select to go on mainly with computer-implemented methods, or they use them to support more traditional results, their conclusions will confirm (surprise!) their authors’ previous reactionary theses, such as a renewed support for the traditional monolaryngealism, and a rejection of Kortlandt’s or Kloekhorst’s (i.e. the Leiden School’s) theories on Proto-Indo-European phonology, and thus a PIE relationship to Proto-Uralic, probably stressing yet again an independent origin for both proto-languages.

See also:

Genomics reveals four prehistoric migration waves into South-East Asia

Open access preprint article at bioRxiv Ancient Genomics Reveals Four Prehistoric Migration Waves into Southeast Asia, by McColl, Racimo, Vinner, et al. (2018).

Abstract (emphasis mine):

Two distinct population models have been put forward to explain present-day human diversity in Southeast Asia. The first model proposes long-term continuity (Regional Continuity model) while the other suggests two waves of dispersal (Two Layer model). Here, we use whole-genome capture in combination with shotgun sequencing to generate 25 ancient human genome sequences from mainland and island Southeast Asia, and directly test the two competing hypotheses. We find that early genomes from Hoabinhian hunter-gatherer contexts in Laos and Malaysia have genetic affinities with the Onge hunter-gatherers from the Andaman Islands, while Southeast Asian Neolithic farmers have a distinct East Asian genomic ancestry related to present-day Austroasiatic-speaking populations. We also identify two further migratory events, consistent with the expansion of speakers of Austronesian languages into Island Southeast Asia ca. 4 kya, and the expansion by East Asians into northern Vietnam ca. 2 kya. These findings support the Two Layer model for the early peopling of Southeast Asia and highlight the complexities of dispersal patterns from East Asia.

A model for plausible migration routes into Southeast Asia, based on the ancestry patterns observed in the ancient genomes.