Statistical methods fashionable again in Linguistics: Reconstructing Proto-Australian dialects

Reconstructing remote relationships – Proto-Australian noun class prefixation, by Mark Harvey & Robert Mailhammer, Diachronica (2017) 34(4): 470–515


Evaluation of hypotheses on genetic relationships depends on two factors: database size and criteria on correspondence quality. For hypotheses on remote relationships, databases are often small. Therefore, detailed consideration of criteria on correspondence quality is important. Hypotheses on remote relationships commonly involve greater geographical and temporal ranges. Consequently, we propose that there are two factors which are likely to play a greater role in comparing hypotheses of chance, contact and inheritance for remote relationships: (i) spatial distribution of corresponding forms; and (ii) language specific unpredictability in related paradigms. Concentrated spatial distributions disfavour hypotheses of chance, and discontinuous distributions disfavour contact hypotheses, whereas hypotheses of inheritance may accommodate both. Higher levels of language-specific unpredictability favour remote over recent transmission. We consider a remote relationship hypothesis, the Proto-Australian hypothesis. We take noun class prefixation as a test dataset for evaluating this hypothesis against these two criteria, and we show that inheritance is favoured over chance and contact.

I was redirected to this work by my wife – who discovered it reading BBC News – , suspicious of its potential glottochronological content. However, I must say – speaking from my absolute ignorance of the main language family investigated – , that it seemed in general an interesting read, with some thorough discussion and attention to detail.

The statistical analyses, however, seem to disrupt the content, and – in my opinion – do not help support its conclusions.

Map of Non-Pama-Nyungan languages.

Computer Science and Linguistics

We are evidently on alert to tackle dubious research, because of the revival of pseudoscientific methods in linguistic investigation, promoted (yet again) by Nature.

It seems that journals with the highest impact factor, in their search for groundbreaking conclusions supported by any methods involving numbers, are setting a still lower level of standards for academic disciplines.

NOTE. If you think about it – if glottochronology has survived the disgrace it fell into in the 2000s, to come back again now to the top of the publishing industry… How can we expect the “Yamnaya ancestry” concept to be overcome? I guess we will still see certain Eastern Europeans in 2030 arguing for elevated steppe ancestry here and there to support the conclusions of the 2015 papers, no matter what…

I am sure that worse times lie ahead for traditional comparative grammar. For example, it seems that there will be more publications on Proto-Indo-European using novel computer methods: a group led by Janhunen and Pyysalo, from the Department of Languages at the University of Helsinki, promises – under an ever-growing bubble of mistery (or so it seems from their Twitter and Facebook accounts) – a machine-implemented reconstruction (with the generative etymological PIE lexicon project) that will once and for all solve all our previous ‘inconsistencies’…

Spoiler alert for their publications: whether they select to go on mainly with computer-implemented methods, or they use them to support more traditional results, their conclusions will confirm (surprise!) their authors’ previous reactionary theses, such as a renewed support for the traditional monolaryngealism, and a rejection of Kortlandt’s or Kloekhorst’s (i.e. the Leiden School’s) theories on Proto-Indo-European phonology, and thus a PIE relationship to Proto-Uralic, probably stressing yet again an independent origin for both proto-languages.

See also: