Statistical methods fashionable again in Linguistics: Reconstructing Proto-Australian dialects

Reconstructing remote relationships – Proto-Australian noun class prefixation, by Mark Harvey & Robert Mailhammer, Diachronica (2017) 34(4): 470–515

Abstract:

Evaluation of hypotheses on genetic relationships depends on two factors: database size and criteria on correspondence quality. For hypotheses on remote relationships, databases are often small. Therefore, detailed consideration of criteria on correspondence quality is important. Hypotheses on remote relationships commonly involve greater geographical and temporal ranges. Consequently, we propose that there are two factors which are likely to play a greater role in comparing hypotheses of chance, contact and inheritance for remote relationships: (i) spatial distribution of corresponding forms; and (ii) language specific unpredictability in related paradigms. Concentrated spatial distributions disfavour hypotheses of chance, and discontinuous distributions disfavour contact hypotheses, whereas hypotheses of inheritance may accommodate both. Higher levels of language-specific unpredictability favour remote over recent transmission. We consider a remote relationship hypothesis, the Proto-Australian hypothesis. We take noun class prefixation as a test dataset for evaluating this hypothesis against these two criteria, and we show that inheritance is favoured over chance and contact.

I was redirected to this work by my wife – who discovered it reading BBC News – , suspicious of its potential glottochronological content. However, I must say – speaking from my absolute ignorance of the main language family investigated – , that it seemed in general an interesting read, with some thorough discussion and attention to detail.

The statistical analyses, however, seem to disrupt the content, and – in my opinion – do not help support its conclusions.

non-pama-nyungan-languages
Map of Non-Pama-Nyungan languages.

Computer Science and Linguistics

We are evidently on alert to tackle dubious research, because of the revival of pseudoscientific methods in linguistic investigation, promoted (yet again) by Nature.

It seems that journals with the highest impact factor, in their search for groundbreaking conclusions supported by any methods involving numbers, are setting a still lower level of standards for academic disciplines.

NOTE. If you think about it – if glottochronology has survived the disgrace it fell into in the 2000s, to come back again now to the top of the publishing industry… How can we expect the “Yamnaya ancestry” concept to be overcome? I guess we will still see certain Eastern Europeans in 2030 arguing for elevated steppe ancestry here and there to support the conclusions of the 2015 papers, no matter what…

I am sure that worse times lie ahead for traditional comparative grammar. For example, it seems that there will be more publications on Proto-Indo-European using novel computer methods: a group led by Janhunen and Pyysalo, from the Department of Languages at the University of Helsinki, promises – under an ever-growing bubble of mistery (or so it seems from their Twitter and Facebook accounts) – a machine-implemented reconstruction (with the generative etymological PIE lexicon project) that will once and for all solve all our previous ‘inconsistencies’…

Spoiler alert for their publications: whether they select to go on mainly with computer-implemented methods, or they use them to support more traditional results, their conclusions will confirm (surprise!) their authors’ previous reactionary theses, such as a renewed support for the traditional monolaryngealism, and a rejection of Kortlandt’s or Kloekhorst’s (i.e. the Leiden School’s) theories on Proto-Indo-European phonology, and thus a PIE relationship to Proto-Uralic, probably stressing yet again an independent origin for both proto-languages.

See also:

Another nail in the coffin for the Anatolian hypothesis: continuity and isolation in the Caucasus during the Neolithic and Calcholithic, in mtDNA samples

caucasus-armenia

A new paper appeared on Current Biology, by Margaryan et al. (including Morten E. Allentoft): Eight Millennia of Matrilineal Genetic Continuity in the South Caucasus.

Among its conclusions:

The plot clearly shows the clustering of the ancient group together with the modern European, Armenian, and Caucasian populations. We observe none of the typical East Eurasian mtDNA lineages (A, C, D, F, G, and M) among the ancient individuals, and only one individual with haplogroup D is present in the modern Armenian maternal gene pool (Artsakh). As such, the archaeologically and historically attested migrations of Central Asian groups (e.g., Turks and Mongols) into the South Caucasus [14, 15] do not seem to have had a major contribution in the maternal gene pool of Armenians. Both geographic (mountainous area) and cultural (Indo-European-speaking Christians and Turkic-speaking Muslims) factors could have served as barriers for genetic contacts between Armenians and Muslim invaders in the 11th–14th centuries CE. The same pattern was observed using Y chromosome markers in geographically diverse Armenian groups.

Also, regarding the potential Indo-European migration into the area:

It appears that during the last eight millennia, there were no major genetic turnovers in the female gene pool in the South Caucasus, despite multiple well-documented cultural changes in the region [27, 28]. This is in contrast to the dramatic shifts of mtDNA lineages occurring in Central Europe during the same time period, which suggests either a different mode of cultural change in the two regions or that the genetic turnovers simply occurred later in Europe compared to the South Caucasus. More data from earlier Mesolithic cultures in the South Caucasus are needed to clarify this. During the highly dynamic Bronze Age and Iron Age periods, with the formation of complex societies and the emergence of distinctive cultures such as Kura-Araxes, Trialeti-Vanadsor, Sevan-Artsakh, Karmir-Berd, Karmir-Vank, Lchashen-Metsamor, and Urartian, we cannot document any changes in the female gene pool. This supports a cultural diffusion model in the South Caucasus, unless the demographic changes were heavily male biased, as was most likely the case in Europe during the Bronze Age migrations [29, 30]. However, genome-wide data from the few Bronze Age individuals published so far from the South Caucasus also support a continuity scenario [26]. Another possibility is that any gene flow into the South Caucasus occurred from groups with a very similar genetic composition, facilitating only subtle genetic changes that are not detectable with the current datasets.

I would obviously support the latter possibility, a demic diffusion that can be shown by precise subclade and admixture analyses, because cultural diffusion is quite difficult to justify in any ancient setting. Since it is most likely south-eastern European R1b-Z2103 lineages (or R1b-M269, if resurged during the proto-language transition in the Balkans) the original marker of Palaeo-Balkan speakers, that is what one should be looking for in Y-DNA investigation in the area. Since migrations were probably male-biased, it is not likely that mtDNA was much affected. But, especially during the Iron Age, a change should also be seen, marked by the appearance of (recent) U subclades.

Related:

The Aryan migration debate, the Out of India models, and the modern “indigenous Indo-Aryan” sectarianism

On the origin of R1a and R1b subclades in Greece

News of the article seen first in Eurogenes (you can see the specific samples there).

Featured image is from the article.