Datasets from Olalde et al. (Nature 2018) and Mathieson et al. (Nature 2018) are out

Datasets from Olalde et al. (Nature 2018) and Mathieson et al. (Nature 2018) – papers yet to appear – are out:

That hopefully means that both papers will be published soon…

New monograph on The Tale of Igor’s Campaign (in Russian)

tale-igor-svyat-campaign

Sergej Nikolaev has published a new monograph on The Tale of Igor’s Campaign (you should download and open it in a PDF viewer to view some special characters correctly):

Слово о полку Игореве»: реконструкция стихотворного текста, by С.Л. Николаев (2018).

Abstract (in Russian).

Текст «Слова о полку Игореве» (далее «Слово») дошел до нас в двух неточных (отредактированных) копиях со списка нач. XVI в. и нескольких выписках из него. Наслоения, привнесенные переписчиком нач. XVI в. (или несколькими переписчиками) – редактура в русле 2 го южнославянского влияния и поздние диалектизмы – непоследовательны (§9.3.1) и не настолько исказили стихотворный текст рубежа XII–XIII вв., чтобы сделать невозможной его реконструкцию. «Слово» по своему жанру (светская поэзия) не принадлежит к текстам, которые по многу раз переписывались в монастырских скрипториях. Поэтому не исключено, что рукопись нач. XVI в. является хотя и небрежной, но первой по счету копией древнерусского оригинала.

«Слово» могло звучать приблизительно так, как я предлагаю в своей реконструкции, морфология и акцентология языка его автора могли быть устроены так, как я предполагаю, и оно могло быть создано в реконструируемой мною системе стихосложения. Однако в действительности многое могло быть устроено иначе. Реконструкция акцентологической системы и две другие гипотезы (о неравносложной силлаботонике и об опциональном прояснении слабых редуцированных) замкнуты друг на друге и образуют circulus in probando. Реконструируемая для «Слова» акцентологическая система выводится из праславянской реконструкции и подтверждается данными современных диалектов, однако она не засвидетельствована в древнерусских памятниках. Слабым местом моей реконструкции является прояснение слабых редуцированных в позициях, где оно нужно исключительно из метрических соображений. В работе, подобной этой, невозможно избежать домыслов и рискованных допущений, ряд выдвинутых гипотез находится «на грани фола», однако в целом моя реконструкция построена на фактах и их интерпретациях, являясь таким образом научным исследованием. В работе используютмя результаты смежных наук ‒ в первую очередь стиховедения. Представленная в настоящей книге реконструкция «Слова» является первым опытом системного моделирования стихотворного текста на гипотетическом древнерусском диалекте XII‒XIII в., существование которого весьма вероятно. Мне хотелось бы надеяться, что моя работа внесет свою скромную лепту в изучение великого памятника древнерусской литературы.

The Tale of Igor’s Campaign is probably the oldest Slavic epic available, recorded later than what oral tradition and linguistic details reflect, like the oldest Indo-Iranian texts. It contains many details interesting for Proto-Slavic (and North-West Indo-European) language and culture reconstruction.

For those confusing recent attestation of languages with their relevance for comparative grammar, I would suggest Martin Joachim Kümmel‘s article Is ancient old and modern new? Fallacies of attestation and reconstruction (with special focus on Indo-Iranian).

Featured image: Viktor Vasnetsov. After Igor Svyatoslavich’s fighting with the Polovtsy (Photographer, referenced in Wikipedia).

Related:

Correlation does not mean causation: the damage of the ‘Yamnaya ancestral component’, and the ‘Future American’ hypothesis

america-languages-lowlandic

Human ancestry can only help solve anthropological questions by using all anthropological disciplines involved. I have said that many times in this blog.

Correlation does not mean causation

Really, it does not.

You might think the tenet ‘correlation does not mean causation‘ must be evident at this point in Statistics, and it must also be for all those using statistical methods in their research. But it is sadly not so. A lot of researchers just look for correlation, and derive conclusions – without even an initial sound hypothesis to be contrasted… You can judge for yourself, e.g. reading the many instances of this complaint in recent publications of Biomedical and Social Sciences, on the interesting blog Statistical Modeling, Causal Inference, and Social Science.

In anthropological questions regarding Indo-European studies there is an added handicap: not taking correlation to mean causation does also mean – to avoid at least the most obvious confounders – taking into account the multiple linguistic and archaeological data that are available right now, to explain the expansion of Indo-European languages.

You might also believe that international researchers in Human Evolutionary Biology – after all, this is essentially a biomedical discipline – are acquainted with statistical methods and their problems when applied to their field. And that scientific journals – and especially those with the highest impact factors, like Nature, Science, or PNAS – have professional, careful reviewers who would never accept papers that equal correlation with causation, especially when Social Sciences are involved (because this alone might make errors grow exponentially…). Sadly, this is obviously not so, either.

https://imgs.xkcd.com/comics/correlation.png

The ‘Yamnaya component’ concept and its damage

From Allentoft et al. (2015), emphasis is mine:

Both studies [Haak et al. (2015) and this one] found a genetic affinity between samples from a central European culture known as Corded Ware, which existed from around 2500 bc, and samples from the earlier Yamnaya steppe culture. This similarity between distant populations is best explained by a substantial westward expansion of the Yamnaya or their close relatives into central Europe (Fig. 1b). Such an expansion is consistent with the steppe hypothesis, which argues that Corded Ware cultures were a conduit for the dispersal of Indo-European languages into Europe.

More interesting than these vague words – and the short, almost invisible suggestion that Yamna may not be exactly the population behind Corded Ware peoples – are the maps that illustrated in Nature their risky hypothesis: they called it “steppe hypothesis“, like that (in general terms), as if everyone defending a steppe origin for Proto-Indo-European would support such a model, when they actually referred to the specific hypothesis of one of their authors (Kristiansen), one of the few archaeologists who keep Gimbutas’ concept of the ‘Kurgan peoples’ alive, based on the Corded Ware culture:

Allentoft Corded Ware
Allentoft et al. (2015): “They conclude that the Corded Ware culture of central Europe had ancestry from the Yamnaya. Allentoft et al. also show that the Afanasievo culture to the east is related to the Yamnaya, and that the Sintashta and Andronovo cultures had ancestry from the Corded Ware. Arrows indicate migrations — those from the Corded Ware reflect the evidence that people of this archaeological culture (or their relatives) were responsible for the spreading of Indo-European languages. All coloured boundaries are approximate.”

In many publications that followed, the trend has been to reproduce this graphical model, by asserting (or implying) that Bell Beaker peoples were the result of subsequent Corded Ware migrations, and indeed that Corded Ware peoples migrated from the Yamna culture, and were thus the vector of expansion for Indo-European languages in Europe.

All of this is being proven wrong, as I predicted: see Mathieson et al. (2017) and Olalde et al. (2017) for recently studied samples with ‘steppe component’, older than (and unrelated to) the Yamna culture. However, no retraction (or correction, whatever) has been published to date about the concept of the ‘Yamnaya ancestry expansion’, and its consequences.

We shall see then just a rather surreptitious shift in terminology from ‘Yamnaya’ to ‘steppe’ component, to adapt to the new data – i.e. some damage control while the ship of ‘Yamnaya ancestry’ capsizes – but little else. “Earlier ‘Yamnaya ancestry’, you say? Just, you know, let’s call it ‘steppe ancestry’ and shift the expansion of Indo-European languages to one or two thousand years earlier, and done!”

The damage of this post-truth genetics is already done: we will see the unending distribution on the Internet in general, and on social networks in particular, of these grandiose conclusions, of far-fetched Indo-European migration models that include the Corded Ware culture, of simplistic maps with apparently harmless ‘arrows of migration’ (like the above) representing fictional population movements suggesting nonexistent dialectal branches.

You might be one of those sceptics wary of so many boring statistical rules: “But it’s a safe reasoning: Yamanaya samples have an ‘ancestral component’ that is found elevated in Corded Ware samples, and less so in Bell Beaker samples, and PCA showed a similar result…so the migration model Yamnaya -> Corded Ware -> Bell Beaker is a priori correct, right?”

The ‘Future American’ hypothesis

Let me illustrate this attractive “Correlation = Causation” argument, using it to solve the problem of Future American languages.

Suppose we live in a future post-apocalyptic world ca. 3500 AD, with no surviving historical records before 3000 AD. None. Just investigation of cultures and their relationship by Archaeology, proto-languages reconstructed and language families identified by Linguistics, etc.

We have thus Future Germanic and Future Romance as the only language families spoken in Future Western Europe and in the Future Americas, in a distribution similar to the present day*, and we have certain somehow related archaeologically-defined cultures on both sides of the Atlantic, like Briton, Iberian, Norman, or Lowlandish, although their distribution remains partly undefined in time and space.

* If you are really curious about this scenario, you can read about the potential evolution of a Future North-American language.

But what languages did the ancestors of Future Americans speak, and who spread them? That question remains far from being settled by our future researchers, in spite of the solidest linguistic and migration models (talking mainly about Briton and Iberian cultures): too many authorities out there questioning them, fighting to impose their own pet theories.

Suddenly, the newly developed field of Human Ancestry comes to save the day. So let’s say we have this map of ancient samples recovered (dated from, say, the 6th to the 18th century AD), and our study is centered on the newly described “Western European” component (a precise combination of, say, WHG+steppe), which peaks in early samples from the Low Lands – hence we call it, quite daringly, “Lowlandic component“.

Our group is keen to demonstrate that the ancient Lowlandic culture described in Archaeology (marked especially by the worldwide distribution of tulips among other traits) is the origin of Western European and American languages… Now, let’s reach conclusions about migrations in the Middle Ages!

america-languages-lowlandic
‘Future American’ hypothesis. Migration routes in Western Europe and the Americas during the Middle Ages, based on the ‘Lowlandic component’ (Click to open higher quality version).

PCA shows that South-West European samples cluster closely to some North-West European samples, and that some late South American samples available cluster at some distance from North American samples – nearer to a native component represented by two individuals with 0% Lowlandic ancestry and a different cluster in PCA. And some North-American samples cluster quite closely to North-West European samples.

Based on the decrease in ‘Lowlandic component’ in the different samples and on PCA, we conclude that Lowlandic peoples (“or their close relatives”) must have migrated at the same time to North America, South America (or potentially from North America to South America?) as well as western, central, and northern Europe. Both migration events must have happened roughly at the same time, in part because both distinct language families appear in a north-south distribution, and Proto-Lowlandic must be (according to Genetics) the ancestor of both, Proto-Future-Germanic and Proto-Future-Romance.

That makes a lot of sense! A huge Lowlandic pressure for migration, you see. Push-pull mechanisms and stuff. A Lowlandic Empire probably (scattered remains are found everywhere)! And, judging by the presence of the ‘Lowlandic component’ in Future East Europe from the Elbe to the Vistula, maybe Lowlandic peoples spread Proto-Slavic, too! We can even date the common Lowlandic-Slavic proto-language this way! So many groundbreaking conclusions!

Future scholars supporting the Lowlandic homeland are on fire; they can’t get enough of publishing papers on the subject. “Two different Future American language families with cultural origins in Britain and Iberia, my ass! Because genetics.”

And don’t forget the future people of haplogroup R1b-U106 and high Lowlandic component: Wow, they are the heirs of those who expanded Future Germanic and Future Romance languages everywhere, aren’t they? How proud they must be. And who wouldn’t want to have these tall, blond, blue-eyed Lowlanders as their forefathers? Personalised genetic analysis is selling like crazy: “let’s know our Lowlandic percentage!”. Everyone is happy, colourful maps with lots of arrows and shit…

But – your future you might ask in awe, seeing that this doesn’t sound quite right, based on your basic archaeological and linguistic knowledge:

  • What about specific models of migration proposed to date? The solidest ones, not just anyone that seems to fit?
  • What about the dialectal classification of languages? The mainstream ones, not those that are compatible with this interpretation?
  • What about archaeological cultures to which individual samples belonged?
  • What about the actual dates of each sample? And how this date relates to the state of the culture to which it belongs?
  • What about the haplogroups, and the actual subclade of each haplogroup?
  • What about the territories, cultures, and dates not sampled, could they change this interpretation in light of known archaeological models?
  • And what about the actual origin of that ancestral component they so frivolously named? Dit it really appear ex nihilo in the Low Lands, and expanded from it?

“Who cares! This new data is sooo coool… And it proves what we wanted, what a coincidence! And it’s numbers, mate! Numbers don’t lie.”

 
No, numbers don’t lie. But people do.

Correlation is fun, isn’t it?

 

Related:

Academic journals can’t be trusted to tell the scientific truth

Dutch researcher Julian Kirchherr has published an interesting article in The Guardian about the reliability of academic journals, and the consequences for the academic world that orbits around them.

Science (and more specifically the scientific publication market) is in a major crisis, and journals are publishing a large amount of articles with fake results – which cannot be replicated in other experiments -, and even false data fabricated by researchers.

The interesting aspect of Mr. Kirchherr’s opinion is that, unlike many others who criticize the shortcomings of the publishing industry, he stresses the value of performance indicators – such as the number of papers published in high-impact journals – for academic institutions.

However, not only novel and surprising results (many of them made by chance) should be taken into account, but also good (and dull) research that doesn’t produce great discoveries, as well as the teaching prowess of a researcher.

The solution cannot be to revolutionize or get rid of the publishing industry: what we need is a proper evolution of a system in crisis, including a general, healthy distrust of methods, materials, results and discussion of any article published in any journal, and not only major scandals.