Consequences of Damgaard et al. 2018 (III): Proto-Finno-Ugric & Proto-Indo-Iranian in the North Caspian region


The Indo-Iranian – Finno-Ugric connection

On the linguistic aspect, this is what the Copenhagen group had to say (in the linguistic supplement) based on Kuz’mina (2001):

(…) a northern connection is suggested by contacts between the Indo-Iranian and the Finno-Ugric languages. Speakers of the Finno-Ugric family, whose antecedent is commonly sought in the vicinity of the Ural Mountains, followed an east-to-west trajectory through the forest zone north and directly adjacent to the steppes, producing languages across to the Baltic Sea. In the languages that split off along this trajectory, loanwords from various stages in the development of the Indo-Iranian languages can be distinguished: 1) Pre-Proto-Indo-Iranian (Proto-Finno-Ugric *kekrä (cycle), *kesträ (spindle), and *-teksä (ten) are borrowed from early preforms of Sanskrit cakrá- (wheel, cycle), cattra- (spindle), and daśa- (10); Koivulehto 2001), 2) Proto-Indo-Iranian (Proto-Finno-Ugric *śata (one hundred) is borrowed from a form close to Sanskrit śatám (one hundred), 3) Pre-Proto-Indo-Aryan (Proto-Finno-Ugric *ora (awl), *reśmä (rope), and *ant- (young grass) are borrowed from preforms of Sanskrit ā́rā- (awl), raśmí- (rein), and ándhas- (grass); Koivulehto 2001: 250; Lubotsky 2001: 308), and 4) loanwords from later stages of Iranian (Koivulehto 2001; Korenchy 1972). The period of prehistoric language contact with Finno-Ugric thus covers the entire evolution of Pre-Proto-Indo-Iranian into Proto-Indo-Iranian, as well as the dissolution of the latter into Proto-Indo- Aryan and Proto-Iranian. As such, it situates the prehistoric location of the Indo-Iranian branch around the southern Urals (Kuz’mina 2001).

NOTE. While I agree with the evident ancestral nature of the *kekrä borrowing, I will repeat it here again: I don’t believe that the distinction of late Proto-Indo-Iranian from ‘Pre-Proto-Indo-Aryan’ loans is warranted; not for words reconstructed from recent Finno-Ugric languages.

The time and place for Finno-Ugric and Indo-Iranian contacts. Late Copper Age migrations in Asia ca. 2800-2300 BC.

In this period of a Pre-Proto-Indo-Iranian community, which is to be associated with East Yamna/Poltavka, ca. 3000-2400 BC – as accepted in the supplement from de Barros Damgaard et al. (Nature 2018) – , both Poltavka and Abashevo/Balanovo herders were expanding ca. 2800-2600 BC to the east (and Abashevo already admixing into Poltavka territory), near the southern Urals.

There is no other, clearer, later connection between Finno-Ugric and Proto-Indo-Iranian speakers. Even the arrival of the Seima-Turbino phenomenon (after ca. 2000 BC), if it brought migrants to North-East Europe, would not fit the linguistic, archaeological, or genetic data. It is by now quite clear that Seima-Turbino does not fit with incoming N1c1 lineages and/or Siberian ancestry, either, for those looking for these as potential signs of incoming Uralic speakers.

While the Copenhagen group did not have access to data from Sintashta ca. 2100 BC onwards – now available in Narasimhan et al. (2018) – when submitting the papers, we already know that there was a clear long period of slow progressive admixture in the North Caspian region. It can be seen in the genetic contribution of Yamna to incoming Abashevo groups, and in the R1b-L23 samples still appearing in Sintashta until ca. 1800 BC (as I predicted could happen).

Since the first sample signalling incoming Abashevo migrants is found in the Poltavka outlier dated ca. 2700 BC (of R1a-Z93 lineage), this represents a rather unique, several centuries long process of admixture in the North Caspian region, different from the massive Afanasevo or Bell Beaker migrations in Asia and Europe, whereby a great part of the native male population was suddenly replaced.

This offers further support for language continuity despite genetic replacement in the development of East Yamna/Poltavka (part of the Steppe EMBA cline, formed by Yamna and Afanasevo) mixing with Abashevo migrants (probably identical to Corded Ware samples) to form Potapovka, Sintashta, and later Srubna, and Andronovo communities (all forming, with Corded Ware groups, a wide Eurasian Steppe MLBA cloud). See the available data from Narasimhan et al. (2018).

Image modified from Narasimhan et al. (2018), including the most likely proto-language identification of different groups. Original description “Modeling results including Admixture events, with clines or 2-way mixtures shown in rectangles, and clouds or 3-way mixtures shown in ellipses”. See the original full image here.

The continuous interactions and migrations left thus eventually two communities in the southern Urals genetically similar, but ethnolinguistically diverse:

  • To the north, Abashevo-Balanovo – but potentially also Fatyanovo, and related North-East European late Corded Ware groups – borrowed necessary words from Indo-Iranian neighbours, while maintaining their Finno-Ugric language and culture.
  • To the south, immigrants (or their descendants) of Abashevo origin expanding among Pre-Proto-Indo-Iranian-speaking North Caspian communities assimilated the surrounding culture and language, giving it their own accent (i.e. ‘satemizing’ it) and turning it into Proto-Indo-Iranian (see e.g. Parpola’s account).

Anthropologically, this ‘long-term founder effect’ that appears as genetic replacement is probably explained by the faster life history in MLBA North Caspian populations, likely due to a combination of changing environmental and social circumstances.

NOTE. The prevalent explanation before the latest studies on the Sintashta society were social strife and isolation of small groups, an argument I used in my demic diffusion model. Other, similar cases of proven linguistic continuity despite genetic replacement are seen in Iberian Bronze Age after the expansion of R1b-L23 lineages (with Vasconic, Iberian, and Tartessian surviving at least until proto-historic times), and in Remote Oceania.

Diachronic map of migrations in Asia ca. 2250-1750 BC

Implications for Late PIE migrations

I am happy to see that people are resorting now to dialectal classifications and Y-DNA to explain the findings in Old Hittites, Tocharians (and related migrations), and Indo-Iranians. It is especially interesting to see precisely this Danish group downplay the relevance of ancestry and favor complex anthropological models when assessing migrations and ethnolinguistic identification.

So let’s talk about the growing elephant in the room.

It seems we all accept now Tocharian’s more archaic Late PIE nature, which is supported by waves of late Khvalynsk migrants starting probably ca. 3300 BC, as seen in different samples to the east in Central Asia, and to the south in Iran. Almost all of them share R1b-L23 lineages.

NOTE. Whereas their early LPIE dialects have not survived to historic times, the rather speculative hypotheses of Euphratic and Gutian languages may be of interest.

We also know of the coetaneous migrants that settled to the west of the Don River (in the territory of the previous late Sredni Stog culture), to form the western South-Bug / Lower Don groups, which, together with the Volga-Ural / North Caucasian groups formed the early Yamna culture, that dominated from ca. 3300 BC over the Pontic-Caspian steppe.

It is only logical that the other attested languages belonging to the common Late PIE trunk must come from these groups, which must have stuck together for quite some time – after the recently proven late Khvalynsk migrations – , to allow for the spread of isoglosses (not found in Tocharian) among them.

This is agreed, even by the Copenhagen group, who expressly state that Yamna is to be identified with the rest of Late PIE languages after the Tocharian-related migrations.

Early Yamna community and its migrations ca. 3000 BC onwards.

The period of an early Yamna community constrained to the Pontic-Caspian steppe (ca. 3300-3000 BC) is followed by renewed waves of Late Proto-Indo-European migrations, during which areal contacts and innovations (even between unrelated LPIE branches) can still be reconstructed.

These later migrations can be precisely described as follows (after the latest studies):

  • Yamna migrants, of mixed R1b-L51 and R1b-Z2103 lineages, settle ca. 3000-2600 BC along the lower Danube, in the Balkans and the Carpathian basin, giving rise later to groups of:
  • In the Pontic-Caspian steppe, early Yamna groups evolve into (from west to east) Late Yamna, Catacomb, and Poltavka groups, ca. 2800-2300 BC, all still dominated by R1b-L23 lineages (see discussion on the Catacomb sample), with:
    • Poltavka peoples admixing with Abashevo migrants to form admixed Potapovka and Sintashta-Petrovka groups, showing still after ca. 1800 BC a mixed society of R1a-Z93 and R1b-Z2103 lineages (see Narasimhan et al. 2018);
      • Expanding early Proto-Iranian and Proto-Indo-Aryan groups in Srubna (to the west) and Andronovo (to the east), during the first half of the 2nd millennium BC, dominate over the Bronze Age steppe and Central Asia with expanding R1a-Z93 lineages.


Diachronic map of Late Copper Age migrations including Classical Bell Beaker (east group) expansion from central Europe ca. 2600-2250 BC

1) East Bell Beakers clearly dominated culturally and genetically over almost all of Europe, ca. 2500-2000 BC, including previous Corded Ware territory, representing thus the most recent massive migration of steppe peoples in Europe, and being the only pan-European culture derived from Late Proto-Indo-European-speaking Yamna. They must therefore be identified with North-West Indo-European speakers, as proposed by Mallory (2013), and not just Italo-Celtic (as supported recently by the Danish school, based on Gimbutas’ outdated model):

1.A) For Germanic, we already have proof that an appropriate, unitary Scandinavian society, ripe for the development of a common Pre-Germanic language (that expanded much later, during the Iron Age, as Proto-Germanic) could have developed only after the arrival of Bell Beakers (see Prescott 2017). The association of proto-historic Germanic tribes mainly with the expansion of R1b-U106 lineages bears witness to that.

NOTE. Even without taking into account the likely L51 samples from Khvalynsk, it is by now quite clear that R1b-L51 lineages were already admixed in Yamna settlers from the Carpathian Basin, and any subclade of U106, L21, DF27, or U152 can thus be found everywhere in Europe associated with any of those North-West Indo-European migrations. What we are seing later, as in the East Bell Beaker migrants arriving in the British Isles (L21), Iberia (DF27), or the Netherlands/Scandinavia (U106), is the further reduction in variability coupled with the expansion of a few sucessful families (and their lineages), as we know it usually happens during migrations.

1.B) For Balto-Slavic, it seems they were not part of the eastern Corded Ware peoples: the Copenhagen group denies an Indo-Slavonic group in the Nature paper, referring instead to a dominion of early Iranians in the steppes, following their traces to proto-historic and historic Iranian-speaking peoples. And we knew already that Bell Beakers dominated over Central-East Europe, before the resurge of R1a-Z645 lineages in the region, which is compatible with the North-West Indo-European nature of their language undergoing a satemization process similar (but not equal to) to the Indo-Iranian one (see the full discussion on Balto-Slavic here).

NOTE. The few ancestral traits common to Germanic and Balto-Slavic are today considered a common substrate language to both, and not due to close contacts (and still less a common branch, as was proposed in the 1st half of the 20th c.). You can read e.g. Kortlandt’s Baltic, Slavic, Germanic (2017), or our Corded Ware substrate hypothesis (2017). In both theories, the referenced substrate is likely a non-Indo-European language, and in both cases it is related to the Corded Ware culture, which represents their most common immediate ancestral population before the spread of Bell Beakers.

2) The late Corded Ware groups of Finland and Estonia, as well as Fatyanovo and Abashevo (and succeeding groups of Eastern Europe) may now be more clearly associated with Proto-Finno-Ugric dialects, and thus probably Corded Ware groups in general with Uralic languages, whose western branches have not survived to this day, with their culture and language being replaced quite early by expanding Bell Beakers.

NOTE. While the demise of Central and Central-East European CWC groups is evident, continuous contacts among Battle Axe culture groups in Scandinavia and the Gulf of Finland through the Baltic Sea – and the strong Bronze Age Palaeo-Germanic influence on Finnic languages (stronger than earlier Indo-Iranian borrowings) may point to the continuity of Proto-Finnic in Northern Scandinavia, which may force a reinterpretation of the prehistoric location of Proto-Finnic-speaking groups.

Those supporting a Corded Ware expansion of Germanic or Balto-Slavic with R1a subclades, now rejecting the expansion of Proto-Indo-European from an Anatolian homeland (following the spread of Neolithic farmer ancestry), and negating the close Proto-Indo-Iranian – Uralic contacts, are willfully ignoring linguistic, archaeological, and genetic data whenever it does not fit with their previous theories.

Good times ahead to chase false syllogisms and contradictions everywhere.


Y-DNA haplogroup R1b-Z2103 in Proto-Indo-Iranians?


We already know that the Sintashta -> Andronovo migrants will probably be dominated by Y-DNA R1a-Z93 lineages. However, I doubt it will be the only Y-DNA haplogroup found.

I said in my predictions for this year that there could not be much new genetic data to ascertain how Pre-Indo-Iranian survived the invasion, gradual replacement and founder effects that happened in terms of male haplogroups after the arrival of late Corded Ware migrants, and that we should probably have to rely on anthropological explanations for language continuity despite genetic replacement, as in the Basque case.

Nevertheless, since we have very few samples, I think we could still see a clear genetic contribution from Yamna to Corded Ware immigrants in the North Caspian region (from Abashevo, in turn a mix of Fatyanovo/Balanovo and Catacomb/Poltavka cultures) in terms of:

  • Ancestral components and PCA in new Sintashta-Petrovka, Andronovo, and/or later samples – similar the ‘steppe’ drift seen in Potapovka relative to Sintashta samples, both formed by incoming Corded Ware migrants – ; and
  • R1b-L23 subclades, either appearing scattered during the Sintashta melting pot (of Abashevo/R1a-Z645 and East Yamna-Poltavka/R1b-Z2103 peoples), or resurging after this period, as we have seen in Pre-Balto-Slavic territory.

This contribution could better explain the obvious language continuity in the region, beautifully complementing the complex anthropological model we have now of archaeological continuity of Sintashta and Potapovka with the previous Poltavka, seen in a similar material and symbolic culture that survived the arrival of newcomers.

A lot of people seem to be looking like crazy since O&M 2018 for some sort of connection between Corded Ware and Yamna migrants in Eastern and Central Europe (wheter in SNP calls of samples published, or among almost forgotten academic papers), either to support the ideas of the 2015 papers – for those who relied on their conclusions and built (even if only mentally) far-fetched migration models around it – , or just because of some sort of absurd continuity theory involving modern R1a-Z645 subclades:

NOTE. The situation we have seen with the hundreds of samples from O&M 2018, and with the recent additional Eastern European samples, depict an unexpected absolutely clear-cut distinction in Y-DNA haplogroups between Corded Ware and Yamna/Bell Beaker: I really can’t see how the situation could be more obvious for everyone, so I doubt any further samples will make certain people change their minds. Their hope is, I guess, that just one sample may give some more oxygen to infinite pet theories, as we are still surprisingly seeing even with reactionary R1b autochthonous continuists in Western Europe…

However, looking into the most likely future for the field, what we should be expecting right now is continuity of Yamna ancestry and lineages in early Proto-Indo-Iranian territory. Since we only have a few samples from Sintashta-Petrovka, Potapovka, and Andronovo, I think there might be a sizeable number of R1b-Z2103 subclades in the territory inhabited by those who – no doubt – spread the language into Central Asia.

Modern Y-DNA haplogroup R1b distribution, by Maulucioni at Wikipedia

While full population replacement by R1a-Z93 lineages in the North Caspian region ca. 2000 BC is not impossible, I don’t think it is very likely, since we already know that there are R1b-Z2103 lineages widely distributed in Indo-Iranian-speaking territory, and Z93 is now known to be an older subclade than YFull’s mean formation date suggested (due to the Ukraine_Eneolithic I6561 sample‘s SNP call), so what we can infer now that actually happened in Sintashta -> Andronovo is not exactly the spread of haplogroup Z93 during its formation, but rather a regional reduction in its variability coupled with the expansion of some of its subclades.

The main question, after the South Asia paper is finally published, will then be:

  1. Given that Yamna peoples were an elite group of patrilineally-related families mainly of R1b-L23 subclades:
  2. Accepting that PCA, ADMIXTURE, and other statistical methods are not relevant (alone) for ethnolinguistic identification: e.g. Yamna ‘outliers’ and East Bell Beaker migrants of R1b-L23 lineages without steppe ancestry; N1c1a1a-L392 lineages and Siberian ancestry unrelated to Uralic speakers; R1a-Z645 and steppe ancestry in North-East Europe related to Uralic-speaking cultures
  3. If we find now, as I expect, genetic continuity of east Yamna in Sintashta -> Andronovo (relative to other late Corded Ware peoples), probably including haplogroup R1b-Z2103 mixed with R1a-Z93 before its further reduction of subclades (e.g. to L657) and expansion during its subsequent spread southward…

Diachronic map of migrations in Asia ca. 2250-1750 BC

Why exactly do we need Corded Ware to explain migrations of Late Indo-European speakers?

In other words: if we had the data we have today in 2015, would we have a need for Corded Ware to explain Indo-European migrations from the steppe? Are some people so blinded by their will to (appear to) be right in their past interpretations that they can’t just let go?

NOTE. On a side note, wouldn’t it be nice for this paper to publish some other R1b-L23 (x2103) sample – maybe even R1b-L51 – in Yamna, Andronovo, or Afanasevo territory, to end both autochthonous continuity theories (of North-Eastern and Western Europe) at the same time?

I really hope someone in David Reich’s team understands this matter, or else they will still identify Corded Ware as the (now probably ‘a’ instead) vector of expansion of Indo-European languages, and some of us will still have fun for another 2 or 3 years with such conclusions, until someone in the lab realizes that ancestry ≠ population ≠ ethnic identification ≠ language.

NOTE. It seems rather dull to read how people are discussing in the Twitterverse conventional constructs like ‘human race‘ as found in Reich’s op-ed in The New York Times, as if such grandiose semantic discussions had any practical meaning, when basic anthropological questions actually relevant for Genomics, like the essential ancestral component ≠ people tenet seem not to be of interest for anyone in the field….

Since our Indo-European demic difusion model (and its consequences for our reconstruction of North-West Indo-European) and this blog are becoming more and more popular each day – judging by the constant growth in visits in the past 6 months or so – , I guess the simplemindedness and predictability of certain geneticists is benefitting traditional anthropology directly, driving more and more amateur geneticists to look for sound academic models to answer the growing inconsistencies of genetic research.

NOTE. I am not saying the rejection of Corded Ware as spreading Indo-European is definitive. Maybe more samples within some years will depict a clear ancient expansion of Early or Middle Proto-Indo-Europeans from Khvalynsk to the forest-steppe and forest zone, and later with certain Corded Ware migrants into Central Europe, over whose territory a Late Indo-European dialect from Bell Beakers became the superstrate, as some have proposed in the past – e.g. to explain Krahe’s Old European hydronymy. I really doubt you could demonstrate such an old ethnolinguistic identification with a clear, unbroken archaeological trail, though, and we know now that this old hydronymy is probably of Late Indo-European nature (possibly even more recent).

What I am saying is: with the data we have now, it does not make any sense to keep the anthropological models invented by geneticists ex nihilo in 2015, and the hundred different alternative Late Indo-European migration models that arebornwitheachnewpaper.

These Yamna -> Corded Ware migration models didn’t have any sense for me since early 2016, but now after O&M 2017, and especially O&M 2018, I don’t think any geneticist with a little knowledge in Linguistics or Archaeology (if they are decent about their quest for truth in describing ancient European migrations) would buy them, if not for some sort of created ‘tradition’. So let’s ditch Corded Ware as Late Indo-European-speaking, let’s accept that late Corded Ware migrants should most likely be identified as early Uralic speakers, and then future data will tell if we are – again – wrong.

Please, don’t let Genomics become another pseudoscience based solely on Bioinformatics like glottochronology: let anthropologists (preferably mainstream archaeologists, but also the true Indo-Europeanists, linguists) help you interpret your raw data. Don’t deceive yourselves thinking that you have read enough about the Indo-European question, or that you know enough Indo-Europeanists (say what?) to derive your own conclusions.

Use the South Asia paper to begin expressly retracting the Corded Ware mess.

Please pretty please with sugar on top?


For commenters: this post concerns an anthropological question, and deals with the expansion of Late Proto-Indo-European speakers from Yamna, and the controversy surrounding the role of Corded Ware migrants that a handful of academics propose spread from it, based on a renewed model of Gimbutas’ outdated Kurgan theory and on the so-called ‘Yamnaya’ ancestry.

It happens so that the discussion has turned lately mainly to ancient Y-DNA haplogroups, because they help confirm previous mainstream anthropological models of cultural diffusion and migration. It is obviously not reasonable to judge prehistoric ethnolinguistic migrations from ca. 5,000 years ago based on historical nation-states and ethnic or religious concepts invented since the Middle Ages, coupled with “your” people’s main modern (or your own) paternal lineage.

EDIT (27 MAR 2018): Minor corrections and post made shorter.

Uralic as a Corded Ware substrate of Indo-Iranian, and loanwords in Finno-Ugric


Asko Parpola has recently published a new paper, Finnish vatsa ~ Sanskrit vatsá and the formation of Indo-Iranian and Uralic languages.


Finnish vatsa ‘stomach’ < PFU *vaćća < Proto-Indo-Aryan *vatsá- ‘calf’ < PIE *vet-(e)s-ó- ‘yearling’ contrasts with Finnish vasa- ‘calf’ < Proto-Iranian *vasa- ‘calf’. Indo-Aryan -ts- versus Iranian -s- refl ects the divergent development of PIE *-tst- in the Iranian branch (> *-st-, with Greek and Balto-Slavic) and in the Indo-Aryan branch ( > *-tt-, probably due to Uralic substratum). The split of Indo-Iranian can be traced in the archaeological record to the differentiation of the Yamnaya culture in the North Pontic and Volga steppes respectively during the third millennium BCE, due to the use of separate sources of metal: the Iranian branch was dependent on the North Caucasus, while the Indo-Aryan branch was oriented towards the Urals. It is argued that the Abashevo culture of the Mid-Volga-Kama-Belaya basins and the Sejma-Turbino trade network (2200–1900 BCE) were bilingual in Proto-Indo-Aryan and PFU, and introduced the PFU as the basis of West Uralic (Volga-Finnic) into the Netted Ware Culture of the Upper Volga-Oka (1900–200 BCE).

He updates thus his quite recent model from On the emergence, contacts and dispersal of Proto-Indo-European, Proto-Uralic and Proto-Aryan in an archaeological perspective (2017).

In it he supported a North-West Indo-European expansion with Corded Ware, and a Neolithic Proto-Uralic community in East Europe (associated with the Comb Ware culture), as I did before the famous 2015 papers.

In fact, he supports that the satemization trend of Proto-Indo-Iranian is due to a Proto-Finno-Ugric substratum in its population in the Volga-Ural region, similar to the model I propose (with the Corded Ware substratum hypothesis).

NOTE. While for Parpola the ‘satemizing’ substratum of Balto-Slavic (a NWIE dialect) may not come exactly from the same Finno-Ugric population as for Indo-Iranian, but from a different Uralic dialect (as I explain in my hypothesis), for the few extant supporters of an Indo-Slavonic group there should not be any problem identifying the same ancient substrate as for the Proto-Indo-Iranian population…

Now that North-West Indo-European is clearly associated with the Yamna -> Bell Beaker expansion, I understand that his previous model is obsolete and needs a revision.

I find it especially difficult to understand (in light of his previous theory) why he compares Indo-Aryan *vatsa– and Iranian *vasa– to assert that the former is the origin of the loanword in Finno-Ugric, when the Proto-Indo-Iranian form is essentially the same as the Indo-Aryan one, with respect to the *w– evolution into *v– in both PII and late FU dialects…

NOTE: I wrote him yesterday asking for this issue, I will post here his answer.

EDIT (20 MAR 2018): The summary of his answer regarding his selection of Indo-Aryan *vatsa– vs. Iranian *vasa– (instead of just PII *watsa-/vatsa-) is one based on Archaeology (and likley guesstimates), since he understands the split into Iranian and Indo-Aryan to have happened early within the Yamna culture, so that the cultural admixture of Abashevo must have happened after the separation.

Potential spread of Finnic. “Distribution of the Netted Ware according to Carpelan (2002: 198). A: Emergence of the Netted Ware on the Upper Volga c. 1900 calBC. B: Spread of Netted Ware by c. 1800 calBC. C: Early Iron Age spread of Netted Ware. (After Carpelan 2002: 198 > Parpola 2012a: 151.)

His effort to link the actual expansion of Finno-Ugric to Corded Ware territory, linking it also partially to population movements from the Seima-Turbino phenomenon – probably associated with the initial expansion of N1c lineages – is another good example of convergence of the different anthropological theories thanks to recent Genomic studies.


Admixture of Srubna and Huns in Hungarian conquerors


New preprint at BioRxiv, Mitogenomic data indicate admixture components of Asian Hun and Srubnaya origin in the Hungarian Conquerors, by Neparáczki et al. (2018), at BioRxiv.

Abstract (emphasis mine):

It has been widely accepted that the Finno-Ugric Hungarian language, originated from proto Uralic people, was brought into the Carpathian Basin by the Hungarian Conquerors. From the middle of the 19th century this view prevailed against the deep-rooted Hungarian Hun tradition, maintained in folk memory as well as in Hungarian and foreign written medieval sources, which claimed that Hungarians were kinsfolk of the Huns. In order to shed light on the genetic origin of the Conquerors we sequenced 102 mitogenomes from early Conqueror cemeteries and compared them to sequences of all available databases. We applied novel population genetic algorithms, named Shared Haplogroup Distance and MITOMIX, to reveal past admixture of maternal lineages. Phylogenetic and population genetic analysis indicated that more than one third of the Conqueror maternal lineages were derived from Central-Inner Asia and their most probable ultimate sources were the Asian Huns. The rest of the lineages most likely originated from the Bronze Age Potapovka-Poltavka-Srubnaya cultures of the Pontic-Caspian steppe, which area was part of the later European Hun empire. Our data give support to the Hungarian Hun tradition and provides indirect evidence for the genetic connection between Asian and European Huns. Available data imply that the Conquerors did not have a major contribution to the gene pool of the Carpathian Basin, raising doubts about the Conqueror origin of Hungarian language.

“Comparison of major Hg distributions from modern and ancient populations. Asian main Hg-s are designated with brackets. Major Hg distribution of Conqueror samples from this study are very similar to that of other 91 Conquerors taken from previous studies [11,12]. Scythians and ancient Xiongnus show similar Hg composition to the bracketed Asian fraction of the Conqueror samples, but Hg B is present just in Xiongnus. Modern Hungarians have very small Asian components pointing at small contribution from the Conquerors. Of the 289 modern Hungarian mitogenomes 272 are published in [29]. Scythian Hg-s are from [48,49,55,59,71–74]. Xiongnu Hg-s are from [66–69].”

Just recently another article contributed to a similar idea. I already talked about the Bronze Age R1a-z93 sample with high steppe ancestry found in the Balkans, and its likely origin in an expansion of the Srubna or a related culture. No truce, therefore, for those looking for autochthonous continuity anywhere in Europe.

We are seeing how multiple migrations shaped the history of the Carpathian basin (and its complex genetic structure) – and of Europe in general -, often from the Pontic-Caspian steppe. That is clear from many different prehistorical and historical times, such as the expansions of Suvorovo-Novodanilovka, Yamna, Srubna, Thraco-Cimmerians, Sarmatians, Scythians, Huns,…

About the linguistic interpretations based on genetics contained in the paper (Hungarian language as a legacy of Huns), well, you know my stance regarding the Yamnaya ancestral concept (and the wrong linguistic interpretations derived from it, which many sadly keep to this day), and genetics in general to solve language questions

This is yet another example of how (what some people would call) “scientific data” is useless without sound anthropological models.

Featured image, from the article: “Hypothetic origin and migration route of different components of the Hungarian Conquerors. Bluish line frames the Eurasian steppe zone, within which all presumptive ancestors of the Conquerors were found. Yellow area designates the Xiongnu Empire at its zenith from which area the East Eurasian lineages originated. Phylogeographical distribution of modern East Eurasian sequence matches (Fig. 1) well correspond to this territory, especially considering that Yakuts, Evenks and Evens lived more south in the past [108], and European Tatars also originated from this area. Regions where Asian and European Scythian remains were found are labeled green, pink is the presumptive range of the Srubnaya culture. Migrants of Xiongnu origin most likely incorporated descendants of these groups. The map was created using QGIS 2.18.4[109]”.

Article available under a CC-BY-NC-ND 4.0 International license.

Discovered via Razib Khan.

See also:

The concept of “Outlier” in Human Ancestry (II): Early Khvalynsk, Sredni Stog, West Yamna, Iron Age Bulgaria, Potapovka, Andronovo…


I already wrote about the concept of outlier in Human Ancestry, so I am not going to repeat myself. This is just an update of “outliers” in recent studies, and their potential origins (here I will repeat some of the examples):

Early Khvalynsk: the three samples from the Samara region have quite different positions in PCA, from nearest to EHG (of Y-DNA haplogroup R1a) to nearest to ANE ancestry (of Y-DNA haplogroup Q). This could represent the initial consequences of the second wave of ANE ancestry – as found later in Yamna samples from a neighbouring region -, possibly brought then by Eurasian migrants related to haplogroup Q.
With only 3 samples, this is obviously just a tentative explanation of the finds. The samples can only be reasonably said to show an unstable time for the region in terms of admixture (i.e. probably migration), judging by the data on PCA.

Ukraine Eneolithic samples offer a curious example of how the concept of outlier can change radically: from the third version (May 30th) of the preprint paper of Mathieson et al. (2017), when the Ukraine Eneolithic sample with steppe ancestry (and clustering with central European samples) was the ‘outlier’, to the fourth version (September 19th), when two samples with steppe ancestry clustering close to Corded Ware samples were now the ‘normal’ ones (i.e. those representing Ukraine Eneolithic population), and the outlier was the one clustering closely with Ukraine Mesolithic samples…

PCA and Admixture for south-eastern Europe. Image modified from Mathieson et al. (2017) – Third revision (May 30th), used in the 2nd edition of the Indo-European demic diffusion model.

This is one of the funny consequences of the wrong interpretation of the ‘yamnaya component’, that made geneticists believe at first that, out of two samples (!), the ‘outlier’ was the one with ‘yamnaya’ ancestry, because this component would have been brought by an eastern immigrant from early Khvalynsk…

This example offers yet another reason why precise anthropological context is necessary to offer the right interpretation of results. Within the Indo-European demic diffusion model – based mainly on Archaeology and Linguistics – , the sample with steppe ancestry was the most logical find in the region for a potential origin of the Corded Ware culture, and it was interpreted as such, well before the publication of the fourth version of Mathieson et al. (2017).

PCA of South-East European and other European samples. Image modified from Mathieson et al. (2017) – Fourth revision (September 19th), used in the 3rd edition of the Indo-European demic diffusion model.

West Yamna (to insist on the same question, the ‘yamnaya’ component): we have only four western Yamna samples, two of them showing Anatolian Neolithic ancestry (one of them, from Ukraine, with a strong ‘southern’ drift). On the other hand, Corded Ware migrants do not show this. So we could infer that their migrations were not coetaneous: whereas peoples of Corded Ware culture expanded ca. 3300 BC to the north – in the natural corridor to the Baltic that has been proposed for this culture in Archaeology for decades (and that is well represented by Ukraine Eneolithic samples) -, peoples of Yamna culture expanded to the west, replacing the Ukraine Eneolithic population (i.e. probably those of ‘Proto-Corded Ware culture’), and eventually mixing with Balkan populations of Anatolian Neolithic ancestry.

Potapovka, Andronovo, and Srubna: while Potapovka clusters closely to the steppe, and Andronovo (like Sintashta) clusters closely to Corded Ware (i.e. Ukraine Neolithic / Central-East European), both have certain ‘outliers’ in PCA: the former has one individual clustering closely to Corded Ware, and the latter to the steppe. Both ‘outliers’ fit well with the interpretation of the recent mixture of Corded Ware peoples with steppe populations, and they offer a different image for the evolution of populations of Potapovka and Sintashta-Petrovka, potentially influencing their language. The position of Srubna samples, nearer to Sintashta and Andronovo (but occupying the same territory as the previous Potapovka) offers the image of a late westward conquest from Corded Ware-related populations.

Diachronic map of migrations ca. 2250-1750 BC

Iron Age Bulgaria: a sample of haplogroup R1a-z93, with more ‘yamnaya’ ancestry than any other previous sample from the Balkans. For some, it might mean continuity from an older time. However – as with the Corded Ware outlier from Esperstedt before it – it is more likely a recent migrant from the steppe. The most likely origin of this individual is therefore people from the steppe, i.e. either the Srubna culture or a related group. Its relatively close cluster in PCA to certain recent Slavic populations can be interpreted in light of the multiple back and forth migrations in the region: of steppe populations to the west (Srubna, Cimmerians, Scythians, Sarmatians,…), and of Slavic-speaking populations:

Diachronic map of Bronze Age migrations ca. 1750-1250 BC.

Well-defined outliers are, therefore, essential to understand a recent history of admixture. On the other hand, the very concept of “outlier” can be a dangerous tool – when the lack of enough samples makes their classification as as such unjustified -, leading to the wrong interpretations.


The concept of “outlier” in studies of Human Ancestry, and the Corded Ware outlier from Esperstedt


While writing the third version of the Indo-European demic diffusion model, I noticed that one Corded Ware sample (labelled I0104) clusters quite closely with steppe samples (i.e. Yamna, Afanasevo, and Potapovka). The other Corded Ware samples cluster, as expected, closely with east-central European samples, which include related cultures such as the Swedish Battle Axe, and later Sintashta, or Potapovka (cultures that are from the steppe proper, but are derived from Corded Ware).

I also noticed after publishing the draft that I had used the wording “Corded Ware outlier” at least once. I certainly had that term in mind when developing the third version, but I did not intend to write it down formally. Nevertheless, I think it is the right name to use.

PCA of dataset including Minoans and Mycenaeans, and Scythians and Sarmatians. The graphic has been arranged so that ancestries and samples are located in geographically friendly axes similar to north-south (Y), east-west(X). Symbols are used, in a simplified manner, in accordance with symbols for Y-DNA haplogroups used in the maps. Labels have been used for simplification of important components. Areas are drawn surrounding Yamna, Poltavka, Afanasevo, Corded Ware (including samples from Estonia, Battle Axe, and Poltavka outlier), and succeeding Sintashta and Potapovka cultures, as well as Bell Beaker. Corded Ware sample I0104, from Esperstedt, has also been labelled.

Outlier in Statistics, as you can infer from the name, is a sample (more precisely an observation) that lies distant to others. It is a slippery concept in Human Evolutionary Biology, because it has no clear definition, and it is thus dependent on a certain degree of subjective evaluation. It seems to be mainly based on a combination of PCA and ADMIXTURE analyses, but should obviously be dependent on the number of samples available for a certain culture, and the regional distribution of the samples available.

We have thus certain clear cases, like the Poltavka outlier, of R1a-M417 lineage, clustering close to Corded Ware (and Sintashta, and Potapovka) samples, but far from other R1b-L23 samples from Poltavka or Yamna cultures, from neighbouring regions in the steppe.

We have also less clear observations, like Balkan Chalcolithic samples, which may or may not have been part of different cultural groups (say, related to the Suvorovo-Novodanilovka expansion, or not), which may justify their differences in ancestral components in ADMIXTURE, and in their position in PCA.

And we have a Yamna sample from western Ukraine, which – unlike the other two available samples – clusters “to the south” of east Yamna samples. Taking into account the Yamna sample from Bulgaria, clustering closely with south-eastern European samples, could you really call this an outlier? Two outliers out of four western Yamna samples? Well, maybe. If you take east and west Yamna from the steppe as a whole, and exclude the Yamna sample from Bulgaria, of course you can. Whether that classification is useful, or actually hinders a proper interpretation of western Yamna samples, and of the “Yamna component” seen in them, is a different story…

PCA for European samples of Mathieson et al. (2017)

But what then about the Corded Ware male from Esperstedt, labelled I0104, dated ca. 2430 BC, which clusters among contemporaneous steppe (Poltavka) samples, and has the greatest proportion of ‘Yamna component’ in ADMIXTURE? After all, it is different in both respects from any other Corded Ware individual – including the oldest samples available, from Latvia (ca. 2885 BC) and Tiefbrunn (ca. 2755 BC).

This sample is one of the direct links between the steppe and Corded Ware in late times, and has been the main reason for the confusion a lot of people seem to have about the “Yamna component” in Corded Ware, with some supporting a direct migration from one into the other, and a few even daring to say that “Corded Ware is indistinguishable from Yamna”(!?).

His family members – all males of haplogroup R1a-M417 (like I0104 and most males from the Corded Ware culture) -, few generations later, show a decreased Yamna component, which clearly indicates that this individual’s admixture came directly from the steppe, and most likely from one or multiple female ancestors. That is compatible with the nomadic nature of the Corded Ware culture (and its known exogamy practices), which connected central Europe with the steppes, up to the North Caspian region.

If labelling other samples as outliers may be interesting to improve the conclusions one can obtain from genetic research, labelling this sample is, in my opinion, essential, to avoid certain strong misconceptions about the origin of the Corded Ware culture.