Eurasian steppe chariots and social complexity during the Bronze Age


New paper (behind paywall), Eurasian Steppe Chariots and Social Complexity During the Bronze Age, by Chechushkov and Epimakhov, Journal of World Prehistory (2018).

Interesting excerpts (emphasis mine):

Nowadays, archaeologists distinguish at least three Bronze Age pictorial traditions on the basis of style, and demonstrate some parallels in the material culture. The earliest is the Yamna–Afanasievo tradition, which is characterized by the symbolic depiction of sun-headed men and animals. Another tradition is a record of the Andronovo people (Kuzmina 1994; Novozhenov 2012), who depicted in it their everyday life and the importance of wheeled transport (Novozhenov 2014a, b). Although petroglyphs on open-air natural rock surfaces are obviously hard to date, the occurrence of similar carvings on stone grave stelae within some Andronovo culture cemeteries (such as the Tamgaly Cemetery and the Samara Cemetery in Sary Arka, Kazakhstan) provide a level of chronological control. Finally, the finds of petroglyphs depicting chariots in the burials of the Karasuk culture (c. 1400–800 BC) in southern Siberia and Kazakhstan allow us to distinguish the latest tradition (Novozhenov 2014b).

“Depictions of a chariot on the petroglyphs, the Koksu River valley, Kazakhstan (redrawn after Novozhenov 2012, p. 45, with the author’s permission)”

The site of Sintashta in the steppe zone of the Southern Trans-Urals (the eastern side of the Ural Mountains) was excavated in the 1970s and yielded abundant Bronze Age material, including unparalleled evidence of six vehicles buried in graves, each with two spoked wheels accompanied by cheekpieces and sacrificial horses (Gening 1977; Gening et al. 1992). (…) Chariot remains from the Middle and Late Bronze Age in the southern Urals are quite abundant compared with early chariot remains from other parts of the world, and allow statistical analysis.

In contrast, only two wagons and one sledge were found in the Royal Cemetery of Ur (Woolley 1965), and only ten actual chariots and their parts are known from tombs of the New Kingdom of Egypt (1550–1069 BC) (Littauer and Crouwel 1985; James 1974; Herold 2006), with the rest of the information on the Near Eastern chariots coming in other forms. Two chariots and the wheels of a third were also found in the Lchashen Cemetery in Armenia (Yesayan 1960), dated to 1400–1300 BC (Pogrebova 2003, p. 397), and bronze models of chariots were found in the burial sites of neighboring Transcaucasia (Brileva 2012). Over one hundred chariots have been discovered in Shang period tombs in China, but none dates before 1200 BC (Wu 2013).

Sintashta–Petrovka chariots were functional and used for carrying passengers and, probably, for warfare. Otherwise, one would not expect to see consistency in the measurements and technological solutions (…)

(1) The technological solutions used to construct a wheel and its dimensions are derived from the measurements of the ‘wheel pits’. They allow such analysis because some had the actual imprints of felloes and spokes. (…) Due to the imprints of spokes and felloes left in the soil, it is clear that the Bronze Age people knew of and utilized the spoked wheel.

(2) Wheel track is the distance between the centerlines of two wheels on an axle. It can be estimated on the basis of the distance between the central axes of all known wheel pits, in addition to direct measurement of the eight known cases of wheel imprints.(…) the majority of findings with a mean wheel track of 136 ± 12 cm might represent either a single-driver chariot or a vehicle with two passengers who accessed the vehicle from the rear, since one extreme of this wheel-track provides enough space for a standing person, while another is suitable for a driver and passenger.

(3) The means of traction is the element that connects the vehicle to the yoke of the draft animals (Littauer et al. 2002, p. xvii). It is needed for a vehicle to be pulled by harnessed animals and is constructed as a central draft pole located between the animals, or shafts located on the external sides of the animals, called thills. (…) Using burial chamber size as a proxy, chariots had a maximum estimated length of 327 ± 20 cm, and a maximum estimated width of 205 ± 21 cm. These dimensions suggest a great similarity to six chariots of Tutankhamun that have maximum dimensions of 260 × 236 cm (Crouwel 2013).

Elements of Bronze Age chariots. Image from Chechushkov (2007).

Associated individuals

suggest that this person was a chief, and that the burial context illustrates his significance in the social life of the local community (Logvin and Shevnina 2008, p. 193). However, it also suggests the diverse role of the Sintashta–Petrovka elites, who were likely engaged in a number of different activities, such as warfare, craft production, food production, and a broad social life.

(…) while weapons are not universally present with chariots, they are present much more often than in non-chariot burials: more than 50% of the chariot burials are accompanied by weapons, with a clear predominance of projectile arms.

The creation, utilization, and maintenance of the chariots would have required a number of important skills, and some degree of standardization in manufacturing chariots might be related to a very small number of chariot makers. This means that the Sintashta–Petrovka craftsmen were ‘attached specialists’ and made their products following the orders and desires of those who were interested in the competitive use of chariots. Hence, the social group interested in producing and maintaining chariots sponsored all of those processes. While the nature of this social group is unclear, it is reasonable to hypothesize that it could be a group of military elites characterized by aggrandizing behavior. These people shared military identities and values, but also belonged to bigger collectives, presumably diverse kin groups. The competition between these collectives for resources, power, and prestige created the chariot complex.


Analyzing horse-headed knobs, Kovalevskaya demonstrates the evolution of horse tack from a simple muzzle to a bridle with bits during the 5th and 4th millennia BC (Kovalevskaya 2014). Her analysis correlates well with a study of pathologies in horse teeth conducted by Brown and Anthony, who suggest the appearance of bits and horseback riding at Botai and Tersek (Anthony et al. 2006). Cheekpieces became the next necessary and logical step in the evolution of means of horse control. Their appearance together with the wheeled vehicles is not a coincidence, but the development of preceding tools. After the year 2000 BC, cheekpieces often occur together with sacrificed horses—13 out of 15 Sintashta burials with cheekpieces also contain horse bones (Epimakhov and Berseneva 2012)—showing evolution in the role of horses.

The whole paper offers an interesting summary of cultural and population events in the Pontic-Caspian steppes since the Early Yamna period. Also, horse-headed knobs!

NOTE. You can find similar information in other (free) papers from Chechushkov in his account in


Consequences of Damgaard et al. 2018 (II): The late Khvalynsk migration waves with R1b-L23 lineages


This post should probably read “Consequences of Narasimhan et al. (2018),” too, since there seems to be enough data and materials published by the Copenhagen group in Nature and Science to make a proper interpretation of the data that will appear in their corrected tables.

The finding of late Khvalynsk/early Yamna migrations, identified with early LPIE migrants almost exclusively of R1b-L23 subclades is probably one of the most interesting findings in the recent papers regarding the Indo-European question.

Although there are still few samples to derive fully-fledged theories, they begin to depict a clearer idea of waves that shaped the expansion of Late Proto-Indo-European migrants in Eurasia during the 4th millennium BC, i.e. well before the expansion of North-West Indo-European, Palaeo-Balkan, and Indo-Iranian languages.

Late Khvalynsk expansions and archaic Late PIE

Like Anatolian, Tocharian has been described as having a more archaic nature than the rest of Late PIE. However, Pre-Tocharian belongs to the Late PIE trunk, clearly distinguishable phonetically and morphologically from Anatolian.

It is especially remarkable that – even though it expanded into Asia – it has more in common with North-West Indo-European, hence its classification (together with NWIE) as part of a Northern group, unrelated to Graeco-Aryan.

The linguistic supplement by Kroonen et al. accepts that peoples from the Afanasevo culture (ca. 3000-2500 BC) are the most likely ancestors of Tocharians.

NOTE. For those equating the Tarim Mummies (of R1a-Z93 lineages) with Tocharians, you have this assertion from the linguistic supplement, which I support:

An intermediate stage has been sought in the oldest so-called Tarim Mummies, which date to ca. 1800 BCE (Mallory and Mair 2000; Wáng 1999). However, also the language(s) spoken by the people(s) who buried the Tarim Mummies remain unknown, and any connection between them and the Afanasievo culture on the one hand or the historical speakers of Tocharian on the other has yet to be demonstrated (cf. also Mallory 2015; Peyrot 2017).

New samples of late Khvalynsk origin

These are are the recent samples that could, with more or less certainty, correspond to migration waves from late Khvalynsk (or early Yamna), from oldest to most recent:

  • The Namazga III samples from the Late Eneolithic period (in Turkmenistan), dated ca. 3360-3000 BC (one of haplogroup J), potentially showing the first wave of EHG-related steppe ancestry into South Asia. Not related to Indo-Iranian migrations.

NOTE. A proper evaluation with further samples from Narasimhan et al. (2018) is necessary, though, before we can assert a late Khvalynsk origin of this ancestry.

  • Afanasevo samples, dated ca. 3081-2450 BC, with all samples dated before ca. 2700 BC uniformly of R1b-Z2103 subclades, sharing a common genetic cluster with Yamna, showing together the most likely genomic picture of late Khvalynsk peoples.

NOTE 1. Anthony (2007) put this expansion from Repin ca. 3300-3000 BC, while his most recent review (2015) of his own work put its completion ca. 3000-2800. While the migration into Afanasevo may have lasted some time, the wave of migrants (based on the most recent radiocarbon dates) must be set at least before ca. 3100 BC from Khvalynsk.

NOTE 2. I proposed that we could find R1b-L51 in Afanasevo, presupposing the development of R1b-L51 and R1b-Z2103 lineages with separating clans, and thus with dialectal divisions. While finding this is still possible within Khvalynsk regions, it seems we will have a division of these lineages already ca. 4250-4000 BC, which would require a closer follow-up of the different inner late Khvalynsk groups and their samples. For the moment, we don’t have a clear connection through lineages between North-West Indo-European groups and Tocharian.

Early Copper Age migrations in Asia ca. 3300-2800, according to Anthony (2015).
  • Subsequent and similar migration waves are probably to be suggested from the new sample of Karagash, beyond the Urals (attributed to the Yamna culture, hence maintaining cultural contacts after the migration waves), of R1b-Z2103 subclade, ca. 3018-2887 BC, potentially connected then to the event that caused the expansion of Yamna migrants westward into the Carpathians at the same time. Not related to Indo-Iranian migrations.
  • The isolated Darra-e Kur sample, without cultural adscription, ca. 2655 BC, of R1b-L151 lineage. Not related to Indo-Iranian migrations.
  • The Hajji Firuz samples: I4243 dated ca. 2326 BC, female, with a clear inflow of steppe ancestry; and I2327 (probably to be dated to the late 3rd millennium BC or after that), of R1b-Z2103 lineage. Not related to Indo-Iranian migrations.

NOTE. A new radiocarbon dating of I2327 is expected, to correct the currently available date of 5900-5000 BC. Since it clusters nearer to Chalcolithic samples from the site than I4243 (from the same archaeological site), it is possible that both are part of similar groups receiving admixture around this period, or maybe I2327 is from a later period, coinciding with the Iron Age sample F38 from Iran (Broushaki et al. 2016), with which it closely clusters. Also, the finding of EHG-related ancestry in Maykop samples dated ca. 3700-3000 BC (maybe with R1b-L23 subclades) offers another potential source of migrants for this Iranian group.

NOTE. Samples from Narasimhan et al. (2018) still need to be published in corrected tables, which may change the actual subclades shown here.

These late Khvalynsk / early Yamna migration waves into Asia are quite early compared to the Indo-Iranian migrations, whose ancestors can only be first identified with Volga-Ural groups of Yamna/Poltavka (ca. 3000-2400 BC), with its fully formed language expanding only with MLBA waves ca. 2300-1200 BC, after mixing with incoming Abashevo migrants.

While the authors apparently forget to reference the previous linguistic theories whereby Tocharian is more archaic than the rest of Late PIE dialects, they refer to the ca. 1,000-year gap between Pre-Tocharian and Proto-Indo-Iranian migrations, and thus their obvious difference:

The fact that Tocharian is so different from the Indo-Iranian languages can only be explained by assuming an extensive period of linguistic separation.

Potential linguistic substrates in the Middle East

A few words about relevant substrate language proposals.

Euphratic language

What Gordon Whittaker proposes is a North-West Indo-European-related substratum in Sumerian language and texts ca. 3500 BC, which may explain some non-Sumerian, non-Semitic word forms. It is just one of many theories concerning this substratum.

Diachronic map of Eneolithic migrations ca. 4000-3100 BC

This is a summary of his findings from his latest writing on the subject (a chapter of a book on Indo-European phonetics, from the series Copenhagen Studies in Indo-European):

In Sumerian and Akkadian vocabulary, the cuneiform writing system, and the names of deities and places in Southern Mesopotamia a body of lexical material has been preserved that strongly suggests influence emanating from a superstrate of Indo-European origin. his Indo-European language, which has been given the name Euphratic, is, at present, attested only indirectly through the filters of Sumerian and Akkadian. The attestations consist of words and names recorded from the mid-4th millennium BC (Late Uruk period) onwards in texts and lexical lists. In addition, basic signs that originally had a recognizable pictorial structure in proto-cuneiform preserve (at least from the early 3rd millennium on) a number of phonetic values with no known motivation in Sumerian lexemes related semantically to the items depicted. This suggests that such values are relics from the original logographic values for the items depicted and, thus, that they were inherited from a language intimately associated with the development of writing in Mesopotamia. Since specialists working on proto-cuneiform, most notably Robert K. Englund of the Cuneiform Digital Library Initiative, see little or no evidence for the presence of Sumerian in the corpus of archaic tablets, the proposed Indo-European language provides a potential solution to this problem. It has been argued that this language, Euphratic, had a profound influence on Sumerian, not unlike that exerted by Sumerian and Akkadian on each other, and that the writing system was the primary vehicle of this influence. he phonological sketch drawn up here is an attempt to chart the salient characteristics of this influence, by comparing reconstructed Indo-European lexemes with similarly patterned ones in Sumerian (and, to a lesser extent, in Akkadian).

His original model, based on phonetic values in basic proto-cuneiform signs, is quite imaginative and a very interesting read, if you have the time. His account hosts most of his papers on the subject.

We could speculate about the potential expansion of this substrate language with the commercial contacts between Uruk and Maykop (as I did), now probably more strongly supported because of the EHG found in Maykop samples.

NOTE. We could also put it in relation with the Anatolian language of Mari, but this would require a new reassessment of its North-West Indo-European nature.

Nevertheless, this theory is far from being mainstream, anywhere. At least today.

NOTE. The proposal remains still hypothetic, because of the flaws in the Indo-European parallels – similar to Koch’s proposal of Indo-European in Tartessian inscriptions. A comprehensive critic approach to the theory is found in Sylvie Vanséveren’s A “new” ancient Indo-European language? On assumed linguistic contacts between Sumerian and Indo-European “Euphratic”, in JIES (2008) 36:3&4.

Gutian language

References to Gutian are popping up related to the Hajji Firuz samples of the mid-3rd millennium.

The hypothesis was put forward by Henning (1978) in purely archaeological terms.

This is the relevant excerpt from the book:

(…) Comparativists have asserted that, in spite of its late appearance, Tokharian is a relatively archaic form of Indo-European.3 This claim implies that the speakers of this group separated from their Indo-European brethren at a comparatively early date. They should accordingly have set out on their migrations rather early, and should have appeared within the Babylonian sphere of influence also rather early. Earlier, at any rate, than the Indo-Iranians, who spoke a highly developed (therefore probably later) form of Indo-European. Moreover, as some of the Indo-Iranians after their division into Iranians and Indo-Aryans4 appeared in Mesopotamia about 1500 B.C., we should expect the Proto-Tokharians about 2000 B.C. or even earlier.

If, armed with these assumptions as our working hypothesis, we look through the pages of history, we find one nation – one nation only – that perfectly fulfills all three conditions, which, therefore, entitles us to recognize it as the “Proto-Tokharians”. Tis name was Guti; the intial is also spelled with q (a voiceless back velar or pharyngeal), but the spelling with g is the original one. The closing -i is part of the name, for the Akkadian case-endings are added to it, nom. Gutium etc. Guti (or Gutium, as some scholars prefer) was valid for the nation, considered as an entity, but also for the territory it occupied.

The text goes on to follow the invasion of Babylonia by the Guti, and further eastward expansions supposedly connected with these, to form the attested Tocharians.

The referenced text by Thorkild Jakobsen offers the interesting linguistic data:

Among the Gutian rulers is one Elulumesh, whose name is evidently Akkadian Elulum slightly “Gutianized” by the Gutian case(?) ending -eš.40 This Gutian ruler Elulum is obviously the same man whom we find participating in the scramble for power after the death of Shar-kali-sharrii; his name appears there in Sumerian form without mimation as Elulu.

The Gutian dynasty, from ca. 22nd c. BC appears as follows:


I don’t think we could derive a potential relation to any specific Indo-European branch from this simple suffix repeated in Gutian rulers, though.

The hypothesis of the Tocharian-like nature of the Guti (apart from the obvious error of considering them as the ancestors of Tocharians) remains not contrasted in new works since. It was cited e.g. by Gamkrelidze and Ivanov (1995) to advance their Armenian homeland, and by Mallory and Adams in their Encyclopedia (1997).

It lies therefore in the obscurity of undeveloped archaeological-linguistic hypotheses, and its connection with the attested R1b-Z2103 samples from Iran is not (yet) warranted.


Y-DNA haplogroup R1b-Z2103 in Proto-Indo-Iranians?


We already know that the Sintashta -> Andronovo migrants will probably be dominated by Y-DNA R1a-Z93 lineages. However, I doubt it will be the only Y-DNA haplogroup found.

I said in my predictions for this year that there could not be much new genetic data to ascertain how Pre-Indo-Iranian survived the invasion, gradual replacement and founder effects that happened in terms of male haplogroups after the arrival of late Corded Ware migrants, and that we should probably have to rely on anthropological explanations for language continuity despite genetic replacement, as in the Basque case.

Nevertheless, since we have very few samples, I think we could still see a clear genetic contribution from Yamna to Corded Ware immigrants in the North Caspian region (from Abashevo, in turn a mix of Fatyanovo/Balanovo and Catacomb/Poltavka cultures) in terms of:

  • Ancestral components and PCA in new Sintashta-Petrovka, Andronovo, and/or later samples – similar the ‘steppe’ drift seen in Potapovka relative to Sintashta samples, both formed by incoming Corded Ware migrants – ; and
  • R1b-L23 subclades, either appearing scattered during the Sintashta melting pot (of Abashevo/R1a-Z645 and East Yamna-Poltavka/R1b-Z2103 peoples), or resurging after this period, as we have seen in Pre-Balto-Slavic territory.

This contribution could better explain the obvious language continuity in the region, beautifully complementing the complex anthropological model we have now of archaeological continuity of Sintashta and Potapovka with the previous Poltavka, seen in a similar material and symbolic culture that survived the arrival of newcomers.

A lot of people seem to be looking like crazy since O&M 2018 for some sort of connection between Corded Ware and Yamna migrants in Eastern and Central Europe (wheter in SNP calls of samples published, or among almost forgotten academic papers), either to support the ideas of the 2015 papers – for those who relied on their conclusions and built (even if only mentally) far-fetched migration models around it – , or just because of some sort of absurd continuity theory involving modern R1a-Z645 subclades:

NOTE. The situation we have seen with the hundreds of samples from O&M 2018, and with the recent additional Eastern European samples, depict an unexpected absolutely clear-cut distinction in Y-DNA haplogroups between Corded Ware and Yamna/Bell Beaker: I really can’t see how the situation could be more obvious for everyone, so I doubt any further samples will make certain people change their minds. Their hope is, I guess, that just one sample may give some more oxygen to infinite pet theories, as we are still surprisingly seeing even with reactionary R1b autochthonous continuists in Western Europe…

However, looking into the most likely future for the field, what we should be expecting right now is continuity of Yamna ancestry and lineages in early Proto-Indo-Iranian territory. Since we only have a few samples from Sintashta-Petrovka, Potapovka, and Andronovo, I think there might be a sizeable number of R1b-Z2103 subclades in the territory inhabited by those who – no doubt – spread the language into Central Asia.

Modern Y-DNA haplogroup R1b distribution, by Maulucioni at Wikipedia

While full population replacement by R1a-Z93 lineages in the North Caspian region ca. 2000 BC is not impossible, I don’t think it is very likely, since we already know that there are R1b-Z2103 lineages widely distributed in Indo-Iranian-speaking territory, and Z93 is now known to be an older subclade than YFull’s mean formation date suggested (due to the Ukraine_Eneolithic I6561 sample‘s SNP call), so what we can infer now that actually happened in Sintashta -> Andronovo is not exactly the spread of haplogroup Z93 during its formation, but rather a regional reduction in its variability coupled with the expansion of some of its subclades.

The main question, after the South Asia paper is finally published, will then be:

  1. Given that Yamna peoples were an elite group of patrilineally-related families mainly of R1b-L23 subclades:
  2. Accepting that PCA, ADMIXTURE, and other statistical methods are not relevant (alone) for ethnolinguistic identification: e.g. Yamna ‘outliers’ and East Bell Beaker migrants of R1b-L23 lineages without steppe ancestry; N1c1a1a-L392 lineages and Siberian ancestry unrelated to Uralic speakers; R1a-Z645 and steppe ancestry in North-East Europe related to Uralic-speaking cultures
  3. If we find now, as I expect, genetic continuity of east Yamna in Sintashta -> Andronovo (relative to other late Corded Ware peoples), probably including haplogroup R1b-Z2103 mixed with R1a-Z93 before its further reduction of subclades (e.g. to L657) and expansion during its subsequent spread southward…

Diachronic map of migrations in Asia ca. 2250-1750 BC

Why exactly do we need Corded Ware to explain migrations of Late Indo-European speakers?

In other words: if we had the data we have today in 2015, would we have a need for Corded Ware to explain Indo-European migrations from the steppe? Are some people so blinded by their will to (appear to) be right in their past interpretations that they can’t just let go?

NOTE. On a side note, wouldn’t it be nice for this paper to publish some other R1b-L23 (x2103) sample – maybe even R1b-L51 – in Yamna, Andronovo, or Afanasevo territory, to end both autochthonous continuity theories (of North-Eastern and Western Europe) at the same time?

I really hope someone in David Reich’s team understands this matter, or else they will still identify Corded Ware as the (now probably ‘a’ instead) vector of expansion of Indo-European languages, and some of us will still have fun for another 2 or 3 years with such conclusions, until someone in the lab realizes that ancestry ≠ population ≠ ethnic identification ≠ language.

NOTE. It seems rather dull to read how people are discussing in the Twitterverse conventional constructs like ‘human race‘ as found in Reich’s op-ed in The New York Times, as if such grandiose semantic discussions had any practical meaning, when basic anthropological questions actually relevant for Genomics, like the essential ancestral component ≠ people tenet seem not to be of interest for anyone in the field….

Since our Indo-European demic difusion model (and its consequences for our reconstruction of North-West Indo-European) and this blog are becoming more and more popular each day – judging by the constant growth in visits in the past 6 months or so – , I guess the simplemindedness and predictability of certain geneticists is benefitting traditional anthropology directly, driving more and more amateur geneticists to look for sound academic models to answer the growing inconsistencies of genetic research.

NOTE. I am not saying the rejection of Corded Ware as spreading Indo-European is definitive. Maybe more samples within some years will depict a clear ancient expansion of Early or Middle Proto-Indo-Europeans from Khvalynsk to the forest-steppe and forest zone, and later with certain Corded Ware migrants into Central Europe, over whose territory a Late Indo-European dialect from Bell Beakers became the superstrate, as some have proposed in the past – e.g. to explain Krahe’s Old European hydronymy. I really doubt you could demonstrate such an old ethnolinguistic identification with a clear, unbroken archaeological trail, though, and we know now that this old hydronymy is probably of Late Indo-European nature (possibly even more recent).

What I am saying is: with the data we have now, it does not make any sense to keep the anthropological models invented by geneticists ex nihilo in 2015, and the hundred different alternative Late Indo-European migration models that arebornwitheachnewpaper.

These Yamna -> Corded Ware migration models didn’t have any sense for me since early 2016, but now after O&M 2017, and especially O&M 2018, I don’t think any geneticist with a little knowledge in Linguistics or Archaeology (if they are decent about their quest for truth in describing ancient European migrations) would buy them, if not for some sort of created ‘tradition’. So let’s ditch Corded Ware as Late Indo-European-speaking, let’s accept that late Corded Ware migrants should most likely be identified as early Uralic speakers, and then future data will tell if we are – again – wrong.

Please, don’t let Genomics become another pseudoscience based solely on Bioinformatics like glottochronology: let anthropologists (preferably mainstream archaeologists, but also the true Indo-Europeanists, linguists) help you interpret your raw data. Don’t deceive yourselves thinking that you have read enough about the Indo-European question, or that you know enough Indo-Europeanists (say what?) to derive your own conclusions.

Use the South Asia paper to begin expressly retracting the Corded Ware mess.

Please pretty please with sugar on top?


For commenters: this post concerns an anthropological question, and deals with the expansion of Late Proto-Indo-European speakers from Yamna, and the controversy surrounding the role of Corded Ware migrants that a handful of academics propose spread from it, based on a renewed model of Gimbutas’ outdated Kurgan theory and on the so-called ‘Yamnaya’ ancestry.

It happens so that the discussion has turned lately mainly to ancient Y-DNA haplogroups, because they help confirm previous mainstream anthropological models of cultural diffusion and migration. It is obviously not reasonable to judge prehistoric ethnolinguistic migrations from ca. 5,000 years ago based on historical nation-states and ethnic or religious concepts invented since the Middle Ages, coupled with “your” people’s main modern (or your own) paternal lineage.

EDIT (27 MAR 2018): Minor corrections and post made shorter.

Admixture of Srubna and Huns in Hungarian conquerors


New preprint at BioRxiv, Mitogenomic data indicate admixture components of Asian Hun and Srubnaya origin in the Hungarian Conquerors, by Neparáczki et al. (2018), at BioRxiv.

Abstract (emphasis mine):

It has been widely accepted that the Finno-Ugric Hungarian language, originated from proto Uralic people, was brought into the Carpathian Basin by the Hungarian Conquerors. From the middle of the 19th century this view prevailed against the deep-rooted Hungarian Hun tradition, maintained in folk memory as well as in Hungarian and foreign written medieval sources, which claimed that Hungarians were kinsfolk of the Huns. In order to shed light on the genetic origin of the Conquerors we sequenced 102 mitogenomes from early Conqueror cemeteries and compared them to sequences of all available databases. We applied novel population genetic algorithms, named Shared Haplogroup Distance and MITOMIX, to reveal past admixture of maternal lineages. Phylogenetic and population genetic analysis indicated that more than one third of the Conqueror maternal lineages were derived from Central-Inner Asia and their most probable ultimate sources were the Asian Huns. The rest of the lineages most likely originated from the Bronze Age Potapovka-Poltavka-Srubnaya cultures of the Pontic-Caspian steppe, which area was part of the later European Hun empire. Our data give support to the Hungarian Hun tradition and provides indirect evidence for the genetic connection between Asian and European Huns. Available data imply that the Conquerors did not have a major contribution to the gene pool of the Carpathian Basin, raising doubts about the Conqueror origin of Hungarian language.

“Comparison of major Hg distributions from modern and ancient populations. Asian main Hg-s are designated with brackets. Major Hg distribution of Conqueror samples from this study are very similar to that of other 91 Conquerors taken from previous studies [11,12]. Scythians and ancient Xiongnus show similar Hg composition to the bracketed Asian fraction of the Conqueror samples, but Hg B is present just in Xiongnus. Modern Hungarians have very small Asian components pointing at small contribution from the Conquerors. Of the 289 modern Hungarian mitogenomes 272 are published in [29]. Scythian Hg-s are from [48,49,55,59,71–74]. Xiongnu Hg-s are from [66–69].”

Just recently another article contributed to a similar idea. I already talked about the Bronze Age R1a-z93 sample with high steppe ancestry found in the Balkans, and its likely origin in an expansion of the Srubna or a related culture. No truce, therefore, for those looking for autochthonous continuity anywhere in Europe.

We are seeing how multiple migrations shaped the history of the Carpathian basin (and its complex genetic structure) – and of Europe in general -, often from the Pontic-Caspian steppe. That is clear from many different prehistorical and historical times, such as the expansions of Suvorovo-Novodanilovka, Yamna, Srubna, Thraco-Cimmerians, Sarmatians, Scythians, Huns,…

About the linguistic interpretations based on genetics contained in the paper (Hungarian language as a legacy of Huns), well, you know my stance regarding the Yamnaya ancestral concept (and the wrong linguistic interpretations derived from it, which many sadly keep to this day), and genetics in general to solve language questions

This is yet another example of how (what some people would call) “scientific data” is useless without sound anthropological models.

Featured image, from the article: “Hypothetic origin and migration route of different components of the Hungarian Conquerors. Bluish line frames the Eurasian steppe zone, within which all presumptive ancestors of the Conquerors were found. Yellow area designates the Xiongnu Empire at its zenith from which area the East Eurasian lineages originated. Phylogeographical distribution of modern East Eurasian sequence matches (Fig. 1) well correspond to this territory, especially considering that Yakuts, Evenks and Evens lived more south in the past [108], and European Tatars also originated from this area. Regions where Asian and European Scythian remains were found are labeled green, pink is the presumptive range of the Srubnaya culture. Migrants of Xiongnu origin most likely incorporated descendants of these groups. The map was created using QGIS 2.18.4[109]”.

Article available under a CC-BY-NC-ND 4.0 International license.

Discovered via Razib Khan.

See also:

The concept of “Outlier” in Human Ancestry (II): Early Khvalynsk, Sredni Stog, West Yamna, Iron Age Bulgaria, Potapovka, Andronovo…


I already wrote about the concept of outlier in Human Ancestry, so I am not going to repeat myself. This is just an update of “outliers” in recent studies, and their potential origins (here I will repeat some of the examples):

Early Khvalynsk: the three samples from the Samara region have quite different positions in PCA, from nearest to EHG (of Y-DNA haplogroup R1a) to nearest to ANE ancestry (of Y-DNA haplogroup Q). This could represent the initial consequences of the second wave of ANE ancestry – as found later in Yamna samples from a neighbouring region -, possibly brought then by Eurasian migrants related to haplogroup Q.
With only 3 samples, this is obviously just a tentative explanation of the finds. The samples can only be reasonably said to show an unstable time for the region in terms of admixture (i.e. probably migration), judging by the data on PCA.

Ukraine Eneolithic samples offer a curious example of how the concept of outlier can change radically: from the third version (May 30th) of the preprint paper of Mathieson et al. (2017), when the Ukraine Eneolithic sample with steppe ancestry (and clustering with central European samples) was the ‘outlier’, to the fourth version (September 19th), when two samples with steppe ancestry clustering close to Corded Ware samples were now the ‘normal’ ones (i.e. those representing Ukraine Eneolithic population), and the outlier was the one clustering closely with Ukraine Mesolithic samples…

PCA and Admixture for south-eastern Europe. Image modified from Mathieson et al. (2017) – Third revision (May 30th), used in the 2nd edition of the Indo-European demic diffusion model.

This is one of the funny consequences of the wrong interpretation of the ‘yamnaya component’, that made geneticists believe at first that, out of two samples (!), the ‘outlier’ was the one with ‘yamnaya’ ancestry, because this component would have been brought by an eastern immigrant from early Khvalynsk…

This example offers yet another reason why precise anthropological context is necessary to offer the right interpretation of results. Within the Indo-European demic diffusion model – based mainly on Archaeology and Linguistics – , the sample with steppe ancestry was the most logical find in the region for a potential origin of the Corded Ware culture, and it was interpreted as such, well before the publication of the fourth version of Mathieson et al. (2017).

PCA of South-East European and other European samples. Image modified from Mathieson et al. (2017) – Fourth revision (September 19th), used in the 3rd edition of the Indo-European demic diffusion model.

West Yamna (to insist on the same question, the ‘yamnaya’ component): we have only four western Yamna samples, two of them showing Anatolian Neolithic ancestry (one of them, from Ukraine, with a strong ‘southern’ drift). On the other hand, Corded Ware migrants do not show this. So we could infer that their migrations were not coetaneous: whereas peoples of Corded Ware culture expanded ca. 3300 BC to the north – in the natural corridor to the Baltic that has been proposed for this culture in Archaeology for decades (and that is well represented by Ukraine Eneolithic samples) -, peoples of Yamna culture expanded to the west, replacing the Ukraine Eneolithic population (i.e. probably those of ‘Proto-Corded Ware culture’), and eventually mixing with Balkan populations of Anatolian Neolithic ancestry.

Potapovka, Andronovo, and Srubna: while Potapovka clusters closely to the steppe, and Andronovo (like Sintashta) clusters closely to Corded Ware (i.e. Ukraine Neolithic / Central-East European), both have certain ‘outliers’ in PCA: the former has one individual clustering closely to Corded Ware, and the latter to the steppe. Both ‘outliers’ fit well with the interpretation of the recent mixture of Corded Ware peoples with steppe populations, and they offer a different image for the evolution of populations of Potapovka and Sintashta-Petrovka, potentially influencing their language. The position of Srubna samples, nearer to Sintashta and Andronovo (but occupying the same territory as the previous Potapovka) offers the image of a late westward conquest from Corded Ware-related populations.

Diachronic map of migrations ca. 2250-1750 BC

Iron Age Bulgaria: a sample of haplogroup R1a-z93, with more ‘yamnaya’ ancestry than any other previous sample from the Balkans. For some, it might mean continuity from an older time. However – as with the Corded Ware outlier from Esperstedt before it – it is more likely a recent migrant from the steppe. The most likely origin of this individual is therefore people from the steppe, i.e. either the Srubna culture or a related group. Its relatively close cluster in PCA to certain recent Slavic populations can be interpreted in light of the multiple back and forth migrations in the region: of steppe populations to the west (Srubna, Cimmerians, Scythians, Sarmatians,…), and of Slavic-speaking populations:

Diachronic map of Bronze Age migrations ca. 1750-1250 BC.

Well-defined outliers are, therefore, essential to understand a recent history of admixture. On the other hand, the very concept of “outlier” can be a dangerous tool – when the lack of enough samples makes their classification as as such unjustified -, leading to the wrong interpretations.


The concept of “outlier” in studies of Human Ancestry, and the Corded Ware outlier from Esperstedt


While writing the third version of the Indo-European demic diffusion model, I noticed that one Corded Ware sample (labelled I0104) clusters quite closely with steppe samples (i.e. Yamna, Afanasevo, and Potapovka). The other Corded Ware samples cluster, as expected, closely with east-central European samples, which include related cultures such as the Swedish Battle Axe, and later Sintashta, or Potapovka (cultures that are from the steppe proper, but are derived from Corded Ware).

I also noticed after publishing the draft that I had used the wording “Corded Ware outlier” at least once. I certainly had that term in mind when developing the third version, but I did not intend to write it down formally. Nevertheless, I think it is the right name to use.

PCA of dataset including Minoans and Mycenaeans, and Scythians and Sarmatians. The graphic has been arranged so that ancestries and samples are located in geographically friendly axes similar to north-south (Y), east-west(X). Symbols are used, in a simplified manner, in accordance with symbols for Y-DNA haplogroups used in the maps. Labels have been used for simplification of important components. Areas are drawn surrounding Yamna, Poltavka, Afanasevo, Corded Ware (including samples from Estonia, Battle Axe, and Poltavka outlier), and succeeding Sintashta and Potapovka cultures, as well as Bell Beaker. Corded Ware sample I0104, from Esperstedt, has also been labelled.

Outlier in Statistics, as you can infer from the name, is a sample (more precisely an observation) that lies distant to others. It is a slippery concept in Human Evolutionary Biology, because it has no clear definition, and it is thus dependent on a certain degree of subjective evaluation. It seems to be mainly based on a combination of PCA and ADMIXTURE analyses, but should obviously be dependent on the number of samples available for a certain culture, and the regional distribution of the samples available.

We have thus certain clear cases, like the Poltavka outlier, of R1a-M417 lineage, clustering close to Corded Ware (and Sintashta, and Potapovka) samples, but far from other R1b-L23 samples from Poltavka or Yamna cultures, from neighbouring regions in the steppe.

We have also less clear observations, like Balkan Chalcolithic samples, which may or may not have been part of different cultural groups (say, related to the Suvorovo-Novodanilovka expansion, or not), which may justify their differences in ancestral components in ADMIXTURE, and in their position in PCA.

And we have a Yamna sample from western Ukraine, which – unlike the other two available samples – clusters “to the south” of east Yamna samples. Taking into account the Yamna sample from Bulgaria, clustering closely with south-eastern European samples, could you really call this an outlier? Two outliers out of four western Yamna samples? Well, maybe. If you take east and west Yamna from the steppe as a whole, and exclude the Yamna sample from Bulgaria, of course you can. Whether that classification is useful, or actually hinders a proper interpretation of western Yamna samples, and of the “Yamna component” seen in them, is a different story…

PCA for European samples of Mathieson et al. (2017)

But what then about the Corded Ware male from Esperstedt, labelled I0104, dated ca. 2430 BC, which clusters among contemporaneous steppe (Poltavka) samples, and has the greatest proportion of ‘Yamna component’ in ADMIXTURE? After all, it is different in both respects from any other Corded Ware individual – including the oldest samples available, from Latvia (ca. 2885 BC) and Tiefbrunn (ca. 2755 BC).

This sample is one of the direct links between the steppe and Corded Ware in late times, and has been the main reason for the confusion a lot of people seem to have about the “Yamna component” in Corded Ware, with some supporting a direct migration from one into the other, and a few even daring to say that “Corded Ware is indistinguishable from Yamna”(!?).

His family members – all males of haplogroup R1a-M417 (like I0104 and most males from the Corded Ware culture) -, few generations later, show a decreased Yamna component, which clearly indicates that this individual’s admixture came directly from the steppe, and most likely from one or multiple female ancestors. That is compatible with the nomadic nature of the Corded Ware culture (and its known exogamy practices), which connected central Europe with the steppes, up to the North Caspian region.

If labelling other samples as outliers may be interesting to improve the conclusions one can obtain from genetic research, labelling this sample is, in my opinion, essential, to avoid certain strong misconceptions about the origin of the Corded Ware culture.