Iron Age Tocharians of Yamnaya ancestry from Afanasevo show hg. R1b-M269 and Q1a1

New open access Ancient Genomes Reveal Yamnaya-Related Ancestry and a Potential Source of Indo-European Speakers in Iron Age Tianshan, by Ning et al. Cell (2019).

Interesting excerpts (emphasis mine, changes for clarity):

Here, we report the first genome-wide data of 10 ancient individuals from northeastern Xinjiang. They are dated to around 2,200 years ago and were found at the Iron Age Shirenzigou site. We find them to be already genetically admixed between Eastern and Western Eurasians. We also find that the majority of the East Eurasian ancestry in the Shirenzigou individuals is related to northeastern Asian populations, while the West Eurasian ancestry is best presented by ∼20% to 80% Yamnaya-like ancestry. Our data thus suggest a Western Eurasian steppe origin for at least part of the ancient Xinjiang population. Our findings furthermore support a Yamnaya-related origin for the now extinct Tocharian languages in the Tarim Basin, in southern Xinjiang.

Haplogroups

The dominant mtDNA lineages of the Shirenzigou people are commonly found in modern and ancient West Eurasian populations, such as U4, U5, and H, while they also have East Eurasian-specific haplogroups A, D4, and G3, preliminarily documenting admixed ancestry from eastern and western Eurasia.

The admixture profile is also shown on the paternal Y chromosome side that 4 out of 6 males in Shirenzigou (Figure S2) belong to the West Eurasian-specific haplogroup R1b (n = 2) and East Eurasian-specific haplogroup Q1a (n = 2), the former is predominant in ancient Yamnaya and nearly 100% in Afanasievo, different from the Middle and Late Bronze Age Steppe groups (Steppe_MLBA) such as Andronovo, [Potapovka], Srubnaya, and Sintashta whose Y chromosomal haplogroup is mainly R1a.

tocharians-y-dna-mtdna

Autosomal

We first carried out principal component analysis (PCA) to assess the genetic affinities of the ancient individuals qualitatively by projecting them onto present-day Eurasian variation (Figure 2). We observed a distinct separation between East and West Eurasians. Our ancient Shirenzigou samples and present-day populations from Central Asia and northwestern China form a genetic cline from East to West in the first PC. The distribution of Shirenzigou samples on the cline is relatively scattered with two major clusters, one being closer to modern-day Uygurs and Kazakhs and the other being closer to recently published ancient Saka and Huns from the Tianshan in Kazakhstan (…).

We applied a formal admixture test using f3 statistics in the form of f3 (Shirenzigou; X, Y) where X and Y are worldwide populations that might be the genetic sources for the Shirenzigou individuals. We observed the most significant signals of admixture in the Shirenzigou samples when using Yamnaya_Samara or Srubnaya as the West Eurasian source and some Northern Asians or Koreans as the East Eurasian source (Table S1). We also plotted the outgroup f3 statistics in the form of f3 (Mbuti; X, Anatolia_Neolithic) and f3 (Mbuti; X, Kostenki14) to visualize the allele sharing between population X and Anatolian farmers. As shown in Figure S3, the Steppe_MLBA populations including Srubnaya, Andronovo, and Sintashta were shifted toward farming populations compared with Yamnaya groups and the Shirenzigou samples. This observation is consistent with ADMIXTURE analysis that Steppe_MLBA populations have an Anatolian and European farmer-related component that Yamnaya groups and the Shirenzigou individuals do not seem to have. The analysis consistently suggested Yamnaya-related Steppe populations were the better source in modeling the West Eurasian ancestry in Shirenzigou.

tocharians-pca-admixture
PCA and ADMIXTURE for Shirenzigou Samples. Modified from the original to include in black squares samples related to Yamnaya.

Genetic Composition of Iron Age Shirenzigou Individuals

We continued to use qpAdm to estimate the admixture proportions in the Shirenzigou samples by using different pairs of source populations, such as Yamnaya_Samara, Afanasievo, Srubnaya, Andronovo, BMAC culture (Bustan_BA and Sappali_Tepe_BA) and Tianshan_Hun as the West Eurasian source and Han, Ulchi, Hezhen, Shamanka_EN as the East Eurasian source. In all cases, Yamnaya, Afanasievo, or Tianshan_Hun always provide the best model fit for the Shirenzigou individuals, while Srubnaya, Andronovo, Bustan_BA and Sappali_Tepe_BA only work in some cases. The Yamnaya_Samara or Afanasievo-related ancestry ranges from ∼20% to 80% in different Shirenzigou individuals, consistent with the scattered distribution on the East-West cline in the PCA

ancestry-tocharians

(…) we then modeled Shirenzigou as a three-way admixture of Yamnaya_Samara, Ulchi (or Hezhen) and Han to infer the source from the East Eurasia side that contributed to Shirenzigou. We found the Ulchi or Hezhen and Han-related ancestry had a complicated and unevenly distribution in the Shirenzigou samples. The most Shirenzigou individuals derived the majority of their East Eurasian ancestry from Ulchi or Hezhen-related populations, while the following two individuals M820 and M15-2 have more Han related than Ulchi/Hezhen-related ancestry.

One important question remains, though: how and when did these Proto-Tocharian speakers migrate from the Afanasevo culture in the Altai into the Tarim Basin? The traditional answer, now more likely than ever, is through the Chemurchek culture. See e.g. A re-analysis of the Qiemu’erqieke (Shamirshak) cemeteries, Xinjiang, China, by Jia and Betts JIES (2010) 38(4).

Also, given the apparent lack of (extra farmer ancestry that characterizes) Corded Ware ancestry, if the results were already suspicious before, how likely are now the published R1a(xZ93) and/or radiocarbon dates of the Xiaohe mummies from Li et al. (2010, 2015)? Because, after all, one should have expected in such a late date a generalized admixture with neighbouring Srubna/Andronovo-like populations.

Related

Early Iranian steppe nomadic pastoralists also show Y-DNA bottlenecks and R1b-L23

New paper (behind paywall) Ancient genomes suggest the eastern Pontic-Caspian steppe as the source of western Iron Age nomads, by Krzewińska et al. Science (2018) 4(10):eaat4457.

Interesting excerpts (emphasis mine, some links to images and tables deleted for clarity):

Late Bronze Age (LBA) Srubnaya-Alakulskaya individuals carried mtDNA haplogroups associated with Europeans or West Eurasians (17) including H, J1, K1, T2, U2, U4, and U5 (table S3). In contrast, the Iron Age nomads (Cimmerians, Scythians, and Sarmatians) additionally carried mtDNA haplogroups associated with Central Asia and the Far East (A, C, D, and M). The absence of East Asian mitochondrial lineages in the more eastern and older Srubnaya-Alakulskaya population suggests that the appearance of East Asian haplogroups in the steppe populations might be associated with the Iron Age nomads, starting with the Cimmerians.

scythian-cimmerian-sarmatian-y-dna-mtdna

#UPDATE (5 OCT 2018): Some Y-SNP calls have been published in a Molgen thread, with:

  • Srubna samples have possibly two R1a-Z280, three R1a-Z93.
  • Cimmerians may not have R1b: cim357 is reported as R1a.
  • Some Scythians have low coverage to the point where it is difficult to assign even a reliable haplogroup (they report hg I2 for scy301, or E for scy197, probably based on some shared SNPs?), but those which can be reliably assigned seem R1b-Z2103 [hence probably the use of question marks and asterisks in the table, and the assumption of the paper that all Scythians are R1b-L23]:
    • The most recent subclade is found in scy305: R1b-Z2103>Z2106 (Z2106+, Y12538/Z8131+)
    • scy304: R1b-Z2103 (M12149/Y4371/Z8128+).
    • scy009: R1b-P312>U152>L2 (P312+, U152?, L2+)?
  • Sarmatians are apparently all R1a-Z93 (including tem002 and tem003);
  • You can read here the Excel file with (some probably as speculative as the paper’s own) results.

    About the PCA

    1. Srubnaya-Alakulskaya individuals exhibited genetic affinity to northern and northeastern present-day Europeans, and these results were also consistent with outgroup f3 statistics.
    2. The Cimmerian individuals, representing the time period of transition from Bronze to Iron Age, were not homogeneous regarding their genetic similarities to present-day populations according to the PCA. F3 statistics confirmed the heterogeneity of these individuals in comparison with present-day populations
    3. The Scythians reported in this study, from the core Scythian territory in the North Pontic steppe, showed high intragroup diversity. In the PCA, they are positioned as four visually distinct groups compared to the gradient of present-day populations:
      1. A group of three individuals (scy009, scy010, and scy303) showed genetic affinity to north European populations (…).
      2. A group of four individuals (scy192, scy197, scy300, and scy305) showed genetic similarities to southern European populations (…).
      3. A group of three individuals (scy006, scy011, and scy193) located between the genetic variation of Mordovians and populations of the North Caucasus (…). In addition, one Srubnaya-Alakulskaya individual (kzb004), the most recent Cimmerian (cim357), and all Sarmatians fell within this cluster. In contrast to the Scythians, and despite being from opposite ends of the Pontic-Caspian steppe, the five Sarmatians grouped close together in this cluster.
      4. A group of three Scythians (scy301, scy304, and scy311) formed a discrete group between the SC and SE and had genetic affinities to present-day Bulgarian, Greek, Croatian, and Turkish populations (…).
      5. Finally, one individual from a Scythian cultural context (scy332) is positioned outside of the modern West Eurasian genetic variation (Fig. 1C) but shared genetic drift with East Asian populations.
    scythian-cimmerian-pca
    Radiocarbon ages and geographical locations of the ancient samples used in this study. Figure panels presented (Left) Bar plot visualizing approximate timeline of presented and previously published individuals. (Right) Principal component analysis (PCA) plot visualizing 35 Bronze Age and Iron Age individuals presented in this study and in published ancient individuals (table S5) in relation to modern reference panel from the Human Origins data set (41).

    Cimmerians

    The presence of an SA component (as well as finding of metals imported from Tien Shan Mountains in Muradym 8) could therefore reflect a connection to the complex networks of the nomadic transmigration patterns characteristic of seasonal steppe population movements. These movements, although dictated by the needs of the nomads and their animals, shaped the economic and social networks linking the outskirts of the steppe and facilitated the flow of goods between settled, semi-nomadic, and nomadic peoples. In contrast, all Cimmerians carried the Siberian genetic component. Both the PCA and f4 statistics supported their closer affinities to the Bronze Age western Siberian populations (including Karasuk) than to Srubnaya. It is noteworthy that the oldest of the Cimmerians studied here (cim357) carried almost equal proportions of Asian and West Eurasian components, resembling the Pazyryks, Aldy-Bel, and Iron Age individuals from Russia and Kazakhstan (12). The second oldest Cimmerian (cim358) was also the only one with both uniparental markers pointing toward East Asia. The Q1* Y chromosome sublineage of Q-M242 is widespread among Asians and Native Americans and is thought to have originated in the Altai Mountains (24)

    Scythians

    In contrast to the eastern steppe Scythians (Pazyryks and Aldy-Bel) that were closely related to Yamnaya, the western North Pontic Scythians were instead more closely related to individuals from Afanasievo and Andronovo groups. Some of the Scythians of the western Pontic-Caspian steppe lacked the SA and the East Eurasian components altogether and instead were more similar to a Montenegro Iron Age individual (3), possibly indicating assimilation of the earlier local groups by the Scythians.

    Toward the end of the Scythian period (fourth century CE), a possible direct influx from the southern Ural steppe zone took place, as indicated by scy332. However, it is possible that this individual might have originated in a different nomadic group despite being found in a Scythian cultural context.

    scythian-alakul-variation
    Genetic diversity and ancestral components of Srubnaya-Alakulskaya population.(here called “Srubnaya”): (Left) Mean f3 statistics for Srubnaya and other Bronze Age populations. Srubnaya group was color-coded the same as with PCA. (Right) Pairwise mismatch estimates for Bronze Age populations.

    Comments

    I am surprised to find this new R1b-L23-based bottleneck in Eastern Iranian expansions so late, but admittedly – based on data from later times in the Pontic-Caspian steppe near the Caucasus – it was always a possibility. The fact that pockets of R1b-L23 lineages remained somehow ‘hidden’ in early Indo-Iranian communities was clear already since Narasimhan et al. (2018), as I predicted could happen, and is compatible with the limited archaeological data on Sintashta-Potapovka populations outside fortified settlements. I already said that Corded Ware was out of Indo-European migrations then, this further supports it.

    Even with all these data coming just from a north-west Pontic steppe region (west of the Dnieper), these ‘Cimmerians’ – or rather the ‘Proto-Scythian’ nomadic cultures appearing before ca. 800 BC in the Pontic-Caspian steppes – are shown to be probably formed by diverse peoples from Central Asia who brought about the first waves of Siberian ancestry (and Asian lineages) seen in the western steppes. You can read about a Cimmerian-related culture, Anonino, key for the evolution of Finno-Permic peoples.

    Also interesting about the Y-DNA bottleneck seen here is the rejection of the supposed continuous western expansions of R1a-Z645 subclades with steppe tribes since the Bronze Age, and thus a clearest link of the Hungarian Árpád dynasty (of R1a-Z2123 lineage) to either the early Srubna-related expansions or – much more likely – to the actual expansions of Hungarian tribes near the Urals in historic times.

    NOTE. I will add the information of this paper to the upcoming post on Ugric and Samoyedic expansions, and the late introduction of Siberian ancestry to these peoples.

    A few interesting lessons to be learned:

    • Remember the fantasy story about that supposed steppe nomadic pastoralist society sharing different Y-DNA lineages? You know, that Yamna culture expanding with R1b from Khvalynsk-Repin into the whole Pontic-Caspian steppes and beyond, developing R1b-dominated Afanasevo, Bell Beaker, and Poltavka, but suddenly appearing (in the middle of those expansions through the steppes) as a different culture, Corded Ware, to the north (in the east-central European forest zone) and dominated by R1a? Well, it hasn’t happened with any other steppe migration, so…maybe Proto-Indo-Europeans were that kind of especially friendly language-teaching neighbours?
    • Remember that ‘pure-R1a’ Indo-Slavonic society emerged from Sintashta ca. 2100 BC? (or even Graeco-Aryan??) Hmmmm… Another good fantasy story that didn’t happen; just like a central-east European Bronze Age Balto-Slavic R1a continuity didn’t happen, either. So, given that cultures from around Estonia are those showing the closest thing to R1a continuity in Europe until the Iron Age, I assume we have to get ready for the Gulf of Finland Balto-Slavic soon.
    • Remember that ‘pure-R1a’ expansion of Indo-Europeans based on the Tarim Basin samples? This paper means ipso facto an end to the Tarim Basin – Tocharian artificial controversy. The Pre-Tocharian expansion is represented by Afanasevo, and whether or not (Andronovo-related) groups of R1a-Z645 lineages replaced part or eventually all of its population before, during, or after the Tocharian expansion into the Tarim Basin, this does not change the origin of the language split and expansion from Yamna to Central Asia; just like this paper does not change the fact that these steppe groups were Proto-Iranian (Srubna) and Eastern Iranian (Scythian) speakers, regardless of their dominant haplogroup.
    • And, best of all, remember the Copenhagen group’s recent R1a-based “Indo-Germanic” dialect revival vs. the R1b-Tocharo-Italo-Celtic? Yep, they made that proposal, in 2018, based on the obvious Yamna—R1b-L23 association, and the desire to support Kristiansen’s model of Corded Ware – Indo-European expansion. Pepperidge Farm remembers. This new data on Early Iranians means another big NO to that imaginary R1a-based PIE society. But good try to go back to Gimbutas’ times, though.
    olander-classificatoin
    Olander’s (2018) tree of Indo-European languages. Presented at Languages and migrations in pre-historic Europe (7-12 Aug 2018)

    Do you smell that fresher air? It’s the Central and East European post-Communist populist and ethnonationalist bullshit (viz. pure blond R1a-based Pan-Nordicism / pro-Russian Pan-Slavism / Pan-Eurasianism, as well as Pan-Turanism and similar crap from the 19th century) going down the toilet with each new paper.

    #EDIT (5 OCT 2018): It seems I was too quick to rant about the consequences of the paper without taking into account the complexity of the data presented. Not the first time this impulsivity happens, I guess it depends on my mood and on the time I have to write a post on the specific work day…

    While the data on Srubna, Cimmerians, and Sarmatians shows clearer Y-DNA bottlenecks (of R1a-Z645 subclades) with the new data, the Scythian samples remain controversial, because of the many doubts about the haplogroups (although the most certain cases are R1b-Z2103), their actual date, and cultural attribution. However, I doubt they belong to other peoples, given the expansionist trends of steppe nomads before, during, and after Scythians (as shown in statistical analyses), so most likely they are Scythian or ‘Para-Scythian’ nomadic groups that probably came from the east, whether or not they incorporated Balkan populations. This is further supported by the remaining R1b-P312 and R1b-Z2103 populations in and around the modern Eurasian steppe region.

    scythian-peoples-balkans
    Early Iron Age cultures of the Carpathian basin ca. 7-6th century BC, including steppe groups Basarabi and Scythians. Ďurkovič et al. (2018).

    You can find an interesting and detailed take on the data published (in Russian) at Vol-Vlad’s LiveJournal (you can read an automatic translation from Google). I think that post is maybe too detailed in debunking all information associated to the supposed Scythians – to the point where just a single sample seems to be an actual Scythian (?!) -, but is nevertheless interesting to read the potential pitfalls of the study.

    Related

    A study of genetic diversity of three isolated populations in Xinjiang using Y-SNP

    indo-european-indo-iranian-migrations

    New open access paper (in Chinese) A study of genetic diversity of three isolated populations in Xinjiang using Y-SNP, by liu et al. Acta Anthropologica Sinitica (2018)

    Abstract:

    The Keriyan, Lopnur and Dolan peoples are isolated populations with sparse numbers living in the western border desert of our country. By sequencing and typing the complete Y-chromosome of 179 individuals in these three isolated populations, all mutations and SNPs in the Y-chromosome and their corresponding haplotypes were obtained. Types and frequencies of each haplotype were analyzed to investigate genetic diversity and genetic structure in the three isolated populations. The results showed that 12 haplogroups were detected in the Keriyan with high frequencies of the J2a1b1 (25.64%), R1a1a1b2a (20.51%), R2a (17.95%) and R1a1a1b2a2 (15.38%) groups. Sixteen haplogroups were noted in the Lopnur with the following frequencies: J2a1 (43.75%), J2a2 (14.06%), R2 (9.38%) and L1c (7.81%). Forty haplogroups were found in the Dolan, noting the following frequencies: R1b1a1a1 (9.21%), R1a1a1b2a1a (7.89%), R1a1a1b2a2b (6.58%) and C3c1 (6.58%). These data show that these three isolated populations have a closer genetic relationship with the Uygur, Mongolian and Sala peoples. In particular, there are no significant differences in haplotype and frequency between the three isolated populations and Uygur (f=0.833, p=0.367). In addition, the genetic haplotypes and frequencies in the three isolated populations showed marked Eurasian mixing illustrating typical characteristics of Central Asian populations.

    population-distribution-map
    Figure 1. The populations distribution map. Left: Uluru. Center: Dali Yabuyi. Right: Kaerqu.

    My knowledge of written Chinese is almost zero, so here are some excerpts with the help of Google Translate:

    The source of 179 blood samples used in the study is shown in Figure 1. The Keriyan blood samples were collected from Dali Yabuyi Township, Yutian County (39 samples). The blood samples of the Lopnur people were collected from Kaerqu Township, Yuli County (64 cases); the blood samples of the Dolan people were collected from the town of Uluru, Awati County (76).

    haplotype-frequency-uighur
    Columns one and two are the Keriyan haplotypes and frequencies, respectively; the third and fourth columns are the Lopnur haplotypes and frequencies; the last four columns are the Daolang haplotypes and frequencies.

    The composition and frequency of the Keriyan people’s haplogroup are closest to those of the Uighurs, and both Principal Component Analysis and Phylogenetic Tree Analysis show that their kinship is recent. We initially infer that the Keriyan are local desert indigenous people. They have a connection with the source of the Uighurs. Chen et al. [42] studied the patriarchal and maternal genetic analysis of the Keriyan people and found that they are not descendants of the Tibetan ethnic group in the West. The Keriyan people are a mixed group of Eastern and Western Europeans, which may originate from the local Vil group. Duan Ranhui [43] and other studies have shown that the nucleotide variability and average nucleotide differences in the Keriyan population are between the reported Eastern and Western populations. The phylogenetic tree also shows that the populations in Central Asia are between the continental lineage of the eastern population and the European lineage of the western population, and the genetic distance between the Keriyan and the Uighurs is the closest, indicating that they have a close relationship.

    y-chromosome-pca

    Regarding the origin of the Lopnur people, Purzhevski judged that it was a mixture of Mongolians and Aryans according to the physical characteristics of the Lopnur people. In 1934, the Sino-Swiss delegation discovered the famous burials of the ancient tombs in the Peacock River. After research, they were the indigenous people before the Loulan period; the researcher Yang Lan, a researcher at the Institute of Cultural Relics of the Chinese Academy of Social Sciences, said that the Lopnur people were descendants of the ancient “Landan survivors”. However, the Loulan people speaking an Indo-European language, and the Lopnur people speaking Uyghur languages contradict this; the historical materials of the Western Regions, “The Geography of the Western Regions” and “The Western Regions of the Ming Dynasty” record the Uighurs who lived in Cao Cao in the late 17th and early 18th centuries. Because of the occupation of the land by the Junggar nobles and their oppression, they fled. Some of them were forced to move to the Lop Nur area. There are many similar archaeological discoveries and historical records. We have no way to determine their accuracy, but they are at different times, and there is a great difference in what is heard in the same region. (…) The genetic characteristics of modern Lopnur people are the result of the long-term ethnic integration of Uyghurs, Mongols, and Europeans. This is also consistent with the similarity of the genetic structure of the Y chromosome of Lopnur in this study with the Uighurs and Mongolians. For example, the frequency of J haplogroup is as high as 59.37%, while J and its downstream sub-haplogroup are mainly distributed in western Europe, West Asia and Central Asia; the frequency of O, R haplogroup is close to that of Mongolians.

    y-chromosome-frequency
    1) KA: Keriya, LB: Rob, DL: Daolang, HTW: Hetian Uygur, HTWZ: and Uygur, TLFW: Turpan Uighur, HZ: Hui, HSKZ: Kazakh, WZBKZ: Wuhuan Others, TJKZ: Tajik, KEKZZ: Kirgiz, TTEZ: Tatar, ELSZ: Russian XBZ: Xibo, MGZ: Mongolian, SLZ: Salar, XJH: Xinjiang Han, GSH: Gansu Han, GDH: Guangdong Han SCH: Sichuan Han. 2) Reference population data source literature 19-22. After the population names in the table have been marked, all the shorthands in the text are referred to in this table. 3) Because the degree of haplotypes of each reference population is different to each sub-group branch, the sub-group branches under the same haplogroup are merged when the population haplogroup data is aggregated, for example: for haplogroup G Some people are divided into G1a and G2a levels, others are assigned to G1, G2, and G3, while some people can only determine G this time. Therefore, each subgroup is merged into a single group G.

    According to Ming History·Western Biography, the Mongolians originated from the Mobei Plateau and later ruled Asia and Eastern Europe. Mongolia was established, and large areas of southern Xinjiang and Central Asia were included. Later, due to the Mongolian king’s struggle for power, it fell into a long-term conflict. People of the land fled to avoid the war, and the uninhabited plain of the lower reaches of the Yarkant River naturally became a good place to live. People from all over the world gathered together and called themselves “Dura” and changed to “Dang Lang”. The long-term local Uyghur exchanges that entered the southern Mongolian monks and “Dura” were gradually assimilated [44]. According to the report, locals wore Mongolian clothes, especially women who still maintained a Mongolian face [45]. In 1976, the robes and waistbands found in the ancient time of the Daolang people in Awati County were very similar to those of the ancients. Dalang Muqam is an important part of Daolang culture. It is also a part of the Uyghur Twelve Muqam, and it retains the ancient Western culture, but it also contains a larger Mongolian culture and relics. The above historical records show that the Daolang people should appear in the Chagatai Khanate and be formed by the integration of Mongolian and Uighur ethnic groups. Through our research, we also found that the paternal haplotype of the Daolang people is contained in both Uygur and Mongolian, and the main haplogroups are the same, whereas the frequencies are different (see Table 3). The principal component analysis and the NJ analysis are also the same. It is very close to the Uyghur and the Mongolian people, which establishes new evidence for the “mixed theory” in molecular genetics.

    main-haplogroup-uighur
    Genetic relationship between the three isolated populations: the Uygur and the Mongolian is the closest, and the main haplogroup can more intuitively compare the source composition of the genetic structure of each population. Haplogroups C, D, and O are mainly distributed in Asia as the East Asian characteristic haplogroup; haplogroups G, J, and R are mainly distributed in continental Europe, and the high frequency distribution is in Europe and Central Asia.

    If the nomenclature follows a recent ISOGG standard, it appears that:

    The presence of exclusively R1a-Z93 subclades and the lack of R1b-M269 samples is compatible with the expansion of R1a-Z93 into the area with Proto-Tocharians, at the turn of the 3rd-2nd millennium BC, as suggested by the Xiaohe samples, supposedly R1a(xZ93).

    Now that it is obvious from ancient DNA (as it was clear from linguistics) that Pre-Tocharians separated earlier than other Late PIE peoples, with the expansion of late Khvalynsk/Repin into the Altai, at the end of the 4th millennium, these prevalent R1a (probably Z93) samples may be showing a replacement of Pre-Tocharian Y-DNA with the Andronovo expansion already by 2000 BC.

    Lacking proper assessment of ancient DNA from Proto-Tocharians, this potential early Y-DNA replacement is still speculative*. However, if that is the case, I wonder what the Copenhagen group will say when supporting this, but rejecting at the same time the more obvious Y-DNA replacement in East Yamna / Poltavka in the mid-3rd millennium with incoming Corded Ware-related peoples. I guess the invention of an Indo-Tocharian group may be near…

    *NOTE. The presence of R1b-M269 among Proto-Tocharians, as well as the presence of R1b-M269 among Tarim Basin peoples in modern and ancient times is not yet fully discarded. The prevalence of R1a-Z93 may also be the sign of a more recent replacement by Iranian peoples, before the Mongolian and Turkic expansions that probably brought R1b(xM269).

    Also, the presence of R1b (xM269) samples in east Asia strengthens the hypothesis of a back-migration of R1b-P297 subclades, from Northern Europe to the east, into the Lake Baikal area, during the Early Mesolithic, as found in the Botai samples and later also in Turkic populations – which are the most likely source of these subclades (and probably also of Q1a2 and N1c) in the region.

    Related

    Yamna/Afanasevo elite males dominated by R1b-L23, Okunevo brings ancient Siberian/Asian population

    afanasevo-okunevo

    Open access paper New genetic evidence of affinities and discontinuities between bronze age Siberian populations, by Hollard et al., Am J Phys Anthropol. (2018) 00:1–11.

    NOTE. This seems to be a peer-reviewed paper based on a more precise re-examination of the samples from Hollard’s PhD thesis, Peuplement du sud de la Sibérie et de l’Altaï à l’âge du Bronze : apport de la paléogénétique (2014).

    Interesting excerpts:

    Afanasevo and Yamna

    The Afanasievo culture is the earliest known archaeological culture of southern Siberia, occupying the Minusinsk-Altai region during the Eneolithic era 3600/3300 BC to 2500 BC (Svyatko et al., 2009; Vadetskaya et al., 2014). Archeological data showed that the Afanasievo culture had strong affinities with the Yamnaya and pre-Yamnaya Eneolithic cultures in the West (Grushin et al., 2009). This suggests a Yamnaya migration into western Altai and into Afanasievo. Note that, in most current publications, “the Yamnaya culture” combines the so-called “classical Yamnaya culture” of the Early Bronze Age and archeological sites of the preceding Repin culture in the middle reaches of the Don and Volga rivers. In the present article we conventionally use the term Yamnaya in the same sense, in which case the beginning of the “Yamnaya culture” can be dated after the middle of the 4th millennium BC, when the Afanasievo culture appeared in the Altai.

    Because of numerous traits attributed to early Indo-Europeans and cultural relations with Kurgan steppe cultures, members of the Afanasievo culture are believed to have been Indo-European speakers (Mallory and Mair, 2000). In a recent whole-genome sequencing study, Allentoft et al. (2015) concluded that Eastern Yamnaya individuals and Afanasievo individuals were genetically indistinguishable. Moreover, this study and one published concurrently by Haak et al. (2015) analyzed 11 Eastern Yamnaya males and showed that all of them belonged to the R1b1a1a (formerly R1b1a) (…)

    indo-european-uralic-migrations-afanasevo
    Early Chalcolithic migrations ca. 3300-2600 BC.

    Published works indicate that R1b was a predominant haplogroup from the late Neolithic to the early Bronze Age, notably in the Bell Beaker and Yamnaya cultures (Allentoft et al., 2015; Haak et al., 2015; Lee et al., 2012; Mathieson et al., 2015). Nearly 100% of the Afanasievo men we typed belonged to the R1b1a1a subhaplogroup and, for at least three of them, more precisely to the L23 (xM412) subclade. (…)

    (…) our results therefore support the hypothesis of a genetic link between Afanasievo and Yamnaya. This also suggests that R1b was indeed dominant in the early Bronze Age Siberian steppe, at least in individuals that were buried in kurgans (possibly an elite part of the population). The geographical and temporal distribution of subhaplogroup R1b1a1a supports the hypothesis of population expansion from West to East in the Eurasian steppe during this period. It should however be noted that the Yamnaya burials from which the samples for DNA analysis were obtained (Allentoft et al., 2015; Haak et al., 2015; Mathieson et al., 2015) were dated within the limits of the Afanasievo period. Ancestors of both East Yamnaya and Afanasievo populations must therefore be sought in the context of earlier Eneolithic cultures in Eastern Europe. Sufficient Y-chromosomal data from such Eneolithic populations is, unfortunately, not yet available.

    mtdna-ydna-afanasevo-okunevo
    Mitochondrial- (A) and Y- (B) haplogroup distribution in studied populations

    Okunevo and paternal lineage shift in South Siberia

    Results obtained in the current study, from more than a dozen Okunevo individuals belonging to the earliest stage of Okunevo culture, that is the Uibat period (2500–2200 BC) (Lazaretov, 1997), suggest a discontinuity in the genetic pool between Afanasievo and Okunevo cultures. Although Y-chromosomal data obtained for bearers of the Okunevo culture showed that one individual carried haplogroup R1b, most Okunevo Y-haplogroups are representative of an Asian component represented by paternal lineages Q and NO1.

    Okunevo carrier of Y-haplogroup Q1b1a-L54, which also supports this hypothesis (L54 being a marker of the lineage from which M3, the main Ameridian lineage, arose). Okunevo people could therefore be a remnant paleo-Siberian population with possible Afanasievo input, as suggested by the presence of the R1b1a1a2a subhaplogroup in one individual.

    indo-european-uralic-migrations-afanasevo-late
    Late Chalcolithic migrations ca. 2600-2250 BC.

    Replacement of Asian Indo-European elite lineages by R1a

    Published genetic data from the late Bronze Age Andronovo culture from the Minusinsk Basin (Keyser et al., 2009), the Sintashta culture from Russia (Allentoft et al., 2015) and the Srubnaya culture from the region of Samara (Mathieson et al., 2015), show that males did not belong to Y-haplogroup R1b but mostly to R1a clades: there appears to have been a change in the dominant Y-chromosomal haplogroup between the early and the late Bronze Age in these regions. Moreover, as described in Allentoft et al. (2015), the Andronovo and Sintashta peoples were closely related to each other but clearly distinct from both Yamnaya and Afanasievo. Although these results do not imply that Y-haplogroup R1b was entirely absent in these later populations, they could correspond to a replacement of the elite between these two main periods and therefore a difference in the haplogroups of the men that were preferentially buried.

    indo-european-uralic-migrations-okunevo-andronovo
    Early Bronze Age migrations ca. 2250-1750 BC.

    Afanasevo and the Tarim Basin

    The discovery, in the Tarim Basin, of well-preserved mummies from the Bronze Age allows for the construction of two hypotheses regarding the peopling of the Xinjiang province at this period. The “steppe hypothesis,” argues for a link with nomadic steppe herders (Hemphill and Mallory, 2004), possibly represented in this case by Afanasievo populations and their descendants (Mallory and Mair, 2000). However, newly published cultural data from the burial grounds of Gumugou (Wang, 2014) and Xiaohe (Xinjiang, 2003, 2007) shows material culture and burial rites incompatible with the Afanasievo culture. The earliest 14C date for Tarim Basin burials would place them at the turn of the 2nd millenium BC (Wang, 2013), 500 years after the Afanasievo period.

    Instead, early Gumugou and Xiaohe burial grounds were contemporary with the start of the Andronovo period. Likewise, the Bronze Age population of the Xinjiang at Gumugou/Qäwrighul is not phenotypically closest to Afanasievo but to the Andronovo (Fedorovo) group of northeastern Kazakhstan and western Altai (Kozintsev, 2009). Our investigations demonstrate that Y-chromosomal lineage composition is also compatible with the notion that the ancient Tarim population was genetically distinct from the Afanasievo population. The only Y-haplogroup found by Li et al. (2010) in the Bronze Age Tarim Basin population was Y-haplogroup R1a, which suggests a proximity of this population with Andronovo groups rather than Afanasievo groups.

    I don’t think these finds are much of a surprise based on what we already know, or need much explanation…

    I would add that, once again, we have more proof that the movement of Okunevo and related ancient Siberian migrants from Central or North Asia will not be able to explain the presence of Uralic languages spread over North-East Europe and Scandinavia already during the Bronze Age.

    Also interesting is to read in more peer-reviewed papers the idea of Late Indo-European speakers clearly linked to the expansion of patrilineally-related elite males marked by haplogroup R1b-L23, most likely since Eneolithic Khvalynsk/Repin cultures.

    Related:

    Consequences of Damgaard et al. 2018 (III): Proto-Finno-Ugric & Proto-Indo-Iranian in the North Caspian region

    copper-age-early_yamna-corded-ware

    The Indo-Iranian – Finno-Ugric connection

    On the linguistic aspect, this is what the Copenhagen group had to say (in the linguistic supplement) based on Kuz’mina (2001):

    (…) a northern connection is suggested by contacts between the Indo-Iranian and the Finno-Ugric languages. Speakers of the Finno-Ugric family, whose antecedent is commonly sought in the vicinity of the Ural Mountains, followed an east-to-west trajectory through the forest zone north and directly adjacent to the steppes, producing languages across to the Baltic Sea. In the languages that split off along this trajectory, loanwords from various stages in the development of the Indo-Iranian languages can be distinguished: 1) Pre-Proto-Indo-Iranian (Proto-Finno-Ugric *kekrä (cycle), *kesträ (spindle), and *-teksä (ten) are borrowed from early preforms of Sanskrit cakrá- (wheel, cycle), cattra- (spindle), and daśa- (10); Koivulehto 2001), 2) Proto-Indo-Iranian (Proto-Finno-Ugric *śata (one hundred) is borrowed from a form close to Sanskrit śatám (one hundred), 3) Pre-Proto-Indo-Aryan (Proto-Finno-Ugric *ora (awl), *reśmä (rope), and *ant- (young grass) are borrowed from preforms of Sanskrit ā́rā- (awl), raśmí- (rein), and ándhas- (grass); Koivulehto 2001: 250; Lubotsky 2001: 308), and 4) loanwords from later stages of Iranian (Koivulehto 2001; Korenchy 1972). The period of prehistoric language contact with Finno-Ugric thus covers the entire evolution of Pre-Proto-Indo-Iranian into Proto-Indo-Iranian, as well as the dissolution of the latter into Proto-Indo- Aryan and Proto-Iranian. As such, it situates the prehistoric location of the Indo-Iranian branch around the southern Urals (Kuz’mina 2001).

    NOTE. While I agree with the evident ancestral nature of the *kekrä borrowing, I will repeat it here again: I don’t believe that the distinction of late Proto-Indo-Iranian from ‘Pre-Proto-Indo-Aryan’ loans is warranted; not for words reconstructed from recent Finno-Ugric languages.

    copper-age-late-urals
    The time and place for Finno-Ugric and Indo-Iranian contacts. Late Copper Age migrations in Asia ca. 2800-2300 BC.

    In this period of a Pre-Proto-Indo-Iranian community, which is to be associated with East Yamna/Poltavka, ca. 3000-2400 BC – as accepted in the supplement from de Barros Damgaard et al. (Nature 2018) – , both Poltavka and Abashevo/Balanovo herders were expanding ca. 2800-2600 BC to the east (and Abashevo already admixing into Poltavka territory), near the southern Urals.

    There is no other, clearer, later connection between Finno-Ugric and Proto-Indo-Iranian speakers. Even the arrival of the Seima-Turbino phenomenon (after ca. 2000 BC), if it brought migrants to North-East Europe, would not fit the linguistic, archaeological, or genetic data. It is by now quite clear that Seima-Turbino does not fit with incoming N1c1 lineages and/or Siberian ancestry, either, for those looking for these as potential signs of incoming Uralic speakers.

    While the Copenhagen group did not have access to data from Sintashta ca. 2100 BC onwards – now available in Narasimhan et al. (2018) – when submitting the papers, we already know that there was a clear long period of slow progressive admixture in the North Caspian region. It can be seen in the genetic contribution of Yamna to incoming Abashevo groups, and in the R1b-L23 samples still appearing in Sintashta until ca. 1800 BC (as I predicted could happen).

    Since the first sample signalling incoming Abashevo migrants is found in the Poltavka outlier dated ca. 2700 BC (of R1a-Z93 lineage), this represents a rather unique, several centuries long process of admixture in the North Caspian region, different from the massive Afanasevo or Bell Beaker migrations in Asia and Europe, whereby a great part of the native male population was suddenly replaced.

    This offers further support for language continuity despite genetic replacement in the development of East Yamna/Poltavka (part of the Steppe EMBA cline, formed by Yamna and Afanasevo) mixing with Abashevo migrants (probably identical to Corded Ware samples) to form Potapovka, Sintashta, and later Srubna, and Andronovo communities (all forming, with Corded Ware groups, a wide Eurasian Steppe MLBA cloud). See the available data from Narasimhan et al. (2018).

    yamna-late-proto-indo-european
    Image modified from Narasimhan et al. (2018), including the most likely proto-language identification of different groups. Original description “Modeling results including Admixture events, with clines or 2-way mixtures shown in rectangles, and clouds or 3-way mixtures shown in ellipses”. See the original full image here.

    The continuous interactions and migrations left thus eventually two communities in the southern Urals genetically similar, but ethnolinguistically diverse:

    • To the north, Abashevo-Balanovo – but potentially also Fatyanovo, and related North-East European late Corded Ware groups – borrowed necessary words from Indo-Iranian neighbours, while maintaining their Finno-Ugric language and culture.
    • To the south, immigrants (or their descendants) of Abashevo origin expanding among Pre-Proto-Indo-Iranian-speaking North Caspian communities assimilated the surrounding culture and language, giving it their own accent (i.e. ‘satemizing’ it) and turning it into Proto-Indo-Iranian (see e.g. Parpola’s account).

    Anthropologically, this ‘long-term founder effect’ that appears as genetic replacement is probably explained by the faster life history in MLBA North Caspian populations, likely due to a combination of changing environmental and social circumstances.

    NOTE. The prevalent explanation before the latest studies on the Sintashta society were social strife and isolation of small groups, an argument I used in my demic diffusion model. Other, similar cases of proven linguistic continuity despite genetic replacement are seen in Iberian Bronze Age after the expansion of R1b-L23 lineages (with Vasconic, Iberian, and Tartessian surviving at least until proto-historic times), and in Remote Oceania.

    bronze_age_early_Asia-andronovo
    Diachronic map of migrations in Asia ca. 2250-1750 BC

    Implications for Late PIE migrations

    I am happy to see that people are resorting now to dialectal classifications and Y-DNA to explain the findings in Old Hittites, Tocharians (and related migrations), and Indo-Iranians. It is especially interesting to see precisely this Danish group downplay the relevance of ancestry and favor complex anthropological models when assessing migrations and ethnolinguistic identification.

    So let’s talk about the growing elephant in the room.

    It seems we all accept now Tocharian’s more archaic Late PIE nature, which is supported by waves of late Khvalynsk migrants starting probably ca. 3300 BC, as seen in different samples to the east in Central Asia, and to the south in Iran. Almost all of them share R1b-L23 lineages.

    NOTE. Whereas their early LPIE dialects have not survived to historic times, the rather speculative hypotheses of Euphratic and Gutian languages may be of interest.

    We also know of the coetaneous migrants that settled to the west of the Don River (in the territory of the previous late Sredni Stog culture), to form the western South-Bug / Lower Don groups, which, together with the Volga-Ural / North Caucasian groups formed the early Yamna culture, that dominated from ca. 3300 BC over the Pontic-Caspian steppe.

    It is only logical that the other attested languages belonging to the common Late PIE trunk must come from these groups, which must have stuck together for quite some time – after the recently proven late Khvalynsk migrations – , to allow for the spread of isoglosses (not found in Tocharian) among them.

    This is agreed, even by the Copenhagen group, who expressly state that Yamna is to be identified with the rest of Late PIE languages after the Tocharian-related migrations.

    copper-age-early_yamna-corded-ware
    Early Yamna community and its migrations ca. 3000 BC onwards.

    The period of an early Yamna community constrained to the Pontic-Caspian steppe (ca. 3300-3000 BC) is followed by renewed waves of Late Proto-Indo-European migrations, during which areal contacts and innovations (even between unrelated LPIE branches) can still be reconstructed.

    These later migrations can be precisely described as follows (after the latest studies):

    • Yamna migrants, of mixed R1b-L51 and R1b-Z2103 lineages, settle ca. 3000-2600 BC along the lower Danube, in the Balkans and the Carpathian basin, giving rise later to groups of:
    • In the Pontic-Caspian steppe, early Yamna groups evolve into (from west to east) Late Yamna, Catacomb, and Poltavka groups, ca. 2800-2300 BC, all still dominated by R1b-L23 lineages (see discussion on the Catacomb sample), with:
      • Poltavka peoples admixing with Abashevo migrants to form admixed Potapovka and Sintashta-Petrovka groups, showing still after ca. 1800 BC a mixed society of R1a-Z93 and R1b-Z2103 lineages (see Narasimhan et al. 2018);
        • Expanding early Proto-Iranian and Proto-Indo-Aryan groups in Srubna (to the west) and Andronovo (to the east), during the first half of the 2nd millennium BC, dominate over the Bronze Age steppe and Central Asia with expanding R1a-Z93 lineages.

    Conclusion

    chalcolithic_late_Europe_Bell_Beaker
    Diachronic map of Late Copper Age migrations including Classical Bell Beaker (east group) expansion from central Europe ca. 2600-2250 BC

    1) East Bell Beakers clearly dominated culturally and genetically over almost all of Europe, ca. 2500-2000 BC, including previous Corded Ware territory, representing thus the most recent massive migration of steppe peoples in Europe, and being the only pan-European culture derived from Late Proto-Indo-European-speaking Yamna. They must therefore be identified with North-West Indo-European speakers, as proposed by Mallory (2013), and not just Italo-Celtic (as supported recently by the Danish school, based on Gimbutas’ outdated model):

    1.A) For Germanic, we already have proof that an appropriate, unitary Scandinavian society, ripe for the development of a common Pre-Germanic language (that expanded much later, during the Iron Age, as Proto-Germanic) could have developed only after the arrival of Bell Beakers (see Prescott 2017). The association of proto-historic Germanic tribes mainly with the expansion of R1b-U106 lineages bears witness to that.

    NOTE. Even without taking into account the likely L51 samples from Khvalynsk, it is by now quite clear that R1b-L51 lineages were already admixed in Yamna settlers from the Carpathian Basin, and any subclade of U106, L21, DF27, or U152 can thus be found everywhere in Europe associated with any of those North-West Indo-European migrations. What we are seing later, as in the East Bell Beaker migrants arriving in the British Isles (L21), Iberia (DF27), or the Netherlands/Scandinavia (U106), is the further reduction in variability coupled with the expansion of a few sucessful families (and their lineages), as we know it usually happens during migrations.

    1.B) For Balto-Slavic, it seems they were not part of the eastern Corded Ware peoples: the Copenhagen group denies an Indo-Slavonic group in the Nature paper, referring instead to a dominion of early Iranians in the steppes, following their traces to proto-historic and historic Iranian-speaking peoples. And we knew already that Bell Beakers dominated over Central-East Europe, before the resurge of R1a-Z645 lineages in the region, which is compatible with the North-West Indo-European nature of their language undergoing a satemization process similar (but not equal to) to the Indo-Iranian one (see the full discussion on Balto-Slavic here).

    NOTE. The few ancestral traits common to Germanic and Balto-Slavic are today considered a common substrate language to both, and not due to close contacts (and still less a common branch, as was proposed in the 1st half of the 20th c.). You can read e.g. Kortlandt’s Baltic, Slavic, Germanic (2017), or our Corded Ware substrate hypothesis (2017). In both theories, the referenced substrate is likely a non-Indo-European language, and in both cases it is related to the Corded Ware culture, which represents their most common immediate ancestral population before the spread of Bell Beakers.

    2) The late Corded Ware groups of Finland and Estonia, as well as Fatyanovo and Abashevo (and succeeding groups of Eastern Europe) may now be more clearly associated with Proto-Finno-Ugric dialects, and thus probably Corded Ware groups in general with Uralic languages, whose western branches have not survived to this day, with their culture and language being replaced quite early by expanding Bell Beakers.

    NOTE. While the demise of Central and Central-East European CWC groups is evident, continuous contacts among Battle Axe culture groups in Scandinavia and the Gulf of Finland through the Baltic Sea – and the strong Bronze Age Palaeo-Germanic influence on Finnic languages (stronger than earlier Indo-Iranian borrowings) may point to the continuity of Proto-Finnic in Northern Scandinavia, which may force a reinterpretation of the prehistoric location of Proto-Finnic-speaking groups.

    Those supporting a Corded Ware expansion of Germanic or Balto-Slavic with R1a subclades, now rejecting the expansion of Proto-Indo-European from an Anatolian homeland (following the spread of Neolithic farmer ancestry), and negating the close Proto-Indo-Iranian – Uralic contacts, are willfully ignoring linguistic, archaeological, and genetic data whenever it does not fit with their previous theories.

    Good times ahead to chase false syllogisms and contradictions everywhere.

    Related:

    Consequences of Damgaard et al. 2018 (II): The late Khvalynsk migration waves with R1b-L23 lineages

    chalcolithic_early-asia

    This post should probably read “Consequences of Narasimhan et al. (2018),” too, since there seems to be enough data and materials published by the Copenhagen group in Nature and Science to make a proper interpretation of the data that will appear in their corrected tables.

    The finding of late Khvalynsk/early Yamna migrations, identified with early LPIE migrants almost exclusively of R1b-L23 subclades is probably one of the most interesting findings in the recent papers regarding the Indo-European question.

    Although there are still few samples to derive fully-fledged theories, they begin to depict a clearer idea of waves that shaped the expansion of Late Proto-Indo-European migrants in Eurasia during the 4th millennium BC, i.e. well before the expansion of North-West Indo-European, Palaeo-Balkan, and Indo-Iranian languages.

    Late Khvalynsk expansions and archaic Late PIE

    Like Anatolian, Tocharian has been described as having a more archaic nature than the rest of Late PIE. However, Pre-Tocharian belongs to the Late PIE trunk, clearly distinguishable phonetically and morphologically from Anatolian.

    It is especially remarkable that – even though it expanded into Asia – it has more in common with North-West Indo-European, hence its classification (together with NWIE) as part of a Northern group, unrelated to Graeco-Aryan.

    The linguistic supplement by Kroonen et al. accepts that peoples from the Afanasevo culture (ca. 3000-2500 BC) are the most likely ancestors of Tocharians.

    NOTE. For those equating the Tarim Mummies (of R1a-Z93 lineages) with Tocharians, you have this assertion from the linguistic supplement, which I support:

    An intermediate stage has been sought in the oldest so-called Tarim Mummies, which date to ca. 1800 BCE (Mallory and Mair 2000; Wáng 1999). However, also the language(s) spoken by the people(s) who buried the Tarim Mummies remain unknown, and any connection between them and the Afanasievo culture on the one hand or the historical speakers of Tocharian on the other has yet to be demonstrated (cf. also Mallory 2015; Peyrot 2017).

    New samples of late Khvalynsk origin

    These are are the recent samples that could, with more or less certainty, correspond to migration waves from late Khvalynsk (or early Yamna), from oldest to most recent:

    • The Namazga III samples from the Late Eneolithic period (in Turkmenistan), dated ca. 3360-3000 BC (one of haplogroup J), potentially showing the first wave of EHG-related steppe ancestry into South Asia. Not related to Indo-Iranian migrations.

    NOTE. A proper evaluation with further samples from Narasimhan et al. (2018) is necessary, though, before we can assert a late Khvalynsk origin of this ancestry.

    • Afanasevo samples, dated ca. 3081-2450 BC, with all samples dated before ca. 2700 BC uniformly of R1b-Z2103 subclades, sharing a common genetic cluster with Yamna, showing together the most likely genomic picture of late Khvalynsk peoples.

    NOTE 1. Anthony (2007) put this expansion from Repin ca. 3300-3000 BC, while his most recent review (2015) of his own work put its completion ca. 3000-2800. While the migration into Afanasevo may have lasted some time, the wave of migrants (based on the most recent radiocarbon dates) must be set at least before ca. 3100 BC from Khvalynsk.

    NOTE 2. I proposed that we could find R1b-L51 in Afanasevo, presupposing the development of R1b-L51 and R1b-Z2103 lineages with separating clans, and thus with dialectal divisions. While finding this is still possible within Khvalynsk regions, it seems we will have a division of these lineages already ca. 4250-4000 BC, which would require a closer follow-up of the different inner late Khvalynsk groups and their samples. For the moment, we don’t have a clear connection through lineages between North-West Indo-European groups and Tocharian.

    tocharian-early-copper-age
    Early Copper Age migrations in Asia ca. 3300-2800, according to Anthony (2015).
    • Subsequent and similar migration waves are probably to be suggested from the new sample of Karagash, beyond the Urals (attributed to the Yamna culture, hence maintaining cultural contacts after the migration waves), of R1b-Z2103 subclade, ca. 3018-2887 BC, potentially connected then to the event that caused the expansion of Yamna migrants westward into the Carpathians at the same time. Not related to Indo-Iranian migrations.
    • The isolated Darra-e Kur sample, without cultural adscription, ca. 2655 BC, of R1b-L151 lineage. Not related to Indo-Iranian migrations.
    • The Hajji Firuz samples: I4243 dated ca. 2326 BC, female, with a clear inflow of steppe ancestry; and I2327 (probably to be dated to the late 3rd millennium BC or after that), of R1b-Z2103 lineage. Not related to Indo-Iranian migrations.

    NOTE. A new radiocarbon dating of I2327 is expected, to correct the currently available date of 5900-5000 BC. Since it clusters nearer to Chalcolithic samples from the site than I4243 (from the same archaeological site), it is possible that both are part of similar groups receiving admixture around this period, or maybe I2327 is from a later period, coinciding with the Iron Age sample F38 from Iran (Broushaki et al. 2016), with which it closely clusters. Also, the finding of EHG-related ancestry in Maykop samples dated ca. 3700-3000 BC (maybe with R1b-L23 subclades) offers another potential source of migrants for this Iranian group.

    NOTE. Samples from Narasimhan et al. (2018) still need to be published in corrected tables, which may change the actual subclades shown here.

    These late Khvalynsk / early Yamna migration waves into Asia are quite early compared to the Indo-Iranian migrations, whose ancestors can only be first identified with Volga-Ural groups of Yamna/Poltavka (ca. 3000-2400 BC), with its fully formed language expanding only with MLBA waves ca. 2300-1200 BC, after mixing with incoming Abashevo migrants.

    While the authors apparently forget to reference the previous linguistic theories whereby Tocharian is more archaic than the rest of Late PIE dialects, they refer to the ca. 1,000-year gap between Pre-Tocharian and Proto-Indo-Iranian migrations, and thus their obvious difference:

    The fact that Tocharian is so different from the Indo-Iranian languages can only be explained by assuming an extensive period of linguistic separation.

    Potential linguistic substrates in the Middle East

    A few words about relevant substrate language proposals.

    Euphratic language

    What Gordon Whittaker proposes is a North-West Indo-European-related substratum in Sumerian language and texts ca. 3500 BC, which may explain some non-Sumerian, non-Semitic word forms. It is just one of many theories concerning this substratum.

    eneolithic_steppe
    Diachronic map of Eneolithic migrations ca. 4000-3100 BC

    This is a summary of his findings from his latest writing on the subject (a chapter of a book on Indo-European phonetics, from the series Copenhagen Studies in Indo-European):

    In Sumerian and Akkadian vocabulary, the cuneiform writing system, and the names of deities and places in Southern Mesopotamia a body of lexical material has been preserved that strongly suggests influence emanating from a superstrate of Indo-European origin. his Indo-European language, which has been given the name Euphratic, is, at present, attested only indirectly through the filters of Sumerian and Akkadian. The attestations consist of words and names recorded from the mid-4th millennium BC (Late Uruk period) onwards in texts and lexical lists. In addition, basic signs that originally had a recognizable pictorial structure in proto-cuneiform preserve (at least from the early 3rd millennium on) a number of phonetic values with no known motivation in Sumerian lexemes related semantically to the items depicted. This suggests that such values are relics from the original logographic values for the items depicted and, thus, that they were inherited from a language intimately associated with the development of writing in Mesopotamia. Since specialists working on proto-cuneiform, most notably Robert K. Englund of the Cuneiform Digital Library Initiative, see little or no evidence for the presence of Sumerian in the corpus of archaic tablets, the proposed Indo-European language provides a potential solution to this problem. It has been argued that this language, Euphratic, had a profound influence on Sumerian, not unlike that exerted by Sumerian and Akkadian on each other, and that the writing system was the primary vehicle of this influence. he phonological sketch drawn up here is an attempt to chart the salient characteristics of this influence, by comparing reconstructed Indo-European lexemes with similarly patterned ones in Sumerian (and, to a lesser extent, in Akkadian).

    His original model, based on phonetic values in basic proto-cuneiform signs, is quite imaginative and a very interesting read, if you have the time. His Academia.edu account hosts most of his papers on the subject.

    We could speculate about the potential expansion of this substrate language with the commercial contacts between Uruk and Maykop (as I did), now probably more strongly supported because of the EHG found in Maykop samples.

    NOTE. We could also put it in relation with the Anatolian language of Mari, but this would require a new reassessment of its North-West Indo-European nature.

    Nevertheless, this theory is far from being mainstream, anywhere. At least today.

    NOTE. The proposal remains still hypothetic, because of the flaws in the Indo-European parallels – similar to Koch’s proposal of Indo-European in Tartessian inscriptions. A comprehensive critic approach to the theory is found in Sylvie Vanséveren’s A “new” ancient Indo-European language? On assumed linguistic contacts between Sumerian and Indo-European “Euphratic”, in JIES (2008) 36:3&4.

    Gutian language

    References to Gutian are popping up related to the Hajji Firuz samples of the mid-3rd millennium.

    The hypothesis was put forward by Henning (1978) in purely archaeological terms.

    This is the relevant excerpt from the book:

    (…) Comparativists have asserted that, in spite of its late appearance, Tokharian is a relatively archaic form of Indo-European.3 This claim implies that the speakers of this group separated from their Indo-European brethren at a comparatively early date. They should accordingly have set out on their migrations rather early, and should have appeared within the Babylonian sphere of influence also rather early. Earlier, at any rate, than the Indo-Iranians, who spoke a highly developed (therefore probably later) form of Indo-European. Moreover, as some of the Indo-Iranians after their division into Iranians and Indo-Aryans4 appeared in Mesopotamia about 1500 B.C., we should expect the Proto-Tokharians about 2000 B.C. or even earlier.

    If, armed with these assumptions as our working hypothesis, we look through the pages of history, we find one nation – one nation only – that perfectly fulfills all three conditions, which, therefore, entitles us to recognize it as the “Proto-Tokharians”. Tis name was Guti; the intial is also spelled with q (a voiceless back velar or pharyngeal), but the spelling with g is the original one. The closing -i is part of the name, for the Akkadian case-endings are added to it, nom. Gutium etc. Guti (or Gutium, as some scholars prefer) was valid for the nation, considered as an entity, but also for the territory it occupied.
    (…).

    The text goes on to follow the invasion of Babylonia by the Guti, and further eastward expansions supposedly connected with these, to form the attested Tocharians.

    The referenced text by Thorkild Jakobsen offers the interesting linguistic data:

    Among the Gutian rulers is one Elulumesh, whose name is evidently Akkadian Elulum slightly “Gutianized” by the Gutian case(?) ending -eš.40 This Gutian ruler Elulum is obviously the same man whom we find participating in the scramble for power after the death of Shar-kali-sharrii; his name appears there in Sumerian form without mimation as Elulu.

    The Gutian dynasty, from ca. 22nd c. BC appears as follows:

    gutian-rulers

    I don’t think we could derive a potential relation to any specific Indo-European branch from this simple suffix repeated in Gutian rulers, though.

    The hypothesis of the Tocharian-like nature of the Guti (apart from the obvious error of considering them as the ancestors of Tocharians) remains not contrasted in new works since. It was cited e.g. by Gamkrelidze and Ivanov (1995) to advance their Armenian homeland, and by Mallory and Adams in their Encyclopedia (1997).

    It lies therefore in the obscurity of undeveloped archaeological-linguistic hypotheses, and its connection with the attested R1b-Z2103 samples from Iran is not (yet) warranted.

    Related:

    Eurasian steppe dominated by Iranian peoples, Indo-Iranian expanded from East Yamna

    yamna-indo-iranian-expansion

    The expected study of Eurasian samples is out (behind paywall): 137 ancient human genomes from across the Eurasian steppes, by de Barros Damgaard et al. Nature (2018).

    Dicussion (emphasis mine):

    Our findings fit well with current insights from the historical linguistics of this region (Supplementary Information section 2). The steppes were probably largely Iranian-speaking in the first and second millennia bc. This is supported by the split of the Indo-Iranian linguistic branch into Iranian and Indian33, the distribution of the Iranian languages, and the preservation of Old Iranian loanwords in Tocharian34. The wide distribution of the Turkic languages from Northwest China, Mongolia and Siberia in the east to Turkey and Bulgaria in the west implies large-scale migrations out of the homeland in Mongolia since about 2,000 years ago35. The diversification within the Turkic languages suggests that several waves of migration occurred36 and, on the basis of the effect of local languages, gradual assimilation to local populations had previously been assumed37. The East Asian migration starting with the Xiongnu accords well with the hypothesis that early Turkic was the major language of Xiongnu groups38. Further migrations of East Asians westwards find a good linguistic correlate in the influence of Mongolian on Turkic and Iranian in the last millennium39. As such, the genomic history of the Eurasian steppes is the story of a gradual transition from Bronze Age pastoralists of West Eurasian ancestry towards mounted warriors of increased East Asian ancestry—a process that continued well into historical times.

    This paper will need a careful reading – better in combination with Narasimhan et al. (2018), when their tables are corrected – , to assess the actual ‘Iranian’ nature of the peoples studied. Their wide and long-term dominion over the steppe could also potentially explain some early samples from Hajji Firuz with steppe ancestry.
    fku

    eurasian-steppe-samples
    Principal component analyses. The principal components 1 and 2 were plotted for the ancient data analysed with the present-day data (no projection bias) using 502 individuals at 242,406 autosomal SNP positions. Dimension 1 explains 3% of the variance and represents a gradient stretching from Europe to East Asia. Dimension 2 explains 0.6% of the variance, and is a gradient mainly represented by ancient DNA starting from a ‘basal-rich’ cluster of Natufian hunter-gatherers and ending with EHGs. BA, Bronze Age; EMBA, Early-to-Middle Bronze Age; SHG, Scandinavian hunter-gatherers.

    For the moment, at first sight, it seems that, in terms of Y-DNA lineages:

    • R1b-Z93 (especially Z2124 subclades) dominate the steppes in the studied periods.
    • R1b-P312 is found in Hallstatt ca. 810 BC, which is compatible with its role in the Celtic expansion.
    • R1b-U106 is found in a West Germanic chieftain in Poprad (Slovakia) ca. 400 AD, during the Migration Period, hence supporting once again the expansion of Germanic tribes especially with R1b-U106 lineages.
    • A new sample of N1c-L392 (L1025) lineage dated ca. 400 AD, now from Lithuania, points again to a quite late expansion of this lineage to the region, believed to have hosted Uralic speakers for more than 2,000 years before this.
    • A sample of haplogroup R1a-Z282 (Z92) dated ca. 1300 AD in the Golden Horde is probably not quite revealing, not even for the East Slavic expansion.
    • Also, interestingly, some R1b(xM269) lineages seem to be associated with Turkic expansions from the eastern steppe dated around 500 AD, which probably points to a wide Eurasian distribution of early R1b subclades in the Mesolithic.

    NOTE. I have referenced not just the reported subclades from the paper, but also (and mainly) further Y-SNP calls studied by Open Genomes. See the spreadsheet here.

    Interesting also to read in the supplementary materials the following, by Michaël Peyrot (emphasis mine):

    1. Early Indo-Europeans on the steppe: Tocharians and Indo-Iranians

    The Indo-European language family is spread over Eurasia and comprises such branches and languages as Greek, Latin, Germanic, Celtic, Sanskrit etc. The branches relevant for the Eurasian steppe are Indo-Aryan (= Indian) and Iranian, which together form the Indo-Iranian branch, and the extinct Tocharian branch. All Indo-European languages derive from a postulated protolanguage termed Proto-Indo-European. This language must have been spoken ca 4500–3500 BCE in the steppe of Eastern Europe21. The Tocharian languages were spoken in the Tarim Basin in present-day Northwest China, as shown by manuscripts from ca 500–1000 CE. The Indo-Aryan branch consists of Sanskrit and several languages of the Indian subcontinent, including Hindi. The Iranian branch is spread today from Kurdish in the west, through a.o. Persian and Pashto, to minority languages in western China, but was in the 2nd and 1st millennia BCE widespread also on the Eurasian steppe. Since despite their location Tocharian and Indo-Iranian show no closer relationship within Indo-European, the early Tocharians may have moved east before the Indo-Iranians. They are probably to be identified with the Afanasievo Culture of South Siberia (ca 2900 – 2500 BCE) and have possibly entered the Tarim Basin ca 2000 BCE103.

    The Indo-Iranian branch is an extension of the Indo-European Yamnaya Culture (ca 3000–2400 BCE) towards the east. The rise of the Indo-Iranian language, of which no direct records exist, must be connected with the Abashevo / Sintashta Culture (ca 2100 – 1800 BCE) in the southern Urals and the subsequent rise and spread of Andronovo-related Culture (1700 – 1500 BCE). The most important linguistic evidence of the Indo-Iranian phase is formed by borrowings into Finno-Ugric languages104–106. Kuz’mina (2001) identifies the Finno-Ugrians with the Andronoid cultures in the pre-taiga zone east of the Urals107. Since some of the oldest words borrowed into Finno-Ugric are only found in Indo-Aryan, Indo-Aryan and Iranian apparently had already begun to diverge by the time of these contacts, and when both groups moved east, the Iranians followed the Indo-Aryans108. Being pushed by the expanding Iranians, the Indo-Aryans then moved south, one group surfacing in equestrian terminology of the Anatolian Mitanni kingdom, and the main group entering the Indian subcontinent from the northwest.

    steppe-migrations-pastoralists
    Summary map. Depictions of the five main migratory events associated with the genomic history of the steppe pastoralists from 3000 bc to the present. a, Depiction of Early Bronze Age migrations related to the expansion of Yamnaya and Afanasievo culture. b, Depiction of Late Bronze Age migrations related to the Sintashta and Andronovo horizons. c, Depiction of Iron Age migrations and sources of admixture. d, Depiction of Hun-period migrations and sources of admixture. e, Depiction of Medieval migrations across the steppes.

    2. Andronovo Culture: Early Steppe Iranian

    Initially, the Andronovo Culture may have encompassed speakers of Iranian as well as Indo-Aryan, but its large expansion over the Eurasian steppe is most probably to be interpreted as the spread of Iranians. Unfortunately, there is no direct linguistic evidence to prove to what extent the steppe was indeed Iranian speaking in the 2nd millennium BCE. An important piece of indirect evidence is formed by an archaic stratum of Iranian loanwords in Tocharian34,109. Since Tocharian was spoken beyond the eastern end of the steppe, this suggests that speakers of Iranian spread at least that far. In the west of the Tarim Basin the Iranian languages Khotanese and Tumshuqese were spoken. However, the Tocharian B word etswe ‘mule’, borrowed from Iranian *atswa- ‘horse’, cannot derive from these languages, since Khotanese has aśśa- ‘horse’ with śś instead of tsw. The archaic Iranian stratum in Tocharian is therefore rather to be connected with the presence of Andronovo people to the north and possibly to the east of the Tarim Basin from the middle of the 2nd millennium BCE onwards110.

    Since Kristiansen and Allentoft sign the paper (and Peyrot is a colleague of Kroonen), it seems that they needed to expressly respond to the growing criticism about their recent Indo-European – Corded Ware Theory. That’s nice.

    They are obviously trying to reject the Corded Ware – Uralic links that are on the rise lately among Uralicists, now that Comb Ware is not a suitable candidate for the expansion of the language family.

    IECWT-proponents are apparently not prepared to let it go quietly, and instead of challenging the traditional Neolithic Uralic homeland in Eastern Europe with a recent paper on the subject, they selected an older one which partially fit, from Kuz’mina (2001), now shifting the Uralic homeland to the east of the Urals (when Kuz’mina asserts it was south of the Urals).

    Different authors comment later in this same paper about East Uralic languages spreading quite late, so even their text is not consistent among collaborating authors.

    Also interesting is the need to resort to the questionable argument of early Indo-Aryan loans – which may have evidently been Indo-Iranian instead, since there is no way to prove a difference between both stages in early Uralic borrowings from ca. 4,500-3,500 years ago…

    EDIT (10/5/2018) The linguistic supplement of the Science paper deals with different Proto-Indo-Iranian stages in Uralic loans, so on the linguistic side at least this influence is clear to all involved.

    A rejection of such proposals of a late, eastern homeland can be found in many recent writings of Finnic scholars; see e.g. my references to Parpola (2017), Kallio (2017), or Nordqvist (2018).

    NOTE. I don’t mind repeating it again: Uralic is one possibility (the most likely one) for the substrate language that Corded Ware migrants spread, but it could have been e.g. another Middle PIE dialect, similar to Proto-Anatolian (after the expansion of Suvorovo-Novodanilovka chiefs). I expressly stated this in the Corded Ware substrate hypothesis, since the first edition. What was clear since 2015, and should be clear to anyone now, is that Corded Ware did not spread Late PIE languages to Europe, and that some east CWC groups only spread languages to Asia after admixing with East Yamna. If they did not spread Uralic, then it was a language or group of languages phonetically similar, which has not survived to this day.

    Their description of Yamna migrations is already outdated after Olalde et al. & Mathieson et al. (2018), and Narasimhan et al. (2018), so they will need to update their model (yet again) for future papers. As I said before, Anthony seems to be one step behind the current genetic data, and the IECWT seems to be one step behind Anthony in their interpretations.

    At least we won’t have the Yamna -> Corded Ware -> BBC nonsense anymore, and they expressly stated that LPIE is to be associated with Yamna, and in particular the “Indo-Iranian branch is an extension of the Indo-European Yamnaya Culture (ca 3000–2400 BCE) to the East” (which will evidently show an East Yamna / Poltavka society of R1b-L23 subclades), so that earlier Eneolithic cultures have to be excluded, and Balto-Slavic identification with East Europe is also out of the way.

    Related: