Sorry for the last weeks of silence, I have been rather busy lately. I am having more projects going on, and (because of that) I also wanted to finish a project I have been working on for many months already.
I have therefore decided to publish a provisional version of the text, in the hope that it will be useful in the following months, when I won’t be able to update it as often as I would like to:
Don’t forget to check out the maps included in the supplementary materials (I have added Y-DNA, mtDNA, and ADMIXTURE data using GIS software).
NOTE. Right now the files are only in my server. I will try to upload them to Academia.edu and Research Gate when I have time, in case the websites are too slow.
I would have preferred to wait for a thorough revision of the section on archaeology and the linguistic sections on Uralic, but I doubt I will have time when the reviews come, so it was either now or maybe next December…
I say so in the introduction, but it is evident that certain aspects of the book are tentative to say the least: the farther back we go from Late Proto-Indo-European, the less clear are many aspects. Also, linguistically I am not convinced about Eurasiatic or Nostratic, although they do have a certain interest when we try to offer a comprehensive view of the past, including ethnolinguistic identities.
I cannot be an expert in everything, and these books cover a lot. I am bound to publish many corrections as new information appears and more reviews are sent. For example, just days ago (before SNP calls of Wang et al. 2018 were published) some paragraphs implied that AME might have expanded Nostratic from the Middle East. Now it does not seem so, and I changed them just before uploading the text. That’s how tentative certain routes are, and how much all of this may change. And that only if we accept a Nostratic phylum…
NOTE. Since the first book I wrote was the linguistic one, and I have spent the last months updating the archaeology + genetics part, now many of you will probably understand 1) why I am so convinced about certain language relationships and 2) how I used many posts to clarify certain ideas and receive comments. Many posts offer probably a good timeline of what I worked with, and when.
I did not add this section to the books, because they are still not ready for print, but I think this is due somewhere now. It is impossible to reference all who have directly or indirectly contributed to this, so this is a list of those I feel have played an important role.
I am indebted to the following people (which does not mean that they share my views, obviously):
First and foremost, to Fernando López-Menchero, for having the patience to review with detail many parts on Indo-European linguistics, knowing that I won’t accept many of his comments anyway. The additional information he offers is invaluable, but I didn’t want to turn this into a huge linguistic encyclopaedia with unending discussions of tiny details of each reconstructed word. I think it is already too big as it is.
Professor Kortlandt is still to review the text, but he contributed to both previous essays in some very interesting ways, so I hope he can help me improve the parts on Uralic, and maybe alternative accounts of expansion for Balto-Slavic, depending on the time depth that he would consider warranted according to the Temematic hypothesis.
I would not have thought about doing this if it were not for the interest of Wekwos (Xavier Delamarre) in publishing a full book about the Indo-European demic diffusion model (in the second half of 2017, I think). It was them who suggested that I extended the content, when all I had done until then was write an essay and draw some maps in my free time between depositing the PhD thesis and defending it.
Sadly, as much as I would like to publish a book with a professional publisher, I don’t think ancient DNA lends itself for the traditional format, so my requests (mainly to have free licenses and being able to review the text at will, as new genetic papers are published) were logically not acceptable. Also, the main aim of all volumes, especially the linguistic one, is the teaching of essentials of Late Proto-Indo-European and related languages, and this objective would be thwarted by selling each volume for $50-70 and only in printed format. I prefer a wider distribution.
At first I didn’t think much of this proposal, because I do not benefit from this kind of publications in my scientific field, but with time my interest in writing a whole, comprehensive book on the subject grew to the point where it was already an ongoing project, probably by the start of 2018.
I would not have been in contact with Wekwos if it were not for user Camulogène Rix at Anthrogenica, so thanks for that and for the interest in this work.
I would not have thought of writing this either if not for the spontaneous support (with an unexpected phone call!) of a professor of the Complutense University of Madrid, Ángel Gómez Moreno, who is interested in this subject – as is his wife, a professor of Classics more closely associated to Indo-European studies, and who helped me with a search for Indo-Europeanists.
EDIT (1 JAN 2019): I remembered that Karin Bojs sent me her book after reading the demic diffusion model. I may have also thought about writing a whole book back then, but mid-2017 is probably too early for the project.
The maps are evidently (for those who are interested in genetics) in part the result of the effort of the late Jean Manco: As you can see from the maps including Y-DNA and mtDNA samples, I have benefitted from her way of organising data and publishing it. Similarly, the work of Iain McDonald in assessing the potential migration routes of R1b and R1a in Europe with the help of detailed maps was behind my idea for the first maps, and consequently behind these, too.
Readers of this blog with interesting comments have also been essential for the improvement of the texts. You can probably see some of your many contributions there. I may not answer many comments, because I am always busy (and sometimes I just don’t have anything interesting to say), but I try to read all of them.
Users of other sites, like Anthrogenica, whose particular points of view and deep knowledge of some very specific aspects are sometimes very useful. In particular, user Anglesqueville helped me to fix some issues with the merging of datasets to obtain the PCAs and ADMIXTURE, and prepared some individual samples to merge them.
Even without posting anything, Google Analytics keeps sending me messages about increasing user fidelity (returning users), and stats haven’t really changed (which probably means more people are reading old posts), so thank you for that.
It is a great resource to learn Late Proto-Indo-European as a modern language, from the most basic level up to an intermediate level (estimated B1–B2, depending on one’s previous background in Indo-European and classical languages).
Instead of working on unending details and discussions of the language reconstruction, it takes Late Proto-Indo-European as a learned, modern language that can be used for communication, so that people not used to study with university manuals on comparative grammar can learn almost everything necessary about PIE in the most comfortable way.
A Guidebook for Modern Indo-European Explorers (Part I), by Fernando López-Menchero, is online! https://t.co/L82Zx75bAj It is a self-learning method of Late PIE as a modern language (A1 up to B1-B2 level), divided into fun lessons with key grammar and culture details. pic.twitter.com/rmWm2m3vfs
NOTE. Even though we help each other with our works, Fernando is not the least interested in genetics (the “steppe ancestry” or the “R1b–R1a” question, or any other issue involving population genomics), or even too much about archaeology or the homeland question (although he uses the mainstream view that Late Proto-Indo-Europeans expanded from Yamna). His only interest is language reconstruction, and I doubt you can find anything else in his works but pure love for linguistics, including this one.
I was starting to call his project of a self-learning method The Winds of Winter, seeing how it appeared to be always in the making, but never actually finished. It seems that the publication of this first part will make my revision of the Indo-European demic diffusion model become the true The Winds of Winter here, in this our common series of books on Late Proto-Indo-European and its dialects…
As you can see, I am publishing less and less in this blog lately, and it’s all just to be able to finish a revision in time (that is, before more new genetic research compels me to delay it again…). It is a very thorough revision, so those of you who liked it are not going to be disappointed.
I hoped to have it ready for mid-December, but, as it turns out, due to different unexpected delays, I am now more confident about a mid-January / February date, and that only if everything goes well.
Overall, 96 samples ranging from Slovenian littoral to Lower Styria were genotyped for 713,599 markers using the OmniExpress 24-V1 BeadChips (Figure 1), genetic data were obtained from Esko et al. (2013). After removing related individuals, 92 samples were left. The Slovenian dataset has been subsequently merged with the Human Origin dataset (Lazaridis et al., 2016) for a total of 2163 individuals.
First, Y chromosome genetic diversity was assessed. A total of 52 Y chromosomes were analyzed for 195 SNPs. The majority of individuals (25, 48.1%) belong to the haplogroup R1a1a1a (R-M417) while the second major haplogroup is represented by R1b (R-M343) including 15 individuals (28.8%). Twelve samples are assigned to haplogroup I (I M170): five and two samples belong to haplogroup I2a (I L460) and I1 (I M253), respectively, while the remaining five samples did not have enough information to be further assigned.
Considering the unbalanced sample size of the Slovenian population compared to the other populations included in the dataset, a subset of 20 Slovenian individuals randomly sampled was used.
All Slovenian samples group together with Hungarians, Czechs, and some Croatians (“Central-Eastern European” cluster) as also suggested by the PCA. All Basque individuals with few French and Spanish cluster together (“Basque” cluster) while a “Northern-European” cluster is made of the majority of French, English, Icelanders, Norwegians, and Orcadians. Five populations contributed to the “Eastern-European” cluster including Belarusians, Estonians, Lithuanians, Mordovians, and Russians. Western and South Europe is split into two cluster: the first (“Western European” cluster) includes all Spanish individuals, few French, and some Italians (North Italy) while the second (“Southern-European” cluster) groups Sicilians, Greeks, some Croatians, Romanians, and some Italians (North Italy).
Admixture Pattern and Migration
All Slovenian individuals share common pattern of genetic ancestry, as revealed by ADMIXTURE analysis. The three major ancestry components are the North East and North West European ones (light blue and dark blue, respectively, Figure 3), followed by a South European one (dark green, Figure 3). Contribution from the Sardinians and Basque are present in negligible amount. The admixture pattern of Slovenians mimics the one suggested by the neighboring Eastern European populations, but it is different from the pattern suggested by North Italian populations even though they are geographically close.
Using ALDER, the most significant admixture event was obtained with Russians and Sardinians as source populations and it happened 135 ± 9.31 generations ago (Z-score = 11.54). (…) When tested for multiple admixture events (MALDER), we obtained evidence for one admixture event 165.391 ± 17.1918 generations ago corresponding to ∼2620 BCE (CI: 3101–2139) considering a generation time of 28 years (Figure 4), with Kalmyk and Sardinians as sources.
We then modeled the Slovenian population as target of admixture of ancient individuals from Haak et al. (2015) while computing the f3(Ancient 1, Ancient 2, Slovenian) statistic. The most significant signal was obtained with Yamnaya and HungaryGamba_EN (Z-score = -10.66), followed by MA1 with LBK_EN (Z-score -9.7) and Yamnaya with Stuttgart (Z-score = -8.6) used as possible source populations (Supplementary Figure 5).
We found a significant signal of admixture by using both pairs as ancient sources. Specifically, for the pair Yamnaya and Hungary_EN the admixture event is dated at 134.38 ± 23.69 generations ago (Z-score = 5.26, p-value of 1.5e-07) while for Yamnaya and LBK_EN at 153.65 ± 22.19 generations ago (Z-score = 6.92, p-value 4.4e-12). Outgroup f3 with Yamnaya put Slovenian population close to Hungarians, Czechs, and English, indicating a similar shared drift between these population with the Steppe populations (Supplementary Figure 6).
Not that any of this would come as a surprise, but:
PCA keeps supporting the common cluster of certain West, South, and East Slavs in a “Central-Eastern European” cluster, distinct from the “North-Eastern European” cluster formed by modern Finno-Ugrians, as well as ancient Finno-Ugrians of north-eastern Europe who were only recently Slavicized.
Admixture supports the same ancient ‘western’ (a core West+South+East Slavic) cluster, and the admixture event with Yamna + Hungary_EN is logically a proxy for Yamna Hungary being at the core of ancestral Central-East population movements related to Bell Beakers in the mid- to late 3rd millennium.
I don’t know where exactly this impulse for the theory of Russia being the cradle of Slavs comes from today (although there are some obvious political trends to revive 19th c. ideas), but it was always clear for everyone, including Russians, that East Slavs had migrated to the east and north and assimilated indigenous Finno-Ugrians, apart from Turkic-, Iranian-, and Caucasian-speaking peoples to the east. Genetics is only confirming what was clear from other disciplines long ago.
Consistent with their origin, Mongolic-speaking Buryats demonstrate genetic similarity with Mongols, and Turkic-speaking Altai-Kizhi and Teleuts are drawn close to CAS groups. The Tungusic-speaking Evenks collected in central and eastern Siberia cluster together and overlap with Yukagirs. Dolgans are widely scattered in the plot, justifying their recent origin from one Evenk clan, Yakuts, and Russian peasants in the 18th century (Popov, 1964). Uralic-speaking populations comprise a very wide cluster with Komi drawn to Europe, and Khants showing a closer affinity with Selkups, Tundra and Forest Nentsi. Yenisey-speaking Kets are intermingled with Selkups. Interestingly, Samoyedic-speaking Nganasans from the Taymyr Peninsula form a separate tight cluster closer to Evenks, Yukagirs, and Koryaks.
ADMIXTURE and the “Siberian component”
Among Siberians, the Komi are primarily Europeans, while Nganasans, Evenks, Yukagirs, and Koryaks are nearly 100% East Asians. At K = 4 finer scale subcontinental structure can be distinguished with the emergence of a “Siberian” component. This component is highly pronounced in the Nganasans. Outside Siberia, this component is present in Germany and in CAS at low frequency. Within ancient cultures, this component has the highest frequency in three BA Karasuk samples. It is also found in Mal’ta, ENE Afanasievo and BA Andronovo, but not in Ust’-Ishim and BA Okunevo. At K = 5, the “Siberian” component is roughly subdivided into two components with different geographic distributions. The “Nganasan” component is frequent in nearly all Siberian populations, except the Komi, Kets and Selkups. The newly derived “Selkup-Ket” component is found at high frequencies in western Siberian populations. It is observed in BA Karasuk and in Mal’ta. At K = 6, the western Siberian “Nentsi-Khant” ancestry component was developed in Forest and Tundra Nentsi, Khants. This component is also present at low levels in EUR, CAS, Tibet, and southern Siberia.
The Dolgans share more segments with the Nganasans than within themselves (54.13 vs 41.72, Mann-Whitney test, P = .000000000001562546). The result is not surprising as the demographic data showed that the Nganasans were subjected to intense assimilation by the Dolgans in the second half of the 20th century (Goltsova, Osipova, Zhadanov, & Villems, 2005). Tundra Nentsi share more IBD with Forest Nentsi than within themselves (83.96 vs 50.3, P = .000055) possibly due to the common origin and long-term gene flow. The Ket and Selkup populations allocate significantly more IBD blocks between populations than with individuals from their own population (121.2 cM vs 85.9 cM for Kets, P = .000008, and 121.2 cM vs 114.9 cM for Selkups, P = .043).
Haplogroup N in Siberia
Although Siberia exhibits 42 haplogroups, the vast majority of Siberian Y-chromosomes belong only to 4 of the 18 major clades (N = 46.2%; C = 20.9%; Q = 14.4%; and R = 15.2%). The Y-chromosome haplogroup N is widely spread across Siberia and Eastern Europe (Ilumae et al., 2016; Karafet et al., 2002; Wong et al., 2016) and reaches its maximum frequency among Siberian populations such as Nganasans (94.1%) and Yakuts (91.9%). Within Siberia, two sister subclades N-P43 and N-L708 show different geographic distributions. N-P43 and derived haplogroups N-P63 and N- P362 (phylogenetically identical to N-B478* and N-B170, respectively) (Ilumae et al., 2016) are extremely rare in other major geographic regions. Likely originating in western Siberia, they are limited almost entirely to northwest Siberia, the Volga- Uralic regions, and the Taymyr Peninsula (ie, do not extend to eastern Siberia). Conversely, clade N-L708 is frequent in all Siberian populations except the Kets and Selkups, reaching its highest frequency in the Yakuts (91.9%).
Surprisingly, not a single sign of the proposed reindeer pastoralist horde led by Nganasans into north-eastern Europe. This is strange because “Siberian” migrants hypothetically imposed their language over Indo-Europeans quite recently, apparently after the Iron Age…
Interesting comparisons among Siberian groups, though.
We report genome-wide ancient DNA from 49 individuals forming four parallel time transects in Belize, Brazil, the Central Andes, and the Southern Cone, each dating to at least ∼9,000 years ago. The common ancestral population radiated rapidly from just one of the two early branches that contributed to Native Americans today. We document two previously unappreciated streams of gene flow between North and South America. One affected the Central Andes by ∼4,200 years ago, while the other explains an affinity between the oldest North American genome associated with the Clovis culture and the oldest Central and South Americans from Chile, Brazil, and Belize. However, this was not the primary source for later South Americans, as the other ancient individuals derive from lineages without specific affinity to the Clovis-associated genome, suggesting a population replacement that began at least 9,000 years ago and was followed by substantial population continuity in multiple regions.
The D4h3a mtDNA haplogroup has been hypothesized to be a marker for an early expansion into the Americas along the Pacific coast (Perego et al., 2009). However, its presence in two Lapa do Santo individuals and Anzick-1 (Rasmussen et al., 2014) makes this hypothesis unlikely.
The patterns we observe on the Y chromosome also force us to revise our understanding of the origins of present-day variation. Our ancient DNA analysis shows that the Q1a2a1b-CTS1780 haplogroup, which is currently rare, was present in a third of the ancient South Americas. In addition, our observation of the currently extremely rare C2b haplogroup at Lapa do Santo disproves the suggestion that it was introduced after 6,000 BP (Roewer et al., 2013).
(…) Our discovery that the Clovis-associated Anzick-1 genome at ∼12,800 BP shares distinctive ancestry with the oldest Chilean, Brazilian, and Belizean individuals supports the hypothesis that an expansion of people who spread the Clovis culture in North America also affected Central and South America, as expected if the spread of the Fishtail Complex in Central and South America and the Clovis Complex in North America were part of the same phenomenon (direct confirmation would require ancient DNA from a Fishtail-context) (Pearson, 2017). However, the fact that the great majority of ancestry of later South Americans lacks specific affinity to Anzick-1 rules out the hypothesis of a homogeneous founding population. Thus, if Clovis-related expansions were responsible for the peopling of South America, it must have been a complex scenario involving arrival in the Americas of sub-structured lineages with and without specific Anzick-1 affinity, with the one with Anzick-1 affinity making a minimal long-term contribution. While we cannot at present determine when the non-Anzick-1 associated lineages first arrived in South America, we can place an upper bound on the date of the spread to South America of all the lineages represented in our sampled ancient genomes as all are ANC-A and thus must have diversified after the ANC-A/ANC-B split estimated to have occurred ∼17,500–14,600 BP (Moreno-Mayar et al., 2018a).
Studies of the peopling of the Americas have focused on the timing and number of initial migrations. Less attention has been paid to the subsequent spread of people within the Americas. We sequenced 15 ancient human genomes spanning Alaska to Patagonia; six are ≥10,000 years old (up to ~18× coverage). All are most closely related to Native Americans, including an Ancient Beringian individual, and two morphologically distinct “Paleoamericans.” We find evidence of rapid dispersal and early diversification, including previously unknown groups, as people moved south. This resulted in multiple independent, geographically uneven migrations, including one that provides clues of a Late Pleistocene Australasian genetic signal, and a later Mesoamerican-related expansion. These led to complex and dynamic population histories from North to South America.
The Australasian signal is not present in USR1 or Spirit Cave, but only appears in Lagoa Santa. None of these individuals has UPopA/Mesoamerican-related admixture, which ap-parently dampened the Australasian signature in South American groups, such as the Karitiana. These findings suggest the Australasian signal, possibly present in a structured ancestral NA population, was absent in NA prior to the Spirit Cave/Lagoa Santa split. Groups carrying this signal were either already present in South America when the ancestors of Lagoa Santa reached the region, or Australasian-related groups arrived later but before 10.4 ka (the Lagoa Santa 14C age). That this signal has not been previously documented in North America implies that an earlier group possessing it had disappeared, or a later-arriving group passed through North America without leaving any genetic trace. If such a signal is ultimately detected in North America it could help determine when groups bear-ing Australasian ancestry arrived, relative to the divergence of SNA groups.
Although we detect the Australasian signal in one of the Lagoa Santa individuals identified as a “Paleoamerican,” it is absent in other “Paleoamericans” (2, 10), including Spirit Cave with its strong genetic affinities to Lagoa Santa. This indicates the “Paleoamerican” cranial form is not associated with the Australasian genetic signal, as previously suggested (6), or any other specific NA clade (2). The cause of this cranial form, if it is representative of broader population pat-terns, evidently did not result from separate ancestry, but likely multiple factors, including isolation and drift and non-stochastic mechanisms.
The peopling of the Andean highlands above 2500 m in elevation was a complex process that included cultural, biological, and genetic adaptations. Here, we present a time series of ancient whole genomes from the Andes of Peru, dating back to 7000 calendar years before the present (BP), and compare them to 42 new genome-wide genetic variation datasets from both highland and lowland populations. We infer three significant features: a split between low- and high-elevation populations that occurred between 9200 and 8200 BP; a population collapse after European contact that is significantly more severe in South American lowlanders than in highland populations; and evidence for positive selection at genetic loci related to starch digestion and plausibly pathogen resistance after European contact. We do not find selective sweep signals related to known components of the human hypoxia response, which may suggest more complex modes of genetic adaptation to high altitude.
To understand the population history and context of dairy pastoralism in the eastern Eurasian steppe, we applied genomic and proteomic analyses to individuals buried in Late Bronze Age (LBA) burial mounds associated with the Deer Stone-Khirigsuur Complex (DSKC) in northern Mongolia. To date, DSKC sites contain the clearest and most direct evidence for animal pastoralism in the Eastern steppe before ca. 1200 BCE.
Most LBA Khövsgöls are projected on top of modern Tuvinians or Altaians, who reside in neighboring regions. In comparison with other ancient individuals, they are also close to but slightly displaced from temporally earlier Neolithic and Early Bronze Age (EBA) populations from the Shamanka II cemetry (Shamanka_EN and Shamanka_EBA, respectively) from the Lake Baikal region. However, when Native Americans are added to PC calculation, we observe that LBA Khövsgöls are displaced from modern neighbors toward Native Americans along PC2, occupying a space not overlapping with any contemporary population. Such an upward shift on PC2 is also observed in the ancient Baikal populations from the Neolithic to EBA and in the Bronze Age individuals from the Altai associated with Okunevo and Karasuk cultures.
(…) two individuals fall on the PC space markedly separated from the others: ARS017 is placed close to ancient and modern northeast Asians, such as early Neolithic individuals from the Devil’s Gate archaeological site (22) and present-day Nivhs from the Russian far east, while ARS026 falls midway between the main cluster and western Eurasians.
Upper Paleolithic Siberians from nearby Afontova Gora and Mal’ta archaeological sites (AG3 and MA-1, respectively) (25, 26) have the highest extra affinity with the main cluster compared with other groups, including the eastern outlier ARS017, the early Neolithic Shamanka_EN, and present-day Nganasans and Tuvinians (Z > 6.7 SE for AG3). Main cluster Khövsgöl individuals mostly belong to Siberian mitochondrial (A, B, C, D, and G) and Y (all Q1a but one N1c1a) haplogroups.
Previous studies show a close genetic relationship between WSH populations and ANE ancestry, as Yamnaya and Afanasievo are modeled as a roughly equal mixture of early Holocene Iranian/ Caucasus ancestry (IRC) and Mesolithic Eastern European hunter-gatherers, the latter of which derive a large fraction of their ancestry from ANE. It is therefore important to pinpoint the source of ANE-related ancestry in the Khövsgöl gene pool: that is, whether it derives from a pre-Bronze Age ANE population (such as the one represented by AG3) or from a Bronze Age WSH population that has both ANE and IRC ancestry.
The amount of WSH contribution remains small (e.g., 6.4 ± 1.0% from Sintashta). Assuming that the early Neolithic populations of the Khövsgöl region resembled those of the nearby Baikal region, we conclude that the Khövsgöl main cluster obtained ∼11% of their ancestry from an ANE source during the Neolithic period and a much smaller contribution of WSH ancestry (4–7%) beginning in the early Bronze Age.
Apparently, then, the first individual with substantial WSH ancestry in the Khövsgöl population (ARS026, of haplogroup R1a-Z2123), directly dated to 1130–900 BC, is consistent with the first appearance of admixed forest-steppe-related populations like Karasuk (ca. 1200-800 BC) in the Altai. Interestingly, haplogroup N1a1a-M178 pops up (with mtDNA U5a2d1) among the earlier Khövsgöl samples.
I will repeat what I wrote recently here: Samoyedic arrived in the Altai with Karasuk and hg R1a-Z645 + Steppe_MLBA-like ancestry, admixed with Altai populations, clustering thus within an Ancient Altai cline. Only later did N1a1a subclades infiltrate Samoyedic (and Ugric) populations, bringing them closer to their modern Palaeo-Siberian cline. The shared mtDNA may support an ancestral EHG-“Siberian” cline, or else a more recent Afanasevo-related origin.
Also interesting, Q1a2 subclades and ANE ancestry making its appearance everywhere among ancestral Eurasian peoples, as Chetan recently pointed out.
Let me begin this final post on the Corded Ware—Uralic connection with an assertion that should be obvious to everyone involved in ethnolinguistic identification of prehistoric populations but, for one reason or another, is usually forgotten. In the words of David Reich, in Who We Are and How We Got Here (2018):
Human history is full of dead ends, and we should not expect the people who lived in any one place in the past to be the direct ancestors of those who live there today.
Another recurrent argument – apart from “Siberian ancestry” – for the location of the Uralic homeland is “haplogroup N”. This is as serious as saying “haplogroup R1” to refer to Indo-European migrations, but let’s explore this possibility anyway:
We have now a better idea of how many ancient migrations (previously hypothesized to be associated with westward Uralic migrations) look like in genetic terms. From Damgaard et al. (Science 2018):
These serial changes in the Baikal populations are reflected in Y-chromosome lineages (Fig. SA; figs. S24 to S27, and tables S13 and SI4). MAI carries the R haplogroup, whereas the majority of Baikal_EN males belong to N lineages, which were widely distributed across Northern Eurasia (29), and the Baikal_LNBA males all carry Q haplogroups, as do most of the Okunevo_EMBA as well as some present-day Central Asians and Siberians.
The only N1c1 sample comes from Ust’Ida Late Neolithic, 180km to the north of Lake Baikal, which – together with the Bronze Age sample from the Kola peninsula, and the medieval sample from Ust’Ida – gives a good idea of the overall expansion of N subclades and Siberian ancestry among the Circum-Arctic peoples of Eurasia, speakers of Palaeo-Siberian languages.
What we should expect from Uralic peoples expanding with haplogroup N – seeing how Yamna expands with R1b-L23, and Corded Ware expands with R1a-Z645 – is to find a common subclade spreading with Uralic populations. Let’s see if it works like that for any N-X subclade, in data from Ilumäe et al. (2016):
Within the Eurasian circum-Arctic spread zone, N3 and N2a reveal a well-structured spread pattern where individual sub-clades show very different distributions:
N1a1-M46 (or N-TAT), formed ca. 13900 BC, TMRCA 9800 BC
N1a1a2-B187, formed ca. 9800 BC, TMRCA 1050 AD:
The sub-clade N3b-B187 is specific to southern Siberia and Mongolia, whereas N3a-L708 is spread widely in other regions of northern Eurasia.
N1a1a1a-L708, formed ca. 6800 BC, TMRCA 5400 BC.
N1a1a1a2-B211/Y9022, formed ca. 5400 BC, TMRCA 1900 BC:
The deepest clade within N3a is N3a1-B211, mostly present in the Volga-Uralic region and western Siberian Khanty and Mansi populations.
N1a1a1a1a-L392/L1026), formed ca. 4400 BC, TMRCA 2800 BC:
The neighbor clade, N3a3’6-CTS6967, spreads from eastern Siberia to the eastern part of Fennoscandia and the Baltic States
N1a1a1a1a1a-CTS2929/VL29, formed ca. 2100 BC, TMRCA 1600 BC:
In Europe, the clade N3a3-VL29 encompasses over a third of the present-day male Estonians, Latvians, and Lithuanians but is also present among Saami, Karelians, and Finns (Table S2 and Figure 3). Among the Slavic-speaking Belarusians, Ukrainians, and Russians, about three-fourths of their hg N3 Y chromosomes belong to hg N3a3.
In the post on Finno-Permic expansions, I depicted what seems to me the most likely way of infiltration of N1c-L392 lineages with Akozino warrior-traders into the western Finno-Ugric populations, with an origin around the Barents sea.
This includes the potential spread of (a minority of) N1c-B211 subclades due to contacts with Anonino on both sides of the Urals, through a northern route of forest and forest-steppe regions (equivalent to the distribution of Cherkaskul compared to Andronovo), given the spread of certain subclades in Ugric populations.
NOTE. An alternative possibility is the association of certain B211 subclades with a southern route of expansion with Pre-Scythian and Scythian populations, under whose influence the Ananino culture emerged -which would imply a very quick infiltration of certain groups of haplogroup N everywhere among Finno-Ugrics on both sides of the Urals – , and also the expansion of some subclades with Turkic-speaking peoples, who apparently expanded with alliances of different peoples. Both (Scythian and Turkic) populations expanded from East Asia, where haplogroup N (including N1c) was present since the Neolithic. I find this a worse model of expansion for upper clades, but – given the YFull estimates and the presence of this haplogroup among Turkic peoples – it is a possibility for many subclades.
N1a1a1a1a2-Z1936, formed ca. 2800 BC, TMRCA 2400 BC:
The only notable exception from the pattern are Russians from northern regions of European Russia, where, in turn, about two-thirds of the hg N3 Y chromosomes belong to the hg N3a4-Z1936—the second west Eurasian clade. Thus, according to the frequency distribution of this clade, these Northern Russians fit better among other non-Slavic populations from northeastern Europe. N3a4 tends to increase in frequency toward the northeastern European regions but is also somewhat unexpectedly a dominant hg N3 lineage among most Turcic-speaking Volga Tatars and South-Ural Bashkirs.
N1a1a1a1a4-M2019 (previously N3a2), formed ca. 4400 BC, TMRCA 1700 BC:
Sub-hg N3a2-M2118 is one of the two main bifurcating branches in the nested cladistic structure of N3a2’6-M2110. It is predominantly found in populations inhabiting present-day Yakutia (Republic of Sakha) in central Siberia and at lower frequencies in the Khanty and Mansi populations, which exhibit a distinct Y-STR pattern (Table S7) potentially intrinsic to an additional clade inside the sub-hg N3a2
The second widespread sub-clade of hg N is N2a. (…):
N1a2b-P43 (B523/FGC10846/Y3184), formed ca. 6800 BC, TMRCA ca. 2700 BC:
The absolute majority of N2a individuals belong to the second sub-clade, N2a1-B523, which diversified about 4.7 kya (95% CI = 4.0–5.5 kya). Its distribution covers the western and southern parts of Siberia, the Taimyr Peninsula, and the Volga-Uralic region with frequencies ranging from from 10% to 30% and does not extend to eastern Siberia (…)
The “European” branch suggested earlier from Y-STR patterns turned out to consist of two clades
N1a2b2a-Y3185/FGC10847, formed ca. 2200 BC, TMRCA 800 BC:
N2a1-L1419, spread mainly in the northern part of that region.
N1a2b2b1-B528/Y24382, formed ca. 900 BC, TMRCA ca. 900 BC:
N2a1-B528, spread in the southern Volga-Uralic region.
We also have a good idea of the distribution of haplogroup R1a-Z645 in ancient samples. Its subclades were associated with the Corded Ware expansion, and some of them fit quite well the early expansion of Finno-Permic, Ugric, and Samoyedic peoples to the east.
This is how the modern distribution of R1a among Uralians looks like, from the latest report in Tambets et al. (2018):
Among Fennic populations, Estonians and Karelians (ca. 1.1 million) have not suffered the greatest bottleneck of Finns (ca. 6-7 million), and show thus a greater proportion of R1a-Z280 than N1c subclades, which points to the original situation of Fennic peoples before their expansion. To trust Finnish Y-DNA to derive conclusions about the Uralic populations is as useful as relying on the Basque Y-DNA for the language spread by R1b-P312…
Among Volga-Finnic populations, Mordovians (the closest to the original Uralic cluster, see above) show a majority of R1a lineages (27%).
Hungarians (ca. 13-15 million) represent the majority of Ugric (and Finno-Ugric) peoples. They are mainly R1a-Z280, also R1a-Z2123, have little N1c, and lack Siberian ancestry, and represent thus the most likely original situation of Ugric peoples in 4th century AD (read more on Avars and Hungarians).
Among Samoyedic peoples, the Selkup, the southernmost ones and latest to expand – that is, those not heavily admixed with Siberian populations – , also have a majority of R1a-Z2123 lineages (see also here for the original Samoyedic haplogroups to the south).
To understand the relevance of Hungarians for Ugric peoples, as well as Estonians, Karelians, and Mordovians (and northern Russians, Finno-Ugric peoples recently Russified) for Finno-Permic peoples, as opposed to the Circum-Arctic and East Siberian populations, one has to put demographics in perspective. Even a modern map can show the relevance of certain territories in the past:
Summary of ancestry + haplogroups
Fennic and Samic populations seem to be clearly influenced by Palaeo-Laplandic peoples, whereas Volga-Finnic and especially Permic populations may have received gene flow from both, but essentially Palaeo-Siberian influence from the north and east.
The fact that modern Mansis and Khantys offer the highest variation in N1a subclades, and some of the highest “Siberian ancestry” among non-Nganasans, should have raised a red flag long ago. The fact that Hungarians – supposedly stemming from a source population similar to Mansis – do not offer the same amount of N subclades or Siberian ancestry (not even close), and offer instead more R1a, in common with Estonians (among Finno-Samic peoples) and Mordvins (among Volga-Finnic peoples) should have raised a still bigger red flag. The fact that Nganasans – the model for Siberian ancestry – show completely different N1a2b-P43 lineages should have been a huge genetic red line (on top of the anthropological one) to regard them as the Uralian-type population.
It is not hard to model the stepped arrival, infiltration, and/or resurge of N subclades and “Siberian ancestries”, as well as their gradual expansion in certain regions, associated with certain migrations first – such as the expansions to the Circum-Arctic region, and later the Scythian- and Turkic-related movements – , as well as limited regional developments, like the known bottleneck in Finns, or the clear late expansion of Ugric and Samoyedic languages to the north among nomadic Palaeo-Siberians due to traditions of exogamy and multilingualism. This fits quite well with the different arrival of N (N1c and xN1c) lineages to the different Uralic-speaking groups, and to the stepped appearance of “Siberian ancestry” in the different regions.
It is evident that a lot of people were too attached to the idea of Palaeolithic R1b lineages ‘native’ to western Europe speaking Basque languages; of R1a lineages speaking Indo-European and spreading with Yamna; and N lineages ‘native’ to north-eastern Europe and speaking Uralic, and this is causing widespread weeping and gnashing of teeth (instead of the joy of discovering where one’s true patrilineal ancestors come from, and what language they spoke in each given period, which is the supposed objective of genetic genealogy…)
As far as I know – and there might be many other similar pet theories out there – there have been proposals of “modern Balto-Slavic-like” populations (in an obvious circular reasoning based on modern populations) in some Scythian clusters of the Iron Age.
NOTE. I will not enter into “Balto-Slavic-like R1a” of the Late Bronze Age or earlier because no one can seriously believe at this point of development of Population Genetics that autosomal similarity predating 1,500+ years the appearance of Slavs equates to their (ethnolinguistic) ancestral population, without a clear intermediate cultural and genetic trail – something we lack today in the Slavic case even for the late Roman period…
We also know of R1a-Z280 lineages in Srubna, probably expanding to the west. With that in mind, and knowing that Palaeo-Germanic was in close contact with Finno-Samic while both were already separated but still in contact, and that Palaeo-Germanic was also in contact and closely related to a ‘Temematic’ distinct from Balto-Slavic (and also that early Proto-Baltic and Proto-Slavic from the Roman Iron Age and later were in contact with western Uralic) this will be the linguistic map of the Iron Age if R1a is considered to expand Indo-European from some kind of “patron-client” relationship with west Yamna:
My problem with this proposal is that it is obviously beholden to the notion of the uninterrupted cultural, historic and ethnic continuity in certain territories. This bias is common in historiography (von Falkenhausen 1993), but it extends even more easily into the lesser known prehistory of any territory, and now more than ever some people feel the need to corrupt (pre)history based on their own haplogroups (or the majority haplogroups of their modern countries). However, more than on philosophical grounds, my rejection is based on facts: this picture is not what the combination of linguistic, archaeological, and genetic data shows. Period.
Nevertheless, if Yamna + Corded Ware represented the “big and early expansion” of Germanic and Italo-Celtic peoples proper of the dream Nazi’s Lebensraum and Fascist’s spazio vitale proposals; Uralians were Siberian hunter-gatherers that controlled the whole eastern and northern Russia, and miraculously managed to push (ethnolinguistically) Neolithic agropastoralists to the west during and after the Iron Age, with gradual (and often minimal) genetic impact; and Balto-Slavic peoples were represented by horse riders from Pokrovka/Srubna, hiding then somewhere around the forest-steppe until after the Scythian expansion, and then spreading their language (without much genetic impact) during the early Middle Ages…so be it.
Even though proposals of an Eastern Uralic (or Ugro-Samoyedic) group are in the minority – and those who support it tend to search for an origin of Uralic in Central Asia – , there is nothing wrong in supporting this from the point of view of a western homeland, because the eastward migration of both Proto-Ugric and Pre-Samoyedic peoples may have been coupled with each other at an early stage. It’s like Indo-Slavonic: it just doesn’t fit the linguistic data as well as the alternative, i.e. the expansion of Samoyedic first, different from a Finno-Ugric trunk. But, in case you are wondering about this possibility, here is Häkkinen’s (2012) phonological argument:
The case of Samoyedic is quite similar to that of Hungarian, although the earliest Palaeo-Siberian contact languages have been lost. There were contacts at least with Tocharian (Kallio 2004), Yukaghir (Rédei 1999) and Turkic (Janhunen 1998). Samoyedic also:
a) has moved far from the related languages and has been exposed to strong foreign influence
b) shares a small number of common words with other branches (from Sammallahti 1988: only 123 ‘Uralic’ words, versus 390 ‘Uralic’ + ‘Finno-Ugric’ words found in other branches than Samoyedic = 31,5 %)
c) derives phonologically from the East Uralic dialect.
The phonological level is taxonomically more reliable, since it lacks the distortion caused by invisible convergence and false divergence at the lexical level. Thus we can conclude that the traditional taxonomic model, according to which Samoyedic was the first branch to split off from the Proto-Uralic unity, is just as incorrect as the view that Hungarian was the first branch to split off.
Late Uralic can be traced back to metallurgical cultures thanks to terms like PU *wäśka ‘copper/bronze’ (borrowed from Proto-Samoyedic *wesä into Tocharian); PU *äsa and *olna/*olni, ‘lead’ or ‘tin’, found in *äsa-wäśka ‘tin-bronze’; and e.g. *weŋći ‘knife’, borrowed into Indo-Iranian (through the stage of vocalization of nasals), appearing later as Proto-Indo-Aryan *wāćī ‘knife, awl, axe’.
It is known that the southern regions of the Abashevo culture developed Proto-Indo-Iranian-speaking Sintashta-Petrovka and Pokrovka (Early Srubna). To the north, however, Abashevo kept its Uralic nature, with continuous contacts allowing for the spread of lexicon – mainly into Finno-Ugric – , and phonetic influence – mainly Uralisms into Proto-Indo-Iranian phonology (read more here).
The northern part of Abashevo (just like the south) was mainly a metallurgical society, with Abashevo metal prospectors found also side by side with Sintashta pioneers in the Zeravshan Valley, near BMAC, in search of metal ores. About the Seima-Turbino phenomenon, from Parpola (2013):
From the Urals to the east, the chain of cultures associated with this network consisted principally of the following: the Abashevo culture (extending from the Upper Don to the Mid- and South Trans-Urals, including the important cemeteries of Sejma and Turbino), the Sintashta culture (in the southeast Urals), the Petrovka culture (in the Tobol-Ishim steppe), the Taskovo-Loginovo cultures (on the Mid- and Lower Tobol and the Mid-Irtysh), the Samus’ culture (on the Upper Ob, with the important cemetery of Rostovka), the Krotovo culture (from the forest steppe of the Mid-Irtysh to the Baraba steppe on the Upper Ob, with the important cemetery of Sopka 2), the Elunino culture (on the Upper Ob just west of the Altai mountains) and the Okunevo culture (on the Mid-Yenissei, in the Minusinsk plain, Khakassia and northern Tuva). The Okunevo culture belongs wholly to the Early Bronze Age (c. 2250–1900 BCE), but most of the other cultures apparently to its latter part, being currently dated to the pre-Andronovo horizon of c. 2100–1800 BCE (cf. Parzinger 2006: 244–312 and 336; Koryakova & Epimakhov 2007: 104–105).
The majority of the Sejma-Turbino objects are of the better quality tin-bronze, and while tin is absent in the Urals, the Altai and Sayan mountains are an important source of both copper and tin. Tin is also available in southern Central Asia. Chernykh & Kuz’minykh have accordingly suggested an eastern origin for the Sejma-Turbino network, backing this hypothesis also by the depiction on the Sejma-Turbino knives of mountain sheep and horses characteristic of that area. However, Christian Carpelan has emphasized that the local Afanas’evo and Okunevo metallurgy of the Sayan-Altai area was initially rather primitive, and could not possibly have achieved the advanced and difficult technology of casting socketed spearheads as one piece around a blank. Carpelan points out that the first spearheads of this type appear in the Middle Bronze Age Caucasia c. 2000 BCE, diffusing early on to the Mid-Volga-Kama-southern Urals area, where “it was the experienced Abashevo craftsmen who were able to take up the new techniques and develop and distribute new types of spearheads” (Carpelan & Parpola 2001: 106, cf. 99–106, 110). The animal argument is countered by reference to a dagger from Sejma on the Oka river depicting an elk’s head, with earlier north European prototypes (Carpelan & Parpola 2001: 106–109). Also the metal analysis speaks for the Abashevo origin of the Sejma-Turbino network. Out of 353 artefacts analyzed, 47% were of tin-bronze, 36% of arsenical bronze, and 8.5% of pure copper. Both the arsenical bronze and pure copper are very clearly associated with the Abashevo metallurgy.
The Abashevo metal production was based on the Volga-Kama-Belaya area sandstone ores of pure copper and on the more easterly Urals deposits of arsenical copper (Figure 9). The Abashevo people, expanding from the Don and Mid-Volga to the Urals, first reached the westerly sandstone deposits of pure copper in the Volga and Kama basins, and started developing their metallurgy in this area, before moving on to the eastern side of the Urals to produce harder weapons and tools of arsenical copper. Eventually they moved even further south, to the area richest in copper in the whole Urals region, founding there the very strong and innovative Sintashta culture.
Regarding the most likely expansion of Eastern Uralic peoples:
Nataliya L’vovna Chlenova (1929–2009; cf. Korenyako & Ku’zminykh 2011) published in 1981 a detailed study of the Cherkaskul’ pottery. In her carefully prepared maps of 1981 and 1984 (Figure 10), she plotted Cherkaskul’ monuments not only in Bashkiria and the Trans-Urals, but also in thick concentrations on the Upper Irtysh, Upper Ob and Upper Yenissei, close to the Altai and Sayan mountains, precisely where the best experts suppose the homeland of Proto-Samoyed to be.
The Cherkaskul’ culture was transformed into the genetically related Mezhovka culture (c. 1500–1000 BCE), which occupied approximately the same area from the Mid-Kama and Belaya rivers to the Tobol river in western Siberia (cf. Parzinger 2006: 444–448; Koryakova & Epimakhov 2007: 170–175). The Mezhovka culture was in close contact with the neighbouring and probably Proto-Iranian speaking Alekseevka alias Sargary culture (c. 1500–900 BCE) of northern Kazakhstan (Figure 4 no. 8) that had a Fëdorovo and Cherkaskul’ substratum and a roller pottery superstratum (cf. Parzinger 2006: 443–448; Koryakova & Epimakhov 2007: 161–170). Both the Cherkaskul’ and the Mezhovka cultures are thought to have been Proto-Ugric linguistically, on the basis of the agreement of their area with that of Mansi and Khanty speakers, who moreover in their Fëdorovo-like ornamentation have preserved evidence of continuity in material culture (cf. Chlenova 1984; Koryakova & Epimakhov 2007: 159, 175).
The Mezhovka culture was succeeded by the genetically related Gamayun culture (c. 1000–700 BCE) (cf. Parzinger 2006: 446; 542–545).
From the Gamayun culture descend Trans-Urals cultures in close contact with Finno-Permic populations of the Cis-Ural region:
[Proto-Mansi] Itkul’ culture (c. 700–200 BCE) distributed along the eastern slope of the Ural Mountains (cf. Parzinger 2006: 552–556). Known from its walled forts, it constituted the principal Trans-Uralian centre of metallurgy in the Iron Age, and was in contact with both the Anan’ino and Akhmylovo cultures (the metallurgical centres of the Mid-Volga and Kama-Belaya region) and the neighbouring Gorokhovo culture.
[Proto-Hungarian] via the Vorob’evo Group (c. 700–550 BCE) (cf. Parzinger 2006: 546–549), to the Gorokhovo culture (c. 550–400 BCE) of the Trans-Uralian forest steppe (cf. Parzinger 2006: 549–552). For various reasons the local Gorokhovo people started mobile pastoral herding and became part of the multicomponent pastoralist Sargat culture (c. 500 BCE to 300 CE), which in a broader sense comprized all cultural groups between the Tobol and Irtysh rivers, succeeding here the Sargary culture. The Sargat intercommunity was dominated by steppe nomads belonging to the Iranian-speaking Saka confederation, who in the summer migrated northwards to the forest steppe
[Proto-Khanty] Late Bronze Age and Early Iron Age cultures related to the Gamayunskoe and Itkul’ cultures that extended up to the Ob: the Nosilovo, Baitovo, Late Irmen’, and Krasnoozero cultures (c. 900–500 BCE). Some were in contact with the Akhmylovo on the Mid-Volga.
Parpola (2012) connects the expansion of Samoyedic with the Cherkaskul variant of Andronovo. As we know, Andronovo was genetically diverse, which speaks in favour of different groups developing similar material cultures in Central Asia.
Juha Janhunen, author of the etymological dictionary of the Samoyed languages (1977), places the homeland of Proto-Samoyedic in the Minusinsk basin on the Upper Yenissei (cf. Janhunen 2009: 72). Mainly on the basis of Bulghar Turkic loanwords, Janhunen (2007: 224; 2009: 63) dates Proto-Samoyedic to the last centuries BCE. Janhunen thinks that the language of the Tagar culture (c. 800–100 BCE) ought to have been Proto-Samoyedic (cf. Janhunen 1983: 117– 118; 2009: 72; Parzinger 2001: 80 and 2006: 619–631 dates the Tagar culture c. 1000–200 BCE; Svyatko et al. 2009: 256, based on human bone samples, c. 900 BCE to 50 CE). The Tagar culture largely continues the traditions of the Karasuk culture (c. 1400–900 BCE), (…)
The use of a map of “Siberian ancestry” peaking in the arctic to show a supposedly late Uralic population movement (starting in the Iron Age!) seems to be the latest trend in population genomics:
I guess that would make this map of Neolithic farmer ancestry represent an expansion of Indo-European from the south, because Anatolia, Greece, Italy, southern France, and Iberia – where this ancestry peaks in modern populations – are among the oldest territories where Indo-European languages were recorded:
Probably not the right interpretation of this kind of simplistic data about modern populations, though…
Overall, and specifically at lower values of K, the genetic makeup of Uralic speakers resembles that of their geographic neighbours. The Saami and (a subset of) the Mansi serve as exceptions to that pattern being more similar to geographically more distant populations (Fig. 3a, Additional file 3: S3). However, starting from K = 9, ADMIXTURE identifies a genetic component (k9, magenta in Fig. 3a, Additional file 3: S3), which is predominantly, although not exclusively, found in Uralic speakers. This component is also well visible on K = 10, which has the best cross-validation index among all tests (Additional file 3: S3B). The spatial distribution of this component (Fig. 3b) shows a frequency peak among Ob-Ugric and Samoyed speakers as well as among neighbouring Kets (Fig. 3a). The proportion of k9 decreases rapidly from West Siberia towards east, south and west, constituting on average 40% of the genetic ancestry of FU speakers in Volga-Ural region (VUR) and 20% in their Turkic-speaking neighbours (Bashkirs, Tatars, Chuvashes; Fig. 3a).
However, this ‘something’ that some people occasionally find in some Uralic populations is also common to other modern and ancient groups, and not so common in some other Uralic peoples. Simply put:
I already said this in the recent publication of Siberian samples, where a renamed and radiocarbon dated Finnish_IA clearly shows that Late Iron Age Saami (ca. 400 AD) had little “Siberian ancestry”, if any at all, representing the most likely Fennic (and Samic) ancestral components before their expansion into central and northern Finland, where they admixed with circum-polar peoples of asbestos ware cultures.
I will say that again and again, any time they report the so-called “Siberian ancestry” in Uralic samples, no matter how it is defined each time: it does not seem to be that special something people are looking for, but rather (at least in a great part) a quite old ancestral component forming an evident cline with EHG, whose best proximate source are Baikal_EN (and/or Devil’s Gate) at this moment, and thus also East European hunter-gatherers for Western Uralic peoples:
So either Samara_HG, Karelia_HG, and many other groups from eastern Europe all spoke Uralic according to this ADMIXTURE graphic (and the formation of steppe ancestry in the Volga-Ural region brought the Proto-Indo-European language to the steppes through the CHG/ANE expansion), or a great part of this “Siberian ancestry” found in modern Uralic-speaking populations is not what some people would like to think it is…
PCA clines can be looked for to represent expansions of ancient populations. Most recently, Flegontov et al. (2018) are attempting to do this with Asian populations:
For some Turkic groups in the Urals and the Altai regions and in the Volga basin, a different admixture model fits the data: the same West Eurasian source + Uralic- or Yeniseian-speaking Siberians. Thus, we have revealed an admixture cline between Scythians and the Iranian farmer genetic cluster, and two further clines connecting the former cline to distinct ancestry sources in Siberia. Interestingly, few Wusun-period individuals harbor substantial Uralic/Yeniseian-related Siberian ancestry, in contrast to preceding Scythians and later Turkic groups characterized by the Tungusic/Mongolic-related ancestry. It remains to be elucidated whether this genetic influx reflects contacts with the Xiongnu confederacy. We are currently assembling a collection of samples across the Eurasian steppe for a detailed genetic investigation of the Hunnic confederacies.
There are potential errors with this approach:
The main one is practical – does a modern cline represent an ancestral language? The answer is: sometimes. It depends on the anthropological context that we have, and especially on the precision of the PCA:
The ‘Europe’, ‘Middle East’, etc. clines of the above PCA do not represent one language, but many. For starters, the PCA includes too many (and modern) populations, its precision is useless for ethnolinguistic groups. Which is the right level? Again, it depends.
The other error is one of detail of the clines drawn (which, in turn, depends on the precision of the PCA). For example, we can draw two paralell lines (or even one line, as in Flegontov et al. above) in one PCA graphic, but we still don’t have the direction of expansion. How do we know if this supposed “Uralic-speaking cline” goes from one region to the other? For that level of detail, we should examine closely modern Uralic-speaking peoples and Circum-Arctic populations:
The real ancient Uralic cluster (drawn above in blue) is thus probably from a North-East European source (probably formed by Battle Axe / Fatyanovo-Balanovo / Abashevo) to the east into Siberian populations, and to the north into Laplandic populations (see below also on Mezhovska ancestry for the drawn ‘European cline’, which some may a priori wrongly assume to be quite late).
The fact that the three formed clines point to an admixture of CWC-related populations from North-Eastern Europe, and that variation is greater at the Palaeo-Laplandic and Palaeo-Siberian extremities compared to the CWC-related one, also supports this as the correct interpretation.
However, judging by the two main clines formed, one could be alternatively inclined to interpret that Palaeo-Laplandic and Palaeo-Siberian populations formed a huge ancestral “Uralic” ghost cluster in Siberia (spanning from the Palaeo-Laplandic to the Palaeo-Siberian one), and from there expanded Finno-Samic on one hand, and “Volga-Ugro-Samoyed” on the other. That poses different problems: an obvious linguistic and archaeological one – which I assume a lot of people do not really care about – , and a not-so-obvious genetic one (see below for ancient samples and for the expansion of haplogroup N).
Unlike this PCA with ancient samples, where Bell Beaker clines could be a rough approximation to the real sources for each population, and where a cluster spanning all three depicted Early Bronze Age clusters could give a rough proximate source of European Bell Beakers in Hungary (and where one can even distinguish the Y-DNA bottlenecks in the L23 trunk created by each cline) the PCA of modern Uralic populations is probably not suitable for a good estimate of the ancient situation, which may be found shifted up or down of the drawn “Uralic” cluster along East European groups.
After all, we already know that the Siberian cline shows probably as much an ancient admixture event – from the original Uralic expansion to the east with Corded Ware ancestry – as another more recent one – a westward migration of Siberian ancestry (or even more than one). While we know with more or less exactitude what happened with the Palaeo-Laplandic admixture by expanding Proto-Finno-Samic populations (see here), the Proto-Ugric and Pre-Samoyedic populations formed probably more than one cline during the different ancient migrations through central Asia.
Apparently, the Corded Ware expansion to the east was not marked by a huge change in ancestry. While the final version of Narasimhan et al. (2018) may show a little more detail about other forest-steppe Seima-Turbino/Andronovo-related migrations (and thus also Eastern Uralic peoples), we have already had enough information for quite some time to get a good idea.
Mezhovska‘s position is similar to the later Pre-Scythian and Scythian populations. There are some interesting details: apart from haplogroup R1a-Z280 (CTS1211+), there is one R1b-M269 (PF6494+), probably Z2103, and an outlier (out of three) in a similar position to the recently described central/southern Scythian clusters.
NOTE. The finding of R1b-M269 in the forest-steppe is probably either 1) from an Afanasevo-Okunevo origin, or 2) from an admixture with neighbouring Andronovo-related populations, such as Sargary. A third, maybe less likely option is that this haplogroup admixed with Abashevo directly (as it happened in Sintashta, Potapovka, or Pokrovka) and formed part of early Uralic migrations. In any case, since Mezhovska is a Bronze Age society from the Urals region, its association with R1b-Z2103 – like the association of R1b-Z2103 in Scythian clusters – cannot be attributed to “Thracian peoples”, a link which is (as I already said) too simplistic.
The drawn “European cline” of Hungarians (see above), leading from ‘west-like’ Mansi to Hungarian populations – and hosting also Finnic and Estonian samples – , cannot therefore be attributed simply to late “Slavic/Balkan-like” admixture.
Karasuk – located further to the east – is basically also Corded Ware peoples showing clearly a recent admixture with local ANE / Baikal_EN-like populations. In terms of haplogroups it shows haplogroup Q, R1a-Z2124, and R1a-Z2123, later found among early Hungarians, and present also in ancient Samoyedic populations now acculturated.
The most interesting aspect of both Mezhovska and Karasuk is that they seem to diverge from a point close to Ukraine_Eneolithic, which is the supposed ancestral source of Corded Ware peoples (read more about the formation of “steppe ancestry”). This means that Eastern Uralians derive from a source closer to Middle Dnieper/Abashevo populations, rather than Battle Axe (shifted to Latvian Neolithic), which is more likely the source prevalent in Finno-Permic peoples.
Their initial admixture with (Palaeo-)Siberian populations is thus seen already starting by this time in Mezhovska and especially in Karasuk, but this process (compared to modern populations) is incomplete:
We know now that Samic peoples expanded during the Late Iron Age into Palaeo-Laplandic populations, admixing with them and creating this modern cline. Finns expanded later to the north (in one of their known genetic bottlenecks), admixing with (and displacing) the Saami in Finland, especially replacing their male lines.
So how did Ugric and Samoyedic peoples admix with Palaeo-Siberian populations further, to obtain their modern cline? The answer is, logically, with East Asian migrations related to forest-steppe populations of Central Asia after the Mezhovska and Karasuk periods, i.e. during the Iron Age and later. Other groups from the forest-steppe in Central Asia show similar East Asian (“Siberian”) admixture. We know this from Narasimhan et al. (2018):
(…) we observe samples from multiple sites dated to 1700-1500 BCE (Maitan, Kairan, Oy_Dzhaylau and Zevakinsikiy) that derive up to ~25% of their ancestry from a source related to present-day East Asians and the remainder from Steppe_MLBA. A similar ancestry profile became widespread in the region by the Late Bronze Age, as documented by our time transect from Zevakinsikiy and samples from many sites dating to 1500-1000 BCE, and was ubiquitous by the Scytho-Sarmatian period in the Iron Age.
Flegontov: Present day Turkic speakers fall into two clusters of admixture patterns (Uralic/Yenisean and Tungussic/Mngolic) based on genomic data with ancient Turks belonging almost exclusively to the first cluster. #ISBA8
The Ugric-speaking Sargat culture in Western Siberia shows the expected mixture of haplogroups (ca. 500 BC – 500 AD), with 5 samples of hg N and 2 of hg R1a1, in Pilipenko et al. (2017). Although radiocarbon dates and subclades are lacking, N lineages probably spread late, because of the late and gradual admixture of Siberian cultures into the Sargat melting pot.
The observed reduction in the genetic distance between the Middle Tagar population and other Scythian like populations of Southern Siberia(Fig 5; S4 Table), in our opinion, is primarily associated with an increase in the role of East Eurasian mtDNA lineages in the gene pool (up to nearly half of the gene pool) and a substantial increase in the joint frequency of haplogroups C and D (from 8.7% in the Early Tagar series to 37.5% in the Middle Tagar series). These features are characteristic of many ancient and modern populations of Southern Siberia and adjacent regions of Central Asia, including the Pazyryk population of the Altai Mountains.
Before the Iron Age, the Karasuk and Mezhovska population were probably already somehow ‘to the north’ within the ancient Steppe-Altai cline (see image below9 created by expanding Seima-Turbino- and Andronovo-related populations. During the Iron Age, further Siberian contributions with Iranian expansions must have placed Uralians of the Central Asian forest-steppe areas much closer to today’s Palaeo-Siberian cline.
However, the modern genetic picture was probably fully developed only in historic times, when Samoyedic and Ugric languages expanded to the north, only in part admixing further with Palaeo-Siberian-speaking nomads from the Circum-Arctic region (see here for a recent history of Samoyedic Enets), which justifies their more recent radical ‘northern shift’.
This late acquisition of the language by Palaeo-Siberian nomads (without much population replacement) also justifies the wide PCA clusters of very small Siberian populations. See for example in the PCA from Tambets et al. (2018):
For their relationship with modern Mansi, we have information on Hungarian conqueror populations from Neparáczki et al. (2018):
Moreover, Y, B and N1a1a1a1a Hg-s have not been detected in Finno-Ugric populations [80–84], implying that the east Eurasian component of the Conquerors and Finno-Ugric people are probably not directly related. The same inference can be drawn from phylogenetic data, as only two Mansi samples appeared in our phylogenetic trees on the side branches (S1 Fig, Networks; 1, 4) suggesting that ancestors of the Mansis separated from Asian ancestors of the Conquerors a long time ago. This inference is also supported by genomic Admixture analysis of Siberian and Northeastern European populations , which revealed that Mansis received their eastern Siberian genetic component approximately 5–7 thousand years ago from ancestors of modern Even and Evenki people. Most likely the same explanation applies to the Y-chromosome N-Tat marker which originated from China [86,87] and its subclades are now widespread between various language groups of North Asia and Eastern Europe .
The genetic picture of Hungarians (their formed cline with Mansi and their haplogroups) may be quite useful for the true admixture found originally in Mansi peoples at the beginning of the Iron Age. By now it is clear even from modern populations that Steppe_MLBA ancestry accompanied the Uralic expansion to the east (roughly approximated in the graphic with Afanasievo_EBA + Bichon_LP EasternHG_M):