There is a good reason for hope, for those who look for a happy ending to the revolution of population genomics that is quickly turning into an involution led by beliefs and personal interests. This blog is apparently one of the the most read sites on Indo-European peoples, if not the most read one, and now on Uralic peoples, too.
I’ve been checking the analytics of our sites, and judging by the numbers of the English blog, Indo-European.eu (without the other languages) is quickly turning into the most visited one from Academia Prisca‘s sites on Indo-European languages, beyond Indo-European.info (and its parent sites in other languages), which host many popular files for download.
If we take into account file downloads (like images or PDFs), and not only what Google Analytics can record, Indo-European.eu has not more users than all other websites of Academia Prisca, but at this pace it will soon reach half the total visits, possibly before the end of 2019.
Overall, we have evolved from some 10,000 users/year in 2006 to ~300,000 active users/year and >1,000,000 page+file views/year in 2018 (impossible to say exactly without spending too much time on this task). Nothing out of the ordinary, I guess, and obviously numbers are not a quality index, but rather a hint at increasing popularity of the subject and of our work.
NOTE. The mean reading time is ~2:40 m, which I guess fits the length of most posts, and most visitors read a mean of ~2+ pages before leaving, with increasing reader fidelity over time.
The most read posts of 2018, now that we can compare those from the last quarter, are as follows:
– The series on the Corded Ware-Uralic theory, with a marked increase in readers, especially with the last three posts:
The most likely reason for the radical increase in this blog’s readership is very simple, then: people want to know what is really happening with the research on ancestral Indo-Europeans and Uralians, and other blogs and forums are not keeping up with that demand, being content with repeating the same ideas again and again (R1a-CWC-IE, R1b-BBC-Vasconic, and N-Comb Ware-Uralic), despite the growing contradictions. As you can imagine, once you have seen the Yamna -> Bell Beaker migration model of North-West Indo-European, with Corded Ware obviously representing Uralic, you can’t unsee it.
The online bullying, personal attacks, and similar childish attempts to silence those who want to talk about this theory elsewhere (while fringe theories like R1a/CHG-OIT, R1b-Vasconic, or the Anatolian/Armenian-CHG hypotheses, to name just a few, are openly discussed) has had, as could be expected, the opposite effect to what was intended. I guess you can say this blog and our projects have profited from the first relevant Streisand effect of population genomics, big time.
If this trend continues this year (and other bloggers’ or forum users’ faith in miracles is not likely to change), I suppose that after the Yamna Hungary samples are published (with the expected results) this blog is going to be the most read in 2020 by a great margin… I can only infer that this tension is also helping raise the interest in (and politicization of) the question, hence probably the overall number of active users and their participation in other blogs and forums is going to increase everywhere in 2019, too, as this debate becomes more and more heated.
So, what I infer from the most popular posts and the numbers is that people want criticism and controversy, and if you want blood you’ve got it. Here it is, my latest addition to the successful series criticizing the “Corded Ware/R1a–Indo-European” pet theories, a post I wrote two-three months ago, slightly updated with the newest comedy, and a sure success for 2019 (already added to the static pages of the menu):
While the true source of R1a-M417 – the main haplogroup eventually associated with Corded Ware, and thus Uralic speakers – is still not known with precision, due to the lack of R1a-M198 in ancient samples, we already know that the Pontic-Caspian steppes were probably not it.
R1a-M459 (xR1a-M198) lineages appear from the Mesolithic to the Chalcolithic scattered from the Baltic to the Caucasus, from the Dniester to Samara, in a situation similar to haplogroups Q1a-M25 and R1b-L754, which supports the idea that R1a, Q1a, and R1b expanded with ANE ancestry, possibly in different waves since the Epipalaeolithic, and formed the known ANE:EHG:WHG cline.
The first confirmed R1a-M417 sample comes from Alexandria, roughly coinciding with the so-called steppe hiatus. Its emergence in the area of the previous “early Sredni Stog” groups (see the mess of the traditional interpretation of the north Pontic groups as “Sredni Stog”) and its later expansion with Corded Ware supports Kristiansen’s interpretation that Corded Ware emerged from the Dnieper-Dniester corridor, although samples from the area up to ca. 4000 BC, including the few Middle Eneolithic samples available, show continuity of hg. I2a-M223 and typical Ukraine Neolithic ancestry.
NOTE. The further subclade R1a-Z93 (Y26) reported for the sample from Alexandria seems too early, given the confidence interval for its formation (ca. 3500-2500 BC); even R1a-Z645 could be too early. Like the attribution of the R1b-L754 from Khvalynsk to R1b-V1636 (after being previously classifed as of Pre-V88 and M73 subclade), it seems reasonable to take these SNP calls with a pinch of salt: especially because Yleaf (designed to look for the furthest subclade possible) does not confirm for them any subclade beyond R1a-M417 and R1b-L754, respectively.
The sudden appearance of “steppe ancestry” in the region, with the high variability shown by Ukraine_Eneolithic samples, suggests that this is due to recent admixture of incoming foreign peoples (of Ukraine Neolithic / Comb Ware ancestry) with Novodanilovka settlers.
The most likely origin of this population, taking into account the most common population movements in the area since the Neolithic, is the infiltration of (mainly) hunter-gatherers from the forest areas. That would confirm the traditional interpretation of the origin of Uralic speakers in the forest zone, although the nature of Pontic-Caspian settlers as hunter-gatherers rather than herders make this identification today fully unnecessary (see here).
EDIT (3 FEB 2019): As for the most common guesstimates for Proto-Uralic, roughly coinciding with the expansion of this late Sredni Stog community (ca. 4000 BC), you can read the recent post by J. Pystynen in Freelance Reconstruction, Probing the roots of Samoyedic.
NOTE. Although my initial simplistic interpretation (of early 2017) of Comb Ware peoples – traditionally identified as Uralic speakers – potentially showing steppe ancestry was probably wrong, it seems that peoples from the forest zone – related to Comb Ware or neighbouring groups like Lublyn-Volhynia – reached forest-steppe areas to the south and eventually expanded steppe ancestry into east-central Europe through the Volhynian Upland to the Polish Upland, during the late Trypillian disintegration (see a full account of the complex interactions of the Final Eneolithic).
The most interesting aspect of ascertaining the origin of R1a-M417, given its prevalence among Uralic speakers, is to precisely locate the origin of contacts between Late Proto-Indo-European and Proto-Uralic. Traditionally considered as the consequence of contacts between Middle and Upper Volga regions, the most recent archaeological research and data from ancient DNA samples has made it clear that it is Corded Ware the most likely vector of expansion of Uralic languages, hence these contacts of Indo-Europeans of the Volga-Ural region with Uralians have to be looked for in neighbours of the north Pontic area.
My bet – rather obvious today – is that the Don River area is the source of the earliest borrowings of Late Uralic from Late Indo-European (i.e. post-Indo-Anatolian). The borrowing of the Late PIE word for ‘horse’ is particularly interesting in this regard. Later contacts (after the loss of the initial laryngeal) may be attributed to the traditionally depicted Corded Ware – Yamna contact zone in the Dnieper-Dniester area.
NOTE. While the finding of R1a-M417 populations neighbouring R1b-L23 in the Don-Volga interfluve would be great to confirm these contacts, I don’t know if the current pace of more and more published samples will continue. The information we have right now, in my opinion, suffices to support close contacts of neighbouring Indo-Europeans and Uralians in the Pontic-Caspian area during the Late Eneolithic.
Single Grave and central Corded Ware groups – showing some of the earliest available dates (emerging likely ca. 3000/2900 BC) – are as varied in their haplogroups as it is expected from a sink (which does not in the least resemble the Volga-Ural population):
Interesting is the presence of R1b-L754 in Obłaczkowo, potentially of R1b-V88 subclade, as previously found in two Central European individuals from Blätterhole MN (ca. 3650 and 3200 BC), and in the Iron Gates and north Pontic areas.
Haplogroups I2a and G have also been reported in early samples, all potentially related to the supposed Corded Ware central-east European homeland, likely in southern Poland, a region naturally connected to the north Pontic forest-steppe area and to the expansion of Neolithic groups.
The true bottlenecks under haplogroup R1a-Z645 seem to have happened only during the migration of Corded Ware to the east: to the north into the Battle Axe culture, mainly under R1a-Z282, and to the south into Middle Dnieper – Fatyanovo-Balanovo – Abashevo, probably eventually under R1a-Z93.
This bottleneck also supports in archaeology the expansion of a sort of unifying “Corded Ware A-horizon” spreading with people (disputed by Furholt), the disintegrating Uralians, and thus a source of further loanwords shared by all surviving Uralic languages.
Confirming this ‘concentrated’ Uralic expansion to the east is the presence of R1a-M417 (xR1a-Z645) lineages among early and late Single Grave groups in the west – which essentially disappeared after the Bell Beaker expansion – , as well as the presence of these subclades in modern Central and Western Europeans. Central European groups became thus integrated in post-Bell Beaker European EBA cultures, and their Uralic dialect likely disappeared without a trace.
NOTE. The fate of R1b-L51 lineages – linked to North-West Indo-Europeans undergoing a bottleneck in the Yamna Hungary -> Bell Beaker migration to the west – is thus similar to haplogroup R1a-Z645 – linked to the expansion of Late Uralians to the east – , hence proving the traditional interpretation of the language expansions as male-driven migrations. These are two of the most interesting genetic data we have to date to confirm previous language expansions and dialectal classifications.
It will be also interesting to see if known GAC and Corded Ware I2a-Y6098 subclades formed eventually part of the ancient Uralic groups in the east, apart from lineages which will no doubt appear among asbestos ware groups and probably hunter-gatherers from north-eastern Europe (see the recent study by Tambets et al. 2018).
Corded Ware ancestry marked the expansion of Uralians
Sadly, some brilliant minds decided in 2015 that the so-called “Yamnaya ancestry” (now more appropriately called “steppe ancestry”) should be associated to ‘Indo-Europeans’. This is causing the development of various new pet theories on the go, as more and more data contradicts this interpretation.
There is a clear long-lasting cultural, populational, and natural barrier between Yamna and Corded Ware: they are derived from different ancestral populations, which show clearly different ancestry and ancestry evolution (although they did converge to some extent), as well as different Y-DNA bottlenecks; they show different cultures, including those of preceding and succeeding groups, and evolved in different ecological niches. The only true steppe pastoralists who managed to dominate over grasslands extending from the Upper Danube to the Altai were Yamna peoples and their cultural successors.
[A]rchaeologist Volker Heyd at the University of Bristol, UK, disagreed, not with the conclusion that people moved west from the steppe, but with how their genetic signatures were conflated with complex cultural expressions. Corded Ware and Yamnaya burials are more different than they are similar, and there is evidence of cultural exchange, at least, between the Russian steppe and regions west that predate Yamnaya culture, he says. None of these facts negates the conclusions of the genetics papers, but they underscore the insufficiency of the articles in addressing the questions that archaeologists are interested in, he argued. “While I have no doubt they are basically right, it is the complexity of the past that is not reflected,” Heyd wrote, before issuing a call to arms. “Instead of letting geneticists determine the agenda and set the message, we should teach them about complexity in past human actions.
Interesting report by Bernard Sécher on Anthrogenica, about the Ph.D. thesis of Samantha Brunel from Institut Jacques Monod, Paris, Paléogénomique des dynamiques des populations humaines sur le territoire Français entre 7000 et 2000 (2018).
A summary from user Jool, who was there, translated into English by Sécher (slight changes to translation, and emphasis mine):
They have a good hundred samples from the North, Alsace and the Mediterranean coast, from the Mesolithic to the Iron Age.
There is no major surprise compared to the rest of Europe. On the PCA plot, the Mesolithic are with the WHG, the early Neolithics with the first farmers close to the Anatolians. Then there is a small resurgence of hunter-gatherers that moves the Middle Neolithics a little closer to the WHGs.
From the Bronze Age, they have 5 samples with autosomal DNA, all in Bell Beaker archaeological context, which are very spread on the PCA. A sample very high, close to the Yamnaya, a little above the Corded Ware, two samples right in the Central European Bell Beakers, a fairly low just above the Neolithic package, and one last full in the package. The most salient point was that the Y chromosomes of their 12 Bronze Age samples (all Bell Beakers) are all R1b, whereas there was no R1b in the Neolithic samples.
Finally they have samples of the Iron Age that are collected on the PCA plot close to the Bronze Age samples. They could not determine if there is continuity with the Bronze Age, or a partial replacement by a genetically close population.
The sample with likely high “steppe ancestry“, clustering closely to Yamna (more than Corded Ware samples) is then probably an early East Bell Beaker individual, probably from Alsace, or maybe close to the Rhine Delta in the north, rather than from the south, since we already have samples from southern France from Olalde et al. (2018) with high Neolithic ancestry, and samples from the Rhine with elevated steppe ancestry, but not that much.
This specific sample, if confirmed as one of those reported as R1b (then likely R1b-L151), as it seems from the wording of the summary, is key because it would finally link Yamna to East Bell Beaker through Yamna Hungary, all of them very “Yamnaya-like”, and therefore R1b-L151 (hence also R1b-L51) directly to the steppe, and not only to the Carpathian Basin (that is, until we have samples from late Repin or West Yamna…)
NOTE. The only alternative explanation for such elevated steppe ancestry would be an admixture between a ‘less Yamnaya-like’ East Bell Beaker + a Central European Corded Ware sample like the Esperstedt outlier + drift, but I don’t think that alternative is the best explanation of its position in the PCA closer to Yamna in any of the infinite parallel universes, so… Also, the sample from Esperstedt is clearly a late outlier likely influenced by Yamna vanguard settlers from Hungary, not the other way round…
Unexpectedly, then, fully Yamnaya-like individuals are found not only in Yamna Hungary ca. 3000-2500 BC, but also among expanding East Bell Beakers later than 2500 BC. This leaves us with unexplained, not-at-all-Yamnaya-like early Corded Ware samples from ca. 2900 BC on. An explanation based on admixture with locals seems unlikely, seeing how Corded Ware peoples continue a north Pontic cluster, being thus different from Yamna and their ancestors since the Neolithic; and how they remained that way for a long time, up to Sintashta, Srubna, Andronovo, and even later samples… A different, non-Indo-European community it is, then.
Let’s wait and see the Ph.D. thesis, when it’s published, and keep observing in the meantime the absurd reactions of denial, anger, bargaining, and depression (stages of grief) among BBC/R1b=Vasconic and CWC/R1a=Indo-European fans, as if they had lost something (?). Maybe one of these reactions is actually the key to changing reality and going back to the 2000s, who knows…
Featured image: initial expansion of the East Bell Beaker Group, by Volker Heyd (2013).
Overall, 96 samples ranging from Slovenian littoral to Lower Styria were genotyped for 713,599 markers using the OmniExpress 24-V1 BeadChips (Figure 1), genetic data were obtained from Esko et al. (2013). After removing related individuals, 92 samples were left. The Slovenian dataset has been subsequently merged with the Human Origin dataset (Lazaridis et al., 2016) for a total of 2163 individuals.
First, Y chromosome genetic diversity was assessed. A total of 52 Y chromosomes were analyzed for 195 SNPs. The majority of individuals (25, 48.1%) belong to the haplogroup R1a1a1a (R-M417) while the second major haplogroup is represented by R1b (R-M343) including 15 individuals (28.8%). Twelve samples are assigned to haplogroup I (I M170): five and two samples belong to haplogroup I2a (I L460) and I1 (I M253), respectively, while the remaining five samples did not have enough information to be further assigned.
Considering the unbalanced sample size of the Slovenian population compared to the other populations included in the dataset, a subset of 20 Slovenian individuals randomly sampled was used.
All Slovenian samples group together with Hungarians, Czechs, and some Croatians (“Central-Eastern European” cluster) as also suggested by the PCA. All Basque individuals with few French and Spanish cluster together (“Basque” cluster) while a “Northern-European” cluster is made of the majority of French, English, Icelanders, Norwegians, and Orcadians. Five populations contributed to the “Eastern-European” cluster including Belarusians, Estonians, Lithuanians, Mordovians, and Russians. Western and South Europe is split into two cluster: the first (“Western European” cluster) includes all Spanish individuals, few French, and some Italians (North Italy) while the second (“Southern-European” cluster) groups Sicilians, Greeks, some Croatians, Romanians, and some Italians (North Italy).
Admixture Pattern and Migration
All Slovenian individuals share common pattern of genetic ancestry, as revealed by ADMIXTURE analysis. The three major ancestry components are the North East and North West European ones (light blue and dark blue, respectively, Figure 3), followed by a South European one (dark green, Figure 3). Contribution from the Sardinians and Basque are present in negligible amount. The admixture pattern of Slovenians mimics the one suggested by the neighboring Eastern European populations, but it is different from the pattern suggested by North Italian populations even though they are geographically close.
Using ALDER, the most significant admixture event was obtained with Russians and Sardinians as source populations and it happened 135 ± 9.31 generations ago (Z-score = 11.54). (…) When tested for multiple admixture events (MALDER), we obtained evidence for one admixture event 165.391 ± 17.1918 generations ago corresponding to ∼2620 BCE (CI: 3101–2139) considering a generation time of 28 years (Figure 4), with Kalmyk and Sardinians as sources.
We then modeled the Slovenian population as target of admixture of ancient individuals from Haak et al. (2015) while computing the f3(Ancient 1, Ancient 2, Slovenian) statistic. The most significant signal was obtained with Yamnaya and HungaryGamba_EN (Z-score = -10.66), followed by MA1 with LBK_EN (Z-score -9.7) and Yamnaya with Stuttgart (Z-score = -8.6) used as possible source populations (Supplementary Figure 5).
We found a significant signal of admixture by using both pairs as ancient sources. Specifically, for the pair Yamnaya and Hungary_EN the admixture event is dated at 134.38 ± 23.69 generations ago (Z-score = 5.26, p-value of 1.5e-07) while for Yamnaya and LBK_EN at 153.65 ± 22.19 generations ago (Z-score = 6.92, p-value 4.4e-12). Outgroup f3 with Yamnaya put Slovenian population close to Hungarians, Czechs, and English, indicating a similar shared drift between these population with the Steppe populations (Supplementary Figure 6).
Not that any of this would come as a surprise, but:
PCA keeps supporting the common cluster of certain West, South, and East Slavs in a “Central-Eastern European” cluster, distinct from the “North-Eastern European” cluster formed by modern Finno-Ugrians, as well as ancient Finno-Ugrians of north-eastern Europe who were only recently Slavicized.
Admixture supports the same ancient ‘western’ (a core West+South+East Slavic) cluster, and the admixture event with Yamna + Hungary_EN is logically a proxy for Yamna Hungary being at the core of ancestral Central-East population movements related to Bell Beakers in the mid- to late 3rd millennium.
I don’t know where exactly this impulse for the theory of Russia being the cradle of Slavs comes from today (although there are some obvious political trends to revive 19th c. ideas), but it was always clear for everyone, including Russians, that East Slavs had migrated to the east and north and assimilated indigenous Finno-Ugrians, apart from Turkic-, Iranian-, and Caucasian-speaking peoples to the east. Genetics is only confirming what was clear from other disciplines long ago.
Let me begin this final post on the Corded Ware—Uralic connection with an assertion that should be obvious to everyone involved in ethnolinguistic identification of prehistoric populations but, for one reason or another, is usually forgotten. In the words of David Reich, in Who We Are and How We Got Here (2018):
Human history is full of dead ends, and we should not expect the people who lived in any one place in the past to be the direct ancestors of those who live there today.
Another recurrent argument – apart from “Siberian ancestry” – for the location of the Uralic homeland is “haplogroup N”. This is as serious as saying “haplogroup R1” to refer to Indo-European migrations, but let’s explore this possibility anyway:
We have now a better idea of how many ancient migrations (previously hypothesized to be associated with westward Uralic migrations) look like in genetic terms. From Damgaard et al. (Science 2018):
These serial changes in the Baikal populations are reflected in Y-chromosome lineages (Fig. SA; figs. S24 to S27, and tables S13 and SI4). MAI carries the R haplogroup, whereas the majority of Baikal_EN males belong to N lineages, which were widely distributed across Northern Eurasia (29), and the Baikal_LNBA males all carry Q haplogroups, as do most of the Okunevo_EMBA as well as some present-day Central Asians and Siberians.
The only N1c1 sample comes from Ust’Ida Late Neolithic, 180km to the north of Lake Baikal, which – together with the Bronze Age sample from the Kola peninsula, and the medieval sample from Ust’Ida – gives a good idea of the overall expansion of N subclades and Siberian ancestry among the Circum-Arctic peoples of Eurasia, speakers of Palaeo-Siberian languages.
What we should expect from Uralic peoples expanding with haplogroup N – seeing how Yamna expands with R1b-L23, and Corded Ware expands with R1a-Z645 – is to find a common subclade spreading with Uralic populations. Let’s see if it works like that for any N-X subclade, in data from Ilumäe et al. (2016):
Within the Eurasian circum-Arctic spread zone, N3 and N2a reveal a well-structured spread pattern where individual sub-clades show very different distributions:
N1a1-M46 (or N-TAT), formed ca. 13900 BC, TMRCA 9800 BC
N1a1a2-B187, formed ca. 9800 BC, TMRCA 1050 AD:
The sub-clade N3b-B187 is specific to southern Siberia and Mongolia, whereas N3a-L708 is spread widely in other regions of northern Eurasia.
N1a1a1a-L708, formed ca. 6800 BC, TMRCA 5400 BC.
N1a1a1a2-B211/Y9022, formed ca. 5400 BC, TMRCA 1900 BC:
The deepest clade within N3a is N3a1-B211, mostly present in the Volga-Uralic region and western Siberian Khanty and Mansi populations.
N1a1a1a1a-L392/L1026), formed ca. 4400 BC, TMRCA 2800 BC:
The neighbor clade, N3a3’6-CTS6967, spreads from eastern Siberia to the eastern part of Fennoscandia and the Baltic States
N1a1a1a1a1a-CTS2929/VL29, formed ca. 2100 BC, TMRCA 1600 BC:
In Europe, the clade N3a3-VL29 encompasses over a third of the present-day male Estonians, Latvians, and Lithuanians but is also present among Saami, Karelians, and Finns (Table S2 and Figure 3). Among the Slavic-speaking Belarusians, Ukrainians, and Russians, about three-fourths of their hg N3 Y chromosomes belong to hg N3a3.
In the post on Finno-Permic expansions, I depicted what seems to me the most likely way of infiltration of N1c-L392 lineages with Akozino warrior-traders into the western Finno-Ugric populations, with an origin around the Barents sea.
This includes the potential spread of (a minority of) N1c-B211 subclades due to contacts with Anonino on both sides of the Urals, through a northern route of forest and forest-steppe regions (equivalent to the distribution of Cherkaskul compared to Andronovo), given the spread of certain subclades in Ugric populations.
NOTE. An alternative possibility is the association of certain B211 subclades with a southern route of expansion with Pre-Scythian and Scythian populations, under whose influence the Ananino culture emerged -which would imply a very quick infiltration of certain groups of haplogroup N everywhere among Finno-Ugrics on both sides of the Urals – , and also the expansion of some subclades with Turkic-speaking peoples, who apparently expanded with alliances of different peoples. Both (Scythian and Turkic) populations expanded from East Asia, where haplogroup N (including N1c) was present since the Neolithic. I find this a worse model of expansion for upper clades, but – given the YFull estimates and the presence of this haplogroup among Turkic peoples – it is a possibility for many subclades.
N1a1a1a1a2-Z1936, formed ca. 2800 BC, TMRCA 2400 BC:
The only notable exception from the pattern are Russians from northern regions of European Russia, where, in turn, about two-thirds of the hg N3 Y chromosomes belong to the hg N3a4-Z1936—the second west Eurasian clade. Thus, according to the frequency distribution of this clade, these Northern Russians fit better among other non-Slavic populations from northeastern Europe. N3a4 tends to increase in frequency toward the northeastern European regions but is also somewhat unexpectedly a dominant hg N3 lineage among most Turcic-speaking Volga Tatars and South-Ural Bashkirs.
N1a1a1a1a4-M2019 (previously N3a2), formed ca. 4400 BC, TMRCA 1700 BC:
Sub-hg N3a2-M2118 is one of the two main bifurcating branches in the nested cladistic structure of N3a2’6-M2110. It is predominantly found in populations inhabiting present-day Yakutia (Republic of Sakha) in central Siberia and at lower frequencies in the Khanty and Mansi populations, which exhibit a distinct Y-STR pattern (Table S7) potentially intrinsic to an additional clade inside the sub-hg N3a2
The second widespread sub-clade of hg N is N2a. (…):
N1a2b-P43 (B523/FGC10846/Y3184), formed ca. 6800 BC, TMRCA ca. 2700 BC:
The absolute majority of N2a individuals belong to the second sub-clade, N2a1-B523, which diversified about 4.7 kya (95% CI = 4.0–5.5 kya). Its distribution covers the western and southern parts of Siberia, the Taimyr Peninsula, and the Volga-Uralic region with frequencies ranging from from 10% to 30% and does not extend to eastern Siberia (…)
The “European” branch suggested earlier from Y-STR patterns turned out to consist of two clades
N1a2b2a-Y3185/FGC10847, formed ca. 2200 BC, TMRCA 800 BC:
N2a1-L1419, spread mainly in the northern part of that region.
N1a2b2b1-B528/Y24382, formed ca. 900 BC, TMRCA ca. 900 BC:
N2a1-B528, spread in the southern Volga-Uralic region.
We also have a good idea of the distribution of haplogroup R1a-Z645 in ancient samples. Its subclades were associated with the Corded Ware expansion, and some of them fit quite well the early expansion of Finno-Permic, Ugric, and Samoyedic peoples to the east.
This is how the modern distribution of R1a among Uralians looks like, from the latest report in Tambets et al. (2018):
Among Fennic populations, Estonians and Karelians (ca. 1.1 million) have not suffered the greatest bottleneck of Finns (ca. 6-7 million), and show thus a greater proportion of R1a-Z280 than N1c subclades, which points to the original situation of Fennic peoples before their expansion. To trust Finnish Y-DNA to derive conclusions about the Uralic populations is as useful as relying on the Basque Y-DNA for the language spread by R1b-P312…
Among Volga-Finnic populations, Mordovians (the closest to the original Uralic cluster, see above) show a majority of R1a lineages (27%).
Hungarians (ca. 13-15 million) represent the majority of Ugric (and Finno-Ugric) peoples. They are mainly R1a-Z280, also R1a-Z2123, have little N1c, and lack Siberian ancestry, and represent thus the most likely original situation of Ugric peoples in 4th century AD (read more on Avars and Hungarians).
Among Samoyedic peoples, the Selkup, the southernmost ones and latest to expand – that is, those not heavily admixed with Siberian populations – , also have a majority of R1a-Z2123 lineages (see also here for the original Samoyedic haplogroups to the south).
To understand the relevance of Hungarians for Ugric peoples, as well as Estonians, Karelians, and Mordovians (and northern Russians, Finno-Ugric peoples recently Russified) for Finno-Permic peoples, as opposed to the Circum-Arctic and East Siberian populations, one has to put demographics in perspective. Even a modern map can show the relevance of certain territories in the past:
Summary of ancestry + haplogroups
Fennic and Samic populations seem to be clearly influenced by Palaeo-Laplandic peoples, whereas Volga-Finnic and especially Permic populations may have received gene flow from both, but essentially Palaeo-Siberian influence from the north and east.
The fact that modern Mansis and Khantys offer the highest variation in N1a subclades, and some of the highest “Siberian ancestry” among non-Nganasans, should have raised a red flag long ago. The fact that Hungarians – supposedly stemming from a source population similar to Mansis – do not offer the same amount of N subclades or Siberian ancestry (not even close), and offer instead more R1a, in common with Estonians (among Finno-Samic peoples) and Mordvins (among Volga-Finnic peoples) should have raised a still bigger red flag. The fact that Nganasans – the model for Siberian ancestry – show completely different N1a2b-P43 lineages should have been a huge genetic red line (on top of the anthropological one) to regard them as the Uralian-type population.
It is not hard to model the stepped arrival, infiltration, and/or resurge of N subclades and “Siberian ancestries”, as well as their gradual expansion in certain regions, associated with certain migrations first – such as the expansions to the Circum-Arctic region, and later the Scythian- and Turkic-related movements – , as well as limited regional developments, like the known bottleneck in Finns, or the clear late expansion of Ugric and Samoyedic languages to the north among nomadic Palaeo-Siberians due to traditions of exogamy and multilingualism. This fits quite well with the different arrival of N (N1c and xN1c) lineages to the different Uralic-speaking groups, and to the stepped appearance of “Siberian ancestry” in the different regions.
It is evident that a lot of people were too attached to the idea of Palaeolithic R1b lineages ‘native’ to western Europe speaking Basque languages; of R1a lineages speaking Indo-European and spreading with Yamna; and N lineages ‘native’ to north-eastern Europe and speaking Uralic, and this is causing widespread weeping and gnashing of teeth (instead of the joy of discovering where one’s true patrilineal ancestors come from, and what language they spoke in each given period, which is the supposed objective of genetic genealogy…)
As far as I know – and there might be many other similar pet theories out there – there have been proposals of “modern Balto-Slavic-like” populations (in an obvious circular reasoning based on modern populations) in some Scythian clusters of the Iron Age.
NOTE. I will not enter into “Balto-Slavic-like R1a” of the Late Bronze Age or earlier because no one can seriously believe at this point of development of Population Genetics that autosomal similarity predating 1,500+ years the appearance of Slavs equates to their (ethnolinguistic) ancestral population, without a clear intermediate cultural and genetic trail – something we lack today in the Slavic case even for the late Roman period…
We also know of R1a-Z280 lineages in Srubna, probably expanding to the west. With that in mind, and knowing that Palaeo-Germanic was in close contact with Finno-Samic while both were already separated but still in contact, and that Palaeo-Germanic was also in contact and closely related to a ‘Temematic’ distinct from Balto-Slavic (and also that early Proto-Baltic and Proto-Slavic from the Roman Iron Age and later were in contact with western Uralic) this will be the linguistic map of the Iron Age if R1a is considered to expand Indo-European from some kind of “patron-client” relationship with west Yamna:
My problem with this proposal is that it is obviously beholden to the notion of the uninterrupted cultural, historic and ethnic continuity in certain territories. This bias is common in historiography (von Falkenhausen 1993), but it extends even more easily into the lesser known prehistory of any territory, and now more than ever some people feel the need to corrupt (pre)history based on their own haplogroups (or the majority haplogroups of their modern countries). However, more than on philosophical grounds, my rejection is based on facts: this picture is not what the combination of linguistic, archaeological, and genetic data shows. Period.
Nevertheless, if Yamna + Corded Ware represented the “big and early expansion” of Germanic and Italo-Celtic peoples proper of the dream Nazi’s Lebensraum and Fascist’s spazio vitale proposals; Uralians were Siberian hunter-gatherers that controlled the whole eastern and northern Russia, and miraculously managed to push (ethnolinguistically) Neolithic agropastoralists to the west during and after the Iron Age, with gradual (and often minimal) genetic impact; and Balto-Slavic peoples were represented by horse riders from Pokrovka/Srubna, hiding then somewhere around the forest-steppe until after the Scythian expansion, and then spreading their language (without much genetic impact) during the early Middle Ages…so be it.
Even though proposals of an Eastern Uralic (or Ugro-Samoyedic) group are in the minority – and those who support it tend to search for an origin of Uralic in Central Asia – , there is nothing wrong in supporting this from the point of view of a western homeland, because the eastward migration of both Proto-Ugric and Pre-Samoyedic peoples may have been coupled with each other at an early stage. It’s like Indo-Slavonic: it just doesn’t fit the linguistic data as well as the alternative, i.e. the expansion of Samoyedic first, different from a Finno-Ugric trunk. But, in case you are wondering about this possibility, here is Häkkinen’s (2012) phonological argument:
The case of Samoyedic is quite similar to that of Hungarian, although the earliest Palaeo-Siberian contact languages have been lost. There were contacts at least with Tocharian (Kallio 2004), Yukaghir (Rédei 1999) and Turkic (Janhunen 1998). Samoyedic also:
a) has moved far from the related languages and has been exposed to strong foreign influence
b) shares a small number of common words with other branches (from Sammallahti 1988: only 123 ‘Uralic’ words, versus 390 ‘Uralic’ + ‘Finno-Ugric’ words found in other branches than Samoyedic = 31,5 %)
c) derives phonologically from the East Uralic dialect.
The phonological level is taxonomically more reliable, since it lacks the distortion caused by invisible convergence and false divergence at the lexical level. Thus we can conclude that the traditional taxonomic model, according to which Samoyedic was the first branch to split off from the Proto-Uralic unity, is just as incorrect as the view that Hungarian was the first branch to split off.
Late Uralic can be traced back to metallurgical cultures thanks to terms like PU *wäśka ‘copper/bronze’ (borrowed from Proto-Samoyedic *wesä into Tocharian); PU *äsa and *olna/*olni, ‘lead’ or ‘tin’, found in *äsa-wäśka ‘tin-bronze’; and e.g. *weŋći ‘knife’, borrowed into Indo-Iranian (through the stage of vocalization of nasals), appearing later as Proto-Indo-Aryan *wāćī ‘knife, awl, axe’.
It is known that the southern regions of the Abashevo culture developed Proto-Indo-Iranian-speaking Sintashta-Petrovka and Pokrovka (Early Srubna). To the north, however, Abashevo kept its Uralic nature, with continuous contacts allowing for the spread of lexicon – mainly into Finno-Ugric – , and phonetic influence – mainly Uralisms into Proto-Indo-Iranian phonology (read more here).
The northern part of Abashevo (just like the south) was mainly a metallurgical society, with Abashevo metal prospectors found also side by side with Sintashta pioneers in the Zeravshan Valley, near BMAC, in search of metal ores. About the Seima-Turbino phenomenon, from Parpola (2013):
From the Urals to the east, the chain of cultures associated with this network consisted principally of the following: the Abashevo culture (extending from the Upper Don to the Mid- and South Trans-Urals, including the important cemeteries of Sejma and Turbino), the Sintashta culture (in the southeast Urals), the Petrovka culture (in the Tobol-Ishim steppe), the Taskovo-Loginovo cultures (on the Mid- and Lower Tobol and the Mid-Irtysh), the Samus’ culture (on the Upper Ob, with the important cemetery of Rostovka), the Krotovo culture (from the forest steppe of the Mid-Irtysh to the Baraba steppe on the Upper Ob, with the important cemetery of Sopka 2), the Elunino culture (on the Upper Ob just west of the Altai mountains) and the Okunevo culture (on the Mid-Yenissei, in the Minusinsk plain, Khakassia and northern Tuva). The Okunevo culture belongs wholly to the Early Bronze Age (c. 2250–1900 BCE), but most of the other cultures apparently to its latter part, being currently dated to the pre-Andronovo horizon of c. 2100–1800 BCE (cf. Parzinger 2006: 244–312 and 336; Koryakova & Epimakhov 2007: 104–105).
The majority of the Sejma-Turbino objects are of the better quality tin-bronze, and while tin is absent in the Urals, the Altai and Sayan mountains are an important source of both copper and tin. Tin is also available in southern Central Asia. Chernykh & Kuz’minykh have accordingly suggested an eastern origin for the Sejma-Turbino network, backing this hypothesis also by the depiction on the Sejma-Turbino knives of mountain sheep and horses characteristic of that area. However, Christian Carpelan has emphasized that the local Afanas’evo and Okunevo metallurgy of the Sayan-Altai area was initially rather primitive, and could not possibly have achieved the advanced and difficult technology of casting socketed spearheads as one piece around a blank. Carpelan points out that the first spearheads of this type appear in the Middle Bronze Age Caucasia c. 2000 BCE, diffusing early on to the Mid-Volga-Kama-southern Urals area, where “it was the experienced Abashevo craftsmen who were able to take up the new techniques and develop and distribute new types of spearheads” (Carpelan & Parpola 2001: 106, cf. 99–106, 110). The animal argument is countered by reference to a dagger from Sejma on the Oka river depicting an elk’s head, with earlier north European prototypes (Carpelan & Parpola 2001: 106–109). Also the metal analysis speaks for the Abashevo origin of the Sejma-Turbino network. Out of 353 artefacts analyzed, 47% were of tin-bronze, 36% of arsenical bronze, and 8.5% of pure copper. Both the arsenical bronze and pure copper are very clearly associated with the Abashevo metallurgy.
The Abashevo metal production was based on the Volga-Kama-Belaya area sandstone ores of pure copper and on the more easterly Urals deposits of arsenical copper (Figure 9). The Abashevo people, expanding from the Don and Mid-Volga to the Urals, first reached the westerly sandstone deposits of pure copper in the Volga and Kama basins, and started developing their metallurgy in this area, before moving on to the eastern side of the Urals to produce harder weapons and tools of arsenical copper. Eventually they moved even further south, to the area richest in copper in the whole Urals region, founding there the very strong and innovative Sintashta culture.
Regarding the most likely expansion of Eastern Uralic peoples:
Nataliya L’vovna Chlenova (1929–2009; cf. Korenyako & Ku’zminykh 2011) published in 1981 a detailed study of the Cherkaskul’ pottery. In her carefully prepared maps of 1981 and 1984 (Figure 10), she plotted Cherkaskul’ monuments not only in Bashkiria and the Trans-Urals, but also in thick concentrations on the Upper Irtysh, Upper Ob and Upper Yenissei, close to the Altai and Sayan mountains, precisely where the best experts suppose the homeland of Proto-Samoyed to be.
The Cherkaskul’ culture was transformed into the genetically related Mezhovka culture (c. 1500–1000 BCE), which occupied approximately the same area from the Mid-Kama and Belaya rivers to the Tobol river in western Siberia (cf. Parzinger 2006: 444–448; Koryakova & Epimakhov 2007: 170–175). The Mezhovka culture was in close contact with the neighbouring and probably Proto-Iranian speaking Alekseevka alias Sargary culture (c. 1500–900 BCE) of northern Kazakhstan (Figure 4 no. 8) that had a Fëdorovo and Cherkaskul’ substratum and a roller pottery superstratum (cf. Parzinger 2006: 443–448; Koryakova & Epimakhov 2007: 161–170). Both the Cherkaskul’ and the Mezhovka cultures are thought to have been Proto-Ugric linguistically, on the basis of the agreement of their area with that of Mansi and Khanty speakers, who moreover in their Fëdorovo-like ornamentation have preserved evidence of continuity in material culture (cf. Chlenova 1984; Koryakova & Epimakhov 2007: 159, 175).
The Mezhovka culture was succeeded by the genetically related Gamayun culture (c. 1000–700 BCE) (cf. Parzinger 2006: 446; 542–545).
From the Gamayun culture descend Trans-Urals cultures in close contact with Finno-Permic populations of the Cis-Ural region:
[Proto-Mansi] Itkul’ culture (c. 700–200 BCE) distributed along the eastern slope of the Ural Mountains (cf. Parzinger 2006: 552–556). Known from its walled forts, it constituted the principal Trans-Uralian centre of metallurgy in the Iron Age, and was in contact with both the Anan’ino and Akhmylovo cultures (the metallurgical centres of the Mid-Volga and Kama-Belaya region) and the neighbouring Gorokhovo culture.
[Proto-Hungarian] via the Vorob’evo Group (c. 700–550 BCE) (cf. Parzinger 2006: 546–549), to the Gorokhovo culture (c. 550–400 BCE) of the Trans-Uralian forest steppe (cf. Parzinger 2006: 549–552). For various reasons the local Gorokhovo people started mobile pastoral herding and became part of the multicomponent pastoralist Sargat culture (c. 500 BCE to 300 CE), which in a broader sense comprized all cultural groups between the Tobol and Irtysh rivers, succeeding here the Sargary culture. The Sargat intercommunity was dominated by steppe nomads belonging to the Iranian-speaking Saka confederation, who in the summer migrated northwards to the forest steppe
[Proto-Khanty] Late Bronze Age and Early Iron Age cultures related to the Gamayunskoe and Itkul’ cultures that extended up to the Ob: the Nosilovo, Baitovo, Late Irmen’, and Krasnoozero cultures (c. 900–500 BCE). Some were in contact with the Akhmylovo on the Mid-Volga.
Parpola (2012) connects the expansion of Samoyedic with the Cherkaskul variant of Andronovo. As we know, Andronovo was genetically diverse, which speaks in favour of different groups developing similar material cultures in Central Asia.
Juha Janhunen, author of the etymological dictionary of the Samoyed languages (1977), places the homeland of Proto-Samoyedic in the Minusinsk basin on the Upper Yenissei (cf. Janhunen 2009: 72). Mainly on the basis of Bulghar Turkic loanwords, Janhunen (2007: 224; 2009: 63) dates Proto-Samoyedic to the last centuries BCE. Janhunen thinks that the language of the Tagar culture (c. 800–100 BCE) ought to have been Proto-Samoyedic (cf. Janhunen 1983: 117– 118; 2009: 72; Parzinger 2001: 80 and 2006: 619–631 dates the Tagar culture c. 1000–200 BCE; Svyatko et al. 2009: 256, based on human bone samples, c. 900 BCE to 50 CE). The Tagar culture largely continues the traditions of the Karasuk culture (c. 1400–900 BCE), (…)
The use of a map of “Siberian ancestry” peaking in the arctic to show a supposedly late Uralic population movement (starting in the Iron Age!) seems to be the latest trend in population genomics:
I guess that would make this map of Neolithic farmer ancestry represent an expansion of Indo-European from the south, because Anatolia, Greece, Italy, southern France, and Iberia – where this ancestry peaks in modern populations – are among the oldest territories where Indo-European languages were recorded:
Probably not the right interpretation of this kind of simplistic data about modern populations, though…
Overall, and specifically at lower values of K, the genetic makeup of Uralic speakers resembles that of their geographic neighbours. The Saami and (a subset of) the Mansi serve as exceptions to that pattern being more similar to geographically more distant populations (Fig. 3a, Additional file 3: S3). However, starting from K = 9, ADMIXTURE identifies a genetic component (k9, magenta in Fig. 3a, Additional file 3: S3), which is predominantly, although not exclusively, found in Uralic speakers. This component is also well visible on K = 10, which has the best cross-validation index among all tests (Additional file 3: S3B). The spatial distribution of this component (Fig. 3b) shows a frequency peak among Ob-Ugric and Samoyed speakers as well as among neighbouring Kets (Fig. 3a). The proportion of k9 decreases rapidly from West Siberia towards east, south and west, constituting on average 40% of the genetic ancestry of FU speakers in Volga-Ural region (VUR) and 20% in their Turkic-speaking neighbours (Bashkirs, Tatars, Chuvashes; Fig. 3a).
However, this ‘something’ that some people occasionally find in some Uralic populations is also common to other modern and ancient groups, and not so common in some other Uralic peoples. Simply put:
I already said this in the recent publication of Siberian samples, where a renamed and radiocarbon dated Finnish_IA clearly shows that Late Iron Age Saami (ca. 400 AD) had little “Siberian ancestry”, if any at all, representing the most likely Fennic (and Samic) ancestral components before their expansion into central and northern Finland, where they admixed with circum-polar peoples of asbestos ware cultures.
I will say that again and again, any time they report the so-called “Siberian ancestry” in Uralic samples, no matter how it is defined each time: it does not seem to be that special something people are looking for, but rather (at least in a great part) a quite old ancestral component forming an evident cline with EHG, whose best proximate source are Baikal_EN (and/or Devil’s Gate) at this moment, and thus also East European hunter-gatherers for Western Uralic peoples:
So either Samara_HG, Karelia_HG, and many other groups from eastern Europe all spoke Uralic according to this ADMIXTURE graphic (and the formation of steppe ancestry in the Volga-Ural region brought the Proto-Indo-European language to the steppes through the CHG/ANE expansion), or a great part of this “Siberian ancestry” found in modern Uralic-speaking populations is not what some people would like to think it is…
PCA clines can be looked for to represent expansions of ancient populations. Most recently, Flegontov et al. (2018) are attempting to do this with Asian populations:
For some Turkic groups in the Urals and the Altai regions and in the Volga basin, a different admixture model fits the data: the same West Eurasian source + Uralic- or Yeniseian-speaking Siberians. Thus, we have revealed an admixture cline between Scythians and the Iranian farmer genetic cluster, and two further clines connecting the former cline to distinct ancestry sources in Siberia. Interestingly, few Wusun-period individuals harbor substantial Uralic/Yeniseian-related Siberian ancestry, in contrast to preceding Scythians and later Turkic groups characterized by the Tungusic/Mongolic-related ancestry. It remains to be elucidated whether this genetic influx reflects contacts with the Xiongnu confederacy. We are currently assembling a collection of samples across the Eurasian steppe for a detailed genetic investigation of the Hunnic confederacies.
There are potential errors with this approach:
The main one is practical – does a modern cline represent an ancestral language? The answer is: sometimes. It depends on the anthropological context that we have, and especially on the precision of the PCA:
The ‘Europe’, ‘Middle East’, etc. clines of the above PCA do not represent one language, but many. For starters, the PCA includes too many (and modern) populations, its precision is useless for ethnolinguistic groups. Which is the right level? Again, it depends.
The other error is one of detail of the clines drawn (which, in turn, depends on the precision of the PCA). For example, we can draw two paralell lines (or even one line, as in Flegontov et al. above) in one PCA graphic, but we still don’t have the direction of expansion. How do we know if this supposed “Uralic-speaking cline” goes from one region to the other? For that level of detail, we should examine closely modern Uralic-speaking peoples and Circum-Arctic populations:
The real ancient Uralic cluster (drawn above in blue) is thus probably from a North-East European source (probably formed by Battle Axe / Fatyanovo-Balanovo / Abashevo) to the east into Siberian populations, and to the north into Laplandic populations (see below also on Mezhovska ancestry for the drawn ‘European cline’, which some may a priori wrongly assume to be quite late).
The fact that the three formed clines point to an admixture of CWC-related populations from North-Eastern Europe, and that variation is greater at the Palaeo-Laplandic and Palaeo-Siberian extremities compared to the CWC-related one, also supports this as the correct interpretation.
However, judging by the two main clines formed, one could be alternatively inclined to interpret that Palaeo-Laplandic and Palaeo-Siberian populations formed a huge ancestral “Uralic” ghost cluster in Siberia (spanning from the Palaeo-Laplandic to the Palaeo-Siberian one), and from there expanded Finno-Samic on one hand, and “Volga-Ugro-Samoyed” on the other. That poses different problems: an obvious linguistic and archaeological one – which I assume a lot of people do not really care about – , and a not-so-obvious genetic one (see below for ancient samples and for the expansion of haplogroup N).
Unlike this PCA with ancient samples, where Bell Beaker clines could be a rough approximation to the real sources for each population, and where a cluster spanning all three depicted Early Bronze Age clusters could give a rough proximate source of European Bell Beakers in Hungary (and where one can even distinguish the Y-DNA bottlenecks in the L23 trunk created by each cline) the PCA of modern Uralic populations is probably not suitable for a good estimate of the ancient situation, which may be found shifted up or down of the drawn “Uralic” cluster along East European groups.
After all, we already know that the Siberian cline shows probably as much an ancient admixture event – from the original Uralic expansion to the east with Corded Ware ancestry – as another more recent one – a westward migration of Siberian ancestry (or even more than one). While we know with more or less exactitude what happened with the Palaeo-Laplandic admixture by expanding Proto-Finno-Samic populations (see here), the Proto-Ugric and Pre-Samoyedic populations formed probably more than one cline during the different ancient migrations through central Asia.
Apparently, the Corded Ware expansion to the east was not marked by a huge change in ancestry. While the final version of Narasimhan et al. (2018) may show a little more detail about other forest-steppe Seima-Turbino/Andronovo-related migrations (and thus also Eastern Uralic peoples), we have already had enough information for quite some time to get a good idea.
Mezhovska‘s position is similar to the later Pre-Scythian and Scythian populations. There are some interesting details: apart from haplogroup R1a-Z280 (CTS1211+), there is one R1b-M269 (PF6494+), probably Z2103, and an outlier (out of three) in a similar position to the recently described central/southern Scythian clusters.
NOTE. The finding of R1b-M269 in the forest-steppe is probably either 1) from an Afanasevo-Okunevo origin, or 2) from an admixture with neighbouring Andronovo-related populations, such as Sargary. A third, maybe less likely option is that this haplogroup admixed with Abashevo directly (as it happened in Sintashta, Potapovka, or Pokrovka) and formed part of early Uralic migrations. In any case, since Mezhovska is a Bronze Age society from the Urals region, its association with R1b-Z2103 – like the association of R1b-Z2103 in Scythian clusters – cannot be attributed to “Thracian peoples”, a link which is (as I already said) too simplistic.
The drawn “European cline” of Hungarians (see above), leading from ‘west-like’ Mansi to Hungarian populations – and hosting also Finnic and Estonian samples – , cannot therefore be attributed simply to late “Slavic/Balkan-like” admixture.
Karasuk – located further to the east – is basically also Corded Ware peoples showing clearly a recent admixture with local ANE / Baikal_EN-like populations. In terms of haplogroups it shows haplogroup Q, R1a-Z2124, and R1a-Z2123, later found among early Hungarians, and present also in ancient Samoyedic populations now acculturated.
The most interesting aspect of both Mezhovska and Karasuk is that they seem to diverge from a point close to Ukraine_Eneolithic, which is the supposed ancestral source of Corded Ware peoples (read more about the formation of “steppe ancestry”). This means that Eastern Uralians derive from a source closer to Middle Dnieper/Abashevo populations, rather than Battle Axe (shifted to Latvian Neolithic), which is more likely the source prevalent in Finno-Permic peoples.
Their initial admixture with (Palaeo-)Siberian populations is thus seen already starting by this time in Mezhovska and especially in Karasuk, but this process (compared to modern populations) is incomplete:
We know now that Samic peoples expanded during the Late Iron Age into Palaeo-Laplandic populations, admixing with them and creating this modern cline. Finns expanded later to the north (in one of their known genetic bottlenecks), admixing with (and displacing) the Saami in Finland, especially replacing their male lines.
So how did Ugric and Samoyedic peoples admix with Palaeo-Siberian populations further, to obtain their modern cline? The answer is, logically, with East Asian migrations related to forest-steppe populations of Central Asia after the Mezhovska and Karasuk periods, i.e. during the Iron Age and later. Other groups from the forest-steppe in Central Asia show similar East Asian (“Siberian”) admixture. We know this from Narasimhan et al. (2018):
(…) we observe samples from multiple sites dated to 1700-1500 BCE (Maitan, Kairan, Oy_Dzhaylau and Zevakinsikiy) that derive up to ~25% of their ancestry from a source related to present-day East Asians and the remainder from Steppe_MLBA. A similar ancestry profile became widespread in the region by the Late Bronze Age, as documented by our time transect from Zevakinsikiy and samples from many sites dating to 1500-1000 BCE, and was ubiquitous by the Scytho-Sarmatian period in the Iron Age.
Flegontov: Present day Turkic speakers fall into two clusters of admixture patterns (Uralic/Yenisean and Tungussic/Mngolic) based on genomic data with ancient Turks belonging almost exclusively to the first cluster. #ISBA8
The Ugric-speaking Sargat culture in Western Siberia shows the expected mixture of haplogroups (ca. 500 BC – 500 AD), with 5 samples of hg N and 2 of hg R1a1, in Pilipenko et al. (2017). Although radiocarbon dates and subclades are lacking, N lineages probably spread late, because of the late and gradual admixture of Siberian cultures into the Sargat melting pot.
The observed reduction in the genetic distance between the Middle Tagar population and other Scythian like populations of Southern Siberia(Fig 5; S4 Table), in our opinion, is primarily associated with an increase in the role of East Eurasian mtDNA lineages in the gene pool (up to nearly half of the gene pool) and a substantial increase in the joint frequency of haplogroups C and D (from 8.7% in the Early Tagar series to 37.5% in the Middle Tagar series). These features are characteristic of many ancient and modern populations of Southern Siberia and adjacent regions of Central Asia, including the Pazyryk population of the Altai Mountains.
Before the Iron Age, the Karasuk and Mezhovska population were probably already somehow ‘to the north’ within the ancient Steppe-Altai cline (see image below9 created by expanding Seima-Turbino- and Andronovo-related populations. During the Iron Age, further Siberian contributions with Iranian expansions must have placed Uralians of the Central Asian forest-steppe areas much closer to today’s Palaeo-Siberian cline.
However, the modern genetic picture was probably fully developed only in historic times, when Samoyedic and Ugric languages expanded to the north, only in part admixing further with Palaeo-Siberian-speaking nomads from the Circum-Arctic region (see here for a recent history of Samoyedic Enets), which justifies their more recent radical ‘northern shift’.
This late acquisition of the language by Palaeo-Siberian nomads (without much population replacement) also justifies the wide PCA clusters of very small Siberian populations. See for example in the PCA from Tambets et al. (2018):
For their relationship with modern Mansi, we have information on Hungarian conqueror populations from Neparáczki et al. (2018):
Moreover, Y, B and N1a1a1a1a Hg-s have not been detected in Finno-Ugric populations [80–84], implying that the east Eurasian component of the Conquerors and Finno-Ugric people are probably not directly related. The same inference can be drawn from phylogenetic data, as only two Mansi samples appeared in our phylogenetic trees on the side branches (S1 Fig, Networks; 1, 4) suggesting that ancestors of the Mansis separated from Asian ancestors of the Conquerors a long time ago. This inference is also supported by genomic Admixture analysis of Siberian and Northeastern European populations , which revealed that Mansis received their eastern Siberian genetic component approximately 5–7 thousand years ago from ancestors of modern Even and Evenki people. Most likely the same explanation applies to the Y-chromosome N-Tat marker which originated from China [86,87] and its subclades are now widespread between various language groups of North Asia and Eastern Europe .
The genetic picture of Hungarians (their formed cline with Mansi and their haplogroups) may be quite useful for the true admixture found originally in Mansi peoples at the beginning of the Iron Age. By now it is clear even from modern populations that Steppe_MLBA ancestry accompanied the Uralic expansion to the east (roughly approximated in the graphic with Afanasievo_EBA + Bichon_LP EasternHG_M):
It has been known for a long time that the Caucasus must have hosted many (at least partially) isolated populations, probably helped by geographical boundaries, setting it apart from open Eurasian areas.
David Reich writes in his book the following about India:
The genetic data told a clear story. Around a third of Indian groups experienced population bottlenecks as strong or stronger than the ones that occurred among Finns or Ashkenazi Jews. We later confirmed this finding in an even larger dataset that we collected working with Thangaraj: genetic data from more than 250 jati groups spread throughout India (…)
Rather than an invention of colonialism as Dirks suggested, long-term endogamy as embodied in India today in the institution of caste has been overwhelmingly important for millennia. (…)
The Han Chinese are truly a large population. They have been mixing freely for thousands of years. In contrast, there are few if any Indian groups that are demographically very large, and the degree of genetic differentiation among Indian jati groups living side by side in the same village is typically two to three times higher than the genetic differentiation between northern and southern Europeans. The truth is that India is composed of a large number of small populations.
There is little doubt now, based on findings spanning thousands of years, that the Mesolithic and Neolithic Caucasus hosted various very small populations, even if the ancestral components may be reduced to the few known to date (such as ANE, EHG, AME*, ENA, CHG, and other “deep” ancestral components).
NOTE. I will call the ancestral component of Dzudzuana/Anatolian hunter-gatherers Ancient Middle Easterner (AME), to give a clear idea of its likely extension during the Late Upper Palaeolithic, and to avoid using the more simplistic Dzudzuana, unless it is useful to mention these specific local samples.
Genetic labs have a strong fixation with ancestry. I guess the use of complex statistical methods gives professionals and laymen alike the feeling of dealing with “Science”, as opposed to academic fields where you have to interpret data. I think language reveals a lot about the way people think, and the fact that ancestral components are called ‘lineages’ – while not wrong per se – is a clear symptom of the lack of interest in the true lineages: Y-DNA haplogroups.
It has become quite clear that male-biased migrations are often the ones which can be confidently followed for actual population movements and ethnolinguistic identification, at least until the Iron Age. The frequently used Palaeolithic clusters offer a clear example of why ancestry does not represent what some people believe: They merely give a basic idea of sizeable population replacements by distant peoples.
Both concepts are important: sizeable and distant peoples. For example, during the Upper Palaeolithic in Europe there was a sizeable population replacement of the Aurignacian Goyet cluster by the Gravettian Vestonice cluster (probably from populations of far eastern Russia) coupled with the arrival of haplogroup I, although during the thousands of years that this material culture lasted, the previously expanded C1a2 lineages did not disappear, and there were probably different resurgence and admixture events.
Haplogroup I certainly expanded with the Gravettian culture to Iberia, where the Goyet ancestry did not change much – probably because of male-driven migrations -, to the extent that during the Magdalenian expansions haplogroup I expanded with an ancestry closer to Goyet, in what is called a ‘resurge’ of the Goyet cluster – even though there is a clear replacement of male lines.
The Villabruna (WHG) cluster is another good example. It probably spread with haplogroup R1b-L754, which – based on the extra ‘East Asian’ affinity of some samples and on modern samples from the Middle East – came probably from the east through a southern route, and not too long before the expansion of WHG likely from around the Black Sea, although this is still unclear. The finding of haplogroup I in samples of mostly WHG ancestry could confuse people that do not care about timing, sub-structured populations, and gene flow.
NOTE. If you don’t understand why ‘clusters’ that span thousands of years don’t really matter for the many Palaeolithic population expansions that certainly happened among hunter-gatherers in Europe, just take a look at what happened with Bell Beakers expanding from Yamna into western Europe within 500 years.
If we don’t thread carefully when talking about population migrations, these terms are bound to confuse people. Just as the fixation on “steppe ancestry” – which marks the arrival in Chalcolithic Europe of peoples from the Pontic-Caspian region – has confused a lot of researchers to this day.
When I began to write about the Indo-European demic diffusion model, my concern was to find a single spot where a North-West Indo-European proto-language could have expanded from ca. 2000 BC (our most common guesstimate). Based on the 2015 papers, and in spite of their conclusions, I thought it had become clear that Corded Ware was not it, and it was rather Bell Beakers. I assumed that Uralic was spoken to the north (as was the traditional belief), and thus Corded Ware expanded from the forest zone, hence steppe ancestry would also be found there with other R1a lineages.
With the publication of Mathieson et al. (2017) and Olalde et al. (2017), I changed my mind, seeing how “steppe ancestry” did in fact appear quite late, hence it was likely to be the result of very specific population movements, probably directly from the Caucasus. Later, Mathieson published in a revision the sample from Alexandria of hg R1a-M417 (probably R1a-Z645, possibly Z93+), which further supported the idea that the migration of Corded Ware peoples started near the North Pontic forest-steppe (as I included in a the next revision).
The question remains the same I repeated recently, though: where do the extra Caucasus components (i.e. beyond EHG) of Eneolithic Ukraine/Corded Ware and Khvalynsk/Yamna come from?
Considering 2-way mixtures, we can model Karelia_HG as deriving 34 ± 2.8% of its ancestry from a Villabruna-related source, with the remainder mainly from ANE represented by the AfontovaGora3 (AG3) sample from Lake Baikal ~17kya.
AG3 was likely of haplogroup Q1a (as reported by YFull, see Genetiker), and probably the ANE ancestry found in Eastern Europe accompanied a Palaeolithic migration of Q1a2-M25 (formed ca. 22600 BC, TMRCA ca. 14300 BC).
Combined with what we know about the Eneolithic Steppe and Caucasus populations – it is likely that ANE ancestry remained the most important component of some of the small ghost populations of the Caucasus until their emergence with the Lola culture.
The first sample we have now attributed to the EHG cluster is Sidelkino, from the Samara region (ca. 9300 BC), mtDNA U5a2. In Damgaard et al. (Science 2018), Yamnaya could be modelled as a CHG population related to Kotias Klde (54%) and the remaining from ANE population related to Sidelkino (>46%), with the following split events:
A split event, where the CHG component of Yamnaya splits from KK1. The model inferred this time at 27 kya (though we note the larger models in Sections S2.12.4 and S2.12.5 inferred a more recent split time).
A split event, where the ANE component of Yamnaya splits from Sidelkino. This was inferred at about about 11 kya.
A split event, where the ANE component of Yamnaya splits from Botai. We inferred this to occur 17 kya. Note that this is above the Sidelkino split time, so our model infers Yamnaya to be more closely related to the EHG Sidelkino, as expected.
An ancestral split event between the CHG and ANE ancestral populations. This was inferred to occur around 40 kya.
Other samples classified as of the EHG cluster:
Popovo2 (ca. 6250 BC) of hg J1, mtDNA U4d – Po2 and Po4 from the same site (ca. 6550 BC) show continuity of mtDNA.
Karelia_HG, from Juzhnii Oleni Ostrov (ca. 6300 BC): I0211/UzOO40 (ca. 6300 BC) of hg J1(xJ1a), mtDNA U4a; and I0061/UzOO74 of hg R1a1(xR1a1a), mtDNA C1
UzOO77 and UzOO76 from Juzhnii Oleni Ostrov (ca. 5250 BC) of mtDNA R1b.
Samara_HG from Lebyanzhinka (ca. 5600 BC) of hg R1b1a, mtDNA U5a1d.
About the enigmatic Anatolia_Neolithic-related ancestry found in Pontic-Caspian steppe samples, this is what Wang et al. (2018) had to say:
We focused on model of mixture of proximal sources such as CHG and Anatolian Chalcolithic for all six groups of the Caucasus cluster (Eneolithic Caucasus, Maykop and Late Makyop, Maykop-Novosvobodnaya, Kura-Araxes, and Dolmen LBA), with admixture proportions on a genetic cline of 40-72% Anatolian Chalcolithic related and 28-60% CHG related (Supplementary Table 7). When we explored Romania_EN and Greece_Neolithic individuals as alternative southeast European sources (30-46% and 36-49%), the CHG proportions increased to 54-70% and 51-64%, respectively. We hypothesize that alternative models, replacing the Anatolian Chalcolithic individual with yet unsampled populations from eastern Anatolia, South Caucasus or northern Mesopotamia, would probably also provide a fit to the data from some of the tested Caucasus groups.
The first appearance of ‘Near Eastern farmer related ancestry’ in the steppe zone is evident in Steppe Maykop outliers. However, PCA results also suggest that Yamnaya and later groups of the West Eurasian steppe carry some farmer related ancestry as they are slightly shifted towards ‘European Neolithic groups’ in PC2 (Fig. 2D) compared to Eneolithic steppe. This is not the case for the preceding Eneolithic steppe individuals. The tilting cline is also confirmed by admixture f3-statistics, which provide statistically negative values for AG3 as one source and any Anatolian Neolithic related group as a second source
Detailed exploration via D-statistics in the form of D(EHG, steppe group; X, Mbuti) and D(Samara_Eneolithic, steppe group; X, Mbuti) show significantly negative D values for most of the steppe groups when X is a member of the Caucasus cluster or one of the Levant/Anatolia farmer-related groups (Supplementary Figs. 5 and 6). In addition, we used f- and D-statistics to explore the shared ancestry with Anatolian Neolithic as well as the reciprocal relationship between Anatolian- and Iranian farmer-related ancestry for all groups of our two main clusters and relevant adjacent regions (Supplementary Fig. 4). Here, we observe an increase in farmer-related ancestry (both Anatolian and Iranian) in our Steppe cluster, ranging from Eneolithic steppe to later groups. In Middle/Late Bronze Age groups especially to the north and east we observe a further increase of Anatolian farmer related ancestry consistent with previous studies of the Poltavka, Andronovo, Srubnaya and Sintashta groups and reflecting a different process not especially related to events in the Caucasus.
(…) Surprisingly, we found that a minimum of four streams of ancestry is needed to explain all eleven steppe ancestry groups tested, including previously published ones (Fig. 2; Supplementary Table 12). Importantly, our results show a subtle contribution of both Anatolian farmer-related ancestry and WHG-related ancestry (Fig.4; Supplementary Tables 13 and 14), which was likely contributed through Middle and Late Neolithic farming groups from adjacent regions in the West. The discovery of a quite old AME ancestry has rendered this probably unnecessary, because this admixture from an Anatolian-like ghost population could be driven even by small populations from the Caucasus.
While it is not yet fully clear, the increased Anatolian_Neolithic-like ancestry in Ukraine_Eneolithic samples (see below) makes it unlikely that all such ancestry in Corded Ware groups comes from a GAC-related contribution. It is likely that at least part of it represents contributions from populations of the Caucasus, based on the mostly westward population movements in the steppe from ca. 4600 BC on, including the Suvorovo-Novodanilovka expansion, and especially the Kuban-Maykop expansion during the final Eneolithic into the North Pontic area.
NOTE. Since CHG-like groups from the Caucasus may have combinations of AME and ANE ancestry similar to Yamna (which may thus appear as ‘steppe ancestry’ in the North Pontic area), it is impossible to interpret with precision the following ADMIXTURE graphic:
The East Asian contribution to samples from the WHG samples (like Loschbour or La Braña), as specified in Fu et al. (2016), does not seem to be related to Baikal_EN, and appears possibly (in the ADMIXTURE analysis) integrated into he Villabruna component. I guess this implies that the shared alleles with East Asians are quite early, and potentially due to the expansion of R1b-L754 from the East.
It would be interesting to know the specific material culture Sidelkino belonged to – i.e. if it was related to the expansion of the North-Eastern Technocomplex – , and its Y-DNA. The Post-Swiderian expansion into eastern Europe, probably associated with the expansion of R1b-P297 lineages (including R1b-M73, found later in Botai and in Baltic HG) is supposed to have begun during the 11th millennium BC, but migrations to the Urals and beyond are probably concentrated in the 9th millennium, so this sample is possibly slightly early for R1b.
NOTE. User Rozenfeld at Anthrogenica posted this, which I think is interesting (in case anyone wants to try a Y-SNP call):
there is something strange with Sidelkino EHG: first, its archaeological context is not described in the supplementary. Second, its sex is not listed in the supplementary tables. Third, after looking for info about this sample, I found that: “Сиделькино-3. Для снятия вопроса о половой принадлежности индивида была проведена генетическая экспертиза, выявившая принадлежность останков мужчине.”(translation: Sidelkino-3. To resolve the question about sex of the remains, the genetic analysis was conducted, which showed that remains belonged to male), source: http://static.iea.ras.ru/books/7487_Traditsii.pdf
So either they haven’t mentioned his Y-DNA in the paper for some reason, or there are more than one Sidelkino sample and the male one has not yet been published. The coverage of the Sidelkino sample from the paper is 2.9, more than enough to tell Y-DNA haplogroup.
My speculative guess right now about specific population movements in far eastern Europe, based on the few data we have:
The expansion of the North-Eastern Technocomplex first around the 9th millennium BC, most likely expanded R1b-P279 ca. 11300 BC, judging by its TMRCA, with both R1b-M73 (TMRCA 5300) and R1b-M269 (TMRCA 4400 BC) info (with extra El Mirón ancestry) back, and thus Eurasiatic.
The expansion of haplogroup J1 to the north may have happened before or after the R1b-P279 expansion. Judging by the increase in AG3-related ancestry near Karelia compared to Baltic_HG, it is possible that it expanded just after R1b-P279 (hence possibly J1-Y6304? TMRCA 9700 BC). Its long-lasting presence in the Caucasus is supported by the Satsurblia (ca. 11300 BC) and the Dolmen BA (ca. 1300 BC) samples.
The expansion of R1a-M17 ca. 6600 BC is still likely to have happened from the east, based on the R1a-M17 samples found in Baikalic cultures slightly later (ca. 5300 BC). The presence of elevated Baikal_EN ancestry in Karelia HG and in Samara HG, and the finding of R1a-M417 samples in the Forest Zone after the Mesolithic suggests a connection with the expansion of Hunter-Gatherer pottery, from the Elshanka culture in the Samara region northward into the Forset Zone and westward into the North Pontic area.
The expansion of R1b-M73 ca. 5300 BC is likely to be associated with the emergence of a group east of the Urals (related to the later Botai culture, and potentially Pre-Yukaghir). Its presence in a Narva sample from Donkalnis (ca. 5200 BC) suggest either an early split and spread of both R1b-P297 lineages (M73 and M269) through Eastern Europe, or maybe a back-migration with hunter-gatherer pottery.
R1b-M269 spread successfully ca. 4400 BC (and R1b-L23 ca. 4100 BC, both based on TMRCA), and this successful expansion is probably to be associated with the Khvalynsk-Novodanilovka expansion. We already know that Samara_HG ca. 5600 was R1b1a, so it is likely that R1b-M269 appeared (or ‘resurged’) in the Volga-Ural region shortly after the expansion of R1a-M17, whose expansion through the region may be inferred by the additional AG3 and Baikal_EN ancestry. Interesting from Samara_HG compared to the previous Sidelkino sample is the introduction of more El Mirón-related ancestry, typical of WHG populations (and thus proper of Baltic groups).
NOTE. The TMRCA dates are obviously gross approximations, because a) the actual rate of mutation is unknown and b) TMRCA estimates are based on the convergence of lineages that survived. The potential finding of R1a-Z645 (possibly Z93+) in Ukraine Eneolithic (ca. 4000 BC), and the potential finding of R1b-L23 in Khvalynsk ca. 4250 BC complicates things further, in terms of dates and origins of any subclade.
The question thus remains as it was long ago: did R1b-M269 lineages expand (‘return’) from the east, near the Urals, or directly from the north? Were they already near Samara at the same time as the expansion of hunter-gatherer pottery, and were not much affected by it? Or did they ‘resurge’ from populations admixed with Caucasus-related ancestry after the expansion of R1a-M17 with this pottery (since there are different stepped expansions from the Samara region)? We could even ask, did R1a-M17 really expand from the east, i.e. are the dates on Baikalic subclades from Moussa et al. (2016) reliable? Or did R1a-M17 expand from some pockets in the Pontic-Caspian steppe, taking over the expansion of HG pottery at some point?
The most interesting aspect from the new paper (regarding Indo-Uralic migrations) is that Ancestral Middle Easterner ancestry will probably be a better proxy for the Anatolia_Neolithic component found in Ukraine Mesolithic to Eneolithic, and possibly also for some of the “more CHG-like” component found among Pontic-Caspian steppe populations, all likely derived from different admixture events with groups from the Caucasus.
NOTE. Even the supposed gene flow of Neolithic Iranian ancestry into the Caucasus can be put into question, since that means possibly a Dzudzuana-like population with greater “deep ancestry” proportion than the one found in CHG, which may still be found within the Caucasus.
If it was not clear already that following ‘steppe ancestry’ wherever it appears is a rather lame way of following Indo-European migrations, every single sample from the Caucasus and their admixture with Pontic-Caspian steppe populations will probably show that “steppe ancestry” is in fact formed by a variety of steppe-related ancestral components, impossible to follow coherently with a single population. Exactly what is happening already with the Siberian ancestry.
If the paper on the Dzudzuana samples has shown something, is that the expansion of an ANE-like population shook the entire Caucasus area up to the Zagros Mountains, creating this ANE – AME cline that are CHG and Iran_N, with further contributions of “deep ancestries” (probably from the south) complicating the picture further.
If this happens with few known samples, and we know of an ANE-like ghost population in the Caucasus (appearing later in the Lola culture), we can already guess that the often repeated “CHG component” found in Ukraine_Eneolithic and Khvalynsk will not be the same (except the part mediated by the Novodanilovka expansion).
This ANE-like expansion happened probably in the Late Upper Palaeolithic, and reached Northern Europe probably after the expansion of the Villabruna cluster (ca. 12000 BC), judging by the advance of AG3-like and ENA-like ancestry in later WHG samples.
The population movements during the Mesolithic and Early Neolithic in the North Pontic area are quite complicated: the extra AME ancestry is probably connected to the admixture with populations from the Caucasus, while the close similarity of Ukraine populations with Scandinavian ones (with an increase in Villabruna ancestry from Mesolithic to Neolithic samples), probably reveal population movements related to the expansion of Maglemose-related groups.
These Maglemose-related groups were probably migrants from the north-west, originally from the Northern European Plains, who occupied the previous Swiderian territory, and then expanded into the North Pontic area. The overwhelming presence of I2a (likely all I2a2a1b1b) lineages in Ukraine Neolithic supports this migration.
The likely picture of Mesolithic-Neolithic migrations in the North Pontic area right now is then:
Expansion of R1a-M459 from the east ca. 12000 BC – probably coupled with AG3 and also some Baikal_EN ancestry. First sample is I1819 from Vasilievka (ca. 8700 BC), another is from Dereivka ca. 6900 BC.
Expansion of R1b-V88 from the Balkans in the west ca. 9700 BC, based on its TMRCA and also the Balkan hunter-gatherer population overwhemingly of this haplogroup from the 10th millennium until the Neolithic. First sample is I1734 from Vasilievka (ca. 7252 BC), which suggests that it replaced the male population there, based on their similar EHG-like adxmixture (and lack of sizeable WHG increase), and shared mtDNA U5b2, U5a2.
Expansion of I2a-Y5606 probably ca. 6800 based on its TMRCA with Janislawice culture. Supporting this is the increase in WHG contribution to Neolithic samples, including the spread of U4 subclades compared to the previous period.
Expansion of R1a-M17 starting probably ca. 6600 BC in the east (see above).
NOTE. The first sample of haplogroup I appears in the Mesolithic: I1763 (ca. 8100 BC) of haplogroup I2a1, probably related to an older Upper Palaeolithic expansion.
It is becoming more and more clear with each new paper that – unless the number of very ancient samples increases – the use of Y-chromosome haplogroups remains one of the most important tools for academics; this is especially so in the steppes, in light of the diversity found in populations from the Caucasus. A clear example comes from the Yamna – Corded Ware similarities:
The presence of haplogroups Q and R1a-M459 (xM17) in Khvalynsk along with a R1b1a sample, which some interpreted as being akin to modern ‘mixed’ populations in the past, is likely to point instead to a period of Khvalynsk-Novodanilovka expansion with R1b-M269, where different small populations from the steppe were being integrated into the common Khvalynsk stock, but where differences are seen in material culture surrounding their burials, as supported by the finding of R1b1 in the Kuban area already in the first half of the 5th millennium. The case would be similar to the early ‘mixed’ Icelandic population.
Only after the emergence of the Samara culture (in the second half of the 6th millennium BC), with a sample of haplogroup R1b1a, starts then the obvious connection with Early Proto-Indo-Europeans; and only after the appearance of late Sredni Stog and haplogroup R1a-M417 (ca. 4000 BC) is its connection with Uralic also clear. In previous population movements, I think more haplogroups were involved in migrations of small groups, and only some communities among them were eventually successful, expanding to be dominant, creating ever growing cultures during their expansions.
Indeed, if you think in terms of Uralic and Indo-European just as converging languages, and forget their potential genetic connection, then the genetic + linguistic picture becomes simplified, and the upper frontier of the 6th millennium BC with a division North Pontic (Mariupol) vs. Volga-Ural (Samara) is enough. However, tracing their movements backwards – with cultural expansions from west to east (with the expansion of farming), and earlier east to west (with hunter-gatherer pottery), and still earlier west to east (with the north-eastern technocomplex), offers an interesting way to prove their potential connection to macrofamilies, at least in terms of population movements.
I am quite convinced right now that it would be possible to connect the expansion of R1b-L754 subclades with a speculative Nostratic (given the R1b-V88 connection with Afroasiatic, and the obvious connection of R1b-L297 with Eurasiatic). Paradoxically, the connection of an Indo-Uralic community in the steppes (after the separation of Yukaghir) with any lineage expansion (R1a-M17, R1b-M269, or even Q, I or J1) seems somehow blurrier than one year ago, possibly just because there are too many open possibilities.
David Reich says about the admixture with Neanderthals, which he helped discover:
At the conclusion of the Neanderthal genome project, I am still amazed by the surprises we encountered. Having found the first evidence of interbreeding between Neanderthals and modern humans, I continue to have nightmares that the finding is some kind of mistake. But the data are sternly consistent: the evidence for Neanderthal interbreeding turns out to be everywhere. As we continue to do genetic work, we keep encountering more and more patterns that reflect the extraordinary impact this interbreeding has had on the genomes of people living today.
I think this is a shared feeling among many of us who have made proposals about anything, to fear that we have made a gross, evident mistake, and constantly look for flaws. However, it seems to me that geneticists are more preoccupied with being wrong in their developed statistical methods, in the theoretical models they are creating, and not so much about errors in the true ancient ethnolinguistic picture human population genetics is (at least in theory) concerned about. Their publications are, after all, constantly associating genetic finds with cultures and (whenever possible) languages, so this aspect of their research should not be taken lightly.
Seeing how David Anthony or Razib Khan (among many others) have changed their previously preferred migration models as new data was published, and they continue to be respected in their own fields, I guess we can be confident that professionals with integrity are going to accept whatever new picture appears. While I don’t think that genetic finds can change what we can reconstruct with comparative grammar, I am also ready to revise guesstimates and routes of expansion of certain dialects if R1a-Z645 is shown to have accompanied Late Proto-Indo-Europeans during their expansion with Yamna, and later integrated somehow with Corded Ware.
However, taking into account the obsession of some with an ancestral, uninterrupted R1a—Indo-European association, and the lack of actual political repercussion of Neanderthal admixture, I think the most common nightmare that all genetic researchers should be worried about is to keep inflating this “Yamnaya ancestry”-based hornet’s nest, which has been constantly stirred up for the past two years, by rejecting it – or, rather, specifying it into its true complex nature.
This succession of corrections and redefinitions, coupled with the distinct Y-DNA bottleneck of each steppe population, will eventually lead to a completely different ethnolinguistic picture of the Pontic-Caspian region during the Eneolithic, which is likely to eventually piss off not only reasonable academics stubbornly attached to the CWC-IE idea, but also a part of those interested in daydreaming about their patrilineal ancestors.
Sometimes it’s better to just rip off the band-aid once and for all…
Experienced researchers, particularly those interested in population structure and historical inference, typically present STRUCTURE results alongside other methods that make different modelling assumptions. These include TreeMix, ADMIXTUREGRAPH, fineSTRUCTURE, GLOBETROTTER, f3 and D statistics, amongst many others. These models can be used both to probe whether assumptions of the model are likely to hold and to validate specific features of the results. Each also comes with its own pitfalls and difficulties of interpretation. It is not obvious that any single approach represents a direct replacement as a data summary tool. Here we build more directly on the results of STRUCTURE/ADMIXTURE by developing a new approach, badMIXTURE, to examine which features of the data are poorly fit by the model. Rather than intending to replace more specific or sophisticated analyses, we hope to encourage their use by making the limitations of the initial analysis clearer.
The default interpretation protocol
Most researchers are cautious but literal in their interpretation of STRUCTURE and ADMIXTURE results, as caricatured in Fig. 1, as it is difficult to interpret the results at all without making several of these assumptions. Here we use simulated and real data to illustrate how following this protocol can lead to inference of false histories, and how badMIXTURE can be used to examine model fit and avoid common pitfalls.
STRUCTURE and ADMIXTURE are popular because they give the user a broad-brush view of variation in genetic data, while allowing the possibility of zooming down on details about specific individuals or labelled groups. Unfortunately it is rarely the case that sampled data follows a simple history comprising a differentiation phase followed by a mixture phase, as assumed in an ADMIXTURE model and highlighted by case study 1. Naïve inferences based on this model (the Protocol of Fig. 1) can be misleading if sampling strategy or the inferred value of the number of populations K is inappropriate, or if recent bottlenecks or unobserved ancient structure appear in the data. It is therefore useful when interpreting the results obtained from real data to think of STRUCTURE and ADMIXTURE as algorithms that parsimoniously explain variation between individuals rather than as parametric models of divergence and admixture.
For example, if admixture events or genetic drift affect all members of the sample equally, then there is no variation between individuals for the model to explain. Non-African humans have a few percent Neanderthal ancestry, but this is invisible to STRUCTURE or ADMIXTURE since it does not result in differences in ancestry profiles between individuals. The same reasoning helps to explain why for most data sets—even in species such as humans where mixing is commonplace—each of the K populations is inferred by STRUCTURE/ADMIXTURE to have non-admixed representatives in the sample. If every individual in a group is in fact admixed, then (with some exceptions) the model simply shifts the allele frequencies of the inferred ancestral population to reflect the fraction of admixture that is shared by all individuals.
Several methods have been developed to estimate K, but for real data, the assumption that there is a true value is always incorrect; the question rather being whether the model is a good enough approximation to be practically useful. First, there may be close relatives in the sample which violates model assumptions. Second, there might be “isolation by distance”, meaning that there are no discrete populations at all. Third, population structure may be hierarchical, with subtle subdivisions nested within diverged groups. This kind of structure can be hard for the algorithms to detect and can lead to underestimation of K. Fourth, population structure may be fluid between historical epochs, with multiple events and structures leaving signals in the data. Many users examine the results of multiple K simultaneously but this makes interpretation more complex, especially because it makes it easier for users to find support for preconceptions about the data somewhere in the results.
In practice, the best that can be expected is that the algorithms choose the smallest number of ancestral populations that can explain the most salient variation in the data. Unless the demographic history of the sample is particularly simple, the value of K inferred according to any statistically sensible criterion is likely to be smaller than the number of distinct drift events that have practically impacted the sample. The algorithm uses variation in admixture proportions between individuals to approximately mimic the effect of more than K distinct drift events without estimating ancestral populations corresponding to each one. In other words, an admixture model is almost always “wrong” (Assumption 2 of the Core protocol, Fig. 1) and should not be interpreted without examining whether this lack of fit matters for a given question.
Because STRUCTURE/ADMIXTURE accounts for the most salient variation, results are greatly affected by sample size in common with other methods. Specifically, groups that contain fewer samples or have undergone little population-specific drift of their own are likely to be fit as mixes of multiple drifted groups, rather than assigned to their own ancestral population. Indeed, if an ancient sample is put into a data set of modern individuals, the ancient sample is typically represented as an admixture of the modern populations (e.g., ref. 28,29), which can happen even if the individual sample is older than the split date of the modern populations and thus cannot be admixed.
This paper was already available as a preprint in bioRxiv (first published in 2016) and it is incredible that it needed to wait all this time to be published. I found it weird how reviewers focused on the “tone” of the paper. I think it is great to see files from the peer review process published, but we need to know who these reviewers were, to understand their whiny remarks… A lot of geneticists out there need to develop a thick skin, or else we are going to see more and more delays based on a perceived incorrect tone towards the field, which seems a rather subjective reason to force researchers to correct a paper.
A potential hindrance to our advice to upgrade from PCA graphs to PCA biplots is that the SNPs are often so numerous that they would obscure the Items if both were graphed together. One way to reduce clutter, which is used in several figures in this article, is to present a biplot in two side-by-side panels, one for Items and one for SNPs. Another stratagem is to focus on a manageable subset of SNPs of particular interest and show only them in a biplot in order to avoid obscuring the Items. A later section on causal exploration by current methods mentions several procedures for identifying particularly relevant SNPs.
One of several data transformations is ordinarily applied to SNP data prior to PCA computations, such as centering by SNPs. These transformations make a huge difference in the appearance of PCA graphs or biplots. A SNPs-by-Items data matrix constitutes a two-way factorial design, so analysis of variance (ANOVA) recognizes three sources of variation: SNP main effects, Item main effects, and SNP-by-Item (S×I) interaction effects. Double-Centered PCA (DC-PCA) removes both main effects in order to focus on the remaining S×I interaction effects. The resulting PCs are called interaction principal components (IPCs), and are denoted by IPC1, IPC2, and so on. By way of preview, a later section on PCA variants argues that DC-PCA is best for SNP data. Surprisingly, our literature survey did not encounter even a single analysis identified as DC-PCA.
The axes in PCA graphs or biplots are often scaled to obtain a convenient shape, but actually the axes should have the same scale for many reasons emphasized recently by Malik and Piepho . However, our literature survey found a correct ratio of 1 in only 10% of the articles, a slightly faulty ratio of the larger scale over the shorter scale within 1.1 in 12%, and a substantially faulty ratio above 2 in 16% with the worst cases being ratios of 31 and 44. Especially when the scale along one PCA axis is stretched by a factor of 2 or more relative to the other axis, the relationships among various points or clusters of points are distorted and easily misinterpreted. Also, 7% of the articles failed to show the scale on one or both PCA axes, which leaves readers with an impressionistic graph that cannot be reproduced without effort. The contemporary literature on PCA of SNP data mostly violates the prohibition against stretching axes.
The percentage of variation captured by each PC is often included in the axis labels of PCA graphs or biplots. In general this information is worth including, but there are two qualifications. First, these percentages need to be interpreted relative to the size of the data matrix because large datasets can capture a small percentage and yet still be effective. For example, for a large dataset with over 107,000 SNPs for over 6,000 persons, the first two components capture only 0.3693% and 0.117% of the variation, and yet the PCA graph shows clear structure (Fig 1A in ). Contrariwise, a PCA graph could capture a large percentage of the total variation, even 50% or more, but that would not guarantee that it will show evident structure in the data. Second, the interpretation of these percentages depends on exactly how the PCA analysis was conducted, as explained in a later section on PCA variants. Readers cannot meaningfully interpret the percentages of variation captured by PCA axes when authors fail to communicate which variant of PCA was used.
Five simple recommendations for effective PCA analysis of SNP data emerge from this investigation.
Use the SNP coding 1 for the rare or minor allele and 0 for the common or major allele.
Use DC-PCA; for any other PCA variant, examine its augmented ANOVA table.
Report which SNP coding and PCA variant were selected, as required by contemporary standards in science for transparency and reproducibility, so that readers can interpret PCA results properly and reproduce PCA analyses reliably.
Produce PCA biplots of both Items and SNPs, rather than merely PCA graphs of only Items, in order to display the joint structure of Items and SNPs and thereby to facilitate causal explanations. Be aware of the arch distortion when interpreting PCA graphs or biplots.
Produce PCA biplots and graphs that have the same scale on every axis.
I read the referenced paper Biplots: Do Not Stretch Them!, by Malik and Piepho (2018), and even though it is not directly applicable to the most commonly available PCA graphs out there, it is a good reminder of the distorting effects of stretching. So for example quite recently in Krause-Kyora et al. (2018), where you can see Corded Ware and BBC samples from Central Europe clustering with samples from Yamna:
NOTE. This is related to a vertical distorsion (i.e. horizontal stretching), but possibly also to the addition of some distant outlier sample/s.
The so-called ‘Yamnaya’ ancestry
Every time I read papers like these, I remember commenters who kept swearing that genetics was the ultimate science that would solve anthropological problems, where unscientific archaeology and linguistics could not. Well, it seems that, like radiocarbon analysis, these promising developing methods need still a lot of refinement to achieve something meaningful, and that they mean nothing without traditional linguistics and archaeology… But we already knew that.
Also, if this is happening in most peer-reviewed publications, made by professional geneticists, in journals of high impact factor, you can only wonder how many more errors and misinterpretations can be found in the obscure market of so many amateur geneticists out there. Because amateur geneticist is a commonly used misnomer for people who are not geneticists (since they don’t have the most basic education in genetics), and some of them are not even ‘amateurs’ (because they are selling the outputs of bioinformatic tools)… It’s like calling healers ‘amateur doctors’.
NOTE. While everyone involved in population genetics is interested in knowing the truth, and we all have our confirmation (and other kinds of) biases, for those who get paid to tell people what they want to hear, and who have sold lots of wrong interpretations already, the incentives of ‘being right’ – and thus getting involved in crooked and paranoid behaviour regarding different interpretations – are as strong as the money they can win or loose by promoting themselves and selling more ‘product’.
As a reminder of how badly these wrong interpretations of genetic results – and the influence of the so-called ‘amateurs’ – can reflect on research groups, yet another turn of the screw by the Copenhagen group, in the oral presentations at Languages and migrations in pre-historic Europe (7-12 Aug 2018), organized by the Copenhagen University. The common theme seems to be that Bell Beaker and thus R1b-L23 subclades do represent a direct expansion from Yamna now, as opposed to being derived from Corded Ware migrants, as they supported before.
NOTE. Yes, the “Yamna → Corded Ware → Únětice / Bell Beaker” migration model is still commonplace in the Copenhagen workgroup. Yes, in 2018. Guus Kroonen had already admitted they were wrong, and it was already changed in the graphic representation accompanying a recent interview to Willerslev. However, since there is still no official retraction by anyone, it seems that each member has to reject the previous model in their own way, and at their own pace. I don’t think we can expect anyone at this point to accept responsibility for their wrong statements.
I love the newly invented arrows of migration from Yamna to the north to distinguish among dialects attributed by them to CWC groups, and the intensive use of materials from Heyd’s publications in the presentation, which means they understand he was right – except for the fact that they are used to support a completely different theory, radically opposed to those defended in Heyd’s model…
Now added to the Copenhagen’s unending proposals of language expansions, some pearls from the oral presentation:
Corded Ware north of the Carpathians of R1a lineages developed Germanic;
R1b borugh [?] Italo-Celtic;
the increase in steppe ancestry on north European Bell Beakers mean that they “were a continuation of the Yamnaya/Corded Ware expansion”;
“Corded Ware groups  stopped their expansion and took over the Bell Beaker package before migrating to England” [yep, it literally says that];
Italo-Celtic expanded to the UK and Iberia with Bell Beakers [I guess that included Lusitanian in Iberia, but not Messapian in Italy; or the opposite; or nothing like that, who knows];
2nd millennium BC Bronze Age Atlantic trade systems expanded Proto-Celtic [yep, trade systems expanded the language]
1st millennium BC expanded Gaulish with La Tène, including a “Gaulish version of Celtic to Ireland/UK” [hmmm, datBritish Gaulish indeed].
You know, because, why the hell not? A logical, stable, consequential, no-nonsense approach to Indo-European migrations, as always.
Also, compare still more invented arrows of migrations, from Mikkel Nørtoft’s Introducing the Homeland Timeline Map, going against Kristiansen’s multiple arrows, and even against the own recent fantasy map series in showing Bell Beakers stem from Yamna instead of CWC (or not, you never truly know what arrows actually mean):
I really, really loved that perennial arrow of migration from Volosovo, ca. 4000-800 BC (3000+ years, no less!), representing Uralic?, like that, without specifics – which is like saying, “somebody from the eastern forest zone, somehow, at some time, expanded something that was not Indo-European to Finland, and we couldn’t care less, except for the fact that they were certainly not R1a“.
This and Kristiansen’s arrows are the most comical invented migration routes of 2018; and that is saying something, given the dozens of similar maps that people publish in forums and blogs each week.
It’s hard to accept that this is a series of presentations made by professional linguists, archaeologists, and geneticists, as stated by the official website, and still harder to imagine that they collaborate within the same professional workgroup, which includes experienced geneticists and academics.
I propose the following video to close future presentations introducing innovative ideas like those above, to help the audience find the appropriate mood: