Grotta d’Oriente is a small coastal cave located on the island of Favignana, the largest (~20 km2) of a group of small islands forming the Egadi Archipelago, ~5 km from the NW coast of Sicily.
The Oriente C funeral pit opens in the lower portion of layer 7, specifically sublayer 7D. Two radiocarbon dates on charcoal from the sublayers 7D (12149±65 uncal. BP) and 7E, 12132±80 uncal. BP are consistent with the associated Late Epigravettian lithic assemblages (Lo Vetro and Martini, 2012; Martini et al., 2012b) and refer the burial to a period between about 14200-13800 cal. BP, when Favignana was connected to the main island (Agnesi et al., 1993; Antonioli et al., 2002; Mannino et al. 2014).
The anatomical features of Oriente C are close to those of Late Upper Palaeolithic populations of the Mediterranean and show strong affinity with other Palaeolithic individuals of Sicily. As suggested by Henke (1989) and Fabbri (1995) the hunter-gatherer populations were morphologically rather uniform.
We confirmed the originally reported mitochondrial haplogroup assignment of U2’3’4’7’8’9. This haplogroup is present in both pre- and post-LGM populations, but is rare by the Mesolithic, when U5 dominates (Posth et al.2016).
Lipson et al. (2018) (their supplementary Figure S5.1) and Villalba-Mouco et al. (2019) (their Figure 2A) showed that European Late Palaeolithic and Mesolithic hunter-gatherers fall along two main axes of genetic variation. Multidimensional scaling (MDS) of f3-statistics shows that these axes form a “V” shape (Fig. 3). (…)
Focusing further on Oriente C, we find that it shares most drift with individuals from Northern Italy, Switzerland and Luxembourg, and less with individuals from Iberia, Scandinavia, and East and Southeast Europe (Fig. 4A-B). Shared drift decreases significantly with distance (Fig. 4C) and with time (Fig. 4D) although in a linear model of drift with distance and time as a covariate, only distance (p=1.3×10-6) and not time (p=0.11) is significant. Consistent with the overall E-W cline in hunter-gatherer ancestry, genetic distance to Oriente C increases more rapidly with longitude than latitude, although this may also be affected by geographic features. For example, Oriente C shares significantly more drift with the 8,000 year-old 1,400 km distant individual from Loschbour in Luxembourg (Lazaridis et al.,2014), than with the 9,000 year old individual from Vela Spila in Croatia (Mathieson et al.,2018) only 700 km away as shown by the D-statistic (Patterson et al.,2012) D (Mbuti, Oriente C, Vela Spila, Villabruna); Z=3.42. Oriente C’s heterozygosity was slightly lower than Villabruna (14% lower at 1240k transversion sites), but this difference is not significant (bootstrap P=0.12).
Discussion and Conclusion
The robust record of radiocarbon dates proves that they reached Sicily not before 15-14 ka cal. BP, several millennia after the LGM peak. In our opinion, in fact, the hypothesis about an early colonization of Sicily by Aurignacians (Laplace, 1964; Chilardi et al., 1996) must be rejected, on the basis of a recent reinterpretation of the techno-typological features of the lithic industries from Riparo di Fontana Nuova (Martini et al., 2007; Lo Vetro and Martini, 2012; on this topic see also Di Maida et al., 2019).
These analyses have implications for understanding the origin and diffusion of the hunter-gatherers that inhabited Europe during the Late Upper Palaeolithic and Mesolithic. Our findings indicate that Oriente C shows a strong genetic relationship with Western European Late Upper Palaeolithic and Mesolithic hunter-gatherers, suggesting that the “Western hunter-gatherers” was a homogeneous population widely distributed in the Central Mediterranean, presumably as a consequence of continuous gene flow among different groups, or a range expansion following the LGM.
The South Italian corridor
Once again, a hypothesis based on phylogeography – apart from scarce archaeological and palaeolinguistic data (“Semitic”-like topo-hydronymy and substrates in Europe) – seems to be confirmed step by step. Since the finding of the Villabruna individual of hg. R1b-L754 (likely R1b-V88, like south-eastern European lineages expanded with WHG ancestry), it was quite likely to find out that southern Europe was the origin of the expansion of R1b-V88 into Africa.
The most likely explanation for the presence of “archaic” R1b-V88 subclades among modern Sardinians was, therefore, that they represented a remnant from a Late Upper Palaeolithic/Early Mesolithic population that had not been replaced in subsequent migrations, and thus that the migration of these lineages into Northern Africa and the Green Sahara happened during a period when Italy was connected by a shallower Mediterranean (and more land connections) to Northern Africa.
Nevertheless, the arguments for a quite recent expansion of R1b-V88 through the Mediterranean and into Africa keep being repeated, probably based on ancestry from the few ancient (and many modern) populations that have been investigated to date, a simplistic approach prone to important errors that overarch whole migration models.
For example, in the recent paper by Marcus et al. (2019) the presence of these lineages among ancient Sardinians (from the late 4th millennium BC on) is interpreted as an expansion of R1b-V88 with the Cardial Neolithic based on their ancestry, disregarding the millennia-long gap between these samples and the presence of this haplogroup in Palaeolithic/Mesolithic Northern Iberia and Northern Italy, and the comparatively much earlier splits in the phylogenetic tree and dispersal among African populations.
Afroasiatic and Nostratic
I was asked recently if I really believed that we could reconstruct Proto-Nostratic and connect it with any ancestral population. My answer is simple: until the Chalcolithic – when the whole picture of Indo-Europeans, Uralians, Egyptians or Semites becomes quite clear – we have just very few (linguistic, archaeological, genetic) dots which we would like to connect, and we do so the best we can. The earlier the population and proto-language, the more difficult this task becomes.
2) After that, I though it was more likely to be connected to AME ancestry and the Middle East, because of the apparent expansion of WHG from south-eastern Europe, and the potential association of Afroasiatic and (Elamo-?)Dravidian to Middle Eastern populations.
3) However, after finding more and more R1b samples expanding through northern Eurasia, spreading through the (then wider) steppe regions; and R1a essentially surviving among other groups in eastern Europe for thousands of years without being associated to significant migrations (like, say, hg. C after the Palaeolithic), it didn’t seem like this division was accurate, hence my most recent version.
But, in essence, it’s all about connecting the dots, and we have very few of them…
In linguistics, I trust traditional linguists who tend to trust other more experimental linguists (like Hyllested or Kortlandt) who consider that – in their experience – an Indo-Uralic and a Eurasiatic phylum can be reconstructed. Similarly, linguists like Kortlandt are apparently (partially) supportive of attempts like that of Allan Bomhard with Nostratic – although almost everyone is critic of the Muscovite school‘s attachment to the Brugmannian reconstruction, stuck in pre-laryngeal Proto-Indo-Anatolian and similar archaisms.
I mostly use Nostratic as a way to give a simplistic ethnolinguistic label to the genetically related prehistoric peoples whose languages we will probably never know. I think it’s becoming clear that the strongest connection right now with the expansion of potential Eurasiatic dialects is offered by ANE-related populations (hence Y-chromosome bottlenecks under hg. R, Q, probably also N), however complicated the reconstruction of that hypothetic community (and its dialectalization) may be.
What should be clear to anyone is that the attempt of many modern Afroasiatic speakers to connect their language to their own (or their own community’s main) haplogroups, frequently E and/or J, is flawed for many reasons; it was simplistic in the 2000s, but it is absurd after the advent of ancient DNA investigation and more recent investigation on SNP mutation rates. R1b-V88 should have been on the table of discussions about the expansion of Afroasiatic communities through the Green Sahara long ago, whether one supports a Nostratic phylum or not.
The fact that the role of R1b bottlenecks and expansions in the spread of Afroasiatic is usually not even discussed despite their likely connection with the most recent population expansions through the Green Sahara fitting a reasonable time frame for Proto-Afroasiatic reconstruction, a reasonable geographical homeland, and a compatible dialectal division – unlike many other proposed (E or J) subclades – reveals (once again) a lot about the reasons behind amateur interest in genetics.
NOTE. That evident interest notwithstanding, it is undeniable that we have a much better understanding of the expansions of R1b subclades than other haplogroups, probably due in great part to the easier recovery of ancient DNA from Eurasia (and Europe in particular), for many different – sociopolitical, geographical, technological – reasons. It is quite possible that a more thorough temporal transect of ancient DNA from the Middle East and Africa might radically change our understanding of population movements, especially those related to the Afroasiatic expansion. I am referring in this post to interpretations based on the data we currently have, despite that potential R1b-based bias.
As I said 6 months ago, 2019 is a tough year to write a blog, because this was going to be a complex regional election year and therefore a time of political promises, hence tenure offers too. Now the preliminary offers have been made, elections have passed, but the timing has slightly shifted toward 2020. So I may have the time, but not really any benefit of dedicating too much effort to the blog, and a lot of potential benefit of dedicating any time to evaluable scientific work.
On the other hand, I saw some potential benefit for publishing texts with ISBNs, hence the updates to the text and the preparation of these printed copies of the books, just in case. While Spain’s accreditation agency has some hard rules for becoming a tenured professor, especially for medical associates (whose years of professional experience are almost worthless compared to published peer-reviewed papers), it is quite flexible in assessing one’s merits.
However, regional and/or autonomous entities are not, and need an official identifier and preferably printed versions to evaluate publications, such as an ISBN for books. I took thus some time about a month ago to update the texts and supplementary materials, to publish a printed copy of the books with Amazon. The first copies have arrived, and they look good.
Corrections and Additions
I have changed the names and order of the books, as I intended for the first publication – as some of you may have noticed when the linguistic book was referred to as the third volume in some parts. In the first concept I just wanted to emphasize that the linguistic work had priority over the rest. Now the whole series and the linguistic volume don’t share the same name, and I hope this added clarity is for the better, despite the linguistic volume being the third one.
I have changed the nomenclature for Uralic dialects, as I said recently. I haven’t really modified anything deeper than that, because – unlike adding new information from population genomics – this would require for me to do a thorough research of the most recent publications of Uralic comparative grammar, and I just can’t begin with that right now.
Anyway, the use of terms like Finno-Ugric or Finno-Samic is as correct now for the reconstructed forms as it was before the change in nomenclature.
The most interesting recent genetic data has come from Iberia and the Mediterranean. Lacking direct data from the Italian Peninsula (and thus from the emergence of the Etruscan and Rhaetian ethnolinguistic community), it is becoming clearer how some quite early waves of Indo-Europeans and non-Indo-Europeans expanded and shrank – at least in West Iberia, West Mediterranean, and France.
Some of the main updates to the text have been made to the sections on Finno-Ugric populations, because some interesting new genetic data (especially Y-DNA) have been published in the past months. This is especially true for Baltic Finns and for Ugric populations.
Consequently, and somehow unsurprisingly, the Balto-Slavic section has been affected by this; e.g. by the identification of Early Slavs likely with central-eastern populations dominated by (at least some subclades of) hg. I2a-L621 and E1b-V13.
I have updated some cultural borders in the prehistoric maps, and the maps with Y-DNA and mtDNA. I have also added one new version of the Early Bronze age map, to better reflect the most likely location of Indo-European languages in the Early European Bronze Age.
As those in software programming will understand, major changes in the files that are used for maps and graphics come with an increasing risk of additional errors, so I would not be surprised if some major ones would be found (I already spotted three of them). Feel free to communicate these errors in any way you see fit.
I have selected more conservative SNPs in certain controversial cases.
I have also deleted most SNP-related footnotes and replaced them with the marking of each individual tentative SNP, leaving only those footnotes that give important specific information, because:
My way of referencing tentative SNP authors did not make it clear which samples were tentative, if there were more than one.
It was probably not necessary to see four names repeated 100 times over.
Often I don’t really know if the person I have listed as author of the SNP call is the true author – unless I saw the full SNP data posted directly – or just someone who reposted the results.
Sometimes there are more than one author of SNPs for a certain sample, but I might have added just one for all.
For a centralized file to host the names of those responsible for the unofficial/tentative SNPs used in the text – and to correct them if necessary -, readers will be eventually able to use Phylogeographer‘s tool for ancient Y-DNA, for which they use (partly) the same data I compiled, adding Y-Full‘s nomenclature and references. You can see another map tool in ArcGIS.
NOTE. As I say in the text, if the final working map tool does not deliver the names, I will publish another supplementary table to the text, listing all tentative SNPs with their respective author(s).
If you are interested in ancient Y-DNA and you want to help develop comprehensive and precise maps of ancient Y-DNA and mtDNA haplogroups, you can contact Hunter Provyn at Phylogeographer.com. You can also find more about phylogeography projects at Iain McDonald’s website.
I previously used certain samples prepared by amateurs from BAM files (like Botai, Okunevo, or Hittites), and the results were obviously less than satisfactory – hence my criticism of the lack of publication of prepared files by the most famous labs, especially the Copenhagen group.
Fortunately for all of us, most published datasets are free, so we don’t have to reinvent the wheel. I criticized genetic labs for not releasing all data, so now it is time for praise, at least for one of them: thank you to all responsible at the Reich Lab for this great merged dataset, which includes samples from other labs.
NOTE. I would like to make my tiny contribution here, for beginners interested in working with these files, so I will update – whenever I have time – the “How To” sections of this blog for PCAs, PCA3d, and ADMIXTURE.
For unsupervised ADMIXTURE in the maps, a K=5 is selected based on the CV, giving a kind of visual WHG : NWAN : CHG/IN : EHG : ENA, but with Steppe ancestry “in between”. Higher K gave worse CV, which I guess depends on the many ancient and modern samples selected (and on the fact that many samples are repeated from different sources in my files, because I did not have time to filter them all individually).
I found some interesting component shared by Central European populations in K=7 to K=9 (from CEU Bell Beakers to Denmark LN to Hungarian EBA to Iberia BA, in a sort of “CEU BBC ancestry” potentially related to North-West Indo-Europeans), but still, I prefer to go for a theoretically more correct visualization instead of cherry-picking the ‘best-looking’ results.
Since I made fun of the search for “Siberian ancestry” in coloured components in Tambets et al. 2018, I have to be consistent and preferred to avoid doing the same here…
In the first publication (in January) and subsequent minor revisions until March, I trusted analyses and ancestry estimates reported by amateurs in 2018, which I used for the text adding my own interpretations. Most of them have been refuted in papers from 2019, as you probably know if you have followed this blog (see very recent examples here, here, or here), compelling me to delete or change them again, and again, and again. I don’t have experience from previous years, although the current pattern must have been evidently repeated many times over, or else we would be still talking about such previous analyses as being confirmed today…
I wanted to be one step ahead of peer-reviewed publications in the books, but I prefer now to go for something safe in the book series, rather than having one potentially interesting prediction – which may or may not be right – and ten huge mistakes that I would have helped to endlessly redistribute among my readers (online and now in print) based on some cherry-picked pairwise comparisons. This is especially true when predictions of “Steppe“- and/or “Siberian“-related ancestry have been published, which, for some reason, seem to go horribly wrong most of the time.
I am sure whole books can be written about why and how this happened (and how this is going to keep happening), based on psychology and sociology, but the reasons are irrelevant, and that would be a futile effort; like writing books about glottochronology and its intermittent popularity due to misunderstood scientist trends. The most efficient way to deal with this problem is to avoid such information altogether, because – as you can see in the current revised text – they wouldn’t really add anything essential to the content of these books, anyway.
Interesting excerpts (emphasis mine, edited for clarity):
On the high frequency of R1b-V88
Our genome-wide data allowed us to assign Y haplogroups for 25 ancient Sardinian individuals. More than half of them consist of R1b-V88 (n=10) or I2-M223 (n=7).
Francalacci et al. (2013) identified three major Sardinia-specific founder clades based on present-day variation within the haplogroups I2-M26, G2-L91 and R1b-V88, and here we found each of those broader haplogroups in at least one ancient Sardinian individual. Two major present-day Sardinian haplogroups, R1b-M269 and E-M215, are absent.
Compared to other Neolithic and present-day European populations, the number of identified R1b-V88 carriers is relatively high.
(…)ancient Sardinian mtDNA haplotypes belong almost exclusively to macro-haplogroups HV (n = 16), JT (n = 17) and U (n = 9), a composition broadly similar to other European Neolithic populations.
On the origin of a Vasconic-like Paleosardo with the Western EEF
(…) the Neolithic (and also later) ancient Sardinian individuals sit between early Neolithic Iberian and later Copper Age Iberian populations, roughly on an axis that differentiates WHG and EEF populations and embedded in a cluster that additionally includes Neolithic British individuals. This result is also evident in terms of absolute genetic differentiation, with low pairwise FST ~ 0.005 +- 0.002 between Neolithic Sardinian individuals and Neolithic western mainland European populations. Pairwise outgroup-f3 analysis shows a very similar pattern, with the highest values of f3 (i.e. most shared drift) being with Neolithic and Copper Age Iberia, gradually dropping off for temporally and geographically distant populations.
In explicit admixture models (using qpAdm, see Methods) the southern French Neolithic individuals (France-N) are the most consistent with being a single source for Neolithic Sardinia (p ~ 0:074 to reject the model of one population being the direct source of the other); followed by other populations associated with the western Mediterranean Neolithic Cardial Ware expansion.
Pervasive Western Hunter-Gatherer ancestry in Iberian/French/Sardinian population
Similar to western European Neolithic and central European Late Neolithic populations, ancient Sardinian individuals are shifted towards WHG individuals in the top two PCs relative to early Neolithic Anatolians Admixture analysis using qpAdm infers that ancient Sardinian individuals harbour HG ancestry (~ 17%) that is higher than early Neolithic mainland populations (including Iberia, ~ 8%), but lower than Copper Age Iberians (~ 25%) and about the same as Southern French Middle-Neolithic individuals (~ 21%).
Continuity from Sardinia Neolithic through the Nuragic
We found several lines of evidence supporting genetic continuity from the Sardinian Neolithic into the Bronze Age and Nuragic times. Importantly, we observed low genetic differentiation between ancient Sardinian individuals from various time periods.
A qpAdm analysis, which is based on simultaneously testing f-statistics with a number of outgroups and adjusts for correlations, cannot reject a model of Neolithic Sardinian individuals being a direct predecessor of Nuragic Sardinian individuals (…) Our qpAdm analysis further shows that the WHG ancestry proportion, in a model of admixture with Neolithic Anatolia, remains stable at ~17% throughout three ancient time-periods.
Steppe influx in Modern Sardinians
While contemporary Sardinian individuals show the highest affinity towards EEF-associated populations among all of the modern populations, they also display membership with other clusters (Fig. 5). In contrast to ancient Sardinian individuals, present-day Sardinian individuals carry a modest “Steppe-like” ancestry component (but generally less than continental present-day European populations), and an appreciable broadly “eastern Mediterranean” ancestry component (also inferred at a high fraction in other present-day Mediterranean populations, such as Sicily and Greece).
Mitochondrial genomes from all three individuals belong to the U5a2d haplogroup. (…) The mitochondrial U5a2d haplogroup is consistent with earlier published results for ancient individuals from Scandinavia, U5a being the most common within SHG. Of the 16 Mesolithic individuals from Scandinavia published prior to our study, seven belong to the U5a haplogroup, nine share the U2 and U4 haplogroups
We divided the SHG group into two groups: SHGa and SHGb (ancient individuals found in contemporary Norway and Sweden, respectively). We based this on both the geographical distribution and the previous studies demonstrating the close relation of SHGa to EHG group and SHGb to WHG group. To further explore the demography within the SHG group, we compared the ancestry of BLE individuals within SHGa and SHGb groups. This comparison revealed a high relative shared drift between BLE individuals and the SHGb group
The results from Huseby Kiev allow us to finally connect the SHG group with the eastern pressure blade technology. However, the higher genetic affinity between Huseby Kiev individuals and the WHG group challenges the earlier suggested tie between eastern technology and EHG genetics. Our results suggest either early cultural transmission, or a more complex course of events involving both non- and co-dependent cultural and genetic admixture.
Seeing how culture is indeed usually associated with the expansion of a certain population, especially at such an early date, I guess this similarity with WHG of incoming eastern peoples comes from an originally EHG population expanding into a mainly WHG area in the west (similar to what happens e.g. with Bell Beakers), or being replaced later by a WHG population which adopted the culture (similar to what happened with late Corded Ware populations in central-east Europe after the expansion of Bell Beakers).
Unlike later periods, it will always be difficult to judge such ancient population movements with few samples covering thousands of years… Probably specific Y-DNA haplogroups would help differentiate between both expanding populations from east and west.
An interesting aspect of the paper, hidden among so many relevant details, is a clearer picture of how the so-called Yamnaya or steppe ancestry evolved from Samara hunter-gatherers to Yamna nomadic pastoralists, and how this ancestry appeared among Proto-Corded Ware populations.
Please note: arrows of “ancestry movement” in the following PCAs do not necessarily represent physical population movements, or even ethnolinguistic change. To avoid misinterpretations, I have depicted arrows with Y-DNA haplogroup migrations to represent the most likely true ethnolinguistic movements. Admixture graphics shown are from Wang et al. (2018), and also (the K12) from Mathieson et al. (2018).
1. Samara to Early Khvalynsk
The so-called steppe ancestry was born during the Khvalynsk expansion through the steppes, probably through exogamy of expanding elite clans (eventually all R1b-M269 lineages) originally of Samara_HG ancestry. The nearest group to the ANE-like ghost population with which Samara hunter-gatherers admixed is represented by the Steppe_Eneolithic / Steppe_Maykop cluster (from the Northern Caucasus Piedmont).
Steppe_Eneolithic samples, of R1b1 lineages, are probably expanded Khvalynsk peoples, showing thus a proximate ancestry of an Early Eneolithic ghost population of the Northern Caucasus. Steppe_Maykop samples represent a later replacement of this Steppe_Eneolithic population – and/or a similar population with further contribution of ANE-like ancestry – in the area some 1,000 years later.
This is what Steppe_Maykop looks like, different from Steppe_Eneolithic:
NOTE. This admixture shows how different Steppe_Maykop is from Steppe_Eneolithic, but in the different supervised ADMIXTURE graphics below Maykop_Eneolithic is roughly equivalent to Eneolithic_Steppe (see orange arrow in ADMIXTURE graphic above). This is useful for a simplified analysis, but actual differences between Khvalynsk, Sredni Stog, Afanasevo, Yamna and Corded Ware are probably underestimated in the analyses below, and will become clearer in the future when more ancestral hunter-gatherer populations are added to the analysis.
2. Early Khvalynsk expansion
We have direct data of Khvalynsk-Novodanilovka-like populations thanks to Khvalynsk and Steppe_Eneolithic samples (although I’ve used the latter above to represent the ghost Caucasus population with which Samara_HG admixed).
We also have indirect data. First, there is the PCA with outliers:
Second, we have data from north Pontic Ukraine_Eneolithic samples (see next section).
Third, there is the continuity of late Repin / Afanasevo with Steppe_Eneolithic (see below).
3. Proto-Corded Ware expansion
It is unclear if R1a-M459 subclades were continuously in the steppe and resurged after the Khvalynsk expansion, or (the most likely option) they came from the forested region of the Upper Dnieper area, possibly from previous expansions there with hunter-gatherer pottery.
Supporting the latter is the millennia-long continuity of R1b-V88 and I2a2 subclades in the north Pontic Mesolithic, Neolithic, and Early Eneolithic Sredni Stog culture, until ca. 4500 BC (and even later, during the second half).
Only at the end of the Early Eneolithic with the disappearance of Novodanilovka (and beginning of the steppe ‘hiatus’ of Rassamakin) is R1a to be found in Ukraine again (after disappearing from the record some 2,000 years earlier), related to complex population movements in the north Pontic area.
NOTE. In the PCA, a tentative position of Novodanilovka closer to Anatolia_Neolithic / Dzudzuana ancestry is selected, based on the apparent cline formed by Ukraine_Eneolithic samples, and on the position and ancestry of Sredni Stog, Yamna, and Corded Ware later. A good alternative would be to place Novodanilovka still closer to the Balkan outliers (i.e. Suvorovo), and a source closer to EHG as the ancestry driven by the migration of R1a-M417.
The first sample with steppe ancestry appears only after 4250 BC in the forest-steppe, centuries after the samples with steppe ancestry from the Northern Caucasus and the Balkans, which points to exogamy of expanding R1a-M417 lineages with the remnants of the Novodanilovka population.
4. Repin / Early Yamna expansion
We don’t have direct data on early Repin settlers. But we do have a very close representative: Afanasevo, a population we know comes directly from the Repin/late Khvalynsk expansion ca. 3500/3300 BC (just before the emergence of Early Yamna), and which shows fully Steppe_Eneolithic-like ancestry.
Compared to this eastern Repin expansion that gave Afanasevo, the late Repin expansion to the west ca. 3300 BC that gave rise to the Yamna culture was one of colonization, evidenced by the admixture with north Pontic (Sredni Stog-like) populations, no doubt through exogamy:
This admixture is also found (in lesser proportion) in east Yamna groups, which supports the high mobility and exogamy practices among western and eastern Yamna clans, not only with locals:
We don’t have a comparison with Ukraine_Eneolithic or Corded Ware samples in Wang et al. (2018), but we do have proximate sources for Abashevo, when compared to the Poltavka population (with which it admixed in the Volga-Ural steppes): Sintashta, Potapovka, Srubna (with further Abashevo contribution), and Andronovo:
The two CWC outliers from the Baltic show what I thought was an admixture with Yamna. However, given the previous mixture of Eneolithic_Steppe in north Pontic steppe-forest populations, this elevated “steppe ancestry” found in Baltic_LN (similar to west Yamna) seems rather an admixture of Baltic sub-Neolithic peoples with a north Pontic Eneolithic_Steppe-like population. Late Repin settlers also admixed with a similar population during its colonization of the north Pontic area, hence the Baltic_LN – west Yamna similarities.
NOTE. A direct admixture with west Yamna populations through exogamy by the ancestors of this Baltic population cannot be ruled out yet (without direct access to more samples), though, because of the contacts of Corded Ware with west Yamna settlers in the forest-steppe regions.
A similar case is found in the Yamna outlier from Mednikarovo south of the Danube. It would be absurd to think that Yamna from the Balkans comes from Corded Ware (or vice versa), just because the former is closer in the PCA to the latter than other Yamna samples. The same error is also found e.g. in the Corded Ware → Bell Beaker theory, because of their proximity in the PCA and their shared “steppe ancestry”. All those theories have been proven already wrong.
NOTE. A similar fallacy is found in potential Sintashta→Mycenaean connections, where we should distinguish statistically that result from an East/West Yamna + Balkans_BA admixture. In fact, genetic links of Mycenaeans with west Yamna settlers prove this (there are some related analyses in Anthrogenica, but the site is down at this moment). To try to relate these two populations (separated more than 1,000 years before Sintashta) is like comparing ancient populations to modern ones, without the intermediate samples to trace the real anthropological trail of what is found…Pure numbers and wishful thinking.
It has been known for a long time that the Caucasus must have hosted many (at least partially) isolated populations, probably helped by geographical boundaries, setting it apart from open Eurasian areas.
David Reich writes in his book the following about India:
The genetic data told a clear story. Around a third of Indian groups experienced population bottlenecks as strong or stronger than the ones that occurred among Finns or Ashkenazi Jews. We later confirmed this finding in an even larger dataset that we collected working with Thangaraj: genetic data from more than 250 jati groups spread throughout India (…)
Rather than an invention of colonialism as Dirks suggested, long-term endogamy as embodied in India today in the institution of caste has been overwhelmingly important for millennia. (…)
The Han Chinese are truly a large population. They have been mixing freely for thousands of years. In contrast, there are few if any Indian groups that are demographically very large, and the degree of genetic differentiation among Indian jati groups living side by side in the same village is typically two to three times higher than the genetic differentiation between northern and southern Europeans. The truth is that India is composed of a large number of small populations.
There is little doubt now, based on findings spanning thousands of years, that the Mesolithic and Neolithic Caucasus hosted various very small populations, even if the ancestral components may be reduced to the few known to date (such as ANE, EHG, AME*, ENA, CHG, and other “deep” ancestral components).
NOTE. I will call the ancestral component of Dzudzuana/Anatolian hunter-gatherers Ancient Middle Easterner (AME), to give a clear idea of its likely extension during the Late Upper Palaeolithic, and to avoid using the more simplistic Dzudzuana, unless it is useful to mention these specific local samples.
Genetic labs have a strong fixation with ancestry. I guess the use of complex statistical methods gives professionals and laymen alike the feeling of dealing with “Science”, as opposed to academic fields where you have to interpret data. I think language reveals a lot about the way people think, and the fact that ancestral components are called ‘lineages’ – while not wrong per se – is a clear symptom of the lack of interest in the true lineages: Y-DNA haplogroups.
It has become quite clear that male-biased migrations are often the ones which can be confidently followed for actual population movements and ethnolinguistic identification, at least until the Iron Age. The frequently used Palaeolithic clusters offer a clear example of why ancestry does not represent what some people believe: They merely give a basic idea of sizeable population replacements by distant peoples.
Both concepts are important: sizeable and distant peoples. For example, during the Upper Palaeolithic in Europe there was a sizeable population replacement of the Aurignacian Goyet cluster by the Gravettian Vestonice cluster (probably from populations of far eastern Russia) coupled with the arrival of haplogroup I, although during the thousands of years that this material culture lasted, the previously expanded C1a2 lineages did not disappear, and there were probably different resurgence and admixture events.
Haplogroup I certainly expanded with the Gravettian culture to Iberia, where the Goyet ancestry did not change much – probably because of male-driven migrations -, to the extent that during the Magdalenian expansions haplogroup I expanded with an ancestry closer to Goyet, in what is called a ‘resurge’ of the Goyet cluster – even though there is a clear replacement of male lines.
The Villabruna (WHG) cluster is another good example. It probably spread with haplogroup R1b-L754, which – based on the extra ‘East Asian’ affinity of some samples and on modern samples from the Middle East – came probably from the east through a southern route, and not too long before the expansion of WHG likely from around the Black Sea, although this is still unclear. The finding of haplogroup I in samples of mostly WHG ancestry could confuse people that do not care about timing, sub-structured populations, and gene flow.
NOTE. If you don’t understand why ‘clusters’ that span thousands of years don’t really matter for the many Palaeolithic population expansions that certainly happened among hunter-gatherers in Europe, just take a look at what happened with Bell Beakers expanding from Yamna into western Europe within 500 years.
If we don’t thread carefully when talking about population migrations, these terms are bound to confuse people. Just as the fixation on “steppe ancestry” – which marks the arrival in Chalcolithic Europe of peoples from the Pontic-Caspian region – has confused a lot of researchers to this day.
When I began to write about the Indo-European demic diffusion model, my concern was to find a single spot where a North-West Indo-European proto-language could have expanded from ca. 2000 BC (our most common guesstimate). Based on the 2015 papers, and in spite of their conclusions, I thought it had become clear that Corded Ware was not it, and it was rather Bell Beakers. I assumed that Uralic was spoken to the north (as was the traditional belief), and thus Corded Ware expanded from the forest zone, hence steppe ancestry would also be found there with other R1a lineages.
With the publication of Mathieson et al. (2017) and Olalde et al. (2017), I changed my mind, seeing how “steppe ancestry” did in fact appear quite late, hence it was likely to be the result of very specific population movements, probably directly from the Caucasus. Later, Mathieson published in a revision the sample from Alexandria of hg R1a-M417 (probably R1a-Z645, possibly Z93+), which further supported the idea that the migration of Corded Ware peoples started near the North Pontic forest-steppe (as I included in a the next revision).
The question remains the same I repeated recently, though: where do the extra Caucasus components (i.e. beyond EHG) of Eneolithic Ukraine/Corded Ware and Khvalynsk/Yamna come from?
Considering 2-way mixtures, we can model Karelia_HG as deriving 34 ± 2.8% of its ancestry from a Villabruna-related source, with the remainder mainly from ANE represented by the AfontovaGora3 (AG3) sample from Lake Baikal ~17kya.
AG3 was likely of haplogroup Q1a (as reported by YFull, see Genetiker), and probably the ANE ancestry found in Eastern Europe accompanied a Palaeolithic migration of Q1a2-M25 (formed ca. 22600 BC, TMRCA ca. 14300 BC).
Combined with what we know about the Eneolithic Steppe and Caucasus populations – it is likely that ANE ancestry remained the most important component of some of the small ghost populations of the Caucasus until their emergence with the Lola culture.
The first sample we have now attributed to the EHG cluster is Sidelkino, from the Samara region (ca. 9300 BC), mtDNA U5a2. In Damgaard et al. (Science 2018), Yamnaya could be modelled as a CHG population related to Kotias Klde (54%) and the remaining from ANE population related to Sidelkino (>46%), with the following split events:
A split event, where the CHG component of Yamnaya splits from KK1. The model inferred this time at 27 kya (though we note the larger models in Sections S2.12.4 and S2.12.5 inferred a more recent split time).
A split event, where the ANE component of Yamnaya splits from Sidelkino. This was inferred at about about 11 kya.
A split event, where the ANE component of Yamnaya splits from Botai. We inferred this to occur 17 kya. Note that this is above the Sidelkino split time, so our model infers Yamnaya to be more closely related to the EHG Sidelkino, as expected.
An ancestral split event between the CHG and ANE ancestral populations. This was inferred to occur around 40 kya.
Other samples classified as of the EHG cluster:
Popovo2 (ca. 6250 BC) of hg J1, mtDNA U4d – Po2 and Po4 from the same site (ca. 6550 BC) show continuity of mtDNA.
Karelia_HG, from Juzhnii Oleni Ostrov (ca. 6300 BC): I0211/UzOO40 (ca. 6300 BC) of hg J1(xJ1a), mtDNA U4a; and I0061/UzOO74 of hg R1a1(xR1a1a), mtDNA C1
UzOO77 and UzOO76 from Juzhnii Oleni Ostrov (ca. 5250 BC) of mtDNA R1b.
Samara_HG from Lebyanzhinka (ca. 5600 BC) of hg R1b1a, mtDNA U5a1d.
About the enigmatic Anatolia_Neolithic-related ancestry found in Pontic-Caspian steppe samples, this is what Wang et al. (2018) had to say:
We focused on model of mixture of proximal sources such as CHG and Anatolian Chalcolithic for all six groups of the Caucasus cluster (Eneolithic Caucasus, Maykop and Late Makyop, Maykop-Novosvobodnaya, Kura-Araxes, and Dolmen LBA), with admixture proportions on a genetic cline of 40-72% Anatolian Chalcolithic related and 28-60% CHG related (Supplementary Table 7). When we explored Romania_EN and Greece_Neolithic individuals as alternative southeast European sources (30-46% and 36-49%), the CHG proportions increased to 54-70% and 51-64%, respectively. We hypothesize that alternative models, replacing the Anatolian Chalcolithic individual with yet unsampled populations from eastern Anatolia, South Caucasus or northern Mesopotamia, would probably also provide a fit to the data from some of the tested Caucasus groups.
The first appearance of ‘Near Eastern farmer related ancestry’ in the steppe zone is evident in Steppe Maykop outliers. However, PCA results also suggest that Yamnaya and later groups of the West Eurasian steppe carry some farmer related ancestry as they are slightly shifted towards ‘European Neolithic groups’ in PC2 (Fig. 2D) compared to Eneolithic steppe. This is not the case for the preceding Eneolithic steppe individuals. The tilting cline is also confirmed by admixture f3-statistics, which provide statistically negative values for AG3 as one source and any Anatolian Neolithic related group as a second source
Detailed exploration via D-statistics in the form of D(EHG, steppe group; X, Mbuti) and D(Samara_Eneolithic, steppe group; X, Mbuti) show significantly negative D values for most of the steppe groups when X is a member of the Caucasus cluster or one of the Levant/Anatolia farmer-related groups (Supplementary Figs. 5 and 6). In addition, we used f- and D-statistics to explore the shared ancestry with Anatolian Neolithic as well as the reciprocal relationship between Anatolian- and Iranian farmer-related ancestry for all groups of our two main clusters and relevant adjacent regions (Supplementary Fig. 4). Here, we observe an increase in farmer-related ancestry (both Anatolian and Iranian) in our Steppe cluster, ranging from Eneolithic steppe to later groups. In Middle/Late Bronze Age groups especially to the north and east we observe a further increase of Anatolian farmer related ancestry consistent with previous studies of the Poltavka, Andronovo, Srubnaya and Sintashta groups and reflecting a different process not especially related to events in the Caucasus.
(…) Surprisingly, we found that a minimum of four streams of ancestry is needed to explain all eleven steppe ancestry groups tested, including previously published ones (Fig. 2; Supplementary Table 12). Importantly, our results show a subtle contribution of both Anatolian farmer-related ancestry and WHG-related ancestry (Fig.4; Supplementary Tables 13 and 14), which was likely contributed through Middle and Late Neolithic farming groups from adjacent regions in the West. The discovery of a quite old AME ancestry has rendered this probably unnecessary, because this admixture from an Anatolian-like ghost population could be driven even by small populations from the Caucasus.
While it is not yet fully clear, the increased Anatolian_Neolithic-like ancestry in Ukraine_Eneolithic samples (see below) makes it unlikely that all such ancestry in Corded Ware groups comes from a GAC-related contribution. It is likely that at least part of it represents contributions from populations of the Caucasus, based on the mostly westward population movements in the steppe from ca. 4600 BC on, including the Suvorovo-Novodanilovka expansion, and especially the Kuban-Maykop expansion during the final Eneolithic into the North Pontic area.
NOTE. Since CHG-like groups from the Caucasus may have combinations of AME and ANE ancestry similar to Yamna (which may thus appear as ‘steppe ancestry’ in the North Pontic area), it is impossible to interpret with precision the following ADMIXTURE graphic:
The East Asian contribution to samples from the WHG samples (like Loschbour or La Braña), as specified in Fu et al. (2016), does not seem to be related to Baikal_EN, and appears possibly (in the ADMIXTURE analysis) integrated into he Villabruna component. I guess this implies that the shared alleles with East Asians are quite early, and potentially due to the expansion of R1b-L754 from the East.
It would be interesting to know the specific material culture Sidelkino belonged to – i.e. if it was related to the expansion of the North-Eastern Technocomplex – , and its Y-DNA. The Post-Swiderian expansion into eastern Europe, probably associated with the expansion of R1b-P297 lineages (including R1b-M73, found later in Botai and in Baltic HG) is supposed to have begun during the 11th millennium BC, but migrations to the Urals and beyond are probably concentrated in the 9th millennium, so this sample is possibly slightly early for R1b.
NOTE. User Rozenfeld at Anthrogenica posted this, which I think is interesting (in case anyone wants to try a Y-SNP call):
there is something strange with Sidelkino EHG: first, its archaeological context is not described in the supplementary. Second, its sex is not listed in the supplementary tables. Third, after looking for info about this sample, I found that: “Сиделькино-3. Для снятия вопроса о половой принадлежности индивида была проведена генетическая экспертиза, выявившая принадлежность останков мужчине.”(translation: Sidelkino-3. To resolve the question about sex of the remains, the genetic analysis was conducted, which showed that remains belonged to male), source: http://static.iea.ras.ru/books/7487_Traditsii.pdf
So either they haven’t mentioned his Y-DNA in the paper for some reason, or there are more than one Sidelkino sample and the male one has not yet been published. The coverage of the Sidelkino sample from the paper is 2.9, more than enough to tell Y-DNA haplogroup.
My speculative guess right now about specific population movements in far eastern Europe, based on the few data we have:
The expansion of the North-Eastern Technocomplex first around the 9th millennium BC, most likely expanded R1b-P279 ca. 11300 BC, judging by its TMRCA, with both R1b-M73 (TMRCA 5300) and R1b-M269 (TMRCA 4400 BC) info (with extra El Mirón ancestry) back, and thus Eurasiatic.
The expansion of haplogroup J1 to the north may have happened before or after the R1b-P279 expansion. Judging by the increase in AG3-related ancestry near Karelia compared to Baltic_HG, it is possible that it expanded just after R1b-P279 (hence possibly J1-Y6304? TMRCA 9700 BC). Its long-lasting presence in the Caucasus is supported by the Satsurblia (ca. 11300 BC) and the Dolmen BA (ca. 1300 BC) samples.
The expansion of R1a-M17 ca. 6600 BC is still likely to have happened from the east, based on the R1a-M17 samples found in Baikalic cultures slightly later (ca. 5300 BC). The presence of elevated Baikal_EN ancestry in Karelia HG and in Samara HG, and the finding of R1a-M417 samples in the Forest Zone after the Mesolithic suggests a connection with the expansion of Hunter-Gatherer pottery, from the Elshanka culture in the Samara region northward into the Forset Zone and westward into the North Pontic area.
The expansion of R1b-M73 ca. 5300 BC is likely to be associated with the emergence of a group east of the Urals (related to the later Botai culture, and potentially Pre-Yukaghir). Its presence in a Narva sample from Donkalnis (ca. 5200 BC) suggest either an early split and spread of both R1b-P297 lineages (M73 and M269) through Eastern Europe, or maybe a back-migration with hunter-gatherer pottery.
R1b-M269 spread successfully ca. 4400 BC (and R1b-L23 ca. 4100 BC, both based on TMRCA), and this successful expansion is probably to be associated with the Khvalynsk-Novodanilovka expansion. We already know that Samara_HG ca. 5600 was R1b1a, so it is likely that R1b-M269 appeared (or ‘resurged’) in the Volga-Ural region shortly after the expansion of R1a-M17, whose expansion through the region may be inferred by the additional AG3 and Baikal_EN ancestry. Interesting from Samara_HG compared to the previous Sidelkino sample is the introduction of more El Mirón-related ancestry, typical of WHG populations (and thus proper of Baltic groups).
NOTE. The TMRCA dates are obviously gross approximations, because a) the actual rate of mutation is unknown and b) TMRCA estimates are based on the convergence of lineages that survived. The potential finding of R1a-Z645 (possibly Z93+) in Ukraine Eneolithic (ca. 4000 BC), and the potential finding of R1b-L23 in Khvalynsk ca. 4250 BC complicates things further, in terms of dates and origins of any subclade.
The question thus remains as it was long ago: did R1b-M269 lineages expand (‘return’) from the east, near the Urals, or directly from the north? Were they already near Samara at the same time as the expansion of hunter-gatherer pottery, and were not much affected by it? Or did they ‘resurge’ from populations admixed with Caucasus-related ancestry after the expansion of R1a-M17 with this pottery (since there are different stepped expansions from the Samara region)? We could even ask, did R1a-M17 really expand from the east, i.e. are the dates on Baikalic subclades from Moussa et al. (2016) reliable? Or did R1a-M17 expand from some pockets in the Pontic-Caspian steppe, taking over the expansion of HG pottery at some point?
The most interesting aspect from the new paper (regarding Indo-Uralic migrations) is that Ancestral Middle Easterner ancestry will probably be a better proxy for the Anatolia_Neolithic component found in Ukraine Mesolithic to Eneolithic, and possibly also for some of the “more CHG-like” component found among Pontic-Caspian steppe populations, all likely derived from different admixture events with groups from the Caucasus.
NOTE. Even the supposed gene flow of Neolithic Iranian ancestry into the Caucasus can be put into question, since that means possibly a Dzudzuana-like population with greater “deep ancestry” proportion than the one found in CHG, which may still be found within the Caucasus.
If it was not clear already that following ‘steppe ancestry’ wherever it appears is a rather lame way of following Indo-European migrations, every single sample from the Caucasus and their admixture with Pontic-Caspian steppe populations will probably show that “steppe ancestry” is in fact formed by a variety of steppe-related ancestral components, impossible to follow coherently with a single population. Exactly what is happening already with the Siberian ancestry.
If the paper on the Dzudzuana samples has shown something, is that the expansion of an ANE-like population shook the entire Caucasus area up to the Zagros Mountains, creating this ANE – AME cline that are CHG and Iran_N, with further contributions of “deep ancestries” (probably from the south) complicating the picture further.
If this happens with few known samples, and we know of an ANE-like ghost population in the Caucasus (appearing later in the Lola culture), we can already guess that the often repeated “CHG component” found in Ukraine_Eneolithic and Khvalynsk will not be the same (except the part mediated by the Novodanilovka expansion).
This ANE-like expansion happened probably in the Late Upper Palaeolithic, and reached Northern Europe probably after the expansion of the Villabruna cluster (ca. 12000 BC), judging by the advance of AG3-like and ENA-like ancestry in later WHG samples.
The population movements during the Mesolithic and Early Neolithic in the North Pontic area are quite complicated: the extra AME ancestry is probably connected to the admixture with populations from the Caucasus, while the close similarity of Ukraine populations with Scandinavian ones (with an increase in Villabruna ancestry from Mesolithic to Neolithic samples), probably reveal population movements related to the expansion of Maglemose-related groups.
These Maglemose-related groups were probably migrants from the north-west, originally from the Northern European Plains, who occupied the previous Swiderian territory, and then expanded into the North Pontic area. The overwhelming presence of I2a (likely all I2a2a1b1b) lineages in Ukraine Neolithic supports this migration.
The likely picture of Mesolithic-Neolithic migrations in the North Pontic area right now is then:
Expansion of R1a-M459 from the east ca. 12000 BC – probably coupled with AG3 and also some Baikal_EN ancestry. First sample is I1819 from Vasilievka (ca. 8700 BC), another is from Dereivka ca. 6900 BC.
Expansion of R1b-V88 from the Balkans in the west ca. 9700 BC, based on its TMRCA and also the Balkan hunter-gatherer population overwhemingly of this haplogroup from the 10th millennium until the Neolithic. First sample is I1734 from Vasilievka (ca. 7252 BC), which suggests that it replaced the male population there, based on their similar EHG-like adxmixture (and lack of sizeable WHG increase), and shared mtDNA U5b2, U5a2.
Expansion of I2a-Y5606 probably ca. 6800 based on its TMRCA with Janislawice culture. Supporting this is the increase in WHG contribution to Neolithic samples, including the spread of U4 subclades compared to the previous period.
Expansion of R1a-M17 starting probably ca. 6600 BC in the east (see above).
NOTE. The first sample of haplogroup I appears in the Mesolithic: I1763 (ca. 8100 BC) of haplogroup I2a1, probably related to an older Upper Palaeolithic expansion.
It is becoming more and more clear with each new paper that – unless the number of very ancient samples increases – the use of Y-chromosome haplogroups remains one of the most important tools for academics; this is especially so in the steppes, in light of the diversity found in populations from the Caucasus. A clear example comes from the Yamna – Corded Ware similarities:
The presence of haplogroups Q and R1a-M459 (xM17) in Khvalynsk along with a R1b1a sample, which some interpreted as being akin to modern ‘mixed’ populations in the past, is likely to point instead to a period of Khvalynsk-Novodanilovka expansion with R1b-M269, where different small populations from the steppe were being integrated into the common Khvalynsk stock, but where differences are seen in material culture surrounding their burials, as supported by the finding of R1b1 in the Kuban area already in the first half of the 5th millennium. The case would be similar to the early ‘mixed’ Icelandic population.
Only after the emergence of the Samara culture (in the second half of the 6th millennium BC), with a sample of haplogroup R1b1a, starts then the obvious connection with Early Proto-Indo-Europeans; and only after the appearance of late Sredni Stog and haplogroup R1a-M417 (ca. 4000 BC) is its connection with Uralic also clear. In previous population movements, I think more haplogroups were involved in migrations of small groups, and only some communities among them were eventually successful, expanding to be dominant, creating ever growing cultures during their expansions.
Indeed, if you think in terms of Uralic and Indo-European just as converging languages, and forget their potential genetic connection, then the genetic + linguistic picture becomes simplified, and the upper frontier of the 6th millennium BC with a division North Pontic (Mariupol) vs. Volga-Ural (Samara) is enough. However, tracing their movements backwards – with cultural expansions from west to east (with the expansion of farming), and earlier east to west (with hunter-gatherer pottery), and still earlier west to east (with the north-eastern technocomplex), offers an interesting way to prove their potential connection to macrofamilies, at least in terms of population movements.
I am quite convinced right now that it would be possible to connect the expansion of R1b-L754 subclades with a speculative Nostratic (given the R1b-V88 connection with Afroasiatic, and the obvious connection of R1b-L297 with Eurasiatic). Paradoxically, the connection of an Indo-Uralic community in the steppes (after the separation of Yukaghir) with any lineage expansion (R1a-M17, R1b-M269, or even Q, I or J1) seems somehow blurrier than one year ago, possibly just because there are too many open possibilities.
David Reich says about the admixture with Neanderthals, which he helped discover:
At the conclusion of the Neanderthal genome project, I am still amazed by the surprises we encountered. Having found the first evidence of interbreeding between Neanderthals and modern humans, I continue to have nightmares that the finding is some kind of mistake. But the data are sternly consistent: the evidence for Neanderthal interbreeding turns out to be everywhere. As we continue to do genetic work, we keep encountering more and more patterns that reflect the extraordinary impact this interbreeding has had on the genomes of people living today.
I think this is a shared feeling among many of us who have made proposals about anything, to fear that we have made a gross, evident mistake, and constantly look for flaws. However, it seems to me that geneticists are more preoccupied with being wrong in their developed statistical methods, in the theoretical models they are creating, and not so much about errors in the true ancient ethnolinguistic picture human population genetics is (at least in theory) concerned about. Their publications are, after all, constantly associating genetic finds with cultures and (whenever possible) languages, so this aspect of their research should not be taken lightly.
Seeing how David Anthony or Razib Khan (among many others) have changed their previously preferred migration models as new data was published, and they continue to be respected in their own fields, I guess we can be confident that professionals with integrity are going to accept whatever new picture appears. While I don’t think that genetic finds can change what we can reconstruct with comparative grammar, I am also ready to revise guesstimates and routes of expansion of certain dialects if R1a-Z645 is shown to have accompanied Late Proto-Indo-Europeans during their expansion with Yamna, and later integrated somehow with Corded Ware.
However, taking into account the obsession of some with an ancestral, uninterrupted R1a—Indo-European association, and the lack of actual political repercussion of Neanderthal admixture, I think the most common nightmare that all genetic researchers should be worried about is to keep inflating this “Yamnaya ancestry”-based hornet’s nest, which has been constantly stirred up for the past two years, by rejecting it – or, rather, specifying it into its true complex nature.
This succession of corrections and redefinitions, coupled with the distinct Y-DNA bottleneck of each steppe population, will eventually lead to a completely different ethnolinguistic picture of the Pontic-Caspian region during the Eneolithic, which is likely to eventually piss off not only reasonable academics stubbornly attached to the CWC-IE idea, but also a part of those interested in daydreaming about their patrilineal ancestors.
Sometimes it’s better to just rip off the band-aid once and for all…
Most mtDNA lineages found are characteristic of the early Neolithic farmers in south-eastern and central Europe of the Starčevo-Kőrös-Criş and LBK cultures. Haplogroups N1a, T2, J, K, and V, which are found in the Neolithic BKG, TRB, GAC and Early Bronze Age samples, are part of the mitochondrial ‘Neolithic package’ (which also includes haplogroups HV, V, and W) that was introduced to Europe with farmers migrating from Anatolia at the onset of the Neolithic17,31.
A noteworthy proportion of Mesolithic haplogroup U5 is also found among the individuals of the current study. The proportion of haplogroup U5 already present in the earliest of the analysed Neolithic groups from the examined area differs from the expected pattern of diversity of mtDNA lineages based on a previous archaeological view and on the aDNA findings from the neighbouring regions which were settled by post-Linear farmers similar to BKG at that time. A large proportion of Mesolithic haplogroups in late-Danubian farmers in Kuyavia was also shown in previous studies concerning BKG samples based on mtDNA only, although these frequencies were derived on the basis of very small sample sizes.
A significant genetic influence of HG populations persisted in this region at least until the Eneolithic/Early Bronze Age period, when steppe migrants arrived to central Europe. The presence of two outliers from the middle and late phases of the BKG in Kuyavia associated with typical Neolithic burial contexts provides evidence that hunter-farmer contacts were not restricted to the final period of this culture and were marked by various episodes of interaction between two societies with distinct cultural and subsistence differences.
The identification of both mitochondrial and Y-chromosome haplogroup lineages of Mesolithic provenance (U5 and I, respectively) in the BKG support the theory that both male and female hunter-gatherers became part of these Neolithic agricultural societies, as has been reported for similar cases from the Carpathian Basin, and the Balkans. The identification of an individual with WHG affinity, dated to ca. 4300 BCE, in a Middle Neolithic context within a BKG settlement, provides direct evidence for the regional existence of HG enclaves that persisted and coexisted at least for over 1000 years, from the arrival of the LBK farmers ca. 5400 BCE until ca. 4300 BCE, in proximity with Neolithic settlements, but without admixing with their inhabitants.
The analysis of two Late Neolithic cultures, the GAC and CWC, shows that steppe ancestry was present only among the CWC individuals analysed, and that the single GAC individual had more WHG ancestry than previous local Neolithic individuals. (…) The CWC’s affinity to WHG, however, contrasts with results from published CWC individuals that identified steppe ancestry related to Yamnaya as the major contributor to the CWC genomes, while here we report also substantial contributions from WHG that could relate to the late persistence of pockets of WHG populations, as supported by the admixture results of N42 and the finding of the 4300-year-old N22 HG individual. These results agree with archaeological theories that suggest that the CWC interaction with incoming steppe cultures was complex and that it varied by region.
About the analyzed CWC samples, it is remarkable that, even though they are somehow related to each other, they do not form a tight cluster. Also, their Y-DNA (I2a), and this:
When compared to previously published CWC data, our CWC group (not individuals) is genetically significantly closer to WHG than to steppe individuals (Z = −4.898), a result which is in contrast with those for CWC from Germany (Z = 2.336), Estonia (Z = 0.555), and Latvia (Z = 1.553).
Włodarczak (2017) talks about the CWC period in Poland after ca. 2600 BC as a time of emergence of an allochthnous population, marked by the rare graves of this area, showing infiltrations initially mainly from Lesser Poland, and later (after 2500 BC) from the western Baltic zone.
Since forest sub-Neolithic populations would have probably given more EHG to the typical CWC population, these samples support the resurge of ‘local’ pockets of GAC- or TRB-like groups with more WHG (and also Levant_Neolithic) ancestry.
The known presence of I2a2a1b lineages in GAC groups in Poland also supports this interpretation, and the subsistence of such pockets of pre-steppe-like populations is also seen with the same or similar lineages appearing in comparable ‘resurge’ events in Central Europe, e.g. in samples from the Únětice and Tumulus culture.
About the Bronze Age sample, we have at last official confirmation of haplogroup R1a1a (sadly no subclade*) at the very beginning of the Trzciniec period – in a region between western (Iwno) and eastern (Strzyżów) groups related to Mierzanowice – , which has to be put in relation with the samples from the final Trzciniec period in the Baltic published in Mittnik et al. (2018).
EDIT (8 OCT 2018): More specific subclades have been published, including a R1a-Z280 lineage for the Bronze Age sample (see spreadsheet).
This confirms the early resurge of R1a-Z645 (probably R1a-Z282) lineages at the core of the developing East European Bronze Age, a province of the European Bronze Age that emerged from evolving Bell Beaker groups in Poland.
I don’t have any hope that the Balto-Slavic evolution through BBC Poland → Mierzanowice/Iwno → Trzciniec → Lusatian cultures is going to be confirmed any time soon, until we have a complete trail of samples to follow all the way to historic Slavs of the Prague culture. However, I do think that the current data on central-east Europe – and the recent data we are receiving from north-east Europe and the Iranian steppes, at odds with the Indo-Slavonic alternative – supports this model.
I guess that, in the end, similar to how the Yamna vs. Corded Ware question is being solved, the real route of expansion of Proto-Balto-Slavic (supposedly spoken ca. 1500-1000 BC) is probably going to be decided by the expansion of either R1a-M458 (from the west) or R1a-Z280 lineages (from the east), because the limited precision of genetic data and analyses available today are going to show ‘modern Slavic’-like populations from the whole eastern half of Europe for the past 4,000 years…
The Middle Neolithic is known to mark the westward expansion of Comb Ware and related cultures in North-Eastern Europe.
Mathieson et al. (2017 and 2018) had this to say about the Middle Neolithic in the Baltic:
At Zvejnieki in Latvia, using 17 newly reported individuals and additional data for 5 previously reported34 individuals, we observe a transition in hunter-gatherer-related ancestry that is opposite to that seen in Ukraine. We find that Mesolithic and Early Neolithic individuals (labelled ‘Latvia_HG’) associated with the Kunda and Narva cultures have ancestry that is intermediate between WHG (approximately 70%) and EHG (approximately 30%), consistent with previous reports34–36(Supplementary Table 3). We also detect a shift in ancestry between Early Neolithic individuals and those associated with the Middle Neolithic Comb Ware complex (labelled ‘Latvia_MN’), who have more EHG-related ancestry; we estimate that the ancestry of Latvia_MN individuals comprises 65% EHG-related ancestry, but two of the four individuals appear to be 100% EHG in principal component space (Fig. 1b).
Other samples and errors on Y-SNP calls
The truth is, this is another sample (Latvia_MN_dup.I4627.SG) from the same individual ZVEJ26.
There is another sample used for the analysis of ZVEJ26, with the same data as in Mathieson et al. (2018), i.e. better coverage, and Y-DNA R1b1a1a(xR1b1a1a2).
Most samples in the tables from Wang et al. (2018) seem to be classified correctly, as in previous papers, but for:
Blätterhöhle Cave sample from Lipson et al. (2017), wrongly classified (again) as R1b1a1a2a1a2a1b2 (I am surprised no R1b-autochtonous-continuity-fan rushed to proclaim something based on this);
Mal’ta 1 sample from Raghavan et al. (2013) as R1b1a1a2;
Iron Gates HG, Schela Cladovey from Gonzalez Fortes (2017) as R1b1a1a2;
Oase1 from Fu (2015) as N1c1a;
samples from Skoglund et al. (2017) from Africa also wrongly classified as R1b1a1a2 and subclades.
It seems therefore that the poor coverage / SNPs hit on autosomes is the key common factor here for these Y-SNP calls, and so it is in the Zvejnieki MN1 duplicated sample. Anyway, if all Y-SNP calls come from the same software applied to all data, and this is going to be used in future papers, this seems to be a great improvement compared to Narasimhan et al. (2018)…
EDIT (25 JUN 2018): I have been reviewing some more papers apart from Mathieson et al. (2018) and Olalde et al. (2018) to compare the reported haplogroups, and there seems to be many potential errors (or updated data, difficult to say sometimes, especially when the newly reported haplogroup is just one or two subclades below the reported one in ‘old’ papers), not only those listed above.
The sample accession number in the European Nucleotide Archive (ENA) is SAMEA45565168 (Latvia_MN1/ZVEJ26) (see here), in case anyone used to this kind of analysis wishes to repeat the Y-SNP calls on both samples.
EDIT (25 JUN 2018): Added that it is another sample with lesser coverage from the same ZVEJ26 individual.