I also noticed after publishing the draft that I had used, although not voluntarily, the wording “Corded Ware outlier” at least once. I certainly had that term in mind when developing the third version, but I did not intend to write it down formally. Nevertheless, I think it is the right name to use.
Outlier in Statistics, as you can infer from the name, is a sample (more precisely an observation) that lies distant to others. It is a slippery concept in Human Evolutionary Biology, because it has no clear definition, and it is thus dependent on a certain degree of subjective evaluation. It seems to be mainly based on a combination of PCA and ADMIXTURE analyses, but should obviously be dependent on the number of samples available for a certain culture, and the regional distribution of the samples available.
We have thus certain clear cases, like the Poltavka outlier, of R1a-M417 lineage, clustering close to Corded Ware (and Sintashta, and Potapovka) samples, but far from other R1b-L23 samples from Poltavka or Yamna cultures, from neighbouring regions in the steppe.
We have also less clear observations, like Balkan Chalcolithic samples, which may or may not have been part of different cultural groups (say, related to the Suvorovo-Novodanilovka expansion, or not), which may justify their differences in ancestral components in ADMIXTURE, and in their position in PCA.
And we have a Yamna sample from western Ukraine, which – unlike the other two available samples – clusters “to the south” of east Yamna samples. Taking into account the Yamna sample from Bulgaria, clustering closely with south-eastern European samples, could you really call this an outlier? Two outliers out of four western Yamna samples? Well, maybe. If you take east and west Yamna from the steppe as a whole, and exclude the Yamna sample from Bulgaria, of course you can. Whether that classification is useful, or actually hinders a proper interpretation of western Yamna samples, and of the “Yamna component” seen in them, is a different story…
But what then about the Corded Ware sample from Erperstedt labelled I0104, dated ca. 2430 BC, which clusters among contemporaneous steppe (Poltavka) samples, and has the greatest proportion of ‘Yamna component’ in ADMIXTURE? After all, it is different in both respects from any other Corded Ware individual – including the oldest samples available, from Latvia (ca. 2885 BC) and Tiefbrunn (ca. 2755 BC).
This sample is one of the direct links between the steppe and Corded Ware in late times, and has been the main reason for the confusion a lot of people seem to have about the “Yamna component” in Corded Ware, with some supporting a direct migration from one into the other, and a few even daring to say that “Corded Ware is indistinguishable from Yamna”(!?).
The same family few generations later shows a decreased Yamna component, which clearly indicates that this individual’s admixture came directly from the steppe. That is compatible with the nomadic nature of the Corded Ware culture (and its known exogamy practices), which connected central Europe with the steppes, up to the North Caspian region.
If labelling other samples as outliers may be interesting to improve the conclusions one can obtain from genetic research, labelling this sample is, in my opinion, essential, to avoid certain strong misconceptions about the origin of the Corded Ware culture.
I have just uploaded the working draft of the third version of the Indo-European demic diffusion model. Unlike the previous two versions, which were published as essays (fully developed papers), this new version adds more information on human admixture, and probably needs important corrections before a definitive edition can be published.
The third version is available right now on ResearchGate and Academia.edu. I will post the PDF at Academia Prisca, as soon as possible:
Feel free to comment on the paper here, or (preferably) in our forum.
A working version (needing some corrections) divided by sections, illustrated with up-to-date, high resolution maps, can be found (as always) at the official collaborative Wiki website indo-european.info.
Genetic and archaeological studies have established a sub-Saharan African origin for anatomically modern humans with subsequent migrations out of Africa. Using the largest multi-locus data set known to date, we investigated genetic differentiation of early modern humans, human admixture and migration events, and relationships among ancestries and language groups. We compiled publicly available genome-wide genotype data on 5,966 individuals from 282 global samples, representing 30 primary language families. The best evidence supports 21 ancestries that delineate genetic structure of present-day human populations. Independent of self-identified ethno-linguistic labels, the vast majority (97.3%) of individuals have mixed ancestry, with evidence of multiple ancestries in 96.8% of samples and on all continents. The data indicate that continents, ethno-linguistic groups, races, ethnicities, and individuals all show substantial ancestral heterogeneity. We estimated correlation coefficients ranging from 0.522 to 0.962 between ancestries and language families or branches. Ancestry data support the grouping of Kwadi-Khoe, Kx’a, and Tuu languages, support the exclusion of Omotic languages from the Afroasiatic language family, and do not support the proposed Dené-Yeniseian language family as a genetically valid grouping. Ancestry data yield insight into a deeper past than linguistic data can, while linguistic data provide clarity to ancestry data.
Regarding European ancestry:
Southern European ancestry correlates with both Italic and Basque speakers (r = 0.764, p = 6.34 × 10−49). Northern European ancestry correlates with Germanic and Balto-Slavic branches of the Indo-European language family as well as Finno-Ugric and Mordvinic languages of the Uralic family (r = 0.672, p = 4.67 × 10−34). Italic, Germanic, and Balto-Slavic are all branches of the Indo-European language family, while the correlation with languages of the Uralic family is consistent with an ancient migration event from Northern Asia into Northern Europe. Kalash ancestry is widely spread but is the majority ancestry only in the Kalash people (Table S3). The Kalasha language is classified within the Indo-Iranian branch of the Indo-European language family.
Sure, admixture analysis came to save the day. Yet again. Now it’s not just Archaeology related to language anymore, it’s Linguistics; all modern languages and their classification, no less. Because why the hell not? Why would anyone study languages, history, archaeology, etc. when you can run certain algorithms on free datasets of modern populations to explain everything?
What I am criticising here, as always, is not the study per se, its methods (PCA, the use of Admixture or any other tools), or its results, which might be quite interesting – even regarding the origin or position of certain languages (or more precisely their speakers) within their linguistic groups; it’s the many broad, unsupported, striking conclusions (read the article if you want to see more wishful thinking).
This is obviously simplistic citebait – that benefits only journals and authors, and it is therefore tacitly encouraged -, but not knowledge, because it is not supported by any linguistic or archaeological data or expertise.
Is anyone with a minimum knowledge of languages, or general anthropology, actually reviewing these articles?
Agriculture first reached the Iberian Peninsula around 5700 BCE. However, little is known about the genetic structure and changes of prehistoric populations in different geographic areas of Iberia. In our study, we focused on the maternal genetic makeup of the Neolithic (~ 5500-3000 BCE), Chalcolithic (~ 3000-2200 BCE) and Early Bronze Age (~ 2200-1500 BCE). We report ancient mitochondrial DNA results of 213 individuals (151 HVS-I sequences) from the northeast, central, southeast and southwest regions and thus on the largest archaeogenetic dataset from the Peninsula to date. Similar to other parts of Europe, we observe a discontinuity between hunter-gatherers and the first farmers of the Neolithic. During the subsequent periods, we detect regional continuity of Early Neolithic lineages across Iberia, however the genetic contribution of hunter-gatherers is generally higher than in other parts of Europe and varies regionally. In contrast to ancient DNA findings from Central Europe, we do not observe a major turnover in the mtDNA record of the Iberian Late Chalcolithic and Early Bronze Age, suggesting that the population history of the Iberian Peninsula is distinct in character.
Detailed conclusions of their work,
The present study, based on 213 new and 125 published mtDNA data of prehistoric Iberian individuals suggests a more complex mode of interaction between local hunter-gatherers and incoming early farmers during the Early and Middle Neolithic of the Iberian Peninsula, as compared to Central Europe. A characteristic of Iberian population dynamics is the proportion of autochthonous hunter-gatherer haplogroups, which increased in relation to the distance to the Mediterranean coast. In contrast, the early farmers in Central Europe showed comparatively little admixture of contemporaneous hunter-gatherer groups. Already during the first centuries of Neolithic transition in Iberia, we observe a mix of female DNA lineages of different origins. Earlier hunter-gatherer haplogroups were found together with a variety of new lineages, which ultimately derive from Near Eastern farming groups. On the other hand, some early Neolithic sites in northeast Iberia, especially the early group from the cave site of Els Trocs in the central Pyrenees, seem to exhibit affinities to Central European LBK communities. The diversity of female lineages in the Iberian communities continued even during the Chalcolithic, when populations became more homogeneous, indicating higher mobility and admixture across different geographic regions. Even though the sample size available for Early Bronze Age populations is still limited, especially with regards to El Argar groups, we observe no significant changes to the mitochondrial DNA pool until the end of our time transect (1500 BCE). The expansion of groups from the eastern steppe, which profoundly impacted Late Neolithic and EBA groups of Central and North Europe, cannot (yet) be seen in the contemporaneous population substrate of the Iberian Peninsula at the present level of genetic resolution. This highlights the distinct character of the Neolithic transition both in the Iberian Peninsula and elsewhere and emphasizes the need for further in depth archaeogenetic studies for reconstructing the close reciprocal relationship of genetic and cultural processes on the population level.
So it seems more and more likely that the North-West Indo-European invasion during the Copper Age (signaled by changes in Y-DNA lineages) was not, as in central Europe, accompanied by much mtDNA turnover. What that means – either a male-dominated invasion, or a longer internal evolution of invasive Y-DNA subclades – remains to bee seen, but I am still more inclined to see the former as the most likely interpretation, in spite of admixture results.
Observable patterns of cultural variation are consistently intertwined with demic movements, cultural diffusion, and adaptation to different ecological contexts [Cavalli-Sforza and Feldman (1981) Cultural Transmission and Evolution: A Quantitative Approach; Boyd and Richerson (1985) Culture and the Evolutionary Process]. The quantitative study of gene–culture coevolution has focused in particular on the mechanisms responsible for change in frequency and attributes of cultural traits, the spread of cultural information through demic and cultural diffusion, and detecting relationships between genetic and cultural lineages. Here, we make use of worldwide whole-genome sequences [Pagani et al. (2016) Nature 538:238–242] to assess the impact of processes involving population movement and replacement on cultural diversity, focusing on the variability observed in folktale traditions (n = 596) [Uther (2004) The Types of International Folktales: A Classification and Bibliography. Based on the System of Antti Aarne and Stith Thompson] in Eurasia. We find that a model of cultural diffusion predicted by isolation-by-distance alone is not sufficient to explain the observed patterns, especially at small spatial scales (up to ~4,000 km). We also provide an empirical approach to infer presence and impact of ethnolinguistic barriers preventing the unbiased transmission of both genetic and cultural information. After correcting for the effect of ethnolinguistic boundaries, we show that, of the alternative models that we propose, the one entailing cultural diffusion biased by linguistic differences is the most plausible. Additionally, we identify 15 tales that are more likely to be predominantly transmitted through population movement and replacement and locate putative focal areas for a set of tales that are spread worldwide.
I am very interested in folktales and their origins within Proto-Indo-European culture, so the title alone was an immediate click-bait for me. It did, as always, disappoint in its methods and conclusions, but just the idea it proposes is of great interest for future studies.
There are gross limitations in assessing folktales using simply the Aarne-Thompson-Uther Classification without further analysis or explanation, apart from a summary of tales in the supplementary materials.
But their maps and simplistic hypothesized waves of diffusion (‘African origin’, ‘northern Eurasian’, ‘Eastern European’, or ‘Middle-Eastern/Caucasian’) seem to me as if they try to swim with the tide of the current literature regarding the identification of Proto-Indo-European demic diffusion with “steppe admixture” distribution (and ancient language family diffusion in general through admixture), and as such it can only be wrong.
If you just look at actual folktale distribution (black dots) and compare them with prehistoric cultures and ancient Y-DNA distribution, you realize their maps don’t make much sense, and more complex methods (and a clearer idea of what admixture represents) are needed.
If their intention was to get published in a journal of high impact factor, they succeeded, so good for them. I am glad this subject gets more attention. Of course, their conclusions are kept formally in line with the many limitations of their methods, and are the most interesting aspect of the article:
By correcting for the presence of ethnolinguistic barriers, we find that the null model of cultural diffusion predicted by IBD alone cannot explain the observed distribution of folktales across Eurasia. Instead, beyond ~4,000 km, cultural diffusion biased by linguistic barriers exhibits the highest correlation at all geographic bins. At small geographic bins (<4,000 km), population movements and linguistic barriers may be more relevant than geographic proximity, pointing once again at the possible importance of small-scale processes of cultural transmission for testing more specific hypotheses when using genetic evidence. In addition, processes other than simple cultural diffusion may be more relevant for a smaller group of tales shared by pairs of populations that are genetically closer than populations not exhibiting those tales. Looking for smaller packages of tales or individual tales and their variants can be useful to shed light on the formation process of this vast body of popular knowledge. The long-range patterns detected by our analyses may complement this picture by suggesting a more ancient origin of some of these folktales (SI Appendix). On a broader level, these results can be used in the future to infer directional trends of cultural dispersal as well as to test for the emergence of systematic social biases [such as prestige bias, conformism/anticonformism, heterophily, and content-dependent biases] or cultural barriers different from linguistic ones, which have a chronology that may be independently ascertained.
If you are interested in studies about folktales, and especially those related to Indo-European traditions, you can check out the following articles I found interesting in the past:
Featured image (featured also in the article): Possible focal area and dispersion pattern for tale ATU313 “The Magic Flight,” one the most popular folktales in this dataset, which may have been additionally spread through population movement and replacement. It is interesting to note how this tale reached locations that are far from its putative origin (such as Japan and southeastern Africa), whereas it was not retained by many populations located in between (gray dots).
Haplogroup R1b-M269 comprises most Western European Y chromosomes; of its main branches, R1b-DF27 is by far the least known, and it appears to be highly prevalent only in Iberia. We have genotyped 1072 R1b-DF27 chromosomes for six additional SNPs and 17 Y-STRs in population samples from Spain, Portugal and France in order to further characterize this lineage and, in particular, to ascertain the time and place where it originated, as well as its subsequent dynamics. We found that R1b-DF27 is present in frequencies ~40% in Iberian populations and up to 70% in Basques, but it drops quickly to 6–20% in France. Overall, the age of R1b-DF27 is estimated at ~4,200 years ago, at the transition between the Neolithic and the Bronze Age, when the Y chromosome landscape of W Europe was thoroughly remodeled. In spite of its high frequency in Basques, Y-STR internal diversity of R1b-DF27 is lower there, and results in more recent age estimates; NE Iberia is the most likely place of origin of DF27. Subhaplogroup frequencies within R1b-DF27 are geographically structured, and show domains that are reminiscent of the pre-Roman Celtic/Iberian division, or of the medieval Christian kingdoms.
Some people like to say that Y-DNA haplogroup analysis, or phylogeography in general, is of no use anymore (especially modern phylogeography), and they are content to see how ‘steppe admixture’ was (or even is) distributed in Europe to draw conclusions about ancient languages and their expansion. With each new paper, we are seeing the advantages of analysing ancient and modern haplogroups in ascertaining population movements.
Quite recently there was a suggestion based on steppe admixture that Basque-speaking Iberians resisted the invasion from the steppe. Observing the results of this article (dates of expansion and demographic data) we see a clear expansion of Y-DNA haplogroups precisely by the time of Bell Beaker expansion from the east. Y-DNA haplogroups of ancient samples from Portugal point exactly to the same conclusion.
The recent article on Mycenaean and Minoan genetics also showed that, when it comes to Europe, most of the demographic patterns we see in admixture are reminiscent of the previous situation, only rarely can we see a clear change in admixture (which would mean an important, sudden replacement of the previous population).
The following are excerpts from the article (emphasis is mine):
Dates and expansions
The average STR variance of DF27 and each subhaplogroup is presented in Suppl. Table 2. As expected, internal diversity was higher in the deeper, older branches of the phylogeny. If the same diversity was divided by population, the most salient finding is that native Basques (Table 2) have a lower diversity than other populations, which contrasts with the fact that DF27 is notably more frequent in Basques than elsewhere in Iberia (Suppl. Table 1). Diversity can also be measured as pairwise differences distributions (Fig. 5). The distribution of mean pairwise differences within Z195 sits practically on top of that of DF27; L176.2 and Z220 have similar distributions, as M167 and Z278 have as well; finally, M153 shows the lowest pairwise distribution values. This pattern is likely to reflect the respective ages of the haplogroups, which we have estimated by a modified, weighted version of the ρ statistic (see Methods).
Z195 seems to have appeared almost simultaneously within DF27, since its estimated age is actually older (4570 ± 140 ya). Of the two branches stemming from Z195, L176.2 seems to be slightly younger than Z220 (2960 ± 230 ya vs. 3320 ± 200 ya), although the confidence intervals slightly overlap. M167 is clearly younger, at 2600 ± 250 ya, a similar age to that of Z278 (2740 ± 270 ya). Finally, M153 is estimated to have appeared just 1930 ± 470 ya.
Haplogroup ages can also be estimated within each population, although they should be interpreted with caution (see Discussion). For the whole of DF27, (Table 3), the highest estimate was in Aragon (4530 ± 700 ya), and the lowest in France (3430 ± 520 ya); it was 3930 ± 310 ya in Basques. Z195 was apparently oldest in Catalonia (4580 ± 240 ya), and with France (3450 ± 269 ya) and the Basques (3260 ± 198 ya) having lower estimates. On the contrary, in the Z220 branch, the oldest estimates appear in North-Central Spain (3720 ± 313 ya for Z220, 3420 ± 349 ya for Z278). The Basques always produce lower estimates, even for M153, which is almost absent elsewhere.
The median value for Tstart has been estimated at 103 generations (Table 4), with a 95% highest probability density (HPD) range of 50–287 generations; effective population size increased from 131 (95% HPD: 100–370) to 72,811 (95% HPD: 52,522–95,334). Considering patrilineal generation times of 30–35 years, our results indicate that R1b-DF27 started its expansion ~3,000–3,500 ya, shortly after its TMRCA.
As a reference, we applied the same analysis to the whole of R1b-S116, as well as to other common haplogroups such as G2a, I2, and J2a. Interestingly, all four haplogroups showed clear evidence of an expansion (p > 0.99 in all cases), all of them starting at the same time, ~50 generations ago (Table 4), and with similar estimated initial and final populations. Thus, these four haplogroups point to a common population expansion, even though I2 (TMRCA, weighted ρ, 7,800 ya) and J2a (TMRCA, 5,500 ya) are older than R1b-DF27. It is worth noting that the expansion of these haplogroups happened after the TMRCA of R1b-DF27.
Sum up and discussion
We have characterized the geographical distribution and phylogenetic structure of haplogroup R1b-DF27 in W. Europe, particularly in Iberia, where it reaches its highest frequencies (40–70%). The age of this haplogroup appears clear: with independent samples (our samples vs. the 1000 genome project dataset) and independent methods (variation in 15 STRs vs. whole Y-chromosome sequences), the age of R1b-DF27 is firmly grounded around 4000–4500 ya, which coincides with the population upheaval in W. Europe at the transition between the Neolithic and the Bronze Age. Before this period, R1b-M269 was rare in the ancient DNA record, and during it the current frequencies were rapidly reached. It is also one of the haplogroups (along with its daughter clades, R1b-U106 and R1b-S116) with a sequence structure that shows signs of a population explosion or burst. STR diversity in our dataset is much more compatible with population growth than with stationarity, as shown by the ABC results, but, contrary to other haplogroups such as the whole of R1b-S116, G2a, I2 or J2a, the start of this growth is closer to the TMRCA of the haplogroup. Although the median time for the start of the expansion is older in R1b-DF27 than in other haplogroups, and could suggest the action of a different demographic process, all HPD intervals broadly overlap, and thus, a common demographic history may have affected the whole of the Y chromosome diversity in Iberia. The HPD intervals encompass a broad timeframe, and could reflect the post-Neolithic population expansions from the Bronze Age to the Roman Empire.
While when R1b-DF27 appeared seems clear, where it originated may be more difficult to pinpoint. If we extrapolated directly from haplogroup frequencies, then R1b-DF27 would have originated in the Basque Country; however, for R1b-DF27 and most of its subhaplogroups, internal diversity measures and age estimates are lower in Basques than in any other population. Then, the high frequencies of R1b-DF27 among Basques could be better explained by drift rather than by a local origin (except for the case of M153; see below), which could also have decreased the internal diversity of R1b-DF27 among Basques. An origin of R1b-DF27 outside the Iberian Peninsula could also be contemplated, and could mirror the external origin of R1b-M269, even if it reaches there its highest frequencies. However, the search for an external origin would be limited to France and Great Britain; R1b-DF27 seems to be rare or absent elsewhere: Y-STR data are available only for France, and point to a lower diversity and more recent ages than in Iberia (Table 3). Unlike in Basques, drift in a traditionally closed population seems an unlikely explanation for this pattern, and therefore, it does not seem probable that R1b-DF27 originated in France. Then, a local origin in Iberia seems the most plausible hypothesis. Within Iberia, Aragon shows the highest diversity and age estimates for R1b-DF27, Z195, and the L176.2 branch, although, given the small sample size, any conclusion should be taken cautiously. On the contrary, Z220 and Z278 are estimated to be older in North Central Spain (N Castile, Cantabria and Asturias). Finally, M153 is almost restricted to the Basque Country: it is rarely present at frequencies >1% elsewhere in Spain (although see the cases of Alacant, Andalusia and Madrid, Suppl. Table 1), and it was found at higher frequencies (10–17%) in several Basque regions; a local origin seems plausible, but, given the scarcity of M153 chromosomes outside of the Basque Country, the diversity and age values cannot be compared.
Within its range, R1b-DF27 shows same geographical differentiation: Western Iberia (particularly, Asturias and Portugal), with low frequencies of R1b-Z195 derived chromosomes and relatively high values of R1b-DF27* (xZ195); North Central Spain is characterized by relatively high frequencies of the Z220 branch compared to the L176.2 branch; the latter is more abundant in Eastern Iberia. Taken together, these observations seem to match the East-West patterning that has occurred at least twice in the history of Iberia: i) in pre-Roman times, with Celtic-speaking peoples occupying the center and west of the Iberian Peninsula, while the non-Indoeuropean eponymous Iberians settled the Mediterranean coast and hinterland; and ii) in the Middle Ages, when Christian kingdoms in the North expanded gradually southwards and occupied territories held by Muslim fiefs.
I wouldn’t trust the absence of R1b-DF27 outside France as a proof that its origin must be in Western Europe – especially since we have ancient DNA, and that assertion might prove quite wrong – but aside from that the article seems solid in its analysis of modern populations.