The concept of “Outlier” in Human Ancestry (II): Early Khvalynsk, Sredni Stog, West Yamna, Iron Age Bulgaria, Potapovka, Andronovo…


I already wrote about the concept of outlier in Human Ancestry, so I am not going to repeat myself. This is just an update of “outliers” in recent studies, and their potential origins (here I will repeat some of the examples):

Early Khvalynsk: the three samples from the Samara region have quite different positions in PCA, from nearest to EHG (of Y-DNA haplogroup R1a) to nearest to ANE ancestry (of Y-DNA haplogroup Q). This could represent the initial consequences of the second wave of ANE ancestry – as found later in Yamna samples from a neighbouring region -, possibly brought then by Eurasian migrants related to haplogroup Q.
With only 3 samples, this is obviously just a tentative explanation of the finds. The samples can only be reasonably said to show an unstable time for the region in terms of admixture (i.e. probably migration), judging by the data on PCA.

Ukraine Eneolithic samples offer a curious example of how the concept of outlier can change radically: from the third version (May 30th) of the preprint paper of Mathieson et al. (2017), when the Ukraine Eneolithic sample with steppe ancestry (and clustering with central European samples) was the ‘outlier’, to the fourth version (September 19th), when two samples with steppe ancestry clustering close to Corded Ware samples were now the ‘normal’ ones (i.e. those representing Ukraine Eneolithic population), and the outlier was the one clustering closely with Ukraine Mesolithic samples…

PCA and Admixture for south-eastern Europe. Image modified from Mathieson et al. (2017) – Third revision (May 30th), used in the 2nd edition of the Indo-European demic diffusion model.

This is one of the funny consequences of the wrong interpretation of the ‘yamnaya component’, that made geneticists believe at first that, out of two samples (!), the ‘outlier’ was the one with ‘yamnaya’ ancestry, because this component would have been brought by an eastern immigrant from early Khvalynsk…

This example offers yet another reason why precise anthropological context is necessary to offer the right interpretation of results. Within the Indo-European demic diffusion model – based mainly on Archaeology and Linguistics – , the sample with steppe ancestry was the most logical find in the region for a potential origin of the Corded Ware culture, and it was interpreted as such, well before the publication of the fourth version of Mathieson et al. (2017).

PCA of South-East European and other European samples. Image modified from Mathieson et al. (2017) – Fourth revision (September 19th), used in the 3rd edition of the Indo-European demic diffusion model.

West Yamna (to insist on the same question, the ‘yamnaya’ component): we have only four western Yamna samples, two of them showing Anatolian Neolithic ancestry (one of them, from Ukraine, with a strong ‘southern’ drift). On the other hand, Corded Ware migrants do not show this. So we could infer that their migrations were not coetaneous: whereas peoples of Corded Ware culture expanded ca. 3300 BC to the north – in the natural corridor to the Baltic that has been proposed for this culture in Archaeology for decades (and that is well represented by Ukraine Eneolithic samples) -, peoples of Yamna culture expanded to the west, replacing the Ukraine Eneolithic population (i.e. probably those of ‘Proto-Corded Ware culture’), and eventually mixing with Balkan populations of Anatolian Neolithic ancestry.

Potapovka, Andronovo, and Srubna: while Potapovka clusters closely to the steppe, and Andronovo (like Sintashta) clusters closely to Corded Ware (i.e. Ukraine Neolithic / Central-East European), both have certain ‘outliers’ in PCA: the former has one individual clustering closely to Corded Ware, and the latter to the steppe. Both ‘outliers’ fit well with the interpretation of the recent mixture of Corded Ware peoples with steppe populations, and they offer a different image for the evolution of populations of Potapovka and Sintashta-Petrovka, potentially influencing their language. The position of Srubna samples, nearer to Sintashta and Andronovo (but occupying the same territory as the previous Potapovka) offers the image of a late westward conquest from Corded Ware-related populations.

Diachronic map of migrations ca. 2250-1750 BC

Iron Age Bulgaria: a sample of haplogroup R1a-z93, with more ‘yamnaya’ ancestry than any other previous sample from the Balkans. For some, it might mean continuity from an older time. However – as with the Corded Ware outlier from Esperstedt before it – it is more likely a recent migrant from the steppe. The most likely origin of this individual is therefore people from the steppe, i.e. either the Srubna culture or a related group. Its relatively close cluster in PCA to certain recent Slavic populations can be interpreted in light of the multiple back and forth migrations in the region: of steppe populations to the west (Srubna, Cimmerians, Scythians, Sarmatians,…), and of Slavic-speaking populations:

Diachronic map of Bronze Age migrations ca. 1750-1250 BC.

Well-defined outliers are, therefore, essential to understand a recent history of admixture. On the other hand, the very concept of “outlier” can be a dangerous tool – when the lack of enough samples makes their classification as as such unjustified -, leading to the wrong interpretations.


Review article about Ancient Genomics, by Pontus Skoglund and Iain Mathieson


A preprint article by two of the most prolific researchers in Human Ancestry is out, and they request feedback: Ancient genomics: a new view into human prehistory and evolution, by Skoglund and Mathieson (2017). Right now, it is downloadable on Dropbox.


The first decade of ancient genomics has revolutionized the study of human prehistory and evolution. We review new insights based on ancient genomic data, including greatly increased resolution of the timing and structure of the out-of-Africa event, the diversification of present-day non-African populations, and the earliest expansions of those populations into Eurasia and America. Prehistoric genomes now document patterns of population continuity and change on every inhabited continent–in particular the effect of agricultural expansions in Africa, Europe and Oceania–and record a history of natural selection that shapes present-day phenotypic diversity. Despite these advances, much remains unknown, in particular about the genomic histories of Asia–the most populous continent, and Africa–the continent that contains the most genetic diversity. Ancient genomes from these and other regions, integrated with a growing understanding of the genomic basis of human phenotypic diversity, will be in focus during the next decade of research in the field.

The paper may be highly recommended as an introduction for anyone interested in the field of Human Ancestry in general.

However, its short summary of steppe ancestry expansion (where the Corded Ware culture predominates) is still reminiscent of the infamous “Yamnaya -> Corded Ware -> Bell Beaker” model set forth by the 2015 Nature articles on the subject, and Kristiansen’s Indo-European Corded Ware theory.

Here is an excerpt (emphasis mine):

The next substantial change is closely related to ancestry that by around 5000 BP extended over a region of more than 2000 miles of the Eurasian steppe, including in individuals associated with the Yamnaya Cultural Complex in far-eastern Europe (1; 38) and with the Afanasievo culture in the central Asian Altai mountains (1). This “steppe” ancestry is itself a mixture between ancestry that is related to Mesolithic hunter-gatherers of eastern Europe and ancestry that is related to both present-day populations (38) and Mesolithic hunter-gatherers (46) from the Caucasus mountains, and also to the populations of Neolithic (11), and Copper Age (56) Iran. Steppe ancestry appeared in southeastern Europe by 6000 BP (72), northeastern Europe around 5000 BP (47) and central Europe at the time of the Corded Ware Complex around 4600 BP (1; 38). These dates are reasonably tight constraints, because in each case there is no evidence of steppe ancestry in individuals immediately preceding these dates (47; 72). Gene flow on the steppe was extensive and bidirectional, as shown by the eastward flow of Anatolian Neolithic ancestry– reaching well into central Eurasia by the time of the Andronovo culture ~3500 BP (1)–and the westward flow of East Asian ancestry–found in individuals associated with the Iron Age Scythian culture close to the Black Sea ~2500 BP (143).

Copper and Bronze Age population movements (14; 78 Martiniano, 2017 #8761; 85; 112), as well as later movements in the Iron Age and Historical period (70; 119) further distributed steppe ancestry around Europe. Present-day western European populations can be modeled as mixtures of these three ancestry components (Mesolithic hunter-gatherer, Anatolian Neolithic and Steppe) (38; 57). In eastern Europe, further shifts in ancestry are the result of additional or distinct gene flow from Anatolia throughout the Neolithic and Bronze Age in the Aegean (42; 51; 55; 72; 87), and gene flow from Siberian-related populations in Finland and the Baltic region (38). East-west gene flow also brought new ancestry–related to populations from 265 Copper Age Iran–to the Levant during the Copper and Bronze ages (39; 56).

The geographic structure of these population transformations gave rise to population structure of present-day Europe. For example Anatolian Neolithic ancestry is highest in southern European populations like Sardinians, and lowest in northern European populations (38). Steppe ancestry is at high frequency in north-central Europeans and low in the south. Isolation-by-distance may have contributed to these patterns to some extent, but the contribution must have been small. In much of Europe, extreme population discontinuity was the norm.

Featured image: from the article, “Major Holocene population movements and expansions that have been demonstrated using ancient DNA.”


New Ukraine Eneolithic sample from late Sredni Stog, near homeland of the Corded Ware culture


Just one day after publishing the draft of the Indo-European demic diffusion model, 3rd version, Mathieson et al. (2017) have updated some information in a new version of their article, including a new interesting sample from late Sredni Stog. It gives support to what I predicted, regarding the potential origin of the third Corded Ware horizon.

After my first version, findings in Olalde et al. (2017) and Mathieson et al. (2017) supported some of my predictions. Now after my third, their new data also supports another prediction. Because the model is based on solid linguistic and archaeological models. Here is an excerpt from the Indo-European demic diffusion model, 3rd ed. (pp. 55-56):

At the end of the Trypillian culture, herding/hunting trends intensified, and the agricultural system collapsed, with people moving to the steppe zone, as confirmed by the presence of numerous graves to the south (Rassamakin 1999). At the same time, the Trypillian world absorbed a foreign tradition related to materials of settlement sites of the Dnieper steppes – such as the late Sredni Stog culture –, like cord impressions and burial rites similar to the later Corded Ware culture, marking also the transformation of decors and changes in their interpretation (Palaguta 2007).

The similarity in burial rituals between Yamna and Corded Ware made Gimbutas define a common “Kurgan people”, whose relationship has also been long supported by Kristiansen (Kristiansen 1989; Kristiansen et al. 2017). An equivalence of both burial rites has been, however, rejected (Häusler 1963, 1978, 1983), and it is generally agreed that the Yamna culture did not expand to the north of the Tisza River.

The importance of horse exploitation in Deriivka, in the forest-steppe zone of the north Pontic region along the Dnieper region, during the Middle Eneolithic period (probably ca. 3700-3530 BC), suggests that horses played a significant role in the life of this Sredni Stog community (Anthony and Brown 2003). In its late period (ca. 4000-3500 BC), this culture had adopted corded ware pottery, and stone battle-axes.

However, this [sic] western steppe peoples were mainly hunters (Rassamakin 1999), and the ‘herding skill’ essential for wild horse domestication seems absent (Kuzmina 2003). All this has been confirmed with zooarchaeological evidence and new molecular and stable isotope results, suggesting an absence of horse domestication in territories of the late Sredni Stog culture in the north Pontic steppe (Mileto et al. 2017), before the advent of migrants from the Indo-European-speaking Repin culture.

The new sample described in Mathieson et al. (2017), dated ca. 4200 BC (but within a wide range, 5000-3500 BC) is from a site classified as of late Sredni Stog (although potentially from Post-Mariupol / Kvitjana), a culture of hunters who probably did not breed domesticated horses (even after the period of conquest and dominance of Suvorovo-Novodanilovka chiefs, from Indo-Hittite-speaking early Khvalynsk, who had domesticated horses), and – more importantly – is of R1a-M417 lineage, shows high so-called “Yamna component” in ADMIXTURE, and clusters among Corded Ware samples in PCA approximately a thousand years before this culture’s expansion. Information from the supplementary material:

An Eneolithic cemetery of the Sredny Stog II culture was excavated by D. Telegin in 1955-1957 near the village of Alexandria, Kupyansk district, Kharkov region on the left bank of the river Oskol. A total of 33 individuals were recovered. Based on craniometric analysis (I.Potekhina 1999) it was suggested that the Eneolithic inhabitants of Alexandria were not homogeneous and resulted from admixture of local Neolithic hunter-gatherers and early farmers, possibly Trypillian groups. We report genetic data from one individual: I6561

PCA, Admixture of Ukraine Eneolithic and other samples from Mathieson et al. (2017)

Another individual from Eneolithic Ukraine (of R1b1 xM269 lineage) clusters quite closely with Neolithic samples from the Baltic, which points to the strong connection between both – southern and northern – regions of east-central Europe before the period of great Chalcolithic expansions, and the potential origin of the spread of R1b (xM269) lineages with the Corded Ware culture.

The so-called ‘Yamna component’ – an infamous name which, as expected, is turning out to be very wrong – has been found quite elevated in this sample, previous and completely unrelated to the Yamna culture and its expansion, and similar to the (later) Corded Ware samples. In fact, we are seeing that Corded Ware samples are actually clustering closely with east-central Europe (excluding the CWC outlier), and not with Yamna and other Indo-European-speaking steppe cultures.

‘Yamna component’ (in yellow) in the North-West Pontic Steppe and the Balkans, including Eneolithic Ukraine samples

It will be fun to see the mess that certain researchers have made (and will still make in the near future) of their findings coupled with the concept of “Yamna component”, when trying to describe the “proxy ancestral populations” of European Copper Age and Bronze Age cultures… Difficult times ahead for many, after the collapse of the simplistic Yamna -> Corded Ware -> Bell Beaker genetic model laid out since Haak et al. (2015) and Allentoft et al. (2015).

[EDIT 27 September 2017] Not directly related, but here is today’s interesting discussion on Twitter surrounding the ancestral populations of the “Yamnaya component”, for illustration of the discussions to come when this ancestry is divided into different, more precise, older (Neolithic) steppe components, and these in turn shown to contribute to different European and Asian Chalcolithic and Bronze Age cultures:

Given the variance found in the three samples from Eneolithic Ukraine (comparable to the variance found in east Bell Beaker samples), we may now be getting closer to the precise territory and culture where the Corded Ware culture might have formed, which cannot be much further from the Dnieper-Dniester region before the Yamna expansion to the west ca. 3300 BC, judging from the elevated steppe component.

It seems, because of the proximity of both cultures and the similar dates of their migrations, that the westward expansion of the Yamna culture may have indeed provided an important push (among some strong ‘pull’ forces) for peoples of the expansion of the Corded Ware culture.

Diachronic map of Eneolithic migrations in the Caucasus ca. 4000-3100 BC

It keeps being demonstrated that archaeologists like Anthony, Heyd, Mallory, etc. were right where others tried to interpret admixture based on few samples and their own imagination, without any knowledge (or interest) whatsoever about Archaeology or Linguistics.

So Genetics reinforces the solidest models of Archaeology and Linguistics? Professional academics being mostly right in their careful research, and amateur geneticists playing with software being wrong? Who would have thought… More and more papers help thus shut up naysayers who state (again and again) that new algorithms are here to revolutionise these academic fields.

As Heyd predicted more than 10 years ago, and as many pointed out in terms of linguistic influence (like Mallory or Prescott) the transformation of Yamna settlers into the east Bell Beaker culture, and this culture‘s spread into western and northern Europe, must be noticed in genetic investigation.

The expansion of peoples is known to be associated with the spread of a certain admixture component + the expansion and reduction in variability of a haplogroup (i.e. few male lineages are usually more successful during the expansion): Neolithic farmers from the Middle East expanding with haplogroup G2a; Natufian component (Levant hunter-gatherers or later, Neolithic farmers) and haplogroup E southward into Africa; CHG component expansion with haplogroup J; WHG expansion into east Europe with haplogroup R1b; etc.

There were (at least) two main expansion processes involving Proto-Indo-European: one causing the branching off of the language ancestral to Anatolian, and another during the spread of Late Indo-European dialects. Based on this, and on known archaeological models, I have predicted since the first version of the demic diffusion model:

  • Based on haplogroups found until then in Yamna (R1b-M269), Corded Ware (R1a-M417, especially Z645), and Bell Beaker (R1b-L151):
    • that mainly R1b-L23 (especially L51) lineages and more steppe admixture would be found in east Bell Beaker – confirmed some two months after my publication by Olalde et al. (2017);
    • and that mainly R1a-M417 (especially Z645) subclades will be found in Corded Ware samples.
  • Based on the finding of “Yamna component” in the Corded Ware culture: that this admixture must have come from somewhere else. I pointed out to eastern Europe, including the forest and forest-steppe zone especially in the natural continuum of the Dniester-Dnieper region. Especially after Mathieson et al. (2017), in my second and third versions of the model, I have more specifically suggested a southern origin in the region, nearer to where the CHG ancestry must have come from (the Caucasus and cultures formed in contact with it), according to mainstream archaeological data, i.e. cultures of the North Pontic steppe / steppe-forest. But of course, until more samples are available, more CHG ancestry in other cultures of the Forest Zone cannot be discarded.

For the vast majority of academics, more samples (regionally proportioned) are needed only from early Corded Ware, as we have from Bell Beaker: if they are (as expected) mostly R1a-M417, then everything is clear, and it will finally mean the end for the tiring, now almost ‘traditional’ association R1a – Proto-Indo-European. Some more samples from the potential homeland of the third Corded Ware horizon, most likely Ukraine (Podolia and Volynia regions), nearer to the time of the Corded Ware expansion, would also be great, to locate the actual ancestral population of Corded Ware migrants – recognisable by the main presence of haplogroup R1a-Z645 (formed ca. 3500 BC), and elevated “Yamna component” before the arrival of the Yamna culture…

If, however, early Corded Ware samples of R1b-L23 subclades are found in certain quantity, especially old samples from east-central Europe (excluding Yamna migrants along the Prut), the tricky question of Late Indo-European cultural diffusion will remain: Did Corded Ware peoples adopt a Late Indo-European language from clans of R1b-L23 lineages? That is what Kristiansen and Anthony have been betting for, a cultural diffusion, caused by:

  • A long-lasting contact, according to Kristiansen (1989,…,2017). He defends that Sredni Stog adopted the language – but obviously not the same culture – from the east, but that it is a genetic and cultural mix from Globular Amphora, Trypillia, and steppe cultures. This has been Kristiansen’s model for almost 30 years, and it follows Marija Gimbutas’ outdated theory of the “Kurgan people”.
  • A rapid change according to Anthony (2007). He associates the adoption of Pre-Germanic with the domination of Yamna chiefs over Usatovo people, and the adoption of Balto-Slavic by the people from (Corded Ware) Middle Dnieper group because of the technical superiority of neighbouring Yamna herders.

Linguistics, with the growing support of a North-West Indo-European group, points clearly to a European expansion of a community speaking the ancestral language of Italo-Celtic, Germanic, and probably Balto-Slavic. Archaeology, too, showed migration from Yamna only to south-eastern Europe (correcting Gimbutas’ Kurgan model) and later with east Bell Beaker mainly into central, western, and northern Europe.

Even Kristiansen admits that only after the arrival of Bell Beaker in Scandinavia was a linguistic community (i.e. Germanic) formed – although he places the center of gravity in Úněticean influence, and (yet again) a cultural diffusion event into the Danish Dagger period.

Because of more and more data contrasting with old theories, some have elected to develop weak, indemonstrable links, to keep supporting e.g. Gimbutas’ concept of “Kurgan people” in Archaeology, and a sudden, early expansion of all PIE dialects at once in Linguistics. It seems that, after so much fuss about the (misleading) ‘Yamna component’ concept – and so many far-fetched assumptions by amateur geneticists -, the Corded Ware connection will once again hinge on weak, indemonstrable cultural diffusion theories, be it ‘Kurgan peoples’ (including now, of course, Eneolithic cultures of Ukraine) or any culture from eastern Europe that will reveal some close samples to Corded Ware migrants, in terms of PCA, ADMIXTURE, or haplogroup.

So once we find mainly R1a-Z645 in more Corded Ware samples (and this haplogroup and more “Yamna component” in non-Yamna cultures of Eneolithic Ukraine, and potentially Poland or Belarus) we all may finally expect a peaceful acceptance of reality, at least in Genetics? Nope. No siree. Nein. Not then, not ever.

Why? Because some people want their paternal lineage to have lived in their historical region, and spoken their historical language, since time immemorial. It won’t matter if Archaeology, Linguistics, Genetics, etc. don’t support their claims: if they need to use some aspects of admixture, or haplogroups (or a combination of them) from carefully selected samples instead of looking at the whole picture; if they have to support that Indo-Europeans came from a culture different than Yamna, in- or outside of the steppe or forest-steppe, be it the Balkans, Anatolia, Armenia, or the Moon; if their proto-language should then come directly from Indo-Hittite, or from a Germano-Slavonic, or Indo-Slavonic, or Indo-Germanic group, or whatever invented dialectal branch necessary to fit their model, or if they have to support the ‘constellation analogy’ of Clackson, or thousands of years of development for each branch; etc. They will support whatever is necessary.

And this adaptation, obviously, has no end. It’s stupid, I know. But that’s how we are, how we think. We have seen that these sad trends continue no matter what, for decades, and not only regarding Indo-European. Some common examples include:

  • Indo-Aryan-speaking Indians defending an autochthonous origin of R1a and Indo-European; as well as the ‘opposite’ autochtonous continuity theory of Dravidian-speaking Indians (based on ASI ancestry, haplogroup R2, mtDNA haplogroup M, or whatever is at hand).
  • Western Europeans defending an autochthonous origin of the R1b haplogroup, with a Palaeolithic or Mesolithic origin, including the language, viz. the recent Indo-European from the Atlantic façade theories (in the Celtic from the West series, by Koch and Cunliffe); the now fading Palaeolithic Continuity Theory; and many other forgotten Eurocentric proposals; as well as the more recent informal hints of a central European/Balkan homeland based on the Villabruna cluster and south-eastern Mesolithic finds, which is at risk of being related to a Balkan origin of Proto-Indo-European
    • There is also the ‘opposite’ theory of the autochthonous origin of the Basques, including Proto-Iberians and potentially other peoples like Paleo-Sardinians, based on the previously popular Vasconic-Uralic hypothesis (and an ancient Europe divided into R1b and N1c1 haplogroups), which is still widely believed in certain regions.
  • Finno-Ugric speakers of N1c1 lineage defending an autochtonous origin of the lineage and language in eastern Europe.
  • Nordic speakers supporting the autochthonous nature of Germanic and haplogroup I1 to Scandinavia.
  • Armenian speakers delighted to see a proposal of Indo-European homeland in the Armenian highlands, be it supported by glottalic consonants, CHG ancestrty, R1b (xM269) or J lineages…
  • Greek speakers now willing to support continuity of haplogroup J as a ‘native’ Greek lineage, of people speaking Proto-Greek (and in earlier times PIE), because of two Minoan, and one Mycenaean samples found in Lazaridis et al. (2017).
  • Even Turks linking Yamna with the expansion of Turkic languages. That one is fun to read, almost like a parody for the rest – substituting “Indo-European” for “Turkic”.
  • For years, a lot of people – me included (at least since 2005) – believed, because of modern maps of R1a distribution, that R1a and Corded Ware are the vector of Indo-European languages. For those of us who don’t have any personal or national tie with this haplogroup, this notion has been easy to change with new data. For others, it obviously isn’t, and it won’t be.
Modern R1a distribution in Eurasia (Wikipedia, 2008). The typical, simplistic map we relied on in the 2000s to derive wrong conclusions based on Genetics, conclusions which are sadly still alive and kicking…

For all these people, a sample, result, or conclusion from any paper, just dubiously in favour, means everything, but a thousand against mean nothing, or can be reinterpreted to support their fantasies.

The Kossinnian autochthonous continuity” crap permeates this relatively new subfield of Human Evolutionary Genetics, as it permeated Indo-European studies (first Linguistics, then Archaeology) in its infancy. It seems to be a generalised human trend, no doubt related to some absurd inferiority complex, mixed with historical romanticism, a certain degree of chauvinism, and (falling in the eternal Godwin’s Law of our field) some outdated, childish notion of ‘supremacy’ linked with the expansion of the own language and people.

Such simplistic and popular models are also lucrative, judging by the boom in demand for DNA analysis, which companies embellish with modern fortune tellers (or fortune tellers themselves sell for a price), promising to ascertain your ‘ancestry proportions’ using automated algorithms, so that you don’t have to get lost in complex genetic data and prehistoric accounts, which can’t help you define your “ethnicity”…

Some just don’t want to realize that the spread of prehistoric languages (like Late Indo-European dialects) was a complex, non-uniform, stepped process, devoid of modern romantic concepts, which in genetic terms necessarily included later founder effects and cultural diffusions, so that no one can trace their haplogroup, lineage, family, region, or country to any single culture, language, or ethnic group. The same, by the way, can be said of peoples and countries in historic times.

As I said before, we shall expect supporters of the Kurgan model (and thus the expansion of R1a-Z645 with Yamna) to wait for just one sample of R1a-M417 in Yamna and/or Bell Beaker (which will eventually be found), and just one sample of R1b-M269 in Corded Ware (which will also eventually be found), to blow the horn of victory in this naïve competition against time, general knowledge, and (essentially) themselves.

A sad consequence of how we are is that, because of the obvious influence of these stupid modern ethnolinguistic agendas, because we are not all rowing in the same direction, genetic results and conclusions are still perceived as far-fetched and labile, and thus most archaeologists and linguists prefer not to include genetic results in their investigation. And those who dare to do so, are badly counselled by those who go with the tide, so that their papers become almost instantly outdated.


Iberian Peninsula: Discontinuity in mtDNA between hunter-gatherers and farmers, not so much during the Chalcolithic and EBA


A new preprint paper at BioRxiv, The maternal genetic make-up of the Iberian Peninsula between the Neolithic and the Early Bronze Age, by Szécsényi-Nagy et al. (2017).


Agriculture first reached the Iberian Peninsula around 5700 BCE. However, little is known about the genetic structure and changes of prehistoric populations in different geographic areas of Iberia. In our study, we focused on the maternal genetic makeup of the Neolithic (~ 5500-3000 BCE), Chalcolithic (~ 3000-2200 BCE) and Early Bronze Age (~ 2200-1500 BCE). We report ancient mitochondrial DNA results of 213 individuals (151 HVS-I sequences) from the northeast, central, southeast and southwest regions and thus on the largest archaeogenetic dataset from the Peninsula to date. Similar to other parts of Europe, we observe a discontinuity between hunter-gatherers and the first farmers of the Neolithic. During the subsequent periods, we detect regional continuity of Early Neolithic lineages across Iberia, however the genetic contribution of hunter-gatherers is generally higher than in other parts of Europe and varies regionally. In contrast to ancient DNA findings from Central Europe, we do not observe a major turnover in the mtDNA record of the Iberian Late Chalcolithic and Early Bronze Age, suggesting that the population history of the Iberian Peninsula is distinct in character.

Iberian mtDNA samples

Detailed conclusions of their work,

The present study, based on 213 new and 125 published mtDNA data of prehistoric Iberian individuals suggests a more complex mode of interaction between local hunter-gatherers and incoming early farmers during the Early and Middle Neolithic of the Iberian Peninsula, as compared to Central Europe. A characteristic of Iberian population dynamics is the proportion of autochthonous hunter-gatherer haplogroups, which increased in relation to the distance to the Mediterranean coast. In contrast, the early farmers in Central Europe showed comparatively little admixture of contemporaneous hunter-gatherer groups. Already during the first centuries of Neolithic transition in Iberia, we observe a mix of female DNA lineages of different origins. Earlier hunter-gatherer haplogroups were found together with a variety of new lineages, which ultimately derive from Near Eastern farming groups. On the other hand, some early Neolithic sites in northeast Iberia, especially the early group from the cave site of Els Trocs in the central Pyrenees, seem to exhibit affinities to Central European LBK communities. The diversity of female lineages in the Iberian communities continued even during the Chalcolithic, when populations became more homogeneous, indicating higher mobility and admixture across different geographic regions. Even though the sample size available for Early Bronze Age populations is still limited, especially with regards to El Argar groups, we observe no significant changes to the mitochondrial DNA pool until the end of our time transect (1500 BCE). The expansion of groups from the eastern steppe, which profoundly impacted Late Neolithic and EBA groups of Central and North Europe, cannot (yet) be seen in the contemporaneous population substrate of the Iberian Peninsula at the present level of genetic resolution. This highlights the distinct character of the Neolithic transition both in the Iberian Peninsula and elsewhere and emphasizes the need for further in depth archaeogenetic studies for reconstructing the close reciprocal relationship of genetic and cultural processes on the population level.

So it seems more and more likely that the North-West Indo-European invasion during the Copper Age (signaled by changes in Y-DNA lineages) was not, as in central Europe, accompanied by much mtDNA turnover. What that means – either a male-dominated invasion, or a longer internal evolution of invasive Y-DNA subclades – remains to bee seen, but I am still more inclined to see the former as the most likely interpretation, in spite of admixture results.


Featured images: from the article, licensed BY-NC-ND.

Analysis of R1b-DF27 haplogroups in modern populations adds new information that contrasts with ‘steppe admixture’ results


New open access article published in Scientific Reports, Analysis of the R1b-DF27 haplogroup shows that a large fraction of Iberian Y-chromosome lineages originated recently in situ, by Solé-Morata et al. (2017).


Haplogroup R1b-M269 comprises most Western European Y chromosomes; of its main branches, R1b-DF27 is by far the least known, and it appears to be highly prevalent only in Iberia. We have genotyped 1072 R1b-DF27 chromosomes for six additional SNPs and 17 Y-STRs in population samples from Spain, Portugal and France in order to further characterize this lineage and, in particular, to ascertain the time and place where it originated, as well as its subsequent dynamics. We found that R1b-DF27 is present in frequencies ~40% in Iberian populations and up to 70% in Basques, but it drops quickly to 6–20% in France. Overall, the age of R1b-DF27 is estimated at ~4,200 years ago, at the transition between the Neolithic and the Bronze Age, when the Y chromosome landscape of W Europe was thoroughly remodeled. In spite of its high frequency in Basques, Y-STR internal diversity of R1b-DF27 is lower there, and results in more recent age estimates; NE Iberia is the most likely place of origin of DF27. Subhaplogroup frequencies within R1b-DF27 are geographically structured, and show domains that are reminiscent of the pre-Roman Celtic/Iberian division, or of the medieval Christian kingdoms.

Some people like to say that Y-DNA haplogroup analysis, or phylogeography in general, is of no use anymore (especially modern phylogeography), and they are content to see how ‘steppe admixture’ was (or even is) distributed in Europe to draw conclusions about ancient languages and their expansion. With each new paper, we are seeing the advantages of analysing ancient and modern haplogroups in ascertaining population movements.

Quite recently there was a suggestion based on steppe admixture that Basque-speaking Iberians resisted the invasion from the steppe. Observing the results of this article (dates of expansion and demographic data) we see a clear expansion of Y-DNA haplogroups precisely by the time of Bell Beaker expansion from the east. Y-DNA haplogroups of ancient samples from Portugal point exactly to the same conclusion.

The situation of R1b-DF27 in Basques, as I have pointed out elsewhere, is probably then similar to the genetic drift of Finns, mainly of N1c lineages, speaking today a Uralic language that expaned with Corded Ware and R1a subclades.

The recent article on Mycenaean and Minoan genetics also showed that, when it comes to Europe, most of the demographic patterns we see in admixture are reminiscent of the previous situation, only rarely can we see a clear change in admixture (which would mean an important, sudden replacement of the previous population).

Equating the so-called steppe admixture with Indo-European languages is wrong. Period.

The following are excerpts from the article (emphasis is mine):

Dates and expansions

The average STR variance of DF27 and each subhaplogroup is presented in Suppl. Table 2. As expected, internal diversity was higher in the deeper, older branches of the phylogeny. If the same diversity was divided by population, the most salient finding is that native Basques (Table 2) have a lower diversity than other populations, which contrasts with the fact that DF27 is notably more frequent in Basques than elsewhere in Iberia (Suppl. Table 1). Diversity can also be measured as pairwise differences distributions (Fig. 5). The distribution of mean pairwise differences within Z195 sits practically on top of that of DF27; L176.2 and Z220 have similar distributions, as M167 and Z278 have as well; finally, M153 shows the lowest pairwise distribution values. This pattern is likely to reflect the respective ages of the haplogroups, which we have estimated by a modified, weighted version of the ρ statistic (see Methods).

Z195 seems to have appeared almost simultaneously within DF27, since its estimated age is actually older (4570 ± 140 ya). Of the two branches stemming from Z195, L176.2 seems to be slightly younger than Z220 (2960 ± 230 ya vs. 3320 ± 200 ya), although the confidence intervals slightly overlap. M167 is clearly younger, at 2600 ± 250 ya, a similar age to that of Z278 (2740 ± 270 ya). Finally, M153 is estimated to have appeared just 1930 ± 470 ya.

Haplogroup ages can also be estimated within each population, although they should be interpreted with caution (see Discussion). For the whole of DF27, (Table 3), the highest estimate was in Aragon (4530 ± 700 ya), and the lowest in France (3430 ± 520 ya); it was 3930 ± 310 ya in Basques. Z195 was apparently oldest in Catalonia (4580 ± 240 ya), and with France (3450 ± 269 ya) and the Basques (3260 ± 198 ya) having lower estimates. On the contrary, in the Z220 branch, the oldest estimates appear in North-Central Spain (3720 ± 313 ya for Z220, 3420 ± 349 ya for Z278). The Basques always produce lower estimates, even for M153, which is almost absent elsewhere.

Simplified phylogenetic tree of the R1b-M269 haplogroup. SNPs in italics were not analyzed in this manuscript.


The median value for Tstart has been estimated at 103 generations (Table 4), with a 95% highest probability density (HPD) range of 50–287 generations; effective population size increased from 131 (95% HPD: 100–370) to 72,811 (95% HPD: 52,522–95,334). Considering patrilineal generation times of 30–35 years, our results indicate that R1b-DF27 started its expansion ~3,000–3,500 ya, shortly after its TMRCA.

As a reference, we applied the same analysis to the whole of R1b-S116, as well as to other common haplogroups such as G2a, I2, and J2a. Interestingly, all four haplogroups showed clear evidence of an expansion (p > 0.99 in all cases), all of them starting at the same time, ~50 generations ago (Table 4), and with similar estimated initial and final populations. Thus, these four haplogroups point to a common population expansion, even though I2 (TMRCA, weighted ρ, 7,800 ya) and J2a (TMRCA, 5,500 ya) are older than R1b-DF27. It is worth noting that the expansion of these haplogroups happened after the TMRCA of R1b-DF27.

Principal component analysis of STR haplotypes. (a) Colored by subhaplogroup, (b) colored by population. Larger squares represent subhaplogroup or population centroids.

Sum up and discussion

We have characterized the geographical distribution and phylogenetic structure of haplogroup R1b-DF27 in W. Europe, particularly in Iberia, where it reaches its highest frequencies (40–70%). The age of this haplogroup appears clear: with independent samples (our samples vs. the 1000 genome project dataset) and independent methods (variation in 15 STRs vs. whole Y-chromosome sequences), the age of R1b-DF27 is firmly grounded around 4000–4500 ya, which coincides with the population upheaval in W. Europe at the transition between the Neolithic and the Bronze Age. Before this period, R1b-M269 was rare in the ancient DNA record, and during it the current frequencies were rapidly reached. It is also one of the haplogroups (along with its daughter clades, R1b-U106 and R1b-S116) with a sequence structure that shows signs of a population explosion or burst. STR diversity in our dataset is much more compatible with population growth than with stationarity, as shown by the ABC results, but, contrary to other haplogroups such as the whole of R1b-S116, G2a, I2 or J2a, the start of this growth is closer to the TMRCA of the haplogroup. Although the median time for the start of the expansion is older in R1b-DF27 than in other haplogroups, and could suggest the action of a different demographic process, all HPD intervals broadly overlap, and thus, a common demographic history may have affected the whole of the Y chromosome diversity in Iberia. The HPD intervals encompass a broad timeframe, and could reflect the post-Neolithic population expansions from the Bronze Age to the Roman Empire.

While when R1b-DF27 appeared seems clear, where it originated may be more difficult to pinpoint. If we extrapolated directly from haplogroup frequencies, then R1b-DF27 would have originated in the Basque Country; however, for R1b-DF27 and most of its subhaplogroups, internal diversity measures and age estimates are lower in Basques than in any other population. Then, the high frequencies of R1b-DF27 among Basques could be better explained by drift rather than by a local origin (except for the case of M153; see below), which could also have decreased the internal diversity of R1b-DF27 among Basques. An origin of R1b-DF27 outside the Iberian Peninsula could also be contemplated, and could mirror the external origin of R1b-M269, even if it reaches there its highest frequencies. However, the search for an external origin would be limited to France and Great Britain; R1b-DF27 seems to be rare or absent elsewhere: Y-STR data are available only for France, and point to a lower diversity and more recent ages than in Iberia (Table 3). Unlike in Basques, drift in a traditionally closed population seems an unlikely explanation for this pattern, and therefore, it does not seem probable that R1b-DF27 originated in France. Then, a local origin in Iberia seems the most plausible hypothesis. Within Iberia, Aragon shows the highest diversity and age estimates for R1b-DF27, Z195, and the L176.2 branch, although, given the small sample size, any conclusion should be taken cautiously. On the contrary, Z220 and Z278 are estimated to be older in North Central Spain (N Castile, Cantabria and Asturias). Finally, M153 is almost restricted to the Basque Country: it is rarely present at frequencies >1% elsewhere in Spain (although see the cases of Alacant, Andalusia and Madrid, Suppl. Table 1), and it was found at higher frequencies (10–17%) in several Basque regions; a local origin seems plausible, but, given the scarcity of M153 chromosomes outside of the Basque Country, the diversity and age values cannot be compared.

Within its range, R1b-DF27 shows same geographical differentiation: Western Iberia (particularly, Asturias and Portugal), with low frequencies of R1b-Z195 derived chromosomes and relatively high values of R1b-DF27* (xZ195); North Central Spain is characterized by relatively high frequencies of the Z220 branch compared to the L176.2 branch; the latter is more abundant in Eastern Iberia. Taken together, these observations seem to match the East-West patterning that has occurred at least twice in the history of Iberia: i) in pre-Roman times, with Celtic-speaking peoples occupying the center and west of the Iberian Peninsula, while the non-Indoeuropean eponymous Iberians settled the Mediterranean coast and hinterland; and ii) in the Middle Ages, when Christian kingdoms in the North expanded gradually southwards and occupied territories held by Muslim fiefs.

Contour maps of the derived allele frequencies of the SNPs analyzed in this manuscript. Population abbreviations as in Table 1. Maps were drawn with SURFER v. 12 (Golden Software, Golden CO, USA).

I wouldn’t trust the absence of R1b-DF27 outside France as a proof that its origin must be in Western Europe – especially since we have ancient DNA, and that assertion might prove quite wrong – but aside from that the article seems solid in its analysis of modern populations.


Text and figures from the article, licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit