Yekaterinovsky Cape, a link between the Samara culture and early Khvalynsk


We already had conflicting information about the elite individual from the Yekaterinovsky Cape and the materials of his grave, which seemed quite old:

For the burial of 45 in the laboratory of the University of Pennsylvania, a 14C date was obtained: PSUAMS-2880 (Sample ID 16068)> 30 kDa gelatin Russia. 12, Ekaterinovka Grave 45 14C age (BP) 6325 ± 25 δ 13C (‰) –23.6 δ15 N (‰) 14.5. The results of dating suggest chronological proximity with typologically close materials from Yasinovatsky and Nikolsky burial grounds (Telegini et al. 2001: 126). The date obtained also precedes the existing dates for the Khvalynsk culture (Morgunova 2009: 14–15), which, given the dominance of Mariupol traits of the burial rite and inventory, confirms its validity. However, the date obtained for human bones does not exclude the possibility of a “reservoir effect” when the age can increase three or more centuries (Shishlin et al. 2006: 135–140).

Now the same date is being confirmed by the latest study published on the site, by Korolev, Kochkina, and Stachenkov (2019) and it seems it is really going to be old. Abstract (in part the official one, in part newly translated for clarity):

For the first time, pottery of the Early Eneolithic burial ground Ekaterinovsky Cape is published. Ceramics were predominantly located on the sacrificial sites in the form of compact clusters of fragments. As a rule, such clusters were located above the burials, sometimes over the burials, some were sprinkled with ocher. The authors have identified more than 70 vessels, some of which have been partially reconstructed. Ceramic was made with inclusion of the crushed shell into molding mass. The rims of vessels had the thickened «collar»; the bottoms had a rounded shape. The ornament was located on the rims and the upper part of the potteries. Fully decorated vessels are rare. The vessels are ornamented with prints of comb and rope stamps, with small pits. A particularity of ceramics ornamentation is presented by the imprints of soft stamps (leather?) or traces of leather form for the making of vessels. The ornamentation, made up of «walking comb» and incised lines, was used rarely as well as the belts of pits made decoration under «collar» of a rim. Some features of the ceramics decoration under study relate it with ceramics of the Khvalynsk culture. The ceramics of Ekaterinovsky Cape burial ground is attributed by the authors to the Samara culture. The ceramic complex under study has proximity to the ceramics from Syezzhe burial ground and the ceramics of the second phase of Samara culture. The chronological position is determined by the authors as a later period than the ceramics from the Syezzhe burial ground, and earlier than the chronological position of ceramics of the Ivanovka stage of the Samara culture and the Khvalynsk culture.

Ceramics from Ekaterinovsky Cape burial ground. 1–2, 4–5, 7–11 – ceramics from aggregations; 3, 6 – ceramics from the cultural layer.

More specifically:

Based on ceramic fragments from a large vessel from a cluster of sq.m. 14, the date received was: SPb-2251–5673 ± 120 BP. The second date was obtained in fragments from the aggregation [see picture above] from the cluster of sq.m. 45–46: SPb-2252–6372 ± 100 BP. The difference in dating indicates that the process of determining the chronology of the burial ground is far from complete, although we note that the earlier date almost coincided with the date obtained from the human bone from individual 45 (Korolev, Kochkina, Stashenkov, 2018, p. 300).

Therefore, the ceramics of the burial ground Ekaterinovsky Cape possess an originality that determines the chronological position of the burial ground between the earliest materials of the burial type in Syezzhe and the Khvalynsk culture. Techno-typological features of dishes make it possible to attribute it to the Samara culture at the stage preceding the appearance of Ivanovska-Khvalynsk ceramics.

It seems that this site showed cultural influences from the upstream region near the Kama-Vyatka interfluve, too, according to Korolev, Kochkina, Stashenkov, and Khokhlov (2018):

In 2017, excavation of burial ground Ekaterinovsky Cape were continued, located in the area of the confl uence of the Bezenchuk River in the Volga River. During the new excavations, 14 burials were studied. The skeleton of the buried were in a position elongated on the back, less often – crooked on the back with knees bent at the knees. In one burial (No. 90), a special position of the skeleton was recorded. In the burial number 90 in the anatomical order, parts of the male skeleton. This gave grounds for the reconstruction of his original position in a semi-sitting position with the support of elbows on the bottom of the pit. Noteworthy inventory: on the pelvic bones on the left lay a bone spoon, near the right humerus, the pommel of a cruciform club was found. A conclusion is made about the high social status of the buried. The results of the analysis of the burial allow us to outline the closest circle of analogies in the materials of Khvalynsky I and Murzikhinsky burial grounds.

Important sites mentioned in both papers and in this text:

To sum up, it seems that the relative dates we have used until now have to be corrected: older Khvalynsk I Khvalynsk II individuals, supposedly dated ca. 5200-4000 BC (most likely after 4700 BC), and younger Yekaterinovsky individuals, supposedly of the fourth quarter of the 5th millennium (ca. 4250-4000 BC), are possibly to be considered, in fact, roughly reversed, if not chronologically, at least culturally speaking.

Interestingly, this gives a new perspective to the presence of a rare fish- or reptile-headed pommel-scepter, which would be natural in a variable period of expansion of the horse and horse-related symbolism, a cultural trait rooted in the Samara culture attested in Syezzhe before the unification of the symbol of power under the ubiquitous Khvalynsk-Suvorovo horse-headed scepters and related materials.

Ekaterinovsky Cape Burial Ground. Inventory of the burial no 90: 1, 2 – stone pommel of the mace; 3, 4 – bone article.

The Khvalynsk chieftain

If the reported lineages from Yekaterinovsky Cape are within the R1b-P297 tree, but without further clades, as Yleaf comparisons may suggest, there is not much change to what we have, and R1b-M269 could actually represent a part of the local population, but also incomers from the south (e.g. the north Caspian steppe hunter-gatherers like Kairshak), the east (with hunter-gatherer pottery), or the west near the Don River (in contact with Mariupol-related cultures, as the authors inferred initially from material culture).

Just like R1a-M417 became incorporated into the Sredni Stog groups after the Novodanilovka-Suvorovo expansion, probably as incoming hunter-gatherer pottery groups from the north admixing with peoples of “Steppe ancestry”, R1b-M269 lineages might have expanded explosively only during the Repin expansion, and maybe (like R1b-L51 later) they formed just a tiny part of the clans that dominated the steppe during the Khvalynsk-Novodanilovka community.

On the other hand, the potential finding of various R1b-M269/L23 samples in Yekaterinovsky Cape (including an elite individual) would suggest now, as it was supported in the original report by Mathieson et al. (2015), that these ancient R1b lineages found in the Volga – Ural region are in fact most likely all R1b-M269 without enough coverage to obtain proper SNP calls, which would simplify the picture of Neolithic expansions (yet again). From the supplementary materials:

10122 / SVP35 (grave 12). Male (confirmed genetically), age 20-30, positioned on his back with raised knees, with 293 copper artifacts, mostly beads, amounting to 80% of the copper objects in the combined cemeteries of Khvalynsk I and II. Probably a high-status individual, his Y-chromosome haplotype, R1b1, also characterized the high-status individuals buried under kurgans in later Yamnaya graves in this region, so he could be regarded as a founder of an elite group of patrilineally related families. His MtDNA haplotype H2a1 is unique in the Samara series.

Khvalynsk cemetery and grave gifts. Grave 90 contained copper beads and rings, a harpoon, flint blades, and a bird-bone tube. Both graves (90 and 91) were partly covered by Sacrificial Deposit 4 with the bones from a horse, a sheep, and a cow. Center: grave goods from the Khvalynsk cemetery-copper rings and bracelets, polished stone mace heads, polished stone bracelet, Cardium shell ornaments, boars tusk chest ornaments, flint blades, and bifiacial projectile points. Bottom: shell-tempered pottery from the Khvalynsk cemetery. After Agapov, Vasiliev, and Pestrikova 1990; and Ryndina 1998, Figure 31. Modified from Anthony (2007).

This remarkable Khvalynsk chieftain, whose rich assemblage may correspond to the period of domination of the culture all over the Pontic-Caspian steppes, has been consistently reported as of hg. R1b-L754 in all publications, including Wang et al. (2018/2019) tentative SNP calls in the supplementary materials (obtained with Yleaf, as the infamous Narasimhan et al. 2018 samples), but has been variously reported by amateurs as within the R1b-M73, R1b-V88, or (lately) R1b-V1636 trees, which makes it unlikely that quality of the sample is allowing for a proper SNP call.

The fact that Mathieson et al. (2015) considered it a member of the R1b-M269 clans appearing later in Yamna seems on point right now, especially if samples from Yekaterinovka are all within this tree. The relevance of R1b-L23 in the expansion of Repin and Yamna is reminiscent of the influence of successful clans among Yamna offshoots, such as Bell Beakers, and among Bell Beaker offshoots during the Bronze Age all over Europe.

Taking these younger expansions as example, it seems quite likely based on cultural links that (at least part of) the main clans of Khvalynsk were of R1b-M269 lineage, stemming from a R1b-dominated Samara culture, in line with the known succeeding expansions and the expected strictly patriarcal and patrilineal society of Proto-Indo-Europeans, which would have exacerbated the usual reduction in Y-chromosome haplogroup variability that happens during population expansions, and the aversion towards foreign groups while the culture lasted.

Cultures of the Pontic-Caspian steppes and forest-steppes and surrounding areas during the Neolithic.

The finding of R1b-L23 in Yekaterinovka, associated with the Samara culture, before or during the Khvalynsk expansion, and close to the Khvalynsk site, would make this Khvalynsk chieftain most likely a member of the M269 tree (paradoxically, the only R1b-L754 branch amateurs have not yet reported for it). Similarly, the sample of a “Samara hunter-gatherer” of Lebyazhinka, of hg. R1b-P297, could also be under this tree, just like most R1b-M269 from Yamna are downstream from R1b-L23, and most reported R1b-M269 or R1b-L23 from Bell Beakers are under R1b-L151.

On the other hand, we know of the shortcomings of attributing a haplogroup expansion to the best known rulers, such as the famous lineages previously wrongly attributed to Niall of the Nine Hostages or Genghis Khan. The known presence of R1b-V1636 up to modern Greeks would be in line with an ancient steppe expansion that we know will show up during the Neolithic, although it could also be a sign of a more recent migration from the Caucasus. The presence of a sister clade of R1b-L23, R1b-PF7562, among modern Balkan populations, may also be attributed to a pre-Yamna steppe expansion.

Y-DNA samples from Khvalynsk and neighbouring cultures. See full version here.

On SNP calls

I reckon that even informal reports on SNP calls, like any other analyses, should be offered in full: not only with a personal or automatic estimation of the result, but with a detailed explanation of the good, dubious, and bad calls, alternatives to that SNP estimation, and a motivated reasoning of why one branch should be preferred over others. Downloading a sample and giving an instruction using a free software tool is never enough, as it became crystal clear recently for the hilariously biased and flawed qpAdm reports on Dutch Bell Beakers as the ‘missing link’ between Corded Ware and Bell Beakers…

Another example I can recall is the report of a R1a-Z93 subclade in the R1a-M417 sample ca. 4000 BC from Alexandria, which seems rather unlikely, seeing how this subclade must have split and expanded explosively with R1a-Z645 to the east with eastern Corded Ware groups, i.e. 1,000 years later, just like Z282 lineages expanded mainly to the north-east. But then again, as with the Khvalynsk chieftain, I have only seen indirect reports of that supposed SNP (including Y26+!), so we should just stick with its officially reported R1a-M417 lineage. This upstream haplogroup was, in fact, repeated with Yleaf’s tentative estimates in Wang et al. (2019) supplementary materials…

The combination of inexperienced, biased, or simply careless design, analyses, and reports, including SNP calls and qpAdm analyses (whether in forums or publications), however well-intentioned (or not) they might be, are hindering a proper analysis of data, adding to the difficulties we already have due to the scarcity of samples, their limited coverage, and the lack of proper context.

Some people like to repeat ad nauseam that archaeology and/or linguistics are ‘not science’ whenever they don’t fit their beliefs and myths based on haplogroup and/or ancestry. But it’s becoming harder and harder to rely on certain genetic data, too, and on their infinite changing interpretations, much more than it is to rely on linguistic and archaeological research, including data, assessments, and discussions that are open for anyone to review…if one is truly interested in them.

The Pazyryk culture spoke a “Uralic-Altaic” language… because haplogroup N

Matrilineal and patrilineal genetic continuity of two iron age individuals from a Pazyryk culture burial, by Tikhonov, Gurkan, Peler, & Dyakonov, Int J Hum Genet (2019).

Relevant excerpts (emphasis mine):

Of particular interest to the current study are the archaeogenetic investigations associated with the exemplary mound 1 from the Ak-Alakha-1 site on the Ukok Plateau in the Altai Republic (Polosmak 1994a; Pilipenko et al. 2015). This typical Pazyryk “frozen grave” was dated around 2268±39 years before present (Bln-4977) (Gersdorff and Parzinger 2000). Initial anthropological findings suggested an undisturbed dual inhumation comprising “a middle-aged European- type man” and “a young European-type woman”, both of whom presumably had a high social status among the Pazyryk elite (Polosmak 1994a). In contrast, recent archaeogenetic investigations revealed somewhat contradicting results since analyses at both the amelogenin gene and Y-chromosome short tandem repeat (Y-STR) loci clearly established that both Scythians were actually males and had paternal and maternal lineages that are typically associated with eastern Eurasians (Pilipenko et al. 2015). Through the use of mitochondrial, autosomal and Y-chromosomal DNA typing systems, it was possible to not only investigate the potential relationships between the two ancient Scythians but also to gather initial phylogenetic and phylogeographic information on their paternal and maternal lineages (Pilipenko et al. 2015).

Based on the Y-STR data available, the two Ak-Alakha-1 Scythians had an in silico haplogroup assignment of N, which first appeared in southeastern Asia and then expanded in southern Siberia (Rootsi et al. 2007; Pilipenko et al. 2015).

Current study aims to investigate the geographical distributions of the ancient and contemporary matches and close genetic variants of the maternal and paternal lineages observed in the two Scythians from the exemplary Ak-Alakha-1 kurgan.

Geographic distribution of the exact matches with the Scythian (PZ1) Y-STR (17-loci) and mtDNA (HVR1) haplotypes detailed in Tables 1a and 1b. Boundaries of the Altai Republic within the Russian Federation are shown with dashed lines, along with an approximate position of the Ak-Alakha-1 burial site, which is denoted with an ‘x’ on the map. Countries shaded in gray refer to those that have full 17-loci Y-STR and/or mtDNA HVR1 match(es) with the PZ1 haplotypes. Inset in the top and bottom left corners are the Altai and Uzbekistan maps, respectively, both scaled-up to allow better representation of the samples derived from these countries. There were no other exact matches from around parts of the globe that are not shown on the map, except for a single contemporary mtDNA haplotype from US, which presumably belonged to an ‘East Asian’ individual. Inset in the top right corner provides a scale for the number of haplotypes observed, but only up to three samples, which is valid for the entire map as well as the inset maps, irrespective of the differences in the scales of the actual map and inset maps themselves. For sample pools larger than three, the same linear scale provided on the inset in the top right corner still applies; please refer to Tables 1a and b for actual sample pool sizes. Samples are depicted on the entire map and the insets maps with circles and diamonds for the Y-STR and mtDNA haplotypes, respectively. Black and white coloring for samples depict whether the haplotype(s) are contemporary or ancient, respectively. Location of the PZ1 mtDNA and Y-STR haplotypes are shown on top of each other.

In response to aggressive Xiongnu expansion into the Altai region around the 2nd century BCE, some members of the Pazyryk culture may have started moving up North, and eventually reached the Vilyuy River at the beginning of 1st century CE. Notably, there is clear population continuity between the Uralic people such as Khants, Mansis and Nganasans, Paleo-Siberian people such as Yukaghirs and Chuvantsi, and the Pazyryk people even when considering just the two mtDNA and Y-STR haplotypes from the Ak-Alakha-1 mound 1 kurgan (Tables 1a, b, Table 2, Fig. 1). These concepts are also in agreement with the famous Yakut ethnographer Ksenofontov, who suggested that technologies associated with ferrous metallurgy were brought to the Vilyuy Valley at around 1st century CE by the first (proto)Turkic-speaking pioneers (Ksenofontov 1992). Yakut ethnogenesis per se possibly involved two major stages, the first being the proto-Turkic epoch through the arrival of Scytho-Siberian culture originating from Southern Siberia, such as that associated with the Pazyryk culture and the second being the proper Turkic epoch.

Nomadic peoples from the Central Asian steppes are East Iranian speakers whenever they are of haplogroup R1a, but “Uralic-Altaic” speakers whenever they are of haplogroup N. True story.

So they followed a haplogroup ca. 37,000 years old, in a sample dated some 2,300 years ago, whose precise subclade and ancient history is (yet) unknown, compared it to present-day populations, and the result is that they spoke “Uralic-Altaic” because haplogroup N and continuity. Sound familiar? Yep, it’s the kind of reasoning you might be reading right now about Iberian Bell Beakers, about Bell Beakers, or even about Yamna and their relationship to a Vasconic-Caucasian language, based on haplogroup R1b in modern Basques. Another true story.

Anyway, based on the multi-ethnic federations created during this time, and on the ancestral components visible in the different groups (see a post on Karasuk by Chad Rohlfsen), the Pazyryk culture’s language is unknown, and it could be, as a matter of fact (apart from the obvious East Iranian connection):

We also know that haplogroup N and Siberian ancestry expanded into cultures of Northern Eurasia precisely with the creation of the new social paradigm of chiefdoms and alliances, roughly at the same time as Scythians expanded, with the first sample of haplogroup N in Hungary appearing with Cimmerians.

Map of archaeological cultures in north-eastern Europe ca. 8th-3rd centuries BC. [The Mid-Volga Akozino group not depicted] Shaded area represents the Ananino cultural-historical society. Fading purple arrows represent likely stepped movements of subclades of haplogroup N for centuries (e.g. Siberian → Ananino → Akozino → Fennoscandia [N-VL29]; Circum-Arctic → forest-steppe [N1, N2]; etc.). Blue arrows represent eventual expansions of Uralic peoples to the north. Modified image from Vasilyev (2002).

While the study of modern populations is interesting, the problem I have with the paper is the reasoning of “language of ancient haplogroups based on modern populations”, and especially with the concept of “Uralic-Altaic”, and the highly hypothetic “Proto-Turkic” nomadic steppe pastoralists before “Hunnic Turkic” (which is itself questionable), before the “real Turkic” layer (being the authors apparently Turkic themselves), and the supposed “continuity” of Eastern Uralic and Turkic groups in Asia since the Out of Africa migration. The combination of all of this in the same text is just disturbing.

If you look at it from the bright side, at least these samples were not of haplogroup R1a-Z280, or we would be talking about great Slavonic Scythians showing continuity from Russia with love, as the paper threatened to do in its introduction…

If you are enjoying the comeback of this retro 2000s comedy in 2019 (based on the classic nativist “R1a=IE”, “R1b=Basque”, and “N=Uralic” combo) it’s because you – like me – are putting yourself in this guy’s shoes every time a new episode of funny self-destruction appears:



A Game of Thrones in Indo-European: proto-languages in Westeros and Essos, and population genomics


I think proto-languages can be applied to basically any appropriate prehistoric setting, and especially to science fiction and fantasy settings. I often viewed the lack of interest for them as based on the idea that they are not fantastic enough, that they would render a fantastic world too realistic to allow for an adequate immersion of the reader (or viewer) into a new world.

With time, I have become more and more convinced that most authors don’t use proto-languages (or tweaked versions of them) simply because they can’t, and resort to the easier way: inventing some rules and words based on some basic ideas and sounds they feel would fit a certain culture or people, to get going. After all, world-building is about a good enough, not too detailed description, and books are about characters and settings, not worlds.

After the end of the 7th season of the Game of Thrones TV series, of which I have become a great fan, I had some season finale grief to deal with, so I thought about applying what we knew about Proto-Indo-Europeans to the fantasy world. Since all book translations deal with English names as if they were translations of the Common Tongue (e.g. Spanish “Invernalia” or “Poniente” for “Winterfel” or “Westeros”), the idea of a translation into Proto-Indo-European seemed quite interesting.

NOTE. I understand that, for some, the idea that “the original language is the best” would make them reject this. However, just take into account the millions who enjoy the books and the TV series only in their native language, and know nothing about the ‘original’ version…

Here are the text and images:

A Dance with Old Tongues

As you can see, the idea of the Common Tongue being Late Proto-Indo-European brings about a whole new (infinite) world of dialectal evolution, language contacts, and population expansions which must be established for the whole setting to work. This is what the text I began to write was about: to use languages (and related populations) of ca. 6000-1500 BC, and to avoid anachronisms and impossible language relationships.

As an added advantage, fans of role-playing games could expand their world with the use of the language correspondences and the maps. This way, instead of “Northern English” being spoken in the North, and “Spanish English” being spoken in Dorne, according to some selections that have been naturally criticized, you have ancient languages that fit with the ancient setting, and which were actually related to each other.

Equivalence of languages of the known world with coeval proto-languages. Solid red lines divide Graeco-Aryan from Northern Indo-European dialects (Tocharian is separated from North-West Indo-European by a dotted red line). See all maps.

I also began drawing a fantasy map, my first one – even though I have been member of Cartographer’s Guild for years – , which eventually helped me with my updates of maps of prehistoric migrations, and even with the use of arrows and colors for scientific publications. I drew details mainly to illustrate the text, not to offer a comprehensive translated world. Most of the work was done in the Summer of 2017, with some map changes done in 2018 with help of the maps and works of fans.

NOTE. I have reviewed it during some long travels lately, and included names of “bloodlines” (i.e. haplogroups), which I find more interesting today for people to understand bottlenecks during prehistoric migrations; I have also added a map using pie charts. If this doesn’t fit well with the whole picture, it’s because it’s a recent addition. The rest is more or less the same as one-two years ago.

I don’t have time now to correct much of what I wrote. I have forgotten most of the relevant details from the books, especially A World of Ice and Fire which I think helped me a lot with this, and I am sure that after writing A Song of Sheep and Horses (now you know the why of the book names) I would deal with some language identification and cognates differently.

I decided to publish it to liven up our Facebook page of Modern Indo-European now that the 8th season is near, so that people can participate and try to translate (translatable) names and expressions into Proto-Indo-European, to see how it would work out. You can also request access our Modern Indo-European and Proto-Indo-European groups; both are administered mainly by Fernando.

If you think this whole idea is crazy, or a huge loss of time, I agree; this is how you lose your time when you like fantasy, comic books, etc. But I am a great fan of fantasy and fiction, and I had a lot of free time back then, so I couldn’t help it…

On the other hand, if you feel that mixing fantasy (or SF) with the Proto-Indo-European question (especially population genomics) is a bad idea, I may have agreed with that two years ago, and maybe this is the reason why I hesitated to publish it then.

Hoewever, today we can read a whole new (2018 and 2019) bunch of “steppe ancestry=Indo-European” fantasies: invisible Nganasan reindeer hordes, a Fearsome Tisza River where Yamna settlers mysteriously disappear, shapeshifting Dutch CWC peoples who change haplogroups, languages dependent on cephalic types, or Yamna/Bell Beaker expanding Vasconic…So what’s the matter with some more fantasy?

Ancient Sardinia hints at Mesolithic spread of R1b-V88, and Western EEF-related expansion of Vasconic


New preprint Population history from the Neolithic to present on the Mediterranean island of Sardinia: An ancient DNA perspective, by Marcus et al. bioRxiv (2019)

Interesting excerpts (emphasis mine, edited for clarity):

On the high frequency of R1b-V88

Our genome-wide data allowed us to assign Y haplogroups for 25 ancient Sardinian individuals. More than half of them consist of R1b-V88 (n=10) or I2-M223 (n=7).

Francalacci et al. (2013) identi fied three major Sardinia-specifi c founder clades based on present-day variation within the haplogroups I2-M26, G2-L91 and R1b-V88, and here we found each of those broader haplogroups in at least one ancient Sardinian individual. Two major present-day Sardinian haplogroups, R1b-M269 and E-M215, are absent.

Compared to other Neolithic and present-day European populations, the number of identi fied R1b-V88 carriers is relatively high.

(…)ancient Sardinian mtDNA haplotypes belong almost exclusively to macro-haplogroups HV (n = 16), JT (n = 17) and U (n = 9), a composition broadly similar to other European Neolithic populations.

Geographic and temporal distribution of R1b-V88 Y-haplotypes in ancient European samples. We plot the geographic position of all ancient samples inferred to carry R1b-V88 equivalent markers. Dates are given as years BCE (means of calibrated 2s radio-carbon dates). Multiple V88 individuals with similar geographic positions are vertically stacked. We additionally color-code the status of the R1b-V88 subclade R1b-V2197, which is found in most present-day African R1b-V88 carriers.

On the origin of a Vasconic-like Paleosardo with the Western EEF

(…) the Neolithic (and also later) ancient Sardinian individuals sit between early Neolithic Iberian and later Copper Age Iberian populations, roughly on an axis that differentiates WHG and EEF populations and embedded in a cluster that additionally includes Neolithic British individuals. This result is also evident in terms of absolute genetic differentiation, with low pairwise FST ~ 0.005 +- 0.002 between Neolithic Sardinian individuals and Neolithic western mainland European populations. Pairwise outgroup-f3 analysis shows a very similar pattern, with the highest values of f3 (i.e. most shared drift) being with Neolithic and Copper Age Iberia, gradually dropping off for temporally and geographically distant populations.

In explicit admixture models (using qpAdm, see Methods) the southern French Neolithic individuals (France-N) are the most consistent with being a single source for Neolithic Sardinia (p ~ 0:074 to reject the model of one population being the direct source of the other); followed by other populations associated with the western Mediterranean Neolithic Cardial Ware expansion.

Principal Components Analysis based on the Human Origins dataset. A: Projection of ancient individuals’ genotypes onto principal component axes de fined by modern Western Eurasians (gray labels).

Pervasive Western Hunter-Gatherer ancestry in Iberian/French/Sardinian population

Similar to western European Neolithic and central European Late Neolithic populations, ancient Sardinian individuals are shifted towards WHG individuals in the top two PCs relative to early Neolithic Anatolians Admixture analysis using qpAdm infers that ancient Sardinian individuals harbour HG ancestry (~ 17%) that is higher than early Neolithic mainland populations (including Iberia, ~ 8%), but lower than Copper Age Iberians (~ 25%) and about the same as Southern French Middle-Neolithic individuals (~ 21%).

Principal Components Analysis based on the Human Origins dataset. B: Zoom into the region most relevant for Sardinian individuals.

Continuity from Sardinia Neolithic through the Nuragic

We found several lines of evidence supporting genetic continuity from the Sardinian Neolithic into the Bronze Age and Nuragic times. Importantly, we observed low genetic differentiation between ancient Sardinian individuals from various time periods.

A qpAdm analysis, which is based on simultaneously testing f-statistics with a number of outgroups and adjusts for correlations, cannot reject a model of Neolithic Sardinian individuals being a direct predecessor of Nuragic Sardinian individuals (…) Our qpAdm analysis further shows that the WHG ancestry proportion, in a model of admixture with Neolithic Anatolia, remains stable at ~17% throughout three ancient time-periods.

Present-day genetic structure in Sardinia reanalyzed with aDNA. A: Scatter plot of the rst two principal components trained on 1577 present-day individuals with grand-parental ancestry from Sardinia. Each individual is labeled with a location if at least 3 of the 4 grandparents were born in the same geographical location (\small” three letter abbreviations); otherwise with \x” or if grand-parental ancestry is missing with \?”. We calculated median PC values for each Sardinian province (large abbreviations). We also projected each ancient Sardinian individual on to the top two PCs (gray points). B/C: We plot f-statistics that test for admixture of modern Sardinian individuals (grouped into provinces) when using Nuragic Sardinian individuals as one source population. Uncertainty ranges depict one standard error (calculated from block bootstrap). Karitiana are used in the f-statistic calculation as a proxy for ANE/Steppe ancestry (Patterson et al., 2012).

Steppe influx in Modern Sardinians

While contemporary Sardinian individuals show the highest affinity towards EEF-associated populations among all of the modern populations, they also display membership with other clusters (Fig. 5). In contrast to ancient Sardinian individuals, present-day Sardinian individuals carry a modest “Steppe-like” ancestry component (but generally less than continental present-day European populations), and an appreciable broadly “eastern Mediterranean” ancestry component (also inferred at a high fraction in other present-day Mediterranean populations, such as Sicily and Greece).


Arrival of steppe ancestry with R1b-P312 in the Mediterranean: Balearic Islands, Sicily, and Iron Age Sardinia


New preprint The Arrival of Steppe and Iranian Related Ancestry in the Islands of the Western Mediterranean by Fernandes, Mittnik, Olalde et al. bioRxiv (2019)

Interesting excerpts (emphasis in bold; modified for clarity):

Balearic Islands: The expansion of Iberian speakers

Mallorca_EBA dates to the earliest period of permanent occupation of the islands at around 2400 BCE. We parsimoniously modeled Mallorca_EBA as deriving 36.9 ± 4.2% of her ancestry from a source related to Yamnaya_Samara; (…). We next used qpAdm to identify “proximal” sources for Mallorca_EBA’s ancestry that are more closely related to this individual in space and time, and found that she can be modeled as a clade with the (small) subset of Iberian Bell Beaker culture associated individuals who carried Steppe-derived ancestry (p=0.442).

Suppl. Materials: The model used was with Bell_Beaker_Iberia_highsteppe, a group of outliers from Iberia buried in a Bell Beaker mortuary context who unlike most individuals from this context in that region had high proportions of Steppe ancestry (p=0.442).

Our estimates of Steppe ancestry in the two later Balearic Islands individuals are lower than the earlier one: 26.3 ± 5.1% for Formentera_MBA and 23.1 ± 3.6% for Menorca_LBA, but the Middle to Late Bronze Age Balearic individuals are not a clade relative to non-Balearic groups. Specifically, we find that f4(Mbuti.DG, X; Formentera_MBA, Menorca_LBA) is positive when X=Iberia_Chalcolithic (Z=2.6) or X=Sardinia_Nuragic_BA (Z=2.7). While it is tempting to interpret the latter statistic as suggesting a genetic link between peoples of the Talaiotic culture of the Balearic islands and the Nuragic culture of Sardinia, the attraction to Iberia_Chalcolithic is just as strong, and the mitochondrial haplogroup U5b1+16189+@16192 in Menorca_LBA is not observed in Sardinia_Nuragic_BA but is observed in multiple Iberia_Chalcolithic individuals. A possible explanation is that both the ancestors of Nuragic Sardinians and the ancestors of Talaiotic people from the Balearic Islands received gene flow from an unsampled Iberian Chalcolithic-related group (perhaps a mainland group affiliated to both) that did not contribute to Formentera_MBA.

This sample, like another one in El Argar, is of hg. R1b-P312. So there you are, the data that connects the Proto-Iberian expansion (replacing IE-speaking Bell Beakers) to the Iberian Chalcolithic population, signaled by the increase in Iberian Chalcolithic ancestry after the arrival of Bell Beakers, most likely connected originally to the Argaric and post-Argaric expansions during the MBA.

PCA with previously published ancient individuals (non-filled symbols), projected onto variation from present-day populations (gray squares).

Steppe in Sardinia IA: Phocaeans from Italy?

Most Sardinians buried in a Nuragic Bronze Age context possessed uniparental haplogroups found in European hunter-gatherers and early farmers, including Y-haplogroup R1b1a[xR1b1a1a] which is different from the characteristic R1b1a1a2a1a2 spread in association with the Bell Beaker complex. An exception is individual I10553 (1226-1056 calBCE) who carried Y-haplogroup J2b2a, previously observed in a Croatian Middle Bronze Age individual bearing Steppe ancestry, suggesting the possibility of genetic input from groups that arrived from the east after the spread of first farmers. This is consistent with the evidence of material culture exchange between Sardinians and mainland Mediterranean groups, although genome-wide analyses find no significant evidence of Steppe ancestry so the quantitative demographic impact was minimal.

Another interesting data, these (Mesolithic) remnant R1b-V88 lineages closely related to the Italian Peninsula, the most likely region of expansion of these lineages into Africa, in turn possibly connected to the expansion of Proto-Afroasiatic.

We detect definitive evidence of Iranian-related ancestry in an Iron Age Sardinian I10366 (391-209 calBCE) with an estimate of 11.9 ± 3.7.% Iran_Ganj_Dareh_Neolithic related ancestry, while rejecting the model with only Anatolian_Neolithic and WHG at p=0.0066 (Supplementary Table 9). The only model that we can fit for this individual using a pair of populations that are closer in time is as a mixture of Iberia_Chalcolithic (11.9 ± 3.2%) and Mycenaean (88.1 ± 3.2%) (p=0.067). This model fits even when including Nuragic Sardinians in the outgroups of the qpAdm analysis, which is consistent with the hypothesis that this individual had little if any ancestry from earlier Sardinians.

Proportions of ancestry using a distal qpAdm framework on an individual basis (a), and based on qpWave clusters

Sicily EBA: The Lusitanian/Ligurian connection?

(…) While a previously reported Bell Beaker culture-associated individual from Sicily had no evidence of Steppe ancestry, (…) we find evidence of Steppe ancestry in the Early Bronze Age by ~2200 BCE. In distal qpAdm, the outlier Sicily_EBA11443 is parsimoniously modeled as harboring 40.2 ± 3.5% Steppe ancestry, and the outlier Sicily_EBA8561 is parsimoniously modeled as harboring 23.3 ± 3.5% Steppe ancestry. (…) The presence of Steppe ancestry in Early Bronze Age Sicily is also evident in Y chromosome analysis, which reveals that 4 of the 5 Early Bronze Age males had Steppe-associated Y-haplogroup R1b1a1a2a1a2. (Online Table 1). Two of these were Y-haplogroup R1b1a1a2a1a2a1 (Z195) which today is largely restricted to Iberia and has been hypothesized to have originated there 2500-2000 BCE. This evidence of west-to-east gene flow from Iberia is also suggested by qpAdm modeling where the only parsimonious proximate source for the Steppe ancestry we found in the main Sicily_EBA cluster is Iberians.

What’s this? An ancestral connection between Sicel Elymian and Galaico-Lusitanian or Ligurian (based on an origin in NE Iberia)? Impossible to say, especially if the languages of these early settlers were replaced later by non-Indo-European speakers from the eastern Mediterranean, and by Indo-European speakers from the mainland closely related to Proto-Italic during the LBA, but see below.

Regarding the comment on R1b-Z195, it is associated with modern Iberians, as DF27 in general, due to founder effects beyond the Pyrenees. It is a very old subclade, split directly from DF27 roughly at the same time as it split from the parent P312, i.e. it can be found anywhere in Europe, and it almost certainly accompanied the expansion of Celts from Central Europe under the subclade R1b-M167/SRY2627.

The connection is thus strong only because of the qpAdm modeling, since R1b-DF27 and subclade R1b-Z195 are certainly lineages expanded quite early, most likely with Yamna settlers in Hungary and East Bell Beakers.

In this case, if stemming from Iberia, it is most likely of subclade R1b-Z220 – or another Z195 (xM167) lineage – originally associated with the Old European substrate found in topo-hydronymy in Iberia, whose most likely remnants attested during the Iron Age were Lusitanians.

Left: Modern distribution of R1b-Z195 (YFull estimate 2700 BC); Right: Modern distribution of DF27. Both include later founder effects within Iberia, so the increase in the Basque country and the Crown of Aragon and the decrease in Portugal can safely be ignored. Contour maps of the derived allele frequencies of the SNPs analyzed in Solé-Morata et al. (2017).

We detect Iranian-related ancestry in Sicily by the Middle Bronze Age 1800-1500 BCE, consistent with the directional shift of these individuals toward Mycenaeans in PCA. Specifically, two of the Middle Bronze Age individuals can only be fit with models that in addition to Anatolia_Neolithic and WHG, include Iran_Ganj_Dareh_Neolithic. The most parsimonious model for Sicily_MBA3125 has 18.0 ± 3.6% Iranian-related ancestry (p=0.032 for rejecting the alternative model of Steppe rather than Iranian-related ancestry), and the most parsimonious model for Sicily_MBA has 14.9 ± 3.9% Iranian-related ancestry (p=0.037 for rejecting the alternative model).

The modern southern Italian Caucasus-related signal identified in Raveane et al. (2018) is plausibly related to the same Iranian-related spread of ancestry into Sicily that we observe in the Middle Bronze Age (and possibly the Early Bronze Age).

The non-Indo-European Sicanians and Elymians were possibly then connected to eastern Mediterranean groups before the expansion of the Sea Peoples.

For the Late Bronze Age group of individuals, qpAdm documented Steppe-related ancestry, modeling this group as 80.2 ± 1.8% Anatolia_Neolithic, 5.3 ± 1.6% WHG, and 14.5 ± 2.2% Yamnaya_Samara. Our modeling using sources more closely related in space and time also supports Sicily_LBA having Minoan-related ancestry or being derived from local preceding populations or individuals with ancestries similar to those of Sicily_EBA3123 (p=0.527), Sicily_MBA3124 (p=0.352), and Sicily_MBA3125 (p=0.095).

This increase in Steppe-related ancestry in a western site during the LBA most likely represents either an expansion from the Aegean or – maybe more likely, given the archaeological finds – a regional population similar to Sicily EBA re-emerging or rather being displaced from the eastern part of the island because of a westward movement from nearby Calabria.

Whether this population sampled spoke Indo-European or not at this time is questionable, since the Iron Age accounts show non-IE Elymians in this region.

Actually, Elymians seem to have spoken Indo-European, which fits well with the increase in steppe ancestry.

EDIT (21 MAR): Interesting about a proposed incoming Minoan-like ancestry is the potential origin of the Iran Neolithic-related ancestry that is going to appear in Central Italy during the LBA. This could then be potentially associated with Tyrsenians passing through the area, although the traditional description may be more more compatible with an arrival of Sea Peoples from the Adriatic.

Sad to read this:

This manuscript is dedicated to the memory of Sebastiano Tusa of the Soprintendenza del Mare in Palermo, who would have been an author of this study had he not tragically died in the crash of Ethiopia Airlines flight 302 on March 10.


Aquitanians and Iberians of haplogroup R1b are exactly like Indo-Iranians and Balto-Slavs of haplogroup R1a


The final paper on Indo-Iranian peoples, by Narasimhan and Patterson (see preprint), is soon to be published, according to the first author’s Twitter account.

One of the interesting details of the development of Bronze Age Iberian ethnolinguistic landscape was the making of Proto-Iberian and Proto-Basque communities, which we already knew were going to show R1b-P312 lineages, a haplogroup clearly associated during the Bell Beaker period with expanding North-West Indo-Europeans:

From the Bronze Age (~2200–900 BCE), we increase the available dataset from 7 to 60 individuals and show how ancestry from the Pontic-Caspian steppe (Steppe ancestry) appeared throughout Iberia in this period, albeit with less impact in the south. The earliest evidence is in 14 individuals dated to ~2500–2000 BCE who coexisted with local people without Steppe ancestry. These groups lived in close proximity and admixed to form the Bronze Age population after 2000 BCE with ~40% ancestry from incoming groups. Y-chromosome turnover was even more pronounced, as the lineages common in Copper Age Iberia (I2, G2, and H) were almost completely replaced by one lineage, R1b-M269.

Proportion of ancestry derived from central European Beaker/Bronze Age populations in Iberians from the Middle Neolithic to the Iron Age (table S15). Colors indicate the Y-chromosome haplogroup for each male. Red lines represent period of admixture. Modified from Olalde et al. (2019).

The arrival of East Bell Beakers speaking Indo-European languages involved, nevertheless, the survival of the two non-IE communities isolated from each other – likely stemming from south-western France and south-eastern Iberia – thanks to a long-lasting process of migration and admixture. There are some common misconceptions about ancient languages in Iberia which may have caused some wrong interpretations of the data in the paper and elsewhere:

NOTE. A simple reading of Iberian prehistory would be enough to correct these. Two recent books on this subject are Villar’s Indoeuropeos, iberos, vascos y otros parientes and Vascos, celtas e indoeuropeos. Genes y lenguas.

Iberian languages were spoken at least in the Mediterranean and the south (ca. “1/3 of Iberia“) during the Bronze Age.

Nope, we only know the approximate location of Iberian culture and inscriptions from the Late Iron Age, and they occupy the south-eastern and eastern coastal areas, but before that it is unclear where they were spoken. In fact, it seems evident now that the arrival of Urnfield groups from the north marks the arrival of Celtic-speaking peoples, as we can infer from the increase in Central European admixture, while the expansion of anthropomorphic stelae from the north-west must have marked the expansion of Lusitanian.

Vasconic was spoken in both sides of the Pyrenees, as it was in the Middle Ages.

Wrong. One of the worst mistakes I am seeing in many comments since the paper was published, although admittedly the paper goes around this problem talking about “Modern Basques”. Vasconic toponyms appear south of the Pyrenees only after the Roman conquests, and tribes of the south-western Pyrenees and Cantabrian regions were likely Celtic-speaking peoples. Aquitanians (north of the western Pyrenees) are the only known ancient Vasconic-speaking population in proto-historic times, ergo the arrival of Bell Beakers in Iberia was most likely accompanied by Indo-European languages which were later replaced by Celtic expanding from Central Europe, and Iberian expanding from south-east Iberia, and only later with Latin and Vasconic.

Ligurian is non-Indo-European, and Lusitanian is Celtic-like, so Iberia must have been mostly non-Indo-European-speaking.

The fragmentary material available on Ligurian is enough to show that phonetically it is a NWIE dialect of non-Celtic, non-Italic nature, much like Lusitanian; that is, unless you follow laryngeals up to Celtic or Italic, in which case you can argue anything about this or any other IE language, as people who reconstruct laryngeals for Baltic in the common era do.

EDIT (19 Mar 2019): It was not clear enough from this paragraph, because Ligurian-like languages in NE Iberia is just a hypothesis based on the archaeological connection of the whole southern France Bell Beaker region. My aim was to repeat the idea that Old European topo-hydronymy is older in NE Iberia (as almost anywhere in Iberia) than Iberian toponymy, so the initial hypothesis is that:

  1. a Palaeo-European language (as Villar puts it) expanded into most regions of Iberia in ancient times (he considered at some point the Mesolithic, but that is obviously wrong, as we know now); then
  2. Celts expanded at least to the Ebro River Basin; then
  3. Iberians expanded to the north and replaced these in NE Iberia; and only then
  4. after the Roman invasion, around the start of the Common Era, appear Vasconic toponyms south of the Pyrenees.

Lusitanian obviously does not qualify as Celtic, lacking the most essential traits that define Celticness…Unless you define “(Para-)Celtic” as Pre-Proto-Celtic-like, or anything of the sort to support some Atlantic continuity, in which case you can also argue that Pre-Italic or Pre-Germanic are Celtic, because you would be essentially describing North-West Indo-European

If Basques have R1b, it’s because of a culture of “matrilocality” as opposed to the “patrilocality” of Indo-Europeans

So wrong it hurts my eyes every time I read this. Not only does matrilocality in a regional group have few known effects in genetics, but there are many well-documented cases of population replacement (with either ancestry or Y-DNA haplogroups, or both) without language replacement, without a need to resort to “matrilineality” or “matrilocality” or any other cultural difference in any of these cases.

In fact, it seems quite likely now that isolated ancient peoples north of the Pyrenees will show a gradual replacement of surviving I2a lineages by neighbouring R1b, while early Iberian R1b-DF27 lineages are associated with Lusitanians, and later incoming R1b-DF27 lineages (apart from other haplogroups) are most likely associated with incoming Celts, which must have remained in north-central and central-east European groups.

NOTE. Notice how R1a is fully absent from all known early Indo-European peoples to date, whether Iberian IE, British IE, Italic, or Greek. The absence of R1a in Iberia after the arrival of Celts is even more telling of the origin of expanding Celts in Central Europe.

I haven’t had enough time to add Iberian samples to my spreadsheet, and hence neither to the ASoSaH texts nor maps/PCAs (and I don’t plan to, because it’s more efficient for me to add both, Asian and Iberian samples, at the same time), but luckily Maciamo has summed it up on Eupedia. Or, graphically depicted in the paper for the southeast:

Y chromosome haplogroup composition of individuals from southeast Iberia during the past 2000 years. The general Iberian Bronze and Iron Age population is included for comparison. Modified from Olalde et al. (2019).

Does this continued influx of Y-DNA haplogroups in Iberia with different cultures represent permanent changes in language? Are, therefore, modern Iberian languages derived from Lusitanian, Sorothaptic/Celtic, Greek, Phoenician, East or West Germanic, Hebrew, Berber, or Arabic languages? Obviously not. Same with Italy (see the recent preprint on modern Italians by Raveane et al. 2018), with France, with Germany, or with Greece.

If that happens in European regions with a known ancient history, why would the recent expansions and bottlenecks of R1b in modern Basques (or N1c around the Baltic, or R1a in Slavs) in the Middle Ages represent an ancestral language surviving into modern times?


If something is clear from Narasimhan, Patterson, et al. (2018), is that we know finally the timing of the introduction and expansion of R1a-Z645 lineages among Indo-Iranians.

We could already propose since 2015 that a slow admixture happened in the steppes, based on archaeological finds, due to settlement elites dominating over common peoples, coupled with the known Uralic linguistic traits of Indo-Iranian (and known Indo-Iranian influence on Finno-Ugric) – as I did in the first version of the Indo-European demic diffusion model.

The new huge sampling of Sintashta – combined with that of Catacomb, Poltavka, Potapovka, Andronovo, and Srubna – shows quite clearly how this long-term admixture process between Uralic peoples and Indo-Iranians happened between forest-steppe CWC (mainly Abashevo) and steppe groups. The situation is not different from that of Iberia ca. 2500-2000 BC; from Narasimhan, Patterson, et al. (2018):

We combined the newly reported data from Kamennyi Ambar 5 with previously reported data from the Sintashta 5 individuals (10). We observed a main cluster of Sintashta individuals that was similar to Srubnaya, Potapovka, and Andronovo in being well modeled as a mixture of Yamnaya-related and Anatolian Neolithic (European agriculturalist-related) ancestry.

Even with such few words referring to one of the most important data in the paper about what happened in the steppes, Wang et al. (2018) help us understand what really happened with this simplistic concept of “steppe ancestry” regarding Yamna vs. Corded Ware differences:

Image modified from Wang et al. (2018). Marked are: in red, approximate limit of Anatolia_Neolithic ancestry found in Yamna populations; in blue, Corded Ware-related groups. “Modelling results for the Steppe and Caucasus 1128 cluster. Admixture proportions based on (temporally and geographically) distal and proximal models, showing additional Anatolian farmer-related ancestry in Steppe groups as well as additional gene flow from the south in some of the Steppe groups as well as the Caucasus groups (see also Supplementary Tables 10, 14 and 20).”

As with Iberia (or any prehistoric region), the details of how exactly this language change happened are not evident, but we only need a plausible explanation coupled with archaeology and linguistics. Poltavka, Potapovka, and Sintashta samples – like the few available Iberian ones ca. 2500-2000 BC – offer a good picture of the cohabitation of R1b-L23 (mainly Z2103) and R1a-Z645 (mainly Z93+): a glimpse at the likely presence of R1a-Z93 within settlements – which must have evolved as the dominant elites – in a society where the majority of the population was initially formed by nomad herders (probably most R1b-Z2103), who were usually buried outside of the main settlements.

Will the upcoming Narasimhan, Patterson et al. (2019) deal with this problem of how R1a-M417 replaced R1b-M269, and how the so-called “Steppe_MLBA” (i.e. Corded Ware) ancestry admixed with “Steppe_EMBA” (i.e. Yamnaya) ancestry in the steppes, and which one of their languages survived in the region (that is, the same the Reich Lab has done with Iberia)? Not likely. The ‘genetic wars’ in Iberia deal with haplogroup R1b-P312, and how it was neither ‘native’ nor associated with Basques and non-Indo-European peoples in general. The ‘genetic wars’ in South Asia are concerned with the steppe origin of R1a, to prove that it is not a ‘native’ haplogroup to India, and thus neither are Indo-Aryan languages. To each region a politically correct account of genetic finds, with enough care not to fully dismiss national myths, it seems.

NOTE. Funnily enough, these ‘genetic wars’ are the making of geneticists since the 1990s and 2000s, so we are still in the midst of mostly internal wars caused by what they write. Just as genetic papers of the 2020s will most likely be a reaction to what they are writing right now about “steppe ancestry” and R1a. You won’t find much change to the linguistic reconstruction in this whole period, except for the most multicolored glottochronological proposals…

The first author of the paper has engaged, as far as I could see in Twitter, in dialogue with Hindu nationalists who try to dismiss the arrival of steppe ancestry and R1a into South Asia as inconclusive (to support the potential origin of Sanskrit millennia ago in the Indus Valley Civilization). How can geneticists deal with the real problem here (the original ethnolinguistic group expanding with Corded Ware), when they have to fend off anti-steppists from Europe and Asia? How can they do it, when they themselves are part of the same societies that demand a politically correct presentation of data?

This is how the data on the most likely Indo-Iranian-speaking region should be presented in an ideal world, where – as in the Iberia paper – geneticists would look closely to the Volga-Ural region to discover what happened with Proto-Indo-Iranians from their earliest to their latest stage, instead of constantly looking for sites close to the Indus Valley to demonstrate who knows what about modern Indian culture:

Tentative map of the Late PIE and Indo-Iranian community in the Volga-Ural steppes since the Eneolithic. Proportion of ancestry derived from central European Corded Ware peoples. Colors indicate the Y-chromosome haplogroup for each male. Red lines represent period of admixture. Modified from Olalde et al. (2019).

Now try and tell Hindu nationalists that Sanskrit expanded from an Early Bronze Age steppe community of R1b-rich nomadic herders that spoke Pre-Indo-Iranian, which was dominated and eventually (genetically) mostly replaced by elite Uralic-speaking R1a peoples from the Russian forest, hence the known phonetic (and some morphological) traits that remained. Good luck with the Europhobic shitstorm ahead..


Iberian cultures, already with a majority of R1b lineages, show a clear northward expansion over previously Urnfield-like groups of north-east Iberia and Mediterranean France (which we now know probably represent the migration of Celts from central Europe). Similarly, Eastern Balts already under a majority of R1a lineages expanded likely into the Baltic region at the same time as the outlier from Turlojiškė (ca. 1075 BC), which represents the first obvious contacts of central-east Europe with the Baltic.

Iberia shows a more recent influx of central and eastern Mediterranean peoples, one of which eventually succeeded in imposing their language in Western Europe: Romans were possibly associated mainly with R1b-U152, apart from many other lineages. Proto-Slavs probably expanded later than Celts, too, connected to the disintegration of the Lusatian culture, and they were at some point associated with R1a-M458 and R1a-Z280(xZ92) lineages, apart from others already found in Early Slavs.

PCA of central-eastern European groups which may have formed the Balto-Slavic-speaking community derived from Bell Beaker, evident from the position ‘westwards’ of CWC in the PCA, and surrounding cultures. Left: Early Bronze Age. Right: Tollense Valley samples.

This parallel between Iberia and eastern Europe is no coincidence: as Europe entered the Bronze Age, chiefdom-based systems became common, and thus the connection of ancestry or haplogroups with ethnolinguistic groups became weaker.

What happened earlier (and who may represent the Pre-Balto-Slavic community) will be clearer when we have enough eastern European samples, but basically we will be able to depict this admixture of NWIE-speaking BBC-derived peoples with Uralic-speaking CWC-derived groups (since Uralic is known to have strongly influenced Balto-Slavic), similar to the admixture found in Indo-Iranians, more or less like this:

Tentative map of the North-West Indo-European and Balto-Slavic community in central-eastern Europe since the East Bell Beaker expansion. Proportion of ancestry derived from Corded Ware peoples. Colors indicate the Y-chromosome haplogroup for each male. Red lines represent period of admixture. Modified from Olalde et al. (2019).

The Early Scythian period marked a still stronger chiefdom-based system which promoted the creation of alliances and federation-like groups, with an earlier representation of the system expanding from north-eastern Europe around the Baltic Sea, precisely during the spread of Akozino warrior-traders (in turn related to the Scythian influence in the forest-steppes), who are the most likely ancestors of most N1c-V29 lineages among modern Germanic, Balto-Slavic, and Volga-Finnic peoples.

Modern haplogroup+language = ancient ones?

It is not difficult to realize, then, that the complex modern genetic picture in Eastern Europe and around the Urals, and also in South Asia (like that of the Aegean or Anatolia) is similar to the Iron Age / medieval Iberian one, and that following modern R1a as an Indo-European marker just because some modern Indo-European-speaking groups showed it was always a flawed methodology; as flawed as following R1b for ancient Vasconic groups, or N1c for ancient Uralic groups.

Why people would argue that haplogroups mean continuity (e.g. R1b with Basques, N1c with Finns, R1a with Slavs, etc.) may be understood, if one lives still in the 2000s. Just like why one would argue that Corded Ware is Indo-European, because of Gimbutas’ huge influence since the 1960s with her myth of “Kurgan peoples”. Not many denied these haplogroup associations, because there was no reason to do it, and those who did usually aligned with a defense of descriptive archaeology.

However, it is a growing paradox that some people interested in genetics today would now, after the Iberian paper, need to:

  • accept that ancient Iberians and probably Aquitanians (each from different regions, and probably from different “Basque-Iberian dialects” in the Chalcolithic, if both were actually related) show eventually expansions with R1b-L23, the haplogroup most obviously associated with expanding Indo-Europeans;
  • acknowledge that modern Iberians have many different lineages derived from prehistoric or historic peoples (Celts, Phoenicians, Greeks, Romans, Jews, Goths, Berbers, Arabs), which have undergone different bottlenecks, the last ones during the Reconquista, but none of their languages have survived;
  • realize that a similar picture is to be found everywhere in central and western Europe since the first proto-historic records, with language replacement in spite of genetic continuity, such as the British Isles (and R1b-L21 continuity) after the arrival of Celts, Romans, Anglo-Saxons, Vikings, or Normans;
  • but, at the same time, continue blindly asserting that haplogroup R1a + “steppe ancestry” represent some kind of supernatural combination which must show continuity with their modern Indo-Iranian or Balto-Slavic language from time immemorial.
Replacement of R1b-L23 lineages during the Early Bronze Age in eastern Europe and in the Eurasian steppes: emergence of R1a in previous Yamnaya and Bell Beaker territories. Modified from EBA Y-DNA map.

Behave, pretty please

The ‘conservative’ message espoused by some geneticists and amateur genealogists here is basically as follows:

  • Let’s not rush to new theories that contradict the 2000s, lest some people get offended by granddaddy not being these pure whatever wherever as they believed, and let’s wait some 5, 10, or 20 years, as long as necessary – to see if some corner of the Yamna culture shows R1a, or some region in north-eastern Europe shows N1c, or some Atlantic Chalcolithic sample shows R1b – to challenge our preferred theories, if we actually need to challenge anything at all, because it hurts too much.
  • Just don’t let many of these genetic genealogists or academics of our time be unhappy, pretty please with sugar on top, and let them slowly adapt to reality with more and more pet theories to fit everything together (past theories + present data), so maybe when all of them are gone, within 50 or 70 years, society can smoothly begin to move on and propose something closer to reality, but always as politically correct as possible for the next generations.
  • For starters, let’s discuss now (yet again) that Bell Beakers may not have been Indo-European at all, despite showing (unlike Corded Ware) clearly Yamna male lineages and ancestry, because then Corded Ware and R1a could not have been Indo-European and that’s terrible, so maybe Bell Beakers are too brachycephalic to speak Indo-European or something, or they were stopped by the Fearsome Tisza River, or they are not pure Dutch Single Grave in The South hence not Indo-European, or whatever, and that’s why Iron Age Iberians or Etruscans show non-Indo-European languages. That’s not disrespectful to the history of certain peoples, of course not, but talking about the evident R1a-Uralic connection is, because this is The South, not The North, and respect works differently there.
  • Just don’t talk about how Slavs and Balts enter history more than 1,500 years later than Indo-European peoples in Western and Southern Europe, including Iberia, and assume a heroic continuity of Balts and Slavs as pure R1a ‘steppe-like’ peoples dominating over thousands of kms. in the Baltic, Fennoscandia, eastern Europe, and northern Asia for 5,000 years, with multiple Balto-Slavs-over-Balto-Slavs migrations, because these absolute units of Indo-European peoples were a trip and a half. They are the Asterix and Obelix of white Indo-European prehistory.
  • Perhaps in the meantime we can also invent some new glottochronological dialectal scheme that fits the expansion of Sredni Stog/Corded Ware with (Germano-?)Indo-Slavonic separated earlier than any other Late PIE dialect; and Finno-Volgaic later than any other Uralic dialect, in the Middle Ages, with N1c.
Genetic structure of the Balto-Slavic populations within a European context according to the three genetic systems, from Kushniarevich et al. (2015). Pure Balto-Slavs from…hmm…yeah this…ancient…region…or people…cluster…Whatever, very very steppe-like peoples, the True Indo-Europeans™, so close to Yamna…almost as close as Finno-Ugrians.

To sum up: Iberia, Italy, France, the British Isles, central Europe, the Balkans, the Aegean, or Anatolia, all these territories can have a complex history of periodic admixture and language replacement everywhere, but some peoples appearing later than all others in the historical record (viz. Basques or Slavs) apparently cannot, because that would be shameful for their national or ethnic myths, and these should be respected.

Ignorance of the own past as a blank canvas to be filled in with stupid ethnolinguistic continuity, turned into something valuable that should not be challenged. Ethnonationalist-like reasoning proper of the 19th century. How can our times be called ‘modern’ when this kind of magical thinking is still prevalent, even among supposedly well-educated people?


Haplogroup R1b-M167/SRY2627 linked to Celts expanding with the Urnfield culture


As you can see from my interest in the recently published Olalde et al. (2019) Iberia paper, once you accept that East Bell Beakers expanded North-West Indo-European, the most important question becomes how did its known dialects spread to their known historic areas.

We already had a good idea about the expansion of Celts, based on proto-historical accounts, fragmentary languages, and linguistic guesstimates, but the connection of Celtic with either Urnfield or slightly later Hallstatt/La Tène was always blurred, due to the lack of precise data on population movements.

The latest paper on Iberia is interesting for many details, such as:

  • The express dismissal of the newest pet theory based on the simplistic “steppe ancestry = IE”: the obsessive comparisons of Dutch Bell Beakers as the origin of basically anything that moves in Europe.
  • A discrete influx of North African ancestry in certain samples before the Moorish invasion (which was probably mediated by peoples of North African rather than Levantine admixture).
  • The finding of very Mycenaean-like Greek colonies of the 5th century (interestingly, under R1b lineages).
Modified from section of PCA of ancient samples by Olalde et al. (2019). “IE Iberia” refers to Pre-Celtic Indo-European languages of Iberia, such as Galaico-Lusitanian in the west (see more on Lusitanian), and a potentially Ligurian-related language in the North-East and southern France.

The paper is, however, of particular importance from the perspective of historical linguistics. It confirms that:

  • Celtic-speaking peoples expanded in Iberia likely during the Late Bronze Age – Early Iron Age (probably with the Urnfield culture, before 1000 BC) with North/Central European ancestry.

NOTE. The paper marks what are believed to be the boundaries of non-Indo-European languages during the Iron Age in later times, extrapolating that situation to the past. Mediterranean sites with Iberian traits (ca. 6th century on) were probably non-Indo-European-speaking tribes, but it is unclear what happened in the centuries before their sampling, and there are no clear boundaries. These incoming Celts from central Europe with the Urnfield culture makes it very likely that the Iberian expansion to the north happened later, incorporating thus this central European ancestry in the process. The southern (orientalizing, Tartessian) site of La Angorrilla shows incineration and influence from Phoenician settlers, and their actual language is also far from clear. The other investigated samples, with higher central European contribution, are from Celtiberian sites.

  • The slightly later arrival of (Phoenician, Greek and) Latin-speaking peoples into Iberia is marked by Central/Eastern Mediterranean and North African ancestry.
Expansion of different ancestry components in Iberia during Prehistory. Modified from Olalde et al. (2019) to include labels with populations expanding with each component.

While both confirm what was more or less already known about the oldest attested NWIE dialects, and further support the role of East Bell Beakers in expanding North-West Indo-European, the first part is interesting for two main reasons:

  1. Koch’s Celtic from the West hypothesis, which made a recent comeback with a renewed model based on “steppe ancestry”, is once again rejected in population genomics, as expected. At this point I doubt this will mean anything to the supporters of the theory (because you can propose as many “Celtic-over-Celtic” layers as you want), but if you are not obsessed with autochthonous continuity of Celtic languages in the Atlantic area we might begin to judge the most correct dialectal split (and thus classification) among those proposed to date, based on ancestry and haplogroup expansions.
  2. We believed in the 2000s that the expansion of haplogroup R1b-M167 (TMRCA ca. 1100 BC for YTree or 1700 BC for YFull) was coupled with the expansion of Iberians from the Pyrenees, in turn (thus) closely related to Basques. This non-IE presence has been contested with toponymic data in linguistics, and with the testing of many modern samples and the subsequent discovery of the widespread distribution of the subclade in western and northern Europe. Now it has become even more likely (lacking confirmation with aDNA) that this haplogroup expanded with Celts.

NOTE. Regarding R1b SNPs, YTree has more samples (and thus more SNPs) to work with estimates, due to its connection with FTDNA groups, so it is in principle more reliable (although estimates were calculated in 2017). Nevertheless, the methods to estimate the age of the MRCA are different between YTree and YFull.

YTree estimations of TMRCA for R1b-Z262 (left) and R1b-M167 (right).

Why this is important has to do with the realization that Celts must have expanded explosively in all directions during the estimated range for Common Celtic (ca. 1500-1000 BC), and as such R1b-M167 is probably going to be one of the clear Y-DNA markers of the Celtic expansion, when it appears in the ancient DNA record, maybe in new SNP calls from samples of the Olalde et al. (2019) paper, or in future Urnfield/Hallstatt/La Tène papers.

Sister clades derived from R1b-Z262 (TMRCA ca. 1650 BC for YTree, or 2700 for YFull), although sharing a quite old origin, may have taken part in the same communities that expanded R1b-M167, likely from some point in central Europe, possibly as remnants of a previous (Tumulus culture?) central European expansion, as the sample SZ5 from Szólád (R1b-CTS1595) and the distribution of modern samples suggest.

Left: Modern distribution of upstream clade L176.2 (YFull R1b-CTS4188); Right: Modern distribution of M167. Both include later expansions within Iberia (probably with the Crown of Aragon during the Reconquista). Contour maps of the derived allele frequencies of the SNPs analyzed in Solé-Morata et al. (2017).

EDIT (23 APRIL): In Hernández et al. (2018), the TMRCA of R1b-M167 is reported as 3372-3718 ybp:

The youngest sub-branch, R1b-M167, dates to approximately 3.5 kya (95% CI= 2.5-5.3 kya), i.e. even after the Bronze Age.

Contour (surface) maps displaying the frequencies of Y-chromosome haplogroup and its sub-lineages across Europe and the Mediterranean basin. Modified from Hernández et al. (2018).

NOTE. Admittedly, the maps are mainly based on Iberian samples and certain limited sampling elsewhere, so most of the frequencies displayed in other territories are extrapolated. Since the percentage of R1b-M167 in France is estimated to be ca. 3%, and in Bavaria ca. 5%, the distribution in Central Europe is probably much higher, and around the Mediterranean much lower than represented in them.

The Celtic expansion might not have been a mass migration of peoples replacing all male lines of their controlled territories (as was common in the Neolithic and Chalcolithic), because of the Bronze Age dominant chiefdom-based system that relied on alliances, but it is becoming clear that Early Celts are also going to show the expansion of certain successful male lineages.

Oh, and you can say goodbye to the autochthonous “Vasconic = R1b-DF27” (latest heir of the “Vasconic = R1b-P312”) theory, too, if – for some strange reason – you hadn’t already.

EDIT (16 MAR) Just in case the wording is not clear: the fact that this haplogroup most likely expanded with Celts does not mean that its lineages didn’t become eventually incorporated into Iberian cultures and adopted non-IE languages: some of them probably did at some point, in some regions of northern Iberia, and most were certainly later incorporated to the Roman civilization and spoke Latin, then to the medieval kingdoms with their languages, and so on until the present day… Only those eventually associated with Iron Age Aquitanians may have retained their non-IE language, unless those lineages today associated with Basques were incorporated later to the Basque-speaking regions by expanding medieval kingdoms. A complex picture repeated everywhere in Europe: no haplogroup+language continuity in sight, anywhere.

NOTE: This here is currently the most likely interpretation of data based on estimations of mutations; it is not confirmed with ancient samples.


Iberia: East Bell Beakers spread Indo-European languages; Celts expanded later


New paper (behind paywall), The genomic history of the Iberian Peninsula over the past 8000 years, by Olalde et al. Science (2019).

NOTE. Access to article from Reich Lab: main paper and supplementary materials.


We assembled genome-wide data from 271 ancient Iberians, of whom 176 are from the largely unsampled period after 2000 BCE, thereby providing a high-resolution time transect of the Iberian Peninsula. We document high genetic substructure between northwestern and southeastern hunter-gatherers before the spread of farming. We reveal sporadic contacts between Iberia and North Africa by ~2500 BCE and, by ~2000 BCE, the replacement of 40% of Iberia’s ancestry and nearly 100% of its Y-chromosomes by people with Steppe ancestry. We show that, in the Iron Age, Steppe ancestry had spread not only into Indo-European–speaking regions but also into non-Indo-European–speaking ones, and we reveal that present-day Basques are best described as a typical Iron Age population without the admixture events that later affected the rest of Iberia. Additionally, we document how, beginning at least in the Roman period, the ancestry of the peninsula was transformed by gene flow from North Africa and the eastern Mediterranean.

Interesting excerpts:

From the Bronze Age (~2200–900 BCE), we increase the available dataset (6, 7, 17) from 7 to 60 individuals and show how ancestry from the Pontic-Caspian steppe (Steppe ancestry) appeared throughout Iberia in this period (Fig. 1, C and D), albeit with less impact in the south (table S13). The earliest evidence is in 14 individuals dated to ~2500–2000 BCE who coexisted with local people without Steppe ancestry (Fig. 2B). These groups lived in close proximity and admixed to form the Bronze Age population after 2000 BCE with ~40% ancestry from incoming groups (Fig. 2B and fig. S6).

Y-chromosome turnover was even more pronounced (Fig. 2B), as the lineages common in Copper Age Iberia (I2, G2, and H) were almost completely replaced by one lineage, R1b-M269. These patterns point to a higher contribution of incoming males than females, also supported by a lower proportion of nonlocal ancestry on the X-chromosome (table S14 and fig. S7), a paradigm that can be exemplified by a Bronze Age tomb from Castillejo del Bonete containing a male with Steppe ancestry and a female with ancestry similar to Copper Age Iberians.


For the Iron Age, we document a consistent trend of increased ancestry related to Northern and Central European populations with respect to the preceding Bronze Age (Figs. 1, C and D, and 2B). The increase was 10 to 19% (95% confidence intervals given here and in the percentages that follow) in 15 individuals along the Mediterranean coast where non-Indo-European Iberian languages were spoken; 11 to 31% in two individuals at the Tartessian site of La Angorrilla in the southwest with uncertain language attribution; and 28 to 43% in three individuals at La Hoya in the north where Indo-European Celtiberian languages were likely spoken (fig. S6 and tables S11 and S12).

This trend documents gene flow into Iberia during the Late Bronze Age or Early Iron Age, possibly associated with the introduction of the Urnfield tradition (18). Unlike in Central or Northern Europe, where Steppe ancestry likely marked the introduction of Indo-European languages (12), our results indicate that, in Iberia, increases in Steppe ancestry were not always accompanied by switches to Indo-European languages.

I think it is obvious they are extrapolating the traditional (not that well-known) linguistic picture of Iberia during the Iron Age, believing in continuity of that picture (especially non-Indo-European languages) during the Urnfield period and earlier.

What this data shows is, as expected, the arrival of Celtic languages in Iberia after Bell Beakers and, by extension, in the rest of western Europe. Somewhat surprisingly, this may have happened during the Urnfield period, and not during the La Tène period.

Also important are the precise subclades:

We thus detect three Bronze Age males who belonged to DF27 (154, 155), confirming its presence in Bronze Age Iberia. The other Iberian Bronze Age males could belong to DF27 as well, but the extremely low recovery rate of this SNP in our dataset prevented us to study its true distribution. All the Iberian Bronze Age males with overlapping sequences at R1b-L21 were negative for this mutation. Therefore, we can rule out Britain as a plausible proximate origin since contemporaneous British males are derived for the L21 subtype.

New open access paper Survival of Late Pleistocene Hunter-Gatherer Ancestry in the Iberian Peninsula, by Villalba-Mouco et al. Cell (2019):

BAL0051 could be assigned to haplogroup I1, while BAL003 carries the C1a1a haplogroup. To the limits of our typing resolution, EN/MN individuals CHA001, CHA003, ELT002 and ELT006 share haplogroup I2a1b, which was also reported for Loschbour [73] and Motala HG [13], and other LN and Chalcolithic individuals from Iberia [7, 9], as well as Neolithic Scotland, France, England [9], and Lithuania [14]. Both C1 and I1/ I2 are considered typical European HG lineages prior to the arrival of farming. Interestingly, CHA002 was assigned to haplogroup R1b-M343, which together with an EN individual from Cova de Els Trocs (R1b1a) confirms the presence of R1b in Western Europe prior to the expansion of steppe pastoralists that established a related male lineage in Bronze Age Europe [3, 6, 9, 13, 19]. The geographical vicinity and contemporaneity of these two sites led us to run genomic kinship analysis in order to rule out any first or second degree of relatedness. Early Neolithic individual FUC003 carries the Y haplogroup G2a2a1, commonly found in other EN males from Neolithic Anatolia [13], Starçevo, LBK Hungary [18], Impressa from Croatia and Serbia Neolithic [19] and Czech Neolithic [9], but also in MN Croatia [19] and Chalcolithic Iberia [9].

See also