R1b-rich earliest Corded Ware, a Yamnaya-related vector of Indo-European languages

New open source paper, Dynamic changes in genomic and social structures in third millennium BCE central Europe, by Papac et al., Science Advances (2021).

Interesting excerpts (emphasis mine):

We report genomic data from the earliest CW individuals to date [show] that CW was widespread across Bohemia by 2900 BCE. The early radiocarbon dates are also supported by these individuals’ genetic profiles, who occupy the most extreme positions on PC2, as expected under a scenario of the earliest CW being migrants from the east who mixed with locals, resulting in intermediate PC2 positions in later generations.

(…)We found poor statistical support (P < 0.005) for modeling Bohemia_CW_Early as a two-way mixture of any known Yamnaya source and any local Bohemian or nonlocal pre-CW source from Poland, Ukraine, Hungary, or Germany (...) We find that when either one of Latvia_MN, Ukraine_Neolithic, or PittedWare is added as a source, almost all (280 of 285) model fits (P values) improve and most of them by several orders of magnitude.

PCA of published and newly reported ancient individuals from Bohemia (n = 271). Data are displayed in four major time periods: (A) Pre-CW–Eneolithic, (B) CWC. Modern-day West Eurasian individuals upon which principal components were calculated (n = 1141; table S7) are grayed out in the background with modern-day Czech and relevant ancients (projected) plotted as colored polygons for reference [labeled in (A), WHG, EHG, Latvia Bronze Age (BA), Yamnaya Samara/Kalmykia, and Anatolia Neolithic]. Individuals mentioned in the main text are labeled.

We provide the first genomic data from CW individuals without steppe ancestry, thereby elucidating the social processes of interaction between CW and pre-CW people. Observing only females (four of four) among early CW individuals without steppe ancestry suggests that the process of assimilating pre-CW people into early CW society was female-biased. Two of these females (STD003 and VLI008) plot in close PCA space to GAC individuals from Bohemia and Poland.

R1b-L151 is the most common Y-lineage among early CW males (6 of 11, 55%) and one branch ancestral to R1b-P312, the dominant Y-lineage in BB.

Temporal Y chromosome and autosomal PC2 variation in Bohemia.
Temporal distribution of Y chromosome haplogroups by culture. Schematic of phylogenetic relationships between Y chromosome lineages is shown along y axis. Dashed vertical lines demarcate respective (colored) cultural group into early and late phases.
(B) Temporal variation in PC2 showing the genetic variation of males and females within each cultural group.


Six years after their famous publications linked the “Steppe” ancestry of Corded Ware individuals with that of Yamnaya-Afanasievo samples, two of the labs responsible for these papers have finally released data showing incontrovertible proof of a migration of a population directly related to the paradigmatic representatives of Indo-Tocharian speakers in terms of Y-chromosome haplogroups, despite their differing ancestry and the conspicuous lack of hg. R1a among sampled Yamnaya or Afanasievo groups.

The spread of hg. R1b-L151 (including common subclade R-U106) is now undoubtedly associated with the expansion of the early Corded Ware community. Consequently, as suggested in the paper, Bell Beakers probably formed to the west of Bohemia, closer to the Rhine, and spread with (North-?)West Indo-European dialects from there in all directions. Even though R-P312 has not been reported yet in this sampling (and there is still no access to BAM files), I think the most parsimonious model is that the predecessors of typical Bell Beaker branches formed part of this “Northern Route”, too.

The Corded Ware culture is therefore the most likely initial vector of expansion of Core Indo-European dialects in Northern Europe. Apart from that important revelation, this paper also shows that geneticists and archaeologists are recognizing some of the errors of their ways:

  • Eneolithic migrations into Europe were heavily male biased. Just like many other prehistoric population movements, mind you. But, Proto-Indo-Europeans in particular have shown obvious strong trends to patrilineally dominated groups since the very first genomic papers published; so much that there seemed to be a 1:1 relationship between Y-DNA bottlenecks and cultural expansions. The fact that patrilineality and patrilocality are also reflected in their reconstructed languages should make anachronistic woke debates about gender equality in prehistoric cultures completely unnecessary. David Reich might have finally changed his opinion about Goldberg, Günther et al. PNAS (2017). Or not, it is difficult to tell with so many coauthors.
  • Detailed Y-chromosome haplogroup phylogenetic and geographic investigation is the key to offer direct genetic links of population expansions. The link with R1b-L51 (R-P310 subclades are found to date already at least in Yamnaya Slovakia, Yamnaya Bulgaria, and in Afanasievo) and R1b-L151 subclades in early Corded Ware individuals, and especially the more prevalent R1b-Z2103 in a slightly later CWC sample, are the ultimate proofs of how both populations are tightly connected. Further, the presence of a Q1b2a individual – a lineage probably shared with Yamnaya from the Carpathian Basin and earlier with Khvalynsk – might offer further support to this connection, as does probably the archaic R1b-V1636 lineages from late Single Gravers in Denmark.
  • It is nice to see labs finally admitting that early Corded Ware individuals showed a higher contribution of hunter-gatherer ancestry than Yamnaya, representing a different kind of “Forest-Steppe” ancestry, as I had long suggested. Contrary to the authors’ weird belated change of heart about a lack of connection with sampled steppe populations based on this differing ancestry contribution, their published Y-DNA data has finally established a direct link with Y-chromosome haplogroups present among Yamnaya and Afanasievo. This suggests that the extra HG admixture in early Corded Ware was probably a mainly female-mediated Ukraine Eneolithic-like contribution stemming from the north Pontic region (i.e. from “Skelya” cultures).

In fact, this difference from the Yamnaya-Afanasievo or “Steppe” ancestry proper and the consistency of its contribution among the early R1b-rich population from CWC Bohemia – in line with the ancestry of early R1a-rich CWC samples from the SE Baltic – is what has also made me change my mind about their most likely “Northern Route” of expansion, instead of a “Southern Route” mediated by Yamnaya settlers from the Carpathian Basin, which would be supported by radiocarbon dates.

NOTE. Nevertheless, it is not unthinkable that a similar, unsampled hunter-gatherer population weathered around the northern Carpathian Basin or in southern Poland or Slovakia, and that it was in fact the western Yamnaya settlers from the Carpathian Basin the ones who admixed and eventually spread as the earliest Corded Ware peoples from Lesser Poland, as David W. Anthony seems to believe even today. Admittedly, the current phylogeographic map shows a weird pincer Steppe-related population movement with an emergence of R1b-Z2103 and R1b-L51 subclades in unrelated coeval settlements in Yamnaya Slovakia and CWC Bohemia ca. 2700 BC… On the other hand, I reckon that their radically different cultural packages suggest an earlier separation in the north Pontic area, geographically and chronologically closer to the Yamnaya-Afanasievo split.

Schematic summary of the major processes that shaped the genetic and cultural diversity of Bohemia (red outline) over time. Arrows on maps indicate a general direction of influences rather than discrete routes of migration.

I was wrong

Based on Heyd’s model and available genetic data, I have defended for about four years now that the Yamnaya from the Carpathian Basin gave rise directly to the earliest full-fledged, R1b-L151-rich East Bell Beakers along the Middle-Upper Danube, expanding later to most of Europe, an archaeological model which seamlessly aligned chronologically and geographically with the development and expansion of North-West Indo-European.

After all, Gimbutas’ outdated Kurgan hypothesis had already been critically reviewed multiple times, proving that many of her so-called kurgan cultures featured just a few traits of a broadly defined Yamnaya package. Lothar Kilian in particular showed that the vast majority of defining traits of the Corded Ware package (among the 23 diagnostic features he selected) did not coincide with those of the Yamnaya (Mallory 1989), which contrasts with other very similar roughly contemporary cultures ranging from South-Eastern Europe to Mongolia, including the Bell Beaker package.

The differing Y-DNA and ancestry of Corded Ware peoples excluded it, in my opinion – like many other cultures Gimbutas considered part of the “kurgan waves” – from taking part in early Indo-European expansions. An obvious consequence of this, given the mixed sub-Neolithic nature of the north Pontic forest-steppe cultures, heavily influenced by incoming Comb Ware groups, was that it became ipso facto the most likely vector of Uralic languages, due to the traditional description of the Uralic homeland in North-Eastern Europe, and the lack of appropriate alternative genetic inflows to explain the expansion of Uralic languages.

In my opinion, the data published in this paper make my ethnolinguistic model no longer valid.

Battle Axe culture European single grave burial forms, trepanation centers, and selected connecting important tomb gifts: amphora types and battle stone hammer axe forms. Exchanges between the South Hvar and Central European Bohemian groups are supported by presence of traded West-Bohemian hammer axes (selected redrawn axes, amphora and C14 data composed from different papers). Image from Diedrich (2018).

Historical linguistics

I have to (re)think how I would fit ethnolinguistic labels of Final Eneolithic / Early Bronze Age groups in detail, which is my main interest. I have put little effort to date in fully comprehending the internal evolution of the Central European Corded Ware, because I deemed it a mere cultural dead end, phagocyted very soon by the Yamnaya-Bell Beaker expansion. I suppose now that the unifying A-horizon might have to do with the visible Y-DNA bottleneck under R1a-CTS4385, and potentially with an internal reactionary wave that gave rise to the Bell Beaker phenomenon in the west.

For the moment, early Corded Ware being a vector of Indo-European languages means that most of Europe received waves of Indo-European speakers at some point or another during the EEBA, hence most Bronze Age groups will remain unclassifiable, formed by speakers of IE and non-IE branches that were never recorded. Also, Únětice with its 80% influx of new Y-lineages from the east (including late samples!) gives yet another turn of the screw to the complex interactions in Central-Eastern European EBA, and it is very likely that it will add pressure to locate later Proto-Balto-Slavic developments in a more westerly area than is customary.

The basic ethnolinguistic scheme would look like this:

  • Indo-Anatolian spreading with Khvalynsk-Novodanilovka, and the Proto-Anatolian split with Suvorovo. No other cultural and population expansion fits better the assumed Early Eneolithic spread of PIE.
    • Indo-Tocharian spreading with the population ancestral to Yamnaya-Afanasievo, which is most likely* represented by Repin, a culture with its core area located around the Middle Don River and Don-Volga interfluve.
      • The archaic Tocharian branch spreading with Afanasievo. No other culture could better fit its assumed earlier split.
      • Core Indo-European spreading with Early Yamnaya*. From here on, the picture becomes blurrier now:
        • North-West Indo-European spreading with Central European Corded Ware.
        • Palaeo-Balkan branches spreading with West Yamnaya and/or Catacomb.

*That is, assuming that the traditional picture of Yamnaya-Afanasievo ethnogenesis is right. The amazingly close and recent kinship ties shared by Yamnaya and Afanasievo individuals, as well as their late radiocarbon dates (mean ca. 3100-3000 BC), seem to speak for something a little older than Yamnaya for a community that would include “proto-Corded Ware”, of roughly the same age, which could potentially mean the neighbouring late Sredni Stog. On the other hand, I am guessing that these new Early Corded Ware individuals from Bohemia will show much closer Yamnaya “cousins” than the diluted IBD-sharing reported for late CW individuals, and the highly variable HG-related contribution among upcoming Yamnaya samples from the Don River (the core Repin area) could be the perfect fit to explain the extra HG ancestry among early Corded Ware.

Some of the main problems with this new picture, as far as I can tell, will revolve around the identification of Fatyanovo-Abashevo with the Indo-Iranian-speaking community. While it would suit the areal satemization process of late eastern Corded Ware groups in contact (signaled strongly by the secondary and late spread of R1a-Z645 from the east, a phenomenon also attested in data from this paper), it leads to important inconsistencies about Pre-Proto-Indo-Iranian, like:

  • Its mainstream classification in common with Palaeo-Balkan languages into a Graeco-Aryan group which must have remained in close contact after the split of NWIE;
  • its evident close, long-term interaction with a disintegrating Proto-Uralic community, and the roughly coeval influence on most Uralic branches of the PU Agricultural Substrate;
  • and, quite simply, the need to refer to Pre-Proto-Indo-Iranians now as agriculturalists from the North-Eastern European forest and taiga, which contradicts the most basic tenets of PIIr. palaeolinguistics.

I see a future full of ad hoc explanations to the Indo-Iranian and Uralic question(s), which is bound to create a constant conflict with linguists, but I consider the simpler solution in genetics better than more complex alternatives involving unfalsifiable cultural diffusion events.

Future changes

As some of you already know, I have been pondering for a while whether to continue with this project or not. Due to water damage to my hard drives, I have lost almost all original files I worked with in the past 3½ years or so, apart from hundreds of collected books and thousands of carefully classified articles on linguistics and archaeology. Not to speak about other personal and professional projects, which have a higher value than this hobby.

Thanks to the online sharing of resources, I still have an almost up-to-date Ancient DNA Dataset, which I intend to keep updating the best I can, and a year-old copy of the Prehistory Map series in .PSD, as well as whatever copies of documents and images I had stored with this blog.

I guess I could find new purpose now in recreating files to avoid spreading outdated information on the evolution of early Indo-European and Uralic languages and related haplogroups, including images, GIS maps, and other documents. I’ll admit though that this seems like a very hard, long-term task, and I have enjoyed too much occupying my free time with outdoor activities in the past two months to just suddenly stay home writing and drawing.

Commenting the data from this paper was worth the effort, though.

Join the discussion...

It is good practice to be registered and logged in to comment.
Please keep the discussion of this post on topic.
Civilized discussion. Academic tone.
For other topics, use the forums instead.
Notify of
Inline Feedbacks
View all comments