Haplogroup is not language, but R1b-L23 expansion was associated with Proto-Indo-Europeans

This is a series of posts I wrote at the end of 2017 / beginning of 2018, to answer the wrong assumptions I could read in forums and blogs. I decided not to publish them then, seeing how many successive papers were confirming my theory in a (surprisingly) clear-cut way. Nevertheless, because I keep reading the same comments and personal attacks no matter what gets published even in mid-2018, I have decided to update and publish them. This way I will be able to respond to the “haplogroup R1a – Indo-European association” directly by pointing to any of these posts from now on, instead of writing a comment each time.

  1. Haplogroup is not language, but R1b-L23 expansion was associated with Proto-Indo-Europeans
  2. The history of the simplistic ‘haplogroup R1a — Indo-European’ association
  3. Tips for dialogue with those supporting the ‘haplogroup R1a — Indo-European’ association

The present state of the Indo-European question

Sometimes when I read what people think I wrote in the Indo-European demic diffusion model, it’s just sad: ‘Non-steppe origin of PIE’; ‘Indo-Europeans are R1b’; ‘R1a represents a Satem PIE dialect’… They either can’t read, or don’t want to read.

Even the post summing up my concept of (and predictions of 2018 for) the Proto-Indo-European / R1b-L23 association seems not to prevent people from making the wrong assumptions.

That is partly why I have decided to add some genetic data and simplistic ethnolinguistic labels to my cultural maps, which I am sure will be used by some against the complexity of Indo-Uralic migrations I wanted to convey in the demic diffusion model, but which (hopefully) can help some of those lost right now.

NOTE. I have changed my concepts little since the first edition of my paper (you can read the different versions at my Academia.edu account), and for the most part all genetic papers have been confirming the broad strokes of the Indo-European demic diffusion model.

I will repeat it, so that there is no confusion: these are SIMPLISTIC maps. The kind that should never be published. However, since there are a lot of similar wrong maps on the Internet spreading the wrong concepts, my hope is that these here will give a much better idea of Indo-European migrations than those.

NOTE. Even the cultural maps are too simplistic, in that they try to represent with artificial borders (similar to modern countries) what were usually many different cultural groups coexisting in neighbouring settlements, without any frontier running through them. To assign a clear-cut ethnolinguistic and haplogroup identification further simplifies the actual ancestral complex picture, rendering it more and more inexact.

They are thus a graphic representation of the cultural, linguistic, genetic, and phylogeographic concepts of the Indo-European demic diffusion model summed up the best I could, whereby Khvalynsk/Repin->Yamna culture represent the formation of Late Indo-European, and its consequences.

  • These are individual, medium resolution images. You can read and download a bigger one with all images combined (15 Mb) by clicking here.
  • Admixture data is added from graphics available in Wang et al. 2018. You can find the relevant accompanying graphic data at the end of the map row (or by clicking here).

Proto-Indo-European migrations can be simplistically described as follows, after the latest studies by Olalde et al., Mathieson et al., Narasimhan et al, Damgaard et al.(x2), Wang et al. 2018, among others:

NOTE. For a more conservative description of events, without some of these recent findings, you can read the Indo-European demic diffusion model, 3rd ed. (to be updated).


Palaeolithic migrations. See the original map.

Indo-Uralic (or Early PIE and Pre-Uralic near each other, for those who don’t support a genetic relationship of Indo-European and Uralic) must have been spoken in Eastern Europe before ca. 5000 BC. The development of that loose community in Eastern Europe after the Palaeolithic-Mesolithic transition may have accompanied the formation of EHG ancestry, and thus potentially the westward expansion of haplogroup R1a-M17, the eastward expansion of haplogroup R1b-P297, or both.

The spread of Elshanian pottery from the east, then the Middle Eastern Neolithisation wave spreading from the west, as well as the in situ formation of an early Khvalynsk – Sredni Stog cultural-historical community from an admixture with local cultures in the Pontic-Caspian steppe offers the most likely ethnolinguistic community to be associated with Indo-Uralic speakers.

Mesolithic-Neolithic transition ca. 7000-5000 BC, with ancestral components. See the original map.


The development of a Middle PIE or Indo-Anatolian-speaking community must be identified with the peoples inhabiting the Volga-Ural region – i.e. probably Samara / early Khvalynsk cultures, whereas Early Proto-Uralic should probably be identified with cultures in the North Pontic region (such as early Sredni Stog), during the fifth millennium BC, in light of their cultural differences, their known samples and successive migration waves.

The different contribution of CHG ancestry from the Northern Caucasus, potentially from absorption of steppe peoples (apart from other potential admixture events) defines the formation of different Neolithic cultures, together with the appearance of evident Y-chromosome bottlenecks caused by expanding patrilineal clans and social interaction. The transition to nomadic pastoralism must have strengthened this social hierarchical trend that began ca. 5200-5000 BC in the steppe (Anthony 2007).

At the end of this period of initial differentiation (ca. 4200 BC or possibly earlier), the Suvorovo-Novodanilovka horse-rider chieftains appear in the North Pontic area and in the Lower Danube, bringing Khvalynsk symbology, which is likely to be the sign of migrating Proto-Anatolian speakers into the Balkans.

Neolithic migrations ca. 5000-4000 BC, with ancestral components. Notice the formation of different “steppe ancestral components” in the Pontic-Caspian steppe. See the original map.


Khvalynsk and Repin cultural materials develop side by side in the Volga-Ural region representing the development of an early Late PIE community (Common Indo-European stage of Late PIE), while cultures to the west of the Don such as late Sredni Stog or Kvitjana continue the early Sredni Stog traditions, thus probably connected with Proto-Uralic. The Caucasus Mountains remains an ethnolinguistic frontier between the steppe and North Iran, although some influence is seen from the steppe into Maykop, from known economic contacts.

Migration waves from Khvalysnk/Repin ca. 3300 BC are seen in different samples to the east in Central Asia, and to the south in Iran. Almost all of them share R1b-L23 lineages. Pre-Tocharian (of a more archaic Late PIE nature) is probably to be identified with the Afanasevo culture (predominantly of R1b-L23 lineages).

At the same time, Khvalysnk/Repin migrants settle to the west of the Don River (in the territory of the previous late Sredni Stog culture), to form the western South-Bug / Lower Don groups, which – together with the Volga-Ural / North Caucasian groups – are part the early Yamna culture, which dominates from ca. 3300 BC over the Pontic-Caspian steppe. To the north of the North Pontic steppe-forest or forest zone, a Uralic-speaking Proto-Corded Ware culture develops (probably related to North Pontic steppe-forest peoples) somewhere in or around the Dnieper-Dniester area.

Eneolithic migrations ca. 4000-3300 BC. Notice the formation of different “steppe ancestral components” in the Pontic-Caspian steppe. See the original map.

Chalcolithic – Early Bronze Age

The period of an early Yamna culture constrained to the Pontic-Caspian steppe (ca. 3300-3000 BC) is followed by renewed waves of Late Proto-Indo-European migrations, during which areal contacts and innovations (even between Northern and Southern branches) can still be reconstructed, in a Disintegrating Proto-Indo-European community.

Early Chalcolithic migrations ca. 3300-2600 BC. See the original map.

Western Europe

Yamna migrants, of mixed R1b-L51 and R1b-Z2103 lineages, settle ca. 3000-2600 BC along the lower Danube, in the Balkans and the Carpathian basin, showing still a homogeneous genetic cluster with other Yamna groups, giving rise later to groups of:

East Bell Beakers clearly dominated culturally and genetically over almost all of Europe, ca. 2500-2000 BC, including previous Corded Ware territory, representing thus the most recent massive migration of steppe peoples in Europe, and being the only pan-European culture derived from Late Proto-Indo-European-speaking Yamna. They must therefore be identified with North-West Indo-European speakers, as proposed by Mallory (2013), and not just Italo-Celtic (as supported recently by the Danish school, based on Gimbutas’ outdated model):

Late Chalcolithic migrations ca. 2600-2250 BC. See the original map.

A) For Germanic, we already have proof that an appropriate, unitary Scandinavian society, ripe for the development of a common Pre-Germanic language (that expanded much later, during the Iron Age, as Proto-Germanic) could have developed only after the arrival of Bell Beakers (see Prescott 2017). The association of proto-historic Germanic tribes mainly with the expansion of R1b-U106 lineages bears witness to that.

NOTE. Even without taking into account the potential L51 samples from Khvalynsk, it is by now quite clear that R1b-L51 lineages were already admixed in Yamna settlers from the Carpathian Basin, and any subclade of U106, L21, DF27, or U152 can thus be found everywhere in Europe associated with any of those North-West Indo-European migrations. What we are seing later, as in the East Bell Beaker migrants arriving in the British Isles (L21), Iberia (DF27), or the Netherlands/Scandinavia (U106), is the further reduction in variability coupled with the expansion of a few sucessful families (and their lineages), as we know it usually happens during migrations.

B) For Balto-Slavic, it seems they were not part of the eastern Corded Ware peoples: the Copenhagen group denies an Indo-Slavonic group in the Nature paper, referring instead to a dominion of early Iranians in the steppes, following their traces to proto-historic and historic Iranian-speaking peoples. And we knew already that Bell Beakers dominated over Central-East Europe, before the resurge of R1a-Z645 lineages in the region, which is compatible with the North-West Indo-European nature of their language undergoing a satemization process similar (but not equal to) to the Indo-Iranian one (see the full discussion on Balto-Slavic here).

NOTE. The few ancestral traits common to Germanic and Balto-Slavic are today considered a common substrate language to both, and not due to close contacts (and still less a common branch, as was proposed in the 1st half of the 20th c.). You can read e.g. Kortlandt’s Baltic, Slavic, Germanic (2017), or our Corded Ware substrate hypothesis (2017). In both theories, the referenced substrate is likely a non-Indo-European language, and in both cases it is related to the Corded Ware culture, which represents their most common immediate ancestral population before the spread of Bell Beakers.

C) Italo-Celtic seems to be associated (based on modern haplogroup distribution) with the expansion of R1b-U152 with the Urnfield/Hallstatt culture in Central Europe. We already have one Hallstatt sample of R1b-U152 lineage. On the other hand, there is already some proof that the Celtic expansions (associated early with La Tène) and some Italic mtDNA samples that proto-historic and early historic European cultural expansions were not necessarily similar to prehistoric migrations – where a great part of the male lineages was replaced.

NOTE. The Italic and Celtic expansions may thus be more similar to the Turkic example, with Turkic cultural customs being imposed by an East Asian minority over Central Asian populations, which resulted in a small detectable increase in East Asian ancestry. And even then the ancestry of nomad populations which received it was extremely heterogeneous, with several individuals being genetically distributed at the extremes of the first principal component. This strong variation was probably more likely in multi-ethnic cultural organizations of proto-historic and early historic times (see Damgaard et al. Nature 2018 for details on the Turkic expansions).

Eastern Europe

In the Pontic-Caspian steppe, early Yamna groups evolve into (from west to east) Late Yamna, Catacomb, and Poltavka groups, ca. 2800-2300 BC, all still dominated by R1b-L23 lineages (see Catacomb samples and the Catacomb outlier), with:

Early Bronze Age migrations ca. 2250-1750 BC. See the original map.

The late Corded Ware groups of Finland and Estonia, as well as Fatyanovo and Abashevo (and succeeding groups of Eastern Europe) may now be more clearly associated with Proto-Finno-Ugric dialects, and thus probably Corded Ware groups in general with Uralic languages, whose western branches have not survived to this day, with their culture and language being replaced quite early by expanding Bell Beakers.

NOTE. While the demise of Central and Central-East European CWC groups is evident, continuous contacts among Battle Axe culture groups in Scandinavia and the Gulf of Finland through the Baltic Sea – and the strong Bronze Age Palaeo-Germanic influence on Finnic languages (stronger than earlier Indo-Iranian borrowings) may point to the continuity of Proto-Finnic in Northern Scandinavia, which may force a reinterpretation of the prehistoric location of Proto-Finnic-speaking groups.

Wang et al. (2018). ADMIXTURE and PCA results, and chronological order of ancient Caucasus individuals. (a) ADMIXTURE results (k=12) of the newly genotyped individuals (filled symbols with black outlines) sorted by genetic clusters (Steppe and Caucasus) and in chronological order (coloured bars indicate the relative archaeological dates, (b) white circles the mean calibrated radiocarbon date and the errors bars the 2-sigma range. (c) ADMIXTURE results of relevant prehistoric individuals mentioned in the text (filled symbols) and (d) shows these projected onto a PCA of 84 modern-day West Eurasian populations (open symbols).

Other prehistoric expansions associated with ancestry and/or haplogroups

There are many examples by now of how prehistoric ethnolinguistic expansions are usually accompanied by haplogroup expansion and reduction in variability, with or without a common ancestral component.

Ancestry and/or Haplogroup are not language

I am tired of saying that genetics ≠ people, and especially that genetics ≠ language. I was convinced that this needed being said more since the wrong identification of ‘Yamnaya’ ancestry, now ‘steppe’ ancestry, with Indo-European.

A still more absurd identification, heir of the 2000s, is that of haplogroup = people, and even haplogroup = language.

NOTE. I guess the main culprits were Cavalli-Sforza and the first geneticists. But certainly some of the worst are some blogs and websites born in the 2000s, which still today misinform their readers.

This haplogroup/language continuity association trend is not new. You may think that R1b/Indo-European is another one. Then check again what I claimed about the R1b-L23/Indo-European association.

On the other hand, there are many wrong associations with haplogroups (including R1b). Let’s check some of them:

  1. Expanding R1a-M417 and especially R1a-Z645 lineages with steppe ancestry did not speak Late Indo-European. In fact, they can trace their origin to the Dnieper-Dniester area as early Uralic speakers.
  2. NOTE. I feel a special responsibility for helping this misconception, so I wrote this dedicated series about it.

  3. Expanding R1a-Z93 lineages in Central Asia during the Middle and Late Bronze Age represent expanding Proto-Indo-Aryan and Proto-Iranian speakers with a steppe and Central Asian origin, no matter what modern genetic composition their linguistic descendants have today, or even in historic times.
  4. R1b-P312 lineages expanded to Iberia with East Bell Beakers. That is, 90% of modern Basque speakers can trace their paternal ancestry to North-West Indo-European-speaking Yamna settlers in Hungary ca. 2900-2600 BC. David Reich and Iosif Lazaridis have confirmed the role of R1b-L23 subclades in the expansion of East Bell Beakers to Iberia, however small their share of “steppe ancestry” actually is.
  5. N1c lineages and Siberian ancestry (if in fact they spread together) did not accompany the expansion of Uralic; they spread quite late, and in successive waves, into North-East European peoples already speaking early Uralic dialects. This is clear from the described Indo-Uralic community (or close Proto-Uralic – Proto-Indo-European and Proto-Anatolian contacts), from Uralic – North-West Indo-European contacts, from Finno-Ugric contacts with early Proto-Indo-Iranian, and from Proto-Finnic contacts with Palaeo-Germanic. The homogeneous genetic landscape we see in Finland is due to known genetic bottlenecks.
  6. Haplogroups G2a and Middle East farmer ancestry; J and CHG ancestry; I1 and SHG ancestry; or I2 and WHG/EEF ancestry. These are not Proto-Indo-European markers, either. Each subclade may need to be explored individually to see which culture or people’s expansion they may have been associated to.
  7. R1b-L23 (probably mainly Z2103) lineages represent the initial expansion of Balkan Indo-European, later seen as Mycenaean, Illyrian, Phrygian, Dacian, Thracian, Armenian, etc., even though the majority of their attested speakers (even the earliest ones) will probably not be of these lineages, and will have more CHG or EEF than steppe ancestry.
  8. Haplogroups I, R1a, or R1b-106 seen in proto-historic Germanic tribes do not represent the original Germanic community: they are the main successful lineages that expanded with the Germanic migrations 2,000 years after the formation of the Pre-Germanic Nordic Late Neolithic. The initial community was formed by Bell Beakers of R1b-L23 lineages in Scandinavia – probably mainly a few expanding clans of early U106 lineages, which may not correspond to those that expanded Germanic later. Later R1b-U106 lineages which may have actually expanded from the Northern Lowlands.
  9. Most early R1b-L21 and R1b-DF27 lineages found in Western Europe do not represent any modern Indo-European language. Expansions of Celtic and Italic languages are unrelated to most of their subclades.
  10. Only some R1b-U152 lineages probably represent the initial Italo-Celtic expansions from Urnfield/Hallstatt, even though proto-historic Celtic and Italic speakers probably had as little from them as Mycenaeans had of R1b-Z2103, and most people of R1b-U152 lineage today don’t speak Celtic languages (and those who do wouldn’t probably be able to find continuity in their Celtic languages).

As a conclusion, 99.99999% (if not all) subclades of the most common Eurasian haplogroups today (whether R1b, R1a, N1c, I1, I2, E, J, G2a, R2, L, etc.) do not show any continuity with ancestral languages, their history being full of episodes of language and population replacement everywhere.

Ethnolinguistic continuity dreams are therefore stupid. There is no other word for it.