The Pazyryk culture spoke a “Uralic-Altaic” language… because haplogroup N

Matrilineal and patrilineal genetic continuity of two iron age individuals from a Pazyryk culture burial, by Tikhonov, Gurkan, Peler, & Dyakonov, Int J Hum Genet (2019).

Relevant excerpts (emphasis mine):

Of particular interest to the current study are the archaeogenetic investigations associated with the exemplary mound 1 from the Ak-Alakha-1 site on the Ukok Plateau in the Altai Republic (Polosmak 1994a; Pilipenko et al. 2015). This typical Pazyryk “frozen grave” was dated around 2268±39 years before present (Bln-4977) (Gersdorff and Parzinger 2000). Initial anthropological findings suggested an undisturbed dual inhumation comprising “a middle-aged European- type man” and “a young European-type woman”, both of whom presumably had a high social status among the Pazyryk elite (Polosmak 1994a). In contrast, recent archaeogenetic investigations revealed somewhat contradicting results since analyses at both the amelogenin gene and Y-chromosome short tandem repeat (Y-STR) loci clearly established that both Scythians were actually males and had paternal and maternal lineages that are typically associated with eastern Eurasians (Pilipenko et al. 2015). Through the use of mitochondrial, autosomal and Y-chromosomal DNA typing systems, it was possible to not only investigate the potential relationships between the two ancient Scythians but also to gather initial phylogenetic and phylogeographic information on their paternal and maternal lineages (Pilipenko et al. 2015).

Based on the Y-STR data available, the two Ak-Alakha-1 Scythians had an in silico haplogroup assignment of N, which first appeared in southeastern Asia and then expanded in southern Siberia (Rootsi et al. 2007; Pilipenko et al. 2015).

Current study aims to investigate the geographical distributions of the ancient and contemporary matches and close genetic variants of the maternal and paternal lineages observed in the two Scythians from the exemplary Ak-Alakha-1 kurgan.

Geographic distribution of the exact matches with the Scythian (PZ1) Y-STR (17-loci) and mtDNA (HVR1) haplotypes detailed in Tables 1a and 1b. Boundaries of the Altai Republic within the Russian Federation are shown with dashed lines, along with an approximate position of the Ak-Alakha-1 burial site, which is denoted with an ‘x’ on the map. Countries shaded in gray refer to those that have full 17-loci Y-STR and/or mtDNA HVR1 match(es) with the PZ1 haplotypes. Inset in the top and bottom left corners are the Altai and Uzbekistan maps, respectively, both scaled-up to allow better representation of the samples derived from these countries. There were no other exact matches from around parts of the globe that are not shown on the map, except for a single contemporary mtDNA haplotype from US, which presumably belonged to an ‘East Asian’ individual. Inset in the top right corner provides a scale for the number of haplotypes observed, but only up to three samples, which is valid for the entire map as well as the inset maps, irrespective of the differences in the scales of the actual map and inset maps themselves. For sample pools larger than three, the same linear scale provided on the inset in the top right corner still applies; please refer to Tables 1a and b for actual sample pool sizes. Samples are depicted on the entire map and the insets maps with circles and diamonds for the Y-STR and mtDNA haplotypes, respectively. Black and white coloring for samples depict whether the haplotype(s) are contemporary or ancient, respectively. Location of the PZ1 mtDNA and Y-STR haplotypes are shown on top of each other.

In response to aggressive Xiongnu expansion into the Altai region around the 2nd century BCE, some members of the Pazyryk culture may have started moving up North, and eventually reached the Vilyuy River at the beginning of 1st century CE. Notably, there is clear population continuity between the Uralic people such as Khants, Mansis and Nganasans, Paleo-Siberian people such as Yukaghirs and Chuvantsi, and the Pazyryk people even when considering just the two mtDNA and Y-STR haplotypes from the Ak-Alakha-1 mound 1 kurgan (Tables 1a, b, Table 2, Fig. 1). These concepts are also in agreement with the famous Yakut ethnographer Ksenofontov, who suggested that technologies associated with ferrous metallurgy were brought to the Vilyuy Valley at around 1st century CE by the first (proto)Turkic-speaking pioneers (Ksenofontov 1992). Yakut ethnogenesis per se possibly involved two major stages, the first being the proto-Turkic epoch through the arrival of Scytho-Siberian culture originating from Southern Siberia, such as that associated with the Pazyryk culture and the second being the proper Turkic epoch.

Nomadic peoples from the Central Asian steppes are East Iranian speakers whenever they are of haplogroup R1a, but “Uralic-Altaic” speakers whenever they are of haplogroup N. True story.

So they followed a haplogroup ca. 37,000 years old, in a sample dated some 2,300 years ago, whose precise subclade and ancient history is (yet) unknown, compared it to present-day populations, and the result is that they spoke “Uralic-Altaic” because haplogroup N and continuity. Sound familiar? Yep, it’s the kind of reasoning you might be reading right now about Iberian Bell Beakers, about Bell Beakers, or even about Yamna and their relationship to a Vasconic-Caucasian language, based on haplogroup R1b in modern Basques. Another true story.

Anyway, based on the multi-ethnic federations created during this time, and on the ancestral components visible in the different groups (see a post on Karasuk by Chad Rohlfsen), the Pazyryk culture’s language is unknown, and it could be, as a matter of fact (apart from the obvious East Iranian connection):

We also know that haplogroup N and Siberian ancestry expanded into cultures of Northern Eurasia precisely with the creation of the new social paradigm of chiefdoms and alliances, roughly at the same time as Scythians expanded, with the first sample of haplogroup N in Hungary appearing with Cimmerians.

Map of archaeological cultures in north-eastern Europe ca. 8th-3rd centuries BC. [The Mid-Volga Akozino group not depicted] Shaded area represents the Ananino cultural-historical society. Fading purple arrows represent likely stepped movements of subclades of haplogroup N for centuries (e.g. Siberian → Ananino → Akozino → Fennoscandia [N-VL29]; Circum-Arctic → forest-steppe [N1, N2]; etc.). Blue arrows represent eventual expansions of Uralic peoples to the north. Modified image from Vasilyev (2002).

While the study of modern populations is interesting, the problem I have with the paper is the reasoning of “language of ancient haplogroups based on modern populations”, and especially with the concept of “Uralic-Altaic”, and the highly hypothetic “Proto-Turkic” nomadic steppe pastoralists before “Hunnic Turkic” (which is itself questionable), before the “real Turkic” layer (being the authors apparently Turkic themselves), and the supposed “continuity” of Eastern Uralic and Turkic groups in Asia since the Out of Africa migration. The combination of all of this in the same text is just disturbing.

If you look at it from the bright side, at least these samples were not of haplogroup R1a-Z280, or we would be talking about great Slavonic Scythians showing continuity from Russia with love, as the paper threatened to do in its introduction…

If you are enjoying the comeback of this retro 2000s comedy in 2019 (based on the classic nativist “R1a=IE”, “R1b=Basque”, and “N=Uralic” combo) it’s because you – like me – are putting yourself in this guy’s shoes every time a new episode of funny self-destruction appears:



A Game of Thrones in Indo-European: proto-languages in Westeros and Essos, and population genomics


I think proto-languages can be applied to basically any appropriate prehistoric setting, and especially to science fiction and fantasy settings. I often viewed the lack of interest for them as based on the idea that they are not fantastic enough, that they would render a fantastic world too realistic to allow for an adequate immersion of the reader (or viewer) into a new world.

With time, I have become more and more convinced that most authors don’t use proto-languages (or tweaked versions of them) simply because they can’t, and resort to the easier way: inventing some rules and words based on some basic ideas and sounds they feel would fit a certain culture or people, to get going. After all, world-building is about a good enough, not too detailed description, and books are about characters and settings, not worlds.

After the end of the 7th season of the Game of Thrones TV series, of which I have become a great fan, I had some season finale grief to deal with, so I thought about applying what we knew about Proto-Indo-Europeans to the fantasy world. Since all book translations deal with English names as if they were translations of the Common Tongue (e.g. Spanish “Invernalia” or “Poniente” for “Winterfel” or “Westeros”), the idea of a translation into Proto-Indo-European seemed quite interesting.

NOTE. I understand that, for some, the idea that “the original language is the best” would make them reject this. However, just take into account the millions who enjoy the books and the TV series only in their native language, and know nothing about the ‘original’ version…

Here are the text and images:

A Dance with Old Tongues

As you can see, the idea of the Common Tongue being Late Proto-Indo-European brings about a whole new (infinite) world of dialectal evolution, language contacts, and population expansions which must be established for the whole setting to work. This is what the text I began to write was about: to use languages (and related populations) of ca. 6000-1500 BC, and to avoid anachronisms and impossible language relationships.

As an added advantage, fans of role-playing games could expand their world with the use of the language correspondences and the maps. This way, instead of “Northern English” being spoken in the North, and “Spanish English” being spoken in Dorne, according to some selections that have been naturally criticized, you have ancient languages that fit with the ancient setting, and which were actually related to each other.

Equivalence of languages of the known world with coeval proto-languages. Solid red lines divide Graeco-Aryan from Northern Indo-European dialects (Tocharian is separated from North-West Indo-European by a dotted red line). See all maps.

I also began drawing a fantasy map, my first one – even though I have been member of Cartographer’s Guild for years – , which eventually helped me with my updates of maps of prehistoric migrations, and even with the use of arrows and colors for scientific publications. I drew details mainly to illustrate the text, not to offer a comprehensive translated world. Most of the work was done in the Summer of 2017, with some map changes done in 2018 with help of the maps and works of fans.

NOTE. I have reviewed it during some long travels lately, and included names of “bloodlines” (i.e. haplogroups), which I find more interesting today for people to understand bottlenecks during prehistoric migrations; I have also added a map using pie charts. If this doesn’t fit well with the whole picture, it’s because it’s a recent addition. The rest is more or less the same as one-two years ago.

I don’t have time now to correct much of what I wrote. I have forgotten most of the relevant details from the books, especially A World of Ice and Fire which I think helped me a lot with this, and I am sure that after writing A Song of Sheep and Horses (now you know the why of the book names) I would deal with some language identification and cognates differently.

I decided to publish it to liven up our Facebook page of Modern Indo-European now that the 8th season is near, so that people can participate and try to translate (translatable) names and expressions into Proto-Indo-European, to see how it would work out. You can also request access our Modern Indo-European and Proto-Indo-European groups; both are administered mainly by Fernando.

If you think this whole idea is crazy, or a huge loss of time, I agree; this is how you lose your time when you like fantasy, comic books, etc. But I am a great fan of fantasy and fiction, and I had a lot of free time back then, so I couldn’t help it…

On the other hand, if you feel that mixing fantasy (or SF) with the Proto-Indo-European question (especially population genomics) is a bad idea, I may have agreed with that two years ago, and maybe this is the reason why I hesitated to publish it then.

Hoewever, today we can read a whole new (2018 and 2019) bunch of “steppe ancestry=Indo-European” fantasies: invisible Nganasan reindeer hordes, a Fearsome Tisza River where Yamna settlers mysteriously disappear, shapeshifting Dutch CWC peoples who change haplogroups, languages dependent on cephalic types, or Yamna/Bell Beaker expanding Vasconic…So what’s the matter with some more fantasy?

ASoSaH Reread (II): Y-DNA haplogroups among Uralians (apart from R1a-M417)


This is mainly a reread of from Book Two: A Game of Clans of the series A Song of Sheep and Horses: chapters iii.5. Early Indo-Europeans and Uralians, iv.3. Early Uralians, v.6. Late Uralians and vi.3. Disintegrating Uralians.

“Sredni Stog”

While the true source of R1a-M417 – the main haplogroup eventually associated with Corded Ware, and thus Uralic speakers – is still not known with precision, due to the lack of R1a-M198 in ancient samples, we already know that the Pontic-Caspian steppes were probably not it.

We have many samples from the north Pontic area since the Mesolithic compared to the Volga-Ural territory, and there is a clear prevalence of I2a-M223 lineages in the forest-steppe area, mixed with R1b-V88 (possibly a back-migration from south-eastern Europe).

R1a-M459 (xR1a-M198) lineages appear from the Mesolithic to the Chalcolithic scattered from the Baltic to the Caucasus, from the Dniester to Samara, in a situation similar to haplogroups Q1a-M25 and R1b-L754, which supports the idea that R1a, Q1a, and R1b expanded with ANE ancestry, possibly in different waves since the Epipalaeolithic, and formed the known ANE:EHG:WHG cline.

Y-DNA samples from Khvalynsk and neighbouring cultures. See full version.

The first confirmed R1a-M417 sample comes from Alexandria, roughly coinciding with the so-called steppe hiatus. Its emergence in the area of the previous “early Sredni Stog” groups (see the mess of the traditional interpretation of the north Pontic groups as “Sredni Stog”) and its later expansion with Corded Ware supports Kristiansen’s interpretation that Corded Ware emerged from the Dnieper-Dniester corridor, although samples from the area up to ca. 4000 BC, including the few Middle Eneolithic samples available, show continuity of hg. I2a-M223 and typical Ukraine Neolithic ancestry.

NOTE. The further subclade R1a-Z93 (Y26) reported for the sample from Alexandria seems too early, given the confidence interval for its formation (ca. 3500-2500 BC); even R1a-Z645 could be too early. Like the attribution of the R1b-L754 from Khvalynsk to R1b-V1636 (after being previously classifed as of Pre-V88 and M73 subclade), it seems reasonable to take these SNP calls with a pinch of salt: especially because Yleaf (designed to look for the furthest subclade possible) does not confirm for them any subclade beyond R1a-M417 and R1b-L754, respectively.

The sudden appearance of “steppe ancestry” in the region, with the high variability shown by Ukraine_Eneolithic samples, suggests that this is due to recent admixture of incoming foreign peoples (of Ukraine Neolithic / Comb Ware ancestry) with Novodanilovka settlers.

The most likely origin of this population, taking into account the most common population movements in the area since the Neolithic, is the infiltration of (mainly) hunter-gatherers from the forest areas. That would confirm the traditional interpretation of the origin of Uralic speakers in the forest zone, although the nature of Pontic-Caspian settlers as hunter-gatherers rather than herders make this identification today fully unnecessary (see here).

EDIT (3 FEB 2019): As for the most common guesstimates for Proto-Uralic, roughly coinciding with the expansion of this late Sredni Stog community (ca. 4000 BC), you can read the recent post by J. Pystynen in Freelance Reconstruction, Probing the roots of Samoyedic.

Late Sredni Stog admixture shows variability proper of recent admixture of forest-steppe peoples with steppe-like population. See full version here.

NOTE. Although my initial simplistic interpretation (of early 2017) of Comb Ware peoples – traditionally identified as Uralic speakers – potentially showing steppe ancestry was probably wrong, it seems that peoples from the forest zone – related to Comb Ware or neighbouring groups like Lublyn-Volhynia – reached forest-steppe areas to the south and eventually expanded steppe ancestry into east-central Europe through the Volhynian Upland to the Polish Upland, during the late Trypillian disintegration (see a full account of the complex interactions of the Final Eneolithic).

The most interesting aspect of ascertaining the origin of R1a-M417, given its prevalence among Uralic speakers, is to precisely locate the origin of contacts between Late Proto-Indo-European and Proto-Uralic. Traditionally considered as the consequence of contacts between Middle and Upper Volga regions, the most recent archaeological research and data from ancient DNA samples has made it clear that it is Corded Ware the most likely vector of expansion of Uralic languages, hence these contacts of Indo-Europeans of the Volga-Ural region with Uralians have to be looked for in neighbours of the north Pontic area.

Sredni Stog – Repin contacts representing Uralic – Late Indo-European contacts were probably concentrated around the Don River.

My bet – rather obvious today – is that the Don River area is the source of the earliest borrowings of Late Uralic from Late Indo-European (i.e. post-Indo-Anatolian). The borrowing of the Late PIE word for ‘horse’ is particularly interesting in this regard. Later contacts (after the loss of the initial laryngeal) may be attributed to the traditionally depicted Corded Ware – Yamna contact zone in the Dnieper-Dniester area.

NOTE. While the finding of R1a-M417 populations neighbouring R1b-L23 in the Don-Volga interfluve would be great to confirm these contacts, I don’t know if the current pace of more and more published samples will continue. The information we have right now, in my opinion, suffices to support close contacts of neighbouring Indo-Europeans and Uralians in the Pontic-Caspian area during the Late Eneolithic.

Classical Corded Ware

After some complex movements of TRB, late Trypillia and GAC peoples, Corded Ware apparently emerged in central-east Europe, under the influence of different cultures and from a population that probably (at least partially) stemmed from the north Pontic forest-steppe area.

Single Grave and central Corded Ware groups – showing some of the earliest available dates (emerging likely ca. 3000/2900 BC) – are as varied in their haplogroups as it is expected from a sink (which does not in the least resemble the Volga-Ural population):

Interesting is the presence of R1b-L754 in Obłaczkowo, potentially of R1b-V88 subclade, as previously found in two Central European individuals from Blätterhole MN (ca. 3650 and 3200 BC), and in the Iron Gates and north Pontic areas.

Haplogroups I2a and G have also been reported in early samples, all potentially related to the supposed Corded Ware central-east European homeland, likely in southern Poland, a region naturally connected to the north Pontic forest-steppe area and to the expansion of Neolithic groups.

Y-DNA samples from early Corded Ware groups and neighbouring cultures. See full version.

The true bottlenecks under haplogroup R1a-Z645 seem to have happened only during the migration of Corded Ware to the east: to the north into the Battle Axe culture, mainly under R1a-Z282, and to the south into Middle Dnieper – Fatyanovo-Balanovo – Abashevo, probably eventually under R1a-Z93.

This separation is in line with their reported TMRCA, and supports the split of Finno-Permic from an eastern Uralic group (Ugric and Samoyedic), although still in contact through the Russian forest zone to allow for the spread of Indo-Iranian loans.

This bottleneck also supports in archaeology the expansion of a sort of unifying “Corded Ware A-horizon” spreading with people (disputed by Furholt), the disintegrating Uralians, and thus a source of further loanwords shared by all surviving Uralic languages.

Confirming this ‘concentrated’ Uralic expansion to the east is the presence of R1a-M417 (xR1a-Z645) lineages among early and late Single Grave groups in the west – which essentially disappeared after the Bell Beaker expansion – , as well as the presence of these subclades in modern Central and Western Europeans. Central European groups became thus integrated in post-Bell Beaker European EBA cultures, and their Uralic dialect likely disappeared without a trace.

NOTE. The fate of R1b-L51 lineages – linked to North-West Indo-Europeans undergoing a bottleneck in the Yamna Hungary -> Bell Beaker migration to the west – is thus similar to haplogroup R1a-Z645 – linked to the expansion of Late Uralians to the east – , hence proving the traditional interpretation of the language expansions as male-driven migrations. These are two of the most interesting genetic data we have to date to confirm previous language expansions and dialectal classifications.

It will be also interesting to see if known GAC and Corded Ware I2a-Y6098 subclades formed eventually part of the ancient Uralic groups in the east, apart from lineages which will no doubt appear among asbestos ware groups and probably hunter-gatherers from north-eastern Europe (see the recent study by Tambets et al. 2018).

Corded Ware ancestry marked the expansion of Uralians

Sadly, some brilliant minds decided in 2015 that the so-called “Yamnaya ancestry” (now more appropriately called “steppe ancestry”) should be associated to ‘Indo-Europeans’. This is causing the development of various new pet theories on the go, as more and more data contradicts this interpretation.

There is a clear long-lasting cultural, populational, and natural barrier between Yamna and Corded Ware: they are derived from different ancestral populations, which show clearly different ancestry and ancestry evolution (although they did converge to some extent), as well as different Y-DNA bottlenecks; they show different cultures, including those of preceding and succeeding groups, and evolved in different ecological niches. The only true steppe pastoralists who managed to dominate over grasslands extending from the Upper Danube to the Altai were Yamna peoples and their cultural successors.

Corded Ware admixture proper of expanding late Sredni Stog-like populations from the forest-steppe. See full version here.

NOTE. You can also read two recent posts by FrankN in the blog aDNA era, with detailed information on the Pontic-Caspian cultures and the formation of “steppe ancestry” during the Palaeolithic, Mesolithic and Neolithic: How did CHG get into Steppe_EMBA? Part 1: LGM to Early Holocene and How did CHG get into Steppe_EMBA? Part 2: The Pottery Neolithic. Unlike your typical amateur blogger on genetics using few statistical comparisons coupled with ‘archaeolinguoracial mumbo jumbo’ to reach unscientific conclusions, these are obviously carefully redacted texts which deserve to be read.

I will not enter into the discussion of “steppe ancestry” and the mythical “Siberian ancestry” for this post, though. I will just repost the opinion of Volker Heyd – an archaeologist specialized in Yamna Hungary and Bell Beakers who is working with actual geneticists – on the early conclusions based on “steppe ancestry”:

[A]rchaeologist Volker Heyd at the University of Bristol, UK, disagreed, not with the conclusion that people moved west from the steppe, but with how their genetic signatures were conflated with complex cultural expressions. Corded Ware and Yamnaya burials are more different than they are similar, and there is evidence of cultural exchange, at least, between the Russian steppe and regions west that predate Yamnaya culture, he says. None of these facts negates the conclusions of the genetics papers, but they underscore the insufficiency of the articles in addressing the questions that archaeologists are interested in, he argued. “While I have no doubt they are basically right, it is the complexity of the past that is not reflected,” Heyd wrote, before issuing a call to arms. “Instead of letting geneticists determine the agenda and set the message, we should teach them about complexity in past human actions.


Happy new year 2019…and enjoy our new books!


Sorry for the last weeks of silence, I have been rather busy lately. I am having more projects going on, and (because of that) I also wanted to finish a project I have been working on for many months already.

I have therefore decided to publish a provisional version of the text, in the hope that it will be useful in the following months, when I won’t be able to update it as often as I would like to:

EDIT (20 JAN 2019): For those of you who are more comfortable reading in your native language, I have placed some links to automatic translations by Google Translate. They might work especially well for the texts of A Game of Clans & A Clash of Chiefs.

Don’t forget to check out the maps included in the supplementary materials: I have added Y-DNA, mtDNA, and ADMIXTURE data using GIS software. The PCA graphics are also important to follow the main text.

NOTE. Right now the files are only in my server. I will try to upload them to and Research Gate when I have time, I have uploaded them to and ResearchGate, in case the websites are too slow.

I would have preferred to wait for a thorough revision of the section on archaeology and the linguistic sections on Uralic, but I doubt I will have time when the reviews come, so it was either now or maybe next December…

I say so in the introduction, but it is evident that certain aspects of the book are tentative to say the least: the farther back we go from Late Proto-Indo-European, the less clear are many aspects. Also, linguistically I am not convinced about Eurasiatic or Nostratic, although they do have a certain interest when we try to offer a comprehensive view of the past, including ethnolinguistic identities.

I cannot be an expert in everything, and these books cover a lot. I am bound to publish many corrections as new information appears and more reviews are sent. For example, just days ago (before SNP calls of Wang et al. 2018 were published) some paragraphs implied that AME might have expanded Nostratic from the Middle East. Now it does not seem so, and I changed them just before uploading the text. That’s how tentative certain routes are, and how much all of this may change. And that only if we accept a Nostratic phylum…

NOTE. Since the first book I wrote was the linguistic one, and I have spent the last months updating the archaeology + genetics part, now many of you will probably understand 1) why I am so convinced about certain language relationships and 2) how I used many posts to clarify certain ideas and receive comments. Many posts offer probably a good timeline of what I worked with, and when.


I did not add this section to the books, because they are still not ready for print, but I think this is due somewhere now. It is impossible to reference all who have directly or indirectly contributed to this, so this is a list of those I feel have played an important role.

I am indebted to the following people (which does not mean that they share my views, obviously):

First and foremost, to Fernando López-Menchero, for having the patience to review with detail many parts on Indo-European linguistics, knowing that I won’t accept many of his comments anyway. The additional information he offers is invaluable, but I didn’t want to turn this into a huge linguistic encyclopaedia with unending discussions of tiny details of each reconstructed word. I think it is already too big as it is.

I would not have thought about doing this if it were not for the interest of Wekwos (Xavier Delamarre) in publishing a full book about the Indo-European demic diffusion model (in the second half of 2017, I think). It was them who suggested that I extended the content, when all I had done until then was write an essay and draw some maps in my free time between depositing the PhD thesis and defending it.

Sadly, as much as I would like to publish a book with a professional publisher, I don’t think ancient DNA lends itself for the traditional format, so my requests (mainly to have free licenses and being able to review the text at will, as new genetic papers are published) were logically not acceptable. Also, the main aim of all volumes, especially the linguistic one, is the teaching of essentials of Late Proto-Indo-European and related languages, and this objective would be thwarted by selling each volume for $50-70 and only in printed format. I prefer a wider distribution.

At first I didn’t think much of this proposal, because I do not benefit from this kind of publications in my scientific field, but with time my interest in writing a whole, comprehensive book on the subject grew to the point where it was already an ongoing project, probably by the start of 2018.

I would not have been in contact with Wekwos if it were not for user Camulogène Rix at Anthrogenica, so thanks for that and for the interest in this work.

I would not have thought of writing this either if not for the spontaneous support (with an unexpected phone call!) of a professor of the Complutense University of Madrid, Ángel Gómez Moreno, who is interested in this subject – as is his wife, a professor of Classics more closely associated to Indo-European studies, and who helped me with a search for Indo-Europeanists.

EDIT (1 JAN 2019): I remembered that Karin Bojs sent me her book after reading the demic diffusion model. I may have also thought about writing a whole book back then, but mid-2017 is probably too early for the project.

Professor Kortlandt is still to review the text, but he contributed to both previous essays in some very interesting ways, so I hope he can help me improve the parts on Uralic, and maybe alternative accounts of expansion for Balto-Slavic, depending on the time depth that he would consider warranted according to the Temematic hypothesis.

The maps are evidently (for those who are interested in genetics) in part the result of the effort of the late Jean Manco: As you can see from the maps including Y-DNA and mtDNA samples, I have benefitted from her way of organising data and publishing it. Similarly, the work of Iain McDonald in assessing the potential migration routes of R1b and R1a in Europe with the help of detailed maps was behind my idea for the first maps, and consequently behind these, too.

I should thank all people responsible for the release of free datasets to work with, including the Reich and Jena labs, the Veeramah Lab, and also researchers from the Max Planck Institute or the Mainz Palaeogenetics group, who didn’t mind to share with me datasets to work with.

Readers of this blog with interesting comments have also been essential for the improvement of the texts. You can probably see some of your many contributions there. I may not answer many comments, because I am always busy (and sometimes I just don’t have anything interesting to say), but I try to read all of them.

EDIT (1 JAN 2019) I think I should mention at least Chetan, Egg, or Robert George; but then I would leave out old europe, Sgr Ganesh, or Tileman Ehlen; and if I include them I would leave out others…

Users of other sites, like Anthrogenica, whose particular points of view and deep knowledge of some very specific aspects are sometimes very useful. In particular, user Anglesqueville helped me to fix some issues with the merging of datasets to obtain the PCAs and ADMIXTURE, and prepared some individual samples to merge them.

Even without posting anything, Google Analytics keeps sending me messages about increasing user fidelity (returning users), and stats haven’t really changed (which probably means more people are reading old posts), so thank you for that.

I hope you enjoy the books.

Happy new year!

Genetic landscape and past admixture of modern Slovenians


Open access Genetic Landscape of Slovenians: Past Admixture and Natural Selection Pattern, by Maisano Delser et al. Front. Genet. (2018).

Interesting excerpts (emphasis mine):


Overall, 96 samples ranging from Slovenian littoral to Lower Styria were genotyped for 713,599 markers using the OmniExpress 24-V1 BeadChips (Figure 1), genetic data were obtained from Esko et al. (2013). After removing related individuals, 92 samples were left. The Slovenian dataset has been subsequently merged with the Human Origin dataset (Lazaridis et al., 2016) for a total of 2163 individuals.

Y chromosome

First, Y chromosome genetic diversity was assessed. A total of 52 Y chromosomes were analyzed for 195 SNPs. The majority of individuals (25, 48.1%) belong to the haplogroup R1a1a1a (R-M417) while the second major haplogroup is represented by R1b (R-M343) including 15 individuals (28.8%). Twelve samples are assigned to haplogroup I (I M170): five and two samples belong to haplogroup I2a (I L460) and I1 (I M253), respectively, while the remaining five samples did not have enough information to be further assigned.

PCA of Slovenian samples with European populations (Slovenian_HO_EU dataset). For details regarding the populations included, see Supplementary Table 1.


Considering the unbalanced sample size of the Slovenian population compared to the other populations included in the dataset, a subset of 20 Slovenian individuals randomly sampled was used.

All Slovenian samples group together with Hungarians, Czechs, and some Croatians (“Central-Eastern European” cluster) as also suggested by the PCA. All Basque individuals with few French and Spanish cluster together (“Basque” cluster) while a “Northern-European” cluster is made of the majority of French, English, Icelanders, Norwegians, and Orcadians. Five populations contributed to the “Eastern-European” cluster including Belarusians, Estonians, Lithuanians, Mordovians, and Russians. Western and South Europe is split into two cluster: the first (“Western European” cluster) includes all Spanish individuals, few French, and some Italians (North Italy) while the second (“Southern-European” cluster) groups Sicilians, Greeks, some Croatians, Romanians, and some Italians (North Italy).

Admixture Pattern and Migration

Modified image, from the paper (Central-East Europeans marked). Unsupervised admixture analysis of Slovenians. Results for K = 5 are showed as it represents the lowest cross-validation error. Slovenian samples show an admixture pattern similar to the neighboring populations such as Croatians and Hungarians. The major ancestral components are: the blue one which is shared with Lithuanians and Russians, followed by the dark green one that is mostly present in Greek samples and the light blue which characterizes Orcadians and English. For population acronyms see Supplementary Table 1.

All Slovenian individuals share common pattern of genetic ancestry, as revealed by ADMIXTURE analysis. The three major ancestry components are the North East and North West European ones (light blue and dark blue, respectively, Figure 3), followed by a South European one (dark green, Figure 3). Contribution from the Sardinians and Basque are present in negligible amount. The admixture pattern of Slovenians mimics the one suggested by the neighboring Eastern European populations, but it is different from the pattern suggested by North Italian populations even though they are geographically close.

Using ALDER, the most significant admixture event was obtained with Russians and Sardinians as source populations and it happened 135 ± 9.31 generations ago (Z-score = 11.54). (…) When tested for multiple admixture events (MALDER), we obtained evidence for one admixture event 165.391 ± 17.1918 generations ago corresponding to ∼2620 BCE (CI: 3101–2139) considering a generation time of 28 years (Figure 4), with Kalmyk and Sardinians as sources.

We then modeled the Slovenian population as target of admixture of ancient individuals from Haak et al. (2015) while computing the f3(Ancient 1, Ancient 2, Slovenian) statistic. The most significant signal was obtained with Yamnaya and HungaryGamba_EN (Z-score = -10.66), followed by MA1 with LBK_EN (Z-score -9.7) and Yamnaya with Stuttgart (Z-score = -8.6) used as possible source populations (Supplementary Figure 5).

We found a significant signal of admixture by using both pairs as ancient sources. Specifically, for the pair Yamnaya and Hungary_EN the admixture event is dated at 134.38 ± 23.69 generations ago (Z-score = 5.26, p-value of 1.5e-07) while for Yamnaya and LBK_EN at 153.65 ± 22.19 generations ago (Z-score = 6.92, p-value 4.4e-12). Outgroup f3 with Yamnaya put Slovenian population close to Hungarians, Czechs, and English, indicating a similar shared drift between these population with the Steppe populations (Supplementary Figure 6).

Admixture events identified with ALDER and MALDER. The gray dots represent significant admixture events detected with ALDER using Slovenians as target, the solid line represents the single admixture event detected using MALDER, dashed lines represent the confidence interval. Only the significant results after multiple testing correction are plotted. For ALDER results see Supplementary Table 5.

Not that any of this would come as a surprise, but:

  • R1a-M458 and some R1a-Z280 (xR1a-Z92) lineages (found among Slovenes) were associated with the Slavic expansion, likely with the Prague-Korchak culture, originally stemming probably from peoples of the Lusatian culture. Other R1a-Z280 lineages remained associated with Uralic peoples, and some became Slavicized only recently.
  • PCA keeps supporting the common cluster of certain West, South, and East Slavs in a “Central-Eastern European” cluster, distinct from the “North-Eastern European” cluster formed by modern Finno-Ugrians, as well as ancient Finno-Ugrians of north-eastern Europe who were only recently Slavicized.
  • Admixture supports the same ancient ‘western’ (a core West+South+East Slavic) cluster, and the admixture event with Yamna + Hungary_EN is logically a proxy for Yamna Hungary being at the core of ancestral Central-East population movements related to Bell Beakers in the mid- to late 3rd millennium.

The theory that East Slavs are at the core of the Slavic expansion makes no sense, in terms of archaeology (see Florin Curta’s dismissal of those recent eastern ‘Slavic’ finds, his commentary on 19th century Pan-Slavic crap, or his book on Slavic migrations), in terms of ancient DNA (the earliest Slavs sampled cluster with modern West Slavs, distant from the steppe cluster, unlike Finno-Ugrians), or in terms of modern DNA.

I don’t know where exactly this impulse for the theory of Russia being the cradle of Slavs comes from today (although there are some obvious political trends to revive 19th c. ideas), but it was always clear for everyone, including Russians, that East Slavs had migrated to the east and north and assimilated indigenous Finno-Ugrians, apart from Turkic-, Iranian-, and Caucasian-speaking peoples to the east. Genetics is only confirming what was clear from other disciplines long ago.


Minimal gene flow from western pastoralists in the Bronze Age eastern steppes


Open access paper Bronze Age population dynamics and the rise of dairy pastoralism on the eastern Eurasian steppe, by Jeong et al. PNAS (2018).

Interesting excerpts (emphasis mine):

To understand the population history and context of dairy pastoralism in the eastern Eurasian steppe, we applied genomic and proteomic analyses to individuals buried in Late Bronze Age (LBA) burial mounds associated with the Deer Stone-Khirigsuur Complex (DSKC) in northern Mongolia. To date, DSKC sites contain the clearest and most direct evidence for animal pastoralism in the Eastern steppe before ca. 1200 BCE.

Most LBA Khövsgöls are projected on top of modern Tuvinians or Altaians, who reside in neighboring regions. In comparison with other ancient individuals, they are also close to but slightly displaced from temporally earlier Neolithic and Early Bronze Age (EBA) populations from the Shamanka II cemetry (Shamanka_EN and Shamanka_EBA, respectively) from the Lake Baikal region. However, when Native Americans are added to PC calculation, we observe that LBA Khövsgöls are displaced from modern neighbors toward Native Americans along PC2, occupying a space not overlapping with any contemporary population. Such an upward shift on PC2 is also observed in the ancient Baikal populations from the Neolithic to EBA and in the Bronze Age individuals from the Altai associated with Okunevo and Karasuk cultures.

Image modified from the article. Karasuk cluster in green, closely related to sample ARS026 in red. Principal Component Analysis (PCA) of selected 2,077 contemporary Eurasians belonging to 149 groups. Contemporary individuals are plotted using three-letter abbreviations for operational group IDs. Group IDs color coded by geographic region. Ancient Khövsgöl individuals and other selected ancient groups are represented on the plot by filled shapes. Ancient individuals are projected onto the PC space using the “lsqproject: YES” option in the smartpca program to minimize the impact of high genotype missing rate.

(…) two individuals fall on the PC space markedly separated from the others: ARS017 is placed close to ancient and modern northeast Asians, such as early Neolithic individuals from the Devil’s Gate archaeological site (22) and present-day Nivhs from the Russian far east, while ARS026 falls midway between the main cluster and western Eurasians.

Upper Paleolithic Siberians from nearby Afontova Gora and Mal’ta archaeological sites (AG3 and MA-1, respectively) (25, 26) have the highest extra affinity with the main cluster compared with other groups, including the eastern outlier ARS017, the early Neolithic Shamanka_EN, and present-day Nganasans and Tuvinians (Z > 6.7 SE for AG3). Main cluster Khövsgöl individuals mostly belong to Siberian mitochondrial (A, B, C, D, and G) and Y (all Q1a but one N1c1a) haplogroups.

The genetic affinity of the Khövsgöl clusters measured by outgroup-f3 and -f4 statistics. (A) The top 20 populations sharing the highest amount of >genetic drift with the Khövsgöl main cluster measured by f3(Mbuti; Khövsgöl, X). (B) The top 15 populations with the most extra affinity with each of the three Khövsgöl clusters in contrast to Tuvinian (for the main cluster) or to the main cluster (for the two outliers), measured by f4(Mbuti, X; Tuvinian/Khövsgöl, Khövsgöl/ARS017/ARS026). Ancient and contemporary groups are marked by squares and circles, respectively. Darker shades represent a larger f4 statistic.

Previous studies show a close genetic relationship between WSH populations and ANE ancestry, as Yamnaya and Afanasievo are modeled as a roughly equal mixture of early Holocene Iranian/ Caucasus ancestry (IRC) and Mesolithic Eastern European hunter-gatherers, the latter of which derive a large fraction of their ancestry from ANE. It is therefore important to pinpoint the source of ANE-related ancestry in the Khövsgöl gene pool: that is, whether it derives from a pre-Bronze Age ANE population (such as the one represented by AG3) or from a Bronze Age WSH population that has both ANE and IRC ancestry.

The amount of WSH contribution remains small (e.g., 6.4 ± 1.0% from Sintashta). Assuming that the early Neolithic populations of the Khövsgöl region resembled those of the nearby Baikal region, we conclude that the Khövsgöl main cluster obtained ∼11% of their ancestry from an ANE source during the Neolithic period and a much smaller contribution of WSH ancestry (4–7%) beginning in the early Bronze Age.

Admixture modeling of Altai populations and the Khövsgöl main cluster using qpAdm. For the archaeological populations, (A) Shamanka_EBA and (B and C) Khövsgöl, each colored block represents the proportion of ancestry derived from a corresponding ancestry source in the legend. Error bars show 1 SE. (A) Shamanka_EBA is modeled as a mixture of Shamanka_EN and AG3. The Khövsgöl main cluster is modeled as (B) a two-way admixture of Shamanka_EBA+Sintashta and (C) a three-way admixture Shamanka_EN+AG3+Sintashta.

Apparently, then, the first individual with substantial WSH ancestry in the Khövsgöl population (ARS026, of haplogroup R1a-Z2123), directly dated to 1130–900 BC, is consistent with the first appearance of admixed forest-steppe-related populations like Karasuk (ca. 1200-800 BC) in the Altai. Interestingly, haplogroup N1a1a-M178 pops up (with mtDNA U5a2d1) among the earlier Khövsgöl samples.

I will repeat what I wrote recently here: Samoyedic arrived in the Altai with Karasuk and hg R1a-Z645 + Steppe_MLBA-like ancestry, admixed with Altai populations, clustering thus within an Ancient Altai cline. Only later did N1a1a subclades infiltrate Samoyedic (and Ugric) populations, bringing them closer to their modern Palaeo-Siberian cline. The shared mtDNA may support an ancestral EHG-“Siberian” cline, or else a more recent Afanasevo-related origin.

Modified image from Jeong et al. (2018), supplementary materials. The first two PCs summarizing the genetic structure within 2,077 Eurasian individuals. The two PCs generally mirror geography. PC1 separates western and eastern Eurasian populations, with many inner Eurasians in the middle. PC2 separates eastern Eurasians along the north-south cline and also separates Europeans from West Asians. Ancient individuals (color-filled shapes), including two Botai individuals, are projected onto PCs calculated from present-day individuals. Read more.

Also interesting, Q1a2 subclades and ANE ancestry making its appearance everywhere among ancestral Eurasian peoples, as Chetan recently pointed out.


Mongolian tribes cluster with East Asians, closely related to the Japanese


New paper behind paywall Whole-genome sequencing of 175 Mongolians uncovers population-specific genetic architecture and gene flow throughout North and East Asia, by Bai et al Nature Genetics (2018).

Interesting excerpts (emphasis mine):

Genome sequencing, variant calling, and construction of the Mongolian reference panel. We collected peripheral blood with informed consent from 175 Mongolian individuals representing six distinct tribes/regions in northern China and Mongolia, including the Abaga, Khalkha, Oirat, Buryat, Sonid, and Horchin tribes.

Population genetic structure. a, PCA of Mongolian individuals and 1000G samples. Mongolians fill a large, less characterized gap between Admixed/Native Americans and other East Asians in the 1000G project. b, PCA of Mongolians and East Asians of 1000G. The abbreviations of EAS populations were used from reference 11.

The fixation index (FST) was used to estimate pairwise genetic differentiation among our Mongolian samples and 26 modern human populations selected from 1000G (…) the Mongolian tribes cluster with East Asian groups. The Mongolian populations show the smallest differentiation from the CHB, and FST values increase relative to the magnitude of geographical separation. The Buryat are the most differentiated tribe compared with other East Asians (1.82–2.97%), while the Horchin are the least (0.25–1.35%). All tribes are closer to the Japanese (JPT) than the CHS with the exception of the Horchin. Among the tribes, the Abaga, Khalkha, Oirat, and Sonid show the least differentiation from one another (FST < 0.15%)

A PCA places the Mongolians in close genetic proximity to a group of North Asian Siberians, including Altaians, Tuvinians, Evenki, and Yakut, indicating that the Mongolian whole-genome variation panel could be a better proxy for these groups than any populations currently in the 1000G panel

The most common Y-chromosome haplogroups are from the C3 sublineage (41.67%), including C3c (29.17%) and C3b (12.50%), followed by haplogroup O (23.61%), and haplogroup N (18.06%) (…) While haplogroups C and O are primarily restricted to Asia, haplogroup N is present at high frequency in Finns (60.5%), at low frequency in non-Mongolian East Asians (< 1%), and virtually absent throughout the remainder of European and African samples in 1000G

Comparison with Finns

Distribution of D-values from D-test under the model of [EAS, Mongolians, X, chimpanzee], where X represents the test population and chimpanzee serves as an outgroup. The positive D-value (Z > 3) indicates that the test population (X) is closer to Mongolians than to EAS. The whiskers correspond to range, and the dots to individual data points, box limits are the upper and lower quartiles. The n in each boxplot is 30. All abbreviations of populations in the figure were used from reference 11.

Of the populations included in our study, Mongolians share the second-highest level of IBD with the Finnish people (FIN), behind only Northern Han Chinese (CHB). While Mongolians share more IBD with Europeans (EUR) as a whole compared with other non-EAS people (Fig. 4b), removal of Finns from the Europeans drops the level of sharing to as low as that with South Asians (SAS) or Admixed American (AMR).

There is considerable geographic separation between modern-day Mongolians and Europe. The positive D-statistic that reveal gene flow between Mongolians and Europeans (Fig. 4c), and the high degree of IBD sharing with Finnish people in particular suggest that complex admixture may have occurred throughout northeastern Europe and Siberia. To see whether Mongolians represent the ethnic group in East Asia with the highest level of gene flow with Finnish people, we calculated a D-statistic for each set of populations [Mongolians, X, FIN, Yoruba (YRI)], where X represents a population from Siberia or Northern Canada. Most of the populations reveal an imbalance in allele frequencies that suggests gene flow with Finns (D >0, Z >3), but the greatest imbalance is observed between Siberians/Northern Canadians and Finnish, rather than between Mongolians and Finns. This pattern indicates that northern Asian populations interacted across large geographic ranges.

6 migration events, from the supplementary materials.

I guess the 1000G does not have northern Eurasian groups, because the IBD map and values would be lightening up with Palaeo-Siberian peoples


Corded Ware—Uralic (IV): Hg R1a and N in Finno-Ugric and Samoyedic expansions


This is the fourth of four posts on the Corded Ware—Uralic identification:

Let me begin this final post on the Corded Ware—Uralic connection with an assertion that should be obvious to everyone involved in ethnolinguistic identification of prehistoric populations but, for one reason or another, is usually forgotten. In the words of David Reich, in Who We Are and How We Got Here (2018):

Human history is full of dead ends, and we should not expect the people who lived in any one place in the past to be the direct ancestors of those who live there today.

Haplogroup N

Another recurrent argument – apart from “Siberian ancestry” – for the location of the Uralic homeland is “haplogroup N”. This is as serious as saying “haplogroup R1” to refer to Indo-European migrations, but let’s explore this possibility anyway:

Ancient haplogroups

We have now a better idea of how many ancient migrations (previously hypothesized to be associated with westward Uralic migrations) look like in genetic terms. From Damgaard et al. (Science 2018):

These serial changes in the Baikal populations are reflected in Y-chromosome lineages (Fig. SA; figs. S24 to S27, and tables S13 and SI4). MAI carries the R haplogroup, whereas the majority of Baikal_EN males belong to N lineages, which were widely distributed across Northern Eurasia (29), and the Baikal_LNBA males all carry Q haplogroups, as do most of the Okunevo_EMBA as well as some present-day Central Asians and Siberians.

The only N1c1 sample comes from Ust’Ida Late Neolithic, 180km to the north of Lake Baikal, which – together with the Bronze Age sample from the Kola peninsula, and the medieval sample from Ust’Ida – gives a good idea of the overall expansion of N subclades and Siberian ancestry among the Circum-Arctic peoples of Eurasia, speakers of Palaeo-Siberian languages.

Geographical location of ancient samples belonging to major clade N of the Y-chromosome.

Modern haplogroups

What we should expect from Uralic peoples expanding with haplogroup N – seeing how Yamna expands with R1b-L23, and Corded Ware expands with R1a-Z645 – is to find a common subclade spreading with Uralic populations. Let’s see if it works like that for any N-X subclade, in data from Ilumäe et al. (2016):

Geographic-Distribution Map of hg N3 / N1c / N1a.

Within the Eurasian circum-Arctic spread zone, N3 and N2a reveal a well-structured spread pattern where individual sub-clades show very different distributions:

N1a1-M46 (or N-TAT), formed ca. 13900 BC, TMRCA 9800 BC

   N1a1a2-B187, formed ca. 9800 BC, TMRCA 1050 AD:

The sub-clade N3b-B187 is specific to southern Siberia and Mongolia, whereas N3a-L708 is spread widely in other regions of northern Eurasia.

     N1a1a1a-L708, formed ca. 6800 BC, TMRCA 5400 BC.

       N1a1a1a2-B211/Y9022, formed ca. 5400 BC, TMRCA 1900 BC:

The deepest clade within N3a is N3a1-B211, mostly present in the Volga-Uralic region and western Siberian Khanty and Mansi populations.

         N1a1a1a1a-L392/L1026), formed ca. 4400 BC, TMRCA 2800 BC:

The neighbor clade, N3a3’6-CTS6967, spreads from eastern Siberia to the eastern part of Fennoscandia and the Baltic States

Frequency-Distribution Maps of Individual Subclade N3a3 / N1a1a1a1a1a-CTS2929/VL29, probably initially with Akozino warrior-traders.

           N1a1a1a1a1a-CTS2929/VL29, formed ca. 2100 BC, TMRCA 1600 BC:

In Europe, the clade N3a3-VL29 encompasses over a third of the present-day male Estonians, Latvians, and Lithuanians but is also present among Saami, Karelians, and Finns (Table S2 and Figure 3). Among the Slavic-speaking Belarusians, Ukrainians, and Russians, about three-fourths of their hg N3 Y chromosomes belong to hg N3a3.

In the post on Finno-Permic expansions, I depicted what seems to me the most likely way of infiltration of N1c-L392 lineages with Akozino warrior-traders into the western Finno-Ugric populations, with an origin around the Barents sea.

This includes the potential spread of (a minority of) N1c-B211 subclades due to contacts with Anonino on both sides of the Urals, through a northern route of forest and forest-steppe regions (equivalent to the distribution of Cherkaskul compared to Andronovo), given the spread of certain subclades in Ugric populations.

NOTE. An alternative possibility is the association of certain B211 subclades with a southern route of expansion with Pre-Scythian and Scythian populations, under whose influence the Ananino culture emerged -which would imply a very quick infiltration of certain groups of haplogroup N everywhere among Finno-Ugrics on both sides of the Urals – , and also the expansion of some subclades with Turkic-speaking peoples, who apparently expanded with alliances of different peoples. Both (Scythian and Turkic) populations expanded from East Asia, where haplogroup N (including N1c) was present since the Neolithic. I find this a worse model of expansion for upper clades, but – given the YFull estimates and the presence of this haplogroup among Turkic peoples – it is a possibility for many subclades.

           N1a1a1a1a2-Z1936, formed ca. 2800 BC, TMRCA 2400 BC:

The only notable exception from the pattern are Russians from northern regions of European Russia, where, in turn, about two-thirds of the hg N3 Y chromosomes belong to the hg N3a4-Z1936—the second west Eurasian clade. Thus, according to the frequency distribution of this clade, these Northern Russians fit better among other non-Slavic populations from northeastern Europe. N3a4 tends to increase in frequency toward the northeastern European regions but is also somewhat unexpectedly a dominant hg N3 lineage among most Turcic-speaking Volga Tatars and South-Ural Bashkirs.

Frequency-Distribution Maps of Individual Subclade N3a4 / N1a1a1a1a2-Z1936, probably with the Samic (first) and Fennic (later) expansions into Paleo-Lakelandic and Palaeo-Laplandic territories.

The expansion of N1a-Z1936 in Fennoscandia is most likely associated with the expansion of Saami into asbestos ware-related territory (like the Lovozero culture) during the Late Iron Age – and mixture with its population – , and with the later Fennic expansion to the east and north, replacing their language, as well as with Arctic and forest populations assimilated during Permic, Ugric, and Samoyedic expansions to the north.

           N1a1a1a1a4-M2019 (previously N3a2), formed ca. 4400 BC, TMRCA 1700 BC:

Sub-hg N3a2-M2118 is one of the two main bifurcating branches in the nested cladistic structure of N3a2’6-M2110. It is predominantly found in populations inhabiting present-day Yakutia (Republic of Sakha) in central Siberia and at lower frequencies in the Khanty and Mansi populations, which exhibit a distinct Y-STR pattern (Table S7) potentially intrinsic to an additional clade inside the sub-hg N3a2

The second widespread sub-clade of hg N is N2a. (…):

   N1a2b-P43 (B523/FGC10846/Y3184), formed ca. 6800 BC, TMRCA ca. 2700 BC:

The absolute majority of N2a individuals belong to the second sub-clade, N2a1-B523, which diversified about 4.7 kya (95% CI = 4.0–5.5 kya). Its distribution covers the western and southern parts of Siberia, the Taimyr Peninsula, and the Volga-Uralic region with frequencies ranging from from 10% to 30% and does not extend to eastern Siberia (…)

Geographic-Distribution Map of hg N2a1 / N1a2b-P43

The “European” branch suggested earlier from Y-STR patterns turned out to consist of two clades

     N1a2b2a-Y3185/FGC10847, formed ca. 2200 BC, TMRCA 800 BC:

N2a1-L1419, spread mainly in the northern part of that region.

     N1a2b2b1-B528/Y24382, formed ca. 900 BC, TMRCA ca. 900 BC:

N2a1-B528, spread in the southern Volga-Uralic region.

Haplogroup R1a

We also have a good idea of the distribution of haplogroup R1a-Z645 in ancient samples. Its subclades were associated with the Corded Ware expansion, and some of them fit quite well the early expansion of Finno-Permic, Ugric, and Samoyedic peoples to the east.

Modified image, from Underhill et al. (2015). Spatial frequency distributions of Z282 (green) and Z93 (blue) affiliated haplogroups.. Notice the potential Finno-Ugric-associated distribution of Z282 (especially R1a-M558, a Z280 subclade), the expansion of R1a-Z2123 subclades with Central Asian forest-steppe groups.

This is how the modern distribution of R1a among Uralians looks like, from the latest report in Tambets et al. (2018):

  • Among Fennic populations, Estonians and Karelians (ca. 1.1 million) have not suffered the greatest bottleneck of Finns (ca. 6-7 million), and show thus a greater proportion of R1a-Z280 than N1c subclades, which points to the original situation of Fennic peoples before their expansion. To trust Finnish Y-DNA to derive conclusions about the Uralic populations is as useful as relying on the Basque Y-DNA for the language spread by R1b-P312
  • Among Volga-Finnic populations, Mordovians (the closest to the original Uralic cluster, see above) show a majority of R1a lineages (27%).
  • Hungarians (ca. 13-15 million) represent the majority of Ugric (and Finno-Ugric) peoples. They are mainly R1a-Z280, also R1a-Z2123, have little N1c, and lack Siberian ancestry, and represent thus the most likely original situation of Ugric peoples in 4th century AD (read more on Avars and Hungarians).
  • Among Samoyedic peoples, the Selkup, the southernmost ones and latest to expand – that is, those not heavily admixed with Siberian populations – , also have a majority of R1a-Z2123 lineages (see also here for the original Samoyedic haplogroups to the south).

To understand the relevance of Hungarians for Ugric peoples, as well as Estonians, Karelians, and Mordovians (and northern Russians, Finno-Ugric peoples recently Russified) for Finno-Permic peoples, as opposed to the Circum-Arctic and East Siberian populations, one has to put demographics in perspective. Even a modern map can show the relevance of certain territories in the past:

Population density (people per km2) map of the world in 1994. From Wikipedia.

Summary of ancestry + haplogroups

Fennic and Samic populations seem to be clearly influenced by Palaeo-Laplandic peoples, whereas Volga-Finnic and especially Permic populations may have received gene flow from both, but essentially Palaeo-Siberian influence from the north and east.

The fact that modern Mansis and Khantys offer the highest variation in N1a subclades, and some of the highest “Siberian ancestry” among non-Nganasans, should have raised a red flag long ago. The fact that Hungarians – supposedly stemming from a source population similar to Mansis – do not offer the same amount of N subclades or Siberian ancestry (not even close), and offer instead more R1a, in common with Estonians (among Finno-Samic peoples) and Mordvins (among Volga-Finnic peoples) should have raised a still bigger red flag. The fact that Nganasans – the model for Siberian ancestry – show completely different N1a2b-P43 lineages should have been a huge genetic red line (on top of the anthropological one) to regard them as the Uralian-type population.

We know now that ethnolinguistic groups have usually expanded with massive (usually male-biased) migrations, and that neighbouring locals often ‘resurge’ later without changing the language. That is seen in Europe after the spread of Bell Beakers, with the increase of previous ancestry and lineages in Scandinavia during the formation of the Nordic ethnolinguistic community; in Central-West Europe, with the resurgence of Neolithic ancestry (and lineages) during the Bronze Age over steppe ancestry; and in Central-East Europe (with Unetice or East European Bronze Age groups like Mierzanowice, Trzciniec, or Lusatian) showing an increase in steppe ancestry (and resurge of R1a subclades); none of them represented a radical ethnolinguistic change.

Map of archaeological cultures in north-eastern Europe ca. 8th-3rd centuries BC. [The Mid-Volga Akozino group not depicted] Shaded area represents the Ananino cultural-historical society. Fading purple arrows represent likely stepped movements of subclades of haplogroup N for centuries (e.g. Siberian → Ananino → Akozino → Fennoscandia [N-VL29]; Circum-Arctic → forest-steppe [N1, N2]; etc.). Blue arrows represent eventual expansions of Uralic peoples to the north. Modified image from Vasilyev (2002).

It is not hard to model the stepped arrival, infiltration, and/or resurge of N subclades and “Siberian ancestries”, as well as their gradual expansion in certain regions, associated with certain migrations first – such as the expansions to the Circum-Arctic region, and later the Scythian- and Turkic-related movements – , as well as limited regional developments, like the known bottleneck in Finns, or the clear late expansion of Ugric and Samoyedic languages to the north among nomadic Palaeo-Siberians due to traditions of exogamy and multilingualism. This fits quite well with the different arrival of N (N1c and xN1c) lineages to the different Uralic-speaking groups, and to the stepped appearance of “Siberian ancestry” in the different regions.

The aternative

It is evident that a lot of people were too attached to the idea of Palaeolithic R1b lineages ‘native’ to western Europe speaking Basque languages; of R1a lineages speaking Indo-European and spreading with Yamna; and N lineages ‘native’ to north-eastern Europe and speaking Uralic, and this is causing widespread weeping and gnashing of teeth (instead of the joy of discovering where one’s true patrilineal ancestors come from, and what language they spoke in each given period, which is the supposed objective of genetic genealogy…)

Since an Indo-Germanic branch (as revived now by some in the Copenhaguen group to fit Kristiansen’s theory of the 1980s with recent genetic data) does not make any sense in linguistics, the finding of R1a in Yamna would not have led where some think it would have, because North-West Indo-European would still be the main Late PIE branch in Europe. Don’t take my word for it; take James P. Mallory’s (2013).

The levels of Indo-European reconstruction, from Mallory & Adams (2006).

If an (unlikely) Indo-Slavonic group were posited, though, such a group would still be bound (with Indo-Iranian) to the steppes with East Yamna/Poltavka (admixing with Abashevo migrants, but retaining its language), developing Sintashta/Potapovka → Srubna/Andronovo, and R1a lineages would have equally undergone the known bottlenecks of the steppes where they replaced R1b-Z2103 – which this eastern group shares with Balkan languages, a haplogroup that links therefore together the Graeco-Aryan group.

As far as I know – and there might be many other similar pet theories out there – there have been proposals of “modern Balto-Slavic-like” populations (in an obvious circular reasoning based on modern populations) in some Scythian clusters of the Iron Age.

NOTE. I will not enter into “Balto-Slavic-like R1a” of the Late Bronze Age or earlier because no one can seriously believe at this point of development of Population Genetics that autosomal similarity predating 1,500+ years the appearance of Slavs equates to their (ethnolinguistic) ancestral population, without a clear intermediate cultural and genetic trail – something we lack today in the Slavic case even for the late Roman period…

The Finnic and Saamic separation looks shallower than it actually is. Invisible convergence can be ‘triangulated’ with the help of Germanic layers of mutual loanwords (Häkkinen 2012).

We also know of R1a-Z280 lineages in Srubna, probably expanding to the west. With that in mind, and knowing that Palaeo-Germanic was in close contact with Finno-Samic while both were already separated but still in contact, and that Palaeo-Germanic was also in contact and closely related to a ‘Temematic’ distinct from Balto-Slavic (and also that early Proto-Baltic and Proto-Slavic from the Roman Iron Age and later were in contact with western Uralic) this will be the linguistic map of the Iron Age if R1a is considered to expand Indo-European from some kind of “patron-client” relationship with west Yamna:

Eastern European language map during the Late Bronze Age / Iron Age, if R1a spread Indo-European languages and Eastern Yamna spoke Indo-Slavonic. Palaeo-Germanic (i.e. Pre- to Proto-Germanic) needs to be in contact with both the Samic Lovozero population and the Fennic west Circum-Arctic one. Italic and Celtic in contact with Pre-Germanic. Germanic in contact with Temematic. Balto-Slavic in contact with Iranian, and near Fennic to allow for later loanwords. For Germanic and Temematic, see Kortlandt (2018).

You might think I have some personal or political reason against this kind of proposals. I haven’t. We have been proposing Indo-European to be the language of the European Union for more than 10 years, so to support R1b-Italo-Celtic in the whole Western Europe, R1a-Germanic in Central and Eastern Europe, and R1a-Indo-Slavonic in the steppes (as the Danish group seems to be doing) has nothing inherently bad (or good) for me. If anything, it gives more reason to support the revival of North-West Indo-European in Europe.

My problem with this proposal is that it is obviously beholden to the notion of the uninterrupted cultural, historic and ethnic continuity in certain territories. This bias is common in historiography (von Falkenhausen 1993), but it extends even more easily into the lesser known prehistory of any territory, and now more than ever some people feel the need to corrupt (pre)history based on their own haplogroups (or the majority haplogroups of their modern countries). However, more than on philosophical grounds, my rejection is based on facts: this picture is not what the combination of linguistic, archaeological, and genetic data shows. Period.

Nevertheless, if Yamna + Corded Ware represented the “big and early expansion” of Germanic and Italo-Celtic peoples proper of the dream Nazi’s Lebensraum and Fascist’s spazio vitale proposals; Uralians were Siberian hunter-gatherers that controlled the whole eastern and northern Russia, and miraculously managed to push (ethnolinguistically) Neolithic agropastoralists to the west during and after the Iron Age, with gradual (and often minimal) genetic impact; and Balto-Slavic peoples were represented by horse riders from Pokrovka/Srubna, hiding then somewhere around the forest-steppe until after the Scythian expansion, and then spreading their language (without much genetic impact) during the early Middle Ages…so be it.

See also