Proto-Indo-European homeland south of the Caucasus?

User Camulogène Rix at Anthrogenica posted an interesting excerpt of Reich’s new book in a thread on ancient DNA studies in the news (emphasis mine):

Ancient DNA available from this time in Anatolia shows no evidence of steppe ancestry similar to that in the Yamnaya (although the evidence here is circumstantial as no ancient DNA from the Hittites themselves has yet been published). This suggests to me that the most likely location of the population that first spoke an Indo-European language was south of the Caucasus Mountains, perhaps in present-day Iran or Armenia, because ancient DNA from people who lived there matches what we would expect for a source population both for the Yamnaya and for ancient Anatolians. If this scenario is right the population sent one branch up into the steppe-mixing with steppe hunter-gatherers in a one-to-one ratio to become the Yamnaya as described earlier- and another to Anatolia to found the ancestors of people there who spoke languages such as Hittite.

The thread has since logically become a trolling hell, and it seems not to be working right for hours now.

Reich’s proposal based on ancestral components to explain the formation of a people and language is a continuation of their emphasis on ancestry to explain cultures and languages. It seems quite interesting to see this happen again, given their current trend to surreptitiously modify their previous ‘Yamnaya ancestry’ concept and Yamnaya millennia-long R1a-R1b community (that supposedly explains a Yamna -> Corded Ware -> Bell Beaker migration) to a more general ‘steppe people’ sharing a ‘steppe ancestry’ who spoke a ‘steppe language’.

Interesting arrows of dispersal of steppe ancestry, from Yamna -> Corded Ware -> Bell Beaker, from David Reich’s new book (yes, from 2018, number one bestseller in

This new idea based on ancestral components suffers thus from the same essential methodological problems, which equate it – yet again – to pure speculation:

  1. It is a conclusion based on the genomic analysis of few individuals from distant regions and different periods, and – maybe more disturbingly – on the lack of steppe ancestry in the few samples at hand.
  2. Wait, what? Steppe ancestry? So they are trying to derive potential genetic connections among specific prehistoric cultures with a poorly depicted genetic sketch, based on previous flawed concepts (instead of on anthropological disciplines), which seems a rather long stretch for any scientist, whether they are content with seeing themselves as barbaric scientific conquerors of academic disciplines or not. In other words, statistics is also science (in fact, the main one to assert anything in almost any scientific field), and you cannot overcome essential errors (design, sampling, hypothesis testing) merely by using a priori correct statistical methods. Results obtained this way constitute a statistical fallacy.

  3. Even if the sampling and hypothesis testing were fine, to derive anthropological models from genomic investigation is completely wrong. Ancestral component ≠ population.
  4. To include not only potential migrations, but also languages spoken by these potential migrants? It’s sad that we have a need to repeat it, but if ancestral component ≠ population, how could ancestral component = language?

The Proto-Indo-European-speaking community

This is what we know about the formation of a Proto-Indo-European community (i.e. a community speaking a reconstructible Proto-Indo-European language) in the Pontic-Caspian steppe, which is based on linguistic reconstruction and guesstimates, tracing archaeological cultures backwards from cultures known to have spoken ancient (proto-)languages, and helping both disciplines with anthropological models (for which ancient genomics is only helping select certain details) of migration or – rarely – cultural diffusion:

NOTE. The following dates are obviously simplified. Read here a more detailed linguistic assessment based on phonology.

Most likely Pre-Proto-Anatolian migration with Suvorovo-Novodanilovka chiefs in the North Pontic steppe and the Balkans.
  • ca. 5000 BC. Early Proto-Indo-European (or Indo-Uralic) spoken probably during the formation and development of a loose Early Khvalynsk – Sredni Stog I cultural-historical community over the Pontic-Caspian steppe region, whose indigenous population probably had mainly Caucasus hunter-gatherer ancestry.
  • ca. 4500 BC. Khvalynsk probably speaking Middle Proto-Indo-European expands, most likely including Suvorovo-Novodanilovka chiefs into the North Pontic steppe, and probably expanding R1b-M269 lineages for the first time.
  • ca. 4000 BC. Separated communities develop, including North Pontic cultures probably gradually dominated by R1a-Z645 (potentially speaking Proto-Uralic); and Khvalynsk (and Repin) cultures probably dominated by R1b-L23 lineages, most likely developing a Late Proto-Indo-European already separated from Proto-Anatolian.
  • ca. 3500 BC. A Proto-Corded Ware population dominated by R1a-Z645 expands to the north, and slightly later an early Yamna community develops from Late Khvalynsk and Repin, expanding to the west of the Don River, and to the east into Afanasevo. This is most likely the period of reduction of variability and expansion of subclades of R1a-Z645 and R1b-L23 that we expect to see with more samples.
  • ca. 3000 BC. Expansion of Corded Ware migrants in northern Europe, and Yamna migrants along the Danube and into the Balkans, with further reduction and expansion of certain subclades.
  • ca. 2500 BC. Expansion of Bell Beaker migrants dominated by R1b-L51 subclades in Europe, and late Corded Ware migrants in east Yamna expanding R1a-Z93 subclades.

All these events are compatible with language reconstruction in mainstream European schools since at least the 1980s, supported by traditional archaeological research of the past 20 years, and is being confirmed with Genomics.

For those willingly lost in a myriad of new dreams boosted by the shallow comment contained in David Reich’s paragraph on CHG ancestry, even he does not doubt that the origin of Late Proto-Indo-European lies in Yamna, to the north of the Caucasus, based on Anthony’s (2007) account:

Both images from the book, posted by Twitter user Jasper at

NOTE: By the way, David Anthony, one of the main sources of information for Reich’s group, never considered Corded Ware to have received Yamna migrants, and althought he changed his model due to the conclusions of the 2015 papers, he has recently changed his model again to adapt it to the inconsistencies found in phylogeography.

CHG ancestry and PIE homeland south of the Caucasus

As for the potential origins of CHG ancestry in early Proto-Indo-European speakers, I already stated clearly my opinion quite recently. They may be attributed to:

Just to be clear, an expansion of Proto-Anatolian to the south, through the Caucasus, cannot be discarded today. It will remain a possibility until Maykop and more Balkan Chalcolithic and Anatolian-speaking samples are published.

However, an original Early Proto-Indo-European community south of the Caucasus seems to me highly unlikely, based on anthropological data, which should drive any conclusion. From what I could read, here are the rather simplistic arguments used:

  • Gimbutas and Maykop: Maykop was thought to be (in Gimbutas’ times) a rather late archaeological culture, directly connected to a Transcaucasian Copper Age culture ca. 2400-2300 BC. It has been demonstrated in recent years that this culture is substantially older, and even then language guesstimates for a Late PIE / Proto-Anatolian would not fit a migration to the north. While our ignorance may certainly be used to derive far-fetched conclusions about potential migrations from and to it, using Gimbutas (or any archaeological theory until the 1990s) today does not make any sense. Still less if we think that she favoured a steppe homeland.

NOTE. It seems that the Reich Lab may have already access to Maykop samples, so this suggested Proto-Indo-European – Maykop connection may have some real foundation. Regardless, we already know that intense contacts happened, so there will be no surprise (unless Y-DNA shows some sort of direct continuity from one to the other).

  • Gamkrelidze & Ivanov: they argued for an Armenian homeland (and are thus at the origin of yet another autochthonous continuity theory), but they did so to support their glottalic theory, i.e. merely to support what they saw as favouring their linguistic model (with Armenian being the most archaic dialect). The glottalic theory is supported today – as far as I know – mainly by Kortlandt, Jagodziński, or (Nostraticist) Bomhard, but even they most likely would not need to argue for an Armenian homeland. In fact, their support of a Graeco-Aryan group (also supported by Gamkrelidze & Ivanov) would be against this, at least in archaeological terms.
  • Colin Renfrew and the Anatolian homeland: This conceptual umbrella of language spreading with farming everywhere has changed so much and so many times in the past 20 years, with so many glottochronological and archaeological estimates circulating, that you can support anything by now using them. Mostly used today for abstract models of long-lasting language contacts, cultural diffusion, and constellation analogies. Anyway, he strives to keep up-to-date information to revise the model, that much is certain:
  • Glottochronology, phylogenetic trees, Swadesh list analysis, statistical estimates, psychics, pyramid power, and healing crystals: no, please, no.
Science Magazine
“A first line of evidence comes from linguistic analysis based on quantitative lexical data, which returned a tree compatible with the Anatolian hypothesis

In principle, unlike many other recent autochthonous continuity theories, I doubt there can be much racial-based opposition anywhere in the world to an origin of Proto-Indo-European in the Middle East, where the oldest civilizations appeared – apart, obviously, from modern Northeast and Northwest Caucasian, Kartvelian, or Semitic speakers, who may in turn have to revisit their autochthonous continuity theories radically…

Nevertheless, it is obvious that prehistoric (and many historic) migrations are signalled by the reduction in variability and expansion of certain Y-DNA haplogroups, and not just by ancestral components. That is generally accepted, although the reasons for this almost universal phenomenon are not always clear.

In fact, Proto-Anatolian and Common Anatolian speakers need not share any ancestral component, PCA cluster, or any other statistical parameter related to steppe populations, not even the same Y-DNA haplogroups, given that approximately three thousand years might have passed between their split from an Indo-Hittite community and the first attested Anatolian-speaking communities…We must carefully follow their tracks from Anatolia ca. 1500 BC to the steppe ca. 4500 BC, otherwise we risk creating another mess like the Corded Ware one.

In my opinion, the substantial contribution of EHG ancestry and R1a-M417 lineages to the Pontic-Caspian steppe (probably ca. 6500 BC) from Central or East Eurasia is the most recent sizeable genomic event in the region, and thus the best candidate for the community that expanded a language ancestral to Proto-Indo-European – whether you call it Pre-Proto-Indo-European, Pre-Indo-Uralic, or Eurasiatic, depending on your preferences.

An early (and substantial) contribution of CHG ancestry in Khvalynsk relative to North Pontic cultures, if it is found with new samples, may actually be a further proof of the Caucasian substrate of Proto-Indo-European proposed by Kortlandt (or Bomhard) as contributing to the differentiation of Middle PIE from Uralic. Genomics could thus help support, again, traditional disciplines in accepting or rejecting academic controversial theories.


In the case of an Early PIE (or Indo-Uralic) homeland, genomic data is scarce. But all traditional anthropological disciplines point to the Pontic-Caspian steppe, so we should stick to it, regardless of the informal suggestion written by a renown geneticist in one paragraph of a book conceived as an introduction to the field.

It seems we are not learning much from the hundreds of peer-reviewed, statistically (superficially, at least) sound genetic papers whose anthropological conclusions have been proven wrong by now. A lot of people should be spending their time learning about the complex, endless methods at hand in this kind of research – not just bioinformatics – , instead of fruitlessly speculating about wild unsubstantiated proposals.

As a final note, I would like to remind some in the discussion, who seem to dismiss the identification of CHG with Proto-Indo-European by supporting a “R1a-R1b” community for PIE, of their previous commitment to ancestral components in identifying peoples and languages, and thus their support to Reich’s (and his group’s) fundamental premises.

You cannot have it both ways. At least David Reich is being consistent.


Statistical methods fashionable again in Linguistics: Reconstructing Proto-Australian dialects

Reconstructing remote relationships – Proto-Australian noun class prefixation, by Mark Harvey & Robert Mailhammer, Diachronica (2017) 34(4): 470–515


Evaluation of hypotheses on genetic relationships depends on two factors: database size and criteria on correspondence quality. For hypotheses on remote relationships, databases are often small. Therefore, detailed consideration of criteria on correspondence quality is important. Hypotheses on remote relationships commonly involve greater geographical and temporal ranges. Consequently, we propose that there are two factors which are likely to play a greater role in comparing hypotheses of chance, contact and inheritance for remote relationships: (i) spatial distribution of corresponding forms; and (ii) language specific unpredictability in related paradigms. Concentrated spatial distributions disfavour hypotheses of chance, and discontinuous distributions disfavour contact hypotheses, whereas hypotheses of inheritance may accommodate both. Higher levels of language-specific unpredictability favour remote over recent transmission. We consider a remote relationship hypothesis, the Proto-Australian hypothesis. We take noun class prefixation as a test dataset for evaluating this hypothesis against these two criteria, and we show that inheritance is favoured over chance and contact.

I was redirected to this work by my wife – who discovered it reading BBC News – , suspicious of its potential glottochronological content. However, I must say – speaking from my absolute ignorance of the main language family investigated – , that it seemed in general an interesting read, with some thorough discussion and attention to detail.

The statistical analyses, however, seem to disrupt the content, and – in my opinion – do not help support its conclusions.

Map of Non-Pama-Nyungan languages.

Computer Science and Linguistics

We are evidently on alert to tackle dubious research, because of the revival of pseudoscientific methods in linguistic investigation, promoted (yet again) by Nature.

It seems that journals with the highest impact factor, in their search for groundbreaking conclusions supported by any methods involving numbers, are setting a still lower level of standards for academic disciplines.

NOTE. If you think about it – if glottochronology has survived the disgrace it fell into in the 2000s, to come back again now to the top of the publishing industry… How can we expect the “Yamnaya ancestry” concept to be overcome? I guess we will still see certain Eastern Europeans in 2030 arguing for elevated steppe ancestry here and there to support the conclusions of the 2015 papers, no matter what…

I am sure that worse times lie ahead for traditional comparative grammar. For example, it seems that there will be more publications on Proto-Indo-European using novel computer methods: a group led by Janhunen and Pyysalo, from the Department of Languages at the University of Helsinki, promises – under an ever-growing bubble of mistery (or so it seems from their Twitter and Facebook accounts) – a machine-implemented reconstruction (with the generative etymological PIE lexicon project) that will once and for all solve all our previous ‘inconsistencies’…

Spoiler alert for their publications: whether they select to go on mainly with computer-implemented methods, or they use them to support more traditional results, their conclusions will confirm (surprise!) their authors’ previous reactionary theses, such as a renewed support for the traditional monolaryngealism, and a rejection of Kortlandt’s or Kloekhorst’s (i.e. the Leiden School’s) theories on Proto-Indo-European phonology, and thus a PIE relationship to Proto-Uralic, probably stressing yet again an independent origin for both proto-languages.

See also:

Oldest N1c1a1a-L392 samples and Siberian ancestry in Bronze Age Fennoscandia

Open access preprint at bioRxiv, Ancient Fennoscandian genomes reveal origin and spread of Siberian ancestry in Europe, by Lamnidis et al. (2018).

Abstract (emphasis mine):

European history has been shaped by migrations of people, and their subsequent admixture. Recently, evidence from ancient DNA has brought new insights into migration events that could be linked to the advent of agriculture, and possibly to the spread of Indo-European languages. However, little is known so far about the ancient population history of north-eastern Europe, in particular about populations speaking Uralic languages, such as Finns and Saami. Here we analyse ancient genomic data from 11 individuals from Finland and Northwest Russia. We show that the specific genetic makeup of northern Europe traces back to migrations from Siberia that began at least 3,500 years ago. This ancestry was subsequently admixed into many modern populations in the region, in particular populations speaking Uralic languages today. In addition, we show that ancestors of modern Saami inhabited a larger territory during the Iron Age than today, which adds to historical and linguistic evidence for the population history of Finland.

Interesting excerpts (edited):

While the Siberian genetic component described here was previously described in modern-day populations from the region, we gain further insights into its temporal depth. Our data suggest that this fourth genetic component found in modern-day north-eastern Europeans arrived in the area around 4,000 years ago at the latest, as illustrated by ALDER dating using the ancient genome-wide data from Bolshoy Oleni Ostrov. The upper bound for the introduction of this component is harder to estimate. The component is absent in the Karelian hunter-gatherers (EHG) 3 dated to 8,300-7,200 yBP as well as Mesolithic and Neolithic populations from the Baltics from 8,300 yBP and 7,100-5,000 yBP respectively. While this suggests an upper bound of 5,000 yBP for the arrival of Siberian ancestry, we cannot exclude the possibility of its presence even earlier, yet restricted to more northern regions, as suggested by its absence in populations in the Baltic during the Bronze Age. Our study also presents the earliest occurrence of the Y-chromosomal haplogroup N1c in Fennoscandia. N1c is common among modern Uralic speakers, and has also been detected in Hungarian individuals dating to the 10th century, yet it is absent in all published Mesolithic genomes from Karelia and the Baltics.

The large Siberian component in the Bolshoy individuals from the Kola Peninsula provides the earliest direct genetic evidence for an eastern migration into this region. Such contact is well documented in archaeology, with the introduction of asbestos-mixed Lovozero ceramics during the second millenium BC, and the spread of even-based arrowheads in Lapland from 1,900 BCE. Additionally, the nearest counterparts of Vardøy ceramics, appearing in the area around 1,600-1,300 BCE, can be found on the Taymyr peninsula, much further to the east. Finally, the Imiyakhtakhskaya culture from Yakutia spread to the Kola Peninsula during the same period. Contacts between Siberia and Europe are also recognised in linguistics. The fact that the Siberian genetic component is consistently shared among Uralic-speaking populations, with the exceptions of Hungarians and the non-Uralic speaking Russians, would make it tempting to equate this component with the spread of Uralic languages in the area. However, such a model may be overly simplistic. First, the presence of the Siberian component on the Kola Peninsula at ca. 4000 yBP predates most linguistic estimates of the spread of Uralic languages to the area. Second, as shown in our analyses, the admixture patterns found in historic and modern Uralic speakers are complex and in fact inconsistent with a single admixture event. Therefore, even if the Siberian genetic component partly spread alongside Uralic languages, it likely presented only an addition to populations carrying this component from earlier.

Plot of ADMIXTURE (K=3) results containing West Eurasian populations and the Nganasan. Ancient individuals from this study are represented by thicker bars.

The novel genome-wide data here presented from ancient individuals from Finland opens new insights into Finnish population history. Two of the three higher coverage individuals and all six low coverage individuals from Levänluhta showed low genetic affinity to modern-day Finnish speakers of the area. Instead, an increased affinity was observed to modern-day Saami speakers, now mostly residing in the north of the Scandinavian Peninsula. These results suggest that the geographic range of the Saami extended further south in the past, and hints at a genetic shift at least in the western Finnish region during the Iron Age. The findings are in concordance with the noted linguistic shift from Saami languages to early Finnish. Further ancient DNA from Finland is needed to conclude to what extent these signals of migration and admixture are representative of Finland as a whole.

PCA plot of 113 Modern Eurasian populations, with individuals from this study projected on the principal components. Uralic speakers are highlighted in light purple.

The two samples of haplogroup N1c1a1a-L392/L1026, dated ca. 1500 BC, come from the site Bolshoy Oleniy Ostrov, in the Kola Peninsula.

Bolshoy Oleniy Ostrov (Great Reindeer Island), situated in the Kola Bay of the Barents Sea and separated from the mainland by Yekarerininsky Island and two straits, harbors the ancient cemetery of an unknown Early Metal Age culture. The preservation of artifacts made from bone and antler, wooden structures, as well as human remains is remarkable for the location and age this site represents. Altogether 19 skeletons of adults and children have been recognized from both single and collective burials of the site, together with more than 250 artifacts. (…) Apart from these excavations, approximately 25 burials were revealed in 1934 during the construction of fortifications. (…) Radiocarbon dates are provided by Moiseyev and Khartanovich in their 2012 study, placing the site in middle to the late 2nd millennium BC (…)

After seing how Late Indo-European languages spread with Yamna and (mainly) R1b-L23 lineages, we are now obtaining proof of how Siberian ancestry – likely accompanying N1c-L392 lineages – was probably related to an early archaeological Siberian influence in the easternmost region of North-East Europe, seen also probably in linguistics.

NOTE. Whereas I proposed – based mainly on common guesstimates – that R1a-M417 and EHG ancestry might have signaled the arrival of an early Yukaghir substratum to NE Europe, later acquired by Uralic spreading over this territory, while N1c1a1a lineages with the Seima-Turbino phenomenon might have given Uralic its later Altaic traits, it is indeed possible – and more likely with the findings in this paper – that N1c1a1a lineages may have in fact spread Yukaghir languages, especially if (like the Leiden school) one supports an Indo-Uralic community.

The linguistic effect of this migration may depend on one’s preferred model for Proto-Uralic and its strata, and especially on one’s position in the Proto-Uralic vs. Proto-Uralo-Yukaghir controversy. Although I really didn’t have a strong opinion on this matter, it is clear from my texts that (unlike Kortlandt) I didn’t consider Yukaghir to share a common ancestor with Uralic languages. What genomics is showing right now seems to me directly translatable to a linguistic model, and we should therefore reject an original Proto-Uralo-Yukaghir community.

Also, it seems that the Finnish population peak which expanded today’s prevalent N1c-L392 lineages – after the Iron Age bottleneck which likely reduced its haplogroup diversity – may have been associated with the event that displaced the Saami population from Finland after ca. 1000 AD.

I think it is becoming still clearer where Uralic languages came from.


Olalde et al. and Mathieson et al. (Nature 2018): R1b-L23 dominates Bell Beaker and Yamna, R1a-M417 resurges in East-Central Europe during the Bronze Age

The official papers Olalde et al. (Nature 2018) and Mathieson et al. (Nature 2018) have appeared. They are based on the 2017 preprints at BioRxiv The Beaker Phenomenon And The Genomic Transformation Of Northwest Europe and The Genomic History Of Southeastern Europe respectively, but with a sizeable number of new samples.

Papers are behind a paywall, but here are the authors’ shareable links to read the papers and supplementary materials: Olalde et al. (2018), Mathieson et al. (2018).

NOTE: The corresponding datasets have been added to the Reich Lab website. Remember you can use my drafts on DIY Human Ancestry analysis (viz. Plink/Eigensoft, PCA, or ADMIXTURE) to investigate the data further in your own computer.

Image modified by me, from Olalde et al (2018). PCA of 999 Eurasian individuals. Marked is the late CWC outlier sample from Esperstedt, showing how early East Bell Beaker samples are the closest to Yamna samples.

I don’t have time to analyze the samples in detail right now, but in short they seem to convey the same information as before: in Olalde et al. (2018) the pattern of Y-DNA haplogroup and steppe ancestry distribution is overwhelming, with an all-R1b-L23 Bell Beaker people accompanying steppe ancestry into western Europe.

EDIT: In Mathieson et al. (2018), a sample classified as of Ukraine_Eneolithic from Dereivka ca. 2890-2696 BC is of R1b1a1a2a2-Z2103 subclade, so Western Yamna during the migrations also of R1b-L23 subclades, in contrast with the previous R1a lineages in Ukraine. In Olalde et al. (2018), it is clearly stated that of the four BB individuals with higher steppe ancestry, the two with higher coverage could be classified as of R1b-S116/P312 subclades.

This is compatible with the expansion of Indo-European-speaking Yamna migrants (also mainly of R1b-L23 subclades) into the East Bell Beaker group, as described with detail in Archaeology (and with the population movement we are seeing having been predicted) first by Volker Heyd in 2007.

Yamna – East Bell Beaker migration 3000-2300 BC. Adapted from Harrison and Heyd (2007), Heyd (2007)

Also, the resurge of R1a-Z645 subclades in Czech and Polish lands (from previous Corded Ware migrants) accompanying other lineages indigenous to the region – seems to have happened only after the Bell Beaker expansion into these territories, during the Bronze Age, probably leading to the formation of the Balto-Slavic community, as I predicted based on previous papers. The fact that a sample of R1b-U106 subclade pops up in this territory is interesting from the point of view of a shared substrate with Germanic, as is the earlier BB sample of R1b-Z2103 for its connection with Graeco-Aryan dialects.

All this suggests that a North-West Indo-European dialect – ancestor of Italo-Celtic, Germanic, and Balto-Slavic -, supported in Linguistics by most modern Indo-European schools of thought, expanded roughly along the Danube, and later to northern, eastern, and western Europe with the Bell Beaker expansion, as supported in Anthropology by Mallory (in Celtic from the West 2, 2013), and by Prescott for the development of a Nordic or Pre-Germanic language in Scandinavia since 1995.

Diachronic map of Late Copper Age migrations including Classical Bell Beaker (east group) expansion from central Europe ca. 2600-2250 BC

Maybe more importantly, the fact that only Indo-Iranian-speaking Sintashta-Petrovka (and later Andronovo) cultures were clearly associated with R1a-Z645 subclades, and rather late – after mixing with early Chalcolithic North Caspian steppe groups (mainly East Yamna and Poltavka herders of R1b-L23 subclades) – gives support to the theory that Corded Ware (and probably the earlier Sredni Stog) groups did not speak or spread Indo-European languages with their migration, but most likely Uralic – as seen in recent papers on the much later arrival of haplogroup N1c – (compatible with the Corded Ware substrate hypothesis), adopting Indo-Iranian by way of cultural diffusion or founder effect events.

As Sheldon Cooper would say,

Under normal circumstances I’d say I told you so. But, as I have told you so with such vehemence and frequency already the phrase has lost all meaning. Therefore, I will be replacing it with the phrase, I informed you thusly

I informed you thusly:

Corded Ware culture contacts in the Baltic Sea region linked to immigrant potters


Article behind paywall Tracing grog and pots to reveal neolithic Corded Ware Culture contacts in the Baltic Sea region (SEM-EDS, PIXE) by Larsson et al., J. Archaeol. Sci. (2018) 91:77-91.

Abstract (emphasis mine):

The Neolithic Corded Ware Culture (CWC) complex spread across the Baltic Sea region ca. 2900/2800–2300/2000 BCE. Whether this cultural adaptation was driven by migration or diffusion remains widely debated. To gather evidence for contact and movement in the CWC material culture, grog-tempered CWC pots from 24 archaeological sites in southern Baltoscandia (Estonia and the southern regions of Finland and Sweden) were sampled for geochemical and micro-structural analyses. Scanning electron microscopy with energy dispersive spectrometry (SEM-EDS) and particle-induced X-ray emission (PIXE) were used for geochemical discrimination of the ceramic fabrics to identify regional CWC pottery-manufacturing traditions and ceramic exchange. Major and minor element concentrations in the ceramic body matrices of 163 individual vessels and grog temper (crushed pottery) present in the ceramic fabrics were measured by SEM-EDS. Furthermore, the high-sensitivity PIXE technique was applied for group confirmation. The combined pot and grog matrix data reveal eight geochemical clusters. At least five geochemical groups appeared to be associated with specific find locations and regional manufacturing traditions. The results indicated complex inter-site and cross-Baltic Sea pottery exchange patterns, which became more defined through the grog data, i.e., the previous generations of pots. The CWC pottery exhibited high technological standards at these latitudes, which, together with the identified exchange patterns and the existing evidence of mobility based on human remains elsewhere in the CWC complex, is indicative of the relocation of skilled potters, possibly through exogamy. An analytical protocol for the geochemical discrimination of grog-tempered pottery, and its challenges and possibilities, is presented.

Meolithic Corded Ware Culture sites studied in the Baltic Sea region

We are seeing a growing complexity for the definition of the Corded Ware culture in anthropological models that help us understand genomic data, including its precise origins and expansion, and indeed for the question of the expansion of ancient Uralic languages in the region.


Genetic prehistory of the Baltic Sea region and Y-DNA: Corded Ware and R1a-Z645, Bronze Age and N1c


Open Access The genetic prehistory of the Baltic Sea region, by Mittnik et al., Nature Communications 9: 442 (2018), based on preprint The Genetic History of Northern Europe, at BioRxirv.

As you can see, it follows my predictions in terms of haplogroups, and sadly the same trend to substitute ‘Yamna’ for ‘steppe’ while keeping linguistic interpretations unchanged…

Important excerpts for the Indo-European question (emphasis mine):

Mesolithic to Neolithic

In the archaeological understanding, the transition from Mesolithic to Neolithic in the Eastern Baltic region does not coincide with a large-scale population turnover and a stark shift in economy as seen in Central and Southern Europe. Rather, it is signified by a change in networks of contacts and the use of pottery, among other material, cultural and economic changes. Our results suggest continued admixture between groups in the south of the Eastern Baltic region, who are more closely related to WHG, and northern or eastern groups, more closely related to EHG. Neolithic social networks from the Eastern Baltic to the River Volga could also explain similarities of the hunter-gatherer pottery styles, although morphologically analogous ceramics might also have developed independently due to similar functionality. The genetic evidence for a change in networks and possibly even a large-scale population movement is most pronounced in the Middle Neolithic in individuals attributed to the CCC. The distribution of this culture overlaps in the north with the Narva culture and extends further north to Finland and Karelia. Its spread in the Eastern Baltic is linked with a significant change in imported raw materials, artefacts, and the appearance of village-like settlements15.

Neolithic to Chalcolithic

We see a further population movement into the regions surrounding the Baltic Sea with the CWC in the Late Neolithic that was accompanied by the first evidence of extensive animal husbandry in the Eastern Baltic. The presence of ancestry from the Pontic-Caspian Steppe among Baltic CWC individuals without the genetic component from north-western Anatolian Neolithic farmers must be due to a direct migration of steppe pastoralists that did not pick up this ancestry in Central Europe. It suggests import of the new economy by an incoming steppe-like population independent of the agricultural societies that were already established to the south and west of the Baltic Sea. The presence of direct contacts to the steppe could lend support to a linguistic model that sees an early branching of Balto-Slavic from a Proto-Indo-European language, for which the west Eurasian steppe was proposed as a homeland. However, as farmer ancestry is found in later Eastern Baltic individuals, it is likely that considerable individual mobility and a network of contact throughout the range of the CWC facilitated its spread eastward, possibly through exogamous marriage practices. Conversely, the appearance of mitochondrial haplogroup U4 in the Central European Late Neolithic after millennia of absence could indicate female gene-flow from the Eastern Baltic, where this haplogroup was present at high frequency.

PCA and ADMIXTURE analysis reflecting Late Neolithic in Northern European prehistory. a Principal components analysis of 1012 present-day West Eurasians (grey points, modern Baltic populations in dark grey) with 294 projected published ancient and ancient North European samples introduced in this study (marked with a red outline). b Ancestral components in ancient individuals estimated by ADMIXTURE (k = 11)
Zoomed-in version of the European Late Neolithic PCA.

So, we see that no farmer ancestry is found in the Baltic (unlike in Western Yamna), that PCA of Late Neolithic is closer to Corded Ware samples from Europe (or to earlier samples from the region) and not to Yamna, as suggested at first by the Zvejnieki individual.

There obviously was exogamy – which may in fact justify the findings in PCA close to Yamna (like the Zvejnieki sample), although researchers obviate that.

Also, as expected, no R1b-M269 in the Baltic (during the Corded Ware period), most are R1a with the majority showing subclade R1a-Z645 (and others poor SNP coverage), which support the reduction in haplogroup diversity to this very subclade during the expansion of Corded Ware peoples, as I predicted it would happen.

Bronze Age

Local foraging societies were, however, not completely replaced and contributed a substantial proportion to the ancestry of Eastern Baltic individuals of the latest LN and Bronze Age. This ‘resurgence’ of hunter-gatherer ancestry in the local population through admixture between foraging and farming groups recalls the same phenomenon observed in the European Middle Neolithic and is responsible for the unique genetic signature of modern-day Eastern Baltic populations.

We suggest that the Siberian and East Asian related ancestry in Estonia, and Y-haplogroup N in north-eastern Europe, where it is widespread today, arrived there after the Bronze Age, ca. 500 calBCE, as we detect neither in our Bronze Age samples from Lithuania and Latvia. As Uralic speaking populations of the Volga-Ural region show high frequencies of haplogroup N, a connection was proposed with the spread of Uralic language speakers from the east that contributed to the male gene pool of Eastern Baltic populations and left linguistic descendants in the Finno-Ugric languages Finnish and Estonian. A potential future direction of research is the identification of the proximate population that contributed to the arrival of this eastern ancestry into Northern Europe.

I predicted that haplogroup N arrived probably to the region west of the Urals with the Sejma-Turbino phenomenon, and that it expanded quite late, probably through founder effects. A late arrival to the region leaves obviously (safe for these researchers and others working with old ideas) only the Corded Ware culture (represented by steppe admixture and mainly haplogroup R1a-Z645) as the vector of expansion of Uralic languages, which show obviously a dialectalization process and regional expansion much older than 500 BC…

It is funny to see how people keep trying to identify R1a with ‘Yamnaya’, now ‘steppe’, but always Indo-European (an ethnolinguistic term, mind you) supposedly because of the ‘Yamnaya’ (now ‘steppe’) admixture, but the only ‘mark’ of Uralic languages for the same researchers in the same paper using this very concept is nevertheless, paradoxically, haplogroup N, with an assumption explicitly based on prevalence in modern populations

This admixture vs. haplogroup question for language and culture identification in genetic papers is really gettting messed up with new data, now in a contortionist-like way…

Images and text: Content of the paper is licensed under CC-by 4.0.

See also:

Differences in ADMIXTURE between Khvalynsk/Yamna and Sredni Stog/Corded Ware


Looking for differences among steppe cultures in Genomics is like looking for a needle in a haystack.

It means, after all, looking for differences among closely related cultures, such as between South-Western and North-Western Anatolian Neolithic cultures, or among Old European cultures (such as Vinča or Cucuteni–Trypillia), or between Iberian cultures after the arrival of steppe-related populations.

These differences between closely related regions, in all these cases and especially among steppe cultures, even when they are supported by Archaeology and anthropological models of migration (and compatible with linguistic models), are expected to be minimal.

Fortunately, we have phylogeography, which helps us point in the right direction when assessing potential migrations using genomic data.

User Tomenable recently pointed out a curious finding on Anthrogenica, from data available in Mathieson et al (2017): in ADMIXTURE results with K=12, a different ancestral component (in light green in the paper, see below) is traceable from the North Caspian steppe since the Neolithic. This is also partially distinguishable on K=10 and K=11, although not so clearly differentiating among later cultures.

NOTE: Read more on the controversy regarding the ideal number of ancestral populations, the absurd use of ADMIXTURE to solve language questions, and the meaning of cross-validation (CV) values

Unsupervised ADMIXTURE plot from k=10 to 12, on a dataset consisting of 1099 present-day individuals and 476 ancient individuals. We show newly reported ancient individuals and some previously published individuals for comparison.

Explanations for this finding might include, as the user points out, a greater contribution of CHG ancestry in the eastern steppe cultures (Khvalynsk/Yamna) compared to the North Pontic steppe (Sredni Stog/Corded Ware), which is probably one of the main genomic differences among both cultures, as I pointed out in the Indo-European demic diffusion model (see accounts on the origins of Khvalynsk and Sredni Stog populations and on contacts between Yamna and the Caucasus, and see below also my sketch of Eurasian genomic history).

Interesting is also the appearance of similar ancestral components later in Vučedol – which probably received admixture from Yamna settlers (see admixture components in West Yamna samples and in the Yamna settler from Bulgaria) – , and later still in the Balkans.

On the other hand, previous ancestral components in outliers from the Balkans seem to be more similar to Sredni Stog samples, giving still more strength to the hypothesis that this common (“steppe”) component expanded westward within the Pontic-Caspian steppe with the spread of Suvorovo-Novodanilovka chiefs.

Problems with this interpretation include:

1) The scarce samples available, the different cultures included, and the CV values of the K populations selected in ADMIXTURE.

2) The lack of data for comparison with Bell Beaker peoples (from Olalde et al. 2017).

3) The sample classified as Latvia_LN/CWC has this component. I have already said before that, given the differences with all other Corded Ware samples, this quite early sample might be an outlier, with Khvalynsk/Yamna population connected directly to the ancestors of this individual, possibly through exogamy (as it is clear from my sketch below). Whether or not this is an outlier among CWC populations in the Baltic, only future samples can tell.

4) Three later individuals from Corded Ware in Germany have the component, in a minimal amount. I would bet – judging by their position in the graphic – that this might be explained through the Esperstedt family. These individuals might have in turn got the contribution directly from the oldest member, who shows what seems (in PCA) like a recent admixture from contemporary steppe cultures (such as the Catacomb culture).

NOTE: See my graphics with interesting members of the Espersted family marked: ADMIXTURE and PCA (outlier).

Tentative sketch modelling the genetic history of Europe and West Eurasia from ancient populations up to the Neolithic, according to results in recent genetic papers and archaeological models of known migrations.

Again, needle in a haystack… And confirmation bias by me, indeed.

But interesting nonetheless.

EDIT (4 JAN 2017): A reader points out that the interpretation of Unsupervised ADMIXTURE should work backwards (i.e. different contributions into different modern populations), and not based solely on ancestral populations, which seems probably right. So again, confirmation bias (and potentially wrong direction fallacy) by me…


Genomic history of Northern Eurasians includes East-West and North-South gradients


Open Access article on modern populations (including ancient samples), Between Lake Baikal and the Baltic Sea: genomic history of the gateway to Europe, by Triska et al., BMC Genetics 18(Suppl 1):110, 2017.


The history of human populations occupying the plains and mountain ridges separating Europe from Asia has been eventful, as these natural obstacles were crossed westward by multiple waves of Turkic and Uralic-speaking migrants as well as eastward by Europeans. Unfortunately, the material records of history of this region are not dense enough to reconstruct details of population history. These considerations stimulate growing interest to obtain a genetic picture of the demographic history of migrations and admixture in Northern Eurasia.

We genotyped and analyzed 1076 individuals from 30 populations with geographical coverage spanning from Baltic Sea to Baikal Lake. Our dense sampling allowed us to describe in detail the population structure, provide insight into genomic history of numerous European and Asian populations, and significantly increase quantity of genetic data available for modern populations in region of North Eurasia. Our study doubles the amount of genome-wide profiles available for this region.

We detected unusually high amount of shared identical-by-descent (IBD) genomic segments between several Siberian populations, such as Khanty and Ket, providing evidence of genetic relatedness across vast geographic distances and between speakers of different language families. Additionally, we observed excessive IBD sharing between Khanty and Bashkir, a group of Turkic speakers from Southern Urals region. While adding some weight to the “Finno-Ugric” origin of Bashkir, our studies highlighted that the Bashkir genepool lacks the main “core”, being a multi-layered amalgamation of Turkic, Ugric, Finnish and Indo-European contributions, which points at intricacy of genetic interface between Turkic and Uralic populations. Comparison of the genetic structure of Siberian ethnicities and the geography of the region they inhabit point at existence of the “Great Siberian Vortex” directing genetic exchanges in populations across the Siberian part of Asia.

f3 values to estimate (a) Eastern European Hunter-Gatherer, b Neolithic Farmer, c Caucasus hunter-gatherer, and d) Mal’ta (Ancient North Eurasian) ancestry in modern humans

Slavic speakers of Eastern Europe are, in general, very similar in their genetic composition. Ukrainians, Belarusians and Russians have almost identical proportions of Caucasus and Northern European components and have virtually no Asian influence. We capitalized on wide geographic span of our sampling to address intriguing question about the place of origin of Russian Starovers, an enigmatic Eastern Orthodox Old Believers religious group relocated to Siberia in seventeenth century. A comparative reAdmix analysis, complemented by IBD sharing, placed their roots in the region of the Northern European Plain, occupied by North Russians and Finno-Ugric Komi and Karelian people. Russians from Novosibirsk and Russian Starover exhibit ancestral proportions close to that of European Eastern Slavs, however, they also include between five to 10 % of Central Siberian ancestry, not present at this level in their European counterparts.

Admixture proportions in studied populations, K = 6. Populations from the Extended dataset. Abbreviated population codes: NSK – Russians from Novosibirsk; STV -Starover Russians; ARK: Bashkirs from Arkhangelskiy district; BRZ – Bashkirs from Burzyansky district

Our project has patched the hole in the genetic map of Eurasia: we demonstrated complexity of genetic structure of Northern Eurasians, existence of East-West and North-South genetic gradients, and assessed different inputs of ancient populations into modern populations.

Featured image, from the article: “Departures from the expected IBD. Shown populations exceed the expected IBD sharing by more than two standard deviations.”


More evidence on the recent arrival of haplogroup N and gradual replacement of R1a lineages in North-Eastern Europe


A new article (in Russian), Kinship Analysis of Human Remains from the Sargat Mounds, Baraba forest-steppe, Western Siberia, by Pilipenko et al. Археология, этнография и антропология Евразии Том 45 № 4 2017, downloadable at ResearchGate.


We present the results of a paleogenetic analysis of nine individuals from two Early Iron Age mounds in the Baraba forest -teppe, associated with the Sargat culture (fi ve from Pogorelka-2 mound 8, and four from Vengerovo-6 mound 1). Four systems of genetic markers were analyzed: mitochondrial DNA, the polymorphic part of the amelogenin gene, autosomal STR-loci, and those of the Y-chromosome. Complete or partial data, obtained for eight of the nine individuals, were subjected to kinship analysis. No direct relatives of the “parent-child” type were detected. However, the data indicate close paternal and maternal kinship among certain individuals. This was evidently one of the reasons why certain individuals were buried under a single mound. Paternal kinship appears to have been of greater importance. The diversity of mtDNA and Y-chromosome lineages among individuals from one and the same mound suggests that kinship was not the only motive behind burying the deceased people jointly. The presence of very similar, though not identical, variants of the Y chromosome in different burial grounds may indicate the existence of groups such as clans, consisting of paternally related males. Our conclusions need further confi rmation and detailed elaboration. Keywords: Paleogenetics, ancient DNA, kinship analysis, mitochondrial DNA, uniparental genetic markers, STR-loci, Y-chromosome, Baraba forest-steppe, Sargat culture, Early Iron Age.

From the older study of the same region (Baraba, numbered 4) “Location of ancient human groups with a high frequency of mtDNA haplogroups U5, U4 and U2e lineages. The area of Northern Eurasian anthropological formation is marked by yellow region on the map (References: 1. Bramanti et al., 2009; 2. Malmstrom et
al., 2009; 3. Krause et al., 2010; 4. this study)”

Chronological time scale of Bronze Age Cultures from the Baraba region
This is the same team that brought an ancient mtDNA study of different cultures within the Baraba steppe-forest region (from the Open Access book Population Dynamics in Prehistory and Early History).

The Baraba steppe-forest is a region between the Ob and Irtysh rivers (about 800 km from west to east), stretching over 200 km from the taiga zone in the north to the steppes in the south.

The new study brings a more recent picture of the region, from the Iron Age Sargat culture, ca. 500 BC – 500 AD, with five samples of haplogroup N and two samples of haplogroup R1a.

R1a lineages in the region probably derive from the previous expansion of Andronovo and related cultures, which had absorbed North Caspian steppe populations and their Late Indo-European culture.

N subclades prevalent in certain modern Eurasian populations are probably derived from the expansion of the Seima-Turbino phenomenon.

While samples are scarce, Y-DNA data keeps showing the same picture I have spoken about more than once:

N subclades (potentially originally speaking Proto-Yukaghir languages) gradually replacing haplogroup R1a (originally probably speaking Uralic languages), probably through successive founder effects (such as the bottlenecks found in Finland), which left their Uralic culture and ethnolinguistic identification intact.

Therefore, late Corded Ware groups of North-Eastern Europe (in the Forest Zone and the Baltic), mainly of R1a-Z645 subclades, probably never adopted Late Indo-European languages.


Prehistoric loan relations: Foreign elements in the Proto-Indo-European vocabulary


An interesting ongoing web project, Prehistoric loan relations, on potential loans of Proto-Indo-European words, from Uralic-Yukaghir, Caucasian, and Middle Eastern influence.

Based on a Ph.D. thesis by Bjørn (2017) Foreign elements in the Proto-Indo-European vocabulary (PDF).

From the website (emphasis mine):

This page allows historical linguists to compare and scrutinize proposed prehistoric lexical borrowings from the perspective of Proto-Indo-European. The first entries are all (135 in total) extracted from my master’s thesis “Foreign elements in the Proto-Indo-European vocabulary” (Bjørn 2017). Comments are encouraged at the bottom of each entry. New entries will be added, also on request.

Take this not as the conclusion, but an invitation to join the conversation.

So, we welcome the invitation, and hope that this new project thrives.

Also, I loved his fantasy-like map of the central Eurasian region (featured image on this post).