Happy new year 2019…and enjoy our new books!


Sorry for the last weeks of silence, I have been rather busy lately. I am having more projects going on, and (because of that) I also wanted to finish a project I have been working on for many months already.

I have therefore decided to publish a provisional version of the text, in the hope that it will be useful in the following months, when I won’t be able to update it as often as I would like to:

A Song of Sheep and Horses

Don’t forget to check out the maps included in the supplementary materials (I have added Y-DNA, mtDNA, and ADMIXTURE data using GIS software).

NOTE. Right now the files are only in my server. I will try to upload them to Academia.edu and Research Gate when I have time, in case the websites are too slow.

I would have preferred to wait for a thorough revision of the section on archaeology and the linguistic sections on Uralic, but I doubt I will have time when the reviews come, so it was either now or maybe next December…

I say so in the introduction, but it is evident that certain aspects of the book are tentative to say the least: the farther back we go from Late Proto-Indo-European, the less clear are many aspects. Also, linguistically I am not convinced about Eurasiatic or Nostratic, although they do have a certain interest when we try to offer a comprehensive view of the past, including ethnolinguistic identities.

I cannot be an expert in everything, and these books cover a lot. I am bound to publish many corrections as new information appears and more reviews are sent. For example, just days ago (before SNP calls of Wang et al. 2018 were published) some paragraphs implied that AME might have expanded Nostratic from the Middle East. Now it does not seem so, and I changed them just before uploading the text. That’s how tentative certain routes are, and how much all of this may change. And that only if we accept a Nostratic phylum…

NOTE. Since the first book I wrote was the linguistic one, and I have spent the last months updating the archaeology + genetics part, now many of you will probably understand 1) why I am so convinced about certain language relationships and 2) how I used many posts to clarify certain ideas and receive comments. Many posts offer probably a good timeline of what I worked with, and when.


I did not add this section to the books, because they are still not ready for print, but I think this is due somewhere now. It is impossible to reference all who have directly or indirectly contributed to this, so this is a list of those I feel have played an important role.

I am indebted to the following people (which does not mean that they share my views, obviously):

First and foremost, to Fernando López-Menchero, for having the patience to review with detail many parts on Indo-European linguistics, knowing that I won’t accept many of his comments anyway. The additional information he offers is invaluable, but I didn’t want to turn this into a huge linguistic encyclopaedia with unending discussions of tiny details of each reconstructed word. I think it is already too big as it is.

Professor Kortlandt is still to review the text, but he contributed to both previous essays in some very interesting ways, so I hope he can help me improve the parts on Uralic, and maybe alternative accounts of expansion for Balto-Slavic, depending on the time depth that he would consider warranted according to the Temematic hypothesis.

I would not have thought about doing this if it were not for the interest of Wekwos (Xavier Delamarre) in publishing a full book about the Indo-European demic diffusion model (in the second half of 2017, I think). It was them who suggested that I extended the content, when all I had done until then was write an essay and draw some maps in my free time between depositing the PhD thesis and defending it.

Sadly, as much as I would like to publish a book with a professional publisher, I don’t think ancient DNA lends itself for the traditional format, so my requests (mainly to have free licenses and being able to review the text at will, as new genetic papers are published) were logically not acceptable. Also, the main aim of all volumes, especially the linguistic one, is the teaching of essentials of Late Proto-Indo-European and related languages, and this objective would be thwarted by selling each volume for $50-70 and only in printed format. I prefer a wider distribution.

At first I didn’t think much of this proposal, because I do not benefit from this kind of publications in my scientific field, but with time my interest in writing a whole, comprehensive book on the subject grew to the point where it was already an ongoing project, probably by the start of 2018.

I would not have been in contact with Wekwos if it were not for user Camulogène Rix at Anthrogenica, so thanks for that and for the interest in this work.

I would not have thought of writing this either if not for the spontaneous support (with an unexpected phone call!) of a professor of the Complutense University of Madrid, Ángel Gómez Moreno, who is interested in this subject – as is his wife, a professor of Classics more closely associated to Indo-European studies, and who helped me with a search for Indo-Europeanists.

EDIT (1 JAN 2019): I remembered that Karin Bojs sent me her book after reading the demic diffusion model. I may have also thought about writing a whole book back then, but mid-2017 is probably too early for the project.

The maps are evidently (for those who are interested in genetics) in part the result of the effort of the late Jean Manco: As you can see from the maps including Y-DNA and mtDNA samples, I have benefitted from her way of organising data and publishing it. Similarly, the work of Iain McDonald in assessing the potential migration routes of R1b and R1a in Europe with the help of detailed maps was behind my idea for the first maps, and consequently behind these, too.

I should thank all people responsible for the release of free datasets to work with, including the Reich and Jena labs, the Veeramah Lab, and also researchers from the Max Planck Institute or the Mainz Palaeogenetics group, who didn’t mind to share with me datasets to work with.

Readers of this blog with interesting comments have also been essential for the improvement of the texts. You can probably see some of your many contributions there. I may not answer many comments, because I am always busy (and sometimes I just don’t have anything interesting to say), but I try to read all of them.

EDIT (1 JAN 2019) I think I should mention at least Chetan, Egg, or Robert George; but then I would leave out old europe, Sgr Ganesh, or Tileman Ehlen; and if I include them I would leave out others…

Users of other sites, like Anthrogenica, whose particular points of view and deep knowledge of some very specific aspects are sometimes very useful. In particular, user Anglesqueville helped me to fix some issues with the merging of datasets to obtain the PCAs and ADMIXTURE, and prepared some individual samples to merge them.

Even without posting anything, Google Analytics keeps sending me messages about increasing user fidelity (returning users), and stats haven’t really changed (which probably means more people are reading old posts), so thank you for that.

I hope you enjoy the books.

Happy new year!

Long-term matrilineal continuity in a nonisolated region of Tuscany


New paper (behind paywall) The female ancestor’s tale: Long‐term matrilineal continuity in a nonisolated region of Tuscany, by Leonardi et al. Am J Phys Anthr (2018).

EDIT (10 SEP 2018): The main author has shared an open access link to read the PDF.

Interesting excerpts:

Here we analyze North-western Tuscany, a region that was a corridor of exchanges between Central Italy and the Western Mediterranean coast.

We newly obtained mitochondrial HVRI sequences from 28 individuals, and after gathering published data, we collected genetic information for 119 individuals from the region. Those span five periods during the last 5,000 years: Prehistory, Etruscan age, Roman age, Renaissance, and Present-day. We used serial coalescent simulations in an approximate Bayesian computation framework to test for continuity between the mentioned groups.

In all cases, a simple model of a long-term genealogical continuity proved to fit the data better, and sometimes much better, than the alternative hypothesis of discontinuity.

The low number of samples analyzed requires some caution in the interpretation. Because we did not test for gene flow, it is at this stage impossible to reject it, but our results suggest at least significant levels of genealogical continuity. Moreover, as it has not been possible to obtain more precise information on the age of the Eneolithic samples, they were grouped together considering the average archaeological period of interest, which may cause a bias in the analyses. (…)

Geographic location of the samples considered in this work

(…) clearly, our samples show high levels of continuity when considering the whole Tuscan region as a genetic reservoir during the Iron Age.

The posterior distributions of the parameters confirm a high degree of genetic isolation in the sampled population, with very small values for the female effective population sizes across time. Such values, in particular the Neolithic ones, are in accord with the estimates obtained in similar studies, both in Tuscany (Ghirotto et al., 2013) and in France (Rivollat et al., 2017).


Taken at their face value, our results do not show any major shift in the composition of the maternal ancestry of the population, across 50 centuries. This does not mean that no demographic process of relevance has affected the population, and indeed the higher diversity accumulating in time is the likely consequence of immigrating people, enriching the mitochondrial gene pool.

(…) the population of the current Lucca province appears to have retained very ancient mitochondrial features, despite occupying a geographical corridor between the Ligurian and the Tyrrhenian coast, and despite not showing the persistence of unique cultural traits through the centuries.


Another possibility is that that the different populations passing through the area (Etruscans, Romans, and Lombards) had a consistent social and/or sex bias. An example of similar patterns has been observed several times. Between the Late Neolithic and the Early Bronze Age, female exogamy in patrilocal society has been observed in Southern Germany (Knipper et al., 2017); during the Bronze Age the migrations toward Europe from the steppes appears to have consisted prevalently of males (Goldberg, Günther, Rosenberg, & Jakobsson, 2017); and in more recent periods in the Canary Islands, the female ancestry maintains a significant amount of autochthonous lineages, while the male ancestry was strongly influenced by the European colonization (Fregel et al., 2009, b).

It is well known that military invasions may not have a significant genetic impact upon the invaded population (Schiffels et al., 2016; Sokal, Oden, Walker, Di Giovanni, & Thomson, 1996;Weale,Weiss, Jager, Bradman, & Thomas, 2002), especially at the mitochondrial level, because of the limited size of a sustainable army, and of the fact that armies are generally composed mostly or only of males. Even if a substantial share of invaders decided to remain and settle the region, this form of gene flow would affect mostly or only the paternal lineages, rather than the maternal ones. We can also hypothesize the immigration of a number of people (e.g., Romans, Lombards) that may have acted as ruler of the region, remaining socially (and so genetically) separated by the local population, and leaving few (if any) traces in the gene pools of the local population.

Supporting Information, Table S1 New ancient samples genotyped

We expect to see that certain migrations since the Iron Age – like the Celtic and Roman ones – were somehow different from previous ones, where, at least since the Neolithic, male-dominated expansions were the rule.

If, however, male-biased expansions are also seen during the Iron Age – probably driven by particular subclades then – , this would certainly justify the continuity of admixture in certain regions in spite of these population expansions, and thus the importance of Y-DNA to track more recent language changes.

One of the most interesting details of the upcoming paper of Italic peoples will be the Y-DNA (and admixture) of Etruscans compared to other neighbouring peoples, given the known conflicting theories regarding their recent vs. older origin in the East before the historical record.


The future of the Reich Lab’s studies and interpretations of Late Indo-European migrations


Short report on advances in Genomics, and on the Reich Lab:

Some interesting details:

  • The Lab is impressive. I would never dream of having something like this at our university. I am really jealous of that working environment.
  • They are currently working on population transformations in Italy; I hope we can have at last Italic and Etruscan samples.
  • It is always worth it to repeat that we are all the source of multiple admixture events, many of them quite recent; and I liked the Star Wars simile.
  • Also, some names hinting at potential new samples?? Zajo-I, Chanchan, Gurulde?, Володарка (Ukraine – medieval?), Autodrom, Облевка, Кресты, Кудуксай (Ural region, palaeo-metal?), Золкут, etc.
Ancient DNA sample bag?

On the bad aspect, they keep repeating the same “steppe ancestry” meme (in the featured image above, or the one below). I know this is the news report (i.e. science communication), not exactly the Reich Lab, but these maps didn’t appear out of the blue.

Steppe ancestry distribution in Europe, according to PBS.

Interesting for future interpretations is the whiteboard behind David Reich’s back (apparently they like to keep relevant information on whiteboards…):

Whiteboard behind David Reich’s back (at his office?).

It seems that while the Copenhagen group will still be bound (see here) by the Gimbutas/Kristiansen starting point, the Reich Lab will remain bound by Anthony’s selection of Ringe’s (2002) glottochronological model, and they will try to make genomic data fit in with it.

In fact, the whiteboard doesn’t even include Ringe’s link of Germanic with Italo-Celtic, which could maybe hint at Anthony’s recent change of heart? (i.e. Yamna Hungary -> Corded Ware). That would mean still less Linguistics (if glottochronology can be called that), and more Archaeology…

Image from Anthony & Ringe (2015). “The Proto-Indo-European homeland, with migrations outward at about 4200 BCE (1), 3300 BCE (2), and 3000 BCE (3a and 3b). A tree diagram (inset) shows the pre-Germanic split as unresolved. Modified from Anthony (2013).”

I don’t know why university labs need to do this: To select the linguistic model preferred by a single archaeologist, which happens to be the lead archaeologist of the group, and then try to make genetic data agree again and again with that model. I guess it is a strategic question, and has to do with granting continued contacts with archaeological sites, and access to samples from them?

I understand none of them will try to learn ancient languages, too much work probably. But, wouldn’t it have been more scientifish, at least, to depart from, say, three or four reasonable potential linguistic models (that is, from Indo-Europeanists), and from there discuss the best potential fits for the current genomic data in each paper?

This is, for example, how the Heyd (archaeologist) + German/Spanish Indo-Europeanist schools would look like:

Yamnaya expansion coupled with Meid’s (1975) description of three stages of Proto-Indo-European development (as interpreted by Adrados 1998) and depiction of Heyd’s proposal of Yamna expansion.

Wouldn’t you say it could have fitted the statistical and Y-DNA data seamlessly, in contrast to Gimbutas/Trager (i.e. Kristiansen today), or to Anthony/Ringe?

NOTE. I would say the mainstream German school follows Meid’s (1975) three-stage theory coupled with Dunkel’s (e.g. 1997) nomenclature. The Spanish school follows Adrados, who has repeated ad nauseam that he was the first to mention the three-stage theory in conferences and papers previous to and coincident with Meid’s proposal (see his latest JIES article, a paper available in Scribd). In any case, Spanish and German scholars have been working hand in hand in accepting and developing a general linguistic model similar to the one above.

Archaeological theories like those of Heyd or Mallory for Yamna and Bell Beaker (in contrast to Kristiansen or Anthony), and Prescott and Walderhaug for Bell Beaker and Germanic (contrasting with Kristiansen and Iversen) are compatible with this German/Spanish model.

The French school is non-existent on the homeland matter, Italian scholars seem to be behind even in the description of Anatolian as archaic (probably related to the general wish to have Latin as derived from Vergil’s Troy), Russian scholars are still working with Nostratic and Mesolithic expansions, and Leiden, as the leading IE publisher worldwide today, is full of very different ‘divos’, each with his own pet theory (some obviously agreeing with the German/Spanish model; and especially interesting is that some of them are strong supporters of an Indo-Uralic proto-language).

The English-speaking world, on the other hand, has seen the most varied models being either proposed or translated into its language, with the most popular ones being those publicized by archaeologists (Winfred P. Lehmann being one of the noteworthy exceptions), which may explain why for some people (archaeologists or geneticists) linguistics seems more like a game. It is to be assumed that these same people haven’t taken a look at the dozens of genetic papers published to date – and hundreds of archaeological papers using a bit of linguistics to support their models – , and how wrong they have all been in their interpretations, or else they would realize that genomics does (sadly) not really look like a serious discipline at all right now among most linguists, and among many archaeologists either…

Thus, instead of comparing the main theories on Proto-Indo-European (i.e. linguistics->archaeology->genetics), which would have offered the most stable framework to assess potential prehistoric ethnolinguistic identifications, they keep using a single, simplistic language tree liked by an archaeologist, and trying to fit genetic data to it, while also adapting archaeology to genetics, i.e. genetics->archaeology->linguistics; which, as you can imagine, is not going to convince any linguist.

Especially disappointing is that the world’s leading genetic lab still relies on a marginal proposal based on glottochronology, the homeopathy of linguistics… At least in that regard everyone should know better by now.

Also, they keep interacting with the wrong audience: instead of trying to engage linguists into the real homeland and dialectal quest, to keep Genomics a serious discipline among academics, they tend to discuss with politically- or racially-motivated people, which is probably also in line with strategic decisions.

In the example below, we see the main author of their recent paper on Indo-Iranian migrations seeking once again interaction, this time through “news” promoted by Hindu nationalist bigots, so that – even if that makes them look more neutral in the eyes of those who may allow access to Indian samples – , in the end, we see in genomics a fictitious revival of the “AIT vs. OIT debate” dead long ago in linguistics and archaeology (anywhere but in India).

Pretty disappointing to see these trends; so much effort and time invested in futile discussions and infinitely reworked doomed glottochronological or 19th-century models, when it is the fine-scale population structure of expanding Yamna peoples what we should be discussing now, and thus Late PIE dialectalisation with offshoots Afanasevo, East Bell Beaker, Balkan Bronze Age, and Sintashta/Potapovka; as well as Corded Ware evolution in Uralic-speaking territory.

EDIT (7 JUN 2018): Some parts of the text have been corrected or slightly modified.


On Latin, Turkic, and Celtic – likely stories of mixed societies and little genetic impact


Recent article on The Conversation, The Roman dead: new techniques are revealing just how diverse Roman Britain was, about the paper (behind paywall) A Novel Investigation into Migrant and Local Health-Statuses in the Past: A Case Study from Roman Britain, by Redfern et al. Bioarchaeology International (2018), among others.

Interesting excerpts about Roman London:

We have discovered, for example, that one middle-aged woman from the southern Mediterranean has black African ancestry. She was buried in Southwark with pottery from Kent and a fourth century local coin – her burial expresses British connections, reflecting how people’s communities and lives can be remade by migration. The people burying her may have decided to reflect her life in the city by choosing local objects, but we can’t dismiss the possibility that she may have come to London as a slave.

The evidence for Roman Britain having a diverse population only continues to grow. Bioarchaeology offers a unique and independent perspective, one based upon the people themselves. It allows us to understand more about their life stories than ever before, but requires us to be increasingly nuanced in our understanding, recognising and respecting these people’s complexities.

We already have a more or less clear idea about how little the Roman conquest may have shaped the genetic map of Europe, Africa, or the Middle East, in contrast to other previous or later migrations or conquests.

Also, on the Turkic expansion, the recent paper of Damgaard et al. (Nature 2018) stated:

In the sixth century AD, the Hunnic Empire had been broken up and dispersed as the Turkic Khaganate assumed the military and political domination of the steppes22,23. Khaganates were steppe nomad political organizations that varied in size and became dominant during this period; they can be contrasted to the previous stateless organizations of the Iron Age24. The Turkic Khaganate was eventually replaced by a number of short-lived steppe cultures25 (…).

We find evidence that elite soldiers associated with the Turkic Khaganate are genetically closer to East Asians than are the preceding Huns of the Tian Shan mountains (Supplementary Information section 3.7). We also find that one Turkic Khaganate-period nomad was a genetic outlier with pronounced European ancestries, indicating the presence of ongoing contact with Europe (…).

Analyses of Turk- and Medieval-period population clusters. a, PCA of Tian Shan Hun, Turk, Kimak, Kipchack, Karakhanid and Golden Horde, including 28 individuals analysed at 242,406 autosomal SNP positions. b, Results for model-based clustering analysis at K = 7. Here we illustrate the admixture analyses with K = 7 as it approximately identifies the major component of relevance (Anatolian/ European farmer component, Caucasian ancestry, EHG-related ancestry and East Asian ancestry).”

These results suggest that Turkic cultural customs were imposed by an East Asian minority elite onto central steppe nomad populations, resulting in a small detectable increase in East Asian ancestry. However, we also find that steppe nomad ancestry in this period was extremely heterogeneous, with several individuals being genetically distributed at the extremes of the first principal component (Fig. 2) separating Eastern and Western descent. On the basis of this notable heterogeneity, we suggest that during the Medieval period steppe populations were exposed to gradual admixture from the east, while interacting with incoming West Eurasians. The strong variation is a direct window into ongoing admixture processes and the multi-ethnic cultural organization of this period.

We already knew that the expansion of the La Tène culture, associated with the expansion of Celtic languages throughout Europe, was probably not accompanied by massive migrations (from the IEDM, 3rd ed.):

The Mainz research project of bio-archaeometric identification of mobility has not proven to date a mass migration of Celtic peoples in central Europe ca. 4th-3rd centuries BC, i.e. precisely in a period where textual evidence informs of large migratory movements (Scheeres 2014). La Tène material culture points to far-reaching inter-regional contacts and cultural transfers (Burmeister 2016).

Also, from the latest paper on Y-chromosome bottleneck:

[The hypothesis of patrilineal kin group competition] has an added benefit in that it could explain the temporal placement of the bottleneck if competition between patrilineal kin groups was the main form of intergroup competition for a limited episode of time after the Neolithic transition. Anthropologists have repeatedly noted that the political salience of unilineal descent groups is greatest in societies of ‘intermediate social scale’ (Korotayev47 and its citations on p. 2), which tend to be post-Neolithic small-scale societies that are acephalous, i.e. without hierarchical institutions48. Corporate kin groups tend to be absent altogether among mobile hunter gatherers with few defensible resource sites or little property (Kelly49 pp. 64–73), or in societies utilizing relatively unoccupied and under-exploited resource landscapes (Earle and Johnson50 pp. 157–171). Once they emerge, complex societies, such as chiefdoms and states, tend to supervene the patrilineal kin group as the unit of intergroup competition, and while they may not eradicate them altogether as sub-polity-level social identities, warfare between such kin groups is suppressed very effectively51,52.These factors restrict the social phenomena responsible for the bottleneck to the period after the initial Neolithic but before the emergence of complex societies, which would place the bottleneck-generating mechanisms in the right period of time for each region of the Old World.

Diachronic map of Late Copper Age migrations including Classical Bell Beaker (east group) expansion from central Europe ca. 2600-2250 BC

However, I recently read in a forum for linguists that the expansion of East Bell Beakers overwhelmingly of R1b-L21 subclades in the British Isles “poses a problem”, in that it should be identified with a Celtic expansion earlier than traditionally assumed…

That interpretation would be in line with the simplistic maps we are seeing right now for Bell Beakers (see below for the Copenhagen group).

If anything, the results of Bell Beaker expansions (taken alone) would seem to support a model similar to Cunliffe & Koch‘s hypotheses of a rather early Celtic expansion into Great Britain and Iberia from the Atlantic.

Spread of Indo-European languages (by the Copenhagen group).

But it doesn’t. Mallory already explained why in Cunliffe & Koch’s series Celtic from the West: the Bell Beaker expansion is too early for that; even for Italo-Celtic. It should correspond to North-West Indo-European speakers.

Not every population movement that is genetically very significant needs to be significant for the languages attested much later in the region.

This should be obvious to everyone with the many examples we already have. One of the least controversial now would probably be the expansion of R1b-DF27, widespread in Iberia probably at roughly the same time as R1b-L21 was in Great Britain, and still pre-Roman Iberians showed a mix of non-Indo-European languages, non-Celtic languages (at least Galaico-Lusitanian), and also some (certain) Celtic languages. And modern Iberians speak Romance languages, without much genetic impact from the Romans, either…

It is well-established in Academia that the expansion of La Tène is culturally associated with the spread of Celtic languages in Europe, including the British Isles and Iberia. While modern maps of U152 distribution may correspond to the migration of early Celts (or Italo-Celtic speakers) with Urnfield/Hallstatt, the great Celtic expansion across Europe need not show a genetic influence greater than or even equal to that of previous prehistoric migrations.

Post-Bell-Beaker Europe, after ca. 2200 BC.

You can see in these de novo models the same kind of invented theoretical ‘problem’ (as Iosif Lazaridis puts it) that we have seen with the Corded Ware showing steppe ancestry, with Old Hittite samples not showing EHG ancestry, or with CHG ancestry appearing north of the Caucasus but no EHG to the south.

However you may want to explain all these errors in scientific terms (selection bias, under-coverage, over-coverage, faulty statistical methods, etc.), these interpretations were simply fruit of the lack of knowledge of the anthropological disciplines at play.

Let’s hope the future paper on Celtic expansion takes this into consideration.


Pre-Roman and Roman mitogenomes from Southern Italy


Ph.D. thesis Assessing Migration and Demographic Change in pre-Roman and Roman Period Southern Italy Using Whole-Mitochondrial DNA and Stable Isotope Analysis, or The Biogeographic Origins of Iron Age Peucetians and Working-Class Romans From Southern Italy, by Matthew Emery, McMaster University (2018).

Abstract (emphasis mine):

Assessing population diversity in southern Italy has traditionally relied on archaeological and historic evidence. Although informative, these lines of evidence do not establish specific instances of within lifetime mobility, nor track population diversity over time. In order to investigate the population structure of ancient South Italy I sequenced the mitochondrial DNA (mtDNA) from 15 Iron Age (7th – 4th c. BCE) and 30 Roman period (1st – 4th c. BCE) individuals buried at Iron Age Botromagno and Roman period Vagnari, in southern Italy, and analyzed δ18O and 87Sr/86Sr values from a subset of the Vagnari skeletal assemblage.

Phylogenetic analysis of 15 Iron Age mtDNAs together with 231 mtDNAs spanning European prehistory suggest that southern Italian Iapygians share close genetic affinities to Neolithic populations from eastern Europe and the Near East. Population pairwise analysis of Iron Age, Roman, and mtDNA datasets spanning the pan-Mediterranean region (n=357), indicate that Roman maternal genetic diversity is more similar to Neolithic and Bronze Age populations from central Europe and the eastern Mediterranean, respectively, than to Iron Age Italians. Genetic distance between population age categories imply moderate mtDNA turnover and constant population size during the Roman conquest of South Italy in the 3rd century BCE.

In order to determine the local versus non-local demographic at Vagnari, I measured the 87Sr/86Sr and 18O/16O of composition of 43 molars, and the 87Sr/86Sr composition of an additional 13 molars, and constructed a preliminary 87Sr/86Sr variation map of the Italian peninsula using disparate 87Sr/86Sr datasets. The relationship between 87Sr/86Sr and previously published δ18O data suggest a relatively low proportion of migrants lived at Vagnari (7%).

This research is the first to generate whole-mitochondrial DNA sequences from Iron Age and Roman period necropoleis, and demonstrates the ability to gain valuable information from the integration of aDNA, stable isotope, archaeological and historic evidence.

mtDNA haplogroup composition between Botromagno (7th – 4th century BCE; n=15) and Vagnari (1st – 4th century CE; n=30) skeletal assemblages.

Interesting excerpts:

Taken together, population pairwise ΦST, and the distribution of mtDNA haplotypes in relation to the comparative mtDNA data set show that the Iron Age southern Italians likely descended from early to late Neolithic farmers from Anatolia and possibly as far East as the Caucasus, and from migrants arriving from eastern Europe around the late Neolithic/early Bronze Age. These findings support previous hypotheses that the ancestors of the Iapygians may have originated in the eastern Balkan region, or derive shared ancestry with a common source population from eastern Europe. Alternatively, southern Italian Iron Age mtDNA variation might also reflect LGM gene flow between southwestern European, Mediterranean, and Carpathian basin refugia, which was suggested for haplogroup subclusters of U5 and J (Malyarchuk et al., 2010; Pala et al., 2012). Future mtDNA (and nuclear DNA) analysis comprised of a larger Iron Age data set from southern Italy is necessary to answer Theodor Mommsen’s initial hypothesis that the Iapygians were the oldest immigrants to the southern Italian region.

Our investigation provides the first mtDNA evidence for the maternal ancestral affiliations of a subset of the Iapygian individuals recovered from southern Italy, and suggests a closer genetic link to European Neolithic and Iron Age Armenians, than to Bronze Age Aegeans. Future comparative ancient DNA data using whole-genome SNP, mtDNA, and NRY-chromosome analysis of pre-Roman populations will provide complementary evidence for the ancestral roots of understudied Iron Age individuals from Italy.

Simplistic map of Illyrian colonies in Italy 550 BCE, from Wikipedia

Archaeological evidence indicates that the Iapygians traded and incorporated Hellenistic elements into their material and cultural traditions (Small, 1992; Peruzzi, 2016). These changes are most apparent in burial custom and ceramic production, and become increasingly prominent by 2400 BP (Peruzzi, 2016). Further evidence shows that Iron Age communities across South Italy retracted in size amidst ongoing conflict between colonies in Magna Graecia, and Rome and Carthage (Small, 1992). This apparent change was interpreted as a decline in local populations throughout the region. However, Bayesian Skygrid analysis using the mtDNA profiles of 15 Iapygians and 30 Roman period individuals suggest that female effective population size was comparable between the two populations. In Chapter 4, population distance (measured as population pairwise ΦST values) across a range of mtDNAs obtained from the pan-Mediterranean, European, and western Asian regions suggest closer maternal affinities to Neolithic and Bronze Age populations from the eastern Mediterranean as a cohort, than with Iron Age Italians. This finding points to moderate mtDNA turnover, and is likely the consequence of Roman gene flow stemming from central and northern Italy via the migration and subsequent occupation by Roman colonies after 2250 BP.

Roman Imperial pursuits peaked by ~2050 BP. This extension of power, coupled with an increase in food and materials procurement, was driven by a substantial labour force comprised of both low status Romans and slaves (Harris, 1980; Bradley, 1987, 1994, 2000). Although several attempts have been made to quantify the number of slaves required to maintain the Roman economy, it is unknown what fraction of the Roman population was slave-owned (~approximately 1 to 3 million by 2050 BP) (Scheidel, 2005). Rome’s slave acquisition during the early centuries of the Republic was likely maintained through military campaigns and conquest, a trend that is well documented in Italy (Scheidel, 1997, 1999, 2005; Harris, 1999; Small, 2002). However, once territory was secured, local slave populations were likely maintained through one or a combination of the following: i) the importation of slaves from non-local regions, ii) were born to slave-owned parents, or iii) were voluntarily self-enslaved to acquire subsistence (Harris, 1999). The importation of foreign slaves was likely more costly than maintaining a self-reproducing slave population, especially in rural areas. As such, rural Roman necropoleis, such Vagnari, provide an opportune case to determine the local versus non-local demographic. Archaeological evidence suggests that Vagnari was involved in agriculture and industrial procurement, and was likely staffed by low-class individuals possibly including slaves (Small et al., 2000). However, without direct archaeological or epigraphic evidence, it is impossible to identify the proportion of slaves at rural sites.

Multi-dimensional scaling plots showing pairwise ΦST values by a) age and b) country. We removed age and geographic categories with less than 5 mtDNA sequence representation to reduce scaling stress, which decreased the sample size from 402 mtDNAs to n = 378 by age, and n= 382 by country. a) MDS plot of the mtDNA categorized by country of origin; b) MDS of mtDNA dataset by age spanning the Upper Paleolithic (pre-LGM) to the Roman period. IronAge 1 = Italian Iron Age samples; IronAge 2 = Armenian Iron Age samples; Roman 1 = Italian Roman samples; Roman 2 = Egyptian Roman samples; TIP = Third Intermediary Period (Egypt); LP = Late Period (Egypt); PP = Ptolemaic Period (Egypt).

(…) The isotope values presented in Chapter 3 obtained from 56 Roman individuals buried at Vagnari suggest that over half (58%) were born directly at Vagnari, with a further 34% originating from South Italy. Only 7% (3/43 with both δ18O and 87Sr/86Sr values) of the individuals sampled resulted in isotope values non-local to the southern peninsula. Two of these individuals originated from either northern Italy or, more broadly, from central Europe, while one individual likely originated from North Africa. Overall, the isotope data suggest a low number of immigrants at Vagnari, which conforms with the population pairwise (ΦST) data for the Iron Age and Roman mtDNAs, and suggests that as the Romans occupied the region, they populated their Imperial properties with people from central Italy (possible the region of Latium, and the surrounding environs of Rome). These results also integrate with the historical evidence concerning the Roman slave economy during the Imperial period. Future research using a larger comparative dataset comprised of pre-Roman and Roman period mtDNAs, δ18O and, 87Sr/86Sr results will refine the interpretations outlined here.

A paper from this thesis is already published in a peer-review journal, Mapping the origins of Imperial Roman workers (1st–4th century CE) at Vagnari, Southern Italy, using 87Sr/86Sr and δ18O variability, Am J Phys Anthropol (2018).


Olalde et al. and Mathieson et al. (Nature 2018): R1b-L23 dominates Bell Beaker and Yamna, R1a-M417 resurges in East-Central Europe during the Bronze Age

The official papers Olalde et al. (Nature 2018) and Mathieson et al. (Nature 2018) have appeared. They are based on the 2017 preprints at BioRxiv The Beaker Phenomenon And The Genomic Transformation Of Northwest Europe and The Genomic History Of Southeastern Europe respectively, but with a sizeable number of new samples.

Papers are behind a paywall, but here are the authors’ shareable links to read the papers and supplementary materials: Olalde et al. (2018), Mathieson et al. (2018).

NOTE: The corresponding datasets have been added to the Reich Lab website. Remember you can use my drafts on DIY Human Ancestry analysis (viz. Plink/Eigensoft, PCA, or ADMIXTURE) to investigate the data further in your own computer.

Image modified by me, from Olalde et al (2018). PCA of 999 Eurasian individuals. Marked is the late CWC outlier sample from Esperstedt, showing how early East Bell Beaker samples are the closest to Yamna samples.

I don’t have time to analyze the samples in detail right now, but in short they seem to convey the same information as before: in Olalde et al. (2018) the pattern of Y-DNA haplogroup and steppe ancestry distribution is overwhelming, with an all-R1b-L23 Bell Beaker people accompanying steppe ancestry into western Europe.

EDIT: In Mathieson et al. (2018), a sample classified as of Ukraine_Eneolithic from Dereivka ca. 2890-2696 BC is of R1b1a1a2a2-Z2103 subclade, so Western Yamna during the migrations also of R1b-L23 subclades, in contrast with the previous R1a lineages in Ukraine. In Olalde et al. (2018), it is clearly stated that of the four BB individuals with higher steppe ancestry, the two with higher coverage could be classified as of R1b-S116/P312 subclades.

This is compatible with the expansion of Indo-European-speaking Yamna migrants (also mainly of R1b-L23 subclades) into the East Bell Beaker group, as described with detail in Archaeology (and with the population movement we are seeing having been predicted) first by Volker Heyd in 2007.

Yamna – East Bell Beaker migration 3000-2300 BC. Adapted from Harrison and Heyd (2007), Heyd (2007)

Also, the resurge of R1a-Z645 subclades in Czech and Polish lands (from previous Corded Ware migrants) accompanying other lineages indigenous to the region – seems to have happened only after the Bell Beaker expansion into these territories, during the Bronze Age, probably leading to the formation of the Balto-Slavic community, as I predicted based on previous papers. The fact that a sample of R1b-U106 subclade pops up in this territory is interesting from the point of view of a shared substrate with Germanic, as is the earlier BB sample of R1b-Z2103 for its connection with Graeco-Aryan dialects.

All this suggests that a North-West Indo-European dialect – ancestor of Italo-Celtic, Germanic, and Balto-Slavic -, supported in Linguistics by most modern Indo-European schools of thought, expanded roughly along the Danube, and later to northern, eastern, and western Europe with the Bell Beaker expansion, as supported in Anthropology by Mallory (in Celtic from the West 2, 2013), and by Prescott for the development of a Nordic or Pre-Germanic language in Scandinavia since 1995.

Diachronic map of Late Copper Age migrations including Classical Bell Beaker (east group) expansion from central Europe ca. 2600-2250 BC

Maybe more importantly, the fact that only Indo-Iranian-speaking Sintashta-Petrovka (and later Andronovo) cultures were clearly associated with R1a-Z645 subclades, and rather late – after mixing with early Chalcolithic North Caspian steppe groups (mainly East Yamna and Poltavka herders of R1b-L23 subclades) – gives support to the theory that Corded Ware (and probably the earlier Sredni Stog) groups did not speak or spread Indo-European languages with their migration, but most likely Uralic – as seen in recent papers on the much later arrival of haplogroup N1c – (compatible with the Corded Ware substrate hypothesis), adopting Indo-Iranian by way of cultural diffusion or founder effect events.

As Sheldon Cooper would say,

Under normal circumstances I’d say I told you so. But, as I have told you so with such vehemence and frequency already the phrase has lost all meaning. Therefore, I will be replacing it with the phrase, I informed you thusly

I informed you thusly:

The Indo-European demic diffusion model, and the “R1b – Indo-European” association


Beginning with the new year, I wanted to commit myself to some predictions, as I did last year, even though they constantly change with new data.

I recently read Proto-Indo-European homelands – ancient genetic clues at last?, by Edward Pegler, which is a good summary of the current state of the art in the Indo-European question for many geneticists – and thus a great example of how well Genetics can influence Indo-European studies, and how badly it can be used to interpret actual cultural events – although more time is necessary for some to realize it. Notice for example the distribution of ‘Yamnaya’ in 3000 BC, all the way to Latvia (based on the initial findings of Mathieson et al. 2017), and the map of 2000 BC with ‘Corded Ware’, both suggesting communities linked by admixture and unrelated to actual cultures.

Some people – especially those interested in keeping a simplistic picture of Europe, either divided into admixture groups or simplistic R1b-Vasconic / R1a-Indo-European / N1c-Uralic (or any combination thereof) – want (others) to believe that I am linking ‘Indo-Europeans’ with haplogroup R1b. That is simply not true. In fact, my model dismisses such simplistic identifications of the reconstructible proto-languages with any modern peoples, admixtures, or haplogroups.

Simplistic Vasconic/R1b-Uralic/N1c distribution, and intruding Indo-European/R1a, according to Wiik.

The beauty of the model lies, therefore, precisely in that if you take any modern group speaking Indo-European languages, none can trace back their combination of language, admixture, and/or haplogroup to a common Indo-European-speaking people. All our ancestral lines have no doubt changed language families (and indeed cultures), they have admixed, and our European regions’ paternal lines have changed, so that any dreams of ‘purity’ or linguistic/cultural/regional continuity become absurd.

That conclusion, which should be obvious to all, has been denied for a long time in blogs and forums alike, and is behind the effort of many of those involved in amateur genetics.

Main linguistic aim

The main consequence of the model, as the title of the paper suggests, is that reconstructible Indo-European proto-languages expanded with people, i.e. with actual communities, which is what we can assert with the help of Genomics. From a personal (or ethnic, or political) point of view genomics is useless, but from an anthropological (and thus linguistic) point of view, genomics can be a very useful tool to decide between alternative models of language diffusion, which has given lots of headaches to those of us involved in Indo-European studies.

The demic diffusion theory for the three main stages of the proto-language expansion was originally, therefore, a dismissal of impossible-to-prove cultural diffusion models for the proto-language – e.g. the adoption of Late Proto-Indo-European by Corded Ware groups due to a patron-client relationship (as proposed by Anthony), or a long-lasting connection between cultures (as proposed by Kristiansen, and favoured by “constellation analogy” proponents like Clackson, who negated the existence of common proto-languages). It also means the acceptance of the easiest anthropological model for language change: migration and – consequently – replacement.

By the time of the famous 2015 papers, I had been dealing for some time with the idea that the shared features between Indo-Iranian and Balto-Slavic may have been due to a common substrate, and must have therefore had some reflection in genomic finds. The data on these papers, and the addition of a weak connection between Pre-Germanic and Balto-Slavic communities, together with their clearest genetic link – R1a-M417 subclades (especially European Z283) – made it still easier to propose a Corded Ware substrate, partially common to the three.

Allentoft Corded Ware
Allentoft et al. “Arrows indicate migrations — those from the Corded Ware reflect the evidence that people of this archaeological culture (or their relatives) were responsible for the spreading of Indo-European languages. All coloured boundaries are approximate.”

Before the famous 2015 papers (and even after them, if we followed their interpretation), we were left to wonder why the supposed vector of expansion of Indo-European languages, Corded Ware migrants – represented by R1a-Z645 subclades, and supposedly continued unchanged into modern populations in its ‘original’ ancestral territories, Balto-Slavic and Indo-Iranian – , were precisely the (phonetically) most divergent Indo-European languages – relative to the parent Late Indo-European proto-language.

My paper implied therefore the dismissal of an unlikely Indo-Slavonic group, as proposed by Kortlandt, and of a still less factible Germano-Slavonic, or Germano-Indo-Slavonic (?) group, as loosely implied by some in the past, and maybe supported in certain archaeological models (viz. Kristiansen or partially Anthony), and presently by some geneticists since their simplistic 2015 papers on “massive migrations from the steppe“, and amateur genetic fans with infinite pet theories, indeed.

A common Corded Ware substrate to Balto-Slavic and Indo-Iranian, and common also partially between Balto-Slavic and Germanic (as supported by Kortlandt, too, albeit with different linguistic connotations), would explain their common features. The Corded Ware culture (and Uralic, tentatively proposed by me as the group’s main language family) is a strong potential connection between them, further supported by phylogeography, too.

Other consequences

Interpretations in my paper help thus dismiss the simplistic Yamna -> Corded Ware -> Bell Beaker migration model implied with phylogeography in the 2000s, and revived again by geneticists and Kristiansen’s workgroup based on the famous 2015 papers, whereby – due to the “Yamnaya ancestral component” – the Yamna culture would have been composed of communities of R1a-M417 and R1b-M269 lineages which remained against all odds ‘related but separated’ for more than two thousand years, sharing a common unitary language (why? and how?), and which expanded from Yamna (mainly R1b-L23) into Corded Ware (mainly R1a-M417) and then into Bell Beaker (mainly R1b-L51), in imaginary migration waves whose traces Archaeology has not found, or Anthropology described, before.

While phylogeography (especially the distribution of ancient samples of certain R1b and R1a subclades) was the main genetic aspect I used in combination with Archaeology and Anthropology to challenge the reliability of the “Yamnaya ancestral component” in assessing migrations – and thus Kristiansen’s now-popular-again modified Kurgan model – , my main aim was to prove a recent expansion of Late Proto-Indo-European from the steppe, and a still more recent expansion of a common group of speakers of North-West Indo-European, the language ancestral to Italo-Celtic, Germanic, and probably Balto-Slavic (or ‘Temematic’, the NWIE substrate of Balto-Slavic, according to some linguists).

My arguments serve for this purpose, and modern distributions of haplogroups or admixture are fully irrelevant: I am ready to change my view at any time, regarding the role of any haplogroup, or ancestral component, archaeological data, or anthropological migration model, to the extent that it supports the soundest linguistic model.

Stages of Proto-Indo-European evolution. IU: Indo-Uralic; PU: Proto-Uralic; PAn: Pre-Anatolian; PToch: Pre-Tocharian; Fin-Ugr: Finno-Ugric. The period between Balkan IE and Proto-Greek could be divided in two periods: an older one, called Proto-Greek (close to the time when NWIE was spoken), probably including Macedonian, and spoken somewhere in the Balkans; and a more recent one, called Mello-Greek, coinciding with the classically reconstructed Proto-Greek, already spoken in the Greek peninsula (West 2007). Similarly, the period between Northern Indo-European and North-West Indo-European could be divided, after the split of Pre-Tocharian, into a North-West Indo-European proper, during the expansion of Yamna to the west, and an Old European period, coinciding with the formation and expansion of the East Bell Beaker group.

Gimbutas’ old theory of sudden and recent expansion served well to support a real community of Proto-Indo-European speakers, as did later the Yamna -> Corded Ware -> Bell Beaker theory that circulated in the 2000s based on modern phylogeography, and as did later partially Anthony’s updated steppe theory (2007). On the other hand, Kristiansen’s long-lasting connections among north-west Pontic steppe cultures and Globular Amphorae and Trypillian cultures, did not fit well with a close community expanding rapidly – although recent genetic data on Trypillia and Globular Amphorae might be compelling him to improve his migration theory.

So, if data turns out to be not as I expect now, I will reflect that in future versions of the paper. I have no problem saying I am wrong. I have been wrong many times before, and something I am certain is that I am wrong now in many details, and I am going to be in the future.

If, for example, R1b-L23(xZ2105) is demonstrated to come from Hungary and not the steppe (as supported by Balanovsky) or R1a-M417 samples are proved to have expanded with West Yamna settlers (as recently proposed by Anthony, see below the Balto-Slavic question), I would support the same model from a linguistic point of view, but modified to reflect these facts. Or if a direct migration link is found in Archaeology from Yamna to Corded Ware, and from Corded Ware to Bell Beaker (as proposed in the 2015 papers), I will revise that too (again, see the image below). Or, if – as Lazaridis et al. (2017) paper on Minoans and Mycenaeans suggested – the Anatolian hypothesis (that is, one of the multiple ones proposed) turns out to be somehow right, I will support it.

My map of Late Proto-Indo-European expansion (A Grammar of Modern Indo-European, 2006), following Gimbutas and Mallory.

Haplogroups are the least important aspect of the whole model, they are just another data that has to be taken into account for a throrough explanation of migrations. It has become essential today because of the apparent lack of vision on the part of geneticists, who failed to use them to adjust their findings of admixture with findings of haplogroup expansions, favouring thus a marginal theory of long-lasting steppe expansion instead of the mainstream anthropological models.

Since many of these alternative scenarios seem less and less likely with each new paper, it is probably more efficient to talk about which developments are most likely to challenge my model.

Main points

My main predictions – based mostly on language guesstimates, archaeological cultures, and anthropological models of migration -, even with the scarce genomic data we had, have been proven right until know with new samples from Mathieson et al. (2017) and Olalde et al. (2017), among other papers of this past year. These were my original assumptions:

(1) A Middle Proto-Indo-European expansion defined by the appearance of steppe ancestry + reduction in haplogroup diversity and expansion of (mainly) R1b-M269 and R1b-L23 lineages;

(2) A Late Proto-Indo-European expansion defined by steppe ancestry + reduction in haplogroup diversity and expansion of (mainly) R1b-L23 subclades; and

(3) A North-West Indo-European expansion defined by steppe ancestry + reduction in haplogroup diversity and expansion of (mainly) R1b-L51 subclades.

The expansion of Corded Ware peoples, associated with steppe ancestry + reduction in haplogroup diversity and expansion of (mainly) R1a-Z645 subclades, represents thus a different migration, which is compatible with the different nature of the Corded Ware culture, unrelated to Yamna and without migration waves from one to the other (although there were certainly contacts in neighbouring regions).

As you can see, neither of the 3+1 expansion models imply that no other haplogroup can be found in the culture or regions involved (others have in fact been found, and still the models remain valid): these migrations imply a reduction of haplogroup diversity, and the expansion of certain subclades as is common in population expansions throughout history. While we all accept this general idea, some people have difficulties accepting just those cases not compatible with their dreams of autochthonous continuity.

Nevertheless, there are still voids in genetic investigation.

Controversial aspects

In my humble opinion, these are potential conflict periods and the most likely areas of change for the future of the theory:

1. When and how did R1b-M269 lineages become “chiefs” in the steppe?

Based on scarce data from Khvalynsk, it seems that during the Neolithic there were many haplogroups in the North Pontic and North Caspian steppes. A reduction to R1b-M269 subclades must have happened either just before or (as I support) during (the migrations that caused) the Suvorovo-Novodanilovka expansion among Sredni Stog, probably coinciding also with the expansion (or one of the expansions) of CHG ancestry (and thus the appearance of ‘Steppe component’ in the steppe). My theory was based initially on Anthony’s account and TMRCA of haplogroups of modern populations (both ca. 4200-4000 BC), but recent samples of the Balkans (R1b-M269 and steppe ancestry) seem to trace the population expansion some centuries back.

If my assessment is correct, then modern populations of haplogroup R1b-M269* and R1b-L23* in the Balkans probably reflect that ancient expansion, and samples related to Proto-Anatolian cultures in the Balkans will most likely be of R1b-M269 subclades and R1b-L23*. After admixture in the Balkans, posterior migrations of Anatolian languages into Anatolia might be associated with a different admixture component and haplogroups, we don’t have enough data yet.

If the haplogroup reduction and expansion in Khvalynsk happened later than the Suvorovo-Novodanilovka expansion, then we might find the expansion of Pre- or Proto-Anatolian associated with many different haplogroups, such as R1b (xM269), R1a, I, J, or G2, and more or less associated with steppe ancestry in the Balkans.

Another reason for finding such variety of haplogroups in ancient samples from the Balkans would be that this Khvalynsk group of “chiefs” traversed – and mixed with – the Sredni Stog population. Nevertheless, if we suppose homogeneity in haplogroups in Khvalynsk during the expansion, a high proportion of different haplogroups explained by admixture with the local population of Sredni Stog would challenge the whole “chief domination” explanation by Anthony, and we would have to return to the “different culture” theory by Rassamakin and potentially an older migration from Khvalynsk. In any case, both researchers show clear links of the Suvorovo-Novodanilovka phenomenon to Khvalynsk, and a differentiation with the surrounding Sredni Stog culture.

A less likely model would support the identification of the whole Eneolithic Pontic-Caspian steppe as a loose Indo-Hittite-speaking community, which would be in my opinion too big a territory and too loose a cultural bond to justify such a long-lasting close linguistic connection. This will probably be the refuge of certain people looking desperately for R1a-IE connections. However, the nature of the western steppe will remain distinct from Late Proto-Indo-European, which must have developed in the Yamna culture, so autochthonous continuity is not on the table anymore, in any case…

Coexistence of the Varna-Gumelniţa culture and the Suvorovo phase of the sceptre-bearer communities. 1 — Fălciu; 2 — Fundeni-Lungoţi; 3 — Novoselskaja; 4 — Suvorovo; 5 — Casimcea; 6 — Kjulevča; 7 — Reka Devnja; 8 — Drama; 9 — Gonova mogila; 10 — Reževo; 11 — geographically separate Decea variant of the sceptre bearer group (after Govedarica, Manzura 2011: Abb. 5, adapted).

2. How did R1a-M417 (and especially R1a-Z645) haplogroups came to dominate over the Corded Ware cultures?

If I am right (again, based on TMRCA of modern populations), then it is precisely at the time of the potential expansion of Proto-Corded Ware from the Dnieper-Dniester forest, forest-steppe, and steppe regions, ca 3300-3000. Furholt’s recent radiocarbon analysis and suggestions of a Lesser Poland origin of the third or A-horizon, on which disparate archaeologists such as Anthony or Klejn rely now, seem to suggest also that Corded Ware was a cultural complex rather than a compact culture reflecting a migration of peoples – similar thus to the Bell Beaker complex.

This cultural complex interpretation of Corded Ware contrasts with the quite homogeneous late samples we have, suggesting clear migration waves in northern Europe, at least at some point in time, so Genomics will be a great tool to ascertain when and from where approximately did Corded Ware peoples expand. Right now, it seems that Eneolithic Ukraine populations are the closest to its origin, so the traditional interpretation of its regional origin by Kristiansen or Anthony remains valid.

3. How was Indo-Iranian adopted by Corded Ware invaders?

This is rather an anthropological question. We need reasonable models of founder effect/cultural diffusion necessary for that to happen – similar to the ones necessary to explain the arrival of N1c subclades into north-east Europe, or the arrival of R1b subclades in Basque/Iberian-speaking regions in south-west Europe. My description of potential events in the eastern steppe – based partially on Anthony – is merely a short sketch. Genomic data is unlikely to offer more than it does today (replacement of haplogroups, and gradually of some steppe component, by late Corded Ware groups in the steppe), but let’s see what new samples can contribute.

As for what some Indians – and other people willing to confront them – are looking for, regarding R1a-M417 and/or Indo-European origins in India, I don’t see the point, we already know a) that the origin of the expansion is in the steppe and b) that Hindu nationalist biggots will not accept results from research that oppose their views. I don’t expect huge surprises there, just more fruitless discussions (fomented by those who live from trolling or conspiracies)…

4. Yamna settlers from Hungary

Anthony’s new theory – and the nature of Balto-Slavic – hinges on the presence of R1a-M417 subclades (associated with later Corded Ware samples) in Yamna settlers of Hungary, potentially originally from the North Pontic area, where the oldest sample has been found.

My ‘modified’ version of Anthony’s new model (the only I deem just remotely factible) includes the expansion of a Proto-Corded Ware from Lesser Poland, but (given the overwhelming R1b found in East Bell Beaker), with R1a-M417 being associated with the region. How to explain this language change with objective data? Well, we have Bell Beaker expanding to these areas at a later time, so we would need to find R1b-L23 settlers in Lesser Poland, and then a resurge of R1a-M417 haplogroup. If not, resorting yet again to cultural diffusion Yamna “patrons” to Corded Ware “clients” of Lesser Poland would bring us to square one, now with the ‘steppe ancestry’ controversy included…

Since some Eastern Europeans are (for no obvious reason whatsoever) putting their hopes on that IE-R1a-CWC association, let’s hope some samples of R1a-M417 in Yamna or Hungary give them a break, so that they can begin accepting something closer to mainstream anthropological models. We could then work from there a Yamna-> Bell Beaker / North-West Indo-European association truce, and from there keep accepting that no single haplogroup from Yamna settlers is linked with modern languages, cultures or ethnic groups.

localization of Central-European funerary monuments with elements of the Pit Grave culture (after Bátora 2006);

5. How and when was Balto-Slavic associated with haplogroup R1a?

If we accept the Southern or Graeco-Aryan nature of Balto-Slavic with influence from an absorbed North-West Indo-European dialect, “Temematic” (as Kortlandt does), then Indo-Slavonic adopted in the steppe from Potapovka by Sintashta and Poltavka populations divided ca. 2000 BC into Indo-Iranian (migrating to the east with Andronovo), and Balto-Slavic (migrating westward with the Srubna culture). History from there is not straightforward, and it should follow Srubna, Thraco-Cimmerian, or other late expansions from cultures of the steppe.

On the other hand, if it is a Northern dialect related closely to Germanic and Italo-Celtic (in a North-West Indo-European group), then its origin has to be found in the initial expansion of East Bell Beakers, and its development into either the Únětice culture (of Balkan and thus potentially “Southern IE” influence), or the Mierzanowice-Nitra culture (of Corded Ware and thus potentially Uralic influence), or maybe from both, given the intermediate substrate found in Germanic and Balto-Slavic.

It is my opinion that the association of Balto-Slavic with haplogroup R1a is quite early after the East Bell Beaker expansion, probably initially with the subclade typically associated with West Slavic, R1a-M458. I have not much data to support this (apart from the most common linguistic model), just modern haplogroup distribution maps and common TMRCA, and highly hypothetical archaeological-anthropological models. Genetics will hopefully bring more data.

Let’s see also what information on ancient haplogroups we can obtain from the Tollense valley (already showing a close cluster with modern West Slavic populations) and steppe regions.

6. How did Germanic, Celtic, and Italic expand?

Germanic is probably the most interesting one. Following the expansion of R1b-L51 subclades (especially R1b-U106) and steppe ancestry (a confounding factor, with the previous expansion of R1a-Z284 subclades) in Scandinavia is going to be fascinating. Anthropological models already point to a linguistic and archaeological expansion of Pre-Germanic with Bell Beaker peoples.

The expansion of Celtic seems to be associated with chiefdoms, untraceable today in terms of haplogroups, and it seems thus different from previous expansions. New studies might tell how that happened, if it was actually in successive ways, as proposed, or maybe we don’t have enough data yet to reach conclusions.

We don’t know either how Italic expanded into the Italian Peninsula, or whether Latin expanded with peoples from Italy, if at all, or it was mostly a cultural diffusion event, as it seems.

Regarding Etruscan, while I think it is a controversy initiated based on fantastic accounts, and ignited with few finds of Middle Eastern ancestry (that seem logical from the point of view of regional contacts), it will be important for Italian linguists and archaeologists, also to accept the most likely scenario.

As for Palaeo-Hispanic languages, while steppe ancestry is found quite reduced in R1b-L51 subclades (after so many different expansions and admixture events since the departure from the steppe), their distribution from the Chalcolithic onwards and the resurgence of native haplogroups may serve to ascertain which Pre-Roman tribes were associated with the oldest regions where these subclades dominated. For that aim, a closer look at the developments in Aquitania and other pre-Roman Vasconic- and Iberian-speaking regions may shed some light on how founder effects might develop to leave the native language intact (in a case similar to the adoption of Indo-Iranian by post-Corded Ware Sinthastha and Potapovka in the eastern Pontic-Caspian steppe).

NOTE: Although mostly unrelated, linguistic questions may also be somehow altered with a change of migration models. For example, our current Corded Ware Substrate Hypothesis – strongly contested by Kortlandt and others – implies that Uralic was potentially the language spoken by Eneolithic Ukraine / Proto-Corded Ware peoples, therefore early Uralic languages were spoken by Corded Ware peoples, as a substrate for Germanic and Balto-Slavic, and Balto-Slavic and Indo-Iranian. If an Indo-Hittite branch different from Late PIE is accepted for Eneolithic Ukraine (thus suggesting a millennia-long cultural-historical community in the steppe), then the model still stands (e.g. Ger. and BSl. *-mos/-mus, as stated by Kortlandt, would correspond to the oldest morphological IE layer). As you can read in the different versions of our model, the different possibilities for the common substrate are stated, and the most likely one selected. But the most likely a priori option sometimes turns out to be wrong…

NOTE 2: You can comment whatever you want here, but I opened a specific thread in our forum if you want serious comments on the model to stuck and be further discussed.

Featured images: from the book Interactions, changes and meanings. Essays in honour of Igor Manzura on the occasion of his 60th birthday. Țerna S., Govedarica B. (eds.). 2016. Kishinev: Stratum Plus.

See also:

Our monograph on North-West Indo-European (first draft) is out

I wrote yesterday about the recently updated Indo-European demic diffusion model.

Fernando López-Menchero and I have published our first draft on the North-West Indo-European proto-language. Our contribution concerns mainly phonetics, and namely two of its most controversial aspects: a common process of laryngeal loss and two series of velars for PIE.

There is also an updated linguistic model for the Corded Ware substrate hypothesis, which seeks to explain certain similarities between Germanic and Balto-Slavic, and between Balto-Slavic and Indo-Iranian, and potential isoglosses between the three.

Available links:

As you probably know, our interest is (and has been for the past 15 years or so, even before our common project) the reconstruction of a North-West Indo-European proto-language, the ancestor of Italo-Celtic, Germanic, and Balto-Slavic. At least since Krahe’s proposal of an Alteuropäische substrate to European hydronymy, some 70 years ago, Indo-Europeanists have been supporting an Old European branch of Proto-Indo-European.

Root *sal-, *salm in European river names. Krahe (1949). From Wikipedia.

However, dialectal divisions were tentative. Since Oettinger, some 30 years ago, we have a clearer picture of a group of closely related dialects, namely Italo-Celtic, Germanic, and Balto-Slavic. Although the nature of Balto-Slavic is somehow contended (for the few scholars who support an Indo-Slavonic group), the minimalist view holds that at least the substrate language of Baltic and Slavic, Holzer‘s Temematic, was part of the North-West Indo-European group.

A North-West Indo-European (NWIE) proto-language not only solved the controversial question of Pan-European IE hydronymy (clearly of Late Indo-European nature), but also – and more elegantly – the question on the origin of the many fragmentary languages attested in Western Europe, usually attributed to a “Pre-Celtic” or “Pre-Italic” nature depending on their surrounding languages (Venetic has even said to be related to Germanic…).

Stages of Proto-Indo-European evolution. IU: Indo-Uralic; PU: Proto-Uralic; PAn: Pre-Anatolian; PToch: Pre-Tocharian; Fin-Ugr: Finno-Ugric. The period between Balkan IE and Proto-Greek could be divided in two periods: an older one, called Proto-Greek (close to the time when NWIE was spoken), probably including Macedonian, and spoken somewhere in the Balkans; and a more recent one, called Mello-Greek, coinciding with the classically reconstructed Proto-Greek, already spoken in the Greek peninsula (West 2007). Similarly, the period between Northern Indo-European and North-West Indo-European could be divided, after the split of Pre-Tocharian, into a North-West Indo-European proper, during the expansion of Yamna to the west, and an Old European period, coinciding with the formation and expansion of the East Bell Beaker group.

Described first mainly in terms of lexical isoglosses, the concept of a NWIE language was then gradually and strongly founded in common grammatical features, contributed to mainly by the German, North American, and Spanish schools (as you know, the British or French schools are quite divided on the nature of Proto-Indo-European itself…). Recent archaeological models pioneered by Harrison and Heyd (2007) showed how this might have happened, with Yamna migrants that evolved as the East Bell Beaker group, and their subsequent expansion into most of Europe.

Genetics is now clearly supporting such a closely related group, too.

Yamna – East Bell Beaker migration 3000-2300 BC according to Heyd in Harrison and Heyd (2007).

The work of Prescott and Walderhaug (1995) on the Pre-Germanic homeland, and the more precise archaeological migration model developed by Prescott clearly established the advent of Bell Beakers in Scandinavia as the key factor for the development of a unitary Pre-Germanic language in Scandinavia during the Dagger Period of the Nordic Late Neolithic.

The nature of Únětice and Mierzanowice/Nitra cultures as of Bell Beaker absorption of preceding Corded Ware cultures made the identification of the Balto-Slavic homeland in the Lusatian culture as quite likely – and this is now being confirmed with the study of Bronze Age samples, like those of the Tollense battlefield, which cluster closely to West Slavic and East German samples.

At the time of Marija Gimbutas’ breakthrough model of the “kurgan peoples” a common dialect from this Old European branch was deemed to be ‘Northern European‘ (or ‘Germano-Balto-Slavic’), which greatly influenced her work, supporting an identification of different burial types as stemming from the same source. This model, rejected already some years after Gimbutas’ proposal, has sadly survived to this day because of tradition (due e.g. to the work and influence of Kristiansen, and to some extent Anthony), and for some years (until the advent of ancient DNA) because of the modern distribution of haplogroup R1a in Europe and its relation to the ancient distribution of the Corded Ware culture.

This traditional model of a ‘Corded Ware -> Bell Beaker expansion of NWIE’ which we also followed until recently, never fit well with the known migrations paths from Yamna (into Balkan Early Bronze Age cultures), with the geographic distribution of Old European hydronymy, or with the guesstimates for Late Indo-European and North-West Indo-European. This compelled us to support a break-up of the proto-language further back in time than warranted by models of language change, and it needed certain unlikely cultural diffusion events over huge areas (because no such migration from Yamna to northern Europe has been attested): along the steppe/forest-steppe zone first, for a diffusion from Yamna into Corded Ware cultures, and along the Danube or the Rhine later, for a diffusion of Corded Ware into Bell Beaker. These models were also based on the wrong interpretation of the first radiocarbon dates of Beakers – placing an origin of the Bell Beaker people in Iberia (which has been rejected in Archaeology, and now also in Genetics).

Such a ‘Germano-Balto-Slavic’ group faded in Linguistics long ago, with most Indo-Europeanists preferring to talk about late contacts (viz. Celto-Germanic or Italo-Germanic contacts), and for some there is – if any subgroup at all – a core West Indo-European or Italo-Celto-Germanic group, which may be supported by recent genetic research on Bell Beaker peoples, with the Beaker group of the Netherlands being the key. Our research on the potential language spoken by Corded Ware peoples – most likely related to Uralic, from an Indo-Uralic community from the Pontic-Caspian steppe – can elegantly explain the isoglosses that both European dialects share.

Diachronic map of Late Copper Age migrations including Classical Bell Beaker (east group) expansion from central Europe ca. 2600-2250 BC

Read also: Schleicher’s Fable in Proto-Indo-European – pitch and stress accent