Analysis of R1b-DF27 haplogroups in modern populations adds new information that contrasts with ‘steppe admixture’ results


New open access article published in Scientific Reports, Analysis of the R1b-DF27 haplogroup shows that a large fraction of Iberian Y-chromosome lineages originated recently in situ, by Solé-Morata et al. (2017).


Haplogroup R1b-M269 comprises most Western European Y chromosomes; of its main branches, R1b-DF27 is by far the least known, and it appears to be highly prevalent only in Iberia. We have genotyped 1072 R1b-DF27 chromosomes for six additional SNPs and 17 Y-STRs in population samples from Spain, Portugal and France in order to further characterize this lineage and, in particular, to ascertain the time and place where it originated, as well as its subsequent dynamics. We found that R1b-DF27 is present in frequencies ~40% in Iberian populations and up to 70% in Basques, but it drops quickly to 6–20% in France. Overall, the age of R1b-DF27 is estimated at ~4,200 years ago, at the transition between the Neolithic and the Bronze Age, when the Y chromosome landscape of W Europe was thoroughly remodeled. In spite of its high frequency in Basques, Y-STR internal diversity of R1b-DF27 is lower there, and results in more recent age estimates; NE Iberia is the most likely place of origin of DF27. Subhaplogroup frequencies within R1b-DF27 are geographically structured, and show domains that are reminiscent of the pre-Roman Celtic/Iberian division, or of the medieval Christian kingdoms.

Some people like to say that Y-DNA haplogroup analysis, or phylogeography in general, is of no use anymore (especially modern phylogeography), and they are content to see how ‘steppe admixture’ was (or even is) distributed in Europe to draw conclusions about ancient languages and their expansion. With each new paper, we are seeing the advantages of analysing ancient and modern haplogroups in ascertaining population movements.

Quite recently there was a suggestion based on steppe admixture that Basque-speaking Iberians resisted the invasion from the steppe. Observing the results of this article (dates of expansion and demographic data) we see a clear expansion of Y-DNA haplogroups precisely by the time of Bell Beaker expansion from the east. Y-DNA haplogroups of ancient samples from Portugal point exactly to the same conclusion.

The situation of R1b-DF27 in Basques, as I have pointed out elsewhere, is probably then similar to the genetic drift of Finns, mainly of N1c lineages, speaking today a Uralic language that expaned with Corded Ware and R1a subclades.

The recent article on Mycenaean and Minoan genetics also showed that, when it comes to Europe, most of the demographic patterns we see in admixture are reminiscent of the previous situation, only rarely can we see a clear change in admixture (which would mean an important, sudden replacement of the previous population).

Equating the so-called steppe admixture with Indo-European languages is wrong. Period.

The following are excerpts from the article (emphasis is mine):

Dates and expansions

The average STR variance of DF27 and each subhaplogroup is presented in Suppl. Table 2. As expected, internal diversity was higher in the deeper, older branches of the phylogeny. If the same diversity was divided by population, the most salient finding is that native Basques (Table 2) have a lower diversity than other populations, which contrasts with the fact that DF27 is notably more frequent in Basques than elsewhere in Iberia (Suppl. Table 1). Diversity can also be measured as pairwise differences distributions (Fig. 5). The distribution of mean pairwise differences within Z195 sits practically on top of that of DF27; L176.2 and Z220 have similar distributions, as M167 and Z278 have as well; finally, M153 shows the lowest pairwise distribution values. This pattern is likely to reflect the respective ages of the haplogroups, which we have estimated by a modified, weighted version of the ρ statistic (see Methods).

Z195 seems to have appeared almost simultaneously within DF27, since its estimated age is actually older (4570 ± 140 ya). Of the two branches stemming from Z195, L176.2 seems to be slightly younger than Z220 (2960 ± 230 ya vs. 3320 ± 200 ya), although the confidence intervals slightly overlap. M167 is clearly younger, at 2600 ± 250 ya, a similar age to that of Z278 (2740 ± 270 ya). Finally, M153 is estimated to have appeared just 1930 ± 470 ya.

Haplogroup ages can also be estimated within each population, although they should be interpreted with caution (see Discussion). For the whole of DF27, (Table 3), the highest estimate was in Aragon (4530 ± 700 ya), and the lowest in France (3430 ± 520 ya); it was 3930 ± 310 ya in Basques. Z195 was apparently oldest in Catalonia (4580 ± 240 ya), and with France (3450 ± 269 ya) and the Basques (3260 ± 198 ya) having lower estimates. On the contrary, in the Z220 branch, the oldest estimates appear in North-Central Spain (3720 ± 313 ya for Z220, 3420 ± 349 ya for Z278). The Basques always produce lower estimates, even for M153, which is almost absent elsewhere.

Simplified phylogenetic tree of the R1b-M269 haplogroup. SNPs in italics were not analyzed in this manuscript.


The median value for Tstart has been estimated at 103 generations (Table 4), with a 95% highest probability density (HPD) range of 50–287 generations; effective population size increased from 131 (95% HPD: 100–370) to 72,811 (95% HPD: 52,522–95,334). Considering patrilineal generation times of 30–35 years, our results indicate that R1b-DF27 started its expansion ~3,000–3,500 ya, shortly after its TMRCA.

As a reference, we applied the same analysis to the whole of R1b-S116, as well as to other common haplogroups such as G2a, I2, and J2a. Interestingly, all four haplogroups showed clear evidence of an expansion (p > 0.99 in all cases), all of them starting at the same time, ~50 generations ago (Table 4), and with similar estimated initial and final populations. Thus, these four haplogroups point to a common population expansion, even though I2 (TMRCA, weighted ρ, 7,800 ya) and J2a (TMRCA, 5,500 ya) are older than R1b-DF27. It is worth noting that the expansion of these haplogroups happened after the TMRCA of R1b-DF27.

Principal component analysis of STR haplotypes. (a) Colored by subhaplogroup, (b) colored by population. Larger squares represent subhaplogroup or population centroids.

Sum up and discussion

We have characterized the geographical distribution and phylogenetic structure of haplogroup R1b-DF27 in W. Europe, particularly in Iberia, where it reaches its highest frequencies (40–70%). The age of this haplogroup appears clear: with independent samples (our samples vs. the 1000 genome project dataset) and independent methods (variation in 15 STRs vs. whole Y-chromosome sequences), the age of R1b-DF27 is firmly grounded around 4000–4500 ya, which coincides with the population upheaval in W. Europe at the transition between the Neolithic and the Bronze Age. Before this period, R1b-M269 was rare in the ancient DNA record, and during it the current frequencies were rapidly reached. It is also one of the haplogroups (along with its daughter clades, R1b-U106 and R1b-S116) with a sequence structure that shows signs of a population explosion or burst. STR diversity in our dataset is much more compatible with population growth than with stationarity, as shown by the ABC results, but, contrary to other haplogroups such as the whole of R1b-S116, G2a, I2 or J2a, the start of this growth is closer to the TMRCA of the haplogroup. Although the median time for the start of the expansion is older in R1b-DF27 than in other haplogroups, and could suggest the action of a different demographic process, all HPD intervals broadly overlap, and thus, a common demographic history may have affected the whole of the Y chromosome diversity in Iberia. The HPD intervals encompass a broad timeframe, and could reflect the post-Neolithic population expansions from the Bronze Age to the Roman Empire.

While when R1b-DF27 appeared seems clear, where it originated may be more difficult to pinpoint. If we extrapolated directly from haplogroup frequencies, then R1b-DF27 would have originated in the Basque Country; however, for R1b-DF27 and most of its subhaplogroups, internal diversity measures and age estimates are lower in Basques than in any other population. Then, the high frequencies of R1b-DF27 among Basques could be better explained by drift rather than by a local origin (except for the case of M153; see below), which could also have decreased the internal diversity of R1b-DF27 among Basques. An origin of R1b-DF27 outside the Iberian Peninsula could also be contemplated, and could mirror the external origin of R1b-M269, even if it reaches there its highest frequencies. However, the search for an external origin would be limited to France and Great Britain; R1b-DF27 seems to be rare or absent elsewhere: Y-STR data are available only for France, and point to a lower diversity and more recent ages than in Iberia (Table 3). Unlike in Basques, drift in a traditionally closed population seems an unlikely explanation for this pattern, and therefore, it does not seem probable that R1b-DF27 originated in France. Then, a local origin in Iberia seems the most plausible hypothesis. Within Iberia, Aragon shows the highest diversity and age estimates for R1b-DF27, Z195, and the L176.2 branch, although, given the small sample size, any conclusion should be taken cautiously. On the contrary, Z220 and Z278 are estimated to be older in North Central Spain (N Castile, Cantabria and Asturias). Finally, M153 is almost restricted to the Basque Country: it is rarely present at frequencies >1% elsewhere in Spain (although see the cases of Alacant, Andalusia and Madrid, Suppl. Table 1), and it was found at higher frequencies (10–17%) in several Basque regions; a local origin seems plausible, but, given the scarcity of M153 chromosomes outside of the Basque Country, the diversity and age values cannot be compared.

Within its range, R1b-DF27 shows same geographical differentiation: Western Iberia (particularly, Asturias and Portugal), with low frequencies of R1b-Z195 derived chromosomes and relatively high values of R1b-DF27* (xZ195); North Central Spain is characterized by relatively high frequencies of the Z220 branch compared to the L176.2 branch; the latter is more abundant in Eastern Iberia. Taken together, these observations seem to match the East-West patterning that has occurred at least twice in the history of Iberia: i) in pre-Roman times, with Celtic-speaking peoples occupying the center and west of the Iberian Peninsula, while the non-Indoeuropean eponymous Iberians settled the Mediterranean coast and hinterland; and ii) in the Middle Ages, when Christian kingdoms in the North expanded gradually southwards and occupied territories held by Muslim fiefs.

Contour maps of the derived allele frequencies of the SNPs analyzed in this manuscript. Population abbreviations as in Table 1. Maps were drawn with SURFER v. 12 (Golden Software, Golden CO, USA).

I wouldn’t trust the absence of R1b-DF27 outside France as a proof that its origin must be in Western Europe – especially since we have ancient DNA, and that assertion might prove quite wrong – but aside from that the article seems solid in its analysis of modern populations.


Text and figures from the article, licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit

Genetic origins of Minoans and Mycenaeans and their continuity into modern Greeks


A new article has appeared in Nature, Genetic origins of the Minoans and Mycenaeans, by Lazaridis et al. (2017), referenced by Science.


The origins of the Bronze Age Minoan and Mycenaean cultures have puzzled archaeologists for more than a century. We have assembled genome-wide data from 19 ancient individuals, including Minoans from Crete, Mycenaeans from mainland Greece, and their eastern neighbours from southwestern Anatolia. Here we show that Minoans and Mycenaeans were genetically similar, having at least three-quarters of their ancestry from the first Neolithic farmers of western Anatolia and the Aegean, and most of the remainder from ancient populations related to those of the Caucasus3 and Iran. However, the Mycenaeans differed from Minoans in deriving additional ancestry from an ultimate source related to the hunter–gatherers of eastern Europe and Siberia, introduced via a proximal source related to the inhabitants of either the Eurasian steppe or Armenia. Modern Greeks resemble the Mycenaeans, but with some additional dilution of the Early Neolithic ancestry. Our results support the idea of continuity but not isolation in the history of populations of the Aegean, before and after the time of its earliest civilizations.

Samples are scarce, and there is only one Y-DNA haplogroup of Mycenaeans, J2a1 (in Galatas Apatheia, ca. 1700-1200), which shows continuity of haplogroups from Minoan samples, so it does not clarify the potential demic diffusion of Proto-Greeks marked by R1b subclades.

Regarding admixture analyses, it is explicitly or implicitly (according to the press release) stated that:

  • There is continuity between Mycenaeans and living people, so that the major components of the Greeks’ ancestry was in place already in the Bronze Age, after the migration of the earliest farmers from Anatolia.
  • Anatolians may have been the source of “eastern” Caucasian ancestry in Mycenaeans, and maybe of early Indo-European languages (i.e. earlier than Proto-Greek) in the region.
  • The “northern” steppe population (speaking a Late Indo-European dialect, then) had arrived only in mainland Greece, with a 13-18% admixture, by the time studied.
  • Samples before the Final Neolithic (ca. 4100 BC) do not possess either type of ancestry, suggesting that the admixture detected occurred during the fourth to second millennium BC.
  • Admixture from Levantine or African influence (i.e. Egyptian or Phoenician colonists) cannot be supported with admixture.

All in all, there is some new interesting information, and among them the possibility of obtaining ancient DNA from arid regions, which is promising for future developments in the field.


Featured map: samples studied, from the article.

Something is very wrong with models based on the so-called ‘steppe admixture’ – and archaeologists are catching up


Russian archaeologist Leo Klejn has published an article Discussion: Are the Origins of Indo-European Languages Explained by the Migration of the Yamnaya Culture to the West?, which includes the criticism received from Wolfgang Haak, Iosif Lazaridis, Nick Patterson, and David Reich (mainly on the genetic aspect), and from Kristian Kristiansen, Karl-Göran Sjögren, Morten Allentoft, Martin Sikora, and Eske Willerslev (mainly on the archaeological aspect).

I will not post details of Klejn’s model of North-South Proto-Indo-European expansion – which is explained in the article, and relies on the north-south cline of ‘steppe admixture’ in the modern European population -, since it is based on marginal anthropological methods and theories, including glottochronological dates, and archaeological theories from the Russian school (mainly Zalyzniak), which are obviously not mainstream in the field of Indo-European Studies, and (paradoxically) on the modern distribution of ‘steppe admixture’…

The most interesting aspects of the article are the reactions to the criticism, some of which can be used from the point of view of the Indo-European demic diffusion model, too. It is sad, however, that they didn’t choose to answer earlier to Heyd’s criticism (or to Heyd’s model, which is essentially also that of Mallory and Anthony), instead of just waiting for proponents of the least interesting models to react…

The answer by Haak et al.:

Klejn mischaracterizes our paper as claiming that practitioners of the Corded Ware culture spoke a language ancestral to all European Indo-European languages, including Greek and Celtic. This is incorrect: we never claim that the ancestor of Greek is the language spoken by people of the Corded Ware culture. In fact, we explicitly state that the expansion of steppe ancestry might account for only a subset of Indo-European languages in Europe. Klejn asserts that ‘a source in the north’ is a better candidate for the new ancestry manifested in the Corded Ware than the Yamnaya. While it is indeed the case that the present-day people with the greatest affinity to the Corded Ware are distributed in north-eastern Europe, a major part of the new ancestry of the Corded Ware derives from a population most closely related to Armenians (Haak et al., 2015) and hunter-gatherers from the Caucasus (Jones et al., 2015). This ancestry has not been detected in any European huntergatherers analysed to date (Lazaridis et al., 2014; Skoglund et al., 2014; Haak et al., 2015; Fu et al., 2016), but made up some fifty per cent of the ancestry of the Yamnaya. The fact that the Corded Ware traced some of its ancestry to the southern Caucasus makes a source in the north less parsimonious.

In our study, we did not speculate about the date of Proto-Indo-European and the locations of its speakers, as these questions are unresolved by our data, although we do think the genetic data impose constraints on what occurred. We are enthusiastic about the potential of genetics to contribute to a resolution of this longstanding issue, but this is likely to require DNA from multiple, as yet unsampled, ancient populations.

Klejn response to that:

Allegedly, I had accused the authors of tracing all Indo-European languages back to Yamnaya, whereas they did not trace all of them but only a portion! Well, I shall not reproach the authors for their ambiguous language: it remains the case that (beginning with the title of the first article) their qualifications are lost and their readers have understood them as presenting the solution to the whole question of the origins of Indo-European languages.

(…) they had in view not the Proto-Indo-European before the separation of the Hittites, but the language that was left after the separation. Yet, this was still the language ancestral to all the remaining Indo-European languages, and the followers of Sturtevan and Kluckhorst call only this language Proto-Indo-European (while they call the initial one Indo-Hittite). The majority of linguists (specialists in Indo-European languages) is now inclined to this view. True, the breakup of this younger language is several hundred years more recent (nearly a thousand years later according to some glottochronologies) than the separation of Anatolian languages, but it is still around a thousand years earlier than the birth of cultures derived from Yamnaya.
More than that, I analysed in my criticism both possibilities — the case for all Indo-European languages spreading from Yamnaya and the case for only some of them spreading from Yamnaya. In the latter case, it is argued that only the languages of the steppes, the Aryan (Indo- Iranian) are descended from Yamnaya, not the languages of northern Europe. Together with many scholars, I am in agreement with the last possibility. But, then, what sense can the proposed migration of the Yamnaya culture to the Baltic region have? It would bring the Indo-Iranian proto-language to that region! Yet, there are no traces of this language on the coasts of the Baltic!

My main concern is that, to my mind, one should not directly apply conclusions from genetics to events in the development of language because there is no direct and inevitable dependence between events in the life of languages, culture, and physical structure (both anthropological and genetic). They can coincide, but often they all follow divergent paths. In each case the supposed coincidence should be proved separately.

The authors’ third objection concerns the increase of the genetic similarity of European population with that of the Yamnaya culture. This increases in the north of Europe and is weak in the south, in the places adjacent to the Yamnaya area, i.e. in Hungary. This gradient is clearly expressed in the modern population, but was present already in the Bronze Age, and hence cannot be explained by shifts that occurred in the Early Iron Age and in medieval times. However, the supposed migration of the Yamnaya culture to the west and north should imply a gradient in just the opposite direction!

Regarding the arguments of Kristiansen and colleagues:

[They argue that] in two early burials of the Corded Ware culture (one in Germany, the other in Poland) some single attributes of Yamnaya origin have been found.

(…) if this is the full extent of Yamnaya infiltration into central Europe—two burials (one for each country) from several thousands (and from several hundreds of early burials)—then it hardly amounts to large-scale migration.

Quite recently we have witnessed the success of a group of geneticists from Stanford University and elsewhere (Poznik et al., 2016). They succeeded in revealing varieties of Y-chromosome connected with demographic expansions in the Bronze Age. Such expansion can give rise to migration. Among the variants connected with this expansion is R1b, and this haplogroup is typical for the Yamnaya culture. But what bad luck! This haplogroup connected with expansion is indicated by the clade L11, while the Yamnaya burials are associated with a different clade, Z2103, that is not marked by expansion. It is now time to think about how else the remarkable results reached by both teams of experienced and bright geneticists may be interpreted.

Regarding the work of Heyd,

(…) with regard to the barrow burials of the third millennium BC in the basin of the Danube, although they have been assigned to the Yamnaya culture, I would consider them as also belonging to
another, separate culture, perhaps a mixed culture: its burial custom is typical of the Yamnaya, but its pottery is absolutely not Yamnaya, but local Balkan with imports of distinctive corded beakers (Schnurbecher). I would not be surprised if
Y-chromosome haplogroups of this population were somewhat similar to those of the Yamnaya, while mitochondrial groups were indigenous. As yet, geneticists deal with great blocks of populations and prefer to match them to very large and generalized cultural blocks, while archaeology now analyses more concrete and smaller cultures, each of which had its own fate.

Iosif Lazaridis shares more thoughts on the discussion in his Twitter account:

As we mentioned in Haak, Lazaridis et al. (2015), the Yamnaya are the best proximate source for the new ancestry that first appears with the Corded Ware in central Europe, as it has the right mix of both ANE (related to Native Americans, MA1, and EHG), but also Armenian/Caucasus/Iran-like southern component of ancestry. The Yamnaya is a westward expansive culture that bears exactly the two new ancestral components (EHG + Caucasus/Iran/Armenian-like).
As for the Y-chromosome, it was already noted in Haak, Lazaridis et al. (2015) that the Yamnaya from Samara had Y-chromosomes which belonged to R-M269 but did not belong to the clade common in Western Europe (p. 46 of supplement). Also, not a single R1a in Yamnaya unlike Corded Ware (R1a-dominated). But Yamnaya samples = elite burials from eastern part of the Yamnaya range. Both R1a/R1b found in Eneolithic Samara and EHG, so in conclusion Yamnaya expansion still the best proximate source for the post-3,000 BCE population change in central Europe. And since 2015 steppe expansion detected elsewhere (Cassidy et al. 16, Martiniano et al. 17, Mittnik et al. 17, Mathieson et al. 17, Lazaridis et al. 2016 (South Asia) and …?…

I love the smell of new wording in the morning… viz. Yamnaya best proximate source for Corded Ware, Corded Ware might account for only a subset of Indo-European languages, Corded Ware representing Aryan languages (probably Klejn misinterprets what the authors mean, i.e. some kind of Indo-Slavonic or Germano-Balto-Slavic group)…

We shall expect more and more ambiguous rewording and more adjustments of previous conclusions as new papers and new criticisms appear.


Featured image from the article: Distribution of the ‘Yamnaya’ genetic component in the populations of Europe (data taken from Haak et al., 2015). The intensity of the colour corresponds to the contribution of this component in various modern populations

Neolithic and Bronze Age Basque-speaking Iberians resisted invaders from the steppe


Good clickbait, right? I have received reports about this new paper in Google Now the whole weekend, and their descriptions are getting worse each day.

The original title of the article published in PLOS Genetics (already known by its preprint in BioRxiv) was The population genomics of archaeological transition in west Iberia: Investigation of ancient substructure using imputation and haplotype-based methods, by Martiniano et al. (2017).

Maybe the title was not attractive enough, so they sent the following summary, entitled “Bronze Age Iberia received fewer Steppe invaders than the rest of Europe” (also in From their article, the only short reference to the linguistic situation of Iberia (as a trial to sum up potential consequences of the genetic data obtained):

Iberia is unusual in harbouring a surviving pre-Indo-European language, Euskera, and inscription evidence at the dawn of history suggests that pre-Indo-European speech prevailed over a majority of its eastern territory with Celtic-related language emerging in the west. Our results showing that predominantly Anatolian-derived ancestry in the Neolithic extended to the Atlantic edge strengthen the suggestion that Euskara is unlikely to be a Mesolithic remnant. Also our observed definite, but limited, Bronze Age influx resonates with the incomplete Indo-European linguistic conversion on the peninsula, although there are subsequent genetic changes in Iberia and defining a horizon for language shift is not yet possible. This contrasts with northern Europe which both lacks evidence for earlier language strata and experienced a more profound Bronze Age migration.

Judging from the article, more precise summaries of potential consequences would have been “Proto-Basque and Proto-Iberian peoples derived from Neolithic farmers, not Mesolithic or Palaeolithic hunter-gatherers”, or “incomplete Indo-European linguistic conversion of the Iberian Peninsula” – both aspects, by the way, are already known. That would have been quite unromantic, though.

Their carefully selected title has been unsurprisingly distorted at least as “Ancient DNA Reveals Why the Iberian Peninsula Is So Unique“, and “Ancient Iberians resisted Steppe invasions better than the rest of Europe 6,000 years ago“.

So I thought, what the hell, let’s go with the tide. Using the published dataset, I have also helped reconstruct the original phenotype of Bronze Age Iberians, and this is how our Iberian ancestors probably looked like:

Typical Iberian village during the Steppe invasion, according to my phenotype study of Martiniano et al. (2017). Notice typical invaders to the right.

And, by the way, they spoke Basque, the oldest language. Period.

Now, for those new to the article, we already knew that there is less “steppe admixture” in Iberian samples from southern Portugal after the time of east Bell Beaker expansion.

(A) PCA estimated from the CHROMOPAINTER coancestry matrix of 67 ancient samples ranging from the Paleolithic to the Anglo-Saxon period. The samples belonging to each one of the 19 populations identified with fineSTRUCTURE are connected by a dashed line. Samples are placed geographically in 3 panels (with random jitter for visual purposes): (B) Hunter-gatherers; (C) Neolithic Farmers (including Ötzi) and (D) Copper Age to Anglo-Saxon samples. The Portuguese Bronze Age samples (D, labelled in red) formed a distinct population (Portuguese_BronzeAge), while the Middle and Late Neolithic samples from Portugal clustered with Spanish, Irish and Scandinavian Neolithic farmers, which are termed “Atlantic_Neolithic” (C, in green).

However, there is also a clear a discontinuity in Neolithic Y-DNA haplogroups (to R1b-P312 haplogroups). That means obviously a male-driven invasion, from the North-West Indo-European-speaking Bell Beaker culture – which in turn did not have much “steppe admixture” compared to other north-eastern cultures, like the Corded Ware culture, probably unrelated to Indo-European languages.

Summary of the samples sequenced in the present study.

As always, trying to equate steppe or Yamna admixture with invasion or language is plainly wrong. Doing it with few samples, and with the wrong assumptions of what “steppe admixture” means, well…

Proto-Basque and Proto-Iberian no doubt survived the Indo-European Bell Beaker migrations, but if Y-DNA lineages were replaced already by the Bronze Age in southern Portugal, there is little reason to support an increased “resistance” of Iberians to Bell Beaker invaders compared to other marginal regions of Europe (relative to the core Yamna expansion in eastern and central Europe).

As you know, Aquitanian (the likely ancestor of Basque) and Iberian were just two of the many non-Indo-European languages spoken in Europe at the dawn of historical records, so to speak about Iberia as radically different than Italy, Greece, Northern Britain, Scandinavia, or Eastern Europe, is reminiscent of the racism (or, more exactly, xenophobia) that is hidden behind romantic views certain people have of their genetic ancestry.

Some groups formed by a majority of R1b-DF27 lineages, now prevalent in Iberia, spoke probably Iberian languages during the Iron Age in north and eastern Iberia, before their acculturation during the expansion of Celtic-speaking peoples, and later during the expansion of Rome, when most of them eventually spoke Latin. In Mediaeval times, these lineages probably expanded Romance languages southward during the Reconquista.

Before speaking Iberian languages, R1b-DF27 lineages (or older R1b-P312) were probably Indo-European speakers who expanded with the Bell Beaker culture from the lower Danube – in turn created by the interaction of Yamna with Proto-Bell Beaker cultures, and adopted probably the native Proto-Basque and Proto-Iberian languages (or possibly the ancestor of both) near the Pyrenees, either by acculturation, or because some elite invaders expanded successfully (their Y-DNA haplogroup) over the general population, for generations.

Maybe some kind of genetic bottleneck happened, that expanded previously not widespread lineages, as with N1c subclades in Finland.

There is nothing wrong with hypothetic models of ancient genetic prehistory: there are still too many potential scenarios for the expansion of haplogroup R1b-DF27 in Iberia. But, please, stop supporting romantic pictures of ethnolinguistic continuity for modern populations. It’s embarrassing.

Featured image from Wikipedia, and Pinterest, with copyright from Albert Uderzo and publisher company Hachette.

Images from the article, licensed CC-by-sa, as all articles from PLOS.

My European Family: The First 54,000 years, by Karin Bojs


I have recently read the book My European Family: The First 54,000 years (2015), by Karin Bojs, a known Swedish scientific journalist, former science editor of the Dagens Nyheter.

My European Family: The First 54,000 Years
It is written in a fresh, dynamic style, and contains general introductory knowledge to Genetics, Archaeology, and their relation to language, and is written in a time of great change (2015) for the disciplines involved.

The book is informed, it shows a balanced exercise between responsible science journalism and entertaining content, and it is at times nuanced, going beyond the limits of popular science books. It is not written for scholars, although you might learn – as I did – interesting details about researchers and institutions of the anthropological disciplines involved. It contains, for example, interviews with known academics, which she uses to share details about their personalities and careers, which give – in my opinion – a much needed context to some of their publications.

Since I am clearly biased against some of the findings and research papers which are nevertheless considered mainstream in the field (like the identification of haplogroup R1a with the Proto-Indo-European expansion, or the concept of steppe admixture), I asked my wife (who knew almost nothing about genetics, or Indo-European studies) to read it and write a summary, if she liked it. She did. So much, that I have convinced her to read The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World (2007), by David Anthony.

Here is her summary of the book, translated from Spanish:

The book is divided in three main parts: The Hunters, The Farmers, and The Indo-Europeans, and each has in turn chapters which introduce and break down information in an entertaining way, mixing them with recounts of her interactions and personal genealogical quest.

Part one, The Hunters, offers intriguing accounts about the direct role music had in the development of the first civilizations, the first mtDNA analyses of dogs (Savolainen), and the discovery of the author’s Saami roots. Explanations about the first DNA studies and their value for archaeological studies are clear and comprehensible for any non-specialized reader. Interviews help give a close view of investigations, like that of Frederic Plassard’s in Les Combarelles cave.

Part two, The Farmers, begins with her travel to Cyprus, and arouses the interest of the reader with her description of the circular houses, her notes on the Basque language, the new papers and theories related to DNA analyses, the theory of the decision of cats to live with humans, the first beers, and the houses built over graves. Karin Bojs analyses the subgroup H1g1 of her grandmother Hilda, and how it belonged to the first migratory wave into Central Europe. This interest in her grandmother’s origins lead her to a conference in Pilsen about the first farmers in Europe, where she knows firsthand of the results of studies by János Jakucs, and studies of nuclear DNA. Later on she interviews Guido Brandt and Joachim Burguer, with whom she talks about haplogroups U, H, and J.

The chapter on Ötzi and the South Tyrol Museum of Archaeology (Bolzano) introduces the reader to the first prehistoric individual whose DNA was analysed, belonging to haplogroup G2a4, but also revealing other information on the Iceman, such as his lactose intolerance.

Part three, dealing with the origin of Indo-Europeans, begins with the difficulties that researchers have in locating the origin of horse domestication (which probably happened in western Kazakhstan, in the Russian steppe between the rivers Volga and Don). She mentions studies by David Anthony and on the Yamna culture, and its likely role in the diffusion of Proto-Indo-European. In an interview with Mallory in Belfast, she recalls the potential interest of far-right extremists in genetic studies (and early links of the Journal of Indo-European Studies to certain ideology), as well as controversial statements of Gimbutas, and her potentially biased vision as a refugee from communist Europe. During the interview, Mallory had a copy of the latest genetic paper sent to Nature Magazine by Haak et al., not yet published, for review, but he didn’t share it.

Then haplogroups R1a and R1b are introduced as the most common in Europe. She visits the Halle State Museum of Prehistory (where the Nebra sky disk is exhibited), and later Krakow, where she interviews Slawomir Kadrow, dealing with the potential creation of the Corded Ware culture from a mix of Funnelbeaker and Globular Amphorae cultures. New studies of ancient DNA samples, published in the meantime, are showing that admixture analyses between Yamna and Corded Ware correlate in about 75%.

In the following chapters there is a broad review of all studies published to date, as well as individuals studied in different parts of Europe, stressing the importance of ships for the expansion of R1b lineages (Hjortspring boat).

The concluding chapter is dedicated to vikings, and is used to demystify them as aggressive warmongers, sketching their relevance as founders of the Russian state.

To sum up, it is a highly documented book, written in a clear style, and is capable of awakening the reader’s interest in genetic and anthropological research. The author enthusiastically looks for new publications and information from researchers, but is at the same time critic with them, showing often her own personal reactions to new discoveries, all of which offers a complex personal dynamic often shared by the reader, engaged with her first-person account the full length of the book.

Mayte Batalla (July 2017)

DISCLAIMER: The author sent me a copy of the book (a translation into Spanish), so there is a potential conflict of interest in this review. She didn’t ask for a review, though, and it was my wife who did it.

The Aryan migration debate, the Out of India models, and the modern “indigenous Indo-Aryan” sectarianism


The Proto-Indo-European Urheimat

Not long ago, the Proto-Indo-European language Urheimat problem used to be cyclic in nature: linguistic and archaeological publications appeared supporting a Copper Age migration from the steppe proposed by Marija Gimbutas, or a Neolithic expansion from Anatolia (or Armenia) proposed by Colin Renfrew, and back again.

I have always supported the simpler, more recent Chalcolithic migration of Late Indo-Europeans from the Pontic-Caspian steppe over an older Neolithic expansion from Anatolia with agriculture. The latter model implied a complex cultural diffusion over a greater span of time than is warranted by linguistic guesstimates, understood as the general grasp that anyone can have on how much a language changes in time, comparing the different stages of different Indo-European languages. Whether they like to talk about it or not, or whether they would describe them as such (or else as terminus ante or post quem), most known linguists and archaeologists involved in Indo-European studies have published at some point their own guesstimates.

To have an idea about how guesstimates work, you only have to learn some Indo-European languages from different branches, the ancient languages from which they are derived, how they have evolved from them through time, and their proto-languages, to see how unlikely it is that the differences from Late Indo-European to Proto-Greek, Proto-Indo-Iranian, Proto-Celtic, or Proto-Italic need a leap of ca. 3000 years almost without change, as required by the Anatolian hypothesis. Some have strong reactions against guesstimates arguing you cannot compare historic or proto-historic changes to prehistoric ones, to support a different linguistic change rate from Proto-Indo-European to proto-languages. I find this to be a sound criticism, but often used justify a worse, ad-hoc estimate that supports other theory.

Glottochronology – in case you are looking for mathematics or statistics to solve the problem – is as useless today as it always was. Not everything – in fact few things in anthropology – can be solved with algorithms and statistics. I do love algorithms and statistics, because their results – if based on sound assumptions – are hard to be contested, but not a single good one has been proposed for comparative grammar, as far as I know.

Algorithms solve everything

Steppe hypothesis

The steppe hypothesis was always the simpler connection with modern Indo-European languages, from a linguistic and archaeological point of view, and archaeogenetics (since the advent of haplogroup investigation, and the finding of modern R1a distribution) did also support it. However, it implied a conquest by warring patrilocal peoples, that substituted the ‘original’ Neolithic European and Asian population and languages, and invasions have not been a fashionable antrhopological subject for a long time.

One of the consequences of the genocidal racism and xenophobia seen during World War II was the strong reaction to its ideological foundations, and there was a common will to end with Kosinna’s trend of historic ethnolinguistic identification of modern peoples. Linguistics and archaeology did then search for more complex models of human relations and exchange, mostly to avoid what appeared as simplistic concepts of migration or invasion. Marija Gimbutas’ simplistic kurganist, male-driven invasion of territories inhabited by matrilocal Old Europeans, albeit reasonable, did not fit well with these post-war times. One could accept historic and proto-historic atrocities and genocide by any people against others, and even tribal conflicts between prehistoric hunter-gatherers that ended in the destruction of one of them, but a violent, massive spread of ‘Aryans’ was considered a dangerous idea to be avoided.

Thanks to the effort of David Anthony (among others) in supporting migration models in Archaeology, the steppe model did have a strong revival even before archaeogenetics began to be a thing in anthropological research.

Anatolian hypothesis

The Anatolian hypothesis, on the other hand, seemed like a fine, long evolution of a language accompanying the peaceful spread of a technological innovation, farming and cattle herding. Originally believed to be mostly a cultural diffusion (now it has been demonstrated to be a mixed diffusion event, with strong demic diffusion in its early phase), it was thus in line with a more politically correct view of prehistoric events.

This cultural diffusion gave in turn way to more peaceful and innovative solutions to language spread, like waves of expansion, or a constellation of languages influencing each other for long periods, so that even the potential reconstruction of a single Proto-Indo-European language or people was doubted. Prehistoric friendly neighbours would have adopted farming and exchanged goods and languages for thousands of years, and only with proto-historic events did people have ethnolinguistic identification that caused conflicts…

While recently there have been some doubts expressed by Mathieson et al. (2017) on the of the steppe hypothesis regarding Proto-Anatolian, it is likely that the lack of enough ancient DNA of the Balkans and Anatolia is the key factor here.

An interesting linguistic proposal, the glottalic theory, while sound in its assumptions and results – much less likely in my opinion than the more common two-dorsal theory, and this much more likely than the prevalent three-dorsal one – gave some theoretical support to the Anatolian (or Armenian) hypothesis, since some proponents felt that a glottalic Proto-Indo-European should have an origin near to the Armenian homeland – because glottalic Proto-Armenian would have retained a phonetic state nearer to the “original” Proto-Indo-European.

That simplistic regional continuity explanation is akin to the trend of Basque researchers to discover links of Proto-Basque with the Pyrenees in Mesolithic and Palaeolithic times, when there is no data to warrant such identifications – and it seems in fact that Proto-Basque, Proto-Iberian, and Palaeo-Sardinian might have accompanied the expansion of farming in the Neolithic. Probably most proponents left of the Glottalic theory today (like Frederik Kortlandt and Alan Bomhard) would accept a steppe migration unrelated to an Armenian or Anatolian origin.

Marginal proposals

There were indeed other marginal proposals, with people supporting origins of Proto-Indo-European in both ends of the current distribution of Indo-European languages, from the “Indo-” in Out of India theories, to the “-European” in Eurocentric proposals. Most Eurocentric proposals – based on certain archaeological cultures and their evolution in- and outside Europe – have been dismissed with archaeological and genetic research, and the remaining ones usually favour the more fashionable peaceful spread of languages.

Palaeolithic Continuity Theory

A small group in support of the more recent Palaeolithic Continuity Theory remains. It seems to me as deeply flawed from a linguistic point of view (with a much larger time span needed than for a Neolithic expansion), but their arguments are led by research on genetics and archaeology, and not much is left for European romanticism, so it has always appeared to me as a professionally acceptable – although futile – attempt by eccentric researchers to disentangle prehistoric events.

Similar to what happens with proponents of the Anatolian hypothesis, new linguistic, archaeological, and genetic research is used to remake PCT models – instead of just dismissing it -, so it is likely that we will have many different proposals of stepped population movements that will make both models eventually converge with the steppe migration theory, to the point where only the steppe migration theory remains, with some added details on its most ancient origin. I guess sometimes it is difficult to let (part of) your life’s research just go away without fighting for some recognition… You desperately look for a tap on the back by some colleagues, even out of pity, who will tell you ‘it seems you might have been right in some details, after all!’…

Out of India

The Out of India theory is the name given to a group of (mostly) independent models that usually propose a Proto-Indo-European homeland based on or around India. Contrary to the PCT, an Out of India theory set during the Mesolithic or Neolithic would be feasible from a linguistic point of view: you could somehow connect some archaeological migrations to support the spread of Early-Proto-Indo-European-speaking R1a lineage happen east-to-west (and north), and genetically it had support in some papers on modern distribution of R1a subclades, for example in Underhill et al. (2014). Underhill himself has since questioned his conclusion in view of recent papers publishing ancient DNA analysis.

Out of India theories, overall, could thus be as strong (or as weak) as the theories concerning an Anatolian origin, in their potential for explanation of the ancient origin of the Proto-Indo-European language spoken in the steppe during the Neolithic and Chalcolithic. However feasible they might a priori be, I have yet to encounter a decent modern paper with that kind of proposal, based on recent genetic papers. Most modern articles are just Indian nationalist crap, and the only decent papers on this matter are becoming quite old fpr this relatively young field of Indo-European studies. Maybe that’s because I don’t have enough time to look for the hidden good anthropological papers among so much dirt. After all, it is not a very likely theory, and one has a limited amount of time.

In recent papers, if you get rid of simplistic reactionary and revisionist views, conservative Indo-Aryan Hindu nationalist or religious bigotry, fantastic connections with the Indus Valley civilization, and simplistic identifications of Proto-Indo-European as ‘nearer’ to Vedic Sanskrit – with absurdly old and odd references to Schleicher’s reconstruction and dialectal Indo-Slavonic or Satem references -, you are left at best with some basic criticisms of Eurocentrism and the known shortcomings of anthropological disciplines in investigating Proto-Indo-European Urheimat, but no data to support any connection with India whatsoever.

If there is a reason for a generalised inferiority complex in India, I would find it in the shameless publication and popularity of such worthless research papers, a trend that is also seen in scientific fields, with Indian researchers having a increasingly tougher time passing editorial and peer reviews, and resorting thus to national journals. In the case of Indo-European studies, instead of trying to fit data with what we know, the only aim in Indian research seems to be to connect the Indus Valley with Proto-Indo-European, and Proto-Indo-European with a “pure” (i.e. Vedic) Indo-Aryan, to support a mythological Indo-Aryan Hinduist India. And that is mostly what you will find in any Out of India article today, whether based on linguistic, archaeological, or – what is prevalent today – genetic investigation.

This has been The Out of India Controversy Week: it began last week with the publication of a quite decent article in The Hindu by Tony Joseph summing up the current situation of anthropological research. It was followed by reactions in conservative Indian news, and this in turn was contested by Davidsky and Razib Khan. The original article by Tony Joseph has been echoed by Victor Mair in Language Log, and I agree with his description of Joseph’s paper as “informed, sensitive, balanced, and nuanced. This is responsible science journalism”, even if I disagree with some of his statements (in a different way than Mr. Mair). However, this propaganda disguised as scientific criticism is what you get from Indian nationalists.

EDIT (25/6/2017): Razib Khan has published a thorough post on Indian evolutionary genetics as follow-up to this week’s controversy. I think there is too much effort being invested during these controversies precisely by the people who need not explain themselves. Anyway, good summaries of anthropological matters are always welcome.

EDIT (29/6/2017): Other posts on the subject, from Brown Pundits: On the “Aryan” debate – the linguistics POV; Razib Khan’s Indian genetics, part n of many; and Aryan Migration and its Discontents.

Interestingly, any time new research comes to shake certain Indian nationalist foundations, a stronger backfire effect happens, and more criticism is done on the shortcomings of such anthropological research. Because, indeed, if the anthropological theory is flawed, mythical Indo-Aryans spread from the Indus Valley, right…? One can only expect this kind of controversies to escalate in conservative Indian blogs and fora alike, and then deescalate until the next paper is published. A dialectic cycle whose only evident result is the increased opposition that conservative Indian researchers – or researchers that depend on funding by such groups – will have in publishing anything related to a potential Aryan invasion, and the addition of a stronger bias in Indian research.

Western European history

It might well be because I am western European, and western Europeans tend to accept quite well multiple invasions from the East. After all, they have happened so many times in proto-historical and historical times, that it is part of our ethnolinguistic nation-building lore. French people trace their history to the expansion of Celts, Romans, and Franks; Spaniards and Portuguese trace it to the spread of Celts, Ibero-Basques, Romans, and Westgoths; Italians to the expansion of Etruscans, Celts and Italics, Romans, Ostrogoths and Langobards; the English to the expansion of Celts, Angles and Saxons, Vikings, and Normans…

It often seems to me that western Europeans will romanticise their origins no matter what appears in historic and genetic investigation: if Neanderthals are unrelated to Europeans, they are ‘cavemen’; if they intermixed with our ancestors, then they suddenly become quite human in their behaviour, and it is great to have more Neanderthal admixture. If Indo-European-speaking R1a lineages invaded central Europe from the east, and transferred their languages, great, because “we” are heirs of original western European hunter-gatherers of Palaeolithic R1b lineages; if R1b lineages represent an invasion of eastern peoples speaking Late Indo-European, great too, because it means that our paternal forefathers were the ‘original’ Indo-European speakers…

This reaction, our history is great no matter what, seems to be a good one for research, since it allows for any change in our romantic views of the past. This, however, does not seem to be the case for some nations, and this inability to change their views is likely related to the inferiority complex that some nations have developed, in turn probably caused by western European colonialism, so one is left to wonder how responsible we are of modern chauvinist trends.

The sad future

Seeing how so many people of eastern European ancestry are convinced of an origin of R1a-M417 in Indo-European migrations from Yamna – when there is (yet?) not a single proof of it – may be just as troubling as the Indian case, or maybe more, since it affects an important part of Europe. I cannot believe that even today only western Europeans are capable of romanticising their own past no matter what, while the rest of the world lives in a quest to appropriate whatever they view as some great ancient culture, people, or language for their own ancestors.

I have already received complaints and have seen people (of Y-DNA haplogroup R1a) complain online that their forefathers cannot have been Uralic speakers, and some Uralic speakers (of haplogroup N) that original Uralic speakers cannot have been of R1a lineages. Firstly, if I were eastern European – be it Germanic, Balto-Slavic, or Uralic speaker, or a speaker of Indo-Aryan languages, of R1a or N lineage, whatever my country of origin, I like to think I would prefer to know where my forefathers actually came from, and what languages they did in fact speak thousands of years ago, even if that disrupts everything I or my fellow countrymen (wrongly) assumed for a long time. Secondly, we – as western Europeans speaking Romance or Germanic languages – have the right to know exactly how our peoples and languages really came to be, even if that means disrupting others’ dreams. Our paternal ancestors probably changed languages 3 or 4 times during their multiple migrations from the east, and were not peaceful hunter-gatherers living since the Palaeolithic in the same region we do now, as traditionally held; if we can get over this, eastern Europeans and Indians can get over it, too.

I think everyone deserves to know the truth, and they will eventually like it and fantasise with it. But many individuals want to disrupt any possible change to keep their current ethnic and nationalist agendas untouched, and that can affect us all. Nationalistic and romantic trends are understandable: Romans needed Virgil at the peak of their conquests to tell them that they had a glorious past in Troy, connecting them to the immortal Greek epics. The most important lesson one can learn from that example is that Italian researchers are still (2000 years later!) influenced by that myth, and they keep trying to look for Anatolian remains in Latin studies, and in the archaeology and evolutionary genetics of Italy. I guess you could therefore say these mythification trends are naturally human…but losing so much time in absurd quests for mythological identities seems absurd, and can only damage research.

It is sad to think about future generations of Indians looking for any sign to support an autochthonous Indo-Aryan homeland, while the rest of the world keeps moving in the right direction…

(Note: featured image is licensed CC-by-sa 4.0 from Avantiputra7 at Wikipedia)