Genetic landscapes showing human genetic diversity aligning with geography

world-effective-migration

New preprint at BioRxiv, Genetic landscapes reveal how human genetic diversity aligns with geography, by Peter, Petkova, and Novembre (2017).

Abstract:

Summarizing spatial patterns in human genetic diversity to understand population history has been a persistent goal for human geneticists. Here, we use a recently developed spatially explicit method to estimate “effective migration” surfaces to visualize how human genetic diversity is geographically structured (the EEMS method). The resulting surfaces are “rugged”, which indicates the relationship between genetic and geographic distance is heterogenous and distorted as a rule. Most prominently, topographic and marine features regularly align with increased genetic differentiation (e.g. the Sahara desert, Mediterranean Sea or Himalaya at large scales; the Adriatic, inter-island straits in near Oceania at smaller scales). We also see traces of historical migrations and boundaries of language families. These results provide visualizations of human genetic diversity that reveal local patterns of differentiation in detail and emphasize that while genetic similarity generally decays with geographic distance, there have regularly been factors that subtly distort the underlying relationship across space observed today. The fine-scale population structure depicted here is relevant to understanding complex processes of human population history and may provide insights for geographic patterning in rare variants and heritable disease risk.

world-migration-effective
Regional patterns of genetic diversity. a: scale bar for relative effective migration rate. Posterior effective migration surfaces for b: Western Eurasia (WEA) e: Central/Eastern Eurasia (CEA) g: Africa (AFR) h Southern African hunter-gatherers (SAHG) k: and Southeast Asian (SEA) analysis panels. ‘X’ marks locations of samples noted as displaced or recently admixed, ‘H’ denotes Hunter-Gatherer populations (both ‘X’ and ‘H’ samples are omitted from the EEMS model fit); in panel g, red circles indicate Nilo-Saharan speakers and in panel h, ‘B’ denotes Bantu-speaking populations. Approximate location of troughs are shown with dashed lines (see Extended Data Figure 4). PCA plots: c: WEA d:Europeans in WEA f: CEA i: SAHG j: AFR l: SEA. Individuals are displayed as grey dots. Large dots reflect median PC position for a sample; with colors reflecting geography matched to the corresponding EEMS figure. In the EEMS plots, approximate sample locations are annotated. For exact locations, see annotated Extended Data Figure 4 and Table S1. Features discussed in the main text and supplement are labeled. FST values per panelemphasize the low absolute levels of differentiation.”

Among ‘effective migration surfaces‘ (or potential past migration routes), the Pontic-Caspian steppe and its most direct connection with the Carpathian basin, the Danubian plains, appear maybe paradoxically as a constant ‘trough’ (below average migration rate) in all maps.

After all, we could have agreed that this region should be a priori thought as the route of many migrations from the steppe and Asia into Central Europe (and thus of ‘effective migration’) in prehistoric, proto-historic and historic times, such as Suvorovo-Novodanilovka (Pre-Anatolian), Yamna (Late Indo-European), probably Srubna, Scythian-Cimmerian, Sarmatian, Huns, Goths, Avars, Slavs, Mongols

It most likely (at least partially) represents a rather recent historical barrier to admixture, involving successive Byzantine, South Slavic, and Ottoman spheres of influence positioned against Balto-Slavic societies of Eastern Europe.

europe-migration-routes
Location of troughs in West Eurasia (below average migration rate in more than 95% of MCMC iterations) are given in brown. Sample locations and EEMS grid are displayed for the West Eurasian analysis panel. FST values are provided per panel to emphasize the low absolute levels of differentiation.

Featured image, from the article: “Large-scale patterns of population structure. a: EEMS posterior mean effective migration surface for Afro-Eurasia (AEA) panel. ‘X’ marks locations of samples excluded as displaced or recently admixed. ‘H marks locations of excluded hunter-gatherer populations. Regions and features discussed in the main text are labeled. Approximate locations of troughs are annotated with dashed lines (see Extended Data Figure 4). b: PCA plot of AEA panel: Individuals are displayed as grey dots, colored dots reflect median of sample locations; with colors reflecting geography and matching with the EEMS plot. Locations displayed in the EEMS plot reflect the position of populations after alignment to grid vertices used in the model (see methods).”

Images and text available under a CC-BY-NC-ND 4.0 International License.

Discovered via Razib Khan’s blog.

Related:

Asian ancestry of the Roma people in Europe

New article, Tau haplotypes support the Asian ancestry of the Roma population settled in the Basque Country, by Alfonso-Sánchez et al., Nature (2017).

Abstract:

We examined tau haplotype frequencies in two different ethnical groups from the Basque Country (BC): Roma people and residents of European ancestry (general population). In addition, we analyzed the spatial distribution of tau haplotypes in Eurasian populations to explore the genetic affinities of the Romani groups living in Europe in a broader scope. The 17q21.31 genomic region was characterized through the genotyping of two diagnostic single nucleotide polymorphisms, SNPs (rs10514879 and rs199451), which allow the identification of H1 and H2 haplotypes. A significant heterozygous deficit was detected in the Romani for rs10514879. The H2 haplotype frequency proved to be more than twice in the BC general population (0.283) than in the Roma people (0.127). In contrast, H2 frequency proved to be very similar between Basque and Hungarian Romani, and similar to the H2 frequencies found in northwestern India and Pakistan as well. Several statistical analyses unveiled genetic structuring for the MAPT diversity, mirrored in a significant association between geography and genetic distances, with an upward trend of H2 haplotype frequencies from Asia to Europe. Yet, Roma samples did not fit into this general spatial patterning because of their discrepancy between geographical position and H2 frequency. Despite the long spatial coexistence in the Basque region between the residents of European ancestry and the Roma, the latter have preserved their Asian genetic ancestry. Bearing in mind the lack of geographical barriers between both ethnical groups, these findings support the notion that sociocultural mores might promote assortative matings in human populations.

roma-tau-asian-genetic
“Regression line and 95% confidence intervals (dashed lines) in a regression analysis of tau H2 haplotype frequencies on the rotated geographical coordinates (H2 freq = 0.4256 − coord × 0.000083) of 35 European and Asian populations (coefficient of determination, r2 = 0.515). Populations examined in this study are highlighted with a frame. Solid circles are European populations, solid squares are Middle Eastern populations, and solid triangles represent South Asian populations. Romani populations are designated by stars. Population labels: BCRoma (Basque Country Roma), BC-resid (Basque Country general population), BC-Spain (Iberian Basques), BC-French (French Basques), UK (British), IT-Sardn (Sardinia, Italy), ITBergm (Bergamo, Italy), ITBresc (Brescia, Italy), IT-Tuscn (Tuscany, Italy), HU-Roma (Hungarian Roma), Palestn(Palestinians), and Samartn (Samaritans)”

I just realized I forgot to include the migration of Indo-Aryan Roma people in the map of medieval migrations… I shall correct that in future versions.

Migration_des_Roms
Map showing the migrations of Romani people through Europe and Asia minor. From Wikipedia.

Featured image: Map of Romani dialects. From Wikipedia, by ArnoldPlaton.

Migration vs. Acculturation models for Aegean Neolithic in Genetics — still depending strongly on Archaeology

aegean-neolithic-anatolia

Recent paper in Proceedings of the Royal Society B: Archaeogenomic analysis of the first steps of Neolithization in Anatolia and the Aegean, by Kılınç et al. (2017).

Abstract:

The Neolithic transition in west Eurasia occurred in two main steps: the gradual development of sedentism and plant cultivation in the Near East and the subsequent spread of Neolithic cultures into the Aegean and across Europe after 7000 cal BCE. Here, we use published ancient genomes to investigate gene flow events in west Eurasia during the Neolithic transition. We confirm that the Early Neolithic central Anatolians in the ninth millennium BCE were probably descendants of local hunter–gatherers, rather than immigrants from the Levant or Iran. We further study the emergence of post-7000 cal BCE north Aegean Neolithic communities. Although Aegean farmers have frequently been assumed to be colonists originating from either central Anatolia or from the Levant, our findings raise alternative possibilities: north Aegean Neolithic populations may have been the product of multiple westward migrations, including south Anatolian emigrants, or they may have been descendants of local Aegean Mesolithic groups who adopted farming. These scenarios are consistent with the diversity of material cultures among Aegean Neolithic communities and the inheritance of local forager know-how. The demographic and cultural dynamics behind the earliest spread of Neolithic culture in the Aegean could therefore be distinct from the subsequent Neolithization of mainland Europe.

The analysis of the paper highlights two points regarding the process of Neolithisation in the Aegean, which is essential to ascertain the impact of later Indo-European migrations of Proto-Anatolian and Proto-Greek and other Palaeo-Balkan speakers(texts partially taken verbatim from the paper):

  • The observation that the two central Anatolian populations cluster together to the exclusion of Neolithic populations of south Levant or of Iran restates the conclusion that farming in central Anatolia in the PPN was established by local groups instead of immigrants, which is consistent with the described cultural continuity between central Anatolian Epipalaeolithic and Aceramic communities. This reiterates the earlier conclusion that the early Neolithisation in the primary zone was largely a process of cultural interaction instead of gene flow.
aegean-neolithic-pca
Principal component analysis (PCA) with modern and ancient genomes. The eigenvectors were calculated using 50 modern west Eurasian populations, onto which genome data from ancient individuals were projected. The gray circles highlight the four ancient gene pools of west Eurasia. Modern-day individuals are shown as gray points. In the Near East, Pre-Neolithic (Epipaleolithic/Mesolithic) and Neolithic individuals genetically cluster by geography rather than by cultural context. For instance, Neolithic individuals of Anatolia cluster to the exclusion of individuals from the Levant or Iran). In Europe, genetic clustering reflects cultural context but not geography: European early Neolithic individuals are genetically distinct from European pre-Neolithic individuals but tightly cluster with Anatolians. PPN: Pre-Pottery/Aceramic Neolithic, PN: Pottery Neolithic, Tepecik: Tepecik-Çiftlik (electronic supplementary material, table S1 lists the number of SNPs per ancient individual).
  • The realisation that there are still two possibilities regarding the question of whether Aegean Neolithisation (post-7000 cal BC) involved similar acculturation processes, or was driven by migration similar to Neolithisation in mainland Europe — a long-standing debate in Archaeology:
    1. Migration from Anatolia to the Aegean: the Aegean Neolithisation must have involved replacement of a local, WHG-related Mesolithic population by incoming easterners. Central Anatolia or south Anatolia / north Levant (of which there is no data) are potential origins of the components observed. Notably, the north Aegeans – Revenia (ca. 6438-6264 BC) and Barcın (ca. 6500-6200 BC) – show higher diversity than the central Anatolians, and the population size of Aegeans was larger than that of central Anatolians. The lack of WHG in later samples indicates that they must have been fully replaced by the eastern migrant farmers.
    2. Adoption of Neolithic elements by local foragers: Alternatively, the Aegean coast Mesolithic populations may have been part of the Anatolian-related gene pool that occupied the Aegean seaboard during the Early Holocene, in an “out-of-the-Aegean hypothesis. Following the LGM, Aegean emigrants would have dispersed into central Anatolia and established populations that eventually gave rise to the local Epipalaeolithic and later Neolithic communities, in line with the earliest direct evidence for human presence in central Anatolia ca 14 000 cal BCE
  • On the archaeological evidence (excerpt):

    Instead of a single-sourced colonization process, the Aegean Neolithization may thus have flourished upon already existing coastal and interior interaction networks connecting Aegean foragers with the Levantine and central Anatolian PPN populations, and involved multiple cultural interaction events from its early steps onward [16,20,64,74]. This wide diversity of cultural sources and the potential role of local populations in Neolithic development may set apart Aegean Neolithization from that in mainland Europe. While Mesolithic Aegean genetic data are awaited to fully resolve this issue, researchers should be aware of the possibility that the initial emergence of the Neolithic elements in the Aegean, at least in the north Aegean, involved cultural and demographic dynamics different than those in European Neolithization.

    Featured image, from the article: “Summary of the data analyzed in this study. (a) Map of west Eurasia showing the geographical locations and (b) timeline showing the time period (years BCE) of ancient individuals investigated in the study. Blue circles: individuals from pre-Neolithic context; red triangles: individuals from Neolithic contexts”.

    Related:

Before steppe ancestry: Europe’s genetic diversity shaped mainly by local processes, with varied sources and proportions of hunter-gatherer ancestry

neolithic-mesolithic-europe

The definitive publication of a BioRxiv preprint article, in Nature: Parallel palaeogenomic transects reveal complex genetic history of early European farmers, by Lipson et al. (2017).

The dataset with all new samples is available at the Reich Lab’s website. You can try my drafts on how to do your own PCA and ADMIXTURE analysis with some of their new datasets.

Abstract:

Ancient DNA studies have established that Neolithic European populations were descended from Anatolian migrants who received a limited amount of admixture from resident hunter-gatherers. Many open questions remain, however, about the spatial and temporal dynamics of population interactions and admixture during the Neolithic period. Here we investigate the population dynamics of Neolithization across Europe using a high-resolution genome-wide ancient DNA dataset with a total of 180 samples, of which 130 are newly reported here, from the Neolithic and Chalcolithic periods of Hungary (6000–2900 BC, n = 100), Germany (5500–3000 BC, n = 42) and Spain (5500–2200 BC, n = 38). We find that genetic diversity was shaped predominantly by local processes, with varied sources and proportions of hunter-gatherer ancestry among the three regions and through time. Admixture between groups with different ancestry profiles was pervasive and resulted in observable population transformation across almost all cultural transitions. Our results shed new light on the ways in which gene flow reshaped European populations throughout the Neolithic period and demonstrate the potential of time-series-based sampling and modelling approaches to elucidate multiple dimensions of historical population interactions.

There were some interesting finds on a regional level, with some late survival of hunter-gatherer ancestry (and Y-DNA haplogroups) in certain specific sites, but nothing especially surprising. This survival of HG ancestry and lineages in Iberia and other regions may be used to revive (yet again) the controversy over the origin of non-Indo-European languages of Europe attested in historical times, such as the only (non-Uralic) one surviving to this day, the Basque language.

This study kept confirming the absence of Y-DNA R1b-M269 subclades in Central Europe before the arrival of Yamna migrants, though, which offers strong reasons to reject the Indo-European from the west hypothesis.

Here are first the PCA of samples included in this paper, and then the PCA of ancient Eurasians (Mathieson et al. 2017) and modern populations (Lazaridis et al. 2014) for comparison of similar clusters:

mesolithic-neolithic-PCA
First two principal components from the PCA. We computed the principal components (PCs) for a set of 782 present-day western Eurasian individuals genotyped on the Affymetrix Human Origins array (background grey points) and then projected ancient individuals onto these axes. A close-up omitting the present-day Bedouin population is shown. From Lipton et al. (2017(
pca-south-east-europe
PCA of South-East European and other European samples from Mathieson et al. (2017)
pca-ancient-modern-europe
Ancient and modern samples on Lazaridis et al. (2014)

Related:

Holocene rise in mobility in at least three stages: Strong link between technological change and human mobility in Western Eurasia

estimating-mobility

New interesting article at PNAS: Estimating mobility using sparse data: Application to human genetic variation, by Loog et al (2017).

Download links and supplemental information.

Significance

Migratory activity is a critical factor in shaping processes of biological and cultural change through time. We introduce a method to estimate changes in underlying migratory activity that can be applied to genetic, morphological, or cultural data and is well-suited to samples that are sparsely distributed in space and through time. By applying this method to ancient genome data, we infer a number of changes in human mobility in Western Eurasia, including higher mobility in pre- than post-Last Glacial Maximum hunter–gatherers, and oscillations in Holocene mobility with peaks centering on the Neolithic transition and the beginnings of the Bronze Age and the Late Iron Age.

Abstract

Mobility is one of the most important processes shaping spatiotemporal patterns of variation in genetic, morphological, and cultural traits. However, current approaches for inferring past migration episodes in the fields of archaeology and population genetics lack either temporal resolution or formal quantification of the underlying mobility, are poorly suited to spatially and temporally sparsely sampled data, and permit only limited systematic comparison between different time periods or geographic regions. Here we present an estimator of past mobility that addresses these issues by explicitly linking trait differentiation in space and time. We demonstrate the efficacy of this estimator using spatiotemporally explicit simulations and apply it to a large set of ancient genomic data from Western Eurasia. We identify a sequence of changes in human mobility from the Late Pleistocene to the Iron Age. We find that mobility among European Holocene farmers was significantly higher than among European hunter–gatherers both pre- and postdating the Last Glacial Maximum. We also infer that this Holocene rise in mobility occurred in at least three distinct stages: the first centering on the well-known population expansion at the beginning of the Neolithic, and the second and third centering on the beginning of the Bronze Age and the late Iron Age, respectively. These findings suggest a strong link between technological change and human mobility in Holocene Western Eurasia and demonstrate the utility of this framework for exploring changes in mobility through space and time.

Featured image, from the article: Estimation of mobility through time from empirical data. (A) Relative mobility rate estimates in Western Eurasia over the last 14,000 y, using a 4,000-y sliding window (121 windows). The solid black line represents the mean α value from 10,000 date resampled iterations; the colored area represents the 95% confidence intervals of the jackknife distribution.

Something is very wrong with models based on the so-called ‘steppe admixture’ – and archaeologists are catching up

steppe-admixture

Russian archaeologist Leo Klejn has published an article Discussion: Are the Origins of Indo-European Languages Explained by the Migration of the Yamnaya Culture to the West?, which includes the criticism received from Wolfgang Haak, Iosif Lazaridis, Nick Patterson, and David Reich (mainly on the genetic aspect), and from Kristian Kristiansen, Karl-Göran Sjögren, Morten Allentoft, Martin Sikora, and Eske Willerslev (mainly on the archaeological aspect).

I will not post details of Klejn’s model of North-South Proto-Indo-European expansion – which is explained in the article, and relies on the north-south cline of ‘steppe admixture’ in the modern European population -, since it is based on marginal anthropological methods and theories, including glottochronological dates, and archaeological theories from the Russian school (mainly Zalyzniak), which are obviously not mainstream in the field of Indo-European Studies, and (paradoxically) on the modern distribution of ‘steppe admixture’…

The most interesting aspects of the article are the reactions to the criticism, some of which can be used from the point of view of the Indo-European demic diffusion model, too. It is sad, however, that they didn’t choose to answer earlier to Heyd’s criticism (or to Heyd’s model, which is essentially also that of Mallory and Anthony), instead of just waiting for proponents of the least interesting models to react…

The answer by Haak et al.:

Klejn mischaracterizes our paper as claiming that practitioners of the Corded Ware culture spoke a language ancestral to all European Indo-European languages, including Greek and Celtic. This is incorrect: we never claim that the ancestor of Greek is the language spoken by people of the Corded Ware culture. In fact, we explicitly state that the expansion of steppe ancestry might account for only a subset of Indo-European languages in Europe. Klejn asserts that ‘a source in the north’ is a better candidate for the new ancestry manifested in the Corded Ware than the Yamnaya. While it is indeed the case that the present-day people with the greatest affinity to the Corded Ware are distributed in north-eastern Europe, a major part of the new ancestry of the Corded Ware derives from a population most closely related to Armenians (Haak et al., 2015) and hunter-gatherers from the Caucasus (Jones et al., 2015). This ancestry has not been detected in any European huntergatherers analysed to date (Lazaridis et al., 2014; Skoglund et al., 2014; Haak et al., 2015; Fu et al., 2016), but made up some fifty per cent of the ancestry of the Yamnaya. The fact that the Corded Ware traced some of its ancestry to the southern Caucasus makes a source in the north less parsimonious.

In our study, we did not speculate about the date of Proto-Indo-European and the locations of its speakers, as these questions are unresolved by our data, although we do think the genetic data impose constraints on what occurred. We are enthusiastic about the potential of genetics to contribute to a resolution of this longstanding issue, but this is likely to require DNA from multiple, as yet unsampled, ancient populations.

Klejn response to that:

Allegedly, I had accused the authors of tracing all Indo-European languages back to Yamnaya, whereas they did not trace all of them but only a portion! Well, I shall not reproach the authors for their ambiguous language: it remains the case that (beginning with the title of the first article) their qualifications are lost and their readers have understood them as presenting the solution to the whole question of the origins of Indo-European languages.

(…) they had in view not the Proto-Indo-European before the separation of the Hittites, but the language that was left after the separation. Yet, this was still the language ancestral to all the remaining Indo-European languages, and the followers of Sturtevan and Kluckhorst call only this language Proto-Indo-European (while they call the initial one Indo-Hittite). The majority of linguists (specialists in Indo-European languages) is now inclined to this view. True, the breakup of this younger language is several hundred years more recent (nearly a thousand years later according to some glottochronologies) than the separation of Anatolian languages, but it is still around a thousand years earlier than the birth of cultures derived from Yamnaya.
More than that, I analysed in my criticism both possibilities — the case for all Indo-European languages spreading from Yamnaya and the case for only some of them spreading from Yamnaya. In the latter case, it is argued that only the languages of the steppes, the Aryan (Indo- Iranian) are descended from Yamnaya, not the languages of northern Europe. Together with many scholars, I am in agreement with the last possibility. But, then, what sense can the proposed migration of the Yamnaya culture to the Baltic region have? It would bring the Indo-Iranian proto-language to that region! Yet, there are no traces of this language on the coasts of the Baltic!

My main concern is that, to my mind, one should not directly apply conclusions from genetics to events in the development of language because there is no direct and inevitable dependence between events in the life of languages, culture, and physical structure (both anthropological and genetic). They can coincide, but often they all follow divergent paths. In each case the supposed coincidence should be proved separately.

The authors’ third objection concerns the increase of the genetic similarity of European population with that of the Yamnaya culture. This increases in the north of Europe and is weak in the south, in the places adjacent to the Yamnaya area, i.e. in Hungary. This gradient is clearly expressed in the modern population, but was present already in the Bronze Age, and hence cannot be explained by shifts that occurred in the Early Iron Age and in medieval times. However, the supposed migration of the Yamnaya culture to the west and north should imply a gradient in just the opposite direction!

Regarding the arguments of Kristiansen and colleagues:

[They argue that] in two early burials of the Corded Ware culture (one in Germany, the other in Poland) some single attributes of Yamnaya origin have been found.

(…) if this is the full extent of Yamnaya infiltration into central Europe—two burials (one for each country) from several thousands (and from several hundreds of early burials)—then it hardly amounts to large-scale migration.

Quite recently we have witnessed the success of a group of geneticists from Stanford University and elsewhere (Poznik et al., 2016). They succeeded in revealing varieties of Y-chromosome connected with demographic expansions in the Bronze Age. Such expansion can give rise to migration. Among the variants connected with this expansion is R1b, and this haplogroup is typical for the Yamnaya culture. But what bad luck! This haplogroup connected with expansion is indicated by the clade L11, while the Yamnaya burials are associated with a different clade, Z2103, that is not marked by expansion. It is now time to think about how else the remarkable results reached by both teams of experienced and bright geneticists may be interpreted.

Regarding the work of Heyd,

(…) with regard to the barrow burials of the third millennium BC in the basin of the Danube, although they have been assigned to the Yamnaya culture, I would consider them as also belonging to
another, separate culture, perhaps a mixed culture: its burial custom is typical of the Yamnaya, but its pottery is absolutely not Yamnaya, but local Balkan with imports of distinctive corded beakers (Schnurbecher). I would not be surprised if
Y-chromosome haplogroups of this population were somewhat similar to those of the Yamnaya, while mitochondrial groups were indigenous. As yet, geneticists deal with great blocks of populations and prefer to match them to very large and generalized cultural blocks, while archaeology now analyses more concrete and smaller cultures, each of which had its own fate.

Iosif Lazaridis shares more thoughts on the discussion in his Twitter account:

As we mentioned in Haak, Lazaridis et al. (2015), the Yamnaya are the best proximate source for the new ancestry that first appears with the Corded Ware in central Europe, as it has the right mix of both ANE (related to Native Americans, MA1, and EHG), but also Armenian/Caucasus/Iran-like southern component of ancestry. The Yamnaya is a westward expansive culture that bears exactly the two new ancestral components (EHG + Caucasus/Iran/Armenian-like).
As for the Y-chromosome, it was already noted in Haak, Lazaridis et al. (2015) that the Yamnaya from Samara had Y-chromosomes which belonged to R-M269 but did not belong to the clade common in Western Europe (p. 46 of supplement). Also, not a single R1a in Yamnaya unlike Corded Ware (R1a-dominated). But Yamnaya samples = elite burials from eastern part of the Yamnaya range. Both R1a/R1b found in Eneolithic Samara and EHG, so in conclusion Yamnaya expansion still the best proximate source for the post-3,000 BCE population change in central Europe. And since 2015 steppe expansion detected elsewhere (Cassidy et al. 16, Martiniano et al. 17, Mittnik et al. 17, Mathieson et al. 17, Lazaridis et al. 2016 (South Asia) and …?…

I love the smell of new wording in the morning… viz. Yamnaya best proximate source for Corded Ware, Corded Ware might account for only a subset of Indo-European languages, Corded Ware representing Aryan languages (probably Klejn misinterprets what the authors mean, i.e. some kind of Indo-Slavonic or Germano-Balto-Slavic group)…

We shall expect more and more ambiguous rewording and more adjustments of previous conclusions as new papers and new criticisms appear.

Related:

Featured image from the article: Distribution of the ‘Yamnaya’ genetic component in the populations of Europe (data taken from Haak et al., 2015). The intensity of the colour corresponds to the contribution of this component in various modern populations

Germanic–Balto-Slavic and Satem (‘Indo-Slavonic’) dialect revisionism by amateur geneticists, or why R1a lineages *must* have spoken Proto-Indo-European

haplogroup-R1a

I feel there has recently been an increase in references to quite old – and generally outdated – terms, such as Germano-Balto-Slavic and “Indo-Slavonic” (i.e. Satem), described as Late Indo-European dialects. This is happening in forums and blogs that deal with “Indo-European genetics”, and only marginally (if at all) with the main anthropological subjects that form Indo-European studies, that is Linguistics and Archaeology.

Firstly, let me go apparently against the very aim of this post, by supporting the common traits that these dialects actually share.

Satem Indo-European or Indo-Slavonic

Balto-Slavic is a complex dialect, whose known proto-history and history offers already a difficult picture. Contrary to the opinion of many, there is no single document that can identify the terms Antes, Sklavenes, and Venedi with the cultures that are usually identified as speaking languages ancestral to East Slavic, South Slavic, and West Slavic . These names were used interchangeably in the Byzantine Empire, which was obviously not involved in classifying Slavic peoples by their linguistic branches… For more on the historical identification of Slavic tribes, read Florin Curta‘s The Making of the Slavs: History and Archaeology of the Lower Danube Region, c. 500-700 A.D. On the identification of potential candidates for early Slavic and Baltic cultures, you can read the appropriate entries in the Encyclopedia of Indo-European Culture, by Mallory & Adams.

Baltic and Slavic tribes seem to have a too recently recorded history to be able to confidently trace back their cultural predecessors. In its recent history, close to the formation of its community, Proto-Slavic must have had intense contacts with Iranian-speaking peoples. Also, previously, if R1a-M417 subclades are in fact the most common lineages expanded with the Corded Ware culture (as it seems now), they have no doubt shared a common language, most likely a non-Indo-European one. Not Indo-European in the strict sense, at least, since it formed part most likely of the Indo-Uralic continuum that must have been spoken during the Mesolithic in Eastern Europe, and a language probably nearer to Uralic than to classic Indo-European.

A strong connection between Balto-Slavic and Indo-Iranian in a common Satem branch, as supported by Kortlandt (see e.g. Balto-Slavic and Indo-Iranian 2016, or a reconstruction of Schleicher’s Fable in PIE branches), would imply that a Corded Ware culture from the Dnieper-Donets – speaking a Graeco-Aryan dialect – interacted for centuries with Uralic and other Graeco-Aryan languages, only later influenced by North-West Indo-European (as late as its contact with East Germanic during the Barbarian migration). This model cannot justify the shared traits of Balto-Slavic with North-West Indo-European, unless a third, substrate language – like Holzer’s (1989) Temematic proposal – is added to the equation. Such models are not impossible, but seem too complex.

On the other hand, linguistically Balto-Slavic seems to have split in its known branches quite early, and traits such as the satemization trend appear to have affected each main dialect (Baltic and Slavic) differently, as attested in the different ruki development, hence the assumption of its early but different influence of the trend to both Indo-Iranian and Balto-Slavic (or, more exactly, Indo-Iranian, Baltic, and Slavic). Also, the common North-West Indo-European vocabulary, as well as morphological trends shared by NW IE dialects, clearly affects the oldest layer of both languages (hence the parent Proto-Balto-Slavic too), which predates thus the satemization trend, and further contributes to the idea of a common root between West Indo-European (or Italo-Celtic), Northern Indo-European (the language ancestral to Pre-Germanic), and Proto-Balto-Slavic.

Germano-Balto-Slavic or North European

A common group between Germanic and Balto-Slavic is justified by the presence of certain common isoglosses, such as the famous shared oblique cases in *-m- instead of *-bh-, and support for such a group is found recently e.g. in Gramkelidze-Ivanov (1993-1994) – who nevertheless support a North-West Indo-European continuum -, or in Jasanoff, for whom both languages (regarding phonological traits) “began their post-IE history together”.

proto-indo-european-language-tree
Proto-Indo-European language tree, including early (Indo-Hittite) and late (European) stages by Trager & Smith (1950). From the paper On the Origin of North Indo-European, Gimbutas (1952)

On the other hand, such shared traits could have derived either from old contacts – supported traditionally because of their proximity -, or by a common substrate to both without a need for direct contacts, as supported by Kortlandt in Baltic, Slavic, Germanic (2016), among others).

The fact that there might have been a different, third language involved – the hypothetic Temematic substrate language to Balto-Slavic, potentially nearer to Baltic because of the stronger superstrate influence in Slavic – further complicates the dialectal identification of Baltic and Slavic – that is, if one supports a common Germanic and Balto-Slavic group.

This Northern group was supported by Gimbutas (1952) based on a previous linguistic paper by Trager & Smith (1950) – published in the infancy of dialectal PIE differentiation -, but this model is not mainstream nowadays. The linguistic model followed by Anthony (based on Ringe, Warnow, and Taylor 2002), did not link Germanic to Balto-Slavic (or to Italo-Celtic, for that sake) more than to any other dialect, but it seems that the three might have spread from the same western (Repin-derived) Yamna region, according to their maps. See for example their recent article The Indo-European Homeland from Linguistic and Archaeological Perspectives, which sums up and partly corrects Anthony’s detailed account of steppe migrations in the already classic book The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World (2007). Dialectal groups and implications are unclear from their publications, with changing linguistic schemes since 2007, but with a quite stable archaeological framework.

Dialectal Late Indo-European

I am not implying that a common group of Balto-Slavic with Indo-Aryan (or of Germanic with Balto-Slavic) is fully discarded by linguistics: history and archaeology can indeed support a close interaction between these languages, and there has been historically some support to the inclusion of Balto-Slavic within a Graeco-Aryan group. However, Linguistics and Archaeology are each day more supportive of the association of Italo-Celtic with Germanic in a North-West Indo-European group, and Balto-Slavic with them (Oettinger 1997). See for example any recent article or book by Mallory, Adams, Beekes, Adrados, etc., or if you prefer, refer to the mainstream models followed by scholars in the German, Spanish, Leiden, or American schools. As you probably know, Clackson for the British school supports an abstract “constellation analogy” model for the language reconstruction, and the French school is dominated by archaeologist Jean Paul Demoule’s rejection of a Proto-Indo-European community; both schools, as you can imagine, will have to revise their theories in light of recent genetic studies…

Regarding Archaeology, North-West Indo-European must have expanded with the westward Yamna expansion, i.e. associated with the Bell Beaker expansion. That was supported in mainstream Archaeology before the most recent genetic studies.

Even Anthony (2007), who has related the Corded Ware culture to the expansion of Indo-European languages through cultural diffusion, recognizing the expansion of Yamna migrants to the west (identifying them with Italo-Celtic and Proto-Greek speakers), has to offer two or three separate cultural diffusion events (!), whereby Pre-Germanic, Proto-Balto-Slavic, and Proto-Indo-Iranian had been learned by the influence of the Yamna culture on neighbouring (unrelated) peoples of Corded Ware cultures: in Central European – Single Grave culture (from Pre-Germanic Usatovo), Middle Dnieper culture (from Balto-Slavic in the Contact Zone), and Potapovka (from Poltavka) cultures, respectively. No actual spread or migration from Yamna into Corded Ware has been supported since Gimbutas.

Balto-Slavic is indeed a complex group of languages – with some supporting (since Toporov and Ivanov proposal in the 1960s) three dialectal groups, composed of East Baltic, West Baltic, and Slavic branches (thus implying an older split of Baltic). Because of the close interaction of eastern Europe with Eurasian invasions, the nature of their language won’t probably ever be solved. Genetics is not the savior that overcomes these difficulties; so long it has only brought more (albeit no doubt interesting) questions, and even though their correct interpretation might offer some new light, we will be far from obtaining a clear picture of the cultural and linguistic development of Proto-Baltic and Proto-Slavic communities.

What I am criticizing here, therefore, is this recent revisionist trend whereby PIE must have been spoken by R1a-Z645 lineages, a trend found not only among amateur geneticists. I am beginning to think – judging from online comments, posts or tweets – that this trend is becoming stronger as a reaction to the fact that not a single R1a-Z645 sample has been found in Yamna or its expansion. These new revisionist models depict a common group of R1a-Z645 lineages hidden somewhere in the steppe, sharing some sort of Indo-Germanic (??) group, or argue for a shared Late PIE community without dialectal divisions, to justify its potential find somewhere marginal to the PIE territory, and then a later development of Corded Ware into Bell Beaker cultures (and, it is implied, peoples).

While not impossible, these are unlikely models, not based on knowledge but on wishes, since linguistic data strongly suggest a North-West Indo-European dialect including Italo-Celtic, Germanic, and (at the very least in its substrate and thus western R1a lineages) Balto-Slavic, and archaeological findings don’t show any meaningful population exchange between Corded Ware and Yamna… That is, it hadn’t until after the first famous papers on the so-called ‘steppe admixture’ of 2015, when (surprise!) Kristiansen has already jumped on the bandwagon (and Anthony seems to be beginning to suggest the same) of previously discarded Yamna -> Corded Ware, and Corded Ware -> Bell Beaker migrations.

Not a single serious researcher can deny that a hidden community of R1a-417 in Yamna is possible. But no one should support that it is the most likely explanation to the current genetic picture, whether based on Linguistic, Archaeology, Anthropology, or Genetics (be it phylogeography or admixture analyses).

I think this recent trend must therefore be the fruit of the influence of previous, deeply entrenched concepts regarding the Corded Ware culture and its link with Proto-Indo-European. These concepts are based on Gimbutas’ Kurgan model, Anthony’s revision of it – explaining the expansion as multiple cultural diffusions (thus renewing Gimbutas’ claim) -, and early studies of modern populations’ haplogroups. Apart from those trends, especially worrying for the future of the field (if it is to be taken seriously), is the interest of some pressure groups, including especially eastern Slavic peoples of R1a lineages, and Finnic speakers of N1c lineages, who are linking some fantastic ancient ethnolinguistic community to their modern national pride.

kurgan-expansion-europe
“European dialect” expansion of Proto-Indo-European according to The Indo-Europeans: Archeological Problems, Gimbutas (1963). Observe the similarities of the western European expansion to the recently proposed expansion of R1b lineages with western Yamna and Bell Beaker.

Adapting to reality

You can find support for anything you like in anthropology: there is certainly a paper out there that apparently supports your personal view on prehistoric ethnolinguistic Europe. You only have to do a quick search in Academia.edu, and you can justify whatever new genetic results you personally obtained playing with the freely available datasets and open source software – e.g. from Reich’s lab, or the famous ADMIXTURE. If you are one of those few interested in the field who haven’t tried it out yet, Razib Khan helps introduce you to DIY Genetics, so you can show off some graphics and proportions, like most popular bloggers and forum users are doing. Then you can also publish your results in BioRxiv, just to try it out.

So there is no merit at all in justifying these genetic results by supporting a potential anthropological scenario for it. Heck, you can invent it! Here, I said it. Anyone can do Anthropology. In fact, it seems that everyone does Human Evolutionary Genetics nowadays, no matter their background. Some lab knowledge and experience in doctoral research seems to be enough.

Admixture analyses are obtained using one or more algorithms, which have a limited potential to inform of possible migrations (its ultimate objective, at least regarding its complementary function to Archaeology within Indo-European studies). Such algorithms invariably have:

  • Intrinsic constraints: You have to understand each algorithm’s intrinsic limitations to be able to apply them correctly, and to derive meaningful but cautious conclusions. Using software commands and obtaining graphics and percentages does not imply you understand the constraints at stake. If you have tried them out, you have seen their great limitations; if you don’t see them, you certainly realize how little you understand of them.
  • Extrinsic constraints. Most are known, and often mentioned explicitly in research papers:
    • Few DNA samples, from limited sites.
    • Scarce and variable material recovered from these samples.
    • Quality of the retrieval, human errors, etc.
    • Lack of precise anthropological context.

Admixture results (whether by professionals or amateurs) are nevertheless often illustrated with tailored anthropological models: in case of the renown papers most likely because of ignorance of anthropological context, broad (philosophical or theoretic) and precise (historical), or lack of sufficient understanding of the different fields involved, and in case of many amateur geneticists also (often) to justify a desire for a prehistoric ethnolinguistic identification similar to their social or political agendas, in a new Kossinnian trend.

Admixture analyses are not wrong per se. It is wrong to trust them to inform you of something they can’t; because they need context, and ancient samples need ancient context, which in prehistoric times is obviously quite limited. If you don’t know as much as possible about the ancient context (i.e. Linguistics, Archaeology, Anthropology), you get the wrong conclusions. Period. If you look for papers on ancient context expecting to find whichever model fits your results (or worse, your wish), that is called bias. Don’t expect to get the right conclusions doing that, either. If you find it, that’s called confirmation bias. Such results are not useful. For anyone, not even you, you just deceive yourself and maybe others.

Some apparently think that a group of geneticists can achieve a meaningful interpretation of data just by adding one or more archaeologists to the research group – or as ‘co-authors’ of individual research papers. Wrong again. Ten people with IQ 20 don’t make the reasoning of a person with IQ 200 (not that I believe in measuring intelligence, but you get the point). Similarly, twenty researchers, each one with knowledge exclusively (or almost exclusively) of their own field, can’t achieve a meaningful explanation for the data obtained. Geneticists look for an anthropological model that coherently fits their results. Archaeologists will look for a model known to them that fits the genetic results (or more likely the interpretations thereof) they are given. That way, when working together, they can achieve a common ground. If neither of them understands the complexities and shortcomings of the others’ materials and methods (and their whole background), the results will be formally correct, but still wrong. They need to know all aspects involved in the others’ fields in great detail, to understand all potential implications of new data.

Since the advent of ancient DNA samples and especially PCA analyses, phylogeography (leaning predominantly on Y chromosomes) has been relegated to a (probably deserved) second place in assessing DNA samples. However, as Razib Khan states, “in the scaffold of the ancient DNA framework it can resolve some issues”. I think this is one of those issues, an issue that is not trivial at all – in that it affects migration models from the steppe at a critical period of linguistic expansion -, and the shortcomings of not relying on it are becoming quite evident with each new publication.

Many amateur geneticists that support the mainstream genetic models of the past two years don’t like the ad hoc explanations that others have been constantly giving to support their previous theories. After all, it seems unfair that some people would reject data that offers an obvious prehistoric picture of populations, because of the unwillingness to change one’s own preconceptions, right? For example, against the mainstream steppe migration theory, we have those who support that R1b must have been western European (Palaeolithic or Mesolithic) hunter-gatherers expanded from Iberia; or those who want R1a to have expanded from India. No matter how strong the evidence is against those models, some groups harbour a desire to fit anything in one’s previous image of reality.

However, some people who can’t stand those absurd ad hoc explanations and rationalisations, are quite ready to embrace the idea that, somehow, during the Chalcolithic expansion of Yamna, an imaginary community was formed where communities of divergent lineages R1a-Z645 (found mostly north of the steppe and later in Corded Ware cultures) and R1b-M269 (found mostly in the steppe and later in the cultures known to have evolved from Yamna, like Afanasevo, Vučedol, and Bell Beaker) lived together and spoke the same language for centuries, or even millennia. And that community would have existed after a Late Neolithic westward expansion of the Khvalynsk culture, and another westward expansion of the Repin culture, both of which probably reduced the diversity of Y-DNA lineages within Yamna: the first to R1b-M269 lineages, the second to R1b-L23 subclades.

Both communities of R1a and R1b lineages, described then as united until the Yamna expansion (although no sample of R1a-Z645 subclade has been ascribed to any steppe expansion) would have expanded somehow separately, R1a-M417 exclusively to the north into Corded Ware – without any migratory connection found between Yamna and Corded Ware in mainstream Archaeology -, and forming thus dialectal groups (like “Germano-Balto-Slavic” or “Indo-Slavonic”) that are not supported by mainstream linguistics.

On the other hand, R1b-Z2103 and R1b-L51 lineages, which were already separated within Yamna and probably forming different communities, are known to have spread to the west with the Yamna expansion, in some places and cultures they are found together (like Bell Beaker), which would be expected in a common migration of separate groups. No single R1b-L23 sample has been found in the Corded Ware expansion, no single R1a-M417 individual in the Yamna expansion.

These convoluted explanations of how R1a lineages must have spoken Indo-European are based on the assumption that admixture analyses (from the current limited data, with the current wrong interpretation of their context) necessarily means that Corded Ware peoples spread as Yamna migrants – hence R1a lineages must come from Yamna – and then spread into Bell Beaker.

It is possible, and in my opinion expected, that eventually some R1a-M417 subclade will be found in Yamna samples (east or west), and some haplogroup R1b-M269 (especially R1b-L23 and subclades) will be found in samples from Corded Ware cultures (west or east). Indeed, there must have been close contacts between both cultures (between Yamna–Southeast Europe–East Bell Beaker and Corded Ware), and not only through female exogamy. It would be quite strange not to find a single R1b-L23 sample in Corded Ware cultures, or an R1a-M417 sample in Late Proto-Indo-European-speaking territories. Those scattered samples, whenever they are found, will probably not change the data: but they might give a reason for some to keep supporting a model that is not the most likely one. It won’t still be the most reasonable, the simplest model that explains all data.

What it means to be an ‘ethnic’ Balt or Slav

Older models – older even than Gimbutas’ kurgan model of the 1950s, as you can see -, by presupposing an instant breakup of a unitary Proto-Indo-European language into different linguistic communities without previous dialectal relation with each other, cannot explain our common European linguistic heritage. More recent models based on recent genetic studies (and on outdated or newly invented linguistic and archaeological theories), by trying to connect genetically (directly) modern eastern Europeans with Proto-Indo-Europeans, are in fact disconnecting Balto-Slavic peoples from the rest of Europe for three thousand years, and connecting them either with Uralic or with Indo-Iranian speakers. Ethnolinguistic identification, however, is not about genetics – and it has never been, and I hope it will never be -; it is related to self-identification into groups, and more broadly to a common culture, and often specifically a common language.

In terms of language, it makes sense to support a situation where Balto-Slavic was a North-West Indo-European dialect (sharing a common language ancestral to Germanic and Italo-Celtic too), with certain ancient (Uralic?) innovative traits shared with Indo-Iranian and partly with Germanic (but with no direct contacts necessary between these branches). Its recent transition to a Baltic and Slavic proto-languages, already by eastern European groups, shows their strong external influence from Uralic and Iranian, respectively, so an identification of Balto-Slavic with the expansion of R1a lineages is probably to be found in a western group of R1a-Z282 subclades expanding eastwards between the Bronze Age and the Iron Age.

Eastern Europe’s Indo-European heritage (Balto-Slavic) is therefore connected to the western European one (Italo-Celtic and Germanic), each with its own linguistic substrate and influences, but with a common, shared ancient language. North-West Indo-European derived in turn from Late Indo-European, a language ancestral to Indo-Iranian and Palaeo-Balkan languages, the latter showing continued contacts with western Europe for millennia.

In the minimum-case scenario – for supporters of a Satem proto-language like Kortlandt – the language substrate to Baltic and Slavic must be a North-West Indo-European language (to fit its shared traits with North-West Indo-European), like Holzer’s Temematic (a hypothesis which Kortlandt seems to support) that would have then been recently absorbed by Satem speakers of Eastern Europe. In that context, central European R1a-Z282 lineages (which form the majority of West Slavic lineages) would have spoken that NWIE language for millennia , until proto-historic times, when a cultural diffusion of a Graeco-Aryan dialect (mainly spoken by R1a-Z93 or eastern European R1a-Z282 lineages, then) would have happened in eastern Europe, and then a cultural diffusion (or demic diffusion?) of Slavic-speaking peoples would have happened to the west, into central Europe.

In none of these scenarios is any sort of Proto-Indo-European -> Balto-Slavic ethnolinguistic, genetic, or territorial continuity to be seen. The former model is not only the simpler explanation for Slavic and Baltic, but it is also the communis opinio today by most Indo-Europeanists, it is supported by Archaeology, and Genetics is likely to keep supporting it with each new paper. I don’t find anything shameful, or that could diminish modern Baltic and Slavic identities a bit, by accepting any of those models, so I don’t understand the imperative need some people seem to have of identifying R1a lineages with the Yamna expansion and thus Proto-Indo-European.

For those who will still be vying for a more prominent role of haplogroup R1a in the Proto-Indo-European ethnogenesis, there are alternative older scenarios to the arrival of Proto-Indo-European to the steppe, as there are older models for an Anatolian origin of PIE. So, for example I already laid out the possibility that the invasion of R1a-M417 brought Indo-European (or, more precisely, Indo-Uralic) to eastern Europe – as part of a Uralo-Yukaghir or Paleo-Siberian group -, while R1b communities may have originally been speakers of Afroasiatic (cf. R1b-V88 and the potential Afroasiatic homeland in Lake Megachad), and R2 would have been associated with the spread of Dravidian (and maybe Kartvelian and Altaic), all of them departing from a common Nostratic associated with haplogroup R. This model could find support in genetics in the link found between Mesolithic Northeast Europeans and Neolithic Siberians, from an Ancient North Eurasian (ANE) population probably rich in Y-chromosome haplogroup R1a.

This is just one of many highly hypothetic ancient scenarios, and it requires more assumptions than a continuity of Indo-Uralic (or even Indo-Uralic and Afroasiatic) with R1b lineages – R1a potentially marking the spread of Paleo-Siberian languages -, and above all it is based on controversial linguistic macrofamilies, not (yet?) supported by mainstream anthropological disciplines. It is nevertheless one theory certain romantics can place their hopes in, as R1b communities of the steppe become accepted as those originally speaking (Middle and Late) Proto-Indo-European in the steppe.

Wrap-up

I am not saying I am right. There is still too much to be said and corrected. In fact, I could be wrong, and we may lack a lot of interesting data: there might have been a late R1a-R1b North-West Indo-European-speaking community within western Yamna, and we might need to revise what we knew about Archaeology yet again (and maybe even Linguistics!) before admixture algorithms; then maybe geneticists have come to save the day after all. However, all anthropological evidence points strongly (and genetic studies more strongly with each new study) to the image we had previous to the first genetic data based on haplogroups.

I think it is preposterous of some researchers (no matter if professional or amateur geneticists, or archaeologists, or even linguists) to think algorithms can beat more than two hundred years and thousands of works on this matter. In Academia, mathematics rarely revolutionize a field; it could usually help, but it can just make you sound scientifish, and point in the wrong direction.

And no, I am not smarter than the rest, I can only judge from what I know, and that is always too little, far less than I would like to. But maybe I am in a more neutral position regarding the end result, given my renewed skepticism in revolutionary methods to solve academic problems, and my indifference as to a western European or eastern European origin of R1b or R1a lineages. And I am not alone in my lack of confidence in the interpretation of recent genetic admixture results – read Voker Heyd’s papers, for example, if you want the view of a renown and experienced archaeologist who was in the field of Indo-European studies earlier than any of those now popular geneticists.

In fact, I also fell for the R1a-Corded Ware expansion of Late Indo-European, and before many in the Anthropological fields, and with even less proof, back when we only had haplogroups of modern populations and the promises of Cavalli-Sforza. When I decided to publish a grammar to learn Indo-European as a modern language, the aim was to offer a mainstream reconstruction of Late Proto-Indo-European without adding my own contributions; despite this, I added the newly, archaeogenetically-supported Corded Ware migration model (see A Grammar of Modern Indo-European, Third Edition, pp. 74 and ff.).

I guess I liked the picture of an old romantic Europe, divided in western Vasconic (R1b) and eastern Uralic (N1c) hunter-gatherers (and later farmers) being invaded by warring kurgan-makers from the steppe (R1a) … And I really liked the article of Haak et al. (2015) – the first one I read on this subject -, which I saw, like everyone else, as supporting what many of us already believed about a single, common expansion of North-West Indo-European into western Europe. It also made our life – regarding the linguistic unity of Balto-Slavic with the West Indo-European core – much easier…

calcolithic-expansion
Chalcolithic expansion from Yamna into Corded Ware (as early North-West Indo-European dialects), as interpreted in the second edition of A Grammar of Modern Indo-European.

Recent papers, when compared to what linguists and archaeologists had been saying for years – before even Y-DNA haplogroup was a thing for any of these now popular genetic labs (not to speak about internet geneticists) -, leave little space for doubt right now. I embraced the results of haplogroup analysis of modern populations, which seemed to support an expansion of Proto-Indo-European R1a-lineages with the Corded Ware culture, and dismissed thus Gimbutas’ and Anthony’s model (of a Yamna -> Bell Beaker expansion of Italo-Celtic). I also embraced the results of the publications on genetics of 2015 with open arms.

But, I was able to change my mind when the careful observation of individual samples of these recent studies began to contradict what we thought, and I did so publicly only recently (publishing the Indo-European demic diffusion model), and more strongly after the latest papers (publishing the updated second edition), without remorse. And I will reverse that decision again if needed, and change it again and again as I feel necessary, no matter how many times. In Science, to adapt to new data does not make you a brownnoser, it makes you a scientist. Not to adapt to new data does not make you a man of firm ideals, or any chivalrous concept you might have about that, it makes you look ignorant and biased. It’s that simple.

Some of you may think that there is a third way: to keep an old, now unlikely idea you have supported in the past, but not bragging about it in the meantime until it is proven fully wrong, just in case it is demonstrated to be right in the end – because then you might claim you were right all along, like you had some magic understanding or hidden data the rest of us didn’t. I don’t think that’s the correct way to behave in a scientific environment, either. That makes you a coward. And you wouldn’t have been right all along: you would have been right, then wrong, then right again. Everybody can see that, and so do you.

Geneticists working on future publications should be planning ahead of what might happen. The overconfidence of Haak et al. (2015), Mathieson et al. (2015), and Allentoft et al. (2015), including Lazaridis et al. (2016), in supporting a Yamna -> Corded Ware migration, and a Corded Ware -> Bell Beaker migration are understandable in a rapidly growing field that didn’t leave enough time to study complex anthropological questions. The recent errors of following that simplistic and wrong model in Mathieson et al. (2017), and Olalde et al. (2017), coupled with Kristiansen’s (2017) and Anthony’s (2017) new interpretations (to fit the conclusions of those genetic studies), can be forgiven, because of all the fuss created around the Steppe admixture concept, and the desire of journals to publish popular papers, and of researchers to go with the tide and gain some popularity along the way.

From now on, however, if the evidence keeps pointing in the same direction, a lack of attention to anthropological detail will be simple wishful ignorance, and that cannot be forgiven in any field that strives to ascertaining the truth. If continued, this trend will damage the field of Human Evolutionary Biology for years – at least in the view of anthropologists, who are the real filter of this field’s conclusions -, when its current results prove wrong. Genetic studies will be banished from anthropological studies, dismissed as a pseudo-science, and avoided by any scientific or academic journal worthy of a minimum self-respect.

To regain trust in a field that purportedly uses “more scientific methods” but is nonetheless proven that wrong for years in its essential assumption (a Yamna -> Corded Ware migration model), and especially when it is associated with the traditionally despised ‘Kossinnian trends‘, will be a hard task for those involved. So many postdoc offers in so many labs being created right now will vanish, as the interest in publishing papers of this discredited field will disappear. This could also threaten the recently renewed impulse by archaeologists and linguists of migration models, which had been rejected for a long time, giving impulse to those who deny them ( e.g. in the UK and in France), or who just don’t want to see Archaeology or Linguistics get involved with such a controversial question, or even between each other.

High-impact factor journals like Nature, Science, PNAS, and those not so famous, as well as their reviewers and readers, are doing a disservice to the endeavour of ascertaining the historical truth, if they allow this to happen without protesting. But such consequences for the field will be their making, and not that of suspicious anthropologists, who do well in distrusting any revolutionary results published by overconfident researchers from newly developed and too broadly defined subfields.

Related:

(Note: featured image is licensed CC-by-sa 4.0 from crates at Wikipedia)

My European Family: The First 54,000 years, by Karin Bojs

steppe-expansion-corded-ware

I have recently read the book My European Family: The First 54,000 years (2015), by Karin Bojs, a known Swedish scientific journalist, former science editor of the Dagens Nyheter.

my-european-family
My European Family: The First 54,000 Years
It is written in a fresh, dynamic style, and contains general introductory knowledge to Genetics, Archaeology, and their relation to language, and is written in a time of great change (2015) for the disciplines involved.

The book is informed, it shows a balanced exercise between responsible science journalism and entertaining content, and it is at times nuanced, going beyond the limits of popular science books. It is not written for scholars, although you might learn – as I did – interesting details about researchers and institutions of the anthropological disciplines involved. It contains, for example, interviews with known academics, which she uses to share details about their personalities and careers, which give – in my opinion – a much needed context to some of their publications.

Since I am clearly biased against some of the findings and research papers which are nevertheless considered mainstream in the field (like the identification of haplogroup R1a with the Proto-Indo-European expansion, or the concept of steppe admixture), I asked my wife (who knew almost nothing about genetics, or Indo-European studies) to read it and write a summary, if she liked it. She did. So much, that I have convinced her to read The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World (2007), by David Anthony.

Here is her summary of the book, translated from Spanish:

The book is divided in three main parts: The Hunters, The Farmers, and The Indo-Europeans, and each has in turn chapters which introduce and break down information in an entertaining way, mixing them with recounts of her interactions and personal genealogical quest.

Part one, The Hunters, offers intriguing accounts about the direct role music had in the development of the first civilizations, the first mtDNA analyses of dogs (Savolainen), and the discovery of the author’s Saami roots. Explanations about the first DNA studies and their value for archaeological studies are clear and comprehensible for any non-specialized reader. Interviews help give a close view of investigations, like that of Frederic Plassard’s in Les Combarelles cave.

Part two, The Farmers, begins with her travel to Cyprus, and arouses the interest of the reader with her description of the circular houses, her notes on the Basque language, the new papers and theories related to DNA analyses, the theory of the decision of cats to live with humans, the first beers, and the houses built over graves. Karin Bojs analyses the subgroup H1g1 of her grandmother Hilda, and how it belonged to the first migratory wave into Central Europe. This interest in her grandmother’s origins lead her to a conference in Pilsen about the first farmers in Europe, where she knows firsthand of the results of studies by János Jakucs, and studies of nuclear DNA. Later on she interviews Guido Brandt and Joachim Burguer, with whom she talks about haplogroups U, H, and J.

The chapter on Ötzi and the South Tyrol Museum of Archaeology (Bolzano) introduces the reader to the first prehistoric individual whose DNA was analysed, belonging to haplogroup G2a4, but also revealing other information on the Iceman, such as his lactose intolerance.

Part three, dealing with the origin of Indo-Europeans, begins with the difficulties that researchers have in locating the origin of horse domestication (which probably happened in western Kazakhstan, in the Russian steppe between the rivers Volga and Don). She mentions studies by David Anthony and on the Yamna culture, and its likely role in the diffusion of Proto-Indo-European. In an interview with Mallory in Belfast, she recalls the potential interest of far-right extremists in genetic studies (and early links of the Journal of Indo-European Studies to certain ideology), as well as controversial statements of Gimbutas, and her potentially biased vision as a refugee from communist Europe. During the interview, Mallory had a copy of the latest genetic paper sent to Nature Magazine by Haak et al., not yet published, for review, but he didn’t share it.

Then haplogroups R1a and R1b are introduced as the most common in Europe. She visits the Halle State Museum of Prehistory (where the Nebra sky disk is exhibited), and later Krakow, where she interviews Slawomir Kadrow, dealing with the potential creation of the Corded Ware culture from a mix of Funnelbeaker and Globular Amphorae cultures. New studies of ancient DNA samples, published in the meantime, are showing that admixture analyses between Yamna and Corded Ware correlate in about 75%.

In the following chapters there is a broad review of all studies published to date, as well as individuals studied in different parts of Europe, stressing the importance of ships for the expansion of R1b lineages (Hjortspring boat).

The concluding chapter is dedicated to vikings, and is used to demystify them as aggressive warmongers, sketching their relevance as founders of the Russian state.

To sum up, it is a highly documented book, written in a clear style, and is capable of awakening the reader’s interest in genetic and anthropological research. The author enthusiastically looks for new publications and information from researchers, but is at the same time critic with them, showing often her own personal reactions to new discoveries, all of which offers a complex personal dynamic often shared by the reader, engaged with her first-person account the full length of the book.

Mayte Batalla (July 2017)

DISCLAIMER: The author sent me a copy of the book (a translation into Spanish), so there is a potential conflict of interest in this review. She didn’t ask for a review, though, and it was my wife who did it.

About the European Union’s arcane language: the EU does seem difficult for people to understand

Mark Mardell asks in his post Learn EU-speak:

Does the EU shroud itself in obscure language on purpose or does any work of detail produce its own arcane language? Of course it is not just the lingo: the EU does seem difficult for people to understand. What’s at the heart of the problem?

His answer on the radio (as those comments that can be read in his blog) will probably look for complex reasoning on the nature of the European Union as an elitist institution, distant from real people, on the “obscure language” (intentionally?) used by MEPs, on the need of that language to be obscured by legal terms, etc.

All that is great. You can talk a lot about the possible reasons why people would find too boring those Europarliament discussions where everyone speaks his own national language; possible reasons why important media (like the BBC) would never show debates on important issues, unless the MEP uses their national language; possible reasons why that doesn’t happen with national parliaments where everyone speaks a common language…

But the most probable answer is so obvious it doesn’t really make sense to ask. The initeresting question is do people actually want to pay the price for having a common Europe?

A simple FAQ about the “advantages” of Esperanto and other conlang religions: “easy”, “neutral” and “number of speakers”

This is, as requested by a reader of the Association’s website, a concise FAQ about Esperanto’s supposed advantages:

Note: Information and questions are being added to the FAQ thanks to the comments made by visitors.

1. Esperanto has an existing community of speakers, it is used in daily life, it has native speakers…

Sorry, I don’t know any native speaker of Esperanto, that has Esperanto as mother tongue – Only this Wikipedia article and the Ethnologue “estimations” without references apart from the UEA website. In fact, the only people that are said to be “native Esperanto speakers” are those 4 or 5 famous people who assert they were educated in Esperanto as second language by their parents. Is it enough to assert “I was taught Volapük as mother tongue by my parents” or “I taught my children Esperanto as mother tongue” to believe it, and report “native speaker” numbers? Do, in any case, those dozens of (in this Esperantist sense) native speakers of Klingon or Quenya that have been reported in the press represent something more than a bad joke of their parents?

Furthermore, there is no single community of speakers that use Esperanto in daily life, I just know some yearly so-called World Congresses where Esperantists use some Esperanto words with each other, just like Trekkies use Klingon words in their Congresses, or LOTR fans use Quenya words. Figures about ‘Esperanto speakers’ – and speakers of Interlingua, Ido, Lingua Franca Nova, Lojban or any other conlang – are unproven (there is no independent, trustworthy research) and numbers are usually given by their supporters using rough and simple numbers and estimations, when not completely invented. Studies have been prepared, explained, financed and directed by national or international associations like the “Universala Esperanto-Asocio”, sometimes through some of its members from different universities, which doesn’t turn those informal studies into “University research”. The answer is not: “let’s learn creationism until evolution is proven”, but the other way round, because the burden of proof is on the least explained reason: If you want people to learn a one-man-made code to substitute their natural languages, then first bring the research and then talk about its proven advantages. Esperantists and other conlangers make the opposite, just like proposers of “altenative” medicines, “alternative” history or “alternative” science, and therefore any outputs are corrupted since its start by their false expectatives, facts being blurred, figures overestimated and findings biased in the best case.

2. But people use it in Skype, Firefox, Facebook,… and there are a lot of Google hits for “Esperanto”. And the Wikipedia in Esperanto has a lot of articles!

So what? The Internet is not the real world. If you look for “herbal medicine”, “creationism” or “penis enlargement”, you’ll find a thousand times more information and websites (“Google hits”) than when looking for serious knowledge, say “surgery”. Likewise, you can find more websites in Esperanto than in Modern Hebrew, but Hebrew has already a strong community of (at least) some millions of third-generation native speakers who use Hebrew in daily life, while Esperanto – which had the broadest potential community – has just some hundreds of fans who play with new technologies, having begun both language projects at the same time back in the 19th century.

Also, is the Wikipedia not a language-popularity contest? A competition between conlangers, like Volapükist vs. Esperantists, Ido-ists against Interlingua-ists, Latinists against Anglo-Saxonists, etc. to see which “community” is able to sleep less and do nothing else than “translate” articles to their most spoken “languages”? How many articles have been written in Esperanto or Volapük, or in Anglo-Saxon or Latin, and how many of them have been consulted thereafter, and by how many people? In fact, Volapük wins now in number of articles, so we should all speak Volapük? No, Esperanto is better than Volapük, of course, because of bla bla…
I guess everyone wins here: Wikipedia has more visitors, more people involved and ready to donate, while those language fans have something more to say when discussing the advantages: hey, we have X million articles in the almighty Wikipedia, while your language has less! Esperanto/Volapük/Ido/… is so cool, we have so many “speakers”! Then, congratulations to all of you Wikipedian conlangers; but, if I were you, I wouldn’t think the real world revolves around the Wikipedia, Google or any other (past or future) website popularity.

3. Esperanto is far easier than what you are suggesting. I am fluent in Esperanto, and I only studied 3 hours! And so did my Esperantist friends!

Do you mean something like saying “me spikas lo esperanto linguo” – with that horrible native accent that only your countrymen understand – and then being able to tell anyone “I speak Esperanto fluently after 3 hours of study”? And then speak about two or three sentences made up of a mix of European words more once a year with your Esperantist friends in an international “Congress”, and then switch to English or to your mother tongue to really explain what you wanted to say? Well then yes, to say “I speak Esperanto fluently” or “I learned Esperanto in 2 days” is really really easy – hey, I’ve just discovered I am a fluent speaker of Esperanto, too! Esperanto is so cool…
But, talking about easiness…Have you conlangers noticed it’s “easy” just for (some) Western Europeans, because those “languages” you are using are made of a mix of the most common and simplest vocabulary of some Western European languages, whereas other speakers think it is as difficult as any Western European language? Do you really really think it is easier than English for a Chinese speaker? I guess good old Mr. Zamenhof didn’t realize that English, French, Latin, Italian, German and Polish wouldn’t be the only international languages today as it was back then in the 19th century, when European countries made up almost the whole international community…
Furthermore, do you really really think that supposed ease of use, which is actually because of the lack of elaborated grammatical and syntactical structures, hasn’t got a compensation in culture, communication and even reasoning?

4. But I’ve been told that Esperanto is successful because it has a (mostly) European vocabulary that makes it easy for Europeans, an agglutinative structure that makes it especially fit for Africans and Asians, and some other features that make it better than every other language for everyone…
I won’t be extending into linguistic details, because those assertions are obviously completely arbitrary and untrustworthy. Not only Esperantism has failed to prove such claims, but also some people have dedicated extensive linguistic studies and thoughts to see if that was right – Esperantism has obtained independent criticism by insiders and outsiders alike, and still they claim the same falsenesses again and again. You have e.g. the thorough article “Learn not to speak Esperanto” which, from a conlanger’s point of view, discusses every supposed advantage of this Polish ophthalmologist’s conlang. Also, it is interesting that some researchers have noted the condition of Esperanto for most speakers as an anti-language, as they use the same grammar and words as the main speech community, but in a different way so that they can only be understood by “insiders”. That can indeed be the key to the perceived advantages of Esperanto by Esperantists of different generations and places, just like anti-social people like slang words to communicate with members of their community and to hide from outsiders, and it is especially interesting in light of the condition of Esperantism as an anti-social movement more than a promotion of a language, representing Esperanto with flags, slogans (“democracy”, “rights”, “freedom”,…), international consultative organizations and congresses…

5. You talk about real cultural neutrality for the European Union; but, since there are several non Indo-European languages inside the EU, Proto-Indo-European does not solve that issue either.

In fact, the European Union is made up of a great majority of Indo-European speakers (more than 97% falling short), and the rest – i.e. Hungarians, Finnish, Maltese, Basque speakers – have a great knowledge (and speaking tradition) of other IE languages of Europe, viz. Latin, French, English, Swedish, Spanish. So, we are proposing to adopt a natural language common to the GREAT majority of the European Union citizens (just like Latin is common to the vast majority of Romance-speaking countries), instead of the current official situation(s) of the EU, like English, or English+French, or English+French+German… To say that Indo-European is not neutral as the European Union’s language, because not all languages spoken in the EU are Indo-European, is a weak argument; to say exactly that, and then to propose English, or English+French, or even a two-day-of-work invention (a vocabulary mix of 4 Western European languages) by a Polish ophthalmologist, that’s a big fallacy.

6. So why are you proposing Indo-European? Why do you bother?

Because we want to. Because we like Europe’s Indo-European and the other Proto-Indo-European dialects, just like people who want to study and speak Latin, Greek, or Sanskrit do it. Have you noticed the difference in culture, tradition, history, vocabulary, etc. between what you are suggesting (artificial one-man-made inventions) and real world historical languages? Hint: that’s why many universities offer courses in or about Latin, Greek, Sanskrit, Proto-Indo-European, etc. while Esperanto is still (after more than a century) another conlanging experiment for those who want to travel abroad once a year to meet other conlang fans.
We propose it because we believe this language could be one practical answer (maybe the only real one) for the communication problems that a unified European Union poses. Because we don’t believe that any “Toki Pona” language invented by one enlightened individual can solve any communication or cultural problem at all in the real world. Because historical, natural languages like Hebrew, or Cornish, or Manx, or Basque, are interesting and valuable for people; whereas “languages” like Esperanto, Interlingua, Ido, Lojban or Klingon aren’t. You cannot change how people think, but you can learn from their interests and customs and behave accordingly: if, knowing how people reacted to Esperanto and Hebrew revival proposals after a century, you decide to keep trying to change people (so that they accept inventions) instead of changing your ideas (so that you accept natural languages), maybe you lack the necessary adaptation, a common essential resource in natural selection, appliable to psychology too.

7. Why don’t you explain this when talking about Proto-Indo-European advantages in the Dnghu Association’s website?

Because if you make a website about science, and you include a reference like: “Why you shouldn’t believe in Islamic creationism?” you are in fact saying Islamic creationism is so important that you have to mention it when talking about science… It’s like creating a website about Internal Medicine, and trying to answer in your FAQ why Homeopathy is not the answer for your problems: it’s just not worth it, if you want to keep a serious appearance. We are not the anti-Esperanto league or something, but the Indo-European Language Association.
Apart from this, proto-languages are indeed difficult to promote as ‘real’ languages, because there is no inscription of them, so they remain ‘hypothetical’, however well they might be reconstructed, like Europe’s Indo-European, or Proto-Germanic – see Five lines of ancient script on a shard of pottery could be the longest proto-Canaanite text for a curious example of a proto-language becoming a natural dead one. For many people, Proto-Basque (for example) seems exactly as hypothetical as Proto-Indo-European, when it indeed isn’t. If we also mixed Esperanto within a serious explanation of our project as a real alternative, that would be another reason for readers to dismiss the project as “another conlanging joke”. No, thanks.

8. Esperanto has its advantages and disadvantages. You just don’t talk from an objective (or “neutral”) point of view: most linguists (of any opinion) are – like Esperantists – biased, so there is no single truth, but opinions.

Yes, indeed. Many Esperantists, as any supporter of pseudosciences, conclude that people might be for or against their theory, and that therefore both positions are equally valid and should be taken with a grain of salt. For this question, I think it’s interesting, for those who think in terms of “equal validity” of their minority views when confronted to what is generally accepted, to take a quick look at Wikipedia’s Neutral Poin of View – equal validity statement, because they’ve had a lot of problems with that issue. To sum up, it says that if you talk about biology, you cannot consequently demand that evolution and creationism be placed as equally valid theories, only because some people (are willing to) assume they are; if you talk about the holocaust, or medicine, you don’t place revisionism or alternative medicines as equally valid theories or sciences: there are academic and scientific criteria that help classify knowledge into scientific and pseudoscientific. Most (if not all) Esperantist claims are at best pseudoscientific, and when they claim real advantages of their conlang, those are just as well (often better) applied to other conlangs or even to any language.

9. Then why do the “Universala Esperanto-Asocio” enjoys consultave relations with both UNESCO and the United Nations? Why is Esperantism described as “democracy”, “education”, “rights”, “emancipation”,… Why do still Esperantists support Esperanto, when it hasn’t got any advantages at all, and they know it?
The only conclusion possible is that Esperantism (and some other fanatic conlangism) is actually a religion, because it’s based on faith alone: faith on believed “easiness”, on believed “neutrality”, on believed “number of speakers”, without any facts, numbers or studies to support it; on the belief that languages can be “better” and “worse” than others. And it’s obviously nonsense to discuss faith and beliefs, as useless as a discussion about Buddha, Muhammad or Jesus. But, trying to disguise those beliefs as facts helps nobody, not even Esperantism, as it can only attract those very people that see creationism and alternative medicines as real alternatives to raw scientifical knowledge. Esperanto is the god, Zamenhof the messiah and the UEA its church.