A manuscript co-authored by Angel Carracedo, from the University of Santiago de Compostela, and (always according to him) pre-accepted in Nature, will offer more insight into the population substructure of Spain, based on autosomal DNA.
Carracedo’s lecture about DNA (in Galician), including his summary of the paper (from december 2017):
Some of the points made in the video:
The study shows a situation parallelling – as expected – the expansion of Spanish Medieval kingdoms during the Reconquista (and subsequent repopulation).
In it, the biggest surprise seems to be the greater substructure found in Galicia, the north-western Spanish territory – greater even than expected by the authors.
As a side note, Galicia shows a great influence from “Moorish” ancestral components, due mainly to the influx from Portugal, which shows more.
It is difficult to judge only from the image and his words, but one could say that there are:
Certain quite old ancestral Galician groups;
then two – also quite old – ancestral Basque groups;
then more recent Galician groups;
and then a common, central Spanish group – including
a wider Asturian-Catalan group, with a western Asturian-Leonese, and an eastern Catalan subgroup;
and a central Castillian-Aragonese group, also with a western Castillian, and an eastern Aragonese subgroup.
We thought that certain parts of the British Isles could show ancestral components related to the old population, although this has not proven exactly right, due to more recent population expansions.
However, this paper might shed light to the controversy surrounding Lusitanian (possibly Gallaico-Lusitanian) as a Pre-Celtic Indo-European group of Iberia, either slightly older as an Italo-Celtic dialect, or potentially from the Bell Beaker expansion, whose genetic imprint might have survived the Roman conquest, which apparently didn’t replace its ancestral population.
Given the presence of a central Spanish group opposed to the other minor groups – and knowing that (at least part of) the Medieval kingdoms should be related to the Occitan region – due to the Celtic expansion, and also potentially later during the Visigothic Kingdom, and the Carolingian Empire – , we can only guess that the other (north-western and Basque) groups are potentially quite old, and reflect prehistoric population structures.
Just speculating here, of course. Another interesting genetic paper to await…
Culture evolves according to dynamics on multiple temporal scales, from individuals’ minute-by-minute behaviour to millennia of cultural accumulation that give rise to population-level differences. These dynamics act on a range of entities—including behavioural sequences, ideas and artefacts as well as individuals, populations and whole species—and involve mechanisms at multiple levels, from neurons in brains to inter-population interactions. Studying such complex phenomena requires an integration of perspectives from a diverse array of fields, as well as bridging gaps between traditionally disparate areas of study. In this article, which also serves as an introduction to the current special issue, we highlight some specific respects in which the study of cultural evolution has benefited and should continue to benefit from an integrative approach. We showcase a number of pioneering studies of cultural evolution that bring together numerous disciplines. These studies illustrate the value of perspectives from different fields for understanding cultural evolution, such as cognitive science and neuroanatomy, behavioural ecology, population dynamics, and evolutionary genetics. They also underscore the importance of understanding cultural processes when interpreting research about human genetics, neuroscience, behaviour and evolution.
Within the past decade or so, archaeology has increasingly utilised and contributed to major advances in scientific methods when exploring the past. This progress is frequently celebrated as a quantum leap in the possibilities for understanding the archaeological record, opening up hitherto inaccessible dimensions of the past. This article represents a critique of the current consumption of science in archaeology, arguing that the discipline’s grounding in the humanities is at stake, and that the notion of ‘interdisciplinarity’ is becoming distorted with the increasing fetishisation of ‘data’, ‘facts’ and quantitative methods. It is argued that if archaeology is to break free of its self-induced inferiority to and dependence on science, it must revitalise its methodology for asking questions pertinent to the humanities.
Thus, I argue that what we are witnessing with ‘the third science revolution’ (Kristiansen 2014) is precisely the proliferation of an already very authoritative science ideal in archaeology. And I worry that this dominance will limit research possibilities and potentials rather than encouraging plurality and radical experimentation with different forms of knowing.
I do believe in the coexistence of disparate academic principles and that collaboration is very often necessary, but I am also of the conviction that some degree of epistemological friction keeps both fields of research progressing. Nurturing distinctions, in other words, is no less useful than aiming for assimilation. What I am arguing for is thus a more respectful friction than the one characterising the processual/post-processual collisions, hoping for an academic environment where differences between research ideals are humbly accepted and cultivated precisely for their disparate strengths.
So, what I am arguing for is a more kaleidoscopic academic landscape, where different positions do not always have to assume a defensive or compromising stance, especially in confrontation with paradigms that are prospering politically. This also implies that science is not simply in the service of archaeology, as Lidén argues, but that we need to consider how archaeology may benefit science more generally by continuing to debate epistemological grounds, methodology and our modes of inquiry. And so, my fellow archaeologists: ask not what science can do for us, but what we can do for science.
In my original article, I addressed the widespread tendency in archaeology to disseminate research findings with sometimes too much conviction, where ambiguous results (and limited statistical data) are adopted with little concern for the inherent uncertainties. It is precisely this valorisation and authority of scientific observations that I claim to lead to an implicit devaluation of studies based in the humanities. The problem is – as stated numerous times in my original article – not science, but the consumption of scientific observations in archaeology, where the subtleties and not least ambiguities of scientific results are filtered out, leaving space almost exclusively for scientifically ‘proven’ facts and unequivocal results. This mode of consumption stands in direct contrast to the epistemological observation in the sciences, dictating that ‘“proof” and “certainty” are actually in short supply in the world of science’ (Freudenburg et al. 2008, p. 5). Hence, the risk is that archaeology somewhat uncritically adopts scientific observations that are in fact ‘empirically underdetermined – based largely on evidence that is in the category of the “maybe,” being inherently ambiguous rather than being absolutely clear-cut’ (Freudenburg et al. 2008, p. 6).
As I said recently on the article Massive Migrations…, by Martin Furholt, we are living a historical debate on essential questions for the future of all these disciplines.
And, as always, there is no shortcut to reading the texts. Unlike in Science, you cannot write a table with a summary of findings…
Featured image from Allentoft et al. “They conclude that the Corded Ware culture of central Europe had ancestry from the Yamnaya. Allentoft et al. also show that the Afanasievo culture to the east is related to the Yamnaya, and that the Sintashta and Andronovo cultures had ancestry from the Corded Ware. Arrows indicate migrations — those from the Corded Ware reflect the evidence that people of this archaeological culture (or their relatives) were responsible for the spreading of Indo-European languages. All coloured boundaries are approximate.”
These differences between closely related regions, in all these cases and especially among steppe cultures, even when they are supported by Archaeology and anthropological models of migration (and compatible with linguistic models), are expected to be minimal.
Fortunately, we have phylogeography, which helps us point in the right direction when assessing potential migrations using genomic data.
User Tomenable recently pointed out a curious finding on Anthrogenica, from data available in Mathieson et al (2017): in ADMIXTURE results with K=12, a different ancestral component (in light green in the paper, see below) is traceable from the North Caspian steppe since the Neolithic. This is also partially distinguishable on K=10 and K=11, although not so clearly differentiating among later cultures.
Interesting is also the appearance of similar ancestral components later in Vučedol – which probably received admixture from Yamna settlers (see admixture components in West Yamna samples and in the Yamna settler from Bulgaria) – , and later still in the Balkans.
On the other hand, previous ancestral components in outliers from the Balkans seem to be more similar to Sredni Stog samples, giving still more strength to the hypothesis that this common (“steppe”) component expanded westward within the Pontic-Caspian steppe with the spread of Suvorovo-Novodanilovka chiefs.
Problems with this interpretation include:
1) The scarce samples available, the different cultures included, and the CV values of the K populations selected in ADMIXTURE.
3) The sample classified as Latvia_LN/CWC has this component. I have already said before that, given the differences with all other Corded Ware samples, this quite early sample might be an outlier, with Khvalynsk/Yamna population connected directly to the ancestors of this individual, possibly through exogamy (as it is clear from my sketch below). Whether or not this is an outlier among CWC populations in the Baltic, only future samples can tell.
4) Three later individuals from Corded Ware in Germany have the component, in a minimal amount. I would bet – judging by their position in the graphic – that this might be explained through the Esperstedt family. These individuals might have in turn got the contribution directly from the oldest member, who shows what seems (in PCA) like a recent admixture from contemporary steppe cultures (such as the Catacomb culture).
Again, needle in a haystack… And confirmation bias by me, indeed.
But interesting nonetheless.
EDIT (4 JAN 2017): A reader points out that the interpretation of Unsupervised ADMIXTURE should work backwards (i.e. different contributions into different modern populations), and not based solely on ancestral populations, which seems probably right. So again, confirmation bias (and potentially wrong direction fallacy) by me…
The history of human populations occupying the plains and mountain ridges separating Europe from Asia has been eventful, as these natural obstacles were crossed westward by multiple waves of Turkic and Uralic-speaking migrants as well as eastward by Europeans. Unfortunately, the material records of history of this region are not dense enough to reconstruct details of population history. These considerations stimulate growing interest to obtain a genetic picture of the demographic history of migrations and admixture in Northern Eurasia.
We genotyped and analyzed 1076 individuals from 30 populations with geographical coverage spanning from Baltic Sea to Baikal Lake. Our dense sampling allowed us to describe in detail the population structure, provide insight into genomic history of numerous European and Asian populations, and significantly increase quantity of genetic data available for modern populations in region of North Eurasia. Our study doubles the amount of genome-wide profiles available for this region.
We detected unusually high amount of shared identical-by-descent (IBD) genomic segments between several Siberian populations, such as Khanty and Ket, providing evidence of genetic relatedness across vast geographic distances and between speakers of different language families. Additionally, we observed excessive IBD sharing between Khanty and Bashkir, a group of Turkic speakers from Southern Urals region. While adding some weight to the “Finno-Ugric” origin of Bashkir, our studies highlighted that the Bashkir genepool lacks the main “core”, being a multi-layered amalgamation of Turkic, Ugric, Finnish and Indo-European contributions, which points at intricacy of genetic interface between Turkic and Uralic populations. Comparison of the genetic structure of Siberian ethnicities and the geography of the region they inhabit point at existence of the “Great Siberian Vortex” directing genetic exchanges in populations across the Siberian part of Asia.
Slavic speakers of Eastern Europe are, in general, very similar in their genetic composition. Ukrainians, Belarusians and Russians have almost identical proportions of Caucasus and Northern European components and have virtually no Asian influence. We capitalized on wide geographic span of our sampling to address intriguing question about the place of origin of Russian Starovers, an enigmatic Eastern Orthodox Old Believers religious group relocated to Siberia in seventeenth century. A comparative reAdmix analysis, complemented by IBD sharing, placed their roots in the region of the Northern European Plain, occupied by North Russians and Finno-Ugric Komi and Karelian people. Russians from Novosibirsk and Russian Starover exhibit ancestral proportions close to that of European Eastern Slavs, however, they also include between five to 10 % of Central Siberian ancestry, not present at this level in their European counterparts.
Our project has patched the hole in the genetic map of Eurasia: we demonstrated complexity of genetic structure of Northern Eurasians, existence of East-West and North-South genetic gradients, and assessed different inputs of ancient populations into modern populations.
Featured image, from the article: “Departures from the expected IBD. Shown populations exceed the expected IBD sharing by more than two standard deviations.”
A thousand years back I’m descended from nearly everyone everywhere in Europe. I’m related to these individuals via millions of lines of descent back through my vast family tree. Yet the majority of the lines back through my pedigree trace to people living in the UK and Western Europe. Many lines trace back to more distant locations, but these are relatively few in number compared to those tracing back to closer to home. Ancestors along each of these lines are (roughly) equally likely to contribute to my genome. Therefore, most of my roughly 2600 genetic ancestors from 1000 years ago, who contributed the majority of my genome to me, will be random people living in the UK and western Europe at that time (who happened to leave descendants).
Looking back a few thousand years more, I’m a descendant of nearly everyone who ever lived almost everywhere in the world (at least those who left descendants, and many did). Yet most of the just over ~6000 individuals from that time who contributed the majority of my genome to me will mostly be found all over Western Eurasia. There’s nothing much special about these individuals who happen to be my genetic ancestors a few thousand years back. They’re likely not royalty. My genetic ancestors are just a random subset of all of my genealogical ancestors, they just happen to be my genetic ancestors due to the vagaries of meiosis and recombination.
As always, a humbling example, e.g. for those looking at haplogroups in the distant past to make modern ethnolinguistic identifications.
Featured image (from the article): Simulation of how much of your autosomal genome is present in each genealogical ancestor as we go back up the generations. Image explained in detail in the article How many genetic ancestors do I have?
The extent of population structure within Ireland is largely unknown, as is the impact of historical migrations. Here we illustrate fine-scale genetic structure across Ireland that follows geographic boundaries and present evidence of admixture events into Ireland. Utilising the ‘Irish DNA Atlas’, a cohort (n = 194) of Irish individuals with four generations of ancestry linked to specific regions in Ireland, in combination with 2,039 individuals from the Peoples of the British Isles dataset, we show that the Irish population can be divided in 10 distinct geographically stratified genetic clusters; seven of ‘Gaelic’ Irish ancestry, and three of shared Irish-British ancestry. In addition we observe a major genetic barrier to the north of Ireland in Ulster. Using a reference of 6,760 European individuals and two ancient Irish genomes, we demonstrate high levels of North-West French-like and West Norwegian-like ancestry within Ireland. We show that that our ‘Gaelic’ Irish clusters present homogenous levels of ancient Irish ancestries. We additionally detect admixture events that provide evidence of Norse-Viking gene flow into Ireland, and reflect the Ulster Plantations. Our work informs both on Irish history, as well as the study of Mendelian and complex disease genetics involving populations of Irish ancestry.
Previous studies of the genetic landscape of Ireland have suggested homogeneity, with population substructure undetectable using single-marker methods. Here we have harnessed the haplotype-based method fineSTRUCTURE in an Irish genome-wide SNP dataset, identifying 23 discrete genetic clusters which segregate with geographical provenance. Cluster diversity is pronounced in the west of Ireland but reduced in the east where older structure has been eroded by historical migrations. Accordingly, when populations from the neighbouring island of Britain are included, a west-east cline of Celtic-British ancestry is revealed along with a particularly striking correlation between haplotypes and geography across both islands. A strong relationship is revealed between subsets of Northern Irish and Scottish populations, where discordant genetic and geographic affinities reflect major migrations in recent centuries. Additionally, Irish genetic proximity of all Scottish samples likely reflects older strata of communication across the narrowest inter-island crossing. Using GLOBETROTTER we detected Irish admixture signals from Britain and Europe and estimated dates for events consistent with the historical migrations of the Norse-Vikings, the Anglo-Normans and the British Plantations. The influence of the former is greater than previously estimated from Y chromosome haplotypes. In all, we paint a new picture of the genetic landscape of Ireland, revealing structure which should be considered in the design of studies examining rare genetic variation and its association with traits.
Here are some interesting excerpts (emphasis mine):
Population structure in Ireland
The geographical distribution of this deep subdivision of Leinster resembles pre-Norman territorial boundaries which divided Ireland into fifths (cúige), with north Leinster a kingdom of its own known as Meath (Mide) . However interpreted, the firm implication of the observed clustering is that despite its previously reported homogeneity, the modern Irish population exhibits genetic structure that is subtly but detectably affected by ancestral population structure conferred by geographical distance and, possibly, ancestral social structure.
ChromoPainter PC1 demonstrated high diversity amongst clusters from the west coast, which may be attributed to longstanding residual ancient (possibly Celtic) structure in regions largely unaffected by historical migration. Alternatively, genetic clusters may also have diverged as a consequence of differential influence from outside populations. This diversity between western genetic clusters cannot be explained in terms of geographic distance alone.
In contrast to the west of Ireland, eastern individuals exhibited relative homogeneity; (…) The overall pattern of western diversity and eastern homogeneity in Ireland may be explained by increased gene flow and migration into and across the east coast of Ireland from geographically proximal regions, the closest of which is the neighbouring island of Britain.
Analysis of variance of the British admixture component in cluster groups showed a significant difference (p < 2×10-16), indicating a role for British Anglo-Saxon admixture in distinguishing clusters, and ChromoPainter PC2 was correlated with the British component (p < 2×10-16), explaining approximately 43% of the variance. PC2 therefore captures an east to west Anglo-Celtic cline in Irish ancestry. This may explain the relative eastern homogeneity observed in Ireland, which could be a result of the greater English influence in Leinster and the Pale during the period of British rule in Ireland following the Norman invasion, or simply geographic proximity of the Irish east coast to Britain. Notably, the Ulster cluster group harboured an exceptionally large proportion of the British component (Fig 1D and 1E), undoubtedly reflecting the strong influence of the Ulster Plantations in the 17th century and its residual effect on the ethnically British population that has remained.
On the genetic structure of the British Isles
The genetic substructure observed in Ireland is consistent with long term geographic diversification of Celtic populations and the continuity shown between modern and Early Bronze Age Irish people
Clusters representing Celtic populations harbouring less Anglo-Saxon influence separate out above and below SEE on PC4. Notably, northern Irish clusters (NLU), Scottish (NISC, SSC and NSC), Cumbria (CUM) and North Wales (NWA) all separate out at a mutually similar level, representing northern Celtic populations. The southern Celtic populations Cornwall (COR), south Wales (SWA) and south Munster (SMN) also separate out on similar levels, indicating some shared haplotypic variation between geographically proximate Celtic populations across both Islands. It is notable that after the split of the ancestrally divergent Orkney, successive ChromoPainter PCs describe diversity in British populations where “Anglo-saxonization” was repelled . PC3 is dominated by Welsh variation, while PC4 in turn splits North and South Wales significantly, placing south Wales adjacent to Cornwall and north Wales at the other extreme with Cumbria, all enclaves where Brittonic languages persisted.
In an interesting symmetry, many Northern Irish samples clustered strongly with southern Scottish and northern English samples, defining the Northern Irish/Cumbrian/Scottish (NICS) cluster group. More generally, by modelling Irish genomes as a linear mixture of haplotypes from British clusters, we found that Scottish and northern English samples donated more haplotypes to clusters in the north of Ireland than to the south, reflecting an overall correlation between Scottish/north English contribution and ChromoPainter PC1 position in Fig 1 (Linear regression: p < 2×10-16, r2 = 0.24).
North to south variation in Ireland and Britain are therefore not independent, reflecting major gene flow between the north of Ireland and Scotland (Fig 5) which resonates with three layers of historical contacts. First, the presence of individuals with strong Irish affinity among the third generation PoBI Scottish sample can be plausibly attributed to major economic migration from Ireland in the 19th and 20th centuries . Second, the large proportion of Northern Irish who retain genomes indistinguishable from those sampled in Scotland accords with the major settlements (including the Ulster Plantation) of mainly Scottish farmers following the 16th Century Elizabethan conquest of Ireland which led to these forming the majority of the Ulster population. Third, the suspected Irish colonisation of Scotland through the Dál Riata maritime kingdom, which expanded across Ulster and the west coast of Scotland in the 6th and 7th centuries, linked to the introduction and spread of Gaelic languages . Such a migratory event could work to homogenise older layers of Scottish population structure, in a similar manner as noted on the east coasts of Britain and Ireland. Earlier communications and movements across the Irish Sea are also likely, which at its narrowest point separates Ireland from Scotland by approximately 20 km.
Genomic footprints of migration into Ireland
Quite interesting is that it is haplogroups, and not admixture, that which defines the oldest migration layers into Ireland. Without evidence of paternal Y-DNA lineages we would probably not be able to ascertain the oldest migrations and languages broght by migrants, including Celtic languages:
Of all the European populations considered, ancestral influence in Irish genomes was best represented by modern Scandinavians and northern Europeans, with a significant single-date one-source admixture event overlapping the historical period of the Norse-Viking settlements in Ireland (p < 0.01; fit quality FQB > 0.985; Fig 6). (…) This suggests a contribution of historical Viking settlement to the contemporary Irish genome and contrasts with previous estimates of Viking ancestry in Ireland based on Y chromosome haplotypes, which have been very low . The modern-day paucity of Norse-Viking Y chromosome haplotypes may be a consequence of drift with the small patrilineal effective population size, or could have social origins with Norse males having less influence after their military defeat and demise as an identifiable community in the 11th century, with persistence of the autosomal signal through recombination.
European admixture date estimates in northwest Ulster did not overlap the Viking age but did include the Norman period and the Plantations
The genetic legacies of the populations of Ireland and Britain are therefore extensively intertwined and, unlike admixture from northern Europe, too complex to model with GLOBETROTTER.
Featured image, from the article on Science Reports: The clustering of individuals with Irish and British ancestry based solely on genetics. Shown are 30 clusters identified by fineStructure from 2,103 Irish and British individuals. The dendrogram (left) shows the tree of clusters inferred by fineStructure and the map (right) shows the geographic origin of 192 Atlas Irish individuals and 1,611 British individuals from the Peoples of the British Isles (PoBI) cohort, labelled according to fineStructure cluster membership. Individuals are placed at the average latitude and longitude of either their great-grandparental (Atlas) or grandparental (PoBI) birthplaces. Great Britain is separated into England, Scotland, and Wales. The island of Ireland is split into the four Provinces; Ulster, Connacht, Leinster, and Munster. The outline of Britain was sourced from Global Administrative Areas (2012). GADM database of Global Administrative Areas, version 2.0. www.gadm.org. The outline of Ireland was sourced from Open Street Map Ireland, Copyright OpenStreetMap Contributors, (https://www.openstreetmap.ie/) – data available under the Open Database Licence. The figure was plotted in the statistical software language R46, version 3.4.1, with various packages.
A preprint article by two of the most prolific researchers in Human Ancestry is out, and they request feedback: Ancient genomics: a new view into human prehistory and evolution, by Skoglund and Mathieson (2017). Right now, it is downloadable on Dropbox.
The first decade of ancient genomics has revolutionized the study of human prehistory and evolution. We review new insights based on ancient genomic data, including greatly increased resolution of the timing and structure of the out-of-Africa event, the diversification of present-day non-African populations, and the earliest expansions of those populations into Eurasia and America. Prehistoric genomes now document patterns of population continuity and change on every inhabited continent–in particular the effect of agricultural expansions in Africa, Europe and Oceania–and record a history of natural selection that shapes present-day phenotypic diversity. Despite these advances, much remains unknown, in particular about the genomic histories of Asia–the most populous continent, and Africa–the continent that contains the most genetic diversity. Ancient genomes from these and other regions, integrated with a growing understanding of the genomic basis of human phenotypic diversity, will be in focus during the next decade of research in the field.
The paper may be highly recommended as an introduction for anyone interested in the field of Human Ancestry in general.
The next substantial change is closely related to ancestry that by around 5000 BP extended over a region of more than 2000 miles of the Eurasian steppe, including in individuals associated with the Yamnaya Cultural Complex in far-eastern Europe (1; 38) and with the Afanasievo culture in the central Asian Altai mountains (1). This “steppe” ancestry is itself a mixture between ancestry that is related to Mesolithic hunter-gatherers of eastern Europe and ancestry that is related to both present-day populations (38) and Mesolithic hunter-gatherers (46) from the Caucasus mountains, and also to the populations of Neolithic (11), and Copper Age (56) Iran. Steppe ancestry appeared in southeastern Europe by 6000 BP (72), northeastern Europe around 5000 BP (47) and central Europe at the time of the Corded Ware Complex around 4600 BP (1; 38). These dates are reasonably tight constraints, because in each case there is no evidence of steppe ancestry in individuals immediately preceding these dates (47; 72). Gene flow on the steppe was extensive and bidirectional, as shown by the eastward flow of Anatolian Neolithic ancestry– reaching well into central Eurasia by the time of the Andronovo culture ~3500 BP (1)–and the westward flow of East Asian ancestry–found in individuals associated with the Iron Age Scythian culture close to the Black Sea ~2500 BP (143).
Copper and Bronze Age population movements (14; 78 Martiniano, 2017 #8761; 85; 112), as well as later movements in the Iron Age and Historical period (70; 119) further distributed steppe ancestry around Europe. Present-day western European populations can be modeled as mixtures of these three ancestry components (Mesolithic hunter-gatherer, Anatolian Neolithic and Steppe) (38; 57). In eastern Europe, further shifts in ancestry are the result of additional or distinct gene flow from Anatolia throughout the Neolithic and Bronze Age in the Aegean (42; 51; 55; 72; 87), and gene flow from Siberian-related populations in Finland and the Baltic region (38). East-west gene flow also brought new ancestry–related to populations from 265 Copper Age Iran–to the Levant during the Copper and Bronze ages (39; 56).
The geographic structure of these population transformations gave rise to population structure of present-day Europe. For example Anatolian Neolithic ancestry is highest in southern European populations like Sardinians, and lowest in northern European populations (38). Steppe ancestry is at high frequency in north-central Europeans and low in the south. Isolation-by-distance may have contributed to these patterns to some extent, but the contribution must have been small. In much of Europe, extreme population discontinuity was the norm.
Featured image: from the article, “Major Holocene population movements and expansions that have been demonstrated using ancient DNA.”
Human ancestry can only help solve anthropological questions by using all anthropological disciplines involved. I have said that many times in this blog.
Correlation does not mean causation
Really, it does not.
You might think the tenet ‘correlation does not mean causation‘ must be evident at this point in Statistics, and it must also be for all those using statistical methods in their research. But it is sadly not so. A lot of researchers just look for correlation, and derive conclusions – without even an initial sound hypothesis to be contrasted… You can judge for yourself, e.g. reading the many instances of this complaint in recent publications of Biomedical and Social Sciences, on the interesting blog Statistical Modeling, Causal Inference, and Social Science.
In anthropological questions regarding Indo-European studies there is an added handicap: not taking correlation to mean causation does also mean – to avoid at least the most obvious confounders – taking into account the multiple linguistic and archaeological data that are available right now, to explain the expansion of Indo-European languages.
You might also believe that international researchers in Human Evolutionary Biology – after all, this is essentially a biomedical discipline – are acquainted with statistical methods and their problems when applied to their field. And that scientific journals – and especially those with the highest impact factors, like Nature, Science, or PNAS – have professional, careful reviewers who would never accept papers that equal correlation with causation, especially when Social Sciences are involved (because this alone might make errors grow exponentially…). Sadly, this is obviously not so, either.
Both studies [Haak et al. (2015) and this one] found a genetic affinity between samples from a central European culture known as Corded Ware, which existed from around 2500 bc, and samples from the earlier Yamnaya steppe culture. This similarity between distant populations is best explained by a substantial westward expansion of the Yamnaya or their close relatives into central Europe (Fig. 1b). Such an expansion is consistent with the steppe hypothesis, which argues that Corded Ware cultures were a conduit for the dispersal of Indo-European languages into Europe.
More interesting than these vague words – and the short, almost invisible suggestion that Yamna may not be exactly the population behind Corded Ware peoples – are the maps that illustrated in Nature their risky hypothesis: they called it “steppe hypothesis“, like that (in general terms), as if everyone defending a steppe origin for Proto-Indo-European would support such a model, when they actually referred to the specific hypothesis of one of their authors (Kristiansen), one of the few archaeologists who keep Gimbutas’ concept of the ‘Kurgan peoples’ alive, based on the Corded Ware culture:
In many publications that followed, the trend has been to reproduce this graphical model, by asserting (or implying) that Bell Beaker peoples were the result of subsequent Corded Ware migrations, and indeed that Corded Ware peoples migrated from the Yamna culture, and were thus the vector of expansion for Indo-European languages in Europe.
We shall see then just a rather surreptitious shift in terminology from ‘Yamnaya’ to ‘steppe’ component, to adapt to the new data – i.e. some damage control while the ship of ‘Yamnaya ancestry’ capsizes – but little else. “Earlier ‘Yamnaya ancestry’, you say? Just, you know, let’s call it ‘steppe ancestry’ and shift the expansion of Indo-European languages to one or two thousand years earlier, and done!”
The damage of this post-truth genetics is already done: we will see the unending distribution on the Internet in general, and on social networks in particular, of these grandiose conclusions, of far-fetched Indo-European migration models that include the Corded Ware culture, of simplistic maps with apparently harmless ‘arrows of migration’ (like the above) representing fictional population movements suggesting nonexistent dialectal branches.
You might be one of those sceptics wary of so many boring statistical rules: “But it’s a safe reasoning: Yamanaya samples have an ‘ancestral component’ that is found elevated in Corded Ware samples, and less so in Bell Beaker samples, and PCA showed a similar result…so the migration model Yamnaya -> Corded Ware -> Bell Beaker is a priori correct, right?”
The ‘Future American’ hypothesis
Let me illustrate this attractive “Correlation = Causation” argument, using it to solve the problem of Future American languages.
Suppose we live in a future post-apocalyptic world ca. 3500 AD, with no surviving historical records before 3000 AD. None. Just investigation of cultures and their relationship by Archaeology, proto-languages reconstructed and language families identified by Linguistics, etc.
We have thus Future Germanic and Future Romance as the only language families spoken in Future Western Europe and in the Future Americas, in a distribution similar to the present day*, and we have certain somehow related archaeologically-defined cultures on both sides of the Atlantic, like Briton, Iberian, Norman, or Lowlandish, although their distribution remains partly undefined in time and space.
* If you are really curious about this scenario, you can read about the potential evolution of a Future North-American language.
But what languages did the ancestors of Future Americans speak, and who spread them? That question remains far from being settled by our future researchers, in spite of the solidest linguistic and migration models (talking mainly about Briton and Iberian cultures): too many authorities out there questioning them, fighting to impose their own pet theories.
Suddenly, the newly developed field of Human Ancestry comes to save the day. So let’s say we have this map of ancient samples recovered (dated from, say, the 6th to the 18th century AD), and our study is centered on the newly described “Western European” component (a precise combination of, say, WHG+steppe), which peaks in early samples from the Low Lands – hence we call it, quite daringly, “Lowlandic component“.
Our group is keen to demonstrate that the ancient Lowlandic culture described in Archaeology (marked especially by the worldwide distribution of tulips among other traits) is the origin of Western European and American languages… Now, let’s reach conclusions about migrations in the Middle Ages!
PCA shows that South-West European samples cluster closely to some North-West European samples, and that some late South American samples available cluster at some distance from North American samples – nearer to a native component represented by two individuals with 0% Lowlandic ancestry and a different cluster in PCA. And some North-American samples cluster quite closely to North-West European samples.
Based on the decrease in ‘Lowlandic component’ in the different samples and on PCA, we conclude that Lowlandic peoples (“or their close relatives”) must have migrated at the same time to North America, South America (or potentially from North America to South America?) as well as western, central, and northern Europe. Both migration events must have happened roughly at the same time, in part because both distinct language families appear in a north-south distribution, and Proto-Lowlandic must be (according to Genetics) the ancestor of both, Proto-Future-Germanic and Proto-Future-Romance.
That makes a lot of sense! A huge Lowlandic pressure for migration, you see. Push-pull mechanisms and stuff. A Lowlandic Empire probably (scattered remains are found everywhere)! And, judging by the presence of the ‘Lowlandic component’ in Future East Europe from the Elbe to the Vistula, maybe Lowlandic peoples spread Proto-Slavic, too! We can even date the common Lowlandic-Slavic proto-language this way! So many groundbreaking conclusions!
Future scholars supporting the Lowlandic homeland are on fire; they can’t get enough of publishing papers on the subject. “Two different Future American language families with cultural origins in Britain and Iberia, my ass! Because genetics.”
And don’t forget the future people of haplogroup R1b-U106 and high Lowlandic component: Wow, they are the heirs of those who expanded Future Germanic and Future Romance languages everywhere, aren’t they? How proud they must be. And who wouldn’t want to have these tall, blond, blue-eyed Lowlanders as their forefathers? Personalised genetic analysis is selling like crazy: “let’s know our Lowlandic percentage!”. Everyone is happy, colourful maps with lots of arrows and shit…
But – your future you might ask in awe, seeing that this doesn’t sound quite right, based on your basic archaeological and linguistic knowledge:
What about specific models of migration proposed to date? The solidest ones, not just anyone that seems to fit?
What about the dialectal classification of languages? The mainstream ones, not those that are compatible with this interpretation?
What about archaeological cultures to which individual samples belonged?
What about the actual dates of each sample? And how this date relates to the state of the culture to which it belongs?
What about the haplogroups, and the actual subclade of each haplogroup?
What about the territories, cultures, and dates not sampled, could they change this interpretation in light of known archaeological models?
And what about the actual origin of that ancestral component they so frivolously named? Dit it really appear ex nihilo in the Low Lands, and expanded from it?
“Who cares! This new data is sooo coool… And it proves what we wanted, what a coincidence! And it’s numbers, mate! Numbers don’t lie.”
Genetic and archaeological studies have established a sub-Saharan African origin for anatomically modern humans with subsequent migrations out of Africa. Using the largest multi-locus data set known to date, we investigated genetic differentiation of early modern humans, human admixture and migration events, and relationships among ancestries and language groups. We compiled publicly available genome-wide genotype data on 5,966 individuals from 282 global samples, representing 30 primary language families. The best evidence supports 21 ancestries that delineate genetic structure of present-day human populations. Independent of self-identified ethno-linguistic labels, the vast majority (97.3%) of individuals have mixed ancestry, with evidence of multiple ancestries in 96.8% of samples and on all continents. The data indicate that continents, ethno-linguistic groups, races, ethnicities, and individuals all show substantial ancestral heterogeneity. We estimated correlation coefficients ranging from 0.522 to 0.962 between ancestries and language families or branches. Ancestry data support the grouping of Kwadi-Khoe, Kx’a, and Tuu languages, support the exclusion of Omotic languages from the Afroasiatic language family, and do not support the proposed Dené-Yeniseian language family as a genetically valid grouping. Ancestry data yield insight into a deeper past than linguistic data can, while linguistic data provide clarity to ancestry data.
Regarding European ancestry:
Southern European ancestry correlates with both Italic and Basque speakers (r = 0.764, p = 6.34 × 10−49). Northern European ancestry correlates with Germanic and Balto-Slavic branches of the Indo-European language family as well as Finno-Ugric and Mordvinic languages of the Uralic family (r = 0.672, p = 4.67 × 10−34). Italic, Germanic, and Balto-Slavic are all branches of the Indo-European language family, while the correlation with languages of the Uralic family is consistent with an ancient migration event from Northern Asia into Northern Europe. Kalash ancestry is widely spread but is the majority ancestry only in the Kalash people (Table S3). The Kalasha language is classified within the Indo-Iranian branch of the Indo-European language family.
Sure, admixture analysis came to save the day. Yet again. Now it’s not just Archaeology related to language anymore, it’s Linguistics; all modern languages and their classification, no less. Because why the hell not? Why would anyone study languages, history, archaeology, etc. when you can run certain algorithms on free datasets of modern populations to explain everything?
What I am criticising here, as always, is not the study per se, its methods (PCA, the use of Admixture or any other tools), or its results, which might be quite interesting – even regarding the origin or position of certain languages (or more precisely their speakers) within their linguistic groups; it’s the many broad, unsupported, striking conclusions (read the article if you want to see more wishful thinking).
This is obviously simplistic citebait – that benefits only journals and authors, and it is therefore tacitly encouraged -, but not knowledge, because it is not supported by any linguistic or archaeological data or expertise.
Is anyone with a minimum knowledge of languages, or general anthropology, actually reviewing these articles?