A preprint article by two of the most prolific researchers in Human Ancestry is out, and they request feedback: Ancient genomics: a new view into human prehistory and evolution, by Skoglund and Mathieson (2017). Right now, it is downloadable on Dropbox.
The first decade of ancient genomics has revolutionized the study of human prehistory and evolution. We review new insights based on ancient genomic data, including greatly increased resolution of the timing and structure of the out-of-Africa event, the diversification of present-day non-African populations, and the earliest expansions of those populations into Eurasia and America. Prehistoric genomes now document patterns of population continuity and change on every inhabited continent–in particular the effect of agricultural expansions in Africa, Europe and Oceania–and record a history of natural selection that shapes present-day phenotypic diversity. Despite these advances, much remains unknown, in particular about the genomic histories of Asia–the most populous continent, and Africa–the continent that contains the most genetic diversity. Ancient genomes from these and other regions, integrated with a growing understanding of the genomic basis of human phenotypic diversity, will be in focus during the next decade of research in the field.
The paper may be highly recommended as an introduction for anyone interested in the field of Human Ancestry in general.
The next substantial change is closely related to ancestry that by around 5000 BP extended over a region of more than 2000 miles of the Eurasian steppe, including in individuals associated with the Yamnaya Cultural Complex in far-eastern Europe (1; 38) and with the Afanasievo culture in the central Asian Altai mountains (1). This “steppe” ancestry is itself a mixture between ancestry that is related to Mesolithic hunter-gatherers of eastern Europe and ancestry that is related to both present-day populations (38) and Mesolithic hunter-gatherers (46) from the Caucasus mountains, and also to the populations of Neolithic (11), and Copper Age (56) Iran. Steppe ancestry appeared in southeastern Europe by 6000 BP (72), northeastern Europe around 5000 BP (47) and central Europe at the time of the Corded Ware Complex around 4600 BP (1; 38). These dates are reasonably tight constraints, because in each case there is no evidence of steppe ancestry in individuals immediately preceding these dates (47; 72). Gene flow on the steppe was extensive and bidirectional, as shown by the eastward flow of Anatolian Neolithic ancestry– reaching well into central Eurasia by the time of the Andronovo culture ~3500 BP (1)–and the westward flow of East Asian ancestry–found in individuals associated with the Iron Age Scythian culture close to the Black Sea ~2500 BP (143).
Copper and Bronze Age population movements (14; 78 Martiniano, 2017 #8761; 85; 112), as well as later movements in the Iron Age and Historical period (70; 119) further distributed steppe ancestry around Europe. Present-day western European populations can be modeled as mixtures of these three ancestry components (Mesolithic hunter-gatherer, Anatolian Neolithic and Steppe) (38; 57). In eastern Europe, further shifts in ancestry are the result of additional or distinct gene flow from Anatolia throughout the Neolithic and Bronze Age in the Aegean (42; 51; 55; 72; 87), and gene flow from Siberian-related populations in Finland and the Baltic region (38). East-west gene flow also brought new ancestry–related to populations from 265 Copper Age Iran–to the Levant during the Copper and Bronze ages (39; 56).
The geographic structure of these population transformations gave rise to population structure of present-day Europe. For example Anatolian Neolithic ancestry is highest in southern European populations like Sardinians, and lowest in northern European populations (38). Steppe ancestry is at high frequency in north-central Europeans and low in the south. Isolation-by-distance may have contributed to these patterns to some extent, but the contribution must have been small. In much of Europe, extreme population discontinuity was the norm.
Featured image: from the article, “Major Holocene population movements and expansions that have been demonstrated using ancient DNA.”
Ancient DNA studies have established that Neolithic European populations were descended from Anatolian migrants who received a limited amount of admixture from resident hunter-gatherers. Many open questions remain, however, about the spatial and temporal dynamics of population interactions and admixture during the Neolithic period. Here we investigate the population dynamics of Neolithization across Europe using a high-resolution genome-wide ancient DNA dataset with a total of 180 samples, of which 130 are newly reported here, from the Neolithic and Chalcolithic periods of Hungary (6000–2900 BC, n = 100), Germany (5500–3000 BC, n = 42) and Spain (5500–2200 BC, n = 38). We find that genetic diversity was shaped predominantly by local processes, with varied sources and proportions of hunter-gatherer ancestry among the three regions and through time. Admixture between groups with different ancestry profiles was pervasive and resulted in observable population transformation across almost all cultural transitions. Our results shed new light on the ways in which gene flow reshaped European populations throughout the Neolithic period and demonstrate the potential of time-series-based sampling and modelling approaches to elucidate multiple dimensions of historical population interactions.
There were some interesting finds on a regional level, with some late survival of hunter-gatherer ancestry (and Y-DNA haplogroups) in certain specific sites, but nothing especially surprising. This survival of HG ancestry and lineages in Iberia and other regions may be used to revive (yet again) the controversy over the origin of non-Indo-European languages of Europe attested in historical times, such as the only (non-Uralic) one surviving to this day, the Basque language.
This study kept confirming the absence of Y-DNA R1b-M269 subclades in Central Europe before the arrival of Yamna migrants, though, which offers strong reasons to reject the Indo-European from the west hypothesis.
Here are first the PCA of samples included in this paper, and then the PCA of ancient Eurasians (Mathieson et al. 2017) and modern populations (Lazaridis et al. 2014) for comparison of similar clusters:
Human ancestry can only help solve anthropological questions by using all anthropological disciplines involved. I have said that many times in this blog.
Correlation does not mean causation
Really, it does not.
You might think the tenet ‘correlation does not mean causation‘ must be evident at this point in Statistics, and it must also be for all those using statistical methods in their research. But it is sadly not so. A lot of researchers just look for correlation, and derive conclusions – without even an initial sound hypothesis to be contrasted… You can judge for yourself, e.g. reading the many instances of this complaint in recent publications of Biomedical and Social Sciences, on the interesting blog Statistical Modeling, Causal Inference, and Social Science.
In anthropological questions regarding Indo-European studies there is an added handicap: not taking correlation to mean causation does also mean – to avoid at least the most obvious confounders – taking into account the multiple linguistic and archaeological data that are available right now, to explain the expansion of Indo-European languages.
You might also believe that international researchers in Human Evolutionary Biology – after all, this is essentially a biomedical discipline – are acquainted with statistical methods and their problems when applied to their field. And that scientific journals – and especially those with the highest impact factors, like Nature, Science, or PNAS – have professional, careful reviewers who would never accept papers that equal correlation with causation, especially when Social Sciences are involved (because this alone might make errors grow exponentially…). Sadly, this is obviously not so, either.
Both studies [Haak et al. (2015) and this one] found a genetic affinity between samples from a central European culture known as Corded Ware, which existed from around 2500 bc, and samples from the earlier Yamnaya steppe culture. This similarity between distant populations is best explained by a substantial westward expansion of the Yamnaya or their close relatives into central Europe (Fig. 1b). Such an expansion is consistent with the steppe hypothesis, which argues that Corded Ware cultures were a conduit for the dispersal of Indo-European languages into Europe.
More interesting than these vague words – and the short, almost invisible suggestion that Yamna may not be exactly the population behind Corded Ware peoples – are the maps that illustrated in Nature their risky hypothesis: they called it “steppe hypothesis“, like that (in general terms), as if everyone defending a steppe origin for Proto-Indo-European would support such a model, when they actually referred to the specific hypothesis of one of their authors (Kristiansen), one of the few archaeologists who keep Gimbutas’ concept of the ‘Kurgan peoples’ alive, based on the Corded Ware culture:
In many publications that followed, the trend has been to reproduce this graphical model, by asserting (or implying) that Bell Beaker peoples were the result of subsequent Corded Ware migrations, and indeed that Corded Ware peoples migrated from the Yamna culture, and were thus the vector of expansion for Indo-European languages in Europe.
We shall see then just a rather surreptitious shift in terminology from ‘Yamnaya’ to ‘steppe’ component, to adapt to the new data – i.e. some damage control while the ship of ‘Yamnaya ancestry’ capsizes – but little else. “Earlier ‘Yamnaya ancestry’, you say? Just, you know, let’s call it ‘steppe ancestry’ and shift the expansion of Indo-European languages to one or two thousand years earlier, and done!”
The damage of this post-truth genetics is already done: we will see the unending distribution on the Internet in general, and on social networks in particular, of these grandiose conclusions, of far-fetched Indo-European migration models that include the Corded Ware culture, of simplistic maps with apparently harmless ‘arrows of migration’ (like the above) representing fictional population movements suggesting nonexistent dialectal branches.
You might be one of those sceptics wary of so many boring statistical rules: “But it’s a safe reasoning: Yamanaya samples have an ‘ancestral component’ that is found elevated in Corded Ware samples, and less so in Bell Beaker samples, and PCA showed a similar result…so the migration model Yamnaya -> Corded Ware -> Bell Beaker is a priori correct, right?”
The ‘Future American’ hypothesis
Let me illustrate this attractive “Correlation = Causation” argument, using it to solve the problem of Future American languages.
Suppose we live in a future post-apocalyptic world ca. 3500 AD, with no surviving historical records before 3000 AD. None. Just investigation of cultures and their relationship by Archaeology, proto-languages reconstructed and language families identified by Linguistics, etc.
We have thus Future Germanic and Future Romance as the only language families spoken in Future Western Europe and in the Future Americas, in a distribution similar to the present day*, and we have certain somehow related archaeologically-defined cultures on both sides of the Atlantic, like Briton, Iberian, Norman, or Lowlandish, although their distribution remains partly undefined in time and space.
* If you are really curious about this scenario, you can read about the potential evolution of a Future North-American language.
But what languages did the ancestors of Future Americans speak, and who spread them? That question remains far from being settled by our future researchers, in spite of the solidest linguistic and migration models (talking mainly about Briton and Iberian cultures): too many authorities out there questioning them, fighting to impose their own pet theories.
Suddenly, the newly developed field of Human Ancestry comes to save the day. So let’s say we have this map of ancient samples recovered (dated from, say, the 6th to the 18th century AD), and our study is centered on the newly described “Western European” component (a precise combination of, say, WHG+steppe), which peaks in early samples from the Low Lands – hence we call it, quite daringly, “Lowlandic component“.
Our group is keen to demonstrate that the ancient Lowlandic culture described in Archaeology (marked especially by the worldwide distribution of tulips among other traits) is the origin of Western European and American languages… Now, let’s reach conclusions about migrations in the Middle Ages!
PCA shows that South-West European samples cluster closely to some North-West European samples, and that some late South American samples available cluster at some distance from North American samples – nearer to a native component represented by two individuals with 0% Lowlandic ancestry and a different cluster in PCA. And some North-American samples cluster quite closely to North-West European samples.
Based on the decrease in ‘Lowlandic component’ in the different samples and on PCA, we conclude that Lowlandic peoples (“or their close relatives”) must have migrated at the same time to North America, South America (or potentially from North America to South America?) as well as western, central, and northern Europe. Both migration events must have happened roughly at the same time, in part because both distinct language families appear in a north-south distribution, and Proto-Lowlandic must be (according to Genetics) the ancestor of both, Proto-Future-Germanic and Proto-Future-Romance.
That makes a lot of sense! A huge Lowlandic pressure for migration, you see. Push-pull mechanisms and stuff. A Lowlandic Empire probably (scattered remains are found everywhere)! And, judging by the presence of the ‘Lowlandic component’ in Future East Europe from the Elbe to the Vistula, maybe Lowlandic peoples spread Proto-Slavic, too! We can even date the common Lowlandic-Slavic proto-language this way! So many groundbreaking conclusions!
Future scholars supporting the Lowlandic homeland are on fire; they can’t get enough of publishing papers on the subject. “Two different Future American language families with cultural origins in Britain and Iberia, my ass! Because genetics.”
And don’t forget the future people of haplogroup R1b-U106 and high Lowlandic component: Wow, they are the heirs of those who expanded Future Germanic and Future Romance languages everywhere, aren’t they? How proud they must be. And who wouldn’t want to have these tall, blond, blue-eyed Lowlanders as their forefathers? Personalised genetic analysis is selling like crazy: “let’s know our Lowlandic percentage!”. Everyone is happy, colourful maps with lots of arrows and shit…
But – your future you might ask in awe, seeing that this doesn’t sound quite right, based on your basic archaeological and linguistic knowledge:
What about specific models of migration proposed to date? The solidest ones, not just anyone that seems to fit?
What about the dialectal classification of languages? The mainstream ones, not those that are compatible with this interpretation?
What about archaeological cultures to which individual samples belonged?
What about the actual dates of each sample? And how this date relates to the state of the culture to which it belongs?
What about the haplogroups, and the actual subclade of each haplogroup?
What about the territories, cultures, and dates not sampled, could they change this interpretation in light of known archaeological models?
And what about the actual origin of that ancestral component they so frivolously named? Dit it really appear ex nihilo in the Low Lands, and expanded from it?
“Who cares! This new data is sooo coool… And it proves what we wanted, what a coincidence! And it’s numbers, mate! Numbers don’t lie.”
I have just uploaded the working draft of the third version of the Indo-European demic diffusion model. Unlike the previous two versions, which were published as essays (fully developed papers), this new version adds more information on human admixture, and probably needs important corrections before a definitive edition can be published.
The third version is available right now on ResearchGate and Academia.edu. I will post the PDF at Academia Prisca, as soon as possible:
Feel free to comment on the paper here, or (preferably) in our forum.
A working version (needing some corrections) divided by sections, illustrated with up-to-date, high resolution maps, can be found (as always) at the official collaborative Wiki website indo-european.info.
Genetic and archaeological studies have established a sub-Saharan African origin for anatomically modern humans with subsequent migrations out of Africa. Using the largest multi-locus data set known to date, we investigated genetic differentiation of early modern humans, human admixture and migration events, and relationships among ancestries and language groups. We compiled publicly available genome-wide genotype data on 5,966 individuals from 282 global samples, representing 30 primary language families. The best evidence supports 21 ancestries that delineate genetic structure of present-day human populations. Independent of self-identified ethno-linguistic labels, the vast majority (97.3%) of individuals have mixed ancestry, with evidence of multiple ancestries in 96.8% of samples and on all continents. The data indicate that continents, ethno-linguistic groups, races, ethnicities, and individuals all show substantial ancestral heterogeneity. We estimated correlation coefficients ranging from 0.522 to 0.962 between ancestries and language families or branches. Ancestry data support the grouping of Kwadi-Khoe, Kx’a, and Tuu languages, support the exclusion of Omotic languages from the Afroasiatic language family, and do not support the proposed Dené-Yeniseian language family as a genetically valid grouping. Ancestry data yield insight into a deeper past than linguistic data can, while linguistic data provide clarity to ancestry data.
Regarding European ancestry:
Southern European ancestry correlates with both Italic and Basque speakers (r = 0.764, p = 6.34 × 10−49). Northern European ancestry correlates with Germanic and Balto-Slavic branches of the Indo-European language family as well as Finno-Ugric and Mordvinic languages of the Uralic family (r = 0.672, p = 4.67 × 10−34). Italic, Germanic, and Balto-Slavic are all branches of the Indo-European language family, while the correlation with languages of the Uralic family is consistent with an ancient migration event from Northern Asia into Northern Europe. Kalash ancestry is widely spread but is the majority ancestry only in the Kalash people (Table S3). The Kalasha language is classified within the Indo-Iranian branch of the Indo-European language family.
Sure, admixture analysis came to save the day. Yet again. Now it’s not just Archaeology related to language anymore, it’s Linguistics; all modern languages and their classification, no less. Because why the hell not? Why would anyone study languages, history, archaeology, etc. when you can run certain algorithms on free datasets of modern populations to explain everything?
What I am criticising here, as always, is not the study per se, its methods (PCA, the use of Admixture or any other tools), or its results, which might be quite interesting – even regarding the origin or position of certain languages (or more precisely their speakers) within their linguistic groups; it’s the many broad, unsupported, striking conclusions (read the article if you want to see more wishful thinking).
This is obviously simplistic citebait – that benefits only journals and authors, and it is therefore tacitly encouraged -, but not knowledge, because it is not supported by any linguistic or archaeological data or expertise.
Is anyone with a minimum knowledge of languages, or general anthropology, actually reviewing these articles?
I have made changes to some of the old blogs I had, like this one, and I have merged two of them (from carlosquiles.com and indo-european.info) in this domain, indo-european.eu, to begin blogging about anthropological questions regarding Proto-Indo-Europeans and their language.
This blog was used years ago as my personal dialectic training site in English, mostly filled with controversial topics, and while I hope to keep some form of discussion, I want to turn it into a more pragmatic blog for news and reports on Indo-European studies.
Indo-European.info will be used as a collaborative Wiki website for this model to include supplementary information from published papers – such as results of individual and group’s admixture analyses, archaeological information of individual samples, and also mtDNA. To collaborate, users will have to request an account first (it will be a closed community), and those with important contributions will be added as authors of the following editions of the paper.