Migrations painted by Irish and Scottish genetic clusters, and their relationship with British and European ones


Interesting and related publications, now appearing in pairs…

1. The Irish DNA Atlas: Revealing Fine-Scale Population Structure and History within Ireland, by Gilbert et al., in Scientific Reports (2017).


The extent of population structure within Ireland is largely unknown, as is the impact of historical migrations. Here we illustrate fine-scale genetic structure across Ireland that follows geographic boundaries and present evidence of admixture events into Ireland. Utilising the ‘Irish DNA Atlas’, a cohort (n = 194) of Irish individuals with four generations of ancestry linked to specific regions in Ireland, in combination with 2,039 individuals from the Peoples of the British Isles dataset, we show that the Irish population can be divided in 10 distinct geographically stratified genetic clusters; seven of ‘Gaelic’ Irish ancestry, and three of shared Irish-British ancestry. In addition we observe a major genetic barrier to the north of Ireland in Ulster. Using a reference of 6,760 European individuals and two ancient Irish genomes, we demonstrate high levels of North-West French-like and West Norwegian-like ancestry within Ireland. We show that that our ‘Gaelic’ Irish clusters present homogenous levels of ancient Irish ancestries. We additionally detect admixture events that provide evidence of Norse-Viking gene flow into Ireland, and reflect the Ulster Plantations. Our work informs both on Irish history, as well as the study of Mendelian and complex disease genetics involving populations of Irish ancestry.

The European ancestry profiles of 30 Irish and British clusters. (a) The total ancestry contribution summarised by majority European country of origin to each of the 30 Irish and British clusters. (b) (left) The ancestry contributions of 19 European clusters that donate at least 2.5% ancestry to any one Irish or British cluster. (right) The geographic distribution of the 19 European clusters, shown as the proportion of individuals in each European region belonging to each of the 19 European clusters. The proportion of individuals form each European region not a member of the 19 European clusters is shown in grey. Total numbers of individuals from each region are shown in white text. Not all Europeans included in the analysis were phenotyped geographically. The figure was generated in the statistical software language R46, version 3.4.1, using various packages. The map of Europe was sourced from the R software package “mapdata” (https://CRAN.R-project.org/package=mapdata).

2. New preprint on BioRxiv, Insular Celtic population structure and genomic footprints of migration, by Byrne, Martiniano et al. (2017).


Previous studies of the genetic landscape of Ireland have suggested homogeneity, with population substructure undetectable using single-marker methods. Here we have harnessed the haplotype-based method fineSTRUCTURE in an Irish genome-wide SNP dataset, identifying 23 discrete genetic clusters which segregate with geographical provenance. Cluster diversity is pronounced in the west of Ireland but reduced in the east where older structure has been eroded by historical migrations. Accordingly, when populations from the neighbouring island of Britain are included, a west-east cline of Celtic-British ancestry is revealed along with a particularly striking correlation between haplotypes and geography across both islands. A strong relationship is revealed between subsets of Northern Irish and Scottish populations, where discordant genetic and geographic affinities reflect major migrations in recent centuries. Additionally, Irish genetic proximity of all Scottish samples likely reflects older strata of communication across the narrowest inter-island crossing. Using GLOBETROTTER we detected Irish admixture signals from Britain and Europe and estimated dates for events consistent with the historical migrations of the Norse-Vikings, the Anglo-Normans and the British Plantations. The influence of the former is greater than previously estimated from Y chromosome haplotypes. In all, we paint a new picture of the genetic landscape of Ireland, revealing structure which should be considered in the design of studies examining rare genetic variation and its association with traits.

Here are some interesting excerpts (emphasis mine):

Population structure in Ireland

The geographical distribution of this deep subdivision of Leinster resembles pre-Norman territorial boundaries which divided Ireland into fifths (cúige), with north Leinster a kingdom of its own known as Meath (Mide) [15]. However interpreted, the firm implication of the observed clustering is that despite its previously reported homogeneity, the modern Irish population exhibits genetic structure that is subtly but detectably affected by ancestral population structure conferred by geographical distance and, possibly, ancestral social structure.

ChromoPainter PC1 demonstrated high diversity amongst clusters from the west coast, which may be attributed to longstanding residual ancient (possibly Celtic) structure in regions largely unaffected by historical migration. Alternatively, genetic clusters may also have diverged as a consequence of differential influence from outside populations. This diversity between western genetic clusters cannot be explained in terms of geographic distance alone.

In contrast to the west of Ireland, eastern individuals exhibited relative homogeneity; (…) The overall pattern of western diversity and eastern homogeneity in Ireland may be explained by increased gene flow and migration into and across the east coast of Ireland from geographically proximal regions, the closest of which is the neighbouring island of Britain.

Analysis of variance of the British admixture component in cluster groups showed a significant difference (p < 2×10-16), indicating a role for British Anglo-Saxon admixture in distinguishing clusters, and ChromoPainter PC2 was correlated with the British component (p < 2×10-16), explaining approximately 43% of the variance. PC2 therefore captures an east to west Anglo-Celtic cline in Irish ancestry. This may explain the relative eastern homogeneity observed in Ireland, which could be a result of the greater English influence in Leinster and the Pale during the period of British rule in Ireland following the Norman invasion, or simply geographic proximity of the Irish east coast to Britain. Notably, the Ulster cluster group harboured an exceptionally large proportion of the British component (Fig 1D and 1E), undoubtedly reflecting the strong influence of the Ulster Plantations in the 17th century and its residual effect on the ethnically British population that has remained.

Fine-grained population structure in Ireland. (A) fineSTRUCTURE clustering dendrogram for 1,035 Irish individuals. Twenty-three clusters are defined, which are combined into cluster groups for clusters that are neighbouring in the dendrogram, overlapping in principal component space (B) and sampled from regions that are geographically contiguous. Details for each cluster in the dendrogram are provided in S1 Fig. (B) Principal components analysis (PCA) of haplotypic similarity, based on ChromoPainter coancestry matrix for Irish individuals. Points are coloured according to cluster groups defined in (A); the median location of each cluster group is plotted. (C) Map of Irelandshowing the sampling location for a subset of 588 individuals analysed in (A) and (B), coloured by cluster group. Points have been randomly jittered within a radius of 5 km to preserve anonymity. Precise sampling location for 44 Northern Irish individuals from the People of the British Isles dataset was unknown; these individuals are plotted geometrically in a circle. (D) “British admixture component” (ADMIXTURE estimates; k=2) for Irish cluster groups. This component has the largest contribution in ancient Anglo-Saxons and the SEE cluster. (E) Linear regression of principal component 2 (B) versus British admixture component (r2 = 0.43; p < 2×10-16). Points are coloured by cluster group. (Standard error for ADMIXTURE point estimates presented in S11 Fig.)

On the genetic structure of the British Isles

The genetic substructure observed in Ireland is consistent with long term geographic diversification of Celtic populations and the continuity shown between modern and Early Bronze Age Irish people

Clusters representing Celtic populations harbouring less Anglo-Saxon influence separate out above and below SEE on PC4. Notably, northern Irish clusters (NLU), Scottish (NISC, SSC and NSC), Cumbria (CUM) and North Wales (NWA) all separate out at a mutually similar level, representing northern Celtic populations. The southern Celtic populations Cornwall (COR), south Wales (SWA) and south Munster (SMN) also separate out on similar levels, indicating some shared haplotypic variation between geographically proximate Celtic populations across both Islands. It is notable that after the split of the ancestrally divergent Orkney, successive ChromoPainter PCs describe diversity in British populations where “Anglo-saxonization” was repelled [22]. PC3 is dominated by Welsh variation, while PC4 in turn splits North and South Wales significantly, placing south Wales adjacent to Cornwall and north Wales at the other extreme with Cumbria, all enclaves where Brittonic languages persisted.

In an interesting symmetry, many Northern Irish samples clustered strongly with southern Scottish and northern English samples, defining the Northern Irish/Cumbrian/Scottish (NICS) cluster group. More generally, by modelling Irish genomes as a linear mixture of haplotypes from British clusters, we found that Scottish and northern English samples donated more haplotypes to clusters in the north of Ireland than to the south, reflecting an overall correlation between Scottish/north English contribution and ChromoPainter PC1 position in Fig 1 (Linear regression: p < 2×10-16, r2 = 0.24).

North to south variation in Ireland and Britain are therefore not independent, reflecting major gene flow between the north of Ireland and Scotland (Fig 5) which resonates with three layers of historical contacts. First, the presence of individuals with strong Irish affinity among the third generation PoBI Scottish sample can be plausibly attributed to major economic migration from Ireland in the 19th and 20th centuries [6]. Second, the large proportion of Northern Irish who retain genomes indistinguishable from those sampled in Scotland accords with the major settlements (including the Ulster Plantation) of mainly Scottish farmers following the 16th Century Elizabethan conquest of Ireland which led to these forming the majority of the Ulster population. Third, the suspected Irish colonisation of Scotland through the Dál Riata maritime kingdom, which expanded across Ulster and the west coast of Scotland in the 6th and 7th centuries, linked to the introduction and spread of Gaelic languages [3]. Such a migratory event could work to homogenise older layers of Scottish population structure, in a similar manner as noted on the east coasts of Britain and Ireland. Earlier communications and movements across the Irish Sea are also likely, which at its narrowest point separates Ireland from Scotland by approximately 20 km.

Genes mirror geography in the British Isles. (A) fineSTRUCTURE clustering dendrogram for combined Irish and British data. Data principally split into Irish and British groups before subdividing into a total of 50 distinct clusters, which are combined into cluster groups for clusters that formed clades in the dendrogram, overlapped in principal component space (B) and were sampled from regions that are geographically contiguous. Names and labels follow the geographical provenance for the majority of data within the cluster group. Details for each cluster in the dendrogram are provided in S2 Fig. (B) Principal component analysis (PCA) of haplotypic similarity based on the ChromoPainter coancestry matrix, coloured by cluster group with their median locations labelled. We have chosen to present PC1 versus PC4 here as these components capture new information regarding correlation between haplotypic variation across Britain and Ireland and geography, while PC2 and PC3 (Fig 4) capture previously reported splitting for Orkney and Wales from Britain [7]. A map of Ireland and Britain is shown for comparison, coloured by sampling regions for cluster groups, the boundaries of which are defined by the Nomenclature of Territorial Units for Statistics (NUTS 2010), with some regions combined. Sampling regions are coloured by the cluster group with the majority presence in the sampling region; some sampling regions have significant minority cluster group representations as well, for example the Northern Ireland sampling region (UKN0; NUTS 2010) is majorly explained by the NICS cluster group but also has significant representation from the NLU cluster group. The PCA plot has been rotated clockwise by 5 degrees to highlight its similarity with the geographical map of the Ireland and Britain. NI, Northern Ireland; PC, principal component. Cluster groups that share names with groups from Fig 1 (NLU; SMN; CLN; CNN) have an average of 80% of their samples shared with the initial cluster groups. © EuroGeographics for the map and administrative boundaries, note some boundaries have been subsumed or modified to better reflect sampling regions.

Genomic footprints of migration into Ireland

Quite interesting is that it is haplogroups, and not admixture, that which defines the oldest migration layers into Ireland. Without evidence of paternal Y-DNA lineages we would probably not be able to ascertain the oldest migrations and languages broght by migrants, including Celtic languages:

Of all the European populations considered, ancestral influence in Irish genomes was best represented by modern Scandinavians and northern Europeans, with a significant single-date one-source admixture event overlapping the historical period of the Norse-Viking settlements in Ireland (p < 0.01; fit quality FQB > 0.985; Fig 6). (…) This suggests a contribution of historical Viking settlement to the contemporary Irish genome and contrasts with previous estimates of Viking ancestry in Ireland based on Y chromosome haplotypes, which have been very low [25]. The modern-day paucity of Norse-Viking Y chromosome haplotypes may be a consequence of drift with the small patrilineal effective population size, or could have social origins with Norse males having less influence after their military defeat and demise as an identifiable community in the 11th century, with persistence of the autosomal signal through recombination.

European admixture date estimates in northwest Ulster did not overlap the Viking age but did include the Norman period and the Plantations

The genetic legacies of the populations of Ireland and Britain are therefore extensively intertwined and, unlike admixture from northern Europe, too complex to model with GLOBETROTTER.

All-Ireland GLOBETROTTER admixture date estimates for European and British surrogate admixing populations. A summary of the date estimates and 95% confidence intervals for inferred admixture events into Ireland from European and British admixing sources is shown in (A), with ancestry proportion estimates for each historical source population for the two events and example coancestry curves shown in (B). In the coancestry curves Relative joint probability estimates the pairwise probability that two haplotype chunks separated by a given genetic distance come from the two modeled source populations respectively (ie FRA(8) and NOR-SG); if a single admixture event occurred, these curves are expected to decay exponentially at a rate corresponding to the number of generations since the event. The green fitted line describes this GLOBETROTTER fitted exponential decay for the coancestry curve. If the sources come from the same ancestral group the slope of this curve will be negative (as with FRA(8) vs FRA(8)), while a positive slope indicates that sources come from different admixing groups (as with FRA(8) vs NOR-SG). The adjacent bar plot shows the inferred genetic composition of the historical admixing sources modelled as a mixture of the sampled modern populations. A European admixture event was estimated by GLOBETROTTER corresponding to the historical record of the Viking age, with major contributions from sources similar to modern Scandinavians and northern Europeans and minor contributions from southern European-like sources. For admixture date estimates from British-like sources the influence of the Norman settlement and the Plantations could not be disentangled, with the point estimate date for admixture falling between these two eras and GLOBETROTTER unable to adequately resolve source and proportion details of admixture event (fit quality FQB< 0.985). The relative noise of the coancestry curves reflects the uncertainty of the British event. Cluster labels (for the European clustering dendrogram, see S4 Fig; for the PoBI clustering dendrogram, see S3 Fig): FRA(8), France cluster 8; NOR-SG, Norway, with significant minor representations from Sweden and Germany; SE_ENG, southeast England; N_SCOT(4) northern Scotland cluster 4.

Another study that strengthens the need to ascertain haplogroup-admixture differences between Yamna/Bell Beaker and Sredni Stog/Corded Ware.

Text and images from preprint article under a CC-BY-NC-ND 4.0 International license.

Featured image, from the article on Science Reports: The clustering of individuals with Irish and British ancestry based solely on genetics. Shown are 30 clusters identified by fineStructure from 2,103 Irish and British individuals. The dendrogram (left) shows the tree of clusters inferred by fineStructure and the map (right) shows the geographic origin of 192 Atlas Irish individuals and 1,611 British individuals from the Peoples of the British Isles (PoBI) cohort, labelled according to fineStructure cluster membership. Individuals are placed at the average latitude and longitude of either their great-grandparental (Atlas) or grandparental (PoBI) birthplaces. Great Britain is separated into England, Scotland, and Wales. The island of Ireland is split into the four Provinces; Ulster, Connacht, Leinster, and Munster. The outline of Britain was sourced from Global Administrative Areas (2012). GADM database of Global Administrative Areas, version 2.0. www.gadm.org. The outline of Ireland was sourced from Open Street Map Ireland, Copyright OpenStreetMap Contributors, (https://www.openstreetmap.ie/) – data available under the Open Database Licence. The figure was plotted in the statistical software language R46, version 3.4.1, with various packages.

Review article about Ancient Genomics, by Pontus Skoglund and Iain Mathieson


A preprint article by two of the most prolific researchers in Human Ancestry is out, and they request feedback: Ancient genomics: a new view into human prehistory and evolution, by Skoglund and Mathieson (2017). Right now, it is downloadable on Dropbox.


The first decade of ancient genomics has revolutionized the study of human prehistory and evolution. We review new insights based on ancient genomic data, including greatly increased resolution of the timing and structure of the out-of-Africa event, the diversification of present-day non-African populations, and the earliest expansions of those populations into Eurasia and America. Prehistoric genomes now document patterns of population continuity and change on every inhabited continent–in particular the effect of agricultural expansions in Africa, Europe and Oceania–and record a history of natural selection that shapes present-day phenotypic diversity. Despite these advances, much remains unknown, in particular about the genomic histories of Asia–the most populous continent, and Africa–the continent that contains the most genetic diversity. Ancient genomes from these and other regions, integrated with a growing understanding of the genomic basis of human phenotypic diversity, will be in focus during the next decade of research in the field.

The paper may be highly recommended as an introduction for anyone interested in the field of Human Ancestry in general.

However, its short summary of steppe ancestry expansion (where the Corded Ware culture predominates) is still reminiscent of the infamous “Yamnaya -> Corded Ware -> Bell Beaker” model set forth by the 2015 Nature articles on the subject, and Kristiansen’s Indo-European Corded Ware theory.

Here is an excerpt (emphasis mine):

The next substantial change is closely related to ancestry that by around 5000 BP extended over a region of more than 2000 miles of the Eurasian steppe, including in individuals associated with the Yamnaya Cultural Complex in far-eastern Europe (1; 38) and with the Afanasievo culture in the central Asian Altai mountains (1). This “steppe” ancestry is itself a mixture between ancestry that is related to Mesolithic hunter-gatherers of eastern Europe and ancestry that is related to both present-day populations (38) and Mesolithic hunter-gatherers (46) from the Caucasus mountains, and also to the populations of Neolithic (11), and Copper Age (56) Iran. Steppe ancestry appeared in southeastern Europe by 6000 BP (72), northeastern Europe around 5000 BP (47) and central Europe at the time of the Corded Ware Complex around 4600 BP (1; 38). These dates are reasonably tight constraints, because in each case there is no evidence of steppe ancestry in individuals immediately preceding these dates (47; 72). Gene flow on the steppe was extensive and bidirectional, as shown by the eastward flow of Anatolian Neolithic ancestry– reaching well into central Eurasia by the time of the Andronovo culture ~3500 BP (1)–and the westward flow of East Asian ancestry–found in individuals associated with the Iron Age Scythian culture close to the Black Sea ~2500 BP (143).

Copper and Bronze Age population movements (14; 78 Martiniano, 2017 #8761; 85; 112), as well as later movements in the Iron Age and Historical period (70; 119) further distributed steppe ancestry around Europe. Present-day western European populations can be modeled as mixtures of these three ancestry components (Mesolithic hunter-gatherer, Anatolian Neolithic and Steppe) (38; 57). In eastern Europe, further shifts in ancestry are the result of additional or distinct gene flow from Anatolia throughout the Neolithic and Bronze Age in the Aegean (42; 51; 55; 72; 87), and gene flow from Siberian-related populations in Finland and the Baltic region (38). East-west gene flow also brought new ancestry–related to populations from 265 Copper Age Iran–to the Levant during the Copper and Bronze ages (39; 56).

The geographic structure of these population transformations gave rise to population structure of present-day Europe. For example Anatolian Neolithic ancestry is highest in southern European populations like Sardinians, and lowest in northern European populations (38). Steppe ancestry is at high frequency in north-central Europeans and low in the south. Isolation-by-distance may have contributed to these patterns to some extent, but the contribution must have been small. In much of Europe, extreme population discontinuity was the norm.

Featured image: from the article, “Major Holocene population movements and expansions that have been demonstrated using ancient DNA.”


Correlation does not mean causation: the damage of the ‘Yamnaya ancestral component’, and the ‘Future American’ hypothesis


Human ancestry can only help solve anthropological questions by using all anthropological disciplines involved. I have said that many times in this blog.

Correlation does not mean causation

Really, it does not.

You might think the tenet ‘correlation does not mean causation‘ must be evident at this point in Statistics, and it must also be for all those using statistical methods in their research. But it is sadly not so. A lot of researchers just look for correlation, and derive conclusions – without even an initial sound hypothesis to be contrasted… You can judge for yourself, e.g. reading the many instances of this complaint in recent publications of Biomedical and Social Sciences, on the interesting blog Statistical Modeling, Causal Inference, and Social Science.

In anthropological questions regarding Indo-European studies there is an added handicap: not taking correlation to mean causation does also mean – to avoid at least the most obvious confounders – taking into account the multiple linguistic and archaeological data that are available right now, to explain the expansion of Indo-European languages.

You might also believe that international researchers in Human Evolutionary Biology – after all, this is essentially a biomedical discipline – are acquainted with statistical methods and their problems when applied to their field. And that scientific journals – and especially those with the highest impact factors, like Nature, Science, or PNAS – have professional, careful reviewers who would never accept papers that equal correlation with causation, especially when Social Sciences are involved (because this alone might make errors grow exponentially…). Sadly, this is obviously not so, either.


The ‘Yamnaya component’ concept and its damage

From Allentoft et al. (2015), emphasis is mine:

Both studies [Haak et al. (2015) and this one] found a genetic affinity between samples from a central European culture known as Corded Ware, which existed from around 2500 bc, and samples from the earlier Yamnaya steppe culture. This similarity between distant populations is best explained by a substantial westward expansion of the Yamnaya or their close relatives into central Europe (Fig. 1b). Such an expansion is consistent with the steppe hypothesis, which argues that Corded Ware cultures were a conduit for the dispersal of Indo-European languages into Europe.

More interesting than these vague words – and the short, almost invisible suggestion that Yamna may not be exactly the population behind Corded Ware peoples – are the maps that illustrated in Nature their risky hypothesis: they called it “steppe hypothesis“, like that (in general terms), as if everyone defending a steppe origin for Proto-Indo-European would support such a model, when they actually referred to the specific hypothesis of one of their authors (Kristiansen), one of the few archaeologists who keep Gimbutas’ concept of the ‘Kurgan peoples’ alive, based on the Corded Ware culture:

Allentoft Corded Ware
Allentoft et al. (2015): “They conclude that the Corded Ware culture of central Europe had ancestry from the Yamnaya. Allentoft et al. also show that the Afanasievo culture to the east is related to the Yamnaya, and that the Sintashta and Andronovo cultures had ancestry from the Corded Ware. Arrows indicate migrations — those from the Corded Ware reflect the evidence that people of this archaeological culture (or their relatives) were responsible for the spreading of Indo-European languages. All coloured boundaries are approximate.”

In many publications that followed, the trend has been to reproduce this graphical model, by asserting (or implying) that Bell Beaker peoples were the result of subsequent Corded Ware migrations, and indeed that Corded Ware peoples migrated from the Yamna culture, and were thus the vector of expansion for Indo-European languages in Europe.

All of this is being proven wrong, as I predicted: see Mathieson et al. (2017) and Olalde et al. (2017) for recently studied samples with ‘steppe component’, older than (and unrelated to) the Yamna culture. However, no retraction (or correction, whatever) has been published to date about the concept of the ‘Yamnaya ancestry expansion’, and its consequences.

We shall see then just a rather surreptitious shift in terminology from ‘Yamnaya’ to ‘steppe’ component, to adapt to the new data – i.e. some damage control while the ship of ‘Yamnaya ancestry’ capsizes – but little else. “Earlier ‘Yamnaya ancestry’, you say? Just, you know, let’s call it ‘steppe ancestry’ and shift the expansion of Indo-European languages to one or two thousand years earlier, and done!”

The damage of this post-truth genetics is already done: we will see the unending distribution on the Internet in general, and on social networks in particular, of these grandiose conclusions, of far-fetched Indo-European migration models that include the Corded Ware culture, of simplistic maps with apparently harmless ‘arrows of migration’ (like the above) representing fictional population movements suggesting nonexistent dialectal branches.

You might be one of those sceptics wary of so many boring statistical rules: “But it’s a safe reasoning: Yamanaya samples have an ‘ancestral component’ that is found elevated in Corded Ware samples, and less so in Bell Beaker samples, and PCA showed a similar result…so the migration model Yamnaya -> Corded Ware -> Bell Beaker is a priori correct, right?”

The ‘Future American’ hypothesis

Let me illustrate this attractive “Correlation = Causation” argument, using it to solve the problem of Future American languages.

Suppose we live in a future post-apocalyptic world ca. 3500 AD, with no surviving historical records before 3000 AD. None. Just investigation of cultures and their relationship by Archaeology, proto-languages reconstructed and language families identified by Linguistics, etc.

We have thus Future Germanic and Future Romance as the only language families spoken in Future Western Europe and in the Future Americas, in a distribution similar to the present day*, and we have certain somehow related archaeologically-defined cultures on both sides of the Atlantic, like Briton, Iberian, Norman, or Lowlandish, although their distribution remains partly undefined in time and space.

* If you are really curious about this scenario, you can read about the potential evolution of a Future North-American language.

But what languages did the ancestors of Future Americans speak, and who spread them? That question remains far from being settled by our future researchers, in spite of the solidest linguistic and migration models (talking mainly about Briton and Iberian cultures): too many authorities out there questioning them, fighting to impose their own pet theories.

Suddenly, the newly developed field of Human Ancestry comes to save the day. So let’s say we have this map of ancient samples recovered (dated from, say, the 6th to the 18th century AD), and our study is centered on the newly described “Western European” component (a precise combination of, say, WHG+steppe), which peaks in early samples from the Low Lands – hence we call it, quite daringly, “Lowlandic component“.

Our group is keen to demonstrate that the ancient Lowlandic culture described in Archaeology (marked especially by the worldwide distribution of tulips among other traits) is the origin of Western European and American languages… Now, let’s reach conclusions about migrations in the Middle Ages!

‘Future American’ hypothesis. Migration routes in Western Europe and the Americas during the Middle Ages, based on the ‘Lowlandic component’ (Click to open higher quality version).

PCA shows that South-West European samples cluster closely to some North-West European samples, and that some late South American samples available cluster at some distance from North American samples – nearer to a native component represented by two individuals with 0% Lowlandic ancestry and a different cluster in PCA. And some North-American samples cluster quite closely to North-West European samples.

Based on the decrease in ‘Lowlandic component’ in the different samples and on PCA, we conclude that Lowlandic peoples (“or their close relatives”) must have migrated at the same time to North America, South America (or potentially from North America to South America?) as well as western, central, and northern Europe. Both migration events must have happened roughly at the same time, in part because both distinct language families appear in a north-south distribution, and Proto-Lowlandic must be (according to Genetics) the ancestor of both, Proto-Future-Germanic and Proto-Future-Romance.

That makes a lot of sense! A huge Lowlandic pressure for migration, you see. Push-pull mechanisms and stuff. A Lowlandic Empire probably (scattered remains are found everywhere)! And, judging by the presence of the ‘Lowlandic component’ in Future East Europe from the Elbe to the Vistula, maybe Lowlandic peoples spread Proto-Slavic, too! We can even date the common Lowlandic-Slavic proto-language this way! So many groundbreaking conclusions!

Future scholars supporting the Lowlandic homeland are on fire; they can’t get enough of publishing papers on the subject. “Two different Future American language families with cultural origins in Britain and Iberia, my ass! Because genetics.”

And don’t forget the future people of haplogroup R1b-U106 and high Lowlandic component: Wow, they are the heirs of those who expanded Future Germanic and Future Romance languages everywhere, aren’t they? How proud they must be. And who wouldn’t want to have these tall, blond, blue-eyed Lowlanders as their forefathers? Personalised genetic analysis is selling like crazy: “let’s know our Lowlandic percentage!”. Everyone is happy, colourful maps with lots of arrows and shit…

But – your future you might ask in awe, seeing that this doesn’t sound quite right, based on your basic archaeological and linguistic knowledge:

  • What about specific models of migration proposed to date? The solidest ones, not just anyone that seems to fit?
  • What about the dialectal classification of languages? The mainstream ones, not those that are compatible with this interpretation?
  • What about archaeological cultures to which individual samples belonged?
  • What about the actual dates of each sample? And how this date relates to the state of the culture to which it belongs?
  • What about the haplogroups, and the actual subclade of each haplogroup?
  • What about the territories, cultures, and dates not sampled, could they change this interpretation in light of known archaeological models?
  • And what about the actual origin of that ancestral component they so frivolously named? Dit it really appear ex nihilo in the Low Lands, and expanded from it?

“Who cares! This new data is sooo coool… And it proves what we wanted, what a coincidence! And it’s numbers, mate! Numbers don’t lie.”

No, numbers don’t lie. But people do.

Correlation is fun, isn’t it?



Human ancestry solves language questions? New admixture citebait


A paper at Scientific Reports, Human ancestry correlates with language and reveals that race is not an objective genomic classifier, by Baker, Rotimi, and Shriner (2017).

Abstract (emphasis mine):

Genetic and archaeological studies have established a sub-Saharan African origin for anatomically modern humans with subsequent migrations out of Africa. Using the largest multi-locus data set known to date, we investigated genetic differentiation of early modern humans, human admixture and migration events, and relationships among ancestries and language groups. We compiled publicly available genome-wide genotype data on 5,966 individuals from 282 global samples, representing 30 primary language families. The best evidence supports 21 ancestries that delineate genetic structure of present-day human populations. Independent of self-identified ethno-linguistic labels, the vast majority (97.3%) of individuals have mixed ancestry, with evidence of multiple ancestries in 96.8% of samples and on all continents. The data indicate that continents, ethno-linguistic groups, races, ethnicities, and individuals all show substantial ancestral heterogeneity. We estimated correlation coefficients ranging from 0.522 to 0.962 between ancestries and language families or branches. Ancestry data support the grouping of Kwadi-Khoe, Kx’a, and Tuu languages, support the exclusion of Omotic languages from the Afroasiatic language family, and do not support the proposed Dené-Yeniseian language family as a genetically valid grouping. Ancestry data yield insight into a deeper past than linguistic data can, while linguistic data provide clarity to ancestry data.

Regarding European ancestry:

Southern European ancestry correlates with both Italic and Basque speakers (r = 0.764, p = 6.34 × 10−49). Northern European ancestry correlates with Germanic and Balto-Slavic branches of the Indo-European language family as well as Finno-Ugric and Mordvinic languages of the Uralic family (r = 0.672, p = 4.67 × 10−34). Italic, Germanic, and Balto-Slavic are all branches of the Indo-European language family, while the correlation with languages of the Uralic family is consistent with an ancient migration event from Northern Asia into Northern Europe. Kalash ancestry is widely spread but is the majority ancestry only in the Kalash people (Table S3). The Kalasha language is classified within the Indo-Iranian branch of the Indo-European language family.

Sure, admixture analysis came to save the day. Yet again. Now it’s not just Archaeology related to language anymore, it’s Linguistics; all modern languages and their classification, no less. Because why the hell not? Why would anyone study languages, history, archaeology, etc. when you can run certain algorithms on free datasets of modern populations to explain everything?

What I am criticising here, as always, is not the study per se, its methods (PCA, the use of Admixture or any other tools), or its results, which might be quite interesting – even regarding the origin or position of certain languages (or more precisely their speakers) within their linguistic groups; it’s the many broad, unsupported, striking conclusions (read the article if you want to see more wishful thinking).

This is obviously simplistic citebait – that benefits only journals and authors, and it is therefore tacitly encouraged -, but not knowledge, because it is not supported by any linguistic or archaeological data or expertise.

Is anyone with a minimum knowledge of languages, or general anthropology, actually reviewing these articles?


Featured image: Ancestry analysis of the global data set, from the article.

Neolithic and Bronze Age Basque-speaking Iberians resisted invaders from the steppe


Good clickbait, right? I have received reports about this new paper in Google Now the whole weekend, and their descriptions are getting worse each day.

The original title of the article published in PLOS Genetics (already known by its preprint in BioRxiv) was The population genomics of archaeological transition in west Iberia: Investigation of ancient substructure using imputation and haplotype-based methods, by Martiniano et al. (2017).

Maybe the title was not attractive enough, so they sent the following summary, entitled “Bronze Age Iberia received fewer Steppe invaders than the rest of Europe” (also in Phys.org. From their article, the only short reference to the linguistic situation of Iberia (as a trial to sum up potential consequences of the genetic data obtained):

Iberia is unusual in harbouring a surviving pre-Indo-European language, Euskera, and inscription evidence at the dawn of history suggests that pre-Indo-European speech prevailed over a majority of its eastern territory with Celtic-related language emerging in the west. Our results showing that predominantly Anatolian-derived ancestry in the Neolithic extended to the Atlantic edge strengthen the suggestion that Euskara is unlikely to be a Mesolithic remnant. Also our observed definite, but limited, Bronze Age influx resonates with the incomplete Indo-European linguistic conversion on the peninsula, although there are subsequent genetic changes in Iberia and defining a horizon for language shift is not yet possible. This contrasts with northern Europe which both lacks evidence for earlier language strata and experienced a more profound Bronze Age migration.

Judging from the article, more precise summaries of potential consequences would have been “Proto-Basque and Proto-Iberian peoples derived from Neolithic farmers, not Mesolithic or Palaeolithic hunter-gatherers”, or “incomplete Indo-European linguistic conversion of the Iberian Peninsula” – both aspects, by the way, are already known. That would have been quite unromantic, though.

Their carefully selected title has been unsurprisingly distorted at least as “Ancient DNA Reveals Why the Iberian Peninsula Is So Unique“, and “Ancient Iberians resisted Steppe invasions better than the rest of Europe 6,000 years ago“.

So I thought, what the hell, let’s go with the tide. Using the published dataset, I have also helped reconstruct the original phenotype of Bronze Age Iberians, and this is how our Iberian ancestors probably looked like:

Typical Iberian village during the Steppe invasion, according to my phenotype study of Martiniano et al. (2017). Notice typical invaders to the right.

And, by the way, they spoke Basque, the oldest language. Period.

Now, for those new to the article, we already knew that there is less “steppe admixture” in Iberian samples from southern Portugal after the time of east Bell Beaker expansion.

(A) PCA estimated from the CHROMOPAINTER coancestry matrix of 67 ancient samples ranging from the Paleolithic to the Anglo-Saxon period. The samples belonging to each one of the 19 populations identified with fineSTRUCTURE are connected by a dashed line. Samples are placed geographically in 3 panels (with random jitter for visual purposes): (B) Hunter-gatherers; (C) Neolithic Farmers (including Ötzi) and (D) Copper Age to Anglo-Saxon samples. The Portuguese Bronze Age samples (D, labelled in red) formed a distinct population (Portuguese_BronzeAge), while the Middle and Late Neolithic samples from Portugal clustered with Spanish, Irish and Scandinavian Neolithic farmers, which are termed “Atlantic_Neolithic” (C, in green).

However, there is also a clear a discontinuity in Neolithic Y-DNA haplogroups (to R1b-P312 haplogroups). That means obviously a male-driven invasion, from the North-West Indo-European-speaking Bell Beaker culture – which in turn did not have much “steppe admixture” compared to other north-eastern cultures, like the Corded Ware culture, probably unrelated to Indo-European languages.

Summary of the samples sequenced in the present study.

As always, trying to equate steppe or Yamna admixture with invasion or language is plainly wrong. Doing it with few samples, and with the wrong assumptions of what “steppe admixture” means, well…

Proto-Basque and Proto-Iberian no doubt survived the Indo-European Bell Beaker migrations, but if Y-DNA lineages were replaced already by the Bronze Age in southern Portugal, there is little reason to support an increased “resistance” of Iberians to Bell Beaker invaders compared to other marginal regions of Europe (relative to the core Yamna expansion in eastern and central Europe).

As you know, Aquitanian (the likely ancestor of Basque) and Iberian were just two of the many non-Indo-European languages spoken in Europe at the dawn of historical records, so to speak about Iberia as radically different than Italy, Greece, Northern Britain, Scandinavia, or Eastern Europe, is reminiscent of the racism (or, more exactly, xenophobia) that is hidden behind romantic views certain people have of their genetic ancestry.

Some groups formed by a majority of R1b-DF27 lineages, now prevalent in Iberia, spoke probably Iberian languages during the Iron Age in north and eastern Iberia, before their acculturation during the expansion of Celtic-speaking peoples, and later during the expansion of Rome, when most of them eventually spoke Latin. In Mediaeval times, these lineages probably expanded Romance languages southward during the Reconquista.

Before speaking Iberian languages, R1b-DF27 lineages (or older R1b-P312) were probably Indo-European speakers who expanded with the Bell Beaker culture from the lower Danube – in turn created by the interaction of Yamna with Proto-Bell Beaker cultures, and adopted probably the native Proto-Basque and Proto-Iberian languages (or possibly the ancestor of both) near the Pyrenees, either by acculturation, or because some elite invaders expanded successfully (their Y-DNA haplogroup) over the general population, for generations.

Maybe some kind of genetic bottleneck happened, that expanded previously not widespread lineages, as with N1c subclades in Finland.

There is nothing wrong with hypothetic models of ancient genetic prehistory: there are still too many potential scenarios for the expansion of haplogroup R1b-DF27 in Iberia. But, please, stop supporting romantic pictures of ethnolinguistic continuity for modern populations. It’s embarrassing.

Featured image from Wikipedia, and Pinterest, with copyright from Albert Uderzo and publisher company Hachette.

Images from the article, licensed CC-by-sa, as all articles from PLOS.

How to do modern phylogeography: Relationships between clans and genetic kin explain cultural similarities over vast distances


A preprint paper has been published in BioRxiv, Relationships between clans and genetic kin explain cultural similarities over vast distances: the case of Yakutia, by Zvenigorosky et al (2017).


Archaeological studies sample ancient human populations one site at a time, often limited to a fraction of the regions and periods occupied by a given group. While this bias is known and discussed in the literature, few model populations span areas as large and unforgiving as the Yakuts of Eastern Siberia. We systematically surveyed 31,000 square kilometres in the Sakha Republic (Yakutia) and completed the archaeological study of 174 frozen graves, assembled between the 15th and the 19th century. We analysed genetic data (autosomal genotypes, Y-chromosome haplotypes and mitochondrial haplotypes) for all ancient subjects and confronted it to the study of 190 modern subjects from the same area and the same population. Ancient familial links and paternal clan were identified between graves up to 1500 km apart and we provide new data concerning the origins of the contemporary Yakut population and demonstrate that cultural similarities in the past were linked to (i) the expansion of specific paternal clans, (ii) preferential marriage among the elites and (iii) funeral choices that could constitute a bias in any ancient population study.

Even if you are not interested in the cultural and anthropological evolution of this Turkic-speaking people of the Russian Far Eastern region, the method used is an excellent example of how to use archaeology and genetics (especially Y-DNA and mtDNA data) to obtain meaningful results when investigating ancient populations.

For quite some time, probably since the first renown admixture analyses of ancient DNA samples were published, we have been living under the impression that phylogeography, or simply archaeogenetics as it was called back in the day, is not needed.

Cavalli-Sforza’s assertion that the study of modern populations could offer a clear picture of past population movements is now considered wrong, and the study of Y-DNA and mtDNA haplogroups is today mostly disregarded as of secondary importance, even among geneticists. Whole genomic investigation (and especially admixture analyses) have been leading the new wave of overconfidence in genetic results, tightly joint with the ignorance of its shortcomings (and commercial interests based on desires of ethnic identification), and haplogroups are usually just reported with other, not entirely meaningful aspects of ancient DNA analyses.

While it is undeniable that admixture analyses are offering quite interesting results, they must be carefully balanced against known archaeological and linguistic knowledge. Phylogeography – and especially Y-DNA haplogroup assessment – is quite interesting in investigating kinship and clans in patrilocal communities – i.e. most communities in prehistoric and historic periods, unless proven otherwise.

Luckily enough, there are those researchers who still strive to obtain meaningful information from haplotypes. The article referenced in this post is quite interesting due to its phylogeographic method’s applicability to ancient cultures and peoples.

When some geneticists look at simplistic prehistoric maps, like those depicting Yamna, Afanasevo, Corded Ware, and Bell Beaker cultures together, they forget that 1) cultural regions are selected more or less arbitrarily (we only have certain scattered sites for each of these cultures); 2) economic or population contacts are difficult to ascertain and to represent graphically; and 3) time periods for archaeological sites are important – in fact, they are probably THE most important aspect in assessing how accurate a map (and its “arrows” of migration or exchange) represents reality.

A careful, detailed study like this one, if applied to the Pontic-Caspian steppe, would probably reveal how R1b subclades dominated steppe clans, beginning at least during the Suvorovo-Novodanilovka expansion to the west, and certainly representing the vast majority of lineages during the internal expansion in the Early Yamna period and its later expansion east and west of the steppe…

Featured image from the article, summing up Geography, Archaeology, and Genetics of Yakutia – including Y-DNA and mtDNA haplogroups from ancient populations.