More Celts of hg. R1b, more Afanasievo ancestry, more maps


Interesting recent developments:

Celts and hg. R1b


Recent paper (behind paywall) Multi-scale archaeogenetic study of two French Iron Age communities: From internal social- to broad-scale population dynamics, by Fischer et al. J Archaeol Sci (2019).

In it, Fischer and colleagues update their previous data for the Y-DNA of Gauls from the Urville-Nacqueville necropolis, Normandy (ca. 300-100 BC), with 8 samples of hg. R, at least 5 of them R1b. They also report new data from the Gallic cemetery at Gurgy ‘Les Noisats’, Southern Paris Basin (ca. 120-80 BC), with 19 samples of hg. R, at least 13 of them R1b.

In both cases, it is likely that both communities belonged (each) to the same paternal lineages, hence the patrilocal residence rules and patrilineality described for Gallic groups, also supported by the different maternal gene pools.

The interesting data would be whether these individuals were of hg. R1b-L21, hence mainly local lineages later replaced or displaced to the west, or – a priori much more likely – of some R1b-U152 and/or R1b-DF27 subclades from Central Europe that became less and less prevalent as Celts expanded into more isolated regions south of the Pyrenees and into the British Isles. Such information is lacking in the paper, probably due to the poor coverage of the samples.

Y-DNA haplogroups in Europe during the Early Iron Age. See full map.

Other Celts

As for early Celts, we already have:

Celtiberians from the Basque Country (one of hg. I2a) and likely Celtic genetic influence in north-east Iberia (all R1b), where Iberian languages spread later, showing that Celts expanded from some place in Central Europe, probably already with the Urnfield culture (ca. 1300 BC on).

Two Hallstatt samples from Bylany, Bohemia (ca. 836-780 BC), by Damgaard et al. Nature (2018), one of them of hg. R1b-U152.

Photo and diagram of burial HÜ-I/8, Mitterkirchen, Oberösterreich, Leskovar 1998.

Another Hallstatt HaC/D1 sample from Mittelkirchen, Austria (ca. 850-650/600), by Kiesslich et al. (2012), with predicted hg. G2a (see Athey’s haplogroup prediction).

One sample of early La Tène culture A from Putzenfeld am Dürrnberg, Hallein, Austria (ca 450–380 BC), by Kiesslich et al. (2012), with predicted hg. R1b (see Athey’s haplogroup prediction).

NOTE. For potential unreliability of haplogroup prediction with Whit Atheys’ haplogroup predictor, see e.g. Zhang et al. (2017).

Photo and diagram of Burial 376, Putzenfeld, Dürrnberg bei Hallein, Moser 2007.

Three Britons from Hinxton, South Cambridgeshire (ca. 170 BC – AD 80) from Schiffels et al. (2016), two of them of local hg. R1b-S461.

Indirectly, data of Vikings by Margaryan et al. (2019) from the British Isles and beyond show hg. R1b associated with modern British-like ancestry, also linked to early “Picts”, hence likely associated with Britons even after the Anglo-Saxon settlement. Supporting both (1) my recent prediction of hg. R1b-M167 expanding with Celts and (2) the reason for its presence among modern Scandinavians, is the finding of the first ancient sample of this subclade (VK166) among the Vikings of St John’s College Oxford, associated with the ‘St Brice’s Day Massacre’ (see Margaryan et al. 2019 supplementary materials).

The R1b-M167 sample shows 23.5% British-like ancestry, hence autosomally closer to other local samples (and related to the likely Picts from Orkney) than to some of his deceased partners at the site. Other samples with sizeable British-like ancestry include VK177 (32.6%, hg. R1b-U152), VK173 (33.3%, hg. I2a1b1a), or VK150 (25.6%, hg. I2a1b1a), while typical Germanic subclades like I1 or R1b-U106 – which may be associated with Anglo-Saxons, too – tend to show less.

Y-DNA haplogroups in Europe during the Late Iron Age. See full map.

I remember some commenter asking recently what would happen to the theory of Proto-Indo-European-speaking R1b-rich Yamnaya culture if Celts expanded with hg. R1a, because there were only one hg. R1b and one (possibly) G2a from Hallstatt. As it turns out, they were mostly R1b. However, the increasingly frequent obsession of searching for specific haplogroups and ancestry during the Iron Age and the Middle Ages is weird, even as a desperate attempt, because:

  1. it is evident that the more recent the ancient DNA samples are, the more they are going to resemble modern populations of the same area, so ancient DNA would become essentially useless;
  2. cultures from the early Iron Age onward (and even earlier) were based on increasingly complex sociopolitical systems everywhere, which is reflected in haplogroup and ancestry variability, e.g. among Balts, East Germanic peoples, Slavs (of hg. E1b-V13, I2a-L621), or Tocharians.

In fact, even the finding of hg. R1b among Celts of central and western Europe during the Iron Age is rather unenlightening, because more specific subclades and information on ancestry changes are needed to reach any meaningful conclusion as to migration vs. acculturation waves of expanding Celtic languages, which spread into areas that were mostly Indo-European-speaking since the Bell Beaker expansion.

Afanasevo ancestry in Asia

Wang and colleagues continue to publish interesting analyses, now in the preprint Inland-coastal bifurcation of southern East Asians revealed by Hmong-Mien genomic history, by Xia et al. bioRxiv (2019).

Interesting excerpt (emphasis mine):

Although the Devil’s Cave ancestry is generally the predominant East Asian lineage in North Asia and adjacent areas, there is an intriguing discrepancy between the eastern [Korean, Japanese, Tungusic (except northernmost Oroqen), and Mongolic (except westernmost Kalmyk) speakers] and the western part [West Xiōngnú (~2,150 BP), Tiānshān Hun (~1,500 BP), Turkic-speaking Karakhanid (~1,000 BP) and Tuva, and Kalmyk]. Whereas the East Asian ancestry of populations in the western part has entirely belonged to the Devil’s Cave lineage till now, populations in the eastern part have received the genomic influence from an Amis-related lineage (17.4–52.1%) posterior to the presence of the Devil’s Cave population roughly in the same region (~7,600 BP)12. Analogically, archaeological record has documented the transmission of wet-rice cultivation from coastal China (Shāndōng and/or Liáoníng Peninsula) to Northeast Asia, notably the Korean Peninsula (Mumun pottery period, since ~3,500 BP) and the Japanese archipelago (Yayoi period, since ~2,900 BP)2. Especially for Japanese, the Austronesian-related linguistic influence in Japanese may indicate a potential contact between the Proto-Japonic speakers and population(s) affiliating to the coastal lineage. Thus, our results imply that a southern-East-Asian-related lineage could be arguably associated with the dispersal of wet-rice agriculture in Northeast Asia at least to some extent.

Spatial and temporal distribution of ancestries in East Asians. Reference populations and corresponding hypothesized ancestral populations: (1) Devil’s Cave (~7,600 BP), the northern East Asian lineage; (2) Amis, the southern East Asian lineage (= AHM + AAA + AAN); (3) Hòabìnhian (~7,900 BP), a lineage related to Andamanese and indigenous hunter-gatherer of MSEA; (4) Kolyma (~9,800 BP), “Ancient Palaeo-Siberians”; (5) Afanasievo (~4,800 BP), steppe ancestry; (6) Namazga (~5,200 BP), the lineage of Chalcolithic Central Asian. Here, we report the best-fitting results of qpAdm based on following criteria: (1) a feasible p-value (&mt; 0.05), (2) feasible proportions of all the ancestral components (mean &mt; 0 and standard error < mean), and (3) with the highest p-value if meeting previous conditions.

In this case, the study doesn’t compare Steppe_MLBA, though, so the findings of Afanasievo ancestry have to be taken with a pinch of salt. They are, however, compared to Namazga, so “Steppe ancestry” is there. Taking into account the limited amount of Yamnaya-like ancestry that could have reached the Tian Shan area with the Srubna-Andronovo horizon in the Iron Age (see here), and the amount of Yamnaya-like ancestry that appears in some of these populations, it seems unlikely that this amount of “Steppe ancestry” would emerge as based only on Steppe_MLBA, hence the most likely contacts of Turkic peoples with populations of both Afanasievo (first) and Corded Ware-derived ancestry (later) to the west of Lake Baikal.

(1) The simplification of ancestral components into A vs. B vs. C… (when many were already mixed), and (2) the simplistic selection of one OR the other in the preferred models (such as those published for Yamnaya or Corded Ware), both common strategies in population genomics pose evident problems when assessing the actual gene flow from some populations into others.

Also, it seems that when the “Steppe”-like contribution is small, both Yamnaya and Corded Ware ancestry will be good fits in admixed populations of Central Asia, due to the presence of peoples of EHG-like (viz. West Siberia HG) and/or CHG-like (viz. Namazga) ancestry in the area. Unless and until these problems are addressed, there is little that can be confidently said about the history of Yamnaya vs. Corded Ware admixture among Asian peoples.

Maps, maps, and more maps

As you have probably noticed if you follow this blog regularly, I have been experimenting with GIS software in the past month or so, trying to map haplogroups and ancestry components (see examples for Vikings, Corded Ware, and Yamnaya). My idea was to show the (pre)historical evolution of ancestry and haplogroups coupled with the atlas of prehistoric migrations, but I have to understand first what I can do with GIS statistical tools.

My latest exercise has been to map modern haplogroup distribution (now added to the main menu above) using data from the latest available reports. While there have been no great surprises – beyond the sometimes awful display of data by some papers – I think it is becoming clearer with each new publication how wrong it was for geneticists to target initially those populations considered “isolated” – hence subject to strong founder effects – to extrapolate language relationships. For example:

  • The mapping of R1b-M269, in particular basal subclades, corresponds nicely with the Indo-European expansions.
  • There is no clear relationship of R1b, not even R1b-DF27 (especially basal subclades), with Basques. There is no apparent relationship between the distribution of R1b-M269 and some mythical non-Indo-European “Old Europeans”, like Etruscans or Caucasian speakers, either.
  • Basal R1a-M417 shows an interesting distribution, as do maps of basal Z282 and Z93 subclades, despite the evident late bottlenecks and acculturation among Slavs.
  • The distribution of hg. N1a-VL29 (and other N1a-L392 subclades) is clearly dissociated from Uralic peoples, and their expansion in the whole Baltic Sea during the Iron Age doesn’t seem to be related to any specific linguistic expansion.
  • haplogroup-n1a-vl29
    Modern distribution of haplogroup N1a-VL29. See full map.
  • Even the most recent association in Post et al. (2019) with hg. N1a-Z1639 – due to the lack of relationship of Uralic with N1a-VL29 – seems like a stretch, seeing how it probably expanded from the Kola Peninsula and the East Urals, and neither the Lovozero Ware nor forest hunter-fishers of the Cis- and Trans-Urals regions were Uralic-speaking cultures.
  • The current prevalence of hg. R1b-M73 supports its likely expansion with Turkic-speaking peoples.
  • The distribution of haplogroup R1b-V88 in Africa doesn’t look like it was a mere founder effect in Chadic peoples – although they certainly underwent a bottleneck under it.
  • The distribution of R1a-M420 (xM198) and hg. R1b-M343 (possibly not fully depicted in the east) seem to be related to expansions close to the Caucasus, supporting once more their location in Eastern Europe / West Siberia during the Mesolithic.
  • The mapping of E1b-V13 and I-M170 (I haven’t yet divided it into subclades) are particularly relevant for the recent eastward expansion of early Slavic peoples.

All in all, modern haplogroup distribution might have been used to ascertain prehistoric language movements even in the 2000s. It was the obsession with (and the wrong assumptions about) the “purity” of certain populations – say, Basques or Finns – what caused many of the interpretation problems and circular reasoning we are still seeing today.

I have also updated maps of Y-chromosome haplogroups reported for ancient samples in Europe and/or West Eurasia for the Early Eneolithic, Early Chalcolithic, Late Chalcolithic, Early Bronze Age, Middle Bronze Age, Late Bronze Age, Early Iron Age, Late Iron Age, Antiquity, and Middle Ages.

Haplogroup inference

I have also tried Yleaf v.2 – which seems like an improvement over the infamous v.1 – to test some samples that hobbyists and/or geneticists have reported differently in the past. I have posted the results in this ancient DNA haplogroup page. It doesn’t mean that the inferences I obtain are the correct ones, but now you have yet another source to compare.

Not many surprises here, either:

  • M15-1 and M012, two Proto-Tocharians from Shirenzigou, are of hg. R1b-PH155, not R1b-M269.
  • I0124, the Samara HG, is of hg. R1b-P297, but uncertain for both R1b-M73 and R1b-M269.
  • I0122, the Khvalynsk chieftain, is of hg. R1b-V1636.
  • I2181, the Smyadovo outlier of poor coverage, is possibly of hg. R, and could be of hg. R1b-M269, but could also be even non-P.
  • I6561 from Alexandria is probably of hg. R1a-M417, likely R1a-Z645, maybe R1a-Z93, but can’t be known beyond that, which is more in line with the TMRCA of R1a subclades and the radiocarbon date of the sample.
  • I2181, the Yamnaya individual (supposedly Pre-R1b-L51) at Lopatino II is R1b-M269, negative for R1b-L51. Nothing beyond that.

You can ask me to try mapping more data or to test the haplogroup of more samples, provided you give me a proper link to the relevant data, they are interesting for the subject of this blog…and I have the time to do it.


Reproductive success among ancient Icelanders stratified by ancestry


New paper (behind paywall), Ancient genomes from Iceland reveal the making of a human population, by Ebenesersdóttir et al. Science (2018) 360(6392):1028-1032.

Abstract and relevant excerpts (emphasis mine):

Opportunities to directly study the founding of a human population and its subsequent evolutionary history are rare. Using genome sequence data from 27 ancient Icelanders, we demonstrate that they are a combination of Norse, Gaelic, and admixed individuals. We further show that these ancient Icelanders are markedly more similar to their source populations in Scandinavia and the British-Irish Isles than to contemporary Icelanders, who have been shaped by 1100 years of extensive genetic drift. Finally, we report evidence of unequal contributions from the ancient founders to the contemporary Icelandic gene pool. These results provide detailed insights into the making of a human population that has proven extraordinarily useful for the discovery of genotype-phenotype associations.

Shared drift of ancient and contemporary Icelanders. (A) Scatterplot of D-statistics reflecting Iceland-specific drift. To aid interpretation, we included values for ancient British-Irish Islanders and a subset of contemporary individuals (who were correspondingly removed from the reference populations).

We estimated the mean Norse ancestry of the settlement population (24 pre-Christians and one early Christian) as 0.566 [95% confidence interval (CI) 0.431–0.702], with a nonsignificant difference betweenmales (0.579) and females (0.521). Applying the same ADMIXTURE analysis to each of the 916 contemporary Icelanders, we obtained a mean Norse ancestry of 0.704 (95% CI 0.699–0.709). Although not statistically significant (t test p = 0.058), this difference is suggestive. A similar difference ofNorse ancestry was observed with a frequency-based weighted least-squares admixture estimator (16), 0.625 [Mean squared error (MSE) = 0.083] versus 0.74 (MSE = 0.0037). Finally, the D-statistic test D(YRI, X; Gaelic, Norse) also revealed a greater affinity between Norse and contemporary Icelanders (0.0004, 95% CI 0.00008–0.00072) than between Norse and ancient Icelanders (−0.0002, 95% CI −0.00056–0.00015). This observation raises the possibility that reproductive success among the earliest Icelanders was stratified by ancestry, as genetic drift alone is unlikely to systematically alter ancestry at thousands of independent loci (fig. S10). We note that many settlers of Gaelic ancestry came to Iceland as slaves, whose survival and freedom to reproduce is likely to have been constrained (17). Some shift in ancestry must also be due to later immigration from Denmark, which maintained colonial control over Iceland from 1380 to 1944 (for example, in 1930 there were 745 Danes out of a total population of 108,629 in Iceland) (18).

Shared drift of ancient and contemporary Icelanders. (B) Estimated Norse,
Gaelic, and Icelandic ancestry for ancient Icelanders using ADMIXTURE
in supervised mode.

Five pre-Christian Icelanders (VDP-A5, DAVA9, NNM-A1, SVK-A1 and TGS-A1) fall just outside the space occupied by contemporary Norse in Fig. 3A. That these individuals show a stronger signal of drift shared with contemporary Icelanders is also apparent in the results of ADMIXTURE, run in supervised mode with three contemporary reference populations (Norse, Gaelic, and Icelandic) (Fig. 3B). The correlation between the proportion of Icelandic ancestry from this analysis and PC1 in Fig. 2A is |r| = 0.913.(…)

(…) as the five ancient Icelanders fall well within the cluster of contemporary Scandinavians (Fig. 3C), we conclude that they, or close relatives, likely contributed more to the contemporary Icelandic gene pool than the other pre-Christians. We note that this observation is consistent with the inference that settlers of Norse ancestry had greater reproductive success than those of Gaelic ancestry.

Haplogroup data, from the paper. Image modified by me, with those close to Gaelic and British/Irish samples (see above Scatterplot of D-statistics and ADMIXTURE data) marked in fluorescent: yellow closer to Gaelic, green less close.

Ancient Icelanders show a clear relation with the typically Norse Y-DNA distribution: I1 / R1a-Z284 / R1b-U106.

  • Among R1a, the picture is uniformly of R1a-Z284 (at least five of the seven reported).
  • There are six samples of I1, with great variation in subclades.
  • Among R1b-L51 subclades (ten samples), there are U106 (at least one sample), L21 (three samples), and another P312 (L238); see above the relationship with those clustering closely with Gaelic samples, marked in fluorescent, which is compatible with Gaelic settlers (predominantly of R1b-L21 lineages) coming to Iceland as slaves.

Probably not much of a surprise, coming from Norse speakers, but they are another relevant reference for comparison with samples of East Germanic tribes, when they appear.

Also, the first reported Klinefelter (XXY) in ancient DNA (sample ID is YGS-B2).


Network analysis of the Viking Age in Ireland as portrayed in Cogadh Gaedhel re Gallaibh


Open access Network analysis of the Viking Age in Ireland as portrayed in Cogadh Gaedhel re Gallaibh, by Yose et al., Royal Society Open Science (2018).


Cogadh Gaedhel re Gallaibh (‘The War of the Gaedhil with the Gaill’) is a medieval Irish text, telling how an army under the leadership of Brian Boru challenged Viking invaders and their allies in Ireland, culminating with the Battle of Clontarf in 1014. Brian’s victory is widely remembered for breaking Viking power in Ireland, although much modern scholarship disputes traditional perceptions. Instead of an international conflict between Irish and Viking, interpretations based on revisionist scholarship consider it a domestic feud or civil war. Counter-revisionists challenge this view and a long-standing and lively debate continues. Here, we introduce quantitative measures to the discussions. We present statistical analyses of network data embedded in the text to position its sets of interactions on a spectrum from the domestic to the international. This delivers a picture that lies between antipodal traditional and revisionist extremes; hostilities recorded in the text are mostly between Irish and Viking—but internal conflict forms a significant proportion of the negative interactions too.

The entire Cogadh network of interacting characters. Characters identified as Irish are represented by green nodes and those identified as Vikings are in blue. Other characters are in grey. Edges between pairs of Irish nodes are also coloured green while those between Viking pairs are blue. Edges linking Irish to Viking nodes are brown and the remaining edges are grey.

See also:

Migrations painted by Irish and Scottish genetic clusters, and their relationship with British and European ones


Interesting and related publications, now appearing in pairs…

1. The Irish DNA Atlas: Revealing Fine-Scale Population Structure and History within Ireland, by Gilbert et al., in Scientific Reports (2017).


The extent of population structure within Ireland is largely unknown, as is the impact of historical migrations. Here we illustrate fine-scale genetic structure across Ireland that follows geographic boundaries and present evidence of admixture events into Ireland. Utilising the ‘Irish DNA Atlas’, a cohort (n = 194) of Irish individuals with four generations of ancestry linked to specific regions in Ireland, in combination with 2,039 individuals from the Peoples of the British Isles dataset, we show that the Irish population can be divided in 10 distinct geographically stratified genetic clusters; seven of ‘Gaelic’ Irish ancestry, and three of shared Irish-British ancestry. In addition we observe a major genetic barrier to the north of Ireland in Ulster. Using a reference of 6,760 European individuals and two ancient Irish genomes, we demonstrate high levels of North-West French-like and West Norwegian-like ancestry within Ireland. We show that that our ‘Gaelic’ Irish clusters present homogenous levels of ancient Irish ancestries. We additionally detect admixture events that provide evidence of Norse-Viking gene flow into Ireland, and reflect the Ulster Plantations. Our work informs both on Irish history, as well as the study of Mendelian and complex disease genetics involving populations of Irish ancestry.

The European ancestry profiles of 30 Irish and British clusters. (a) The total ancestry contribution summarised by majority European country of origin to each of the 30 Irish and British clusters. (b) (left) The ancestry contributions of 19 European clusters that donate at least 2.5% ancestry to any one Irish or British cluster. (right) The geographic distribution of the 19 European clusters, shown as the proportion of individuals in each European region belonging to each of the 19 European clusters. The proportion of individuals form each European region not a member of the 19 European clusters is shown in grey. Total numbers of individuals from each region are shown in white text. Not all Europeans included in the analysis were phenotyped geographically. The figure was generated in the statistical software language R46, version 3.4.1, using various packages. The map of Europe was sourced from the R software package “mapdata” (

2. New preprint on BioRxiv, Insular Celtic population structure and genomic footprints of migration, by Byrne, Martiniano et al. (2017).


Previous studies of the genetic landscape of Ireland have suggested homogeneity, with population substructure undetectable using single-marker methods. Here we have harnessed the haplotype-based method fineSTRUCTURE in an Irish genome-wide SNP dataset, identifying 23 discrete genetic clusters which segregate with geographical provenance. Cluster diversity is pronounced in the west of Ireland but reduced in the east where older structure has been eroded by historical migrations. Accordingly, when populations from the neighbouring island of Britain are included, a west-east cline of Celtic-British ancestry is revealed along with a particularly striking correlation between haplotypes and geography across both islands. A strong relationship is revealed between subsets of Northern Irish and Scottish populations, where discordant genetic and geographic affinities reflect major migrations in recent centuries. Additionally, Irish genetic proximity of all Scottish samples likely reflects older strata of communication across the narrowest inter-island crossing. Using GLOBETROTTER we detected Irish admixture signals from Britain and Europe and estimated dates for events consistent with the historical migrations of the Norse-Vikings, the Anglo-Normans and the British Plantations. The influence of the former is greater than previously estimated from Y chromosome haplotypes. In all, we paint a new picture of the genetic landscape of Ireland, revealing structure which should be considered in the design of studies examining rare genetic variation and its association with traits.

Here are some interesting excerpts (emphasis mine):

Population structure in Ireland

The geographical distribution of this deep subdivision of Leinster resembles pre-Norman territorial boundaries which divided Ireland into fifths (cúige), with north Leinster a kingdom of its own known as Meath (Mide) [15]. However interpreted, the firm implication of the observed clustering is that despite its previously reported homogeneity, the modern Irish population exhibits genetic structure that is subtly but detectably affected by ancestral population structure conferred by geographical distance and, possibly, ancestral social structure.

ChromoPainter PC1 demonstrated high diversity amongst clusters from the west coast, which may be attributed to longstanding residual ancient (possibly Celtic) structure in regions largely unaffected by historical migration. Alternatively, genetic clusters may also have diverged as a consequence of differential influence from outside populations. This diversity between western genetic clusters cannot be explained in terms of geographic distance alone.

In contrast to the west of Ireland, eastern individuals exhibited relative homogeneity; (…) The overall pattern of western diversity and eastern homogeneity in Ireland may be explained by increased gene flow and migration into and across the east coast of Ireland from geographically proximal regions, the closest of which is the neighbouring island of Britain.

Analysis of variance of the British admixture component in cluster groups showed a significant difference (p < 2×10-16), indicating a role for British Anglo-Saxon admixture in distinguishing clusters, and ChromoPainter PC2 was correlated with the British component (p < 2×10-16), explaining approximately 43% of the variance. PC2 therefore captures an east to west Anglo-Celtic cline in Irish ancestry. This may explain the relative eastern homogeneity observed in Ireland, which could be a result of the greater English influence in Leinster and the Pale during the period of British rule in Ireland following the Norman invasion, or simply geographic proximity of the Irish east coast to Britain. Notably, the Ulster cluster group harboured an exceptionally large proportion of the British component (Fig 1D and 1E), undoubtedly reflecting the strong influence of the Ulster Plantations in the 17th century and its residual effect on the ethnically British population that has remained.

Fine-grained population structure in Ireland. (A) fineSTRUCTURE clustering dendrogram for 1,035 Irish individuals. Twenty-three clusters are defined, which are combined into cluster groups for clusters that are neighbouring in the dendrogram, overlapping in principal component space (B) and sampled from regions that are geographically contiguous. Details for each cluster in the dendrogram are provided in S1 Fig. (B) Principal components analysis (PCA) of haplotypic similarity, based on ChromoPainter coancestry matrix for Irish individuals. Points are coloured according to cluster groups defined in (A); the median location of each cluster group is plotted. (C) Map of Irelandshowing the sampling location for a subset of 588 individuals analysed in (A) and (B), coloured by cluster group. Points have been randomly jittered within a radius of 5 km to preserve anonymity. Precise sampling location for 44 Northern Irish individuals from the People of the British Isles dataset was unknown; these individuals are plotted geometrically in a circle. (D) “British admixture component” (ADMIXTURE estimates; k=2) for Irish cluster groups. This component has the largest contribution in ancient Anglo-Saxons and the SEE cluster. (E) Linear regression of principal component 2 (B) versus British admixture component (r2 = 0.43; p < 2×10-16). Points are coloured by cluster group. (Standard error for ADMIXTURE point estimates presented in S11 Fig.)

On the genetic structure of the British Isles

The genetic substructure observed in Ireland is consistent with long term geographic diversification of Celtic populations and the continuity shown between modern and Early Bronze Age Irish people

Clusters representing Celtic populations harbouring less Anglo-Saxon influence separate out above and below SEE on PC4. Notably, northern Irish clusters (NLU), Scottish (NISC, SSC and NSC), Cumbria (CUM) and North Wales (NWA) all separate out at a mutually similar level, representing northern Celtic populations. The southern Celtic populations Cornwall (COR), south Wales (SWA) and south Munster (SMN) also separate out on similar levels, indicating some shared haplotypic variation between geographically proximate Celtic populations across both Islands. It is notable that after the split of the ancestrally divergent Orkney, successive ChromoPainter PCs describe diversity in British populations where “Anglo-saxonization” was repelled [22]. PC3 is dominated by Welsh variation, while PC4 in turn splits North and South Wales significantly, placing south Wales adjacent to Cornwall and north Wales at the other extreme with Cumbria, all enclaves where Brittonic languages persisted.

In an interesting symmetry, many Northern Irish samples clustered strongly with southern Scottish and northern English samples, defining the Northern Irish/Cumbrian/Scottish (NICS) cluster group. More generally, by modelling Irish genomes as a linear mixture of haplotypes from British clusters, we found that Scottish and northern English samples donated more haplotypes to clusters in the north of Ireland than to the south, reflecting an overall correlation between Scottish/north English contribution and ChromoPainter PC1 position in Fig 1 (Linear regression: p < 2×10-16, r2 = 0.24).

North to south variation in Ireland and Britain are therefore not independent, reflecting major gene flow between the north of Ireland and Scotland (Fig 5) which resonates with three layers of historical contacts. First, the presence of individuals with strong Irish affinity among the third generation PoBI Scottish sample can be plausibly attributed to major economic migration from Ireland in the 19th and 20th centuries [6]. Second, the large proportion of Northern Irish who retain genomes indistinguishable from those sampled in Scotland accords with the major settlements (including the Ulster Plantation) of mainly Scottish farmers following the 16th Century Elizabethan conquest of Ireland which led to these forming the majority of the Ulster population. Third, the suspected Irish colonisation of Scotland through the Dál Riata maritime kingdom, which expanded across Ulster and the west coast of Scotland in the 6th and 7th centuries, linked to the introduction and spread of Gaelic languages [3]. Such a migratory event could work to homogenise older layers of Scottish population structure, in a similar manner as noted on the east coasts of Britain and Ireland. Earlier communications and movements across the Irish Sea are also likely, which at its narrowest point separates Ireland from Scotland by approximately 20 km.

Genes mirror geography in the British Isles. (A) fineSTRUCTURE clustering dendrogram for combined Irish and British data. Data principally split into Irish and British groups before subdividing into a total of 50 distinct clusters, which are combined into cluster groups for clusters that formed clades in the dendrogram, overlapped in principal component space (B) and were sampled from regions that are geographically contiguous. Names and labels follow the geographical provenance for the majority of data within the cluster group. Details for each cluster in the dendrogram are provided in S2 Fig. (B) Principal component analysis (PCA) of haplotypic similarity based on the ChromoPainter coancestry matrix, coloured by cluster group with their median locations labelled. We have chosen to present PC1 versus PC4 here as these components capture new information regarding correlation between haplotypic variation across Britain and Ireland and geography, while PC2 and PC3 (Fig 4) capture previously reported splitting for Orkney and Wales from Britain [7]. A map of Ireland and Britain is shown for comparison, coloured by sampling regions for cluster groups, the boundaries of which are defined by the Nomenclature of Territorial Units for Statistics (NUTS 2010), with some regions combined. Sampling regions are coloured by the cluster group with the majority presence in the sampling region; some sampling regions have significant minority cluster group representations as well, for example the Northern Ireland sampling region (UKN0; NUTS 2010) is majorly explained by the NICS cluster group but also has significant representation from the NLU cluster group. The PCA plot has been rotated clockwise by 5 degrees to highlight its similarity with the geographical map of the Ireland and Britain. NI, Northern Ireland; PC, principal component. Cluster groups that share names with groups from Fig 1 (NLU; SMN; CLN; CNN) have an average of 80% of their samples shared with the initial cluster groups. © EuroGeographics for the map and administrative boundaries, note some boundaries have been subsumed or modified to better reflect sampling regions.

Genomic footprints of migration into Ireland

Quite interesting is that it is haplogroups, and not admixture, that which defines the oldest migration layers into Ireland. Without evidence of paternal Y-DNA lineages we would probably not be able to ascertain the oldest migrations and languages broght by migrants, including Celtic languages:

Of all the European populations considered, ancestral influence in Irish genomes was best represented by modern Scandinavians and northern Europeans, with a significant single-date one-source admixture event overlapping the historical period of the Norse-Viking settlements in Ireland (p < 0.01; fit quality FQB > 0.985; Fig 6). (…) This suggests a contribution of historical Viking settlement to the contemporary Irish genome and contrasts with previous estimates of Viking ancestry in Ireland based on Y chromosome haplotypes, which have been very low [25]. The modern-day paucity of Norse-Viking Y chromosome haplotypes may be a consequence of drift with the small patrilineal effective population size, or could have social origins with Norse males having less influence after their military defeat and demise as an identifiable community in the 11th century, with persistence of the autosomal signal through recombination.

European admixture date estimates in northwest Ulster did not overlap the Viking age but did include the Norman period and the Plantations

The genetic legacies of the populations of Ireland and Britain are therefore extensively intertwined and, unlike admixture from northern Europe, too complex to model with GLOBETROTTER.

All-Ireland GLOBETROTTER admixture date estimates for European and British surrogate admixing populations. A summary of the date estimates and 95% confidence intervals for inferred admixture events into Ireland from European and British admixing sources is shown in (A), with ancestry proportion estimates for each historical source population for the two events and example coancestry curves shown in (B). In the coancestry curves Relative joint probability estimates the pairwise probability that two haplotype chunks separated by a given genetic distance come from the two modeled source populations respectively (ie FRA(8) and NOR-SG); if a single admixture event occurred, these curves are expected to decay exponentially at a rate corresponding to the number of generations since the event. The green fitted line describes this GLOBETROTTER fitted exponential decay for the coancestry curve. If the sources come from the same ancestral group the slope of this curve will be negative (as with FRA(8) vs FRA(8)), while a positive slope indicates that sources come from different admixing groups (as with FRA(8) vs NOR-SG). The adjacent bar plot shows the inferred genetic composition of the historical admixing sources modelled as a mixture of the sampled modern populations. A European admixture event was estimated by GLOBETROTTER corresponding to the historical record of the Viking age, with major contributions from sources similar to modern Scandinavians and northern Europeans and minor contributions from southern European-like sources. For admixture date estimates from British-like sources the influence of the Norman settlement and the Plantations could not be disentangled, with the point estimate date for admixture falling between these two eras and GLOBETROTTER unable to adequately resolve source and proportion details of admixture event (fit quality FQB< 0.985). The relative noise of the coancestry curves reflects the uncertainty of the British event. Cluster labels (for the European clustering dendrogram, see S4 Fig; for the PoBI clustering dendrogram, see S3 Fig): FRA(8), France cluster 8; NOR-SG, Norway, with significant minor representations from Sweden and Germany; SE_ENG, southeast England; N_SCOT(4) northern Scotland cluster 4.

Another study that strengthens the need to ascertain haplogroup-admixture differences between Yamna/Bell Beaker and Sredni Stog/Corded Ware.

Text and images from preprint article under a CC-BY-NC-ND 4.0 International license.

Featured image, from the article on Science Reports: The clustering of individuals with Irish and British ancestry based solely on genetics. Shown are 30 clusters identified by fineStructure from 2,103 Irish and British individuals. The dendrogram (left) shows the tree of clusters inferred by fineStructure and the map (right) shows the geographic origin of 192 Atlas Irish individuals and 1,611 British individuals from the Peoples of the British Isles (PoBI) cohort, labelled according to fineStructure cluster membership. Individuals are placed at the average latitude and longitude of either their great-grandparental (Atlas) or grandparental (PoBI) birthplaces. Great Britain is separated into England, Scotland, and Wales. The island of Ireland is split into the four Provinces; Ulster, Connacht, Leinster, and Munster. The outline of Britain was sourced from Global Administrative Areas (2012). GADM database of Global Administrative Areas, version 2.0. The outline of Ireland was sourced from Open Street Map Ireland, Copyright OpenStreetMap Contributors, ( – data available under the Open Database Licence. The figure was plotted in the statistical software language R46, version 3.4.1, with various packages.