Haplogroup J spread in the Mediterranean due to Phoenician and Greek colonizations


Open access A finely resolved phylogeny of Y chromosome Hg J illuminates the processes of Phoenician and Greek colonizations in the Mediterranean, by Finocchio et al. Scientific Reports (2018) Nº 7465.

Abstract (emphasis mine):

In order to improve the phylogeography of the male-specific genetic traces of Greek and Phoenician colonizations on the Northern coasts of the Mediterranean, we performed a geographically structured sampling of seven subclades of haplogroup J in Turkey, Greece and Italy. We resequenced 4.4 Mb of Y-chromosome in 58 subjects, obtaining 1079 high quality variants. We did not find a preferential coalescence of Turkish samples to ancestral nodes, contradicting the simplistic idea of a dispersal and radiation of Hg J as a whole from the Middle East. Upon calibration with an ancient Hg J chromosome, we confirmed that signs of Holocenic Hg J radiations are subtle and date mainly to the Bronze Age. We pinpointed seven variants which could potentially unveil star clusters of sequences, indicative of local expansions. By directly genotyping these variants in Hg J carriers and complementing with published resequenced chromosomes (893 subjects), we provide strong temporal and distributional evidence for markers of the Greek settlement of Magna Graecia (J2a-L397) and Phoenician migrations (rs760148062). Our work generated a minimal but robust list of evolutionarily stable markers to elucidate the demographic dynamics and spatial domains of male-mediated movements across and around the Mediterranean, in the last 6,000 years.

J2-L397. The star indicates the centroid of derived alleles. The solid square indicates the centroid of ancestral alleles, with its 95% C.I. (ellipse). In the insets: distributions of the pairwise sampling distances (in Km) for the carriers of the ancestral (black) and derived (white) allele, with solid and dashed lines indicating the respective averages. At right: median joining network of 7-STR haplotypes and SNPs in the same groups, with sectors coloured according to sampling location. Haplotype structure is detailed for some nodes, in the order YCA2a-YCA2b-DYS19-DYS390-DYS391-DYS392-DYS393 (in italics).

Interesting excerpts:

Two features of our tree are at odds with the simplistic idea of a dispersal of Hg J as a whole from the Middle East towards Greece and Italy and an accompanying radiation26. First, there is little evidence of sudden diversification between 15 and 5 kya, a period of likely population increase and pressure for range expansion, due to the Agricultural revolution in the Fertile Crescent. Second, within each subclade, lineages currently sampled in Turkey do not show up as preferentially ancestral. Both findings are replicated and reinforced by examining the previous landmark studies. Our Turkish samples do not coalesce preferentially to ancestral nodes when mapped onto these studies’ trees.

Additional relevant information on the entire Hg J comes from the discontinuous distribution of J2b-M12. The northern fringe of our sample is enriched in the J2b-M241 subclade, which reappears in the gulf of Bengal38,45, with low frequencies in the intervening Iraq46 and Iran47. No J2b-M12 carriers were found among 35 modern Lebanese, as contrasted to one of two ancient specimens from the same region35.

In summary, a first conclusion of our sequencing effort and merge with available data is that the phylogeography of Hg J is complex and hardly explained by the presence of a single population harbouring the major lineages at the onset of agriculture and spreading westward. A unifying explanation for all the above inconsistencies could be a centre of initial radiation outside the area here sampled more densely, i.e. the Caucasus and regions North of it, from which different Hg J subclades may have later reached mainland Italy, Greece and Turkey, possibly following different routes and times. Evidence in this direction comes from the distribution of J2a-M41045,48 and the early-49 or mid-Holocene50 southward spread of J1.

Supplemental Figure 7. Maps of sampling locations for the carriers of the derived allele (white triangle point down) at the indicated SNP vs carriers of the ancestral allele (black triangle point-up), conditioned on identical genotype at the same most terminal marker. Coastlines were drawn with the R packages18 “map” and “mapproj” v. 3.1.3 (https://cran.r-project.org/web/packages/mapproj/index.html), and additional features added with default functions. The star triangle indicates the centroid of derived alleles. The solid square indicates the centroid of ancestral alleles, with its 95% C.I. (ellipse). In the insets: distributions of the pairwise sampling distances (in Km) for the carriers of the ancestral (black) and derived (white) allele, with solid and dashed lines indicating the respective averages. At right: median joining network of 7-STR haplotypes and SNPs in the same groups, with sectors coloured according to sampling location. Haplotype structure is detailed for some nodes, in the order YCA2a-YCA2b-DYS19-DYS390-DYS391-DYS392-DYS393 (in italics).

The lineage defined by rs779180992, belonging to J2b-M205, and dated at 4–4.5 kya, has a radically different distribution, with derived alleles in Continental Italy, Greece and Northern Turkey, and two instances in a Palestinian and a Jew. The interpretation of the spread of this lineage is not straightforward. Tentative hypotheses are linked to Southward movements that occurred in the Balkan Peninsula from the Bronze Age29,53, through the Roman occupation and later54.

The slightly older (5.6–6.3 kya) branch 98 lineage displays a similar trend of a Eastward positioning of derived alleles, with the notable difference of being present in Sardinia, Crete, Cyprus and Northern Egypt. This feature and the low frequency of the parental J2a-M92 lineage in the Balkans27 calls for an explanation different from the above.

Finally, we explored the distribution of J2a-L397 and three derived lineages within it. J2a-L397 is tightly associated with a typical DYS445 6-repeat allele. This has been hypothesized as a marker of the Greek colonizations in the Mediterranean55, based on its presence in Greek Anatolia and Provence (France), a region with attested Iron Age Greek contribution. All of our chromosomes in this clade were characterized also by DYS391(9), confirming their Anatolian Greek signature. We resolved the J2a-L397 clade to an unprecedented precision, with three internal markers which allow a finer discrimination than STRs. The ages of the three lineages (2.0–3.0 kya) are compatible with the beginning of the Greek colonial period, in the 8th century BCE. The three subclades have different distributions (Fig. 2B), with two (branches 57, 59) found both East and West to Greece, and one only in Italy (branch 58). As to Mediterranean Islands, J2a-L397 was found in Cyprus56 and Crete43. Its presence as one of the three branches 57–59 will represent an important test. In Italy all three variants were found mainly along the Western coast (18/25), which hosted the preferred Greek trade cities. The finding of all three differentiated lineages in Locri excludes a local founder effect of a single genealogy. Interestingly, an important Greek colony was established in this location, with continuity of human settlement until modern times. The sample composed of the same subjects displayed genetic affinities with Eastern Greece and the Aegean also at autosomal markers57. In summary, the distributions of branches 57–59 mirror the variety of the cities of origin and geographic ranges during the phases of the colonization process58.

So, there you have it, another proof that haplogroup J and CHG-related ancestry in the Mediterranean was mainly driven by different (and late) expansions of historic peoples.


Population structure in Argentina shows most European sources of South European origin


Open access Population structure in Argentina, by Muzzio et al., PLOS One (2018).

Abstract (emphasis mine):

We analyzed 391 samples from 12 Argentinian populations from the Center-West, East and North-West regions with the Illumina Human Exome Beadchip v1.0 (HumanExome-12v1-A). We did Principal Components analysis to infer patterns of populational divergence and migrations. We identified proportions and patterns of European, African and Native American ancestry and found a correlation between distance to Buenos Aires and proportion of Native American ancestry, where the highest proportion corresponds to the Northernmost populations, which is also the furthest from the Argentinian capital. Most of the European sources are from a South European origin, matching historical records, and we see two different Native American components, one that spreads all over Argentina and another specifically Andean. The highest percentages of African ancestry were in the Center West of Argentina, where the old trade routes took the slaves from Buenos Aires to Chile and Peru. Subcontinentaly, sources of this African component are represented by both West Africa and groups influenced by the Bantu expansion, the second slightly higher than the first, unlike North America and the Caribbean, where the main source is West Africa. This is reasonable, considering that a large proportion of the ships arriving at the Southern Hemisphere came from Mozambique, Loango and Angola.

Principal component analysis.
On the x axis is PC 1 while PC2 is the y axis. Plus symbols represent Argentinian samples and circles are for reference panels. Fig 2a (left) Argentinians with YRI and LWK for African references (“African”), IBS and TSI for European references (“European”) and the PEL, MXL, PUR and CLM as a Latin American references. Fig 2b (right) samples from Argentina with IBS, MXL, CLM and PEL.


Distribution of Southern Iberian haplogroup H indicates exchanges in the western Mediterranean

Recent open access paper The distribution of mitochondrial DNA haplogroup H in southern Iberia indicates ancient human genetic exchanges along the western edge of the Mediterranean, by Hernández, Dugoujon, Novelletto, Rodríguez, Cuesta and Calderón, BMC Genetics (2017).

Abstract (emphasis mine):

The structure of haplogroup H reveals significant differences between the western and eastern edges of the Mediterranean, as well as between the northern and southern regions. Human populations along the westernmost Mediterranean coasts, which were settled by individuals from two continents separated by a relatively narrow body of water, show the highest frequencies of mitochondrial haplogroup H. These characteristics permit the analysis of ancient migrations between both shores, which may have occurred via primitive sea crafts and early seafaring. We collected a sample of 750 autochthonous people from the southern Iberian Peninsula (Andalusians from Huelva and Granada provinces). We performed a high-resolution analysis of haplogroup H by control region sequencing and coding SNP screening of the 337 individuals harboring this maternal marker. Our results were compared with those of a wide panel of populations, including individuals from Iberia, the Maghreb, and other regions around the Mediterranean, collected from the literature.

Both Andalusian subpopulations showed a typical western European profile for the internal composition of clade H, but eastern Andalusians from Granada also revealed interesting traces from the eastern Mediterranean. The basal nodes of the most frequent H sub-haplogroups, H1 and H3, harbored many individuals of Iberian and Maghrebian origins. Derived haplotypes were found in both regions; haplotypes were shared far more frequently between Andalusia and Morocco than between Andalusia and the rest of the Maghreb. These and previous results indicate intense, ancient and sustained contact among populations on both sides of the Mediterranean.

Our genetic data on mtDNA diversity, combined with corresponding archaeological similarities, provide support for arguments favoring prehistoric bonds with a genetic legacy traceable in extant populations. Furthermore, the results presented here indicate that the Strait of Gibraltar and the adjacent Alboran Sea, which have often been assumed to be an insurmountable geographic barrier in prehistory, served as a frequently traveled route between continents.

a, b, c. Interpolated frequency surfaces of clade H and its main sub-clades (H1 and H3). Frequencies (%) are showed in a colour scale. See information about the populations used in Additional files 4 and 5. Map templates were taken from Natural Earth free map repository (http://www.naturalearthdata.com/)

I usually find mtDNA data, especially studies like this one based on modern populations, very difficult to interpret for anthropological purposes. It is well-known that there are important differences in the pattern of Y-DNA and mtDNA expansion and distribution.

A paragraph in this respect caught my attention:

The patterns of variation in the Y-chromosome between western and eastern Andalusians, based on 416 males, have also been investigated for a set of Y-Short Tandem Repeats (Y-STRs) and Y-SNPs [53, 54, 55], Calderón et al., unpublished data] in combination to mtDNA analyses ([18, 19] and present study). In general, for both uniparental makers, Andalusians exhibit a typical western European genetic background, with peak frequencies of mtDNA Hg H and Y-chromosome Hg R1b1b2-M269 (45% and 60%, respectively). Interestingly, our results have further revealed that the influence of African female input is far more significant when compared to male influence in contemporary Andalusians. The lack of correspondence between the maternal and paternal genetic profiles of human populations reflects intrinsic differences in migratory behavior related to sex-biased processes and admixture, as well as differences in male and female effective population sizes related to the variance in reproductive success affected, for example, by polygyny [56, 57].

I think that the greater reduction in patrilineal lineages compared to maternal lineages we usually see during and after prehistoric or historic migrations have more to do with the renown Uí Néill family case and with war-related casualties (since combatants were usually men) than with other more popular explanations, such as enslavement of women or polygyny.

The most successful paternal lines (anywhere in the world) were probably those who remained in power for a long time (be it a patriarchal society based on families, clans, or more complex organizational units), who were richer and thus more capable of having healthy offspring, who in turn were able to survive longer and have more children who inherited power, etc.

In case of recent migrations or population movements that disrupt the previously established organization, after a certain number of generations, successful patrilocal families (usually from incoming lineages) might slowly dominate over a whole region, with poorer families (usually of ‘indigenous’ lineages) suffering a greater – especially perinatal and child – mortality, without any obvious (pre)historic event associated to these gradual changes.

This gradual replacement of paternal lineages is compatible with the adoption of the native language by newcomers. If the number of migrants is greater that the native population, and especially if their technology is more advanced, then a more radical change including ethnolinguistic identification is more likely.

I don’t deny the (pre)historic existence of radical replacement of male populations with continuity of female lineages due to massacres of men, female slavery, or polygyny, but they are probably not the main explanation for most regional differences seen in paternal lineages, and should thus be used with caution.

Gradual replacement and founder effects are also the most logical explanation for why autochthonous continuity myths (that the modern regional prevalence of few successful lineages tended to create in the 2000s) haven’t been corroborated by ancient DNA; e.g. R1b-DF27 in Basques, N1c-M178 in Finnic populations, R1a-Z283 in Slavs, etc. There is nothing different in those areas from other recent founder effects and internal migratory flows seen everywhere in Europe in the past millennia.

Paper discovered via a link by Alberto Gonzalez on Facebook group Iberia ADN


Iberia in the Copper and Early Bronze Age: Cultural, demographic, and environmental analysis


New paper (behind paywall), Cultural, Demographic and Environmental Dynamics of the Copper and Early Bronze Age in Iberia (3300–1500 BC): Towards an Interregional Multiproxy Comparison at the Time of the 4.2 ky BP Event, Blanco-González, Lillios, López-Sáez, et al. J World Prehist (2018).

Abstract (emphasis mine):

This paper presents the first comprehensive pan-Iberian overview of one of the major episodes of cultural change in later prehistoric Iberia, the Copper to Bronze Age transition (c. 2400–1900 BC), and assesses its relationship to the 4.2 ky BP climatic event. It synthesizes available cultural, demographic and palaeoenvironmental evidence by region between 3300 and 1500 BC. Important variation can be discerned through this comparison. The demographic signatures of some regions, such as the Meseta and the southwest, diminished in the Early Bronze Age, while other regions, such as the southeast, display clear growth in human activities; the Atlantic areas in northern Iberia barely experienced any changes. This paper opens the door to climatic fluctuations and inter-regional demic movements within the Peninsula as plausible contributing drivers of particular historical dynamics.

Division of Iberia into 5 study areas according to their culture history (3300–1550 BC)

Interesting excerpts summarizing key trends in the different regions:

  • Between 2200 and 1900 BC, the northernmost regions (i.e. Galicia, the Cantabrian strip and the northeastern sector to the north of the Ebro valley) underwent relatively minor changes in the realms of settlement and burial practices. (…) In addition, some Atlantic areas show a marked and statistically significant fall in human activity c. 2200 BC, with a subsequent recovery c. 1600 BC, and such observations are matched by paleoenvironmental proxies and a lack of known EBA sites
  • The overall impression from the Meseta is one of sharp disruption in cultural practices; these include both settlement and burial patterns, abrupt shifts in local climate conditions, and striking differences in human pressure on vegetation. However, there was also clear intra-regional variability, with remarkable internal particularities and differential tempos between the western and eastern sectors. In terms of material culture, discontinuity with the Copper Age is the main trend in the western Duero and the Tagus valleys, yet EBA communities to the north of the Central System adopted far more distinctive and therefore traceable site types (hilltops) and material repertoire. This shift was even stronger in the case of the Motillas culture at La Mancha, whose pathway seems closely tied to the Argaric area.
  • Intra-regional variability is also apparent within the northeast (…) In the second millennium BC, material culture changed, long-distance exchange intensified and anthropogenic pressure increased, despite continuity in diverse realms of social practice.
  • The pattern in the southwest was one of marked discontinuity with two key features: a) it follows the general decreasing trends manifest across Atlantic Iberia; and b) its temporality was clearly different from the rest of the Peninsula and apparently unrelated to the 4.2 ky BP event. Thus, a highly conspicuous and rich variety of cultural expressions in the Chalcolithic, with an early and marked peak in human activity during the Beaker phase c. 2500 BC, gave way to a sudden cultural collapse prior to the onset of the EBA
  • The southeast exhibits one of the most remarkable cultural shifts in Western Europe. (…) The radical transformation in Chalcolithic materiality and ways of life could be regarded as a kind of societal collapse. The Argaric, a highly hierarchical and integrated regional polity, is the clearest example of a new scenario that emerged after the 4.2 ky BP event, yet the contributing role of environmental change and immigration from other regions remains to be fully explored.

Since R1b-DF27 lineages are widely distributed in modern western Europe, it is only logical that the recent find of its first ancient sample in Iberia has sparked the interest for Chalcolithic and Early Bronze Age Iberian cultures.

There is not much literature in English about Iberian prehistory, especially on the evolution of Bell Beaker culture. Also, most papers in Spanish on this cultural phenomenon – in my humble opinion, as a non-archaeologist – seem to be written from a merely descriptive archaeological point of view, many of them still sharing the radiocarbon-based assessment of origin and distribution of materials, instead of more complex anthropological models of cultural change and potential migrations.

Nevertheless, changes and influences in Iberian cultures are obvious regardless of the view taken on population movements (which are becoming quite clear now), and this paper seems to me a thorough review, very interesting for international researchers when interpreting ancient DNA from Iberia.

Featured image, modified from the paper: “The Bell Beaker culture in the northern Meseta: an artistic recreation of funerary ritual at Fuente Olmedo (Valladolid). Source: Garrido-Pena et al. 2011, Fig. 7.7”.

EDIT (21 MAR 2018): Interesting C14 date repository project Cronología de la Prehistoria de la Península Ibérica (read a brief description, in Spanish).


First Iberian R1b-DF27 sample, probably from incoming East Bell Beakers


I had some more time to read the paper by Valdiosera et al. (2018) and its supplementary material.

One of the main issues since the publication of Olalde et al. (2018) (and its hundreds of Bell Beaker samples) was the lack of a clear Y-DNA R1b-DF27 subclades among East Bell Beaker migrants, which left us wondering when the subclade entered the Iberian Peninsula, since it could have (theoretically) happened from the Chalcolithic to the Iron Age.

My prediction was that this lineage found today widespread among the Iberian population crossed the Pyrenees quite early, during the Chalcolithic, with migrating East Bell Beakers expanding North-West Indo-European dialects, and that it spread slowly afterwards.

The first ancient sample clearly identified as of R1b-DF27 subclade is found in this paper, at the Late Bronze Age site Cueva de los Lagos. Although it is unidentified and has no radiocarbon date, the site as a whole is associated with the Cogotas culture and its Bouquique ceramic decoration.

Y-DNA and mtDNA haplogroups, from the paper. Sequencing statistics and contamination rates for newly generated sequence data.

It was found in the northern part of the Cogotas culture territory (which lies mainly between Castille and Aragon, in North-Central Spain), shows evident steppe admixture, and it has become obvious with the latest papers (including this one) that R1b-M269 lineages intruded south of the Pyrenees associated with East Bell Beaker migrations.

The Proto-Cogotas culture is associated with a Bell Beaker substrate influenced by either El Argar or Atlantic Bronze, and the specific type of ceramics found at this Cogotas culture site are probably from the mid-2nd millennium, which is too early for the Celtic expansion.

Supervised ADMIXTURE results.

Nevertheless, due to the quite likely late date of the sample (in the centuries around 1500 BC), there is still a possibility that incoming R1b-DF27 lineages were not among the early R1b-M269 lineages found in the Iberian Chalcolithic, and were associated with later migrations from Central Europe, potentially linked to the expansion of the Urnfield culture, and thus nearer to an Italo-Celtic community.

Diachronic map of migrations in Europe ca. 1250-750 BC.

In any of these scenarios, a Pre-Celtic expansion of North-West Indo-European in Iberia (possibly associated with Lusitanian) is still the best explanation for the origin and expansion of (at least some) modern Iberian R1b-DF27 lineages, including those found among the Basque-speaking population.

This implies that the ‘indigenous’ Neolithic lineages of Iberia (like I2 and G2a2) were replaced with subsequent internal gene flows and founder effects, such as those that evidently happened (probably quite recently) among Basques, even though indigenous languages show an obvious continuity.

I would say this is the last nail in the coffin for autochthonous Y-DNA continuity theories for Spain and France (i.e. for the traditional Vasconic-Uralic hypothesis), but we know that data is never enough for any die hard continuist…so let’s just say another nail in the coffin for endless autochthonous continuity theories.

EDIT (18 & 26 MAR 2018): Genetiker has published Y-SNP calls for both R1b samples, showing this one is R1b1a1a2a1a2a-BY15964 (see modern members of this subclade in ytree), and that the other one is R1b1a1a2a~L23.


Iberian prehistoric migrations in Genomics from Neolithic, Chalcolithic, and Bronze Age


New open access paper Four millennia of Iberian biomolecular prehistory illustrate the impact of prehistoric migrations at the far end of Eurasia, by Valdiosera, Günther, Vera-Rodríguez, et al. PNAS (2018) published ahead of print.

Abstract (emphasis mine)

Population genomic studies of ancient human remains have shown how modern-day European population structure has been shaped by a number of prehistoric migrations. The Neolithization of Europe has been associated with large-scale migrations from Anatolia, which was followed by migrations of herders from the Pontic steppe at the onset of the Bronze Age. Southwestern Europe was one of the last parts of the continent reached by these migrations, and modern-day populations from this region show intriguing similarities to the initial Neolithic migrants. Partly due to climatic conditions that are unfavorable for DNA preservation, regional studies on the Mediterranean remain challenging. Here, we present genome-wide sequence data from 13 individuals combined with stable isotope analysis from the north and south of Iberia covering a four-millennial temporal transect (7,500–3,500 BP). Early Iberian farmers and Early Central European farmers exhibit significant genetic differences, suggesting two independent fronts of the Neolithic expansion. The first Neolithic migrants that arrived in Iberia had low levels of genetic diversity, potentially reflecting a small number of individuals; this diversity gradually increased over time from mixing with local hunter-gatherers and potential population expansion. The impact of post-Neolithic migrations on Iberia was much smaller than for the rest of the continent, showing little external influence from the Neolithic to the Bronze Age. Paleodietary reconstruction shows that these populations have a remarkable degree of dietary homogeneity across space and time, suggesting a strong reliance on terrestrial food resources despite changing culture and genetic make-up.

(A) f4 statistics testing affinities of prehistoric European farmers to either early Neolithic Iberians or central Europeans, restricting these reference populations to SNP-captured individuals to avoid technical artifacts driving the affinities. The boxplots in A show the distributions of all individual f4 statistics belonging to the respective groups. The signal is not sensitive to the choice of reference populations and is not driven by hunter-gatherer–related admixture (Datasets S4 and S5). (B) Estimates of ancestry proportions in different prehistoric Europeans as well as modern southwestern Europeans. Individuals from regions of Iberia were grouped together for the analysis in A and B to increase sample sizes per group and reduce noise


We present a comprehensive biomolecular dataset spanning four millennia of prehistory across the whole Iberian Peninsula. Our results highlight the power of archaeogenomic studies focusing on specific regions and covering a temporal transect. The 4,000 y of prehistory in Iberia were shaped by major chronological changes but with little geographic substructure within the Peninsula. The subtle but clear genetic differences between early Neolithic Iberian farmers and early Neolithic central European farmers point toward two independent migrations, potentially originating from two slightly different source populations. These populations followed different routes, one along the Mediterranean coast, giving rise to early Neolithic Iberian farmers, and one via mainland Europe forming early Neolithic central European farmers. This directly links all Neolithic Iberians with the first migrants that arrived with the initial Mediterranean Neolithic wave of expansion. These Iberians mixed with local hunter-gatherers (but maintained farming/pastoral subsistence strategies, i.e., diet), leading to a recovery from the loss of genetic diversity emerging from the initial migration founder bottleneck. Only after the spread of Bell Beaker pottery did steppe-related ancestry arrive in Iberia, where it had smaller contributions to the population compared with the impact that it had in central Europe. This implies that the two prehistoric migrations causing major population turnovers in central Europe had differential effects at the southwestern edge of their distribution: The Neolithic migrations caused substantial changes in the Iberian gene pool (the introduction of agriculture by farmers) (6, 9, 11, 13, 24), whereas the impact of Bronze Age migrations (Yamnaya) was significantly smaller in Iberia than in north-central Europe (24). The post-Neolithic prehistory of Iberia is generally characterized by interactions between residents rather than by migrations from other parts of Europe, resulting in relative genetic continuity, while most other regions were subject to major genetic turnovers after the Neolithic (4, 6, 7, 9, 25, 48). Although Iberian populations represent the furthest wave of Neolithic expansion in the westernmost Mediterranean, the subsequent populations maintain a surprisingly high genetic legacy of the original pioneer farming migrants from the east compared with their central European counterparts. This counterintuitive result emphasizes the importance of in-depth diachronic studies in all parts of the continent.


Patterns of genetic differentiation and the footprints of historical migrations in the Iberian Peninsula

Open access preprint (which I announced already) at bioRxiv Patterns of genetic differentiation and the footprints of historical migrations in the Iberian Peninsula, by Bycroft et al. (2018).

Abstract (emphasis mine):

Genetic differences within or between human populations (population structure) has been studied using a variety of approaches over many years. Recently there has been an increasing focus on studying genetic differentiation at fine geographic scales, such as within countries. Identifying such structure allows the study of recent population history, and identifies the potential for confounding in association studies, particularly when testing rare, often recently arisen variants. The Iberian Peninsula is linguistically diverse, has a complex demographic history, and is unique among European regions in having a centuries-long period of Muslim rule. Previous genetic studies of Spain have examined either a small fraction of the genome or only a few Spanish regions. Thus, the overall pattern of fine-scale population structure within Spain remains uncharacterised. Here we analyse genome-wide genotyping array data for 1,413 Spanish individuals sampled from all regions of Spain. We identify extensive fine-scale structure, down to unprecedented scales, smaller than 10 Km in some places. We observe a major axis of genetic differentiation that runs from east to west of the peninsula. In contrast, we observe remarkable genetic similarity in the north-south direction, and evidence of historical north-south population movement. Finally, without making particular prior assumptions about source populations, we show that modern Spanish people have regionally varying fractions of ancestry from a group most similar to modern north Moroccans. The north African ancestry results from an admixture event, which we date to 860 – 1120 CE, corresponding to the early half of Muslim rule. Our results indicate that it is possible to discern clear genetic impacts of the Muslim conquest and population movements associated with the subsequent Reconquista.

“(a) Binary tree showing the inferred hierarchical relationships between clusters. The colours and points correspond to each cluster as shown on the map, and the length of the coloured rectangles is proportional to the number of individuals assigned to that cluster. We combined some small clusters (Methods) and the thick black branches indicate the clades of the tree that we visualise in the map. We have labeled clusters according to the approximate location of most of their members, but geographic data was not used in the inference. (b) Each individual is represented by a point placed at (or close to) the centroid of their grandparents’ birthplaces. On this map we only show the individuals for whom all four grandparents were born within 80km of their average birthplace, although the data for all individuals were used in the fineSTRUCTURE inference. The background is coloured according to the spatial densities of each cluster at the level of the tree where there are 14 clusters (see Methods). The colour and symbol of each point corresponds to the cluster the individual was assigned to at a lower level of the tree, as shown in (a). The labels and boundaries of Spain’s Autonomous Communities are also shown.”

Some interesting excerpts:

Our results further imply that north west African-like DNA predominated in the migration. Moreover, admixture mainly, and perhaps almost exclusively, occurred within the earlier half of the period of Muslim rule. Within Spain, north African ancestry occurs in all groups, although levels are low in the Basque region and in a region corresponding closely to the 14th-century ‘Crown of Aragon’. Therefore, although genetically distinct this implies that the Basques have not been completely isolated from the rest of Spain over the past 1300 years.

NOTE. I must add here that the Expulsion of Moriscos is known to have been quite successful in the old Crown of Aragon – deeply affecting its economy – , in contrast with other territories of the Crown of Castille, where they either formed less sizeable communities, or were dispersed and eventually Christened and integrated with local communities. For example, thousands of Moriscos from Granada were dispersed following the War of Alpujarras (1567–1571) into different regions of the Crown of Castille, and many could not be later expelled due to the locals’ resistance to follow the expulsion edict.

Perhaps surprisingly, north African ancestry does not reflect proximity to north Africa, or even regions under more extended Muslim control. The highest amounts of north African ancestry found within Iberia are in the west (11%) including in Galicia, despite the fact that the region of Galicia as it is defined today (north of the Miño river), was never under Muslim rule and Berber settlements north of the Douro river were abandoned by. This observation is consistent with previous work using Y-chromosome data. We speculate that the pattern we see is driven by later internal migratory flows, such as between Portugal and Galicia, and this would also explain why Galicia and Portugal show indistinguishable ancestry sharing with non-Spanish groups more generally. Alternatively, it might be that these patterns reflect regional differences in patterns of settlement and integration with local peoples of north African immigrants themselves, or varying extents of the large-scale expulsion of Muslim people, which occurred post-Reconquista and especially in towns and cities.

We estimated ancestry profiles for each point on a fine spatial grid across Spain (Methods). Gray crosses show
the locations of sampled individuals used in the estimation. Map shows the fraction contributed from the donor group ‘NorthMorocco’.

Overall, the pattern of genetic differentiation we observe in Spain reflects the linguistic and geopolitical boundaries present around the end of the time of Muslim rule in Spain, suggesting this period has had a significant and long-term impact on the genetic structure observed in modern Spain, over 500 years later. In the case of the UK, similar geopolitical correspondence was seen, but to a different period in the past (around 600 CE). Noticeably, in these two cases, country-specific historical events rather than geographic barriers seem to drive overall patterns of population structure. The observation that fine-scale structure evolves at different rates in different places could be explained if observed patterns tend to reflect those at the ends of periods of significant past upheaval, such as the end of Muslim rule in Spain, and the end of the Anglo-Saxon and Danish Viking invasions in the UK.

Certain people want to believe (well into the 21st century) into ideal ancestral populations and ancient ethnolinguistic identifications linked to one’s own – or the own country’s dominant – ancestral components and Y-DNA haplogroup.

We are nevertheless seeing how mainly the most recent relevant geopolitical events and late internal migratory flows have shaped the genetic structure (including Y-DNA haplogroup composition) of modern regions and countries regardless of its population’s actual language or ethnic identification, whether (pre)historical or modern.

Another surprise for many, I guess.


Population substructure in Iberia, highest in the north-west territory (to appear in Nature)

A manuscript co-authored by Angel Carracedo, from the University of Santiago de Compostela, and (always according to him) pre-accepted in Nature, will offer more insight into the population substructure of Spain, based on autosomal DNA.

Carracedo’s lecture about DNA (in Galician), including his summary of the paper (from december 2017):

Some of the points made in the video:

  • The study shows a situation parallelling – as expected – the expansion of Spanish Medieval kingdoms during the Reconquista (and subsequent repopulation).
  • In it, the biggest surprise seems to be the greater substructure found in Galicia, the north-western Spanish territory – greater even than expected by the authors.
  • As a side note, Galicia shows a great influence from Moorish” ancestral components, due mainly to the influx from Portugal, which shows more.

It is difficult to judge only from the image and his words, but one could say that there are:

  • Certain quite old ancestral Galician groups;
    • then two – also quite old – ancestral Basque groups;
      • then more recent Galician groups;
        • and then a common, central Spanish group – including
          • a wider Asturian-Catalan group, with a western Asturian-Leonese, and an eastern Catalan subgroup;
          • and a central Castillian-Aragonese group, also with a western Castillian, and an eastern Aragonese subgroup.
Spain’s population substructure, from the video.

We thought that certain parts of the British Isles could show ancestral components related to the old population, although this has not proven exactly right, due to more recent population expansions.

However, this paper might shed light to the controversy surrounding Lusitanian (possibly Gallaico-Lusitanian) as a Pre-Celtic Indo-European group of Iberia, either slightly older as an Italo-Celtic dialect, or potentially from the Bell Beaker expansion, whose genetic imprint might have survived the Roman conquest, which apparently didn’t replace its ancestral population.

Given the presence of a central Spanish group opposed to the other minor groups – and knowing that (at least part of) the Medieval kingdoms should be related to the Occitan region – due to the Celtic expansion, and also potentially later during the Visigothic Kingdom, and the Carolingian Empire – , we can only guess that the other (north-western and Basque) groups are potentially quite old, and reflect prehistoric population structures.

Just speculating here, of course. Another interesting genetic paper to await…

Seen first in the Facebook group Iberia ADN.