A history of male migration in and out of the Green Sahara

Open access research highlight A history of male migration in and out of the Green Sahara, by Yali Xue, Genome Biology (2018) 19:30, on the recent paper by D’Atanasio et al.

Insights from the Green Saharan Y-chromosomal findings (emphasis mine):

It is widely accepted that sub-Saharan Y chromosomes are dominated by E-M2 lineages carried by Bantu-speaking farmers as they expanded from West Africa starting < 5 kya, reaching South Africa within recent centuries [4]. The E-M2-Bantu lineages lie phylogenetically within the E-M2-Green Sahara lineage and show at least three explosive lineage expansions beginning 4.9–5.3 kya [5] (Fig. 1a). These events of E-M2-Bantu expansion are slightly later than the R-V88 expansion, and highlight the range of male demographic changes in the mid-Holocene. North of the Sahara, in addition to the four trans-Saharan haplogroups, haplogroup E-M81 (which diverged from E-M78 ~ 13 kya) became very common in present-day populations as a result of another massive expansion ~ 2 kya [6] (Fig. 1a).

Simplified Y-chromosomal phylogeny and inferred past or observed present-day distribution of relevant Y-chromosomal lineages. a Calibrated phylogenetic tree of Y-chromosomal lineages discussed in the text. Green shading represents the period when the present-day Sahara Desert was green and fertile. Lineages represented by filled pentagons have undergone very rapid expansions. b [featured image] The Green Sahara period 5–12 kya. Green shading indicates that the present-day Sahara Desert was green and fertile. The colors within the large oval represent the four Y-chromosomal haplogroups deduced to be present in the region at this time; specific locations are not implied. The arrows indicate the inferred origins of these haplogroups to the north or south, but specific origins and routes are not implied. c The present-day distributions of the four Green Saharan Y-chromosomal haplogroups. Yellow shading indicates the Sahara Desert. Each circle represents a sampled population, with the presence or absence of the four Green Saharan haplogroups shown by the colored sectors; other haplogroups may also be present in these populations, but are not shown. The small arrows indicate the inferred northwards and southwards movements of these haplogroups when the Sahara became uninhabitable.

Although Y chromosomes exist within populations and so share and reflect the general history of those populations, they can sometimes show some departures from other parts of the genome that result from differences in male and female behaviors. D’Atanasio et al. [1] highlight one such contrast in their study. Present-day North African populations show substantial sub-Saharan autosomal and mtDNA genetic components ascribed to the Roman and Arab slave trades 1–2 kya [7], but carry few sub-Saharan Y lineages from this source, probably reflecting the smaller numbers of male slaves and their reduced reproductive opportunities when compared to those of female slaves. The sub-Saharan Y chromosomes in these North African populations thus originate predominantly from the earlier Green Sahara period.

In this part of Africa, the indigenous languages that are spoken belong to three of the four African linguistic families (Afro-Asiatic, Nilo-Saharan and Niger-Congo). Interestingly, these languages show non-random associations with Y lineages. For example, Chadic languages within the Afro-Asiatic family are associated with haplogroup R-V88, whereas Nilo-Saharan languages are associated with specific sublineages within A3-M13 and E-M78, further illustrating the complex human history of the region.

The main question after D’Atanasio et al. (2018) is thus:

(…) what are the reasons for the very rapid R-V88 expansion 5–6 kya [1] and E-M81 expansion ~ 2 kya [6], and how do these expansions fit within general worldwide patterns of male-specific expansions, which in other cases have been linked to cultural and technological changes [5]?

I think that the only known haplogroup expansion that might fit today the spread and dialectalization of Afroasiatic, a proto-language probably contemporaneous or slighly older than Middle Proto-Indo-European, is that of R1b-V88 lineages. However, without ancient DNA samples to corroborate this, we cannot be sure.

See also:

Pleistocene North African genomes link Near Eastern and sub-Saharan African human populations


Pleistocene North African genomes link Near Eastern and sub-Saharan African human populations, by van de Loosdrecht et al. Science (2018).


North Africa is a key region for understanding human history, but the genetic history of its people is largely unknown. We present genomic data from seven 15,000-year-old modern humans from Morocco, attributed to the Iberomaurusian culture. We find a genetic affinity with early Holocene Near Easterners, best represented by Levantine Natufians, suggesting a pre-agricultural connection between Africa and the Near East. We do not find evidence for gene flow from Paleolithic Europeans into Late Pleistocene North Africans. The Taforalt individuals derive one third of their ancestry from sub-Saharan Africans, best approximated by a mixture of genetic components preserved in present-day West and East Africans. Thus, we provide direct evidence for genetic interactions between modern humans across Africa and Eurasia in the Pleistocene.


We analyzed the genetic affinities of the Taforalt individ-uals by performing principal component analysis (PCA) and model-based clustering of worldwide data (Fig. 2). When pro-jected onto the top PCs of African and West Eurasian popu-lations, the Taforalt individuals form a distinct cluster in an intermediate position between present-day North Africans (e.g., Amazighes (Berbers), Mozabite and Saharawi) and East Africans (e.g., Afar, Oromo and Somali) (Fig. 2A). Consist-ently, we find that all males with sufficient nuclear DNA preservation carry Y haplogroup E1b1b1a1 (M-78; table S16). This haplogroup occurs most frequently in present-day North and East African populations (18). The closely related E1b1b1b (M-123) haplogroup has been reported for Epipaleolithic Natufians and Pre-Pottery Neolithic Levantines (“Levant_N”) (16). Unsupervised genetic clustering also suggests a connection of Taforalt to the Near East. The three major components that comprise the Taforalt genomes are maximized in early Holocene Levantines, East African hunter-gatherer Hadza from north-central Tanzania, and West Africans (K = 10; Fig. 2B). In contrast, present-day North Africans have smaller sub-Saharan African components with minimal Hadza-related contribution (Fig. 2B).

Taforalt harboring an ancestry that contains additional affinity with South, East and Central African outgroups. None of the present-day or ancient Holocene African groups serve as a good proxy for this unknown ancestry, because adding them as the third source is still insufficient to match the model to the Taforalt gene pool.

Mitochondrial consensus sequences of the Taforalt indi-viduals belong to the U6a (n = 6) and M1b (n = 1) haplogroups (15), which are mostly confined to present-day populations in North and East Africa (7). U6 and M1 have been proposed as markers for autochthonous Maghreb ancestry, which might have been originally introduced into this region by a back-to-Africa migration from West Asia (6, 7). The occurrence of both haplogroups in the Taforalt individuals proves their pre-Holocene presence in the Maghreb.
(…) the diversification of haplogroup U6a and M1 found for Taforalt is dated to ~24,000 yBP (fig. S23), which is close in time to the earliest known appearance of the Iberomaurusian in Northwest Africa (25,845-25,270 cal. yBP at Tamar Hat (26)).

A summary of the genetic profile of the Taforalt individuals. (A) The top two PCs calculated from present-day African, Near Eastern and South European individuals from 72 populations. The Taforalt individuals are projected thereon (red-colored circles). Selected present-day populations are marked by colored symbols. Labels for other populations (marked by small grey circles) are provided in fig. S8. (B) ADMIXTURE results of chosen African and Middle Eastern populations (K = 10). Ancient individuals are labeled in red color. Major ancestry components in Taforalt are maximized in early Holocene Levantines (green), West Africans (purple) and East African Hadza (brown). The ancestry component prevalent in pre-Neolithic Europeans (beige) is absent in Taforalt.

The relationships of the Iberomaurusian culture with the preceding MSA, including the local backed bladelet technologies in Northeast Africa, and the Epigravettian in southern Europe have been questioned (13). The genetic profile of Taforalt suggests substantial Natufian-related and sub-Saharan African-related ancestries (63.5% and 36.5%, respec-tively), but not additional ancestry from Epigravettian or other Upper Paleolithic European populations. Therefore, we provide genomic evidence for a Late Pleistocene connection between North Africa and the Near East, predating the Neolithic transition by at least four millennia, while rejecting a potential Epigravettian gene flow from southern Europe into northern Africa within the resolution of our data.

It seems that the Taforalt gene pool (ca. 13000-12000 BC) cannot be explained by a connection with Upper Palaeolithic Europeans, but a more archaic admixture, so the authors cannot prove a migration through the Strait of Gibraltar or Sicily.

Nevertheless, these results apparently suggest:

  • That there is no contact before ca. 12000 BC through the Strait of Gibraltar; therefore the Sicilian route I support for the migration of R1b-V88 lineages is still the most likely one.
  • That the North African connection with Natufians is quite old – for which we already had modern Y-DNA investigation – , and therefore unlikely to be related to the Afroasiatic expansion.

I am glad I had some more time this week to read at least some interesting parts of the published papers, because the information to process is becoming insanely huge…


Iberian prehistoric migrations in Genomics from Neolithic, Chalcolithic, and Bronze Age


New open access paper Four millennia of Iberian biomolecular prehistory illustrate the impact of prehistoric migrations at the far end of Eurasia, by Valdiosera, Günther, Vera-Rodríguez, et al. PNAS (2018) published ahead of print.

Abstract (emphasis mine)

Population genomic studies of ancient human remains have shown how modern-day European population structure has been shaped by a number of prehistoric migrations. The Neolithization of Europe has been associated with large-scale migrations from Anatolia, which was followed by migrations of herders from the Pontic steppe at the onset of the Bronze Age. Southwestern Europe was one of the last parts of the continent reached by these migrations, and modern-day populations from this region show intriguing similarities to the initial Neolithic migrants. Partly due to climatic conditions that are unfavorable for DNA preservation, regional studies on the Mediterranean remain challenging. Here, we present genome-wide sequence data from 13 individuals combined with stable isotope analysis from the north and south of Iberia covering a four-millennial temporal transect (7,500–3,500 BP). Early Iberian farmers and Early Central European farmers exhibit significant genetic differences, suggesting two independent fronts of the Neolithic expansion. The first Neolithic migrants that arrived in Iberia had low levels of genetic diversity, potentially reflecting a small number of individuals; this diversity gradually increased over time from mixing with local hunter-gatherers and potential population expansion. The impact of post-Neolithic migrations on Iberia was much smaller than for the rest of the continent, showing little external influence from the Neolithic to the Bronze Age. Paleodietary reconstruction shows that these populations have a remarkable degree of dietary homogeneity across space and time, suggesting a strong reliance on terrestrial food resources despite changing culture and genetic make-up.

(A) f4 statistics testing affinities of prehistoric European farmers to either early Neolithic Iberians or central Europeans, restricting these reference populations to SNP-captured individuals to avoid technical artifacts driving the affinities. The boxplots in A show the distributions of all individual f4 statistics belonging to the respective groups. The signal is not sensitive to the choice of reference populations and is not driven by hunter-gatherer–related admixture (Datasets S4 and S5). (B) Estimates of ancestry proportions in different prehistoric Europeans as well as modern southwestern Europeans. Individuals from regions of Iberia were grouped together for the analysis in A and B to increase sample sizes per group and reduce noise


We present a comprehensive biomolecular dataset spanning four millennia of prehistory across the whole Iberian Peninsula. Our results highlight the power of archaeogenomic studies focusing on specific regions and covering a temporal transect. The 4,000 y of prehistory in Iberia were shaped by major chronological changes but with little geographic substructure within the Peninsula. The subtle but clear genetic differences between early Neolithic Iberian farmers and early Neolithic central European farmers point toward two independent migrations, potentially originating from two slightly different source populations. These populations followed different routes, one along the Mediterranean coast, giving rise to early Neolithic Iberian farmers, and one via mainland Europe forming early Neolithic central European farmers. This directly links all Neolithic Iberians with the first migrants that arrived with the initial Mediterranean Neolithic wave of expansion. These Iberians mixed with local hunter-gatherers (but maintained farming/pastoral subsistence strategies, i.e., diet), leading to a recovery from the loss of genetic diversity emerging from the initial migration founder bottleneck. Only after the spread of Bell Beaker pottery did steppe-related ancestry arrive in Iberia, where it had smaller contributions to the population compared with the impact that it had in central Europe. This implies that the two prehistoric migrations causing major population turnovers in central Europe had differential effects at the southwestern edge of their distribution: The Neolithic migrations caused substantial changes in the Iberian gene pool (the introduction of agriculture by farmers) (6, 9, 11, 13, 24), whereas the impact of Bronze Age migrations (Yamnaya) was significantly smaller in Iberia than in north-central Europe (24). The post-Neolithic prehistory of Iberia is generally characterized by interactions between residents rather than by migrations from other parts of Europe, resulting in relative genetic continuity, while most other regions were subject to major genetic turnovers after the Neolithic (4, 6, 7, 9, 25, 48). Although Iberian populations represent the furthest wave of Neolithic expansion in the westernmost Mediterranean, the subsequent populations maintain a surprisingly high genetic legacy of the original pioneer farming migrants from the east compared with their central European counterparts. This counterintuitive result emphasizes the importance of in-depth diachronic studies in all parts of the continent.


Genomic analysis of Germanic tribes from Bavaria show North-Central European ancestry


New open access paper Population genomic analysis of elongated skulls reveals extensive female-biased immigration in Early Medieval Bavaria, by Veeramah, Rott, Groß, et al. PNAS (2018), published ahead of print.

First, a bit of context on the Bavarii:

Europe experienced a profound cultural transformation between Late Antiquity and the Middle Ages that laid the foundations of the modern political, social, and religious landscape. During this period, colloquially known as the “Migration Period,” the Roman Empire gradually dissolved, with 5th and 6th century historiographers and contemporary witnesses describing the formation and migration of numerous Germanic peoples, such as the Goths, Alamanni, Gepids, and Longobards. However, the genetic and social composition of groups involved and the exact nature of these “migrations” are unclear and have been a subject of substantial historical and archaeological debate

In the mid 6th century AD, the historiographer Jordanes and the poet and hagiographer Venantius Fortunatus provide the first mention of a group known as the Baiuvarii that resided in modern day Bavaria. It is likely that this group had already started to form in the 5th century AD, and that it emanated from a combination of the romanized local population of the border province of the former Roman Empire and immigrants from north of the Danube (2). While the Baiuvarii are less well known than some other contemporary groups, an interesting archaeological feature in Bavaria from this period is the presence of skeletons with artificially deformed or elongated skulls.

Procrustes-transformed PCA of ancient samples using pseudohaploid calls based on off-target reads using an imputed POPRES modern reference dataset. Blue, green, and red male or female symbols are ancient Bavarian individuals with normal, intermediate, and elongated skulls, respectively. Orange circles are Anglo-Saxon era individuals. Large circles are medians for regions, dots are individuals. CE, central Europe; EE, eastern Europe; NE, northern Europe; NEE, northeastern Europe; NEW, northwestern Europe; SE, southern Europe; SEE, southeast Europe; WE, western Europe. Percentage of variation explained by PCs 1 and 2 for modern populations only is 0.25% and 0.15%.

Abstract (emphasis mine):

Modern European genetic structure demonstrates strong correlations with geography, while genetic analysis of prehistoric humans has indicated at least two major waves of immigration from outside the continent during periods of cultural change. However, population-level genome data that could shed light on the demographic processes occurring during the intervening periods have been absent. Therefore, we generated genomic data from 41 individuals dating mostly to the late 5th/early 6th century AD from present-day Bavaria in southern Germany, including 11 whole genomes (mean depth 5.56×). In addition we developed a capture array to sequence neutral regions spanning a total of 5 Mb and 486 functional polymorphic sites to high depth (mean 72×) in all individuals. Our data indicate that while men generally had ancestry that closely resembles modern northern and central Europeans, women exhibit a very high genetic heterogeneity; this includes signals of genetic ancestry ranging from western Europe to East Asia. Particularly striking are women with artificial skull deformations; the analysis of their collective genetic ancestry suggests an origin in southeastern Europe. In addition, functional variants indicate that they also differed in visible characteristics. This example of female-biased migration indicates that complex demographic processes during the Early Medieval period may have contributed in an unexpected way to shape the modern European genetic landscape. Examination of the panel of functional loci also revealed that many alleles associated with recent positive selection were already at modern-like frequencies in European populations ∼1,500 years ago.

Supervised model-based clustering ADMIXTURE analysis for ancient samples based on phased haplotypes for individual 1,000 bp loci from the 5-Mb neutralome. Analysis is based on the best of 100 runs for K = 8, but NC_EUR is the ancestry summed across 1000 Genomes CEU, 1000 Genomes GBR, and GoNL populations (i.e., it represents a northern/central European ancestry). Blue, green, and red male or female symbols are ancient Bavarian individuals with normal, intermediate, and elongated skulls, respectively.

There is no Y-DNA data to keep confirming the North-Central origin of certain modern European subclades in Central and South-Central Europe.

The potential Ostrogothic sample from Crimea was probably Hunnic, as the paper itself suggests, and both Ostrogoths and Gepids are known to have been allies of the Huns for a long time. It is also a well-known fact that East Germanic tribes migrated south- and eastward through eastern Europe, and then from the steppe westward.

Obviously, the PCA of a late Gepid sample – after a certain number of generations and admixture events with ‘local’ populations during the migrations – , and of a Crimean sample without a clear cultural identification, are of limited value today, until more samples are available.

Hence sadly no valid data yet to add to the debate of East Germanic nature, which mainly concerns its traditionally described origin in Scandinavia – i.e. close to North Germanic dialects – against a different origin (and dialectal branch) within Proto-Germanic territory.

NOTE. Just to be clear for future papers on Germanic tribes, I would expect East Germanic males to show either:
a) mainly R1b-U106, I1, and R1a-Z645 subclades, and to cluster closely to samples of Scandinavia during Antiquity, which would support a Scandinavian origin – a predominance of typically Scandinavian R1a-Z284 subclades would be more indicative of this origin, of course;
b) or mainly R1b-U106, R1b-P312, and I1 subclades and a PCA cluster close to West Germanic tribes, which would challenge its traditional dialectal identification.

I agree with the authors in that a few samples are able to describe certain migratory events, though, such as the emphasized female-biased long-distance migration in Bavaria, as well as the diverse ancestry of women versus men.


Patterns of genetic differentiation and the footprints of historical migrations in the Iberian Peninsula

Open access preprint (which I announced already) at bioRxiv Patterns of genetic differentiation and the footprints of historical migrations in the Iberian Peninsula, by Bycroft et al. (2018).

Abstract (emphasis mine):

Genetic differences within or between human populations (population structure) has been studied using a variety of approaches over many years. Recently there has been an increasing focus on studying genetic differentiation at fine geographic scales, such as within countries. Identifying such structure allows the study of recent population history, and identifies the potential for confounding in association studies, particularly when testing rare, often recently arisen variants. The Iberian Peninsula is linguistically diverse, has a complex demographic history, and is unique among European regions in having a centuries-long period of Muslim rule. Previous genetic studies of Spain have examined either a small fraction of the genome or only a few Spanish regions. Thus, the overall pattern of fine-scale population structure within Spain remains uncharacterised. Here we analyse genome-wide genotyping array data for 1,413 Spanish individuals sampled from all regions of Spain. We identify extensive fine-scale structure, down to unprecedented scales, smaller than 10 Km in some places. We observe a major axis of genetic differentiation that runs from east to west of the peninsula. In contrast, we observe remarkable genetic similarity in the north-south direction, and evidence of historical north-south population movement. Finally, without making particular prior assumptions about source populations, we show that modern Spanish people have regionally varying fractions of ancestry from a group most similar to modern north Moroccans. The north African ancestry results from an admixture event, which we date to 860 – 1120 CE, corresponding to the early half of Muslim rule. Our results indicate that it is possible to discern clear genetic impacts of the Muslim conquest and population movements associated with the subsequent Reconquista.

“(a) Binary tree showing the inferred hierarchical relationships between clusters. The colours and points correspond to each cluster as shown on the map, and the length of the coloured rectangles is proportional to the number of individuals assigned to that cluster. We combined some small clusters (Methods) and the thick black branches indicate the clades of the tree that we visualise in the map. We have labeled clusters according to the approximate location of most of their members, but geographic data was not used in the inference. (b) Each individual is represented by a point placed at (or close to) the centroid of their grandparents’ birthplaces. On this map we only show the individuals for whom all four grandparents were born within 80km of their average birthplace, although the data for all individuals were used in the fineSTRUCTURE inference. The background is coloured according to the spatial densities of each cluster at the level of the tree where there are 14 clusters (see Methods). The colour and symbol of each point corresponds to the cluster the individual was assigned to at a lower level of the tree, as shown in (a). The labels and boundaries of Spain’s Autonomous Communities are also shown.”

Some interesting excerpts:

Our results further imply that north west African-like DNA predominated in the migration. Moreover, admixture mainly, and perhaps almost exclusively, occurred within the earlier half of the period of Muslim rule. Within Spain, north African ancestry occurs in all groups, although levels are low in the Basque region and in a region corresponding closely to the 14th-century ‘Crown of Aragon’. Therefore, although genetically distinct this implies that the Basques have not been completely isolated from the rest of Spain over the past 1300 years.

NOTE. I must add here that the Expulsion of Moriscos is known to have been quite successful in the old Crown of Aragon – deeply affecting its economy – , in contrast with other territories of the Crown of Castille, where they either formed less sizeable communities, or were dispersed and eventually Christened and integrated with local communities. For example, thousands of Moriscos from Granada were dispersed following the War of Alpujarras (1567–1571) into different regions of the Crown of Castille, and many could not be later expelled due to the locals’ resistance to follow the expulsion edict.

Perhaps surprisingly, north African ancestry does not reflect proximity to north Africa, or even regions under more extended Muslim control. The highest amounts of north African ancestry found within Iberia are in the west (11%) including in Galicia, despite the fact that the region of Galicia as it is defined today (north of the Miño river), was never under Muslim rule and Berber settlements north of the Douro river were abandoned by. This observation is consistent with previous work using Y-chromosome data. We speculate that the pattern we see is driven by later internal migratory flows, such as between Portugal and Galicia, and this would also explain why Galicia and Portugal show indistinguishable ancestry sharing with non-Spanish groups more generally. Alternatively, it might be that these patterns reflect regional differences in patterns of settlement and integration with local peoples of north African immigrants themselves, or varying extents of the large-scale expulsion of Muslim people, which occurred post-Reconquista and especially in towns and cities.

We estimated ancestry profiles for each point on a fine spatial grid across Spain (Methods). Gray crosses show
the locations of sampled individuals used in the estimation. Map shows the fraction contributed from the donor group ‘NorthMorocco’.

Overall, the pattern of genetic differentiation we observe in Spain reflects the linguistic and geopolitical boundaries present around the end of the time of Muslim rule in Spain, suggesting this period has had a significant and long-term impact on the genetic structure observed in modern Spain, over 500 years later. In the case of the UK, similar geopolitical correspondence was seen, but to a different period in the past (around 600 CE). Noticeably, in these two cases, country-specific historical events rather than geographic barriers seem to drive overall patterns of population structure. The observation that fine-scale structure evolves at different rates in different places could be explained if observed patterns tend to reflect those at the ends of periods of significant past upheaval, such as the end of Muslim rule in Spain, and the end of the Anglo-Saxon and Danish Viking invasions in the UK.

Certain people want to believe (well into the 21st century) into ideal ancestral populations and ancient ethnolinguistic identifications linked to one’s own – or the own country’s dominant – ancestral components and Y-DNA haplogroup.

We are nevertheless seeing how mainly the most recent relevant geopolitical events and late internal migratory flows have shaped the genetic structure (including Y-DNA haplogroup composition) of modern regions and countries regardless of its population’s actual language or ethnic identification, whether (pre)historical or modern.

Another surprise for many, I guess.


Population substructure in Iberia, highest in the north-west territory (to appear in Nature)

A manuscript co-authored by Angel Carracedo, from the University of Santiago de Compostela, and (always according to him) pre-accepted in Nature, will offer more insight into the population substructure of Spain, based on autosomal DNA.

Carracedo’s lecture about DNA (in Galician), including his summary of the paper (from december 2017):

Some of the points made in the video:

  • The study shows a situation parallelling – as expected – the expansion of Spanish Medieval kingdoms during the Reconquista (and subsequent repopulation).
  • In it, the biggest surprise seems to be the greater substructure found in Galicia, the north-western Spanish territory – greater even than expected by the authors.
  • As a side note, Galicia shows a great influence from Moorish” ancestral components, due mainly to the influx from Portugal, which shows more.

It is difficult to judge only from the image and his words, but one could say that there are:

  • Certain quite old ancestral Galician groups;
    • then two – also quite old – ancestral Basque groups;
      • then more recent Galician groups;
        • and then a common, central Spanish group – including
          • a wider Asturian-Catalan group, with a western Asturian-Leonese, and an eastern Catalan subgroup;
          • and a central Castillian-Aragonese group, also with a western Castillian, and an eastern Aragonese subgroup.
Spain’s population substructure, from the video.

We thought that certain parts of the British Isles could show ancestral components related to the old population, although this has not proven exactly right, due to more recent population expansions.

However, this paper might shed light to the controversy surrounding Lusitanian (possibly Gallaico-Lusitanian) as a Pre-Celtic Indo-European group of Iberia, either slightly older as an Italo-Celtic dialect, or potentially from the Bell Beaker expansion, whose genetic imprint might have survived the Roman conquest, which apparently didn’t replace its ancestral population.

Given the presence of a central Spanish group opposed to the other minor groups – and knowing that (at least part of) the Medieval kingdoms should be related to the Occitan region – due to the Celtic expansion, and also potentially later during the Visigothic Kingdom, and the Carolingian Empire – , we can only guess that the other (north-western and Basque) groups are potentially quite old, and reflect prehistoric population structures.

Just speculating here, of course. Another interesting genetic paper to await…

Seen first in the Facebook group Iberia ADN.


Olalde et al. and Mathieson et al. (Nature 2018): R1b-L23 dominates Bell Beaker and Yamna, R1a-M417 resurges in East-Central Europe during the Bronze Age

The official papers Olalde et al. (Nature 2018) and Mathieson et al. (Nature 2018) have appeared. They are based on the 2017 preprints at BioRxiv The Beaker Phenomenon And The Genomic Transformation Of Northwest Europe and The Genomic History Of Southeastern Europe respectively, but with a sizeable number of new samples.

Papers are behind a paywall, but here are the authors’ shareable links to read the papers and supplementary materials: Olalde et al. (2018), Mathieson et al. (2018).

NOTE: The corresponding datasets have been added to the Reich Lab website. Remember you can use my drafts on DIY Human Ancestry analysis (viz. Plink/Eigensoft, PCA, or ADMIXTURE) to investigate the data further in your own computer.

Image modified by me, from Olalde et al (2018). PCA of 999 Eurasian individuals. Marked is the late CWC outlier sample from Esperstedt, showing how early East Bell Beaker samples are the closest to Yamna samples.

I don’t have time to analyze the samples in detail right now, but in short they seem to convey the same information as before: in Olalde et al. (2018) the pattern of Y-DNA haplogroup and steppe ancestry distribution is overwhelming, with an all-R1b-L23 Bell Beaker people accompanying steppe ancestry into western Europe.

EDIT: In Mathieson et al. (2018), a sample classified as of Ukraine_Eneolithic from Dereivka ca. 2890-2696 BC is of R1b1a1a2a2-Z2103 subclade, so Western Yamna during the migrations also of R1b-L23 subclades, in contrast with the previous R1a lineages in Ukraine. In Olalde et al. (2018), it is clearly stated that of the four BB individuals with higher steppe ancestry, the two with higher coverage could be classified as of R1b-S116/P312 subclades.

This is compatible with the expansion of Indo-European-speaking Yamna migrants (also mainly of R1b-L23 subclades) into the East Bell Beaker group, as described with detail in Archaeology (and with the population movement we are seeing having been predicted) first by Volker Heyd in 2007.

Yamna – East Bell Beaker migration 3000-2300 BC. Adapted from Harrison and Heyd (2007), Heyd (2007)

Also, the resurge of R1a-Z645 subclades in Czech and Polish lands (from previous Corded Ware migrants) accompanying other lineages indigenous to the region – seems to have happened only after the Bell Beaker expansion into these territories, during the Bronze Age, probably leading to the formation of the Balto-Slavic community, as I predicted based on previous papers. The fact that a sample of R1b-U106 subclade pops up in this territory is interesting from the point of view of a shared substrate with Germanic, as is the earlier BB sample of R1b-Z2103 for its connection with Graeco-Aryan dialects.

All this suggests that a North-West Indo-European dialect – ancestor of Italo-Celtic, Germanic, and Balto-Slavic -, supported in Linguistics by most modern Indo-European schools of thought, expanded roughly along the Danube, and later to northern, eastern, and western Europe with the Bell Beaker expansion, as supported in Anthropology by Mallory (in Celtic from the West 2, 2013), and by Prescott for the development of a Nordic or Pre-Germanic language in Scandinavia since 1995.

Diachronic map of Late Copper Age migrations including Classical Bell Beaker (east group) expansion from central Europe ca. 2600-2250 BC

Maybe more importantly, the fact that only Indo-Iranian-speaking Sintashta-Petrovka (and later Andronovo) cultures were clearly associated with R1a-Z645 subclades, and rather late – after mixing with early Chalcolithic North Caspian steppe groups (mainly East Yamna and Poltavka herders of R1b-L23 subclades) – gives support to the theory that Corded Ware (and probably the earlier Sredni Stog) groups did not speak or spread Indo-European languages with their migration, but most likely Uralic – as seen in recent papers on the much later arrival of haplogroup N1c – (compatible with the Corded Ware substrate hypothesis), adopting Indo-Iranian by way of cultural diffusion or founder effect events.

As Sheldon Cooper would say,

Under normal circumstances I’d say I told you so. But, as I have told you so with such vehemence and frequency already the phrase has lost all meaning. Therefore, I will be replacing it with the phrase, I informed you thusly

I informed you thusly:

Germanic tribes during the Barbarian migrations show mainly R1b, also I lineages


New preprint at BioRxiv, Understanding 6th-Century Barbarian Social Organization and Migration through Paleogenomics, by Amorim, Vai, Posth, et al. (2018)

Abstract (emphasis mine):

Despite centuries of research, much about the barbarian migrations that took place between the fourth and sixth centuries in Europe remains hotly debated. To better understand this key era that marks the dawn of modern European societies, we obtained ancient genomic DNA from 63 samples from two cemeteries (from Hungary and Northern Italy) that have been previously associated with the Longobards, a barbarian people that ruled large parts of Italy for over 200 years after invading from Pannonia in 568 CE. Our dense cemetery-based sampling revealed that each cemetery was primarily organized around one large pedigree, suggesting that biological relationships played an important role in these early Medieval societies. Moreover, we identified genetic structure in each cemetery involving at least two groups with different ancestry that were very distinct in terms of their funerary customs. Finally, our data was consistent with the proposed long-distance migration from Pannonia to Northern Italy.

Interesting excerpts:

Since the adults were almost all non-local, it is tempting to suggest that we may be observing the historically described fara during migration. Regardless, this group appears to be a unit organized around one high-status, kin-based group of predominantly males, but also incorporating other males that may have some common central/northern European descent. The relative lack of adult female representatives from Kindred SZ1, the diverse genetic and isotope signatures of the sampled women around the males and their rich graves goods suggests that they may have been acquired and incorporated into the unit during the process of migration (perhaps hinting at a patrilocal societal structure that has been shown to be prominent in Europe during earlier periods).

The remaining part of this community for which we have genomic data (N=7) is composed of individuals of mainly southern European genetic ancestry that are conspicuously lacking grave goods and occupy the southeastern part of the cemetery, with randomly oriented graves with straight walls. While the lack of grave goods does not necessarily imply that these individuals were of lower status, it does point to them belonging to a different social group. Interestingly, the strontium isotope data suggest that they may have migrated together with the warrior-based group from outside Szólád, but barriers to gene flow were largely been maintained.

Genetic structure of Szólád and Collegno. (A) Procrustes Principal Component Analysis of modern and ancient European population (faded small dots are individuals, larger circle is median of individuals) along with samples from Szólád (filled circles), Collegno (filled stars), Bronze Age SZ1 (filled grey circle), second period CL36 (grey star), two Avar-period samples from Szólád (yellow circles), Anglo-Saxon period UK (orange circles) and 6th Century Bavaria (green circles). Szólád and Collegno samples are filled with colors based on estimated ancestry from ADMIXTURE. Blue circles with thick black edge = Kindred_SZ1 , blue stars with thick black edge = Kindred_CL1 , stars with thick green edge = Kindred_CL2 . NWE = northwest Europe, NE = modern north Europe, NEE = modern northeast Europe, CE =central Europe, EE = eastern Europe, WE =western Europe, SE = southern Europe, SEE = southeast Europe, HUN = modern Hungarian, HBr = Hungarian Bronze Age, Br = central, northern and eastern Europe Bronze age. Model-based ancestry estimates from Admixture for Szólád (B) and Collegno (C) using 1000 Genomes Project Eurasian and YRI populations to supervise analysis. Note that high contamination was identified in CL31 and is shown with a triangle in the (A) and overlaid with a pink hue in the (C).

Evidence for Migrating Barbarians and “Longobards”

Our two cemeteries overlap chronologically with the historically documented migration of Longobards from Pannonia to Italy at the end of the 6th century. It is thus intriguing that we observe that central/northern European ancestry is dominant not only in Szólád, but also in Collegno. Based on modern genetic data we would not expect to see a preponderance of such ancestry in either Hungary or especially Northern Italy. While we do not yet know the general genomic background of Europe in these geographic regions just before the establishment of Szólád and Collegno, other Migration Period genomes from the UK and Germany show a fairly strong correlation with modern geography (while also possessing a similar central/northern European ancestry component to that found in Szólád and Collegno). Going further back in time, Late Bronze Age Hungarians show almost no resemblance to populations from modern central/northern Europe, especially compare to Bronze Age Germans and in particular Scandinavians, who, in contrast, show considerable overlap with our Szólád and Collegno central/northern ancestry samples. Coupled with the strontium isotope data, our paleogenomic analysis suggest that the earliest individuals of central/northern ancestry in Collegno were probably migrants while those with southern ancestry were local residents. Our results are thus consistent with an origin of barbarian groups such as the Longobards somewhere in Northern and Central Europe east of the Rhine and north of the Danube. Thus our results cannot reject the migration, its route, and settlement of “the Longobards” described in historical texts.

We note however that whether these people identified as “Longobard” or any other particular barbarian people is impossible to assess. Modern European genetic variation is generally highly structured by geography 22,32 , even at the level of individual villages 33 . It is, therefore, surprising to find significant diversity, even amongst individuals with central/northern ancestry, within small, individual Langobard cemeteries. Even amongst the two family groups of primarily central/northern ancestry, who may have formed the heart of such migration, there is clear evidence of admixture with individuals with more southern ancestry. If we are seeing evidence of movements of barbarians, there is no evidence that these were genetically homogenous groups of people.

From the supplementary material:

The haplogroups detected in the samples show a prevalence of R1b (55.3%), which is the most common sub-haplogroup in western Europe, with a peak in the Iberian Peninsula and in the British islands and a west-east gradient in central Europe. A consistent percentage of haplotypes belongs to the I haplogroup (26.4%), both in the I1a and, more abundantly, in I2a2 sub-haplogroups. They are particularly frequent in the northern Balkans with a westward gradient in central and western Europe, with some lineages belonging to I2a2a1b particularly common in the Germanic region.

Relative and absolute haplogroup frequencies: COL = Collegno; SZO = Szólád; CEU = Central European from Utah; FIN = Finnish; GBR = Britons; IBS = Iberians; SAR = Sardinians; TSI = Tuscans


Y chromosome C2*-star cluster traces back to ordinary Mongols, rather than Genghis Khan


Article behind paywall, Whole-sequence analysis indicates that the Y chromosome C2*-Star Cluster traces back to ordinary Mongols, rather than Genghis Khan, by Wei, Yan, Lu, et al. Eur J Hum Genet (2018); 26:230–237


The Y-chromosome haplogroup C3*-Star Cluster (revised to C2*-ST in this study) was proposed to be the Y-profile of Genghis Khan. Here, we re-examined the origin of C2*-ST and its associations with Genghis Khan and Mongol populations. We analyzed 34 Y-chromosome sequences of haplogroup C2*-ST and its most closely related lineage. We redefined this paternal lineage as C2b1a3a1-F3796 and generated a highly revised phylogenetic tree of the haplogroup, including 36 sub-lineages and 265 non-private Y-chromosome variants. We performed a comprehensive analysis and age estimation of this lineage in eastern Eurasia, including 18,210 individuals from 292 populations. We discovered that the origin of populations with high frequencies of C2*-ST can be traced to either an ancient Niru’un Mongol clan or ordinary Mongol tribes. Importantly, the age of the most recent common ancestor of C2*-ST (2576 years, 95% CI = 1975–3178) and its sub-lineages, and their expansion patterns, are consistent with the diffusion of all Mongolic-speaking populations, rather than Genghis Khan himself or his close male relatives. We concluded that haplogroup C2*-ST is one of the founder paternal lineages of all Mongolic-speaking populations, and direct evidence of an association between C2*-ST and Genghis Khan has yet to be discovered.

This is a great example of the potential mistake that one can make in assessing leading clans of population expansions from the perspective of the renown case of the Uí Néill clan’s expansion in Ireland.

Just some days ago I wrote about the first Hungarian dynasty’s haplogroup R1a, and the potential association of other Ugric-speaking clans with R1a subclades, so let’s wait and see if future papers on other ancient Hungarian clans and Hungarian settlers bring surprises…


mtDNA suggest original East Germanic population linked to Jutland Iron Age and Bell Beaker


Open Access article A mosaic genetic structure of the human population living in the South Baltic region during the Iron Age, by Stolarek et al., at Scientific Reports 8:2455 (2018).

About the site:

Kowalewko is a village in Wielkopolskie vojevodship, close to Poznan, in the middle reaches of the Samica Kierska river. Biritual Roman Age cemetery (site 12), dated from the mid-1st to the beginning of 3rd century AD, is located in the featureless arable fields at the South and West of the village

About the Wielbark culture:

Chronology spans almost all the Roman Iron Age, since ca. 20 AD to ca. 450 AD. The Wielbark culture is associated with the Goths and Gepids, who migrated from Scandinavia towards the Black Sea, and their successors, who, after several centuries, returned to the lands formerly occupied by their ancestors. Typical features of the culture include inhumation graves rich in goods of numerous ornaments frequently of noble metals, while no implements and weapons have been observed and iron objects very rarely. Less frequent cremations. Barrows recorded within cemeteries reflect emergence of elites. The Wielbark communities built stone constructions, including pebbled floors and circles. This culture is mainly known from cemeteries, as settlements, not fortified, are less recognized.

Location of Kowalewko and a scheme of the Kowalewko cemetery site 12, based on the Fig. 3 from the monograph by Tomasz Skorupka, Kowalewko 12. Biritual cemetery of a population of the Wielbark Culture (mid 1st to beginning of 3rd century AD), published in: Marek Chlodnicki [ed.], Archaeological rescue investigations along the gas transit pipeline, vol. II – Wielkopolska, part 3, Poznan 2001, generated using Corel Draw ver. 12.0, with the author permission. Sampled graves are marked with a red color. Europe and Poland maps were downloaded from Wikimedia Commons (https://commons.wikimedia.org), under the free licence, and modified with Corel Draw ver. 12.0.

Interesting excerpts with emphasis added (and some stylistic changes for abbreviations):

Analysis of genetic distances (see Fig. 2b) showed that both Jutland Iron Age (JIA) and Kowalewko (Kow-OVIA), are the closest to the Central Europe Metapopulation (CEM). However, it should be mentioned that many of the resulting genetic proximities did not reach statistical significance at the alpha level 0.05 (mainly due to the multiple comparisons), thus they should be interpreted with caution. Higher prevalence of the mtDNA haplogroup H in Kowalewko and Jutland Iron Age(its high level is also characteristic for the Bell Beaker Culture) than in the preceding Corded Ware Culture (CWC) and Unetic Culture (UC) supports the hypothesis assuming significant demographic changes in Central Europe after the LN/EBA period. This hypothesis is additionally strengthened by the results of AMOVA analysis indicating that there is some inconsistency between genetic distances and the chronology of the appearance of the studied populations in Central Europe, i.e., the older populations (BBC, CWC) contributed more to the genetic structure of CEM than the younger ones (UC).

Changes in the occurrence of mtDNA haplogroups U5a/U5b in Central Europe are also worth noting. At LN and EBA, the prevailing haplogroup was U5a for BBC/CWC/UC. Next, there was a dominance of U5b for the Kow-OVIA/JIA during IA and now U5a is again more popular (CEM). The first alteration in the U5a/U5b prevalence between the LN/EBA and the IA supports the hypothesis of demographic changes right after the LN, proposed by Brandt et al (2013). The second conversion indicated by our results suggests another crucial demographic event that should occur between the IA and present.

On the basis of the above observations, one may assume that in the IA, specific genetic substructures were formed in Central Europe. Because the demographic history of fossil populations often has a local character33,34, it is worth considering the range of the observed changes. These considerations should also take into account the hypothesis on the migrations that most likely occurred between the 3rd and 6th century AD. In this context, it seems necessary to compare Kow-OVIA and JIA with other populations from the IA, in particular those located east of Vistula, and with the populations that inhabited this region during the Middle Ages.

PCA2 vs. PCA3 on the haplogroup frequencies of ‘European Population Transect’ populations

Finally, we found that the genetic structures of female and male subpopulations of Kow-OVIA were significantly different. This fact cannot be explicitly determined based on the results of individual analyses; however, it is quite evident if one considers the whole set of data presented here including the Fisher test on haplogroup frequencies. The analyses of both mtDNA haplogroups and genetic distances indicated that women from Kowalewko were related closer to the EN/MN populations, and the men were closer to the CWC and UC. This observation may explain why the genetic relationships of Kow-OVIA with other ancient European populations were more complex and more difficult to define as it was in the case of JIA. In analyzing Kow-OVIA, we observed multiple overlapping effects of two subpopulations with different genetic affinities. One would speculate that the genetic profile of Kow-OVIA-F resulted from exogamy that was described for the CWC population. This is, however, not the case. We found that the genetic differences between women and men were maintained for the entire observation period, i.e., for 200 years (approximately 8 generations). Such a composition of the genetic structure of Kow-OVIA could exist only if at least one subgroup (Kow-OVIA-F or -M) was periodically exchanged. It would further mean that Kowalewko played some specific roles in that region. According to the recent archaeological studies, the colonization pattern in IA Greater Poland could be linked with the existence of a centralized organization system32. Kowalewko could have been one of the important elements of this system. For example, it could have functioned as a garrison for the population closely associated with the JIA, such that warriors stayed in the garrison for only a few years and were then replaced by others. Other scenarios are also possible; however, verification of any hypothesis requires more detailed studies.

All in all, we know that Wielbark probably represented the initial migration period of East Germanic tribes, traditionally believed to be from Northern Scandinavia, into territory later inhabited by Slavic tribes (and potentially earlier by a Balto-Slavic community).

Other than that, the results show some potential for a stable genomic situation in the Germanic homeland in terms of mtDNA, common after the Bell Beaker expansion, which probably brought Pre-Germanic to Scandinavia.

Nevertheless, only a comprehensive study of all Germanic regions from that period (whole genomic and Y-DNA) might shed light onto the real origin of East Germanic peoples, and thus their contended dialectal position, since we already know that certain modern Slavic and Germanic populations cluster closely to some Bronze Age communities of the same region, so differences during the Iron Age may be already quite subtle.

In my humble opinion, too many hypotheses in the paper for few interesting data – as is more and more usual in genetic papers. I guess journals expect that to get more attention, although serious reviewers should actually encourage the opposite, and only informal blogs like this one should come up with far-fetched theories, instead of rebutting them…