Shared ancestry of ancient Eurasian hepatitis B virus diversity linked to Bronze Age steppe


Ancient hepatitis B viruses from the Bronze Age to the Medieval period, by Mühlemann et al., Science (2018) 557:418–423.

NOTE. You can read the PDF at Dalia Pokutta’s account.

Abstract (emphasis):

Hepatitis B virus (HBV) is a major cause of human hepatitis. There is considerable uncertainty about the timescale of its evolution and its association with humans. Here we present 12 full or partial ancient HBV genomes that are between approximately 0.8 and 4.5 thousand years old. The ancient sequences group either within or in a sister relationship with extant human or other ape HBV clades. Generally, the genome properties follow those of modern HBV. The root of the HBV tree is projected to between 8.6 and 20.9 thousand years ago, and we estimate a substitution rate of 8.04 × 10−6–1.51 × 10−5 nucleotide substitutions per site per year. In several cases, the geographical locations of the ancient genotypes do not match present-day distributions. Genotypes that today are typical of Africa and Asia, and a subgenotype from India, are shown to have an early Eurasian presence. The geographical and temporal patterns that we observe in ancient and modern HBV genotypes are compatible with well-documented human migrations during the Bronze and Iron Ages1,2. We provide evidence for the creation of HBV genotype A via recombination, and for a long-term association of modern HBV genotypes with humans, including the discovery of a human genotype that is now extinct. These data expose a complexity of HBV evolution that is not evident when considering modern sequences alone.

Geographical distribution of analysed samples and modern genotypes. a (featured image), Distribution of modern human HBV genotypes. Genotypes relevant to this Letter are shown in colour. Coloured shapes indicate the locations of the HBV-positive samples included for further analysis. b (above this text), Locations of analysed Bronze Age samples are shown as circles and Iron Age and later samples are shown as triangles. Coloured markers indicate HBV-positive samples. Ancient genotype A samples are found in regions in which genotype D predominates today, and HBV-DA27 is of subgenotype D5 which today is found almost exclusively in India.

Interesting excerpts:

We find genotype A in south-western Russia by 4.3 ka (in samples RISE386 and RISE387) in individuals belonging to the Sintashta culture, and in a Hungarian sample (DA195) from the Scythian culture. The western Scythians are related to the Bronze Age cultures of western steppe populations2 and their shared ancestry suggests that the modern genotype A may descend from this ancient Eurasian diversity and not, as previously hypothesized, from African ancestors29,30. This is also consistent with the phylogeny (Fig. 2), as well as the fact that the three oldest ancient genotype A sequences (HBV-DA195, HBV-RISE386 and HBV-RISE387) lack the six-nucleotide insertion found in the youngest (HBV-DA119) and in all modern genotype A sequences. The ancestors of subgenotypes A1 and A3 could have been carried into Africa subsequently, via migration from western Eurasia31.

The ancient HBV genotype D sequences were all found in Central Asia. HBV-DA27, found in Kazakhstan and dated to 1.6 ka, falls basal to the modern subgenotype D5 sequences that today are found in the Paharia tribe from eastern India32. DA27 and the Paharia people in India are linked by their East Asian ancestry2,33.

Dated maximum clade credibility tree of HBV. A log-normal relaxed clock and coalescent exponential population prior were used. Grey horizontal bars indicate the 95% HPD interval of the age of the node. Larger numbers on the nodes indicate the median age and 95% HPD interval of the age (in parentheses) under a strict clock and Bayesian skyline tree prior. Clades of genotypes C (except clade C4), E, F, G and H are collapsed and shown as dots. The figure includes a possible tenth genotype, J, based on a single human isolate. Taxon names for ancient samples indicate era (BA, Bronze Age; IA, Iron Age or later), sample name, sample age in years, ISO 3166 three-letter abbreviation of country of sequence origin, and region of sequence origin. Taxon names for modern samples indicate human genotype or subgenotype or host species if non-human, GenBank accession number, sample age in years, ISO 3166 three-letter abbreviation of country of sequence origin, and region of sequence origin.

(…)Despite the age of the samples and the imperfect diagnostic test, our dataset contained a high proportion of HBV-positive individuals. The actual ancient prevalence during the Bronze Age and thereafter might have been higher, reaching or exceeding the prevalence typically found in contemporary indigenous populations5. This clearly establishes the potential of HBV as powerful proxy tool for research into human spread and interactions. The data from ancient genomes reveal aspects of complexity in HBV evolution that are not apparent when only modern sequences are considered. They show the existence of ancient HBV genotypes in locations incongruent with their present-day distribution, contradicting previously suggested geographical or temporal origins of genotypes or sub-genotypes; evidence for the creation of genotype A via recombination and the emergence of the genotype outside Africa; at least one now-extinct human genotype; ancient genotype-level localized diversity; and demonstrate that the viral substitution rate obtained from modern heterochronously sampled sequences is probably misleading. Together, these findings suggest that the difficulty in formulating a coherent theory for the origin and spread of HBV may be due to genetic evidence of an earlier evolutionary scenario being overwritten by relatively recent alterations, as has previously been suggested in the context of recombination24

See also:

On Latin, Turkic, and Celtic – likely stories of mixed societies and little genetic impact


Recent article on The Conversation, The Roman dead: new techniques are revealing just how diverse Roman Britain was, about the paper (behind paywall) A Novel Investigation into Migrant and Local Health-Statuses in the Past: A Case Study from Roman Britain, by Redfern et al. Bioarchaeology International (2018), among others.

Interesting excerpts about Roman London:

We have discovered, for example, that one middle-aged woman from the southern Mediterranean has black African ancestry. She was buried in Southwark with pottery from Kent and a fourth century local coin – her burial expresses British connections, reflecting how people’s communities and lives can be remade by migration. The people burying her may have decided to reflect her life in the city by choosing local objects, but we can’t dismiss the possibility that she may have come to London as a slave.

The evidence for Roman Britain having a diverse population only continues to grow. Bioarchaeology offers a unique and independent perspective, one based upon the people themselves. It allows us to understand more about their life stories than ever before, but requires us to be increasingly nuanced in our understanding, recognising and respecting these people’s complexities.

We already have a more or less clear idea about how little the Roman conquest may have shaped the genetic map of Europe, Africa, or the Middle East, in contrast to other previous or later migrations or conquests.

Also, on the Turkic expansion, the recent paper of Damgaard et al. (Nature 2018) stated:

In the sixth century AD, the Hunnic Empire had been broken up and dispersed as the Turkic Khaganate assumed the military and political domination of the steppes22,23. Khaganates were steppe nomad political organizations that varied in size and became dominant during this period; they can be contrasted to the previous stateless organizations of the Iron Age24. The Turkic Khaganate was eventually replaced by a number of short-lived steppe cultures25 (…).

We find evidence that elite soldiers associated with the Turkic Khaganate are genetically closer to East Asians than are the preceding Huns of the Tian Shan mountains (Supplementary Information section 3.7). We also find that one Turkic Khaganate-period nomad was a genetic outlier with pronounced European ancestries, indicating the presence of ongoing contact with Europe (…).

Analyses of Turk- and Medieval-period population clusters. a, PCA of Tian Shan Hun, Turk, Kimak, Kipchack, Karakhanid and Golden Horde, including 28 individuals analysed at 242,406 autosomal SNP positions. b, Results for model-based clustering analysis at K = 7. Here we illustrate the admixture analyses with K = 7 as it approximately identifies the major component of relevance (Anatolian/ European farmer component, Caucasian ancestry, EHG-related ancestry and East Asian ancestry).”

These results suggest that Turkic cultural customs were imposed by an East Asian minority elite onto central steppe nomad populations, resulting in a small detectable increase in East Asian ancestry. However, we also find that steppe nomad ancestry in this period was extremely heterogeneous, with several individuals being genetically distributed at the extremes of the first principal component (Fig. 2) separating Eastern and Western descent. On the basis of this notable heterogeneity, we suggest that during the Medieval period steppe populations were exposed to gradual admixture from the east, while interacting with incoming West Eurasians. The strong variation is a direct window into ongoing admixture processes and the multi-ethnic cultural organization of this period.

We already knew that the expansion of the La Tène culture, associated with the expansion of Celtic languages throughout Europe, was probably not accompanied by massive migrations (from the IEDM, 3rd ed.):

The Mainz research project of bio-archaeometric identification of mobility has not proven to date a mass migration of Celtic peoples in central Europe ca. 4th-3rd centuries BC, i.e. precisely in a period where textual evidence informs of large migratory movements (Scheeres 2014). La Tène material culture points to far-reaching inter-regional contacts and cultural transfers (Burmeister 2016).

Also, from the latest paper on Y-chromosome bottleneck:

[The hypothesis of patrilineal kin group competition] has an added benefit in that it could explain the temporal placement of the bottleneck if competition between patrilineal kin groups was the main form of intergroup competition for a limited episode of time after the Neolithic transition. Anthropologists have repeatedly noted that the political salience of unilineal descent groups is greatest in societies of ‘intermediate social scale’ (Korotayev47 and its citations on p. 2), which tend to be post-Neolithic small-scale societies that are acephalous, i.e. without hierarchical institutions48. Corporate kin groups tend to be absent altogether among mobile hunter gatherers with few defensible resource sites or little property (Kelly49 pp. 64–73), or in societies utilizing relatively unoccupied and under-exploited resource landscapes (Earle and Johnson50 pp. 157–171). Once they emerge, complex societies, such as chiefdoms and states, tend to supervene the patrilineal kin group as the unit of intergroup competition, and while they may not eradicate them altogether as sub-polity-level social identities, warfare between such kin groups is suppressed very effectively51,52.These factors restrict the social phenomena responsible for the bottleneck to the period after the initial Neolithic but before the emergence of complex societies, which would place the bottleneck-generating mechanisms in the right period of time for each region of the Old World.

Diachronic map of Late Copper Age migrations including Classical Bell Beaker (east group) expansion from central Europe ca. 2600-2250 BC

However, I recently read in a forum for linguists that the expansion of East Bell Beakers overwhelmingly of R1b-L21 subclades in the British Isles “poses a problem”, in that it should be identified with a Celtic expansion earlier than traditionally assumed…

That interpretation would be in line with the simplistic maps we are seeing right now for Bell Beakers (see below for the Copenhagen group).

If anything, the results of Bell Beaker expansions (taken alone) would seem to support a model similar to Cunliffe & Koch‘s hypotheses of a rather early Celtic expansion into Great Britain and Iberia from the Atlantic.

Spread of Indo-European languages (by the Copenhagen group).

But it doesn’t. Mallory already explained why in Cunliffe & Koch’s series Celtic from the West: the Bell Beaker expansion is too early for that; even for Italo-Celtic. It should correspond to North-West Indo-European speakers.

Not every population movement that is genetically very significant needs to be significant for the languages attested much later in the region.

This should be obvious to everyone with the many examples we already have. One of the least controversial now would probably be the expansion of R1b-DF27, widespread in Iberia probably at roughly the same time as R1b-L21 was in Great Britain, and still pre-Roman Iberians showed a mix of non-Indo-European languages, non-Celtic languages (at least Galaico-Lusitanian), and also some (certain) Celtic languages. And modern Iberians speak Romance languages, without much genetic impact from the Romans, either…

It is well-established in Academia that the expansion of La Tène is culturally associated with the spread of Celtic languages in Europe, including the British Isles and Iberia. While modern maps of U152 distribution may correspond to the migration of early Celts (or Italo-Celtic speakers) with Urnfield/Hallstatt, the great Celtic expansion across Europe need not show a genetic influence greater than or even equal to that of previous prehistoric migrations.

Post-Bell-Beaker Europe, after ca. 2200 BC.

You can see in these de novo models the same kind of invented theoretical ‘problem’ (as Iosif Lazaridis puts it) that we have seen with the Corded Ware showing steppe ancestry, with Old Hittite samples not showing EHG ancestry, or with CHG ancestry appearing north of the Caucasus but no EHG to the south.

However you may want to explain all these errors in scientific terms (selection bias, under-coverage, over-coverage, faulty statistical methods, etc.), these interpretations were simply fruit of the lack of knowledge of the anthropological disciplines at play.

Let’s hope the future paper on Celtic expansion takes this into consideration.


Haplogroup J spread in the Mediterranean due to Phoenician and Greek colonizations


Open access A finely resolved phylogeny of Y chromosome Hg J illuminates the processes of Phoenician and Greek colonizations in the Mediterranean, by Finocchio et al. Scientific Reports (2018) Nº 7465.

Abstract (emphasis mine):

In order to improve the phylogeography of the male-specific genetic traces of Greek and Phoenician colonizations on the Northern coasts of the Mediterranean, we performed a geographically structured sampling of seven subclades of haplogroup J in Turkey, Greece and Italy. We resequenced 4.4 Mb of Y-chromosome in 58 subjects, obtaining 1079 high quality variants. We did not find a preferential coalescence of Turkish samples to ancestral nodes, contradicting the simplistic idea of a dispersal and radiation of Hg J as a whole from the Middle East. Upon calibration with an ancient Hg J chromosome, we confirmed that signs of Holocenic Hg J radiations are subtle and date mainly to the Bronze Age. We pinpointed seven variants which could potentially unveil star clusters of sequences, indicative of local expansions. By directly genotyping these variants in Hg J carriers and complementing with published resequenced chromosomes (893 subjects), we provide strong temporal and distributional evidence for markers of the Greek settlement of Magna Graecia (J2a-L397) and Phoenician migrations (rs760148062). Our work generated a minimal but robust list of evolutionarily stable markers to elucidate the demographic dynamics and spatial domains of male-mediated movements across and around the Mediterranean, in the last 6,000 years.

J2-L397. The star indicates the centroid of derived alleles. The solid square indicates the centroid of ancestral alleles, with its 95% C.I. (ellipse). In the insets: distributions of the pairwise sampling distances (in Km) for the carriers of the ancestral (black) and derived (white) allele, with solid and dashed lines indicating the respective averages. At right: median joining network of 7-STR haplotypes and SNPs in the same groups, with sectors coloured according to sampling location. Haplotype structure is detailed for some nodes, in the order YCA2a-YCA2b-DYS19-DYS390-DYS391-DYS392-DYS393 (in italics).

Interesting excerpts:

Two features of our tree are at odds with the simplistic idea of a dispersal of Hg J as a whole from the Middle East towards Greece and Italy and an accompanying radiation26. First, there is little evidence of sudden diversification between 15 and 5 kya, a period of likely population increase and pressure for range expansion, due to the Agricultural revolution in the Fertile Crescent. Second, within each subclade, lineages currently sampled in Turkey do not show up as preferentially ancestral. Both findings are replicated and reinforced by examining the previous landmark studies. Our Turkish samples do not coalesce preferentially to ancestral nodes when mapped onto these studies’ trees.

Additional relevant information on the entire Hg J comes from the discontinuous distribution of J2b-M12. The northern fringe of our sample is enriched in the J2b-M241 subclade, which reappears in the gulf of Bengal38,45, with low frequencies in the intervening Iraq46 and Iran47. No J2b-M12 carriers were found among 35 modern Lebanese, as contrasted to one of two ancient specimens from the same region35.

In summary, a first conclusion of our sequencing effort and merge with available data is that the phylogeography of Hg J is complex and hardly explained by the presence of a single population harbouring the major lineages at the onset of agriculture and spreading westward. A unifying explanation for all the above inconsistencies could be a centre of initial radiation outside the area here sampled more densely, i.e. the Caucasus and regions North of it, from which different Hg J subclades may have later reached mainland Italy, Greece and Turkey, possibly following different routes and times. Evidence in this direction comes from the distribution of J2a-M41045,48 and the early-49 or mid-Holocene50 southward spread of J1.

Supplemental Figure 7. Maps of sampling locations for the carriers of the derived allele (white triangle point down) at the indicated SNP vs carriers of the ancestral allele (black triangle point-up), conditioned on identical genotype at the same most terminal marker. Coastlines were drawn with the R packages18 “map” and “mapproj” v. 3.1.3 (, and additional features added with default functions. The star triangle indicates the centroid of derived alleles. The solid square indicates the centroid of ancestral alleles, with its 95% C.I. (ellipse). In the insets: distributions of the pairwise sampling distances (in Km) for the carriers of the ancestral (black) and derived (white) allele, with solid and dashed lines indicating the respective averages. At right: median joining network of 7-STR haplotypes and SNPs in the same groups, with sectors coloured according to sampling location. Haplotype structure is detailed for some nodes, in the order YCA2a-YCA2b-DYS19-DYS390-DYS391-DYS392-DYS393 (in italics).

The lineage defined by rs779180992, belonging to J2b-M205, and dated at 4–4.5 kya, has a radically different distribution, with derived alleles in Continental Italy, Greece and Northern Turkey, and two instances in a Palestinian and a Jew. The interpretation of the spread of this lineage is not straightforward. Tentative hypotheses are linked to Southward movements that occurred in the Balkan Peninsula from the Bronze Age29,53, through the Roman occupation and later54.

The slightly older (5.6–6.3 kya) branch 98 lineage displays a similar trend of a Eastward positioning of derived alleles, with the notable difference of being present in Sardinia, Crete, Cyprus and Northern Egypt. This feature and the low frequency of the parental J2a-M92 lineage in the Balkans27 calls for an explanation different from the above.

Finally, we explored the distribution of J2a-L397 and three derived lineages within it. J2a-L397 is tightly associated with a typical DYS445 6-repeat allele. This has been hypothesized as a marker of the Greek colonizations in the Mediterranean55, based on its presence in Greek Anatolia and Provence (France), a region with attested Iron Age Greek contribution. All of our chromosomes in this clade were characterized also by DYS391(9), confirming their Anatolian Greek signature. We resolved the J2a-L397 clade to an unprecedented precision, with three internal markers which allow a finer discrimination than STRs. The ages of the three lineages (2.0–3.0 kya) are compatible with the beginning of the Greek colonial period, in the 8th century BCE. The three subclades have different distributions (Fig. 2B), with two (branches 57, 59) found both East and West to Greece, and one only in Italy (branch 58). As to Mediterranean Islands, J2a-L397 was found in Cyprus56 and Crete43. Its presence as one of the three branches 57–59 will represent an important test. In Italy all three variants were found mainly along the Western coast (18/25), which hosted the preferred Greek trade cities. The finding of all three differentiated lineages in Locri excludes a local founder effect of a single genealogy. Interestingly, an important Greek colony was established in this location, with continuity of human settlement until modern times. The sample composed of the same subjects displayed genetic affinities with Eastern Greece and the Aegean also at autosomal markers57. In summary, the distributions of branches 57–59 mirror the variety of the cities of origin and geographic ranges during the phases of the colonization process58.

So, there you have it, another proof that haplogroup J and CHG-related ancestry in the Mediterranean was mainly driven by different (and late) expansions of historic peoples.


Decline of genetic diversity in ancient domestic stallions in Europe

Open access research article Decline of genetic diversity in ancient domestic stallions in Europe, by Wutke et al., Science (2018), 4(4):eaap9691.

Abstract (emphasis mine):

Present-day domestic horses are immensely diverse in their maternally inherited mitochondrial DNA, yet they show very little variation on their paternally inherited Y chromosome. Although it has recently been shown that Y chromosomal diversity in domestic horses was higher at least until the Iron Age, when and why this diversity disappeared remain controversial questions. We genotyped 16 recently discovered Y chromosomal single-nucleotide polymorphisms in 96 ancient Eurasian stallions spanning the early domestication stages (Copper and Bronze Age) to the Middle Ages. Using this Y chromosomal time series, which covers nearly the entire history of horse domestication, we reveal how Y chromosomal diversity changed over time. Our results also show that the lack of multiple stallion lineages in the extant domestic population is caused by neither a founder effect nor random demographic effects but instead is the result of artificial selection—initially during the Iron Age by nomadic people from the Eurasian steppes and later during the Roman period. Moreover, the modern domestic haplotype probably derived from another, already advantageous, haplotype, most likely after the beginning of the domestication. In line with recent findings indicating that the Przewalski and domestic horse lineages remained connected by gene flow after they diverged about 45,000 years ago, we present evidence for Y chromosomal introgression of Przewalski horses into the gene pool of European domestic horses at least until medieval times.

The frequencies of Y chromosome haplotypes started to change during the Late Bronze Age (1600–900 BCE).
Inferred temporal trajectories of haplotype frequencies. Each haplotype is displayed by a different color. The shaded area represents the 95% highest-density region. The trajectories were constructed taking the median values across frequencies from the simulations of the Bayesian posterior sample. The small chart represents the stacked frequencies; the amplitude of each colored area is proportional to the median haplotype frequencies (normalized) at a given time. The x and y axes of the small chart match those in the large one. Ka, thousands of years.

Interesting excerpts:

The first record of the modern domestic Y chromosome haplotype stems from two Bronze Age samples of similar age. Notably, both samples were found in two distantly located regions: present-day Slovakia (2000–1600 BCE, dated by archaeological context) and western Siberia (14C-dated: 1609–1436 cal. BCE). Although a very recent study proposes an oriental origin of this haplotype (14), we cannot determine the geographical origin of Y-HT-1 with certainty, because this haplotype has not been found thus far in predomestic or wild stallions. There are two possible scenarios: (i) Y-HT-1 emerged within the domestic population by mutation and (ii) Y-HT-1 was already present in wild horses and entered the domestic population either at the beginning of domestication (but initially restricted to Asian horses) or later by introgression (from wild Y-HT-1 carrying studs during the Iron Age). Crosses between domestic animals and their wild counterparts have been observed in several domestic species (15–18); thus, the simplest explanation would be that we missed Y-HT-1 in older samples because of limited geographical sampling. However, the estimated haplotype age is contemporary (Fig. 4) with the assumed starting point of horse domestication ~4000–3500 BCE (19), rendering it likely that Y-HT-1 originated within the domestic horse gene pool. Still, we cannot rule out definitively that it appeared before domestication.

Independent of its geographical origin, Y-HT-1 progressively replaced all other haplotypes—except for one additional lineage that is restricted to Yakutian horses (11). Considering our data, this trend in paternal diversity toward dominance of the modern lineage appears to start in the Bronze Age and becomes even more pronounced during the Iron Age. The Bronze Age was a time of large-scale human migrations across Eurasia (20–22), movements that were undoubtedly facilitated by the spread of horses as a means of transport and warfare. At that time, the western Eurasian steppes were inhabited by highly mobile cultures that largely relied on horses (20, 21, 23, 24). The genetic admixture of northern and central European humans with Caucasians/eastern Europeans did correlate with the spread of the Yamnaya culture from the Pontic-Caspian steppe (25), an area that has repeatedly been suggested as the center of horse domestication (19, 26, 27). Given the importance of domestic horses, it appears that deliberate selection/rejection of certain stallions by these people might have contributed to the loss of paternal diversity. The spread of humans out of this region might also have resulted in the spread of Y-HT-1 from Asia to Europe. This scenario also agrees with recent findings that the low male diversity of extant horses is not caused by recruiting only a limited number of stallions during early domestication (13).

Decline of paternal diversity began in Asia.
Maps displaying age, locality, and haplotype (different colors) of each successfully genotyped sample.

The presence of the Y chromosome haplotype carried by present-day Przewalski horses (Y-HT-2) in early domestic stallions and a European wild horse (Pie05; table S2) could be the result of introgression of Przewalski stallions. Although the original distribution of the Przewalski horse is unknown, it was probably much larger than that of the relict population in Mongolia that produced modern Przewalski horses and might even have extended into Central Europe. However, it is also possible that either Przewalski horses were among the initially domesticated horses or that Y-HT-2 occurred both in Przewalski horses and in those wild horses that are the ancestors of domestic horses, based on autosomal DNA data (30). Regardless of how Y-HT-2 entered the domestic gene pool, it was eventually lost, as were all haplotypes except Y-HT-1. In our sample set, Y-HT-2 was undetectable as early as the third time bin. However, it is possible that Y-HT-2 may have been present during this time period, but with a frequency below 0.11 (with 95% probability). The inferred time trajectories for Y-HT-2 frequencies suggest that it could nevertheless have persisted at very low frequencies until the Middle Ages (Fig. 3). On the basis of these simulations, this finding could be interpreted as a relic of this haplotype’s formerly higher frequency in the domestic horse gene pool. It is also possible that the presence of this haplotype could be the result of mating a wild stallion with a domestic mare, a frequently reported breeding practice when wild horses were still widely distributed. However, a significant contribution of the Przewalski horse to the gene pool of modern domestic horses has been almost ruled out by recent genomic studies (13, 31, 32).

Stallion lineages through time.
Temporal haplotype network of the four detected Y chromosome haplotypes. Age of the samples indicated by multiple layers separated by color; vertical lines connecting the haplotypes of consecutive layers/ages represent which haplotype was transferred into a later/younger period. Numbers constitute the respective number of individuals showing this particular haplotype for that period. Prz, Przewalski; Dom, domestic.


Genetic prehistory of the Baltic Sea region and Y-DNA: Corded Ware and R1a-Z645, Bronze Age and N1c


Open Access The genetic prehistory of the Baltic Sea region, by Mittnik et al., Nature Communications 9: 442 (2018), based on preprint The Genetic History of Northern Europe, at BioRxirv.

As you can see, it follows my predictions in terms of haplogroups, and sadly the same trend to substitute ‘Yamna’ for ‘steppe’ while keeping linguistic interpretations unchanged…

Important excerpts for the Indo-European question (emphasis mine):

Mesolithic to Neolithic

In the archaeological understanding, the transition from Mesolithic to Neolithic in the Eastern Baltic region does not coincide with a large-scale population turnover and a stark shift in economy as seen in Central and Southern Europe. Rather, it is signified by a change in networks of contacts and the use of pottery, among other material, cultural and economic changes. Our results suggest continued admixture between groups in the south of the Eastern Baltic region, who are more closely related to WHG, and northern or eastern groups, more closely related to EHG. Neolithic social networks from the Eastern Baltic to the River Volga could also explain similarities of the hunter-gatherer pottery styles, although morphologically analogous ceramics might also have developed independently due to similar functionality. The genetic evidence for a change in networks and possibly even a large-scale population movement is most pronounced in the Middle Neolithic in individuals attributed to the CCC. The distribution of this culture overlaps in the north with the Narva culture and extends further north to Finland and Karelia. Its spread in the Eastern Baltic is linked with a significant change in imported raw materials, artefacts, and the appearance of village-like settlements15.

Neolithic to Chalcolithic

We see a further population movement into the regions surrounding the Baltic Sea with the CWC in the Late Neolithic that was accompanied by the first evidence of extensive animal husbandry in the Eastern Baltic. The presence of ancestry from the Pontic-Caspian Steppe among Baltic CWC individuals without the genetic component from north-western Anatolian Neolithic farmers must be due to a direct migration of steppe pastoralists that did not pick up this ancestry in Central Europe. It suggests import of the new economy by an incoming steppe-like population independent of the agricultural societies that were already established to the south and west of the Baltic Sea. The presence of direct contacts to the steppe could lend support to a linguistic model that sees an early branching of Balto-Slavic from a Proto-Indo-European language, for which the west Eurasian steppe was proposed as a homeland. However, as farmer ancestry is found in later Eastern Baltic individuals, it is likely that considerable individual mobility and a network of contact throughout the range of the CWC facilitated its spread eastward, possibly through exogamous marriage practices. Conversely, the appearance of mitochondrial haplogroup U4 in the Central European Late Neolithic after millennia of absence could indicate female gene-flow from the Eastern Baltic, where this haplogroup was present at high frequency.

PCA and ADMIXTURE analysis reflecting Late Neolithic in Northern European prehistory. a Principal components analysis of 1012 present-day West Eurasians (grey points, modern Baltic populations in dark grey) with 294 projected published ancient and ancient North European samples introduced in this study (marked with a red outline). b Ancestral components in ancient individuals estimated by ADMIXTURE (k = 11)
Zoomed-in version of the European Late Neolithic PCA.

So, we see that no farmer ancestry is found in the Baltic (unlike in Western Yamna), that PCA of Late Neolithic is closer to Corded Ware samples from Europe (or to earlier samples from the region) and not to Yamna, as suggested at first by the Zvejnieki individual.

There obviously was exogamy – which may in fact justify the findings in PCA close to Yamna (like the Zvejnieki sample), although researchers obviate that.

Also, as expected, no R1b-M269 in the Baltic (during the Corded Ware period), most are R1a with the majority showing subclade R1a-Z645 (and others poor SNP coverage), which support the reduction in haplogroup diversity to this very subclade during the expansion of Corded Ware peoples, as I predicted it would happen.

Bronze Age

Local foraging societies were, however, not completely replaced and contributed a substantial proportion to the ancestry of Eastern Baltic individuals of the latest LN and Bronze Age. This ‘resurgence’ of hunter-gatherer ancestry in the local population through admixture between foraging and farming groups recalls the same phenomenon observed in the European Middle Neolithic and is responsible for the unique genetic signature of modern-day Eastern Baltic populations.

We suggest that the Siberian and East Asian related ancestry in Estonia, and Y-haplogroup N in north-eastern Europe, where it is widespread today, arrived there after the Bronze Age, ca. 500 calBCE, as we detect neither in our Bronze Age samples from Lithuania and Latvia. As Uralic speaking populations of the Volga-Ural region show high frequencies of haplogroup N, a connection was proposed with the spread of Uralic language speakers from the east that contributed to the male gene pool of Eastern Baltic populations and left linguistic descendants in the Finno-Ugric languages Finnish and Estonian. A potential future direction of research is the identification of the proximate population that contributed to the arrival of this eastern ancestry into Northern Europe.

I predicted that haplogroup N arrived probably to the region west of the Urals with the Sejma-Turbino phenomenon, and that it expanded quite late, probably through founder effects. A late arrival to the region leaves obviously (safe for these researchers and others working with old ideas) only the Corded Ware culture (represented by steppe admixture and mainly haplogroup R1a-Z645) as the vector of expansion of Uralic languages, which show obviously a dialectalization process and regional expansion much older than 500 BC…

It is funny to see how people keep trying to identify R1a with ‘Yamnaya’, now ‘steppe’, but always Indo-European (an ethnolinguistic term, mind you) supposedly because of the ‘Yamnaya’ (now ‘steppe’) admixture, but the only ‘mark’ of Uralic languages for the same researchers in the same paper using this very concept is nevertheless, paradoxically, haplogroup N, with an assumption explicitly based on prevalence in modern populations

This admixture vs. haplogroup question for language and culture identification in genetic papers is really gettting messed up with new data, now in a contortionist-like way…

Images and text: Content of the paper is licensed under CC-by 4.0.

See also:

More evidence on the recent arrival of haplogroup N and gradual replacement of R1a lineages in North-Eastern Europe


A new article (in Russian), Kinship Analysis of Human Remains from the Sargat Mounds, Baraba forest-steppe, Western Siberia, by Pilipenko et al. Археология, этнография и антропология Евразии Том 45 № 4 2017, downloadable at ResearchGate.


We present the results of a paleogenetic analysis of nine individuals from two Early Iron Age mounds in the Baraba forest -teppe, associated with the Sargat culture (fi ve from Pogorelka-2 mound 8, and four from Vengerovo-6 mound 1). Four systems of genetic markers were analyzed: mitochondrial DNA, the polymorphic part of the amelogenin gene, autosomal STR-loci, and those of the Y-chromosome. Complete or partial data, obtained for eight of the nine individuals, were subjected to kinship analysis. No direct relatives of the “parent-child” type were detected. However, the data indicate close paternal and maternal kinship among certain individuals. This was evidently one of the reasons why certain individuals were buried under a single mound. Paternal kinship appears to have been of greater importance. The diversity of mtDNA and Y-chromosome lineages among individuals from one and the same mound suggests that kinship was not the only motive behind burying the deceased people jointly. The presence of very similar, though not identical, variants of the Y chromosome in different burial grounds may indicate the existence of groups such as clans, consisting of paternally related males. Our conclusions need further confi rmation and detailed elaboration. Keywords: Paleogenetics, ancient DNA, kinship analysis, mitochondrial DNA, uniparental genetic markers, STR-loci, Y-chromosome, Baraba forest-steppe, Sargat culture, Early Iron Age.

From the older study of the same region (Baraba, numbered 4) “Location of ancient human groups with a high frequency of mtDNA haplogroups U5, U4 and U2e lineages. The area of Northern Eurasian anthropological formation is marked by yellow region on the map (References: 1. Bramanti et al., 2009; 2. Malmstrom et
al., 2009; 3. Krause et al., 2010; 4. this study)”

Chronological time scale of Bronze Age Cultures from the Baraba region
This is the same team that brought an ancient mtDNA study of different cultures within the Baraba steppe-forest region (from the Open Access book Population Dynamics in Prehistory and Early History).

The Baraba steppe-forest is a region between the Ob and Irtysh rivers (about 800 km from west to east), stretching over 200 km from the taiga zone in the north to the steppes in the south.

The new study brings a more recent picture of the region, from the Iron Age Sargat culture, ca. 500 BC – 500 AD, with five samples of haplogroup N and two samples of haplogroup R1a.

R1a lineages in the region probably derive from the previous expansion of Andronovo and related cultures, which had absorbed North Caspian steppe populations and their Late Indo-European culture.

N subclades prevalent in certain modern Eurasian populations are probably derived from the expansion of the Seima-Turbino phenomenon.

While samples are scarce, Y-DNA data keeps showing the same picture I have spoken about more than once:

N subclades (potentially originally speaking Proto-Yukaghir languages) gradually replacing haplogroup R1a (originally probably speaking Uralic languages), probably through successive founder effects (such as the bottlenecks found in Finland), which left their Uralic culture and ethnolinguistic identification intact.

Therefore, late Corded Ware groups of North-Eastern Europe (in the Forest Zone and the Baltic), mainly of R1a-Z645 subclades, probably never adopted Late Indo-European languages.


The concept of “Outlier” in Human Ancestry (II): Early Khvalynsk, Sredni Stog, West Yamna, Iron Age Bulgaria, Potapovka, Andronovo…


I already wrote about the concept of outlier in Human Ancestry, so I am not going to repeat myself. This is just an update of “outliers” in recent studies, and their potential origins (here I will repeat some of the examples):

Early Khvalynsk: the three samples from the Samara region have quite different positions in PCA, from nearest to EHG (of Y-DNA haplogroup R1a) to nearest to ANE ancestry (of Y-DNA haplogroup Q). This could represent the initial consequences of the second wave of ANE ancestry – as found later in Yamna samples from a neighbouring region -, possibly brought then by Eurasian migrants related to haplogroup Q.
With only 3 samples, this is obviously just a tentative explanation of the finds. The samples can only be reasonably said to show an unstable time for the region in terms of admixture (i.e. probably migration), judging by the data on PCA.

Ukraine Eneolithic samples offer a curious example of how the concept of outlier can change radically: from the third version (May 30th) of the preprint paper of Mathieson et al. (2017), when the Ukraine Eneolithic sample with steppe ancestry (and clustering with central European samples) was the ‘outlier’, to the fourth version (September 19th), when two samples with steppe ancestry clustering close to Corded Ware samples were now the ‘normal’ ones (i.e. those representing Ukraine Eneolithic population), and the outlier was the one clustering closely with Ukraine Mesolithic samples…

PCA and Admixture for south-eastern Europe. Image modified from Mathieson et al. (2017) – Third revision (May 30th), used in the 2nd edition of the Indo-European demic diffusion model.

This is one of the funny consequences of the wrong interpretation of the ‘yamnaya component’, that made geneticists believe at first that, out of two samples (!), the ‘outlier’ was the one with ‘yamnaya’ ancestry, because this component would have been brought by an eastern immigrant from early Khvalynsk…

This example offers yet another reason why precise anthropological context is necessary to offer the right interpretation of results. Within the Indo-European demic diffusion model – based mainly on Archaeology and Linguistics – , the sample with steppe ancestry was the most logical find in the region for a potential origin of the Corded Ware culture, and it was interpreted as such, well before the publication of the fourth version of Mathieson et al. (2017).

PCA of South-East European and other European samples. Image modified from Mathieson et al. (2017) – Fourth revision (September 19th), used in the 3rd edition of the Indo-European demic diffusion model.

West Yamna (to insist on the same question, the ‘yamnaya’ component): we have only four western Yamna samples, two of them showing Anatolian Neolithic ancestry (one of them, from Ukraine, with a strong ‘southern’ drift). On the other hand, Corded Ware migrants do not show this. So we could infer that their migrations were not coetaneous: whereas peoples of Corded Ware culture expanded ca. 3300 BC to the north – in the natural corridor to the Baltic that has been proposed for this culture in Archaeology for decades (and that is well represented by Ukraine Eneolithic samples) -, peoples of Yamna culture expanded to the west, replacing the Ukraine Eneolithic population (i.e. probably those of ‘Proto-Corded Ware culture’), and eventually mixing with Balkan populations of Anatolian Neolithic ancestry.

Potapovka, Andronovo, and Srubna: while Potapovka clusters closely to the steppe, and Andronovo (like Sintashta) clusters closely to Corded Ware (i.e. Ukraine Neolithic / Central-East European), both have certain ‘outliers’ in PCA: the former has one individual clustering closely to Corded Ware, and the latter to the steppe. Both ‘outliers’ fit well with the interpretation of the recent mixture of Corded Ware peoples with steppe populations, and they offer a different image for the evolution of populations of Potapovka and Sintashta-Petrovka, potentially influencing their language. The position of Srubna samples, nearer to Sintashta and Andronovo (but occupying the same territory as the previous Potapovka) offers the image of a late westward conquest from Corded Ware-related populations.

Diachronic map of migrations ca. 2250-1750 BC

Iron Age Bulgaria: a sample of haplogroup R1a-z93, with more ‘yamnaya’ ancestry than any other previous sample from the Balkans. For some, it might mean continuity from an older time. However – as with the Corded Ware outlier from Esperstedt before it – it is more likely a recent migrant from the steppe. The most likely origin of this individual is therefore people from the steppe, i.e. either the Srubna culture or a related group. Its relatively close cluster in PCA to certain recent Slavic populations can be interpreted in light of the multiple back and forth migrations in the region: of steppe populations to the west (Srubna, Cimmerians, Scythians, Sarmatians,…), and of Slavic-speaking populations:

Diachronic map of Bronze Age migrations ca. 1750-1250 BC.

Well-defined outliers are, therefore, essential to understand a recent history of admixture. On the other hand, the very concept of “outlier” can be a dangerous tool – when the lack of enough samples makes their classification as as such unjustified -, leading to the wrong interpretations.


Another hint at the role of Corded Ware peoples in spreading Uralic languages into north-eastern Europe, found in mtDNA analysis of the Finnish population


Open article at Scientific Reports (Nature): Identification and analysis of mtDNA genomes attributed to Finns reveal long-stagnant demographic trends obscured in the total diversity, by Översti et al. (2017).

Of special interest is its depiction of Finland’s past as including the expansion of Corded Ware population of mtDNA U5b1b2 (and probably Y-DNA R1a-M417 subclades), most likely Uralic speakers of the Forest Zone, to the north of the Yamna culture (where Late Proto-Indo-European was spoken).

A later expansion of other subclades – particularly Y-DNA N1c -, was probably associated with the later western expansion of the Eurasian Seima-Turbino phenomenon, and its current prevalence in Finnish Y-DNA haplogroups might have been the consequence of the population decline ca. 1500 BC, and later Iron Age population bottleneck (with the population peak ca. 500 AD) described in the article.

That would more naturally explain the ‘cultural diffusion’ of Finnic languages into invading eastern N1c lineages, a diffusion which would have been in fact a long-term, quite gradual replacement of previously prevalent Y-DNA R1a subclades in the region, as supported by the prevalent “steppe” component in genome-wide ancestry of Finns.

Therefore, there were probably no sudden, strong population (and thus cultural) changes associated with the arrival of N1c lineages, like the ones seen with R1a (Corded Ware / Uralic) and R1b (Yamna / Proto-Indo-European) expansions in Europe.

How the Saami fit into this scheme is not yet obvious, though.


In Europe, modern mitochondrial diversity is relatively homogeneous and suggests an ubiquitous rapid population growth since the Neolithic revolution. Similar patterns also have been observed in mitochondrial control region data in Finland, which contrasts with the distinctive autosomal and Y-chromosomal diversity among Finns. A different picture emerges from the 843 whole mitochondrial genomes from modern Finns analyzed here. Up to one third of the subhaplogroups can be considered as Finn-characteristic, i.e. rather common in Finland but virtually absent or rare elsewhere in Europe. Bayesian phylogenetic analyses suggest that most of these attributed Finnish lineages date back to around 3,000–5,000 years, coinciding with the arrival of Corded Ware culture and agriculture into Finland. Bayesian estimation of past effective population sizes reveals two differing demographic histories: 1) the ‘local’ Finnish mtDNA haplotypes yielding small and dwindling size estimates for most of the past; and 2) the ‘immigrant’ haplotypes showing growth typical of most European populations. The results based on the local diversity are more in line with that known about Finns from other studies, e.g., Y-chromosome analyses and archaeology findings. The mitochondrial gene pool thus may contain signals of local population history that cannot be readily deduced from the total diversity.

From its results:

In general, there appears to be two loose and largely overlapping clusters among the Finn-characteristic haplogroups: the first between 1,000–2,000 ybp and the second around 3,300–5,500 ybp. The age of the older cluster coincides temporally with the arrival of the Corded-Ware culture and, notably, the spread of agriculture in Finland. The arrival and spread of agriculture, temporally corresponding with the age estimates for most of the haplogroups characteristic of Finns, might be a sign of population size increase enabled by the new mode of subsistence, resulting in reduced drift and accumulation of genetic diversity in the population.


Another insight in the past population sizes in Finland is based on radiocarbon-dated archaeological findings in different time periods. These analyses suggest two prehistoric population peaks in Finland, the Stone Age peak (c. 5,500 ybp) and the Metal Age peak (~1,500 ybp). Both of these peaks were followed by a population decline, which appears to have reached its ebb around 3,500 ybp. These developments are not distinguishable in the BSPs. However, these ages correspond well to the two haplogroup age clusters described above. The presumably less severe Iron Age population bottleneck seen in the archaeological data, 1,500–1,300 ybp, temporally coincides with the population size reduction visible for the Finn-characteristic subhaplogroups.


Discovered via Eurogenes.