The previous post showed the potential use of TreeToM to visualize ancient DNA samples in maps together with their Y-DNA phylogenetic trees. I have written Newick trees for Y-chromosome haplogroups R1b-L388 (encompassing R-V1636 and R-P297, which in turn split into R-M73 and R-M269), R1a, and N.
I have reviewed some of the BAM files from my previous bulk analyses with YLeaf v.2, to add information that I had not previously included in the All Ancient DNA Dataset, and which might be relevant to the proper depiction of phylogenetic trees; in particular, positive and negative SNPs potentially distinguishing archaic subclades (“pre-” and “basal” branches) within the R1a, R1b, and N trunks:
NOTE. I don’t have currently time for much more, and SNP inference is a kind of “art” which requires experience assessing (available ancestral and derivative reads and their implications in) hundreds of ancient and modern samples which very few people have, and few of them dedicate time to ancient DNA, so if you see errors just contact me.
1. Haplogroup K split in Northern Asia
It seems quite likely that the expansions of early K subclades accompanied the different waves of ancestry described for AMH, Upper Palaeolithic and Epipalaeolithic expansions, although it is still unclear which precise route(s) R1a and R1b carriers took and when, because the few available ancient samples (and modern DNA distributions) are of little help regarding potential northern vs. southern Caspian paths for ca. 20,000-10,000 years ago.
1.2. Haplogroup K Newick tree
This is a simplified Newick tree for the available samples within the K trunk:
NOTE. I have used a simplified adjustment of node distances to reflect the TMRCA of early branches (where 0.1 is roughly a thousand years), but then I have kept 0.1 for L23 and subsequent branches to avoid overcrowding.
This is the list of available ancient samples within the R1b-L388 tree ordered by date:
NOTE. I have tried to remove all samples with undefined subclades whenever they are not informative for the specific period and region (viz. undefined L23 in the Bell Beaker period, undefined L151 or P312 in the Bronze Age, etc.).
NOTE. Not included is the potential R1b-M269 sample with Steppe ancestry from the Balkan Chalcolithic Smyadovo site, due to conflicting negative SNPs (for P) potentially indicating damage. If it is within the R trunk, as officially reported, then it’s likely R1b-M269, too, as it likely expanded with Suvorovo chiefs.
2.3.2. Haplogroup R1b in Yamnaya- and Afanasievo-related peoples
These are the samples and corresponding TreeToM map of Yamnaya, Afanasievo, and other Early Bronze Age groups. Particularly interesting are the basal L23* and Z2103* subclades found in the Don-Volga region, supporting Anthony’s (2019) description of it as cradle of Proto-Indo-Europeans.
NOTE. The Corded Ware sample from Obłaczkowo, RISE1, is not included due to lack of subclade beyond R1b-L754, and especially because of the conflicting archaeological context (hence origin and radiocarbon date), given that the tested individual should be the same as poz44, which has a different Y-DNA and mtDNA…
2.3.3. Haplogroup R1b in Bell Beaker, Balkan, and Poltavka expansions
These are the samples and corresponding TreeToM map of European Late Neolithic – Early Bronze Age groups, such as Bell Beakers, Balkan BA, Catacomb, Poltavka. Particularly interesting are:
High variability proper of the expected Bell Beakers source – includinig basal subclades L51*, L151*, P310*, as well as Z2103 (and dubious U106, which I haven’t included despite the confirmed slightly later sample from Únětice) – among the East BBC Group (see another map with basal subclades).
The following map is a close-up of the Bell Beaker and EEBA area and associated R1b lineages. Even without including the dubious U106 samples from Bavarian, Czech, and Hungarian BBC groups, it is evident from Bell Beaker and European Early Bronze Age data combined that the common origin of expansion of Yamnaya lineages with Bell Beakers was located in Central Europe closest to the Middle Danube, with the periphery showing bottlenecks under specific subclades.
These are the samples and corresponding TreeToM map of Bronze Age groups from Eurasia. Particularly relevant in this period are the emergence of Yamnaya R1b-L23 lineages among Mycenaeans (Z2103 and potentially also xZ2103), and among early Italic-speaking peoples (basal Z2118* and Z2103, apart from L151), which – together with Pre-Tocharian Afanasievo and Pre-Indo-Iranian Poltavka – confirms the association of the expansion of R1b-L23-rich Yamnaya with the spread of Late Proto-Indo-European.
And this is a close-up of Europe:
3. Haplogroup R1a East European expansion
3.1. Haplogroup R1a Newick tree
I have used a simplified adjustment of node distances to reflect the TMRCA of early branches (where 0.1 is roughly a thousand years), but then I have kept 0.1 for M417 and subsequent branches to avoid overcrowding.
This is the list of R1a cases used for the maps in the required CSV format:
NOTE. I have removed all samples with undefined subclades not informative for later dates (viz. undefined M459 in EMBA, undefined M417 in LBA, etc.). I have also removed the unlikely R1a cases from Cis-Baikal Neolithic samples, as well as the unlikely R-M198 (xZ93) from the Xiaohe cemetery, since it is more straightforward to assume that any Andronovo-related R1a case from Xinjiang should be, in fact, Z93+.
These are the Mesolithic and Neolithic samples up until just before the Corded Ware expansion, and the map they form (with slightly modified colors for easier visualization):
NOTE. The R1a-Y3* sample from Alexandria (I6561) has a radiocarbon date ca. 4000 BC that should be questioned based on its fully formed Corded Ware-like ancestry (different from coeval or later samples from the area), Y3 subclade, and LP alleles appearing 1,000 years earlier than they should, but I include it nevertheless in case its real date corresponds to a Pre-Yamnaya or Yamnaya phase (say, ca. 3500-2900 BC) from the Middle Dnieper.
3.3.2. Haplogroup R1a in Corded Ware and MLBA Sintashta-Potapovka
It is hard not to notice that the division of R1a-M417 subclades before and during the expansion of Corded Ware groups runs ostensibly through the Volhynian-Podolian Upland, exactly where archaeologists have traditionally estimated the origin of the Corded Ware culture. That is, until the fiasco of conventionally selecting a poorly understood (and catastrophically named) “Yamnaya ancestry” component as defining a single population, which has thrown the field into disarray.
3.3.3. Haplogroup R1a in the Bronze Age
These are the samples from the Bronze Age until just before the Scythian and later expansions from Asia. The most interesting aspect in Europe is the apparent replacement of most R1a-CTS4385 lineages prevalent in CWC groups of Central Europe, which can be attributed to the massive migration of Bell Beakers.
The later gradual expansion of Z645 subclades in the Eastern EEBA province – seen in the Bronze Age samples from Iwno-Trzciniec and Chłopice-Veselé (in the previous map), and later in the Halberstadt LBA sample – suggest resurgence events and bottlenecks among expanding Balto-Slavs.
The assessment of haplogroup N is complicated by the current lack of a proper regional and temporal transect of Siberia. Nevertheless, the different available patterns in selected periods shed some light on the most likely causes of its bottlenecks.
4.1. Haplogroup N Newick tree
This is the Newick tree for the reported N subclades (no distance between nodes for clarity):
4.3.1. Haplogroup N Mesolithic and Neolithic expansions
These are the early N subclades available and the corresponding map, which shows the most likely origin of the split of N lineages west of Lake Baikal:
NOTE. I did not include the reported N-Tat of a Comb Ware group by Chekunova because of the dubious combination of subclade and radiocarbon date in that location, although admittedly R1a-Tat is old enough to appear anywhere during the Neolithic…
4.3.2. Haplogroup N Eurasian Bronze Age
These are the available Bronze Age samples until just before the (Pre-Scythian) nomadic expansions. Notice the L1026* from Lovozero related to the expansion of Palaeo-Arctic peoples, and the L1026 from Khövsgöl in the Eastern Steppes, a region under the impact of Karasuk-related ancestry, suggesting its presence among (and expansion with) the Scythian communities of Siberia. Both cases help explain the later emergence of the so-called “Siberian ancestry” in Eastern and Northern Europe.
4.3.3. Haplogroup N and Iron Age nomads
The westward expansion of nomads throughout Northern Eurasia brought different haplogroups from Siberia across the Urals, with hg. N probably already prevalent among Iranian-, East Uralic-, and Altaic-speaking populations of Asia and possibly already on both sides of the Urals due to the integration of forest hunter-gatherers among expanding Corded Ware-related groups.
The Siberian origin of N lineages pushed into the Cis-Urals by Altaic-speaking peoples – some of which eventually spread with the Hungarian Conquerors – can be inferred from previous Iron Age expansions, as well as from the modern wide distribution of N-Y13850 subclades among disparate Turkic-speaking populations, from Turan to Samara in the north, and to Anatolia in the south.
Similarly, many N1a-VL29 (especially N1a-L550) subclades found in the East Baltic today spread eastwards with Germanic-speaking peoples during the Viking migrations in the Middle Ages, adding to the known radical founder effects that happened roughly at the same time as Finnic spread to the north and east from its Estonian homeland.
There is an obvious complementary distribution of R1b-M269 vs. R1a-M198, particularly R1b-L23 vs. R1a-M417, precisely during the relevant period of Indo-European vs. Uralic splits and expansions from Eastern Europe.
Only after the evolution of Bell Beakers into the European Early Bronze Age cultures, and of Poltavka herders into the Sintashta-Potapovka-Filatovka community, appeared different R1a subclades integrated into disparate (previously R1b-rich) Indo-European-speaking groups, expanding further under distinct regional bottlenecks corresponding to already developing dialects.
And only much later did N subclades, probably widely distributed among Palaeosiberian peoples during the Seima-Turbino phenomenon, start to have an impact within R1a-rich Uralic-speaking communities around the Urals, Arctic, and West Siberia, at the same time as other haplogroups (such as I1, I2, E1b, J2, or J1 subclades) either resurged or infiltrated and later expanded with Indo-European-speaking societies in Europe, too.
It has already become evident (see e.g. the recent Mittnik et al. 2019) the relevance of a proper SNP inference to track fine-scale population movements together with ancestry estimations, in order to properly distinguish potential prehistoric language expansions. It is amazing that in 2020 many labs are still working with outdated software or simplistic automated inferences for haplogroup assignments, instead of carefully reviewing each sample’s available data by hand.