Visualizing phylogenetic trees of ancient DNA in a map

Yesterday the Eaton Lab at Columbia University announced on Twitter a nifty little tool by Carlos Alonso Maya-Lastra called TreeToM, which accepts Newick trees and CSV latitude/longitude data to explore phylogeny and geography interactively, with no coding required.

I thought it could complement nicely my All Ancient DNA Dataset, particularly for those newly described SNPs (FTDNA private variants, etc.) that have not been incorporated yet into SNP Tracker.

Here are two examples with snippets to copy&paste to the appropriate boxes in TreeToM. Feel free to add others in the comments:

A. Epipalaeolithic R1b-V2219, Neolithic R1b-V88

The map below was published in the recent Marcus et al. (2019), who were probably right in assuming that R1b-V88 expanded with the Neolithic, despite a simpler reasoning based on ADMIXTURE and on the distinction of SNP V2197 (currently an equivalent of FTDNA’s FGC21027 or YFull’s Y8451) within the R1b-(pre-)V88 trunk:

r1b-v88-europe
Geographic and temporal distribution of R1b-V88 Y-haplotypes in ancient European samples. We plot the geographic position of all ancient samples inferred to carry R1b-V88 equivalent markers. Dates are given as years BCE (means of calibrated 2s radio-carbon dates). Multiple V88 individuals with similar geographic positions are vertically stacked. We additionally color-code the status of the R1b-V88 subclade R1b-V2197, which is found in most present-day African R1b-V88 carriers. Modified from Marcus et al. (2019).

A.1. Phylogenetic tree

First, we need the most likely phylogenetic tree based on known SNPs. In this case, due to FTDNA and YFull lagging behind ancient DNA, Pribislav’s V2219 tree is the most precise description available to date:

pribislav-r1b-v2219-haplotree
Pribislav’s R1b-V2219 tree.

Once we have a reasonable haplotree, we need to convert it to the Newick format (see specification here). There are as of today no estimated formation / TMRCA dates for all SNPs of the V2219 tree, so we will leave the tentatively described branches with the same node distance (hence the “0.1” in all).

NOTE. Blank spaces (imitating tabs) are only added for clarity. I have added node names as per the XML specification, but apparently TreeToM does not support it at this moment, nor does it seem to be common to display them in other tools using Newick trees.

('(L754)':0.01,
  ('V2219*':0.01,
  '(V2219)':0.01,
    ('PF6340*':0.01,
    '(PF6340)':0.01,
      ('Y8457*':0.1,
      '(Y8457)':0.1,
        ('V88*':0.1,
        '(V88)':0.1,
          ('M18*':0.1,
          '(M18)':0.1
        )M18:0.1,
          ('Y7777*':0.1,
          '(Y7777)',
            ('Y8451*':0.1,
            '(Y8451)':0.1,
            'Y8447':0.1,
              ('V35*':0.1,
              '(V35)':0.1
              )V35:0.1
            )Y8451:0.1
          )Y7777:0.1
        )V88:0.1
      )Y8457:0.1
    )PF6340:0.1
  )V2219:0.1
)L754;

A.2. Samples and latitude/longitude

Secondly, we need to sort the dataset by those samples under the old pre-R1b-V88 trunk, or R1b-L754 (xL388), i.e. within the recently described R1b-V2219 tree.

NOTE. I also included the Villabruna sample because of its most likely R1b-(pre-?)V2219 lineage, but not the LBK sample in Nikitin et al. (2019), whose subclade (if it is, in fact, of hg. R) is probably also well within the V2219 trunk.

r1b-v2219-v88
. Sorted spreadsheet All Ancient DNA by those belonging to the old R1b-V88 trunk.

Now we can select only the columns with subclade (using the same name as for the SNP branches above), latitude, and longitude, and add another one with web colors, and then save the file as a comma-separated CSV, so that we have an output similar to this one:

(L754),46.15,12.21,black
V2219*,44.6258333,22.6066666,black
V2219*,44.6258333,22.6066666,black
V2219*,44.5154854,23.52087725,black
V2219*,44.640262,22.30333,black
V2219*,44.629711,22.6125561,black
V2219*,44.552924,22.027563,black
V2219*,44.595879,22.010568,black
V2219*,51.79,11.14,black
(V2219),48.91422,33.76493,green
(V2219),39.866642,9.233355,green
(V2219),39.866642,9.233355,green
(V2219),44.29,4.46,green
(V2219),44.517,22.722,green
(V2219),44.517,22.722,green
PF6340*,44.595879,22.010568,blue
PF6340*,44.595879,22.010568,blue
PF6340*,44.595879,22.010568,blue
PF6340*,44.53,22.05,blue
PF6340*,51.358333,7.551111,blue
(PF6340),48.91422,33.76493,#BA55D3
(PF6340),48.91422,33.76493,#BA55D4
(PF6340),48.91422,33.76493,#BA55D5
(PF6340),43.2131,27.8644,#BA55D6
(PF6340),51.358333,7.551111,#BA55D7
(PF6340),41.49194444,2.138888889,#BA55D8
(PF6340),41.49194444,2.138888889,#BA55D9
(Y8457),48.3,35.139,violet
(Y8457),48.91422,33.76493,violet
(Y8457),48.91422,33.76493,violet
(Y8457),48.91422,33.76493,violet
(Y8457),48.91422,33.76493,violet
(Y8457),48.91422,33.76493,violet
(Y8457),48.91422,33.76493,violet
(Y8457),43.05777778,26.98361111,violet
(V88),39.87,9.2356,red
(V88),39.86666667,9.233333333,red
(V88),39.86666667,9.233333333,red
(V88),39.9489,8.4022,red
(V88),39.87,9.2356,red
M18*,39.87,9.2356,brown
(M18),39.87,9.2356,#8B0000
(M18),39.87,9.2356,#8B0000
Y8451*,42.5,0.5,#E9967A
Y8451*,39.1632,8.5326,#E9967A
(Y8451),44.53,22.05,#F08080
(Y8451),41.96,13.54,#F08080
(Y8451),42.5,0.5,#F08080
(Y8451),42.7343714,-7.0305368,#F08080
V35*,39.4397,9.1688,orange

A.3. TreeToM output

This is the output obtained by copy&pasting both snippets above in TreeToM:

r1b-v2219-phylogeography

You can also use a simpler tree and color scheme, for easier visualization:

ancient-r1b-v88-y-dna

As can be readily seen, whereas R1b-V2219 lineages spread with the Epipalaeolithic expansion of WHG-related peoples, R1b-V88 and its subclades show a secondary bottleneck coinciding with the Neolithic expansion, crossing from Europe into Northern Africa most likely with impressed ware through the Central Mediterranean.

neolithic-barley
Map of the Neolithic expansion coupled with barley landrace data from Genesys, by Luigi Guarino.

That puts an end to my attempts to integrate traditional views – such as the oft-cited Diakonoff (1998) – in tracing Proto-Afroasiatic to ca. 10000 BC (or even earlier!) to potential Mesolithic expansions, and further supports that the spread of Afro-Asiatic speakers was linked to pastoralist movements through the Green Sahara, as shown in Prendergast et al. (2019) and hinted by reconstructible PAA vocabulary.

NOTE. It seems quite fitting that population genomics helps dismiss the idea of a Stone Age Afro-Asiatic, as it helped dismiss the similarly (guesstimate-wise) far-fetched Anatolia Neolithic homeland for Proto-Indo-Europeans…

prendergast-africa-herders-dna
Admixture events contributing to ancestry of ancient eastern Africans. Results were inferred from genome-wide ancient DNA data from 41 individuals from archaeological sites in Kenya and Tanzania, analyzed together with published ancient and present-day genetic data. Black circles represent reported individuals, placed at their median calibrated radiocarbon dates (six individuals, five of whom have forager-related ancestry, had insufficient collagen for dating and thus are not represented here). Ancestry components depicted in green and gray continue to the present day (outside of eastern Africa) but are truncated for readability. Image modified from Prendergast et al. (2019).

This complicates the picture in terms of male lineages because, as it is well-known by now, different Y-DNA bottlenecks can be observed in the spread of the Neolithic through the Mediterranean (and, it is to be assumed, through Northern Africa), including G2a, R1b, J2b, T1a, or E1b subclades, and that without taking into account potential East Mediterranean expansions directly from the Levant.

On the other hand, based on D’Atanasio et al. (2018), it seems that by the end of the Green Sahara period, only a handful of lineages show a Pan-African expansion compatible with a massive language replacement event.

green-sahara-r1b-v88-v69-e1b-m-78
Time estimates and frequency maps of the four trans-Saharan haplogroups and major sub-clades. Left: Time estimates of the four trans-Saharan clades and their main internal lineages. To the left of the timeline, the time windows of the main climatic/historical African events are reported in different colours (legend in the upper left). Right: Frequency maps of the main trans-Saharan clades and sub-clades. For each map, the relative frequencies (percentages) are reported to the right.

A.4. Comparison with other maps

Although information on Y-DNA of ancient samples can be read directly from the All Ancient DNA Dataset, or visualized in context in the Prehistory Atlas, TreeToM offers a nice visualization that can help understand the geographic evolution of ancient SNPs.

For comparison, here is a dynamic OpenLayers map of samples in the R1b-V2219 trunk with a colour code similar to the one used above (click on the symbols to display more information):

ancient-dna-r1b-v2219-v88

You can also look for these samples in the ancient Y-DNA maps divided by ages:

chalcolithic-early-y-dna-map

For a clear picture of this haplogroup’s relevance in Africa, check out also the map of modern distribution of R1b-V88 in the maps of basal R1-M173 subclades.

B. Neolithic J2b-L283

At some point in late 2018 – 2019, it seemed possible that hg. J2b-L283 could have emerged as a minority among the R1b-rich Proto-Indo-Europeans of the Don–Volga region, but becoming more successful within Balkan EBA cultures, more or less like I2a-L699 or R1b-M269(xL23) lineages.

The discovery of a basal J2b-L283* in the Caucasus MBA sample reported by Wang et al. (2018) reinforced that idea, and no one could discard that – much like R1b-V1636 or R1a (xM417) – there wouldn’t be some other minor lineages expanding with Khvalynsk and (later) Yamnaya. The TMRCA of European clades was particularly fitting.

However, the ancient Sardinian samples reported in Marcus et al. (2019) – showing basal subclades and no Steppe ancestry – have shattered that expectation, and the Khvalynsk sample of hg. J reported by Anthony (2019) turned out to be J1.

In fact, the reported Etruscan from Civitavecchia by Antonio et al. (2019) complicates things further, because it is of the same subclade as the Balkan BA sample from Veliki Vanik, potentially evidencing a wider distribution of J2b-Y15058 among non-homogeneous Southern European Bronze Age communities.

B.1. Phylogenetic tree

This is a simplistic Newick tree made from YFull’s current public tree for J2b-L283:

NOTE. Node distances have been selected for better visualization, not (as they should) by taking estimated formation or TMRCA dates into account. For simplicity purposes, this tree does not include basal subclades – which would be essential to visualize the proper migration paths if there were many samples available.

('L283*':0.6,
'L283':0.6,
  ('YP91':0.7
  )YP91:0.1,
  ('Z600':0.7,
    ('YP157':0.6
    )YP157:0.1,
    ('Z585':0.1,
      ('Z615':0.1,
        ('Z597':0.1,
          ('Z2507':0.1,
            ('Y15058':0.1,
              ('Z38240':0.1,
                ('CTS6190':0.1
                )CTS6190:0.1
              )Z38240:0.1
            )Y15058:0.1
          )Z2507:0.1,
          ('Z638':0.1,
            ('Z1297':0.1,
              ('Y27522':0.1,
                ('Y23094':0.1
                )Y23094:0.1
              )Y27522:0.1,
              ('Z1295':0.1,
                ('Z8421':0.1,
                  ('Z631':0.1
                  )Z8421:0.1
                )Z8421:0.1
              )Z1295:0.1
            )Z1297:0.1
          )Z638:0.1
        )Z597:0.1
      )Z615:0.1
    )Z585:0.1
  )Z600:0.1
)L283;

B.2. Ancient samples

These are J2b-L283 lineages selected from the All Ancient DNA Dataset:

ancient-dna-j2b-l283

L283,40.15,45.86,green
L283*,43.354032,43.721893,black
Z600,39.667222,9.426111,blue
Z631,41.9179937,12.4912439,orange
Y23094,41.6877312,13.0864953,brown
Y15058,43.194,17.344,red
CTS6190,42.0888597,11.7656459,violet
YP157,39.667222,9.426111,aqua
YP91,39.667222,9.426111,lime

B.3. TreeToM output

This is the output obtained by copy&pasting both snippets above in TreeToM:

ancient-y-dna-j-l283

B. 4. Comparison with other tools

For comparison, here is the path created by SNP Tracker for J-Z631 based on modern descendants, following FTDNA’s public Y-DNA haplotree, showing uncertainty areas:

snp-tracker-j2b-l283

And here is path of SNP Tracker for J2b-CTS6190, the subclade found in the Etruscan sample from Civitavecchia (zoomed to Europe):

j2b-l283-cts6190

Referencing TreeToM

How to cite TreToM, according to the author on Twitter:

Maya-Lastra, C. A. 2020. TreeToM, Simple tool for visualizing phylogenetic trees onto a map. camayal.info/wa/treetom/

See also

Join the discussion...

It is good practice to be registered and logged in to comment.
Please keep the discussion of this post on topic.
Civilized discussion. Academic tone.
For other topics, use the forums instead.
4 Comments
Oldest
Newest
Inline Feedbacks
View all comments