Ancient Y-DNA haplogroups

This page hosts analyses of ancient human Y-chromosomal haplogroup inference from next generation sequencing data, performed with software YLeaf v.2. For information on data and interpretation, refer to the manual at Erasmus MC Resources.

As a quick reference on data interpretation, please read papers on ancient DNA damage, such as Dabney, Meyer, and Pääbo (2013), for information of which derivative (and ancestral) SNP calls may be wrong.

dna-fragmentation-deamination
Fragmentation and deamination. (A) A likely cause of fragmentation in ancient DNA is depurination, in which theN-glycosyl bond between a sugar and an adenine or guanine residue is cleaved, resulting in an abasic site. The DNA strand is then fragmented through b elimination, leaving 30-aldehydic and 50-phosphate ends. (B) Deamination of cytosine to uracil is the major mechanism leading to miscoding lesions in ancient DNA. DNA polymerases will incorporate an A across from the U, and in turn a T across from the A, causing apparent G to A and C to T substitutions.

For the most recent automated Y-SNP calls obtained with Yleaf v.2.3 or later over FASTQ files, including discussion of individual samples, see:

These are recent automated Y-SNP calls performed by Yleaf v.2.2 over FASTQ files (careful – it does not distinguish male from female samples):

The following are folders with results (in text files) for most ancient samples on which I have run Yleaf v.2 or v.2.2 (for some detailed examples with a more colorful presentation, see below):

If you are interested in adding a specific sample to the list, you can contact me.

Ancient DNA examples

The following examples are ordered chronologically, following the official date of the published papers. Most samples are tested for two options (labelled in the output spreadsheet):

  1. Option 20/90, to test standard quality.
    • Minimum number of reads for each base above on the quality threshold: 1
    • Minimum quality for each read: 20
    • Minimum percentage of a base result for acceptance: 90
  2. Option 0/10, to examine most SNP calls:
    • Minimum number of reads for each base above on the quality threshold: 1
    • Minimum quality for each read: 0
    • Minimum percentage of a base result for acceptance: 10

Ning et al. Cell (2019)

Proto-Tocharian samples from Shirenzigou, officially reported as of hg. R1b1a1b-M269.

Sample_name: M15-1
Hg: R1b2
Hg_marker: R-PH491/etc*(xY105682)
Total_reads: 3141489
Valid_markers: 4337
QC-score: 1.0
QC-1: 1.0
QC-2: 1.0
QC-3: 1.0
See full output.


Sample_name: M012
Hg: R1b2b
Hg_marker: R-BY14575*(xY32793,Y104457)
Total_reads: 2695398
Valid_markers: 3631
QC-score: 1.0
QC-1: 1.0
QC-2: 1.0
QC-3: 1.0
See full output.

Olalde et al. Nature (2019)

Proto-Lusitanian sample from Coimbra, reported as of hg. R1b-P297.

Sample_name: I7687_1240k
Hg: R1b1a1b
Hg_marker: R-CTS1415^^/etc*(xA561,A1243,CTS8234,Z222,PF6584,DF106,S798,BY451^^,A543,L1403,FGC11317,L277.1,CTS7556)
Total_reads: 121116
Valid_markers: 328
QC-score: 1.0
QC-1: 1.0
QC-2: 1.0
QC-3: 1.0

See full output.

Mittnik et al. Cell (2018)

Sample Turlojiske1932, male, haplogroup not reported in the original paper.

Sample_name SRR6354776
Hg: R1a~
Hg_marker: R-F1769*(xYP5531,YP360,FGC32020,YP5292)
Total_reads: 20965
Valid_markers: 185
QC-score: 1.0
QC-1: 1.0
QC-2: 1.0
QC-3: 1.0

See full output.

Poor coverage, but seems ancestral to YP5531, hence possibly to R1a1a1a1-CTS7083/L664/S298 (the “basal” R1a-M417 subclades), which would make it R1a-Z645, but not even this can be confirmed.

Olalde et al. Nature (2017, 2018)

Sample from Bohemia, reported as of hg. R1b1a2a1a(xR1b1a2a1a1,xR1b1a2a1a2) in the paper, by amateurs as potentially of hg. R1b-U106.

Sample_name: I7288_1240k
Hg: R1b1a1b1a1a1c2b2
Hg_marker: R-S268*(xS23409,BY118,S26063,S1743)
Total_reads: 190985
Valid_markers: 977
QC-score: 1.0
QC-1: 1.0
QC-2: 1.0
QC-3: 1.0

See full output.

Certainly R1b-M269, probably R1b-L151, one quality read within R1b-U106, R1b1a1b1a1a1c2b2-S268 (C->T). No SNP calls challenging or corroborating this.


Sample from Hungary, reported as of hg. R1b1a2a1a2b1-L51, by amateurs as potentially of hg. R1b-U106.

Sample_name: I4178_1240k
Hg: NA
Hg_marker: NA
Total_reads: 1838301
Valid_markers: 7832
QC-score: 0.0
QC-1: 0.0
QC-2: 0.0
QC-3: 0.0

See full output.

Most likely R1b-M269, one quality read within R1b-U106, R1b1a1b1a1a1b-S493 (G->A). No SNP calls challenging or corroborating this.

Mathieson et al. Nature (2017, 2018)

Balkans Chalcolithic

Balkan outlier from Smyadovo. Officially reported as of hg. R.

Sample_name: I2181
Hg: NA
Hg_marker: NA
Total_reads: 146894
Valid_markers: 713
QC-score: 0.0
QC-1: 0.0
QC-2: 1.0
QC-3: 1.0
See full output.

Poor coverage. Positive up to CT, then R-P280 (C->G), R1b1a1-CTS9018 (C->T) R1b1a1b-PF6452 (G->A), with no possibility of contrasting if they are due to damage.

Also, negative for P~-PF5867 (G->A, ditto), and P~CTS10081 (C->A).

Ukraine_Eneolithic

Sample with steppe ancestry from Alexandria. Officially reported as of hg. R1a-M417. Amateurs have reported it as of hg. R1a-Z93 (Y95+, Y26+, L657-).

Sample_name: I6561
Hg: NA
Hg_marker: NA
Total_reads: 2043663
Valid_markers: 10718
QC-score: 0.0
QC-1: 0.0
QC-2: 1.0
QC-3: 1.0
See full output.

It has positive calls up to R1a1a1-M417, and also R1a1a1b~-F3044, R1a1a1b~-CTS9754, even R1a1a1b2~-AM01870, possibly then R1a1a1b2-Z93.

The other positives are C->T, including R1a1a1b2a~-F3568, and R1a1a1b2a1~AM00483.

In fact, R1a1a1b2a1~-AM00479 (G->T) is negative.

van de Loosdrecht (2018)

TAF009 is reported in the supplementary materials as E1b1b1a1b1-Z5994 (see here), but the SNP calls are few, and the description of the paper only mentions E1b1a1-M78 for five samples.

Sample_name: SRR6664777
Hg: NA
Hg_marker: NA
Total_reads: 722793
Valid_markers: 2880
QC-score: 0
QC-1: 0
QC-2: 0
QC-3: 0

See full output.

The report allows us to confirm E1b1a1-M78 in combination with the other Taforalt samples of the same haplogroup (the output alone shows a warning because of conflicting data), but it seems ancestral to different SNPs defining E1b1b1a1b1.

Mathieson et al. Nature (2015)

Yamnaya

Sample from Yamnaya at Lopatino II, Samara, officially reported as of hg. R1b-L23. Genetiker reported it as of hg. R1b-L23, Pre-L51 (Y410+, L51-).

Sample_name: I0443
Hg: NA
Hg_marker: NA
Total_reads: 146894
Valid_markers: 713
QC-score: 0.0
QC-1: 0.0
QC-2: 1.0
QC-3: 1.0
See full output.

Positive up to R1b-L23 and negative for R1b1a1b1b-M12149 (G->A) and R1b1a1b1b-Z2105 (C->A) and beyond. Negative for R1b1a1b1a-L51 (G->A) and negative for multiple SNPs at R1b-L151. Surely R1b-L23 (xZ2103, xL51), but Y410+ (confirmed with other software by different amateurs) means it’s probably upstream from R1b-L51.

Samara_Eneolithic

Three samples from Khvalynsk II.

The chieftain, officially reported as of hg. R1b-L752. Amateurs had reported it as of hg. R1b-Pre-V88, hg. R1b-M73, hg. R1a-V1636.

Sample_name: I0122
Hg: R1b1a2
Hg_marker: R-BY15369/etc*(xBY15332,Y109041)
Total_reads: 1914224
Valid_markers: 7449
QC-score: 1.0
QC-1: 1.0
QC-2: 1.0
QC-3: 1.0
See full output.


The sample reported as of hg. R1a-M459. Amateurs have reported it to be of hg. R1a-YP1272

Sample_name: I0433
Hg: NA
Hg_marker: NA
Total_reads: 1396905
Valid_markers: 5074
QC-score: 0.0
QC-1: 0.0
QC-2: 1.0
QC-3: 1.0
See full output.

It has positive calls up to R1a1, but negative for R1a1a-M198 and R1a1b-YP1272.


The sample reported as of hg. Q1a:

Sample_name: I0434
Hg: Q1
Hg_marker: Q-F2676*(xF4930,L57)
Total_reads: 147970
Valid_markers: 490
QC-score: 1.0
QC-1: 1.0
QC-2: 1.0
QC-3: 1.0
See full output.

It has dubious negative calls for Q1b-L57 (G->A), Q1b-CTS8328 (C->T), Q1b1-L55 (G->A), and Q1b1-L476 (G->A). Undefined beyond Q1.

Samara hunter-gatherer

Officially reported as of hg. R1b-L754. Amateurs have reported it to be of hg. R1b-M73.

Sample_name: I0124
Hg: NA
Hg_marker: NA
Total_reads: 1272057
Valid_markers: 5080
QC-score: 0.0
QC-1: 0.0
QC-2: 1.0
QC-3: 1.0
See full output.

It has positive calls up to R1b-P297.
Positive for two R1b-M73 SNPs: R1b1a1a-Y13872 (C->T), but negative for others.
Positive for two R1b-M269 SNPs: R1b1a1b-L777 (T->C) and R1b1a1b-PF6436 (C->T), but negative for others.

Allentoft et al. Nature (2015)

Sample_name RISE98
Hg NA
Hg_marker NA
Total_reads 9783990
Valid_markers 39643
QC-score 0.0
QC-1 1.0
QC-2 1.0
QC-3 1.0

See full output.

Even though the result is undetermined, it has positive calls up to R1b-L151 and, being positive for R1b1a1b1a1a1-M405 (C->T) R1b-U106 is the most likely haplogroup of the sample, without confirmation of other SNPs.