If you have followed this blog for some time, you know that I consider the evolution of ancient haplogroups to be a keystone to understand prehistoric migrations in the fine scale necessary to outline potential language evolution and language change, being often more relevant than simplistic admixture analyses – especially Y-DNA bottlenecks among pre-Iron Age populations. I am glad to see that most labs dealing with human population genomics seem to be caring about this aspect lately, too.
All Ancient DNA Dataset
Since 2018, I have compiled data of Y-DNA and mtDNA for most reported ancient samples, including analyses of BAM files by hobbyists and online informal reports of research papers in preparation, including also my automated haplogroup inferences performed with the software Yleaf v.2 used by professional geneticists. I referenced the first version of the dataset in the book series A Song of Sheep and Horses, and I have improved it a lot since then, in quantity and in quality.
You can access the latest version of the data at the following links:
As shortcut, just type haplogroup.info on your browser.
NOTE. You can download the files or read them online. The behaviour of the Excel sheet might differ when reading it with Google Drive, though.
Direct link to files from haplogroup.info (may not be the latest ones):
- Excel: original file.
- CSV: comma-separated file (non-UTF8 to avoid errors).
- TSV: tab-separated values (text file).
- PDF for reading and searching.
- HTML for online reference (slow to load!).
For more information on the meaning of columns, color codes, emphasis, and styles in general check the second sheet in the file, called Supp. Info.
By default, samples are ordered by ISOGG nomenclature and (secondarily) by mean year cal. BC, which gives a more natural visual of Y-DNA subclades, but you can order samples by any other column, search for specific values, etc.
A formatted spreadsheet with all FTDNA SNPs available to date has been kindly provided by Göran Runfeldt.
Updates with their date and comment start anew in major versions. For updates, comments, etc. there is a dedicated thread at the forum.
Many detailed Y-DNA SNP calls are available from the haplogroup inference section of this website.
Citing this work
You can reuse and modify the files posted here and their content as you see fit, for any personal or academic research, as well as for any kind of project, whether personal or professional, open or copyrighted.
Copyright 2013-2018 Jean Manco.
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
Please always cite the source with version number, and whenever possible (online work) this website indo-european.eu, too.
All samples including haplogroup inference different from the original publication have an individual “responsible”. If you find that the inference or its author is incorrect, please contact me.
NOTE. I have checked some of these inferences – many more than the most relevant ones I posted here – and, whenever necessary, selected those I agree with. My own inferences follow results from YLeaf v.2
Beyond the simplest netiquette, you should cite and/or link to this file whenever you publish any bulk information remade or taken from (or based on) it, for a simple reason:
If you reference the file with its version, you make it easier for people to track changes, especially those related to errors corrected since you last downloaded or checked it, and you offer thus information on how reliable your data (and project) actually is.
Everyone can read the online spreadsheet, but only I am able to modify it. The best way to make changes is to email me corrections directly at [email protected].
Alternatively, you can leave a comment on this page or any post of this blog related to specific samples, or at the Indo-European forum, especially if you think it merits some discussion.
If you have an interesting (public or private) project and want to receive immediate notifications of updates from Google Drive, or you would like to discuss the possibility of editing the file directly for bulk modifications, you may also contact me.
Maps and GIS
ArcGIS Online offers the possibility of publishing data layers over image layers with predetermined themes for ease of use. You can read more about instructions for use of the software.
NOTE. The easier to remember domain name haplogroup.info can be used to access both, this ArcGIS Online map and the Google Drive folder with the dataset.
The free GIS software QGIS offers the possibility to export different open formats, so I have used it to publish online maps with all samples and also divided by age, that users can quickly refer to.
There is an updated list of available static and dynamic maps for reference of Y-DNA and mtDNA haplogroups of ancient samples:
I will post here relevant projects using this spreadsheet:
NOTE. Before you do a simple copy (or just some minor modifications) to publish it elsewhere, think about how your contribution is going to compensate for the multiple errors that will be corrected in this spreadsheet, while your copy (and many other similar ones) might keep spreading them. In my experience, it becomes very soon prohibitive to track all changes done to just two files, so imagine multiple ones. If your aim is to improve certain parts, think about collaborating instead.
SNP Tracker is currently the only free tool to obtain migration paths of SNPs, offering inferences based on FTDNA SNPs and additional YFull data at a professional level.
The tool is constantly updated and uses YFull SNP formation dates to interpolate dates onto the much larger FTDNA Y and mt SNP trees. SNP Tracker loads the full Y and mt trees in the background.
Furthermore, Paleolithic to Bronze Age paths are largely determined by hand-curated SNPs that are “pegged” to specific locations, avoiding thus the usual distortions of any algorithm blindly applied to the data. To validate these “pegged” locations, Rob has analysed DNA data from ancient sites here relative to specific SNPs to see how they work out compared to other automatic means.
NOTE. For example, it is very likely that, without “pegging”, R1b’s route based on modern samples would probably follow a maritime route from the Middle East through the Mediterranean into Central Europe…
Maps also show visually additional information, such as plagues, cultures, other “co-resident” haplogroups, etc. without excess visual clutter.
Rob is now actively developing new ideas based on ancient DNA data, so stay tuned to his website for news!
Karim Roussi uses the professional ArcGIS online tool for quick reference and search of ancient DNA data, although the forked spreadsheet used is now outdated:
You can contact him under mirekium (at gmail.com) to help him update the collaborative file, propose other projects, etc.