All Ancient DNA Dataset

Miscellanea Population Genomics All Ancient DNA Dataset

Viewing 10 posts - 41 through 50 (of 50 total)
  • Author
    Posts
  • #34321
    Carlos Quiles
    Keymaster

      Updated to version 2.04.53, including new data from (and updates to previously reported) early farmers of Anatolia, South-East and Central Europe, from Marchi et al. bioRxiv (2020).

      #34495
      Carlos Quiles
      Keymaster

        Updated to version 2.04.68, including the changes reported for Uyelgi samples, as well as other FTDNA Haplotree (provisional) assessments, like:

        • Kostenki14, which splits hg. C1b-B66 with another FTDNA customer.
        • SG41, which splits hg. D1a-BY12975 with another FTDNA customer from Kazakhstan.
        • Samples from Yu et al. Cell (2020), such as UKY001 and probably KAG001, GLZ002, which will probably split hg. C2a-BY728.
        #34989
        Carlos Quiles
        Keymaster

          Release of version 2.05:

          I have been updating the Ancient DNA Dataset, with some global additions, clearly enough to change version number. In particular, these are the columns added (or those I consider likely or possible to be added):

          FTDNA-Y-Haplotree: for FTDNA Y-Haplotree Y-names. I hope that sorting the file following their SNP order will help clarify the actual position of each ancient sample in their respective haplogroup branches.

          As you might have noticed, I am also shifting the “main” original column, YTree, to an FTDNA-friendly naming system. Naming consistency was becoming an issue, since many samples have now a depth that cannot be followed with either ISOGG or YFull.

          NOTE. For the moment, though, I am wary of changing the subclade naming for certain haplogroups. For example, haplogroup J – for some reason – appears to have an important user base in YFull which encourages the addition of ancient samples to their YTree. Anyway, it looks as though in the near future, when all ancient samples get fully analyzed and published by FTDNA, the whole haplogroup naming ecosystem will possibly be dominated by FTDNA.

          Y-SNP: I am now selecting only SNPs approved by FTDNA, so as to avoid the many dubious SNPs described by other companies and individuals but not fully accepted by others. Nevertheless, a proper terminal SNP (with negative and dubious ones) needs a manual check, and (unless you are Michael Sager) this is an impossible task for one person. Also, I am not well-versed in most subclades, and a certain experience with ancient and modern samples is needed when it comes to assess which derived and ancestral downstream calls are more likely to be correct. I will be posting links to the files, including pathPhynder’s estimation, apart from including as many alternative Responsible-SNP sources, to strengthen the reliability of each call.

          Isotopes: Basically, whether the sample is considered local or non-local, not necessarily the specific isotopic values, which might increase the file unnecessarily.

          Skeletal-Element: Will NOTE be included, for the moment. I am not convinced that a column with bone type (or other sample origin) is useful for this ancient haplogroup compilation, except maybe for statistical analyses. For the moment, I prefer not to increase the file size.

          Data-Type: Ditto. Furthermore, by following the current Reich Lab’s naming standard (adding .SG or .DG) I think this information is mostly included in the Object_ID of the samples relevant for genome-wide analyses.

          Qualitative Assessment/Confidence of archaeological and chronological contextualization for the genetic data of an individual: Very useful new columns added currently and for the past (2?) years by the Reich Lab. Since most samples offer reliable results, only some offer doubts, and a few have alerts, it seems like the most economic choice, I am not sure if only doubts and alerts should be added to the final column, reserved for “site”, which seems like the most economic choice. Until the next release of the Reich Lab curated Dataset, I don’t think I will make a decision on this.

          In general, I will try to keep up with the Reich Lab’s Dataset naming changes, to make both compatible and easy to combine when performing formal stats, even though their slow pace of corrections (and radical naming changes from the first to the second version released) suggest that those conventions might not be valid for long.

          #34994
          Carlos Quiles
          Keymaster

            Version 2.05.07 includes recent samples from:

            Other papers like Moussa et al. (2021) and others with few samples – to see the whole list of new samples since your last downloaded version, order the spreadsheet by date (second-to-last column).

            #35401
            Carlos Quiles
            Keymaster

              Updated to version 2.05.21, including:

              Minor changes, like the update of I6561, the Alexandria sample of hg. R1a-Y3, dated supposedly ca. 4000 BC, but now corrected in the AADR based on genetic data (as I suggested to the authors here):

              Context: Layer date based on 6 20-28 cM IBD individuals with Srubnaya/Alakul/Kazakhstan_MLBA individuals from 3900-3400 [based on these genetic results we ignore the direct date of 4153-3970 calBCE (5215±20 BP, PSUAMS-2832) from same site calibrated as 95.4%; IntCal20, OxCal v4.4.2 Bronk Ramsey (2020)

              #35914
              Carlos Quiles
              Keymaster

                Updated to version 2.05.75 (There have been other intermediate versions published with some of these updates):

                • New rules for access to Y-SNP files: Now fully restricted to reliable users; bots are forbidden.
                • I have checked new batches of samples for SNP calls from the FTDNA Haplotree, including Allentoft et al. (2015), Mathieson et al. (2015) and (partially) Mathieson et al. (2018), Damgaard et al. Nature (2018), and Jeong et al. (2020).
                • Added links to Y-SNP calls from Olalde et al. (2018) and Olalde et al. (2019). Currently working on Damgaard et al. Science (2018).

                  The new color codes are intended to immediately convey information visually about recent Y-SNP updates (2021):

                • light green background: Those checked by me, in contrast with those in green background with the ‘seal of approval’ of FTDNA or YFull.
                • estimations bold: those calls considered estimations by me (due e.g. to lack of intermediate SNPs, or unreliable derived on ancestral SNP calls subject to deamination).
                • Strikethrough: in the “responsible” column, whenever the previous call is corrected (not just updated to a more specific subclade, which remains underlined).
                #37359
                Carlos Quiles
                Keymaster

                  Recent changes leading up to the current version 2.06.160:

                  Today I added FTDNA’s assessment of the Y-DNA of Peder Winstrup from Krzewińska et al. (2021).

                  Also updated are the ADMIXTURE values, including the new samples from Gnecchi et al. (2021), and experimenting with the SE Asia proxy: now the reference is Thailand LN_BA rather than Papuan.

                  All files (including PDFs) updated and uploaded.

                  #38583
                  Carlos Quiles
                  Keymaster

                    New version 2.07, now adding an inverted Formation-Age Ratio (FAR) applied to Y-SNPs and mt-SNPs, as a measure of time-related precision of the terminal SNP: the closer the value is to 1, the closer the formation date is to the ancient sample’s (radiocarbon vs. contextual) date.

                    This metric was proposed by Jari Kinnunen (from haplotree.info), and estimates are based on some relatively recent YFull formation dates adapted to FTDNA’s Y-DNA Haplotree at SNP Tracker.

                    [We are still waiting for FTDNA’s own estimations to be published, as recently announced]

                    Changes from the previously published 2.06.209 also include new SNP inferences, especially from the R1a (mainly Z93) and R1b branches (mainly P312).

                    The spreadsheet is up to date with the most recent reports of ancient samples.

                    #38921
                    Jvd
                    Participant

                      Hi Carlos,

                      Do you have any idea when the next update of the ancient spreadsheet/map will be available?

                      Best regards,

                      Jan

                       

                      #39013
                      bce
                      Participant

                        could ancient Y-DNA from these studies be added to the dataset?

                        https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7111520/

                        https://www.biorxiv.org/content/10.1101/2021.08.30.458211v1.full

                        https://dspace.cuni.cz/handle/20.500.11956/31423 (medieval Czech, data on page 83 and 84 of “text prace”)

                        I apologize if it’s already there.

                      Viewing 10 posts - 41 through 50 (of 50 total)
                      • You must be logged in to reply to this topic.