Estimating genetic kin relationships in prehistoric populations: the Corded Ware family from Esperstedt


Open access Estimating genetic kin relationships in prehistoric populations, by Monroy Kuhn, Jakobsson, & Günther, PLOS One (2018).


Archaeogenomic research has proven to be a valuable tool to trace migrations of historic and prehistoric individuals and groups, whereas relationships within a group or burial site have not been investigated to a large extent. Knowing the genetic kinship of historic and prehistoric individuals would give important insights into social structures of ancient and historic cultures. Most archaeogenetic research concerning kinship has been restricted to uniparental markers, while studies using genome-wide information were mainly focused on comparisons between populations. Applications which infer the degree of relationship based on modern-day DNA information typically require diploid genotype data. Low concentration of endogenous DNA, fragmentation and other post-mortem damage to ancient DNA (aDNA) makes the application of such tools unfeasible for most archaeological samples. To infer family relationships for degraded samples, we developed the software READ (Relationship Estimation from Ancient DNA). We show that our heuristic approach can successfully infer up to second degree relationships with as little as 0.1x shotgun coverage per genome for pairs of individuals. We uncover previously unknown relationships among prehistoric individuals by applying READ to published aDNA data from several human remains excavated from different cultural contexts. In particular, we find a group of five closely related males from the same Corded Ware culture site in modern-day Germany, suggesting patrilocality, which highlights the possibility to uncover social structures of ancient populations by applying READ to genome-wide aDNA data. READ is publicly available from

Kin-relationship among males at the Corded Ware site in Esperstedt, Germany. The five individuals, their inferred degree of relationship and their uniparental haplogroups. The dashed line between I1540 and I1538 shows a second degree relationship missed by READ.

I already wrote about its bioRxiv preprint, and how this late Corded Ware family from Esperstedt – which obviously led some researchers to certain wrong conclusions since its publication some 5 years ago – shows an evident shift (in admixture and PCA cluster) to the steppe, probably unrelated to the initial Corded Ware expansion.

This difference with other earlier Corded Ware migrants may also explain their shared R1a-M417, possibly xZ645 lineages, different from the R1a-Z645 subclades that expanded with Corded Ware migrants.


The concept of “outlier” in studies of Human Ancestry, and the Corded Ware outlier from Esperstedt


While writing the third version of the Indo-European demic diffusion model, I noticed that one Corded Ware sample (labelled I0104) clusters quite closely with steppe samples (i.e. Yamna, Afanasevo, and Potapovka). The other Corded Ware samples cluster, as expected, closely with east-central European samples, which include related cultures such as the Swedish Battle Axe, and later Sintashta, or Potapovka (cultures that are from the steppe proper, but are derived from Corded Ware).

I also noticed after publishing the draft that I had used the wording “Corded Ware outlier” at least once. I certainly had that term in mind when developing the third version, but I did not intend to write it down formally. Nevertheless, I think it is the right name to use.

PCA of dataset including Minoans and Mycenaeans, and Scythians and Sarmatians. The graphic has been arranged so that ancestries and samples are located in geographically friendly axes similar to north-south (Y), east-west(X). Symbols are used, in a simplified manner, in accordance with symbols for Y-DNA haplogroups used in the maps. Labels have been used for simplification of important components. Areas are drawn surrounding Yamna, Poltavka, Afanasevo, Corded Ware (including samples from Estonia, Battle Axe, and Poltavka outlier), and succeeding Sintashta and Potapovka cultures, as well as Bell Beaker. Corded Ware sample I0104, from Esperstedt, has also been labelled.

Outlier in Statistics, as you can infer from the name, is a sample (more precisely an observation) that lies distant to others. It is a slippery concept in Human Evolutionary Biology, because it has no clear definition, and it is thus dependent on a certain degree of subjective evaluation. It seems to be mainly based on a combination of PCA and ADMIXTURE analyses, but should obviously be dependent on the number of samples available for a certain culture, and the regional distribution of the samples available.

We have thus certain clear cases, like the Poltavka outlier, of R1a-M417 lineage, clustering close to Corded Ware (and Sintashta, and Potapovka) samples, but far from other R1b-L23 samples from Poltavka or Yamna cultures, from neighbouring regions in the steppe.

We have also less clear observations, like Balkan Chalcolithic samples, which may or may not have been part of different cultural groups (say, related to the Suvorovo-Novodanilovka expansion, or not), which may justify their differences in ancestral components in ADMIXTURE, and in their position in PCA.

And we have a Yamna sample from western Ukraine, which – unlike the other two available samples – clusters “to the south” of east Yamna samples. Taking into account the Yamna sample from Bulgaria, clustering closely with south-eastern European samples, could you really call this an outlier? Two outliers out of four western Yamna samples? Well, maybe. If you take east and west Yamna from the steppe as a whole, and exclude the Yamna sample from Bulgaria, of course you can. Whether that classification is useful, or actually hinders a proper interpretation of western Yamna samples, and of the “Yamna component” seen in them, is a different story…

PCA for European samples of Mathieson et al. (2017)

But what then about the Corded Ware male from Esperstedt, labelled I0104, dated ca. 2430 BC, which clusters among contemporaneous steppe (Poltavka) samples, and has the greatest proportion of ‘Yamna component’ in ADMIXTURE? After all, it is different in both respects from any other Corded Ware individual – including the oldest samples available, from Latvia (ca. 2885 BC) and Tiefbrunn (ca. 2755 BC).

This sample is one of the direct links between the steppe and Corded Ware in late times, and has been the main reason for the confusion a lot of people seem to have about the “Yamna component” in Corded Ware, with some supporting a direct migration from one into the other, and a few even daring to say that “Corded Ware is indistinguishable from Yamna”(!?).

His family members – all males of haplogroup R1a-M417 (like I0104 and most males from the Corded Ware culture) -, few generations later, show a decreased Yamna component, which clearly indicates that this individual’s admixture came directly from the steppe, and most likely from one or multiple female ancestors. That is compatible with the nomadic nature of the Corded Ware culture (and its known exogamy practices), which connected central Europe with the steppes, up to the North Caspian region.

If labelling other samples as outliers may be interesting to improve the conclusions one can obtain from genetic research, labelling this sample is, in my opinion, essential, to avoid certain strong misconceptions about the origin of the Corded Ware culture.