A Song of Sheep and Horses, revised edition, now available as printed books

cover-song-sheep-and-horses

As I said 6 months ago, 2019 is a tough year to write a blog, because this was going to be a complex regional election year and therefore a time of political promises, hence tenure offers too. Now the preliminary offers have been made, elections have passed, but the timing has slightly shifted toward 2020. So I may have the time, but not really any benefit of dedicating too much effort to the blog, and a lot of potential benefit of dedicating any time to evaluable scientific work.

On the other hand, I saw some potential benefit for publishing texts with ISBNs, hence the updates to the text and the preparation of these printed copies of the books, just in case. While Spain’s accreditation agency has some hard rules for becoming a tenured professor, especially for medical associates (whose years of professional experience are almost worthless compared to published peer-reviewed papers), it is quite flexible in assessing one’s merits.

However, regional and/or autonomous entities are not, and need an official identifier and preferably printed versions to evaluate publications, such as an ISBN for books. I took thus some time about a month ago to update the texts and supplementary materials, to publish a printed copy of the books with Amazon. The first copies have arrived, and they look good.

series-song-sheep-horses-cover

Corrections and Additions

Titles
I have changed the names and order of the books, as I intended for the first publication – as some of you may have noticed when the linguistic book was referred to as the third volume in some parts. In the first concept I just wanted to emphasize that the linguistic work had priority over the rest. Now the whole series and the linguistic volume don’t share the same name, and I hope this added clarity is for the better, despite the linguistic volume being the third one.

Uralic dialects
I have changed the nomenclature for Uralic dialects, as I said recently. I haven’t really modified anything deeper than that, because – unlike adding new information from population genomics – this would require for me to do a thorough research of the most recent publications of Uralic comparative grammar, and I just can’t begin with that right now.

Anyway, the use of terms like Finno-Ugric or Finno-Samic is as correct now for the reconstructed forms as it was before the change in nomenclature.

west-east-uralic-schema

Mediterranean
The most interesting recent genetic data has come from Iberia and the Mediterranean. Lacking direct data from the Italian Peninsula (and thus from the emergence of the Etruscan and Rhaetian ethnolinguistic community), it is becoming clearer how some quite early waves of Indo-Europeans and non-Indo-Europeans expanded and shrank – at least in West Iberia, West Mediterranean, and France.

Finno-Ugric
Some of the main updates to the text have been made to the sections on Finno-Ugric populations, because some interesting new genetic data (especially Y-DNA) have been published in the past months. This is especially true for Baltic Finns and for Ugric populations.

ananino-culture-new

Balto-Slavic
Consequently, and somehow unsurprisingly, the Balto-Slavic section has been affected by this; e.g. by the identification of Early Slavs likely with central-eastern populations dominated by (at least some subclades of) hg. I2a-L621 and E1b-V13.

Maps
I have updated some cultural borders in the prehistoric maps, and the maps with Y-DNA and mtDNA. I have also added one new version of the Early Bronze age map, to better reflect the most likely location of Indo-European languages in the Early European Bronze Age.

As those in software programming will understand, major changes in the files that are used for maps and graphics come with an increasing risk of additional errors, so I would not be surprised if some major ones would be found (I already spotted three of them). Feel free to communicate these errors in any way you see fit.

bronze-age-early-indo-european
European Early Bronze Age: tentative langage map based on linguistics, archaeology, and genetics.

SNPs
I have selected more conservative SNPs in certain controversial cases.

I have also deleted most SNP-related footnotes and replaced them with the marking of each individual tentative SNP, leaving only those footnotes that give important specific information, because:

  • My way of referencing tentative SNP authors did not make it clear which samples were tentative, if there were more than one.
  • It was probably not necessary to see four names repeated 100 times over.
  • Often I don’t really know if the person I have listed as author of the SNP call is the true author – unless I saw the full SNP data posted directly – or just someone who reposted the results.
  • Sometimes there are more than one author of SNPs for a certain sample, but I might have added just one for all.
ancient-dna-all
More than 6000 ancient DNA samples compiled to date.

For a centralized file to host the names of those responsible for the unofficial/tentative SNPs used in the text – and to correct them if necessary -, readers will be eventually able to use Phylogeographer‘s tool for ancient Y-DNA, for which they use (partly) the same data I compiled, adding Y-Full‘s nomenclature and references. You can see another map tool in ArcGIS.

NOTE. As I say in the text, if the final working map tool does not deliver the names, I will publish another supplementary table to the text, listing all tentative SNPs with their respective author(s).

If you are interested in ancient Y-DNA and you want to help develop comprehensive and precise maps of ancient Y-DNA and mtDNA haplogroups, you can contact Hunter Provyn at Phylogeographer.com. You can also find more about phylogeography projects at Iain McDonald’s website.

Graphics
I have also added more samples to both the “Asian” and the “European” PCAs, and to the ADMIXTURE analyses, too.

I previously used certain samples prepared by amateurs from BAM files (like Botai, Okunevo, or Hittites), and the results were obviously less than satisfactory – hence my criticism of the lack of publication of prepared files by the most famous labs, especially the Copenhagen group.

Fortunately for all of us, most published datasets are free, so we don’t have to reinvent the wheel. I criticized genetic labs for not releasing all data, so now it is time for praise, at least for one of them: thank you to all responsible at the Reich Lab for this great merged dataset, which includes samples from other labs.

NOTE. I would like to make my tiny contribution here, for beginners interested in working with these files, so I will update – whenever I have time – the “How To” sections of this blog for PCAs, PCA3d, and ADMIXTURE.

-iron-age-europe-romans
Detail of the PCA of European Iron Age populations. See full versions.

ADMIXTURE
For unsupervised ADMIXTURE in the maps, a K=5 is selected based on the CV, giving a kind of visual WHG : NWAN : CHG/IN : EHG : ENA, but with Steppe ancestry “in between”. Higher K gave worse CV, which I guess depends on the many ancient and modern samples selected (and on the fact that many samples are repeated from different sources in my files, because I did not have time to filter them all individually).

I found some interesting component shared by Central European populations in K=7 to K=9 (from CEU Bell Beakers to Denmark LN to Hungarian EBA to Iberia BA, in a sort of “CEU BBC ancestry” potentially related to North-West Indo-Europeans), but still, I prefer to go for a theoretically more correct visualization instead of cherry-picking the ‘best-looking’ results.

Since I made fun of the search for “Siberian ancestry” in coloured components in Tambets et al. 2018, I have to be consistent and preferred to avoid doing the same here…

qpAdm
In the first publication (in January) and subsequent minor revisions until March, I trusted analyses and ancestry estimates reported by amateurs in 2018, which I used for the text adding my own interpretations. Most of them have been refuted in papers from 2019, as you probably know if you have followed this blog (see very recent examples here, here, or here), compelling me to delete or change them again, and again, and again. I don’t have experience from previous years, although the current pattern must have been evidently repeated many times over, or else we would be still talking about such previous analyses as being confirmed today…

I wanted to be one step ahead of peer-reviewed publications in the books, but I prefer now to go for something safe in the book series, rather than having one potentially interesting prediction – which may or may not be right – and ten huge mistakes that I would have helped to endlessly redistribute among my readers (online and now in print) based on some cherry-picked pairwise comparisons. This is especially true when predictions of “Steppe“- and/or “Siberian“-related ancestry have been published, which, for some reason, seem to go horribly wrong most of the time.

I am sure whole books can be written about why and how this happened (and how this is going to keep happening), based on psychology and sociology, but the reasons are irrelevant, and that would be a futile effort; like writing books about glottochronology and its intermittent popularity due to misunderstood scientist trends. The most efficient way to deal with this problem is to avoid such information altogether, because – as you can see in the current revised text – they wouldn’t really add anything essential to the content of these books, anyway.

Continue reading

Official site of the book series:
A Song of Sheep and Horses: eurafrasia nostratica, eurasia indouralica

More Hungarian Conquerors of hg. N1c-Z1936, and the expansion of ‘Altaic-Uralic’ N1c

Open access Y-chromosomal connection between Hungarians and geographically distant populations of the Ural Mountain region and West Siberia, by Post et al. Scientific Reports (2019) 9:7786.

Hungarian Conquerors

More interesting than the study of modern populations of the paper is the following excerpt from the introduction, referring to a paper that is likely in preparation, Európai És Ázsiai Apai Genetikai Vonalak A Honfoglaló Magyar Törzsekben, by Fóthi, E., Fehér, T., Fóthi, Á. & Keyser, C., Avicenna Institute of Middle Eastern Studies (2019):

Certain chr-Y lineages from haplogroup (hg) N have been proposed to be associated with the spread of Uralic languages. So far, hg N3 has not been reported for Indo-European speaking populations in Central Europe, but it is present among Hungarians, although the proportion of hg N in the paternal gene pool of present-day Hungarians is only marginal (up to 4%) compared to other Uralic speaking populations. It has been shown earlier that one of the sub-clades of hg N – N3a4-Z1936 – could be a potential link between two Ugric speaking populations: the Hungarians and the Mansi. It is also notable that some ancient Hungarian samples from the 9th and 10th century Carpathian Basin belonged to this hg N sub-clade: Three Z1936 samples were found in the Upper-Tisza area (Karos II, Bodrogszerdahely/Streda nad Bodrogom) and two in the Middle-Tisza basin cemeteries (Nagykörű and Tiszakécske). The haplotype of the Nagykörű sample is identical with one contemporary Hungarian sample from Transylvania that tested positive for B545 marker downstream of N3a4-Z193632. Similar findings come from the maternal gene pool of historical Hungarians: the analyses of early medieval aDNA samples from Karos-Eperjesszög cemeteries revealed the presence of mtDNA hgs of East Asian provenance.

A commenter recently wrote that in a study by Fehér (probably this one) two Hungarian conquerors, from Ormenykut and Tuzser, will be of hg. N1c-2110. Assuming no other lineages will appear, this would leave the proportion of N1c-L392 vs. R1a-Z280/Z93 closer to the reported proportion of hg. N vs. R1a (5 vs. 2) among Sargat samples, and is thus compatible with a direct migration of Hungarians from around the Urals.

However, the sampling of Iron Age populations around the Urals is scarce, and we don’t know what other lineages these studied Magyars will have, but – based on the known variability of the published ones, and on the ca. 50-60 early Magyar males available to date in previous studies to obtain Y-chromosome haplogroups – I would say these reported N1c lineages are just a tiny proportion of what’s to come…

“Altaic-Uralic” N1c

altaic-uralic-n1c-haplogroup
Phylogenetic tree of hg N3a4. Phylogenetic tree of 33 high coverage Y-chromosomes from
haplogroup N3a4 was reconstructed with BEAST v.1.7.5 software package.

Archaeogenetic studies based on mtDNA haplotypes have shown that ancient Hungarians were relatively close to contemporary Bashkirs who are a Turkic speaking population residing in the Volga-Ural region. Another study reported excessive identical-by-descent (IBD) genomic segments shared between the Ob-Ugric speaking Khantys and Bashkirs but a moderate IBD sharing between Turkic speaking Tatars and their neighbours including Bashkirs.

Phylogenetic tree of hg N3a4 has two main sub-clades defined by markers B535 and B539 that diverged around 4.9 kya (95% confidence interval [CI] = 3.7–6.3 kya). Inner sub-clades of N3a4-B539 (defined by markers B540 and B545) split 4.2 kya (95% CI = 3.0–5.6 kya). (…) The phylogenetic tree reveals that all five Hungarian samples belong to N3a4-B539 sub-clade that they share with Ob-Ugric speaking Khanty and Mansi, and Turkic speaking Bashkirs and Tatars from the Volga-Ural region. Hungarian and Bashkir chrY lineages belong to both sub-clades of N3a4-B539.

Modern distribution of the “Ugric N1c”

To test the presence and proportions of hg N3a4 lineages in a more comprehensive sample set and with a higher phylogenetic resolution level compared to earlier studies, we analysed the genotyping data of about 5000 Eurasian individuals, including West Siberian Mansi and Khanty who are linguistically closest to Hungarians

n3a4-n1c-z1936-ugric
Map of the entire hg N3a4.

There is a clear difference in geographic distribution patterns of these two hg N3a4 sub-clades. Hg N3a4-B535 (Fig. 3b) is common mostly among Finnic (Finns, Karelians, Vepsas, Estonians) and Saami speaking populations in North eastern Europe. The highest frequency is detected in Finns (~44%) but it also reaches up to 32% in Vepsas and around 20% in Karelians, Saamis and North Russians. The latter are known to have changed their language or to be an admixed population with reported similar genetic composition to their Finnic speaking neighbors. The frequency of N3a4-B535 rapidly decreases towards south to around 5% in Estonians, being almost absent in Latvians (1%) and not found among Lithuanians. Towards east its frequency is from 1–9% among Eastern European Russians and populations of the Volga-Ural region such as Komis, Mordvins and Chuvashes (…)

n3a4-n1c-z1936-finnic-samic
Map of N3a4 subclades defined by B535.

Hg N3a4-B539, on the other hand, is prevalent among Turkic speaking Bashkirs and also found in Tatars but is entirely missing from other populations of the Volga-Ural region such as Uralic speaking Udmurts, Maris, Komis and Mordvins, and in Northeast Europe, where instead N3a4-B535 lineages are frequent. Besides Bashkirs and Tatars in Volga-Ural region, N3a4-B539 is substantially represented in West Siberia among Ugric speaking Mansis and Khantys. Among Hungarians, however, N3a4-B539 has a subtle frequency of 1–4%.

n3a4-n1c-z1936-ugric-bashkir
Map of N3a4 subclades defined by B539, with a local snapshot showing the N3a4-B539 distribution among Hungarian speakers.

The battle to appropriate N1c-L392

So, basically, the team of Kristiina Tambets is arguing that N1c-VL29 expanded Finnic to the East Baltic (hence from a common Finno-Mordvinic dialect splitting ca. 600 BC on?) because, you know, apparently the agreed separation of known Uralic dialects from ca. 2000 BC, and their Bronze Age presence around the Baltic, is not valid when you follow haplogroups instead of languages or archaeology.

But now this other group of Tambets (co-author of this paper) considers that hg. N1c-Z1936 – which is probably behind the N1c-L392 samples from Lovozero Ware in the Kola Peninsula – represent either the True Uralic-speaking Palaeo-Arctic peoples, or else merely Ugric-speaking peoples which happened to expand to Fennoscandia but left no trace of their language…

To accept this identification you only have to NOT ask why:

  • N1c is first found in ancient cultures close to Lake Baikal.
  • N1c-L392 appears in ancient East Asian populations speaking completely different languages, with Altaic and Uralic being just some among many Palaeo-Siberian populations where the haplogroup will pop up.
  • Turkic populations like Bashkirs and Tatars (who expanded to the Volga through the southern Urals before the expansion of Hungarians) show a shared distribution of the B539 haplotype with Hungarians.
  • The phylogenetic tree and areas of N1c-L392 expansions don’t make any sense in light of the known linguistic and cultural expansions of Uralic-speaking peoples.

In fact, the Hungarian research group of Neparáczki – publishing the recent paper on Hungarian Conquerors – was apparently looking for a connection with Turkic peoples to support some traditional Turanian myths, and they found it in some scattered R1a-Z93 samples which supposedly connect Hungarian Conquerors to Huns (?), instead of looking for this closer link through N1c-Z1936 (especially haplotype B539)…

Also, is it me or are there two opposed trends with completely different interpretations among researchers publishing papers about hg. N1c: one systematically arguing for Altaic origins, and another for Uralic ones?

If somebody sees some complex reasoning behind the discussions of all these recent papers, beyond the simplest “let’s follow N for Uralic/Altaic”, feel free to comment below. Just so I can understand what I might be doing wrong in assessing Neolithic and Bronze Age migrations in linguistics and archaeology with help of ancient haplogroups coupled with ancestral components, but these researchers are doing right by playing with obsessive ideas born out of the 2000s coupled with phylogenetic trees and maps of modern haplogroup distributions…

This is probably going to be this blog’s most used image in 2019:

horse-meme-steppe-ancestry

Related

Mitogenomes suggest rapid expansion of domesticated horse before 3500 BC

Open access Origin and spread of Thoroughbred racehorses inferred from complete mitochondrial genome sequences: Phylogenomic and Bayesian coalescent perspectives, by Yoon et al. PLOS One (2018).

Abstract (emphasis mine)

The Thoroughbred horse breed was developed primarily for racing, and has a significant contribution to the qualitative improvement of many other horse breeds. Despite the importance of Thoroughbred racehorses in historical, cultural, and economical viewpoints, there was no temporal and spatial dynamics of them using the mitogenome sequences. To explore this topic, the complete mitochondrial genome sequences of 14 Thoroughbreds and two Przewalski’s horses were determined. These sequences were analyzed together along with 151 previously published horse mitochondrial genomes from a range of breeds across the globe using a Bayesian coalescent approach as well as Bayesian inference and maximum likelihood methods. The racing horses were revealed to have multiple maternal origins and to be closely related to horses from one Asian, two Middle Eastern, and five European breeds. Thoroughbred horse breed was not directly related to the Przewalski’s horse which has been regarded as the closest taxon to the all domestic horses and the only true wild horse species left in the world. Our phylogenomic analyses also supported that there was no apparent correlation between geographic origin or breed and the evolution of global horses. The most recent common ancestor of the Thoroughbreds lived approximately 8,100–111,500 years ago, which was significantly younger than the most recent common ancestor of modern horses (0.7286 My). Bayesian skyline plot revealed that the population expansion of modern horses, including Thoroughbreds, occurred approximately 5,500–11,000 years ago, which coincide with the start of domestication. This is the first phylogenomic study on the Thoroughbred racehorse in association with its spatio-temporal dynamics. The database and genetic history information of Thoroughbred mitogenomes obtained from the present study provide useful information for future horse improvement projects, as well as for the study of horse genomics, conservation, and in association with its geographical distribution.

horse-domestication
Bayesian skyline plot (BSP) based on mitochondrial genome sequences from 167 modern horses.
The dark line in the BSP represents the estimated effective population size through time. The green area represents the 95% highest posterior density confidence intervals for this estimate.

Interesting excerpts:

We carried out a Bayesian coalescent approach using extended mitochondrial genome sequences from 167 horses in order to further assess the timescale of horse domestication. Here, we first calculated the time of the most recent common ancestor of Thoroughbred horses. Our analysis revealed the age of the most recent common ancestor of the racing horse to be around 8,100–111,500 years old. This estimate is much younger than that of the most recent common ancestor of the global horses, which has been estimated at 0.7286 Mys old.

phylogenetic-tree-horses
Bayesian maximum clade credibility phylogenomic tree on the ground of the mitochondrial genome sequences of 167 modern horses.
The data set (16,432 base pairs) was also analyzed phylogenetically using Bayesian inference (BI) and maximum likelihood (ML) methods which showed the same topologies. 95% Highest Posterior Density of node heights are shown by blue bars. Groups are marked by a “G”. Numbers at the nodes represent (left to right): posterior probabilities (≥0.80) for the BI tree and bootstrap values (≥70%) for the ML tree. The racing horses were revealed to have multiple maternal origins and to be closely related to horses from one Asian, two Middle Eastern, and five European breeds. Results of phylogenomic analyses also uncovered no apparent association between geographic origin or breed and heterogeneity of global horses. The most recent common ancestor of the Thoroughbreds lived approximately 8,100–111,500 years ago, which was significantly younger than the most recent common ancestor of modern horses (0.7286 My).

On the domestication time of modern horses, there have been several publications derived from both archaeological [49–51] and molecular [11–12, 23, 48] evidences. D’Andrade [49] reported that the origin of domestic horses was around 4,000 years ago. Ludwig et al. [50] stated the domestication time to be about 5,000 years ago, while Anthony [51] noted that horse rearing by humans may have occurred approximately 6,000 years ago. Subsequently, on the basis of mitochondrial genome sequences, Lippold et al. [11] and Achilli et al. [12] postulated domestication time to be about 6,000–8,000 and 6,000–7,000 years ago, respectively. Warmuth [48] dated domestication time to 5,500 years ago based on autosomal genotype data, while Orlando et al. [23] claimed that Przewalski’s and domestic horse populations diverged 38,000–72,000 years ago based on analysis of genome sequences. In contrast to the previous hypothesized date of horse domestication, the results of our Bayesian skyline plot (BSP) analysis depict a rapid expansion of the horse population approximately 5,500–11,000 years ago, which coincides with the start of domestication.

It seems that we will not have an update on horse aDNA from the ISBA 8, so we will have to make do with this for the moment.

Related

Yet another Bayesian phylogenetic tree – now for Dravidian

dravidian-languages

Open access A Bayesian phylogenetic study of the Dravidian language family, by Kolipakam et al. (including Bouckaert and Gray), Royal Society Open Science (2018).

Abstract (emphasis mine):

The Dravidian language family consists of about 80 varieties (Hammarström H. 2016 Glottolog 2.7) spoken by 220 million people across southern and central India and surrounding countries (Steever SB. 1998 In The Dravidian languages (ed. SB Steever), pp. 1–39: 1). Neither the geographical origin of the Dravidian language homeland nor its exact dispersal through time are known. The history of these languages is crucial for understanding prehistory in Eurasia, because despite their current restricted range, these languages played a significant role in influencing other language groups including Indo-Aryan (Indo-European) and Munda (Austroasiatic) speakers. Here, we report the results of a Bayesian phylogenetic analysis of cognate-coded lexical data, elicited first hand from native speakers, to investigate the subgrouping of the Dravidian language family, and provide dates for the major points of diversification. Our results indicate that the Dravidian language family is approximately 4500 years old, a finding that corresponds well with earlier linguistic and archaeological studies. The main branches of the Dravidian language family (North, Central, South I, South II) are recovered, although the placement of languages within these main branches diverges from previous classifications. We find considerable uncertainty with regard to the relationships between the main branches.

dravidian-phylogenetic-tree
MCC tree summary of the posterior probability distribution of the tree sample generated by the analysis with the relaxed covarion model with relative mutation rates estimated. Node bars give the 95% highest posterior density (HPD) limits of the node heights. Numbers over branches give the posterior probability of the node to the right (range 0–1). Colour coding of the branches gives subgroup affiliation: red, South I; blue, Central; purple, North; yellow, South II.

With every new paper using these revamped pseudoscientific linguistic methods popular in the early 2000s, including glottochronology, Swadesh lists, phylogenetic trees, mutation rates, etc. I feel a little more like Sergeant Murtaugh…

Featured image, from the article: “Map of the Dravidian languages in India, Pakistan, Afghanistan and Nepal adapted from Ethnologue [2]. Each polygon represents a language variety (language or dialect). Colours correspond to subgroups (see text). The three large South I languages, Kannada, Tamil and Malayalam are light red, while the smaller South I languages are bright red. Languages present in the dataset used in this paper are indicated by name, with languages with long (950 + years) literatures in bold.”

See also:

Quantitative analysis of population-scale family trees with millions of relatives

genealogy-demographic-data

The paper Quantitative analysis of population-scale family trees with millions of relatives, by Kaplanis, Gordon, Shor, et al. Science (2018) 359(6379), based on a study of genealogical information at Geni, is today news worldwide.

Abstract:

Family trees have vast applications in multiple fields from genetics to anthropology and economics. However, the collection of extended family trees is tedious and usually relies on resources with limited geographical scope and complex data usage restrictions. Here, we collected 86 million profiles from publicly-available online data shared by genealogy enthusiasts. After extensive cleaning and validation, we obtained population-scale family trees, including a single pedigree of 13 million individuals. We leveraged the data to partition the genetic architecture of longevity by inspecting millions of relative pairs and to provide insights into the geographical dispersion of families. We also report a simple digital procedure to overlay other datasets with our resource in order to empower studies with population-scale genealogical data.

While the article is behind a paywall, you can still read its preprint at bioRxiv.

Excerpts interesting for genetic genealogy(emphasis mine):

Assessment of theories of familial dispersion

Familial dispersion is a major driving force of various genetic, economical, and demographic processes (…)

First, we analyzed sex-specific migration patterns (21) to resolve conflicting results regarding sex bias in human migration (52). Our results indicate that females migrate more than males in Western societies but over shorter distances. The median mother-child distances were significantly larger (Wilcox, one-tailed, p < 10−90) by a factor of 1.6x than father-child distances (Fig. 4A). This trend appeared throughout the 300 years of our analysis window, including in the most recent birth cohort, and was observed both in North American (Wilcox, one-tailed, p < 10−23) and European duos (Wilcox, one-tailed, p < 10−87). On the other hand, we found that the average mother-child distances (fig. S17) were significantly shorter than the father-child distances (t-test, p < 10−90), suggesting that long-range migration events are biased toward males. Consistent with this pattern, fathers displayed a significantly (p < 10−83) higher frequency than mothers to be born in a different country than their offspring (Fig. 4B). Again, this pattern was evident when restricting the data to North American or European duos. Taken together, males and females in Western societies show different migration distributions in which patrilocality occurs only in relatively local migration events and large-scale events that usually involve a change of country are more common in males than females.

genealogical-tree-generations
An example of the genealogical and demographic information available on the website, with a real pedigree of ~6000 individuals. Green: profiles, red: marriages. The family tree spans about 7 generations

Next, we inspected the marital radius (the distance be-tween mates’ places of birth) and its effect on the genetic relatedness of couples (21). The isolation by distance theory of Malécot predicts that increases in the marital radius should exponentially decrease the genetic relatedness of individuals (53). But the magnitude of these forces is also a function of factors such as taboos against cousin marriages (54).

We started by analyzing temporal changes in the birth locations of couples in our cohort. Prior to the Industrial Revolution (<1750), most marriages occurred between peo-ple born only 10km from each other (Fig. 4A [black line]). Similar patterns were found when analyzing European-born individuals (fig. S18) or North American-born individuals (fig. S19). After the beginning of the second Industrial Revolution (1870), the marital radius rapidly increased and reached ~100km for most marriages in the birth cohort in 1950. Next, we analyzed the genetic relatedness (IBD) of couples as measured by tracing their genealogical ties (Fig. 4C). Between 1650 and 1850, the average IBD of couples was relatively stable and on the order of ~4th cousins, whereas IBD exhibited a rapid decrease post-1850. Overall, the medi-an marital radius for each year showed a strong correlation (R2 = 72%) with the expected IBD between couples. Every 70km increase in the marital radius correlated with a decrease in the genetic relatedness of couples by one meiosis event (Fig. 4D). This correlation matches previous isolation by distance forces in continental regions (55). However, this trend was not consistent over time and exhibits three phases. For the pre-1800 birth cohorts, the correlation between marital distance and IBD was insignificant (p > 0.2) and weak (R2 = 0.7%) (fig. S20A). Couples born around 1800-1850 showed a two-fold increase in their marital distance from 8km in 1800 to 19km in 1850. Marriages are usually about 20-25 years after birth and around this time (1820-1875) rapid transportation changes took place, such as the advent of railroad travel in most of Europe and the United States. However, the increase in marital distance was significantly (p < 10−13) coupled with an increase in genetic relat-edness, contrary to the isolation by distance theory (fig. S20B). Only for the cohorts born after 1850, did the data match (R2 = 80%) the theoretical model of isolation by distance (fig. S20C). Taken together, the data shows a 50-year lag between the advent of increased familial dispersion and the decline of genetic relatedness between couples. During this time, individuals continued to marry relatives despite the increased distance. From these results, we hypothesize that changes in 19th century transportation were not the primary cause for decreased consanguinity. Rather, our results suggest that shifting cultural factors played a more important role in the recent reduction of genetic relatedness of couples in Western societies.

EDIT 3/2/2018: Added details of the article.

See also: