New preprint papers on Finland’s population history and disease, skin pigmentation in Africa, and genetic variation in Thailand hunter-gatherers


New and interesting research these days in BioRxiv:

Haplotype sharing provides insights into fine-scale population history and disease in Finland, by Martín et al. (2017):

Finland provides unique opportunities to investigate population and medical genomics because of its adoption of unified national electronic health records, detailed historical and birth records, and serial population bottlenecks. We assemble a comprehensive view of recent population history (≤100 generations), the timespan during which most rare disease-causing alleles arose, by comparing pairwise haplotype sharing from 43,254 Finns to geographically and linguistically adjacent countries with different population histories, including 16,060 Swedes, Estonians, Russians, and Hungarians. We find much more extensive sharing in Finns, with at least one ≥ 5 cM tract on average between pairs of unrelated individuals. By coupling haplotype sharing with fine-scale birth records from over 25,000 individuals, we find that while haplotype sharing broadly decays with geographical distance, there are pockets of excess haplotype sharing; individuals from northeast Finland share several-fold more of their genome in identity-by-descent (IBD) segments than individuals from southwest regions containing the major cities of Helsinki and Turku. We estimate recent effective population size changes over time across regions of Finland and find significant differences between the Early and Late Settlement Regions as expected; however, our results indicate more continuous gene flow than previously indicated as Finns migrated towards the northernmost Lapland region. Lastly, we show that haplotype sharing is locally enriched among pairs of individuals sharing rare alleles by an order of magnitude, especially among pairs sharing rare disease causing variants. Our work provides a general framework for using haplotype sharing to reconstruct an integrative view of recent population history and gain insight into the evolutionary origins of rare variants contributing to disease.

Migration rates and haplotype sharing within Finland and between neighboring countries. A) Map of regional Finnish, Swedish, and Estonian birthplaces Purple triangle indicates St. Petersburg, Russia. Hungary not shown. 1 Finnish, Swedish, and Estonian region labels are shown in Table S3. B) Principal components analysis (PCA) of unrelated individuals, colored by birth region as shown in A) if available or country otherwise. C-D) Migration rates inferred with EEMS. Values and colors indicate inferred rates, for example with +1 (shades of blue) indicating an order of magnitude more migration at a given point on average, and shades of orange indicating migration barriers. C) Migration rates among municipalities in Finland. D) Migration rates within and between Finland, Sweden, Estonia, and St. Petersburg, Russia. Available under a CC-BY 4.0 International license.

Interesting to understand this paper is the whole research published by the Institute for Molecular Medicine Finland (FIMM): their website contains detailed research on Finland’s recent genetic history.

NOTE: The featured image of this article contains three figures from the FIMM (License CC-BY 4.0). Left: Position of the points represents the locations of 1042 Finnish individuals. By clustering the individuals into two groups based on genome data we see a split between eastern (blue) and western (red) parts. Individuals who show considerable relatedness to both groups have been colored with cyan. Both parents of each individual were born close to each other and based on the parents’ birth years we can infer that we are looking at the genetic structure present in Finland before 1950s. Center: An estimated borderline of the Treaty of Nöteborg on top of the map from the left. The border line is drawn between Jääski (28.92 N, 61.04 E) and Pyhäjoki (24.26 N, 64.46 E). Right: The settlement border divides Finland into the early settlement region (to west and south of the border) and the late settlement region (to east and north of the border) (Jutikkala 1933, s. 91). We see that Southern Savo (in south-eastern part of the early settlement) is among the only parts of the early settlement region that is dominated by the eastern genetic group. Information from Matti Pirinen and Sini Kerminen, 24.5.2017.

An Unexpectedly Complex Architecture for Skin Pigmentation in Africans, by Martin et al (2017):

Fewer than 15 genes have been directly associated with skin pigmentation variation in humans, leading to its characterization as a relatively simple trait. However, by assembling a global survey of quantitative skin pigmentation phenotypes, we demonstrate that pigmentation is more complex than previously assumed with genetic architecture varying by latitude. We investigate polygenicity in the Khoe and the San, populations indigenous to southern Africa, who have considerably lighter skin than equatorial Africans. We demonstrate that skin pigmentation is highly heritable, but that known pigmentation loci explain only a small fraction of the variance. Rather, baseline skin pigmentation is a complex, polygenic trait in the KhoeSan. Despite this, we identify canonical and non-canonical skin pigmentation loci, including near SLC24A5, TYRP1, SMARCA2/VLDLR, and SNX13 using a genome-wide association approach complemented by targeted resequencing. By considering diverse, under-studied African populations, we show how the architecture of skin pigmentation can vary across humans subject to different local evolutionary pressures.

Contrasting maternal and paternal genetic variation of hunter-gatherer groups in Thailand, by Kutanan et al. (2017):

The Maniq and Mlabri are the only recorded nomadic hunter-gatherer groups in Thailand. Here, we sequenced complete mitochondrial (mt) DNA genomes and ~2.364 Mbp of non-recombining Y chromosome (NRY) to learn more about the origins of these two enigmatic populations. Both groups exhibited low genetic diversity compared to other Thai populations, and contrasting patterns of mtDNA and NRY diversity: there was greater mtDNA diversity in the Maniq than in the Mlabri, while the converse was true for the NRY. We found basal uniparental lineages in the Maniq, namely mtDNA haplogroups M21a, R21 and M17a, and NRY haplogroup K. Overall, the Maniq are genetically similar to other negrito groups in Southeast Asia. By contrast, the Mlabri haplogroups (B5a1b1 for mtDNA and O1b1a1a1b and O1b1a1a1b1a1 for the NRY) are common lineages in Southeast Asian non-negrito groups, and overall the Mlabri are genetically similar to their linguistic relatives (Htin and Khmu) and other groups from northeastern Thailand. In agreement with previous studies of the Mlabri, our results indicate that the Malbri do not directly descend from the indigenous negritos. Instead, they likely have a recent origin (within the past 1,000 years) by an extreme founder event (involving just one maternal and two paternal lineages) from an agricultural group, most likely the Htin or a closely-related group.


Indo-European demic diffusion model, 3rd edition


I have just uploaded the working draft of the third version of the Indo-European demic diffusion model. Unlike the previous two versions, which were published as essays (fully developed papers), this new version adds more information on human admixture, and probably needs important corrections before a definitive edition can be published.

The third version is available right now on ResearchGate and I will post the PDF at Academia Prisca, as soon as possible:

Map overlaid by PCA including Yamna, Corded Ware, Bell Beaker, and other samples

Feel free to comment on the paper here, or (preferably) in our forum.

A working version (needing some corrections) divided by sections, illustrated with up-to-date, high resolution maps, can be found (as always) at the official collaborative Wiki website

Effective migration in Western Eurasia reveals fine-scale migration surface features


Interesting poster from SMBE 2017, Maps of effective migration as a summary of global human genetic diversity, by Benjamin Peter, Desislava Petkova, Matthew Stephens & John Novembre, of the JNPopGen group of the University of Chicago.

You can read the full poster in the original PDF, or in compressed image. The following are important excerpts:

Aim: To answer the following questions:

  • Which regions have high/low effective migration?
  • How well is human genetic diversity explained by this pure isolation-by-distance model?
  • How does the explanatory performance of EEMS compare to PCA?

Method: It uses the method proposed by Petkova et al. (2016) to fit a map of time-averaged (effective) migration rates to geographically referenced samples, and merges data from 24 different studies (8740 individuals from 469 populations) to assess human genetic diversity on global and continental scale.

  1. Basic workflow:
    • Merge data, remove duplicated & related individuals.
    • Remove Hunter-Gatherer and recently admixed populations. Their locations are still indicated with (H) and (X), respectively
  2. EEMS analysis
    • Calculate genetic distance matrix between all individuals.
    • Fit migration map to data using EEMS MCMC algorithm
  3. Comparison to PCA: Standard PCA using flashpca (Abraham & Inouye 2014) was used, they compare correlation of genetic distance induced from first ten PCs with the fitted EEMS distance

Interpretation: A continuous habitat is approximated by a discrete grid (light gray). A Bayesian model is used to infer the most likely migration rates, which are given on a log scale compared to the Average (BLUE= 100x higher, BROWN=100x lower

Map of effective migrations in Europe

Results (see maps):

  1. Global diversity patterns correlate with topographical features
  2. In Western Eurasia, EEMS reveals fine-scale migration surface features

Discussion: EEMS Maps are intuitive and direct way to visualize geographically referenced genetic data.

Dense sampling (WEstern Eurasian panel) in particular yields high resolution and accuracy, but the method works well at a global scale (FST=0.06) and just in Western Eurasia (FST=0.01).

EEMS-maps are able to reasonably well predict genetic differences, but hunter-gatherer populations and admixed populations were a priori excluded.

Discovered via Eurogenes. Full image via Reddit.

The over-simplistic “Kossinian Model”: homogeneous peoples speaking a common language within clearly delimited cultures


There seems to be a growing trend to over-simplistic assumptions in archaeology and linguistics, led by amateur and professional geneticists alike, due to the recent (only partially deserved) popularity of Human Evolutionary Biology.

These studies are offering ancient DNA samples, whose Y-DNA and mtDNA haplogroups and admixture analyses are showing some new valuable information on ancient cultures and peoples. However, their authors are constantly giving uninformed conclusions.

I have read a good, simple description of the Kossinnian model in the book Balkan Dialogues (Routledge, 2017), which has been shared to be fully read online by co-editor Maria Ivanova.

Chapter 3, The transitions between Neolithic and Early Bronze Age in Greece, and the “Indo-European problem”, by Jean-Paul Demoule, offers a clear account of the difficulties found in tracing the arrival of Proto-Greek speakers to Greece or the “Coming of the Greeks”. The identifications of cultural breaks most commonly supported by academics as potentially signaling the arrival of Proto-Greeks are cited, including the Early Helladic III period ca. 2300 BC (with the diffusion of Mynian ware), or the Middle Helladic period ca. 2000 BC. The problem of finding a clear cultural break before the emergence of Mycenaean Greece (which obviously spoke an early Greek dialect) has led some to adopt a “Palaeolithic autochthonous theory” (Giannopoulos 2012), which offers still more problems than it solves.

Of interest is his reference to Kossina in light of the recent popularity in resorting to DNA to answer all problems. It is mandatory for the field of Indo-European studies – regardless of what renown labs and journals of high impact factor are publishing – to avoid carrying on “in the steps of race based cranial measurement which enjoyed its floruit in the 19th century before fading into oblivion.”

This is why, without denying the relationship between Indo-European languages, we need to question the validity of the overall model itself, which has shown itself to be over-simplistic in assuming the movement of permanent and long-lasting homogeneous “peoples”. More precisely, we have to criticize in details the “Kossinnian Model” underlying all those assumptions – “Kossinnian”, because of the German archaeologist Gustaf Kossina (1858–1931), well known for the famous sentence: “Cultural provinces, which are clearly delimited on the basis of archaeology, correspond in every era to specific peoples or tribes” (“Scharf umgrenzte archäologische Kultur-provinzen decken sich zu allen Zeiten mit ganz bestimmten Völkern und Völkerstämmen”). Four basic assumptions arise from this central idea:

  1. Changes in languages are due to population movements, usually involving conquest, and every migration implies a linguistic change.
  2. Archaeological “cultures” are homogenous ethnic groups, with defined frontiers, based on the model of 19th- and 20th-century nation-states and equally on the model of biological entities that reproduce by parthenogenesis.
  3. There is coincidence between language and material culture.
  4. Finally, languages are also homogenous biological entities which are autonomous and clearly delimited, and which can reproduce by parthenogenesis or by scissiparity.

Unfortunately, none of these points is self-evident and each can be countered by a number of historical examples (Demoule 2014: 553–592).

While I agree with the first part of the first statement attributed to the “Kossinnian model”, i.e. that languages are usually the product of population movements (either involving conquest or not), the other statements are obviously and demonstrably false, and are frequently assumed in comments, blog posts, forums, and even research articles – particularly in those based on genetic studies -, and this trend seems to be increasing lately.