The cradle of Russians, an obvious Finno-Volgaic genetic hotspot


First look of an accepted manuscript (behind paywall), Genome-wide sequence analyses of ethnic populations across Russia, by Zhernakova et al. Genomics (2019).

Interesting excerpts:

There remain ongoing discussions about the origins of the ethnic Russian population. The ancestors of ethnic Russians were among the Slavic tribes that separated from the early Indo-European Group, which included ancestors of modern Slavic, Germanic and Baltic speakers, who appeared in the northeastern part of Europe ca. 1,500 years ago. Slavs were found in the central part of Eastern Europe, where they came in direct contact with (and likely assimilation of) the populations speaking Uralic (Volga-Finnish and Baltic- Finnish), and also Baltic languages [11–13]. In the following centuries, Slavs interacted with the Iranian-Persian, Turkic and Scandinavian peoples, all of which in succession may have contributed to the current pattern of genome diversity across the different parts of Russia. At the end of the Middle Ages and in the early modern period, there occurred a division of the East Slavic unity into Russians, Ukrainians and Belarusians. It was the Russians who drove the colonization movement to the East, although other Slavic, Turkic and Finnish peoples took part in this movement, as the eastward migrations brought them to the Ural Mountains and further into Siberia, the Far East, and Alaska. During that interval, the Russians encountered the Finns, Ugrians, and Samoyeds speakers in the Urals, but also the Turkic, Mongolian and Tungus speakers of Siberia. Finally, in the great expanse between the Altai Mountains on the border with Mongolia, and the Bering Strait, they encountered paleo-Asiatic groups that may be genetically closest to the ancestors of the Native Americans. Today’s complex patchwork of human diversity in Russia has continued to be augmented by modern migrations from the Caucasus, and from Central Asia, as modern economic migrations take shape.

Sample relatedness based on genotype data. Eurasia: Principal Component plot of 574 modern Russian genomes. Colors reflect geographical regions of collection; shapes reflect the sample source. Red circles show the location of Genome Russia samples.

In the current study, we annotated whole genome sequences of individuals currently living on the territory of Russia and identifying themselves as ethnic Russian or as members of a named ethnic minority (Fig. 1). We analyzed genetic variation in three modern populations of Russia (ethnic Russians from Pskov and Novgorod regions and ethnic Yakut from the Sakha Republic), and compared them to the recently released genome sequences collected from 52 indigenous Russian populations. The incidence of function-altering mutations was explored by identifying known variants and novel variants and their allele frequencies relative to variation in adjacent European, East Asian and South Asian populations. Genomic variation was further used to estimate genetic distance and relationships, historic gene flow and barriers to gene flow, the extent of population admixture, historic population contractions, and linkage disequilibrium patterns. Lastly, we present demographic models estimating historic founder events within Russia, and a preliminary HapMap of ethnic Russians from the European part of Russia and Yakuts from eastern Siberia.

Sample relatedness based on genotype data. Western Russia and neighboring countries: Principal Component plot of 574 modern Russian genomes. Colors reflect geographical regions of collection; shapes reflect the sample source. Red circles show the location of Genome Russia samples.

The collection of identified SNPs was used to inspect quantitative distinctions among 264 individuals from across Eurasia (Fig. 1) using Principal Component Analysis (PCA) (Fig. 2). The first and the second eigenvectors of the PCA plot are associated with longitude and latitude, respectively, of the sample locations and accurately separate Eurasian populations according to geographic origin. East European samples cluster near Pskov and Novgorod samples, which fall between northern Russians, Finno-Ugric peoples (Karelian, Finns, Veps etc.), and other Northeastern European peoples (Swedes, Central Russians, Estonian, Latvians, Lithuanians, and Ukrainians) (Fig. 2b). Yakut individuals map into the Siberian sample cluster as expected (Fig. 2a). To obtain an extended view of population relationships, we performed a maximum likelihood-based estimation of ancestry and population structure using ADMIXTURE [46](Fig. 2c). The Novgorod and Pskov populations show similar profiles with their Northeastern European ancestors while the Yakut ethnic group showed mixed ancestry similar to the Buryat and Mongolian groups.

Population structure across samples in 178 populations from five major geographic regions (k=5). Samples are pooled across three different studies that covered the territory of Russian Federation (Mallick et al. 2016 [36], Pagani et al. 2016 [37], this study). The optimal k-value was selected by value of cross validation error. Russian samples from all studies (highlighted in bold dark blue) show a slight gradient from Eastern European (Ukrainian, Belorussian, Polish) to North European (Estonian Karelian, Finnish) structures, reflecting population history of northward expansion. Yakut samples from different studies (highlighted in bold red) also show a slight gradient from Mongolian to Siberian people (Evens), as expected from their original admixture and northward expansions. The samples originated from this study are highlighted, and plotted in separated boxes below.

Possible admixture sources of the Genome Russia populations were addressed more formally by calculating F3 statistics, which is an allele frequency-based measure, allowing to test if a target population can be modeled as a mixture of two source populations [48]. Results showed that Yakut individuals are best modeled as an admixture of Evens or Evenks with various European populations (Supplemental Table S4). Pskov and Novgorod showed admixture of European with Siberian or Finno-Ugric populations, with Lithuanian and Latvian populations being the dominant European sources for Pskov samples.

The heatmaps of gene flow barriers show for each point at the geographical map the interpolated differences in allele frequencies (AF) between the estimated AF at the point with AFs in the vicinity of this point. The direction of the maximal difference in allele frequencies is coded by colors and arrows.

So, Russians expanding in the Middle Ages as acculturaded Finno-Volgaic peoples.

Or maybe the true Germano-Slavonic™-speaking area was in north-eastern Europe, until the recent arrival of Finno-Permians with the totally believable Nganasan-Saami horde, whereas Yamna -> Bell Beaker represented Vasconic-Caucasian expanding all over Europe in the Bronze Age. Because steppe ancestry in Fennoscandia and Modern Basques in Iberia.

A really hard choice between equally plausible models.


New monograph on The Tale of Igor’s Campaign (in Russian)


Sergej Nikolaev has published a new monograph on The Tale of Igor’s Campaign (you should download and open it in a PDF viewer to view some special characters correctly):

Слово о полку Игореве»: реконструкция стихотворного текста, by С.Л. Николаев (2018).

Abstract (in Russian).

Текст «Слова о полку Игореве» (далее «Слово») дошел до нас в двух неточных (отредактированных) копиях со списка нач. XVI в. и нескольких выписках из него. Наслоения, привнесенные переписчиком нач. XVI в. (или несколькими переписчиками) – редактура в русле 2 го южнославянского влияния и поздние диалектизмы – непоследовательны (§9.3.1) и не настолько исказили стихотворный текст рубежа XII–XIII вв., чтобы сделать невозможной его реконструкцию. «Слово» по своему жанру (светская поэзия) не принадлежит к текстам, которые по многу раз переписывались в монастырских скрипториях. Поэтому не исключено, что рукопись нач. XVI в. является хотя и небрежной, но первой по счету копией древнерусского оригинала.

«Слово» могло звучать приблизительно так, как я предлагаю в своей реконструкции, морфология и акцентология языка его автора могли быть устроены так, как я предполагаю, и оно могло быть создано в реконструируемой мною системе стихосложения. Однако в действительности многое могло быть устроено иначе. Реконструкция акцентологической системы и две другие гипотезы (о неравносложной силлаботонике и об опциональном прояснении слабых редуцированных) замкнуты друг на друге и образуют circulus in probando. Реконструируемая для «Слова» акцентологическая система выводится из праславянской реконструкции и подтверждается данными современных диалектов, однако она не засвидетельствована в древнерусских памятниках. Слабым местом моей реконструкции является прояснение слабых редуцированных в позициях, где оно нужно исключительно из метрических соображений. В работе, подобной этой, невозможно избежать домыслов и рискованных допущений, ряд выдвинутых гипотез находится «на грани фола», однако в целом моя реконструкция построена на фактах и их интерпретациях, являясь таким образом научным исследованием. В работе используютмя результаты смежных наук ‒ в первую очередь стиховедения. Представленная в настоящей книге реконструкция «Слова» является первым опытом системного моделирования стихотворного текста на гипотетическом древнерусском диалекте XII‒XIII в., существование которого весьма вероятно. Мне хотелось бы надеяться, что моя работа внесет свою скромную лепту в изучение великого памятника древнерусской литературы.

The Tale of Igor’s Campaign is probably the oldest Slavic epic available, recorded later than what oral tradition and linguistic details reflect, like the oldest Indo-Iranian texts. It contains many details interesting for Proto-Slavic (and North-West Indo-European) language and culture reconstruction.

For those confusing recent attestation of languages with their relevance for comparative grammar, I would suggest Martin Joachim Kümmel‘s article Is ancient old and modern new? Fallacies of attestation and reconstruction (with special focus on Indo-Iranian).

Featured image: Viktor Vasnetsov. After Igor Svyatoslavich’s fighting with the Polovtsy (Photographer, referenced in Wikipedia).


Genetic landscapes showing human genetic diversity aligning with geography


New preprint at BioRxiv, Genetic landscapes reveal how human genetic diversity aligns with geography, by Peter, Petkova, and Novembre (2017).


Summarizing spatial patterns in human genetic diversity to understand population history has been a persistent goal for human geneticists. Here, we use a recently developed spatially explicit method to estimate “effective migration” surfaces to visualize how human genetic diversity is geographically structured (the EEMS method). The resulting surfaces are “rugged”, which indicates the relationship between genetic and geographic distance is heterogenous and distorted as a rule. Most prominently, topographic and marine features regularly align with increased genetic differentiation (e.g. the Sahara desert, Mediterranean Sea or Himalaya at large scales; the Adriatic, inter-island straits in near Oceania at smaller scales). We also see traces of historical migrations and boundaries of language families. These results provide visualizations of human genetic diversity that reveal local patterns of differentiation in detail and emphasize that while genetic similarity generally decays with geographic distance, there have regularly been factors that subtly distort the underlying relationship across space observed today. The fine-scale population structure depicted here is relevant to understanding complex processes of human population history and may provide insights for geographic patterning in rare variants and heritable disease risk.

Regional patterns of genetic diversity. a: scale bar for relative effective migration rate. Posterior effective migration surfaces for b: Western Eurasia (WEA) e: Central/Eastern Eurasia (CEA) g: Africa (AFR) h Southern African hunter-gatherers (SAHG) k: and Southeast Asian (SEA) analysis panels. ‘X’ marks locations of samples noted as displaced or recently admixed, ‘H’ denotes Hunter-Gatherer populations (both ‘X’ and ‘H’ samples are omitted from the EEMS model fit); in panel g, red circles indicate Nilo-Saharan speakers and in panel h, ‘B’ denotes Bantu-speaking populations. Approximate location of troughs are shown with dashed lines (see Extended Data Figure 4). PCA plots: c: WEA d:Europeans in WEA f: CEA i: SAHG j: AFR l: SEA. Individuals are displayed as grey dots. Large dots reflect median PC position for a sample; with colors reflecting geography matched to the corresponding EEMS figure. In the EEMS plots, approximate sample locations are annotated. For exact locations, see annotated Extended Data Figure 4 and Table S1. Features discussed in the main text and supplement are labeled. FST values per panelemphasize the low absolute levels of differentiation.”

Among ‘effective migration surfaces‘ (or potential past migration routes), the Pontic-Caspian steppe and its most direct connection with the Carpathian basin, the Danubian plains, appear maybe paradoxically as a constant ‘trough’ (below average migration rate) in all maps.

After all, we could have agreed that this region should be a priori thought as the route of many migrations from the steppe and Asia into Central Europe (and thus of ‘effective migration’) in prehistoric, proto-historic and historic times, such as Suvorovo-Novodanilovka (Pre-Anatolian), Yamna (Late Indo-European), probably Srubna, Scythian-Cimmerian, Sarmatian, Huns, Goths, Avars, Slavs, Mongols

It most likely (at least partially) represents a rather recent historical barrier to admixture, involving successive Byzantine, South Slavic, and Ottoman spheres of influence positioned against Balto-Slavic societies of Eastern Europe.

Location of troughs in West Eurasia (below average migration rate in more than 95% of MCMC iterations) are given in brown. Sample locations and EEMS grid are displayed for the West Eurasian analysis panel. FST values are provided per panel to emphasize the low absolute levels of differentiation.

Featured image, from the article: “Large-scale patterns of population structure. a: EEMS posterior mean effective migration surface for Afro-Eurasia (AEA) panel. ‘X’ marks locations of samples excluded as displaced or recently admixed. ‘H marks locations of excluded hunter-gatherer populations. Regions and features discussed in the main text are labeled. Approximate locations of troughs are annotated with dashed lines (see Extended Data Figure 4). b: PCA plot of AEA panel: Individuals are displayed as grey dots, colored dots reflect median of sample locations; with colors reflecting geography and matching with the EEMS plot. Locations displayed in the EEMS plot reflect the position of populations after alignment to grid vertices used in the model (see methods).”

Images and text available under a CC-BY-NC-ND 4.0 International License.

Discovered via Razib Khan’s blog.