A Game of Thrones in Indo-European: proto-languages in Westeros and Essos, and population genomics


I think proto-languages can be applied to basically any appropriate prehistoric setting, and especially to science fiction and fantasy settings. I often viewed the lack of interest for them as based on the idea that they are not fantastic enough, that they would render a fantastic world too realistic to allow for an adequate immersion of the reader (or viewer) into a new world.

With time, I have become more and more convinced that most authors don’t use proto-languages (or tweaked versions of them) simply because they can’t, and resort to the easier way: inventing some rules and words based on some basic ideas and sounds they feel would fit a certain culture or people, to get going. After all, world-building is about a good enough, not too detailed description, and books are about characters and settings, not worlds.

After the end of the 7th season of the Game of Thrones TV series, of which I have become a great fan, I had some season finale grief to deal with, so I thought about applying what we knew about Proto-Indo-Europeans to the fantasy world. Since all book translations deal with English names as if they were translations of the Common Tongue (e.g. Spanish “Invernalia” or “Poniente” for “Winterfel” or “Westeros”), the idea of a translation into Proto-Indo-European seemed quite interesting.

NOTE. I understand that, for some, the idea that “the original language is the best” would make them reject this. However, just take into account the millions who enjoy the books and the TV series only in their native language, and know nothing about the ‘original’ version…

Here are the text and images:

A Dance with Old Tongues

As you can see, the idea of the Common Tongue being Late Proto-Indo-European brings about a whole new (infinite) world of dialectal evolution, language contacts, and population expansions which must be established for the whole setting to work. This is what the text I began to write was about: to use languages (and related populations) of ca. 6000-1500 BC, and to avoid anachronisms and impossible language relationships.

As an added advantage, fans of role-playing games could expand their world with the use of the language correspondences and the maps. This way, instead of “Northern English” being spoken in the North, and “Spanish English” being spoken in Dorne, according to some selections that have been naturally criticized, you have ancient languages that fit with the ancient setting, and which were actually related to each other.

Equivalence of languages of the known world with coeval proto-languages. Solid red lines divide Graeco-Aryan from Northern Indo-European dialects (Tocharian is separated from North-West Indo-European by a dotted red line). See all maps.

I also began drawing a fantasy map, my first one – even though I have been member of Cartographer’s Guild for years – , which eventually helped me with my updates of maps of prehistoric migrations, and even with the use of arrows and colors for scientific publications. I drew details mainly to illustrate the text, not to offer a comprehensive translated world. Most of the work was done in the Summer of 2017, with some map changes done in 2018 with help of the maps and works of fans.

NOTE. I have reviewed it during some long travels lately, and included names of “bloodlines” (i.e. haplogroups), which I find more interesting today for people to understand bottlenecks during prehistoric migrations; I have also added a map using pie charts. If this doesn’t fit well with the whole picture, it’s because it’s a recent addition. The rest is more or less the same as one-two years ago.

I don’t have time now to correct much of what I wrote. I have forgotten most of the relevant details from the books, especially A World of Ice and Fire which I think helped me a lot with this, and I am sure that after writing A Song of Sheep and Horses (now you know the why of the book names) I would deal with some language identification and cognates differently.

I decided to publish it to liven up our Facebook page of Modern Indo-European now that the 8th season is near, so that people can participate and try to translate (translatable) names and expressions into Proto-Indo-European, to see how it would work out. You can also request access our Modern Indo-European and Proto-Indo-European groups; both are administered mainly by Fernando.

If you think this whole idea is crazy, or a huge loss of time, I agree; this is how you lose your time when you like fantasy, comic books, etc. But I am a great fan of fantasy and fiction, and I had a lot of free time back then, so I couldn’t help it…

On the other hand, if you feel that mixing fantasy (or SF) with the Proto-Indo-European question (especially population genomics) is a bad idea, I may have agreed with that two years ago, and maybe this is the reason why I hesitated to publish it then.

Hoewever, today we can read a whole new (2018 and 2019) bunch of “steppe ancestry=Indo-European” fantasies: invisible Nganasan reindeer hordes, a Fearsome Tisza River where Yamna settlers mysteriously disappear, shapeshifting Dutch CWC peoples who change haplogroups, languages dependent on cephalic types, or Yamna/Bell Beaker expanding Vasconic…So what’s the matter with some more fantasy?

Genetic landscapes showing human genetic diversity aligning with geography


New preprint at BioRxiv, Genetic landscapes reveal how human genetic diversity aligns with geography, by Peter, Petkova, and Novembre (2017).


Summarizing spatial patterns in human genetic diversity to understand population history has been a persistent goal for human geneticists. Here, we use a recently developed spatially explicit method to estimate “effective migration” surfaces to visualize how human genetic diversity is geographically structured (the EEMS method). The resulting surfaces are “rugged”, which indicates the relationship between genetic and geographic distance is heterogenous and distorted as a rule. Most prominently, topographic and marine features regularly align with increased genetic differentiation (e.g. the Sahara desert, Mediterranean Sea or Himalaya at large scales; the Adriatic, inter-island straits in near Oceania at smaller scales). We also see traces of historical migrations and boundaries of language families. These results provide visualizations of human genetic diversity that reveal local patterns of differentiation in detail and emphasize that while genetic similarity generally decays with geographic distance, there have regularly been factors that subtly distort the underlying relationship across space observed today. The fine-scale population structure depicted here is relevant to understanding complex processes of human population history and may provide insights for geographic patterning in rare variants and heritable disease risk.

Regional patterns of genetic diversity. a: scale bar for relative effective migration rate. Posterior effective migration surfaces for b: Western Eurasia (WEA) e: Central/Eastern Eurasia (CEA) g: Africa (AFR) h Southern African hunter-gatherers (SAHG) k: and Southeast Asian (SEA) analysis panels. ‘X’ marks locations of samples noted as displaced or recently admixed, ‘H’ denotes Hunter-Gatherer populations (both ‘X’ and ‘H’ samples are omitted from the EEMS model fit); in panel g, red circles indicate Nilo-Saharan speakers and in panel h, ‘B’ denotes Bantu-speaking populations. Approximate location of troughs are shown with dashed lines (see Extended Data Figure 4). PCA plots: c: WEA d:Europeans in WEA f: CEA i: SAHG j: AFR l: SEA. Individuals are displayed as grey dots. Large dots reflect median PC position for a sample; with colors reflecting geography matched to the corresponding EEMS figure. In the EEMS plots, approximate sample locations are annotated. For exact locations, see annotated Extended Data Figure 4 and Table S1. Features discussed in the main text and supplement are labeled. FST values per panelemphasize the low absolute levels of differentiation.”

Among ‘effective migration surfaces‘ (or potential past migration routes), the Pontic-Caspian steppe and its most direct connection with the Carpathian basin, the Danubian plains, appear maybe paradoxically as a constant ‘trough’ (below average migration rate) in all maps.

After all, we could have agreed that this region should be a priori thought as the route of many migrations from the steppe and Asia into Central Europe (and thus of ‘effective migration’) in prehistoric, proto-historic and historic times, such as Suvorovo-Novodanilovka (Pre-Anatolian), Yamna (Late Indo-European), probably Srubna, Scythian-Cimmerian, Sarmatian, Huns, Goths, Avars, Slavs, Mongols

It most likely (at least partially) represents a rather recent historical barrier to admixture, involving successive Byzantine, South Slavic, and Ottoman spheres of influence positioned against Balto-Slavic societies of Eastern Europe.

Location of troughs in West Eurasia (below average migration rate in more than 95% of MCMC iterations) are given in brown. Sample locations and EEMS grid are displayed for the West Eurasian analysis panel. FST values are provided per panel to emphasize the low absolute levels of differentiation.

Featured image, from the article: “Large-scale patterns of population structure. a: EEMS posterior mean effective migration surface for Afro-Eurasia (AEA) panel. ‘X’ marks locations of samples excluded as displaced or recently admixed. ‘H marks locations of excluded hunter-gatherer populations. Regions and features discussed in the main text are labeled. Approximate locations of troughs are annotated with dashed lines (see Extended Data Figure 4). b: PCA plot of AEA panel: Individuals are displayed as grey dots, colored dots reflect median of sample locations; with colors reflecting geography and matching with the EEMS plot. Locations displayed in the EEMS plot reflect the position of populations after alignment to grid vertices used in the model (see methods).”

Images and text available under a CC-BY-NC-ND 4.0 International License.

Discovered via Razib Khan’s blog.


Effective migration in Western Eurasia reveals fine-scale migration surface features


Interesting poster from SMBE 2017, Maps of effective migration as a summary of global human genetic diversity, by Benjamin Peter, Desislava Petkova, Matthew Stephens & John Novembre, of the JNPopGen group of the University of Chicago.

You can read the full poster in the original PDF, or in compressed image. The following are important excerpts:

Aim: To answer the following questions:

  • Which regions have high/low effective migration?
  • How well is human genetic diversity explained by this pure isolation-by-distance model?
  • How does the explanatory performance of EEMS compare to PCA?

Method: It uses the method proposed by Petkova et al. (2016) to fit a map of time-averaged (effective) migration rates to geographically referenced samples, and merges data from 24 different studies (8740 individuals from 469 populations) to assess human genetic diversity on global and continental scale.

  1. Basic workflow:
    • Merge data, remove duplicated & related individuals.
    • Remove Hunter-Gatherer and recently admixed populations. Their locations are still indicated with (H) and (X), respectively
  2. EEMS analysis
    • Calculate genetic distance matrix between all individuals.
    • Fit migration map to data using EEMS MCMC algorithm
  3. Comparison to PCA: Standard PCA using flashpca (Abraham & Inouye 2014) was used, they compare correlation of genetic distance induced from first ten PCs with the fitted EEMS distance

Interpretation: A continuous habitat is approximated by a discrete grid (light gray). A Bayesian model is used to infer the most likely migration rates, which are given on a log scale compared to the Average (BLUE= 100x higher, BROWN=100x lower

Map of effective migrations in Europe

Results (see maps):

  1. Global diversity patterns correlate with topographical features
  2. In Western Eurasia, EEMS reveals fine-scale migration surface features

Discussion: EEMS Maps are intuitive and direct way to visualize geographically referenced genetic data.

Dense sampling (WEstern Eurasian panel) in particular yields high resolution and accuracy, but the method works well at a global scale (FST=0.06) and just in Western Eurasia (FST=0.01).

EEMS-maps are able to reasonably well predict genetic differences, but hunter-gatherer populations and admixed populations were a priori excluded.

Discovered via Eurogenes. Full image via Reddit.