The importance of fine-scale studies for integrating palaeogenomics and archaeology


Short review (behind paywall) The importance of fine-scale studies for integrating paleogenomics and archaeology, by Krishna R. Veeramah, Current Opinion in Genetics & Development (2018) 53:83-89.

Abstract (emphasis mine):

There has been an undercurrent of intellectual tension between geneticists studying human population history and archaeologists for almost 40 years. The rapid development of paleogenomics, with geneticists working on the very material discovered by archaeologists, appears to have recently heightened this tension. The relationship between these two fields thus far has largely been of a multidisciplinary nature, with archaeologists providing the raw materials for sequencing, as well as a scaffold of hypotheses based on interpretation of archaeological cultures from which the geneticists can ground their inferences from the genomic data. Much of this work has taken place in the context of western Eurasia, which is acting as testing ground for the interaction between the disciplines. Perhaps the major finding has not been any particular historical episode, but rather the apparent pervasiveness of migration events, some apparently of substantial scale, over the past ∼5000 years, challenging the prevailing view of archaeology that largely dismissed migration as a driving force of cultural change in the 1960s. However, while the genetic evidence for ‘migration’ is generally statistically sound, the description of these events as structured behaviours is lacking, which, coupled with often over simplistic archaeological definitions, prevents the use of this information by archaeologists for studying the social processes they are interested in. In order to integrate paleogenomics and archaeology in a truly interdisciplinary manner, it will be necessary to focus less on grand narratives over space and time, and instead integrate genomic data with other form of archaeological information at the level of individual communities to understand the internal social dynamics, which can then be connected amongst communities to model migration at a regional level. A smattering of recent studies have begun to follow this approach, resulting in inferences that are not only helping ask questions that are currently relevant to archaeologists, but also potentially opening up new avenues of research.

Interesting excerpts (emphasis mine, reference numbers removed for clarity):

There are two major, somewhat intertwined, problems that currently exist.

First, archaeologists are not critiquing whether the migrations identified by paleogenomics using sophisticated population genetic machinery are actually occurring. Instead, the technical criticism arrives in terms of how these migrations are being ascribed to specific cultures. In many paleogenomic papers, there is a tendency (and often an analytical and technical need) to associate samples with particular archaeological cultures, for which all samples are then treated as possessing some kind homogenous and pervasive social identity that is bound in space and time. The major critiques of this thus far have been directed to those studies examining Corded-Ware and Bell-Beaker-related individuals and their potential relationship to the Yamnaya [Vander Linden (2016), Heyd (2017), Furholt (2017)], but are applicable to many other ‘migration’ scenarios described in the recent literature. This is compounded by the use of sometimes small numbers of samples to represent certain cultures from a particular geographic area as representatives of the entire culture at a supra-regional level. Yet often these archaeological cultures such as Corded-Ware and Bell-Beaker themselves show considerable variability in space and time, and even within cemeteries, which is not factored into the genetic analysis.

From a population geneticists point of view, this kind of simplification is somewhat understandable and will often likely have very little impact on the final analysis, given that the primary goal is usually to use ancient samples to better understand modern genetic variation. Though there may be a specific historical interest in some of these past events, I would argue that the aim for most population geneticists at a higher level is to try and fit modern patterns of genetic variation using the simplest models possible that take into account past demographic events (for example fitting f-statistics using the ADMIXTUREGRAPH approach), as this is how we are trained. Although sharing an archaeological culture may not mean that a set of individuals are part of the same homogeneous social group in reality, this approach may be a good enough heuristic to find broad genetic connections compared to another group represented by a different culture, which can then ultimately help understand and model modern human population structure. However, for an archaeologists interested in the ancient individuals themselves and their social identity, this lumping is unsatisfactory, where sophisticated narratives of the individual migrants and their ancient communities are the intended goal.

From the paper. Barplot showing cumulative number of ancient Eurasian genomes published on a yearly basis up to 8th July 2018. Includes samples undergoing both whole genome shotgun and SNP capture sequencing.

The second related problem is that ‘migration’ in the sense used currently in the paleogenomics literature lacks sufficient detail to be of much use for an archaeologists attempting to disentangle the complex social dynamics within and between communities. To truly understand the role of migration as a social process and its contribution towards cultural changes, it is necessary to describe it as a structured behaviour, rather than treating it as an explanatory ‘black box’. Are the migrations occurring as a result of short range waves-of-advance movements, or as long-distance movements via leapfrogging models or stream migrations along established routes dependent on key kinship networks. Are there return migrants, and are some subset of individuals more predisposed to migration driving the signals? Although such models were implemented in past studies (even with classical markers [1]) and are part of the population genetics literature, they are lacking in the current paleogenomics literature when discussing migration. The finding that there is an increase of 12.3% of ancestry type X in population A compared to the preceding population B that is suggestive of a migration, is not particularly useful for examining these kind of models. It is also unclear to what degree standard population genetic parameters estimated from genomic data such as effective population size, Ne, and gene flow are relevant to models studied in archaeology, given they reflect (somewhat undefined) long-term population sizes and average rates of movements over time, rather than reflecting any kind of reality of census size and mobility in the ancient communities the archaeologists are actually attempting to study.

The text goes on to talk about ways of studying fine-grained social dynamics of local cultures, such as:

define levels of genetic relatedness, but also in terms of material culture, age, sex, stress and activity indicators, stable isotopes for diet reconstruction (nitrogen, d13C and d15N, carbon, 13C/12C) and strontium and oxygen isotopes for mobility (87Sr/86Sr, d18O). Where possible, sites should be examined over multiple generations. In addition it will be incredibly useful to characterize the impact of disease in these communities, which is also proving to be a highly fruitful realm for paleogenomics.

I would say that the main problem is not the obvious limitations of palaeogenomics in terms of identifying prehistoric ethnolinguistic communities and their evolution, which is why it is just another tool to complement archaeology and linguistics. The main problem is the narrow understanding that some people have of the inherent limitations of palaeogenomics – especially when it interests them – , when publicizing simplistic conclusions based on these tools and their results. And I am not referring only to amateurs.


Human dietary evolution in central Germany, and relationship of Únětice to Corded Ware and Bell Beaker cultures


Open access 4000 years of human dietary evolution in central Germany, from the first farmers to the first elites, by Münster et al. PLOS One (2018).

Excerpts (emphasis mine):

This study of human diet between the early stages of the farming lifestyle and the Early Bronze Age in the MES, based on carbon and nitrogen isotope analyses, is amongst the most comprehensive of its kind. Or results show that human dietary behaviour has changed significantly throughout the study period. A distinct increase in the proportion of animal protein in the human diet can be identified over time, a trend which only the people from the BBC did not follow. The results of the stable isotope analyses are consistent with epidemiological data on caries frequency, which indicate the highest proportions of carbohydrates in the human diet in the EN and the lowest in the EBA [19]. These findings may have been due to an increased consumption of either meat or dairy products. Although meat and dairy consumption cannot be distinguished by means of stable isotope data or caries frequency, molecular-genetic analyses of lactase persistence argue against an increased consumption of fresh milk [9]. However, although approximately 70% of the world population has a lactose intolerance, most of them can tolerate dairy foods or lactose-containing foods without developing symptoms [128]. It therefore comes as no surprise that the use of processed milk, i.e. dairy products, appears to have set in early on in the Neolithic period [99]. Unarguably, there was an increasing stabilisation of the supply of meat and secondary animal products throughout the Neolithic. The data dynamics overall argue against an equal availability of animal-derived protein to all sections of the various populations, which attests to early processes of specialisation, individualisation and hierarchisation. Moreover, population-genetic processes are also reflected in the development of human dietary habits. From the 4th millennium BC onwards, groups moved into the MES from the north, sometimes accompanied by violence [6,29], and fundamental demographic changes took place in the FN with the arrival of CWC groups from the north-eastern steppes and the BBC from south-western Europe [6,7]. This former pastoral steppe component, in particular, may have been responsible for the fact that animal-based foodstuffs reached their highest importance in the FN and EBA. Differences in the consumption of animal-derived products between the sexes resulted in significantly lower δ15N values and less access to animal protein in females. Besides behavioural choices as to what food to consume, numerous other nutritional and gender-specific factors must certainly be taken into account when assessing the subsistence and nutritional balance of individuals. In the future, analysis of single amino acids of nitrogen and the compound-specific carbon isotope analysis of lipids and bone mineral may help providing more detailed and nuanced insight on aspects of human diet, such as protein sources in complex foodwebs, nutritional stress and disease [129131]. They should become a standard in isotope studies and applied more often and routinely.

Overview of investigated sites and archaeological chronology of Neolithic and Early Bronze Age central Germany. The Stroke Ornamented Culture and Michelsberg Culture are not represented in our sample due to low rate of anthropological findings. Chronology after Schwarz in [29].

Regarding specifically differences between Corded Ware (CWC) and Bell Beaker (BBC) cultures in Saxony-Anhalt, a region already known to show a resurge of the previous population after the Únětice period:

Based on isotope data from collagen [104], a diet with a high protein content from meat or dairy products has been postulated for CWC groups from south-western Germany, though researchers there were also unable to distinguish between the two sources of protein. The consumption of fresh milk and the consumption of dairy products such as cheese, yoghurt and kefir may also be erroneously dated to the same period and associated with lactase persistence. A newly reported genome-wide SNP dataset from 230 West Eurasians dating from between 6,500 and 300 cal. BC [9] has shown, like earlier studies [105], that no notable increase in lactase persistence in Europe appears to have occurred prior to 2,000 BC. It was and is a fact that milk is not a natural foodstuff for adult consumption, unless one is prepared to negate the numerous symptoms of lactose intolerance, including abdominal pain, bloating, flatulence, diarrhoea, asthma and others. Cultural evolution in conjunction with natural selection has made it possible for us to use milk and its secondary products as a source of protein and energy. Whilst the continuous increase in animal protein in the diet of the Neolithic populations of the MES from the LBK to the Early Bronze Age can undoubtedly partly be traced back to an intensified use of secondary animal products over the course of the Neolithic, it is difficult to estimate how great a contribution this made to the increase in δ15N values. Judging from molecular-genetic data on lactase persistence, however, the consumption of fresh milk, at least, appears to have first begun to have an impact on the protein balance of individuals around 4,000 years ago [9].

NOTE. Regarding lactase persistence, we now know that Ukraine_Eneolithic sample I6561, of haplogroup R1a-Z93 (hence probably related to the later expansion of the Corded Ware culture) is the nearest sample to the population that might have expanded the 13910*T lactase persistence allele in Northern Europe.

Sex-specific differences in stable carbon and nitrogen isotope values in humans.

[After the massive influx of the CWC into central Europe in the FN] The dietary profile once again exhibits an increase in the mean δ15N values, to 10.1 ± 1.0 ‰. The BBC, which spread somewhat later throughout north and central Europe (with the arrival of the CWC jointly making up Event C) and whose origins are presumed to have been in south-western Europe, constitutes an exception, not just from the point of view of genetics. In contrast to the general diachronic trend consisting of raised δ15N values in the cultural groups examined, the BBC exhibited a nutritional decrease in mean δ15N values to 9.7 ± 0.7 ‰. The divergence between the CWC and the BBC to be seen in their funerary rites, despite their chronological and sometimes also territorial coexistence, is thus also visible in their dietary habits. Comparative examinations of CWC sites in southern Germany have shown that their mean δ15N values were, in fact, comparable to those of the CWC in the MES (δ13C: -19.9 ± 0.6 ‰, δ15N: 10.8 ± 0.7 ‰, n = 32), despite exhibiting significant variation between and even within the sites, thus pointing to the diverging subsistence strategies of different communities [104]. The UC, which followed the CWC in the MES, bore close affinities to its forerunner in terms of its population genetics, thus supporting the hypothesis that the BBC only had a minimal genetic impact on the UC [6,7]. The close genetic links between the UC and the CWC, however, are also seen in very similar mean nitrogen values which, at 10.4 ± 0.7 ‰, were the highest in the overall sample. Moreover, a striking aspect in the evaluation of the mean δ15N values over time is a clear tendency towards rising standard deviations (S4 Fig). It is highly likely that this reflects increased social differentiation in society at the end of the Neolithic and in the Early Bronze Age. Socioeconomic advancement led to differences in status within communities and even to the formation of an elite, the differences applying to numerous facets of life, including dietary habits [60].

Chronological development of the distribution of δ15N-values according to the different archaeological periods. >Numbers of individuals are displayed in parentheses.

I think the overstudied region of Saxony-Anhalt and the Tollense valley region may not be exactly where the Proto-Balto-Slavic homeland actually formed, but they are certainly showing interesting hints to how (and where approximately) it might have happened…


Patterns of genetic differentiation and the footprints of historical migrations in the Iberian Peninsula

Open access preprint (which I announced already) at bioRxiv Patterns of genetic differentiation and the footprints of historical migrations in the Iberian Peninsula, by Bycroft et al. (2018).

Abstract (emphasis mine):

Genetic differences within or between human populations (population structure) has been studied using a variety of approaches over many years. Recently there has been an increasing focus on studying genetic differentiation at fine geographic scales, such as within countries. Identifying such structure allows the study of recent population history, and identifies the potential for confounding in association studies, particularly when testing rare, often recently arisen variants. The Iberian Peninsula is linguistically diverse, has a complex demographic history, and is unique among European regions in having a centuries-long period of Muslim rule. Previous genetic studies of Spain have examined either a small fraction of the genome or only a few Spanish regions. Thus, the overall pattern of fine-scale population structure within Spain remains uncharacterised. Here we analyse genome-wide genotyping array data for 1,413 Spanish individuals sampled from all regions of Spain. We identify extensive fine-scale structure, down to unprecedented scales, smaller than 10 Km in some places. We observe a major axis of genetic differentiation that runs from east to west of the peninsula. In contrast, we observe remarkable genetic similarity in the north-south direction, and evidence of historical north-south population movement. Finally, without making particular prior assumptions about source populations, we show that modern Spanish people have regionally varying fractions of ancestry from a group most similar to modern north Moroccans. The north African ancestry results from an admixture event, which we date to 860 – 1120 CE, corresponding to the early half of Muslim rule. Our results indicate that it is possible to discern clear genetic impacts of the Muslim conquest and population movements associated with the subsequent Reconquista.

“(a) Binary tree showing the inferred hierarchical relationships between clusters. The colours and points correspond to each cluster as shown on the map, and the length of the coloured rectangles is proportional to the number of individuals assigned to that cluster. We combined some small clusters (Methods) and the thick black branches indicate the clades of the tree that we visualise in the map. We have labeled clusters according to the approximate location of most of their members, but geographic data was not used in the inference. (b) Each individual is represented by a point placed at (or close to) the centroid of their grandparents’ birthplaces. On this map we only show the individuals for whom all four grandparents were born within 80km of their average birthplace, although the data for all individuals were used in the fineSTRUCTURE inference. The background is coloured according to the spatial densities of each cluster at the level of the tree where there are 14 clusters (see Methods). The colour and symbol of each point corresponds to the cluster the individual was assigned to at a lower level of the tree, as shown in (a). The labels and boundaries of Spain’s Autonomous Communities are also shown.”

Some interesting excerpts:

Our results further imply that north west African-like DNA predominated in the migration. Moreover, admixture mainly, and perhaps almost exclusively, occurred within the earlier half of the period of Muslim rule. Within Spain, north African ancestry occurs in all groups, although levels are low in the Basque region and in a region corresponding closely to the 14th-century ‘Crown of Aragon’. Therefore, although genetically distinct this implies that the Basques have not been completely isolated from the rest of Spain over the past 1300 years.

NOTE. I must add here that the Expulsion of Moriscos is known to have been quite successful in the old Crown of Aragon – deeply affecting its economy – , in contrast with other territories of the Crown of Castille, where they either formed less sizeable communities, or were dispersed and eventually Christened and integrated with local communities. For example, thousands of Moriscos from Granada were dispersed following the War of Alpujarras (1567–1571) into different regions of the Crown of Castille, and many could not be later expelled due to the locals’ resistance to follow the expulsion edict.

Perhaps surprisingly, north African ancestry does not reflect proximity to north Africa, or even regions under more extended Muslim control. The highest amounts of north African ancestry found within Iberia are in the west (11%) including in Galicia, despite the fact that the region of Galicia as it is defined today (north of the Miño river), was never under Muslim rule and Berber settlements north of the Douro river were abandoned by. This observation is consistent with previous work using Y-chromosome data. We speculate that the pattern we see is driven by later internal migratory flows, such as between Portugal and Galicia, and this would also explain why Galicia and Portugal show indistinguishable ancestry sharing with non-Spanish groups more generally. Alternatively, it might be that these patterns reflect regional differences in patterns of settlement and integration with local peoples of north African immigrants themselves, or varying extents of the large-scale expulsion of Muslim people, which occurred post-Reconquista and especially in towns and cities.

We estimated ancestry profiles for each point on a fine spatial grid across Spain (Methods). Gray crosses show
the locations of sampled individuals used in the estimation. Map shows the fraction contributed from the donor group ‘NorthMorocco’.

Overall, the pattern of genetic differentiation we observe in Spain reflects the linguistic and geopolitical boundaries present around the end of the time of Muslim rule in Spain, suggesting this period has had a significant and long-term impact on the genetic structure observed in modern Spain, over 500 years later. In the case of the UK, similar geopolitical correspondence was seen, but to a different period in the past (around 600 CE). Noticeably, in these two cases, country-specific historical events rather than geographic barriers seem to drive overall patterns of population structure. The observation that fine-scale structure evolves at different rates in different places could be explained if observed patterns tend to reflect those at the ends of periods of significant past upheaval, such as the end of Muslim rule in Spain, and the end of the Anglo-Saxon and Danish Viking invasions in the UK.

Certain people want to believe (well into the 21st century) into ideal ancestral populations and ancient ethnolinguistic identifications linked to one’s own – or the own country’s dominant – ancestral components and Y-DNA haplogroup.

We are nevertheless seeing how mainly the most recent relevant geopolitical events and late internal migratory flows have shaped the genetic structure (including Y-DNA haplogroup composition) of modern regions and countries regardless of its population’s actual language or ethnic identification, whether (pre)historical or modern.

Another surprise for many, I guess.


We are all special, which also means that none of us is


Adam Rutherford writes You’re Descended from Royalty and So Is Everybody Else – Anybody you can name from ancient history is in your family tree, which I discovered via John Hawks’ new post The surprising connectedness of human genealogies over centuries.


One way to think of it is to accept that everyone of European descent should have billions of ancestors at a time in the 10th century, but there weren’t billions of people around then, so try to cram them into the number of people that actually were. The math that falls out of that apparent impasse is that all of the billions of lines of ancestry have coalesced into not just a small number of people, but effectively literally everyone who was alive at that time. So, by inference, if Charlemagne was alive in the ninth century, which we know he was, and he left descendants who are alive today, which we also know is true, then he is the ancestor of everyone of European descent alive in Europe today.

Since most of this blog’s posts support academic disciplines looking for answers to the Indo-European question, and gives constantly reasons against modern genetic (and phylogenetic) identification, I think it is worth at least a quick read for anyone interested in the field.

I recently referred to the interesting series of posts by Graham Coop on this matter.

Featured image: Europe around 800 – the map is public domain from from the Historical Atlas (New York, 1911)


Genetic landscapes showing human genetic diversity aligning with geography


New preprint at BioRxiv, Genetic landscapes reveal how human genetic diversity aligns with geography, by Peter, Petkova, and Novembre (2017).


Summarizing spatial patterns in human genetic diversity to understand population history has been a persistent goal for human geneticists. Here, we use a recently developed spatially explicit method to estimate “effective migration” surfaces to visualize how human genetic diversity is geographically structured (the EEMS method). The resulting surfaces are “rugged”, which indicates the relationship between genetic and geographic distance is heterogenous and distorted as a rule. Most prominently, topographic and marine features regularly align with increased genetic differentiation (e.g. the Sahara desert, Mediterranean Sea or Himalaya at large scales; the Adriatic, inter-island straits in near Oceania at smaller scales). We also see traces of historical migrations and boundaries of language families. These results provide visualizations of human genetic diversity that reveal local patterns of differentiation in detail and emphasize that while genetic similarity generally decays with geographic distance, there have regularly been factors that subtly distort the underlying relationship across space observed today. The fine-scale population structure depicted here is relevant to understanding complex processes of human population history and may provide insights for geographic patterning in rare variants and heritable disease risk.

Regional patterns of genetic diversity. a: scale bar for relative effective migration rate. Posterior effective migration surfaces for b: Western Eurasia (WEA) e: Central/Eastern Eurasia (CEA) g: Africa (AFR) h Southern African hunter-gatherers (SAHG) k: and Southeast Asian (SEA) analysis panels. ‘X’ marks locations of samples noted as displaced or recently admixed, ‘H’ denotes Hunter-Gatherer populations (both ‘X’ and ‘H’ samples are omitted from the EEMS model fit); in panel g, red circles indicate Nilo-Saharan speakers and in panel h, ‘B’ denotes Bantu-speaking populations. Approximate location of troughs are shown with dashed lines (see Extended Data Figure 4). PCA plots: c: WEA d:Europeans in WEA f: CEA i: SAHG j: AFR l: SEA. Individuals are displayed as grey dots. Large dots reflect median PC position for a sample; with colors reflecting geography matched to the corresponding EEMS figure. In the EEMS plots, approximate sample locations are annotated. For exact locations, see annotated Extended Data Figure 4 and Table S1. Features discussed in the main text and supplement are labeled. FST values per panelemphasize the low absolute levels of differentiation.”

Among ‘effective migration surfaces‘ (or potential past migration routes), the Pontic-Caspian steppe and its most direct connection with the Carpathian basin, the Danubian plains, appear maybe paradoxically as a constant ‘trough’ (below average migration rate) in all maps.

After all, we could have agreed that this region should be a priori thought as the route of many migrations from the steppe and Asia into Central Europe (and thus of ‘effective migration’) in prehistoric, proto-historic and historic times, such as Suvorovo-Novodanilovka (Pre-Anatolian), Yamna (Late Indo-European), probably Srubna, Scythian-Cimmerian, Sarmatian, Huns, Goths, Avars, Slavs, Mongols

It most likely (at least partially) represents a rather recent historical barrier to admixture, involving successive Byzantine, South Slavic, and Ottoman spheres of influence positioned against Balto-Slavic societies of Eastern Europe.

Location of troughs in West Eurasia (below average migration rate in more than 95% of MCMC iterations) are given in brown. Sample locations and EEMS grid are displayed for the West Eurasian analysis panel. FST values are provided per panel to emphasize the low absolute levels of differentiation.

Featured image, from the article: “Large-scale patterns of population structure. a: EEMS posterior mean effective migration surface for Afro-Eurasia (AEA) panel. ‘X’ marks locations of samples excluded as displaced or recently admixed. ‘H marks locations of excluded hunter-gatherer populations. Regions and features discussed in the main text are labeled. Approximate locations of troughs are annotated with dashed lines (see Extended Data Figure 4). b: PCA plot of AEA panel: Individuals are displayed as grey dots, colored dots reflect median of sample locations; with colors reflecting geography and matching with the EEMS plot. Locations displayed in the EEMS plot reflect the position of populations after alignment to grid vertices used in the model (see methods).”

Images and text available under a CC-BY-NC-ND 4.0 International License.

Discovered via Razib Khan’s blog.


Islands across the Indonesian archipelago show complex patterns of admixture


An open access article Complex patterns of admixture across the Indonesian archipelago, by Hudjashov et al. (2017), has appeared in Molecular Biology and Evolution, and clarifies further the Austronesian (AN) expansion.


Indonesia, an island nation as large as continental Europe, hosts a sizeable proportion of global human diversity, yet remains surprisingly under-characterized genetically. Here, we substantially expand on existing studies by reporting genome-scale data for nearly 500 individuals from 25 populations in Island Southeast Asia, New Guinea and Oceania, notably including previously unsampled islands across the Indonesian archipelago. We use high-resolution analyses of haplotype diversity to reveal fine detail of regional admixture patterns, with a particular focus on the Holocene. We find that recent population history within Indonesia is complex, and that populations from the Philippines made important genetic contributions in the early phases of the Austronesian expansion. Different, but interrelated processes, acted in the east and west. The Austronesian migration took several centuries to spread across the eastern part of the archipelago, where genetic admixture postdates the archeological signal. As with the Neolithic expansion further east in Oceania and in Europe, genetic mixing with local inhabitants in eastern Indonesia lagged behind the arrival of farming populations. In contrast, western Indonesia has a more complicated admixture history shaped by interactions with mainland Asian and Austronesian newcomers, which for some populations occurred more than once. Another layer of complexity in the west was introduced by genetic contact with maritime travelers from South Asia and strong demographic events in isolated local groups.

Among its results (emphasis is mine):

Most eastern Indonesian populations show traces of admixture that appear to reflect an expansion of AN speakers (Figure 4B, S3). There is a striking similarity between inferred events – each admixed population includes both a Philippine non-Kankanaey and western Indonesian-like source likely representing Holocene movements of Asian farming groups, as well as a Papuan-like source representing local indigenous ancestry. One reason for the lack of clear Taiwanese sources may be because the aboriginal populations of Taiwan were heavily affected by post-AN movements from mainland East Asia, most recently sinicization by Han Chinese, and thus no longer depict the ancestral AN gene pool (Mörseburg, et al. 2016). However, this notable pattern could equally be explained by the dominance of language and culture transfers during early phases of the Neolithic expansion from Taiwan into the Philippines, followed by people with predominantly Philippine ancestry driving later demic diffusion into the Indonesian archipelago. Interestingly, Mörseburg, et al. (2016), by using a different sample set and genotype-based analytical toolkit, indicated that the Kankanaey ethnic group from the Philippines is likely the closest living proxy of the source population that gave rise to the AN expansion. We did not detect this population among sources of admixture in eastern Indonesia, and therefore suggest that the place of individual Philippine groups in the AN expansion needs to be further addressed by better sampling in the Philippine archipelago.

Sumba and Flores, the two westernmost islands to the east of Wallace’s line, display a high proportion of Java and Bali surrogates in their AN admixing source. This suggests that the AN movement into eastern Indonesia, especially for Sumba and Flores, had earlier experienced some degree of genetic contact with western Indonesian groups. In contrast, the sources of AN admixture in Lembata, Alor, Pantar and Timor are dominated by Sulawesi (Figure 4B, S3, Table S3, S5). This generally agrees with expectations from the geography of the region, whereby AN groups exiting the southern Philippines were likely funneled into at least two streams, including a western path through Borneo and a central path through Sulawesi (Blust 2014).

Point estimates of genetic admixture times in eastern Indonesia lie within a narrow timeframe ranging between ca 185 BCE to 360 CE or 75 to 56 generations ago (95% CI 510 BCE – 475 CE or 87–52 generations) (Figure 4B, Table S3). These inferred dates are younger than some previous estimates (120–200 generations ago) (Xu, et al. 2012; Sanderson, et al. 2015; Sedghifar, et al. 2015). A major analysis of admixture in Indonesia estimated the date of AN contact in the eastern part of archipelago to be around 500 to 600 CE (ca 50 generations, CI estimates between 58–42 generations ago) (Lipson, et al. 2014), surprisingly young given the archaeological evidence. However, the study pooled a very small sample of genetically heterogeneous eastern Indonesian islands including, for example, Flores and Alor. As we show here (Figure 2, 4, 5, S3, Table S3, S5, S6), while the wave of AN speakers left a common genetic trace across the whole of eastern Indonesia, the details and dates of this contact vary considerably not only between islands (e.g., Flores and Alor), but also within individual islands (e.g., Flores Rampasasa vs. Flores Bama). The genetic dates, which were obtained here by denser geographical sampling of 8 eastern islands, a much larger number of individuals (28 per island on average) and a greater number of SNPs, are up to 30 generations older, predating the Common Era in many cases.

It therefore took migrants at least half a millennium to proceed from islands around Wallace’s line to the easternmost sampled part of eastern Indonesia. Nevertheless, observed dates for AN contact in eastern Indonesia are still approximately a millennium younger than the earliest Neolithic archaeological evidence in the region, and two explanations seem most likely here. First, the AN migration may have involved several waves of people leaving Taiwan, spanning multiple generations, which would bias date estimates later than the first arrival of the Neolithic archeological assemblage (Sedghifar, et al. 2015). Second, there may have been a substantial time gap between the spread of culture and technological traditions, and the beginning of extensive genetic contact between incoming farming groups and native inhabitants in Indonesia (Lansing, et al. 2011). The lack of considerable admixture with Papuan groups was recently noted in ancient Lapita individuals from Remote Oceania, whose genomes are mostly Asian and carry little to no Papuan ancestry, suggesting limited contact as they moved through Melanesia to previously uninhabited islands in the Pacific (Skoglund, et al. 2016). A lag in admixture between local and incoming Neolithic groups has also been observed in Europe, where hunter-gatherer and farming populations initially co-existed for nearly a thousand years without substantial genetic interaction (Malmström, et al. 2015).

austronesian-admixture Ancestral genomic components in regional populations. For every K, the modal solution with the highest number of ADMIXTURE runs is shown; individual ancestry proportions were averaged across all runs from the same mode and the number of runs (out of 50) assigned to the presented solution is shown in parentheses. Average cross validation statistics were calculated across all runs from the same mode (insert). The minimum cross-validation score is observed at K=9. Note major ancestry components in Indonesia and ISEA – Papuan (light purple), mainland Asian (light yellow) and AN (light blue) – as well as major differences in the distribution of these three ancestries between eastern and western Indonesia. Populations from the Philippines and Flores are abbreviated as ‘Ph.’ and ‘Fl.’, respectively.

Featured images are taken from the article.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (, which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact

Related posts:

Effective migration in Western Eurasia reveals fine-scale migration surface features


Interesting poster from SMBE 2017, Maps of effective migration as a summary of global human genetic diversity, by Benjamin Peter, Desislava Petkova, Matthew Stephens & John Novembre, of the JNPopGen group of the University of Chicago.

You can read the full poster in the original PDF, or in compressed image. The following are important excerpts:

Aim: To answer the following questions:

  • Which regions have high/low effective migration?
  • How well is human genetic diversity explained by this pure isolation-by-distance model?
  • How does the explanatory performance of EEMS compare to PCA?

Method: It uses the method proposed by Petkova et al. (2016) to fit a map of time-averaged (effective) migration rates to geographically referenced samples, and merges data from 24 different studies (8740 individuals from 469 populations) to assess human genetic diversity on global and continental scale.

  1. Basic workflow:
    • Merge data, remove duplicated & related individuals.
    • Remove Hunter-Gatherer and recently admixed populations. Their locations are still indicated with (H) and (X), respectively
  2. EEMS analysis
    • Calculate genetic distance matrix between all individuals.
    • Fit migration map to data using EEMS MCMC algorithm
  3. Comparison to PCA: Standard PCA using flashpca (Abraham & Inouye 2014) was used, they compare correlation of genetic distance induced from first ten PCs with the fitted EEMS distance

Interpretation: A continuous habitat is approximated by a discrete grid (light gray). A Bayesian model is used to infer the most likely migration rates, which are given on a log scale compared to the Average (BLUE= 100x higher, BROWN=100x lower

Map of effective migrations in Europe

Results (see maps):

  1. Global diversity patterns correlate with topographical features
  2. In Western Eurasia, EEMS reveals fine-scale migration surface features

Discussion: EEMS Maps are intuitive and direct way to visualize geographically referenced genetic data.

Dense sampling (WEstern Eurasian panel) in particular yields high resolution and accuracy, but the method works well at a global scale (FST=0.06) and just in Western Eurasia (FST=0.01).

EEMS-maps are able to reasonably well predict genetic differences, but hunter-gatherer populations and admixed populations were a priori excluded.

Discovered via Eurogenes. Full image via Reddit.