Islands across the Indonesian archipelago show complex patterns of admixture


An open access article Complex patterns of admixture across the Indonesian archipelago, by Hudjashov et al. (2017), has appeared in Molecular Biology and Evolution, and clarifies further the Austronesian (AN) expansion.


Indonesia, an island nation as large as continental Europe, hosts a sizeable proportion of global human diversity, yet remains surprisingly under-characterized genetically. Here, we substantially expand on existing studies by reporting genome-scale data for nearly 500 individuals from 25 populations in Island Southeast Asia, New Guinea and Oceania, notably including previously unsampled islands across the Indonesian archipelago. We use high-resolution analyses of haplotype diversity to reveal fine detail of regional admixture patterns, with a particular focus on the Holocene. We find that recent population history within Indonesia is complex, and that populations from the Philippines made important genetic contributions in the early phases of the Austronesian expansion. Different, but interrelated processes, acted in the east and west. The Austronesian migration took several centuries to spread across the eastern part of the archipelago, where genetic admixture postdates the archeological signal. As with the Neolithic expansion further east in Oceania and in Europe, genetic mixing with local inhabitants in eastern Indonesia lagged behind the arrival of farming populations. In contrast, western Indonesia has a more complicated admixture history shaped by interactions with mainland Asian and Austronesian newcomers, which for some populations occurred more than once. Another layer of complexity in the west was introduced by genetic contact with maritime travelers from South Asia and strong demographic events in isolated local groups.

Among its results (emphasis is mine):

Most eastern Indonesian populations show traces of admixture that appear to reflect an expansion of AN speakers (Figure 4B, S3). There is a striking similarity between inferred events – each admixed population includes both a Philippine non-Kankanaey and western Indonesian-like source likely representing Holocene movements of Asian farming groups, as well as a Papuan-like source representing local indigenous ancestry. One reason for the lack of clear Taiwanese sources may be because the aboriginal populations of Taiwan were heavily affected by post-AN movements from mainland East Asia, most recently sinicization by Han Chinese, and thus no longer depict the ancestral AN gene pool (Mörseburg, et al. 2016). However, this notable pattern could equally be explained by the dominance of language and culture transfers during early phases of the Neolithic expansion from Taiwan into the Philippines, followed by people with predominantly Philippine ancestry driving later demic diffusion into the Indonesian archipelago. Interestingly, Mörseburg, et al. (2016), by using a different sample set and genotype-based analytical toolkit, indicated that the Kankanaey ethnic group from the Philippines is likely the closest living proxy of the source population that gave rise to the AN expansion. We did not detect this population among sources of admixture in eastern Indonesia, and therefore suggest that the place of individual Philippine groups in the AN expansion needs to be further addressed by better sampling in the Philippine archipelago.

Sumba and Flores, the two westernmost islands to the east of Wallace’s line, display a high proportion of Java and Bali surrogates in their AN admixing source. This suggests that the AN movement into eastern Indonesia, especially for Sumba and Flores, had earlier experienced some degree of genetic contact with western Indonesian groups. In contrast, the sources of AN admixture in Lembata, Alor, Pantar and Timor are dominated by Sulawesi (Figure 4B, S3, Table S3, S5). This generally agrees with expectations from the geography of the region, whereby AN groups exiting the southern Philippines were likely funneled into at least two streams, including a western path through Borneo and a central path through Sulawesi (Blust 2014).

Point estimates of genetic admixture times in eastern Indonesia lie within a narrow timeframe ranging between ca 185 BCE to 360 CE or 75 to 56 generations ago (95% CI 510 BCE – 475 CE or 87–52 generations) (Figure 4B, Table S3). These inferred dates are younger than some previous estimates (120–200 generations ago) (Xu, et al. 2012; Sanderson, et al. 2015; Sedghifar, et al. 2015). A major analysis of admixture in Indonesia estimated the date of AN contact in the eastern part of archipelago to be around 500 to 600 CE (ca 50 generations, CI estimates between 58–42 generations ago) (Lipson, et al. 2014), surprisingly young given the archaeological evidence. However, the study pooled a very small sample of genetically heterogeneous eastern Indonesian islands including, for example, Flores and Alor. As we show here (Figure 2, 4, 5, S3, Table S3, S5, S6), while the wave of AN speakers left a common genetic trace across the whole of eastern Indonesia, the details and dates of this contact vary considerably not only between islands (e.g., Flores and Alor), but also within individual islands (e.g., Flores Rampasasa vs. Flores Bama). The genetic dates, which were obtained here by denser geographical sampling of 8 eastern islands, a much larger number of individuals (28 per island on average) and a greater number of SNPs, are up to 30 generations older, predating the Common Era in many cases.

It therefore took migrants at least half a millennium to proceed from islands around Wallace’s line to the easternmost sampled part of eastern Indonesia. Nevertheless, observed dates for AN contact in eastern Indonesia are still approximately a millennium younger than the earliest Neolithic archaeological evidence in the region, and two explanations seem most likely here. First, the AN migration may have involved several waves of people leaving Taiwan, spanning multiple generations, which would bias date estimates later than the first arrival of the Neolithic archeological assemblage (Sedghifar, et al. 2015). Second, there may have been a substantial time gap between the spread of culture and technological traditions, and the beginning of extensive genetic contact between incoming farming groups and native inhabitants in Indonesia (Lansing, et al. 2011). The lack of considerable admixture with Papuan groups was recently noted in ancient Lapita individuals from Remote Oceania, whose genomes are mostly Asian and carry little to no Papuan ancestry, suggesting limited contact as they moved through Melanesia to previously uninhabited islands in the Pacific (Skoglund, et al. 2016). A lag in admixture between local and incoming Neolithic groups has also been observed in Europe, where hunter-gatherer and farming populations initially co-existed for nearly a thousand years without substantial genetic interaction (Malmström, et al. 2015).

austronesian-admixture Ancestral genomic components in regional populations. For every K, the modal solution with the highest number of ADMIXTURE runs is shown; individual ancestry proportions were averaged across all runs from the same mode and the number of runs (out of 50) assigned to the presented solution is shown in parentheses. Average cross validation statistics were calculated across all runs from the same mode (insert). The minimum cross-validation score is observed at K=9. Note major ancestry components in Indonesia and ISEA – Papuan (light purple), mainland Asian (light yellow) and AN (light blue) – as well as major differences in the distribution of these three ancestries between eastern and western Indonesia. Populations from the Philippines and Flores are abbreviated as ‘Ph.’ and ‘Fl.’, respectively.

Featured images are taken from the article.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (, which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact

Related posts:

When linguistics does not seem to be a science


An interesting essay by Arika Okrent has appeared in Aeon – Is linguistics a science? It concerns the central position of Chomsky’s Universal Grammar to modern Linguistics, and revolves around a story in Tom Wolfe’s book The Kingdom of Speech (2016), Everett’s discovery of the Pirahã culture’s (and language’s) emphasis on the here and now: not embedding one phrase inside another, the simple kinship system, lack of numbers, and absence of fiction or creation myths. Some excerpts of the essay:

This looks suspiciously like defiance of a central feature of the scientific archetype, one first put forward by the philosopher Karl Popper: theories are not scientific unless they have the potential to be falsified. If you claim that recursion is the essential feature of language, and if the existence of a recursionless language does not debunk your claim, then what could possibly invalidate it?


In an interview with in 2007, Everett said he emailed Chomsky: ‘What is a single prediction that universal grammar makes that I could falsify? How could I test it?’ According to Everett, Chomsky replied to say that universal grammar doesn’t make any predictions; it’s a field of study, like biology.


By contrast, good theories or hypotheses are those that allow you to search for contrary evidence. Thus Albert Einstein’s theory of general relativity made a very specific prediction about the effect of gravity on light, which could be subsequently tested during the solar eclipse of 1919. Unlike astrology or Freudianism, relativity could be contradicted. It was possible to conceive of an observation that would conflict with one’s expectations (although the eclipse ultimately vindicated Einstein). The capacity to be disproved is what makes general relativity scientific.


In Chomsky’s formulation, we are not just after a set of abstract rules that account for the things we can see and hear, but one that explains why they are the way they are. In the late 1970s, Chomsky began to refer to this method of enquiry as the ‘Galilean style’


Chomsky’s Galilean vision was that our intuitive judgments about language stem from an innate language faculty, a universal grammar underlying the human capacity for language. His project is to determine the essential nature of that universal grammar – not the nature of language, but the nature of the human capacity for language. The distinction is a subtle one.

Regarding the pseudoscience claims about Linguistics, or in this case Chomsky’s Universal Grammar, and the common answer to such criticism of linguistic abstractions by their authors (asserting that they can be “neither right nor wrong” but only “fecund or sterile”), they reminded me of an old XKCD comic which sums up this line of reasoning quite well:

Two more studies on the genetic history of East Asia: Han Chinese and Thailand


A comprehensive map of genetic variation in the world’s largest ethnic group – Han Chinese, by Charleston et al. (2017).

It is believed – based on uniparental markers from modern and ancient DNA samples and array-based genome-wide data – that Han Chinese originated in the Central Plain region of China during prehistoric times, expanding with agriculture and technology northward and southward, to become the largest Chinese ethnic group.


As are most non-European populations around the globe, the Han Chinese are relatively understudied in population and medical genetics studies. From low-coverage whole-genome sequencing of 11,670 Han Chinese women we present a catalog of 25,057,223 variants, including 548,401 novel variants that are seen at least 10 times in our dataset. Individuals from our study come from 19 out of 22 provinces across China, allowing us to study population structure, genetic ancestry, and local adaptation in Han Chinese. We identify previously unrecognized population structure along the East-West axis of China and report unique signals of admixture across geographical space, such as European influences among the Northwestern provinces of China. Finally, we identified a number of highly differentiated loci, indicative of local adaptation in the Han Chinese. In particular, we detected extreme differentiation among the Han Chinese at MTHFR, ADH7, and FADS loci, suggesting that these loci may not be specifically selected in Tibetan and Inuit populations as previously suggested. On the other hand, we find that Neandertal ancestry does not vary significantly across the provinces, consistent with admixture prior to the dispersal of modern Han Chinese. Furthermore, contrary to a previous report, Neandertal ancestry does not explain a significant amount of heritability in depression. Our findings provide the largest genetic data set so far made available for Han Chinese and provide insights into the history and population structure of the world’s largest ethnic group.

Using Shanghai individuals as representatives, shared drift between Chinese and ancient humans are computed by calculating the outgroup f3 statistics of the form f3(Mbuty;X, Y), with ancient individuals separated into approximately Palaeolithic, Mesolithic, Neolithic , and Chalcolithic-Medieval times. it is found that modern Chinese individuals show greater shared drift with pre-Neolithic hunter-gatherers rather than Neolithic farmers (Featured image from the article).

EDIT (17/7/2017): Davidski at Eurogenes shares an interesting view on this kind of results:

These sorts of estimates always look way off. And I doubt that it’s largely the result of the Silk Road, which linked China to the Near East and Mediterranean rather than to Northern Europe. More likely it reflects gene flow from the Pontic-Caspian steppe in Eastern Europe during the Bronze and Iron ages, via the Afanasievo, Andronovo, and other closely related steppe peoples

New insights from Thailand into the maternal genetic history of Mainland Southeast Asia, by Kutanan et al. (2017)


Tai-Kadai (TK) is one of the major language families in Mainland Southeast Asia (MSEA), with a concentration in the area of Thailand and Laos. Our previous study of 1,234 mtDNA genome sequences supported a demic diffusion scenario in the spread of TK languages from southern China to Laos as well as northern and northeastern Thailand. Here we add an additional 560 mtDNA sequences from 22 groups, with a focus on the TK-speaking central Thai people and the Sino-Tibetan speaking Karen. We find extensive diversity, including 62 haplogroups not reported previously from this region. Demic diffusion is still a preferable scenario for central Thais, emphasizing the extension and expansion of TK people through MSEA, although there is also some support for an admixture model. We also tested competing models concerning the genetic relationships of groups from the major MSEA languages, and found support for an ancestral relationship of TK and Austronesian-speaking groups.

Potential Afroasiatic Urheimat near Lake Megachad


The publication of new ancient DNA samples from Africa is near, according to people at the SMBE meeting. As reported by, a group by Pontus Skoglund has analysed new samples (complementing the study made by Carina Schlebusch), so we will have ancient samples of Africans from 300 to 6,000 years ago. They have been compared to the data of modern African populations, and among their likely conclusions (to be published):

  • Several thousand years ago, likely Tanzanian herders migrated far and wide, reaching Southern Africa centuries before the first farmers.
  • West Africans were likely early contributors to the gene pool of sub-Saharan Africans.
  • One ancient African herder showed influence from even farther abroad, with 38% of their DNA coming from outside Africa. 9-22% of the DNA of modern farmers, including the southern Khoe-San, comes from East Africans and Eurasian herders
  • Modern farmers, the ones as old as 500 years old, did have Bantu DNA in their genomes, but the ancient hunter-gatherers predated the spread of the Bantu.

Razib Khan, asked about the Afroasiatic homeland by David Reich, has taken this opportunity to publish his own hypothesis on the expansion of Afroasiatic, given the known Admixture analyses, using Y-DNA phylogeography, and with reasonable assumptions. He concludes that Afroasiatic expansion might also be associated with the western expansion of E1b1b subclades from a Levantine (“Natufian”) homeland.

I think it is necessary to remind everyone of the many problems unsolved by Indo-European studies – a much older discipline (and with more research published) than Afroasiatic studies. It is already quite revealing that we can’t still trace back Proto-Semitic to its homeland, and that Proto-Semitic is probably as old as Late Proto-Indo-European. We are talking, then, about an ancient proto-language – Afroasiatic – possibly older than Middle Indo-European (or Indo-Hittite), and whose dialects are still not well studied – but for the Semitic and Egyptian branches. Linguistic guesstimates or phylogenetic speculation date the proto-language (and thus the homeland) within a wide range, from 15,000 to 6,000 years ago.

There is an obvious trend (probably driven by Semitic and Egyptian researchers) to place the Afroasiatic Homeland near one of the many proposed Semitic homelands, i.e. in East Africa. This is similar to the trend seen in the first half of the 20th century in Indo-European studies, with most proposals locating the Proto-Indo-European homeland in Europe. European languages were the best known, and only the perceived antiquity of Vedic Sanskrit made some propose South Asian origins for the proto-language. However, it was only careful interpretation of linguistic finds, combined with archaeological data, what eventually yielded the Kurgan hypothesis, which has been since refined.

A model for the homeland and expansion of Afroasiatic, from Wikipedia

Razib Khan’s proposal makes sense in that it fits what others have proposed before, i.e. an east African or Middle Eastern Afroasiatic homeland, and that it links it with the expansion of farming. However, we have to keep in mind that until 5,000 years ago the Sahara was not the desert we know: it had certain important green corridors, humid areas between megalakes. The Sahara might not have been exactly green 10,000 to 5,000 years ago (roughly the time when Afroasiatic must have been spoken), but it had certain regions that allowed for an east-west migration. However, it also allowed for a west-east migration, and – perhaps more importantly – for a sizeable population expansion in central Saharan territory. To forget that is to allow for potentially wrong assumptions to be made.

What we expect from the next papers on ancient African DNA samples are the result of certain (more recent) population – and thus potentially ethnolinguistic – movements, but they probably won’t solve the question of the Afroasiatic homeland, which has an older time span than the samples studied. There is a wide void in African prehistory – compared with Near Eastern history – and this research will be closing that gap, just like European samples are helping close the gap in the prehistory of western, northern, and eastern Europe, compared to the history of the eastern Mediterranean regions.

Diachronic map of Paleolithic migrations of R1b lineages in Europe and Africa

I already wrote, regarding the potential ethnolinguistic link between Indo-European and Afroasiatic, that a close look at the migration of R1b-V88 lineages from Europe (through southern Italy?) into the Sahara – through the Fezzan-Chad-Chotts, and Chad-Chotts-Ahnet-Moyer megalake green corridors – could have been the key to the successful expansion of Afrasians.

Interesting aspects to take into account are the distribution of R1b-V88 lineages, compared to the location of Chadic languages (probably the most divergent and least known of the group) and to the potential North Afroasiatic (composed by Egyptian, Berber, and Semitic) and South Afroasiatic group (made of Cushitic and Omotic). Chadic has been argued to be connected variously to North Afroasiatic, or to the Berber branch, but the Northern group has also been argued to be connected with Cushitic, with Omotic as an independent branch. Also interesting would then be the potential connection between Indo-European (or Indo-Uralic) and Afroasiatic.

Modern distribution of haplogroup R1b, from Wikipedia

We could speculatively place the potential primary Afroasiatic homeland in the south-central Sahara, near the Megachad lake (i.e. near the peak of R1b-V88 lineages), with a secondary homeland in eastern Africa (as in the map above) – and maybe a tertiary homeland (of North Afroasiatic) in the Middle East, associated with the expansion of “Natufians” and E1b1b subclades. The identification of the spread of Afroasiatic languages with the expansion of R1b-V88 lineages needs an anthropological context (linguistic and archaeological) that is obviously lacking today.

It is important to keep all possibilities in sight when reviewing genetic analyses.


EDIT (16/7/2017): Added link to Neby’s post on a potential Semitic homeland, and Nature article on Schlebusch and Skoglund research.

My European Family: The First 54,000 years, by Karin Bojs


I have recently read the book My European Family: The First 54,000 years (2015), by Karin Bojs, a known Swedish scientific journalist, former science editor of the Dagens Nyheter.

My European Family: The First 54,000 Years
It is written in a fresh, dynamic style, and contains general introductory knowledge to Genetics, Archaeology, and their relation to language, and is written in a time of great change (2015) for the disciplines involved.

The book is informed, it shows a balanced exercise between responsible science journalism and entertaining content, and it is at times nuanced, going beyond the limits of popular science books. It is not written for scholars, although you might learn – as I did – interesting details about researchers and institutions of the anthropological disciplines involved. It contains, for example, interviews with known academics, which she uses to share details about their personalities and careers, which give – in my opinion – a much needed context to some of their publications.

Since I am clearly biased against some of the findings and research papers which are nevertheless considered mainstream in the field (like the identification of haplogroup R1a with the Proto-Indo-European expansion, or the concept of steppe admixture), I asked my wife (who knew almost nothing about genetics, or Indo-European studies) to read it and write a summary, if she liked it. She did. So much, that I have convinced her to read The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World (2007), by David Anthony.

Here is her summary of the book, translated from Spanish:

The book is divided in three main parts: The Hunters, The Farmers, and The Indo-Europeans, and each has in turn chapters which introduce and break down information in an entertaining way, mixing them with recounts of her interactions and personal genealogical quest.

Part one, The Hunters, offers intriguing accounts about the direct role music had in the development of the first civilizations, the first mtDNA analyses of dogs (Savolainen), and the discovery of the author’s Saami roots. Explanations about the first DNA studies and their value for archaeological studies are clear and comprehensible for any non-specialized reader. Interviews help give a close view of investigations, like that of Frederic Plassard’s in Les Combarelles cave.

Part two, The Farmers, begins with her travel to Cyprus, and arouses the interest of the reader with her description of the circular houses, her notes on the Basque language, the new papers and theories related to DNA analyses, the theory of the decision of cats to live with humans, the first beers, and the houses built over graves. Karin Bojs analyses the subgroup H1g1 of her grandmother Hilda, and how it belonged to the first migratory wave into Central Europe. This interest in her grandmother’s origins lead her to a conference in Pilsen about the first farmers in Europe, where she knows firsthand of the results of studies by János Jakucs, and studies of nuclear DNA. Later on she interviews Guido Brandt and Joachim Burguer, with whom she talks about haplogroups U, H, and J.

The chapter on Ötzi and the South Tyrol Museum of Archaeology (Bolzano) introduces the reader to the first prehistoric individual whose DNA was analysed, belonging to haplogroup G2a4, but also revealing other information on the Iceman, such as his lactose intolerance.

Part three, dealing with the origin of Indo-Europeans, begins with the difficulties that researchers have in locating the origin of horse domestication (which probably happened in western Kazakhstan, in the Russian steppe between the rivers Volga and Don). She mentions studies by David Anthony and on the Yamna culture, and its likely role in the diffusion of Proto-Indo-European. In an interview with Mallory in Belfast, she recalls the potential interest of far-right extremists in genetic studies (and early links of the Journal of Indo-European Studies to certain ideology), as well as controversial statements of Gimbutas, and her potentially biased vision as a refugee from communist Europe. During the interview, Mallory had a copy of the latest genetic paper sent to Nature Magazine by Haak et al., not yet published, for review, but he didn’t share it.

Then haplogroups R1a and R1b are introduced as the most common in Europe. She visits the Halle State Museum of Prehistory (where the Nebra sky disk is exhibited), and later Krakow, where she interviews Slawomir Kadrow, dealing with the potential creation of the Corded Ware culture from a mix of Funnelbeaker and Globular Amphorae cultures. New studies of ancient DNA samples, published in the meantime, are showing that admixture analyses between Yamna and Corded Ware correlate in about 75%.

In the following chapters there is a broad review of all studies published to date, as well as individuals studied in different parts of Europe, stressing the importance of ships for the expansion of R1b lineages (Hjortspring boat).

The concluding chapter is dedicated to vikings, and is used to demystify them as aggressive warmongers, sketching their relevance as founders of the Russian state.

To sum up, it is a highly documented book, written in a clear style, and is capable of awakening the reader’s interest in genetic and anthropological research. The author enthusiastically looks for new publications and information from researchers, but is at the same time critic with them, showing often her own personal reactions to new discoveries, all of which offers a complex personal dynamic often shared by the reader, engaged with her first-person account the full length of the book.

Mayte Batalla (July 2017)

DISCLAIMER: The author sent me a copy of the book (a translation into Spanish), so there is a potential conflict of interest in this review. She didn’t ask for a review, though, and it was my wife who did it.

Effective migration in Western Eurasia reveals fine-scale migration surface features


Interesting poster from SMBE 2017, Maps of effective migration as a summary of global human genetic diversity, by Benjamin Peter, Desislava Petkova, Matthew Stephens & John Novembre, of the JNPopGen group of the University of Chicago.

You can read the full poster in the original PDF, or in compressed image. The following are important excerpts:

Aim: To answer the following questions:

  • Which regions have high/low effective migration?
  • How well is human genetic diversity explained by this pure isolation-by-distance model?
  • How does the explanatory performance of EEMS compare to PCA?

Method: It uses the method proposed by Petkova et al. (2016) to fit a map of time-averaged (effective) migration rates to geographically referenced samples, and merges data from 24 different studies (8740 individuals from 469 populations) to assess human genetic diversity on global and continental scale.

  1. Basic workflow:
    • Merge data, remove duplicated & related individuals.
    • Remove Hunter-Gatherer and recently admixed populations. Their locations are still indicated with (H) and (X), respectively
  2. EEMS analysis
    • Calculate genetic distance matrix between all individuals.
    • Fit migration map to data using EEMS MCMC algorithm
  3. Comparison to PCA: Standard PCA using flashpca (Abraham & Inouye 2014) was used, they compare correlation of genetic distance induced from first ten PCs with the fitted EEMS distance

Interpretation: A continuous habitat is approximated by a discrete grid (light gray). A Bayesian model is used to infer the most likely migration rates, which are given on a log scale compared to the Average (BLUE= 100x higher, BROWN=100x lower

Map of effective migrations in Europe

Results (see maps):

  1. Global diversity patterns correlate with topographical features
  2. In Western Eurasia, EEMS reveals fine-scale migration surface features

Discussion: EEMS Maps are intuitive and direct way to visualize geographically referenced genetic data.

Dense sampling (WEstern Eurasian panel) in particular yields high resolution and accuracy, but the method works well at a global scale (FST=0.06) and just in Western Eurasia (FST=0.01).

EEMS-maps are able to reasonably well predict genetic differences, but hunter-gatherer populations and admixed populations were a priori excluded.

Discovered via Eurogenes. Full image via Reddit.