The genetic makings of South Asia – IVC as Proto-Dravidian


Review (behind paywall) The genetic makings of South Asia, by Metspalu, Monda, and Chaubey, Current Opinion in Genetics & Development (2018) 53:128-133.

Interesting excerpts (emphasis mine):

(…) the spread of agriculture in Europe was a result of the demic diffusion of early Anatolian farmers, it was discovered that the spread of agriculture to South Asia was mediated by a genetically completely different farmer population in the Zagros mountains in contemporary Iran (IF). The ANI-ASI cline itself was interpreted as a mixture of three components genetically related to Iranian agriculturalists, Onge and Early and Middle Bronze Age Steppe populations (Steppe_EMBA).

The first ever autosomal aDNA from South Asia comes from Northern Pakistan (Swat Valley, early Iron Age). This study presented altogether 362 aDNA samples from the broad South and Central Asia and contributes substantially to our understanding of the evolutionary past of South and Central Asia. The study redefines the three genetic strata that form the basis of the Indian Cline. The Indus Periphery (IP) component is composed of (varying proportions of): first, IF, second, Ancient Ancestral South Asians (AASI), which represents an ancient branch of human genetic variation in Asia arising from a population split contemporaneous with the splits of East Asian, Onge and Australian Aboriginal ancestors and third, West_Siberian Hunter gatherers (WS_HG).

The authors argue that IP could have formed the genetic base of the Indus Valley Civilization (IVC). Upon the collapse of the IVC IP contributes to the formation of both ASI and ANI. ASI is formed as IP admixes further with AASI. ANI in turn forms when IP admixes with the incoming Middle and Late Bronze Age Steppe (Steppe_MLBA) component, (rather than the Steppe_EMBA groups suggested earlier)

A sketch of the peopling history of South Asia. Depicting the full complexity of available reconstructions is not attempted. Placing of population labels does not indicate precise geographic location or range of the population in question. Rather we aim to highlight the essentials of the recent advancements in the field. We divide the scenario into three time horizons: Panels (a) before 10 000 BCE (pre agriculture era.); (b) 10 000 BCE to 3000 BCE (agriculture era) and (c) 3000 BCE to prehistoric era/modern era. (iron age).

Dating of the arrival of the Austro-Asiatic speakers in South Asia-based on Y chromosome haplogroup O2a1-M95 expansion estimates yielded dates between 3000 and 2000 BCE [30]. However, admixture LD decay-based approach on genome-wide data suggests the admixture between South Asian and incoming Austro-Asiatic speakers occurred slightly later between 1800 and 0 BCE (Tätte et al. submitted). It is interesting that while the mtDNA variants of the Mundas are completely South Asian, the Y chromosome variation is dominated at >60% by haplogroup O2a which is phylogeographically nested in East Asian-specific paternal lineages.

In India, the speakers of Tibeto-Burman (TB) languages live in the Seven Sisters States in Northeast India and in the very north of the country. Genetically they show a clear East Asian origin and around 20% of subsequent admixture with South Asians within the last 1000 years.The genetic flavour of East Asia in TB is different from that in Munda speakers as the best surrogates for the East Asian admixing component are contemporary Han Chinese.

I found the simplistic migration maps especially interesting to illustrate ancient population movements. The emergence of EHG is supposed to involve a WHG:ANE cline, though, and this isn’t clear from the map. Also, there is new information on what may be at the origin of WHG and Anatolian hunter-gatherers.

From the recent Reich’s session on South Asia at ISBA 8:

– Tale of three clines, with clear indication that “Indus Periphery” samples drawn from an already-cosmopolitan and heterogeneous world of variable ASI & Iranian ancestry. (I know how some people like to pore over these pictures – so note red dots = just dummy data for illustration.)
– Some more certainty about primary window of steppe ancestry injection into S. Asia: 2000-1500 BC
Alexander M. Kim

Featured image: map of South Asian languages from


Munda admixture happened probably during the ANI-ASI mixture


Preprint The genetic legacy of continental scale admixture in Indian Austroasiatic speakers, by Tätte et al. bioRxiv (2018).

Interesting excerpts:

Studies analysing mtDNA and Y chromosome markers have revealed a sex-specific admixture pattern of admixture of Southeast and South Asian ancestry components for Munda speakers. While close to 100% of mtDNA lineages present in Mundas match those in other Indian populations, around 65% of their paternal genetic heritage is more closely related to Southeast Asian than South Asian variation. Such a contrasting distribution of maternal and paternal lineages among the Munda speakers is a classic example of ‘father tongue hypothesis’. However, the temporality of this expansion is contentious. Based on Y-STR data the coalescent time of Indian O2a-M95 haplogroup was estimated to be >10 KYA. Recently, the reconstructed phylogeny of 8.8 Mb region of Y chromosome data showed that Indian O2a-M95 lineages coalesce within a clade nested within East/Southeast Asian within the last ~5-7 KYA. This date estimate sets the upper boundary for the main episode of gene flow of Y chromosomes from Southeast Asia to India.

Supplementary Figure S4. First two components of principal component analysis (PCA). Individuals and population medians (circles) are marked with abbreviations from population names. Different colours represent populations from different geographic areas and/or linguistic groups as shown on the legend on the right. For the full names of populations see Supplementary Table S1. PCA was performed using software EIGENSOFT 6.1.42 on the whole filtered dataset (1072 individuals), previously LD pruned as described in the title of Supplementary Figure S1. The first two principal components describe 5.13% and 2.57% of total variance.

Admixture proportions suggest a novel scenario

Regardless of which West Asian population we used, we found that Munda speakers can be described on average as a mixture of ~19% Southeast Asian, 15% West Asian and 66% Onge (South Asian) components. Alternatively, the West and South Asian components of Munda could be modelled using a single South Asian population (Paniya), accounting on average to 77% of the Munda genome. When rescaling the West and South Asian (Onge) components to 1 to explore the Munda genetic composition prior to the introduction of the Southeast Asian component, we note that the West Asian component is lower (~19%) in Munda compared to Paniya (27%) (Supplementary Table S4: *Average_Lao=0). Consistently with qpGraph analyses in Narasimhan et al. (2018), this may point to an initial admixture of a Southeast Asian substrate with a South Asian substrate free of any West Asian component, followed by the encounter of the resulting admixed population with a Paniya-like population. Such a scenario would imply an inverse relationship between the Southeast and West Asian relative proportions in Munda or, in other words, the increase of Southeast Asian component should cause a greater reduction of the West Asian compared to the reduction in the South Asian component in Munda.

The distribution of genetic components (K=13) based on the global ADMIXTURE analysis (Supplementary Figure S1, S2, S3) for a subset of populations on a map of South and Southeast Asia. The circular legend in the bottom left corner shows the ancestral components corresponding to the colours on pie charts. The sector sizes correspond to population median.

Dating the admixture event

In this study, we have replicated a result previously reported in Chaubey et al. (2011)7 that the Mundas lack one ancestral component (k2) that is characteristic to Indian Indo-European and Dravidian speaking populations. If this component came to India through one of the Indo-Aryan migrations then it would be fair to presume that the Munda admixture happened before this component reached India or at least before it spread all over the country. However, the admixture time computed here, falls in the exact same timeframe as the ANI-ASI mixture has been estimated to have happened in India through which the k2 component probably spread. Therefore, we propose that if the Munda admixture happened at the same time, it is possible for it to have happened in the eastern part of the country, east of Bangladesh, and later when populations from East Asia moved to the area, the Mundas migrated towards central India. Such a scenario, which may be further clarified by ancient DNA analyses, seems to be further supported by the fact that Mundas harbor a smaller fraction of West Asian ancestry compared to contemporary Paniya (Supplementary Table S4) and cannot therefore be seen as a simple admixture product of Southern Indian populations with incoming Southeast Asian ancestries.

Image from Damgaard et al. (2018). A summary of the four qpAdm models fitted for South Asian populations. For each modern South Asian population. we fit different models with qpAdm to explain their ancestry composition using ancient groups and present the f irst model that we could not reject in the following priority order: 1. Namazga_CA + Onge, 2. Namazga_CA + Onge + Late Bronze Age Steppe, 3. Namazga_CA + Onge + Xiongnu_lA (East Asian proxy). and 4. Turkmenistan_lA + Xiongnu_lA. Xiongnu_lA were used here to represent East Asian ancestry. We observe that while South Asian Dravidian speakers can be modeled as a mixture of Onge and Namazga_CA. an additional source related to Late Bronze Age steppe groups is required for IE speakers. In Tibeto-Burman and Austro-Asiatic speakers. an East Asian rather than a Steppe_MLBA source is required

Linguistics and genome-wide data

(…) by and large, the linguistic classification justifies itself but Kharia and Juang do not fit in this simplification perfectly.

Once again, with the current level of detail in genetic studies, there is often no clear dialectal division possible for certain groups without fine-scale population studies, and the help from linguistics and archaeology.

Featured image from open access paper by Chaubey et al. (2011).


Modelling of prehistoric dispersal of rice varieties in India point to a north-western origin


New paper (behind paywall), A tale of two rice varieties: Modelling the prehistoric dispersals of japonica and proto-indica rices, by Silva et al., The Holocene (2018).

Interesting excerpts (emphasis mine):


Our empirical evidence comes from the Rice Archaeological Database (RAD). The first version of this database was used for a synthesis of rice dispersal by Fuller et al. (2010), a slightly expanded dataset (version 1.1) was used to model the dispersal of rice, land area under wet rice cultivation and associated methane emissions from 5000–1000 BP (Fuller et al., 2011). The present dataset (version 2) was used in a previous analysis of the origins of rice domestication (Silva et al., 2015). The database records sites and chronological phases within sites where rice has been reported, including whether rice was identified from plant macroremains, phytoliths or impressions in ceramics. Ages are recorded as the start and end date of each phase, and a median age of the phase is then used for analysis. Dating is based on radiocarbon evidence (…)

Modelling framework

Our approach expands on previous efforts to model the geographical origins, and subsequent spread, of japonica rice (Silva et al., 2015). The methodology is based on the explicit modelling of dispersal hypotheses using the Fast Marching algorithm, which computes the cost-distance of an expanding front at each point of a discrete lattice or raster from the source(s) of diffusion (Sethian, 1996; Silva and Steele, 2012, 2014). Sites in the RAD database are then queried for their cost-distance, the distance from the source(s) of dispersal along the cost-surface that represents the hypothesis being modelled (see Connolly and Lake, 2006; Douglas, 1994; Silva et al., 2015; Silva and Steele, 2014 for more on this approach) and, together with the site’s dating, used for regression analysis. (…)

Predicted arrival times of the non-shattering rice variety (japonica or the hybrid indica) across southern Asia based on best-fitting model H2. Included are also sites with known presence of non-shattering spikelet bases (see text).

Model and results

The ‘Inner Asia Mountain Corridor’ hypothesis (H2) therefore predicts japonica rice to arrive first in northwest India via a route that starts in the Yellow river valley, travels west via the well-known Hexi corridor, then just south of the Inner Asian Mountains and thence to India.

The results also show that the addition of the Inner Asia Mountain Corridor significantly improves the model’s fit to the data, particularly model H2 where rice is introduced to the Indian subcontinent exclusively via a trade route that circumvents the Tibetan plateau. This agrees with independent archaeological evidence that sees millets spread westwards along this corridor perhaps as early as 3000 BC (e.g. Boivin et al., 2012; Kohler-Schneider and Canepelle, 2009; Rassamakin, 1999) and certainly by 2500–2000 BC (Frachetti et al., 2010; Spengler 2015; Stevens et al., 2016), that is, in the same time frame as that predicted for rice in model H2. The arrival of western livestock (sheep, cattle) into central China, 2500–2000 BC (Fuller et al., 2011; Yuan and Campbell, 2009), and wheat, ca. 2000 BC (Betts et al., 2014; Flad et al., 2010; Stevens et al., 2016; Zhao, 2015), add evidence for the role of the Inner Asia Mountain Corridor for domesticated species dispersal in this period.


Through a combination of explicit spatial modelling and simulation, we have demonstrated the high likelihood that dispersal of rice via traders in Central Asia introduced japonica rice into South Asia. Only slightly less likely is a combination of introduction via two routes including a Central Asia to Pakistan/northwestern India route as well as introduction to northeastern India directly from China/Myanmar. However, there is a very low probability that current archaeological evidence for rice fits with a single introduction of japonica into India via the northeast. We have also simulated the minimum amount of archaeobotanical sampling from the Neolithic (to Bronze Age) period in the regions of northeastern India and Myanmar that will be necessary to strengthen support for the combined introduction (model H3) or a single Central Asian introduction (model H2).


Mitogenomes show continuity of Neolithic populations in Southern India

New paper (behind paywall) Neolithic phylogenetic continuity inferred from complete mitochondrial DNA sequences in a tribal population of Southern India, by Sylvester et al. Genetica (2018).

This paper used a complete mtDNA genome study of 113 unrelated individuals from the Melakudiya tribal population, a Dravidian speaking tribe from the Kodagu district of Karnataka, Southern India.

Some interesting excerpts (emphasis mine):

Autosomal genetic evidence indicates that most of the ethnolinguistic groups in India have descended from a mixture of two divergent ancestral populations: Ancestral North Indians (ANI) related to People of West Eurasia, the Caucasus, Central Asia and the Middle East, and Ancestral South Indians (ASI) distantly related to indigenous Andaman Islanders (Reich et al. 2009). It is presumed that proto-Dravidian language, most likely originated in Elam province of South Western Iran, and later spread eastwards with the movement of people to the Indus Valley and later the subcontinent India (McAlpin et al. 1975; Cavalli-Sforza et al. 1988; Renfrew 1996; Derenko et al. 2013). West Eurasian haplogroups are found across India and harbor many deep-branching lineages of Indian mtDNA pool, and most of the mtDNA lineages of Western Eurasian ancestry must have a recent entry date less than 10 Kya (Kivisild et al. 1999a). The frequency of these lineages is specifically found among the higher caste groups of India (Bamshad et al. 1998, 2001; Basu et al. 2003) and many caste groups are direct descendants of Indo-Aryan immigrants (Cordaux et al. 2004). These waves of various invasions and subsequent migrations resulted in major demographic expansions in the region, which added new languages and cultures to the already colonized populations of India. Although previous genetic studies of the maternal gene pools of Indians had revealed a genetic connection between Iranian populations and the Arabian Peninsula, likely the result of both ancient and recent gene flow (Metspalu et al. 2004; Terreros et al. 2011).


Haplogroup HV14

mtDNA haplogroup HV14 has prominence in North/Western Europe, West Eurasia, Iran, and South Caucasus to Central Asia (Malyarchuk et al. 2008; Schonberg et al. 2011; Derenko et al. 2013; De Fanti et al. 2015). Although Palanichamy identified haplogroup HV14a1 in three Indian samples (Palanichamy et al. 2015), it is restricted to limited unknown distribution. In the present study, by the addition of considerable sequences from the Melakudiya population, a unique novel subclade designated as HV14a1b was found with a high frequency (43%) allowed us to reveal the earliest diverging sequences in the HV14 tree prior to the emergence of HV14a1b in Melakudiya. (…) The coalescence age for haplogroup HV14 in this study is dated ~ 16.1 ± 4.2 kya and the founder age of haplogroup HV14 in Melakudiya tribe, which is represented by a novel clade HV14a1b is ~ 8.5 ± 5.6 kya

Maximum Parsimonious tree of complete mitogenomes constructed using 38 sequences from Melakudiya tribe and 11 previously published sequences belonging to haplogroup HV14 [Supplementary file Table S2] Suffixes @ indicate back mutation, a plus sign (+) an insertion. Control region mutations are underlined, and synonymous transitions are shown in normal font and non-synonymous mutations are shown in bold font. Coalescence ages (Kya) for complete coding region are shown in normal font and synonymous transitions are shown in Italics

Haplogroup U7a3a1a2

The coalescence age of haplogroup U7a3a1a2 dates to ~ 13.3 ± 4.0 kya. (…)

Although, haplogroup U7 has its origin from the Near East and is widespread from Europe to India, the phylogeny of Melakudiya tribe with subclade U7a3a1a2 clusters with populations of India (caste and tribe) and neighboring populations (Irwin et al. 2010; Ranaweera et al. 2014; Sahakyan et al. 2017), hint about the in-situ origin of the subclade in India from Indo-Aryan immigrants.

I am not a native English speaker, but this paper looks like it needs a revision by one.

Also – without comparison with ancient DNA – it is not enough to show coalescence age to prove an origin of haplogroup expansion in the Neolithic instead of later bottlenecks. However, since we are talking about mtDNA, it is likely that their analysis is mostly right.

Finally, one thing is to prove that the origin of the Indus Valley Civilization lies (in part) in peoples from the Iranian plateau, and to show with ASI ancestry that they are probably the origin of Proto-Dravidian expansion, and another completely different thing is to prove an Elamo-Dravidian connection.

Since that group is not really accepted in linguistics, it is like talking about proving – through that Iran Neolithic ancestry – a Sumero-Dravidian, or a Hurro-Dravidian connection…


Shared ancestry of ancient Eurasian hepatitis B virus diversity linked to Bronze Age steppe


Ancient hepatitis B viruses from the Bronze Age to the Medieval period, by Mühlemann et al., Science (2018) 557:418–423.

NOTE. You can read the PDF at Dalia Pokutta’s account.

Abstract (emphasis):

Hepatitis B virus (HBV) is a major cause of human hepatitis. There is considerable uncertainty about the timescale of its evolution and its association with humans. Here we present 12 full or partial ancient HBV genomes that are between approximately 0.8 and 4.5 thousand years old. The ancient sequences group either within or in a sister relationship with extant human or other ape HBV clades. Generally, the genome properties follow those of modern HBV. The root of the HBV tree is projected to between 8.6 and 20.9 thousand years ago, and we estimate a substitution rate of 8.04 × 10−6–1.51 × 10−5 nucleotide substitutions per site per year. In several cases, the geographical locations of the ancient genotypes do not match present-day distributions. Genotypes that today are typical of Africa and Asia, and a subgenotype from India, are shown to have an early Eurasian presence. The geographical and temporal patterns that we observe in ancient and modern HBV genotypes are compatible with well-documented human migrations during the Bronze and Iron Ages1,2. We provide evidence for the creation of HBV genotype A via recombination, and for a long-term association of modern HBV genotypes with humans, including the discovery of a human genotype that is now extinct. These data expose a complexity of HBV evolution that is not evident when considering modern sequences alone.

Geographical distribution of analysed samples and modern genotypes. a (featured image), Distribution of modern human HBV genotypes. Genotypes relevant to this Letter are shown in colour. Coloured shapes indicate the locations of the HBV-positive samples included for further analysis. b (above this text), Locations of analysed Bronze Age samples are shown as circles and Iron Age and later samples are shown as triangles. Coloured markers indicate HBV-positive samples. Ancient genotype A samples are found in regions in which genotype D predominates today, and HBV-DA27 is of subgenotype D5 which today is found almost exclusively in India.

Interesting excerpts:

We find genotype A in south-western Russia by 4.3 ka (in samples RISE386 and RISE387) in individuals belonging to the Sintashta culture, and in a Hungarian sample (DA195) from the Scythian culture. The western Scythians are related to the Bronze Age cultures of western steppe populations2 and their shared ancestry suggests that the modern genotype A may descend from this ancient Eurasian diversity and not, as previously hypothesized, from African ancestors29,30. This is also consistent with the phylogeny (Fig. 2), as well as the fact that the three oldest ancient genotype A sequences (HBV-DA195, HBV-RISE386 and HBV-RISE387) lack the six-nucleotide insertion found in the youngest (HBV-DA119) and in all modern genotype A sequences. The ancestors of subgenotypes A1 and A3 could have been carried into Africa subsequently, via migration from western Eurasia31.

The ancient HBV genotype D sequences were all found in Central Asia. HBV-DA27, found in Kazakhstan and dated to 1.6 ka, falls basal to the modern subgenotype D5 sequences that today are found in the Paharia tribe from eastern India32. DA27 and the Paharia people in India are linked by their East Asian ancestry2,33.

Dated maximum clade credibility tree of HBV. A log-normal relaxed clock and coalescent exponential population prior were used. Grey horizontal bars indicate the 95% HPD interval of the age of the node. Larger numbers on the nodes indicate the median age and 95% HPD interval of the age (in parentheses) under a strict clock and Bayesian skyline tree prior. Clades of genotypes C (except clade C4), E, F, G and H are collapsed and shown as dots. The figure includes a possible tenth genotype, J, based on a single human isolate. Taxon names for ancient samples indicate era (BA, Bronze Age; IA, Iron Age or later), sample name, sample age in years, ISO 3166 three-letter abbreviation of country of sequence origin, and region of sequence origin. Taxon names for modern samples indicate human genotype or subgenotype or host species if non-human, GenBank accession number, sample age in years, ISO 3166 three-letter abbreviation of country of sequence origin, and region of sequence origin.

(…)Despite the age of the samples and the imperfect diagnostic test, our dataset contained a high proportion of HBV-positive individuals. The actual ancient prevalence during the Bronze Age and thereafter might have been higher, reaching or exceeding the prevalence typically found in contemporary indigenous populations5. This clearly establishes the potential of HBV as powerful proxy tool for research into human spread and interactions. The data from ancient genomes reveal aspects of complexity in HBV evolution that are not apparent when only modern sequences are considered. They show the existence of ancient HBV genotypes in locations incongruent with their present-day distribution, contradicting previously suggested geographical or temporal origins of genotypes or sub-genotypes; evidence for the creation of genotype A via recombination and the emergence of the genotype outside Africa; at least one now-extinct human genotype; ancient genotype-level localized diversity; and demonstrate that the viral substitution rate obtained from modern heterochronously sampled sequences is probably misleading. Together, these findings suggest that the difficulty in formulating a coherent theory for the origin and spread of HBV may be due to genetic evidence of an earlier evolutionary scenario being overwritten by relatively recent alterations, as has previously been suggested in the context of recombination24

See also:

Demographic history and genetic adaptation in the Himalayan region

Open access Demographic history and genetic adaptation in the Himalayan region inferred from genome-wide SNP genotypes of 49 populations, by Arciero et al. Mol. Biol. Evol (2018), accepted manuscript (msy094).

Abstract (emphasis mine):

We genotyped 738 individuals belonging to 49 populations from Nepal, Bhutan, North India or Tibet at over 500,000 SNPs, and analysed the genotypes in the context of available worldwide population data in order to investigate the demographic history of the region and the genetic adaptations to the harsh environment. The Himalayan populations resembled other South and East Asians, but in addition displayed their own specific ancestral component and showed strong population structure and genetic drift. We also found evidence for multiple admixture events involving Himalayan populations and South/East Asians between 200 and 2,000 years ago. In comparisons with available ancient genomes, the Himalayans, like other East and South Asian populations, showed similar genetic affinity to Eurasian hunter-gatherers (a 24,000-year-old Upper Palaeolithic Siberian), and the related Bronze Age Yamnaya. The high-altitude Himalayan populations all shared a specific ancestral component, suggesting that genetic adaptation to life at high altitude originated only once in this region and subsequently spread. Combining four approaches to identifying specific positively-selected loci, we confirmed that the strongest signals of high-altitude adaptation were located near the Endothelial PAS domain-containing protein 1 (EPAS1) and Egl-9 Family Hypoxia Inducible Factor 1 (EGLN1) loci, and discovered eight additional robust signals of high-altitude adaptation, five of which have strong biological functional links to such adaptation. In conclusion, the demographic history of Himalayan populations is complex, with strong local differentiation, reflecting both genetic and cultural factors; these populations also display evidence of multiple genetic adaptations to high-altitude environments.

Population samples analysed in this study. A. Map of South and East Asia, highlighting the four regions examined, and the colour assigned to each. B. Samples from the Tibetan Plateau. C.Samples from Nepal. D. Samples from Bhutan and India. The circle areas are proportional to the sample sizes. The three letter population codes in B-D are defined in supplementary table S1.

Relevant excerpts:

Genetic affinity to ancestral populations

We explored the genetic affinity between the Himalayan populations and five ancient genomes using f3-outgroup statistics. Himalayans show greater affinity to Eurasian hunter-gatherers (MA-1, a 24,000- year-old Upper Palaeolithic Siberian), and the related Bronze Age Yamnaya, than to European farmers (5,500-4,800 years ago; Fig. 5A) or to European hunter-gatherers (La Braña, 7,000 years ago; Fig. 5B), like other South and East Asian populations. We further explored the affinity of Himalayan populations by comparing them with the 45,000-year-old Upper Palaeolithic hunter-gatherer (Ust’-Ishim) and each of MA-1, La Braña, or Yamnaya. Himalayan individuals cluster together with other East Asian populations and show equal distance from Ust’-Ishim and the other ancient genomes, probably because Ust’-Ishim belongs to a much earlier period of time (supplementary fig. S15). We also explored genetic affinity between modern Himalayan populations and five ancient Himalayans (3,150 1,250 years old) from Nepal. The ancient individuals cluster together with modern Himalayan populations in a worldwide PCA (supplementary fig. S16), and the f3-outgroup statistics show modern high-altitude populations have the closest affinity with these ancient Himalayans, suggesting that these ancient individuals could represent a proxy for the first populations residing in the region (supplementary fig. S17 and supplementary table S4). Finally, we explored the genetic affinity of Himalayan samples with the archaic genomes of Denisovans and Neanderthals (Skoglund and Jakobsson 2011), and found that they show a similar sharing pattern with Denisovans and Neanderthals to the other South and East Asian populations. Individuals belonging to four Nepalese, one Cambodian, and three Chinese populations show the highest Denisovan sharing (after populations from Australia and Papua New Guinea) but these values are not significantly greater than other South and East Asian populations (supplementary figs. S18 and S19).

Genetic structure of the Himalayan region populations from analyses using unlinked SNPs. A. PCA of the Himalayan and HGDP-CEPH populations. Each dot represents a sample, coded by region as indicated. The Himalayan region samples lie between the HGDP-CEPH East Asian and South Asian samples on the right-hand side of the plot. B. PCA of the Himalayan populations alone. Each dot represents a sample, coded by country or region as indicated. Most samples lie on an arc between Bhutanese and Nepalese samples; Toto (India) are seen as extreme outlier in the bottom left corner, while Dhimal (Nepal) and Bodo (India) also form outliers.

NOTE. The variance explained in the PCA graphics seems to be too high. This happened recently also with the Damgaard et al. (2018) papers (see here the comment by Iosif Lazaridis).

Similarities and differences between high-altitude Himalayan

The most striking example is provided by the Toto from North India, an isolated tribal group with the lowest genetic diversity of the Himalayan populations examined here, indicated by the smallest long-term Ne (supplementary fig. S5), and a reported census size of 321 in 1951 (Mitra 1951), although their numbers have subsequently increased. Despite this extreme substructure, shared common ancestry among the high-altitude populations (Fig. 2C and Fig. 3) can be detected, and the Nepalese in general are distinguished from the Bhutanese and Tibetans (Fig. 2C) and they also cluster separately (Fig. 3). In a worldwide context, they share an ancestral component with South Asians (supplementary fig. S2). On the other hand, the Tibetans do not show detectable population substructure, probably due to a much more recent split in comparison with the other populations (Fig. 2C and supplementary fig. S6). The genetic similarity between the high-altitude populations, including Tibetans, Sherpa and Bhutanese, is also supported by their clustering together on the phylogenetic tree, the PCA generated from the co-ancestry matrix generated by fineSTRUCTURE (supplementary fig. S10 and S11), the lack of statistical significance for most of the D-statistics tests (Yoruba, Han; high-altitude Himalayan 1, high-altitude Himalayan 2), and the absence of correlation between the increased genetic affinity to lowland East Asians and the spatial location of the Himalayan populations (supplementary figs. S12 and S13). Together, these results suggest the presence of a single ancestral population carrying advantageous variants for high-altitude adaptation that separated from lowland East Asians, and then spread and diverged into different populations across the Himalayan region. (…)

Recent admixture events

Genetic structure of the Himalayan region populations from analyses using unlinked SNPs. C. ADMIXTURE (K values of 2 to 6, as indicated) analysis of the Himalayan samples. Note that most increases in the value of K result in single population being distinguished. Population codes in C are defined in supplementary table S1.

Himalayan populations show signatures of recent admixture events, mainly with South and East Asian populations as well as within the Himalayan region itself. Newar and Lhasa show the oldest signature of admixture, dated to between 2,000 and 1,000 years ago. Majhi and Dhimal display signatures of admixture within the last 1,000 years. Chetri and Bodo show the most recent admixture events, between 500 and 200 years ago (Fig. 4, supplementary tables S3). The comparison between the genetic tree and the linguistic association of each Himalayan population highlights the agreement between genetic and linguistic sub-divisions, in particular in the Bhutanese and Tibetan populations. Nepalese populations show more variability, with genetic sub-clusters of populations belonging to different linguistic affiliations (Fig. 3B). Modern high-altitude Himalayans show genetic affinity with ancient genomes from the same region (supplementary fig. S17), providing additional support for the idea of an ancient high-altitude population that spread across the Himalayan region and subsequently diverged into several of the present-day populations. Furthermore, Himalayan populations show a similar pattern of allele sharing with Denisovans as other South-East Asian populations (supplementary fig. S18 and S19). Overall, geographical isolation, genetic drift, admixture with neighbouring populations and linguistic subdivision played important roles in shaping the genetic variability we see in the Himalayan region today.


Yet another Bayesian phylogenetic tree – now for Dravidian


Open access A Bayesian phylogenetic study of the Dravidian language family, by Kolipakam et al. (including Bouckaert and Gray), Royal Society Open Science (2018).

Abstract (emphasis mine):

The Dravidian language family consists of about 80 varieties (Hammarström H. 2016 Glottolog 2.7) spoken by 220 million people across southern and central India and surrounding countries (Steever SB. 1998 In The Dravidian languages (ed. SB Steever), pp. 1–39: 1). Neither the geographical origin of the Dravidian language homeland nor its exact dispersal through time are known. The history of these languages is crucial for understanding prehistory in Eurasia, because despite their current restricted range, these languages played a significant role in influencing other language groups including Indo-Aryan (Indo-European) and Munda (Austroasiatic) speakers. Here, we report the results of a Bayesian phylogenetic analysis of cognate-coded lexical data, elicited first hand from native speakers, to investigate the subgrouping of the Dravidian language family, and provide dates for the major points of diversification. Our results indicate that the Dravidian language family is approximately 4500 years old, a finding that corresponds well with earlier linguistic and archaeological studies. The main branches of the Dravidian language family (North, Central, South I, South II) are recovered, although the placement of languages within these main branches diverges from previous classifications. We find considerable uncertainty with regard to the relationships between the main branches.

MCC tree summary of the posterior probability distribution of the tree sample generated by the analysis with the relaxed covarion model with relative mutation rates estimated. Node bars give the 95% highest posterior density (HPD) limits of the node heights. Numbers over branches give the posterior probability of the node to the right (range 0–1). Colour coding of the branches gives subgroup affiliation: red, South I; blue, Central; purple, North; yellow, South II.

With every new paper using these revamped pseudoscientific linguistic methods popular in the early 2000s, including glottochronology, Swadesh lists, phylogenetic trees, mutation rates, etc. I feel a little more like Sergeant Murtaugh…

Featured image, from the article: “Map of the Dravidian languages in India, Pakistan, Afghanistan and Nepal adapted from Ethnologue [2]. Each polygon represents a language variety (language or dialect). Colours correspond to subgroups (see text). The three large South I languages, Kannada, Tamil and Malayalam are light red, while the smaller South I languages are bright red. Languages present in the dataset used in this paper are indicated by name, with languages with long (950 + years) literatures in bold.”

See also:

Archaeological and anthropological studies on the Harappan cemetery of Rakhigarhi, India


New open access paper Archaeological and anthropological studies on the Harappan cemetery of Rakhigarhi, India, by Shinde, Kim, Wo, et al. PLOS One (2018) 13(2): e0192299.


An insufficient number of archaeological surveys has been carried out to date on Harappan Civilization cemeteries. One case in point is the necropolis at Rakhigarhi site (Haryana, India), one of the largest cities of the Harappan Civilization, where most burials within the cemetery remained uninvestigated. Over the course of the past three seasons (2013 to 2016), we therefore conducted excavations in an attempt to remedy this data shortfall. In brief, we found different kinds of graves co-existing within the Rakhigarhi cemetery in varying proportions. Primary interment was most common, followed by the use of secondary, symbolic, and unused (empty) graves. Within the first category, the atypical burials appear to have been elaborately prepared. Prone-positioned internments also attracted our attention. Since those individuals are not likely to have been social deviants, it is necessary to reconsider our pre-conceptions about such prone-position burials in archaeology, at least in the context of the Harappan Civilization. The data presented in this report, albeit insufficient to provide a complete understanding of Harappan Civilization cemeteries, nevertheless does present new and significant information on the mortuary practices and anthropological features at that time. Indeed, the range of different kinds of burials at the Rakhigarhi cemetery do appear indicative of the differences in mortuary rituals seen within Harappan societies, therefore providing a vivid glimpse of how these people respected their dead.

Harappan sites where skeletons were discovered (indicated by dots). Red dot: Rakhigarhi site; dashed dot: skeletons from non-cemetery area; black dots: cemetery sites other than Rakhigarhi.

This is a must read for anyone willing to analyze in detail the upcoming Rakhigarhi samples, which will bring more information regarding the Neolithic population of the Indian subcontinent before the migration of Indo-Iranian peoples.