The genetic makings of South Asia – IVC as Proto-Dravidian


Review (behind paywall) The genetic makings of South Asia, by Metspalu, Monda, and Chaubey, Current Opinion in Genetics & Development (2018) 53:128-133.

Interesting excerpts (emphasis mine):

(…) the spread of agriculture in Europe was a result of the demic diffusion of early Anatolian farmers, it was discovered that the spread of agriculture to South Asia was mediated by a genetically completely different farmer population in the Zagros mountains in contemporary Iran (IF). The ANI-ASI cline itself was interpreted as a mixture of three components genetically related to Iranian agriculturalists, Onge and Early and Middle Bronze Age Steppe populations (Steppe_EMBA).

The first ever autosomal aDNA from South Asia comes from Northern Pakistan (Swat Valley, early Iron Age). This study presented altogether 362 aDNA samples from the broad South and Central Asia and contributes substantially to our understanding of the evolutionary past of South and Central Asia. The study redefines the three genetic strata that form the basis of the Indian Cline. The Indus Periphery (IP) component is composed of (varying proportions of): first, IF, second, Ancient Ancestral South Asians (AASI), which represents an ancient branch of human genetic variation in Asia arising from a population split contemporaneous with the splits of East Asian, Onge and Australian Aboriginal ancestors and third, West_Siberian Hunter gatherers (WS_HG).

The authors argue that IP could have formed the genetic base of the Indus Valley Civilization (IVC). Upon the collapse of the IVC IP contributes to the formation of both ASI and ANI. ASI is formed as IP admixes further with AASI. ANI in turn forms when IP admixes with the incoming Middle and Late Bronze Age Steppe (Steppe_MLBA) component, (rather than the Steppe_EMBA groups suggested earlier)

A sketch of the peopling history of South Asia. Depicting the full complexity of available reconstructions is not attempted. Placing of population labels does not indicate precise geographic location or range of the population in question. Rather we aim to highlight the essentials of the recent advancements in the field. We divide the scenario into three time horizons: Panels (a) before 10 000 BCE (pre agriculture era.); (b) 10 000 BCE to 3000 BCE (agriculture era) and (c) 3000 BCE to prehistoric era/modern era. (iron age).

Dating of the arrival of the Austro-Asiatic speakers in South Asia-based on Y chromosome haplogroup O2a1-M95 expansion estimates yielded dates between 3000 and 2000 BCE [30]. However, admixture LD decay-based approach on genome-wide data suggests the admixture between South Asian and incoming Austro-Asiatic speakers occurred slightly later between 1800 and 0 BCE (Tätte et al. submitted). It is interesting that while the mtDNA variants of the Mundas are completely South Asian, the Y chromosome variation is dominated at >60% by haplogroup O2a which is phylogeographically nested in East Asian-specific paternal lineages.

In India, the speakers of Tibeto-Burman (TB) languages live in the Seven Sisters States in Northeast India and in the very north of the country. Genetically they show a clear East Asian origin and around 20% of subsequent admixture with South Asians within the last 1000 years.The genetic flavour of East Asia in TB is different from that in Munda speakers as the best surrogates for the East Asian admixing component are contemporary Han Chinese.

I found the simplistic migration maps especially interesting to illustrate ancient population movements. The emergence of EHG is supposed to involve a WHG:ANE cline, though, and this isn’t clear from the map. Also, there is new information on what may be at the origin of WHG and Anatolian hunter-gatherers.

From the recent Reich’s session on South Asia at ISBA 8:

– Tale of three clines, with clear indication that “Indus Periphery” samples drawn from an already-cosmopolitan and heterogeneous world of variable ASI & Iranian ancestry. (I know how some people like to pore over these pictures – so note red dots = just dummy data for illustration.)
– Some more certainty about primary window of steppe ancestry injection into S. Asia: 2000-1500 BC
Alexander M. Kim

Featured image: map of South Asian languages from


Reconstruction of Y-DNA phylogeny helps also reconstruct Tibeto-Burman expansion


New paper (behind paywall) Reconstruction of Y-chromosome phylogeny reveals two neolithic expansions of Tibeto-Burman populations by Wang et al. Mol Genet Genomics (2018).

Interesting excerpts:

Archeological studies suggest that a subgroup of ancient populations of the Miaodigou culture (~ 6300–5500 BP) moved westward to the upper stream region of the Yellow River and created the Majiayao culture (~ 5400–4900 BP) (Liu et al. 2010), which was proposed to be the remains of direct ancestors of Tibeto-Burman populations (Sagart 2008). On the other hand, Han populations, the other major descendant group of the Yang-Shao culture (~ 7000–5500 BP), are composed of many other sub-lineages of Oα-F5 and extremely low frequencies of D-M174 (Additional files 1: Figure S1; Additional files 2: Table S1). Therefore, we propose that Oα-F5 may be one of the dominant paternal lineages in ancient populations of Yang-Shao culture and its successors.

In this study, we demonstrated that both sub-lineages of D-M174 and Oα-F5 are founding paternal lineages of modern Tibeto-Burman populations. The genetic patterns suggested that the ancestor group of modern Tibeto-Burman populations may be an admixture of two distinct ancient populations. One of them may be hunter–gatherer populations who survived on the plateau since the Paleolithic Age, represented by varied sub-lineages of sub-lineages of D-M174. The other one was comprised of farmers who migrated from the middle Yellow River basin, represented by sub-lineages of Oα-F5. In general, the genetic evidence in this study supports the conclusion that the appearance of the ancestor group of Tibeto-Burman populations was triggered by the Neolithic expansion from the upper-middle Yellow River basin and admixture with local populations on the Tibetan Plateau (Su et al. 2000).

Simplified phylogenetic tree showing sample locations. The size of the circle for each sampling location corresponds to the number of samples

Two neolithic expansion origins of Tibeto‑Burman populations

We also observed significant differences in the paternal gene pool of different subgroups of Tibeto-Burman populations. Haplogroup D-M174 contributed ~ 54% percent in a sampling of 2354 Tibetan males throughout the Tibetan Plateau (Qi et al. 2013). Previous studies have also found high frequencies of D-M174 in other populations on the Tibetan Plateau (Shi et al. 2008), including Sherpa (Lu et al. 2016) and Qiang (Wang et al. 2014). In contrast, haplogroup D-M174 is rare or absent from Tibeto-Burman populations from Northeast India and Burma (Shi et al. 2008). In populations of the Ngwi-Burmese language subgroup, the average frequencies of haplogroup D-M174 are ~ 5% (Dong et al. 2004; Peng et al. 2014). Furthermore, we found that lineage Oα1c1b-CTS5308 is mainly found in Tibeto-Burman populations from the Tibetan Plateau. In contrast, lineage Oα1c1a-Z25929 was found in Tibeto-Burman populations from Northeast India, Burma, and the Yunan and Hunan provinces of China (Additional files 1: Figure S1; Additional files 2: Table S1). In general, enrichment of lineage Oα1c1b- CTS5308 and high frequencies of D-M174 can be found in most Tibeto-Burman populations on the Tibetan Plateau and adjacent regions, whereas Tibeto-Burman populations from other regions tend to have lineage Oα1c1a-Z25929 and a little to no percentage of D-M174.

The inconsistent pattern we observed in the paternal gene pool of modern Tibeto-Burman populations suggested that there may be two distinct ancestor groups (Fig. 3). The proposed migration routes shown in Fig. 3 are somewhat different from those proposed by Su et al. (2000). According to our age estimation, most of the D1a2a-P47 samples belong to sub-lineage PH116, a young lineage that emerged ~ 2500 years ago (95% CI 1915–3188 years). On the other hand, continuous differentiation can be observed on a phylogenetic tree of lineages D1a1a1a1-PH4979 and D1a1a1a2-Z31591 since 6000 years ago. Therefore, we proposed that a group of ancient populations may have moved to the upper basin of the Yellow River and admixed intensively with local populations with high frequencies of haplogroup D-M174, including its sub-lineage D1a2a-P47 (Fig. 3). This ancestor group eventually gave birth to modern Tibeto-Burman populations on the Tibetan Plateau and adjacent regions. The other ancestor group moved toward the southwest and finally reached South East Asia (Burma and other locations) and the northeastern part of India (Fig. 3). This ancestor group may have had no or a minor admixture of D-M174 in their paternal gene pool.

Two proposed ancestor groups and migration routes for Tibeto-Burman populations

Long‑term admixture before expansion to a high‑altitude region

It is interesting to investigate the time gap between the appearance of Neolithic cultures in the northeastern part of the Tibetan Plateau and the final phase of human expansion across the Tibetan Plateau. The Majiayao culture (~ 5400–4900 BP) is the earliest Neolithic culture in the northeastern part of the Tibetan Plateau (Liu et al. 2010). However, previous archeological study has suggested that the final phase of diffusion into the high-altitude area of the Tibetan Plateau occurred at approximately 3.6 kya (Chen et al. 2015). Our genetic evidence in this study is consistent with this scenario based on archeological evidence. Based on Y-chromosome analysis in this study, many unique lineages of Tibeto-Burman populations emerged between 6000 years ago and 2500 years ago (Additional files 3: Table S2). The most recent common age of D1a2-PH116, a sub-lineage that spread throughout the Tibetan Plateau, is only 2500 years ago.

We propose that there may be two important factors for the observed age gap. First, living in a high-altitude environment may require some crucial physical characteristics that were lacking from Neolithic immigrants from the middle Yellow River Basin. Intense genetic admixture with local people who had survived on the Tibetan Plateau since the Paleolithic Age may have actually guaranteed the expansion of humans across the Tibetan Plateau. Therefore, a long period of admixture, lasting from 5.4 to 3.6 kya, may be necessary for the appearance of a population with beneficial genetic variants that was genetically adapted to the high-altitude environment. Second, technological innovations, such as the domestication of wheat and highland barley (Chen et al. 2015), establishment of yak pastoralism (Rhode et al. 2007), and introduction of other culture elements in the Bronze Age (Ma et al. 2016), are also important factors that facilitated permanent settlements with large population sizes in the high-altitude area of the Tibetan Plateau.


Indo-European and Central Asian admixture in Indian population, dependent on ethnolinguistic and geodemographic divisions


Preprint paper at BioRxiv, Dissecting Population Substructure in India via Correlation Optimization of Genetics and Geodemographics, by Bose et al. (2017), a mixed group from Purdue University and IBM TJ Watson Research Center. A rather simple paper, which is nevertheless interesting in its approach to the known multiple Indian demographic divisions, and in its short reported methods and results.


India represents an intricate tapestry of population substructure shaped by geography, language, culture and social stratification operating in concert. To date, no study has attempted to model and evaluate how these evolutionary forces have interacted to shape the patterns of genetic diversity within India. Geography has been shown to closely correlate with genetic structure in other parts of the world. However, the strict endogamy imposed by the Indian caste system, and the large number of spoken languages add further levels of complexity. We merged all publicly available data from the Indian subcontinent into a data set of 835 individuals across 48,373 SNPs from 84 well-defined groups. Bringing together geography, sociolinguistics and genetics, we developed COGG (Correlation Optimization of Genetics and Geodemographics) in order to build a model that optimally explains the observed population genetic sub-structure. We find that shared language rather than geography or social structure has been the most powerful force in creating paths of gene flow within India. Further investigating the origins of Indian substructure, we create population genetic networks across Eurasia. We observe two major corridors towards mainland India; one through the Northwestern and another through the Northeastern frontier with the Uygur population acting as a bridge across the two routes. Importantly, network, ADMIXTURE analysis and f3 statistics support a far northern path connecting Europe to Siberia and gene flow from Siberia and Mongolia towards Central Asia and India.

Among the most interesting results (emphasis mine):

Our meta-analysis of the ADMIXTURE output shows that the IE and DR populations across castes shared very high ancestry, indicating the autochthonous origin of the caste system in India (Figure 2). f3 statistics show that most of the castes and tribes in India are admixed, with contributions from other castes and/or tribes, across languages affiliations (Supplementary Table 4 and Supplementary Note). The geographically isolated Tibeto-Burman tribes and the Dravidian speaking tribes appear to be the most isolated in India. Linear Discriminant Analysis on the normalized data set clearly supports genetic strati cation by castes and languages in the Indian sub-continent


Our meta-analysis of the ADMIXTURE plot in Figure 4A quantifies the ADMIXTURE results (darker colors indicate higher pairwise shared ancestry). Indian populations show a greater proportion of shared ancestry with the so-called Indian Northwestern Frontier populations, namely the tribal populations spanning Afghanistan and Pakistan. Central Asian populations share higher degrees of ancestry with IE and DR Froward castes. Uygurs share high degrees of ancestry with Indian populations.


f3 statistics (all negative Z-scores are shown) indicate Chinese and Siberian ancestry contributing to the Tibeto-Burman tribal speakers. On the other hand, the Mongols and the Europeans have contributed significant amounts of ancestry to the Indo-European and Tibeto-Burman forward castes. F3 statistics also show that the Central Asians are an admixed population with signs of admixture from Caucasus and other parts of Europe.

Among the results for proportions of shared ancestry between Indians and Eurasians (FIG. 4), there is an obvious influence of European admixture (Caucasus, and Southern, Central, and Northern EU), potentially from the Yamna-Corded Ware expansion, in IE_ForwardCaste, which is lessened in IE_BackwardCaste and also in IE_Tribal, while DR_ForwardCaste shows again more admixture than IE_Tribal, but diminishing with lower castes and quite low in DR_Tribal.

Ancestry from Central Asia is strong with a similar pattern, which hints at the influence of Sintashta, Andronovo, and BMAC influence in the expansion of the Steppe component, even more than a later Turkic component.

On the other hand, the influence from Turkey is difficult to assess, given the complex genetic history of Anatolia, but the map contained in Fig. 6 doesn’t feel right, not only from a genetic viewpoint, but also from linguistic and archaeological points of view. This is the typical map created with admixture analyses that is wrong because of not taking into account anthropological theories.

Quite interesting is then the influence of admixture in these different ethnolinguistic groups, Indo-European and Dravidic, which points to an initially greater expansion of Indo-European speakers, and later resurge of Dravidian languages.

Featured image contains simplified origin and data of samples studied, from the article.