The genetic makings of South Asia – IVC as Proto-Dravidian


Review (behind paywall) The genetic makings of South Asia, by Metspalu, Monda, and Chaubey, Current Opinion in Genetics & Development (2018) 53:128-133.

Interesting excerpts (emphasis mine):

(…) the spread of agriculture in Europe was a result of the demic diffusion of early Anatolian farmers, it was discovered that the spread of agriculture to South Asia was mediated by a genetically completely different farmer population in the Zagros mountains in contemporary Iran (IF). The ANI-ASI cline itself was interpreted as a mixture of three components genetically related to Iranian agriculturalists, Onge and Early and Middle Bronze Age Steppe populations (Steppe_EMBA).

The first ever autosomal aDNA from South Asia comes from Northern Pakistan (Swat Valley, early Iron Age). This study presented altogether 362 aDNA samples from the broad South and Central Asia and contributes substantially to our understanding of the evolutionary past of South and Central Asia. The study redefines the three genetic strata that form the basis of the Indian Cline. The Indus Periphery (IP) component is composed of (varying proportions of): first, IF, second, Ancient Ancestral South Asians (AASI), which represents an ancient branch of human genetic variation in Asia arising from a population split contemporaneous with the splits of East Asian, Onge and Australian Aboriginal ancestors and third, West_Siberian Hunter gatherers (WS_HG).

The authors argue that IP could have formed the genetic base of the Indus Valley Civilization (IVC). Upon the collapse of the IVC IP contributes to the formation of both ASI and ANI. ASI is formed as IP admixes further with AASI. ANI in turn forms when IP admixes with the incoming Middle and Late Bronze Age Steppe (Steppe_MLBA) component, (rather than the Steppe_EMBA groups suggested earlier)

A sketch of the peopling history of South Asia. Depicting the full complexity of available reconstructions is not attempted. Placing of population labels does not indicate precise geographic location or range of the population in question. Rather we aim to highlight the essentials of the recent advancements in the field. We divide the scenario into three time horizons: Panels (a) before 10 000 BCE (pre agriculture era.); (b) 10 000 BCE to 3000 BCE (agriculture era) and (c) 3000 BCE to prehistoric era/modern era. (iron age).

Dating of the arrival of the Austro-Asiatic speakers in South Asia-based on Y chromosome haplogroup O2a1-M95 expansion estimates yielded dates between 3000 and 2000 BCE [30]. However, admixture LD decay-based approach on genome-wide data suggests the admixture between South Asian and incoming Austro-Asiatic speakers occurred slightly later between 1800 and 0 BCE (Tätte et al. submitted). It is interesting that while the mtDNA variants of the Mundas are completely South Asian, the Y chromosome variation is dominated at >60% by haplogroup O2a which is phylogeographically nested in East Asian-specific paternal lineages.

In India, the speakers of Tibeto-Burman (TB) languages live in the Seven Sisters States in Northeast India and in the very north of the country. Genetically they show a clear East Asian origin and around 20% of subsequent admixture with South Asians within the last 1000 years.The genetic flavour of East Asia in TB is different from that in Munda speakers as the best surrogates for the East Asian admixing component are contemporary Han Chinese.

I found the simplistic migration maps especially interesting to illustrate ancient population movements. The emergence of EHG is supposed to involve a WHG:ANE cline, though, and this isn’t clear from the map. Also, there is new information on what may be at the origin of WHG and Anatolian hunter-gatherers.

From the recent Reich’s session on South Asia at ISBA 8:

– Tale of three clines, with clear indication that “Indus Periphery” samples drawn from an already-cosmopolitan and heterogeneous world of variable ASI & Iranian ancestry. (I know how some people like to pore over these pictures – so note red dots = just dummy data for illustration.)
– Some more certainty about primary window of steppe ancestry injection into S. Asia: 2000-1500 BC
Alexander M. Kim

Featured image: map of South Asian languages from


Munda admixture happened probably during the ANI-ASI mixture


Preprint The genetic legacy of continental scale admixture in Indian Austroasiatic speakers, by Tätte et al. bioRxiv (2018).

Interesting excerpts:

Studies analysing mtDNA and Y chromosome markers have revealed a sex-specific admixture pattern of admixture of Southeast and South Asian ancestry components for Munda speakers. While close to 100% of mtDNA lineages present in Mundas match those in other Indian populations, around 65% of their paternal genetic heritage is more closely related to Southeast Asian than South Asian variation. Such a contrasting distribution of maternal and paternal lineages among the Munda speakers is a classic example of ‘father tongue hypothesis’. However, the temporality of this expansion is contentious. Based on Y-STR data the coalescent time of Indian O2a-M95 haplogroup was estimated to be >10 KYA. Recently, the reconstructed phylogeny of 8.8 Mb region of Y chromosome data showed that Indian O2a-M95 lineages coalesce within a clade nested within East/Southeast Asian within the last ~5-7 KYA. This date estimate sets the upper boundary for the main episode of gene flow of Y chromosomes from Southeast Asia to India.

Supplementary Figure S4. First two components of principal component analysis (PCA). Individuals and population medians (circles) are marked with abbreviations from population names. Different colours represent populations from different geographic areas and/or linguistic groups as shown on the legend on the right. For the full names of populations see Supplementary Table S1. PCA was performed using software EIGENSOFT 6.1.42 on the whole filtered dataset (1072 individuals), previously LD pruned as described in the title of Supplementary Figure S1. The first two principal components describe 5.13% and 2.57% of total variance.

Admixture proportions suggest a novel scenario

Regardless of which West Asian population we used, we found that Munda speakers can be described on average as a mixture of ~19% Southeast Asian, 15% West Asian and 66% Onge (South Asian) components. Alternatively, the West and South Asian components of Munda could be modelled using a single South Asian population (Paniya), accounting on average to 77% of the Munda genome. When rescaling the West and South Asian (Onge) components to 1 to explore the Munda genetic composition prior to the introduction of the Southeast Asian component, we note that the West Asian component is lower (~19%) in Munda compared to Paniya (27%) (Supplementary Table S4: *Average_Lao=0). Consistently with qpGraph analyses in Narasimhan et al. (2018), this may point to an initial admixture of a Southeast Asian substrate with a South Asian substrate free of any West Asian component, followed by the encounter of the resulting admixed population with a Paniya-like population. Such a scenario would imply an inverse relationship between the Southeast and West Asian relative proportions in Munda or, in other words, the increase of Southeast Asian component should cause a greater reduction of the West Asian compared to the reduction in the South Asian component in Munda.

The distribution of genetic components (K=13) based on the global ADMIXTURE analysis (Supplementary Figure S1, S2, S3) for a subset of populations on a map of South and Southeast Asia. The circular legend in the bottom left corner shows the ancestral components corresponding to the colours on pie charts. The sector sizes correspond to population median.

Dating the admixture event

In this study, we have replicated a result previously reported in Chaubey et al. (2011)7 that the Mundas lack one ancestral component (k2) that is characteristic to Indian Indo-European and Dravidian speaking populations. If this component came to India through one of the Indo-Aryan migrations then it would be fair to presume that the Munda admixture happened before this component reached India or at least before it spread all over the country. However, the admixture time computed here, falls in the exact same timeframe as the ANI-ASI mixture has been estimated to have happened in India through which the k2 component probably spread. Therefore, we propose that if the Munda admixture happened at the same time, it is possible for it to have happened in the eastern part of the country, east of Bangladesh, and later when populations from East Asia moved to the area, the Mundas migrated towards central India. Such a scenario, which may be further clarified by ancient DNA analyses, seems to be further supported by the fact that Mundas harbor a smaller fraction of West Asian ancestry compared to contemporary Paniya (Supplementary Table S4) and cannot therefore be seen as a simple admixture product of Southern Indian populations with incoming Southeast Asian ancestries.

Image from Damgaard et al. (2018). A summary of the four qpAdm models fitted for South Asian populations. For each modern South Asian population. we fit different models with qpAdm to explain their ancestry composition using ancient groups and present the f irst model that we could not reject in the following priority order: 1. Namazga_CA + Onge, 2. Namazga_CA + Onge + Late Bronze Age Steppe, 3. Namazga_CA + Onge + Xiongnu_lA (East Asian proxy). and 4. Turkmenistan_lA + Xiongnu_lA. Xiongnu_lA were used here to represent East Asian ancestry. We observe that while South Asian Dravidian speakers can be modeled as a mixture of Onge and Namazga_CA. an additional source related to Late Bronze Age steppe groups is required for IE speakers. In Tibeto-Burman and Austro-Asiatic speakers. an East Asian rather than a Steppe_MLBA source is required

Linguistics and genome-wide data

(…) by and large, the linguistic classification justifies itself but Kharia and Juang do not fit in this simplification perfectly.

Once again, with the current level of detail in genetic studies, there is often no clear dialectal division possible for certain groups without fine-scale population studies, and the help from linguistics and archaeology.

Featured image from open access paper by Chaubey et al. (2011).


Mitogenomes show continuity of Neolithic populations in Southern India

New paper (behind paywall) Neolithic phylogenetic continuity inferred from complete mitochondrial DNA sequences in a tribal population of Southern India, by Sylvester et al. Genetica (2018).

This paper used a complete mtDNA genome study of 113 unrelated individuals from the Melakudiya tribal population, a Dravidian speaking tribe from the Kodagu district of Karnataka, Southern India.

Some interesting excerpts (emphasis mine):

Autosomal genetic evidence indicates that most of the ethnolinguistic groups in India have descended from a mixture of two divergent ancestral populations: Ancestral North Indians (ANI) related to People of West Eurasia, the Caucasus, Central Asia and the Middle East, and Ancestral South Indians (ASI) distantly related to indigenous Andaman Islanders (Reich et al. 2009). It is presumed that proto-Dravidian language, most likely originated in Elam province of South Western Iran, and later spread eastwards with the movement of people to the Indus Valley and later the subcontinent India (McAlpin et al. 1975; Cavalli-Sforza et al. 1988; Renfrew 1996; Derenko et al. 2013). West Eurasian haplogroups are found across India and harbor many deep-branching lineages of Indian mtDNA pool, and most of the mtDNA lineages of Western Eurasian ancestry must have a recent entry date less than 10 Kya (Kivisild et al. 1999a). The frequency of these lineages is specifically found among the higher caste groups of India (Bamshad et al. 1998, 2001; Basu et al. 2003) and many caste groups are direct descendants of Indo-Aryan immigrants (Cordaux et al. 2004). These waves of various invasions and subsequent migrations resulted in major demographic expansions in the region, which added new languages and cultures to the already colonized populations of India. Although previous genetic studies of the maternal gene pools of Indians had revealed a genetic connection between Iranian populations and the Arabian Peninsula, likely the result of both ancient and recent gene flow (Metspalu et al. 2004; Terreros et al. 2011).


Haplogroup HV14

mtDNA haplogroup HV14 has prominence in North/Western Europe, West Eurasia, Iran, and South Caucasus to Central Asia (Malyarchuk et al. 2008; Schonberg et al. 2011; Derenko et al. 2013; De Fanti et al. 2015). Although Palanichamy identified haplogroup HV14a1 in three Indian samples (Palanichamy et al. 2015), it is restricted to limited unknown distribution. In the present study, by the addition of considerable sequences from the Melakudiya population, a unique novel subclade designated as HV14a1b was found with a high frequency (43%) allowed us to reveal the earliest diverging sequences in the HV14 tree prior to the emergence of HV14a1b in Melakudiya. (…) The coalescence age for haplogroup HV14 in this study is dated ~ 16.1 ± 4.2 kya and the founder age of haplogroup HV14 in Melakudiya tribe, which is represented by a novel clade HV14a1b is ~ 8.5 ± 5.6 kya

Maximum Parsimonious tree of complete mitogenomes constructed using 38 sequences from Melakudiya tribe and 11 previously published sequences belonging to haplogroup HV14 [Supplementary file Table S2] Suffixes @ indicate back mutation, a plus sign (+) an insertion. Control region mutations are underlined, and synonymous transitions are shown in normal font and non-synonymous mutations are shown in bold font. Coalescence ages (Kya) for complete coding region are shown in normal font and synonymous transitions are shown in Italics

Haplogroup U7a3a1a2

The coalescence age of haplogroup U7a3a1a2 dates to ~ 13.3 ± 4.0 kya. (…)

Although, haplogroup U7 has its origin from the Near East and is widespread from Europe to India, the phylogeny of Melakudiya tribe with subclade U7a3a1a2 clusters with populations of India (caste and tribe) and neighboring populations (Irwin et al. 2010; Ranaweera et al. 2014; Sahakyan et al. 2017), hint about the in-situ origin of the subclade in India from Indo-Aryan immigrants.

I am not a native English speaker, but this paper looks like it needs a revision by one.

Also – without comparison with ancient DNA – it is not enough to show coalescence age to prove an origin of haplogroup expansion in the Neolithic instead of later bottlenecks. However, since we are talking about mtDNA, it is likely that their analysis is mostly right.

Finally, one thing is to prove that the origin of the Indus Valley Civilization lies (in part) in peoples from the Iranian plateau, and to show with ASI ancestry that they are probably the origin of Proto-Dravidian expansion, and another completely different thing is to prove an Elamo-Dravidian connection.

Since that group is not really accepted in linguistics, it is like talking about proving – through that Iran Neolithic ancestry – a Sumero-Dravidian, or a Hurro-Dravidian connection…