Intense but irregular NWIE and Indo-Iranian contacts show Uralic disintegrated in the West

chalcolithic-early-uralic-indo-european

Open access PhD thesis Indo-Iranian borrowings in Uralic: Critical overview of sound substitutions and distribution criterion, by Sampsa Holopainen, University of Helsinki (2019), under the supervision of Forsberg, Saarikivi, and Kallio.

Interesting excerpts (emphasis mine):

The gap between Russian and Western scholarship

Many scholars in the Soviet Union and later the Russian Federation also have researched this topic over the last five decades. Notably the eminent Eugene Helimski dealt with this topic in several articles: his 1992 article (republished in Helimski 2000) on the emergence of Uralic consonantal stems used Indo-Iranian and other Indo-European loans as key evidence, and it was one of the first serious attempts to stratify the loanwords, paying attention to the non-initial syllables as well. Helimski (1997b) discusses Indo-Iranian loanwords more generally, but it is especially notable for the introduction of the “Andronovo Aryan” idea: Helimski argues that some loanwords in Ob-Ugric and Permic are derived from an unattested, third branch of Indo-Iranian. Helimski’s idea has been supported by at least Mikhail Zhivlov in a 2013 article, but otherwise it has not received wide acceptance. Helimski was also known for his criticism (see especially Helimski 2001) of Jorma Koivulehto’s etymological work: although the main targets of Helimski’s criticism were Koivulehto’s writings on Proto-Indo-European and Germanic borrowings (which fitted poorly with Helimski’s ideas of the Nostratic roots of Proto-Uralic and his other theories on Uralic linguistic prehistory), also some of his Indo-Iranian ideas received unnecessarily sharp criticism in Helimski (2001).

Vladimir Napol’skikh is another important Russian scholar who has written on several occasions about Indo-Iranian–Uralic contacts. His 2014 article is notable for its criticism on Helimski’s Andronovo Aryan theory and his arguments in favour of Indo-Aryan loanwords. Napol’skikh also considered some of the traditional Indo-Iranian loanwords to be borrowings from Tocharian (see below) in some of his earlier works, an idea which has been criticized by Kallio (2004) and Widmer (2002) and which Napol’skikh himself has since dropped in later publications (2010, 2014), where many of these alleged Tocharian loans are again considered Indo-Iranian.

Some of the main characteristics of Russian research is that the earliest Indo-European loanwords are usually considered to represent an inheritance from the Nostratic proto-language (Helimski [2001]; Kassian, Zhivlov & Starostin [2015]), an idea which is not widely accepted by scholars of Uralic in the West. Although this often does not concern the Indo-Iranian loanwords at all, or it concerns only a part of them, the works of Jorma Koivulehto, who dealt with both earlier Indo-European and Indo-Iranian loans, receive so much criticism from the Russian scholars that his important ideas are often totally rejected or left unmentioned in Russian research.

This kind of rejection of central etymological research literature can be considered one of the most pressing problems in Uralic loanword studies, and it leaves a regrettable gap between Russian and Western European scholars in this perspective.

11-chalcolithic-late-cultures

11-chalcolithic-late-uralic

Semantics

Among the Indo-Iranian loanwords in Uralic, one can easily mention examples that follow the classification of semantic change as described above. For widening or generalization, vasara ‘hammer’ is a good example: the Indo-Iranian original denotes ‘the weapon of the god Indra’ in Indic and ‘the weapon of the god Mithra’ in Avestan, whereas Finnish ‘hammer’ (and the Mordvin meaning ‘axe’) are more general meanings of tools. Fi huhta is a good example of narrowing: Iranian *tsuxta- means simply ‘burned’, whereas in Finnic huhta means specifically ‘a burned patch used in slash-and-burn agriculture’. Metonomy has taken place in Mordvin, where čuvto denotes simply ‘tree’; this probably developed through the meaning ‘wood burned for agriculture’. Khanty (South) wǟrəs denotes ‘horse’s mane’, but its Iranian original probably had a more general meaning of hair (cf. Avestan varəsa- ‘hair of human and animal, mostly hair of the head’).

An interesting example of degeneration is the etymology of Finnic orja ‘slave’, probably borrowed from the Indo-Iranian ethnonym *(H)ārya- ‘Aryan’ (for the original semantics of this word, see the entry *orja in Chapter 2). A similar development is seen in English slave which is etymologically connected to the ethnonym Slav.

Distribution as a criterion in the dating of loanwords

(…) some of the Indo-Iranian loans seem to have a wide distribution, but upon a closer look it becomes clear that they include phonological irregularities, which can only be explained by assuming that they are parallel loans. The ability to recognize parallel borrowings is extremely important in Uralic loanword studies, and it has been developed with success in the research of Germanic and Baltic loanwords (see Junttila 2015).

Interestingly, K. Häkkinen (1983: 207) argues that although words disappear from languages, the most basic words often remain stable and are maintained for longer periods. Although this is probably true, here the notion of “basicness” is something that is open to different interpretations. Many central concepts in culture and livelihoods are often described with prestige words that are borrowed, and these central words can be very easily replaced. In determining the age of the loanwords one has to always keep in mind that a reflex of a very early cultural borrowing from Indo-Iranian to Proto-Uralic/Proto- West Uralic etc. can easily have been lost in some daughter language, if a later prestige loan for the same concept has been borrowed from some later contact language (such as from some form of Germanic or Baltic into Finnic or from some Turkic language into Udmurt, Mari or Mordvin).

In Uralic linguistics the common loanword layers shared by some intermediary proto-language have often been seen as giving support to the reconstruction of these stages, but K. Häkkinen (100–108) considers this problematic. It should also be noted that the distribution of Indo-Iranian loanwords very rarely matches the assumed taxonomic divisions: there are some loanwords confined to the Finno-Permic, Finno-Volgaic or Ugric languages, but very few loanwords that would be Finno-Permic, Finno-Volgaic or Ugric in the way that the word is found in all the languages that belong to the branch.

Consontants

Laryngeals

There are only very few possible examples of a consonantal substitution of the word-initial laryngeal. It seems probable that the word-initial laryngeal, if it was retained, was not substituted in any way in Uralic. *karšV (> Fi karhu), an uncertain etymology, is the only possible example.

(…) Even if *k was a result of laryngeal hardening, the development would probably be earlier than Proto-Indo-Iranian, meaning that by the time the word was borrowed, the Indo-Iranian word simply had the stop *k that was regularly substituted by Uralic *k.

Evidence for Andronovo Aryan and Indo-Aryan loanwords?

None of the loanwords have to be considered as Andronovo Aryan or Proto-Indo-Aryan based on the criteria that were presented in the Introduction. The Uralic palatal affricate *ć or sibilant *ś can in all cases be explained from Proto-Indo-Iranian *ć, and there is no need to assume that it should reflect Andronovo Aryan *ć or PIA *ś. In the etymological material of this study, no further positive evidence was found for the distinction of PU *ś and *ć as substitutions of the Proto-Indo-Iranian affricates. This means that at least in word-initial position there probably was no difference between *ć and *ś, and even though we do not know what this sound was phonetically, it is safe to assume that Uralic words showing *ś reflect a sound substitution of Indo-Iranian *ć and *Ʒ́.

Regarding the distribution of the etymologies within Indo-Iranian, all the loanwords which cannot be from Iranian because of the lack of attested Iranian cognates have a more or less secure Proto-Indo-Iranian etymology, and nothing prevents us from assuming that these words reflect Proto-Indo-Iranian borrowings. It is also possible that some words with solid Proto-Indo-Iranian etymologies were present in Iranian but were lost before the first Old Iranian texts were composed.

12-bronze-age-early-cultures

12-bronze-age-early-uralic

List of Indo-European and Indo-Iranian Etymologies

Pre-Indo-Iranian

*ertä ‘side’, *kekrä ‘wheel’, *kečrä ‘spindle’, *mekši ‘bee’, (*meti ‘honey’), *ońća ‘part’, (*orpa ‘orphan’), *peijas ‘feast’, *pejmä ‘milk’, Pre-P *pertä ‘wing’, *repä ‘fox’, *rećmä ‘rope’, *sejti ‘bridge’

Proto-Indo-Iranian

*aćtara ‘whip’, *anti/onta, *ora ‘awl’, *orja ‘slave; south’, (*orpa ‘orphan’), *pośi ‘penis’, *śaŋka ‘handle’, Pre-Md *śaγa ‘goat’, *śarwi ‘horn’, *śaδa- ‘to rain’, śara- ‘shit’, *śi̮ta ‘hundred’, Pre-P *śVta ‘hundred’, *śasra ‘thousand’, *śišta ‘wax’, *śoma- ‘sad’, *waćara ‘hammer’, *woraći ‘boar’

Ambiguous early loans (can be either from PII or PI)

*ajša ‘shaft’, *asVra ‘lord’, *iha ‘yearning. passion’, *ihta ‘lust’, *jama ‘twin’, *jawi/jowa (> Mo juv) ‘awn’, *jawi (> PS *jäə̑) ‘flour’, *ji̮ni ‘way, path’, *juma ‘god’, *kana- ‘to dig’, *kara- ‘to dig’, *kata- ‘to graze’, *kertä- ‘to bind’, *ki̮ntaw ‘tree stump’, *kürtńV ‘iron’, PKh *kǟrtV ‘iron’, *kärtä ‘iron’, *martas ‘dead’, *ńātV- ‘to help’, *pakas ‘god’, *para ‘good’, Kh pĕnt ‘way’, PMs *pē̮ńtV ‘brother-in-law’, *pora ‘old’, *poči- ‘to boil’, Pre-P *porta ‘vessel’, *puntaksi ‘bottom’, Pre-Ma *pänti- ‘to bind’, PMa *pärća ‘ear of corn’, *pätäri- ‘to flee’, *saγi- ‘to get, obtain’, *sampas ‘pillar’, *saŋka ‘old’, *sara ‘lake’, *sasara ‘sister’, *säptä ‘seven’, *tajwas ‘sky’, *takra ‘piece of flesh’, *tarna ‘grass’, *tojwV ‘wish’, *toraksi ‘through’, *tora- ‘to fight’, *täjV ‘milk’, *täjinV ‘cow’, *täši, *uška ‘bull’, *wakša- (> PS *wåtå-) ‘to grow’, *wajna- ‘to see’, *wojna- ‘to see’, *wiša ‘venom’, *wi̮rna ‘wool’, *wärkä ‘kidney’, PS *wǝ̑rkǝ̑ ‘wolf’, *wirtV- ‘to hold, raise’, *äŋkärä ‘coal’

List of uncertain Indo-Iranian etymologies

PFi *aiwa (← Germanic ?), Ma *arša ‘mane’, PMs *ǟrV ‘fire’, *aštira ‘barren earth’, POug *ćakV ‘hammer’, *ćara- ‘brown; ? to dawn’, *ćero ‘hill-top’, *ćerti ‘group’, *itä- ‘to appear’, Pre-Fi *karšV ‘bear’, PMs *kīrV ‘iron’, *kota ‘chum’, Pre-Sa *kupa ‘pit’, PFi *kärsä ‘snout’, *maksa- ‘to pay’, PFi *mana-, PUg ? *mańći, Ma marij ‘Mari; man; husband’, *mē̮ja ‘wedding’, *mykkä ‘dumb’, PP *oč ‘corn’, *orpV ‘relative’, PFi *paksu ‘thick’, *peji- ‘to milk’, *pi̮ŋka ‘psychedelic mushroom’ POUg *porV ‘phratry’, Pre-Sa *poti ‘against’, Pre-Fi *šatas ‘germ’, *sentü- ‘to be born’, *šerä- ‘to wake up’, Ms šVšwǝŋ ‘hare’, PUg *śeŋkV ‘nail’, Pre-Sa *soma/sami ‘some’, PP *sur ‘beer’, PFi *süte- ‘to hit’ (< ? *sewči-), Hu szekér ‘wagon’, Kh ʌīkər ‘Narte’ PUg *taja- ‘secret’, Pre-Fi *terni ‘young’, *terwV ‘healthy’, ? *towkV ‘spring’, PWU *utarV ‘udder’ (← Germanic ?; Mari *waδar ← II), *waŋka ‘hook’, Mo E v́eŕges, M vərǵas ‘wolf’

Etymologies that were probably borrowed from another Indo-European source (PIE, PBSl, Germanic, Baltic)

*aisa ‘shaft’ ← Balto-Slavic, PFi *aiwa (← Germanic ?), *apV ‘help’ ← Germanic, *jewä ‘grain’ ← Balto-Slavic, Ma karaš etc. ‘honeycomb’ ← Baltic, (*meti ‘honey’ ← ? PIE,) Fi *ojas ‘shaft’ ← Slavic, *ola ← Baltic, *oŋki ← Germanic, *porćas ← Balto-Slavic, Pre-Sa *porta ‘vessel’ ← Germanic, *salV ‘salt’ (cannot be reconstructed for PU, various later parallel loans), *śi̮lkaw ← Balto-Slavic, *sammu- ← Germanic, *śuka ← Balto-Slavic, Mari *šŭžar ← Baltic/Balto-Slavic or Slavic, *tejniš ‘pregnant animal’ ← Baltic/Balto-Slavic, PWU *utarV ‘udder’ (? ← Germanic)

Early loans into differentiated branches

Proto-West Uralic

Only in Finnic:

*aćnas ‘voracious’, *iha ‘wish’, *ihta ‘lust’, PFi *isV ‘appetite’, *martas ‘dead’, *očra ‘barley’, *peijas ‘feast’, *pejmä ‘milk’, *pe̮rna ‘spleen’, *sampas ‘pillar’, *sooja ‘shelter’, *tajwas ‘sky’, *takra ‘piece of flesh’, *terwV ‘healthy’, *tojwV ‘wish’

All of these words, with the exception of *sooja ‘shelter’, were clearly borrowed into Early Proto-Finnic (Pre-Finnic) at the latest. Formally most of the loans could be from PII or PI.

Only in Saami:

*kata- ‘to graze’, *kertä- ‘to bind’, *pora ‘old’, *wojna- ‘to see’

All of the loans were acquired before the Saami vowel changes. Formally all could be either from Proto-Indo-Iranian or Proto-Iranian.

Only in Finnic and Saami:

*asma ‘voracious’, *jama ‘twin’, *kekrä ‘wheel’, *mača ‘insect’

*asma ‘voracious’, *jama ‘twin’, *kekrä ‘wheel’, *mača ‘insect’ Of these, *mača from Proto-Iranian and *jama is ambiguous. As the -sm- in asma does not point to Proto-Indo-Iranian *ć, this is probably an Iranian loan too. It is possible that these words were borrowed into Proto-West Uralic, as there is no general support for a Finno-Saamic proto-language today. As the cognates within Finnic and Saami are regular, there is no need to assume parallel borrowings. *kekrä has to be from Proto-Indo-Iranian.

NOTE. Based on the discussion of stages of borrowing from Indo-Iranian, and of the distribution of *kekrä among Uralic dialects in particular, Holopainen probably means Pre-Indo-Iranian for this example.

Only in Mordvin and/or Finnic and/or Saami (can point to a borrowing into Proto-West Uralic):

*ji̮ni ‘way’, *kečrä ‘spindle’, *rećmä ‘rope’, *śaŋka, *waćara ‘hammer’, *warsa ‘foal’, *wasa ‘calf’, *woraći ‘pig’

Based on phonological criteria, these loans do not form a chronologically coherent layer, but probably their modern distribution is accidental (their original distribution can have been wider). *kečrä ‘spindle’ and *rećmä ‘rope’ are from Pre-II, *śaŋka, *waćara and *woraći from PII, *warsa and *wasa from later Iranian (Alanic). *ji̮ni is ambiguous. Also the loans confined to Finnic and Saami mentioned above probably were borrowed into Proto-West Uralic, as it is a more convincing taxonomic entity than Proto-Finno-Saamic.

Proto-Mari-Permic

Only in Mordvin, Finnic and/or Saami and Mari

*juma ‘good’

This loan can be either from PII or PI. As it is obvious that these four branches do not form any taxonomical entity (Salminen 2002; J. Häkkinen 2009), it is only logical that there are no other loanwords with a “Finno-Volgaic” distribution.

Only in Mari:

*kVrtnV ‘metal’ (← PII, PI or later), Pre-Ma *pänti- ‘to bind’, PMa *pärća ‘ear of corn’, *si̮rńa ‘gold’ (← Old Iranian)

Only very few early Indo-Iranian loans can be found in Mari and in no other Uralic language. It is unclear what the reason for this is. It is, of course, possible that some uncertain loanwords like marij ‘man; Mari’ turn out to be correct after all, but even that does not make the number of loans in Mari very high. The situation has to be explained either with loss of vocabulary and replacement by later loans (from Turkic, and also perhaps from Permic) or with Mari’s location on the periphery at the time of the later contacts with the Iranian languages. Agyagási (2019: 254–258) argues that the current area where Mari is spoken was formed only relatively late, after the Mongol invasion in the High Middle Ages. If this is indeed correct, and Mari was spoken in more northern areas before that, it can be assumed that Pre-Mari had only sporadic contacts with the Iranian languages after it split off from Proto-Uralic.

Only in Permic (early loans; for later loans confined to Permic)

*a(č)wa ‘stallion’, PP *ju ‘awn’, *kertä ‘house’, *kärtä ‘metal’, *kada- ~ *gada- ‘to steal’, *karka ‘chicken’, *parśa ~ *barśa ‘mane’, *parta ‘knife’, *pertä ‘wing’, *poči- ‘to boil’, *porta ‘vessel’, *dura ‘long’, *domV ‘to tame’, PP *śumi̮s ‘band’, PP *šud‘luck’, *uška ‘bull’, *wi̮rna ‘wool’, *wirä ‘man, husband’, *äŋkärä ‘coal’

The number of loanwords in Permic is relatively high, and many of these can be considered to be Iranian loanwords. Technically many loans are ambiguous, but as some of the words were borrowed late due to historical reasons (‘iron’), and some were borrowed into a Pre-Permic which already had a phonological system that was different from Proto-Uralic (*šud- has d which cannot reflect PU *δ).

It is probable that the Permic languages were in continuous contact with the Indo-Iranian languages from the time they split from Proto-Uralic until the early mediaeval era.

Proto-Ugro-Samoyedic

Only in Khanty and Mansi (regular cases):

POUg *ēräɣ ‘song’, POUg *eträ ‘clear sky’, POug *mɔ̈ŋki ‘forest-spirit’, *ńātV- ‘to help’, *päčäɣ ‘reindeer’

The number of these etymologies is so low that it is very difficult to determine whether these words were borrowed into Proto-Ob-Ugric or some earlier proto-language, such as Proto-Ugric.

Only in Khanty and/or Mansi and/or Hungarian (regular cases):

*säptä ‘seven’ (Khanty + Hungarian regular), *sara ‘lake’

There are so few convincing loanwords with a “Ugric” distribution that they provide very little evidence. Either of these loans could be from Proto-Indo-Iranian or Proto-Iranian, if we assume that *s > *h was a common Iranian sound change. Both loans were acquired

Only in Samoyed:

*jäwi (> PS *jäə̑), PS *pulə̑ ~ *pi̮lə̑ ‘bridge’, *täjki ‘spear’, PS *wǝ̑rkə̑ ‘wolf’, Pre-S *täši (> PS *tät), *wakša- (> PS *wåtå) ‘to grow’

Of these, only *wåtå- has to be a very early loan because of *s > *t. *jäwi (> PS *jäə̑) and PS *wə̑rkə̑ were possibly acquired before the Proto-Samoyed vowel developments, making them probably early loanwords too. Formally all of them could be either from PII or PI. *pulə̑ ~ *pi̮lə̑ could have been borrowed into Proto-Samoyed (with Iranian *u corresponding to Samoyed *u), and because of the *l the word is probably from a relatively late, Middle Iranian language.

The following loanwords have a distribution with a cognate in both Samoyed and some other branch:

*śaδa- ‘to rain’, *tora- ‘to fight’ (also *itä-, which is more uncertain, belongs here)

Pan-Uralic loans

The following loanwords have a distribution with regular cognates with at least one Ugric branch and some other branch, which points to early borrowing. Although formally *kana- and *kara- are ambiguous, they are probably from Proto-Indo-Iranian because of their distribution. The rest of the loans are from Pre-II or PII.

*kana- ‘to dig’, *kara- ‘to dig’, *meti ‘honey’, *mekši ‘bee’, *orpV ‘orphan’, *ora ‘awl’, *peji- ‘to milk’, *pätäri- ‘to flee’, *śara- ‘shit’, *śoma- ‘sad’

The following loanwords are found in at least two non-adjacent branches of Uralic (the ones listed in the above categories are not counted). As there are no widely accepted criteria for a word to be considered “Uralic”, all of these could be considered loanwords into Proto-Uralic, in this case probably from Proto-Indo-Iranian or Pre-Indo-Iranian.

*ajša ‘shaft’, *anti/onta ‘grass’, *ertä ‘side’, *ki̮ntaw ‘tree stump’, *mertä ‘human’, *orja ‘slave’, *para ‘good’, *počaw ‘reindeer’, *puntaksi ‘bottom’, *saγi- ‘to get, obtain’, *repä ‘fox’, *si̮ŋka ‘old’, *sasara ‘sister’, *sejti ‘bridge’, *śišta ‘wax’, *tarna ‘grass’, *toraksi ‘through’, *wiša ‘venom’

12-bronze-age-middle-cultures

12-bronze-age-middle-uralic

Discussion about the distribution and its impact on Uralic taxonomy

(…) there are Proto-Iranian loanwords which were borrowed simultaneously into several early branches of Uralic, making it likely that Uralic had split into several branches by the time of these contacts.

Also the fact that many of the Proto-Indo-Iranian loanwords either show a restricted distribution (such as West Uralic *waćara, *woraći) or irregular correspondences (*asVra, *śasra, *śi̮ta) can point to the conclusion that Proto-Uralic was fragmenting by the time when contacts with Proto-Indo-Iranian took place.

The earlier, Pre-Indo-Iranian loanwords usually show a wider distribution and regular sound correspondences. Although the number of these earliest loans is quite small, based on their distribution and regular correspondences it can be assumed that the Pre-Indo-Iranian stage (after RUKI, *l > *r and the merger of velars and labiovelars but before the merger of non-high vowels) was concurrent with Proto-Uralic, with the changes leading to Proto-Indo-Iranian happening after the dispersal of Proto-Uralic.

The distribution of loanwords reinforces the old idea that Samoyed is a lexical outlier, as only few convincing Indo-Iranian etymologies for Proto-Uralic words (*saδa- ‘to rain’, *tora- ‘to fight’) have a convincing reflex in Samoyed. However, the fact that such etymologies exist means rather that the situation is due to lexical loss in Samoyed, and that the earliest contact occurred before Samoyed split off from Proto-Uralic.

There are very few loanwords that have a Ugric distribution (being found in at least one Ob-Ugric branch and Hungarian), and likewise rather few in Ob-Ugric. The few loans that have a distribution confined to Ugric were borrowed before the change *s > *θ took place. This means that the Ugric distribution does not mean much from the point of view of chronology or taxonomy, as the words were borrowed into a language that was still identical to Proto-Uralic. Even some loans borrowed into Khanty and Mansi have to be so early.

Impacts on dating and the location of the contact zones

Because of the very limited number of convincing etymologies found only in Finnic or Saami, it is probable that there were not (extensive) contacts with Pre-Finnic or Pre-Saami after the split of Proto-West Uralic.

The great number of loanwords of varying ages in Permic inevitably points to the conclusion that the pre-form of the Permic branch had been constantly spoken in an area that was adjacent to the Iranian languages. The different layers of loanwords in Permic clearly point to chronological differences in the donor languages, but it also seems that Permic was in contact with various forms of Iranian and not with different diachronic stages of the same language.

In general, the words that have been borrowed are typical cultural words, and the contacts between Indo-Iranian and Uralic seems to have been a typical contact situation in which a culturally less-advanced language group borrows various cultural terms from a more “advanced” group. The words in various loanword layers related to horse and cattle breeding show obvious cultural influence in the field of domesticated animals, and the borrowing of some names of grains points to agricultural influence from the Indo-Iranians on the speakers of Uralic.

Needless to say, many of the borrowings I listed in A Song of Sheep and Horses suffer from the same ailment attributed to Indo-Europeanists in general:

With slight exaggeration one can agree with the remark by Koivulehto (1999a: 209–210) that the Indo-Europeanists often use outdated sources or are simply uninterested in the topic. The problem is further complicated by the various and often obsolete views expressed in even relatively modern Uralicist works, such as those of Rédei (1986c; 1988) or Katz (2003); (…) Mallory & Adams (2006) adequately refer to the importance of the early loanwords, but they use mostly Rédei’s outdated reconstructions and stratigraphy in support of their theories.

I need to review all related texts with this thesis and the works recently published by Kümmel, as well as the recent book of the Leiden school on Indo-Uralic.

Also, does anyone know the (traditional?) why of the resistance to the Indo-Uralic concept among Uralicists? Maybe it’s a reaction against the Nostraticist and Siberian views of Uralic espoused by the Soviets?

Related

Yamnaya replaced Europeans, but admixed heavily as they spread to Asia

narasimhan-spread-yamnaya-ancestry

Recent papers The formation of human populations in South and Central Asia, by Narasimhan, Patterson et al. Science (2019) and An Ancient Harappan Genome Lacks Ancestry from Steppe Pastoralists or Iranian Farmers, by Shinde et al. Cell (2019).

NOTE. For direct access to Narasimhan, Patterson et al. (2019), visit this link courtesy of the first author and the Reich Lab.

I am currently not on holidays anymore, and the information in the paper is huge, with many complex issues raised by the new samples and analyses rather than solved, so I will stick to the Indo-European question, especially to some details that have changed since the publication of the preprint. For a summary of its previous findings, see the book series A Song of Sheep and Horses, in particular the sections from A Clash of Chiefs where I discuss languages and regions related to Central and South Asia.

I have updated the maps of the Preshistory Atlas, and included the most recently reported mtDNA and Y-DNA subclades. I will try to update the Eurasian PCA and related graphics, too.

NOTE. Many subclades from this paper have been reported by Kolgeh (download), Pribislav and Principe at Anthrogenica on this thread. I have checked some out for comparison, but even if it contradicted their analyses mine would be the wrong ones. I will upload my spreadsheets and link to them from this page whenever I find the time.

caucasus-cline-narasimhan
Ancestry clines (1) before and (2) after the advent of farming. Colour modified from the original to emphasize the CHG cline: notice the apparent relevance of forest-steppe groups in the formation of this CHG mating network from which Pre-Yamnaya peoples emerged.

Indo-Europeans

I think the Narasimhan, Patterson et al. (2019) paper is well-balanced, and unexpectedly centered – as it should – on the spread of Yamnaya-related ancestry (now Western_Steppe_EMBA) as the marker of Proto-Indo-European migrations, which stretched ca. 3000 BC “from Hungary in the west to the Altai mountains in the east”, spreading later Indo-European dialects after admixing with local groups, from the Atlantic to South Asia.

I. Afanasievo

I.1. East or West PIE?

I expected Afanasievo to show (1) R1b-L23(xZ2103, xL51) and (2) R1b-L51 lineages, apart from (3) the known R1b-Z2103 ones, pointing thus to an ancestral PIE community before the typical Yamnaya bottlenecks, and with R1b-L51 supporting a connection with North-West Indo-European. The presence of some samples of hg. Q pointed in this direction, too.

However, Afanasievo samples show overwhelmingly R1b-Z2103 subclades (all except for those with low coverage), all apparently under R1b-Z2108 (formed ca. 3500 BC, TMRCA ca. 3500 BC), like most samples from East Yamnaya.

This necessarily shifts the split and spread of R1b-L23 lineages to Khvalynsk/early Repin-related expansions, in line with what TMRCA suggested, and what advances by Anthony (2019) and Khokhlov (2018) on future samples from the Reich Lab suggest.

Given the almost indistinguishable ancestry between Afanasievo and Early Yamnaya, there seems to be as of yet little potential information to support in population genomics that Pre-Tocharians were more closely related to North-West Indo-Europeans than to Graeco-Aryans, as it is proposed in linguistics based on the few shared traits between them, and the lack of innovations proper of the Graeco-Aryan community.

NOTE. A new issue of Wekʷos contains an abstract from a relevant paper by Blažek on vocabulary for ‘word’, including the common NWIE *wrdʰo-/wordʰo-, but also a new (for me, at least) Northern Indo-European one: *rēki-/*rēkoi̯-, shared by Slavic and Tocharian.

The fact that bottlenecks happened around the time of the late Repin expansion suggests that we might be able to see different clans based on the predominant lineages developing around the Don-Volga area in the 4th millennium BC. The finding of Pre-R1b-L51 in Lopatino (see below), and of a Catacomb sample of hg. R1b-Z2103(Z2105-) in the North Caucasus steppe near Novoaleksandrovskij also support a star-like phylogeny of R1b-L23 stemming from the Don-Volga area.

NOTE. Interestingly, a dismissal of a common trunk between Tocharian and North-West Indo-European would mean that shared similarities between such disparate groups could be traced back to a Common Late PIE trunk, and not to a shared (western) Repin community. For an example of such a ‘pure’ East-West dialectal division, see the diagram of Adams & Mallory (2007) at the end of the post. It would thus mean a fatal blow to Kortlandt’s Indo-Slavonic group among other hypothetical groupings (remade versions of the ancient Centum-Satem division), as well as to certain assumptions about laryngeal survival or tritectalism that usually accompany them. Still, I don’t think this is the case, so the question will remain a linguistic one, and maybe some similarities will be found with enough number of samples that differentiate Northern Indo-Europeans from the East Yamna/Catacomb-Poltavka-Balkan_EBA group.

afanasievo-y-dna
Y-chromosome haplogroups of Afanasievo samples and neighbouring groups. See full maps.

I.2. Expansion or resurgence of hg. Q1b?

Haplogroup Q1b-Y6802(xY6798) seems to be the main lineage that expanded with Afanasievo, or resurged in their territory. It’s difficult to tell, because the three available samples are family, and belong to a later period.

NOTE. I have finally put some order to the chaos of Q1a vs. Q1b subclades in my spreadsheet and in the maps. The change of ISOGG 2016 to 2017 has caused that many samples reported as of Q1 subclades from papers prepared during the 2017-2018 period, and which did not provide specific SNP calls, were impossible to define with certainty. By checking some of them I could determine the specific standard used.

In favour of the presence of this haplogroup in the Pre-Yamnaya community are:

  • The statement by Anthony (2019) that Q1a [hence maybe Q1b in the new ISOGG nomenclature] represented a significant minority among an R1b-rich community.
  • The sample found in a Sintastha WSHG outlier (see below), of hg. Q1b-Y6798, and the sample from Lola, of hg. Q1b-L717, are thus from other lineage(s) separated thousands of years from the Afanasievo subclade, but might be related to the Khvalynsk expansion, like R1b-V1636 and R1b-M269 are.

These are the data that suggest multiple resurgence events in Afanasievo, rather than expanding Q1b lineages with late Repin:

  • Overwhelming presence of R1b in early Yamnaya and Afanasievo samples; one Q1(xQ1b) sample reported in Khvalynsk.
  • The three Q1b samples appear only later, although wide CI for radiocarbon dates, different sites, and indistinguishable ancestry may preclude a proper interpretation of the only available family.
    • Nevertheless, ancestry seems unimportant in the case of Afanasievo, since the same ancestry is found up to the Iron Age in a community of varied haplogroups.
  • Another sample of hg. Q1b-Y6802(xY6798) is found in Aigyrzhal_BA (ca. 2120 BC), with Central_Steppe_EMBA (WSHG-related) ancestry; however, this clade formed and expanded ca. 14000 BC.
  • The whole Altai – Baikal area seems to be a Q1b-L54 hotspot, although admittedly many subclades separated very early from each other, so they might be found throughout North Eurasia during the Neolithic.
  • One Afanasievo sample is reported as of hg. C in Shin (2017), and the same haplogroup is reported by Hollard (2014) for the only available sample of early Chemurchek to date, from Kulala ula, North Altai (ca. 2400 BC).
afanasievo-chemurchek-y-dna
Y-chromosome haplogroups of late Afanasievo – early Chemurchek samples and neighbouring groups. See full maps.

I.3. Agricultural substrate

Evidence of continuous contacts of Central_Steppe_MLBA populations with BMAC from ca. 2100 BC on – visible in the appearance of Steppe ancestry among BMAC samples and BMAC ancestry among Steppe pastoralists – supports the close interaction between Indo-Iranian pastoralists and BMAC agriculturalists as the origin of the Asian agricultural substrate found in Proto-Indo-Iranian, hence likely related to the language of the Oxus Civilization.

Similar to the European agricultural substrate adopted by West Yamnaya settlers (both NWIE and Palaeo-Balkan speakers), Tocharian shows a few substrate terms in common with Indo-Iranian, which can be explained by contacts in different dialectal stages through phonetic reconstruction alone.

The recent Hermes et al. (2019) supports the early integration of pastoralism and millet cultivation in Central Asia (ca. 2700 BC or earlier), with the spread of agriculture to the north – through the Inner Asian Mountain Corridor – being thus unrelated to the Indo-Iranian expansions, which might support independent loans.

However, compared to the huge number of parallel shared loans between NWIE and Palaeo-Balkan languages in the European substratum, Indo-Iranians seem to have been the first borrowers of vocabulary from Asian agriculturalists, while Proto-Tocharian shows just one certain related word, with phonetic similarities that warrant an adoption from late Indo-Iranian dialects.

chemurchek-sintashta-bmac
Y-chromosome haplogroups of Sintashta, Central Asia, and neighbouring groups in the Early Bronze Age. See full maps.

The finding of hg. (pre-)R1b-PH155 in a BMAC sample from Dzharkutan (to the west of Xinjiang) together with hg. R1b in a sample from Central Mongolia previously reported by Shin (2017) support the widespread presence of this lineage to the east and west of Xinjiang, which means it might have become incorporated to Indo-Iranian migrants into the Xiaohe horizon, to the Afanasievo-Chemurchek-derived groups, or the later from the former. In other words, the Island Biogeography Theory with its explanation of founder effects might be, after all, applicable to the whole Xinjiang area, not only during the Chemurchek – Tianshan-Beilu – Xiaohe interaction.

Of course, there is no need for too complicated models of haplogroup resurgence events in Central and South Asia, seeing how the total amount of hg. R1a-L657 (today prevalent among Indo-Aryan speakers from South Asia) among ancient Western/Central_Steppe_MLBA-related samples amounts to a total of 0, and that many different lineages survived in the region. Similar cases of haplogroup resurgence and Y-DNA bottleneck events are also found in the Central and Eastern Mediterranean, and in North-Eastern Europe. From the paper:

[It] could reflect stronger ecological or cultural barriers to the spread of people in South Asia than in Europe, allowing the previously established groups more time to adapt and mix with incoming groups. A second difference is the smaller proportion of Steppe pastoralist– related ancestry in South Asia compared with Europe, its later arrival by ~500 to 1000 years, and a lower (albeit still significant) male sex bias in the admixture (…).

Y-chromosome haplogroups of samples from the Srubna-Andronovo and Andronovo-related horizon, Xiaohe, late BMAC, and neighbouring groups. See full maps.

II. R1b-Beakers replaced R1a-CWC peoples

II.1. R1a-M417-rich Corded Ware

Newly reported Corded Ware samples from Radovesice show hg. R1a-M417, at least some of them xZ645, ‘archaic’ lineages shared with the early Bergrheinfeld sample (ca. 2650 BC) and with the coeval Esperstedt family, hence supporting that it eventually became the typical Western Corded Ware lineage(s), probably dominating over the so-called A-horizon and the Single Grave culture in particular. On the other hand, R1a-Z645 was typical of bottlenecks among expanding Eastern Corded Ware groups.

Interestingly, it is supported once again that known bottlenecks under hg. R1a-M417 happened during the Corded Ware expansion, evidenced also by the remarkable high variability of male lineages among early Corded Ware samples. Similarly, these Corded Ware samples from Bohemia form part of the typical ‘Central European’ cluster in the PCA, which excludes once again not only the ‘official’ Espersted outlier I1540, but also the known outlier with Yamnaya ancestry.

NOTE. The fact that Esperstedt is closely related geographically and in terms of ancestry to later Únětice samples further complicates the assumption that Únětice is a mixture of Bell Beakers and Corded Ware, being rather an admixture of incoming Bell Beakers with post-Yamnaya vanguard settlers who admixed with Corded Ware (see more on the expansion of Yamnaya ancestry). In other words, Únětice is rather an admixture of Yamnaya+EEF with Yamnaya+(CWC+EEF).

Y-chromosome haplogroups of samples from Catacomb, Poltavka, Balkan EBA, and Bell Beaker, as well as neighbouring groups. See full maps.

On Ukraine_Eneolithic I6561

If the bottlenecks are as straightforward as they appear, with a star-like phylogeny of R1a-M417 starting with the Pre-Corded Ware expansion, then what is happening with the Alexandria sample, so precisely radiocarbon dated to ca. 4045-3974 BC? The reported hg. R1a-M417 was fully compatible, while R1a-Z645 could be compatible with its date, but the few positive SNPs I got in my analysis point indeed to a potential subclade of R1a-Z94, and I trust more experienced hobbyists in this ‘art’ of ascertaining the SNPs of ancient samples, and they report hg. R1a-Z93 (Z95+, Y26+, Y2-).

Seeing how Y-DNA bottlenecks worked in Yamnaya-Afanasievo and in Corded Ware and related groups, and if this sample really is so deep within R1a-Z93 in a region that should be more strongly affected by the known Neolithic Y-chromosome bottlenecks and forest-steppe ecotone, someone from the lab responsible for this sample should check its date once again, before more people keep chasing their tails with an individual that (based on its derived SNPs’ TMRCA) might actually be dated to the Bronze Age, where it could make much more sense in terms of ancestry and position in the PCA.

EDIT (14 SEP 2019): … and with the fact that he is the first individual to show the genetic adaptation for lactase persistence (I3910-T), which is only found later among Bell Beakers, and much later in Sintashta and related Steppe_MLBA peoples (see comments below).

This is also evidenced by the other Ukraine_Eneolithic (likely a late Yamnaya) sample of hg. R1b-Z2103 from Dereivka (ca. 2800 BC) and who – despite being in a similar territory 1,000 years later – shows a wholly diluted Yamnaya ancestry under typically European HG ancestry, even more so than other late Sredni Stog samples from Dereivka of ca. 3600-3400 BC, suggesting a decrease in Steppe ancestry rather than an increase – which is supposedly what should be expected based on the ancestry from Alexandria…

Like the reported Chalcolithic individual of Hajji Firuz who showed an apparently incompatible subclade and Yamnaya ancestry at least some 1,000 years before it should, and turned out to be from the Iron Age (see below), this may be another case of wrong radiocarbon dating.

NOTE. It would be interesting, if this turns out to be another Hajji Firuz-like error, to check how well different ancestry models worked in whose hands exactly, and if anyone actually pointed out that this sample was derived, and not ancestral, to many different samples that were used in combination with it. It would also be a great control to check if those still supporting a Sredni Stog origin for PIE would shift their preference even more to the north or west, depending on where the first “true” R1a-M417 samples popped up. Such a finding now could be thus a great tool to discover whether haplogroup-based bias plays a role in ancestry magic as related to the Indo-European question, i.e. if it really is about “pure statistics”, or there is something else to it…

II.1. R1b-L51-rich Bell Beakers

The overwhelming majority of R1b-L51 lineages in Radovesice during the Bell Beaker period, just after the sampled Corded Ware individuals from the same site, further strengthen the hypothesis of an almost full replacement of R1a-M417 lineages from Central Europe up to southern Scandinavia after the arrival of Bell Beakers.

Yet another R1b-L151* sample has popped up in Central Europe, in the individual classified as Bilina_BA (ca. 2200-800 BC), which clusters with Bell Beakers from Bohemia, with the outlier from Turlojiškė, and with Early Slavs, suggesting once again that a group of central-east European Beakers represented the Pre-Proto-Balto-Slavic community before their spread and admixture events to the east.

The available ancient distribution of R1b-L51*, R1b-L52* or R1b-L151* is getting thus closer to the most likely origin of R1b-L51 in the expansion of East Bell Beakers, who trace their paternal ancestors to Yamnaya settlers from the Carpathian Basin:

NOTE. Some of these are from other sources, and some are samples I have checked in a hurry, so I may have missed some derived SNPs. If you send me a corrected SNP call to dismiss one of these, or more ‘archaic’ samples, I’ll correct the map accordingly. See also maps of modern distributionof R1b-M269 subclades.

r1b-l51-ancient-europe
Distribution of ‘archaic’ R1b-L51 subclades in ancient samples, overlaid over a map of Yamnaya and Bell Beaker migrations. In blue, Yamnaya Pre-L51 from Lopatino (not shown) and R1b-L52* from BBC Augsburg. In violet, R1b-L51 (xP312,xU106) from BBC Prague and Poland. In maroon, hg. R1b-L151* from BBC Hungary, BA Bohemia, and (not shown) a potential sample from BBC at Mondelange, which is certainly xU106, maybe xP312. Interestingly, the earliest sample of hg. R1b-U106 (a lineage more proper of northern Europe) has been found in a Bell Beaker from Radovesice (ca. 2350 BC), between two of these ‘archaic’ R1b-L51 samples; and a sample possibly of hg. R1b-ZZ11+ (ancestral to DF27 and U152) was found in a Bell Beaker from Quedlinburg, Germany (ca. 2290 BC), to the north-west of Bohemia. The oldest R1b-U152 are logically from Central Europe, too.

III. Proto-Indo-Iranian

Before the emergence of Proto-Indo-Iranian, it seems that Pre-Proto-Indo-Iranian-speaking Poltavka groups were subjected to pressure from Central_Steppe_EMBA-related peoples coming from the (south-?)east, such as those found sampled from Mereke_BA. Their ‘kurgan’ culture was dated correctly to approximately the same date as Poltavka materials, but their ancestry and hg. N2(pre-N2a) – also found in a previous sample from Botai – point to their intrusive nature, and thus to difficulties in the Pre-Proto-Indo-Iranian community to keep control over the previous East Yamnaya territory in the Don-Volga-Ural steppes.

We know that the region does not show genetic continuity with a previous period (or was not under this ‘eastern’ pressure) because of an Eastern Yamnaya sample from the same site (ca. 3100 BC) showing typical Yamnaya ancestry. Before Yamnaya, it is likely that Pre-Yamnaya ancestry formed through admixture of EHG-like Khvalynsk with a North Caspian steppe population similar to the Steppe_Eneolithic samples from the North Caucasus Piedmont (see Anthony 2019), so we can also rule out some intermittent presence of a Botai/Kelteminar-like population in the region during the Khvalynsk period.

It is very likely, then, that this competition for the same territory – coupled with the known harsher climate of the late 3rd millennium BC – led Poltavka herders to their known joint venture with Abashevo chiefs in the formation of the Sintashta-Potapovka-Filatovka community of fortified settlements. Supporting these intense contacts of Poltavka herders with Central Asian populations, late ‘outliers’ from the Volga-Ural region show admixture with typical Central_Steppe_MLBA populations: one in Potapovka (ca. 2220 BC), of hg. R1b-Z2103; and four in the Sintashta_MLBA_o1 cluster (ca. 2050-1650 BC), with two samples of hg. R1b-L23 (one R1b-Z2109), one Q1b-L56(xL53), one Q1b-Y6798.

central-steppe-pastoralists
Outlier analysis reveals ancient contacts between sites. We plot the average of principal component 1 (x axis) and principal component 2 (y axis) for the West Eurasian and All Eurasian PCA plots (…). In the Middle to Late Bronze Age Steppe, we observe, in addition to the Western_Steppe_MLBA and Central_Steppe_MLBA clusters (indistinguishable in this projection), outliers admixed with other ancestries. The BMAC-related admixture in Kazakhstan documents northward gene flow onto the Steppe and confirms the Inner Asian Mountain Corridor as a conduit for movement of people.

Similar to how the Sintashta_MLBA_o2 cluster shows an admixture with central steppe populations and hg. R1a-Z645, the WSHG ancestry in those outliers from the o1 cluster of typically (or potentially) Yamnaya lineages show that Poltavka-like herders survived well after centuries of Abashevo-Poltavka coexistence and admixture events, supporting the formation of a Proto-Indo-Iranian community from the local language as pronounced by the incomers, who dominated as elites over the fortified settlements.

The Proto-Indo-Iranian community likely formed thus in situ in the Don-Volga-Ural region, from the admixture of locals of Yamnaya ancestry with incomers of Corded Ware ancestry – represented by the ca. 67% Yamnaya-like ancestry and ca. 33% ancestry from the European cline. Their community formed thus ca. 1,000 years later than the expansion of Late PIE ca. 3500 BC, and expanded (some 500 years after that) a full-fledged Proto-Indo-Iranian language with the Srubna-Andronovo horizon, further admixing with ca. 9% of Central_Steppe_EMBA (WSHG-related) ancestry in their migration through Central Asia, as reported in the paper.

IV. Armenian

The sample from Hajji Firuz, of hg. R1b-Z2103 (xPF331), has been – as expected – re-dated to the Iron Age (ca. 1193-1019 BC), hence it may offer – together with the samples from the Levant and their Aegean-like ancestry rapidly diluted among local populations – yet another proof of how the Late Bronze Age upheaval in Europe was the cause of the Armenian migration to the Armenoid homeland, where they thrived under the strong influence from Hurro-Urartian.

middle-east-armenia-y-dna
Y-chromosome haplogroups of the Middle East and neighbouring groups during the Late Bronze Age / Iron Age. See full maps.

Indus Valley Civilization and Dravidian

A surprise came from the analysis reported by Shinde et al. (2019) of an Iran_N-related IVC ancestry which may have split earlier than 10000 BC from a source common to Iran hunter-gatherers of the Belt Cave.

For the controversial Elamo-Dravidian hypothesis of the Muscovite school, this difference in ancestry between both groups (IVC and Iran Neolithic) seems to be a death blow, if population genomics was even needed for that. Nevertheless, I guess that a full rejection of a recent connection will come down to more recent and subtle population movements in the area.

EDIT (12 SEP): Apparently, Iosif Lazaridis is not so sure about this deep splitting of ‘lineages’ as shown in the paper, so we may be talking about different contributions of AME+ANE/ENA, which means the Elamo-Dravidian game is afoot; at least in genomics:

I shared the idea that the Indus Valley Civilization was linked to the Proto-Dravidian community, so I’m inclined to support this statement by Narasimhan, Patterson, et al. (2019), even if based only on modern samples and a few ancient ones:

The strong correlation between ASI ancestry and present-day Dravidian languages suggests that the ASI, which we have shown formed as groups with ancestry typical of the Indus Periphery Cline moved south and east after the decline of the IVC to mix with groups with more AASI ancestry, most likely spoke an early Dravidian language.

india-steppe-indus-valley-andamanese-ancestry
Natural neighbour interpolation of qpAdm results – Maximum A Posteriori Estimate from the Hierarchical Model (estimates used in the Narasimhan, Patterson et al. 2019 figures) for Central_Steppe_MLBA-related (left), Indus_Periphery_West-related (center) and Andamanese_Hunter-Gatherer-related ancestry (right) among sampled modern Indian populations. In blue, peoples of IE language; in red, Dravidian; in pink, Tibeto-Burman; in black, unclassified. See full image.

I am wary of this sort of simplistic correlation with modern speakers, because we have seen what happened with the wrong assumptions about modern Balto-Slavic and Finno-Ugric speakers and their genetic profile (see e.g. here or here). In fact, I just can’t differentiate as well as those with deep knowledge in South Asian history the social stratification of the different tribal groups – with their endogamous rules under the varna and jati systems – in the ancestry maps of modern India. The pattern of ancestry and language distribution combined with the findings of ancient populations seem in principle straightforward, though.

Conclusion

The message to take home from Shinde et al. (2019) is that genomic data is fully at odds with the Anatolian homeland hypothesis – including the latest model by Heggarty (2014)* – whose relevance is still overvalued today, probably due in part to the shift of OIT proponents to more reasonable Out-of-Iran models, apparently more fashionable as a vector of Indo-Aryan languages than Eurasian steppe pastoralists?
*The authors listed this model erroneously as Heggarty (2019).

The paper seems to play with the occasional reference to Corded Ware as a vector of expansion of Indo-European languages, even after accepting the role of Yamnaya as the most evident population expanding Late PIE to western Europe – and the different ancestry that spread with Indo-Iranian to South Asia 1,000 years later. However, the most cringe-worthy aspect is the sole citation of the debunked, pseudoscientific glottochronological method used by Ringe, Warnow, and Taylor (2002) to support the so-called “steppe homeland”, a paper and dialectal scheme which keeps being referenced in papers of the Reich Lab, probably as a consequence of its use in Anthony (2007).

On the other hand, these are the equivalent simplistic comments in Narasimhan, Patterson et al. (2019):

The Steppe ancestry in South Asia has the same profile as that in Bronze Age Eastern Europe, tracking a movement of people that affected both regions and that likely spread the unique features shared between Indo-Iranian and Balto-Slavic languages. (…), which despite their vast geographic separation share the “satem” innovation and “ruki” sound laws.

mallory-adams-tree
Indo-European dialectal relationships, from Mallory and Adams (2006).

The only academic closely related to linguistics from the list of authors, as far as I know, is James P. Mallory, who has supported a North-West Indo-European dialect (including Balto-Slavic) for a long time – recently associating its expansion with Bell Beakers – opposed thus to a Graeco-Aryan group which shared certain innovations, “Satemization” not being one of them. Not that anyone needs to be a linguist to dismiss any similarities between Balto-Slavic and Indo-Iranian beyond this phonetic trend, mind you.

Even Anthony (2019) supports now R1b-rich Pre-Yamnaya and Yamnaya communities from the Don-Volga region expanding Middle and Late Proto-Indo-European dialects.

So how does the underlying Corded Ware ancestry of eastern Europe (where Pre-Balto-Slavs eventually spread to from Bell Beaker-derived groups) and of the highly admixed (“cosmopolitan”, according to the authors) Sintashta-Potapovka-Filatovka in the east relate to the similar-but-different phonetic trends of two unrelated IE dialects?

If only there was a language substrate that could (as Shinde et al. put it) “elegantly” explain this similar phonetic evolution, solving at the same time the question of the expansion of Uralic languages and their strong linguistic contacts with steppe peoples. Say, Eneolithic populations of mainly hunter-fisher-gatherers from the North Pontic forest-steppes with a stronger connection to metalworking

Related

Yamnaya ancestry: mapping the Proto-Indo-European expansions

steppe-ancestry-expansion-europe

The latest papers from Ning et al. Cell (2019) and Anthony JIES (2019) have offered some interesting new data, supporting once more what could be inferred since 2015, and what was evident in population genomics since 2017: that Proto-Indo-Europeans expanded under R1b bottlenecks, and that the so-called “Steppe ancestry” referred to two different components, one – Yamnaya or Steppe_EMBA ancestry – expanding with Proto-Indo-Europeans, and the other one – Corded Ware or Steppe_MLBA ancestry – expanding with Uralic speakers.

The following maps are based on formal stats published in the papers and supplementary materials from 2015 until today, mainly on Wang et al. (2018 & 2019), Mathieson et al. (2018) and Olalde et al. (2018), and others like Lazaridis et al. (2016), Lazaridis et al. (2017), Mittnik et al. (2018), Lamnidis et al. (2018), Fernandes et al. (2018), Jeong et al. (2019), Olalde et al. (2019), etc.

NOTE. As in the Corded Ware ancestry maps, the selected reports in this case are centered on the prototypical Yamnaya ancestry vs. other simplified components, so everything else refers to simplistic ancestral components widespread across populations that do not necessarily share any recent connection, much less a language. In fact, most of the time they clearly didn’t. They can be interpreted as “EHG that is not part of the Yamnaya component”, or “CHG that is not part of the Yamnaya component”. They can’t be read as “expanding EHG people/language” or “expanding CHG people/language”, at least no more than maps of “Steppe ancestry” can be read as “expanding Steppe people/language”. Also, remember that I have left the default behaviour for color classification, so that the highest value (i.e. 1, or white colour) could mean anything from 10% to 100% depending on the specific ancestry and period; that’s what the legend is for… But, fere libenter homines id quod volunt credunt.

Sections:

  1. Neolithic or the formation of Early Indo-European
  2. Eneolithic or the expansion of Middle Proto-Indo-European
  3. Chalcolithic / Early Bronze Age or the expansion of Late Proto-Indo-European
  4. European Early Bronze Age and MLBA or the expansion of Late PIE dialects

1. Neolithic

Anthony (2019) agrees with the most likely explanation of the CHG component found in Yamnaya, as derived from steppe hunter-fishers close to the lower Volga basin. The ultimate origin of this specific CHG-like component that eventually formed part of the Pre-Yamnaya ancestry is not clear, though:

The hunter-fisher camps that first appeared on the lower Volga around 6200 BC could represent the migration northward of un-admixed CHG hunter-fishers from the steppe parts of the southeastern Caucasus, a speculation that awaits confirmation from aDNA.

neolithic-chg-ancestry
Natural neighbor interpolation of CHG ancestry among Neolithic populations. See full map.

The typical EHG component that formed part eventually of Pre-Yamnaya ancestry came from the Middle Volga Basin, most likely close to the Samara region, as shown by the sampled Samara hunter-gatherer (ca. 5600-5500 BC):

After 5000 BC domesticated animals appeared in these same sites in the lower Volga, and in new ones, and in grave sacrifices at Khvalynsk and Ekaterinovka. CHG genes and domesticated animals flowed north up the Volga, and EHG genes flowed south into the North Caucasus steppes, and the two components became admixed.

neolithic-ehg-ancestry
Natural neighbor interpolation of EHG ancestry among Neolithic populations. See full map.

To the west, in the Dnieper-Dniester area, WHG became the dominant ancestry after the Mesolithic, at the expense of EHG, revealing a likely mating network reaching to the north into the Baltic:

Like the Mesolithic and Neolithic populations here, the Eneolithic populations of Dnieper-Donets II type seem to have limited their mating network to the rich, strategic region they occupied, centered on the Rapids. The absence of CHG shows that they did not mate frequently if at all with the people of the Volga steppes (…)

neolithic-whg-ancestry
Natural neighbor interpolation of WHG ancestry among Neolithic populations. See full map.

North-West Anatolia Neolithic ancestry, proper of expanding Early European farmers, is found up to border of the Dniester, as Anthony (2007) had predicted.

neolithic-anatolia-farmer-ancestry
Natural neighbor interpolation of Anatolia Neolithic ancestry among Neolithic populations. See full map.

2. Eneolithic

From Anthony (2019):

After approximately 4500 BC the Khvalynsk archaeological culture united the lower and middle Volga archaeological sites into one variable archaeological culture that kept domesticated sheep, goats, and cattle (and possibly horses). In my estimation, Khvalynsk might represent the oldest phase of PIE.

(…) this middle Volga mating network extended down to the North Caucasian steppes, where at cemeteries such as Progress-2 and Vonyuchka, dated 4300 BC, the same Khvalynsk-type ancestry appeared, an admixture of CHG and EHG with no Anatolian Farmer ancestry, with steppe-derived Y-chromosome haplogroup R1b. These three individuals in the North Caucasus steppes had higher proportions of CHG, overlapping Yamnaya. Without any doubt, a CHG population that was not admixed with Anatolian Farmers mated with EHG populations in the Volga steppes and in the North Caucasus steppes before 4500 BC. We can refer to this admixture as pre-Yamnaya, because it makes the best currently known genetic ancestor for EHG/CHG R1b Yamnaya genomes.

From Wang et al (2019):

Three individuals from the sites of Progress 2 and Vonyuchka 1 in the North Caucasus piedmont steppe (‘Eneolithic steppe’), which harbour EHG and CHG related ancestry, are genetically very similar to Eneolithic individuals from Khvalynsk II and the Samara region. This extends the cline of dilution of EHG ancestry via CHG-related ancestry to sites immediately north of the Caucasus foothills

eneolithic-pre-yamnaya-ancestry
Natural neighbor interpolation of Pre-Yamnaya ancestry among Neolithic populations. See full map. This map corresponds roughly to the map of Khvalynsk-Novodanilovka expansion, and in particular to the expansion of horse-head pommel-scepters (read more about Khvalynsk, and specifically about horse symbolism)

NOTE. Unpublished samples from Ekaterinovka have been previously reported as within the R1b-L23 tree. Interestingly, although the Varna outlier is a female, the Balkan outlier from Smyadovo shows two positive SNP calls for hg. R1b-M269. However, its poor coverage makes its most conservative haplogroup prediction R-M343.

The formation of this Pre-Yamnaya ancestry sets this Volga-Caucasus Khvalynsk community apart from the rest of the EHG-like population of eastern Europe.

eneolithic-ehg-ancestry
Natural neighbor interpolation of non-Pre-Yamnaya EHG ancestry among Eneolithic populations. See full map.

Anthony (2019) seems to rely on ADMIXTURE graphics when he writes that the late Sredni Stog sample from Alexandria shows “80% Khvalynsk-type steppe ancestry (CHG&EHG)”. While this seems the most logical conclusion of what might have happened after the Suvorovo-Novodanilovka expansion through the North Pontic steppes (see my post on “Steppe ancestry” step by step), formal stats have not confirmed that.

In fact, analyses published in Wang et al. (2019) rejected that Corded Ware groups are derived from this Pre-Yamnaya ancestry, a reality that had been already hinted in Narasimhan et al. (2018), when Steppe_EMBA showed a poor fit for expanding Srubna-Andronovo populations. Hence the need to consider the whole CHG component of the North Pontic area separately:

eneolithic-chg-ancestry
Natural neighbor interpolation of non-Pre-Yamnaya CHG ancestry among Eneolithic populations. See full map. You can read more about population movements in the late Sredni Stog and closer to the Proto-Corded Ware period.

NOTE. Fits for WHG + CHG + EHG in Neolithic and Eneolithic populations are taken in part from Mathieson et al. (2019) supplementary materials (download Excel here). Unfortunately, while data on the Ukraine_Eneolithic outlier from Alexandria abounds, I don’t have specific data on the so-called ‘outlier’ from Dereivka compared to the other two analyzed together, so these maps of CHG and EHG expansion are possibly showing a lesser distribution to the west than the real one ca. 4000-3500 BC.

eneolithic-whg-ancestry
Natural neighbor interpolation of WHG ancestry among Eneolithic populations. See full map.

Anatolia Neolithic ancestry clearly spread to the east into the north Pontic area through a Middle Eneolithic mating network, most likely opened after the Khvalynsk expansion:

eneolithic-anatolia-farmer-ancestry
Natural neighbor interpolation of Anatolia Neolithic ancestry among Eneolithic populations. See full map.
eneolithic-iran-chl-ancestry
Natural neighbor interpolation of Iran Chl. ancestry among Eneolithic populations. See full map.

Regarding Y-chromosome haplogroups, Anthony (2019) insists on the evident association of Khvalynsk, Yamnaya, and the spread of Pre-Yamnaya and Yamnaya ancestry with the expansion of elite R1b-L754 (and some I2a2) individuals:

eneolithic-early-y-dna
Y-DNA haplogroups in West Eurasia during the Early Eneolithic in the Pontic-Caspian steppes. See full map, and see culture, ADMIXTURE, Y-DNA, and mtDNA maps of the Early Eneolithic and Late Eneolithic.

3. Early Bronze Age

Data from Wang et al. (2019) show that Corded Ware-derived populations do not have good fits for Eneolithic_Steppe-like ancestry, no matter the model. In other words: Corded Ware populations show not only a higher contribution of Anatolia Neolithic ancestry (ca. 20-30% compared to the ca. 2-10% of Yamnaya); they show a different EHG + CHG combination compared to the Pre-Yamnaya one.

eneolithic-steppe-best-fits
Supplementary Table 13. P values of rank=2 and admixture proportions in modelling Steppe ancestry populations as a three-way admixture of Eneolithic steppe Anatolian_Neolithic and WHG using 14 outgroups.
Left populations: Test, Eneolithic_steppe, Anatolian_Neolithic, WHG.
Right populations: Mbuti.DG, Ust_Ishim.DG, Kostenki14, MA1, Han.DG, Papuan.DG, Onge.DG, Villabruna, Vestonice16, ElMiron, Ethiopia_4500BP.SG, Karitiana.DG, Natufian, Iran_Ganj_Dareh_Neolithic.

Yamnaya Kalmykia and Afanasievo show the closest fits to the Eneolithic population of the North Caucasian steppes, rejecting thus sizeable contributions from Anatolia Neolithic and/or WHG, as shown by the SD values. Both probably show then a Pre-Yamnaya ancestry closest to the late Repin population.

wang-eneolithic-steppe-caucasus-yamnaya
Modelling results for the Steppe and Caucasus cluster. Admixture proportions based on (temporally and geographically) distal and proximal models, showing additional AF ancestry in Steppe groups and additional gene flow from the south in some of the Steppe groups as well as the Caucasus groups. See tables above. Modified from Wang et al. (2019). Within a blue square, Yamnaya-related groups; within a cyan square, Corded Ware-related groups. Green background behind best p-values. In red circle, SD of AF/WHG ancestry contribution in Afanasevo and Yamnaya Kalmykia, with ranges that almost include 0%.

EBA maps include data from Wang et al. (2018) supplementary materials, specifically unpublished Yamnaya samples from Hungary that appeared in analysis of the preprint, but which were taken out of the definitive paper. Their location among Yamnaya settlers from Hungary is speculative, although most uncovered kurgans in Hungary are concentrated in the Tisza-Danube interfluve.

eba-yamnaya-ancestry
Natural neighbor interpolation of Pre-Yamnaya ancestry among Early Bronze Age populations. See full map. This map corresponds roughly with the known expansion of late Repin/Yamnaya settlers.

The Y-chromosome bottleneck of elite males from Proto-Indo-European clans under R1b-L754 and some I2a2 subclades, already visible in the Khvalynsk sampling, became even more noticeable in the subsequent expansion of late Repin/early Yamnaya elites under R1b-L23 and I2a-L699:

chalcolithic-early-y-dna
Y-DNA haplogroups in West Eurasia during the Yamnaya expansion. See full map and maps of cultures, ADMIXTURE, Y-DNA, and mtDNA of the Early Chalcolithic and Yamnaya Hungary.

Maps of CHG, EHG, Anatolia Neolithic, and probably WHG show the expansion of these components among Corded Ware-related groups in North Eurasia, apart from other cultures close to the Caucasus:

NOTE. For maps with actual formal stats of Corded Ware ancestry from the Early Bronze Age to the modern times, you can read the post Corded Ware ancestry in North Eurasia and the Uralic expansion.

eba-chg-ancestry
Natural neighbor interpolation of non-Pre-Yamnaya CHG ancestry among Early Bronze Age populations. See full map.
eba-ehg-ancestry
Natural neighbor interpolation of non-Pre-Yamnaya EHG ancestry among Early Bronze Age populations. See full map.
eba-whg-ancestry
Natural neighbor interpolation of WHG ancestry among Early Bronze Age populations. See full map.
eba-anatolia-farmer-ancestry
Natural neighbor interpolation of Anatolia Neolithic ancestry among Early Bronze Age populations. See full map.
eba-iran-chl-ancestry
Natural neighbor interpolation of Iran Chl. ancestry among Early Bronze Age populations. See full map.

4. Middle to Late Bronze Age

The following maps show the most likely distribution of Yamnaya ancestry during the Bell Beaker-, Balkan-, and Sintashta-Potapovka-related expansions.

4.1. Bell Beakers

The amount of Yamnaya ancestry is probably overestimated among populations where Bell Beakers replaced Corded Ware. A map of Yamnaya ancestry among Bell Beakers gets trickier for the following reasons:

  • Expanding Repin peoples of Pre-Yamnaya ancestry must have had admixture through exogamy with late Sredni Stog/Proto-Corded Ware peoples during their expansion into the North Pontic area, and Sredni Stog in turn had probably some Pre-Yamnaya admixture, too (although they don’t appear in the simplistic formal stats above). This is supported by the increase of Anatolia farmer ancestry in more western Yamna samples.
  • Later, Yamnaya admixed through exogamy with Corded Ware-like populations in Central Europe during their expansion. Even samples from the Middle to Upper Danube and around the Lower Rhine will probably show increasing contributions of Steppe_MLBA, at the same time as they show an increasing proportion of EEF-related ancestry.
  • To complicate things further, the late Corded Ware Espersted family (from ca. 2500 BC or later) shows, in turn, what seems like a recent admixture with Yamnaya vanguard groups, with the sample of highest Yamnaya ancestry being the paternal uncle of other individuals (all of hg. R1a-M417), suggesting that there might have been many similar Central European mating networks from the mid-3rd millennium BC on, of (mainly) Yamnaya-like R1b elites displaying a small proportion of CW-like ancestry admixing through exogamy with Corded Ware-like peoples who already had some Yamnaya ancestry.
mlba-yamnaya-ancestry
Natural neighbor interpolation of Yamnaya ancestry among Middle to Late Bronze Age populations (Esperstedt CWC site close to BK_DE, label is hidden by BK_DE_SAN). See full map. You can see how this map correlated with the map of Late Copper Age migrations and Yamanaya into Bell Beaker expansion.

NOTE. Terms like “exogamy”, “male-driven migration”, and “sex bias”, are not only based on the Y-chromosome bottlenecks visible in the different cultural expansions since the Palaeolithic. Despite the scarce sampling available in 2017 for analysis of “Steppe ancestry”-related populations, it appeared to show already a male sex bias in Goldberg et al. (2017), and it has been confirmed for Neolithic and Copper Age population movements in Mathieson et al. (2018) – see Supplementary Table 5. The analysis of male-biased expansion of “Steppe ancestry” in CWC Esperstedt and Bell Beaker Germany is, for the reasons stated above, not very useful to distinguish their mutual influence, though.

Based on data from Olalde et al. (2019), Bell Beakers from Germany are the closest sampled ones to expanding East Bell Beakers, and those close to the Rhine – i.e. French, Dutch, and British Beakers in particular – show a clear excess “Steppe ancestry” due to their exogamy with local Corded Ware groups:

Only one 2-way model fits the ancestry in Iberia_CA_Stp with P-value>0.05: Germany_Beaker + Iberia_CA. Finding a Bell Beaker-related group as a plausible source for the introduction of steppe ancestry into Iberia is consistent with the fact that some of the individuals in the Iberia_CA_Stp group were excavated in Bell Beaker associated contexts. Models with Iberia_CA and other Bell Beaker groups such as France_Beaker (P-value=7.31E-06), Netherlands_Beaker (P-value=1.03E-03) and England_Beaker (P-value=4.86E-02) failed, probably because they have slightly higher proportions of steppe ancestry than the true source population.

olalde-iberia-chalcolithic

The exogamy with Corded Ware-like groups in the Lower Rhine Basin seems at this point undeniable, as is the origin of Bell Beakers around the Middle-Upper Danube Basin from Yamnaya Hungary.

To avoid this excess “Steppe ancestry” showing up in the maps, since Bell Beakers from Germany pack the most Yamnaya ancestry among East Bell Beakers outside Hungary (ca. 51.1% “Steppe ancestry”), I equated this maximum with BK_Scotland_Ach (which shows ca. 61.1% “Steppe ancestry”, highest among western Beakers), and applied a simple rule of three for “Steppe ancestry” in Dutch and British Beakers.

NOTE. Formal stats for “Steppe ancestry” in Bell Beaker groups are available in Olalde et al. (2018) supplementary materials (PDF). I didn’t apply this adjustment to Bk_FR groups because of the R1b Bell Beaker sample from the Champagne/Alsace region reported by Samantha Brunel that will pack more Yamnaya ancestry than any other sampled Beaker to date, hence probably driving the Yamnaya ancestry up in French samples.

The most likely outcome in the following years, when Yamnaya and Corded Ware ancestry are investigated separately, is that Yamnaya ancestry will be much lower the farther away from the Middle and Lower Danube region, similar to the case in Iberia, so the map above probably overestimates this component in most Beakers to the north of the Danube. Even the late Hungarian Beaker samples, who pack the highest Yamnaya ancestry (up to 75%) among Beakers, represent likely a back-migration of Moravian Beakers, and will probably show a contribution of Corded Ware ancestry due to the exogamy with local Moravian groups.

Despite this decreasing admixture as Bell Beakers spread westward, the explosive expansion of Yamnaya R1b male lineages (in words of David Reich) and the radical replacement of local ones – whether derived from Corded Ware or Neolithic groups – shows the true extent of the North-West Indo-European expansion in Europe:

chalcolithic-late-y-dna
Y-DNA haplogroups in West Eurasia during the Bell Beaker expansion. See full map and see maps of cultures, ADMIXTURE, Y-DNA, and mtDNA of the Late Copper Age and of the Yamnaya-Bell Beaker transition.

4.2. Palaeo-Balkan

There is scarce data on Palaeo-Balkan movements yet, although it is known that:

  1. Yamnaya ancestry appears among Mycenaeans, with the Yamnaya Bulgaria sample being its best current ancestral fit;
  2. the emergence of steppe ancestry and R1b-M269 in the eastern Mediterranean was associated with Ancient Greeks;
  3. Thracians, Albanians, and Armenians also show R1b-M269 subclades and “Steppe ancestry”.

4.3. Sintashta-Potapovka-Filatovka

Interestingly, Potapovka is the only Corded Ware derived culture that shows good fits for Yamnaya ancestry, despite having replaced Poltavka in the region under the same Corded Ware-like (Abashevo) influence as Sintashta.

This proves that there was a period of admixture in the Pre-Proto-Indo-Iranian community between CWC-like Abashevo and Yamnaya-like Catacomb-Poltavka herders in the Sintashta-Potapovka-Filatovka community, probably more easily detectable in this group because of the specific temporal and geographic sampling available.

srubnaya-yamnaya-ehg-chg-ancestry
Supplementary Table 14. P values of rank=3 and admixture proportions in modelling Steppe ancestry populations as a four-way admixture of distal sources EHG, CHG, Anatolian_Neolithic and WHG using 14 outgroups.
Left populations: Steppe cluster, EHG, CHG, WHG, Anatolian_Neolithic
Right populations: Mbuti.DG, Ust_Ishim.DG, Kostenki14, MA1, Han.DG, Papuan.DG, Onge.DG, Villabruna, Vestonice16, ElMiron, Ethiopia_4500BP.SG, Karitiana.DG, Natufian, Iran_Ganj_Dareh_Neolithic.

Srubnaya ancestry shows a best fit with non-Pre-Yamnaya ancestry, i.e. with different CHG + EHG components – possibly because the more western Potapovka (ancestral to Proto-Srubnaya Pokrovka) also showed good fits for it. Srubnaya shows poor fits for Pre-Yamnaya ancestry probably because Corded Ware-like (Abashevo) genetic influence increased during its formation.

On the other hand, more eastern Corded Ware-derived groups like Sintashta and its more direct offshoot Andronovo show poor fits with this model, too, but their fits are still better than those including Pre-Yamnaya ancestry.

mlba-ehg-ancestry
Natural neighbor interpolation of non-Pre-Yamnaya EHG ancestry among Middle to Late Bronze Age populations. See full map.
mlba-chg-ancestry
Natural neighbor interpolation of non-Pre-Yamnaya CHG ancestry among Middle to Late Bronze Age populations. See full map.
mlba-anatolia-farmer-ancestry
Natural neighbor interpolation of Anatolia Neolithic ancestry among Middle to Late Bronze Age populations. See full map.
mlba-iran-chl-ancestry
Natural neighbor interpolation of Iran Chl. ancestry among Middle to Late Bronze Age populations. See full map.

NOTE For maps with actual formal stats of Corded Ware ancestry from the Early Bronze Age to the modern times, you should read the post Corded Ware ancestry in North Eurasia and the Uralic expansion instead.

The bottleneck of Proto-Indo-Iranians under R1a-Z93 was not yet complete by the time when the Sintashta-Potapovka-Filatovka community expanded with the Srubna-Andronovo horizon:

early-bronze-age-y-dna
Y-DNA haplogroups in West Eurasia during the European Early Bronze Age. See full map and see maps of cultures, ADMIXTURE, Y-DNA, and mtDNA of the Early Bronze Age.

4.4. Afanasevo

At the end of the Afanasevo culture, at least three samples show hg. Q1b (ca. 2900-2500 BC), which seemed to point to a resurgence of local lineages, despite continuity of the prototypical Pre-Yamnaya ancestry. On the other hand, Anthony (2019) makes this cryptic statement:

Yamnaya men were almost exclusively R1b, and pre-Yamnaya Eneolithic Volga-Caspian-Caucasus steppe men were principally R1b, with a significant Q1a minority.

Since the only available samples from the Khvalynsk community are R1b (x3), Q1a(x1), and R1a(x1), it seems strange that Anthony would talk about a “significant minority”, unless Q1a (potentially Q1b in the newer nomenclature) will pop up in some more individuals of those ca. 30 new to be published. Because he also mentions I2a2 as appearing in one elite burial, it seems Q1a (like R1a-M459) will not appear under elite kurgans, although it is still possible that hg. Q1a was involved in the expansion of Afanasevo to the east.

middle-bronze-age-y-dna
Y-DNA haplogroups in West Eurasia during the Middle Bronze Age. See full map and see maps of cultures, ADMIXTURE, Y-DNA, and mtDNA of the Middle Bronze Age and the Late Bronze Age.

Okunevo, which replaced Afanasevo in the Altai region, shows a majority of hg. Q1b, but also some R1b-M269 samples proper of Afanasevo, suggesting partial genetic continuity.

NOTE. Other sampled Siberian populations clearly show a variety of Q subclades that likely expanded during the Palaeolithic, such as Baikal EBA samples from Ust’Ida and Shamanka with a majority of Q1b, and hg. Q reported from Elunino, Sagsai, Khövsgöl, and also among peoples of the Srubna-Andronovo horizon (the Krasnoyarsk MLBA outlier), and in Karasuk.

From Damgaard et al. Science (2018):

(…) in contrast to the lack of identifiable admixture from Yamnaya and Afanasievo in the CentralSteppe_EMBA, there is an admixture signal of 10 to 20% Yamnaya and Afanasievo in the Okunevo_EMBA samples, consistent with evidence of western steppe influence. This signal is not seen on the X chromosome (qpAdm P value for admixture on X 0.33 compared to 0.02 for autosomes), suggesting a male-derived admixture, also consistent with the fact that 1 of 10 Okunevo_EMBA males carries a R1b1a2a2 Y chromosome related to those found in western pastoralists. In contrast, there is no evidence of western steppe admixture among the more eastern Baikal region region Bronze Age (~2200 to 1800 BCE) samples.

This Yamnaya ancestry has been also recently found to be the best fit for the Iron Age population of Shirenzigou in Xinjiang – where Tocharian languages were attested centuries later – despite the haplogroup diversity acquired during their evolution, likely through an intermediate Chemurchek culture (see a recent discussion on the elusive Proto-Tocharians).

Haplogroup diversity seems to be common in Iron Age populations all over Eurasia, most likely due to the spread of different types of sociopolitical structures where alliances played a more relevant role in the expansion of peoples. A well-known example of this is the spread of Akozino warrior-traders in the whole Baltic region under a partial N1a-VL29-bottleneck associated with the emerging chiefdom-based systems under the influence of expanding steppe nomads.

early-iron-age-y-dna
Y-DNA haplogroups in West Eurasia during the Early Iron Age. See full map and see maps of cultures, ADMIXTURE, Y-DNA, and mtDNA of the Early Iron Age and Late Iron Age.

Surprisingly, then, Proto-Tocharians from Shirenzigou pack up to 74% Yamnaya ancestry, in spite of the 2,000 years that separate them from the demise of the Afanasevo culture. They show more Yamnaya ancestry than any other population by that time, being thus a sort of Late PIE fossils not only in their archaic dialect, but also in their genetic profile:

shirenzigou-afanasievo-yamnaya-andronovo-srubna-ulchi-han

The recent intrusion of Corded Ware-like ancestry, as well as the variable admixture with Siberian and East Asian populations, both point to the known intense Old Iranian and Old/Middle Chinese contacts. The scarce Proto-Samoyedic and Proto-Turkic loans in Tocharian suggest a rather loose, probably more distant connection with East Uralic and Altaic peoples from the forest-steppe and steppe areas to the north (read more about external influences on Tocharian).

Interestingly, both R1b samples, MO12 and M15-2 – likely of Asian R1b-PH155 branch – show a best fit for Andronovo/Srubna + Hezhen/Ulchi ancestry, suggesting a likely connection with Iranians to the east of Xinjiang, who later expanded as the Wusun and Kangju. How they might have been related to Huns and Xiongnu individuals, who also show this haplogroup, is yet unknown, although Huns also show hg. R1a-Z93 (probably most R1a-Z2124) and Steppe_MLBA ancestry, earlier associated with expanding Iranian peoples of the Srubna-Andronovo horizon.

All in all, it seems that prehistoric movements explained through the lens of genetic research fit perfectly well the linguistic reconstruction of Proto-Indo-European and Proto-Uralic.

Related

Corded Ware ancestry in North Eurasia and the Uralic expansion

uralic-clines-nganasan

Now that it has become evident that Late Repin (i.e. Yamnaya/Afanasevo) ancestry was associated with the migration of R1b-L23-rich Late Proto-Indo-Europeans from the steppe in the second half of the the 4th millennium BC, there’s still the question of how R1a-rich Uralic speakers of Corded Ware ancestry expanded , and how they spread their languages throughout North Eurasia.

Modern North Eurasians

I have been collecting information from the supplementary data of the latest papers on modern and ancient North Eurasian peoples, including Jeong et al. (2019), Saag et al. (2019), Sikora et al. (2018), or Flegontov et al. (2019), and I have tried to add up their information on ancestral components and their modern and historical distributions.

Fortunately, the current obsession with simplifying ancestry components into three or four general, atemporal groups, and the common use of the same ones across labs, make it very simple to merge data and map them.

Corded Ware ancestry

There is no doubt about the prevalent ancestry among Uralic-speaking peoples. A map isn’t needed to realize that, because ancient and modern data – like those recently summarized in Jeong et al. (2019) – prove it. But maps sure help visualize their intricate relationship better:

natural-modern-srubnaya-ancestry
Natural neighbor interpolation of Srubnaya ancestry among modern populations. See full map.
kriging-modern-srubnaya-ancestry
Kriging interpolation of Srubnaya ancestry among modern populations. See full map

Interestingly, the regions with higher Corded Ware-related ancestry are in great part coincident with (pre)historical Finno-Ugric-speaking territories:

uralic-languages-modern
Modern distribution of Uralic languages, with ancient territory (in the Common Era) labelled and delimited by a red line. For more information on the ancient territory see here.

Edit (29/7/2019): Here is the full Steppe_MLBA ancestry map, including Steppe_MLBA (vs. Indus Periphery vs. Onge) in modern South Asian populations from Narasimhan et al. (2018), apart from the ‘Srubnaya component’ in North Eurasian populations. ‘Dummy’ variables (with 0% ancestry) have been included to the south and east of the map to avoid weird interpolations of Steppe_MLBA into Africa and East Asia.

modern-steppe-mlba-ancestry2
Natural neighbor interpolation of Steppe MLBA-like ancestry among modern populations. See full map.

Anatolia Neolithic ancestry

Also interesting are the patterns of non-CWC-related ancestry, in particular the apparent wedge created by expanding East Slavs, which seems to reflect the intrusion of central(-eastern) European ancestry into Finno-Permic territory.

NOTE. Read more on Balto-Slavic hydrotoponymy, on the cradle of Russians as a Finno-Permic hotspot, and about Pre-Slavic languages in North-West Russia.

natural-modern-lbk-en-ancestry
Natural neighbor interpolation of LBK EN ancestry among modern populations. See full map.
kriging-modern-lbk-en-ancestry
Kriging interpolation of LBK EN ancestry among modern populations. See full map

WHG ancestry

The cline(s) between WHG, EHG, ANE, Nganasan, and Baikal HG are also simplified when some of them excluded, in this case EHG, represented thus in part by WHG, and in part by more eastern ancestries (see below).

modern-whg-ancestry
Natural neighbor interpolation of WHG ancestry among modern populations. See full map.
kriging-modern-whg-ancestry
Kriging interpolation of WHG ancestry among modern populations. See full map.

Arctic, Tundra or Forest-steppe?

Data on Nganasan-related vs. ANE vs. Baikal HG/Ulchi-related ancestry is difficult to map properly, because both ancestry components are usually reported as mutually exclusive, when they are in fact clearly related in an ancestral cline formed by different ancient North Eurasian populations from Siberia.

When it comes to ascertaining the origin of the multiple CWC-related clines among Uralic-speaking peoples, the question is thus how to properly distinguish the proportions of WHG-, EHG-, Nganasan-, ANE or BaikalHG-related ancestral components in North Eurasia, i.e. how did each dialectal group admix with regional groups which formed part of these clines east and west of the Urals.

The truth is, one ought to test specific ancient samples for each “Siberian” ancestry found in the different Uralic dialectal groups, but the simplistic “Siberian” label somehow gets a pass in many papers (see a recent example).

Below qpAdm results with best fits for Ulchi ancestry, Afontova Gora 3 ancestry, and Nganasan ancestry, but some populations show good fits for both and with similar proportions, so selecting one necessarily simplifies the distribution of both.

Ulchi ancestry

modern-ulchi-ancestry
Natural neighbor interpolation of Ulchi ancestry among modern populations. See full map.
kriging-modern-ulchi-ancestry
Kriging interpolation of Ulchi ancestry among modern populations. See full map.

ANE ancestry

natural-modern-ane-ancestry
Natural neighbor interpolation of ANE ancestry among modern populations. See full map.
kriging-modern-ane-ancestry
Kriging interpolation of ANE ancestry among modern populations. See full map.

Nganasan ancestry

modern-nganasan-ancestry
Natural neighbor interpolation of Nganasan ancestry among modern populations. See full map.
kriging-modern-nganasan-ancestry
Kriging interpolation of Nganasan ancestry among modern populations. See full map.

Iran Chalcolithic

A simplistic Iran Chalcolithic-related ancestry is also seen in the Altaic cline(s) which (like Corded Ware ancestry) expanded from Central Asia into Europe – apart from its historical distribution south of the Caucasus:

modern-iran-chal-ancestry
Natural neighbor interpolation of Iran Neolithic ancestry among modern populations. See full map.
kriging-modern-iran-neolithic-ancestry
Kriging interpolation of Iran Chalcolithic ancestry among modern populations. See full map.

Other models

The first question I imagine some would like to know is: what about other models? Do they show the same results? Here is the simplistic combination of ancestry components published in Damgaard et al. (2018) for the same or similar populations:

NOTE. As you can see, their selection of EHG vs. WHG vs. Nganasan vs. Natufian vs. Clovis of is of little use, but corroborate the results from other papers, and show some interesting patterns in combination with those above.

EHG

damgaard-modern-ehg-ancestry
Natural neighbor interpolation of EHG ancestry among modern populations, data from Damgaard et al. (2018). See full map.
damgaard-kriging-ehg-ancestry
Kriging interpolation of EHG ancestry among modern populations. See full map.

Natufian ancestry

damgaard-modern-natufian-ancestry
Natural neighbor interpolation of Natufian ancestry among modern populations, data from Damgaard et al. (2018). See full map.
damgaard-kriging-natufian-ancestry
Kriging interpolation of Natufian ancestry among modern populations. See full map.

WHG ancestry

damgaard-modern-whg-ancestry
Natural neighbor interpolation of WHG ancestry among modern populations, data from Damgaard et al. (2018). See full map.
damgaard-kriging-whg-ancestry
Kriging interpolation of WHG ancestry among modern populations. See full map.

Baikal HG ancestry

damgaard-modern-baikalhg-ancestry
Natural neighbor interpolation of Baikal hunter-gatherer ancestry among modern populations, data from Damgaard et al. (2018). See full map.
damgaard-kriging-baikal-hg-ancestry
Kriging interpolation of Baikal HG ancestry among modern populations. See full map.

Ancient North Eurasians

Once the modern situation is clear, relevant questions are, for example, whether EHG-, WHG-, ANE, Nganasan-, and/or Baikal HG-related meta-populations expanded or became integrated into Uralic-speaking territories.

When did these admixture/migration events happen?

How did the ancient distribution or expansion of Palaeo-Arctic, Baikalic, and/or Altaic peoples affect the current distribution of the so-called “Siberian” ancestry, and of hg. N1a, in each specific population?

NOTE. A little excursus is necessary, because the calculated repetition of a hypothetic opposition “N1a vs. R1a” doesn’t make this dichotomy real:

  1. There was not a single ethnolinguistic community represented by hg. R1a after the initial expansion of Eastern Corded Ware groups, or by hg. N1a-L392 after its initial expansion in Siberia:
  2. Different subclades became incorporated in different ways into Bronze Age and Iron Age communities, most of which without an ethnolinguistic change. For example, N1a subclades became incorporated into North Eurasian populations of different languages, reaching Uralic- and Indo-European-speaking territories of north-eastern Europe during the late Iron Age, at a time when their ancestral origin or language in Siberia was impossible to ascertain. Just like the mix found among Proto-Germanic peoples (R1b, R1a, and I1)* or among Slavic peoples (I2a, E1b, R1a)*, the mix of many Uralic groups showing specific percentages of R1a, N1a, or Q subclades* reflect more or less recent admixture or acculturation events with little impact on their languages.

*other typically northern and eastern European haplogroups are also represented in early Germanic (N1a, I2, E1b, J, G2), Slavic (I1, G2, J) and Finno-Permic (I1, R1b, J) peoples.

ananino-culture-new
Map of archaeological cultures in north-eastern Europe ca. 8th-3rd centuries BC. [The Mid-Volga Akozino group not depicted] Shaded area represents the Ananino cultural-historical society. Fading purple arrows represent likely stepped movements of subclades of haplogroup N for centuries (e.g. Siberian → Ananino → Akozino → Fennoscandia [N-VL29]; Circum-Arctic → forest-steppe [N1, N2]; etc.). Blue arrows represent eventual expansions of Uralic peoples to the north. Modified image from Vasilyev (2002).

The problem with mapping the ancestry of the available sampling of ancient populations is that we lack proper temporal and regional transects. The maps that follow include cultures roughly divided into either “Bronze Age” or “Iron Age” groups, although the difference between samples may span up to 2,000 years.

NOTE. Rough estimates for more external groups (viz. Sweden Battle Axe/Gotland_A for the NW, Srubna from the North Pontic area for the SW, Arctic/Nganasan for the NE, and Baikal EBA/”Ulchi-like” for the SE) have been included to offer a wider interpolated area using data already known.

Bronze Age

Similar to modern populations, the selection of best fit “Siberian” ancestry between Baikal HG vs. Nganasan, both potentially ± ANE (AG3), is an oversimplification that needs to be addressed in future papers.

Corded Ware ancestry

bronze-age-corded-ware-ancestry
Natural neighbor interpolation of Srubnaya ancestry among Bronze Age populations. See full map.

Nganasan-like ancestry

bronze-age-nganasan-like-ancestry
Natural neighbor interpolation of Nganasan-like ancestry among Bronze Age populations. See full map.

Baikal HG ancestry

bronze-age-baikal-hg-ancestry
Natural neighbor interpolation of Baikal Hunter-Gatherer ancestry among Bronze Age populations. See full map.

Afontova Gora 3 ancestry

bronze-age-afontova-gora-ancestry
Natural neighbor interpolation of Afontova Gora 3 ancestry among Bronze Age populations. See full map.

Iron Age

Corded Ware ancestry

Interestingly, the moderate expansion of Corded Ware-related ancestry from the south during the Iron Age may be related to the expansion of hg. N1a-VL29 into the chiefdom-based system of north-eastern Europe, including Ananyino/Akozino and later expanding Akozino warrior-traders around the Baltic Sea.

NOTE. The samples from Levänluhta are centuries older than those from Estonia (and Ingria), and those from Chalmny Varre are modern ones, so this region has to be read as a south-west to north-east distribution from the Iron Age to modern times.

iron-age-corded-ware-ancestry
Natural neighbor interpolation of Srubnaya ancestry among Iron Age populations. See full map.

Baikal HG-like ancestry

The fact that this Baltic N1a-VL29 branch belongs in a group together with typically Avar N1a-B197 supports the Altaic origin of the parent group, which is possibly related to the expansion of Baikalic ancestry and Iron Age nomads:

iron-age-baikal-ancestry
Natural neighbor interpolation of Baikal HG ancestry among Iron Age populations. See full map.

Nganasan-like ancestry

The dilution of Nganasan-like ancestry in an Arctic region featuring “Siberian” ancestry and hg. N1a-L392 at least since the Bronze Age supports the integration of hg. N1a-Z1934, sister clade of Ugric N1a-Z1936, into populations west and east of the Urals with the expansion of Uralic languages to the north into the Tundra region (see here).

The integration of N1a-Z1934 lineages into Finnic-speaking peoples after their migration to the north and east, and the displacement or acculturation of Saami from their ancestral homeland, coinciding with known genetic bottlenecks among Finns, is yet another proof of this evolution:

iron-age-nganasan-ancestry
Natural neighbor interpolation of Nganasan ancestry among Iron Age populations. See full map.

WHG ancestry

Similarly, WHG ancestry doesn’t seem to be related to important population movements throughout the Bronze Age, which excludes the multiple North Eurasian populations that will be found along the clines formed by WHG, EHG, ANE, Nganasan, Baikal HG ancestry as forming part of the Uralic ethnogenesis, although they may be relevant to follow later regional movements of specific populations.

iron-age-whg-ancestry
Natural neighbor interpolation of WHG ancestry among Iron Age populations. See full map.

Conclusion

It seems natural that people used to look at maps of haplogroup distribution from the 2000s, coupled with modern language distributions, and would try to interpret them in a certain way, reaching thus the wrong conclusions whose consequences are especially visible today when ancient DNA keeps contradicting them.

In hindsight, though, assuming that Balto-Slavs expanded with Corded Ware and hg. R1a, or that Uralians expanded with “Siberian” ancestry and hg. N1a, was as absurd as looking at maps of ancestry and haplogroup distribution of ancient and modern Native Americans, trying to divide them into “Germanic” or “Iberian”…

The evolution of each specific region and cultural group of North Eurasia is far from being clear. However, the general trend speaks clearly in favour of an ancient, Bronze Age distribution of North Eurasian ancestry and haplogroups that have decreased, diluted, or become incorporated into expanding Uralians of Corded Ware ancestry, occasionally spreading with inter-regional expansions of local groups.

Given the relatively recent push of Altaic and Indo-European languages into ancestral Uralic-speaking territories, only the ancient Corded Ware expansion remains compatible with the spread of Uralic languages into their historical distribution.

Related

Sea Peoples behind Philistines were Aegeans, including R1b-M269 lineages

New open access paper Ancient DNA sheds light on the genetic origins of early Iron Age Philistines, by Feldman et al. Science Advances (2019) 5(7):eaax0061.

Interesting excerpts (modified for clarity, emphasis mine):

Here, we report genome-wide data from human remains excavated at the ancient seaport of Ashkelon, forming a genetic time series encompassing the Bronze to Iron Age transition. We find that all three Ashkelon populations derive most of their ancestry from the local Levantine gene pool. The early Iron Age population was distinct in its high genetic affinity to European-derived populations and in the high variation of that affinity, suggesting that a gene flow from a European-related gene pool entered Ashkelon either at the end of the Bronze Age or at the beginning of the Iron Age. Of the available contemporaneous populations, we model the southern European gene pool as the best proxy for this incoming gene flow. Last, we observe that the excess European affinity of the early Iron Age individuals does not persist in the later Iron Age population, suggesting that it had a limited genetic impact on the long-term population structure of the people in Ashkelon.

philistines-pca
Ancient genomes (marked with color-filled symbols) projected onto the principal components inferred from present-day west Eurasians (gray circles). The newly reported Ashkelon populations are annotated in the upper corner.

Genetic discontinuity between the Bronze Age and the early Iron Age people of Ashkelon

In comparison to ASH_LBA, the four ASH_IA1 individuals from the following Iron Age I period are, on average, shifted along PC1 toward the European cline and are more spread out along PC1, overlapping with ASH_LBA on one extreme and with the Greek Late Bronze Age “S_Greece_LBA” on the other. Similarly, genetic clustering assigns ASH_IA1 with an average of 14% contribution from a cluster maximized in the Mesolithic European hunter-gatherers labeled “WHG” (shown in blue in Fig. 2B) (15, 22, 26). This component is inferred only in small proportions in earlier Bronze Age Levantine populations (2 to 9%).

In agreement with the PCA and ADMIXTURE results, only European hunter-gatherers (including WHG) and populations sharing a history of genetic admixture with European hunter-gatherers (e.g., as European Neolithic and post-Neolithic populations) produced significantly positive f4-statistics (Z ≥ 3), suggesting that, compared to ASH_LBA, ASH_IA1 has additional European-related ancestry.

We find that the PC1 coordinates positively correlate with the proportion of WHG ancestry modeled in the Ashkelon individuals, suggesting that WHG reasonably tag a European-related ancestral component within the ASH_IA1 individuals.

philistines-admixture
We plot the ancestral proportions of the Ashkelon individuals inferred by qpAdm using Iran_ChL, Levant_ChL, and WHG as sources ±1 SEs. P values are annotated under each model. In cases when the three-way model failed (χ2P < 0.05), we plot the fitting two-way model. The WHG ancestry is necessary only in ASH_IA1.

The best supported one (χ2P = 0.675) infers that ASH_IA1 derives around 43% of ancestry from the Greek Bronze Age “Crete_Odigitria_BA” (43.1 ± 19.2%) and the rest from the ASH_LBA population.

(…) only the models including “Sardinian,” “Crete_Odigitria_BA,” or “Iberia_BA” as the candidate population provided a good fit (χ2P = 0.715, 49.3 ± 8.5%; χ2P = 0.972, 38.0 ± 22.0%; and χ2P = 0.964, 25.8 ± 9.3%, respectively). We note that, because of geographical and temporal sampling gaps, populations that potentially contributed the “European-related” admixture in ASH_IA1 could be missing from the dataset.

The transient impact of the “European-related” gene flow on the Ashkelon gene pool

The ASH_IA2 individuals are intermediate along PC1 between the ASH_LBA ones and the earlier Bronze Age Levantines (Jordan_EBA/Lebanon_MBA) in the west Eurasian PCA (Fig. 2A). Notably, despite being chronologically closer to ASH_IA1, the ASH_IA2 individuals position closer, on average, to the earlier Bronze Age individuals.

philistines-y-dna
See more information on Y-DNA SNP calls, including ASH067 as R1b-M269 (xL151).

The transient excess of European-related genetic affinity in ASH_IA1 can be explained by two scenarios. The early Iron Age European-related genetic component could have been diluted by either the local Ashkelon population to the undetectable level at the time of the later Iron Age individuals or by a gene flow from a population outside of Ashkelon introduced during the final stages of the early Iron Age or the beginning of the later Iron Age.

By modeling ASH_IA2 as a mixture of ASH_IA1 and earlier Bronze Age Levantines/Late Period Egyptian, we infer a range of 7 to 38% of contribution from ASH_IA1, although no contribution cannot be rejected because of the limited resolution to differentiate between Bronze Age and early Iron Age ancestries in this model.

Hg. R1b-M269 and the Aegean

I already predicted this relationship of Philistines and Aegeans (Greeks in particular) months ago, based on linguistics, archaeology, and phylogeography, although it was (and still is) yet unclear if these paternal lineages might have come from other nearby populations which might be descended from Common Anatolians instead, given the known intense contacts between Helladic and West Anatolian groups.

luwian-civilization-sea-peoples
The alternative view: The Sea Peoples can be traced back to the Aegean, so they could also have consisted of Luwian petty kingdoms, who had formed an alliance and attacked Hatti from the south.

The deduction process for the Greek connection was quite simple:

Palaeo-Balkan populations

We know that R1b-Z2103 expanded with Yamna, including West Yamna settlers: they appear in Vučedol, which means they formed part of the earliest expansion waves of Yamna settlers into the Carpathian Basin, and they also appear scattered among Bell Beakers (apart from dominating East Yamna and Afanasevo), which suggests that they were possibly one of the most successful lineages during the late Repin/early Yamna expansion.

The “Steppe ancestry” associated with I2a-L699 samples among Balkan BA peoples may have also been associated with recent Bronze Age expansions, and this haplogroup’s presence among modern Balkan peoples may also suggest that it expanded with Palaeo-Balkan languages. Nevertheless, we don’t know which specific lineages and “Steppe ancestry” they represent, sadly.

These samples may well be related to remnants of previous Balkan populations like Cernavodă or Ezero, because there has been no peer-reviewed attempt at distinguishing Khvalynsk-/Novodanilovka- from Sredni Stog- from Yamnaya-related populations (see here), and some groups that are associated with this ancestry, like Corded Ware, are known to be culturally distinct from Yamna.

In any case, Proto-Greeks from the southern Balkans (say, Sitagroi IV and related groups) are probably going to show, based on Palaeo-Balkan substrate and Pre-Greek substrate and on the available Mycenaean samples, a process of decreasing proportion of R1b-Z2103 lineages relative to local ones, and a relatively similar cline of Yamna:EEF ancestry from northern to southern areas, at least in the periods closest to the Yamna expansion.

NOTE. The finding of “archaic” R1b-L389 (R1b-V1636) and R1a-M198 subclades among modern Greeks and the likely Neolithic origin of these paternal lineages around the Caucasus suggest that their presence in Greece may be from any of the more recent migrations that have happened between Anatolia and the Balkans, especially during the Common Era, rather than Indo-Anatolian migrations; probably very very recently.

-chalcolithic-late-balkans
Bronze Age cultures in the Balkans and the Aegean. See full map including ancient samples with Y-DNA, mtDNA, and ADMIXTURE.

Minoans and haplogroup J

In the Aegean, it is already evident that the population changed language partly through cultural diffusion, probably through elite domination of Proto-Greek speakers. Whether that happened before the invasion into the Greek Peninsula or after it is unclear, as we discussed recently, because we only have one reported Y-chromosome haplogroup among Mycenaeans, and it is J (probably continuing earlier lineages).

Now we have more samples from the so-called Emporion 2 cluster in Olalde et al. (2019), which shows Mycenaean-like eastern Mediterranean ancestry and 3 (out of 3) samples of haplogroup J, which – given the origin of the colony in Phocea – may be interpreted as the prevalence of West Anatolian-like ancestry and lineages in the eastern part of the Aegean (and possibly thus south Peloponnese), in line with the modern situation.

NOTE. It does not seem likely that those R or R1b-L23 samples from the Emporion 1 cluster are R1b-Z2103, based on their West European-like ancestry, although they still may be, because – as we know – ancestry (unlike haplogroup) changes too easily to interpret it as an ancestral ethnolinguistic marker.

anatolia-greek-aegean
PCA of ancient samples related to the Aegean, with Minoans, Mycenaeans (including the Emporion 2 cluster in the background) Anatolia N-Ch.-BA and Levantine BA-LBA populations, including Tel Shadud samples. See more PCAs of ancient Eurasian populations.

Greeks and haplogroup R1b-M269

Therefore, while the presence of R1b-Z2103 among ancient Balkan peoples connected to the Yamna expansion is clear, one might ask if R1b-Z2103 really spread up to the Peloponnese by the time of the Mycenaean Civilization. That has only one indirect answer, and it’s most likely yes.

We already had some R1b-Z2103 among Thracians and around the Armenoid homeland, which offers another clue at the migration of these lineages from the Balkans. The distribution of different “archaic” R1b-Z2103 subclades among modern Balkan populations and around the Aegean offered more support to this conclusion.

But now we have two interesting ancient populations that bear witness to the likely intrusion of R1b-M269 with Proto-Greeks:

An Ancient Greek of hg. R1b

A single ancient sample supports the increase in R1b-Z2103 among Greeks during the “Dorian” invasions that triggered the Dark Ages and the phenomenon of the Aegean Sea Peoples. It comes from a Greek lab study, showing R1b1b (i.e. R1b-P297 in the old nomenclature) as the only Y-chromosome haplogroup obtained from the sampling of the Gulf of Amurakia ca. 470-30 BC, i.e. before the Roman foundation of Nikopolis, hence from people likely from Anaktorion in Ancient Acarnania, of Corinthian origin.

ancient-greeks-y-dna-mtdna

Even with the few data available – and with the caution necessary for this kind of studies from non-established labs, which may be subject to many different kinds of errors – one could argue that the western Greek areas, which received different waves of migrants from the north and shows a higher distribution of R1b-Z2103 in modern times, was probably more heavily admixed with R1b-Z2103 than southern and eastern areas, which were always dominated by Greek-speaking populations more heavily admixed with locals.

The Dorian invasion and the Greek Dark Ages may thus account for a renewed influx of R1b-Z2103 lineages accompanying the dialects that would eventually help form the Hellenic Koiné. In a sense, it is only natural that demographically stronger populations around the Bronze Age Aegean would suffer a limited (male) population replacement with the succeeding invasions, starting with a higher genetic impact in the north-west and diminishing as they progressed to the south and the east, coupled with stepped admixture events with local populations.

This would be therefore the late equivalent of what happened at the end of the 3rd millennium BC, with Mycenaeans and their genetic continuity with Minoans.

pre-greek-ssos
Distribution of Pre-Greek place-names ending in -ssos/-ssa or -sos/-sa. See original images and more on the south/east cline distribution of Pre-Greek place-names here.

Sea peoples of hg. R1b-M269

Thanks to Wang et al. (2018) supplementary materials we knew that one of the two Levantine LBA II samples from Tel Shadud (final 13th–early 11th c. BC) published in van den Brink (2017) was of hg. R1b-M269 – in fact, the one interpreted as a Canaanite official residing at this site and emulating selected funerary aspects of Egyptian mortuary culture.

Both analyzed samples, this elite individual and a commoner of hg. J buried nearby, were genetically similar and indistinguishable from local populations, though:

Principal Components Analysis of L112 and L126 was carried out within the framework described in Lazaridis et al. (2016). This analysis showed that the two individuals cluster genetically, with similar estimated proportions of ancestry from diverse West Eurasian ancestral sources. These results are consistent with the hypothesis that they derive from the same population, or alternatively that they derive from two quite closely related populations.

We know that ancestry changes easily within a few generations, so there was not much information to go on, except for the fact that – being R1b-M269 – this individual could trace his paternal ancestor at some point to Proto-Indo-Europeans.

One might think that, because many haplogroups in this spreadsheet were wrong, this is also wrong; nevertheless, many haplogroups are correctly identified by Yleaf, and finding R1b-M269 in the Levant after the expansion of Sea Peoples could not be that surprising, because they were most likely related to populations of the Aegean Sea. Any other related hg. R1b (R1b-M73, R1b-V88, even R1b-V1636) wouldn’t fit as well as R1b-M269.

sea-peoples-egypt-rameses-iii

However, the early expansion of Proto-Indo-Aryans into the Middle East, as well as the later expansion of Armenians from the Balkans through Anatolia and of West Iranians from the east may have all potentially been related to this sample. But still, the previous linguistic and archaeological theories concerning the Philistines and the expansion of Sea Peoples in the Levant made this sample a likely (originally) Greek “Dorian” lineage, rather than the other (increasingly speculative) alternatives.

In any case, it was obvious to anyone – that is, to anyone with a minimum knowledge of how population genomics works – that just the two samples from van den Brink (2017) couldn’t be used to get to any conclusions about the ancestral origin of these individuals (or their differences) beyond Levantine peoples, because their ancestry was essentially (i.e. statistically) the same as the other few available ancient samples from nearby regions and similar periods.

If anything, the PCA suggested an origin of the R1b sample closer to Aegean populations relative to the J individual (see PCA above), and this should have been supported also by amateur models, without any possible confirmation (as with the ASH_IA2 cluster in this paper). However, if you have followed online discussions of Tel Shadud R1b-M269 sample since it was mentioned first on Eupedia months ago – including another wave of misguided speculation based on the ancestry of both individuals triggered by a discussion on this blog -, you have once more proof of how misleading ancestry analyses can be in the wrong hands.

NOTE. This is the Nth proof (and that only in 2019) of how it’s best to just avoid amateur analyses and interpretations altogether, as I did in the recent publication of the books. All those who didn’t take into account whatever was commented about the ancestry of these samples haven’t lost a single bit of relevant information on Levantine peoples, and have had more time for useful reads, compared to those dedicated to endless void speculation, once again gone awfully wrong, as does everything related to cocky ancient DNA crackpottery 😉

bronze-age-late-aegean
Late Bronze Age population movements in the Eastern Mediterranean and the Middle East. See full map including ancient DNA samples with Y-DNA, mtDNA, and ADMIXTURE.

Admittedly, though, even accepting the evident Mediterranean origin of this lineage, one could have argued that this sample may have been of R1b-L151 subclade, if one were inclined to support the theory that Italic peoples were behind Sea Peoples expanding east – and consequently that the ancestors of Etruscans had migrated eastward into the Aegean (e.g. into Lemnos), so that it could be asserted that Tyrsenian might have been a remnant language of an ancient population of northern Italy.

Philistines

Fortunately, some of the samples recovered in Feldman et al. (2019) that could be analyzed (those of the cluster ASH_IA1) offer a very specific time frame where European ancestry appeared (ca. 1250 BC) before it subsequently became fully diluted (as seen in cluster ASH_IA2) among the prevalent Levantine ancestry of the area.

Also fortunately, this precise cluster shows another R1b-M269 sample, likely R1b-Z2103 (because it is probably xL151), and this sample together with others from the same cluster prove that the ancestry related to the original southern European incomers was:

  1. Recent, related thus to LBA population movements, as expected; and
  2. More closely related to coeval Aegeans, including Mycenaeans with Steppe-related ancestry.

NOTE. I say “fortunately” because, as you can imagine if you have dealt with amateurish discussions long enough, without this cluster with evident Aegean ancestry and the R1b-M269 (Z2103) sample precisely associated to it, some would enter again in endless comment loops created by ancestry magicians, showing how Aegean peoples were not behind Sea Peoples, or not behind Philistines, or not behind the R1b-M269 among Philistines, depending on their specific agendas.

aegean-sea-peoples
Map of the Sea People invasions in the Aegean Sea and Eastern Mediterranean at the end of the Late Bronze Age (blue arrows).. Some of the major cities impacted by the raids are denoted with historical dates. Inland invasions are represented by purple arrows. From Kaniewski et al. (2011). Some of the major cities impacted by the raids are denoted with historical dates. Inland invasions are represented by purple arrows.

The results of the paper don’t solve the question of the exact origin of all Sea Peoples (not even that of Philistines), but it is quite clear that most of those forming this seafaring confederation must have come from sites around the Aegean Sea. This supports thus the traditional origin attributed to them, including a hint at the likely expansion of Eastern Mediterranean ancestry and lineages into the Italian Peninsula precisely from the Aegean, as some oral communications have already disclosed.

As an indirect conclusion from the findings in this paper, then, we can now more confidently support that Tyrsenian speakers most likely expanded into the Appenines and the Alps originally from a Tyrsenian-speaking LBA population from Lemnos, due to the social unrest in the whole Aegean region, and might have become heavily admixed with local Italic peoples quite quickly, as it happened with Philistines, resulting in yet another case of language expansion through (the simplistically called) elite domination.

Conclusion

Even more interesting than these specific findings, this paper confirms yet another hypothesis based on phylogeography, and proves once again two important starting points for ancient DNA interpretation that I have discussed extensively in this blog:

  • The rare R1b-M269 Y-chromosome lineage of Tel Shadud offered ipso facto the most relevant clue about the ancestral geographical origin of this Canaanite elite male’s paternal family, most likely from the north-west based on ancient phylogeography, which indirectly – in combination with linguistics and archaeology – supported the ancestral ethnolinguistic identification of Philistines with the Aegean and thus with (a population closest to) Ancient Greeks.
  • Ancestry analyses are often fully unreliable when assessing population movements, especially when few samples from incomplete temporal-geographical transects are assessed in isolation, because – unlike paternal (and maternal) haplogroups – ancestry might change fully within a few generations, depending on the particular anthropological setting. Their investigation is thus bound by many limitations – of design, statistical, and anthropological (i.e. archaeological and linguistic) – which are quite often not taken into account.

These cornerstones of ancient DNA interpretation have been already demonstrated to be valid not only for Levantine populations, as in this case, but also for Balkan peoples, for Bell Beakers, for steppe populations (like Khvalynsk, Sredni Stog, Yamna, Corded Ware), for Basques, for Balto-Slavs, for Ugrians and Samoyeds, and for many other prehistoric peoples.

I rest my case.

Related

Bronze Age cultures in the Tarim Basin and the elusive Proto-Tocharians

andronovo-xiaohe-horizon

Master’s thesis Shifting Memories: Burial Practices and Cultural Interaction in Bronze Age China: A study of the Xiaohe-Gumugou cemeteries in the Tarim Basin, by Yunyun Yang, Uppsala University, Department of Archaeology and Ancient History (2019).

Summary excerpts, mainly from the conclusions (emphasis mine):

Both the Xiaohe and the Gumugou groups are suggested as possibly originating from southern Siberia or Central Asia and being related to Afanasievo and Andronovo people (Han 1986, 1994; Li et al. 2010, 2015). But a latest research suggest that the Xiaohe males are genetic distinct from the Afanasievo males, considering the paternal lineages (Hollard et al. 2018). From genetic evidence, it is suggested that southern Siberia and Central Asia were dominated by Europeans during the Bronze Age. Southern Siberia was predominant by Europeans since the Bronze Age as a result of eastward migration of Kurgan people (Keyser et al. 2009). Central Asia started to have an eastern Eurasian maternal lineage that coexisted with the previous western maternal lineage from around 700 BCE (Lalueza-Fox et al. 2004). Based on the research mentioned above, we can conclude as that the Xiaohe and the Gumugou people possibly came from the southern Siberia or Central Asia.

Origin of the Xiaohe horizon

There are two hypotheses about the origins of the Xiaohe horizon. The “steppe hypothesis” assumes that the early settlers (Gumugou people) of the Tarim Basin came from the Afanasievo culture in the Minusinsk Basin-Altai Mountains regions (Kuz’mina et al. 2008; Mallory et al. 2008). The “oasis hypothesis” argues that the early settlers were related to the spreading of the oasis-based agricultural groups from the Bactria and Margiana parts of the southern Central Asia area (Chen et al. 1995). Both hypotheses mainly relied on the use of some materials such as animal cattle, sheep/goats, camel hair, and plant wheat, whose origins were bound to western traditions. But these proofs cannot provide enough support to claim that the Xiaohe horizon cultures were from Afanasievo or BMAC cultures, except for telling there were possible cultural connections or interactions among them. What’s more, there were no horses or potteries in the Xiaohe horizon.

It is worth noting that Ephedra plant is commonly thought as a strong candidate of the Soma or Haoma sacred drink for the ancient Indians or Iranians. Soma is the name recorded in the Vedic Brahmanism religious literature Rigveda, Haoma in the Zoroastrianism Avesta, and indicates as a ritual drink from plant juice. The reason to address Ephedra plant to Soma-Haoma drink is mainly because of its ephedrine, which works on muscle strength, low blood pressure, (and asthma) to make people get rid of tiredness (Houben 2013). Furthermore, it is thought that Ephedra with anti-fatigue function gives gods or the dead immortality, longevity, and resurrection (Mahdihassan 1987). From a mobile consideration of Vedic Aryans perspective, it is thought Vedic Aryans made use of Ephedra, cannabis and poppy to produce Soma drink in Margiana, only Ephedra in Bactria and in Indian mountains area, but other substitutes in Indian plains (Shah 2014). From the Ephedra perspective, it is agreeable that the Xiaohe-Gumugou people were related to the Indo-Aryan peoples (Mallory et al. 1997; Wang 2017).

gumugou-xiaohe
The distribution map of the sites in the Xiaohe cultural horizon.

Burial customs

Both the Xiaohe and the Gumugou groups maintained similar burial customs, but we can distinguish a developing process from the slight diverse ways of the Gumugou cemetery to the highly consistent and advanced technology in making coffins of the Xiaohe cemetery. In terms of the dressing, the dead wore a felt cap, a pair of leather boots, a bracelet twined on the right wrist, and was wrapped in a big felt mantle. The dead in the Xiaohe cemetery also wore a loin-cloth. Commonly, both cemeteries contained burials goods of Ephedra twigs, grains of wheat and millet, grass-made baskets, animal ears (such as calf ears), and livestock. Wooden coffins in the two cemeteries were constructed in a similar way, by assembling two side-planks, two end-boards, a lid consisting of a few short straight boards, and covered with livestock hide (mainly cattle hide in the Xiaohe cemetery and sheep/goats hide in the Gumugou cemetery).

Considering the similar and continuous burial behaviours in the two cemeteries, it can be assumed that both the Xiaohe and the Gumugou societies were stable and consistent. The Xiaohe cemetery had both the special clay-lid wooden coffins and the normal coffins in its early phase (burial layers 4th-5th), then turned to be stable and consistent with the normal coffins (burial layers 1st-3rd), and have developed better construction of the boat-shape coffins. The Gumugou cemetery contained two main burial patterns, type I; the sun-radiating-spokes burials and type II; the normal burials, which coexisted during the same time. Burials of type II were similar but not limited to strict rules. Burials in both the Xiaohe and the Gumugou cemetery were fairly heterogeneous, and the clay-lid wooden coffins in the Xiaohe cemetery and the sun-radiating-spokes burials in the Gumugou cemetery only took up in a small percentage of each cemetery. These special burial types could indicate special roles of the dead in their related societies. Either the dead had high social positions or possibly they actually had a different ancestry origin. It is argued here that the latter is something that is quite possible, considering the mixed populations in the two cemeteries.

The sun-radiating-spokes burials share some features with a similar type of grave, constructed of circular stone kerbs of the stone-pit graves. The sun-radiating-spokes burials might represent an adaption to the local desert environment, which had better access to wood rather than stones. Circular stone kerbs with stone-pit in centre were widely seen in Bronze Age Afanasievo and Andronovo burials, and also in the late Bronze Age and early Iron Age burials along the Tian Shan. The present study suggests a high possibility that the six males buried in the sun-radiating-spokes graves came from the contemporary parallel Andronovo horizon, and kept some of their own ancestry memories in an adapted way.

xinjiang-afanasievo-andronovo-bmac-tian-shan
An assumption of the spreading/expansion routes stone burial construct.

Societies

Although the Xiaohe and Gumugou societies were stable and consistent, it does not mean that the societies were isolated, and we can see strong indications of them being open to the outside. With time, the Xiaohe population were getting even more diverse origins, as newcomers kept joining the group from outside. However, the burial behaviours in the Xiaohe cemetery did not change as a consequence if these additions. This suggests that the newcomers inherited the local burial customs, and strongly indicates that they became part of the community and adopted the new social identity, possibly through marriage. As a result, the diverse populations can well explain the coexistence of different cultural elements in the burials, e.g. cattle, sheep/goats, camel hair (from Central Asia), grains of wheat (from the west) and millet (from the east), etc.

The Xiaohe and the Gumugou societies were similar, but the Xiaohe society developed to a more advanced level both in economy and in social structure. First, the oasis-based economic system of the Xiaohe and the Gumugou had similar husbandry, but later this was developed to different extent. Both societies mainly relied on livestock, and while the Xiaohe people favoured cattle, the Gumugou people favoured sheep/goats. The two societies also developed agriculture, which can be seen from the grains of wheat and millet. It has been shown that grains of wheat are bread wheat. The Xiaohe people also cooked porridge with millet and milk, and had dairy products.

From these evidences, we can assume that the Xiaohe people have developed a stronger economic level. Secondly, the Xiaohe society had more distinguished gender roles, resulting in different social roles for men and women in terms of work and religions. The female and male dead were buried in a distinguished way with loin-cloths and wooden monuments. Sexual identity on a social level refers to how people consider and expect different genders to act and behave under the social and cultural framework. In the Xiaohe society, men carried out hunting tasks (creatures like vultures, badgers, lizards, snakes); women were associated to the rebirth of lives. To synthesize, a possible relation between the Xiaohe and the Gumugou societies is that they represent two parallel groups who shared similar economic systems because of the similar environment, or that there is a chronological difference where the Gumugou people may have existed earlier. The absolute dating information from the two cemeteries is insufficient to rule out the second situation.

tarim-basin-regions
The area division of the Tarim Basin and its surroundings (The division is made based on the mountain ranges including Altai Mountains, Tian Shan, and Kunlun Mountains, and also the distribution of ancient cemeteries in the whole Xinjiang generally.)

Surroundings

To place the Xiaohe horizon in the larger context of the Bronze Age burials in its surroundings, the hypothesis presented in this study is that the Xiaohe-Gumugou people might possibly represent a parallel to the Andronovo groups, with an eastward migration, that developed their own societies and ethnicities in the Tarim Basin with some ancestral memories still preserved. Considering the location and the geographical features of Xinjiang, the Altai Mountains and the Tian Shan left open access from the Eurasian Steppe to the Dzungarian Basin. The Hami Basin-the Balikun Grassland was the first intersection area to combine the possible western and eastern cultural influences. To pass by the Turpan Basin and enter into the Tarim Basin, there were two possible routes, one northern route along the southern edge of Tian Shan, and one southern route along the northern edge of Kunlun Mountains.

In the early Bronze Age, the burials in Xinjiang had some clear typical geographic features that distinguish them from their surroundings. But from the late Bronze Age to the early Iron Age, the tradition with circular kerbs of stones with stone-pits burials expanded along the southern edge of the Tian Shan, which was a major shift of burial practice that possibly could be linked to the expansion of the Andronovo horizon or a general nomadic expansion.

Although there were no horses or wagons found in the Xiaohe burials, the wooden horse-hoof objects were an indication of horses, which did not exist in their daily lives anymore, but possibly were related to some settlers’ ancestral memories of their nomadic origins. However, it was more important for them to assimilate to the common social identities of their new group. After people died, it was preferred to be buried in the communal cemetery. Even if the dead bodies were lost, wooden substitutes will be used in graves to represent the dead, since they believed in afterlife and thought that the end of the death is rebirth.

Comments

While the results of Li et al. (2010, 2015) of Xiaohe mummies regarding Y-chromosome haplogroups – showing mostly R1a(xZ93) – and radiocarbon dates of the samples are yet to be confirmed, Proto-Tocharians are known to have had contacts with Samoyeds, early Indo-Iranians (in turn in contact with the BMAC language), then into Common Tocharian with ancient Iranians, and then Indo-Aryan and Iranian languages again (for more on this, see Ged Carling‘s publications).

The connection of the Tocharian branch with Afanasevo is essentially indisputable today, like that of Late Proto-Indo-European with late Repin/early Yamna, even more so than it was just 10 years ago, thanks to the most recent genetic investigation. The common genetic stock of Yamna and Afanasevo – as well as that of East Bell Beakers and Palaeo-Balkan peoples – fits perfectly earlier predictions based on the linguistic estimates of the separation and evolution of the diverse language communities, and the tentative attribution to Eurasian steppe-related cultures.

early-bronze-age-tocharian-chemurchek
Tentative identification of language groups among Early Bronze Age cultures. Pre-/Proto-Tocharian is traditionally associated with Chemurchek. See full image.

The trail leading from Afanasevo to Common Tocharians, on the other hand, seems to be more tricky, not unlike many other Indo-European-speaking groups from Europe and Asia, whose precise evolution until their historical attestation is often unclear. Nevertheless, the eventual presence of diverse haplogroups among historical Tocharians – whether they coincide with ancient DNA recovered from BMAC, South India, Andronovo, or Bronze Age Tian Shan populations – will only be relevant to understand the genetic evolution of the speakers of Tocharian during its different stages.

If the genetic trail backwards from known Tocharians to (earlier) unknown Common Tocharians, and forwards from known Pre-Tocharians to (later) unknown Proto-Tocharians leads unequivocally to these populations from the Xiaohe cultural horizon, this paper shows one of the mechanisms through which peoples of the Andronovo cultural horizon (or, more precisely, male lines derived from it) may have become integrated into a Tocharian-speaking population, not dissimilar to what happened in the steppes between Uralic-speaking Abashevo and Pre-Proto-Indo-Iranian-speaking Catacomb-Poltavka to form the Proto-Indo-Iranian-speaking Sintashta-Potapovka-Filatovka culture.

As we have discussed in this blog many times over, to solve this ethnolinguistic identification of prehistoric cultures one needs to investigate ancient DNA in combination with linguistic guesstimates and the Indo-European homeland problem from a wide anthropological perspective. People not understanding this simple concept are bound to end up in some comical Tocharo-Indo-Iranian grouping related to Corded Ware ancestry from Andronovo, similar to the Celto-Ibero-Basques of elevated CEU BA ancestry and hg. R1b-P312 to the south of the Pyrenees during the Iron Age from Olalde et al. (2019), and to the Balto-Finno-Slavs of hg. R1a-Z283 and elevated “Steppe ancestry” in the BA-IA East Baltic from Saag et al. (2019)

Related

Balto-Slavic accentual mobility: an innovation in contact with Balto-Finnic

bronze-age-germanic-balto-slavic

Some very specific prosodic innovations affected the Balto-Slavic linguistic community, probably at a time when it already showed internal dialectal differences. Whether those innovations were related to archaic remnants stemming from the parent Proto-Indo-European language, and whether that disintegrating community included different dialects, remains an object of active debate.

“Archaic” Balto-Slavic?

The main question about Balto-Slavic is whether this concept represents a single community, or it was rather a continuum formed by two (Baltic and Slavic) or possibly three (East Baltic, West Baltic, Slavic) neighbouring communities, speaking closely related Northern European dialects, which just happened to evolve very close to each other, i.e. in cultures that were closer to each other than they were to Germanic or Balto-Finnic.

In my opinion, their similarities warrant the reconstruction of a single original central-east European community since the dissolution of Bell Beakers, speaking a North-West Indo-European dialect, and most internal differences between Baltic and Slavic may be explained as innovations. The precise identification of a Proto-Balto-Slavic community remains elusive, although the Unetice-Iwno-Mierzanowice triangle remains the best bet, with Trzciniec showing what seems like an Early Slavic-like population reaching up to the East Baltic.

bell-beaker-balto-slavic-germanic
Bell Beaker expansion in eastern Europe and around the Baltic.

The reconstruction of a common Balto-Slavic proto-language is known to range from difficult to impossible, depending on who you ask, not the least because of the differences that are discussed in this post, and which have been the own battlefield created by Balticists and Slavicists for decades. The old tenet that Balto-Slavic had inherited some traits directly from PIE is – in contrast with e.g. the Italo-Celtic concept – surprisingly vivid still today.

Take, for example, these internal differences and supposedly archaic traits:

  • The ruKi rule, where Baltic shows mostly *is, *us, and Slavic shows *, *; or the different output of Satemization in Baltic compared to Slavic (and both compared to Indo-Iranian). Nevertheless, the Satemization trends in Balto-Slavic and Indo-Iranian are usually explained together and taken as a sign of a traditional three-velar system for PIE.
    • If you consider Satemization as a late trend in Balto-Slavic, affecting each dialect in a different way, and thus Balto-Slavic phonetic evolution clearly distinct from the Indo-Iranian trend, rejecting trictectalism, this problem is solved. This would also solve the impossible Indo-Slavonic problem, and the paradox of Balto-Slavic sharing a genetic phylum with Germanic and Italo-Celtic.
    • If you, however, conflate these differences and North-West Indo-European features with an ad hoc explanation of a hypothetic Centum dialect called Temematic, which intends to solve their (in Holzer’s words) unlösbaren inconsistencies, you essentially add a whole new inconsistency without solving their previous ones. For a full rebuttal of Holzer‘s Temematic etymologies, see Matasović (2014).
  • Kortlandt’s reconstruction of a PIE 3rd singular *-e (Baltic from *-et, Slavic from *-eti) and 3rd plural *-o, which would have been replaced independently in other Indo-European dialects (by *-eti, *-onti), is reminiscent of his own reconstruction of laryngeals almost up to the attestation of all Indo-European dialects, including Baltic. If you consider these traits an innovation, this artificially created problem is immediately solved.
  • Genitive plural Pre-Baltic *-ōm vs. Pre-Slavic *-ŏm is another commonly cited example. However, I would place this difference among other similar differences found within other related IE dialects, hence a common phonetic innovation (see e.g. below for the classicist view of unstable obliques).
  • Kortlandt’s reconstruction of oblique cases in *-m-, shared with Germanic, as stemming from a common Middle PIE *-mus (based essentially on Old Lithuanian *-mus and on a non-existent equivalent Anatolian formation), hence different from those in *-bʰ-. While you can argue for infinite more reasonable alternatives, the most often cited one is the ins.-dat. pl. *-bʰ- as a common NWIE innovation based on ins. sg. *bʰi-, while forms in *-m- (including ins. sg.) as a Northern European phonetic innovation. The simplest, most elegant explanation I’ve read to date (I think by Rémy Viredaz) is the similar bilabial change of Giacobo/Giacomo in Italian…

As you can see, some Balto-Slavicists could have written whole books about how their object of study holds the key to solve problems on common Proto-Indo-European paradigms, some of which wouldn’t need solving if they hadn’t been started by Balto-Slavicists themselves…

While all of these “archaic” traits are easily dismissed without further ado (except for some understandable damaged pride among academics), there is one especially pervasive idea among those willing to find the white whale of laryngeal remnants in Indo-European languages (see here for other examples of dubious laryngeal remains).

prophecy-before-battle
The prophecy before the battle, Józef Ryszkiewicz, 1890. Or, how to conjure laryngeal remnants in Balto-Slavic.

Accentual development in contact

Whichever position one prefers, the general argument is that the Balto-Slavic accentual system is non-trivial for the classification of both dialects into a common branch. However, that would only be completely true if it were a common innovation, but not so much if it were a natural laryngeal evolution.

In fact, the broken tone preserving a PIE laryngeal, as proposed by Kortlandt – continuing Meillet’s idea of synchronous PIE-PBS developments – was always very difficult to accept. Even the rising pronunciation is not original, and represents a shift of the accent on the initial syllable in Latvian…

In my opinion, the derivation of a modern phenomenon from a PIE laryngeal must always raise a red flag (see below on archaisms vs. innovations in IE languages). As you can see from my take of the fable in Balto-Slavic, which uses Kortlandt’s reconstruction, I preferred not to take into account the reconstructed accents. The fable remains thus a model of what could have been a common Proto-Balto-Slavic, unlike other reconstructions, which are much less tentative.

NOTE. You could argue that accents may be reconstructed in spite of the wrong theory behind them, but this is not true; at least not of all reconstructed accents, some of which require further assumptions. Think about it this way: I wouldn’t take into account a reconstruction of Germanic accent which used Danish glottalized tone for a hypothetical Proto-Germanic laryngeal, even if most accents seemed correct at first sight. The truth is, I didn’t want to dedicate time to go through each reconstructed word and its explanation, so it was easier to delete them all, even though that’s not an actual solution, either. You will find the same doubts in the description of Balto-Slavic evolution in my old Modern Indo-European grammar. The introduction to IE dialects was partially copied from Wikipedia (which, in the case of Balto-Slavic, essentially summarized data from Kortlandt), but in the grammar I just tried to keep the basics, and not very successfully, because you need a comprehensive and coherent description of a language’s evolution. That’s how messed up the question was, and how it still is, even though 15 years of research have passed…

Despite the idea of an “archaic Balto-Slavic”, especially prevalent among older researchers, the current trend is to consider Balto-Slavic prosodic changes as a natural innovation, even among those who would artificially reconstruct laryngeal remnants up to late Balto-Slavic stages.

NOTE. You can read more about the Proto-Indo-European laryngeal loss and vocalism. While the presence of certain laryngeals up to Late PIE is certain, the loss in many environments is also generally agreed upon. This is especially true of a hypothetical Indo-Slavonic branch, like that supported by Kortlandt: even those supporting multiple laryngeal loss events must admit that Indo-Iranian showed no laryngeals before its disintegration, whether they put this loss as an internal Proto-Indo-Iranian evolution, or they place it earlier. Tocharian attests to an evolution similar to the rest of Late PIE dialects (hence to a quite early laryngeal loss trend), and Balkan dialects (supposedly splitting before Indo-Slavonic) also lost laryngeals in a similar way, except for initial ones, which show vocalic output instead of full loss.

So, where does a laryngeal loss fit in this “Indo-Slavonic” scheme, exactly? Before the Tocharian split? Before the Balkan split? After the Balkan split but before the full loss in Indo-Iranian? And where exactly does this group belong regarding Corded Ware, and where does Germanic? No idea (but you can read Kortlandt try fitting his model with Gimbutas’ “Kurgan peoples”). Because one thing is to reconstruct Proto-Greek, or Proto-Celtic, or Proto-Italic forms without laryngeals and to put them in relation with a purely theoretical three-laryngeal PIE, and a different one is to reconstruct laryngeals (including in environments which were already lost in Tocharian) up to Proto-Baltic and Proto-Slavic, which seems more than just a bit of a stretch…

mallory-adams-tree
Indo-European dialectal relationships, from Mallory and Adams (2006).

Thomas Olander offered a summary of the current positions regarding the Balto-Slavic accentual system recently in Indo-European heritage in the Balto-Slavic accentuation system (2013), which also contains a summary of his Mobility Law, to explain this phenomenon as a common Pre-Baltic and Pre-Slavic innovation.

Andersen, an advocate of different Baltic and Slavic dialects developing in contact with Satem dialects, suggested in The Satem Languages of the Indo-European Northwest. First Contacts? (2009), partially based on Olander’s initial proposal, that Baltic and Slavic accentual mobility arose as a result of contact with languages with fixed word-initial ictus: the accent was lost in the word-final mora in pre-Proto-Baltic and, independently, in pre-Proto-Slavic. Hence, the central innovation, the accent loss

technically is not a shared Slavic and Baltic innovation. On the contrary. It shows that the speakers of the Pre-Slavic and Pre-Baltic dialects formed bilingual communities with speakers of contact dialects that were of the same prosodic type, viz. had fixed initial ictus but no free accent.

In the meantime, Olander (2019) has found out about more real-world examples of this same phenomenon:

Prosodic features are known to be susceptible to contact influence (Salmons 1992:1 and passim). While it does not directly influence the evaluation of the Mobility Law as a non-trivial innovation, it is interesting that most of the alleged parallels are indeed considered to be contact-induced changes due to influence from languages with an ictus on the word-initial syllable (Andersen 2009: 11-14; Rinkevičius 2013): Balto-Fennic in the case of the Karelian and (perhaps through Latvian as an intermediary) Žemaitian dialects, and Hungarian in the case of the Slavonian dialects (for Karelian see Jakobson 1938/2002: 239; Veenker 1967: 74; Thomason & Kaufman 1988: 122, 241; Salmons 1992: 41- 42; for Žemaitian see Zinkevičius 1966: 45- 46; for Slavonian see Ivić 1958: 287).

I am not aware of any hypotheses on a contact-induced origin for Greek prosodic innovations, but it is at least worth noting that there is agreement on significant substrate influence on Greek. While we may speculate that these substrate language(s) had word-initial ictus like Balto-Fennic and Hungarian, we do not have any actual information about the prosodic system(s) (thus even Beekes 2014: 9, who in other respects provides a fairly detailed picture of the substrate).

The parallels from other speech varieties show that an accent loss of the type suggested for a pre-stage of Baltic and Slavic is a type of prosodic change that has occurred several times in different various systems. In the context of the present paper this means that the sound law itself cannot be classified as a non-trivial innovation; it may have taken place in already differentiated dialects or languages. Also, the parallels suggest that a loss of the accent may be the result of influence from languages with fixed word-initial ictus.

In this time when even linguists agree that substrate/contact languages have to be related to specific ethnolinguistic groups (see here for Germanic), the fact that Olander stops short of naming this substrate behind Pre-Baltic and Pre-Slavic as being Late Uralic in general, or Balto-Finnic in particular, is surprising.

NOTE. Not the least because Olander is part of the Homeland Timeline map project of the Copenhagen group (their website is not working right now), and they placed Volosovo as Uralians expanding with Netted Ware in contact with the Baltic during the Bronze Age…So what’s to doubt about Balto-Slavic – Balto-Finnic contacts, exactly? Maybe if Balto-Finnic was the substrate language behind Balto-Slavic (as it was in Germanic), it would mean that Uralic languages were previously spoken in territories that became later Germanic- and Balto-Slavic-speaking?

copenhagen-group-map
Still image from the Copenhagen Timeline Map (accessed one year ago), showing in green Volosovo hunter-gatherers who, according to the map, later expand to the north-east with Netted Ware…

Archaism vs. Innovation

If we tried to describe these trends of explaining peculiar traits in recent Indo-European dialects as archaism vs. innovation from a purely theoretical point of view, we could roughly distinguish two different positions (with infinite variants, of course) among academics – just like we could find people more inclined to leftist or rightist trends when speaking about economy. When it comes to linguistics, which is the least messed-up field where one can describe Indo-European and Indo-Europeans, I think we can find two alternative basic tenets:

  • One idea would hold that the oldest attested dialects – and those with an older guesstimated proto-language – are the gold standard as to what the original situation may have been, and about what could be described as an archaism. For example, Ancient Greek and Mycenaean or Vedic Sanskrit for old dialects; Tocharian, or Italic dialects for those with quite old guesstimates, each for different reasons; and Anatolian for both, old dialect and attested early.
  • NOTE. Nevertheless, the phonology of Anatolian inscriptions is often difficult to ascertain, and its ancient dialectal nature stemming from a Middle PIE stage may still be disputed by some. The archaic nature of Tocharian seems to be maybe less generally accepted than that of Anatolian, but I would say there is general consensus on the matter today.

  • The other general idea would support that the most isolated dialects are those which may hold the key to the oldest Indo-European traits, somehow hidden from external influences and areal contacts, and thus from generalized innovative trends that have affected the best known ancient dialects. In that sense, languages like Slavic, Baltic, Albanian, or Armenian – as well as some Balkan fragmentary dialects – are quite common aims of study to reveal exceptional PIE traits.

I think the education system in Southern Europe and South Asia is that of formal classicists. In eastern Europe, I’d reckon the education system – especially in regions that were never connected to the Graeco-Roman tradition – favours linguistics as a study of the own and related proto-languages. For northern Europe, I would say it’s 50/50, especially in Scandinavia, depending on whether classicists or linguists dominate over the departments of Indo-European. For example, while Germany or Austria would maybe lean more toward the classics, Copenhagen’s obsession with Germanic as the most archaic IE branch is well known…

birch-bark-manuscript-panini-grammar-treatise
A 17th-century birch bark manuscript of Pāṇini’s grammar treatise from Kashmir. Image from Wikipedia.

Both positions, when blindly accepted, are bound to fail at some point or another:

  • If you take Classical Sanskrit, Classical Greek, or Classical Latin as an example of Proto-Indo-European, you are bound to make radical mistakes when reconstructing the parent language, more so if you disregard the oldest attested layers of the languages. An interesting view of the so-called Adradists at the Complutense University of Madrid – apart from their famous 9-laryngeal reconstruction – is that Middle PIE had only 5 cases, with a general (unstable) oblique one in Late PIE that later evolved into the attested 5 to 8 cases in the different dialects. That is, in my opinion, a fairly typical classicist error, which would be easily addressed by taking into account the oldest stages, like those attested in Mycenaean and in Old Latin, instead of focusing on classical grammar. The 8-case system is, in fact, one of the few true Balto-Slavic archaisms, supported by external comparanda.
  • On the other hand, if you take Albanian, Armenian, Baltic or Slavic, or even phonetically dubious data like those from some Anatolian inscriptions, you can eventually argue for anything. And I really mean anything; you are leaving the logic door wide open for any crazy-ass opinion about Proto-Indo-European based on traits found in modern languages: From how many velars evolved (if at all, because you may find all of them in Luwian, or still living in Albanian or in Armenian…) and their nature as ejective consonants in Late PIE (based on Armenian or Germanic); to how many laryngeals and when these laryngeals disappeared (if they actually did disappear, because some may even find them in Modern Lithuanian, in Armenian, or in Danish…); etc. Once you believe your own romantic view of some modern language(s) retaining traits from five thousand years ago, there is no stopping that; not for you, but not for anyone else, either.

NOTE. One of the funniest consequences of this type of ‘worldview’, where one assumes that – the own interpretations of – modern dialects are as reliable (or even more so than) ancient ones, and that Indo-European dialects somehow split at the same time from the parent language (so there was one common “full laryngeal” language, and then all attested dialects evolved from it) are some of the theories that you can easily find posted on Facebook’s group on Proto-Indo-European. Let’s just say, for the sake of simplicity, that you can compare English ‘sunrise’ with Spanish ‘sonrisa’ “smile” all you want, and assert that both reveal a common origin in PIE *sup- hence from the Sun and the smile going “up” or something, but any explanation as to how you reached that conclusion doesn’t make for the why this comparison shouldn’t have even started at all. Now replace English and Spanish with Armenian, Slavic, and/or Albanian, invent some new IE sound law, throw one or two laryngeals in the mix, and somehow this might get a pass among certain linguists…

celebration-svetovid-rügen
The Celebration of Svetovid on Rügen, Alphonse Mucha, The Slav Epic. Image from Wikipedia. Were Early Slavs some among a selected few romantic peoples to keep the “true” Indo-European language and traditions? Of course not.

While no one can deny the value of different Indo-European branches for the reconstruction of the parent language, no matter how recently they were attested, the only reasonable solution whenever a difficult case arises is to trust ancient dialects more than recent ones. Using data from fringe theories based on recent dialects to build a Proto-Indo-European paradigm, especially when there is contradictory data from ancient IE dialects, is flawed for two reasons:

  1. Languages attested later – especially after periods of population movements and contacts – would show, in general, a greater degree of change. Preferring Old Slavic or Classical Armenian to reconstruct Indo-European over ancient dialects like Ancient Greek, Vedic Sanskrit, or ancient Italic dialects is, in a way, like taking Byzantine Greek, Pali, or Old French as models, respectively.
  2. Classical languages are indeed modified due to the action of grammarians, but once standardized these “languages behind a state” (or religion) are less prone to change, due to the transmission of oral (and written) literature, education, commerce, etc. Languages left to unorganized tribes are less constrained in their evolution, and their internal (substrate) and external (contact) influences are greater and (what’s worse) unknown.

Baltic and Slavic, like Albanian or Armenian, are dialects attested very recently, which may have undergone complex internal and external influences we may never fully understand. Confronted with controversial or inexplicable traits compared to ancient branches like Greek, Indo-Iranian, or Italo-Celtic (especially if they fit with other Indo-European dialects), the conservative solution that will be right most of the time (and I mean 99.9999% of cases) is to assume they represent an innovation over Late PIE.

The fact that some researchers still use these recent dialects as a blank canvas instead, in order to propose unending new ideas about how to reconstruct IE proto-languages, or even older common PIE stages, is shocking. Not “R1a/Steppe” vs. “N1c/Siberian” haplogroup+ancestry bullshit-level shocking, but still unacceptable in a serious academic environment.

The only reason why Balto-Slavicists have failed so many times in this “unsolvable” question that seems to be Proto-Balto-Slavic reconstruction, apart from the known differences between Baltic and Slavic, is precisely the fixation of many with their object of study as a model for other IE languages (and thus for PIE), instead of taking the rest as a model for the reconstruction of Balto-Slavic (or of Proto-Baltic and Proto-Slavic).

Repeating ad nauseam the popular concept of Balto-Slavic (or Baltic and Slavic) being among the most archaic IE dialects, or the slowest evolving IE dialects, and cheap nationalist slogans of the sort, does not help this aim, and just reading or hearing that should make anyone cringe instantly. Not less than reading or hearing about Sanskrit being essentially equal to PIE, or spoken in the Indus Valley 10,000 years ago. Because we are not living in the 19th century, mind you.

Related

A Song of Sheep and Horses, revised edition, now available as printed books

cover-song-sheep-and-horses

As I said 6 months ago, 2019 is a tough year to write a blog, because this was going to be a complex regional election year and therefore a time of political promises, hence tenure offers too. Now the preliminary offers have been made, elections have passed, but the timing has slightly shifted toward 2020. So I may have the time, but not really any benefit of dedicating too much effort to the blog, and a lot of potential benefit of dedicating any time to evaluable scientific work.

On the other hand, I saw some potential benefit for publishing texts with ISBNs, hence the updates to the text and the preparation of these printed copies of the books, just in case. While Spain’s accreditation agency has some hard rules for becoming a tenured professor, especially for medical associates (whose years of professional experience are almost worthless compared to published peer-reviewed papers), it is quite flexible in assessing one’s merits.

However, regional and/or autonomous entities are not, and need an official identifier and preferably printed versions to evaluate publications, such as an ISBN for books. I took thus some time about a month ago to update the texts and supplementary materials, to publish a printed copy of the books with Amazon. The first copies have arrived, and they look good.

series-song-sheep-horses-cover

Corrections and Additions

Titles
I have changed the names and order of the books, as I intended for the first publication – as some of you may have noticed when the linguistic book was referred to as the third volume in some parts. In the first concept I just wanted to emphasize that the linguistic work had priority over the rest. Now the whole series and the linguistic volume don’t share the same name, and I hope this added clarity is for the better, despite the linguistic volume being the third one.

Uralic dialects
I have changed the nomenclature for Uralic dialects, as I said recently. I haven’t really modified anything deeper than that, because – unlike adding new information from population genomics – this would require for me to do a thorough research of the most recent publications of Uralic comparative grammar, and I just can’t begin with that right now.

Anyway, the use of terms like Finno-Ugric or Finno-Samic is as correct now for the reconstructed forms as it was before the change in nomenclature.

west-east-uralic-schema

Mediterranean
The most interesting recent genetic data has come from Iberia and the Mediterranean. Lacking direct data from the Italian Peninsula (and thus from the emergence of the Etruscan and Rhaetian ethnolinguistic community), it is becoming clearer how some quite early waves of Indo-Europeans and non-Indo-Europeans expanded and shrank – at least in West Iberia, West Mediterranean, and France.

Finno-Ugric
Some of the main updates to the text have been made to the sections on Finno-Ugric populations, because some interesting new genetic data (especially Y-DNA) have been published in the past months. This is especially true for Baltic Finns and for Ugric populations.

ananino-culture-new

Balto-Slavic
Consequently, and somehow unsurprisingly, the Balto-Slavic section has been affected by this; e.g. by the identification of Early Slavs likely with central-eastern populations dominated by (at least some subclades of) hg. I2a-L621 and E1b-V13.

Maps
I have updated some cultural borders in the prehistoric maps, and the maps with Y-DNA and mtDNA. I have also added one new version of the Early Bronze age map, to better reflect the most likely location of Indo-European languages in the Early European Bronze Age.

As those in software programming will understand, major changes in the files that are used for maps and graphics come with an increasing risk of additional errors, so I would not be surprised if some major ones would be found (I already spotted three of them). Feel free to communicate these errors in any way you see fit.

bronze-age-early-indo-european
European Early Bronze Age: tentative langage map based on linguistics, archaeology, and genetics.

SNPs
I have selected more conservative SNPs in certain controversial cases.

I have also deleted most SNP-related footnotes and replaced them with the marking of each individual tentative SNP, leaving only those footnotes that give important specific information, because:

  • My way of referencing tentative SNP authors did not make it clear which samples were tentative, if there were more than one.
  • It was probably not necessary to see four names repeated 100 times over.
  • Often I don’t really know if the person I have listed as author of the SNP call is the true author – unless I saw the full SNP data posted directly – or just someone who reposted the results.
  • Sometimes there are more than one author of SNPs for a certain sample, but I might have added just one for all.
ancient-dna-all
More than 6000 ancient DNA samples compiled to date.

For a centralized file to host the names of those responsible for the unofficial/tentative SNPs used in the text – and to correct them if necessary -, readers will be eventually able to use Phylogeographer‘s tool for ancient Y-DNA, for which they use (partly) the same data I compiled, adding Y-Full‘s nomenclature and references. You can see another map tool in ArcGIS.

NOTE. As I say in the text, if the final working map tool does not deliver the names, I will publish another supplementary table to the text, listing all tentative SNPs with their respective author(s).

If you are interested in ancient Y-DNA and you want to help develop comprehensive and precise maps of ancient Y-DNA and mtDNA haplogroups, you can contact Hunter Provyn at Phylogeographer.com. You can also find more about phylogeography projects at Iain McDonald’s website.

Graphics
I have also added more samples to both the “Asian” and the “European” PCAs, and to the ADMIXTURE analyses, too.

I previously used certain samples prepared by amateurs from BAM files (like Botai, Okunevo, or Hittites), and the results were obviously less than satisfactory – hence my criticism of the lack of publication of prepared files by the most famous labs, especially the Copenhagen group.

Fortunately for all of us, most published datasets are free, so we don’t have to reinvent the wheel. I criticized genetic labs for not releasing all data, so now it is time for praise, at least for one of them: thank you to all responsible at the Reich Lab for this great merged dataset, which includes samples from other labs.

NOTE. I would like to make my tiny contribution here, for beginners interested in working with these files, so I will update – whenever I have time – the “How To” sections of this blog for PCAs, PCA3d, and ADMIXTURE.

-iron-age-europe-romans
Detail of the PCA of European Iron Age populations. See full versions.

ADMIXTURE
For unsupervised ADMIXTURE in the maps, a K=5 is selected based on the CV, giving a kind of visual WHG : NWAN : CHG/IN : EHG : ENA, but with Steppe ancestry “in between”. Higher K gave worse CV, which I guess depends on the many ancient and modern samples selected (and on the fact that many samples are repeated from different sources in my files, because I did not have time to filter them all individually).

I found some interesting component shared by Central European populations in K=7 to K=9 (from CEU Bell Beakers to Denmark LN to Hungarian EBA to Iberia BA, in a sort of “CEU BBC ancestry” potentially related to North-West Indo-Europeans), but still, I prefer to go for a theoretically more correct visualization instead of cherry-picking the ‘best-looking’ results.

Since I made fun of the search for “Siberian ancestry” in coloured components in Tambets et al. 2018, I have to be consistent and preferred to avoid doing the same here…

qpAdm
In the first publication (in January) and subsequent minor revisions until March, I trusted analyses and ancestry estimates reported by amateurs in 2018, which I used for the text adding my own interpretations. Most of them have been refuted in papers from 2019, as you probably know if you have followed this blog (see very recent examples here, here, or here), compelling me to delete or change them again, and again, and again. I don’t have experience from previous years, although the current pattern must have been evidently repeated many times over, or else we would be still talking about such previous analyses as being confirmed today…

I wanted to be one step ahead of peer-reviewed publications in the books, but I prefer now to go for something safe in the book series, rather than having one potentially interesting prediction – which may or may not be right – and ten huge mistakes that I would have helped to endlessly redistribute among my readers (online and now in print) based on some cherry-picked pairwise comparisons. This is especially true when predictions of “Steppe“- and/or “Siberian“-related ancestry have been published, which, for some reason, seem to go horribly wrong most of the time.

I am sure whole books can be written about why and how this happened (and how this is going to keep happening), based on psychology and sociology, but the reasons are irrelevant, and that would be a futile effort; like writing books about glottochronology and its intermittent popularity due to misunderstood scientist trends. The most efficient way to deal with this problem is to avoid such information altogether, because – as you can see in the current revised text – they wouldn’t really add anything essential to the content of these books, anyway.

Continue reading

Official site of the book series:
A Song of Sheep and Horses: eurafrasia nostratica, eurasia indouralica