In order to improve the phylogeography of the male-specific genetic traces of Greek and Phoenician colonizations on the Northern coasts of the Mediterranean, we performed a geographically structured sampling of seven subclades of haplogroup J in Turkey, Greece and Italy. We resequenced 4.4 Mb of Y-chromosome in 58 subjects, obtaining 1079 high quality variants. We did not find a preferential coalescence of Turkish samples to ancestral nodes, contradicting the simplistic idea of a dispersal and radiation of Hg J as a whole from the Middle East. Upon calibration with an ancient Hg J chromosome, we confirmed that signs of Holocenic Hg J radiations are subtle and date mainly to the Bronze Age. We pinpointed seven variants which could potentially unveil star clusters of sequences, indicative of local expansions. By directly genotyping these variants in Hg J carriers and complementing with published resequenced chromosomes (893 subjects), we provide strong temporal and distributional evidence for markers of the Greek settlement of Magna Graecia (J2a-L397) and Phoenician migrations (rs760148062). Our work generated a minimal but robust list of evolutionarily stable markers to elucidate the demographic dynamics and spatial domains of male-mediated movements across and around the Mediterranean, in the last 6,000 years.
Two features of our tree are at odds with the simplistic idea of a dispersal of Hg J as a whole from the Middle East towards Greece and Italy and an accompanying radiation26. First, there is little evidence of sudden diversification between 15 and 5 kya, a period of likely population increase and pressure for range expansion, due to the Agricultural revolution in the Fertile Crescent. Second, within each subclade, lineages currently sampled in Turkey do not show up as preferentially ancestral. Both findings are replicated and reinforced by examining the previous landmark studies. Our Turkish samples do not coalesce preferentially to ancestral nodes when mapped onto these studies’ trees.
Additional relevant information on the entire Hg J comes from the discontinuous distribution of J2b-M12. The northern fringe of our sample is enriched in the J2b-M241 subclade, which reappears in the gulf of Bengal38,45, with low frequencies in the intervening Iraq46 and Iran47. No J2b-M12 carriers were found among 35 modern Lebanese, as contrasted to one of two ancient specimens from the same region35.
In summary, a first conclusion of our sequencing effort and merge with available data is that the phylogeography of Hg J is complex and hardly explained by the presence of a single population harbouring the major lineages at the onset of agriculture and spreading westward. A unifying explanation for all the above inconsistencies could be a centre of initial radiation outside the area here sampled more densely, i.e. the Caucasus and regions North of it, from which different Hg J subclades may have later reached mainland Italy, Greece and Turkey, possibly following different routes and times. Evidence in this direction comes from the distribution of J2a-M41045,48 and the early-49 or mid-Holocene50 southward spread of J1.
The lineage defined by rs779180992, belonging to J2b-M205, and dated at 4–4.5 kya, has a radically different distribution, with derived alleles in Continental Italy, Greece and Northern Turkey, and two instances in a Palestinian and a Jew. The interpretation of the spread of this lineage is not straightforward. Tentative hypotheses are linked to Southward movements that occurred in the Balkan Peninsula from the Bronze Age29,53, through the Roman occupation and later54.
The slightly older (5.6–6.3 kya) branch 98 lineage displays a similar trend of a Eastward positioning of derived alleles, with the notable difference of being present in Sardinia, Crete, Cyprus and Northern Egypt. This feature and the low frequency of the parental J2a-M92 lineage in the Balkans27 calls for an explanation different from the above.
Finally, we explored the distribution of J2a-L397 and three derived lineages within it. J2a-L397 is tightly associated with a typical DYS445 6-repeat allele. This has been hypothesized as a marker of the Greek colonizations in the Mediterranean55, based on its presence in Greek Anatolia and Provence (France), a region with attested Iron Age Greek contribution. All of our chromosomes in this clade were characterized also by DYS391(9), confirming their Anatolian Greek signature. We resolved the J2a-L397 clade to an unprecedented precision, with three internal markers which allow a finer discrimination than STRs. The ages of the three lineages (2.0–3.0 kya) are compatible with the beginning of the Greek colonial period, in the 8th century BCE. The three subclades have different distributions (Fig. 2B), with two (branches 57, 59) found both East and West to Greece, and one only in Italy (branch 58). As to Mediterranean Islands, J2a-L397 was found in Cyprus56 and Crete43. Its presence as one of the three branches 57–59 will represent an important test. In Italy all three variants were found mainly along the Western coast (18/25), which hosted the preferred Greek trade cities. The finding of all three differentiated lineages in Locri excludes a local founder effect of a single genealogy. Interestingly, an important Greek colony was established in this location, with continuity of human settlement until modern times. The sample composed of the same subjects displayed genetic affinities with Eastern Greece and the Aegean also at autosomal markers57. In summary, the distributions of branches 57–59 mirror the variety of the cities of origin and geographic ranges during the phases of the colonization process58.
So, there you have it, another proof that haplogroup J and CHG-related ancestry in the Mediterranean was mainly driven by different (and late) expansions of historic peoples.
This post should probably read “Consequences of Narasimhan et al. (2018),” too, since there seems to be enough data and materials published by the Copenhagen group in Nature and Science to make a proper interpretation of the data that will appear in their corrected tables.
The finding of late Khvalynsk/early Yamna migrations, identified with early LPIE migrants almost exclusively of R1b-L23 subclades is probably one of the most interesting findings in the recent papers regarding the Indo-European question.
Although there are still few samples to derive fully-fledged theories, they begin to depict a clearer idea of waves that shaped the expansion of Late Proto-Indo-European migrants in Eurasia during the 4th millennium BC, i.e. well before the expansion of North-West Indo-European, Palaeo-Balkan, and Indo-Iranian languages.
Late Khvalynsk expansions and archaic Late PIE
Like Anatolian, Tocharian has been described as having a more archaic nature than the rest of Late PIE. However, Pre-Tocharian belongs to the Late PIE trunk, clearly distinguishable phonetically and morphologically from Anatolian.
It is especially remarkable that – even though it expanded into Asia – it has more in common with North-West Indo-European, hence its classification (together with NWIE) as part of a Northern group, unrelated to Graeco-Aryan.
The linguistic supplement by Kroonen et al. accepts that peoples from the Afanasevo culture (ca. 3000-2500 BC) are the most likely ancestors of Tocharians.
NOTE. For those equating the Tarim Mummies (of R1a-Z93 lineages) with Tocharians, you have this assertion from the linguistic supplement, which I support:
An intermediate stage has been sought in the oldest so-called Tarim Mummies, which date to ca. 1800 BCE (Mallory and Mair 2000; Wáng 1999). However, also the language(s) spoken by the people(s) who buried the Tarim Mummies remain unknown, and any connection between them and the Afanasievo culture on the one hand or the historical speakers of Tocharian on the other has yet to be demonstrated (cf. also Mallory 2015; Peyrot 2017).
New samples of late Khvalynsk origin
These are are the recent samples that could, with more or less certainty, correspond to migration waves from late Khvalynsk (or early Yamna), from oldest to most recent:
The Namazga III samples from the Late Eneolithic period (in Turkmenistan), dated ca. 3360-3000 BC (one of haplogroup J), potentially showing the first wave of EHG-related steppe ancestry into South Asia. Not related to Indo-Iranian migrations.
NOTE. A proper evaluation with further samples from Narasimhan et al. (2018) is necessary, though, before we can assert a late Khvalynsk origin of this ancestry.
Afanasevo samples, dated ca. 3081-2450 BC, with all samples dated before ca. 2700 BC uniformly of R1b-Z2103 subclades, sharing a common genetic cluster with Yamna, showing together the most likely genomic picture of late Khvalynsk peoples.
NOTE 1. Anthony (2007) put this expansion from Repin ca. 3300-3000 BC, while his most recent review (2015) of his own work put its completion ca. 3000-2800. While the migration into Afanasevo may have lasted some time, the wave of migrants (based on the most recent radiocarbon dates) must be set at least before ca. 3100 BC from Khvalynsk.
NOTE 2. I proposed that we could find R1b-L51 in Afanasevo, presupposing the development of R1b-L51 and R1b-Z2103 lineages with separating clans, and thus with dialectal divisions. While finding this is still possible within Khvalynsk regions, it seems we will have a division of these lineages already ca. 4250-4000 BC, which would require a closer follow-up of the different inner late Khvalynsk groups and their samples. For the moment, we don’t have a clear connection through lineages between North-West Indo-European groups and Tocharian.
Subsequent and similar migration waves are probably to be suggested from the new sample of Karagash, beyond the Urals (attributed to the Yamna culture, hence maintaining cultural contacts after the migration waves), of R1b-Z2103 subclade, ca. 3018-2887 BC, potentially connected then to the event that caused the expansion of Yamna migrants westward into the Carpathians at the same time. Not related to Indo-Iranian migrations.
The isolated Darra-e Kur sample, without cultural adscription, ca. 2655 BC, of R1b-L151 lineage. Not related to Indo-Iranian migrations.
The Hajji Firuz samples: I4243 dated ca. 2326 BC, of haplogroup I1b, with a clear inflow of steppe ancestry; and I2327 (probably to be dated to the late 3rd millennium BC or after that), of R1b-Z2103 lineage. Not related to Indo-Iranian migrations.
NOTE. A new radiocarbon dating of I2327 is expected, to correct the currently available date of 5900-5000 BC. Since it clusters nearer to Chalcolithic samples from the site than I4243 (from the same archaeological site), it is possible that both are part of similar groups receiving admixture around this period, or maybe I2327 is from a later period, coinciding with the Iron Age sample F38 from Iran (Broushaki et al. 2016), with which it closely clusters. Also, the finding of EHG-related ancestry in Maykop samples dated ca. 3700-3000 BC (maybe with R1b-L23 subclades) offers another potential source of migrants for this Iranian group.
NOTE. Samples from Narasimhan et al. (2018) still need to be published in corrected tables, which may change the actual subclades shown here.
These late Khvalynsk / early Yamna migration waves into Asia are quite early compared to the Indo-Iranian migrations, whose ancestors can only be first identified with Volga-Ural groups of Yamna/Poltavka (ca. 3000-2400 BC), with its fully formed language expanding only with MLBA waves ca. 2300-1200 BC, after mixing with incoming Abashevo migrants.
While the authors apparently forget to reference the previous linguistic theories whereby Tocharian is more archaic than the rest of Late PIE dialects, they refer to the ca. 1,000-year gap between Pre-Tocharian and Proto-Indo-Iranian migrations, and thus their obvious difference:
The fact that Tocharian is so different from the Indo-Iranian languages can only be explained by assuming an extensive period of linguistic separation.
Potential linguistic substrates in the Middle East
A few words about relevant substrate language proposals.
What Gordon Whittaker proposes is a North-West Indo-European-related substratum in Sumerian language and texts ca. 3500 BC, which may explain some non-Sumerian, non-Semitic word forms. It is just one of many theories concerning this substratum.
This is a summary of his findings from his latest writing on the subject (a chapter of a book on Indo-European phonetics, from the series Copenhagen Studies in Indo-European):
In Sumerian and Akkadian vocabulary, the cuneiform writing system, and the names of deities and places in Southern Mesopotamia a body of lexical material has been preserved that strongly suggests influence emanating from a superstrate of Indo-European origin. his Indo-European language, which has been given the name Euphratic, is, at present, attested only indirectly through the filters of Sumerian and Akkadian. The attestations consist of words and names recorded from the mid-4th millennium BC (Late Uruk period) onwards in texts and lexical lists. In addition, basic signs that originally had a recognizable pictorial structure in proto-cuneiform preserve (at least from the early 3rd millennium on) a number of phonetic values with no known motivation in Sumerian lexemes related semantically to the items depicted. This suggests that such values are relics from the original logographic values for the items depicted and, thus, that they were inherited from a language intimately associated with the development of writing in Mesopotamia. Since specialists working on proto-cuneiform, most notably Robert K. Englund of the Cuneiform Digital Library Initiative, see little or no evidence for the presence of Sumerian in the corpus of archaic tablets, the proposed Indo-European language provides a potential solution to this problem. It has been argued that this language, Euphratic, had a profound influence on Sumerian, not unlike that exerted by Sumerian and Akkadian on each other, and that the writing system was the primary vehicle of this influence. he phonological sketch drawn up here is an attempt to chart the salient characteristics of this influence, by comparing reconstructed Indo-European lexemes with similarly patterned ones in Sumerian (and, to a lesser extent, in Akkadian).
His original model, based on phonetic values in basic proto-cuneiform signs, is quite imaginative and a very interesting read, if you have the time. His Academia.edu account hosts most of his papers on the subject.
We could speculate about the potential expansion of this substrate language with the commercial contacts between Uruk and Maykop (as I did), now probably more strongly supported because of the EHG found in Maykop samples.
NOTE. We could also put it in relation with the Anatolian language of Mari, but this would require a new reassessment of its North-West Indo-European nature.
Nevertheless, this theory is far from being mainstream, anywhere. At least today.
NOTE. The proposal remains still hypothetic, because of the flaws in the Indo-European parallels – similar to Koch’s proposal of Indo-European in Tartessian inscriptions. A comprehensive critic approach to the theory is found in Sylvie Vanséveren’s A “new” ancient Indo-European language? On assumed linguistic contacts between Sumerian and Indo-European “Euphratic”, in JIES (2008) 36:3&4.
References to Gutian are popping up related to the Hajji Firuz samples of the mid-3rd millennium.
The hypothesis was put forward by Henning (1978) in purely archaeological terms.
This is the relevant excerpt from the book:
(…) Comparativists have asserted that, in spite of its late appearance, Tokharian is a relatively archaic form of Indo-European.3 This claim implies that the speakers of this group separated from their Indo-European brethren at a comparatively early date. They should accordingly have set out on their migrations rather early, and should have appeared within the Babylonian sphere of influence also rather early. Earlier, at any rate, than the Indo-Iranians, who spoke a highly developed (therefore probably later) form of Indo-European. Moreover, as some of the Indo-Iranians after their division into Iranians and Indo-Aryans4 appeared in Mesopotamia about 1500 B.C., we should expect the Proto-Tokharians about 2000 B.C. or even earlier.
If, armed with these assumptions as our working hypothesis, we look through the pages of history, we find one nation – one nation only – that perfectly fulfills all three conditions, which, therefore, entitles us to recognize it as the “Proto-Tokharians”. Tis name was Guti; the intial is also spelled with q (a voiceless back velar or pharyngeal), but the spelling with g is the original one. The closing -i is part of the name, for the Akkadian case-endings are added to it, nom. Gutium etc. Guti (or Gutium, as some scholars prefer) was valid for the nation, considered as an entity, but also for the territory it occupied.
The text goes on to follow the invasion of Babylonia by the Guti, and further eastward expansions supposedly connected with these, to form the attested Tocharians.
Among the Gutian rulers is one Elulumesh, whose name is evidently Akkadian Elulum slightly “Gutianized” by the Gutian case(?) ending -eš.40 This Gutian ruler Elulum is obviously the same man whom we find participating in the scramble for power after the death of Shar-kali-sharrii; his name appears there in Sumerian form without mimation as Elulu.
The Gutian dynasty, from ca. 22nd c. BC appears as follows:
I don’t think we could derive a potential relation to any specific Indo-European branch from this simple suffix repeated in Gutian rulers, though.
The hypothesis of the Tocharian-like nature of the Guti (apart from the obvious error of considering them as the ancestors of Tocharians) remains not contrasted in new works since. It was cited e.g. by Gamkrelidze and Ivanov (1995) to advance their Armenian homeland, and by Mallory and Adams in their Encyclopedia (1997).
It lies therefore in the obscurity of undeveloped archaeological-linguistic hypotheses, and its connection with the attested R1b-Z2103 samples from Iran is not (yet) warranted.
This is part I of two posts on the most recent data concerning the earliest known Indo-European migrations.
Anatolian in Armi
I am reading in forums about “Kroonen’s proposal” of Anatolian in the 3rd millennium. That is false. The Copenhagen group (in particular the authors of the linguistic supplement, Kroonen, Barjamovic, and Peyrot) are merely referencing Archi (2011. “In Search of Armi”. Journal of Cuneiform Studies 63: 5–34) in turn using transcriptions from Bonechi (1990. “Aleppo in età arcaica; a proposito di un’opera recente”. Studi Epigrafici e Linguistici sul Vicino Oriente Antico 7: 15–37.), who asserted the potential Anatolian origin of the terms. This is what Archi had to say about this:
Most of these personal names belong to a name-giving tradition different from that of Ebla; Arra-ti/tulu(m) is attested also at Dulu, a neighbouring city-state (Bonechi 1990b: 22–25).28 We must, therefore, deduce that Armi belonged to a marginal, partially Semitized linguistic area different from the ethno-linguistic region dominated by Ebla. Typical are masculine personal names ending in -a-du: A-la/li-wa-du/da, A-li/lu-wa-du, Ba-mi-a-du, La-wadu, Mi-mi-a-du, Mu-lu-wa-du. This reminds one of the suffix -(a)nda, -(a)ndu, very productive in the Anatolian branch of Indo-European (Laroche 1966: 329). Elements such as ali-, alali-, lawadu-, memi-, mula/i- are attested in Anatolian personal names of the Old Assyrian period (Laroche 1966: 26–27, 106, 118, 120).
This was used by Archi to speculatively locate the state of Armi, in or near Ebla territory, which could correspond with the region of modern north-western Syria:
The onomastic tradition of Armi, so different from that of Ebla and her allies (§ 5), obliges us to locate this city on the edges of the Semitized area and, thus, necessarily north of the line running through Hassuwan – Ursaum – Irritum – Harran. If Armi were to be found at Banat-Bazi, it would have represented an anomaly within an otherwise homogenous linguistic scenario.34
Taken as a whole, the available information suggests that Armi was a regional state, which enjoyed a privileged relationship with Ebla: the exchange of goods between the two cities was comparable only to that between Ebla and Mari. No other state sent so many people to Ebla, especially merchants, lú-kar. It is only a hypothesis that Armi was the go-between for Ebla and for the areas where silver and copper were extracted.
This proposal is similar to the one used to support Indo-Aryan terminology in Mittanni (ca. 16th-14th c. BC), so the scarce material should not pose a problem to those previously arguing about the ‘oldest’ nature of Indo-Aryan.
NOTE. On the other hand, the theory connecting ‘mariannu‘, a term dated to 1761 BC (referenced also in the linguistic supplement), and put in relation with PIIr. *arya–, seems too hypothetical for the moment, although there is a clear expansion of Aryan-related terms in the Middle East that could support one or more relevant eastern migration waves of Indo-Aryans from Asia.
Potential routes of Anatolian migration
Once we have accepted that Anatolian is not Late PIE – and that only needed a study of Anatolian archaisms, not the terminology from Armi – , we can move on to explore the potential routes of expansion.
On the Balkan route
A current sketch of the dots connecting Khvalynsk with Anatolia is as follows.
Then we have Cernavoda I (ca. 3850-3550 BC), a culture potentially derived from the earlier expansion of Suvorovo chiefs, as shown in cultural similarities with preceding cultures and Yamna, and also in the contacts with the North Pontic steppe cultures (read a a recent detailed post on this question).
We also have proof of genetic inflow from the steppe into populations of cultures near those suggested to be heirs of those dominated by Suvorovo chiefs, from the 5th millennium BC (in Varna I ca. 4630 BC, and Smyadovo ca. 4500 BC, see image below).
If these neighbouring Balkan peoples of ca. 4500 BC are taken as proxies for Proto-Anatolians, then it becomes quite clear why Old Hittite samples dated 3,000 years after this migration event of elite chiefs could show no or almost no ancestry from Europe (for this question, read my revision of Lazaridis’ preprint).
NOTE. A full account of the crisis in the lower Danube, as well as the Suvorovo-Novodanilovka intrusion, is available in Anthony (2007).
The southern Balkans and Anatolia
The later connection of Cernavoda II-III and related cultures (and potentially Ezero) with Troy, on the other hand, is still blurry. But, even if a massive migration of Common Anatolian is found to happen from the Balkans into Anatolia in the late 4th / beginning of the 3rd millennium, the people responsible for this expansion could show a minimal trace of European ancestry.
Earlier third millennium cal BCE is the period of development of interconnected Early Bronze Age societies in Eurasia, which economic and social structures expressed variants of pre-state political structures, named in the specialized literature tribes and chiefdoms. In this work new arguments will be added to the chiefdom model of third millennium cal BC societies of Yunatsite culture in the Central Balkans from the perspectives of the interrelations between Dubene (south central Bulgaria) and Troy (northwest Turkey) wealth expression.
Possible explanations of the similarity in the wealth expression between Troy and Yunatsite chiefdoms is the direct interaction between the political elite. However, the golden and silver objects in the third millennium cal BCE in the Eastern Mediterranean are most of all an expression of economic wealth. This is the biggest difference between the early state and chiefdoms in the third millennium cal BCE in Eurasia and Africa. The literacy and the wealth expression in the early states was politically centralized, while the absence of literacy and wider distribution of the wealth expression in the chiefdoms of the eastern Mediterranean are indicators, that wider distribution of wealth and the existed stable subsistence layers prevented the formation of states and the need to regulate the political systems through literacy.
The only way to link Common Anatolians to their Proto-Anatolian (linguistic) ancestors would therefore be to study preceding cultures and their expansions, until a proper connecting route is found, as I said recently.
These late commercial contacts in the south-eastern Balkans (Nikolova also offers a simplified presentation of data, in English) are yet another proof of how Common Anatolian languages may have further expanded into Anatolia.
NOTE. One should also take into account the distribution of modern R1b-M269* and L23* subclades (i.e. those not belonging to the most common subclades expanding with Yamna), which seem to peak around the Balkans. While those may just belong to founder effects of populations preceding Suvorovo or related to Yamna migrants, the Balkans is a region known to have retained Y-DNA haplogroup diversity, in contrast with other European regions.
On a purely linguistic aspect, there are strong Hattic and Hurrian influences on Anatolian languages, representing a unique layer that clearly differentiates them from LPIE languages, pointing also to different substrates behind each attested Common Anatolian branch or individual language:
Phonetic changes, like the appearance of /f/ and /v/.
Split ergativity: Hurrian is ergative, Hattic probably too.
Increasing use of enclitic pronoun and particle chains after first stressed word: in Hattic after verb, in Hurrian after nominal forms.
Almost obligatory use of clause initial and enclitic connectors: e.g. semantic and syntactic identity of Hattic pala/bala and Hittite nu.
It seems that the Danish group is now taking a stance in favour of a Maykop route (from the linguistic supplement):
The period of Proto-Anatolian linguistic unity can now be placed in the 4th millennium BCE and may have been contemporaneous with e.g. the Maykop culture (3700–3000 BCE), which influenced the formation and apparent westward migration of the Yamnaya and maintained commercial and cultural contact with the Anatolian highlands (Kristiansen et al. 2018).
In fact, they have data to support this:
The EHG ancestry detected in individuals associated with both Yamnaya (3000–2400 BCE) and the Maykop culture (3700–3000 BCE) (in prep.) is absent from our Anatolian specimens, suggesting that neither archaeological horizon constitutes a suitable candidate for a “homeland” or “stepping stone” for the origin or spread of Anatolian Indo- European speakers to Anatolia. However, with the archaeological and genetic data presented here, we cannot reject a continuous small-scale influx of mixed groups from the direction of the Caucasus during the Chalcolithic period of the 4th millennium BCE.
It will not be surprising to find not only EHG, but also R1b-L23 subclades there. In my opinion, though, the most likely source of EHG ancestry in Maykop (given the different culture shown in other steppe groups) is exogamy.
The question will still remain: was this a Proto-Anatolian-speaking group?
My opinion in this regard – again, without access to the study – is that you would still need to propose:
A break-up of Anatolian ca. 4500 BC represented by some early group migrating into the Northern Caucasus area.
For this group – who were closely related linguistically and culturally to early Khvalynsk – to remain isolated in or around the Northern Caucasus, i.e. somehow ‘hidden’ from the evolving LPIE speakers in late Khvalynsk/early Yamna peoples.
Then appear as Old Hittites without showing EHG ancestry (even though they show it in the period 3700-3000 BC), near the region of the Armi state, where Anatolian was supposedly spoken already in the mid-3rd millennium.
Not a very convincing picture, right now, but indeed possible.
Also, we have R1b-Z2103 lineages and clear steppe ancestry in the region probably ca. 2500 BC with Hajji Firuz, which is most likely the product of the late Khvalynsk migration waves that we are seeing in the recent papers.
These migrations are then related to early LPIE-speaking migrants spreading after ca. 3300 BC – that also caused the formation of early Yamna and the expansion of Tocharian-related migrants – , which leaves almost no space for an Anatolian expansion, unless one supports that the former drove the latter.
NOTE. In any case, if the Caucasus route turned out to be the actual Anatolian route, I guess this would be a way as good as any other to finally kill their Indo-European – Corded Ware theory, for obvious reasons.
On the North Iranian homeland
A few thoughts for those equating CHG ancestry in IE speakers (and especially now in Old Hittites) with an origin in North Iran, due to a recent comment by David Reich:
In the paper it is clearly stated that there is no Neolithic Iranian ancestry in the Old Hittite samples.
Ancestry is not people, and it is certainly not language. The addition of CHG ancestry to the Eneolithic steppe need not mean a population or linguistic replacement. Although it could have been. But this has to be demonstrated with solid anthropological models.
NOTE. On the other hand, if you find people who considered (at least until de Barros Damgaard et al. 2018) steppe (ancestry/PCA) = Indo-European, then you should probably confront them about why CHG in Hittites and the arrival of CHG in steppe groups is now not to be considered the same, i.e why CHG / Iran_N ≠ PIE.
Since there has been no serious North Iranian homeland proposal made for a while, it is difficult to delineate a modern sketch, and I won’t spend the time with that unless there is some real anthropological model and genetic proof of it. I guess the Armenian homeland hypothesis proposed by Gamkrelidze and Ivanov (1995) would do, but since it relies on outdated data (some of which appears also in Gimbutas’ writings), it would need a full revision.
NOTE. Their theory of glottalic consonants (or ejectives) relied on the ‘archaism’ of Hittite, Germanic, and Armenian. As you can see (unless you live in the mid-20th century) this is not very reasonable, since Hittite is attested quite late and after heavy admixture with Middle Eastern peoples, and Germanic and Armenian are some of the latest attested (and more admixed, phonetically changed) languages.
This would be a proper answer, indeed, for those who would accept this homeland due to the reconstruction of ‘ejectives’ for these languages. Evidently, there is no need to posit a homeland near Armenia to propose a glottalic theory. Kortlandt is a proponent of a late and small expansion of Late PIE from the steppe, and still proposes a reconstruction of ejectives for PIE. But, this was the main reason of Gamkrelidze and Ivanov to propose that homeland, and in that sense it is obviously flawed.
Those claiming a relationship of the North Iranian homeland with such EHG ancestry in Maykop, or with the hypothetic Proto-Euphratic or Gutian, are obviously not understanding the implications of finding steppe ancestry coupled with (likely) early Late PIE migrants in the region in the mid-4th millennium.
A lot of interesting data, I will try to analyse its main implications, if only superficially, in sections.
Anatolia_EBA from Ovaören, and Anatolia_MLBA (this including Assyrian and Old Hittite samples), all from Kalehöyük, show almost no change in Y-DNA lineages (three samples J2a, one G2a), and therefore an origin of these people in common with CHG and Iranian Neolithic populations is likely. No EHG ancestry is found. And PCA cluster is just somehow closer to Europe, but not to EHG populations.
NOTE. Hittite is attested only in the late first half of the 2nd millennium, although the authors cite (in the linguistic supplement) potential evidence from the palatial archives of the ancient city of Ebla in Syria to argue that Indo-European languages may have been already spoken in the region in the late 3rd millennium BCE.
Regarding the Assyrian samples (one J2a) from Ovaören:
Layer V of GT-137 was the richest in terms of architectural finds and dates to the Early Bronze Age II. In this layer, 2 different structures and a well were uncovered. The well was filled with stones, pottery, and human skeletons (Figs. S2 and S3). In total, skeletons belonging to 22 individuals, including adults, young adults, and children, must belong to the disturbed Early Bronze Age II graves adjacent to the well (103). Pottery and stones found below the skeletons demonstrate that the water well was consciously filled and closed. The fill consists of dumped stones, sherds and skeletons, and the closing stones demonstrate that the water well was consciously filled and cancelled.
Regarding the site most likely associated with the emergence of Old Hittite (two samples J2a1, one G2a2b1), this is what we know:
The Middle Bronze Age at Kaman-Kalehöyük represented by stratum IIIc yields material remains (seals and ceramics) contemporary with the international trade system managed by expatriate Assyrian merchants evidenced at the nearby site of Kültepe/Kanesh. It is therefore also referred to as belonging to the “Assyrian Colony Period” (98). The stratum has revealed three burned architectural units, and it has been suggested that the seemingly site-wide conflagration might be connected to a destruction event linked with the emergence of the Old Hittite state (99). (…) Omura (100) suggests that the rooms could belong to a public building, and that it might even be a small trade center based on the types of artifacts recovered. Omura (100) has concluded that the evidence from the first complex indicates a battle between 2 groups took place at the site. It is possible that a group died inside the buildings, mostly perishing in the fire, while another group died in the courtyard.
The PCA (Fig. 2B) indicates that all the Anatolian genome sequences from the Early Bronze Age ( -2200 BCE) and Late Bronze Age (-1600 BCE) cluster with a previously sequenced Copper Age ( -3900- 3700 BCE) individual from Northwestern Anatolia and lie between Anatolian Neolithic (Anatolia_ N) samples and CHG samples but not between Anatolia_N and EHG samples.
(…) we are not able to reject a two-population qpAdm model in which these groups derive -60% of their ancestry from Anatolian farmers and -40% from CHG-related ancestry (p-value = 0.5). This signal is not driven by Neolithic Iranian ancestry.
NOTE. Anatolian Iron Age samples, from the Hellenistic period, which was obviously greatly influenced by different, later Indo-European migrations, does show a change in PCA.
Regarding CHG ancestry:
Ancient DNA findings suggest extensive population contact between the Caucasus and the steppe during the Copper Age (-5000-3000 BCE) (1, 2, 42). Particularly, the first identified presence of Caucasian genomic ancestry in steppe populations is through the Khvalynsk burials (2, 47) and that of steppe ancestry in the Caucasus is through Armenian Copper Age individuals (42). These admixture processes likely gave rise to the ancestry that later became typical of the Yamnaya pastoralists (7), whose IE language may have evolved under the influence of a Caucasian language, possibly ‘from the Maykop culture (50, 55). This scenario is consistent with both the “Copper Age steppe” (4) and the “Caucasian” models for the origin of the Proto-Anatolian language (56).
The CHG specific ancestry and the absence of EHG-related ancestry in Bronze Age Anatolia would be in accordance with intense cultural interactions between populations in the Caucasus and Anatolia observed during the late 5th millennium BCE that seem to come to an end in the first half of the 4th millennium BCE with the village-based egalitarian Kura-Araxes society (59, 60), thus preceding the emergence and dispersal of Proto-Anatolian.
Our results indicate that the early spread of IE languages into Anatolia was not associated with any large-scale steppe-related migration, as previously suggested (61). Additionally, and in agreement with the later historical record of the region (62), we find no correlation between genetic ancestry and exclusive ethnic or political identities among the populations of Bronze Age Central Anatolia, as has previously been hypothesized ( 63).
The Anatolian question
There is no steppe ancestry or R1b-M269 lineages near early historic Hittites. Yet.
Nevertheless, we already know about potentially similar cases:
N1c lineages and Siberian ancestry arrived late in North-East Europe, modifying the ancestry of North-East European groups – with each region showing its own different late waves of N lineages or Siberian ancestry. Even after the known bottlenecks and the subsequent expansion of recently arrived haplogroups and ancestry, there was not much cultural (or ethnolinguistic) impact.
So there seems to be thus no theoretical problem in accepting:
That neither steppe ancestry nor R1b-M269 subclades, already diminished in Bulgaria in the mid-5th millennium, did reach Anatolia, but only those Common Anatolian-speaking Aegean groups over whose ancestors Proto-Anatolians (marked by incoming EHG ancestry) would have previously dominated in the Balkans.
That steppe ancestry and R1b-M269 subclades did in fact arrive in the Aegean, but EHG was further diluted among the CHG-related population by the time of the historic Anatolian-speaking peoples in central Anatolia. Or, the most likely option, that their trace have not been yet found. Probably the western Luwian peoples, near Troy, were genetically closer to Common Anatolians.
What we can assert right now is that Proto-Anatolian must have separated quite early for this kind of data to show up. This should mean an end to the Late PIE origin of Anatolian, if there was some lost soul from the mid-20th century still rooting for this.
As I said in my review of Lazaridis’ latest preprint, we will have to wait for the appropriate potential routes of expansion of Proto-Anatolian to be investigated. As he answered, the lack of EHG poses a problem for steppe expansion into Anatolia, but there is still no better alternative model proposed.
This is what the authors have to say:
Our findings are thus consistent with historical models of cultural hybridity and “Middle Ground” in a multi-cultural and multi-lingual but genetically homogeneous Bronze Age Anatolia (68, 69). Current linguistic estimations converge on dating the Proto-Anatolian split from residual PIE to the late 5th or early 4th millennia BCE (58, 70) and place the breakup of Anatolian IE inside Turkey prior to the mid-3rd millennium (53, 71,72).
We cannot at this point reject a scenario in which the introduction of the Anatolian IE languages into Anatolia was coupled with the CHG-derived admixture prior to 3700 BCE, but note that this is contrary to the standard view that PIE arose in the steppe north of the Caucasus (4) and that CHG ancestry is also associated with several non-IE-speaking groups, historical and current. Indeed, our data are also consistent with the first speakers of Anatolian IE coming to the region by way of commercial contacts and small-scale movement during the Bronze Age. Among comparative linguists, a Balkan route for the introduction of Anatolian IE is generally considered more likely than a passage through the Caucasus, due, for example, to greater Anatolian IE presence and language diversity in the west (73). Further discussion of these options is given in the archaeological and linguistic supplementary discussions (48, 49).
If you are asking yourselves why the Danish school (of Allentoft, Kristiansen, and Kroonen, co-authors of this paper) was not so fast to explain the findings the same way the proposed their infamous Indo-European – steppe ancestry association (i.e. ancestry = language, ergoCHG = PIE in this case), and resorted to mainstream anthropological models instead to explain the incongruence, I can think of two main reasons:
The possibility of having an early PIE around the Caucasus, potentially closely related not only to Uralic to the north, but also to Caucasian languages, Sumerian, Afroasiatic, Elamo-Dravidian, etc. could be a good reason for those excited with these few samples to begin dealing with macro-language proposals, such as Eurasiatic and Nostratic. If demonstrated to be true, a Northern Iranian origin of Middle PIE would also help relieve a little bit the pressure that some are feeling about the potentially male-driven Indo-European continuity (even if not “autochthonous”) associated with the expansion of R1b-L23 subclades.
Interesting data from an early East Yamna offshoot at Karagash, ca. 3018-2887 BC, of R1b-Z2106 lineage, which shows some ancestry, lineage, and cultural continuity in Sholpan, ca. 2620-2468 BC, in Kazakhstan.
On the formation of Yamna and its CHG contribution, from the supplementary material:
An admixture event, where Yamnaya is formed from a CHG population related to KK1 [=Kotias, dated ca. 7800 BC] and an ANE population related to Sidelkino and Botai. We inferred 54% of the Yamnaya ancestry to come from CHG and the remaining 46% to come from ANE.
A split event, where the CHG component of Yamnaya splits from KK1. The model inferred this time at 27 kya (though we note the larger models in Sections S2.12.4 and S2.12.5 inferred a more recent split time [see below graphic]).
A split event, where the ANE component of Yamnaya splits from Sidelkino. This was inferred at about about 11 kya.
A split event, where the ANE component of Yamnaya splits from Botai. We inferred this to occur 17 kya. Note that this is above the Sidelkino split time, so our model infers Yamnaya to be more closely related to the EHG Sidelkino, as expected.
An ancestral split event between the CHG and ANE ancestral populations. This was inferred to occur around 40 kya.
On the expansion of domestication
CHG is not found in Botai, no gene flow from Yamna is found in its samples, and they are more related to East Asians, while Yamna is related to West Eurasians:
The lack of evidence of admixture between Botai horse herders and western steppe pastoralists is consistent with these latter migrating through the central steppe but not settling until they reached the Altai to the east (4). More significantly, this lack of admixture suggests that horses were domesticated by hunter-gatherers not previously familiar with farming, as were the cases for dogs (38) and reindeer (39). Domestication of the horse thus may best parallel that of the reindeer, a food animal that can be milked and ridden, which has been proposed to be domesticated by hunters via the “prey path” (40); indeed anthropologists note similarities in cosmological beliefs between hunters and reindeer herders (41). In contrast, most animal domestications were achieved by settled agriculturalists (5).
NOTE. I am not sure, but they seem to hint that there were separate events of horse domestication and horse-riding technique by the Botai and Yamna populations due to their lack of genetic contribution from the latter to the former. I guess they did not take into account farming spreading to the steppe without genetic contribution beyond the Dnieper… In fact, the superiority in horse-riding shown by the expanding Yamna peoples – as they state – should also serve to suggest from where the original technique expanded.
On the expansion of Yamna, and the different expansion of Steppe MLBA (with Indo-Iranian speakers) into Asia, further supporting Narasimhan et al. (2018), they have this to say:
However, direct influence of Yamnaya or related cultures of that period is not visible in the archaeological record, except perhaps for a single burial mound in Sarazm in present-day Tajikistan of contested age (44, 45). Additionally, linguistic reconstruction of proto-culture coupled with the archaeological chronology evidences a Late (-2300-1200 BCE) rather than Early Bronze Age (-3000-2500 BCE) arrival of the Indo-Iranian languages into South Asia (16, 45, 46). Thus, debate persists as to how and when Western Eurasian genetic signatures and IE languages reached South Asia.
Samples from the Namazga region (current Turkmenistan) from the Iron Age show an obvious influence from steppe MLBA (ca. 2300-1200 BC), and not steppe EBA (i.e. Yamna), population, in contrast with samples from the Chalcolithic (ca. 3300 BC), which don’t show this influence. This helps distinguish prior contacts with Iran Neolithic from the actual steppe population that expanded Indo-Iranian into Asia.
Very interesting therefore the Namazga CA sample (ca. 855 BC), of R1a-Z93 subclade, showing the sign of immigrant Indo-Aryans in the region. For more on this we will need an evaluation in common with the corrected data from Narasimhan et al. (2018), and all, including de Barros (Nature 2018), in combination with statistical methods to ascertain differences between early Indo-Aryans and Iranians.
Siberian peoples and N1c lineages
We have already seen how the paper on Eurasian steppe samples tries to assign Uralic to Neolithic peoples east of the Urals. The association with Okunevo is unlikely, since most are of haplogroup Q1a2, but they seem to suggest (combining both papers) that they accompanied N lineages from Siberian hunter-gatherers (present e.g. in Botai or Shamanka II, during the Early Neolithic), and formed part of (or suffered from) different demic diffusion waves:
These serial changes in the Baikal populations are reflected in Y-chromosome lineages (Fig. SA; figs. S24 to S27, and tables S13 and SI4). MAI carries the R haplogroup, whereas the majority of Baikal_EN males belong to N lineages, which were widely distributed across Northern Eurasia (29), and the Baikal_LNBA males all carry Q haplogroups, as do most of the Okunevo_EMBA as well as some present-day Central Asians and Siberians.
NOTE. Also interesting to see no R1a in Baikal hunter-gatherers after ca. 3500 BC, and a prevalence of N lineages as supported in a previous paper on the Kitoi culture, which some had questioned in the past.
In fact, the only N1c1 sample comes from Ust’Ida Late Neolithic, 180km to the north of Lake Baikal, apparently before the expansion of Q1a2a lineages during the EBA period. While this sample may be related to those expanded later in Finno-Ugric territory (although it may only be related to those expanded much later with Yakuts), other samples are not clearly from those found widely distributed among North-East Europeans only after the Iron Age, or – as in the case of Shamanka II (N1c2), they are clearly not of the same haplogroup.
Regarding Y-DNA data, once again almost 100% of samples from late Khvalynsk/Yamna and derived cultures (like Afanasevo and Bell Beaker) are R1b-L23, no single R1a-M417 lineage found, and few expected by now, if any, within Late Proto-Indo-European territory.
While they claim to take Y-DNA into account to assess migrations – as they do for example with Asian cultures – , their previous model of a Yamna “R1a-R1b community” remains oddly unchanged, and they even insist on it in the supplementary materials, as they do in their parallel Nature paper.
They have also expressly mitigated the use of ancestral components to assess populations, citing the ancestral and modern association of CHG ancestry with different ethnolinguistic groups in the Middle East, to dismiss any rushed conclusions on the origin of Anatolian, and consequently of Middle PIE. And they did so evidently because it did not fit the anthropological data that is mainstream today (supporting a Balkan route), which is the right thing to do.
However, they have apparently not stopped to reconsider the links of CWC and steppe ancestry to ancestral and modern Uralic peoples – although they expressly mention the strong connection with modern Karelians in the supplementary material.
Also, after Narasimhan et al. (2018), there is a clear genetic continuity with East Yamna (in ancestry as in R1b-L23 subclades), so their interpretations about Indo-Iranian in this paper and especially de Barros (Nature 2018) – regarding the Abashevo -> Sintashta/Srunba/Andronovo connection – come, again, too late.
Here is my translation of the reported summary (emphasis mine):
Khokhlov, A.A. Preliminary results of anthropological and genetic studies of materials of the Volga-Ural region of the Neolithic-Early Bronze Age by an international group of scientists.
In his report, A. A. Khokhlov introduced the scientific circle to the still unpublished data of the new Eneolithic burial ground Yekaterinovskiy Cape, which combines both the Mariupol and Khvalynsk features, and is dated to the fourth quarter of the V millennium BC. All samples analyzed had a Uraloid anthropological type, the chromosome of all samples belonged to haplogroup R1b1a2 (R-P312/S116), and to haplogroup R1b1a1a2a1a1c2b2b1a2. mtDNA to haplogroups U2, U4, U5. In the Khvalynsk burial grounds (first half of the IV millennium BC), the anthropological material differs in a greater variety. In addition to the Uraloid substratum, European wide-faced and southern European variants are recorded. To the samples are added haplogroup R1a1, O1a1, I2a2 to mtDNA T2a1b, H2a1.
So, first of all:
This is a reported summary of an oral communication, and it was written in a forum by a user. Unlike many out there, though, this one uses his real name, apparently assisted to the conference, and is himself a Russian of self-reported haplogroup R1a1a, so probably no interest in reporting this if it’s not true. Errors contained may have been made by him, and may not have been found in the original communication, since he says he wrote it by hand.
Something is obviously off with the haplogroup nomenclature. There has recently been mixing of standards, with some papers reporting R1b1a2-M269 (which is supposed to be now ISOGG V88), and most using R1b1a1a2-M269. What I had never seen is both standards used at the same time, as in this report, so I guess it’s another error of transcription.
It is doubtful that we would be talking about that recent referenced subclade of U106, but it can’t be a surprise to finally find L51 subclades alongside Z2103 in Proto-Indo-European territory. Also, the summary must obviously refer to Q1a1, not O1a1, and probably to the first half of the V (and not IV) millennium BC.
NOTE. Since Khokhlov, like Anthony, is an anthropologist, and this is an archaeological conference, we could suppose – if the report is truthful to what he said or what could be read in the summary – that this is the best he can do to report genetic material that was not assessed by him, but by a specialized lab, because it is not his field. I think the relevant data is nevertheless useful until we have the official publication.
From this report of archaeological works, we know there were 60 Early Eneolithic burials excavated in 2013, dating to the period between S’yezzhe and Khvalynsk. 15 more burials were excavated in 2017, and there are to date already around 93 reported burials, with ongoing excavations.
Assuming that what the report conveys is more or less correct in the basics, let’s derive some simple conclusions from the data:
The presence of some samples uniformly of R1b-L23 subclades that early will mean an end to the question of when this haplogroup dominated over the Khvalynsk population, and probably also when it appeared (rather early during this culture’s formation), since it would mean R1b-L23 subclades were widespread already by the end of the 5th millenium.
I can only guess that CHG ancestry will be found in these samples, based indirectly on what is reported in anthropological terms, and what appears later in Yamna and Afanasevo samples. This will contradict some recent comments suggesting an admixture driven by males from the south, and especially a Maykop -> Khvalynsk migration as a source of this component, placing the admixture at earlier times, and/or driven by exogamy. Therefore we can reject the formation of Middle PIE outside of Khvalynsk, and also the expansion of Proto-Anatolian from Maykop (unless Maykop itself is proposed as a steppe offshoot).
Khvalynsk was probably dominated by R1b-L23 subclades already ca. 4250-4000 BC, which – combined with earlier, more diverse Eneolithic samples from the region (dated ca. 5000-4500 BC) – would support an expansion of these subclades just before this time, in the mid-5th millennium BC, as I proposed based on ancient samples and TMRCAs of modern haplogroups. It is now more likely then that I was right in linking the expansion of R1b-M269 and early R1b-L23 lineages as chiefs with the spread of horse riding from early Khvalynsk, and thus associated also with the split and migration of the Proto-Anatolian community, probably with expanding Suvorovo-Novodanilovka chiefs.
NOTE. While the presence of R1b-P312 and R1b-U106 subclades that early does not seem likely based on their estimated formation dates (in turn based on modern descendants), this is not the first time that such estimations have been proven wrong with ancient samples (viz. the “late” Z93 subclade from Eneolithic Ukraine sample I6561). Also, we already have one sample labelled U106 supposedly expanding with Indo-Iranians, and a sample of an early L51 subclade in Central Asia potentially linked to Afanasevo migrants in the infamous tables of Narasimhan et al. (2018), which help support its early presence in the North Caspian area. Some of these younger subclades seem (based on TMRCAs and forming dates of modern haplogroups) more like a wrong ‘excessive-subclade-reporting fest’, probably due to the use of a certain software for inferences of Y-SNP calls from scarce material, but who knows.
EDIT (2 MAY 2018): A commenter in the forum cast doubts on the actual dates of the site, citing the reservoir effect in Khvalynsk which may show earlier radiocarbon dates than the actual ones. Since this is an international team well versed in archaeological remains of this region, and there have been already many samples and remains assessed before and after these dates, it is not very likely that they did not take such problems of radiocarbon dating into account when reporting the findings…
The publication of this and more data in a book is supposedly due for the summer, so let’s wait for the officially reported haplogroups, and for the corrected tables in Narasimhan et al. (2018), to draw the necessary detailed conclusions.
This post was emailed to subscribers of this blog on the 1st of May immediately after publication, with our Newsletter. If you want to keep up to date with the latest interesting information instantly (few mails will be submitted a month, if any), subscribe now.
EDIT (May 2017) The answer I received from the group to my questions regarding these samples can be read here.
User Camulogène Rix at Anthrogenica posted an interesting excerpt of Reich’s new book in a thread on ancient DNA studies in the news (emphasis mine):
Ancient DNA available from this time in Anatolia shows no evidence of steppe ancestry similar to that in the Yamnaya (although the evidence here is circumstantial as no ancient DNA from the Hittites themselves has yet been published). This suggests to me that the most likely location of the population that first spoke an Indo-European language was south of the Caucasus Mountains, perhaps in present-day Iran or Armenia, because ancient DNA from people who lived there matches what we would expect for a source population both for the Yamnaya and for ancient Anatolians. If this scenario is right the population sent one branch up into the steppe-mixing with steppe hunter-gatherers in a one-to-one ratio to become the Yamnaya as described earlier- and another to Anatolia to found the ancestors of people there who spoke languages such as Hittite.
The thread has since logically become a trolling hell, and it seems not to be working right for hours now.
This new idea based on ancestral components suffers thus from the same essential methodological problems, which equate it – yet again – to pure speculation:
It is a conclusion based on the genomic analysis of few individuals from distant regions and different periods, and – maybe more disturbingly – on the lack of steppe ancestry in the few samples at hand.
Wait, what? Steppe ancestry? So they are trying to derive potential genetic connections among specific prehistoric cultures with a poorly depicted genetic sketch, based on previous flawed concepts (instead of on anthropological disciplines), which seems a rather long stretch for any scientist, whether they are content with seeing themselves as barbaric scientific conquerors of academic disciplines or not. In other words, statistics is also science (in fact, the main one to assert anything in almost any scientific field), and you cannot overcome essential errors (design, sampling, hypothesis testing) merely by using a priori correct statistical methods. Results obtained this way constitute a statistical fallacy.
Even if the sampling and hypothesis testing were fine, to derive anthropological models from genomic investigation is completely wrong. Ancestral component ≠ population.
To include not only potential migrations, but also languages spoken by these potential migrants? It’s sad that we have a need to repeat it, but if ancestral component ≠ population, how could ancestral component = language?
The Proto-Indo-European-speaking community
This is what we know about the formation of a Proto-Indo-European community (i.e. a community speaking a reconstructible Proto-Indo-European language) in the Pontic-Caspian steppe, which is based on linguistic reconstruction and guesstimates, tracing archaeological cultures backwards from cultures known to have spoken ancient (proto-)languages, and helping both disciplines with anthropological models (for which ancient genomics is only helping select certain details) of migration or – rarely – cultural diffusion:
ca. 4500 BC. Khvalynsk probably speaking Middle Proto-Indo-European expands, most likely including Suvorovo-Novodanilovka chiefs into the North Pontic steppe, and probably expanding R1b-M269 lineages for the first time.
ca. 4000 BC. Separated communities develop, including North Pontic cultures probably gradually dominated by R1a-Z645 (potentially speaking Proto-Uralic); and Khvalynsk (and Repin) cultures probably dominated by R1b-L23 lineages, most likely developing a Late Proto-Indo-European already separated from Proto-Anatolian.
ca. 3500 BC. A Proto-Corded Ware population dominated by R1a-Z645 expands to the north, and slightly later an early Yamna community develops from Late Khvalynsk and Repin, expanding to the west of the Don River, and to the east into Afanasevo. This is most likely the period of reduction of variability and expansion of subclades of R1a-Z645 and R1b-L23 that we expect to see with more samples.
For those willingly lost in a myriad of new dreams boosted by the shallow comment contained in David Reich’s paragraph on CHG ancestry, even he does not doubt that the origin of Late Proto-Indo-European lies in Yamna, to the north of the Caucasus, based on Anthony’s (2007) account:
Innner genetic flow among steppe cultures in close contact.
Potentially stable seasonal exchange systems during the Eneolithic among certain steppe groups with settlements of the Northern Caucasus, which may have included bidirectional exogamy practices.
Just to be clear, an expansion of Proto-Anatolian to the south, through the Caucasus, cannot be discarded today. It will remain a possibility until Maykop and more Balkan Chalcolithic and Anatolian-speaking samples are published.
However, an original Early Proto-Indo-European community south of the Caucasus seems to me highly unlikely, based on anthropological data, which should drive any conclusion. From what I could read, here are the rather simplistic arguments used:
Gimbutas and Maykop: Maykop was thought to be (in Gimbutas’ times) a rather late archaeological culture, directly connected to a Transcaucasian Copper Age culture ca. 2400-2300 BC. It has been demonstrated in recent years that this culture is substantially older, and even then language guesstimates for a Late PIE / Proto-Anatolian would not fit a migration to the north. While our ignorance may certainly be used to derive far-fetched conclusions about potential migrations from and to it, using Gimbutas (or any archaeological theory until the 1990s) today does not make any sense. Still less if we think that she favoured a steppe homeland.
NOTE. It seems that the Reich Lab may have already access to Maykop samples, so this suggested Proto-Indo-European – Maykop connection may have some real foundation. Regardless, we already know that intense contacts happened, so there will be no surprise (unless Y-DNA shows some sort of direct continuity from one to the other).
Gamkrelidze & Ivanov: they argued for an Armenian homeland (and are thus at the origin of yet another autochthonous continuity theory), but they did so to support their glottalic theory, i.e. merely to support what they saw as favouring their linguistic model (with Armenian being the most archaic dialect). The glottalic theory is supported today – as far as I know – mainly by Kortlandt, Jagodziński, or (Nostraticist) Bomhard, but even they most likely would not need to argue for an Armenian homeland. In fact, their support of a Graeco-Aryan group (also supported by Gamkrelidze & Ivanov) would be against this, at least in archaeological terms.
Colin Renfrew and the Anatolian homeland: This conceptual umbrella of language spreading with farming everywhere has changed so much and so many times in the past 20 years, with so many glottochronological and archaeological estimates circulating, that you can support anything by now using them. Mostly used today for abstract models of long-lasting language contacts, cultural diffusion, and constellation analogies. Anyway, he strives to keep up-to-date information to revise the model, that much is certain:
Glottochronology, phylogenetic trees, Swadesh list analysis, statistical estimates, psychics, pyramid power, and healing crystals: no, please, no.
In principle, unlike many other recent autochthonous continuity theories, I doubt there can be much racial-based opposition anywhere in the world to an origin of Proto-Indo-European in the Middle East, where the oldest civilizations appeared – apart, obviously, from modern Northeast and Northwest Caucasian, Kartvelian, or Semitic speakers, who may in turn have to revisit their autochthonous continuity theories radically…
In fact, Proto-Anatolian and Common Anatolian speakers need not share any ancestral component, PCA cluster, or any other statistical parameter related to steppe populations, not even the same Y-DNA haplogroups, given that approximately three thousand years might have passed between their split from an Indo-Hittite community and the first attested Anatolian-speaking communities…We must carefully follow their tracks from Anatolia ca. 1500 BC to the steppe ca. 4500 BC, otherwise we risk creating another mess like the Corded Ware one.
In my opinion, the substantial contribution of EHG ancestry and R1a-M417 lineages to the Pontic-Caspian steppe (probably ca. 6500 BC) from Central or East Eurasia is the most recent sizeable genomic event in the region, and thus the best candidate for the community that expanded a language ancestral to Proto-Indo-European – whether you call it Pre-Proto-Indo-European, Pre-Indo-Uralic, or Eurasiatic, depending on your preferences.
An early (and substantial) contribution of CHG ancestry in Khvalynsk relative to North Pontic cultures, if it is found with new samples, may actually be a further proof of the Caucasian substrate of Proto-Indo-European proposed by Kortlandt (or Bomhard) as contributing to the differentiation of Middle PIE from Uralic. Genomics could thus help support, again, traditional disciplines in accepting or rejecting academic controversial theories.
In the case of an Early PIE (or Indo-Uralic) homeland, genomic data is scarce. But all traditional anthropological disciplines point to the Pontic-Caspian steppe, so we should stick to it, regardless of the informal suggestion written by a renown geneticist in one paragraph of a book conceived as an introduction to the field.
It seems we are not learning much from the hundreds of peer-reviewed, statistically (superficially, at least) sound genetic papers whose anthropological conclusions have been proven wrong by now. A lot of people should be spending their time learning about the complex, endless methods at hand in this kind of research – not just bioinformatics – , instead of fruitlessly speculating about wild unsubstantiated proposals.
As a final note, I would like to remind some in the discussion, who seem to dismiss the identification of CHG with Proto-Indo-European by supporting a “R1a-R1b” community for PIE, of their previous commitment to ancestral components in identifying peoples and languages, and thus their support to Reich’s (and his group’s) fundamental premises.
You cannot have it both ways. At least David Reich is being consistent.
The paper presents the result of analysis of charred food on the interior part of the vessels from the graves of the East Manych and West Manych Catacomb archaeological cultures (2500–2350 cal bc). The phytolith and pollen analyses identified pollen of wild steppe plants and phytoliths of domesticated gramineous plants determined as barley phytoliths. Direct 14С dating of one of the samples demonstrates that barley spikelets and stems were used in funeral rites by local steppe communities. However, there are no data suggesting that steppe inhabitants of the Lower Don Region were engaged in agriculture in the mid-3000 bc. Supposedly, barley could have reached the steppes through seasonal migrations of mobile pastoralists to the south, use of North Caucasus grasslands in the economic system of seasonal moves and exchange with local people. Nevertheless, presence of carbonized barley seeds in the occupation layers at North Caucasus settlements of 4000–3000 bc requires confirmation by direct 14С dating of such samples.
The results of studies of the chemical and microbiological properties of the soils buried under the barrows of the Eneolithic, Bronze, and Middle Ages periods of the southeast of the Russian Plain are presented. It was shown that the climate of the region in the Eneolithic period (4200–4100 BC) and in the Middle Ages (700 years ago) was more humid in comparison to the present time. The third millennium BC was characterized by a gradual increase of the climate aridity. Its peak was at the end of the III millennium BC. The number and biomass of microbial cells was maximal in soils buried in periods of high atmospheric humidity (4200–4100 and 3000–2800 BC) and sharply decreased during the aridization period in the second half of the III millennium BC. In general, the variability of indicators of microbocenosis conditions of desert–steppe buried soils of all ages from the burial mounds correlated with the centuries-old dynamics of the climate.
It is well known that access to more food – as in favorable crops and cattle feeding – may cause demographic explosions, and the second article – together with recent genomic data – may be yet another proof of that.
Until now, pastoralism seemed to be the main subsistence economy for most steppe groups. It seems that earlier Eneolithic contacts of certain steppe groups with settlements of the Northern Caucasus might have been not just to obtain prestige goods though, but – if proper radiocarbon dating confirms it – also implied essential goods, and maybe more stable seasonal exchange systems.
Such stable economic exchanges might have therefore included bidirectional exogamy practices, justifying the sizeable genomic contribution from the Caucasus.
At this point this is just another good theory to take into account.
I will not post details of Klejn’s model of North-South Proto-Indo-European expansion – which is explained in the article, and relies on the north-south cline of ‘steppe admixture’ in the modern European population -, since it is based on marginal anthropological methods and theories, including glottochronological dates, and archaeological theories from the Russian school (mainly Zalyzniak), which are obviously not mainstream in the field of Indo-European Studies, and (paradoxically) on the modern distribution of ‘steppe admixture’…
The most interesting aspects of the article are the reactions to the criticism, some of which can be used from the point of view of the Indo-European demic diffusion model, too. It is sad, however, that they didn’t choose to answer earlier to Heyd’s criticism (or to Heyd’s model, which is essentially also that of Mallory and Anthony), instead of just waiting for proponents of the least interesting models to react…
The answer by Haak et al.:
Klejn mischaracterizes our paper as claiming that practitioners of the Corded Ware culture spoke a language ancestral to all European Indo-European languages, including Greek and Celtic. This is incorrect: we never claim that the ancestor of Greek is the language spoken by people of the Corded Ware culture. In fact, we explicitly state that the expansion of steppe ancestry might account for only a subset of Indo-European languages in Europe. Klejn asserts that ‘a source in the north’ is a better candidate for the new ancestry manifested in the Corded Ware than the Yamnaya. While it is indeed the case that the present-day people with the greatest affinity to the Corded Ware are distributed in north-eastern Europe, a major part of the new ancestry of the Corded Ware derives from a population most closely related to Armenians (Haak et al., 2015) and hunter-gatherers from the Caucasus (Jones et al., 2015). This ancestry has not been detected in any European huntergatherers analysed to date (Lazaridis et al., 2014; Skoglund et al., 2014; Haak et al., 2015; Fu et al., 2016), but made up some fifty per cent of the ancestry of the Yamnaya. The fact that the Corded Ware traced some of its ancestry to the southern Caucasus makes a source in the north less parsimonious.
In our study, we did not speculate about the date of Proto-Indo-European and the locations of its speakers, as these questions are unresolved by our data, although we do think the genetic data impose constraints on what occurred. We are enthusiastic about the potential of genetics to contribute to a resolution of this longstanding issue, but this is likely to require DNA from multiple, as yet unsampled, ancient populations.
Klejn response to that:
Allegedly, I had accused the authors of tracing all Indo-European languages back to Yamnaya, whereas they did not trace all of them but only a portion! Well, I shall not reproach the authors for their ambiguous language: it remains the case that (beginning with the title of the first article) their qualifications are lost and their readers have understood them as presenting the solution to the whole question of the origins of Indo-European languages.
(…) they had in view not the Proto-Indo-European before the separation of the Hittites, but the language that was left after the separation. Yet, this was still the language ancestral to all the remaining Indo-European languages, and the followers of Sturtevan and Kluckhorst call only this language Proto-Indo-European (while they call the initial one Indo-Hittite). The majority of linguists (specialists in Indo-European languages) is now inclined to this view. True, the breakup of this younger language is several hundred years more recent (nearly a thousand years later according to some glottochronologies) than the separation of Anatolian languages, but it is still around a thousand years earlier than the birth of cultures derived from Yamnaya.
More than that, I analysed in my criticism both possibilities — the case for all Indo-European languages spreading from Yamnaya and the case for only some of them spreading from Yamnaya. In the latter case, it is argued that only the languages of the steppes, the Aryan (Indo- Iranian) are descended from Yamnaya, not the languages of northern Europe. Together with many scholars, I am in agreement with the last possibility. But, then, what sense can the proposed migration of the Yamnaya culture to the Baltic region have? It would bring the Indo-Iranian proto-language to that region! Yet, there are no traces of this language on the coasts of the Baltic!
My main concern is that, to my mind, one should not directly apply conclusions from genetics to events in the development of language because there is no direct and inevitable dependence between events in the life of languages, culture, and physical structure (both anthropological and genetic). They can coincide, but often they all follow divergent paths. In each case the supposed coincidence should be proved separately.
The authors’ third objection concerns the increase of the genetic similarity of European population with that of the Yamnaya culture. This increases in the north of Europe and is weak in the south, in the places adjacent to the Yamnaya area, i.e. in Hungary. This gradient is clearly expressed in the modern population, but was present already in the Bronze Age, and hence cannot be explained by shifts that occurred in the Early Iron Age and in medieval times. However, the supposed migration of the Yamnaya culture to the west and north should imply a gradient in just the opposite direction!
Regarding the arguments of Kristiansen and colleagues:
[They argue that] in two early burials of the Corded Ware culture (one in Germany, the other in Poland) some single attributes of Yamnaya origin have been found.
(…) if this is the full extent of Yamnaya infiltration into central Europe—two burials (one for each country) from several thousands (and from several hundreds of early burials)—then it hardly amounts to large-scale migration.
Quite recently we have witnessed the success of a group of geneticists from Stanford University and elsewhere (Poznik et al., 2016). They succeeded in revealing varieties of Y-chromosome connected with demographic expansions in the Bronze Age. Such expansion can give rise to migration. Among the variants connected with this expansion is R1b, and this haplogroup is typical for the Yamnaya culture. But what bad luck! This haplogroup connected with expansion is indicated by the clade L11, while the Yamnaya burials are associated with a different clade, Z2103, that is not marked by expansion. It is now time to think about how else the remarkable results reached by both teams of experienced and bright geneticists may be interpreted.
Regarding the work of Heyd,
(…) with regard to the barrow burials of the third millennium BC in the basin of the Danube, although they have been assigned to the Yamnaya culture, I would consider them as also belonging to
another, separate culture, perhaps a mixed culture: its burial custom is typical of the Yamnaya, but its pottery is absolutely not Yamnaya, but local Balkan with imports of distinctive corded beakers (Schnurbecher). I would not be surprised if
Y-chromosome haplogroups of this population were somewhat similar to those of the Yamnaya, while mitochondrial groups were indigenous. As yet, geneticists deal with great blocks of populations and prefer to match them to very large and generalized cultural blocks, while archaeology now analyses more concrete and smaller cultures, each of which had its own fate.
Iosif Lazaridis shares more thoughts on the discussion in his Twitter account:
As we mentioned in Haak, Lazaridis et al. (2015), the Yamnaya are the best proximate source for the new ancestry that first appears with the Corded Ware in central Europe, as it has the right mix of both ANE (related to Native Americans, MA1, and EHG), but also Armenian/Caucasus/Iran-like southern component of ancestry. The Yamnaya is a westward expansive culture that bears exactly the two new ancestral components (EHG + Caucasus/Iran/Armenian-like).
As for the Y-chromosome, it was already noted in Haak, Lazaridis et al. (2015) that the Yamnaya from Samara had Y-chromosomes which belonged to R-M269 but did not belong to the clade common in Western Europe (p. 46 of supplement). Also, not a single R1a in Yamnaya unlike Corded Ware (R1a-dominated). But Yamnaya samples = elite burials from eastern part of the Yamnaya range. Both R1a/R1b found in Eneolithic Samara and EHG, so in conclusion Yamnaya expansion still the best proximate source for the post-3,000 BCE population change in central Europe. And since 2015 steppe expansion detected elsewhere (Cassidy et al. 16, Martiniano et al. 17, Mittnik et al. 17, Mathieson et al. 17, Lazaridis et al. 2016 (South Asia) and …?…
I love the smell of new wording in the morning… viz. Yamnaya best proximate source for Corded Ware, Corded Ware might account for only a subset of Indo-European languages, Corded Ware representing Aryan languages (probably Klejn misinterprets what the authors mean, i.e. some kind of Indo-Slavonic or Germano-Balto-Slavic group)…
We shall expect more and more ambiguous rewording and more adjustments of previous conclusions as new papers and new criticisms appear.
Featured image from the article: Distribution of the ‘Yamnaya’ genetic component in the populations of Europe (data taken from Haak et al., 2015). The intensity of the colour corresponds to the contribution of this component in various modern populations