Sea Peoples behind Philistines were Aegeans, including R1b-M269 lineages

New open access paper Ancient DNA sheds light on the genetic origins of early Iron Age Philistines, by Feldman et al. Science Advances (2019) 5(7):eaax0061.

Interesting excerpts (modified for clarity, emphasis mine):

Here, we report genome-wide data from human remains excavated at the ancient seaport of Ashkelon, forming a genetic time series encompassing the Bronze to Iron Age transition. We find that all three Ashkelon populations derive most of their ancestry from the local Levantine gene pool. The early Iron Age population was distinct in its high genetic affinity to European-derived populations and in the high variation of that affinity, suggesting that a gene flow from a European-related gene pool entered Ashkelon either at the end of the Bronze Age or at the beginning of the Iron Age. Of the available contemporaneous populations, we model the southern European gene pool as the best proxy for this incoming gene flow. Last, we observe that the excess European affinity of the early Iron Age individuals does not persist in the later Iron Age population, suggesting that it had a limited genetic impact on the long-term population structure of the people in Ashkelon.

Ancient genomes (marked with color-filled symbols) projected onto the principal components inferred from present-day west Eurasians (gray circles). The newly reported Ashkelon populations are annotated in the upper corner.

Genetic discontinuity between the Bronze Age and the early Iron Age people of Ashkelon

In comparison to ASH_LBA, the four ASH_IA1 individuals from the following Iron Age I period are, on average, shifted along PC1 toward the European cline and are more spread out along PC1, overlapping with ASH_LBA on one extreme and with the Greek Late Bronze Age “S_Greece_LBA” on the other. Similarly, genetic clustering assigns ASH_IA1 with an average of 14% contribution from a cluster maximized in the Mesolithic European hunter-gatherers labeled “WHG” (shown in blue in Fig. 2B) (15, 22, 26). This component is inferred only in small proportions in earlier Bronze Age Levantine populations (2 to 9%).

In agreement with the PCA and ADMIXTURE results, only European hunter-gatherers (including WHG) and populations sharing a history of genetic admixture with European hunter-gatherers (e.g., as European Neolithic and post-Neolithic populations) produced significantly positive f4-statistics (Z ≥ 3), suggesting that, compared to ASH_LBA, ASH_IA1 has additional European-related ancestry.

We find that the PC1 coordinates positively correlate with the proportion of WHG ancestry modeled in the Ashkelon individuals, suggesting that WHG reasonably tag a European-related ancestral component within the ASH_IA1 individuals.

We plot the ancestral proportions of the Ashkelon individuals inferred by qpAdm using Iran_ChL, Levant_ChL, and WHG as sources ±1 SEs. P values are annotated under each model. In cases when the three-way model failed (χ2P < 0.05), we plot the fitting two-way model. The WHG ancestry is necessary only in ASH_IA1.

The best supported one (χ2P = 0.675) infers that ASH_IA1 derives around 43% of ancestry from the Greek Bronze Age “Crete_Odigitria_BA” (43.1 ± 19.2%) and the rest from the ASH_LBA population.

(…) only the models including “Sardinian,” “Crete_Odigitria_BA,” or “Iberia_BA” as the candidate population provided a good fit (χ2P = 0.715, 49.3 ± 8.5%; χ2P = 0.972, 38.0 ± 22.0%; and χ2P = 0.964, 25.8 ± 9.3%, respectively). We note that, because of geographical and temporal sampling gaps, populations that potentially contributed the “European-related” admixture in ASH_IA1 could be missing from the dataset.

The transient impact of the “European-related” gene flow on the Ashkelon gene pool

The ASH_IA2 individuals are intermediate along PC1 between the ASH_LBA ones and the earlier Bronze Age Levantines (Jordan_EBA/Lebanon_MBA) in the west Eurasian PCA (Fig. 2A). Notably, despite being chronologically closer to ASH_IA1, the ASH_IA2 individuals position closer, on average, to the earlier Bronze Age individuals.

See more information on Y-DNA SNP calls, including ASH067 as R1b-M269 (xL151).

The transient excess of European-related genetic affinity in ASH_IA1 can be explained by two scenarios. The early Iron Age European-related genetic component could have been diluted by either the local Ashkelon population to the undetectable level at the time of the later Iron Age individuals or by a gene flow from a population outside of Ashkelon introduced during the final stages of the early Iron Age or the beginning of the later Iron Age.

By modeling ASH_IA2 as a mixture of ASH_IA1 and earlier Bronze Age Levantines/Late Period Egyptian, we infer a range of 7 to 38% of contribution from ASH_IA1, although no contribution cannot be rejected because of the limited resolution to differentiate between Bronze Age and early Iron Age ancestries in this model.

Hg. R1b-M269 and the Aegean

I already predicted this relationship of Philistines and Aegeans (Greeks in particular) months ago, based on linguistics, archaeology, and phylogeography, although it was (and still is) yet unclear if these paternal lineages might have come from other nearby populations which might be descended from Common Anatolians instead, given the known intense contacts between Helladic and West Anatolian groups.

The alternative view: The Sea Peoples can be traced back to the Aegean, so they could also have consisted of Luwian petty kingdoms, who had formed an alliance and attacked Hatti from the south.

The deduction process for the Greek connection was quite simple:

Palaeo-Balkan populations

We know that R1b-Z2103 expanded with Yamna, including West Yamna settlers: they appear in Vučedol, which means they formed part of the earliest expansion waves of Yamna settlers into the Carpathian Basin, and they also appear scattered among Bell Beakers (apart from dominating East Yamna and Afanasevo), which suggests that they were possibly one of the most successful lineages during the late Repin/early Yamna expansion.

The “Steppe ancestry” associated with I2a-L699 samples among Balkan BA peoples may have also been associated with recent Bronze Age expansions, and this haplogroup’s presence among modern Balkan peoples may also suggest that it expanded with Palaeo-Balkan languages. Nevertheless, we don’t know which specific lineages and “Steppe ancestry” they represent, sadly.

These samples may well be related to remnants of previous Balkan populations like Cernavodă or Ezero, because there has been no peer-reviewed attempt at distinguishing Khvalynsk-/Novodanilovka- from Sredni Stog- from Yamnaya-related populations (see here), and some groups that are associated with this ancestry, like Corded Ware, are known to be culturally distinct from Yamna.

In any case, Proto-Greeks from the southern Balkans (say, Sitagroi IV and related groups) are probably going to show, based on Palaeo-Balkan substrate and Pre-Greek substrate and on the available Mycenaean samples, a process of decreasing proportion of R1b-Z2103 lineages relative to local ones, and a relatively similar cline of Yamna:EEF ancestry from northern to southern areas, at least in the periods closest to the Yamna expansion.

NOTE. The finding of “archaic” R1b-L389 (R1b-V1636) and R1a-M198 subclades among modern Greeks and the likely Neolithic origin of these paternal lineages around the Caucasus suggest that their presence in Greece may be from any of the more recent migrations that have happened between Anatolia and the Balkans, especially during the Common Era, rather than Indo-Anatolian migrations; probably very very recently.

Bronze Age cultures in the Balkans and the Aegean. See full map including ancient samples with Y-DNA, mtDNA, and ADMIXTURE.

Minoans and haplogroup J

In the Aegean, it is already evident that the population changed language partly through cultural diffusion, probably through elite domination of Proto-Greek speakers. Whether that happened before the invasion into the Greek Peninsula or after it is unclear, as we discussed recently, because we only have one reported Y-chromosome haplogroup among Mycenaeans, and it is J (probably continuing earlier lineages).

Now we have more samples from the so-called Emporion 2 cluster in Olalde et al. (2019), which shows Mycenaean-like eastern Mediterranean ancestry and 3 (out of 3) samples of haplogroup J, which – given the origin of the colony in Phocea – may be interpreted as the prevalence of West Anatolian-like ancestry and lineages in the eastern part of the Aegean (and possibly thus south Peloponnese), in line with the modern situation.

NOTE. It does not seem likely that those R or R1b-L23 samples from the Emporion 1 cluster are R1b-Z2103, based on their West European-like ancestry, although they still may be, because – as we know – ancestry (unlike haplogroup) changes too easily to interpret it as an ancestral ethnolinguistic marker.

PCA of ancient samples related to the Aegean, with Minoans, Mycenaeans (including the Emporion 2 cluster in the background) Anatolia N-Ch.-BA and Levantine BA-LBA populations, including Tel Shadud samples. See more PCAs of ancient Eurasian populations.

Greeks and haplogroup R1b-M269

Therefore, while the presence of R1b-Z2103 among ancient Balkan peoples connected to the Yamna expansion is clear, one might ask if R1b-Z2103 really spread up to the Peloponnese by the time of the Mycenaean Civilization. That has only one indirect answer, and it’s most likely yes.

We already had some R1b-Z2103 among Thracians and around the Armenoid homeland, which offers another clue at the migration of these lineages from the Balkans. The distribution of different “archaic” R1b-Z2103 subclades among modern Balkan populations and around the Aegean offered more support to this conclusion.

But now we have two interesting ancient populations that bear witness to the likely intrusion of R1b-M269 with Proto-Greeks:

An Ancient Greek of hg. R1b

A single ancient sample supports the increase in R1b-Z2103 among Greeks during the “Dorian” invasions that triggered the Dark Ages and the phenomenon of the Aegean Sea Peoples. It comes from a Greek lab study, showing R1b1b (i.e. R1b-P297 in the old nomenclature) as the only Y-chromosome haplogroup obtained from the sampling of the Gulf of Amurakia ca. 470-30 BC, i.e. before the Roman foundation of Nikopolis, hence from people likely from Anaktorion in Ancient Acarnania, of Corinthian origin.


Even with the few data available – and with the caution necessary for this kind of studies from non-established labs, which may be subject to many different kinds of errors – one could argue that the western Greek areas, which received different waves of migrants from the north and shows a higher distribution of R1b-Z2103 in modern times, was probably more heavily admixed with R1b-Z2103 than southern and eastern areas, which were always dominated by Greek-speaking populations more heavily admixed with locals.

The Dorian invasion and the Greek Dark Ages may thus account for a renewed influx of R1b-Z2103 lineages accompanying the dialects that would eventually help form the Hellenic Koiné. In a sense, it is only natural that demographically stronger populations around the Bronze Age Aegean would suffer a limited (male) population replacement with the succeeding invasions, starting with a higher genetic impact in the north-west and diminishing as they progressed to the south and the east, coupled with stepped admixture events with local populations.

This would be therefore the late equivalent of what happened at the end of the 3rd millennium BC, with Mycenaeans and their genetic continuity with Minoans.

Distribution of Pre-Greek place-names ending in -ssos/-ssa or -sos/-sa. See original images and more on the south/east cline distribution of Pre-Greek place-names here.

Sea peoples of hg. R1b-M269

Thanks to Wang et al. (2018) supplementary materials we knew that one of the two Levantine LBA II samples from Tel Shadud (final 13th–early 11th c. BC) published in van den Brink (2017) was of hg. R1b-M269 – in fact, the one interpreted as a Canaanite official residing at this site and emulating selected funerary aspects of Egyptian mortuary culture.

Both analyzed samples, this elite individual and a commoner of hg. J buried nearby, were genetically similar and indistinguishable from local populations, though:

Principal Components Analysis of L112 and L126 was carried out within the framework described in Lazaridis et al. (2016). This analysis showed that the two individuals cluster genetically, with similar estimated proportions of ancestry from diverse West Eurasian ancestral sources. These results are consistent with the hypothesis that they derive from the same population, or alternatively that they derive from two quite closely related populations.

We know that ancestry changes easily within a few generations, so there was not much information to go on, except for the fact that – being R1b-M269 – this individual could trace his paternal ancestor at some point to Proto-Indo-Europeans.

One might think that, because many haplogroups in this spreadsheet were wrong, this is also wrong; nevertheless, many haplogroups are correctly identified by Yleaf, and finding R1b-M269 in the Levant after the expansion of Sea Peoples could not be that surprising, because they were most likely related to populations of the Aegean Sea. Any other related hg. R1b (R1b-M73, R1b-V88, even R1b-V1636) wouldn’t fit as well as R1b-M269.


However, the early expansion of Proto-Indo-Aryans into the Middle East, as well as the later expansion of Armenians from the Balkans through Anatolia and of West Iranians from the east may have all potentially been related to this sample. But still, the previous linguistic and archaeological theories concerning the Philistines and the expansion of Sea Peoples in the Levant made this sample a likely (originally) Greek “Dorian” lineage, rather than the other (increasingly speculative) alternatives.

In any case, it was obvious to anyone – that is, to anyone with a minimum knowledge of how population genomics works – that just the two samples from van den Brink (2017) couldn’t be used to get to any conclusions about the ancestral origin of these individuals (or their differences) beyond Levantine peoples, because their ancestry was essentially (i.e. statistically) the same as the other few available ancient samples from nearby regions and similar periods.

If anything, the PCA suggested an origin of the R1b sample closer to Aegean populations relative to the J individual (see PCA above), and this should have been supported also by amateur models, without any possible confirmation (as with the ASH_IA2 cluster in this paper). However, if you have followed online discussions of Tel Shadud R1b-M269 sample since it was mentioned first on Eupedia months ago – including another wave of misguided speculation based on the ancestry of both individuals triggered by a discussion on this blog -, you have once more proof of how misleading ancestry analyses can be in the wrong hands.

NOTE. This is the Nth proof (and that only in 2019) of how it’s best to just avoid amateur analyses and interpretations altogether, as I did in the recent publication of the books. All those who didn’t take into account whatever was commented about the ancestry of these samples haven’t lost a single bit of relevant information on Levantine peoples, and have had more time for useful reads, compared to those dedicated to endless void speculation, once again gone awfully wrong, as does everything related to cocky ancient DNA crackpottery 😉

Late Bronze Age population movements in the Eastern Mediterranean and the Middle East. See full map including ancient DNA samples with Y-DNA, mtDNA, and ADMIXTURE.

Admittedly, though, even accepting the evident Mediterranean origin of this lineage, one could have argued that this sample may have been of R1b-L151 subclade, if one were inclined to support the theory that Italic peoples were behind Sea Peoples expanding east – and consequently that the ancestors of Etruscans had migrated eastward into the Aegean (e.g. into Lemnos), so that it could be asserted that Tyrsenian might have been a remnant language of an ancient population of northern Italy.


Fortunately, some of the samples recovered in Feldman et al. (2019) that could be analyzed (those of the cluster ASH_IA1) offer a very specific time frame where European ancestry appeared (ca. 1250 BC) before it subsequently became fully diluted (as seen in cluster ASH_IA2) among the prevalent Levantine ancestry of the area.

Also fortunately, this precise cluster shows another R1b-M269 sample, likely R1b-Z2103 (because it is probably xL151), and this sample together with others from the same cluster prove that the ancestry related to the original southern European incomers was:

  1. Recent, related thus to LBA population movements, as expected; and
  2. More closely related to coeval Aegeans, including Mycenaeans with Steppe-related ancestry.

NOTE. I say “fortunately” because, as you can imagine if you have dealt with amateurish discussions long enough, without this cluster with evident Aegean ancestry and the R1b-M269 (Z2103) sample precisely associated to it, some would enter again in endless comment loops created by ancestry magicians, showing how Aegean peoples were not behind Sea Peoples, or not behind Philistines, or not behind the R1b-M269 among Philistines, depending on their specific agendas.

Map of the Sea People invasions in the Aegean Sea and Eastern Mediterranean at the end of the Late Bronze Age (blue arrows).. Some of the major cities impacted by the raids are denoted with historical dates. Inland invasions are represented by purple arrows. From Kaniewski et al. (2011). Some of the major cities impacted by the raids are denoted with historical dates. Inland invasions are represented by purple arrows.

The results of the paper don’t solve the question of the exact origin of all Sea Peoples (not even that of Philistines), but it is quite clear that most of those forming this seafaring confederation must have come from sites around the Aegean Sea. This supports thus the traditional origin attributed to them, including a hint at the likely expansion of Eastern Mediterranean ancestry and lineages into the Italian Peninsula precisely from the Aegean, as some oral communications have already disclosed.

As an indirect conclusion from the findings in this paper, then, we can now more confidently support that Tyrsenian speakers most likely expanded into the Appenines and the Alps originally from a Tyrsenian-speaking LBA population from Lemnos, due to the social unrest in the whole Aegean region, and might have become heavily admixed with local Italic peoples quite quickly, as it happened with Philistines, resulting in yet another case of language expansion through (the simplistically called) elite domination.


Even more interesting than these specific findings, this paper confirms yet another hypothesis based on phylogeography, and proves once again two important starting points for ancient DNA interpretation that I have discussed extensively in this blog:

  • The rare R1b-M269 Y-chromosome lineage of Tel Shadud offered ipso facto the most relevant clue about the ancestral geographical origin of this Canaanite elite male’s paternal family, most likely from the north-west based on ancient phylogeography, which indirectly – in combination with linguistics and archaeology – supported the ancestral ethnolinguistic identification of Philistines with the Aegean and thus with (a population closest to) Ancient Greeks.
  • Ancestry analyses are often fully unreliable when assessing population movements, especially when few samples from incomplete temporal-geographical transects are assessed in isolation, because – unlike paternal (and maternal) haplogroups – ancestry might change fully within a few generations, depending on the particular anthropological setting. Their investigation is thus bound by many limitations – of design, statistical, and anthropological (i.e. archaeological and linguistic) – which are quite often not taken into account.

These cornerstones of ancient DNA interpretation have been already demonstrated to be valid not only for Levantine populations, as in this case, but also for Balkan peoples, for Bell Beakers, for steppe populations (like Khvalynsk, Sredni Stog, Yamna, Corded Ware), for Basques, for Balto-Slavs, for Ugrians and Samoyeds, and for many other prehistoric peoples.

I rest my case.


Balto-Slavic accentual mobility: an innovation in contact with Balto-Finnic


Some very specific prosodic innovations affected the Balto-Slavic linguistic community, probably at a time when it already showed internal dialectal differences. Whether those innovations were related to archaic remnants stemming from the parent Proto-Indo-European language, and whether that disintegrating community included different dialects, remains an object of active debate.

“Archaic” Balto-Slavic?

The main question about Balto-Slavic is whether this concept represents a single community, or it was rather a continuum formed by two (Baltic and Slavic) or possibly three (East Baltic, West Baltic, Slavic) neighbouring communities, speaking closely related Northern European dialects, which just happened to evolve very close to each other, i.e. in cultures that were closer to each other than they were to Germanic or Balto-Finnic.

In my opinion, their similarities warrant the reconstruction of a single original central-east European community since the dissolution of Bell Beakers, speaking a North-West Indo-European dialect, and most internal differences between Baltic and Slavic may be explained as innovations. The precise identification of a Proto-Balto-Slavic community remains elusive, although the Unetice-Iwno-Mierzanowice triangle remains the best bet, with Trzciniec showing what seems like an Early Slavic-like population reaching up to the East Baltic.

Bell Beaker expansion in eastern Europe and around the Baltic.

The reconstruction of a common Balto-Slavic proto-language is known to range from difficult to impossible, depending on who you ask, not the least because of the differences that are discussed in this post, and which have been the own battlefield created by Balticists and Slavicists for decades. The old tenet that Balto-Slavic had inherited some traits directly from PIE is – in contrast with e.g. the Italo-Celtic concept – surprisingly vivid still today.

Take, for example, these internal differences and supposedly archaic traits:

  • The ruKi rule, where Baltic shows mostly *is, *us, and Slavic shows *, *; or the different output of Satemization in Baltic compared to Slavic (and both compared to Indo-Iranian). Nevertheless, the Satemization trends in Balto-Slavic and Indo-Iranian are usually explained together and taken as a sign of a traditional three-velar system for PIE.
    • If you consider Satemization as a late trend in Balto-Slavic, affecting each dialect in a different way, and thus Balto-Slavic phonetic evolution clearly distinct from the Indo-Iranian trend, rejecting trictectalism, this problem is solved. This would also solve the impossible Indo-Slavonic problem, and the paradox of Balto-Slavic sharing a genetic phylum with Germanic and Italo-Celtic.
    • If you, however, conflate these differences and North-West Indo-European features with an ad hoc explanation of a hypothetic Centum dialect called Temematic, which intends to solve their (in Holzer’s words) unlösbaren inconsistencies, you essentially add a whole new inconsistency without solving their previous ones. For a full rebuttal of Holzer‘s Temematic etymologies, see Matasović (2014).
  • Kortlandt’s reconstruction of a PIE 3rd singular *-e (Baltic from *-et, Slavic from *-eti) and 3rd plural *-o, which would have been replaced independently in other Indo-European dialects (by *-eti, *-onti), is reminiscent of his own reconstruction of laryngeals almost up to the attestation of all Indo-European dialects, including Baltic. If you consider these traits an innovation, this artificially created problem is immediately solved.
  • Genitive plural Pre-Baltic *-ōm vs. Pre-Slavic *-ŏm is another commonly cited example. However, I would place this difference among other similar differences found within other related IE dialects, hence a common phonetic innovation (see e.g. below for the classicist view of unstable obliques).
  • Kortlandt’s reconstruction of oblique cases in *-m-, shared with Germanic, as stemming from a common Middle PIE *-mus (based essentially on Old Lithuanian *-mus and on a non-existent equivalent Anatolian formation), hence different from those in *-bʰ-. While you can argue for infinite more reasonable alternatives, the most often cited one is the ins.-dat. pl. *-bʰ- as a common NWIE innovation based on ins. sg. *bʰi-, while forms in *-m- (including ins. sg.) as a Northern European phonetic innovation. The simplest, most elegant explanation I’ve read to date (I think by Rémy Viredaz) is the similar bilabial change of Giacobo/Giacomo in Italian…

As you can see, some Balto-Slavicists could have written whole books about how their object of study holds the key to solve problems on common Proto-Indo-European paradigms, some of which wouldn’t need solving if they hadn’t been started by Balto-Slavicists themselves…

While all of these “archaic” traits are easily dismissed without further ado (except for some understandable damaged pride among academics), there is one especially pervasive idea among those willing to find the white whale of laryngeal remnants in Indo-European languages (see here for other examples of dubious laryngeal remains).

The prophecy before the battle, Józef Ryszkiewicz, 1890. Or, how to conjure laryngeal remnants in Balto-Slavic.

Accentual development in contact

Whichever position one prefers, the general argument is that the Balto-Slavic accentual system is non-trivial for the classification of both dialects into a common branch. However, that would only be completely true if it were a common innovation, but not so much if it were a natural laryngeal evolution.

In fact, the broken tone preserving a PIE laryngeal, as proposed by Kortlandt – continuing Meillet’s idea of synchronous PIE-PBS developments – was always very difficult to accept. Even the rising pronunciation is not original, and represents a shift of the accent on the initial syllable in Latvian…

In my opinion, the derivation of a modern phenomenon from a PIE laryngeal must always raise a red flag (see below on archaisms vs. innovations in IE languages). As you can see from my take of the fable in Balto-Slavic, which uses Kortlandt’s reconstruction, I preferred not to take into account the reconstructed accents. The fable remains thus a model of what could have been a common Proto-Balto-Slavic, unlike other reconstructions, which are much less tentative.

NOTE. You could argue that accents may be reconstructed in spite of the wrong theory behind them, but this is not true; at least not of all reconstructed accents, some of which require further assumptions. Think about it this way: I wouldn’t take into account a reconstruction of Germanic accent which used Danish glottalized tone for a hypothetical Proto-Germanic laryngeal, even if most accents seemed correct at first sight. The truth is, I didn’t want to dedicate time to go through each reconstructed word and its explanation, so it was easier to delete them all, even though that’s not an actual solution, either. You will find the same doubts in the description of Balto-Slavic evolution in my old Modern Indo-European grammar. The introduction to IE dialects was partially copied from Wikipedia (which, in the case of Balto-Slavic, essentially summarized data from Kortlandt), but in the grammar I just tried to keep the basics, and not very successfully, because you need a comprehensive and coherent description of a language’s evolution. That’s how messed up the question was, and how it still is, even though 15 years of research have passed…

Despite the idea of an “archaic Balto-Slavic”, especially prevalent among older researchers, the current trend is to consider Balto-Slavic prosodic changes as a natural innovation, even among those who would artificially reconstruct laryngeal remnants up to late Balto-Slavic stages.

NOTE. You can read more about the Proto-Indo-European laryngeal loss and vocalism. While the presence of certain laryngeals up to Late PIE is certain, the loss in many environments is also generally agreed upon. This is especially true of a hypothetical Indo-Slavonic branch, like that supported by Kortlandt: even those supporting multiple laryngeal loss events must admit that Indo-Iranian showed no laryngeals before its disintegration, whether they put this loss as an internal Proto-Indo-Iranian evolution, or they place it earlier. Tocharian attests to an evolution similar to the rest of Late PIE dialects (hence to a quite early laryngeal loss trend), and Balkan dialects (supposedly splitting before Indo-Slavonic) also lost laryngeals in a similar way, except for initial ones, which show vocalic output instead of full loss.

So, where does a laryngeal loss fit in this “Indo-Slavonic” scheme, exactly? Before the Tocharian split? Before the Balkan split? After the Balkan split but before the full loss in Indo-Iranian? And where exactly does this group belong regarding Corded Ware, and where does Germanic? No idea (but you can read Kortlandt try fitting his model with Gimbutas’ “Kurgan peoples”). Because one thing is to reconstruct Proto-Greek, or Proto-Celtic, or Proto-Italic forms without laryngeals and to put them in relation with a purely theoretical three-laryngeal PIE, and a different one is to reconstruct laryngeals (including in environments which were already lost in Tocharian) up to Proto-Baltic and Proto-Slavic, which seems more than just a bit of a stretch…

Indo-European dialectal relationships, from Mallory and Adams (2006).

Thomas Olander offered a summary of the current positions regarding the Balto-Slavic accentual system recently in Indo-European heritage in the Balto-Slavic accentuation system (2013), which also contains a summary of his Mobility Law, to explain this phenomenon as a common Pre-Baltic and Pre-Slavic innovation.

Andersen, an advocate of different Baltic and Slavic dialects developing in contact with Satem dialects, suggested in The Satem Languages of the Indo-European Northwest. First Contacts? (2009), partially based on Olander’s initial proposal, that Baltic and Slavic accentual mobility arose as a result of contact with languages with fixed word-initial ictus: the accent was lost in the word-final mora in pre-Proto-Baltic and, independently, in pre-Proto-Slavic. Hence, the central innovation, the accent loss

technically is not a shared Slavic and Baltic innovation. On the contrary. It shows that the speakers of the Pre-Slavic and Pre-Baltic dialects formed bilingual communities with speakers of contact dialects that were of the same prosodic type, viz. had fixed initial ictus but no free accent.

In the meantime, Olander (2019) has found out about more real-world examples of this same phenomenon:

Prosodic features are known to be susceptible to contact influence (Salmons 1992:1 and passim). While it does not directly influence the evaluation of the Mobility Law as a non-trivial innovation, it is interesting that most of the alleged parallels are indeed considered to be contact-induced changes due to influence from languages with an ictus on the word-initial syllable (Andersen 2009: 11-14; Rinkevičius 2013): Balto-Fennic in the case of the Karelian and (perhaps through Latvian as an intermediary) Žemaitian dialects, and Hungarian in the case of the Slavonian dialects (for Karelian see Jakobson 1938/2002: 239; Veenker 1967: 74; Thomason & Kaufman 1988: 122, 241; Salmons 1992: 41- 42; for Žemaitian see Zinkevičius 1966: 45- 46; for Slavonian see Ivić 1958: 287).

I am not aware of any hypotheses on a contact-induced origin for Greek prosodic innovations, but it is at least worth noting that there is agreement on significant substrate influence on Greek. While we may speculate that these substrate language(s) had word-initial ictus like Balto-Fennic and Hungarian, we do not have any actual information about the prosodic system(s) (thus even Beekes 2014: 9, who in other respects provides a fairly detailed picture of the substrate).

The parallels from other speech varieties show that an accent loss of the type suggested for a pre-stage of Baltic and Slavic is a type of prosodic change that has occurred several times in different various systems. In the context of the present paper this means that the sound law itself cannot be classified as a non-trivial innovation; it may have taken place in already differentiated dialects or languages. Also, the parallels suggest that a loss of the accent may be the result of influence from languages with fixed word-initial ictus.

In this time when even linguists agree that substrate/contact languages have to be related to specific ethnolinguistic groups (see here for Germanic), the fact that Olander stops short of naming this substrate behind Pre-Baltic and Pre-Slavic as being Late Uralic in general, or Balto-Finnic in particular, is surprising.

NOTE. Not the least because Olander is part of the Homeland Timeline map project of the Copenhagen group (their website is not working right now), and they placed Volosovo as Uralians expanding with Netted Ware in contact with the Baltic during the Bronze Age…So what’s to doubt about Balto-Slavic – Balto-Finnic contacts, exactly? Maybe if Balto-Finnic was the substrate language behind Balto-Slavic (as it was in Germanic), it would mean that Uralic languages were previously spoken in territories that became later Germanic- and Balto-Slavic-speaking?

Still image from the Copenhagen Timeline Map (accessed one year ago), showing in green Volosovo hunter-gatherers who, according to the map, later expand to the north-east with Netted Ware…

Archaism vs. Innovation

If we tried to describe these trends of explaining peculiar traits in recent Indo-European dialects as archaism vs. innovation from a purely theoretical point of view, we could roughly distinguish two different positions (with infinite variants, of course) among academics – just like we could find people more inclined to leftist or rightist trends when speaking about economy. When it comes to linguistics, which is the least messed-up field where one can describe Indo-European and Indo-Europeans, I think we can find two alternative basic tenets:

  • One idea would hold that the oldest attested dialects – and those with an older guesstimated proto-language – are the gold standard as to what the original situation may have been, and about what could be described as an archaism. For example, Ancient Greek and Mycenaean or Vedic Sanskrit for old dialects; Tocharian, or Italic dialects for those with quite old guesstimates, each for different reasons; and Anatolian for both, old dialect and attested early.
  • NOTE. Nevertheless, the phonology of Anatolian inscriptions is often difficult to ascertain, and its ancient dialectal nature stemming from a Middle PIE stage may still be disputed by some. The archaic nature of Tocharian seems to be maybe less generally accepted than that of Anatolian, but I would say there is general consensus on the matter today.

  • The other general idea would support that the most isolated dialects are those which may hold the key to the oldest Indo-European traits, somehow hidden from external influences and areal contacts, and thus from generalized innovative trends that have affected the best known ancient dialects. In that sense, languages like Slavic, Baltic, Albanian, or Armenian – as well as some Balkan fragmentary dialects – are quite common aims of study to reveal exceptional PIE traits.

I think the education system in Southern Europe and South Asia is that of formal classicists. In eastern Europe, I’d reckon the education system – especially in regions that were never connected to the Graeco-Roman tradition – favours linguistics as a study of the own and related proto-languages. For northern Europe, I would say it’s 50/50, especially in Scandinavia, depending on whether classicists or linguists dominate over the departments of Indo-European. For example, while Germany or Austria would maybe lean more toward the classics, Copenhagen’s obsession with Germanic as the most archaic IE branch is well known…

A 17th-century birch bark manuscript of Pāṇini’s grammar treatise from Kashmir. Image from Wikipedia.

Both positions, when blindly accepted, are bound to fail at some point or another:

  • If you take Classical Sanskrit, Classical Greek, or Classical Latin as an example of Proto-Indo-European, you are bound to make radical mistakes when reconstructing the parent language, more so if you disregard the oldest attested layers of the languages. An interesting view of the so-called Adradists at the Complutense University of Madrid – apart from their famous 9-laryngeal reconstruction – is that Middle PIE had only 5 cases, with a general (unstable) oblique one in Late PIE that later evolved into the attested 5 to 8 cases in the different dialects. That is, in my opinion, a fairly typical classicist error, which would be easily addressed by taking into account the oldest stages, like those attested in Mycenaean and in Old Latin, instead of focusing on classical grammar. The 8-case system is, in fact, one of the few true Balto-Slavic archaisms, supported by external comparanda.
  • On the other hand, if you take Albanian, Armenian, Baltic or Slavic, or even phonetically dubious data like those from some Anatolian inscriptions, you can eventually argue for anything. And I really mean anything; you are leaving the logic door wide open for any crazy-ass opinion about Proto-Indo-European based on traits found in modern languages: From how many velars evolved (if at all, because you may find all of them in Luwian, or still living in Albanian or in Armenian…) and their nature as ejective consonants in Late PIE (based on Armenian or Germanic); to how many laryngeals and when these laryngeals disappeared (if they actually did disappear, because some may even find them in Modern Lithuanian, in Armenian, or in Danish…); etc. Once you believe your own romantic view of some modern language(s) retaining traits from five thousand years ago, there is no stopping that; not for you, but not for anyone else, either.

NOTE. One of the funniest consequences of this type of ‘worldview’, where one assumes that – the own interpretations of – modern dialects are as reliable (or even more so than) ancient ones, and that Indo-European dialects somehow split at the same time from the parent language (so there was one common “full laryngeal” language, and then all attested dialects evolved from it) are some of the theories that you can easily find posted on Facebook’s group on Proto-Indo-European. Let’s just say, for the sake of simplicity, that you can compare English ‘sunrise’ with Spanish ‘sonrisa’ “smile” all you want, and assert that both reveal a common origin in PIE *sup- hence from the Sun and the smile going “up” or something, but any explanation as to how you reached that conclusion doesn’t make for the why this comparison shouldn’t have even started at all. Now replace English and Spanish with Armenian, Slavic, and/or Albanian, invent some new IE sound law, throw one or two laryngeals in the mix, and somehow this might get a pass among certain linguists…

The Celebration of Svetovid on Rügen, Alphonse Mucha, The Slav Epic. Image from Wikipedia. Were Early Slavs some among a selected few romantic peoples to keep the “true” Indo-European language and traditions? Of course not.

While no one can deny the value of different Indo-European branches for the reconstruction of the parent language, no matter how recently they were attested, the only reasonable solution whenever a difficult case arises is to trust ancient dialects more than recent ones. Using data from fringe theories based on recent dialects to build a Proto-Indo-European paradigm, especially when there is contradictory data from ancient IE dialects, is flawed for two reasons:

  1. Languages attested later – especially after periods of population movements and contacts – would show, in general, a greater degree of change. Preferring Old Slavic or Classical Armenian to reconstruct Indo-European over ancient dialects like Ancient Greek, Vedic Sanskrit, or ancient Italic dialects is, in a way, like taking Byzantine Greek, Pali, or Old French as models, respectively.
  2. Classical languages are indeed modified due to the action of grammarians, but once standardized these “languages behind a state” (or religion) are less prone to change, due to the transmission of oral (and written) literature, education, commerce, etc. Languages left to unorganized tribes are less constrained in their evolution, and their internal (substrate) and external (contact) influences are greater and (what’s worse) unknown.

Baltic and Slavic, like Albanian or Armenian, are dialects attested very recently, which may have undergone complex internal and external influences we may never fully understand. Confronted with controversial or inexplicable traits compared to ancient branches like Greek, Indo-Iranian, or Italo-Celtic (especially if they fit with other Indo-European dialects), the conservative solution that will be right most of the time (and I mean 99.9999% of cases) is to assume they represent an innovation over Late PIE.

The fact that some researchers still use these recent dialects as a blank canvas instead, in order to propose unending new ideas about how to reconstruct IE proto-languages, or even older common PIE stages, is shocking. Not “R1a/Steppe” vs. “N1c/Siberian” haplogroup+ancestry bullshit-level shocking, but still unacceptable in a serious academic environment.

The only reason why Balto-Slavicists have failed so many times in this “unsolvable” question that seems to be Proto-Balto-Slavic reconstruction, apart from the known differences between Baltic and Slavic, is precisely the fixation of many with their object of study as a model for other IE languages (and thus for PIE), instead of taking the rest as a model for the reconstruction of Balto-Slavic (or of Proto-Baltic and Proto-Slavic).

Repeating ad nauseam the popular concept of Balto-Slavic (or Baltic and Slavic) being among the most archaic IE dialects, or the slowest evolving IE dialects, and cheap nationalist slogans of the sort, does not help this aim, and just reading or hearing that should make anyone cringe instantly. Not less than reading or hearing about Sanskrit being essentially equal to PIE, or spoken in the Indus Valley 10,000 years ago. Because we are not living in the 19th century, mind you.


A Song of Sheep and Horses, revised edition, now available as printed books


As I said 6 months ago, 2019 is a tough year to write a blog, because this was going to be a complex regional election year and therefore a time of political promises, hence tenure offers too. Now the preliminary offers have been made, elections have passed, but the timing has slightly shifted toward 2020. So I may have the time, but not really any benefit of dedicating too much effort to the blog, and a lot of potential benefit of dedicating any time to evaluable scientific work.

On the other hand, I saw some potential benefit for publishing texts with ISBNs, hence the updates to the text and the preparation of these printed copies of the books, just in case. While Spain’s accreditation agency has some hard rules for becoming a tenured professor, especially for medical associates (whose years of professional experience are almost worthless compared to published peer-reviewed papers), it is quite flexible in assessing one’s merits.

However, regional and/or autonomous entities are not, and need an official identifier and preferably printed versions to evaluate publications, such as an ISBN for books. I took thus some time about a month ago to update the texts and supplementary materials, to publish a printed copy of the books with Amazon. The first copies have arrived, and they look good.


Corrections and Additions

I have changed the names and order of the books, as I intended for the first publication – as some of you may have noticed when the linguistic book was referred to as the third volume in some parts. In the first concept I just wanted to emphasize that the linguistic work had priority over the rest. Now the whole series and the linguistic volume don’t share the same name, and I hope this added clarity is for the better, despite the linguistic volume being the third one.

Uralic dialects
I have changed the nomenclature for Uralic dialects, as I said recently. I haven’t really modified anything deeper than that, because – unlike adding new information from population genomics – this would require for me to do a thorough research of the most recent publications of Uralic comparative grammar, and I just can’t begin with that right now.

Anyway, the use of terms like Finno-Ugric or Finno-Samic is as correct now for the reconstructed forms as it was before the change in nomenclature.


The most interesting recent genetic data has come from Iberia and the Mediterranean. Lacking direct data from the Italian Peninsula (and thus from the emergence of the Etruscan and Rhaetian ethnolinguistic community), it is becoming clearer how some quite early waves of Indo-Europeans and non-Indo-Europeans expanded and shrank – at least in West Iberia, West Mediterranean, and France.

Some of the main updates to the text have been made to the sections on Finno-Ugric populations, because some interesting new genetic data (especially Y-DNA) have been published in the past months. This is especially true for Baltic Finns and for Ugric populations.


Consequently, and somehow unsurprisingly, the Balto-Slavic section has been affected by this; e.g. by the identification of Early Slavs likely with central-eastern populations dominated by (at least some subclades of) hg. I2a-L621 and E1b-V13.

I have updated some cultural borders in the prehistoric maps, and the maps with Y-DNA and mtDNA. I have also added one new version of the Early Bronze age map, to better reflect the most likely location of Indo-European languages in the Early European Bronze Age.

As those in software programming will understand, major changes in the files that are used for maps and graphics come with an increasing risk of additional errors, so I would not be surprised if some major ones would be found (I already spotted three of them). Feel free to communicate these errors in any way you see fit.

European Early Bronze Age: tentative langage map based on linguistics, archaeology, and genetics.

I have selected more conservative SNPs in certain controversial cases.

I have also deleted most SNP-related footnotes and replaced them with the marking of each individual tentative SNP, leaving only those footnotes that give important specific information, because:

  • My way of referencing tentative SNP authors did not make it clear which samples were tentative, if there were more than one.
  • It was probably not necessary to see four names repeated 100 times over.
  • Often I don’t really know if the person I have listed as author of the SNP call is the true author – unless I saw the full SNP data posted directly – or just someone who reposted the results.
  • Sometimes there are more than one author of SNPs for a certain sample, but I might have added just one for all.
More than 6000 ancient DNA samples compiled to date.

For a centralized file to host the names of those responsible for the unofficial/tentative SNPs used in the text – and to correct them if necessary -, readers will be eventually able to use Phylogeographer‘s tool for ancient Y-DNA, for which they use (partly) the same data I compiled, adding Y-Full‘s nomenclature and references. You can see another map tool in ArcGIS.

NOTE. As I say in the text, if the final working map tool does not deliver the names, I will publish another supplementary table to the text, listing all tentative SNPs with their respective author(s).

If you are interested in ancient Y-DNA and you want to help develop comprehensive and precise maps of ancient Y-DNA and mtDNA haplogroups, you can contact Hunter Provyn at You can also find more about phylogeography projects at Iain McDonald’s website.

I have also added more samples to both the “Asian” and the “European” PCAs, and to the ADMIXTURE analyses, too.

I previously used certain samples prepared by amateurs from BAM files (like Botai, Okunevo, or Hittites), and the results were obviously less than satisfactory – hence my criticism of the lack of publication of prepared files by the most famous labs, especially the Copenhagen group.

Fortunately for all of us, most published datasets are free, so we don’t have to reinvent the wheel. I criticized genetic labs for not releasing all data, so now it is time for praise, at least for one of them: thank you to all responsible at the Reich Lab for this great merged dataset, which includes samples from other labs.

NOTE. I would like to make my tiny contribution here, for beginners interested in working with these files, so I will update – whenever I have time – the “How To” sections of this blog for PCAs, PCA3d, and ADMIXTURE.

Detail of the PCA of European Iron Age populations. See full versions.

For unsupervised ADMIXTURE in the maps, a K=5 is selected based on the CV, giving a kind of visual WHG : NWAN : CHG/IN : EHG : ENA, but with Steppe ancestry “in between”. Higher K gave worse CV, which I guess depends on the many ancient and modern samples selected (and on the fact that many samples are repeated from different sources in my files, because I did not have time to filter them all individually).

I found some interesting component shared by Central European populations in K=7 to K=9 (from CEU Bell Beakers to Denmark LN to Hungarian EBA to Iberia BA, in a sort of “CEU BBC ancestry” potentially related to North-West Indo-Europeans), but still, I prefer to go for a theoretically more correct visualization instead of cherry-picking the ‘best-looking’ results.

Since I made fun of the search for “Siberian ancestry” in coloured components in Tambets et al. 2018, I have to be consistent and preferred to avoid doing the same here…

In the first publication (in January) and subsequent minor revisions until March, I trusted analyses and ancestry estimates reported by amateurs in 2018, which I used for the text adding my own interpretations. Most of them have been refuted in papers from 2019, as you probably know if you have followed this blog (see very recent examples here, here, or here), compelling me to delete or change them again, and again, and again. I don’t have experience from previous years, although the current pattern must have been evidently repeated many times over, or else we would be still talking about such previous analyses as being confirmed today…

I wanted to be one step ahead of peer-reviewed publications in the books, but I prefer now to go for something safe in the book series, rather than having one potentially interesting prediction – which may or may not be right – and ten huge mistakes that I would have helped to endlessly redistribute among my readers (online and now in print) based on some cherry-picked pairwise comparisons. This is especially true when predictions of “Steppe“- and/or “Siberian“-related ancestry have been published, which, for some reason, seem to go horribly wrong most of the time.

I am sure whole books can be written about why and how this happened (and how this is going to keep happening), based on psychology and sociology, but the reasons are irrelevant, and that would be a futile effort; like writing books about glottochronology and its intermittent popularity due to misunderstood scientist trends. The most efficient way to deal with this problem is to avoid such information altogether, because – as you can see in the current revised text – they wouldn’t really add anything essential to the content of these books, anyway.

Continue reading

Official site of the book series:
A Song of Sheep and Horses: eurafrasia nostratica, eurasia indouralica

Ancient Sardinia hints at Mesolithic spread of R1b-V88, and Western EEF-related expansion of Vasconic


New preprint Population history from the Neolithic to present on the Mediterranean island of Sardinia: An ancient DNA perspective, by Marcus et al. bioRxiv (2019)

Interesting excerpts (emphasis mine, edited for clarity):

On the high frequency of R1b-V88

Our genome-wide data allowed us to assign Y haplogroups for 25 ancient Sardinian individuals. More than half of them consist of R1b-V88 (n=10) or I2-M223 (n=7).

Francalacci et al. (2013) identi fied three major Sardinia-specifi c founder clades based on present-day variation within the haplogroups I2-M26, G2-L91 and R1b-V88, and here we found each of those broader haplogroups in at least one ancient Sardinian individual. Two major present-day Sardinian haplogroups, R1b-M269 and E-M215, are absent.

Compared to other Neolithic and present-day European populations, the number of identi fied R1b-V88 carriers is relatively high.

(…)ancient Sardinian mtDNA haplotypes belong almost exclusively to macro-haplogroups HV (n = 16), JT (n = 17) and U (n = 9), a composition broadly similar to other European Neolithic populations.

Geographic and temporal distribution of R1b-V88 Y-haplotypes in ancient European samples. We plot the geographic position of all ancient samples inferred to carry R1b-V88 equivalent markers. Dates are given as years BCE (means of calibrated 2s radio-carbon dates). Multiple V88 individuals with similar geographic positions are vertically stacked. We additionally color-code the status of the R1b-V88 subclade R1b-V2197, which is found in most present-day African R1b-V88 carriers.

On the origin of a Vasconic-like Paleosardo with the Western EEF

(…) the Neolithic (and also later) ancient Sardinian individuals sit between early Neolithic Iberian and later Copper Age Iberian populations, roughly on an axis that differentiates WHG and EEF populations and embedded in a cluster that additionally includes Neolithic British individuals. This result is also evident in terms of absolute genetic differentiation, with low pairwise FST ~ 0.005 +- 0.002 between Neolithic Sardinian individuals and Neolithic western mainland European populations. Pairwise outgroup-f3 analysis shows a very similar pattern, with the highest values of f3 (i.e. most shared drift) being with Neolithic and Copper Age Iberia, gradually dropping off for temporally and geographically distant populations.

In explicit admixture models (using qpAdm, see Methods) the southern French Neolithic individuals (France-N) are the most consistent with being a single source for Neolithic Sardinia (p ~ 0:074 to reject the model of one population being the direct source of the other); followed by other populations associated with the western Mediterranean Neolithic Cardial Ware expansion.

Principal Components Analysis based on the Human Origins dataset. A: Projection of ancient individuals’ genotypes onto principal component axes de fined by modern Western Eurasians (gray labels).

Pervasive Western Hunter-Gatherer ancestry in Iberian/French/Sardinian population

Similar to western European Neolithic and central European Late Neolithic populations, ancient Sardinian individuals are shifted towards WHG individuals in the top two PCs relative to early Neolithic Anatolians Admixture analysis using qpAdm infers that ancient Sardinian individuals harbour HG ancestry (~ 17%) that is higher than early Neolithic mainland populations (including Iberia, ~ 8%), but lower than Copper Age Iberians (~ 25%) and about the same as Southern French Middle-Neolithic individuals (~ 21%).

Principal Components Analysis based on the Human Origins dataset. B: Zoom into the region most relevant for Sardinian individuals.

Continuity from Sardinia Neolithic through the Nuragic

We found several lines of evidence supporting genetic continuity from the Sardinian Neolithic into the Bronze Age and Nuragic times. Importantly, we observed low genetic differentiation between ancient Sardinian individuals from various time periods.

A qpAdm analysis, which is based on simultaneously testing f-statistics with a number of outgroups and adjusts for correlations, cannot reject a model of Neolithic Sardinian individuals being a direct predecessor of Nuragic Sardinian individuals (…) Our qpAdm analysis further shows that the WHG ancestry proportion, in a model of admixture with Neolithic Anatolia, remains stable at ~17% throughout three ancient time-periods.

Present-day genetic structure in Sardinia reanalyzed with aDNA. A: Scatter plot of the rst two principal components trained on 1577 present-day individuals with grand-parental ancestry from Sardinia. Each individual is labeled with a location if at least 3 of the 4 grandparents were born in the same geographical location (\small” three letter abbreviations); otherwise with \x” or if grand-parental ancestry is missing with \?”. We calculated median PC values for each Sardinian province (large abbreviations). We also projected each ancient Sardinian individual on to the top two PCs (gray points). B/C: We plot f-statistics that test for admixture of modern Sardinian individuals (grouped into provinces) when using Nuragic Sardinian individuals as one source population. Uncertainty ranges depict one standard error (calculated from block bootstrap). Karitiana are used in the f-statistic calculation as a proxy for ANE/Steppe ancestry (Patterson et al., 2012).

Steppe influx in Modern Sardinians

While contemporary Sardinian individuals show the highest affinity towards EEF-associated populations among all of the modern populations, they also display membership with other clusters (Fig. 5). In contrast to ancient Sardinian individuals, present-day Sardinian individuals carry a modest “Steppe-like” ancestry component (but generally less than continental present-day European populations), and an appreciable broadly “eastern Mediterranean” ancestry component (also inferred at a high fraction in other present-day Mediterranean populations, such as Sicily and Greece).


Arrival of steppe ancestry with R1b-P312 in the Mediterranean: Balearic Islands, Sicily, and Iron Age Sardinia


New preprint The Arrival of Steppe and Iranian Related Ancestry in the Islands of the Western Mediterranean by Fernandes, Mittnik, Olalde et al. bioRxiv (2019)

Interesting excerpts (emphasis in bold; modified for clarity):

Balearic Islands: The expansion of Iberian speakers

Mallorca_EBA dates to the earliest period of permanent occupation of the islands at around 2400 BCE. We parsimoniously modeled Mallorca_EBA as deriving 36.9 ± 4.2% of her ancestry from a source related to Yamnaya_Samara; (…). We next used qpAdm to identify “proximal” sources for Mallorca_EBA’s ancestry that are more closely related to this individual in space and time, and found that she can be modeled as a clade with the (small) subset of Iberian Bell Beaker culture associated individuals who carried Steppe-derived ancestry (p=0.442).

Suppl. Materials: The model used was with Bell_Beaker_Iberia_highsteppe, a group of outliers from Iberia buried in a Bell Beaker mortuary context who unlike most individuals from this context in that region had high proportions of Steppe ancestry (p=0.442).

Our estimates of Steppe ancestry in the two later Balearic Islands individuals are lower than the earlier one: 26.3 ± 5.1% for Formentera_MBA and 23.1 ± 3.6% for Menorca_LBA, but the Middle to Late Bronze Age Balearic individuals are not a clade relative to non-Balearic groups. Specifically, we find that f4(Mbuti.DG, X; Formentera_MBA, Menorca_LBA) is positive when X=Iberia_Chalcolithic (Z=2.6) or X=Sardinia_Nuragic_BA (Z=2.7). While it is tempting to interpret the latter statistic as suggesting a genetic link between peoples of the Talaiotic culture of the Balearic islands and the Nuragic culture of Sardinia, the attraction to Iberia_Chalcolithic is just as strong, and the mitochondrial haplogroup U5b1+16189+@16192 in Menorca_LBA is not observed in Sardinia_Nuragic_BA but is observed in multiple Iberia_Chalcolithic individuals. A possible explanation is that both the ancestors of Nuragic Sardinians and the ancestors of Talaiotic people from the Balearic Islands received gene flow from an unsampled Iberian Chalcolithic-related group (perhaps a mainland group affiliated to both) that did not contribute to Formentera_MBA.

This sample, like another one in El Argar, is of hg. R1b-P312. So there you are, the data that connects the Proto-Iberian expansion (replacing IE-speaking Bell Beakers) to the Iberian Chalcolithic population, signaled by the increase in Iberian Chalcolithic ancestry after the arrival of Bell Beakers, most likely connected originally to the Argaric and post-Argaric expansions during the MBA.

PCA with previously published ancient individuals (non-filled symbols), projected onto variation from present-day populations (gray squares).

Steppe in Sardinia IA: Phocaeans from Italy?

Most Sardinians buried in a Nuragic Bronze Age context possessed uniparental haplogroups found in European hunter-gatherers and early farmers, including Y-haplogroup R1b1a[xR1b1a1a] which is different from the characteristic R1b1a1a2a1a2 spread in association with the Bell Beaker complex. An exception is individual I10553 (1226-1056 calBCE) who carried Y-haplogroup J2b2a, previously observed in a Croatian Middle Bronze Age individual bearing Steppe ancestry, suggesting the possibility of genetic input from groups that arrived from the east after the spread of first farmers. This is consistent with the evidence of material culture exchange between Sardinians and mainland Mediterranean groups, although genome-wide analyses find no significant evidence of Steppe ancestry so the quantitative demographic impact was minimal.

Another interesting data, these (Mesolithic) remnant R1b-V88 lineages closely related to the Italian Peninsula, the most likely region of expansion of these lineages into Africa, in turn possibly connected to the expansion of Proto-Afroasiatic.

We detect definitive evidence of Iranian-related ancestry in an Iron Age Sardinian I10366 (391-209 calBCE) with an estimate of 11.9 ± 3.7.% Iran_Ganj_Dareh_Neolithic related ancestry, while rejecting the model with only Anatolian_Neolithic and WHG at p=0.0066 (Supplementary Table 9). The only model that we can fit for this individual using a pair of populations that are closer in time is as a mixture of Iberia_Chalcolithic (11.9 ± 3.2%) and Mycenaean (88.1 ± 3.2%) (p=0.067). This model fits even when including Nuragic Sardinians in the outgroups of the qpAdm analysis, which is consistent with the hypothesis that this individual had little if any ancestry from earlier Sardinians.

Proportions of ancestry using a distal qpAdm framework on an individual basis (a), and based on qpWave clusters

Sicily EBA: The Lusitanian/Ligurian connection?

(…) While a previously reported Bell Beaker culture-associated individual from Sicily had no evidence of Steppe ancestry, (…) we find evidence of Steppe ancestry in the Early Bronze Age by ~2200 BCE. In distal qpAdm, the outlier Sicily_EBA11443 is parsimoniously modeled as harboring 40.2 ± 3.5% Steppe ancestry, and the outlier Sicily_EBA8561 is parsimoniously modeled as harboring 23.3 ± 3.5% Steppe ancestry. (…) The presence of Steppe ancestry in Early Bronze Age Sicily is also evident in Y chromosome analysis, which reveals that 4 of the 5 Early Bronze Age males had Steppe-associated Y-haplogroup R1b1a1a2a1a2. (Online Table 1). Two of these were Y-haplogroup R1b1a1a2a1a2a1 (Z195) which today is largely restricted to Iberia and has been hypothesized to have originated there 2500-2000 BCE. This evidence of west-to-east gene flow from Iberia is also suggested by qpAdm modeling where the only parsimonious proximate source for the Steppe ancestry we found in the main Sicily_EBA cluster is Iberians.

What’s this? An ancestral connection between Sicel Elymian and Galaico-Lusitanian or Ligurian (based on an origin in NE Iberia)? Impossible to say, especially if the languages of these early settlers were replaced later by non-Indo-European speakers from the eastern Mediterranean, and by Indo-European speakers from the mainland closely related to Proto-Italic during the LBA, but see below.

Regarding the comment on R1b-Z195, it is associated with modern Iberians, as DF27 in general, due to founder effects beyond the Pyrenees. It is a very old subclade, split directly from DF27 roughly at the same time as it split from the parent P312, i.e. it can be found anywhere in Europe, and it almost certainly accompanied the expansion of Celts from Central Europe under the subclade R1b-M167/SRY2627.

The connection is thus strong only because of the qpAdm modeling, since R1b-DF27 and subclade R1b-Z195 are certainly lineages expanded quite early, most likely with Yamna settlers in Hungary and East Bell Beakers.

In this case, if stemming from Iberia, it is most likely of subclade R1b-Z220 – or another Z195 (xM167) lineage – originally associated with the Old European substrate found in topo-hydronymy in Iberia, whose most likely remnants attested during the Iron Age were Lusitanians.

Left: Modern distribution of R1b-Z195 (YFull estimate 2700 BC); Right: Modern distribution of DF27. Both include later founder effects within Iberia, so the increase in the Basque country and the Crown of Aragon and the decrease in Portugal can safely be ignored. Contour maps of the derived allele frequencies of the SNPs analyzed in Solé-Morata et al. (2017).

We detect Iranian-related ancestry in Sicily by the Middle Bronze Age 1800-1500 BCE, consistent with the directional shift of these individuals toward Mycenaeans in PCA. Specifically, two of the Middle Bronze Age individuals can only be fit with models that in addition to Anatolia_Neolithic and WHG, include Iran_Ganj_Dareh_Neolithic. The most parsimonious model for Sicily_MBA3125 has 18.0 ± 3.6% Iranian-related ancestry (p=0.032 for rejecting the alternative model of Steppe rather than Iranian-related ancestry), and the most parsimonious model for Sicily_MBA has 14.9 ± 3.9% Iranian-related ancestry (p=0.037 for rejecting the alternative model).

The modern southern Italian Caucasus-related signal identified in Raveane et al. (2018) is plausibly related to the same Iranian-related spread of ancestry into Sicily that we observe in the Middle Bronze Age (and possibly the Early Bronze Age).

The non-Indo-European Sicanians and Elymians were possibly then connected to eastern Mediterranean groups before the expansion of the Sea Peoples.

For the Late Bronze Age group of individuals, qpAdm documented Steppe-related ancestry, modeling this group as 80.2 ± 1.8% Anatolia_Neolithic, 5.3 ± 1.6% WHG, and 14.5 ± 2.2% Yamnaya_Samara. Our modeling using sources more closely related in space and time also supports Sicily_LBA having Minoan-related ancestry or being derived from local preceding populations or individuals with ancestries similar to those of Sicily_EBA3123 (p=0.527), Sicily_MBA3124 (p=0.352), and Sicily_MBA3125 (p=0.095).

This increase in Steppe-related ancestry in a western site during the LBA most likely represents either an expansion from the Aegean or – maybe more likely, given the archaeological finds – a regional population similar to Sicily EBA re-emerging or rather being displaced from the eastern part of the island because of a westward movement from nearby Calabria.

Whether this population sampled spoke Indo-European or not at this time is questionable, since the Iron Age accounts show non-IE Elymians in this region.

Actually, Elymians seem to have spoken Indo-European, which fits well with the increase in steppe ancestry.

EDIT (21 MAR): Interesting about a proposed incoming Minoan-like ancestry is the potential origin of the Iran Neolithic-related ancestry that is going to appear in Central Italy during the LBA. This could then be potentially associated with Tyrsenians passing through the area, although the traditional description may be more more compatible with an arrival of Sea Peoples from the Adriatic.

Sad to read this:

This manuscript is dedicated to the memory of Sebastiano Tusa of the Soprintendenza del Mare in Palermo, who would have been an author of this study had he not tragically died in the crash of Ethiopia Airlines flight 302 on March 10.


Happy new year 2019…and enjoy our new books!


Sorry for the last weeks of silence, I have been rather busy lately. I am having more projects going on, and (because of that) I also wanted to finish a project I have been working on for many months already.

I have therefore decided to publish a provisional version of the text, in the hope that it will be useful in the following months, when I won’t be able to update it as often as I would like to:

EDIT (20 JAN 2019): For those of you who are more comfortable reading in your native language, I have placed some links to automatic translations by Google Translate. They might work especially well for the texts of A Game of Clans & A Clash of Chiefs.

Don’t forget to check out the maps included in the supplementary materials: I have added Y-DNA, mtDNA, and ADMIXTURE data using GIS software. The PCA graphics are also important to follow the main text.

NOTE. Right now the files are only in my server. I will try to upload them to and Research Gate when I have time, I have uploaded them to and ResearchGate, in case the websites are too slow.

I would have preferred to wait for a thorough revision of the section on archaeology and the linguistic sections on Uralic, but I doubt I will have time when the reviews come, so it was either now or maybe next December…

I say so in the introduction, but it is evident that certain aspects of the book are tentative to say the least: the farther back we go from Late Proto-Indo-European, the less clear are many aspects. Also, linguistically I am not convinced about Eurasiatic or Nostratic, although they do have a certain interest when we try to offer a comprehensive view of the past, including ethnolinguistic identities.

I cannot be an expert in everything, and these books cover a lot. I am bound to publish many corrections as new information appears and more reviews are sent. For example, just days ago (before SNP calls of Wang et al. 2018 were published) some paragraphs implied that AME might have expanded Nostratic from the Middle East. Now it does not seem so, and I changed them just before uploading the text. That’s how tentative certain routes are, and how much all of this may change. And that only if we accept a Nostratic phylum…

NOTE. Since the first book I wrote was the linguistic one, and I have spent the last months updating the archaeology + genetics part, now many of you will probably understand 1) why I am so convinced about certain language relationships and 2) how I used many posts to clarify certain ideas and receive comments. Many posts offer probably a good timeline of what I worked with, and when.


I did not add this section to the books, because they are still not ready for print, but I think this is due somewhere now. It is impossible to reference all who have directly or indirectly contributed to this, so this is a list of those I feel have played an important role.

I am indebted to the following people (which does not mean that they share my views, obviously):

First and foremost, to Fernando López-Menchero, for having the patience to review with detail many parts on Indo-European linguistics, knowing that I won’t accept many of his comments anyway. The additional information he offers is invaluable, but I didn’t want to turn this into a huge linguistic encyclopaedia with unending discussions of tiny details of each reconstructed word. I think it is already too big as it is.

I would not have thought about doing this if it were not for the interest of Wekwos (Xavier Delamarre) in publishing a full book about the Indo-European demic diffusion model (in the second half of 2017, I think). It was them who suggested that I extended the content, when all I had done until then was write an essay and draw some maps in my free time between depositing the PhD thesis and defending it.

Sadly, as much as I would like to publish a book with a professional publisher, I don’t think ancient DNA lends itself for the traditional format, so my requests (mainly to have free licenses and being able to review the text at will, as new genetic papers are published) were logically not acceptable. Also, the main aim of all volumes, especially the linguistic one, is the teaching of essentials of Late Proto-Indo-European and related languages, and this objective would be thwarted by selling each volume for $50-70 and only in printed format. I prefer a wider distribution.

At first I didn’t think much of this proposal, because I do not benefit from this kind of publications in my scientific field, but with time my interest in writing a whole, comprehensive book on the subject grew to the point where it was already an ongoing project, probably by the start of 2018.

I would not have been in contact with Wekwos if it were not for user Camulogène Rix at Anthrogenica, so thanks for that and for the interest in this work.

I would not have thought of writing this either if not for the spontaneous support (with an unexpected phone call!) of a professor of the Complutense University of Madrid, Ángel Gómez Moreno, who is interested in this subject – as is his wife, a professor of Classics more closely associated to Indo-European studies, and who helped me with a search for Indo-Europeanists.

EDIT (1 JAN 2019): I remembered that Karin Bojs sent me her book after reading the demic diffusion model. I may have also thought about writing a whole book back then, but mid-2017 is probably too early for the project.

Professor Kortlandt is still to review the text, but he contributed to both previous essays in some very interesting ways, so I hope he can help me improve the parts on Uralic, and maybe alternative accounts of expansion for Balto-Slavic, depending on the time depth that he would consider warranted according to the Temematic hypothesis.

The maps are evidently (for those who are interested in genetics) in part the result of the effort of the late Jean Manco: As you can see from the maps including Y-DNA and mtDNA samples, I have benefitted from her way of organising data and publishing it. Similarly, the work of Iain McDonald in assessing the potential migration routes of R1b and R1a in Europe with the help of detailed maps was behind my idea for the first maps, and consequently behind these, too.

I should thank all people responsible for the release of free datasets to work with, including the Reich and Jena labs, the Veeramah Lab, and also researchers from the Max Planck Institute or the Mainz Palaeogenetics group, who didn’t mind to share with me datasets to work with.

Readers of this blog with interesting comments have also been essential for the improvement of the texts. You can probably see some of your many contributions there. I may not answer many comments, because I am always busy (and sometimes I just don’t have anything interesting to say), but I try to read all of them.

EDIT (1 JAN 2019) I think I should mention at least Chetan, Egg, or Robert George; but then I would leave out old europe, Sgr Ganesh, or Tileman Ehlen; and if I include them I would leave out others…

Users of other sites, like Anthrogenica, whose particular points of view and deep knowledge of some very specific aspects are sometimes very useful. In particular, user Anglesqueville helped me to fix some issues with the merging of datasets to obtain the PCAs and ADMIXTURE, and prepared some individual samples to merge them.

Even without posting anything, Google Analytics keeps sending me messages about increasing user fidelity (returning users), and stats haven’t really changed (which probably means more people are reading old posts), so thank you for that.

I hope you enjoy the books.

Happy new year!

R1a-Z280 lineages in Srubna; and first Palaeo-Balkan R1b-Z2103?


Scythian samples from the North Pontic area are far more complex than what could be seen at first glance. From the new Y-SNP calls we have now thanks to the publications at Molgen (see the spreadsheet) and in Anthrogenica threads, I think this is the basis to work with:

NOTE. I understand that writing a paper requires a lot of work, and probably statistical methods are the main interest of authors, editors, and reviewers. But it is difficult to comprehend how any user of open source tools can instantly offer a more complex assessment of the samples’ Y-SNP calls than professionals working on these samples for months. I think that, by now, it should be clear to everyone that Y-DNA is often as important (sometimes even more) than statistical tools to infer certain population movements, since admixture can change within few generations of male-biased migrations, whereas haplogroups can’t…


Srubna-Andronovo samples are as homogeneous as they always were, dominated by R1a-Z645 subclades and CWC-related (steppe_MLBA) ancestry.

The appearance of one (possibly two) R-Z280 lineages in this mixed Srubna-Alakul region of the southern Urals and this early (1880-1690 BC, hence rather Pokrovka-Alakul) points to the admixture of R1a-Z93 and R1a-Z280 already in Abashevo, which also explains the wide distribution of both subclades in the forest zones of Central Asia.

If Abashevo is the cornerstone of the Indo-Iranian / Uralic community, as it seems, the genetic admixture would initially be quite similar, undergoing in the steppes a reduction to haplogroup R1a-Z93 (obviously not complete), at the same time as it expanded to the west with Pokrovka and Srubna, and to the east with Petrovka and Andronovo. To the north, similar reductions will probably be seen following the Seima-Turbino phenomenon.

NOTE. Another R1a-Z280 has been found in the recent sample from Bronze Age Poland (see spreadsheet). As it appears right now in ancient and modern DNA, there seems to be a different distribution between subclades:

  • R1a-Z280 (formed ca. 2900 BC, TMRCA ca. 2600 BC) appears mainly distributed today to the east, in the forest and steppe regions, with the most ‘successful’ expansions possibly related to the spread of Abashevo- and Battle Axe-related cultures (Indo-Iranian and Uralic alike).
  • R1a-M458 (formed ca. 2700, TMRCA ca. 2700 BC) appears mainly distributed to the north, from central Europe to the east – but not in the steppe in aDNA, with the most ‘successful’ expansions to the west.

M458 lineages seem thus to have expanded in the steppe in sizeable numbers only after the Iranian expansions (see a map of modern R1a distributions) i.e. possibly with the expansion of Slavs, which supports the model whereby cultures from central-east Europe (like Trzciniec and Lusatian), accompanied mainly by M458 lineages, were responsible for the expansion of Proto-Balto-Slavic (and later Proto-Slavic).

The finding of haplogroup R1a-Z93, among them one Z2123, is no surprise at this point after other similar Srubna samples. As I said, the early Srubna expansion is most likely responsible for the Szólád Bronze Age sample (ca. 2100-1700 BC), and for the Balkans BA sample (ca. 1750-1625 BC) from Merichleri, due to incursions along the central-east European steppe.

Map of decorated bone/antler bridle cheek-pieces and whip handle equivalents. They are often local translations that remained faithful to the originals (from data in Piggott, 1965; Kristiansen & Larsson, 2005; David, 2007). Image from Vandkilde (2014).


Cimmerian samples from the west show signs of continuity with R1a-Z93 lineages. Nevertheless, the sample of haplogroup Q1a-Y558, together with the ‘Pre-Scythian’ sample of haplogroup N (of the Mezőcsát Culture) in Hungary ca. 980-830 BC, as well as their PCA, seem to depict an origin of these Pre-Scythian peoples in populations related to the eastern Central Asian steppes, too.

NOTE. I will write more on different movements (unrelated to Uralic expansions) from Central and East Asia to the west accompanied by Siberian ancestry and haplogroup N with the post of Ugric-Samoyedic expansions.


The Scythian of Z2123 lineage ca. 375-203 BC from the Volga (in Mathieson et al. 2015), together with the sample scy193 from Glinoe (probably also R1a-Z2123), without a date, as well as their common Steppe_MLBA cluster, suggest that Scythians, too, were at first probably quite homogeneous as is common among pastoralist nomads, and came thus from the Central Asian steppes.

The reduction in haplogroup variability among East Iranian peoples seems supported by the three new Late Sarmatian samples of haplogroup R1a-Z2124.

Approximate location of Glinoe and Glinoe Sad (with Starosilya to the south, in Ukrainian territory):

This initial expansion of Scythians does not mean that one can dismiss the western samples as non-Scythians, though, because ‘Scythian’ is a cultural attribution, based on materials. Confirming the diversity among western Scythians, a session at the recent ISBA 8:

Genetic continuity in the western Eurasian Steppe broken not due to Scythian dominance, but rather at the transition to the Chernyakhov culture (Ostrogoths), by Järve et al.

The long-held archaeological view sees the Early Iron Age nomadic Scythians expanding west from their Altai region homeland across the Eurasian Steppe until they reached the Ponto-Caspian region north of the Black and Caspian Seas by around 2,900 BP. However, the migration theory has not found support from ancient DNA evidence, and it is still unclear how much of the Scythian dominance in the Eurasian Steppe was due to movements of people and how much reflected cultural diffusion and elite dominance. We present new whole-genome results of 31 ancient Western and Eastern Scythians as well as samples pre- and postdating them that allow us to set the Scythians in a temporal context by comparing the Western Scythians to samples before and after within the Ponto-Caspian region. We detect no significant contribution of the Scythians to the Early Iron Age Ponto-Caspian gene pool, inferring instead a genetic continuity in the western Eurasian Steppe that persisted from at least 4,800–4,400 cal BP to 2,700–2,100 cal BP (based on our radiocarbon dated samples), i.e. from the Yamnaya through the Scythian period.

(…) Our results (…) support the hypothesis that the Scythian dominance was cultural rather than achieved through population replacement.

Detail of the slide with admixture of Scythian groups in Ukraine:


The findings of those 31 samples seem to support what Krzewińska et al. (2018) found in a tiny region of Moldavia-south-western Ukraine (Glinoi, Glinoi Sad, and Starosilya).

The question, then, is as follows: if Scythian dominance was “cultural rather than achieved through population replacement”…Where are the R1b-Z2103 from? One possibility, as I said in the previous post, is that they represent pockets of Iranian R1b lineages in the steppes descended from eastern Yamna, given that this haplogroup appears in modern populations from a wide region surrounding the steppes.

The other possibility, which is what some have proposed since the publication of the paper, is that they are related to Thracians, and thus to Palaeo-Balkan populations. About the previously published Thracian individuals in Sikora et al. (2014):

Geographic origin of ancient samples and ADMIXTURE results. (A) Map of Europe indicating the discovery sites for each of the ancient samples used in this study. (B) Ancestral population clusters inferred using ADMIXTURE on the HGDP dataset, for k = 6 ancestral clusters. The width of the bars of the ancient samples was increased to aid visualization.

For the Thracian individuals from Bulgaria, no clear pattern emerges. While P192-1 still shows the highest proportion of Sardinian ancestry, K8 more resembles the HG individuals, with a high fraction of Russian ancestry.

Despite their different geographic origins, both the Swedish farmer gok4 and the Thracian P192-1 closely resemble the Iceman in their relationship with Sardinians, making it unlikely that all three individuals were recent migrants from Sardinia. Furthermore, P192-1 is an Iron Age individual from well after the arrival of the first farmers in Southeastern Europe (more than 2,000 years after the Iceman and gok4), perhaps indicating genetic continuity with the early farmers in this region. The only non-HG individual not following this pattern is K8 from Bulgaria. Interestingly, this individual was excavated from an aristocratic inhumation burial containing rich grave goods, indicating a high social standing, as opposed to the other individual, who was found in a pit.


The following are excerpts from A Companion to Ancient Thrace (2015), by Valeva, Nankov, and Graninger (emphasis mine):

Thracian settlements from the 6th c. BC on:

(…) urban centers were established in northeastern Thrace, whose development was linked to the growth of road and communication networks along with related economic and distributive functions. The early establishment of markets/emporia along the Danube took place toward the middle of the first millennium BCE (Irimia 2006, 250–253; Stoyanov in press). The abundant data for intensive trade discovered at the Getic village in Satu Nou on the right bank of the Danube provides another example of an emporion that developed along the main artery of communication toward the interior of Thrace (Conovici 2000, 75–76).

Undoubtedly the most prominent manifestation of centralization processes and stratification in the settlement system of Thrace arrives with the emergence of political capitals – the leading urban centers of various Thracian political formations.

Image from Volf at Vol_Vlad LiveJournal.

Their relationships with Scythians and Greeks

The Scythian presence south of the Danube must be balanced with a Thracian presence north of the river. We have observed Getae there in Alexander’s day, settled and raising grain. For Strabo the coastlands from the Danube delta north as far as the river and Greek city of Tyras were the Desert of the Getae (7.3.14), notable for its poverty and tracklessness beyond the great river. He seems to suggest also that it was here that Lysimachus was taken alive by Dromichaetes, king of the Getae, whose famous homily on poverty and imperialism only makes sense on the steppe beyond the river (7.3.8; cf. Diod. 21.12; further on Getic possessions above the Danube, Paus. 1.9 with Delev 2000, 393, who seems rather too skeptical; on poverty, cf. Ballesteros Pastor 2003). This was the kind of discourse more familiarly found among Scythians, proud and blunt in the strength of their poverty. However, as Herodotus makes clear, simple pastoralism was not the whole story as one advanced round into Scythia. For he observes the agriculture practiced north and west of Olbia. These were the lands of the Alizones and the people he calls the Scythian Ploughmen, not least to distinguish them from the Royal Scythians east of Olbia, in whose outlook, he says, these agriculturalist Scythians were their inferiors, their slaves (Hdt. 4.20). The key point here is that, as we began to see with the Getan grain-fields of Alexander’s day, there was scope for Thracian agriculturalists to maintain their lifestyles if they moved north of the Danube, the steppe notwithstanding. It is true that it is movement in the other direction that tends to catch the eye, but there are indications in the literary tradition and, especially, in the archaeological record that there was also significant movement northward from Thrace across the Danube and the Desert of the Getae beyond it.

Greek literary sources were not much concerned with Thracian migration into Scythia, but we should observe the occasional indications of that process in very different texts and contexts. At the level of myth, it is to be remembered that Amazons were regularly considered to be of Thracian ethnicity from Archaic times onward and so are often depicted in Thracian dress in Greek art (Bothmer 1957; cf. Sparkes 1997): while they are most familiar on the south coast of the Black Sea, east of Sinope, they were also located on the north coast, especially east of the Don (the ancient Tanais). Herodotus reports an origin-story of the Sauromatians there, according to which this people had been created by the union of some Scythian warriors with Amazons captured on the south coast and then washed up on the coast of Scythia (4.110). While the story is unhistorical, it is not without importance. First, it reminds us that passage north from the Danube was not the only way that Thracians, Thracian influence, and Thracian culture might find their way into Scythia. There were many more and less circuitous routes, especially by sea, that could bring Thrace into Scythia. Secondly, the myth offered some ideological basis for the Sauromatian settlement in Thrace that Strabo records, for Sauromatians might claim a Thracian origin through their Amazon forebears. Finally, rather as we saw that Heracles could bring together some of the peoples of the region, we should also observe that Ares, whose earthly home was located in Thrace by a strong Greek and Roman tradition, seems also to have been a deity of special significance and special cult among the Scythians. So much was appropriate, especially from a Classical perspective, in associations between these two peoples, whose fame resided especially in their capacity for war.

Scythians: cultures and findings (ca. 7th-4th/3rd c. BC). Greek colonies marked with concentric circles.

This broad picture of cultural contact, interaction, and osmosis, beyond simple conflict, provides the context for a range of archaeological discoveries, which – if examined separately – may seem to offer no more than a scatter of peculiarities. Here we must acknowledge especially the pioneering work of Melyukova, who has done most to develop thinking on Thracian–Scythian interaction. As she pointed out, we have a good example of Thracian–Scythian osmosis as early as the mid-seventh century bce at Tsarev Brod in northeastern Bulgaria, where a warrior’s burial combines elements of Scythian and Thracian culture (Melyukova 1965). For, while the manner of his burial and many of the grave goods find parallels in Scythia and not Thrace, there are also goods which would be odd in a Scythian burial and more at home in a Thracian one of this period (notably a Hallstatt vessel, an iron knife, and a gold diadem). Also interesting in this regard are several stone figures found in the Dobrudja which resemble very closely figures of this kind (baby) known from Scythia (Melyukova 1965, 37–38). They range in date from perhaps the sixth to the third centuries bce, and presumably were used there – as in Scythia – to mark the burials of leading Scythians deposited in the area. Is this cultural osmosis? We should probably expect osmosis to occur in tandem with the movement of artefacts, so that only good contexts can really answer such questions from case to case. However, the broad pattern is indicated by a range of factors. Particularly notable in this regard is the observable development of a Thraco-Scythian form of what is more familiar as “Scythian animal style,” a term which – it must be understood – already embraces a range of types as we examine the different examples of the style across the great expanse from Siberia to the western Ukraine. As Melyukova observes, Thrace shows both items made in this style among Scythians and, more numerous and more interesting, a Thracian tendency to adapt that style to local tastes, with observable regional distinctions within Thrace itself. Among the Getae and Odrysians the adaptation seems to have been at its height from the later fifth century to the mid-third century (Melyukova 1965, 38; 1979).

The absence of local animal style in Bulgaria before the fifth century bce confirms that we have cultural influences and osmosis at work here, though that is not to say that Scythian tradition somehow dominated its Thracian counterpart, as has been claimed (pace Melyukova 1965, 39; contrast Kitov 1980 and 1984). Of particular interest here is the horse-gear (forehead-covers, cheek-pieces, bridle fittings, and so on) which is found extensively in Romania and Bulgaria as well as in Scythia, both in hoarded deposits and in burials. This exemplifies the development of a regional animal style, not least in silver and bronze, which problematizes the whole issue of the place(s) of its production. Accordingly, the regular designation as “Thracian” of horse-gear from the rich fourth century Scythian burial of Oguz in the Ukraine becomes at least awkward and questionable (further, Fialko 1995). And let us be clear that this is no minor matter, nor even part of a broader debate about the shared development of toreutics among Thracians and Scythians (e.g., Kitov 1980 and 1984). A finely equipped horse of fine quality was a strong statement and striking display of wealth and the power it implied

(…) while Thracian pottery appears at Olbia, Scythian pottery among Thracians is largely confined to the eastern limits of what should probably be regarded as Getic territory, namely the area close to the west of the Dniester, from the sixth century bce. Rather exceptional then is the Scythian pottery noted at Istros, which has been explained as a consequence of the Scythian pursuit of the withdrawing army of Darius and, possibly, a continued Scythian grip on the southern Danube in its aftermath (Melyukova 1965, 34). The archaeology seems to show us, therefore, that the elite Thracians and Scythians were more open to adaptation and acculturation than were their lesser brethren.

Paleo-Balkan languages in Eastern Europe between 5th and 1st century BC. From Wikipedia.


(…) we see distinct peoples and organizations, for example as Sitalces’ forces line up against the Scythians. Much more striking, however, against that general background, are the various ways in which the two peoples and their elites are seen to interact, connect, and share a cultural interface. We see also in Scyles’ story how the Greek cities on the coast of Thrace and Scythia played a significant role in the workings of relationships between the two peoples. It is not simply that these cities straddled the Danube, but also that they could collaborate – witness the honors for Autocles, ca. 300 bce (SEG 49.1051; Ochotnikov 2006) – and were implicated with the interactions of the much greater non-Greek powers around them. At the same time, we have seen the limited reality of familiar distinctions between settled Thracians and nomadic Scythians and the limited role of the Danube too in dividing Thrace and Scythia. The interactions of the two were not simply matters of dynastic politics and the occasional shared taste for artefacts like horse-gear, but were more profoundly rooted in the economic matrix across the region, so that “Scythian” nomadism might flourish in the Dobrudja and “Thracian-style” agriculture and settlement can be traced from Thrace across the Danube as far as Olbia. All of that offers scant justification for the Greek tendency to run together Thracians and Scythians as much the same phenomenon, not least as irrational, ferocious, and rather vulgar barbarians (e.g., Plato, Rep. 435b), because such notions were the result of ignorance and chauvinism. However, Herodotus did not share those faults to any degree, so that we may take his ready movement from Scythians to Thracians to be an indication of the importance of interaction between the two peoples whom he had encountered not only as slaves in the Aegean world, but as powerful forces in their own lands (e.g., Hdt. 4.74, where Thracian usage is suddenly brought into his account of Scythian hemp). Similarly, Thucydides, who quite without need breaks off his disquisition on the Odrysians to remark upon political disunity among the Scythians (Thuc. 2.97, a favorite theme: cf. Hdt. 4.81; Xen., Cyr. 1.1.4). As we have seen throughout this discussion, there were many reasons why Thracians might turn the thoughts of serious writers to Scythians and vice versa.

It seems, following Sikora et al. (2014), that Thracian ‘common’ populations would have more Anatolian Neolithic ancestry compared to more ‘steppe-like’ samples. But there were important differences even between the two nearby samples published from Bulgaria, which may account for the close interaction between Scythians and Thracians we see in Krzewińska et al. (2018), potentially reflected in the differences between the Central, Southern and the South-Central clusters (possibly related to different periods rather than peoples??).

If these R1b-Z2103 were descended from Thracian elites, this would be the first proof of Palaeo-Balkan populations showing mainly R1b-Z2103, as I expect. Their appearance together with haplogroup I2a2a1b1 (also found in Ukraine Neolithic and in the Yamna outlier from Bulgaria) seem to support this regional continuity, and thus a long-lasting cultural and ethnic border roughly around the Danube, similar to the one found in the northern Caucasus.

However, since these samples are some 2,500 years younger than the Yamna expansion to the south, and they are archaeologically Scythians, it is impossible to say. In any case, it would seem that the main expansion of R1a-Z645 lineages to the south of the Danube – and therefore those found among modern Greeks – was mediated by the Slavic expansions centuries later.

Modified image from Krzewińska et al. (2018), with added Y-DNA haplogroups to each defined Scythian cluster and Sarmatians. Principal component analysis (PCA) plot visualizing 35 Bronze Age and Iron Age individuals presented in this study and in published ancient individuals in relation to modern reference panel from the Human Origins data set. See image with population references.

On the Northern cluster there is a sample of haplogroup R1b-P312 which, given its position on the PCA (apparently even more ‘modern Celtic’-like than the Hallstatt_Bylany sample from Damgaard et al. 2018), it seems that it could be the product of the previous eastward Hallstatt expansion…although potentially also from a recent one?:

Especially important in the archaeology of this interior is the large settlement at Nemirov in the wooded steppe of the western Ukraine, where there has been considerable excavation. This settlement’s origins evidently owe nothing significant to Greek influence, though the early east Greek pottery there (from ca. 650 bce onward: Vakhtina 2007) and what seems to be a Greek graffito hint at its connections with the Greeks of the coast, especially at Olbia, which lay at the estuary of the River Bug on whose middle course the site was located (Braund 2008). The main interest of the site for the present discussion, however, is its demonstrable participation in the broader Hallstatt culture to its west and south (especially Smirnova 2001). Once we consider Nemirov and the forest steppe in connection with Olbia and the other locations across the forest steppe and coastal zone, together with the less obvious movements across the steppe itself, we have a large picture of multiple connectivities in which Thrace bulks large.

Early Iron Age cultures of the Carpathian basin ca. 7-6th century BC, including steppe-related groups. Ďurkovič et al. (2018).

While the above description of clear-cut R1a-Steppe and R1b-Balkans is attractive (and probably more reliable than admixture found in scattered samples of unclear dates), the true ancient genetic picture is more complicated than that:

  • There is nothing in the material culture of the published western Scythians to distinguish the supposed Thracian elites.
  • We have the sample I0575, an Early Sarmatian from the southern Urals (one of the few available) of haplogroup R1b-Z2106, which supports the presence of R1b-Z2103 lineages among Eastern Iranian-speaking peoples.
  • We also have DA30, a Sarmatian of I2b lineage from the central steppes in Kazakhstan (ca. 47 BC – 24 AD).
  • Other Sarmatian samples of haplogroup R remain undefined.
  • There is R1a-Z93 in a late Sarmatian-Hun sample, which complicates the picture of late pastoralist nomads further.

Therefore, the possibility of hidden pockets of Iranian peoples of R1b-Z2103 (maybe also R1b-P312) lineages remains the best explanation, and should not be discarded simply because of the prevalent haplogroups among modern populations, or because of the different clusters found, or else we risk an obvious circular reasoning: “this sample is not (autosomically or in prevalent haplogroups) like those we already had from the steppe, ergo it is not from this or that steppe culture.” Hopefully, the upcoming paper by Järve et al. will help develop a clearer genetic transect of Iranian populations from the steppes.

All in all, the diversity among western Scythians represents probably one of the earliest difficult cases of acculturation to be studied with ancient DNA (obviously not the only one), since Scythians combine unclear archaeological data with limited and conflicting proto-historical accounts (also difficult to contrast with the wide confidence intervals of radiocarbon dates) with different evolving clusters and haplogroups – especially in border regions with strong and continued interactions of cultures and peoples.

With emerging complex cases like these during the Iron Age, I am happy to see that at least earlier expansions show clearer Y-DNA bottlenecks, or else genetics would only add more data to argue about potential cultural diffusion events, instead of solving questions about proto-language expansions once and for all…


Y-chromosome mixture in the modern Corsican population shows different migration layers


Open access Prehistoric migrations through the Mediterranean basin shaped Corsican Y-chromosome diversity, by Di Cristofaro et al. PLOS One (2018).

Interesting excerpts:

This study included 321 samples from men throughout Corsica; samples from Provence and Tuscany were added to the cohort. All samples were typed for 92 Y-SNPs, and Y-STRs were also analyzed.

Haplogroup R represented approximately half of the lineages in both Corsican and Tuscan samples (respectively 51.8% and 45.3%) whereas it reached 90% in Provence. Sub-clade R1b1a1a2a1a2b-U152 predominated in North Corsica whereas R1b1a1a2a1a1-U106 was present in South Corsica. Both SNPs display clinal distributions of frequency variation in Europe, the U152 branch being most frequent in Switzerland, Italy, France and Western Poland. Calibrated branch lengths from whole Y chromosome sequencing [44,45] and ancient DNA studies [46] both indicated that R1a and R1b diversification began relatively recently, about 5 Kya, consistent with Bronze Age and Copper Age demographic expansion. TMRCA estimations are concordant with such expansion in Corsica.

Spatial frequency maps for haplogroups with frequencies above 3%, their Y-STR based phylogenetic networks in Corsican populations (Blue: North, Green: West, Orange: South, Black: Center and Purple: East) and their TMRCA (in years, +/- SE).

Haplogroup G reached 21.7% in Corsica and 13.3% in Tuscany. Sub-clade G2a2a1a2-L91 accounted for 11.3% of all haplogroups in Corsica yet was not present in Provence or in Tuscany. Thirty-four out of the 37 G2a2a1a2-L91 displayed a unique Y-STR profile, illustrated by the star-like profile of STR networks (Fig 1). G2a2a1a2-L91 and G2a2a-PF3147(xL91xM286) show their highest frequency in present day Sardinia and southern Corsica compared to low levels from Caucasus to Southern Europe, encompassing the Near and Middle East [21,47–50]. Ancient DNA results from Early and Middle Neolithic samples reported the presence of haplogroup G2a-P15 [51–53], consistent with gene flow from the Mediterranean region during the Neolithic transition. Td expansion time estimated by STR for P15-affiliated chromosomes was estimated to be 15,082+/-2217 years ago [49]. Ötzi, the 5,300-year-old Alpine mummy, was derived for the L91 SNP [21]. A genetic relationship between G haplogroups from Corsica and Sardinia is further supported by DYS19 duplication, reported in North Sardinia [14], and observed in the southern part of the Corsica in 9 out of 37 G2a2a1a2-L91 chromosomes and in 4 out of 5 G2a2a-PF3147(xL91xM286) chromosomes, 3 of which displayed an identical STR profile (S4 Table).

This lineage has a reported coalescent age estimated by whole sequencing in Sardinian samples of about 9,000 years ago. This could reflect common ancestors coming from the Caucasus and moving westward during the Neolithic period [48], whereas their continental counterparts would have been replaced by rapidly expanding populations associated with the Bronze Age [46,54,55]. Estimated TMRCA for L91 lineage in Corsica is 4529 +/- 853 years. G-L497 showed high frequencies in Corsica compared to Provence and Tuscany, and this haplogroup was common in Europe, but rare in Greece, Anatolia and the Middle East. Fifteen out of the 17 Corsican G2a2b2a1a1b-L497 displayed a unique Y-STR profile (S4 Table) with an estimated TMRCA of 6867 +/- 1294 years. Haplogroup G2a2b1-M406, associated with Impressed Ware Neolithic markers, along with J2a1-DYS445 = 6 and J2a1b1-M92 [22,49], had very low levels in Corsica. Conversely, G2a2b2a-P303was highly represented and seemed to be independent of the G2a2b1-M406 marker. The 7 G2a2b2a-P303(xL497xM527) Corsican chromosomes displayed a unique Y-STR profile (S4 Table).

First and second axes of the PCA based on 12 Y-chromosome haplogroup frequencies in 83 west Mediterranean populations.

Haplogroup J, mainly represented by J2a1b-M67(xM92), displayed intermediate frequencies in Corsica compared to Tuscany and Provence. J2a1b-M67(xM92) derived STR network analysis displayed a quite homogeneous profile across the island with an estimated TMRCA of 2381 +/- 449 years (Fig 1) and individuals displaying M67 were peripheral compared to Northwestern Italians (S2 Fig). The haplogroup J2a1-Page55(xM67xM530), characteristic of non-Greek Anatolia [22], was found in the north-west of Corsica. Haplogroup J2a1-DYS445 = 6 was found in the north-west with DYS391 = 10 repeats, and in the far south with DYS391 = 9 repeats, the former was associated with Anatolian Greek samples, whereas the second was found in central Anatolia [22]. The 7 J2b2a-M241 displayed a unique Y-STR profile (S4 Table), they were only detected in the Cap Corse region, this sub-haplogroup shows frequency peaks in both the southern Balkans and northern-central Italy [56] and is associated with expansion from the Near East to the Balkans during Neolithic period [57].

Haplogroup E, mainly represented by E1b1b1a1b1a-V13, displayed intermediate frequencies in Corsica compared to Tuscany and Provence. E1b1b1a1b1a-V13 was thought to have initiated a pan-Mediterranean expansion 7,000 years ago starting from the Balkans [52] and its dispersal to the northern shore of the Mediterranean basin is consistent with the Greek Anatolian expansion to the western Mediterranean [22], characteristic of the region surrounding Alaria, and consistent with the TMRCA estimated in Corsica for this haplogroup. A few E1b1a-V38 chromosomes are also observed in the same regions as V13.