European hydrotoponymy (II): Basques and Iberians after Lusitanians and “Ligurians”

bronze-age-languages-western-europe

The first layer in hydrotoponymy of Iberia is clearly Indo-European, in territories that were occupied by Indo-Europeans when Romans arrived, but also in most of those occupied by non-Indo-Europeans.

Among Indo-European peoples, the traditional paradigm – carried around in Wikipedia-like texts until our days – has been to classify their languages as “Pre-Celtic” despite the non-Celtic phonetics (especially the initial -p-), because the same toponyms appear in areas occupied by Celts (e.g. Parisii, Pictones, Pelendones, Palantia); or – even worse – just as “Celtic”, because of the famous -briga and related components. This was evidently not tenable at the end of the 20th century, and it is simply anachronistic today.

NOTE. Since Indo-Europeans and non-Indo-Europeans of Western Europe show strong Y-chromosome bottlenecks under R1b-P312 lineages, maps below show the evolution of cultural groups side by side with ADMIXTURE of ancient DNA samples instead. The map series on prehistorical migrations contains also Y-DNA and mtDNA maps.

Most excerpts below (emphasis mine) are translated from Spanish (see the original text here):

iberia-bell-beakers-steppe
Top Left: Arrival of Indo-European-speaking East Bell Beakers and likely disruption of the Basque-Iberian community (ca 2500 BC on). Top Right: corresponding (unsupervised) ADMIXTURE map of ancient DNA samples. Arrival of Central European ancestry (“Steppe ancestry”, roughly represented by the blue color), with other components still prevalent, roughly including Anatolia Neolithic (brown), WHG (red), and sporadically Northern African (violet). Notice the high proportion of Central European ancestry in central and north-western Iberia. See full maps including Y-DNA and mtDNA. Bottom: PCA of Bell Beaker and contemporaneous samples.

Palaeo-Indo-Europeans

While the non-Celtic Indo-European nature of Lusitanian is certain, the nature of the “Pre-Celtic” language spoken by peoples such as Cantabri, Astures, Pellendones, Carpetani and Vettones is still being discussed, due to the scarcity of material to work with.

Galaico-Lusitanian

From Hacia una definición del lusitano, by Vallejo (2013):

It is certain that the delimitation of the geographical area set by Tovar is still valid, basically determined by the known direct documents, that is, the traditionally accepted inscriptions (the classic ones of Lamas de Moledo, Arroyo de la Luz and Cabeço das Fráguas), in addition to the new ones from Arroyo and the recent one from Arronches, see Fig. 1), to which some others could be added: the new bilingual inscription from Viseu necessarily compels us to consider it as indigenous, because it contains terms that belong to the core of the language and not only onomastics (I refer to the nexus igo and the nicknames deibabor and deibobor). By virtue of this new incorporation, we can also consider other texts as indigenous, although they do not include a common lexicon (see Fig. 1, inscriptions 7 to 22), in the expectation that many Lusitanian scribes were consciously mixing two linguistic registers (code switching), one to refer to the deities (for which they frequently used indigenous inflection) and another for anthroponyms (always with Latin inflection).

iberia-early-bronze-age
Left: Early Bronze Age cultures in Iberia (in red, likely Indo-European groups; in green, likely non-Indo-European groups). Right: Unsupervised ADMIXTURE of ancient DNA samples. See full maps including Y-DNA and mtDNA.

Firstly, it is striking that this geographical profile drawn by the texts correspond almost exactly to the distribution of large series of anthroponyms and theonyms.* Among the abundant names of people we can highlight those with a large number of repetitions whose appearance is circumscribed to our region of study (see Fig. 2). Some of them are truly frequent and lack parallels on the outside, such as the stem Tanc / Tang- (of Tanginus) with no less than 130 attestations, or Tonc- / Tong- (of Tongius or Tongetamus) with 70. Others show also sufficiently representative figures as Camalus and Maelo (with 46 repetitions each), Celtius (with 29), Caturo or Sunua (with 23), Camira (with 22), Doquirus (with 20), Louesius (with 18), Al(l)ucquius (with 17) or Malge(i)nus (with 16). According to these quantities, it appears that these are not casual occurrences of names, taking into account that chance tends to be reduced to a minimum in the study of the Iberian Peninsula, since we can easily handle the entire peninsular corpus. In turn, Reue, Bandue, Nauiae and Crougiae are the theonyms that best represent the Lusitanian-Galician area, coinciding fundamentally (Figure 3) with the picture that anthroponymy and texts had drawn, although with less examples.

lusitanian-inscriptions-toponymy-anthroponymy-teonymy
Top left: Lusitanian (long and short) inscriptions; top right: Map of the distribution of statue-menhirs and south-western stelae, by Rodríguez-Corral (2014) [(1) stelae in Beira Alta and Tras-os-Montes (Portugal), and Orense (Galicia, Spain); (2) both in the same territory: northwestern statue-menhirs and southwestern stelae; (3) hybridization of both into the same material form (stela/stela-menhir from Pedra Alta)]; bottom left: Lusitanian teonymy; bottom right: Lusitanian anthroponymy.

* The other subdivision of the onomastics, toponymy, presents difficulty in the elaboration of series, by the few repetitions of segments, once the universal element -briga has been eliminated.

It is not only these groups of names and roots that help us define a large northwestern area, but, as I have had occasion to mention in other places, some onomastic data that share a similar distribution can also be added: the desinence -oi (with an assimilation in -oe / -ui) of theonymic dative singular, the ending -bo of dative plural, the presence of the noun-forming suffix -aiko-, in addition to other phonetic features such as the passage of e> ei in anthroponymy, the reduction ug> uo the step of w> b.

iberia-north-west-dna
Genetic isolation in modern north-western Iberia (northern Portugal / southern Galicia) is greater than in other Iberian regions, forming different ancestral clusters splitting before others (including Basques). Image from Bycroft et al. (2018). See explanatory video by Carracedo.

Astur-Cantabrian

From The concept of Onomastic Landscape: the case of the Astures, by Vallejo (2013):

(…) First of all, it seems that there is an independent onomastic area, which can be defined by a series of names and suffixes that are repeated there exclusively or predominantly. This area does not seem to correspond with what we know of the Lusitanian-Galician onomastics nor of the more coastal Asturian; it also differs from the Celtiberian area, with which it does not have features in common. In this way, and always in the conjectural terrain, we could find ourselves before an Indo-European non-Celtic language different from the Lusitanian language.

A peculiarity that will have to be investigated is the presence of an excessively wide border corridor, where the names of the southern Astures (Augustales) do not predominate, but neither those of the northern Astures (Transmontanos). Similarly, we will have to see the scope of the hypothesis that there might have been a language perhaps differentiated from that spoken in the Lusitanian, Galician or Celtiberian zones; the lower documentary richness of the Asturian zone of Transmontana makes it more difficult to guarantee that it is not the same linguistic area as the one we isolate among Asturian cities.

In any case, de Hoz, even taking into account the difficulty of an affirmation of this type, pointed out ambiguously that we could find ourselves in front of different languages. On the other hand, the absence of texts directly transmitted by this people leaves us without a definitive confirmation the argument that it is a linguistically differentiated region, but it does not invalidate it at all. These drawbacks require the suspension of the exact characterization of our area, awaiting advances in the field of epigraphy and methodology.

astur-cantabrian-toponymy

Non-Indo-Europeans

The following are mainly excerpts from Villar (2007, 2014):

villar-vascos
Lenguas, genes y culturas en la Prehistoria de Europa y Asia suroccidental (2007). Buy the ebook online (or the printed version, if available).

Basques

Anthroponymy

The information provided by place-names and hydronyms on the one hand and anthroponyms on the other is of undoubted historical value in both cases, but of different specific significance. Anthroponyms reflect the present situation at the moment when living people were using them. It is an aspect very sensitive to social changes of all kinds, reaching its highest level of instability when there is language change.

(…) the Pre-Roman anthroponymic inventory of the Basque Country and Navarre indicates that prior to the arrival of Romans the language spoken was Indo-European (reflected in the names used) in the territories of Caristii, Varduli and Autrigones, while in Vasconic territory (especially in the current Navarre) most of the speakers chose Iberian names. In the territories of the current Basque Country, only a negligible statistical proportion chose Basque names, whereas in Navarre it was a minority of the population. That’s how things were towards the 3rd century BC.

Hydro-Toponymy

Cities and rivers are not subject to the ephemeral life cycle of humans. Rivers have very long cycles that go far beyond the life time not only of individuals, but also of languages ​​and cultures. Cities are also generally very stable, although social circumstances occasionally cause one to be abandoned or destroyed, while new ones are created from time to time. That means that the names of rivers and cities are not subject to fashions or frequent change. Nor does a language change imply a renewal of the previous hydronymy and toponymy.

Speakers of the new languages ​​incorporated into a territory learn from the natives the hydronymic and toponymic system, producing what we call the “toponymic transmission”. (…) it requires a prolonged contact between the native population and the new occupants, which can only occur when the indigenous population is not annihilated quickly and radically.

iberia-middle-bronze-age
Top Left: Middle Bronze Age cultures in Iberia (in red, likely Indo-European groups; in green, likely non-Indo-European groups). Top Right: Unsupervised ADMIXTURE of ancient DNA samples. See full maps including Y-DNA and mtDNA. Bottom: Bottom: PCA of Bronze Age groups.

The ancient onomastic data of the Basque Country and Navarre can be summarized as follows:

  • Ancient hydronymy, the longest lasting onomastic component, is not Basque, but Indo-European in its entirety.
  • The old toponymy, which follows it in durability, is also Indo-European in its entirety, except Poampaelo (now Pamplona) and Oiarso (now Oyarzun).
  • And in anthroponymy, which reflects the language used at the time when those names were in use, is also massively Indo-European, although there are between 10-15% anthroponyms of Vasconic etymology.

(…) the existing data show that, while in Roman times in Hispania there were only a couple of place-names in the Pyrenean border and a dozen anthroponyms of Vasconic etymology, in Aquitaine there was an abundant antroponymy of that etymology.

iberia-late-bronze-age
Left: Late Bronze Age cultures in Iberia (in red, likely Indo-European groups; in green, likely non-Indo-European groups). Right: Unsupervised ADMIXTURE of ancient DNA samples. See full maps including Y-DNA and mtDNA.

This set of facts is most compatible with a hypothesis that postulated a late infiltration of this type of population from Aquitaine, which at the time of the Roman conquest had only reached to establish a bridgehead, consisting of a small population center in Navarre and Alto Aragón and nothing else, except some isolated individuals in the current provinces of Álava, Vizcaya and Guipúzcoa. The almost complete absence of old place-names of Vasconic etymology would be explained in this way: Vasconic speakers, recently arrived and still in small numbers, would not have had the possibility of altering in depth the toponymic heritage prior to their arrival, which was Indo-European.

The idea of ​​a late Vasconization of a part of those territories, in the High Middle Ages or late Antiquity, is not new. Already in the 1920s M. Gómez Moreno said about the modern Basque provinces, with the district of Estella in Navarra, that “personal nomenclature allows comparisons of definitive value, probative that there lived people of the Cantabrian-Asturian race [who for Gómez Moreno were Indo-European], without the slightest trace of perceptible Basqueness”. For him, the first Indo-European people to penetrate the peninsula would have been Ligurian, which evolved into Cantabrians, Asturians, Venetians, Lusitanians, Tormogi, Vacaeans, Autrigones, Caristii and Varduli.

iberia-early-iron-age
Top Left: Pre-Roman cultures in Iberia (in red/brown, Indo-European groups; in pink, Greek; in yellow, Phoenician; in green, likely non-Indo-European groups; Tartessian is disputed). Top Right: Unsupervised ADMIXTURE of ancient DNA samples. See full maps including Y-DNA and mtDNA. Bottom: PCA of Iron Age groups.

Aquitaine

If, as we said above, Basque speakers began to enter the Iberian Peninsula from the other side of the Pyrenees only from the Roman-Republican era, to intensify their presence in the following centuries we must assume that they were to the north of the Pyrenees already before those dates. And, indeed, the existence of this abundant Vasconic antroponymy shows that in the first centuries of our era – while Vasconic speakers in the Peninsula were very few in number, their population in Aquitaine was abundant.

In a provisional manner we can advance that [Aquitaine’s] hydronyms are also known in other places of Europe and easily compatible with Indo-European etymologies (Argantia, Aturis, Tarnes, Sigmanos); and among the place names there are also many that are compatible with non-Gallic Indo-European etymologies, or not necessarily Gallic (Curianum, Aquitania, Burdigala, Cadurci, Auscii, Eluii, Rutani, Cala- (gorris), Latusates, Cossion, Sicor, Oscidates, Vesuna, etc.).

In addition to those place names that we classify as generically Indo-European, there are not a few Celts (Lugdunum, Mediolanum, Noviomagos, Segodunon, Bituriges, Petrucorii, Pinpedunni), several Latins (Aquae Augustae, Convenae, ad Sextum, Augusta), and even some Celto-Latin hybrids (Augustonemeton, Augustoriton). On the other hand, there are hardly any names, neither serial nor not serial, that have a reasonable possibility of being explained by Vasconic etymology (Anderedon could be one of them).

Consequently, the onomastic question of Aquitaine is not compatible with the possibility that Vasconic is the “primordial element” there, either. On the contrary, it is compatible with the hypothesis that they arrived also late in Aquitaine, when hydro-toponymy was already established. They had to Vasconize all or part of the previous population, that turned to use to a large extent the Vasconic anthroponymy. But the previous toponymy remained and the Vasconization process was probably soon interrupted by Celticization first, and Romanization later.

aquitanian-tribes-vascones
Aquitani and neighbouring tribes around the Pyrenees, as described by the Romans (ca. 1st c. BC). The Basque language likely expanded south and west of the Pyrenees into Indo-European-speaking territories during the Roman period. The term ‘Vascones’ only became applied to Basque-speaking tribes in medieval times. Map modified from image by Sémhur at Wikipedia.

A prediction in genetics

This is how Francisco Villar and co-authors from the University of Salamanca saw what would happen with the genetic studies of modern Basques in 2007, based on the similarity with neighbouring Iberians and French, and the late intrusion of the language in its current territory:

Unfortunately, linguistics does not have the means to establish the moment of that arrival in terms of absolute chronology. In any case, this hypothesis is not incompatible with some peculiarities in the frequency of certain genes of the Basque-speaking population. Indeed, today we tend to attribute these peculiarities to the joint action of genetic drift and isolation; to which perhaps we could add a bottleneck in the Vasconic founding population that would one day settle in Aquitaine.

villar-indoeuropeos
Indoeuropeos, iberos, vascos y sus parientes (2014). Buy the ebook online (Or printed version, if available).

Also Villar, in 2014:

In the hypothesis that I propose, future speakers of Basque would have settled initially in Aquitaine, where there would have been an inevitable genetic diffusion with pre-existing [first stage] populations. On the other hand, Basque speakers from Aquitaine would have started to arrive to the Basque Country and Navarre only from Roman times (only a couple of Vasconic toponyms, at least one of them of recent creation; scarce anthroponyms of Vasconic etymology). The part of those populations that mixed with the pre-existing Palaeo-Indo-Europeans (Indo-European names of rivers; general Indo-European toponymy) saw how the uniqueness of their haplogroups, if there was any, was diluted, making it difficult to distinguish from the general [Indo-European] background; being a minority, it could had been even lost as a result of adverse genetic drift.

Olalde et al. (2019) confirmed this hypothesis that modern Basques are quite similar to investigated Iron Age Indo-Europeans from Iberia (such as Celtiberians sampled from the Basque Country):

For the Iron Age, we document a consistent trend of increased ancestry related to Northern and Central European populations with respect to the preceding Bronze Age. The increase was 10 to 19% (95% confidence intervals given here and in the percentages that follow) in 15 individuals along the Mediterranean coast where non-Indo-European Iberian languages were spoken; 11 to 31% in two individuals at the Tartessian site of La Angorrilla in the southwest with uncertain language attribution; and 28 to 43% in three individuals at La Hoya in the north where Indo-European Celtiberian languages were likely spoken. This trend documents gene flow into Iberia during the Late Bronze Age or Early Iron Age, possibly associated with the introduction of the Urnfield tradition.

Modern Basques show therefore, paradoxically, an ancestry similar to recent Iron Age Indo-European invaders (quite likely the ancestors of Celtiberians), which confirms the hypothesis of bottlenecks/founder effects followed by a very recent isolation of its population:

(…) the genetic profile of present-day Basques who speak the only non-Indo-European language in Western Europe [] overlap genetically with Iron Age populations showing substantial levels of Steppe ancestry.

iberia-roman-period
Left: Roman period in Iberia. Right: Unsupervised ADMIXTURE of ancient DNA samples. See full maps including Y-DNA and mtDNA. Notice increase of steppe ancestry in the north, associated with the (Late Bronze Age / Early Iron Age) arrival of Central Europeans.

Iberians

Regarding the Iberian language, the circumstances of analysis are less favorable. However, we can observe in the ancient toponymy of typically Iberian areas (the Spanish Levant and Catalonia) a considerable proportion of toponymy of Indo-European etymology, often identical to that which F. Villar (2000) has called “Southern-Iberian-Pyrenean”. In fact, its presence in the Levant is nothing else but a continuation from Catalonia to the South along the Mediterranean coast. Here are some examples: Caluba, Sorobis, Uduba, Lesuros, Urce / Urci, Turbula, Arsi / Arse, Asterum, Cartalias, Castellona, ​​Lassira, Lucentum, Saguntum, Trete, Calpe, Lacetani, Onusa, Palantia, Saetabis, Saetabicula, Sarna , Segestica, Sicana, Turia, Turicae, Turis.

Compatible with the Indo-European etymology can also be Blanda, Sebelacum, Sucro, Tader, Sigarra, Mastia, Contestania, Liria, Lauro, Indibilis, Herna, Edeta, Dertosa, Cesetania, Cossetani, Celeret, Bernaba, Biscargis, (…)

Finally, in other place names there are Indo-European components in hybrid toponymic syntagms, such as:

  1. orc- / urc-: Orceiabar, Urcarailur, Urceatin, Urcebas, Urcecere, Urcescer, Urceticer.
  2. Il-: Iltukoite, Iluro (3), Ilurci, Ilorci, Ilurcis, Ilucia, Iliturgi, Ilarcurris, Iluberitani, etc.

il-iberian

Examples like these show that in Catalonia and the Spanish Levant the Iberian language is not the deepest identifiable substrate language, but that it took root there when there was previously an Indo-European language that had created a considerable network of toponyms and hydronyms that we can recognize, and over which Iberians settled as a superstrate. The pre-existence of an Indo-European language in the historically Iberian area is further corroborated by the fact that its ancient hydronyms are all Indo-European, with the exception of a single river that has a name that is supposed to be Iberian: the Iberus (Ebro), of which obviously the country and its inhabitants took their name. No doubt ib- was an appellation for river, so that in the language that created that hydronym the Iber should have simply been “the river”. But we will see in the body of this work that ib- is in various places outside the Iberian Peninsula as an appellation for «river», which will force us to rethink its supposed Iberian affiliation. In fact, the Iberus had another name, Elaisos, whose etymology is compatible with Indo-European. As we know with certainty that after Iberians no other Indo-European peoples came to their territory before the Romans, the Indo-European creators of that hydronymy have had to be there before the Iberians. And its antiquity must be considerable because, as we have already said, the vast majority of its hydronyms (Alebus, Caluba, Lesuros, Palantia, Saetabis, Sigarra, Sucro, Tader, Turia and Uduba, Elaisos) belong to that anonymous Indo-European language that didn’t leave written texts or had historical continuity.

inscriptions-celtiberians-iberians-hispania
Inscriptions in Iberia ca. 2nd–1st c. BC. Purple squares show Celtiberian inscriptions, blue circles show Iberian inscriptions. Image modified from Hesperia – Banco de datos de lenguas paleohispánicas.

Villar (2014):

Not always that a language is settled in a territory is it able to eradicate the existing ones definitively. Even a political system as unitary and unifying as the Roman was not able to eradicate the Basque language. And nowadays in Latin America, despite the crushing cultural dominance of Spanish, despite the means for the schooling of a modern society, in spite of the media, a multitude of pre-Columbian languages ​​are spoken that coexist with the language of culture, the only one that is written in those countries. In those situations, which can be prolonged for quite a lot of time, there are individuals who only speak the language newly imposed, others who speak only the language that has resisted disappearing, and others who speak both, in a broad framework of bilingualism. My proposal is that something similar to that must have happened in the Iberian territory when the Romans arrived: A language of culture, Iberian, diversified into more or less distant local dialects, coexisted with several previous languages, equally differentiated from the dialectal point of view. This explains the irruption in the Iberian texts of non-Iberian anthroponyms and, above all, the existence there of a Palaeo-Indo-European hydro-toponymy that had remained in use not only because it was transmitted to Iberian speakers, but also because its native users were still present.

Related

European hydrotoponymy (I): Old European substrate and its relative chronology

old-european-hydronymy-toponymy

These first two posts on Old European hydro-toponymy contain excerpts mainly from Indoeuropeos, iberos, vascos y sus parientes, by Francisco Villar, Universidad de Salmanca (2014), but also from materials of Lenguas, genes y culturas en la Prehistoria de Europa y Asia suroccidental, by Villar et al. Universidad de Salamanca (2007). I can’t recommend both books hardly enough for anyone interested in the history of Pre-Roman peoples in Iberia and Western Europe.

NOTE. Both books also contain detailed information on hydrotoponymy of other regions, like Northern Europe, the Aegean and the Middle East, with some information about Asia, apart from (outdated) genetic data, but their main aim is obviously the Prehistory of Iberia and neighbouring regions like France, Italy, or Northern Africa.

Here are only some excerpts (emphasis mine), translated from Spanish (see the original texts here), accompanied by images from both books.

villar-indoeuropeos
Indoeuropeos, iberos, vascos y sus parientes (2014). Buy the ebook online (Or printed version, if available).

Alteuropäisch and Krahe

The investigation of “Old European” or Alteuropäisch, popularized by Krahe, began precisely with the study of some toponyms and personal names spread all over Europe, previously considered “Ligurian” (by H. d’Arbois de Jubainville and C. Jullian) or “Illyrian” (by J. Pokorny), with which those linguistic groups – in turn badly known – were given an excessive extension, based only on some lexical coincidences.

This is a comment made by the author about Krahe‘s data and his opinions, frequently used against his compiled data, which I find paradoxically applicable to Villar’s data and his tentative assignment of the relative linguistic chronology to an absolute one – including the expansion of a “Mesolithic” Indo-European vs. a “Neolithic” Basque / Iberian vs. a Bronze Age Celtic – when it is now clear that the sequence of events was much later than that:

It is very widespread today a derogatory and globally disqualifying attitude to everything that sounds like Alteuropäisch and Krahe, sometimes without the necessary discrimination between different hypotheses, or even between data and hypothesis. It is not fair that the version of H. Krahe and that of W. P. Schmid be disqualified in a single simplistic judgment as if they were the same thing. But it is a major mistake to reduce the value of the hydro-toponymic data of Europe by the mere fact that Krahe attributed an implausible historical explanation to them. The data are real and still need an adequate explanation within a real historical framework, despite the unfeasibility of Krahe’s explanation.

With that we reach a point that I want to highlight. Among those who are allergic to anything that involves deviating one iota of the Indo-European paradigm as a single event, an attitude gaining momentum considers that hydro-toponymy was introduced in the different regions of Europe and Southeast Asia by the same Indo-European languages ​​that appear historically occupying their territory. H. Krahe had argued strongly against this possibility, so now I will save myself a deeper refutation and I will limit myself to pointing out some difficulties that position is forced to face.

salo-salano
Sala, Sala, Sala, Sala, Sala, Sala, Sala, Sala, Sala, Sala, Sala, Salaca/Salis, Salaceni,
Salacia, Salacia, Salaeni, Salam, Salandona, Salangi, Salangi , Salaniana, Sãlantas,
Salapa, Salapeni, Salaphitanum, Salapia / Salpia / Salapina palus / Salpe, Salar, Salara, Salarama,
Salarbima, Salariga, Salars, Salas, Salat, Salauris, Salcitani, Sale, Sale, Sale, Sale
stagnum, Salecon, Saleia, Salentina, Salentini, Salernum, Salerni, Sales, Sali, Salia, Salia,
Salica, Salica, Salice, Salii, Salija, Salinẽlis, Salìnis, Salìnis, Salìnis, Salìnis, Salinsae, Salionca,
Salius, Salō, Salō, Saloca, Salodurum, Salona, Salonae, Salonenica, Salonia, Saloniana,
Salonime, Salonium, Salontia, Saluca, Salum, Salum, Salunatasi, Saluntum / Salluntum,
Salùpis, Sãlupis, Salur, Salurnis, Selepitani, Sõlis.

The defenders of that alternative have to assume that the process of dialectalization, that before the migrations from the Urheimat was separating into the different Indo-European branches, affected each of them in the phonetic aspect in the general naming vocabulary, but left them unaltered in its phonetic predialectal state with regards to hydro-toponymy, as well as a good part of the naming lexicon related to the concepts of “river, water” and the different qualities of water currents. For example, according to those sharing that opinion, the Hispanic Palantia of the area of Vaccei would be in fact Celtic, but in that name the loss of the initial /p/ that characterizes Celtic would not have been applicable. Similarly, the hydro-toponymy in Germania is largely exempt from the Lautverschiebung, in Greece the loss of initial /s/, etc. These names not only fail to suffer the dialectal innovations corresponding to their zones, but sometimes they present innovations different from the features of the dialect involved. For example the word *mori “sea, standing water” is sometimes found in the hydro-toponymy of Gaul in the form *mari instead of *mori proper of Celtic (Marantium, Marisanga, Marsus), which in the framework of the paradigm has to be inevitably interpreted as a non-Celtic innovation.

wako-wogo
Potential geographic relationship between a priori unrelated graphic-phonetic variants.

Names of this nature that appear in areas where a pre-Roman historical Indo-European language never existed remain unexplained, such as in North Africa, Arabia Felix or the Caucasus: Lake Pallantias in Libya; the Salat River in Mauritania Tingitana; Auso in Mauritania Caesariensis; the Alonta River in Georgia; the Abas River in Caucasian Albania; Salma and Salapeni in Arabia Felix; etc. Of course, for these cases it is always possible to deny any relationship of kinship between these forms and their European cognates, and attribute everything to the chance of random homophonies. Thus, once again, the annoying comparative data are sacrificed in the sacred altar of the paradigm, despite the fact that they are so numerous and consistent that if there were no blind faith in the current dogma, they would be sufficient to articulate a new paradigm over them.

The choice of each Indo-Europeanist between the non-Indo-European and the Indo-European interpretation to explain the prehistoric toponymy of Europe is not motivated by the fact that they manage partial sets of hydronyms that are more propitious alternatively for the one or the other option. On the contrary, frequently the same batch of materials is claimed by both trends as its own. An extreme example is that of Th. Vennemann, who considers simply as non-Indo-European (specifically Paleo-Basque) exactly the same material that H. Krahe used to support his Indo-European interpretation. Thus, the structure and linguistic characteristics of the studied material have little role in the choice of one or the other path, which is rather conditioned by convictions and adhesion to a varied range of personal beliefs, traditional dogmas and scientific paradigms.

villar-vascos
Lenguas, genes y culturas en la Prehistoria de Europa y Asia suroccidental (2007). Buy the ebook online (or the printed version, if available).

The linguistic column

The sequence of languages ​​that were successively spoken in any territory constitutes what by analogy [with the “geological column”] we could call its “ethno-linguistic column”.

Next I offer the list of the languages ​​detected in the compositional (and to a lesser extent derivational) toponymic syntagms in which the appellatives ub-, up-, ab-, ap-, ur-, il-, igi, tuk, -ip – analyzed in this work – are involved.

From the interaction of the different strata in words and hybrid syntagms we can, therefore, establish the linguistic column in the Iberian Peninsula and its neighboring territories (Western Europe and Northern Africa) with the following sequence:

1. A first stratum of very old chronology, which in a previous publication I have proposed to call Palaeo-Indo-European [“arqueo-indoeuropeo”]. The toponymic elements belonging to this stratum dealt with throughout this text are abundant: kerso-, turso-, alawo-, lako-, mido-, silo-, tibo-, etc.

They always function as determinant toponyms of a place-name in any other language. It never uses the name “city” (or “river”) in hybrid syntagms. Their place names (determinants) are combined with names of the following languages:

   a) Iberian in Iberia or Southern France: kiŕś-iltiŕ, tuŕś-iltiŕ, alaun-iltiŕte, lakunm ∙ -iltiŕte.

   b) The language of the igi in southern Iberia and perhaps Northern Africa: Cantigi, Saltigi, Sagigi, Sicingi.

   c) The southern language of the postponed -il: Mid-ili, Sil-ili, Tib-ili.

   d) The language of the postponed -ip: Lac-ipo, Ost-ipo, Vent-ipo.

   e) Celtic in Gaul: kerso-ialos > Cersolius > Cerseuil; Ibili-duros > Ibliodurus.

karo-karanto
Cariensi, Carantium, Carandonis, Carae, Caraca / Caracca, Carrinensis, Cariaca, Carneus, Carula, Carlae, Carieco, Cariocieco, Caricillum, Carona, Carnona, Caranta, Carantonus / Carantana, Caronte, Carantomum / Carantomium, Carronenses / Garronenses, Cares / Carus, Caranusca, Carona, Caro vicus, Carninia, Carus, Carnutes, Carnonis castrum, Carenses, Caralis / Carallis, Carni, Carnicum, Caraceni, Careia, Carici, Carant / Carrant, Carnonacae, Carontō, Cariolum, Caritani, Carinum, Carantani, Carnuntum, Cariniana Vallis, Cariones, Careotae, Caroia, Caria, Careum, Carnae, Caran, Carnasium, Carnus, Carneates, Carnium, Carenus, Karlasuwa, Carnias, Karahna, Karna, Cariuntis, Kariuna, Careotis, Karu, Caralitis, Carus, Carnasso, Cares, Carene, Caranum, Caria, Carina, Carura, Caralis, Coralis, Carana, Carnalis, Carinum, Carnus, Carium, Carnium, Carnus Carnuntus / Carnusii, Chariuntas, Carandra, Carna, Carana, Carine, Cariatae, Caralae, Carura, Carei, Carura, Caricum, Caranis, Caralia, Carustum, Carystus, Carastasei.

This first Palaeo-Indo-European layer also corresponds to:

Several Palaeo-Indo-European varieties that have ab-, ap-, ub-, up- as a name for «river». To them belong also numerous place names (balsa-, siko-, wol-, etc.) that act as first members composed in both monoglotic and hybrid syntagmas.

Palaeo-Indo-European varieties in which ur- is the name “river”.

ab-hydronyms

2. The second stratum in decreasing order of antiquity is formed by the language of the place name igi “city”, although its presence is only verified with certainty in Iberia (especially in the south) and Northern Africa:

   a) It sets the igi name in compounds with Palaeo-Indo-European toponyms as in Salt-, Ast-, Olont-, Cant-, Aur- (Hispania) and Sagigi, Sicingi (Northern Africa).

   b) It works as the first place-name of the compound when the second is il: Igilium, Igilgili, Singili.

3. The third stratum is the language of the name il “city”:

   a) It puts the nickname il as determined in hybrid syntagms with Palaeo-Indo-European determinants: Mid-ili, Sil-ili, Tib-ili.

   b) It puts the nickname il as determined in hybrid syntagms with determinant toponyms igi: Igilium, Igilgili, Singili.

   c) It puts the place names (determinants) in front of the name (determined) of the language -ip (Il-ipa, Il-ipula and Il-ipla).

il-toponyms

4. Fourth is the language of the name ip- “city”, which puts the name (determined) in syntagms with:

   a) Palaeo-Indo-European toponym (determinant): Lac-ipo, Ost-ipo, Vent-ipo.

   b) Toponym (determinant) il: Ilipa.

   c) Second generation hybrid toponym of Palaeo-Indo-European + il: Balsilippa.

   d) In the Balsilippa and Sicilippa conglomerates, the three strata appear in the expected sequence: Palaeo-Indo-European + il + ip.

ip-toponyms

5. In the fifth place of the sequence is the language of the tuk-:

   a) It puts the name tuk- in compounds in which the place-name is a Palaeo-Indo-European element: Acatucci (see Aduatuci in Germania).

   b) It puts the name tuk- “height, top” in compounds in which the place-name is an ip- fossilized as place-names: Iptuci, etc.

   c) On at least one occasion an ip-fossilized syntagm acts as a toponym opposite a Celtic name: Itucodon (<Iptuco-dunum).

NOTE. Even though Villar talks about this stratum -tuk in Germania (Aduatukus) and the British Isles (Itucodon), only one case is found in each territory.

tuk-variants

6. The last place is occupied by Celtic:

   a) In Itucodon it puts the name (dunum) in front of a complex toponym of two previous strata, ip- + tuk-; and in Iliodurus it gives the name duro- in front of an equally complex Ibliodurus (<Ibili + duro).

   b) In bilbiliz it puts the casual morpheme in a fossilized bi-member toponym of a previous stratum, one of whose components is il-: Bilbil-iz.

linguistica-cronologia-hispania
[First column modified to include relative instead of absolute chronology]

A hard change of paradigm

More effort did it cost me to accept that ub- is a dialectal variant of a known Indo-European word for “water, river”, of which previously knew three others: ap-, ab-, up-. The obviousness of the phonetic correlation ap- / ab- // up- / ub- together with the semantic link with rivers, which can be verified above all outside of Spain, but is also present in our Peninsula, forced my resistance little by little. And with it fell the first trench of the dogma, unshakable until that moment, that everything in the Peninsula in the south was to be non-Indo-European.

ub-ob-hydronyms

Along with this serial component, many other isolated place names were revealed as very likely of Indo-European etymology, both in the “Iberian” East and in the “Tartessian” South. So the ubiquity of Indo-European throughout the Peninsula began to impose itself to me painfully. I say painfully because I lacked a paradigm in which to fit the new perspective that was making its way into my mind, which was therefore suspended in nothing, without any theoretical support, leaving me with a feeling that I was losing my footing. And for a time I was reluctant to accept the profound implications that all of this had entailed.

All il languages, in any of their locations, exhibit a compositional behavior in hybrid toponymic syntagms that place them all in an intermediate position between the clearly [first/second layer] strata, with place-names for their human settlements semantically derived from water realities (ur), and those clearly attributable to the [fifth layer] with appellations derived from settlements in heights (briga, dunum). But in that intermediate segment of the column there are three strata: 1) il, 2) ip-, 3) tuk-. In Andalusia there is an additional one: the igi stratum, of opaque semantics, which immediately precedes the il stratum.

or-ur-hydronyms
Hydronyms in -or-, -ur-.

To postulate that any of the toponymic strata of our column imply a new linguistic stratum, certain additional requirements will be necessary. One of them is that, in addition to the name in question, the languages ​​involved should share other features that could not have been lent, such as the very precise order of elements in the compounds Toponym + Name coexisting with Name + Adjective. Or the sharing of additional lexical elements that are not usually subject to loans, such as the semantically basic adjectives beri «new» and bels «black».

Unfortunately, the toponymic method, like the Comparative Method itself, does not have the capacity to establish precise absolute chronologies. (…)

Linguistic chronology

old-european-hydro-toponymy
Old European hydrotoponymy. Baltic data compensated. Statistical method Kriging.

In Europe (Hispania, South of France, Germania, British Isles, Baltic) the oldest stratum that can be identified is an indeterminable number of palaeo-varieties of the Indo-European macro-family, which do not have a direct local relationship with historical Indo-European languages, to the extent that we can verify. In fact, we have seen that stratigraphic signs lead us to consider the main Indo-European pre-Roman language of Hispania, the Celtic language, as a stratum after the il language, which in turn is later than the peninsular Indo-European palaeo-varieties.

In North Africa there is also a Palaeo-Indo-European stratum present. But there is also a very old non-Indo-European stratum whose identity I can not define through the material used. Nor has it been possible for me to establish relative antiquity of one and the other on African soil.

Another of the languages ​​involved, which has il- as an appellation for “city” in the Southwest of Hispania and North Africa, could have some kind of kinship relationship with Basque on the one hand and the Iberian language on the other, but the same indirect form that I have just pointed out for the Indo-European palaeo-varieties with respect to the historical Indo-European languages. Or in other words: the language(s) of the place-names referred to in this work would be palaeo-varieties of a linguistic family to which two known historical languages, Iberian and Basque, may have belonged, although we can’t establish a relation of direct affiliation neither between those two historical languages ​​among themselves, nor between any of them and the palaeo-varieties of the prehistoric toponymy.

linguistica-cronologia-africa
[First column modified to include relative instead of absolute chronology]

In general, Celtic does not have in its historical territories the onomastic behavior of an ancestral language, but that of an intrusive language, whose presence there is not only more recent than other Indo-European varieties, but also after that of various non-Indo-European strata, which are themselves ranked between the oldest detected (Palaeo-Indo-European) and the last of Pre-Romans, which is Celtic itself. If we only detected two strata, the Indo-European and the Celtic ones, we could discuss if it is possible that both are one and the same, so that what we define as “Celtic” is nothing other than the modern in situ evolution of Palaeo-Indo-European. But examples like those of kiŕśiltiŕ, kerso-ialos, Cirsa or Itucodon, among many others analyzed throughout this book, make it unlikely. And, in addition, the mediation of several strata in the column between the Palaeo-Indo-European language of Cirsa, as well as the greater antiquity of the ip- and tuk- languages ​​in Spanish, Gallic and British territory, defines the latter as a new and more recent layer than the aforementioned, which burst into its historical sites during the Iron Age.

Because Archaeology continues to deny the existence of population movements of a size worthy of consideration in the Iron Age, it is necessary to accept that the Indo-European Problem remains intact. It is understandable that before this aporia, many minds who are uncomfortable living with doubts, prefer to adopt a creed (the traditional, the Neolithic or the continuist) and expose it as a certainty to their students in the classrooms or their colleagues in conferences and publications. It’s not my case. For me, with Voltaire, “le doute est désagréable, mais la certitude est ridicule”. Or with Manzoni: “E men male l’agitarsi nel dubbio, che riposar nell’errore”.

Continue reading on European hydrotoponymy (II): Basques, Iberians, and Etruscans after Old Europeans.

Related

Common pitfalls in human genomics and bioinformatics: ADMIXTURE, PCA, and the ‘Yamnaya’ ancestral component

invasion-from-the-steppe-yamnaya

Good timing for the publication of two interesting papers, that a lot of people should read very carefully:

ADMIXTURE

Open access A tutorial on how not to over-interpret STRUCTURE and ADMIXTURE bar plots, by Daniel J. Lawson, Lucy van Dorp & Daniel Falush, Nature Communications (2018).

Interesting excerpts (emphasis mine):

Experienced researchers, particularly those interested in population structure and historical inference, typically present STRUCTURE results alongside other methods that make different modelling assumptions. These include TreeMix, ADMIXTUREGRAPH, fineSTRUCTURE, GLOBETROTTER, f3 and D statistics, amongst many others. These models can be used both to probe whether assumptions of the model are likely to hold and to validate specific features of the results. Each also comes with its own pitfalls and difficulties of interpretation. It is not obvious that any single approach represents a direct replacement as a data summary tool. Here we build more directly on the results of STRUCTURE/ADMIXTURE by developing a new approach, badMIXTURE, to examine which features of the data are poorly fit by the model. Rather than intending to replace more specific or sophisticated analyses, we hope to encourage their use by making the limitations of the initial analysis clearer.

The default interpretation protocol

Most researchers are cautious but literal in their interpretation of STRUCTURE and ADMIXTURE results, as caricatured in Fig. 1, as it is difficult to interpret the results at all without making several of these assumptions. Here we use simulated and real data to illustrate how following this protocol can lead to inference of false histories, and how badMIXTURE can be used to examine model fit and avoid common pitfalls.

admixture-protocol
A protocol for interpreting admixture estimates, based on the assumption that the model underlying the inference is correct. If these assumptions are not validated, there is substantial danger of over-interpretation. The “Core protocol” describes the assumptions that are made by the admixture model itself (Protocol 1, 3, 4), and inference for estimating K (Protocol 2). The “Algorithm input” protocol describes choices that can further bias results, while the “Interpretation” protocol describes assumptions that can be made in interpreting the output that are not directly supported by model inference

Discussion

STRUCTURE and ADMIXTURE are popular because they give the user a broad-brush view of variation in genetic data, while allowing the possibility of zooming down on details about specific individuals or labelled groups. Unfortunately it is rarely the case that sampled data follows a simple history comprising a differentiation phase followed by a mixture phase, as assumed in an ADMIXTURE model and highlighted by case study 1. Naïve inferences based on this model (the Protocol of Fig. 1) can be misleading if sampling strategy or the inferred value of the number of populations K is inappropriate, or if recent bottlenecks or unobserved ancient structure appear in the data. It is therefore useful when interpreting the results obtained from real data to think of STRUCTURE and ADMIXTURE as algorithms that parsimoniously explain variation between individuals rather than as parametric models of divergence and admixture.

For example, if admixture events or genetic drift affect all members of the sample equally, then there is no variation between individuals for the model to explain. Non-African humans have a few percent Neanderthal ancestry, but this is invisible to STRUCTURE or ADMIXTURE since it does not result in differences in ancestry profiles between individuals. The same reasoning helps to explain why for most data sets—even in species such as humans where mixing is commonplace—each of the K populations is inferred by STRUCTURE/ADMIXTURE to have non-admixed representatives in the sample. If every individual in a group is in fact admixed, then (with some exceptions) the model simply shifts the allele frequencies of the inferred ancestral population to reflect the fraction of admixture that is shared by all individuals.

Several methods have been developed to estimate K, but for real data, the assumption that there is a true value is always incorrect; the question rather being whether the model is a good enough approximation to be practically useful. First, there may be close relatives in the sample which violates model assumptions. Second, there might be “isolation by distance”, meaning that there are no discrete populations at all. Third, population structure may be hierarchical, with subtle subdivisions nested within diverged groups. This kind of structure can be hard for the algorithms to detect and can lead to underestimation of K. Fourth, population structure may be fluid between historical epochs, with multiple events and structures leaving signals in the data. Many users examine the results of multiple K simultaneously but this makes interpretation more complex, especially because it makes it easier for users to find support for preconceptions about the data somewhere in the results.

In practice, the best that can be expected is that the algorithms choose the smallest number of ancestral populations that can explain the most salient variation in the data. Unless the demographic history of the sample is particularly simple, the value of K inferred according to any statistically sensible criterion is likely to be smaller than the number of distinct drift events that have practically impacted the sample. The algorithm uses variation in admixture proportions between individuals to approximately mimic the effect of more than K distinct drift events without estimating ancestral populations corresponding to each one. In other words, an admixture model is almost always “wrong” (Assumption 2 of the Core protocol, Fig. 1) and should not be interpreted without examining whether this lack of fit matters for a given question.

admixture-pitfalls
Three scenarios that give indistinguishable ADMIXTURE results. a Simplified schematic of each simulation scenario. b Inferred ADMIXTURE plots at K= 11. c CHROMOPAINTER inferred painting palettes.

Because STRUCTURE/ADMIXTURE accounts for the most salient variation, results are greatly affected by sample size in common with other methods. Specifically, groups that contain fewer samples or have undergone little population-specific drift of their own are likely to be fit as mixes of multiple drifted groups, rather than assigned to their own ancestral population. Indeed, if an ancient sample is put into a data set of modern individuals, the ancient sample is typically represented as an admixture of the modern populations (e.g., ref. 28,29), which can happen even if the individual sample is older than the split date of the modern populations and thus cannot be admixed.

This paper was already available as a preprint in bioRxiv (first published in 2016) and it is incredible that it needed to wait all this time to be published. I found it weird how reviewers focused on the “tone” of the paper. I think it is great to see files from the peer review process published, but we need to know who these reviewers were, to understand their whiny remarks… A lot of geneticists out there need to develop a thick skin, or else we are going to see more and more delays based on a perceived incorrect tone towards the field, which seems a rather subjective reason to force researchers to correct a paper.

PCA of SNP data

Open access Effective principal components analysis of SNP data, by Gauch, Qian, Piepho, Zhou, & Chen, bioRxiv (2018).

Interesting excerpts:

A potential hindrance to our advice to upgrade from PCA graphs to PCA biplots is that the SNPs are often so numerous that they would obscure the Items if both were graphed together. One way to reduce clutter, which is used in several figures in this article, is to present a biplot in two side-by-side panels, one for Items and one for SNPs. Another stratagem is to focus on a manageable subset of SNPs of particular interest and show only them in a biplot in order to avoid obscuring the Items. A later section on causal exploration by current methods mentions several procedures for identifying particularly relevant SNPs.

One of several data transformations is ordinarily applied to SNP data prior to PCA computations, such as centering by SNPs. These transformations make a huge difference in the appearance of PCA graphs or biplots. A SNPs-by-Items data matrix constitutes a two-way factorial design, so analysis of variance (ANOVA) recognizes three sources of variation: SNP main effects, Item main effects, and SNP-by-Item (S×I) interaction effects. Double-Centered PCA (DC-PCA) removes both main effects in order to focus on the remaining S×I interaction effects. The resulting PCs are called interaction principal components (IPCs), and are denoted by IPC1, IPC2, and so on. By way of preview, a later section on PCA variants argues that DC-PCA is best for SNP data. Surprisingly, our literature survey did not encounter even a single analysis identified as DC-PCA.

The axes in PCA graphs or biplots are often scaled to obtain a convenient shape, but actually the axes should have the same scale for many reasons emphasized recently by Malik and Piepho [3]. However, our literature survey found a correct ratio of 1 in only 10% of the articles, a slightly faulty ratio of the larger scale over the shorter scale within 1.1 in 12%, and a substantially faulty ratio above 2 in 16% with the worst cases being ratios of 31 and 44. Especially when the scale along one PCA axis is stretched by a factor of 2 or more relative to the other axis, the relationships among various points or clusters of points are distorted and easily misinterpreted. Also, 7% of the articles failed to show the scale on one or both PCA axes, which leaves readers with an impressionistic graph that cannot be reproduced without effort. The contemporary literature on PCA of SNP data mostly violates the prohibition against stretching axes.

pca-how-to
DC-PCA biplot for oat data. The gradient in the CA-arranged matrix in Fig 13 is shown here for both lines and SNPs by the color scheme red, pink, black, light green, dark green.

The percentage of variation captured by each PC is often included in the axis labels of PCA graphs or biplots. In general this information is worth including, but there are two qualifications. First, these percentages need to be interpreted relative to the size of the data matrix because large datasets can capture a small percentage and yet still be effective. For example, for a large dataset with over 107,000 SNPs for over 6,000 persons, the first two components capture only 0.3693% and 0.117% of the variation, and yet the PCA graph shows clear structure (Fig 1A in [4]). Contrariwise, a PCA graph could capture a large percentage of the total variation, even 50% or more, but that would not guarantee that it will show evident structure in the data. Second, the interpretation of these percentages depends on exactly how the PCA analysis was conducted, as explained in a later section on PCA variants. Readers cannot meaningfully interpret the percentages of variation captured by PCA axes when authors fail to communicate which variant of PCA was used.

Conclusion

Five simple recommendations for effective PCA analysis of SNP data emerge from this investigation.

  1. Use the SNP coding 1 for the rare or minor allele and 0 for the common or major allele.
  2. Use DC-PCA; for any other PCA variant, examine its augmented ANOVA table.
  3. Report which SNP coding and PCA variant were selected, as required by contemporary standards in science for transparency and reproducibility, so that readers can interpret PCA results properly and reproduce PCA analyses reliably.
  4. Produce PCA biplots of both Items and SNPs, rather than merely PCA graphs of only Items, in order to display the joint structure of Items and SNPs and thereby to facilitate causal explanations. Be aware of the arch distortion when interpreting PCA graphs or biplots.
  5. Produce PCA biplots and graphs that have the same scale on every axis.

I read the referenced paper Biplots: Do Not Stretch Them!, by Malik and Piepho (2018), and even though it is not directly applicable to the most commonly available PCA graphs out there, it is a good reminder of the distorting effects of stretching. So for example quite recently in Krause-Kyora et al. (2018), where you can see Corded Ware and BBC samples from Central Europe clustering with samples from Yamna:

NOTE. This is related to a vertical distorsion (i.e. horizontal stretching), but possibly also to the addition of some distant outlier sample/s.

pca-cwc-yamna-bbc
Principal Component Analysis (PCA) of the human Karsdorf and Sorsum samples together with previously published ancient populations projected on 27 modern day West Eurasian populations (not shown) based on a set of 1.23 million SNPs (Mathieson et al., 2015). https://doi.org/10.7554/eLife.36666.006

The so-called ‘Yamnaya’ ancestry

Every time I read papers like these, I remember commenters who kept swearing that genetics was the ultimate science that would solve anthropological problems, where unscientific archaeology and linguistics could not. Well, it seems that, like radiocarbon analysis, these promising developing methods need still a lot of refinement to achieve something meaningful, and that they mean nothing without traditional linguistics and archaeology… But we already knew that.

Also, if this is happening in most peer-reviewed publications, made by professional geneticists, in journals of high impact factor, you can only wonder how many more errors and misinterpretations can be found in the obscure market of so many amateur geneticists out there. Because amateur geneticist is a commonly used misnomer for people who are not geneticists (since they don’t have the most basic education in genetics), and some of them are not even ‘amateurs’ (because they are selling the outputs of bioinformatic tools)… It’s like calling healers ‘amateur doctors’.

NOTE. While everyone involved in population genetics is interested in knowing the truth, and we all have our confirmation (and other kinds of) biases, for those who get paid to tell people what they want to hear, and who have sold lots of wrong interpretations already, the incentives of ‘being right’ – and thus getting involved in crooked and paranoid behaviour regarding different interpretations – are as strong as the money they can win or loose by promoting themselves and selling more ‘product’.

As a reminder of how badly these wrong interpretations of genetic results – and the influence of the so-called ‘amateurs’ – can reflect on research groups, yet another turn of the screw by the Copenhagen group, in the oral presentations at Languages and migrations in pre-historic Europe (7-12 Aug 2018), organized by the Copenhagen University. The common theme seems to be that Bell Beaker and thus R1b-L23 subclades do represent a direct expansion from Yamna now, as opposed to being derived from Corded Ware migrants, as they supported before.

NOTE. Yes, the “Yamna → Corded Ware → Únětice / Bell Beaker” migration model is still commonplace in the Copenhagen workgroup. Yes, in 2018. Guus Kroonen had already admitted they were wrong, and it was already changed in the graphic representation accompanying a recent interview to Willerslev. However, since there is still no official retraction by anyone, it seems that each member has to reject the previous model in their own way, and at their own pace. I don’t think we can expect anyone at this point to accept responsibility for their wrong statements.

So their lead archaeologist, Kristian Kristiansen, in The Indo-Europeanization of Europé (sic):

kristiansen-migrations
Kristiansen’s (2018) map of Indo-European migrations

I love the newly invented arrows of migration from Yamna to the north to distinguish among dialects attributed by them to CWC groups, and the intensive use of materials from Heyd’s publications in the presentation, which means they understand he was right – except for the fact that they are used to support a completely different theory, radically opposed to those defended in Heyd’s model

Now added to the Copenhagen’s unending proposals of language expansions, some pearls from the oral presentation:

  • Corded Ware north of the Carpathians of R1a lineages developed Germanic;
  • R1b borugh [?] Italo-Celtic;
  • the increase in steppe ancestry on north European Bell Beakers mean that they “were a continuation of the Yamnaya/Corded Ware expansion”;
  • Corded Ware groups [] stopped their expansion and took over the Bell Beaker package before migrating to England” [yep, it literally says that];
  • Italo-Celtic expanded to the UK and Iberia with Bell Beakers [I guess that included Lusitanian in Iberia, but not Messapian in Italy; or the opposite; or nothing like that, who knows];
  • 2nd millennium BC Bronze Age Atlantic trade systems expanded Proto-Celtic [yep, trade systems expanded the language]
  • 1st millennium BC expanded Gaulish with La Tène, including a “Gaulish version of Celtic to Ireland/UK” [hmmm, dat British Gaulish indeed].

You know, because, why the hell not? A logical, stable, consequential, no-nonsense approach to Indo-European migrations, as always.

Also, compare still more invented arrows of migrations, from Mikkel Nørtoft’s Introducing the Homeland Timeline Map, going against Kristiansen’s multiple arrows, and even against the own recent fantasy map series in showing Bell Beakers stem from Yamna instead of CWC (or not, you never truly know what arrows actually mean):

corded-ware-migrations
Nørtoft’s (2018) maps of Indo-European migrations.

I really, really loved that perennial arrow of migration from Volosovo, ca. 4000-800 BC (3000+ years, no less!), representing Uralic?, like that, without specifics – which is like saying, “somebody from the eastern forest zone, somehow, at some time, expanded something that was not Indo-European to Finland, and we couldn’t care less, except for the fact that they were certainly not R1a“.

This and Kristiansen’s arrows are the most comical invented migration routes of 2018; and that is saying something, given the dozens of similar maps that people publish in forums and blogs each week.

NOTE. You can read a more reasonable account of how haplogroup R1b-L51 and how R1-Z645 subclades expanded, and which dialects most likely expanded with them.

We don’t know where these scholars of the Danish workgroup stand at this moment, or if they ever had (or intended to have) a common position – beyond their persistent ideas of Yamnaya™ ancestral component = Indo-European and R1a must be Indo-European – , because each new publication changes some essential aspects without expressly stating so, and makes thus everything still messier.

It’s hard to accept that this is a series of presentations made by professional linguists, archaeologists, and geneticists, as stated by the official website, and still harder to imagine that they collaborate within the same professional workgroup, which includes experienced geneticists and academics.

I propose the following video to close future presentations introducing innovative ideas like those above, to help the audience find the appropriate mood:

Related

Rhetoric of debates, discussions and arguments: Useful destructive criticism for scientific & academic research, reasons and personal opinions; the example of Proto-Indo-European language revival

Rhetoric (Wikipedia) is the art of harnessing reason, emotions and authority, through language, with a view to persuade an audience and, by persuading, to convince this audience to act, to pass judgement or to identify with given values. The word derives from PIE root wer-, ‘speak’, as in MIE zero-grade wrdhom, ‘word’, or full-grade werdhom, ‘verb’; from wrētōr ρήτωρ (rhētōr), “orator” [built like e.g. wistōr (<*widtor), Gk. ἵστωρ (histōr), “a wise man, one who knows right, a judge” (from which ‘history’), from PIE root weid-, ‘see, know’]; from that noun is adj. wrētorikós, Gk. ρητορικός (rhētorikós), “oratorical, skilled in speaking”, and fem. wrētorikā, GK ρητορική (rhētorikē). According to Plato, rhetoric is the “art of enchanting the soul”.

When related to Proto-Indo-European language revival, as well as in modern scientific research of any discipline, discussions are sometimes interesting in light of historical rhetoric, as they might get really close to some classical (counter-)argumentative resources, however unknown they are to their users…

Sophists taught that every argument could be countered with an opposing argument, that an argument’s effectiveness derived from how “likely” it appeared to the audience (its probability of seeming true), and that any probability argument could be countered with an inverted probability argument. Thus, if it seemed likely that a strong, poor man were guilty of robbing a rich, weak man, the strong poor man could argue, on the contrary, that this very likelihood (that he would be a suspect) makes it unlikely that he committed the crime, since he would most likely be apprehended for the crime. They also taught and were known for their ability to make the weaker (or worse) argument the stronger (or better).

So, for example, if people might generally think that evolution is very likely to have occured, because of the scientifical data available, one only has to say something like “God put those proofs there to confound people and prove their faith“. And, even if there is no single reason to give why that person is entitled to interpret the Bible that way, and to determine what ‘God thought’ when ‘inventing proofs of a false evolution’, in fact there is no need to give rational arguments: this very likelihood of evolution is in itself a proof of how good God is in cheating us…

Statistics was a discipline mostly unknown to sophists, but I’m sure they more or less imagined the typical bell curve that population beliefs and opinions follow. If interpreted the other way round, one could say that the more an idea is believed by people, the more likely is that someone will come along with another, competing one. In fact, that’s natural evolution, too: without that universal trend that life has to differentiate itself from the normal, matter would have never changed and get more and more complicated…

That trend is observed in research, too, as man is obviously another animal and its intelligence another natural feature subjected to the evolutive machinery of nature. That’s why Occam’s razor is never a sufficient argument to end a research field or hypothesis: you have e.g. Gimbutas’ theories (or Renfrew’s, if you like) – even though obviously not completely proven hypothesis -, about some prehistoric speakers being successful in their conquests and migrations through Eurasia, which infers with logic that what happend with Indo-European languages expansion is what has almost always happened in the known history of language expansion, using the most probable extrapolation they can with the facts we know. But you will still find competing hypothesis about an unlikely millennium-long, peaceful spread and mix of languages through and from Europe or Asia, based on some controversial facts and a great part of imagination. And, even if such theories are far away from what can generally be considered rational, they will certainly find supporters; and it’s not bad that such unlikely ideas emerge: science is built up thanks to some of such marginal ideas which eventually prove true; apart from the million ones that prove false and disappear, and some dozens that are sadly able to remain, like homeopathy or Esperanto-like conlanging, as I’ve said before. The same happens with the human body, which went through mutation obtaining lots of advantages, but at the same time dragging some genetic illnesses along…

About Proto-Indo-European research, it’s more or less straightforward which hypothesis and theories are considered generally accepted, and which ones minority views. Nevertheless, that doesn’t prevent renown experts from accepting some marginal hypothesis in some aspects of PIE reconstruction, while keeping the general view on other ones; neither does that prevent renown linguists and philologists to consider Proto-Indo-European, or comparative and historical grammar in general, an absurd work: the ex-Dean of a southern Spanish University, a Latin professor, deems PIE an “invention”; in his words, “from Lat. pater, Gk. pater, and Eng. father, we say there is a language that said what, ‘pater‘? pfff”; he obviously considers “language=written & renown language system”; the problem with that thought is that if PIE becomes spoken (i.e. written too) and renown, just as Old Latin became Classical Latin – instead of disappearing as the other Italic dialects – the whole reasoning is useless; so it’s also useless now. One of the most famous Indo-Europeanists in Spain, F. Adrados (e.g. marginal supporter of Etruscan as an IE language) and Bernabé (e.g. marginal supporter of the Glottalic theory, I think), even if dedicated to Indo-European reconstruction, deemed PIE revival – in some news in Spanish newspaper El Mundo – a “uthopia“, but considered at the same time possible that Greek and Latin (respectively) became EU’s official language: it’s not that they don’t consider speaking PIE impossible, but only that there are “better” alternatives: better, I guess, for Romance or Greek speakers or philologists…

About Proto-Indo-European language revival for Europe, thus, it is difficult to ascertain if it is the most rational choice, as it is to ascertain if liberal thoughts are more rational than conservative ones. I have lived in other countries within the European Union, and have visited other parts of Spain where the spoken language is not Spanish; from that experience, the different attitudes I’ve found are overwhelming: when you speak in English or German anywhere in Europe, the conversation is everything but fluent; also, if you speak English in the UK, German in Germany, French in France, or Czech in Czechia, even mastering quite well the regional language, you’ll never get the same reaction as if a Catalan (from a Catalan-speaking region) speaks Spanish in, say, Galicia (a Galician-Portuguese speaking region), as both use a language (Spanish) common to both of them. That was also the idea behind the first Esperanto out there, probably Volapük, and it has been the idea behind every conlang trying to be THE International Auxiliary Language since then; and none has succeeded. That was also the idea behind Hebrew revival in Israel, for speakers of a hundred different languages living in the same territory: they had other modern, common languages to choose instead of an ancient, partially incomplete, and “difficult” (in Esperantist terms) one, too, and it succeeded.

Latin use in Europe, on the other hand, has been declining ever since the first Romance dialects developed, and had its latest offcial (i.e. legal) use in Europe, apart from the Catholic church, at the beginning of the XX century in Hungary – curiously enough, a non-Indo-European speaking country. Its revival has been proposed a thousand times since then, but has never recovered its prestige, as Germanic-speaking countries have taken the lead in Western Europe, and Slavic-speaking countries in the East. It is hard to explain now why English- or German- or Polish-speaking peoples should learn and speak again the language of the Romans and the Roman Empire, with which they have little history in common…

The rest of known language revivals, like Cornish or Manx, or even e.g. the partial revival (“sociolect”) of Katharevousa Greek, not to talk about the so-called “revivals” – in fact “language revitalizations” – of Basque, Catalan, Breton, Ukrainian, etc. have been just regionally oriented language (or prestige + vocabulary) revivals with cultural or social purposes.

So, is Proto-Indo-European revival a “correct”, or “sufficiently rational” option, given the known facts? As an opinion, it is neither correct nor incorrect, as being “Indo-Europeanist for Europe” is like being leftist or conservative in politics; just like supporting Hebrew revival wasn’t (a hundred years ago) “sufficiently rational” in itself, and controversy over its revival have never ended. But, the reasons behind PIE revival can and should be questioned, as the reasons behind a conlang adoption (i.e. the concepts of “better” and “easier” when applied to language) can and should be critically reviewed. In Proto-Indo-European, it refers – I think – to two main questions:

1) Did Proto-Indo-European exist? i.e. can we confidently consider any proto-language something different from especulation or mere unproven hypothesis? The answer is “it depends”. Proto-Indo-European was probably a language spoken by prehistorical people, as probable as any generally accepted scientific theory we can support without experimental proofs, like theories on the Universe, its creation or development: they might prove wrong in the future, but – following the necessary abstraction and common sense – it’s not difficult to accept most individual premises and facts surrounding them. That migh be said about proto-languages like Proto-Slavic (ca. 1 AD), Proto-Germanic (ca. 1000 BC), Proto-Greek or Proto-Indo-Iranian (ca. 2000 BC) or Proto-Indo-European, especially about its European or North-Western subbranch (ca. 2500-2000 BC); on the other hand, however, about proto-languages like ‘Proto-Eurasiatic’ or ‘Proto-Nostratic’, or ‘Proto-Indo-Tyrrhenian’, or ‘Proto-Thraco-Illyrian’, or ‘Proto-Indo-Uralic’, or ‘Proto-Italo-Celtic’ (or even Proto-Italic), or ‘Proto-Balto-Slavic’, and the hundred other proposed combinations, it is impossible to prove beyond doubt if and when they were languages at all.

2) Is the Proto-Indo-European reconstruction trustable enough to be “revived”? i.e. can we consider it a speakable language, or just a linguistic theoretical approach? Again, it depends, but here mostly mixed with political opinions. In light of Ancient Hebrew – a language that ceased to be spoken 2500 years ago -, “revived” as a modern language introducing thousands of newly coined terms – many of them from Indo-European origin -, to the point that some want to name it “Israeli”, instead of “Hebrew” (as we call MIE “European” or “Europaio” instead of “Indo-European”), I guess the answer is clearly yes, it’s possible: in any possible case, Indo-European languages have a continuated history of more than 4000 years, and modern terms need only (in most cases) a sound-law adjustment to be translated into PIE. Also, in light of the other proto-languages with a high scientifical basis and a similar time span, like Proto-Uralic, Proto-Semitic or Proto-Dravidian, there is no possible comparison with Proto-Indo-European: while PIE is practically a fully reconstructed and well-known language without written texts to ‘confirm’ our knowledge, the rest are just experimental (mainly vocabulary-based) reconstructions. There are, thus, proto-languages and proto-languages, as there are well-known natural dead languages and poorly attested ones; PIE is therefore one of the few ones which might be called today a real, natural language, like Proto-Germanic, Proto-Slavic or Proto-Indo-Aryan.

However, anti-Europeanists (or, better, anti-Indo-Europeanists for the European Union) won’t find it difficult to say a simple “a proto-language is not enough to be revived, as Ancient Hebrew was written down and PIE wasn’t”, thus disguising their sceptic views on the politics behind the project with seemingly rational discussion. While others will also state, in light of our clear confrontation with conlangs, that “proto-language is nothing different from a conlang”, thus disguising their real interest in spreading their personal desire that a proto-language be similar to a conlang. One only has to say: “Classical Latin couldn’t be reconstructed by comparing Spanish, French and Italian” – when, in fact, the question should be something like “could the common, Late Vulgar Latin, be reconstructed with a high degree of confidence, having just the writings of the first mediaeval romance languages?” The answer is probably a simple “yes,and quite well”, until proven the contrary, but by expressing the first doubt one can easily transform the possible-reconstruction argument in an apparently unlikely one; enough to convince those who want to be convinced…

Thus, whereas some people consider PIE a natural language, confidently reconstructed, but impossible to speak today because of political matters, others just consider it another invention, nothing different from Esperanto, while Esperantist talk about it as a “worse” or “more difficult” alternative to it: you could nevertheless find all opinions mixed together when it comes to destructive discussions, as the objective is not to defend an own rational and worked idea, but simply to destroy the appearance (or likelihood, in sophistic terms) of the rival’s idea. Be it anti-Europeanism, anti-Indo-European-reconstrution or anti-everything-else-than-Esperanto, you don’t have to defend your position: just repeat your known anti- cliches, and you’ve “won”. Apparently, at least.

Cicero noted what Greek rhetors already knew before about usual debates, and how arguments should be made and countered so that no idea is left accepted. In that sense, discussions were (and are) generally so unnecessary, that the Socratic Method seems to be still the best philosophical approach to discussions, even those concerning scientifical (i.e. “most probable”) facts: Instead of arriving at answers, non-expert (and often expert) discussion is used to break down the theories others hold, not “to go beyond the axioms and postulates we take for granted” and obtain a better knowledge, as Greek philosophers put it, but just to destroy what others build up.

So, for example, we might get these general rules to counter any argument, even if it’s not only based on opinions, but also on generally accepted facts:

1) Demonstrate the falseness of a part of the rival’s argument; then, infer the falseness of the whole reasoning. For example, let’s say Gimbutas’ view is out-dated, or that we at Dnghu included something considered nowadays ‘wrong’ in our grammar: then PIE revival is also mistaken; nothing more to explain. Or, let’s say that Hebrew revival is not “equal” to a proto-language revival, and that therefore the comparison is ‘false’ – even if comparisons are there to compare similar cases, not “equal” cases, which would be absurd – then, the whole PIE revival project is ‘equivocal’ or ‘absurd’. That’s the view about PIE revival you can find in some comments made on American blogs out there.

2) You can also confirm a part of your rival’s argument, and then, by doing it, carry that argument to its extreme, to the extent that the consequences of it are intolerable, and the paroxism completely distorts your rival’s argument. That’s more or less what I usually do when confronting conlanging as a real option for the European Union, by saying “OK, let’s adopt the ‘better’ and ‘easier’ language: first Esperanto, then the “better” and “easier” Esperanzo, then Lojban, then Pilosofio, then Mazematio, etc. etc. ad infinitum” – so, as a conclusion, one might accept that “better” and “easier” are not actually good reasons to adopt a language; hence the arguments based on “better” and “easier” cliches are opinion, not ratio.

3) The most common now (and then, I guess, in spoken language) is personal discredit, by which you can infer that his argument is also corrupted. That is what some have made when lacking more arguments, calling me personally (and the Indo-European language Association in general ?!) a “racist”, “nazi”, or “KKK-like” group; or trying to discredit me personally by saying I don’t master the English language; or that I misspelled or ‘was wrong’ in reconstructing this or that PIE name or noun; or even just because I am “an amateur”, – thus suggesting we all have to be “language professionals” to propose a trustable PIE revival. A recent example of this is our latest Esperantist visitor, saying I am “close to being racist” because I propose PIE for the EU – thus obviously inviting readers to identify “language=race”, saying that “I propose one language = I propose one race = I am a racist”, and therefore if “I=racist” and “I propose PIE revival” => “PIE=x”. The whole reasoning is nonsense, but he is not the first – and won’t be the last – educated individual to say (and possibly believe) that…

4) The fourth is actually only a minor method derived from the third, used in desperate cases, which consists on taking a sensible, emotional example of the consequences of the generalization of the rival’s argument, to demonstrate the moral baseness of the one who defends it; then, if he is discredited, his argument is corrupted, too [see point 3]… That is what some desperate people do when saying that PIE revival for the EU is “bad” (or “worse”) for non-IE-language-speakers like Finnish, Hungarian, Estonian, Basque or Maltese peoples. In fact, anyone who had taken a look at our website, or had made a quick search about me, would have found that I began this project of PIE revival to defend European languages (at least minority languages, as national or official languages are already well protected) against the European Union’s English officious imperium and English-German-French official triumvirate. Also, if we left PIE revival, only some languages (the official, i.e. national ones, 25 today) would get EU support, while the rest just die out or resist with some regional or private support. With Modern Indo-European, on the other hand, there will only be one official language supported by the European Union, and the rest really equal in front of each other and the Union, be it English, Maltese, Basque, Saami or Piedmontese. Nowadays, English is the language spoken in institutions, Maltese has an official status before the EU, while Saami is official in its country, Basque is only official in its territory, and Piedmontese, Asturian, Breton, and the majority of EU regional languages are only privately and locally defended. Nevertheless, one only has to say “supporting Indo-European is what Nazis did, PIE revival is racist and wants to destroy non-Indo-European peoples and cultures”; and, there you are: nothing proven, nothing reasoned, but the simplest and most efficient FUD you can find to counter the thousand arguments in favour of this revival project.

However unnecessary and unfruitful it might seem, I still discuss – or even directly look for debate -, because I get a benefit of such long, active pauses from my study, unlike those tiny passive TV- or radio-pauses I insert between study hours, especially in these stressful exam periods. Indeed I can find something to discuss in any website at any time, but I’m generally interested in debating these language political options. Nevertheless, I find it difficult to understand why some people get mad (at me, the project, or even the association or the whole world), when in fact taking part on any discussion is freely accepted by all of us, and it’s me who put new ideas and proposals on the table, and the others who just have to criticize them…

Something valuable for life I learned from psychology (possibly the only thing…) is about Chomsky’s reaction on Skinner’s comments: my professor (close to Freudian psychoanalysis), who told us the story – I hope I got it well, I cannot find it out there – thought it was Skinner who “won” the debate, by answering to Chomsky’s criticism, who in turn had criticized Skinner’s work, Verbal Behaviour, for his “scientistic”, not scientific, concept of the human mind. In fact, the younger Chomsky had just applied science to psychology (a need that psychology still has), simplifying the understanding of mind with a strict cognitive view, and criticizing some traditional views that psychologists accepted as ‘normal’. Skinner and those who followed his behavioural school of thought overreacted, mostly based on the belief that Chomsky’s reasons were against their lives and professional options, when in fact reason and opinion are in different planes. Chomsky, instead of entering the flame (yes, trolling existed back in the 60’s) did nothing. When asked years later, about why he didn’t reply as expected to all that criticism, he just said: “they missed the point”; he said what he had to say, criticized what he wanted, proposed an alternative, and left the discussion. And still, even by not answering, cognitive revolution provoked a shift in American psychology between the 1950s through the 1970s from being primarily behavioral to being primarily cognitive.

If you want to debate about opinions – be it PIE revival, Europeanism, general politics, Star Trek or the sex of angels -, entering into unending criticisms and personal attacks, that’s OK; but you should do it if and when you want, as I only do it because I obtain something beneficial, having a good time, laughing a little bit, relaxing from study, thinking about interesting reasons that might appear for or against my views or ideas, etc. And you should do it to get something in (re)turn, be it that same stress relief I (and most people) get, or other personal or professional benefits whatsoever. If not, if maybe you are getting more stressed trying to “convince” me or others, to “make us change our minds” with great one-minute ‘reasons’, by discussing directly your opinions as if they were ‘true‘, then you are clearly “missing the point” (using Chomsky’s words) with these discussions, and – as our latest Esperantist commenter (Mr. Janoski) puts it – “losing your time”, “trying to understand” something…