Population size potentially affecting rates of language change


Open access Population Size and the Rate of Language Evolution: A Test Across Indo-European, Austronesian, and Bantu Languages, by Greenhill et al. Front. Psychol (2018) 9:576.

Summary (emphasis mine):

What role does speaker population size play in shaping rates of language evolution? There has been little consensus on the expected relationship between rates and patterns of language change and speaker population size, with some predicting faster rates of change in smaller populations, and others expecting greater change in larger populations. The growth of comparative databases has allowed population size effects to be investigated across a wide range of language groups, with mixed results. One recent study of a group of Polynesian languages revealed greater rates of word gain in larger populations and greater rates of word loss in smaller populations. However, that test was restricted to 20 closely related languages from small Oceanic islands. Here, we test if this pattern is a general feature of language evolution across a larger and more diverse sample of languages from both continental and island populations. We analyzed comparative language data for 153 pairs of closely-related sister languages from three of the world’s largest language families: Austronesian, Indo-European, and Niger-Congo. We find some evidence that rates of word loss are significantly greater in smaller languages for the Indo-European comparisons, but we find no significant patterns in the other two language families. These results suggest either that the influence of population size on rates and patterns of language evolution is not universal, or that it is sufficiently weak that it may be overwhelmed by other influences in some cases. Further investigation, for a greater number of language comparisons and a wider range of language features, may determine which of these explanations holds true.

Interesting excerpts:

Our analysis suggests that, as for Polynesian languages, smaller Indo-European languages have greater rates of word loss from basic vocabulary. This result is consistent with the claim that smaller populations are at greater risk of loss of language elements, and other aspects of culture, due to effects of incomplete sampling of variants over generations. However, we note that the relatively small sample size for this dataset complicates the interpretation of this result. Least squares regression after Welch & Waxman test has the same false positive rate but has much less power than Poisson regression when sample size is small (~ten or fewer pairs, Hua et al., 2015). This makes it difficult to interpret the inconsistent results of these two analyses, as they may be due to their difference in the statistical power. Hence, the negative relationship between rates of loss and population size for Indo-European languages would benefit from additional investigation. We do not find evidence for a negative relationship between population size and word loss rates in the Austronesian and Bantu groups. This finding suggests that either these datasets contain too few language variants to have sufficient power to detect rate differences, or that the increased loss rate in small populations is not a universal phenomenon, or that it is a relatively weak force in some language groups and thus may be overwhelmed by other social, linguistic or demographic factors.

Regarding potential drawbacks of the study:

[M]easuring speech community size is notoriously difficult. How exactly does one delimit a speech community (Crystal, 2008) and what degree of proficiency in a language is sufficient to be part of the community (Bloomfield, 1933)? This task is made harder as there are few national censuses that collect detailed speaker statistics. Further, speaker population size can change rapidly with many modern world languages (especially the Indo-European languages) experiencing rapid growth over the last few hundred years (Crystal, 2008), while others have experienced catastrophic declines (Bowern, 2010). For the same reasons, the difficulty of obtaining accurate population estimates is also a problem in biology. Furthermore, the relevant parameter for genetic change—the effective population size—is difficult to estimate directly, even when accurate census information is available (Wang et al., 2016). Likewise, there may be an important role played by population and network density—tight-knit networks may inhibit change, while loosely integrated speech communities (regardless of their size), may facilitate change (Granovetter, 1973; Milroy and Milroy, 1992). One way forward here is perhaps to simulate rates of change over a range of population sizes and network topologies (c.f. Reali et al., 2018).

As conclusions:

Firstly, we provide some evidence that rates of language change can be affected by demographic factors. Even if the effect is not universal, the finding of significant associations between population size and patterns of linguistic change in some languages urges caution for any analysis of language evolution that makes an assumption of uniform rates of change. These results also potentially provide a window on processes of language change in these lineages, providing further impetus to investigate the effect of number of speakers on patterns of language transmission and loss. A more detailed study of language change for a larger number of comparisons might clarify the relationship between population size and word loss rates, particularly within the Indo-European language family.

Secondly, we have shown that the significant patterns of language change identified in a previous study are not a universal phenomenon. Unlike the study of Polynesian languages, we did not find any significant relationships between word gain rate and population size, and the association between loss rates and population size was not evident for all language families analyzed. The lack of universal relationships suggests that it may be difficult to draw general conclusions about the influence of demographic factors on patterns and rates of language change. Many other factors have been proposed to influence rates of language change (Greenhill, 2014) including population density, social structure (Nettle, 1999; Labov, 2007; Ke et al., 2008; Trudgill, 2011), degree of contact, and connectedness with other languages (Matras, 2009; Bowern, 2010), degree of language diffusion within a speech community (Wichmann et al., 2008), degree of bilingualism or multilingualism (Lupyan and Dale, 2010; Bentz and Winter, 2013), language group diversity (Atkinson et al., 2008) and environmental factors such as habitat heterogeneity and latitude (Bowern, 2010; Blust, 2013; Amano et al., 2014). These factors might mediate or overwhelm the effect of speaker population size.

We find no evidence to support the hypothesis that uptake of new words should be faster in small populations, which is based on the assumption that new words can diffuse more efficiently through a smaller speaker population than a larger one (Nettle, 1999). Nor do we find support for the suggestion that large, widespread languages have a tendency to lose linguistic features a greater rate (Lupyan and Dale, 2010). However, this latter hypothesis is predominantly expected to explain loss of complex linguistic morphology (such as case systems), which may be harder for non-native speakers to learn, rather than basic vocabulary studied here which may be comparatively easier for second language learners to acquire (but see Kempe and Brooks, 2018). Further, our results cannot be interpreted as confirmation of previous studies that suggest there is no effect of population size on rates (Wichmann and Holman, 2009). The detection of significant patterns in rates of lexical change with population size variation in the Polynesian and Indo-European languages, but the failure to identify similar patterns in the Bantu and Austronesian data, suggests that patterns of rates may need to be investigated on a case-by-case basis.


Fast life history as adaptive regional response to less hospitable and unstable Early Indo-Iranian territory


Another interesting paper, Life in the fast lane: Settled pastoralism in the Central Eurasian Steppe during the Middle Bronze Age, by Judd et al. (2017).

Abstract (emphasis mine):

We tested the hypothesis that the purported unstable climate in the South Urals region during the Middle Bronze Age (MBA) resulted in health instability and social stress as evidenced by skeletal response.The skeletal sample (n = 99) derived from Kamennyi Ambar 5 (KA-5), a MBA kurgan cemetery (2040-1730 cal. BCE, 2 sigma) associated with the Sintashta culture. Skeletal stress indicators assessed included cribra orbitalia, porotic hyperostosis, dental enamel hypoplasia, and tibia periosteal new bone growth. Dental disease (caries, abscess, calculus, and periodontitis) and trauma were scored. Results were compared to regional data from the nearby Samara Valley, spanning the Early to Late Bronze Age (EBA, LBA).Lesions were minimal for the KA-5 and MBA-LBA groups except for periodontitis and dental calculus. No unambiguous weapon injuries or injuries associated with violence were observed for the KA-5 group; few injuries occurred at other sites. Subadults (<18 years) formed the majority of each sample. At KA-5, subadults accounted for 75% of the sample with 10% (n = 10) estimated to be 14-18 years of age.Skeletal stress markers and injuries were uncommon among the KA-5 and regional groups, but a MBA-LBA high subadult mortality indicates elevated frailty levels and inability to survive acute illnesses. Following an optimal weaning program, subadults were at risk for physiological insult and many succumbed. Only a small number of individuals attained biological maturity during the MBA, suggesting that a fast life history was an adaptive regional response to a less hospitable and perhaps unstable environment

Interesting excerpt:

The low frequencies of violence-related trauma contrast sharply to the epidemic of skeletal violence observed during the Iron Age (8th-2nd centuries BC) at other regional sites, notably Aymyrlyg (Murphy, 2003). The paucity of weapon-related injuries among the Bronze Age groups may be the outcome of many factors. While weapons and chariots did exist, they could have had multi-functional contexts aside from warfare. Individuals killed in warfare may not be present if bodies were abandoned on battlefields or disposed of where the individual died. Alternatively, warfare may have involved the capture of humans in addition to material resources, such as herds or weapons, leaving no skeletal trace of physical violence (Martin, Harrod, & Fields, 2010; Wilkinson, 1997). Trauma analysis is further complicated by the lack of soft tissue, which is the target for those attempting to kill or immobilize their opponent (Judd, 2008; Judd & Redfern, 2012), and it is possible that violence-related injuries or burns sustained from metallurgy were absent because only the soft tissue was affected. The skeletal evidence for trauma is minimal at KA-5 and its contemporary sites, which may be partially attributed to the less than desirable preservation of the collections. Based on the skeletal material available, internal or external social tensions resulting in altercations are not supported.

The lack of material or skeletal evidence for warfare has encouraged a more optimistic interpretation of Steppe community relations living with environmental instability. Herding camps, such as that at Peschanyi Dol, provided evidence for assorted groups utilizing the site based on the clay sources of ceramic sherds found in the camp’s trash pit (Anthony, Brown, Kuznetsov, & Mochalov, 2016a). Anthony et al. (2016a, 2016b) suggested that herders shifted according to a schedule that permitted several settlements to use prime camp sites. They proposed a cooperative region-wide organization of groups that worked together in three key activities: mining, summer herding, and winter wolf-dog rituals (Anthony et al., 2016a). A similar regional social arrangement may have existed in the KA-5 vicinity and accords with the livestock management models proposed by Stobbe and colleagues (2016).

Demographic distribution of KA-5 and Samara Valley samples

Using the available sampling, and based on the absence of skeletal stress markers (in combination with the high subadult mortality among Sintashta samples), the study concludes that the available data cannot support the traditional view that MBA was a period of social strife.

Since other Samara Valley samples do not follow a similar trend with Sintashta, a homogeneous, long-term relationship with the environment is suggested for this culture, independent of climatic shift or unpredictability.

We already know that R1a-Z645 subclades, which expanded with the Corded Ware culture, appeared in a Poltavka cemetery rather early, which, coupled with the incomplete replacement that we see in Early Indo-Iranian communities, suggest a gradual expansion of its (mostly R1a-Z93) subclades among Proto-Indo-Iranians.

My limited, speculative proposal of how this lineage replacement took place was based precisely on this traditional description of partially isolated, warring communities:

The process by which this cultural assimilation happened in the Sintashta-Petrovka region, given the presupposed warring nature of their contacts, remains unclear. It is conceivable, in a region of highly fortified settlements, to think about alliances of different groups against each other, akin to the situation found in Bronze Age Europe: a minority of Abashevo chiefs and their families would dominate over certain fortified settlements and wage war against other, neighbouring tribes.

After a certain number of generations, the most successful settlements would have replaced the paternal lineages of the region – with only a slight drift to steppe admixture observed in PCA compared to Corded Ware –, while the majority of the population in these settlements – including females, commoners and slaves – retained the original Poltavka culture. R1b1a1a2a2-Z2103 lineages were mostly replaced in the region by haplogroup R1a1a1b2-Z93, as demonstrated by the later expansion of its subclades with Andronovo and Srubna cultures, and by present-day distribution of R1a1a1b2-Z93 lineages in Eurasia.

Now we see more proof for a likely bottleneck in a more peaceful (or, rather, cooperative) region, as recently described by Anthony. In fact, if you take a look at the sampling of the paper (which is obviously not randomised), Potapovka – coeval with Sintashta, but genetically more similar to the earlier Yamna and Poltavka – follows a less steep demographic distribution than Sintashta, with succeeding Srubna (which shows a marked shift toward the Corded Ware cluster) maintaining a similar demographic pattern…

I guess the answer is probably between both positions, war and environment; the main issue is which one was the most important contributing factor. If we judged the whole picture solely by the samples studied in this paper, the answer would be the environment.

In any case, even though we like to see every single paternal lineage substitution in a territory as necessarily linked with a meaningful migration coupled with ethnolinguistic change, sometimes this is not the case; as, the replacement of R1b-L23 lineages in Proto-Balto-Slavic and Proto-Indo-Iranian communities by R1a-Z645 subclades; the replacement of R1a-Z645 by N1c-L392 subclades in Uralic-speaking territories; or the replacement of native lineages by R1b-L51 subclades among Basques.


Human dietary evolution in central Germany, and relationship of Únětice to Corded Ware and Bell Beaker cultures


Open access 4000 years of human dietary evolution in central Germany, from the first farmers to the first elites, by Münster et al. PLOS One (2018).

Excerpts (emphasis mine):

This study of human diet between the early stages of the farming lifestyle and the Early Bronze Age in the MES, based on carbon and nitrogen isotope analyses, is amongst the most comprehensive of its kind. Or results show that human dietary behaviour has changed significantly throughout the study period. A distinct increase in the proportion of animal protein in the human diet can be identified over time, a trend which only the people from the BBC did not follow. The results of the stable isotope analyses are consistent with epidemiological data on caries frequency, which indicate the highest proportions of carbohydrates in the human diet in the EN and the lowest in the EBA [19]. These findings may have been due to an increased consumption of either meat or dairy products. Although meat and dairy consumption cannot be distinguished by means of stable isotope data or caries frequency, molecular-genetic analyses of lactase persistence argue against an increased consumption of fresh milk [9]. However, although approximately 70% of the world population has a lactose intolerance, most of them can tolerate dairy foods or lactose-containing foods without developing symptoms [128]. It therefore comes as no surprise that the use of processed milk, i.e. dairy products, appears to have set in early on in the Neolithic period [99]. Unarguably, there was an increasing stabilisation of the supply of meat and secondary animal products throughout the Neolithic. The data dynamics overall argue against an equal availability of animal-derived protein to all sections of the various populations, which attests to early processes of specialisation, individualisation and hierarchisation. Moreover, population-genetic processes are also reflected in the development of human dietary habits. From the 4th millennium BC onwards, groups moved into the MES from the north, sometimes accompanied by violence [6,29], and fundamental demographic changes took place in the FN with the arrival of CWC groups from the north-eastern steppes and the BBC from south-western Europe [6,7]. This former pastoral steppe component, in particular, may have been responsible for the fact that animal-based foodstuffs reached their highest importance in the FN and EBA. Differences in the consumption of animal-derived products between the sexes resulted in significantly lower δ15N values and less access to animal protein in females. Besides behavioural choices as to what food to consume, numerous other nutritional and gender-specific factors must certainly be taken into account when assessing the subsistence and nutritional balance of individuals. In the future, analysis of single amino acids of nitrogen and the compound-specific carbon isotope analysis of lipids and bone mineral may help providing more detailed and nuanced insight on aspects of human diet, such as protein sources in complex foodwebs, nutritional stress and disease [129131]. They should become a standard in isotope studies and applied more often and routinely.

Overview of investigated sites and archaeological chronology of Neolithic and Early Bronze Age central Germany. The Stroke Ornamented Culture and Michelsberg Culture are not represented in our sample due to low rate of anthropological findings. Chronology after Schwarz in [29]. https://doi.org/10.1371/journal.pone.0194862.g001

Regarding specifically differences between Corded Ware (CWC) and Bell Beaker (BBC) cultures in Saxony-Anhalt, a region already known to show a resurge of the previous population after the Únětice period:

Based on isotope data from collagen [104], a diet with a high protein content from meat or dairy products has been postulated for CWC groups from south-western Germany, though researchers there were also unable to distinguish between the two sources of protein. The consumption of fresh milk and the consumption of dairy products such as cheese, yoghurt and kefir may also be erroneously dated to the same period and associated with lactase persistence. A newly reported genome-wide SNP dataset from 230 West Eurasians dating from between 6,500 and 300 cal. BC [9] has shown, like earlier studies [105], that no notable increase in lactase persistence in Europe appears to have occurred prior to 2,000 BC. It was and is a fact that milk is not a natural foodstuff for adult consumption, unless one is prepared to negate the numerous symptoms of lactose intolerance, including abdominal pain, bloating, flatulence, diarrhoea, asthma and others. Cultural evolution in conjunction with natural selection has made it possible for us to use milk and its secondary products as a source of protein and energy. Whilst the continuous increase in animal protein in the diet of the Neolithic populations of the MES from the LBK to the Early Bronze Age can undoubtedly partly be traced back to an intensified use of secondary animal products over the course of the Neolithic, it is difficult to estimate how great a contribution this made to the increase in δ15N values. Judging from molecular-genetic data on lactase persistence, however, the consumption of fresh milk, at least, appears to have first begun to have an impact on the protein balance of individuals around 4,000 years ago [9].

NOTE. Regarding lactase persistence, we now know that Ukraine_Eneolithic sample I6561, of haplogroup R1a-Z93 (hence probably related to the later expansion of the Corded Ware culture) is the nearest sample to the population that might have expanded the 13910*T lactase persistence allele in Northern Europe.

Sex-specific differences in stable carbon and nitrogen isotope values in humans. https://doi.org/10.1371/journal.pone.0194862.g004

[After the massive influx of the CWC into central Europe in the FN] The dietary profile once again exhibits an increase in the mean δ15N values, to 10.1 ± 1.0 ‰. The BBC, which spread somewhat later throughout north and central Europe (with the arrival of the CWC jointly making up Event C) and whose origins are presumed to have been in south-western Europe, constitutes an exception, not just from the point of view of genetics. In contrast to the general diachronic trend consisting of raised δ15N values in the cultural groups examined, the BBC exhibited a nutritional decrease in mean δ15N values to 9.7 ± 0.7 ‰. The divergence between the CWC and the BBC to be seen in their funerary rites, despite their chronological and sometimes also territorial coexistence, is thus also visible in their dietary habits. Comparative examinations of CWC sites in southern Germany have shown that their mean δ15N values were, in fact, comparable to those of the CWC in the MES (δ13C: -19.9 ± 0.6 ‰, δ15N: 10.8 ± 0.7 ‰, n = 32), despite exhibiting significant variation between and even within the sites, thus pointing to the diverging subsistence strategies of different communities [104]. The UC, which followed the CWC in the MES, bore close affinities to its forerunner in terms of its population genetics, thus supporting the hypothesis that the BBC only had a minimal genetic impact on the UC [6,7]. The close genetic links between the UC and the CWC, however, are also seen in very similar mean nitrogen values which, at 10.4 ± 0.7 ‰, were the highest in the overall sample. Moreover, a striking aspect in the evaluation of the mean δ15N values over time is a clear tendency towards rising standard deviations (S4 Fig). It is highly likely that this reflects increased social differentiation in society at the end of the Neolithic and in the Early Bronze Age. Socioeconomic advancement led to differences in status within communities and even to the formation of an elite, the differences applying to numerous facets of life, including dietary habits [60].

Chronological development of the distribution of δ15N-values according to the different archaeological periods. >Numbers of individuals are displayed in parentheses. https://doi.org/10.1371/journal.pone.0194862.g007

I think the overstudied region of Saxony-Anhalt and the Tollense valley region may not be exactly where the Proto-Balto-Slavic homeland actually formed, but they are certainly showing interesting hints to how (and where approximately) it might have happened…


Basal Eurasians split off ca. 80kya and contributed ca. 10% to EEF


Efficiently inferring the demographic history of many populations with allele count data, by Kamm, Terhorst, Durbin, & Song (2018).


The sample frequency spectrum (SFS), or histogram of allele counts, is an important summary statistic in evolutionary biology, and is often used to infer the history of population size changes, migrations, and other demographic events affecting a set of populations. The expected multipopulation SFS under a given demographic model can be efficiently computed when the populations in the model are related by a tree, scaling to hundreds of populations. Admixture, back-migration, and introgression are common natural processes that violate the assumption of a tree-like population history, however, and until now the expected SFS could be computed for only a handful of populations when the demographic history is not a tree. In this article, we present a new method for efficiently computing the expected SFS and linear functionals of it, for demographies described by general directed acyclic graphs. This method can scale to more populations than previously possible for complex demographic histories including admixture. We apply our method to an 8-population SFS to estimate the timing and strength of a proposed “basal Eurasian” admixture event in human history. We implement and release our method in a new open-source software package momi2.

Inferred model and bootstraps for the 11 population demography described in Section 4. In the foreground (blue) is our point estimate from maximum composite likelihood; in the background (gray) are 300 bootstrap reestimates, which were created by splitting the data into 100 equally sized contiguous blocks, resampling these blocks with replacement, and refitting the model. The y-axis is linear below 5 x 104, then follows a logarithmic scale above 5 x 104.

Link to momi2 software package (GitHub).

Discovered via Iosif Lazaridis.

Featured image, from the article: “An example of a 3-population Moran model. The bottom of the graph corresponds to the present and the top to the past. Population 2 receives admixture from population 3 after splitting from population 1. Other features of the demography include archaic samples in population 1, and various size changes along the edges of this demography.”