David Reich on the influence of ancient DNA on Archaeology and Linguistics

An interesting interview has appeared on The Atlantic, Ancient DNA Is Rewriting Human (and Neanderthal) History, on the occasion of the publication of David Reich’s book Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past.

Some interesting excerpts (I have emphasized some of Reich’s words):

On the efficiency of the Reich Lab

Zhang: How much does it cost to process an ancient DNA sample right now?

Reich: In our hands, a successful sample costs less than $200. That’s only two or three times more than processing them on a present-day person. And maybe about one-third to one half of the samples we screen are successful at this point.

This is probably the most controversial assessment for the Twitterverse, since it puts the Reich Lab at the top of the publishing chain, but I don’t find this fact controversial; at all.

Anyone interested in doing genetic studies has free datasets, papers, and bioinformatic tools at hand – thanks to his lab, mostly – to develop new methods and publish papers. Such secondary works won’t probably be published in journals with the highest impact factor, but what can you do, welcome to the scientific world…

Also, by the looks of it, every single researcher involved in recovering an archaeological sample is included as co-author of the papers, so there is a clear benefit for ‘local’ researchers collaborating with the Lab. Therefore, these researchers and their institutions are responsible for whatever unfair situation might be created by their exchange.

On Archaeology’s reaction to Kossinna and Nazi ideas:

Zhang: You actually had German collaborators drop out of a study because of these exact concerns, right? One of them wrote, “We must(!) avoid … being compared with the so-called ‘siedlungsarchäologie Method’ from Gustaf Kossinna!”

Reich: Yeah, that’s right. I think one of the things the ancient DNA is showing is actually the Corded Ware culture does correspond coherently to a group of people. I think that was a very sensitive issue to some of our coauthors, and one of the coauthors resigned because he felt we were returning to that idea of migration in archaeology that pots are the same as people. There have been a fair number of other coauthors from different parts of continental Europe who shared this anxiety.

We responded to this by adding a lot of content to our papers to discuss these issues and contextualize them. Our results are actually almost diametrically opposite from what Kossina thought because these Corded Ware people come from the East, a place that Kossina would have despised as a source for them. But nevertheless it is true that there’s big population movements, and so I think what the DNA is doing is it’s forcing the hand of this discussion in archaeology, showing that in fact, major movements of people do occur. They are sometimes sharp and dramatic, and they involve large-scale population replacements over a relatively short period of time. We now can see that for the first time.

What the genetics is finding is often outside the range of what the archaeologists are discussing these days.

This is mostly true: Genomics offers a whole new dimension to assess exchanges among groups, and help thus select anthropological models of cultural diffusion. They offer another way of interpreting prehistoric cultural evolution and change, including the investigation of potential languages of these cultures, ways of change and replacement, etc.

Also, he acknowledges that there is a lot of content added to the papers in search for context – and thus avoid simplistic assumptions and conclusions – , so this is a reasonable way to look at the (often erroneous) cultural and linguistic context which accompany most genetic papers, and even the new methods being developed to assess samples.

On the other hand, the fact that many in Archaeology didn’t want to discuss migrations does not mean that it was not discussed at all, as he seems to suggest.

On how Genomics fits with traditional disciplines

Zhang: I think at one point in your book you actually describe ancient DNA researchers as the “barbarians” at the gates of the study of history.

Reich: Yeah.

Zhang: Does it feel that way? Have you gotten into arguments with archaeologists over your findings?

Reich: I think archaeologists and linguists find it frustrating that we’re not trained in the language of archaeology and all these sensitivities like about Kossinna. Yet we have this really powerful tool which is this way of looking at things nobody has been able to look at before.

The point I was trying to make there was that even if we’re not always able to articulate the context of our findings very well, this is very new information, and a serious scholar really needs to take this on board. It’s dangerous. Barbarians may not talk in an educated and learned way but they have access to weapons and ways of looking at things that other people haven’t looked to. And time and again we’ve learned in the past that ignoring barbarians is a dangerous thing to do.

I think this is also mostly true: many academics find it frustrating to read these papers, most of which lack a minimal understanding of the topics being discussed.

For example, you can’t pretend to derive meaningful conclusions about Proto-Indo-Europeans knowing nothing about their language and the potential cultures associated with them (and why they were associated with them in the first place)…

I also agree with him in that the study of ancient DNA is a very powerful tool. Everyone involved in Anthropology and Archaeology should be trained these days in Genomics – or, at least, they should have the opportunity to do so.

On the dangers of Genomics

Reich: (…) I know there are extremists who are interested in genealogy and genetics. But I think those are very marginal people, and there’s, of course, a concern they may impinge on the mainstream.

But if you actually take any serious look at this data, it just confounds every stereotype. It’s revealing that the differences among populations we see today are actually only a few thousand years old at most and that everybody is mixed. I think that if you pay any attention to this world, and have any degree of seriousness, then you can’t come out feeling affirmed in the racist view of the world. You have to be more open to immigration. You have to be more open to the mixing of different peoples. That’s your own history.

I guess David Reich does not frequent forums on human genetics linked to ethnolinguistic identification, or he would not think of ‘extremists’ as marginal people. Or else we have a different view of what defines an ‘extremist’…


I did not have the best of opinions about David Reich – or any other geneticist involved in publishing anthropological theories, for that matter. I have always had great respect for their scientific work, though.

If anything, this article shows that he knows his own (and his fellow geneticists’) limitations, and the dangers and limitations of Genomics as a whole, so I have more respect for him – and anyone involved with his Lab’s work – after reading this piece.

I would sum up his interview with his humbling sentence:

We should think we really don’t know what we’re talking about.

NOTE. Also on the occasion of the publication of his book, Nature has published the piece Sex, power and ancient DNA – Turi King hails David Reich’s thrilling account of mapping humans through time and place.

After buying Lalueza-Fox’s recent book ‘La forja genètica d’Europa’, I don’t really feel like buying another book on Genomics and migrations from a geneticist. If you have read Reich’s book, please share your impressions.

EDIT (19 MAR 2018): Razib Khan has written a ‘preview of a review‘ that he intends to publish on the National Review, and it seems the book might be worth it, after all.

EDIT (20 MAR 2018): The New York Times’ Carl Zimmer writes a review, David Reich Unearths Human History Etched in Bone. Seen first in Razib Khan’s Gene Expression blog.

Quantitative analysis of population-scale family trees with millions of relatives


The paper Quantitative analysis of population-scale family trees with millions of relatives, by Kaplanis, Gordon, Shor, et al. Science (2018) 359(6379), based on a study of genealogical information at Geni, is today news worldwide.


Family trees have vast applications in multiple fields from genetics to anthropology and economics. However, the collection of extended family trees is tedious and usually relies on resources with limited geographical scope and complex data usage restrictions. Here, we collected 86 million profiles from publicly-available online data shared by genealogy enthusiasts. After extensive cleaning and validation, we obtained population-scale family trees, including a single pedigree of 13 million individuals. We leveraged the data to partition the genetic architecture of longevity by inspecting millions of relative pairs and to provide insights into the geographical dispersion of families. We also report a simple digital procedure to overlay other datasets with our resource in order to empower studies with population-scale genealogical data.

While the article is behind a paywall, you can still read its preprint at bioRxiv.

Excerpts interesting for genetic genealogy(emphasis mine):

Assessment of theories of familial dispersion

Familial dispersion is a major driving force of various genetic, economical, and demographic processes (…)

First, we analyzed sex-specific migration patterns (21) to resolve conflicting results regarding sex bias in human migration (52). Our results indicate that females migrate more than males in Western societies but over shorter distances. The median mother-child distances were significantly larger (Wilcox, one-tailed, p < 10−90) by a factor of 1.6x than father-child distances (Fig. 4A). This trend appeared throughout the 300 years of our analysis window, including in the most recent birth cohort, and was observed both in North American (Wilcox, one-tailed, p < 10−23) and European duos (Wilcox, one-tailed, p < 10−87). On the other hand, we found that the average mother-child distances (fig. S17) were significantly shorter than the father-child distances (t-test, p < 10−90), suggesting that long-range migration events are biased toward males. Consistent with this pattern, fathers displayed a significantly (p < 10−83) higher frequency than mothers to be born in a different country than their offspring (Fig. 4B). Again, this pattern was evident when restricting the data to North American or European duos. Taken together, males and females in Western societies show different migration distributions in which patrilocality occurs only in relatively local migration events and large-scale events that usually involve a change of country are more common in males than females.

An example of the genealogical and demographic information available on the website, with a real pedigree of ~6000 individuals. Green: profiles, red: marriages. The family tree spans about 7 generations

Next, we inspected the marital radius (the distance be-tween mates’ places of birth) and its effect on the genetic relatedness of couples (21). The isolation by distance theory of Malécot predicts that increases in the marital radius should exponentially decrease the genetic relatedness of individuals (53). But the magnitude of these forces is also a function of factors such as taboos against cousin marriages (54).

We started by analyzing temporal changes in the birth locations of couples in our cohort. Prior to the Industrial Revolution (<1750), most marriages occurred between peo-ple born only 10km from each other (Fig. 4A [black line]). Similar patterns were found when analyzing European-born individuals (fig. S18) or North American-born individuals (fig. S19). After the beginning of the second Industrial Revolution (1870), the marital radius rapidly increased and reached ~100km for most marriages in the birth cohort in 1950. Next, we analyzed the genetic relatedness (IBD) of couples as measured by tracing their genealogical ties (Fig. 4C). Between 1650 and 1850, the average IBD of couples was relatively stable and on the order of ~4th cousins, whereas IBD exhibited a rapid decrease post-1850. Overall, the medi-an marital radius for each year showed a strong correlation (R2 = 72%) with the expected IBD between couples. Every 70km increase in the marital radius correlated with a decrease in the genetic relatedness of couples by one meiosis event (Fig. 4D). This correlation matches previous isolation by distance forces in continental regions (55). However, this trend was not consistent over time and exhibits three phases. For the pre-1800 birth cohorts, the correlation between marital distance and IBD was insignificant (p > 0.2) and weak (R2 = 0.7%) (fig. S20A). Couples born around 1800-1850 showed a two-fold increase in their marital distance from 8km in 1800 to 19km in 1850. Marriages are usually about 20-25 years after birth and around this time (1820-1875) rapid transportation changes took place, such as the advent of railroad travel in most of Europe and the United States. However, the increase in marital distance was significantly (p < 10−13) coupled with an increase in genetic relat-edness, contrary to the isolation by distance theory (fig. S20B). Only for the cohorts born after 1850, did the data match (R2 = 80%) the theoretical model of isolation by distance (fig. S20C).Taken together, the data shows a 50-year lag between the advent of increased familial dispersion and the decline of genetic relatedness between couples. During this time, individuals continued to marry relatives despite the increased distance. From these results, we hypothesize that changes in 19th century transportation were not the primary cause for decreased consanguinity. Rather, our results suggest that shifting cultural factors played a more important role in the recent reduction of genetic relatedness of couples in Western societies.

EDIT 3/2/2018: Added details of the article.

We are all special, which also means that none of us is


Adam Rutherford writes You’re Descended from Royalty and So Is Everybody Else – Anybody you can name from ancient history is in your family tree, which I discovered via John Hawks’ new post The surprising connectedness of human genealogies over centuries.


One way to think of it is to accept that everyone of European descent should have billions of ancestors at a time in the 10th century, but there weren’t billions of people around then, so try to cram them into the number of people that actually were. The math that falls out of that apparent impasse is that all of the billions of lines of ancestry have coalesced into not just a small number of people, but effectively literally everyone who was alive at that time. So, by inference, if Charlemagne was alive in the ninth century, which we know he was, and he left descendants who are alive today, which we also know is true, then he is the ancestor of everyone of European descent alive in Europe today.

Since most of this blog’s posts support academic disciplines looking for answers to the Indo-European question, and gives constantly reasons against modern genetic (and phylogenetic) identification, I think it is worth at least a quick read for anyone interested in the field.

I recently referred to the interesting series of posts by Graham Coop on this matter.

Featured image: Europe around 800 – the map is public domain from from the Historical Atlas (New York, 1911)


Genetic vs. genealogical ancestors and actual geographical constraints


Interesting post from Graham Coop, Where did your genetic ancestors come from?

An excerpt:

A thousand years back I’m descended from nearly everyone everywhere in Europe. I’m related to these individuals via millions of lines of descent back through my vast family tree. Yet the majority of the lines back through my pedigree trace to people living in the UK and Western Europe. Many lines trace back to more distant locations, but these are relatively few in number compared to those tracing back to closer to home. Ancestors along each of these lines are (roughly) equally likely to contribute to my genome. Therefore, most of my roughly 2600 genetic ancestors from 1000 years ago, who contributed the majority of my genome to me, will be random people living in the UK and western Europe at that time (who happened to leave descendants).

Looking back a few thousand years more, I’m a descendant of nearly everyone who ever lived almost everywhere in the world (at least those who left descendants, and many did). Yet most of the just over ~6000 individuals from that time who contributed the majority of my genome to me will mostly be found all over Western Eurasia. There’s nothing much special about these individuals who happen to be my genetic ancestors a few thousand years back. They’re likely not royalty. My genetic ancestors are just a random subset of all of my genealogical ancestors, they just happen to be my genetic ancestors due to the vagaries of meiosis and recombination.

As always, a humbling example, e.g. for those looking at haplogroups in the distant past to make modern ethnolinguistic identifications.

Genetics in combination with genealogy poses a question akin to the Ship of Theseus paradox.

Featured image (from the article): Simulation of how much of your autosomal genome is present in each genealogical ancestor as we go back up the generations. Image explained in detail in the article How many genetic ancestors do I have?


From Adamic or the language of the Garden of Eden until the Tower of Babel: the confusion of tongues and the earliest dialects attested

No, I didn’t have a revelation today. I am just offering a little support exactly to what Dawkins and his Brights dislike, to show them extreme action causes extreme (re)actions. I’d like to play their radical game, too, offering some help in linguistics to those who have only naïve theories on the language of Eden.

These are the statements about the Adamic language and the Tower of Babel as Abrahamic texts, beliefs and traditions show:

  • Adamic was the language spoken by Adam and Eve in the Garden of Eden. Adamic is typically identified with either the language used by God to address Adam, or the language invented by Adam (Book of Genesis 2:19).
  • The Genesis is ambiguous on whether the language of Adam was preserved by Adam’s descendants until the confusion of tongues (Genesis 11:1-9), or if it began to evolve naturally even before Babel (Genesis 10:5), into what is usually called Chaldaic:
    1. Dante in his De Vulgari Eloquentia argues that the Adamic language is of divine origin and therefore unchangeable.
    2. In his Divina Commedia, however, Dante changes his view to the effect that the Adamic language was the product of Adam. This had the consequence that it could not any longer be regarded immutable, and hence Hebrew could not be regarded as identical with the language of Paradise..
  • Also, the nature of that original language remains controversial, interpretations showing many nationalist flavours:
    • Traditional Jewish exegesis such as Midrash (Genesis Rabbah 38) says that Adam spoke Old Hebrew or rather its linguistic ancestor Proto-Canaanite, because the names he gives Eve – “Isha” (Book of Genesis 2:23) and “Chava” (Genesis 3:20) – only make sense in Hebrew.
    • Traditional Christians based on Genesis 10:5 have assumed that the Japhetite, or Indo-European, languages are rather the direct descendants of the Adamic language, having separated before the confusion of tongues, by which also Hebrew was affected.
      1. Early Christian fathers claimed that Adam spoke Latin to explain why God would make it the liturgical language of his Church, although “Latin” here would be a loose way of referring to its ancestor, Proto-Italic or older Europe’s Indo-European.
      2. Modern traditional Catholics follow Anne Catherine Emmerick’s revelations (1790), which stated that the most direct descendants of the Adamic language were Bactrian, Zend and Indian languages (i.e., the Indo-Iranian languages), associating the Adamic language with the then-recent concept of the “common source” of these tongues, now known as Proto-Indo-European:

        This language was the pure Hebrew, or Chaldaic. The first tongue, the mother tongue, spoken by Adam, Shem, and Noah, was different, and it is now extant only in isolated dialects. Its first pure offshoots are the Zend, the sacred tongue of India, and the language of the Bactrians. In those languages, words may be found exactly similar to the Low German of my native place.

    • Many Muslim scholars, following the traditional Jewish identification of Pre-Hebrew as the Adamic language, hence classified within the Semitic language family (which includes the Ge’ez language used in the Book of Enoch), claim that Pre-Arabic – hence Proto-(West-)Semitic – is the original Adamic language. Most of them do not believe the Semitic languages were the direct descendants of the Adamic language, but rather trace them back to Abraham, instead of Noah and Adam.
  • The confusion of tongues is the initial fragmentation of human languages described in the Book of Genesis 11:1–9, as a result of the construction of the Tower of Babel.

    And the Lord said, Behold, the people is one, and they have all one language; and this they begin to do: and now nothing will be restrained from them, which they have imagined to do.

    Go to, let us go down, and there confound their language, that they may not understand one another’s speech.

    So the Lord scattered them abroad from thence upon the face of all the earth: and they left off to build the city.

    The language spoken by Noah and his descendants – whether the original Adamic language (either of divine origin or not) or the derived Chaldaic – split into seventy or seventy-two languages, according to the different traditions. The existence of only one language before Babel in Genesis 11:1

    And the whole earth was of one language, and of one speech

    has sometimes been interpreted as being in contradiction to Genesis 10:5

    Of these were the isles of the nations divided in their lands, every one after his tongue, after their families, in their nations.

    1. This issue only arises, however, if Genesis 10:5 is interpreted as taking place before and separate from the Tower of Babel story, instead of as an overview of events later described in detail in Genesis 11.
    2. It also necessitates that the reference to the earth being “divided” (Genesis 10:25) is taken to mean the division of languages, rather than a physical division of the earth (such as in the formation of continents).

So, to sum up, these are the facts known to us from comparative linguistics, related to those Abrahamic beliefs and interpretations and the biblical chronology:

  • Mainstream linguists – without any links to religion, just based on comparative grammar – have accepted some form or other of language superfamilies, from Eurasiatic and Afro-Asiatic < Nostratic < Borean < Proto-World language, which would correspond loosely to that common language of the Genesis that was spoken before it was (instantly?) “confounded” into different languages, hence the similar (or even worse) results obtained in reconstructing subgroupings (say Indo-Uralic, Ural-Altaic) than with a more global Nostratic or even Proto-World language.
  • Most of the earliest attested, reconstructed or (generally accepted) hypothetic languages, like Old Egyptian; (Semitic) Akkadian, Pre-Proto-Canaanite; (Indo-European) Europe’s Indo-European, Proto-Indo-Iranian, Proto-Greek, Common Anatolian; (Uralic) Proto-Finno-Ugric; (Sino-Tibetan) Proto-Sinitic; (Pre-)Proto-Dravidic; etc. can be traced back – depending on the archeological findings and linguistic theories, inherently inexact – to ca. 2500 BC.
  • It is therefore odd that before that date everything is ‘more blurred’ (so to speak) in linguistic findings and reconstructions of older linguistic ancestors – as e.g. the hypothesized laryngeals (or their phonetic output) in Late Proto-Indo-European, or the difficult reconstruction of Proto-Semitic, not to talk about Proto-Uralic or Proto-Sino-Tibetan. This is the strongest argument to support a theoretical instant split of a common (Chaldaic or Adamic) language into 70 or 72 derived languages, which we know from attested inscriptions, reconstructions or hypothesis, or which disappeared without a trace.
  • About their classification into language “families”, they might be related to the families based on consanguinity as described in the Bible, but identifications of those families by modern scholars have blurred the possible links (if any) between older language superfamilies and Noah’s sons; cf. Japhetic‘s simplistic identification with Indo-European, or Semitic‘s with “Semitic” languages. However, the more traditional identification of Japheth’s sons with “European” peoples (and therefore Eurasiatic languages), and Shem’s sons with (the old concept of) “Asian” peoples (hence with Afro-Asiatic languages) is more reasonable, leaving Ham’s sons with (at least) Austric and Dené-Caucasian languages (see Borean language tree).
  • Many biblical interpretations of the Adamic language share therefore mistakes inherent to the culturally-biased and simplistic views of many scholars, hence the identification of the original tongue as Proto-Semitic by Jews and Muslims, Proto-Indo-European by many Christians (since Rasmus Rask‘s first description of it as “Japetisk”), Sanskrit or Indo-Iranian (Aryan) by Hinduism, etc. That has hindered a more rational interpretation of the Bible and other sacred texts in light of the newest academic findings.

To sum up, we cannot know if the Adamic language existed, or its nature; we don’t know if Chaldaic (the common language before Babel) was the same as Adamic, or if not, if it was global (Proto-World language) or local to the Middle East (Nostratic?) according to Genesis 10:5. We can, however, defend mainstream Abrahamic beliefs on the confusion of tongues and the Tower of Babel as possible (“probability” based on extrapolation has little to do with religion and even with social events happened more than 4000 years ago) and that the descendants of Noah might have spoken a common language until the centuries on either side of 2500 BC:

All that nonwithstanding any possible interpretations of Adamic or Chaldaic from Old Earth Creationists, who usually take the historical accounts of the Genesis (its literal interpretation) as real facts just from the Tower of Babel on, dismissing the rest of the biblical data from the Flood backwards, and indeed any timeline calculated with genealogies by Young Earth Creationists.