Humphrys genealogy

Genealogy research by Mark Humphrys.


Common ancestors of all humans - Genetics


Common ancestors of all humans (using genetics)



Summary

DNA studies can tell us about some interesting CA's such as Mitochondrial Eve and Y chromosome Adam, but these CA's are much older than the MRCA.

In fact, by focusing only on common ancestry of DNA that gets inherited, all CA's found in genetic studies will be much older than the MRCA.



CAs using DNA


Most recent female-female line ancestor of all living humans ("Mitochondrial Eve")

Take all people alive today. Take all their mothers. This is a smaller set. Take all their mothers. This is a smaller set. And so on, until you get to 1 person ([Slatkin, 1999] says you must get to 1 person since mathematically this is "a pure death process that has an absorbing state at 1").

Our most recent female-female line ancestor is called "Mitochondrial Eve" since Mitochondrial DNA passes (almost) entirely through the female line and so may be used to estimate a date for her. Contrary to a lot of confused discussion, e.g. [Ayala, 1995], Mitochondrial Eve's existence is not in doubt. We can work it out from our armchair. What is in dispute is the date, which has been estimated at 100,000 to 200,000 years ago.

Also contrary to much confused discussion by paleontologists, no date for Mitochondrial Eve implies any sort of population bottleneck at that time. Mitochondrial Eve would have co-existed with huge numbers of male and female relations from whom we also descend. Indeed, [Ayala, 1995] points out that our inheritance from Mitochondrial Eve would be only 1 part in 400,000 of our DNA. The rest we inherit from her contemporaries. But he still spends half the paper attacking the idea of a small ancestral population - an idea that no one believes.


Most recent male-male ancestor ("Y chromosome Adam")

Similarly, by studying male-only DNA, we can try to get an estimate for "Y chromosome Adam". Here there is little to no variation, and much controversy about why. Estimates range from 15,000 to 270,000 years ago, depending on the model used.


All surnames (except one) will die out

Clearly if we had used surnames strictly in the male line since Y chromosome Adam, then we would all now bear his surname, despite him having millions of male contemporaries of different surnames. (In our thought experiment, that is. Of course surnames did not exist back then.)

As a result of thinking about Y chromosome Adam, we can see that if we use surnames strictly in the male line forever into the future, then not only will all hereditary titles die out, but all surnames except one will die out too.

The world does not of course strictly follow that surname rule, but the West approximately does, and surnames do go extinct. Without a mechanism for generating entirely new surnames from scratch (not belonging to either parent) the diversity of surnames can only decline. Neil Fraser nicely describes it as "a random walk - next to a cliff. The only force acting on the system is that once a name randomly stumbles to zero it is gone and can never recover."


The MRCA is much more recent than Mitochondrial Eve or Y chromosome Adam

By following only the female-female or male-male paths, we ignore the billions of other ancestral paths we could follow, thus pushing the common ancestor much further back into the past. The MRCA in any line will be much more recent than Mitochondrial Eve or Y chromosome Adam.



DNA can't tell us the MRCA

DNA studies have a problem in telling us about the MRCA. As [Chang, 1999] notes, the MRCA will be much more recent than any MRCA that could ever be found in DNA studies, even if one were to study the ancestry of every single gene. The reason being that we are considering people who are simply ancestors, through any route, whether or not any of their genes actually survived the journey.


Random 1/2 of parent's DNA (t=1)

Consider that you only get 1/2 of your father's DNA, 1/2 of your mother's DNA (hence total size of DNA is constant). Which bits you get is somewhat random. For any given bit, you may not have inherited it at all. For any given marker, you may not possess that marker, even though you are your father's child in reality (i.e. in genealogy).

Example of how evidence can be lacking in the DNA

Here's an example. First remember that everyone has 2 copies of the genome, so that when you inherit a random 1/2 from your father you get a full genome, rather than, say, having gaps that the 1/2 from your mother has to cover for.

Say for one gene, your father's two copies are AB, your mother's are CD. You could end up with AC, your sibling could end up with BD. For this gene only, there is no genetic evidence of your recent common ancestry.


Random average 1/4 of grandparent's DNA (t=2)

While you do get 1/2 of your father's DNA, this does not mean you get 1/4 of your grandfather's DNA. Your father's DNA is a mix of your grandfather and grandmother's DNA. But which bits of your father's DNA you get is somewhat random. On average you will get a somewhat random 1/4 of your grandfather's DNA, but you could get less or more.

Probability of inheriting no DNA from a grandparent

While I am sure that you can inherit probabilistically more or less than 1/4 from a grandparent, I am unsure of the details. Here's a sketch, but I need to do more reading on this. If you can point me to the answers, let me know.

If there are n events at which to choose between your father's grandfather copy and grandmother copy, the probability of you inheriting from him none of your grandfather's DNA (*) is:

(1/2)n

(*) If you are your father's daughter. If his son, you must inherit the Y chromosome. We will ignore the special cases of the male-male and female-female lines. Admittedly these are hard to ignore with grandparents, since they are 2 of only 4 lines, but these 2 special lines can be ignored as we go back 10 generations or more.

[Chang, 1999, author's reply] discusses this extreme case. I'm not sure if n=23 here (the no. of chromosomes). Then the probability of all grandmother, none from grandfather, would be (1/2)23 = 1 in 223 = 1 in 8.4 million.

If we allow for crossover, the probability of all grandmother, none from grandfather, is:

(1/4)n

If n=23, (1/4)23 = 1 in 246 = 1 in 70 trillion.

Q. Is n=23?


Random average 1/8 of great-grandparent's DNA (t=3)

Similarly, you get on average a somewhat random inheritance of 1/8 of each great-grandparent's DNA, but could get less (or more).

Probability of inheriting no DNA from a great-grandparent

Again I am unsure of the following. Let's say your father's 4 grandparents had, as chromosome 1, c1 and d1 on one side, e1 and f1 on the other side. Excluding crossover, there is a choice between c1 and d1 on one side, and a choice between e1 and f1 on the other. Your father gets the two winners, and he passes on to you one of these. So there is probability 3/4 of not inheriting from one of the four. Probability of you inheriting no DNA from a particular great-grandparent is:
(3/4)n

If n=23 probability (3/4)23 = 1 in 747.

How does crossover affect this? If one great-grandparent is c, your father has 3/4 chance of getting either c, or c crossed with d. He then has 3/4 chance of passing this on, either as is or crossed over. So you have (3/4)2 = 0.56 chance of inheriting some c, or 1 - (3/4)2 = 0.44 chance of inheriting none. So we get chance of inheriting no DNA from a great-grandparent is:

(1 - (3/4)2)n

If n=23 probability (0.44)23 = 1 in 181 million.

Q. Is n=23?


Random average 1 part in 2t of ancestor t generations ago

In general, for an ancestor of yours t generations ago, you inherit an average 1 part in 2t of their DNA, but can inherit more (or less).

Probability of inheriting no DNA from an ancestor t generations ago

Again I am unsure of the following. For an ancestor t generations ago, what is the probability of inheriting none of their DNA? If there was no crossover this would be:
( 1 - (1/2)t-1 )n
where to inherit, the DNA must get through (t-1) competitions, where, in each, the probability of getting through is 1/2. So the prob. of getting through all the competitions is (1/2)t-1. If t=2 or 3 we can see this reduces to the probabilities listed above. This is always between 0 and 1. With finite n, this increases towards 1 as t (the number of generations back) increases.

If n=23, the probability depends on t. This is equal to 1/2 for:
1-(1/2)t-1 = 0.97
1/2t-1 = 0.03
2t-1 = 33.7
t-1 = 5
In other words, more than 6 generations back, the prob. of inheriting no DNA at all from one of your ancestors is more than 1/2.

But what about crossover? With crossover, the probability of inheriting none of the DNA of an ancestor at generation t is:

( 1 - (3/4)t-1 )n
where to inherit, the DNA must get through (t-1) competitions, where, in each, the probability of some getting through is 3/4. So the prob. of getting through all the competitions is (3/4)t-1. If t=2 or 3 we can see this reduces to the probabilities listed above. This is always between 0 and 1. With finite n, this increases towards 1 as t increases.

If n=23, the probability depends on t. This is equal to 1/2 for:
(3/4)t-1 = 0.03
t-1 = 12
In other words, more than 13 generations back, the prob. of inheriting no DNA at all from one of your ancestors is more than 1/2. Note that at 13 generations back (c. 1500s - 1600s) you have 8192 ancestors.

Q. Is n=23?


As n goes to infinity

Q. Is n=23?

For small n, it is easier (more probable) to not inherit from an ancestor. With a single event (n=1), it could easily lose that event. With a large number of events, it is unlikely it loses them all. For large n, it is harder to not inherit from an ancestor. As n goes to infinity, you must have inherited some DNA from the ancestor.

We can see that above, for any finite t, as n goes to infinity, the probability of not inheriting goes to zero.

Even with high n, probability of inheriting no DNA from ancestor is still high in historical times

But even for quite high n, the probability of inheriting no DNA from an ancestor is still high in historical times. For example, for n = 10,000, the probability of inheriting nothing from an ancestor is equal to 1/2 for:
( 1 - (3/4)t-1 )10000 = 1/2
1 - (3/4)t-1 = 0.99993
(3/4)t-1 = 0.00007
t-1 = 33
Hence, even for n=10000, by the time we get back only 34 generations (i.e. c.1000 AD) the probability of inheriting no DNA at all from an ancestor is greater than 1/2.

As n goes to infinity, inheritance goes to infinity

As n goes to infinity, not only does the probability of inheriting something from any ancestor go to 1, but the amount inherited from any ancestor goes to infinity. But for any finite n, the amount inherited is still only 1 part in 2t of the ancestor's DNA.

Whatever the probabilities, you still only inherit 1 part in 2t

I am unsure of the probabilities above. But even if it is harder than I think to inherit no DNA from an ancestor, it is still true that, for all n, you inherit on average only 1 part in 2t.

Example - An MRCA with no evidence in the DNA

Imagine 16 people (24) with an MRCA 10 generations ago. Each has inherited on average 1 part in 210 (1/1024) of the MRCA's DNA (and maybe less). Quite possibly these parts don't overlap. Take 16 samples of 1/1024 of the MRCA's genes. These samples may not overlap at any point. These 16 people are in reality descended from this MRCA, but there is nothing in their genes to show it.

For an MRCA 30 generations ago, you need 230 people = 1 billion people to be sure that their samples of 1 part in 230 of the ancestor's DNA must overlap.

Your ancestors are related

In real life, the issue of detecting that you have actually inherited none of an ancestor's DNA is made more complex by the fact that your ancestors themselves are related to each other, and so may share DNA, and so it may look like you inherited DNA from one, when in fact you didn't.




Conclusion - Within historical times, you have ancestors from whom you have no DNA

Even though my model is simplified, I think the conclusion is probably true - that within historical times (3000 BC to 2000 AD) you have ancestors from whom you have inherited no DNA.

As I say, I need to do more reading on this. I'm sure this has been discussed before. There is some discussion of this in [Wiuf and Hein, 1999].




The "MRCA" for every gene

So what can DNA studies tell us? Above we looked at the "MRCA" for the Y chromosome. If you studied all genes in the genome, and found an "MRCA" for each gene, would all of them be much further back than the (true) MRCA? Let us define the terms:

CA1
Our original definition of a CA - just an ancestor, including those who left no genes at all to the present day.

CA2
CAs who have left any genes in different places on the genome in a minority of their descendants. A smaller number than CA1s.

CA3
CAs who have left the same gene in the same place in a minority of their descendants. A smaller number than CA2s.

CA4
CAs who have left just one gene in the same place in all of their descendants. A smaller number than CA3s. This is what DNA studies will calculate as the "MRCA" for this gene.

MMRCA
The most recent MRCA of any gene.

So the "real" CAs (the CA1s) outnumber the CAs of a gene (the CA4s), but do they vastly outnumber them? As genome size tends to infinity (i.e. n goes to infinity) it becomes impossible for an actual ancestor (CA1) not to be at least a partial genetic ancestor (CA2) as well. So the difference between CA1 and CA2 breaks down.

I used to say on this page:

"Not just that, but as genome size tends to infinity, the number of genes a CA1 must pass to you goes to infinity, and (I think) finding two in the same place in two descendants becomes more likely. So the difference between CA2 and CA3 breaks down. And with a finite number of individuals, finding two in the same place for all of them becomes more likely. So the difference between CA3 and CA4 breaks down."

but now we can see this is not so. (At least I put in "(I think)" in the correct place!) The difference between CA2 and CA3 does not break down. For any finite n, you are getting a larger inheritance from the ancestor alright, but it is still only 1 part in 2t, so for any 2 descendants it is quite possible that their samples do not overlap (for any reasonable size t). The probability of overlap depends on t, not on n.


Conclusion - All CAs in DNA studies will be much older than the MRCA

So my conclusion is that even with large genomes, the CA1s (true CAs) are not equal to the CA4s (gene study CAs). Even if you did the MRCA for every gene, even the best result (the most recent) could be much older than the true MRCA. [Chang, 1999] has a good discussion of why the true MRCA, following any line, will be much more recent than any gene-tracking MRCA. See also [Wiuf and Hein, 1999].




Archaeology can't tell us the MRCA

Archaeology is also of limited use in telling us about the MRCA. For instance, even the MRCAs found in DNA studies will exist much more recently than the paleontologists might imagine looking at the fossil evidence - for the simple reason that they are merely "statistical artefacts" of no real importance to the overall story of human evolution. It would be totally wrong, for example, to imagine that the CA lived in an important or influential place or culture. See [Dawkins and Jones, 1992].

For instance, [O'Connell, 1995] is confused about Mitochondrial Eve's relation to the fossil record - no date for Mitochondrial Eve, no matter how recent, could possibly contradict the fossil record studied by the paleontologists. This is based on the error of assuming that Mitochondrial Eve is important (see above).

One could even say that genealogy is the pursuit of statistical artefacts.




Return to Common ancestors of all humans.


Donation Drive

Please donate to support this site. I have spent a great deal of time and money on this research. Research involves travel and many expenses. Some research "things to do" are not done for years, because I do not have the money to do them.
Please Donate Here to support the ongoing research and to keep this website free.

Help       Conventions       Abbreviations       How to read the trees

Privacy policy       Adoption policy       Image re-use policy       New 250 G VPS server.