Archive for January, 2010

Evolving Chimps are Messing Up Y-DNA Dating

January 28, 2010 5 comments

We are only starting to understand the functionality of Y-DNA. New research on chimpanzees by Hughes et al. (2010) revealed considerable selective pressures that caused the Y-chromosome to evolve more rapidly than anything else. The degree of similarity in orthologous MSY sequences, i.e. of any Y gene found in chimpanzees and humans that can be traced to a common ancestor, amounts to 98.3% nucleotide identity, slightly less than the value reported when comparing the rest of the chimpanzee and human genomes (98.8%). Deletions, insertions and substitutions can be observed and the alignments of nucleotides don’t necessarily imply a direct homology. I think it would be an error to assume these polymorphisms are without meaning and presume no selective differences. However, more strikingly, more than 30% of the MSY sequences is not orthologous. Divergence also involves the so-called X-degenerate region, the realm of genes considered at least 40 million years old, that contains surviving relics of ancient autosomes (non-sex chromosomes) from which the X and Y chromosome co-evolved. The article suggests an extraordinary divergence between humans and chimpanzees:
“The chimpanzee MSY contains twice as many massive palindromes as the human MSY, yet it has lost large fractions of the MSY protein-coding genes and gene families present in the last common ancestor.”

Interspecies Handshake for the Blog.

Among other processes, it seems the differences relate to diverging demands that were posed by mating behaviour and sperm production. Obviously the effect of the changes was to enhance functionality. Promiscuous female behaviour caused sperm competition among males that resulted in the chimp ability to produce sperm cells that reach an average speed of 0.7 km/h, against a mere 0.2 km/h to humans.
But how these results interfere with previous assumptions on Y that concern genetic dating? The dating system for Y-DNA haplogroups as proposed by Karafet (2008) heavily builds upon assumptions on the nature of the Y-chromosome that now seem to be rendered obsolete. Karafet defined her dating system on what she thought to be correctly intertwined archeological (paleo-anthropological) and genetic investigation results. She could have been wrong on the validity of both. Then the entire fundament of her reasoning turns out to be circular, and the framework crushes.

To provide estimates of the age of the nodes, we chose to fix the time to the most recent common ancestor of CT (defined by P9.1, M168, and M294) at 70 thousand years ago (Kya), which is consistent with previous estimates from genetic and archaeological data (Lahr and Foley 1998; Hammer and Zegura 2002; Macaulay et al. 2005), and is the chronological approximation given in Jobling et al. (2004) (p250) for the first major human out-of-Africa dispersals.

So far the Y-chromosome was understood thus:
“[…] one-half consists of tandemly repeated SATELLITE DNA and the rest carries few genes, and most of it does not recombine. However, it is because of this disregard for the rules that the Y chromosome is such a superb tool for investigating recent human evolution from a male perspective.” (Jobling et al., 2004)
In a review, The Whitehead Institute for Biomedical Research came up with the following considerations that redefine the human Y-chromosome rather as a hotbed of evolutionary change :

Chimp and human Y chromosomes evolving faster than expected
CAMBRIDGE, Mass. (January 13, 2010) – Contrary to a widely held scientific theory that the mammalian Y chromosome is slowly decaying or stagnating, new evidence suggests that in fact the Y is actually evolving quite rapidly through continuous, wholesale renovation.
By conducting the first comprehensive interspecies comparison of Y chromosomes, Whitehead Institute researchers have found considerable differences in the genetic sequences of the human and chimpanzee Ys—an indication that these chromosomes have evolved more quickly than the rest of their respective genomes over the 6 million years since they emerged from a common ancestor. The findings are published online this week in the journal Nature.
The region of the Y that is evolving the fastest is the part that plays a role in sperm production,” say Jennifer Hughes, first author on the Nature paper and a postdoctoral researcher in Whitehead Institute Director David Page’s lab. “The rest of the Y is evolving more like the rest of the genome, only a little bit faster.”

Apparently, exact knowledge on Y-DNA and how it works is still lacking:

Wes Warren, Assistant Director of the Washington University Genome Center, agrees. “This work clearly shows that the Y is pretty ingenious at using different tools than the rest of the genome to maintain diversity of genes,” he says. “These findings demonstrate that our knowledge of the Y chromosome is still advancing.”

Still Karafet proposed a system of SNP dating based on freely mutating portions of Y-DNA, whose behaviour could already be assumed sufficiently predicatable. This must be wrong.
As for now, the possibility for a wholesale verification of the (random) Y mutation rate by sequencing has not been fully exploited. We depend on assessments that concern picked microsatelite loci and assume average mutation rates all over Y. Thus, by comparing relatives separated over an increasing amount of documented generations we could retrieve such average values. Comparing all base-pairs is a painstaking exercise that so far has been done only at the euchromatic male-specific region for up to 10Mb out of a total of about 30Mb of Y-chromosome base-pairs, and excluding ‘gaps in the reference sequence, highly repeated sections, and palindromes from our analysis’ (Xue et al., 2009).
“The Y chromosomes of two individuals separated by 13 generations were flow sorted and sequenced by Illumina (Solexa) paired-end sequencing to an average depth of 113 or 203, respectively [4]. Candidate mutations were further examined by capillary sequencing in cell-line and blood DNA from the donors and additional family members. Twelve mutations were confirmed in ~10.15 Mb; eight of these had occurred in vitro and four in vivo. The latter could be placed in different positions on the pedigree and led to a mutation-rate measurement of 3.0 X 10-8 mutations/nucleotide/generation (95% CI: 8.9 X 10-9 – 7.0 X 10-8), consistent with estimates of 2.3 X 10-8 – 6.3 X 10-8 mutations/nucleotide/generation for the same Y-chromosomal region from published human-chimpanzee comparisons [5] depending on the generation and split times assumed.”
The human Y-chromosome has 454 genes and 25.121.652 sequenced base pairs (~24 Mb), that is the euchromatic part out of a total of 57,741,652 base pairs. Between 1% and 2% on average of the base pairs are coding, though estimates of the base pairs being expresssed one way or the other run up to 80% of the genome (i.e. the sequenced part). This means that 80% of these 24 Mb base pairs is not ‘junk’ and can’t be considered neutral. Such a huge part simply can’t be permitted to mutate without any restrictions on molecular level, since most mutations are deleterious. The occurrence of mutations at this magnitude is hard to accept for genes that are actually used, one way or the other: we don’t even know to what degree slow mutating STR could be truly included in the junk part of DNA. Coding DNA is fairly immutable since it should be clear that evolution is a slow process that can’t guarantee the survival of successful lineages that are subject to mutations at the same rate as random mutations. At least the occurrence of spontaneous mutations is based on internal processes whose logic doesn’t necessarily depend on mere statistics, but instead of the local availability of structural variability potential as eg. supplied by unevenly distributed palindrome DNA. This little detail alone would already have the potential to decrease the overall mutation count about five times at the assumption of just 20% junk DNA, and thus increase the current SNP dating accordingly by 5 times.
The mutations detected by Xue et al. in only one third of the Y-chromosome indeed seem to approximate the expected values at a normal mutation rate over the whole chromosome, though this result may be biased by their own predefined expectations. Maybe Xue et al. just diagnosed an exaggerated accumulation of mutations within a region that included all genetic areas they already knew as highly prone to mutations. If this were the case, their investigation was useless and we should rather direct our efforts in distinguishing the behaviour and mutation rates related to either coding or non-coding DNA.
In her age calculations for Y-DNA haplogroups Karafet bases herself on the assumptions of others, e.g. Jobling et al. (2004). The latter just doubles the previous estimates concerning STR (Pritchard et al.,2000) and SNP (Thomson et al.,2000) due to the new insight of generation intervals of 35 years:
“The appropriate choice of generation time for ancient populations remains unclear, but the use of 35 years would almost double the TMRCA estimates.”
Thus her study also depends heavily on older studies that make some curious assumptions on Chimp and Junk Y-DNA, including Jobling’s claim that Y-DNA “carries few genes”, leaving one-half of Y-DNA virtually without any potential coding blueprint.
Thomson et al. (2000): “For the ages of major events in these trees, an estimate for the mutation (single-nucleotide substitution) rate was needed. To obtain this rate, the number of substitutions was found between a chimpanzee sequence and a human sequence for the genomic region in question. From this information, the mutation rates per site per year for the three genes were estimated“.
Note that these mutations rates are now probably way too high due to positive selection that occurred among chimps. This study of Thomson was cited by Jobling as: “The largest study of Y-SNP variation free from ascertainment bias, which is based on DHPLC [denaturing high-performance liquid chromatography] and proposes a common ancestor ~59 kya.”
Karafet’s assumptions on STR are based on similar principles: […] our analysis has several modeling limitations. […] In particular, we ignore population structure. Second, we assume selective neutrality of the Y chromosome.[…] It would be of considerable biological interest if natural selection were shown to have been an important force on the human Y chromosome, but the value of the Y chromosome as a tool for interpreting human history would then be reduced (Pritchard et al.,1999).
Jobling: “The human population is so large that, even given the low average mutation rate of ~2 × 10–8 per base per generation,we expect recurrent mutations to occur at every base of the Y chromosome in each global generation.”
Of course this can’t be true for coding DNA, where only a few DNA configurations are viable.

Jobling referring to Kayser (2000): “Studies that use mutation rates in calculations […] often quote average rates, such as 3.17 × 10–3 per microsatellite per generation”. Note this study of Kayser was cited by Jobling as: “Still the largest published study to measure the mutation rates at Y-chromosomal microsatellites using father–son pairs. This is more laborious than using deep-rooting pedigrees or sperm pools, but produces more reliable measurements.”
Jobling: “Sequencing of the chimpanzee genome is underway, and promises a cornucopia of information about the evolution of our own genome. Assembly of a chimp genome sequence using the human sequence as a framework will be straightforward for most chromosomes, but it might prove difficult for the Y chromosome because of its evolutionary lability. It is to be hoped that expenditure of effort on the Y chromosome will be comparable to that on other chromosomes, and that its reputation as a gene-poor junk-rich delinquent will not lead to a reluctance to include it wholeheartedly in the sequencing effort.”
The investigation of Hughes et al. shows that Y-DNA is neither subject to evolutionary lability nor does it substantiate claims of being a “gene-poor junk-rich delinquent.”

So what it is all about?
Chimps deviated from humans because of some peculiar selective pressures that concern sperm competition (Nascimento et al., 2008).
This deviation thus probably affected chimps rather than humanoids, what seems to be confirmed by what was already known in the seventies, that Gorilla sperm is more similar to human sperm than the sperm of chimps (Seuánez, 1976).
This can only make sense if Y-DNA of chimps deviated from the common stock rather than the y-DNA of humans. The Gorilla-Chimp-Human group split off too early from Orangutan to presume a specific correlation of Human DNA to Orangutan to be worthwhile, unless gorillas, like chimps, deviated disproportionally from the common human lineage since the Orangutan split. In that case, even primitive “Orangutan-like” features would be valuable for making better age estimations than chimps, at least concerning the Y-chromosome.

King Kong illustrates interspecies mating is hard to imagine.

Male Y-DNA developed rapidly, but this doesn’t prove ancestral males developed preferences for certain kinds of ancestral females. Actually, strikingly low differences at X-chromosome levels between humans and chimps even allow both species to have evolved together for a much longer time than the differences on Y (and other chromosome differences) suggest. Free mixing may apply even more to early humans, where evolutionary forces that concern sexuality remained lower. If a certain group of early “chumans” (ancestral chimp-humans) developed a chimp-like sexual behaviour that caused females to be so very promiscuous as to trigger male sperm competition, then mainstream “chuman” males just didn’t get a chance anymore to add to the genepool of the most promiscuous group. On the other hand, males that already developed better sperm strategies lost their competitive edge in mainstream communities where female behaviour was less explicit. The female chimp has an estrus cycle of about 34 to 35 days. While in heat, the bare skin on her bottom becomes pink and swollen, and she may mate with several males. When did the males develop their mating preferences? And when females lost their attractiveness to one of the emerging species? Sperm behaviour may have been the prime cause of the split, since I don’t think humans are known for being particularly selective in finding a mating partner. That humans and chimps stopped mating/mixing thus may have been interluded by a lost sperm-competition among males, rather than cross-group infertility. Somehow early humans did not follow this sexual chimp-culture, or else (in this view) the split wouldn’t have occurred due to the sexual advantage of chimp males. Maybe early human males became discouraged by the explicit promiscuity and swollen bottoms of the females demanding sperm competition, or the early chimp females became discouraged beforehand to show their pink bottoms to the early human losers of the sperm competition around. Still chimp females and chimp males could have entered the human genepool for a longer period, unless of course the Y-DNA changes among chimps were also a response to a new chimp-female receptivity of a certain kind of chuman-sperm. However, evidence of a shared female evolution – if any – tends to outweigh all potential evidence of hybridization.
Speciation does not happen if Panmixia outweighs Fixation. In a simple formula:

P = 1 – F
P – “factor of Panmixia”
F – “factor of Fixation”

In this case F(ixation) could have happened because of two concurrent reproduction strategies among males, without isolation. Panmixia could have been fully in place for female “chumans”, to the point that they might have remained indistinguishable one from the other for some time. The view that early hominids may have been human-chimpanzee hybrids has no empirical support in the animal world. However, Panmixia does not necessarily imply hybridization at any stage. The same lack of empirical evidence makes the Multiregional Hypothesis so very hard to prove. There is no empirical data on animals that persistently violate biological barreers. Humans, however, are essentially different from animals in much of their behavior, and the uniqueness of humans implies no other examples, thus the non-existence of empirical data that concern animal observation by definition. In my view, maximum Panmixia is likely as a feature of incipient humans, and “maybe” of chumans as well. Fixation as an exclusive result of sexual behavior and sperm competition thus would be perfectly in line with the Multiregional view.
The only way to account for the accumulation of human Y mutations over the whole population is to assume that Y evolved in a process of change that involved the regular replacement of the whole male population of the species, always departing from a single ancestor. There certainly are quite a few mutations since the human-chimp split date and it just doesn’t make sense to imagine evolution as a one step event. Parallel lineages may have occurred sometimes though successful mutations only occur once, and most probably one at a time. Moreover, there is not any reason to assume that each successful mutation on Y implies the emergence of a new species versus the extinction of ancestral species. Thus the selective forces not only resulted into the continuous reconstruction of Y (Hughes et al.), but also into the continuous reconstruction of the whole male population, departing each time from a single male ancestor – no matter how small the change and on what part of the Y the successful mutation occurred. The only precondition, of course, would be real selective advantage. The main implication to what this means to the nature of Y-DNA is: much less junk than was ever assumed. The evolutionary changes of orthologous MSY sequences that were a “little bit faster” could confuse mutation rates even more. Actually, in my view evolving Y DNA does not allow so much random change, except for the acceleration of decay. I suspect the existence of non-conventional mechanisms leading to successful mutations. Furthermore, diminished random change would inevitably slow down the formation of new stable markers that are truly “random”.

Male competition - Humans lost! (Travis the Ladykiller).

Let’s not be confused here about the word evolution. Evolution in this context implies adaptation and non-random change caused by natural selection. Most of all, true evolution implies non-neutral Y markers, not the statistic accumulation of variance or diversity of junk DNA. Obviously, this is not what population geneticists should want to have in calculating their mutation rates since neutrality is their explicit and prime assumption. In all the relevant papers this assumed neutrality is explicitly mentioned.
The variation of coding sites is very low, since mutations on coding DNA could invoke a tricky situation. Except for decay, evolutionary change of coding DNA is usually limited to a set of polymorphisms. For instance, if only 10 polymorphisms are viable and doing about the same then this is all we will ever see, no matter how much time will pass. Evolution of polymorphisms is not infinite. That is why you can grow the eyes of a mouse on the legs of a fly, using genes that are essentially similar to all species. This probably means that new mutations rather originate from another source, somewhere else on the chromosome. There is no accepted theory on the emergence of successful mutations as far I know, though there are theories on the coding potential of palindromic elements, inverted repeats that like direct repeats can also be tandem repeats. My guess is that to gain a competitive edge you’ll need an increased supply of these repeats, like chimps have, i.e. some kind of genetic lab where new configurations can be tested without compromising existing,i.e. functional genes. Somehow these palindromes find their source in blueprints and we don’t know yet how loosely related they really are to coding sites. Definitely we can observe constraints to the variance of STR – this could be one.
The effective mutation rates of sites subject to selection is lower than for sites not subject to selection. Population geneticists may be quite used to dealing with this, even without the need for deeper understanding: HVR vs coding region equal to fast STRs vs slow STRs. This may be no big deal, but only in the case very little of the Y actually codes for proteins. We don’t know how much is coding, we are only starting to understand the functionality of Y-DNA, like the study also indicates. If slow STR are indeed (loosely) linked to coding regions, and fast STR to HVR, then non-neutrality should be an issue to consider. Non-coding parts may be closely associated to coding parts and actually this is what the rapidly “evolving” chimp Y-DNA suggests:

“By comparing the MSYs of the two species we show that they differ radically in sequence structure and gene content, indicating rapid evolution during the past 6 million years. The chimpanzee MSY contains twice as many massive palindromes as the human MSY, yet it has lost large fractions of the MSY protein-coding genes and gene families present in the last common ancestor.”

Note the “lost part” of chimp Y-DNA is a powerful indication of the one-sided nature of chimp evolution, apparently causing a considerable degree of collateral damage. Remarkably, Gorilla DNA didn’t attest such loss of the ancestral state. The X-degenerate region on the Y chromosome has retained all 16 genes for gorilla’s and humans alike, while chimpanzee has lost 4 of the 16 genes since the divergence of the two species.
Indeed, at 6 million years of separation, the difference in MSY gene content in chimpanzee and human is more comparable to the difference in autosomal gene content in chicken and human, at 310 million years of separation.”
The impact of change on human Y evolution remains unclear in the study. There can’t be any doubt that genetic decay was a principal dynamic all along in the evolution of Y chromosomes, but chimp DNA show us that “wholesale renovation is the paramount
theme in the continuing evolution of chimpanzee, human and perhaps other older MSYs.”

The dynamics of change are so widely different between chimps and humans, that the massive chimpanzee ampliconic regions being 44% larger than in human must have some evolutionary advantage.

Previous models of Y-chromosome evolution treated the chromosome as a uniform, homogeneous substrate for evolutionary change. In fact, the evolution of ampliconic sequences has outpaced that of X-degenerate sequences
Unlike the human MSY, nearly all of the chimpanzee MSY palindromes exist in multiple copies, so that each palindrome arm has potential partners for both intra- and interpalindrome gene conversion (non-reciprocal transfer) – Hughes et al., 2010

Thus, DNA of non-coding intron regions have a function after all, that is all about evolution. We don’t know if polymorphisms of an allel are the direct result of mutations in coding DNA, or replacements that pop up from an associated palindrome “lab”. These polymorphisms should be equivalent or functional within the same range. Repeat polymorphisms, on the other hand, could be associated to the corresponding polymorphism of an allel blueprint. Thus the non-neutral behavior could extend to much more than the few identified genes. The existence of each polymorphism thus should depend on its “evolutionary” success, and less to the statistic probability of occurrence.
In short, my point is that Y-DNA variation may be less “random” than normally assumed. Much of the “junk” exists for the specific need of genetic recycling. Moreover, “stable” regions assumed to be the source of valid marker SNP’s are subject of decay rather than being the scene of mutational dynamics. This is quite different from the blanket assumptions currently applied to access genetic variance and age. Note this issue even wasn’t ever addressed at all in mathematical assessments dealing with variance and age.
I wonder how much fastly evolving DNA could ever contribute to a really valuable deep peek into the ancestry of the human species, so let’s consider “slow STR” and evolution. It must be very worthwhile to evaluate the selective forces of Y among humans during the last 100.000 years. I could not find any reliable article on the subject, only rumors. Someone proposed reduced sizes of the male reproductive organ among Y-DNA haplogroups CF(xIJK), though the poor results I remember of a study among e.g. Germans would rather suggest the opposite. Interesting though to investigate selective forces that e.g. would explain low Y-DNA haplogroup G-values over a wide area including Europe as reminiscent of pre-Y-DNA Haplogroup IJK conditions. Male-human “sizes” definitely attest specific human selective forces (compared to other species) and definitely among humans there is a lot of variation, though I don’t have a clue to any relation to Y-DNA. So far sizes of the reproductive male organ are only a reliable Y-DNA predictor to Gorillas: merely 4 cm! There is no guarantee on any relationship and maybe the Y only involves the properties of sperm (and maybe male behaviour as well, including preferences for older females among chimps) and thus should be rather considered utterly invisible.
Old Y-DNA thus could still be preserved as a very early geographic pattern in early humans migrations, only that in the future we should compare with gorillas rather than chimps in this Y-DNA matter to retrieve valid estimations for SNP mutation rates. However, due to this latest chimp Y-DNA study it became very tenuous that – as Jobling assumed – we can still continue in the assumption of the Y-Chromosome being regarded as a neutral locus. We simply can’t presume selective pressures on male human Y-DNA to have ended already ages ago, e.g. just before the chimps deviated from the humanoid lineage. What we should know is that the Karafet assumptions are build on thin air.
The chimpanzee evolved because the sins of Eve, that decided to be promiscuous and thus degraded Adam to the state of an Ape. If so, we have to get accustomed to the idea that humans still occupy paradise and that the animals were thrown out instead. I never considered myself a descendant of Adam nor of Eve, though. Now I know why.

Forever Chimps, due to the sins of Eve.


  • J.F.Hughes et al. – Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene content, 2010, link (paysite). Try here.
  • Tatiana M. Karafet et al. – New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree, 2008, link
  • H. Goto et al. – Evolution of X-degenerate Y chromosome genes in greater apes: conservation of gene content in human and gorilla, but not chimpanzee, 2009, link
  • Yali Xue et al. – Human Y Chromosome Base-Substitution Mutation Rate Measured by Direct Sequencing in a Deep-Rooting Pedigree (2009), link
  • Mark A. Jobling and Chris Tyler-Smith – The Human Y Chromosome: An Evolutionary Marker Comes Of Age (2004), link
  • Jaclyn M Nascimento et al. – The use of optical tweezers to study sperm competition and motility in primates (2008), link
  • Russell Thomson et al. – Recent common ancestry of human Y chromosomes: Evidence from DNA sequence data (2000), link
  • H. Seuánez, Fluorescent (F) bodies in the spermatozoa of man and the great apes (1976), link
  • Kayser, M. et al. – Characteristics and frequency of germline mutations at microsatellite loci from the human Y chromosome, as revealed by direct observation in father/son pairs (2000), link
  • Pritchard, J. K., Seielstad, M. T., Perez-Lezaun, A. & Feldman, M. W. – Population growth of human Y chromosomes: a study of Y chromosome microsatellites (1999), link
  • Gráinne McGuire et al. – Models of Sequence Evolution for DNA Sequences Containing Gaps (2001), link

Recommended reading:

  • John Hawks Weblog – A low human mutation rate may throw everything out of whack, link

Note: I got noticed this article was erroneously cited elsewhere to support claims in favor of lower Y-DNA based date estimates. For this reason I bolded the phrases that indicate my view that instead (much) higher Y-DNA based date estimates should be considered.