Abstract
Genome-wide premortem DNA methylation patterns can be computationally reconstructed from high-coverage DNA sequences of ancient samples. Because DNA methylation is more conserved across species than across tissues, and ancient DNA is typically extracted from bones and teeth, previous works utilizing ancient DNA methylation maps focused on studying evolutionary changes in the skeletal system. Here we suggest that DNA methylation patterns in one tissue may, under certain conditions, be informative on DNA methylation patterns in other tissues of the same individual. Using the fact that tissue-specific DNA methylation builds up during embryonic development, we identified the conditions that allow for such cross-tissue inference and devised an algorithm that carries it out. We trained the algorithm on methylation data from extant species and reached high precisions of up to 0.92 for validation datasets. We then used the algorithm on archaic humans, and identified more than 1,850 positions for which we were able to observe differential DNA methylation in prefrontal cortex neurons. These positions are linked to hundreds of genes, many of which are involved in neural functions such as structural and developmental processes. Six positions are located in the neuroblastoma breaking point family (NBPF) gene family, which probably played a role in human brain evolution. The algorithm we present here allows for the examination of epigenetic changes in tissues and cell types that are absent from the palaeontological record, and therefore provides new ways to study the evolutionary impacts of epigenetic changes.
Similar content being viewed by others
Main
Changes in gene regulation often underlie phenotypic divergence1,2,3,4, making the identification of regulatory differences between archaic humans (Neanderthals and Denisovans) and anatomically modern ones a major goal of palaeogenetics5. Because RNA is rarely present in palaeontological remains6, changes in gene regulation must be inferred indirectly from the ancient DNA (aDNA) sequences themselves. Previous works showed that aDNA degradation signals can be harnessed to computationally reconstruct premortem genome-wide DNA methylation maps7,8,9. DNA methylation is a key epigenetic mark that strongly affects the activity level of regulatory regions such as promoters and enhancers10. Hence, its reconstruction provides information on premortem gene activity patterns in ancient individuals. Indeed, the reconstruction of DNA methylation in Neanderthals, a Denisovan and anatomically modern humans allowed us to identify regulatory differences between these human groups and associate them with phenotypic changes, opening the field of palaeoepigenetics9,11,12.
However, DNA methylation is tissue-specific to the extent that methylation patterns in two different tissues of the same organism are often more diverged than methylation patterns in the same tissue in two different species13,14. Although occasionally aDNA is extracted from soft tissues such as skin15 and liver16, this is limited to a small number of highly conserved and relatively young samples. Generally, bones and teeth are the main sources of aDNA. In particular, these tissues are the source of all archaic human aDNA. Accordingly, our previous palaeoepigenetic studies focused on the evolution of the skeletal system9,12, but could provide only limited insights on other systems such as the nervous system8.
Yet, DNA methylation in one tissue still carries information on DNA methylation in other tissues of the same individual. The reason is that DNA methylation is erased almost completely in the zygote, and then re-established in a cell type-specific manner during development10. When a DNA methylation change that separates human groups is established during early developmental stages, it may propagate to all descendant cell types, and be simultaneously manifested in multiple tissues.
Here we show that for a certain class of DNA methylation changes, we are able to use parsimony considerations17 to predict the timing of the change during development, rendering skeletal DNA methylation changes informative on the methylation state in other tissues. Testing our method on modern primate tissues, we show that we achieve precision of 0.7–0.92 in using DNA methylation changes in one tissue to predict corresponding changes in another tissue.
We applied our method to predict DNA methylation changes in prefrontal neurons between modern humans, archaic humans and chimpanzees. We found 71 genes that are associated with differential neuronal DNA methylation separating modern and archaic humans, as well as 870 genes that are associated with differential neuronal DNA methylation separating all humans from chimpanzees. We show that some of these genes are involved in neural functions. Moreover, we discovered six methylation changes in genes that carry the Olduvai domain, whose copy number is associated with an increase in brain volume and cognitive function, suggesting that these changes might be important in human brain evolution.
Results
Given the almost complete reset of DNA methylation patterns in the zygote as part of the epigenetic reprogramming process18, evolutionary differences in DNA methylation patterns between species are manifested as changes that build up during embryonic development. A methylation change that occurs in cells that are not fully differentiated would propagate to all descendant cell lineages, unless reverted by a later change. For example, if a site goes through an evolutionary methylation change that builds up at a time that precedes the split of the mesodermal and the ectodermal germ layers, the difference in methylation level would show up both in bone and in neurons (Fig. 1a).
a, Schematic embryonic development trees. DNA methylation changes along the tree would propagate to all descendant lineages. Considering a pair of tissues, for example bone and brain, methylation changes that occurred earlier than their developmental split (right) are fundamental and will affect both tissues. Changes that occurred after the split (left) are tissue-specific and will affect only one of the tissues. b, Different triad phylogenies and paired tissues are considered by the algorithm for validation and test sets. Methylation data are missing for the target ‘t’ in tissue ‘p’, but available for tissue ‘a’ for all species. c, Identification of configurations, out of all 12 possible, where the methylation state in the target in tissue ‘p’ can be predicted. When prediction is possible, the type of methylation change, whether fundamental (F) or tissue-specific (TS), is denoted. Note that for clarity, methylation levels are considered as binary variables.
For any pair of tissues or cell types, we denote by ‘fundamental changes’ all those evolutionary changes to DNA methylation that were established before the developmental split between the two tissues or cell types. Evolutionary DNA methylation changes that occurred after this developmental time point would be called tissue-specific changes (Fig. 1a). Note that the terms tissue-specific and fundamental methylation changes should be always understood in the context of two specific tissues or cell types. For example, a methylation change affecting all endodermal tissues is still considered tissue-specific if we compare, for example, lung with bone. The basis of the algorithm we develop below is the fact that if we are able to determine, for two tissues or cell types, whether an evolutionary change in DNA methylation is either fundamental or tissue-specific, then the methylation level in one tissue or cell type would be informative on the methylation level in the other tissue or cell type.
The inference procedure
We consider a three-species phylogeny, in which the species are marked as ‘reference’ (superscript r), ‘target’ (superscript t) and ‘outgroup’ (superscript o). For these three species, we examine five DNA methylation maps in two tissues (or cell types). In one tissue, denoted ‘all’ (superscript a), DNA methylation data are available for all three species. In the other tissue, denoted ‘partial’ (superscript p), DNA methylation data are available for the reference and the outgroup only (Fig. 1b). Our goal is to predict the methylation levels in the target in tissue ‘partial’.
A useful abstraction of the problem, which allows for convenient conceptualization of the approach, is to imagine that the DNA methylation in each CpG position is a binary variable than can be either high or low. We dub the combination of binary methylation levels across the five available DNA methylation maps in a specific CpG position a configuration. There are exactly three possible combinations of methylation states that describe CpG positions with differential methylation across the three species in tissue ‘all’ (Fig. 1c). An evolutionary methylation change in tissue ‘all’ can occur along the branch leading to the reference (reference-derived), along the branch leading to the target (target-derived) or along either of the two branches that separate the outgroup from the reference and target (outgroup-specific). Each of these three combinations may be associated with any of four different combination of methylation states in the tissue ‘partial’, leading to a total of 12 possible configurations that should be examined (Fig. 1c).
For some configurations, it is possible to determine the developmental timing of the methylation change using parsimony, whereas for others it is not. In a previous publication17 we outlined the procedure and demonstrated it in a particular configuration. Here we examine all possible configurations, identify those in which inference on the ‘target’ methylation can be made, develop a detailed algorithm to carry out the inference procedure and use it to predict DNA methylation levels in archaic human brains.
Of the 12 possible configurations, there are five for which parsimony considerations provide a prediction for the timing of the developmental change in methylation (Fig. 1c). In three configurations we predict that the change was tissue-specific, in two configurations the change was fundamental and in all others that it is impossible to determine the timing of the change. In tissue-specific configurations we end up predicting that the methylation change detected in tissue ‘all’ does not hold in tissue ‘partial’. These cases are less relevant for the current study, where our goal is to identify methylation changes in tissue ‘partial’. In the remainder of the article, therefore, we focus solely on the two configurations in which we conclude that the methylation change was fundamental. Note that because data are missing in tissue ‘partial’ of the ‘target’, we cannot make any predictions regarding developmental timing of the methylation change in target-derived changes.
Quantifying the performance
Based on the abstraction above, we developed an algorithm that accounts for the non-binary nature of DNA methylation and identifies CpG positions that are consistent with having gone through a fundamental methylation change (Methods). In these positions, we predict that the methylation levels in ‘target’ in tissue ‘partial’ are closer to those in the outgroup (for a reference-derived methylation change) or to those in the reference (for an outgroup-specific methylation change) (Fig. 1c).
Eventually, we would like to apply our algorithm to a triad of species made of modern human (reference), archaic human (target) and chimpanzee (outgroup). However, lacking DNA methylation in non-skeletal tissues of archaic humans, we optimized the parameters and quantified the performance based on a triad of extant species with available DNA methylation data in multiple tissues. To resemble the triad that we will eventually use, we used three modern primate species: modern humans as the ‘reference’, chimpanzees as the ‘target’ and rhesus macaques as the ‘outgroup’.
We trained the algorithm for the above triad of extant species, using heart as the tissue ‘all’ and kidney as the tissue ‘partial’ (Methods, Supplementary Tables 1 and 2 and Extended Data Figs. 1 and 2). For predicting reference-derived fundamental changes, we obtained a precision of 0.85, compared with an average of 0.22 in permuted data. For predicting outgroup-specific fundamental changes, we obtained a precision of 0.77, compared with an average of 0.40 in permuted data (Fig. 2a). Elevated precision for permuted data when predicting outgroup-specific fundamental changes is expected and, in fact, inherent to our algorithm. This stems from the fact that the evolutionary distance between the outgroup and the reference and/or target is, by definition, longer than the evolutionary distance between the target and the reference, leading to a greater similarity in the DNA methylation patterns of the reference and the target, and hence to an increased probability of finding by chance configurations that resemble outgroup-specific fundamental changes (‘Discussion’). In any case, for both reference-derived changes and outgroup-specific changes, the algorithm shows training set precision which is significantly higher than random.
a, Precision of the algorithm using the optimized parameters on observed data (blue) compared with permutations (orange) in the detection of outgroup-specific (left) and reference-derived (right) fundamental changes. b, Validation of the algorithm on tissues that were not used for training. Error bars denote 1 s.d.
We validated our algorithm using DNA methylation data from liver and lung in these three species, which are tissues on which the algorithm was not trained. In each comparison, the precision achieved using the real data significantly exceeded the precision that was achieved using permuted data. The observed precision in predicting fundamental changes was in the range 0.7–0.84, compared with 0.21–0.37 on permuted data (Fig. 2b and Supplementary Table 3). Flipping the roles of reference and target, namely picking modern human as the target and chimpanzees as the reference, yielded very similar performance estimations (Supplementary Table 3 and Extended Data Fig. 3). To evaluate the statistical significance of the precision levels we obtain, we performed 1,000 permutations, in which in each permutation we randomly shuffled the tags ‘reference’, ‘target’ and ‘outgroup’ across the samples, keeping the total number of reference, target and outgroup samples fixed. For reference-derived inference, we obtained that 0.004 of the permutations yielded higher precision than our algorithm, whereas this fraction was 0.021 for outgroup-specific inference. Combined, these results show that our algorithm achieves significantly higher precision than expected by chance, even when generalized to tissues and a species combination on which it was not trained.
Because of the need to achieve sufficient statistical power, reconstructing DNA methylation maps of aDNA requires the use of moving averages across windows of consecutive CpG positions8. Although we showed in the past that this smoothing does not substantially affect the detection of differentially methylated regions (DMRs)9, we nevertheless wished to examine the effect of smoothing on the precision of our algorithm. We therefore applied the algorithm to a smoothed version of the above data (Methods) and found that the precision remains very similar to that obtained for non-smoothed data, and that it is always significantly higher than the precision achieved on permuted data (Supplementary Table 3).
Identifying differential methylation in neurons
The method we developed can be used to infer on aDNA methylation in any tissue. Because the large volume and high complexity of the human brain stand out as hallmarks of modern humans, we decided to demonstrate the applicability of our method to this organ first. To this end, we applied the algorithm with bone serving as the tissue ‘all’ and neuronal cell lines from the prefrontal cortex as the tissue ‘partial’.
For bones, we considered published DNA methylation data (including data generated by us in previous works), data of four human femora measured by us for the current study and reconstructed DNA methylation maps in ancient samples, including maps we reconstructed in previous works10,17 and a new map of Mesolithic anatomically modern human from Sweden19 that we reconstructed for this study (Methods and Supplementary Table 4). DNA methylation in prefrontal cortex cell lines was taken from published studies20,21. In total, neuronal DNA methylation data comprise 26 modern humans, 11 chimpanzees and 15 macaques, and bone DNA methylation data comprise 11 modern humans (5 measured, 6 reconstructed), 2 archaic humans, 7 chimpanzees and 10 macaques.
We first estimated the precision of our algorithm on a triad of extant species, composed of modern human as reference, chimpanzee as target, and rhesus macaque as outgroup. This yielded a precision of 0.81 in predicting outgroup-specific fundamental changes, and 0.92 in predicting reference-derived fundamental changes, both being significantly higher than the values achieved for random permutations (Fig. 3a).
We then applied the algorithm to the triad composed of modern human as ‘reference’, archaic human as ‘target’ and chimpanzee as ‘outgroup’, and identified 1,750 CpG positions presenting putative outgroup-specific fundamental changes, and 122 CpG positions presenting putative reference-derived fundamental changes. Of these, 1,131 CpG positions (64.6%), presenting putative outgroup-specific changes, are located in the promoter or gene body of 870 genes, and 80 CpG positions (61.1%), presenting putative reference-derived changes, are located in the promoter or gene body of 71 genes (Supplementary Table 5).
Archaic humans, which serve as ‘target’, comprise only two samples. To assess the possible reduction in power because of this small sample set, we repeated the analysis on the triad of extant species, but randomly downsampled the ‘target’ chimpanzee from 11 samples to only 2. We still detected 7 of the original 8 CpG positions associated with reference-derived changes, and 483 of the 531 original CpG positions associated with outgroup-specific changes. This suggests that the use of only two samples in the target set reduces the power of our analysis by approximately 10–12%.
We accounted for batch effects and evaluated the false discovery rate (FDR) for our actual triad by performing 1,000 permutations, shuffling the tags ‘reference’, ‘target’ and ‘outgroup’ of the bone samples and counting the number of detected reference-derived and outgroup-specific changes. On average, we observed 0.49 reference-derived CpG positions (FDR = 0.005) and 68.6 outgroup-specific CpG positions (FDR = 0.044).
A considerable number of the CpG positions associated with fundamental changes are grouped in clusters. This observation probably reflects the known correlation between adjacent CpG positions, and indicates regional changes in methylation in regulatory regions, where one often finds a close grouping of multiple differentially methylated CpGs22,23. We hypothesized that even isolated CpG positions showing fundamental changes would represent regional methylation changes. Namely, that the adjacent CpG positions would show changes in DNA methylation compatible with the same fundamental changes, despite not crossing the significance threshold. To test this, we computed the average methylation in such adjacent CpG positions, and compared it with both the nearby isolated CpG positions showing a fundamental change and the average methylation level in the genome. As expected, we found that in 89.1% of the cases the methylation level in adjacent CpGs clustered with the neighbouring isolated CpG position, suggesting that even loci featuring an isolated CpG position with a fundamental change might, in fact, represent a regional methylation change.
CpG positions with fundamental methylation changes are expected to preferentially fall inside regions we detected in a previous work as differentially methylated across the same triad, using a subset of the bone samples used in the current study9. Indeed, 172 of the 1,750 outgroup-specific positions (9.8%) fall in the list of 2,031 DMRs that separate chimpanzees and humans (P < 10−200 hypergeometric test), and 3 of the 122 reference-derived positions (2.5%) fall in the list of 873 anatomically modern human-derived DMRs (P = 2.8 × 10−6). Likewise, 137 of the 870 (15.7%) genes associated with outgroup-specific changes overlap with the list of 1,185 differentially methylated genes separating chimpanzees and humans (P = 7.9 × 10−40), and 5 of the 71 (7%) genes associated with reference-derived changes overlap with the list of 588 differentially methylated genes that have changed in modern humans (P = 4.8 × 10−3). Notably, we do not expect a complete overlap, because we used an upgraded version of the reconstruction algorithm, including new filtering techniques and histogram matching24.
Testing for enrichment in biological functions of our gene lists yielded no significant terms. This is expected, given that a fundamental methylation change is, by definition, a change that is shared across multiple tissues. Specifically, fundamental methylation changes that are shared between bone and brain could be associated with brain, bone or any other tissue (Fig. 1a). Therefore, these changes could be related to a wide range of functions, making it unlikely that specific functional enrichment would be observed. However, given the importance of the brain in human evolution, and the lack of methylation data from brain in archaic human, we focus here on changes that might have a relation to the nervous system.
We focused on genes associated with a particularly large number of positions with fundamental changes (Fig. 4 and Supplementary Table 6). In this context, we identified 26 outgroup-specific changes and 3 reference-derived changes localized in the locus containing the genes ZNF707 and CCDC166. Notably, our knowledge about the functions of these genes is limited, with scant evidence pointing at potential relevance to cancer25,26. Nonetheless, given its classification as a zinc-finger protein, it is reasonable to hypothesize that ZNF707 probably possesses an as-yet undiscovered regulatory role.
a, Genes featuring more than four outgroup-specific fundamental changes. Instances in which changes overlap multiple genes are consolidated in the same column. Methylation alterations overlapping genes associated with the Olduvai domain (NBPF8, NBPF9 and NBPF10) are collectively assembled in the column furthest right. b, Genes featuring more than two reference-derived fundamental changes, along with the reference-derived change that intersects with NBPF9, which harbours the Olduvai domain.
Next in terms of the number of fundamental changes is the gene ADAMTS2, which exhibits 15 outgroup-specific changes. This gene product regulates neuronal migration by cleaving reelin27, and has been implicated in schizophrenia because it activates dopaminergic signalling and exhibits overexpression in the blood of people during episodes of psychosis (a phenomenon that can be reversed by antipsychotic medications)28,29. Moreover, in the Human Phenotype Ontology30, ADAMTS2 shows associations with language impairment, intellectual disability, microcephaly, neurodevelopmental delay and other brain-related terms. The gene NF-YA harbours five outgroup-specific changes. NF-YA is a component of the NF-Y transcription factor, which protects neurons from cell death31. Inactivation of NF-Y has been linked to various forms of neuronal pathologies32,33. PIWIL1 exhibits seven outgroup-specific fundamental changes and regulates neuronal polarization and migration34. BRCA1 harbours five outgroup-specific fundamental changes. Apart from being one of the most well-studied tumour suppressors, BRCA1 plays a role in the survival of neural progenitors35. It is also expressed in the hippocampus, and has been observed to be under-expressed in patients with Alzheimer’s disease. Knocking down this gene in the dentate gyrus leads to learning and memory deficits, impairs synaptic plasticity and neuronal shrinkage36. Notably, hypomethylation in a specific region in BRCA1, which does not overlap any of the positions with fundamental changes, has been associated with Alzheimer’s disease37. Moreover, a variant of BRCA1 has been associated with intellectual disability38. AHRR, which also harbours five changes, represses aryl hydrocarbon receptor (AHR), an important mediator for many cellular events. The main hallmarks of brain ageing, including oxidative stress, neuroinflammation and neurogenesis, are affected by AHR, suggesting its function as a regulator for ageing in the brain, together with its effect on the nervous system development39,40. The deubiquitylase OTUB1 also exhibit five changes encoding the gene OTUB1. This gene mediates neuronal survival after intracerebral haemorrhage41,42. The effects of this protein on neuroimmune response contribute to pathologies associated with brain diseases such as Parkinson’s disease and multiple sclerosis43,44.
Turning our attention to genes associated with reference-derived changes, of eight such genes with three altered positions (the highest number of such alternations in a gene), three show a relationship to the brain. Mutations in TMEM216 have been linked to syndromes that involve brain abnormalities45,46. TMEM216 regulates ciliogenesis, a process with substantial importance in brain development, and disruptions in TMEM216 function lead to impairments in various neurodevelopmental processes47. Another gene linked to ciliary function is CROCC. CROCC encodes rootletin, a protein involved in the formation of ciliary rootlets. Mutations in this gene in Drosophila have been associated with sensory deficits48. PSMC2 is a component of the 26S proteasome complex that plays a role in various neural processes, including synaptic plasticity49,50 and brain autophagy51. Notably, impairments in the 26S proteasome complex have been strongly associated with the presence of brain inclusions such as alpha-synuclein and tau tangles52.
Five CpG positions associated with outgroup-specific fundamental changes, and another one associated with reference-derived changes, are located in genes from the neuroblastoma breaking point family (NBPF) gene family, inside the LOC100288142 locus (Fig. 4). NBPF proteins are enriched with a protein domain called the Olduvai domain that is key to human brain evolution. This domain, which is found almost exclusively in NBPF proteins, went through a human-specific increase in copy number, reaching approximately 300 copies. Chimpanzees show less than half that number of copies, and the number drops to several dozens of copies in other Old World monkeys, and to around a single copy in non-primate mammals. The domain is absent in non-mammals53. Olduvai domain copy number variation is correlated with brain volume54,55, cognitive functions56 and autistic symptoms57,58,59. It is also negatively correlated with schizophrenia60. NBPF genes in general have been found to enhance neural stem cell proliferation61. Four of the five outgroup-specific fundamental changes are located in NBPF10, one of the NBPF genes with the highest number of human-specific Olduvai domain copies. NBPF10 is adjacent to NOTCH2NL, an additional gene linked to human brain volume62,63, and it has been suggested that the two evolved in tandem64.
To further explore potential regulatory effects of the differential methylation, we examined whether CpG positions with fundamental methylation changes tend to overlap enhancers associated with bones and neurons more than would be expected by chance (Methods). In bones, we observed a significant overlap for both outgroup-specific (298 intersecting positions; P = 9.04 × 10−10) and reference-derived CpG positions (28 intersecting positions; P = 2.66 × 10−4). In neurons, significant overlap was only evident for outgroup-specific CpG positions (81 intersecting positions; P = 5.37 × 10−7), whereas no significant overlap was observed for reference-derived CpG positions (4 intersecting positions). Next, we examined genes that are associated with enhancers that overlap CpG positions with differential methylation. Although there are no such genes for reference-derived CpG positions, there are 330 and 480 genes for outgroup-specific CpG positions in bones and neurons, respectively. Enrichment analysis of these genes revealed similar patterns, despite representing two distinct tissues, highlighting organelle functions over tissue-specific ones (Supplementary Table 7). This aligns with expectations, considering that fundamental changes affect multiple tissues.
We next searched for motifs that overlap these enhancers and transcription factors that bind these motifs (Supplementary Table 8). In outgroup-specific changes we found enrichment of transcription factors that are related to brain development (GLI1, GLI2, ZIC1, NHLH2, NR2F2, ZIC2 and ZNF148; FDR = 2.3 × 10−3) and specifically forebrain development (GLI1, GLI2, ZIC1, NHLH2 and NR2F2; FDR = 0.01).
Application to bulk prefrontal cortex
In addition to prefrontal neurons, DNA methylation is also available from bulk prefrontal cortex tissue from humans, chimpanzees and rhesus macaque (Methods and Extended Data Fig. 4)65,66. To examine whether the methylation changes we identified in neurons can be detected in whole tissues, we repeated the analysis with these bulk DNA methylation measurements.
We first evaluated the performance of our algorithm using the same triad of extant species as we used before, setting modern human as ‘reference’, chimpanzee as ‘target’ and rhesus macaque as ‘outgroup’, with bone serving as the tissue ‘all’ and bulk prefrontal cortex as the tissue ‘partial’. Unfortunately, there is a small number of samples of DNA methylation in bulk prefrontal cortex (three humans, three chimpanzees and two rhesus macaques) with relatively lower coverage (Supplementary Table 4), compromising the power of our approach to detect differentially methylated positions. Indeed, although the precision we achieved is significantly higher than for permuted data (P < 10−4), we observed lower values than in previous analyses. We obtained a precision of 0.66 in predicting reference-derived changes (compared with a mean of 0.32 for permuted data) and a precision of 0.55 in predicting outgroup-specific changes (compared with a mean of 0.36 for permuted data) (Fig. 3b).
We then considered archaic humans as ‘target’ and chimpanzees as ‘outgroup’ and applied our algorithm to predict differential methylation in bulk brain tissue. We found 396 CpG positions associated with putative outgroup-specific changes and 53 CpG positions associated with putative reference-derived changes. The outgroup-specific CpG positions are associated with 248 genes and the reference-derived CpG positions are associated with 35 genes (Supplementary Table 9). Reassuringly, we found high consistency between these CpG positions and those identified using prefrontal neurons. In total, 156 of the 396 of the outgroup-specific CpG positions appear among the 1,750 positions identified using prefrontal neurons (P < 10−300, hypergeometric test) and 8 of the 53 of the reference-derived CpG positions appear among the 122 positions identified using prefrontal neurons (P = 4.7 × 10−44). Similarly, of the 248 genes associated with outgroup-specific changes in bulk prefrontal cortex, 116 were found in the corresponding list of 870 genes identified using prefrontal neurons (P < 10−100) and 8 of the 35 genes associated with reference-derived changes were also identified using neurons (P = 3.54 × 10−16).
Discussion
Based on the embryonic developmental timing of an evolutionary methylation change, we defined two families of changes; fundamental changes are those that occurred before the developmental split between a pair of tissues, whereas tissue-specific changes are those that occurred after the split. In this work, we developed an algorithm to detect CpG positions that underwent fundamental changes, because they result in DNA methylation differences across the examined species that are also replicated in the tissue ‘partial’ (Fig. 1c). Similar methodology can be used to detect tissue-specific changes as well, although these result in no change in DNA methylation across the species in the tissue ‘partial’ (Fig. 1c), hence potentially presenting more limited evolutionary interest. The main algorithmic modification that is required would be to use statistical tests for equivalence, such as the two one-sided t-tests (TOST) procedure67.
The algorithm can be used to determine the methylation state in non-skeletal tissues of archaic humans, as long as methylation data are available for this tissue in both modern humans and chimpanzees (or any other non-human ape, in the case that bone methylation data are available for this species as well). This ability to obtain information on DNA methylation in archaic tissues that are not accessible in the palaeontological record opens new ways to examine changes in gene regulation and their potential effect on evolutionary adaptations in humans.
To quantify performance, we tested the algorithm on three extant species, comprising modern humans, chimpanzees and rhesus macaques (Fig. 2a). The longer branches on this phylogenetic tree (median divergence time between modern humans and rhesus macaques is 28.9 million years ago (Ma), compared with the median divergence time between modern humans and chimpanzees of 6.4 Ma (ref. 68)) result in an underestimation of the performance of the algorithm, because the likelihood of independent reversal of the methylation state along each branch increases. We therefore estimate that our algorithm works with even higher precisions than those reported here. We used the algorithm to find genomic positions in which the methylation state can be determined in the brains of archaic humans. We focused on the brain because of its central role in human evolution and the potential implications of brain-related differences between human groups, and between humans and chimpanzees. However, the algorithm we present is general and can be used to determine the methylation state in other archaic tissues. In fact, brain is probably among the most challenging choices, given its large developmental distance from bone (Fig. 1a). A tissue that is developmentally closer to bone, such as muscle or heart, is likely to result in a larger number of positions whose methylation state can be determined.
We showed that we are able to detect two types of fundamental DNA methylation changes. Reference-derived changes are those in which DNA methylation levels in modern humans are different from those in both archaic humans and chimpanzees, representing methylation patterns that are unique to modern humans. Outgroup-specific changes are those in which DNA methylation levels in both modern and archaic humans are different from those in chimpanzees, representing methylation patterns that are shared across modern and archaic humans. In all analyses, we detect fewer reference-derived positions than outgroup-specific ones, although with higher precision. This is expected considering the much shorter evolutionary time span in which the methylation change could have occurred for reference-derived changes compared with outgroup-specific ones (Fig. 1b). Shorter distances mean fewer methylation change events, which result in lower numbers of detected events, but also in lower probability for reversal of methylation changes, and therefore higher precision of parsimony-based inference.
We present here evidence for a strong connection between genes associated with fundamental changes and neuronal functions. Among the most interesting findings is the identification of six fundamental changes in NBPF genes carrying Olduvai domains, which seem to be associated with human brain evolution. Whereas the association of Olduvai domains to brain evolution has focused thus far on their copy number, our findings suggest that methylation changes might also have a role in this; in particular, NBPF10, has four fundamental methylation changes and is also one of the genes with the largest number of Olduvai domains. With that, future research is imperative to establish causality and validate the functional importance of these methylation changes on phenotypes.
The number of CpG positions that can be tested for differential methylation is constrained by the limited availability of DNA methylation data from primate bone, because part of the data was produced using Illumina Infinium MethylationEPIC BeadChip methylation array (Supplementary Table 4), which represent approximately 3% of (but enriched for functional) CpG positions in the genome. As a result, our set of CpG positions associated with fundamental changes represents only a subset of the positions that show such an association. In addition, restrictions come from the fact that we considered a CpG position only if data were available in at least two samples from each group. This criterion resulted in a restricted set of CpGs available for analysis, ranging from 1% to 17% of autosomal CpGs, depending on the tissues selected for each analysis. These limitations underscore the need for an expanded dataset of bone methylation.
For reference-derived changes, we require that the mean methylation in the ‘target’ will be closer to the mean methylation of the ‘outgroup’ than to that of the ‘reference’, whereas for outgroup-specific changes we require the opposite. Addition of this requirement was important for reducing the level of false predictions, which result from the small number of samples in the ‘target’ compared with the other groups. However, when this additional requirement is used, the algorithm is unable to detect CpG positions that have gone through gradual methylation change (for example, that the ‘outgroup’ is unmethylated, the ‘target’ is partially methylated and the ‘reference’ is highly methylated). Such positions could potentially be detected as being simultaneously reference-derived and outgroup-specific methylation changes. However, mean methylation in the ‘target’ group will be closer to either the ‘outgroup’ or the ‘reference’. Therefore, the additional requirement will lead the algorithm to detect only one of the two types of fundamental changes, and information on gradual changes will be lost. We hope that in the future more methylation maps of archaic humans will be published, making use of the additional requirement unnecessary, and allowing for the detection of gradual fundamental methylation changes.
Overall, we present an algorithm to infer DNA methylation in non-skeletal archaic tissues. The algorithm can be applied to any triad of species or groups and could be useful whenever DNA methylation in a certain tissue is unavailable or difficult to obtain.
Methods
DNA methylation data
We collected and generated DNA methylation data from seven tissues and cell types (neurons, bone, kidney, heart, liver, lung and bulk brain) across three species (modern human, chimpanzee and rhesus macaque). In addition, we used reconstructed DNA methylation data in bones of modern and archaic humans. Data were collected only from autosomes, to avoid the unique methylation patterns that characterize sex chromosomes.
Bone DNA methylation of present-day individuals published in this study
DNA methylation maps from four femur head bones from present-day humans were generated using whole-genome bisulfite sequencing (WGBS). Patients were females with osteoarthritis. Samples were extracted during total hip replacement surgery and were taken from the healthy part of the bone. All four patients signed a form according to Helsinki approval 0178-13-HMO. DNA extraction and the WGBS protocol are described in a previous publication9 (Supplementary Table 4). In brief, DNA was extracted from bones using a QIAamp DNA Investigator kit (Qiagen, catalogue no. 56504) and DNA libraries were built with the Illumina TruSeq Sample Preparation kit. Bisulfite treatment was applied in two rounds using the EpiTect Bisulfite kit (Qiagen) and paired-end sequencing was performed on an Illumina Hi-Seq 2000 instrument. We aligned the reads to hg19 genome assembly using Bismark v.0.23.0 with the following parameters: --bowtie2 --non_bs_mm -p 4 --multicore 4. We then used bismark_methylation_extractor to extract methylation calls with the following parameters: -p --parallel 4 --bedGraph. We did not perform filtration of CpG positions by coverage, as we used the default Bismark setting of including positions with a minimum coverage of 1×.
Previously published modern DNA methylation
Neuronal data include WGBS DNA methylation data measured in neurons isolated from prefrontal cortices of humans, chimpanzees and macaques. Data for humans were downloaded from Gene Expression Omnibus (GEO) accession number GSE107638 (ref. 20), taking only the 25 healthy controls. Data for chimpanzees and macaques were downloaded from GEO accession number GSE151768 (ref. 21). WGBS DNA methylation data for kidneys, hearts, livers and lungs of four specimens for each of these species were downloaded from GEO accession number GSE112356 (refs. 69,70). Bulk WGBS brain data that were generated from prefrontal cortices of three modern humans and three chimpanzees was downloaded from GEO accession number GSE37202 (ref. 66). Bulk WGBS data of two macaques were downloaded from GEO accession number GSE77124 (ref. 65). For bone, we used ten Illumina Infinium MethylationEPIC BeadChip (850K) methylation arrays from rhesus71 and data acquired by Gokhman et al.9 including one modern human and one chimpanzee WGBS map, one chimpanzee reduced representation bisulfite sequencing map and four chimpanzee 850K methylation arrays (Supplementary Table 4).
Most non-human DNA methylation data were already mapped to the human hg19 reference genome. However, this was not done for rhesus bulk brain samples, and we therefore mapped these data to hg19 using liftOver72. This mapped ~30% of the rhesus CpG positions to a corresponding CpG position in human. However, in a significantly enriched number of cases, the rhesus CpG positions were mapped to a position preceding a human CpG position by a single base (Extended Data Fig. 4). We considered these as valid mappings, ending up with ~40% of the rhesus CpG positions being successfully mapped to the human hg19 genome reference.
Ancient DNA methylation maps
We used our newest version of the RoAM software tool24 to reconstruct the methylome of a Mesolithic anatomically modern human individual from Stora Karlsö, Sweden (~9,000 years ago) that had been previously sequeneced19 (Supplementary Table 4). In addition, we used our previously published methylation maps of five ancient anatomically modern humans9, one Neanderthal and one Denisovan8, which at the time of writing are the only uracil DNA glycosylase-treated high-coverage archaic samples, two prerequisites for computational reconstruction of aDNA methylation17.
Batch effects
To minimize batch effects stemming from combining data from different sources, we took several measures. First, all data used for optimization and cross-validation were taken from the same laboratory69,70. Similarly, all the brain data, including both prefrontal cortex neurons and bulk brain tissue, were taken from the same laboratory20,21,65,66.
Most of the bone data were produced by us, either specifically for this paper or in previous work9. To address consistency between the modern bone samples and the ancient ones, we used histogram matching during the DNA methylation reconstruction, fitting the reconstructed histogram to that of previously published modern bone data9,73.
Finally, remaining batch effects are accounted for in the estimated false discovery rates of our method.
Detecting CpGs positions with differential DNA methylation
The algorithm receives methylation data from three species and two tissues (Fig. 1b). Only CpG positions where, for each tissue and species, data are available from at least two samples were considered. In addition, only positions where the variance in methylation levels in each tissue and species was <0.08 were considered. This threshold was selected to guarantee that a minimum set of two samples will have at most a 40% difference in methylation level.
Identifying reference-derived fundamental changes
Let us have methylation data for N CpG positions and let \(0\le {m}_{i,\;j}^{s,u}\le 1\) be the measured methylation level of sample j at CpG position i (i = 1, …, N), in species s and tissue u. We say that the methylation level at CpG position i is reference-derived in tissue ‘all’ if
Here, \({t}_{i}^{r,a}\) is the statistic of an equal-variance t-test at CpG position i in tissue ‘all’, contrasting the ‘reference’ with the ‘target’ and ‘outgroup’, and \({t}^{r,a}\) and \({\sigma }^{r,a}\) are the estimators of the mean and the standard deviation of the statistic across the N positions. \({k}_{a}^{{rd}}\) is a parameter.
Similarly, we say that CpG position i shows differential methylation between the ‘reference’ and the ‘outgroup’ in tissue ‘partial’ if
Here, \({t}_{i}^{{ro},p}\) is the statistic of an equal-variance t-test at CpG position i in tissue ‘partial’, contrasting the ‘reference’ with the ‘outgroup’, and \({t}^{{ro},p}\) and \({\sigma }^{{ro},p}\) are the estimators of the mean and the standard deviation of the statistic across the N positions. \({k}_{p}^{{rd}}\) is a parameter.
To identify reference-derived fundamental changes (left-hand column in Fig. 1c), we look for CpG positions i that satisfy:
Inequality (equation (1c)) guarantees that the methylation difference between the ‘reference’ and the ‘outgroup’ in both tissues is in the same direction.
We found it useful to add a fourth condition to equation (1a–c), verifying that the mean methylation in ‘target’ is closer to the mean methylation of the ‘outgroup’ than to the mean methylation of the reference. If \(\overline{{m}_{\iota }^{r,a}}\), \(\overline{{m}_{\iota }^{t,a}}\) and \(\overline{{m}_{\iota }^{o,a}}\) are the mean methylation across samples in tissue ‘all’ in position i of the ‘reference’, ‘target’ and ‘outgroup’, respectively, then
In the implementation, we kept this additional condition optional because the algorithm detects methylation changes in high precision even without it. However, it is advisable to use it when the number of samples in ‘target’ is small compared with the other groups, because it prevents mispredictions that arise as a result of the lower weight of the ‘target’ samples in the t-test. Because only two archaic human samples serve as ‘target’ in the current study, we used this additional condition throughout the paper. Positions that satisfy equation (1a–d) are those for which we predict that the methylation in ‘target’ in tissue ‘partial’ clusters with that in ‘outgroup’, and different from that in ‘reference’ (Fig. 1c).
Identifying outgroup-specific fundamental changes
We follow a similar approach to identify fundamental changes in which the methylation change in tissue ‘all’ is outgroup-specific (right-hand column in Fig. 1c). CpG positions that we predict have gone through outgroup-specific fundamental changes should satisfy:
Here, \({t}_{i}^{o,a}\) is the statistic of an equal-variance t-test at CpG position i in tissue ‘all’ contrasting the ‘outgroup’ with the ‘target’ and ‘reference’, \({t}^{o,a}\) and \({\sigma }^{o,a}\) are the estimators of the mean and the standard deviation of the statistic across the N positions, and \({k}_{a}^{{os}}\) and \({k}_{p}^{{os}}\) are parameters. Note that the t-test for tissue ‘partial’, equation (2b), is identical to the one for the reference-derived fundamental changes, equation (1b), because in both cases we wish to find a significant difference in tissue ‘partial’ between the ‘outgroup’ and the ‘reference’. Yet, we allow the parameters \({k}_{p}^{{rd}}\) and \({k}_{p}^{{os}}\) to be potentially different.
Here, too, we added a fourth condition that guarantees that the mean methylation in ‘target’ is closer to ‘reference’ than to ‘outgroup’,
In the implementation, this condition is optional, but we used it throughout this work.
Measuring performance
Whenever the true value of the methylation in the ‘target’ at tissue ‘partial’ is known, we can use it to estimate the precision of our algorithm. To this end, we define CpG positions for which there is truly a reference-derived methylation change in tissue ‘partial’ as those that satisfy
Here, \({t}_{i}^{\,r,p}\) is the statistic of an equal-variance t-test at CpG position i in tissue ‘partial’, contrasting the ‘reference’ with the ‘target’ and ‘outgroup’, \({t}^{r,p}\) and \({\sigma }^{r,p}\) are the estimators of the mean and the standard deviation of the statistic across the N positions, and \({k}_{t}^{{rd}}\) is a parameter. Similarly, we define CpG positions for which there is truly an outgroup-specific methylation change in tissue ‘partial’ as those that satisfy
where \({t}_{i}^{o,p}\) is the statistic of an equal-variance t-test at CpG position i in tissue ‘partial’, contrasting the ‘outgroup’ with the ‘target’ and ‘reference’, \({t}^{o,p}\) and \({\sigma }^{o,p}\) are the estimators of the mean and the standard deviation of the statistic across the N positions, and \({k}_{t}^{{os}}\) is a parameter.
Let Sp be the set of CpG positions for which we predict a reference-derived fundamental change, namely, those positions that satisfy equation (1a–d). Let Se be the set of reference-derived fundamental changes that are considered ‘true’ changes, defined as the collection of positions that satisfy
where equation (5a) is identical to equation (1a) and (5b) is identical to equation (3). Then, the precision of our algorithm that predicts reference-derived fundamental changes is
where |S| is the size of set S.
Analogously, for outgroup-specific fundamental changes we define Sp as the set of positions that satisfy equation (2a–d), and Se as the set of positions that are taken as ‘true’ outgroup-specific fundamental change, namely the set of positions that satisfy
where equation (7a) is identical to equation (2a) and (7b) is identical to equation (4). Precision is defined as in equation (6).
Parameter estimation
Overall, the algorithms that predict reference-derived changes and outgroup-specific changes are independent of each other, and each respective set of parameters can be optimized separately. Each algorithm uses two parameters, \({k}_{a}^{{rd}}\) and \({k}_{p}^{{rd}}\), for predicting reference-derived changes (equation (1a–d)), and \({k}_{a}^{{os}}\) and \({k}_{p}^{{os}}\) for predicting outgroup-specific changes (equation (2a–d)). Moreover, the performance of each algorithm is determined using an additional parameter, either \({k}_{t}^{{rd}}\) for the reference-derived changes or \({k}_{t}^{{os}}\) for the outgroup-specific changes. All three parameters for each algorithm were optimized together. Permuted data were generated by pairing a random CpG position in tissue ‘partial’ to each position in tissue ‘all’. For each analysis, 10,000 permuted datasets were generated. The optimized parameters were chosen as the values that gave the largest difference between the precision of the algorithm on the real data and its precision on permutated data.
We performed the optimization process by using chimpanzee as ‘target’, modern humans as ‘reference’ and rhesus macaque as ‘outgroup’ and choosing heart as tissue ‘all’ and kidney as tissue ‘partial’. We scanned all values of the parameters in a grid of values from 2 to 4 using linear spacing of 0.2 for each parameter. Notably, for each value of the parameters, both in the reference-derived and the outgroup-specific comparisons, the precision of the algorithm when applied to the real data was higher than the average precision of the corresponding permutated data (Extended Data Fig. 1). Moreover, not a single permutation in any set of parameters achieved precision that exceeded that of the real data. The optimized values of the parameters are given in Supplementary Table 1.
Some of the optimized parameters take their value at the edge of the grid, pointing at the possibility of achieving even higher precision using values that are outside the current grid. However, we noticed that in these cases precision reaches a plateau, so that the improvement is minor when changing the value of the parameter. Combined with the fact that values of the parameters outside the grid decreased the number of detections and hence reduced the power of the analysis, we decided to bound the value of the parameters and not extend the grid (Extended Data Fig. 2). The set of optimized parameters is remarkably robust to the selection of the training dataset and changed only slightly when using other tissue combinations (Supplementary Table 2).
Smoothing data
Smoothing of methylation data in modern samples was performed in the cross-validation part only. It was done using a moving average with a fixed window size over CpG positions. We used a window size of 31 CpG positions, to match typical window sizes used for reconstructing ancient methylation8,9.
Gene enrichment analysis
A CpG position with a predicted fundamental change is associated with a gene if it resides in the gene body or in the promoter region, defined as 5,000 bp upstream of the transcription start site to 1,000 bp downstream of it. Gene body coordinates and gene names were determined using the hg19 genome reference downloaded from UCSC genome assembly74.
We used DAVID75,76 to test for enrichment of our gene lists in biological functions. We used all default datasets.
Enhancer enrichment analysis
Enhancer locations were downloaded from EnhancerAtlas77. For bone enhancers we used the osteoblasts datasets. For neuron enhancers we combined data of embryonic stem cell neurons and cerebellum neurons. Enrichment was evaluated using a hypergeometric test followed by Benjamini–Hochberg FDR procedure. EnhancerAtlas includes a comprehensive list of genes associated with each enhancer. We used DAVID to investigate functional enrichments for genes linked to enhancers that intersect with the detected positions. Further, we used MEME-ChIP78 to examine which of the positions that overlap enhancers also overlap binding motifs, and then used TOMTOM79 to identify transcription factors that bind these motifs.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All sequencing and methylation data generated for this study was deposited in the National Center for Biotechnology Information’s Gene Expression Omnibus under GEO access number GSE276666.
Code availability
The MATLAB code can be downloaded from http://carmelab.huji.ac.il/software.html.
Change history
05 September 2025
A Correction to this paper has been published: https://doi.org/10.1038/s41559-025-02865-7
References
King, M. C. & Wilson, A. C. Evolution at two levels in humans and chimpanzees. Science 188, 107–116 (1975).
Romero, I. G., Ruvinsky, I. & Gilad, Y. Comparative studies of gene expression and the evolution of gene regulation. Nat. Rev. Genet. 13, 505–516 (2012).
Wray, G. A. The evolutionary significance of cis-regulatory mutations. Nat. Rev. Genet. 8, 206–216 (2007).
Carroll, S. B. Evo-Devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell 134, 25–36 (2008).
Colbran, L. L. et al. Inferred divergent gene regulation in archaic hominins reveals potential phenotypic differences. Nat. Ecol. Evol. 3, 1598–1606 (2019).
Fordyce, S. L. et al. Deep sequencing of RNA from ancient maize kernels. PLoS ONE 8, e50961 (2013).
Pedersen, J. S. et al. Genome-wide nucleosome map and cytosine methylation levels of an ancient human genome. Genome Res. 24, 454–466 (2014).
Gokhman, D. et al. Reconstructing the DNA methylation maps of the Neandertal and the Denisovan. Science 344, 523–527 (2014).
Gokhman, D. et al. Differential DNA methylation of vocal and facial anatomy genes in modern humans. Nat. Commun. 11, 1189 (2020).
Jones, P. A. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat. Rev. Genet. 13, 484–492 (2012).
Mathov, Y., Batyrev, D., Meshorer, E. & Carmel, L. Harnessing epigenetics to study human evolution. Curr. Opin. Genet. Dev. 62, 23–29 (2020).
Gokhman, D. et al. Reconstructing Denisovan anatomy using DNA methylation maps. Cell 179, 180–192.e10 (2019).
Pai, A. A., Bell, J. T., Marioni, J. C., Pritchard, J. K. & Gilad, Y. A genome-wide study of DNA methylation patterns and gene expression levels in multiple human and chimpanzee tissues. PLoS Genet. 7, e1001316 (2011).
Böck, J. et al. Cell type and species-specific patterns in neuronal and non-neuronal methylomes of human and chimpanzee cortices. Cereb. Cortex 28, 3724–3739 (2018).
Palkopoulou, E. et al. Complete genomes reveal signatures of demographic and genetic declines in the woolly mammoth. Curr. Biol. 25, 1395–1400 (2015).
Seguin-Orlando, A. et al. Pros and cons of methylation-based enrichment methods for ancient DNA. Sci. Rep. 5, 11826 (2015).
Gokhman, D., Meshorer, E. & Carmel, L. Epigenetics: it’s getting old. Past meets future in paleoepigenetics. Trends Ecol. Evol. 31, 290–300 (2016).
Morgan, H. D., Santos, F., Green, K., Dean, W. & Reik, W. Epigenetic reprogramming in mammals. Hum. Mol. Genet. 14, R47–R58 (2005).
Günther, T. et al. Population genomics of Mesolithic Scandinavia: investigating early postglacial migration routes and high-latitude adaptation. PLoS Biol. 16, e2003703 (2018).
Berto, S. et al. Accelerated evolution of oligodendrocytes in the human brain. Proc. Natl Acad. Sci. USA 116, 24334–24342 (2019).
Jeong, H. et al. Evolution of DNA methylation in the human brain. Nat. Commun. 12, 2021 (2021).
Zhang, W., Spector, T. D., Deloukas, P., Bell, J. T. & Engelhardt, B. E. Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements. Genome Biol. 16, 14 (2015).
Affinito, O. et al. Nucleotide distance influences co-methylation between nearby CpG sites. Genomics 112, 144–150 (2020).
Mathov, Y. et al. RoAM: computational reconstruction of ancient methylomes and identification of differentially methylated regions. Preprint at bioRxiv https://doi.org/10.1101/2024.08.08.607143 (2024).
Chen, S. et al. A novel DNA methylation signature associated with lymph node metastasis status in early gastric cancer. Clin. Epigenetics 14, 18 (2022).
Kim, J. et al. Identification of candidate genes associated with susceptibility to ovarian clear cell adenocarcinoma using cis-eQTL analysis. J. Clin. Med. 9, 1137 (2020).
Yamakage, Y. et al. A disintegrin and metalloproteinase with thrombospondin motifs 2 cleaves and inactivates Reelin in the postnatal cerebral cortex and hippocampus, but not in the cerebellum. Mol. Cell. Neurosci. 100, 103401 (2019).
Crespo-Facorro, B., Prieto, C. & Sainz, J. Schizophrenia gene expression profile reverted to normal levels by antipsychotics. Int. J. Neuropsychopharmacol. 18, pyu066 (2015).
Ruso-Julve, F. et al. Dopaminergic control of ADAMTS2 expression through cAMP/CREB and ERK: molecular effects of antipsychotics. Transl. Psychiatry 9, 306 (2019).
Gargano, M. A. et al. The human phenotype ontology in 2024: phenotypes around the world. Nucleic Acids Res. 52, D1333–D1346 (2024).
Yamanaka, T. et al. NF-Y inactivation causes atypical neurodegeneration characterized by ubiquitin and p62 accumulation and endoplasmic reticulum disorganization. Nat. Commun. 5, 3354 (2014).
Yamanaka, T. et al. Differential roles of NF-Y transcription factor in ER chaperone expression and neuronal maintenance in the CNS. Sci. Rep. 6, 34575 (2016).
Yamanaka, T. et al. Mutant Huntingtin reduces HSP70 expression through the sequestration of NF-Y transcription factor. EMBO J. 27, 827–839 (2008).
Zhao, P. P. et al. Novel function of PIWIL1 in neuronal polarization and migration via regulation of microtubule-associated proteins. Mol. Brain 8, 39 (2015).
Pao, G. M. et al. Role of BRCA1 in brain development. Proc. Natl Acad. Sci. USA 111, E1240–E1248 (2014).
Suberbielle, E. et al. DNA repair factor BRCA1 depletion occurs in Alzheimer brains and impairs cognitive function in mice. Nat. Commun. 6, 8897 (2015).
Mano, T. et al. Neuron-specific methylome analysis reveals epigenetic regulation and tau-related dysfunction of BRCA1 in Alzheimer’s disease. Proc. Natl Acad. Sci. USA 114, E9645–E9654 (2017).
Algouneh, A. et al. Dual BRCA1 and BRCA2 pathogenic variants in an adolescent with syndromic intellectual disability. Clin. Case Rep. 10, e6202 (2022).
Gassmann, K. et al. Species-specific differential AhR expression protects human neural progenitor cells against developmental neurotoxicity of PAHs. Environ. Health Perspect. 118, 1571–1577 (2010).
Ojo, E. S. & Tischkau, S. A. The role of AhR in the hallmarks of brain aging: friend and foe. Cells 10, 2729 (2021).
Xie, L. et al. OTUB1 attenuates neuronal apoptosis after intracerebral hemorrhage. Mol. Cell. Biochem. 422, 171–180 (2016).
Shen, J. et al. Deubiquitylating enzyme OTUB1 facilitates neuronal survival after intracerebral hemorrhage via inhibiting NF-κB-triggered apoptotic cascades. Mol. Neurobiol. 61, 1726–1736 (2024).
Wang, X. et al. OTUB 1 inhibits CNS autoimmunity by preventing IFN‐γ‐induced hyperactivation of astrocytes. EMBO J. 38, e100947 (2019).
Xing, X., Xu, F., Wang, Y. & Liu, H. Role of the OTUB1/IRF7/NOX4 axis in oxidative stress injury and inflammatory responses in mice with Parkinson’s disease. Psychogeriatrics 23, 32–44 (2023).
Edvardson, S. et al. Joubert Syndrome 2 (JBTS2) in Ashkenazi Jews is associated with a TMEM216 mutation. Am. J. Hum. Genet. 86, 93–97 (2010).
Valente, E. M. et al. Mutations in TMEM216 perturb ciliogenesis and cause Joubert, Meckel and related syndromes. Nat. Genet. 42, 619–625 (2010).
Guo, J. et al. Developmental disruptions underlying brain abnormalities in ciliopathies. Nat. Commun. 6, 7857 (2015).
Chen, J. V. et al. Rootletin organizes the ciliary rootlet to achieve neuron sensory function in Drosophila. J. Cell Biol. 211, 435–453 (2015).
Tai, H. C., Besche, H., Goldberg, A. L. & Schuman, E. M. Characterization of the brain 26S proteasome and its interacting proteins. Front. Mol. Neurosci. 3, 1236 (2010).
Yi, J. J. & Ehlers, M. D. Emerging roles for ubiquitin and protein degradation in neuronal function. Pharm. Rev. 59, 14–39 (2007).
Ugun-Klusek, A. et al. Continued 26S proteasome dysfunction in mouse brain cortical neurons impairs autophagy and the Keap1-Nrf2 oxidative defence pathway. Cell Death Dis. 8, e2531–e2531 (2017).
Myeku, N. et al. Tau-driven 26S proteasome impairment and cognitive dysfunction can be prevented early in disease by activating cAMP-PKA signaling. Nat. Med. 22, 46–53 (2015).
Popesco, M. C. et al. Human lineage-specific amplification, selection, and neuronal expression of DUF1220 domains. Science 313, 1304–1307 (2006).
Zimmer, F. & Montgomery, S. H. Phylogenetic analysis supports a link between DUF1220 domain number and primate brain expansion. Genome Biol. Evol. 7, 2083–2088 (2015).
Dumas, L. J. et al. DUF1220-domain copy number implicated in human brain-size pathology and evolution. Am. J. Hum. Genet. 91, 444-454 (2012).
Davis, J. M. et al. DUF1220 copy number is linearly associated with increased cognitive function as measured by total IQ and mathematical aptitude scores. Hum. Genet. 134, 67 (2015).
Davis, J. M. et al. DUF1220 dosage is linearly associated with increasing severity of the three primary symptoms of autism. PLoS Genet. 10, 1004241 (2014).
Davis, J. M., Searles Quick, V. B. & Sikela, J. M. Replicated linear association between DUF1220 copy number and severity of social impairment in autism. Hum. Genet. 134, 569 (2015).
Davis, J. M., Heft, I., Scherer, S. W. & Sikela, J. M. A third linear association between Olduvai (DUF1220) copy number and severity of the classic symptoms of inherited autism. Am. J. Psychiatry 176, 643–650 (2019).
Searles Quick, V. B., Davis, J. M., Olincy, A. & Sikela, J. M. DUF1220 copy number is associated with schizophrenia risk and severity: implications for understanding autism and schizophrenia as related diseases. Transl. Psychiatry 5, e697 (2015).
Keeney, J. G. et al. DUF1220 protein domains drive proliferation in human neural stem cells and are associated with increased cortical volume in anthropoid primates. Brain Struct. Funct. 220, 3053–3060 (2014).
Fiddes, I. T. et al. Human-specific NOTCH2NL genes affect Notch signaling and cortical neurogenesis. Cell 173, 1356–1369.e22 (2018).
Florio, M. et al. Evolution and cell-type specificity of human-specific genes preferentially expressed in progenitors of fetal neocortex. eLife 7, e32332 (2018).
Fiddes, I. T., Pollen, A. A., Davis, J. M. & Sikela, J. M. Paired involvement of human-specific Olduvai domains and NOTCH2NL genes in human brain evolution. Hum. Genet. 138, 715–721 (2019).
Mendizabal, I. et al. Comparative methylome analyses identify epigenetic regulatory loci of human brain evolution. Mol. Biol. Evol. 33, 2947–2959 (2016).
Zeng, J. et al. Divergent whole-genome methylation maps of human and chimpanzee brains reveal epigenetic basis of human regulatory evolution. Am. J. Hum. Genet. 91, 455–465 (2012).
Schuirmann, D. J. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J. Pharmacokinet. Biopharm. 15, 657–680 (1987).
Kumar, S., Stecher, G., Suleski, M. & Hedges, S. B. TimeTree: a resource for timelines, timetrees, and divergence times. Mol. Biol. Evol. 34, 1812–1819 (2017).
Pavlovic, B. J., Blake, L. E., Roux, J., Chavarria, C. & Gilad, Y. A comparative assessment of human and chimpanzee iPSC-derived cardiomyocytes with primary heart tissues. Sci. Rep. 8, 15312 (2018).
Blake, L. E. et al. A comparison of gene expression and DNA methylation patterns across tissues and species. Genome Res. 30, 250–262 (2020).
Housman, G., Quillen, E. E. & Stone, A. C. Intraspecific and interspecific investigations of skeletal DNA methylation and femur morphology in primates. Am. J. Phys. Anthropol. 173, 34–49 (2020).
Hinrichs, A. S. et al. The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 34, D590–D598 (2006).
Barouch, A., Mathov, Y., Meshorer, E., Yakir, B. & Carmel, L. Reconstructing DNA methylation maps of ancient populations. Nucleic Acids Res. 52, 1602–1612 (2024).
Karolchik, D. et al. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32, D493–D496 (2004).
Huang, D. W., Sherman, B. T. & Lempicki, R. A. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37, 1–13 (2009).
Huang, D. W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009).
Gao, T. & Qian, J. EnhancerAtlas 2.0: an updated resource with enhancer annotation in 586 tissue/cell types across nine species. Nucleic Acids Res. 48, D58–D64 (2020).
Machanick, P. & Bailey, T. L. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696–1697 (2011).
Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. & Noble, W. S. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).
Acknowledgements
This publication was made possible through the support of a grant from the John Templeton Foundation (grant ID 61739 to L.C. and E.M.). The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of the John Templeton Foundation. This study was also funded by the Israel Science Foundation (grant no. 2436/22 to L.C.) and by the Ministry of Innovation, Science & Technology (grant no. grant 1001584586 to L.C. and E.M.). T.M.-B. is supported by funding from the European Research Council under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 864203), PID2021-126004NB-100 (MICIIN/FEDER, UE), ‘Unidad de Excelencia María de Maeztu’, funded by the AEI (CEX2018-000792-M), NIH 1R01HG010898-01A1 and Secretaria d’Universitats i Recerca and CERCA Programme del Departament d’Economia i Coneixement de la Generalitat de Catalunya (GRC 2021 SGR 00177). L.C. is the Snyder Granadar chair in Genetics. E.M. is the Arthur Gutterman Family Chair for Stem Cell Research. We wish to thank Avigal Be’er for the design of Fig. 1.
Author information
Authors and Affiliations
Contributions
Y.M., E.M. and L.C. designed the research. Y.M. conducted the analysis. M.N.-R., C.L., T.M.-B., A.K. and M.L. contributed the production of data. N.G. helped collecting data. E.M. and L.C. supervised the study. Y.M., E.M. and L.C. wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Ecology & Evolution thanks Alison Barton and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Precision on training set as a function of the parameters ka and kp.
Precision of the algorithm in detection of a outgroup-specific fundamental changes and b reference-derived fundamental changes, for each value of the parameters ka and kp. In yellow: the precision when running the algorithm on real data, in blue: the precision when running on permutations.
Extended Data Fig. 2 Precision versus number of detections.
Precision reaches plateau in higher values of the parameters, whereas power is decreasing. a, Precision and the number of outgroup-specific changes detected as a function of the parameter Kp when the other parameters are fixed to the optimized value. b, A similar comparison for the parameter Ka while the rest of the parameters are fixed to the optimized value.
Extended Data Fig. 3 Precision on an independent dataset.
Validating the algorithm by measuring precision on independent dataset. To complement the results presented in the main text, here chimpanzees were selected as reference and modern humans as target.
Extended Data Fig. 4 Mapping of CpG positions in the rhesus genome to the human genome.
Percentage of CpG positions successfully mapped to a human CpG position from rhesus bulk brain samples. If a rhesus CpG position was originally mapped by liftOver to coordinate c, the figure shows the percentage of positions where the human reference genome has a CpG at coordinate c + shift.
Supplementary information
Supplemantary Tables 1–9.
Table 1. Precision in predicting fundamental changes on validation datasets. ‘Smooth’ determines whether the DNA methylation signal was smoothed. Table 2. List of DNA methylation samples used in this study. WGBS, whole-genome bisulfite sequencing; RRBS, reduced representation bisulfite sequencing. Table 3. Optimized parameters for finding reference-derived and outgroup-specific DNA methylation changes. Table 4. Optimized parameters on different tissue pairs. Rows, tissue all; columns, tissue partial. Table 5. CpGs positions associated with outgroup-specific (a) and reference-derived (b) fundamental DNA methylation changes when comparing bone to prefrontal neurons. Five right columns report average methylation level in the relevant samples. Table 6. Genes with large number of CpG positions associated with outgroup-specific (a) and reference-derived (b) fundamental changes. CGI, CpG island. Table 7. DAVID output of enrichment analysis for genes that are targeted by enhancers intersecting with CpG positions associated with fundamental methylation changes. Count, number of genes associated with the term from our list; list-length, total number of detected genes; pop term, total number of human genes associated with this term; pop total, total number of human genes in the Gene Ontology database. Enrichment tests were done using DAVID75,76, accounting for multiple comparisons. Table 8. MEME-ChiP output for motifs overlapping fundamental changes linked to enhancers, and the transcription factors that bind these motifs. Table 9. CpGs and genes associated with outgroup-specific (a) and reference-derived (b) fundamental DNA methylation changes using bulk prefrontal cortex as tissue partial. Five right columns report average methylation level in the relevant samples.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Mathov, Y., Nissim-Rafinia, M., Leibson, C. et al. Inferring DNA methylation in non-skeletal tissues of ancient specimens. Nat Ecol Evol 9, 153–165 (2025). https://doi.org/10.1038/s41559-024-02571-w
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41559-024-02571-w