Main

Changes in gene regulation often underlie phenotypic divergence1,2,3,4, making the identification of regulatory differences between archaic humans (Neanderthals and Denisovans) and anatomically modern ones a major goal of palaeogenetics5. Because RNA is rarely present in palaeontological remains6, changes in gene regulation must be inferred indirectly from the ancient DNA (aDNA) sequences themselves. Previous works showed that aDNA degradation signals can be harnessed to computationally reconstruct premortem genome-wide DNA methylation maps7,8,9. DNA methylation is a key epigenetic mark that strongly affects the activity level of regulatory regions such as promoters and enhancers10. Hence, its reconstruction provides information on premortem gene activity patterns in ancient individuals. Indeed, the reconstruction of DNA methylation in Neanderthals, a Denisovan and anatomically modern humans allowed us to identify regulatory differences between these human groups and associate them with phenotypic changes, opening the field of palaeoepigenetics9,11,12.

However, DNA methylation is tissue-specific to the extent that methylation patterns in two different tissues of the same organism are often more diverged than methylation patterns in the same tissue in two different species13,14. Although occasionally aDNA is extracted from soft tissues such as skin15 and liver16, this is limited to a small number of highly conserved and relatively young samples. Generally, bones and teeth are the main sources of aDNA. In particular, these tissues are the source of all archaic human aDNA. Accordingly, our previous palaeoepigenetic studies focused on the evolution of the skeletal system9,12, but could provide only limited insights on other systems such as the nervous system8.

Yet, DNA methylation in one tissue still carries information on DNA methylation in other tissues of the same individual. The reason is that DNA methylation is erased almost completely in the zygote, and then re-established in a cell type-specific manner during development10. When a DNA methylation change that separates human groups is established during early developmental stages, it may propagate to all descendant cell types, and be simultaneously manifested in multiple tissues.

Here we show that for a certain class of DNA methylation changes, we are able to use parsimony considerations17 to predict the timing of the change during development, rendering skeletal DNA methylation changes informative on the methylation state in other tissues. Testing our method on modern primate tissues, we show that we achieve precision of 0.7–0.92 in using DNA methylation changes in one tissue to predict corresponding changes in another tissue.

We applied our method to predict DNA methylation changes in prefrontal neurons between modern humans, archaic humans and chimpanzees. We found 71 genes that are associated with differential neuronal DNA methylation separating modern and archaic humans, as well as 870 genes that are associated with differential neuronal DNA methylation separating all humans from chimpanzees. We show that some of these genes are involved in neural functions. Moreover, we discovered six methylation changes in genes that carry the Olduvai domain, whose copy number is associated with an increase in brain volume and cognitive function, suggesting that these changes might be important in human brain evolution.

Results

Given the almost complete reset of DNA methylation patterns in the zygote as part of the epigenetic reprogramming process18, evolutionary differences in DNA methylation patterns between species are manifested as changes that build up during embryonic development. A methylation change that occurs in cells that are not fully differentiated would propagate to all descendant cell lineages, unless reverted by a later change. For example, if a site goes through an evolutionary methylation change that builds up at a time that precedes the split of the mesodermal and the ectodermal germ layers, the difference in methylation level would show up both in bone and in neurons (Fig. 1a).

Fig. 1: Conceptual framework of the algorithm.
figure 1

a, Schematic embryonic development trees. DNA methylation changes along the tree would propagate to all descendant lineages. Considering a pair of tissues, for example bone and brain, methylation changes that occurred earlier than their developmental split (right) are fundamental and will affect both tissues. Changes that occurred after the split (left) are tissue-specific and will affect only one of the tissues. b, Different triad phylogenies and paired tissues are considered by the algorithm for validation and test sets. Methylation data are missing for the target ‘t’ in tissue ‘p’, but available for tissue ‘a’ for all species. c, Identification of configurations, out of all 12 possible, where the methylation state in the target in tissue ‘p’ can be predicted. When prediction is possible, the type of methylation change, whether fundamental (F) or tissue-specific (TS), is denoted. Note that for clarity, methylation levels are considered as binary variables.

For any pair of tissues or cell types, we denote by ‘fundamental changes’ all those evolutionary changes to DNA methylation that were established before the developmental split between the two tissues or cell types. Evolutionary DNA methylation changes that occurred after this developmental time point would be called tissue-specific changes (Fig. 1a). Note that the terms tissue-specific and fundamental methylation changes should be always understood in the context of two specific tissues or cell types. For example, a methylation change affecting all endodermal tissues is still considered tissue-specific if we compare, for example, lung with bone. The basis of the algorithm we develop below is the fact that if we are able to determine, for two tissues or cell types, whether an evolutionary change in DNA methylation is either fundamental or tissue-specific, then the methylation level in one tissue or cell type would be informative on the methylation level in the other tissue or cell type.

The inference procedure

We consider a three-species phylogeny, in which the species are marked as ‘reference’ (superscript r), ‘target’ (superscript t) and ‘outgroup’ (superscript o). For these three species, we examine five DNA methylation maps in two tissues (or cell types). In one tissue, denoted ‘all’ (superscript a), DNA methylation data are available for all three species. In the other tissue, denoted ‘partial’ (superscript p), DNA methylation data are available for the reference and the outgroup only (Fig. 1b). Our goal is to predict the methylation levels in the target in tissue ‘partial’.

A useful abstraction of the problem, which allows for convenient conceptualization of the approach, is to imagine that the DNA methylation in each CpG position is a binary variable than can be either high or low. We dub the combination of binary methylation levels across the five available DNA methylation maps in a specific CpG position a configuration. There are exactly three possible combinations of methylation states that describe CpG positions with differential methylation across the three species in tissue ‘all’ (Fig. 1c). An evolutionary methylation change in tissue ‘all’ can occur along the branch leading to the reference (reference-derived), along the branch leading to the target (target-derived) or along either of the two branches that separate the outgroup from the reference and target (outgroup-specific). Each of these three combinations may be associated with any of four different combination of methylation states in the tissue ‘partial’, leading to a total of 12 possible configurations that should be examined (Fig. 1c).

For some configurations, it is possible to determine the developmental timing of the methylation change using parsimony, whereas for others it is not. In a previous publication17 we outlined the procedure and demonstrated it in a particular configuration. Here we examine all possible configurations, identify those in which inference on the ‘target’ methylation can be made, develop a detailed algorithm to carry out the inference procedure and use it to predict DNA methylation levels in archaic human brains.

Of the 12 possible configurations, there are five for which parsimony considerations provide a prediction for the timing of the developmental change in methylation (Fig. 1c). In three configurations we predict that the change was tissue-specific, in two configurations the change was fundamental and in all others that it is impossible to determine the timing of the change. In tissue-specific configurations we end up predicting that the methylation change detected in tissue ‘all’ does not hold in tissue ‘partial’. These cases are less relevant for the current study, where our goal is to identify methylation changes in tissue ‘partial’. In the remainder of the article, therefore, we focus solely on the two configurations in which we conclude that the methylation change was fundamental. Note that because data are missing in tissue ‘partial’ of the ‘target’, we cannot make any predictions regarding developmental timing of the methylation change in target-derived changes.

Quantifying the performance

Based on the abstraction above, we developed an algorithm that accounts for the non-binary nature of DNA methylation and identifies CpG positions that are consistent with having gone through a fundamental methylation change (Methods). In these positions, we predict that the methylation levels in ‘target’ in tissue ‘partial’ are closer to those in the outgroup (for a reference-derived methylation change) or to those in the reference (for an outgroup-specific methylation change) (Fig. 1c).

Eventually, we would like to apply our algorithm to a triad of species made of modern human (reference), archaic human (target) and chimpanzee (outgroup). However, lacking DNA methylation in non-skeletal tissues of archaic humans, we optimized the parameters and quantified the performance based on a triad of extant species with available DNA methylation data in multiple tissues. To resemble the triad that we will eventually use, we used three modern primate species: modern humans as the ‘reference’, chimpanzees as the ‘target’ and rhesus macaques as the ‘outgroup’.

We trained the algorithm for the above triad of extant species, using heart as the tissue ‘all’ and kidney as the tissue ‘partial’ (Methods, Supplementary Tables 1 and 2 and Extended Data Figs. 1 and 2). For predicting reference-derived fundamental changes, we obtained a precision of 0.85, compared with an average of 0.22 in permuted data. For predicting outgroup-specific fundamental changes, we obtained a precision of 0.77, compared with an average of 0.40 in permuted data (Fig. 2a). Elevated precision for permuted data when predicting outgroup-specific fundamental changes is expected and, in fact, inherent to our algorithm. This stems from the fact that the evolutionary distance between the outgroup and the reference and/or target is, by definition, longer than the evolutionary distance between the target and the reference, leading to a greater similarity in the DNA methylation patterns of the reference and the target, and hence to an increased probability of finding by chance configurations that resemble outgroup-specific fundamental changes (‘Discussion’). In any case, for both reference-derived changes and outgroup-specific changes, the algorithm shows training set precision which is significantly higher than random.

Fig. 2: Precision in detecting fundamental DNA methylation changes using endodermal and mesodermal tissues.
figure 2

a, Precision of the algorithm using the optimized parameters on observed data (blue) compared with permutations (orange) in the detection of outgroup-specific (left) and reference-derived (right) fundamental changes. b, Validation of the algorithm on tissues that were not used for training. Error bars denote 1 s.d.

We validated our algorithm using DNA methylation data from liver and lung in these three species, which are tissues on which the algorithm was not trained. In each comparison, the precision achieved using the real data significantly exceeded the precision that was achieved using permuted data. The observed precision in predicting fundamental changes was in the range 0.7–0.84, compared with 0.21–0.37 on permuted data (Fig. 2b and Supplementary Table 3). Flipping the roles of reference and target, namely picking modern human as the target and chimpanzees as the reference, yielded very similar performance estimations (Supplementary Table 3 and Extended Data Fig. 3). To evaluate the statistical significance of the precision levels we obtain, we performed 1,000 permutations, in which in each permutation we randomly shuffled the tags ‘reference’, ‘target’ and ‘outgroup’ across the samples, keeping the total number of reference, target and outgroup samples fixed. For reference-derived inference, we obtained that 0.004 of the permutations yielded higher precision than our algorithm, whereas this fraction was 0.021 for outgroup-specific inference. Combined, these results show that our algorithm achieves significantly higher precision than expected by chance, even when generalized to tissues and a species combination on which it was not trained.

Because of the need to achieve sufficient statistical power, reconstructing DNA methylation maps of aDNA requires the use of moving averages across windows of consecutive CpG positions8. Although we showed in the past that this smoothing does not substantially affect the detection of differentially methylated regions (DMRs)9, we nevertheless wished to examine the effect of smoothing on the precision of our algorithm. We therefore applied the algorithm to a smoothed version of the above data (Methods) and found that the precision remains very similar to that obtained for non-smoothed data, and that it is always significantly higher than the precision achieved on permuted data (Supplementary Table 3).

Identifying differential methylation in neurons

The method we developed can be used to infer on aDNA methylation in any tissue. Because the large volume and high complexity of the human brain stand out as hallmarks of modern humans, we decided to demonstrate the applicability of our method to this organ first. To this end, we applied the algorithm with bone serving as the tissue ‘all’ and neuronal cell lines from the prefrontal cortex as the tissue ‘partial’.

For bones, we considered published DNA methylation data (including data generated by us in previous works), data of four human femora measured by us for the current study and reconstructed DNA methylation maps in ancient samples, including maps we reconstructed in previous works10,17 and a new map of Mesolithic anatomically modern human from Sweden19 that we reconstructed for this study (Methods and Supplementary Table 4). DNA methylation in prefrontal cortex cell lines was taken from published studies20,21. In total, neuronal DNA methylation data comprise 26 modern humans, 11 chimpanzees and 15 macaques, and bone DNA methylation data comprise 11 modern humans (5 measured, 6 reconstructed), 2 archaic humans, 7 chimpanzees and 10 macaques.

We first estimated the precision of our algorithm on a triad of extant species, composed of modern human as reference, chimpanzee as target, and rhesus macaque as outgroup. This yielded a precision of 0.81 in predicting outgroup-specific fundamental changes, and 0.92 in predicting reference-derived fundamental changes, both being significantly higher than the values achieved for random permutations (Fig. 3a).

Fig. 3: Precision in detecting fundamental DNA methylation changes using brain and bone.
figure 3

a,b, Precision of the algorithm in detecting fundamental changes affecting both bone and prefrontal cortex neurons (a) or bulk prefrontal cortex tissue (b) on observed data (blue) and permutated data (orange). Error bars denote 1 s.d.

We then applied the algorithm to the triad composed of modern human as ‘reference’, archaic human as ‘target’ and chimpanzee as ‘outgroup’, and identified 1,750 CpG positions presenting putative outgroup-specific fundamental changes, and 122 CpG positions presenting putative reference-derived fundamental changes. Of these, 1,131 CpG positions (64.6%), presenting putative outgroup-specific changes, are located in the promoter or gene body of 870 genes, and 80 CpG positions (61.1%), presenting putative reference-derived changes, are located in the promoter or gene body of 71 genes (Supplementary Table 5).

Archaic humans, which serve as ‘target’, comprise only two samples. To assess the possible reduction in power because of this small sample set, we repeated the analysis on the triad of extant species, but randomly downsampled the ‘target’ chimpanzee from 11 samples to only 2. We still detected 7 of the original 8 CpG positions associated with reference-derived changes, and 483 of the 531 original CpG positions associated with outgroup-specific changes. This suggests that the use of only two samples in the target set reduces the power of our analysis by approximately 10–12%.

We accounted for batch effects and evaluated the false discovery rate (FDR) for our actual triad by performing 1,000 permutations, shuffling the tags ‘reference’, ‘target’ and ‘outgroup’ of the bone samples and counting the number of detected reference-derived and outgroup-specific changes. On average, we observed 0.49 reference-derived CpG positions (FDR = 0.005) and 68.6 outgroup-specific CpG positions (FDR = 0.044).

A considerable number of the CpG positions associated with fundamental changes are grouped in clusters. This observation probably reflects the known correlation between adjacent CpG positions, and indicates regional changes in methylation in regulatory regions, where one often finds a close grouping of multiple differentially methylated CpGs22,23. We hypothesized that even isolated CpG positions showing fundamental changes would represent regional methylation changes. Namely, that the adjacent CpG positions would show changes in DNA methylation compatible with the same fundamental changes, despite not crossing the significance threshold. To test this, we computed the average methylation in such adjacent CpG positions, and compared it with both the nearby isolated CpG positions showing a fundamental change and the average methylation level in the genome. As expected, we found that in 89.1% of the cases the methylation level in adjacent CpGs clustered with the neighbouring isolated CpG position, suggesting that even loci featuring an isolated CpG position with a fundamental change might, in fact, represent a regional methylation change.

CpG positions with fundamental methylation changes are expected to preferentially fall inside regions we detected in a previous work as differentially methylated across the same triad, using a subset of the bone samples used in the current study9. Indeed, 172 of the 1,750 outgroup-specific positions (9.8%) fall in the list of 2,031 DMRs that separate chimpanzees and humans (P < 10−200 hypergeometric test), and 3 of the 122 reference-derived positions (2.5%) fall in the list of 873 anatomically modern human-derived DMRs (P = 2.8 × 10−6). Likewise, 137 of the 870 (15.7%) genes associated with outgroup-specific changes overlap with the list of 1,185 differentially methylated genes separating chimpanzees and humans (P = 7.9 × 10−40), and 5 of the 71 (7%) genes associated with reference-derived changes overlap with the list of 588 differentially methylated genes that have changed in modern humans (P = 4.8 × 10−3). Notably, we do not expect a complete overlap, because we used an upgraded version of the reconstruction algorithm, including new filtering techniques and histogram matching24.

Testing for enrichment in biological functions of our gene lists yielded no significant terms. This is expected, given that a fundamental methylation change is, by definition, a change that is shared across multiple tissues. Specifically, fundamental methylation changes that are shared between bone and brain could be associated with brain, bone or any other tissue (Fig. 1a). Therefore, these changes could be related to a wide range of functions, making it unlikely that specific functional enrichment would be observed. However, given the importance of the brain in human evolution, and the lack of methylation data from brain in archaic human, we focus here on changes that might have a relation to the nervous system.

We focused on genes associated with a particularly large number of positions with fundamental changes (Fig. 4 and Supplementary Table 6). In this context, we identified 26 outgroup-specific changes and 3 reference-derived changes localized in the locus containing the genes ZNF707 and CCDC166. Notably, our knowledge about the functions of these genes is limited, with scant evidence pointing at potential relevance to cancer25,26. Nonetheless, given its classification as a zinc-finger protein, it is reasonable to hypothesize that ZNF707 probably possesses an as-yet undiscovered regulatory role.

Fig. 4: Genes featuring multiple fundamental DNA methylation changes.
figure 4

a, Genes featuring more than four outgroup-specific fundamental changes. Instances in which changes overlap multiple genes are consolidated in the same column. Methylation alterations overlapping genes associated with the Olduvai domain (NBPF8, NBPF9 and NBPF10) are collectively assembled in the column furthest right. b, Genes featuring more than two reference-derived fundamental changes, along with the reference-derived change that intersects with NBPF9, which harbours the Olduvai domain.

Next in terms of the number of fundamental changes is the gene ADAMTS2, which exhibits 15 outgroup-specific changes. This gene product regulates neuronal migration by cleaving reelin27, and has been implicated in schizophrenia because it activates dopaminergic signalling and exhibits overexpression in the blood of people during episodes of psychosis (a phenomenon that can be reversed by antipsychotic medications)28,29. Moreover, in the Human Phenotype Ontology30, ADAMTS2 shows associations with language impairment, intellectual disability, microcephaly, neurodevelopmental delay and other brain-related terms. The gene NF-YA harbours five outgroup-specific changes. NF-YA is a component of the NF-Y transcription factor, which protects neurons from cell death31. Inactivation of NF-Y has been linked to various forms of neuronal pathologies32,33. PIWIL1 exhibits seven outgroup-specific fundamental changes and regulates neuronal polarization and migration34. BRCA1 harbours five outgroup-specific fundamental changes. Apart from being one of the most well-studied tumour suppressors, BRCA1 plays a role in the survival of neural progenitors35. It is also expressed in the hippocampus, and has been observed to be under-expressed in patients with Alzheimer’s disease. Knocking down this gene in the dentate gyrus leads to learning and memory deficits, impairs synaptic plasticity and neuronal shrinkage36. Notably, hypomethylation in a specific region in BRCA1, which does not overlap any of the positions with fundamental changes, has been associated with Alzheimer’s disease37. Moreover, a variant of BRCA1 has been associated with intellectual disability38. AHRR, which also harbours five changes, represses aryl hydrocarbon receptor (AHR), an important mediator for many cellular events. The main hallmarks of brain ageing, including oxidative stress, neuroinflammation and neurogenesis, are affected by AHR, suggesting its function as a regulator for ageing in the brain, together with its effect on the nervous system development39,40. The deubiquitylase OTUB1 also exhibit five changes encoding the gene OTUB1. This gene mediates neuronal survival after intracerebral haemorrhage41,42. The effects of this protein on neuroimmune response contribute to pathologies associated with brain diseases such as Parkinson’s disease and multiple sclerosis43,44.

Turning our attention to genes associated with reference-derived changes, of eight such genes with three altered positions (the highest number of such alternations in a gene), three show a relationship to the brain. Mutations in TMEM216 have been linked to syndromes that involve brain abnormalities45,46. TMEM216 regulates ciliogenesis, a process with substantial importance in brain development, and disruptions in TMEM216 function lead to impairments in various neurodevelopmental processes47. Another gene linked to ciliary function is CROCC. CROCC encodes rootletin, a protein involved in the formation of ciliary rootlets. Mutations in this gene in Drosophila have been associated with sensory deficits48. PSMC2 is a component of the 26S proteasome complex that plays a role in various neural processes, including synaptic plasticity49,50 and brain autophagy51. Notably, impairments in the 26S proteasome complex have been strongly associated with the presence of brain inclusions such as alpha-synuclein and tau tangles52.

Five CpG positions associated with outgroup-specific fundamental changes, and another one associated with reference-derived changes, are located in genes from the neuroblastoma breaking point family (NBPF) gene family, inside the LOC100288142 locus (Fig. 4). NBPF proteins are enriched with a protein domain called the Olduvai domain that is key to human brain evolution. This domain, which is found almost exclusively in NBPF proteins, went through a human-specific increase in copy number, reaching approximately 300 copies. Chimpanzees show less than half that number of copies, and the number drops to several dozens of copies in other Old World monkeys, and to around a single copy in non-primate mammals. The domain is absent in non-mammals53. Olduvai domain copy number variation is correlated with brain volume54,55, cognitive functions56 and autistic symptoms57,58,59. It is also negatively correlated with schizophrenia60. NBPF genes in general have been found to enhance neural stem cell proliferation61. Four of the five outgroup-specific fundamental changes are located in NBPF10, one of the NBPF genes with the highest number of human-specific Olduvai domain copies. NBPF10 is adjacent to NOTCH2NL, an additional gene linked to human brain volume62,63, and it has been suggested that the two evolved in tandem64.

To further explore potential regulatory effects of the differential methylation, we examined whether CpG positions with fundamental methylation changes tend to overlap enhancers associated with bones and neurons more than would be expected by chance (Methods). In bones, we observed a significant overlap for both outgroup-specific (298 intersecting positions; P = 9.04 × 10−10) and reference-derived CpG positions (28 intersecting positions; P = 2.66 × 10−4). In neurons, significant overlap was only evident for outgroup-specific CpG positions (81 intersecting positions; P = 5.37 × 10−7), whereas no significant overlap was observed for reference-derived CpG positions (4 intersecting positions). Next, we examined genes that are associated with enhancers that overlap CpG positions with differential methylation. Although there are no such genes for reference-derived CpG positions, there are 330 and 480 genes for outgroup-specific CpG positions in bones and neurons, respectively. Enrichment analysis of these genes revealed similar patterns, despite representing two distinct tissues, highlighting organelle functions over tissue-specific ones (Supplementary Table 7). This aligns with expectations, considering that fundamental changes affect multiple tissues.

We next searched for motifs that overlap these enhancers and transcription factors that bind these motifs (Supplementary Table 8). In outgroup-specific changes we found enrichment of transcription factors that are related to brain development (GLI1, GLI2, ZIC1, NHLH2, NR2F2, ZIC2 and ZNF148; FDR = 2.3 × 10−3) and specifically forebrain development (GLI1, GLI2, ZIC1, NHLH2 and NR2F2; FDR = 0.01).

Application to bulk prefrontal cortex

In addition to prefrontal neurons, DNA methylation is also available from bulk prefrontal cortex tissue from humans, chimpanzees and rhesus macaque (Methods and Extended Data Fig. 4)65,66. To examine whether the methylation changes we identified in neurons can be detected in whole tissues, we repeated the analysis with these bulk DNA methylation measurements.

We first evaluated the performance of our algorithm using the same triad of extant species as we used before, setting modern human as ‘reference’, chimpanzee as ‘target’ and rhesus macaque as ‘outgroup’, with bone serving as the tissue ‘all’ and bulk prefrontal cortex as the tissue ‘partial’. Unfortunately, there is a small number of samples of DNA methylation in bulk prefrontal cortex (three humans, three chimpanzees and two rhesus macaques) with relatively lower coverage (Supplementary Table 4), compromising the power of our approach to detect differentially methylated positions. Indeed, although the precision we achieved is significantly higher than for permuted data (P < 10−4), we observed lower values than in previous analyses. We obtained a precision of 0.66 in predicting reference-derived changes (compared with a mean of 0.32 for permuted data) and a precision of 0.55 in predicting outgroup-specific changes (compared with a mean of 0.36 for permuted data) (Fig. 3b).

We then considered archaic humans as ‘target’ and chimpanzees as ‘outgroup’ and applied our algorithm to predict differential methylation in bulk brain tissue. We found 396 CpG positions associated with putative outgroup-specific changes and 53 CpG positions associated with putative reference-derived changes. The outgroup-specific CpG positions are associated with 248 genes and the reference-derived CpG positions are associated with 35 genes (Supplementary Table 9). Reassuringly, we found high consistency between these CpG positions and those identified using prefrontal neurons. In total, 156 of the 396 of the outgroup-specific CpG positions appear among the 1,750 positions identified using prefrontal neurons (P < 10−300, hypergeometric test) and 8 of the 53 of the reference-derived CpG positions appear among the 122 positions identified using prefrontal neurons (P = 4.7 × 10−44). Similarly, of the 248 genes associated with outgroup-specific changes in bulk prefrontal cortex, 116 were found in the corresponding list of 870 genes identified using prefrontal neurons (P < 10−100) and 8 of the 35 genes associated with reference-derived changes were also identified using neurons (P = 3.54 × 10−16).

Discussion

Based on the embryonic developmental timing of an evolutionary methylation change, we defined two families of changes; fundamental changes are those that occurred before the developmental split between a pair of tissues, whereas tissue-specific changes are those that occurred after the split. In this work, we developed an algorithm to detect CpG positions that underwent fundamental changes, because they result in DNA methylation differences across the examined species that are also replicated in the tissue ‘partial’ (Fig. 1c). Similar methodology can be used to detect tissue-specific changes as well, although these result in no change in DNA methylation across the species in the tissue ‘partial’ (Fig. 1c), hence potentially presenting more limited evolutionary interest. The main algorithmic modification that is required would be to use statistical tests for equivalence, such as the two one-sided t-tests (TOST) procedure67.

The algorithm can be used to determine the methylation state in non-skeletal tissues of archaic humans, as long as methylation data are available for this tissue in both modern humans and chimpanzees (or any other non-human ape, in the case that bone methylation data are available for this species as well). This ability to obtain information on DNA methylation in archaic tissues that are not accessible in the palaeontological record opens new ways to examine changes in gene regulation and their potential effect on evolutionary adaptations in humans.

To quantify performance, we tested the algorithm on three extant species, comprising modern humans, chimpanzees and rhesus macaques (Fig. 2a). The longer branches on this phylogenetic tree (median divergence time between modern humans and rhesus macaques is 28.9 million years ago (Ma), compared with the median divergence time between modern humans and chimpanzees of 6.4 Ma (ref. 68)) result in an underestimation of the performance of the algorithm, because the likelihood of independent reversal of the methylation state along each branch increases. We therefore estimate that our algorithm works with even higher precisions than those reported here. We used the algorithm to find genomic positions in which the methylation state can be determined in the brains of archaic humans. We focused on the brain because of its central role in human evolution and the potential implications of brain-related differences between human groups, and between humans and chimpanzees. However, the algorithm we present is general and can be used to determine the methylation state in other archaic tissues. In fact, brain is probably among the most challenging choices, given its large developmental distance from bone (Fig. 1a). A tissue that is developmentally closer to bone, such as muscle or heart, is likely to result in a larger number of positions whose methylation state can be determined.

We showed that we are able to detect two types of fundamental DNA methylation changes. Reference-derived changes are those in which DNA methylation levels in modern humans are different from those in both archaic humans and chimpanzees, representing methylation patterns that are unique to modern humans. Outgroup-specific changes are those in which DNA methylation levels in both modern and archaic humans are different from those in chimpanzees, representing methylation patterns that are shared across modern and archaic humans. In all analyses, we detect fewer reference-derived positions than outgroup-specific ones, although with higher precision. This is expected considering the much shorter evolutionary time span in which the methylation change could have occurred for reference-derived changes compared with outgroup-specific ones (Fig. 1b). Shorter distances mean fewer methylation change events, which result in lower numbers of detected events, but also in lower probability for reversal of methylation changes, and therefore higher precision of parsimony-based inference.

We present here evidence for a strong connection between genes associated with fundamental changes and neuronal functions. Among the most interesting findings is the identification of six fundamental changes in NBPF genes carrying Olduvai domains, which seem to be associated with human brain evolution. Whereas the association of Olduvai domains to brain evolution has focused thus far on their copy number, our findings suggest that methylation changes might also have a role in this; in particular, NBPF10, has four fundamental methylation changes and is also one of the genes with the largest number of Olduvai domains. With that, future research is imperative to establish causality and validate the functional importance of these methylation changes on phenotypes.

The number of CpG positions that can be tested for differential methylation is constrained by the limited availability of DNA methylation data from primate bone, because part of the data was produced using Illumina Infinium MethylationEPIC BeadChip methylation array (Supplementary Table 4), which represent approximately 3% of (but enriched for functional) CpG positions in the genome. As a result, our set of CpG positions associated with fundamental changes represents only a subset of the positions that show such an association. In addition, restrictions come from the fact that we considered a CpG position only if data were available in at least two samples from each group. This criterion resulted in a restricted set of CpGs available for analysis, ranging from 1% to 17% of autosomal CpGs, depending on the tissues selected for each analysis. These limitations underscore the need for an expanded dataset of bone methylation.

For reference-derived changes, we require that the mean methylation in the ‘target’ will be closer to the mean methylation of the ‘outgroup’ than to that of the ‘reference’, whereas for outgroup-specific changes we require the opposite. Addition of this requirement was important for reducing the level of false predictions, which result from the small number of samples in the ‘target’ compared with the other groups. However, when this additional requirement is used, the algorithm is unable to detect CpG positions that have gone through gradual methylation change (for example, that the ‘outgroup’ is unmethylated, the ‘target’ is partially methylated and the ‘reference’ is highly methylated). Such positions could potentially be detected as being simultaneously reference-derived and outgroup-specific methylation changes. However, mean methylation in the ‘target’ group will be closer to either the ‘outgroup’ or the ‘reference’. Therefore, the additional requirement will lead the algorithm to detect only one of the two types of fundamental changes, and information on gradual changes will be lost. We hope that in the future more methylation maps of archaic humans will be published, making use of the additional requirement unnecessary, and allowing for the detection of gradual fundamental methylation changes.

Overall, we present an algorithm to infer DNA methylation in non-skeletal archaic tissues. The algorithm can be applied to any triad of species or groups and could be useful whenever DNA methylation in a certain tissue is unavailable or difficult to obtain.

Methods

DNA methylation data

We collected and generated DNA methylation data from seven tissues and cell types (neurons, bone, kidney, heart, liver, lung and bulk brain) across three species (modern human, chimpanzee and rhesus macaque). In addition, we used reconstructed DNA methylation data in bones of modern and archaic humans. Data were collected only from autosomes, to avoid the unique methylation patterns that characterize sex chromosomes.

Bone DNA methylation of present-day individuals published in this study

DNA methylation maps from four femur head bones from present-day humans were generated using whole-genome bisulfite sequencing (WGBS). Patients were females with osteoarthritis. Samples were extracted during total hip replacement surgery and were taken from the healthy part of the bone. All four patients signed a form according to Helsinki approval 0178-13-HMO. DNA extraction and the WGBS protocol are described in a previous publication9 (Supplementary Table 4). In brief, DNA was extracted from bones using a QIAamp DNA Investigator kit (Qiagen, catalogue no. 56504) and DNA libraries were built with the Illumina TruSeq Sample Preparation kit. Bisulfite treatment was applied in two rounds using the EpiTect Bisulfite kit (Qiagen) and paired-end sequencing was performed on an Illumina Hi-Seq 2000 instrument. We aligned the reads to hg19 genome assembly using Bismark v.0.23.0 with the following parameters: --bowtie2 --non_bs_mm -p 4 --multicore 4. We then used bismark_methylation_extractor to extract methylation calls with the following parameters: -p --parallel 4 --bedGraph. We did not perform filtration of CpG positions by coverage, as we used the default Bismark setting of including positions with a minimum coverage of 1×.

Previously published modern DNA methylation

Neuronal data include WGBS DNA methylation data measured in neurons isolated from prefrontal cortices of humans, chimpanzees and macaques. Data for humans were downloaded from Gene Expression Omnibus (GEO) accession number GSE107638 (ref. 20), taking only the 25 healthy controls. Data for chimpanzees and macaques were downloaded from GEO accession number GSE151768 (ref. 21). WGBS DNA methylation data for kidneys, hearts, livers and lungs of four specimens for each of these species were downloaded from GEO accession number GSE112356 (refs. 69,70). Bulk WGBS brain data that were generated from prefrontal cortices of three modern humans and three chimpanzees was downloaded from GEO accession number GSE37202 (ref. 66). Bulk WGBS data of two macaques were downloaded from GEO accession number GSE77124 (ref. 65). For bone, we used ten Illumina Infinium MethylationEPIC BeadChip (850K) methylation arrays from rhesus71 and data acquired by Gokhman et al.9 including one modern human and one chimpanzee WGBS map, one chimpanzee reduced representation bisulfite sequencing map and four chimpanzee 850K methylation arrays (Supplementary Table 4).

Most non-human DNA methylation data were already mapped to the human hg19 reference genome. However, this was not done for rhesus bulk brain samples, and we therefore mapped these data to hg19 using liftOver72. This mapped ~30% of the rhesus CpG positions to a corresponding CpG position in human. However, in a significantly enriched number of cases, the rhesus CpG positions were mapped to a position preceding a human CpG position by a single base (Extended Data Fig. 4). We considered these as valid mappings, ending up with ~40% of the rhesus CpG positions being successfully mapped to the human hg19 genome reference.

Ancient DNA methylation maps

We used our newest version of the RoAM software tool24 to reconstruct the methylome of a Mesolithic anatomically modern human individual from Stora Karlsö, Sweden (~9,000 years ago) that had been previously sequeneced19 (Supplementary Table 4). In addition, we used our previously published methylation maps of five ancient anatomically modern humans9, one Neanderthal and one Denisovan8, which at the time of writing are the only uracil DNA glycosylase-treated high-coverage archaic samples, two prerequisites for computational reconstruction of aDNA methylation17.

Batch effects

To minimize batch effects stemming from combining data from different sources, we took several measures. First, all data used for optimization and cross-validation were taken from the same laboratory69,70. Similarly, all the brain data, including both prefrontal cortex neurons and bulk brain tissue, were taken from the same laboratory20,21,65,66.

Most of the bone data were produced by us, either specifically for this paper or in previous work9. To address consistency between the modern bone samples and the ancient ones, we used histogram matching during the DNA methylation reconstruction, fitting the reconstructed histogram to that of previously published modern bone data9,73.

Finally, remaining batch effects are accounted for in the estimated false discovery rates of our method.

Detecting CpGs positions with differential DNA methylation

The algorithm receives methylation data from three species and two tissues (Fig. 1b). Only CpG positions where, for each tissue and species, data are available from at least two samples were considered. In addition, only positions where the variance in methylation levels in each tissue and species was <0.08 were considered. This threshold was selected to guarantee that a minimum set of two samples will have at most a 40% difference in methylation level.

Identifying reference-derived fundamental changes

Let us have methylation data for N CpG positions and let \(0\le {m}_{i,\;j}^{s,u}\le 1\) be the measured methylation level of sample j at CpG position i (i = 1, …, N), in species s and tissue u. We say that the methylation level at CpG position i is reference-derived in tissue ‘all’ if

$$\left|{t}_{i}^{r,a}\right|\ge {t}^{r,a}+{k}_{a}^{{rd}}{\sigma }^{r,a}.$$

Here, \({t}_{i}^{r,a}\) is the statistic of an equal-variance t-test at CpG position i in tissue ‘all’, contrasting the ‘reference’ with the ‘target’ and ‘outgroup’, and \({t}^{r,a}\) and \({\sigma }^{r,a}\) are the estimators of the mean and the standard deviation of the statistic across the N positions. \({k}_{a}^{{rd}}\) is a parameter.

Similarly, we say that CpG position i shows differential methylation between the ‘reference’ and the ‘outgroup’ in tissue ‘partial’ if

$$\left|{t}_{i}^{{ro},p}\right|\ge {t}^{{ro},p}+{k}_{p}^{{rd}}{\sigma }^{{ro},p}.$$

Here, \({t}_{i}^{{ro},p}\) is the statistic of an equal-variance t-test at CpG position i in tissue ‘partial’, contrasting the ‘reference’ with the ‘outgroup’, and \({t}^{{ro},p}\) and \({\sigma }^{{ro},p}\) are the estimators of the mean and the standard deviation of the statistic across the N positions. \({k}_{p}^{{rd}}\) is a parameter.

To identify reference-derived fundamental changes (left-hand column in Fig. 1c), we look for CpG positions i that satisfy:

$$\left|{t}_{i}^{\,r,a}\right|\ge {t}^{r,a}+{k}_{a}^{{rd}}{\sigma }^{r,a}$$
(1a)
$$\left|{t}_{i}^{\,{ro},p}\right|\ge {t}^{{ro},p}+{k}_{p}^{{rd}}{\sigma }^{{ro},p},$$
(1b)
$${t}_{i}^{\,r,a}\cdot {t}_{i}^{\,{ro},p} > 0.$$
(1c)

Inequality (equation (1c)) guarantees that the methylation difference between the ‘reference’ and the ‘outgroup’ in both tissues is in the same direction.

We found it useful to add a fourth condition to equation (1a–c), verifying that the mean methylation in ‘target’ is closer to the mean methylation of the ‘outgroup’ than to the mean methylation of the reference. If \(\overline{{m}_{\iota }^{r,a}}\), \(\overline{{m}_{\iota }^{t,a}}\) and \(\overline{{m}_{\iota }^{o,a}}\) are the mean methylation across samples in tissue ‘all’ in position i of the ‘reference’, ‘target’ and ‘outgroup’, respectively, then

$$\left|\overline{{m}_{\iota }^{t,a}}-\overline{{m}_{\iota }^{r,a}}\right| > \left|\overline{{m}_{\iota }^{t,a}}-\overline{{m}_{\iota }^{o,a}}\right|.$$
(1d)

In the implementation, we kept this additional condition optional because the algorithm detects methylation changes in high precision even without it. However, it is advisable to use it when the number of samples in ‘target’ is small compared with the other groups, because it prevents mispredictions that arise as a result of the lower weight of the ‘target’ samples in the t-test. Because only two archaic human samples serve as ‘target’ in the current study, we used this additional condition throughout the paper. Positions that satisfy equation (1a–d) are those for which we predict that the methylation in ‘target’ in tissue ‘partial’ clusters with that in ‘outgroup’, and different from that in ‘reference’ (Fig. 1c).

Identifying outgroup-specific fundamental changes

We follow a similar approach to identify fundamental changes in which the methylation change in tissue ‘all’ is outgroup-specific (right-hand column in Fig. 1c). CpG positions that we predict have gone through outgroup-specific fundamental changes should satisfy:

$$\left|{t}_{i}^{o,a}\right|\ge {t}^{o,a}+{k}_{a}^{{os}}{\sigma }^{o,a}$$
(2a)
$$\left|{t}_{i}^{{ro},p}\right|\ge {t}^{{ro},p}+{k}_{p}^{{os}}{\sigma }^{{ro},p},$$
(2b)
$${t}_{i}^{o,a}\cdot {t}_{i}^{{ro},p} > 0.$$
(2c)

Here, \({t}_{i}^{o,a}\) is the statistic of an equal-variance t-test at CpG position i in tissue ‘all’ contrasting the ‘outgroup’ with the ‘target’ and ‘reference’, \({t}^{o,a}\) and \({\sigma }^{o,a}\) are the estimators of the mean and the standard deviation of the statistic across the N positions, and \({k}_{a}^{{os}}\) and \({k}_{p}^{{os}}\) are parameters. Note that the t-test for tissue ‘partial’, equation (2b), is identical to the one for the reference-derived fundamental changes, equation (1b), because in both cases we wish to find a significant difference in tissue ‘partial’ between the ‘outgroup’ and the ‘reference’. Yet, we allow the parameters \({k}_{p}^{{rd}}\) and \({k}_{p}^{{os}}\) to be potentially different.

Here, too, we added a fourth condition that guarantees that the mean methylation in ‘target’ is closer to ‘reference’ than to ‘outgroup’,

$$\left|\overline{{m}_{\iota }^{t,a}}-\overline{{m}_{\iota }^{o,a}}\right| > \left|\overline{{m}_{\iota }^{t,a}}-\overline{{m}_{\iota }^{r,a}}\right|.$$
(2d)

In the implementation, this condition is optional, but we used it throughout this work.

Measuring performance

Whenever the true value of the methylation in the ‘target’ at tissue ‘partial’ is known, we can use it to estimate the precision of our algorithm. To this end, we define CpG positions for which there is truly a reference-derived methylation change in tissue ‘partial’ as those that satisfy

$$\left|{t}_{i}^{\,r,p}\right|\ge {t}^{r,p}+{k}_{t}^{{rd}}{\sigma }^{r,p}.$$
(3)

Here, \({t}_{i}^{\,r,p}\) is the statistic of an equal-variance t-test at CpG position i in tissue ‘partial’, contrasting the ‘reference’ with the ‘target’ and ‘outgroup’, \({t}^{r,p}\) and \({\sigma }^{r,p}\) are the estimators of the mean and the standard deviation of the statistic across the N positions, and \({k}_{t}^{{rd}}\) is a parameter. Similarly, we define CpG positions for which there is truly an outgroup-specific methylation change in tissue ‘partial’ as those that satisfy

$$\left|{t}_{i}^{\,o,p}\right|\ge {t}^{\,o,p}+{k}_{t}^{{os}}{\sigma }^{\,o,p},$$
(4)

where \({t}_{i}^{o,p}\) is the statistic of an equal-variance t-test at CpG position i in tissue ‘partial’, contrasting the ‘outgroup’ with the ‘target’ and ‘reference’, \({t}^{o,p}\) and \({\sigma }^{o,p}\) are the estimators of the mean and the standard deviation of the statistic across the N positions, and \({k}_{t}^{{os}}\) is a parameter.

Let Sp be the set of CpG positions for which we predict a reference-derived fundamental change, namely, those positions that satisfy equation (1a–d). Let Se be the set of reference-derived fundamental changes that are considered ‘true’ changes, defined as the collection of positions that satisfy

$$\left|{t}_{i}^{r,a}\right|\ge {t}^{r,a}+{k}_{a}^{{rd}}{\sigma }^{r,a}$$
(5a)
$$\left|{t}_{i}^{r,p}\right|\ge {t}^{r,p}+{k}_{t}^{{rd}}{\sigma }^{r,p},$$
(5b)
$${t}_{i}^{r,a}\cdot {t}_{i}^{r,p} > 0,$$
(5c)

where equation (5a) is identical to equation (1a) and (5b) is identical to equation (3). Then, the precision of our algorithm that predicts reference-derived fundamental changes is

$${{\mathrm{Prc}}}=\frac{|{S}_{p}\cap {S}_{e}|}{|{S}_{p}|},$$
(6)

where |S| is the size of set S.

Analogously, for outgroup-specific fundamental changes we define Sp as the set of positions that satisfy equation (2a–d), and Se as the set of positions that are taken as ‘true’ outgroup-specific fundamental change, namely the set of positions that satisfy

$$\left|{t}_{i}^{o,a}\right|\ge {t}^{o,a}+{k}_{a}^{{os}}{\sigma }^{o,a},$$
(7a)
$$\left|{t}_{i}^{o,p}\right|\ge {t}^{o,p}+{k}_{t}^{{os}}{\sigma }^{o,p},$$
(7b)
$${t}_{i}^{o,a}\cdot {t}_{i}^{o,p} > 0,$$
(7c)

where equation (7a) is identical to equation (2a) and (7b) is identical to equation (4). Precision is defined as in equation (6).

Parameter estimation

Overall, the algorithms that predict reference-derived changes and outgroup-specific changes are independent of each other, and each respective set of parameters can be optimized separately. Each algorithm uses two parameters, \({k}_{a}^{{rd}}\) and \({k}_{p}^{{rd}}\), for predicting reference-derived changes (equation (1a–d)), and \({k}_{a}^{{os}}\) and \({k}_{p}^{{os}}\) for predicting outgroup-specific changes (equation (2a–d)). Moreover, the performance of each algorithm is determined using an additional parameter, either \({k}_{t}^{{rd}}\) for the reference-derived changes or \({k}_{t}^{{os}}\) for the outgroup-specific changes. All three parameters for each algorithm were optimized together. Permuted data were generated by pairing a random CpG position in tissue ‘partial’ to each position in tissue ‘all’. For each analysis, 10,000 permuted datasets were generated. The optimized parameters were chosen as the values that gave the largest difference between the precision of the algorithm on the real data and its precision on permutated data.

We performed the optimization process by using chimpanzee as ‘target’, modern humans as ‘reference’ and rhesus macaque as ‘outgroup’ and choosing heart as tissue ‘all’ and kidney as tissue ‘partial’. We scanned all values of the parameters in a grid of values from 2 to 4 using linear spacing of 0.2 for each parameter. Notably, for each value of the parameters, both in the reference-derived and the outgroup-specific comparisons, the precision of the algorithm when applied to the real data was higher than the average precision of the corresponding permutated data (Extended Data Fig. 1). Moreover, not a single permutation in any set of parameters achieved precision that exceeded that of the real data. The optimized values of the parameters are given in Supplementary Table 1.

Some of the optimized parameters take their value at the edge of the grid, pointing at the possibility of achieving even higher precision using values that are outside the current grid. However, we noticed that in these cases precision reaches a plateau, so that the improvement is minor when changing the value of the parameter. Combined with the fact that values of the parameters outside the grid decreased the number of detections and hence reduced the power of the analysis, we decided to bound the value of the parameters and not extend the grid (Extended Data Fig. 2). The set of optimized parameters is remarkably robust to the selection of the training dataset and changed only slightly when using other tissue combinations (Supplementary Table 2).

Smoothing data

Smoothing of methylation data in modern samples was performed in the cross-validation part only. It was done using a moving average with a fixed window size over CpG positions. We used a window size of 31 CpG positions, to match typical window sizes used for reconstructing ancient methylation8,9.

Gene enrichment analysis

A CpG position with a predicted fundamental change is associated with a gene if it resides in the gene body or in the promoter region, defined as 5,000 bp upstream of the transcription start site to 1,000 bp downstream of it. Gene body coordinates and gene names were determined using the hg19 genome reference downloaded from UCSC genome assembly74.

We used DAVID75,76 to test for enrichment of our gene lists in biological functions. We used all default datasets.

Enhancer enrichment analysis

Enhancer locations were downloaded from EnhancerAtlas77. For bone enhancers we used the osteoblasts datasets. For neuron enhancers we combined data of embryonic stem cell neurons and cerebellum neurons. Enrichment was evaluated using a hypergeometric test followed by Benjamini–Hochberg FDR procedure. EnhancerAtlas includes a comprehensive list of genes associated with each enhancer. We used DAVID to investigate functional enrichments for genes linked to enhancers that intersect with the detected positions. Further, we used MEME-ChIP78 to examine which of the positions that overlap enhancers also overlap binding motifs, and then used TOMTOM79 to identify transcription factors that bind these motifs.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.