Abstract
Contrary to expectations based on their higher cell numbers, larger and longer-lived species do not face dramatically increased risk of cancer. This strongly suggests that evolution has fashioned natural cancer resistance mechanisms, yet our knowledge remains limited on what these mechanisms might be. The cancer immunological surveillance hypothesis, proposed by Burnet and Thomas in the 1950s, highlights immunity as a key factor determining species-specific cancer resistance. Here we address the original, evolutionary interpretation of this hypothesis by investigating the relationship between cancer mortality risk and markers of efficient antigen presentation. Our results show that the expansion of the MHC class I gene complex, as well as increased selection for diversity at these genes is associated with sharply decreasing cancer mortality risk across mammals. This suggests that the efficient presentation of diverse peptides in somatic cells is important for cancer suppression across mammals, providing pioneering evidence that supports the cancer immunosurveillance hypothesis across species.
Similar content being viewed by others
Introduction
Cancer is associated with mutations in genes that control the growth, division, or functioning of cells. These alterations are generally reflected by the phenotype of cells, including a set of tumor-specific or tumor-associated antigens exhibited at the cell surface that differ (in structure, distribution, or density) from those of healthy cells, making them immunogenic1. Such immunogenicity implies the recognition of tumor antigens by the adaptive immune system, potentially triggering the elimination of cells carrying mutations, often before tumor formation could take place. Critical to the functioning of this process is the major histocompatibility complex (MHC, in humans named human leukocyte antigen complex, or HLA)2. The MHC gene complex is present in all amniotes and encompasses some of the most variable functional genes described across these lineages1,3,4, with the number of duplicated genes being unusually high and their alleles exhibiting hyper-polymorphism5. Genes of the MHC code for membrane proteins whose primary role is antigen presentation in the immune response2. MHC class I (MHC I) molecules have a particularly important role in cancer defense6. These proteins are present on the surface of all nucleated cells in vertebrates4, and serve the function of presenting intracellular protein fragments to the immune system. In healthy cells, this implies the presentation of fragments of proteins normally present in cells, eliciting no eliminative response from the immune system, through mechanisms of self-tolerance7. In case of an intracellular infection (e.g., viruses, intracellular bacteria) or genetic alterations (e.g., cancer driver mutations), peptides unusual to healthy cells are transported and exhibited on the surface of cells by MHC I molecules. Abnormal antigens are then recognized by CD8 T lymphocytes and (unless T cell responses are restrained by immune checkpoints8) cells carrying these are destroyed by CD8 effector T cells1. In case of cancer, this process is called T-cell mediated cancer immunosurveillance, and it is a mechanism widely recognized as an important pathway that nips many cancers in the bud6. The cancer immunosurveillance hypothesis is not new, having its roots in the early 20th century work of Paul Ehrlich9, later being suggested by Lewis Thomas10, and further developed by Sir MacFarlane Burnet11. The two of them explicitly framed immunological surveillance as an evolutionary hypothesis, suggesting that the need to recognize and eliminate tumor cells was the primary selective force leading to the emergence of adaptive immunity, a major evolutionary event that happened about 500 million years ago12. The role of adaptive immunity in cancer defense is now well established, with extensive experimental work demonstrating much higher rates of carcinogen-induced, as well as spontaneous tumors in mice lacking adaptive immunity as compared to wild-type mice13.
The efficiency of T cell-mediated cancer immunosurveillance is contingent on multiple factors, and the MHC I genotype could be one of these14. The generally observed high variability in MHC alleles within diverse animal taxa has been suggested to indicate the evolutionary advantage of MHC heterozygotes over homozygotes4,15,16. This is also supported by observations that codons maintaining the structural integrity of the MHC I molecules are highly conserved, while codons of the peptide-binding regions are extremely variable15,17. Heterozygote advantage might emerge due to superior protection against multiple infectious agents or mutations4,18. For instance, heterozygosity in MHC I genes at the level of individuals is associated with a larger repertoire of protein fragments that are able to bind to the peptide-binding groove of MHC I molecules3,18,19, potentially resulting in more efficient elimination of cells with abnormal intracellular proteins, such as those emerging due to infections or genetic damage16,19,20. In the case of cancer defenses, the importance of MHC class-I proteins is striking6,7. For instance, in human cancer patients, it was recently demonstrated that heterozygosity at HLA-I loci (and even nucleotide diversity of HLA-I alleles across fully heterozygous individuals) is a good predictor of the efficacy of checkpoint blockade immunotherapy19,20. These studies demonstrate that HLA-I genetic diversity plays a central role in efficient immunological surveillance of abnormal cells by increasing the repertoire of peptides presented on MHC I (the immunopeptidome). Moreover, abundant evidence suggests that suppression of MHC I expression and the resulting impairment in antigen presentation is a mechanism widely used by tumor cells to evade T cell-mediated cancer immunosurveillance7,21. The significance of MHC I in cancer prevention is evident not only at an individual level but also within populations of certain animals. For example, the suppression of classical MHC I molecules on the surfaces of cancer cells, and the presence of non-classical MHC molecules (which inherently lack variations between individuals) are believed to collectively contribute to the heightened vulnerability of Tasmanian devils to clonally transmissible cancer known as devil facial tumor disease22,23,24, initiated by cancer cells spreading as allografts among individuals. Due to its prominent role in cancer defense, MHC-I-mediated immune surveillance is currently also the primary target mechanism for cancer vaccines25,26. Moreover, recent evidence in humans has suggested that MHC I genotype predicts the mutational landscape of tumors in human cancer patients14. When looked across species, clear differences in MHC I genotype are evident regarding both loci number and allele diversity27,28, mostly believed to be shaped by pathogen-mediated balancing selection. Nevertheless, we have no understanding on how cross-species variability in MHC genetics is linked with cancer risk.
Cancer susceptibility varies across animals, with some species highlighted for their exceptionally low risk of cancer (e.g., naked mole rats29). Moreover, long-lived, large bodied species are often highlighted for their surprisingly low risk of cancer compared to theoretical expectations based on their large cell numbers (due to their large size and continuous cell turnover throughout their long lives)30,31,32. This observation, known as “Peto’s paradox”, brings into evidence that evolution solved the problem of carcinogenesis in large bodied and/or long-lived species33. A recent study has highlighted that a key mechanism through which this might happen is a slower rate at which the mutational burden of somatic tissues augments, at least in long-lived taxa, pushing cancer to mostly emerge in late age-classes34. Such deceleration in mutational accumulation during the evolution of long-lived species could occur through (1) a lowered rate at which mutations emerge (e.g., better protection of DNA), (2) an improved DNA repair and/or neutralization of damaged cells (e.g., by apoptosis/cellular senescence)35, (3) more efficient immunological surveillance and elimination of cells with non-silent mutations, or (4) a combination of these factors. In this study, we aimed to explore the third component, by testing whether the efficiency of cancer immunosurveillance (reflected by the complexity, defined here by higher gene copy number or allelic diversity of the MHC I gene complex) shapes cross-species variation in cancer mortality risk (CMR) or inherent cancer predisposition (reflected by body size or longevity) across mammals. We predict (1) decreasing cancer prevalence with increasing complexity or diversity of the MHC I gene complex across species; and (2) increasing complexity or diversity of the MHC I gene complex with increasing longevity and/or body mass in order to compensate for their increased inherent risk of cancer.
Here we built on our previous work30 in which we established a methodology to estimate CMR of hundreds of zoo-held mammalian species, using two metrics i.e., CMR and incidence of cancer mortality (ICM). CMR represents a simple measure of the prevalence of cancer mortality across deceased animals, while ICM is the incidence of cancer mortality at 90% adult longevity, thereby being controlled for age and censoring (see ref. 30 for details). In order to establish the genetic diversity at MHC I loci, we obtained sequences of classical MHC I genes from GenBank, particularly focusing on exons 2 and 3 (i.e., regions coding the extracellular α1 and α2 domains, including the peptide-binding region of the MHC I molecules). Given the heterogeneity in sampling intensity across species, traditional measures of MHC I diversity (e.g., allele diversity) could not be used here. Therefore, we estimated selection for MHC diversity as the ratio of non-synonymous (dN) to synonymous (dS) nucleotide substitution rates, a widely used measure of evolutionary pressure on the coded proteins36. Importantly, only about 25% of codons from the considered exons 2 and 3 code for the peptide-binding groove of the MHC I molecules37,38, responsible for direct interactions with antigens. These functional peptide-binding MHC residues are the most likely targets of positive diversifying selection4. Therefore, to focus on sites most likely involved in antigen recognition, we further estimated selection for MHC I diversity at a subset of codons showing the strongest evidence of positive selection. Specifically, we identified the 20 most positively selected sites (PSS) across the MHC I sequences from all considered species and used the estimates of the dN/dS ratio across this set of sites as a proxy of selection for functionally relevant (in terms of antigen binding properties) variation of MHC I molecules (hereafter referred to as selection for MHC I diversity at PSS). Moreover, in order to measure the complexity of the adaptive immune system we estimated the expansion of the MHC class I gene complex over the evolutionary timeline of mammals. We focused on a single gene family (i.e., groups of genes that share a common ancestor) that includes key genes of the MHC-I antigen-presenting system (AZGP1, MHC-A, MHC-B, MHC-C, MHC-E, and MHC-G) and we estimated the number of protein-coding genes belonging to this gene family in each mammalian species using Orthofinder39.
Results
Selection for MHC I diversity and cancer mortality risk
We obtained MHC I sequence data to estimate selection for MHC I diversity in taxa where cancer mortality could also be assessed. Data were available for 28 species for cancer mortality rate (CMR) and 27 species for intrinsic cancer mortality (ICM) (Fig. S1). Using this information, we then explored the link between selection or MHC I diversity and measures of cancer mortality. Species with zero CMR/ICM estimates were excluded from multivariate models (resulting in final sample sizes of 25 and 24, respectively), because zero cancer mortality estimates often reflect poor sampling (see ref. 30), while the excess of zeros skews the distribution of disease risk in a way that violates model assumptions. Moreover, the low number of species with no detected cancer cases (n = 3) did not allow us to run binomial models on cancer incidence. We thus analyzed non-zero CMRs as a function of selection for MHC I diversity in phylogenetic logistic regressions. We detected negative correlations between selection for MHC I diversity and both CMR (β (SE) = −3.62 (2.13), p = 0.1029), as well as ICM (β (SE) = −5.89 (2.41), p = 0.0232, Table S1 and Fig. 1a, b). The coefficient of determination indicated that selection for MHC I diversity alone explains 24% and 35% of variation in CMR and ICM, respectively. Selection for MHC I diversity at PSS showed considerably stronger associations with CMR (β (SE) = −0.73 (0.21), p = 0.0021) and ICM (β (SE) = −0.64 (0.29), p = 0.0366) (Table S2 and Fig. S3). Selection for MHC I diversity at PSS explained 33% and 28% of cross-species variation in CMR and ICM respectively.
Non-zero a CMR or b ICM were plotted against selection for MHC I diversity at the 20 most positively selected sites (PSS), expressed as the ratio of non-synonymous (dN) to synonymous (dS) nucleotide substitution rates. Points are proportional to the precision of cancer mortality risk estimates (as indicated by the total number of individuals with RDI). Slopes originate from phylogenetic models using a consensus tree (presented in Table S2) and both associatons were statistically significant p = 0.0034 and 0.0436, respecitvely). All models were repeated with 100 equally likely phylogenetic hypotheses and the distribution of p-values indicating the relationship between selection for MHC I diversity and c CMR or d ICM are shown. P = 0.05 is marked with a vertical red line. All statistical tests performed are two-sided and no correction for multiple testing was performed.
Expansion of the MHC I gene family and cancer mortality risk
Allelic diversity within a gene can be shaped by copy number variation, where distinct alleles at individual loci enhance population-level diversity and enable single organisms to harbor diverse allelic repertoires. This is especially important for MHC I, exhibiting striking inter-specific variation in locus number27,28,40 from e.g., only three classical loci in humans to up to over a dozen loci in some rodents41. Given this variability, we explored whether the expansion of the MHC I gene complex shows a similar negative correlation with cancer mortality across species.
To quantify MHC I expansion, we estimated the number of protein coding genes in the MHC I gene family using Orthofinder39. Data were obtained for 127 and 119 mammalian species with available cancer mortality data (CMR and ICM respectively) and sequenced genomes. MHC I gene family size ranged from only one protein coding gene in the Southern three-banded armadillo (Tolypeutes matacus) to 44 in the White-faced saki (Pithecia pithecia), with a mean of 14.60 genes across taxa. We then analyzed non-zero cancer mortality (CMR: n = 106; ICM: n = 100) as a function of MHC I gene family size in phylogenetic logistic regressions. Notably, two rodents, the naked mole rat (Heterocephalus glaber) and the Patagonian mara (Dolichotis patagonum) emerged as high leverage outliers in our statistical analyses (Fig. 2). Both species exhibit an unusually low MHC I gene family size while also having exceptionally low cancer mortality. Including these species in our models altered model fits considerably, suggesting they may represent biologically distinct cases rather than generalizable trends. Indeed, the immune system of naked mole rats is known to be highly distinctive compared to other mammals (e.g., lack of natural killer cells, immune cell populations dominated by splenic immune cells: 56% compared to 8% in mice42, low frequency of T cells42,43). The immune system of the Patagonian mara remains largely unstudied, its unique placement in our analysis raised concerns about its comparability to other taxa in our database. Due to these factors, we excluded both species from our statistical models to ensure robust and generalizable conclusions. Nonetheless, these outliers are visualized in figures and their placement is discussed from an evolutionary perspective. Following these exclusions, our results indicated a steeply declining cancer mortality with the expansion of the MHC I gene family across mammals (CMR (β (SE) = −0.51 (0.14), p = 0.0004; ICM (β (SE) = −0.47 (0.17), p = 0.0068, Table S3 and Fig. 2). The coefficient of determination indicated that MHC I gene family size alone explains 37% and 15% of the cross-species variance in CMR and ICM respectively.
Non-zero a CMR or b ICM were plotted against the loge number of protein coding genes within the MHC class I gene complex. Points are proportional to the precision of cancer mortality risk estimates (as indicated by the total number of individuals with RDI). Slopes originate from phylogenetic models using a consensus tree (presented in Table S3) and both associatons were statistically significant p = 0.0004 and 0.0068. Please note that the naked mole rat and the Patagonian mara due to their unique immune features and/or their high leverage, were not considered in these analyses. All models were repeated with 100 equally likely phylogenetic hypotheses and the distribution of p-values indicating the relationship between the size of the MHC I gene family and c CMR or d ICM are shown. P = 0.05 is marked with a vertical red line. All statistical tests performed are two-sided and no correction for multiple testing was performed.
Confounding effects
Previous works have emphasized that cross-species variation in the number of MHC loci has co-evolved with body mass and longevity, at least in birds44. While CMR appears to be unrelated to body mass and adult life expectancy30 (but see ref. 45), these life-history traits might have the potential to confound associations detected between the complexity or diversity of MHC I and CMR. To explore this, we rerun the above models, while including adult body mass and adult life expectancy as covariates in phylogenetic logistic regressions exploring the link between cancer mortality measures and selection for MHC I diversity or MHC I gene family size. Model results containing these confounding effects provided highly congruent results to single-predictor models both in case of selection for MHC I diversity (Table S4), as well as in case of MHC I gene family size (Table S5). These models thus confirmed the negative association of the MHC I complexity and diversity with CMR across the mammalian phylogeny.
Our previous analyses on the cross-species variation in cancer risk across mammals has revealed an important role of the species’ diet30. We have shown that primarily consuming mammalian prey items significantly increases the risk of cancer mortality across mammalian species. Given the important role of mammal-based diet, we tested whether controlling for this effect in our analyses changes our conclusions regarding the role of MHC I in shaping cross-species variation in cancer mortality. The models indicated that the effect of selection for MHC I diversity (Table S6), as well as of MHC I gene family size (Table S7) on cancer mortality metrics remain robust and are not influenced by the consumption of mammalian prey across species.
The phylogenetic tree used in our analyses represents only one of the multiple hypotheses reflecting the potential evolutionary trajectory of the taxa included in these analyses. Therefore, to explore how sensitive our results are to the phylogenetic uncertainty, we rerun each model using 100 equally plausible phylogenies. Results on the association between CMR and selection for MHC I diversity or MHC I gene family size were highly consistent between alternative phylogenetic hypotheses (MHC I diversity: Fig. 1c, d, MHC I diversity at PSS: Fig. S3c, d, MHC I gene family size Fig. 2c, d).
Evolution of MHC I in function of life-history traits
In order to explore the extent to which selection on MHC I is influenced by the evolutionary history of the studied species, we explored the phylogenetic signal of the estimated MHC I traits. The phylogenetic inertia of selection for MHC I diversity was low when measured across all sites (n = 28, λ = 0.11, P = 0.6842, significance marks the difference from λ = 0), but showed moderate phylogenetic signal when measured across PSS (n = 28, λ = 0.55, P = 0.0066). Similarly, MHC I gene family size showed a very strong phylogenetic signal (n = 127, λ = 0.91, p < 0.0001, Fig. S4). We then explored whether selection on MHC I is more intense in large or long-lived taxa and if MHC I could be a factor contributing to the better cancer defense of such species. To do this, we gathered information on average adult body mass, adult life expectancy30 (i.e., average time lived by individuals following their sexual maturity) and 80% adult longevity (post-sexual-maturity age when 80% of the adult population is dead, representing a more reliable longevity metric than maximum lifespan46), and we tested the effects of these traits on MHC I characteristics. To test the correlation between selection for MHC I diversity or MHC gene family size and life-history traits, we built single-predictor phylogenetic regressions. We used MHC I characteristics as dependent variables and life-history traits as explanatory variables in subsequent models. Selection for MHC I diversity was unrelated to adult body mass, but tended to increase with adult life expectancy (R2 = 0.16), and increased significantly with increasing 80% adult longevity (R2 = 0.33, Fig. 3a–c and Table S8). On the contrary, selection for MHC I diversity at PSS showed no association with any of the three life history traits (Fig. 3d–f and Table S8). Similarly, MHC I gene family size was unrelated to body mass, adult life expectancy and 80% adult longevity across the studied mammals (Fig. 4a–c and Table S8).
Selection for MHC I diversity a–c overall or c–e at the 20 most positively selected sites (PSS) in function of a, d adult body mass, b, e adult life expectancy, or c, f 80% adult longevity. Associations were tested using phylogenetic generalized least squares regressions. Models shown in (b, c, e, f) were weighted for (and points are proportional to) the precision of life expectancy and 80% adult longevity estimates (reflected by the log number of individuals with RDI). Slopes originate from phylogenetic regressions presented in Table S8. Selection for MHC I diversity only increased with 80% adult longevity (p = 0.0051). All statistical tests performed are two-sided and no correction for multiple testing was performed.
Loge MHC I gene family size in function of a adult body mass, b adult life expectancy, or c 80% adult longevity. Associations were tested using phylogenetic generalized least squares regressions, either a unweighted or b, c weighted for the number of deaths used to estimate adult life expectancy and 80% adult longevity. b, c points are proportional to the precision of longevity metrics. Slopes originate from phylogenetic regressions presented in Table S8. None of the presented associations reached statistical significance. All statistical tests performed are two-sided and no correction for multiple testing was performed.
Discussion
Our study highlights that diversity and complexity of the MHC I gene complex are key predictors of CMR across mammals. We show that the cross-species variation in the selection for MHC I diversity accounts for approximately one-third of the variation in CMR observed among different mammalian taxa, while MHC I gene family size explains nearly 37%. These pronounced inverse correlations of the MHC I diversity and complexity with CMR suggest a fundamental role of T cell-mediated cancer immunosurveillance in defining the intrinsic cancer vulnerability of various mammalian species. In bolstering the immunological surveillance hypothesis on a cross-species scale, our study underscores adaptive immunity as a cancer defense that is potentially general to the majority of jawed vertebrates, endowed with adaptive immunity. Furthermore, our study highlights a positive correlation between selection for MHC I diversity and species-specific 80% adult longevity. These findings empirically support the idea that the evolution of extended longevity has exerted selective pressure on adaptive immunity, at least in mammals.
Recent studies have highlighted that individual variation in MHC I genotype predicts the occurrence of specific oncogenic mutations14, as well as the occurrence of various cancers among patients47. Moreover, genetic diversity at MHC I loci determines the efficiency of checkpoint blockade immunotherapy in cancer patients19,20. These studies provided evidence that efficient immunological surveillance is a crucial component of the natural cancer defense across individuals of a single species. Our study corroborates these findings and extends on these conclusions on multiple fronts. First, we demonstrate that the diversity and complexity of MHC I is important in cancer defense not just between individuals of a species, but also correlates with the risk of cancer across a wide range of mammalian taxa. This implies that natural selection on immunity, especially on MHC I, might have been one of the key evolutionary mechanisms that shaped natural cancer defenses in mammals. This is particularly important, as it provides empirical support for the original evolutionary formulation of the immunosurveillance hypothesis, put forward over 60 years ago. Interestingly, while the molecular mechanisms underlying the recognition of cancer cells were not yet understood, Lewis Thomas clearly expressed his theory that cancer immunosurveillance relies on the recognition of tumor-specific neoantigens48. Our study provides support for this exact mechanism of tumor surveillance and supports the claims that the need for efficient cancer defense might have been among the primary selective factors leading to the emergence of adaptive immunity.
Second, our results demonstrate that selection for MHC I diversity significantly increases with adult life expectancy and 80% adult longevity, but varies independently of body mass across mammals. This suggests that MHC I diversity, and the ability of the immune system to recognize and respond to potential intracellular pathogens and cancerous cells might have played a pivotal role in the evolution of slow life-histories. The importance of superior cancer immunosurveillance in species with extended longevity shown here is also congruent with findings of Cagan et al.34, showing that the rate of mutational accumulation in somatic tissues of mammals steeply decreases with increasing longevity, but varies independently of the body mass of the species. Consequently, it is possible that the slow accumulation of somatic mutations in long-lived taxa could be (at least partially) ensured by a more efficient immunological surveillance (mediated by high genetic diversity at the MHC I loci). Nonetheless, while selection for MHC-I diversity at PSS and MHC I gene family size followed similar trends, neither exhibited significant correlations with body mass or longevity metrics. This suggests a more limited role of MHC I in the evolution of slow life-histories or may reflect compensatory mechanisms to MHC-based immunity. For instance, in species with limited MHC-I diversity, alternative tumor suppressor mechanisms might compensate, such as reliance on innate immunity, or non-immune based defenses (e.g., early contact inhibition in naked mole rats)29.
Our results highlight the benefits of harboring high MHC I diversity in the protection against cancer. Nonetheless, most species do not exhibit extremely high MHC I diversities49. This paradox can be explained by T cell receptor depletion, which is a mechanism that causes a trade-off between MHC diversity and T cell receptor repertoire size41. The latter is important for the immune system, ensuring efficient antigen recognition, mounting effective immune responses, establishing immune memory, and maintaining self-tolerance, contributing to overall immune adaptability and responsiveness. T cell receptor repertoire diversity is shaped through positive and negative selection during the maturation of T cells in the thymus and it is inversely related to the number of expressed MHC molecules50. This mechanism suggests that evolution shapes a careful balance between efficient cancer resistance and self-tolerance, indicating a stabilizing selection on MHC I diversity.
While our results suggests that cancer contributes to the evolutionary pressure on MHC I, the latter plays a key role not just in cancer, but also in intracellular pathogen and parasite defense51,52. Moreover, several cancers are of pathogenic origin53, it is thus possible that the detected associations between selection for MHC I diversity and CMR are indirect and are mediated by selection on MHC I that targets pathogen defense. Exploring this possibility is especially important, since MHC diversity is often linked to pathogen pressure54 and long-lived, large animals are generally better protected from both cancers30,32 and pathogens53, making such indirect associations plausible. Separating these effects, however, requires high-quality information on the prevalence and diversity of a broad range of potential intracellular pathogens that might contribute to selection for MHC I diversity (e.g., viruses, intracellular bacteria, fungi, protozoa, haemosporidians). Yet, such information is difficult to obtain, especially in comparable quality across a wide range of species. It is thus essential that further studies explore the role of pathogen pressure on MHC I besides cancer risk, to further support the nature of the primary selective pressure on MHC I during the evolutionary history of jawed vertebrates.
The extreme polymorphism of MHC I genes is thought to be maintained by a combination of selection mechanisms, notably heterozygote advantage, negative frequency-dependent selection, fluctuating selection, and mate choice55. These processes can act simultaneously, leave similar molecular footprints, and their relative contributions to the emergence and maintenance of MHC I diversity in natural animal populations remain debated. While our study cannot resolve this debate, it highlights cancer as a potential driver of MHC diversity, in addition to the well-recognized pathogen-driven selection. Future work, particularly using theoretical modeling, should explore whether, and through which evolutionary mechanisms, cancer might contribute to selection on MHC I genes. While most cancers do not co-evolve with hosts in the same way pathogens do, thereby limiting the applicability of some pathogen-driven evolutionary models, exceptions exist. Notably, transmissible cancers and cancers of pathogenic origin (e.g., viral-induced) may follow similar evolutionary dynamics as infectious agents, and thus could promote MHC I diversity through comparable selection mechanisms.
The curious case of some rodents
Our results indicate a very clear association between the expansion of the MHC I gene family and reduced cancer mortality, yet the naked mole rat and the Patagonian mara emerged as species that clearly did not follow the pattern of other mammalian taxa in this respect. For instance, naked mole rats exhibit a markedly simplified MHC-I system compared to what has been observed in most mammalian species, yet their cancer risk is also believed to be one of the lowest of all mammals. The naked mole rat genome contains only three MHC-I genes (two protein-coding and one pseudogene)42, compared to six functional protein-coding genes in humans and up to 22 in mice42. This reduction in MHC I is paired with a complete lack of natural killer cells and of genes that normally control the production of natural killer cells in other rodents42. This is remarkable, as natural killer cells generally rely on MHC-I interactions to detect infected or cancerous cells via the mechanism of missing-self recognition. These unique immune adaptations have been suggested to be a consequence of the underground lifestyle of this species and their consequent limited exposure to airborne viruses, which could have relaxed selective pressure from intracellular pathogens, allowing naked mole rats to prioritize alternative immune strategies42. Research dedicated to naked mole rats in recent decades has however, also highlighted that naked mole rats have unique and highly efficient anti-cancer adaptations that are independent of the immune system56,57. We thus suggest that these alternative tumor-suppressor mechanisms might have contributed to the relaxed selection on MHC I in this species. Alternatively, the reduced MHC I based immunosurveillance in naked mole rats might have increased the selective pressure for more efficient alternative tumor-suppressor mechanisms in this species promoting the evolution of already well characterized mechanisms of tumor prevention in this taxon. In either case, species exhibiting low MHC I gene copy number or allele diversity paired with limited cancer incidence might represent interesting target taxa of studies aiming to uncover new natural tumor-suppressor mechanisms across other mammals.
A solution to Peto’s paradox
Peto’s paradox has been a central topic of comparative oncology during the last couple of decades. Solving the paradox is of great importance, as it could clearly highlight natural cancer resistance mechanisms shaped by evolution that could potentially inspire human cancer therapies. Which physiological or genomic adaptations might solve the paradox has been a central topic for multiple comparative studies58,59, with little general insight gained so far. While studies have identified species-specific molecular or cellular mechanisms limiting oncogenesis29, it is extremely unlikely that each animal species possesses uniquely distinct anti-cancer mechanisms, and general tumor-suppressor modalities are expected to exist60. Importantly, our results indicate that adaptive immunity might be one such general (yet non-exclusive) mechanism of cancer resistance in mammals and potentially across jawed vertebrates, where the mechanism is identical across species, yet its efficiency is variable across taxa. This is also supported by the facts that both MHC I gene family expansion and selection for MHC I diversity are associated with low cancer mortality, while selection for MHC I diversity is also positively associated with longevity across mammals. Interestingly, however, selection for MHC I diversity was unrelated to body size, indicating that MHC I might have a limited role in the protection against cancer caused by the increase in body size during evolution.
Methods
Cancer mortality risk
We estimated CMR using data provided by the Zoological Information Management System (ZIMS), managed by Species360 (a non-profit organization custodian of zoo-aquaria data)61 following the methodology of Vincze et al.30. In short, we obtained a new data extraction of individual-level date of birth, date of death and cause of death (categorical data, including neoplasia as an option) from the husbandry module of ZIMS for animals that lived in zoos after the 1st of January 2010 (data extracted in June 2023). We excluded records of species that might have been subject to domestication (see ref. 30 for the list of excluded taxa), individuals below the age of sexual maturity (obtained from literature, see Source Data for species-specific resources) and all species where less than 20 individuals have relevant death information (RDI, information regarding the cause of death). Using the resulting database, we estimated CMR, as simple ratio between individuals who died of cancer divided by the total number of individuals with RDI in each species. Second, we estimated the cumulative ICM, a metric of CMR at (90% adult longevity, controlling for potential biases due to disregarding left-truncation (i.e., cancer before individuals enter the study) and right-censoring (individuals alive, thus with unknown fate at data extraction)30 using survival modeling. The database used to estimate CMR/ICM data for this study includes 80 species from 13 mammalian orders and contains information on 88,464 individuals, of which 34,128 were dead at the time of data extraction and 8503 had available RDI30.
Selection for MHC I diversity
In order to estimate selection for MHC diversity, we relied on publicly available resources from the NCBI GenBank Nucleotide Database. As sequence data in GenBank may originate from different sources (e.g., from different populations with varying demographic history) and may have been collected under different sampling schemes (the information on the number of genotyped individuals often remains unknown), it cannot be effectively used to quantify and compare standard measures of MHC diversity (e.g., allelic richness) across species. However, GenBank data provides a sample of species-specific MHC sequences, which can be used to detect molecular signature of historical selection at these genes and to reliably quantify variation in selection patterns among species.
In order to infer selection patterns at the mammalian MHC class I, we first downloaded GenBank sequences of MHC class I exons 2 and 3 (i.e., the two exons coding for the peptide-binding groove in MHC I) for species with available estimates of CMR (the last accession on June 2023). In three taxa with exceptionally high numbers of sequences (e.g., >3700 available sequences for Macaca fuscicularis), we downloaded 100 random sequences per exon. All sequences were aligned using Geneious software v. 10.0.5 (Biomatters Ltd., Auckland, New Zealand) and trimmed to our targeted exons. Identical allelic variants were removed from the alignments. The average length of alignments was 256.3 ± 2.9 [SE] bp and 261.9 ± 3.9 [SE] for exons 2 and 3, respectively, which covered 94–97% of full exon length.
To exclude the possible bias caused by the presence of non-classical MHC I sequences, we identified putative non-classical sequences based on the conservation of amino acid residues responsible for anchoring (docking) antigen peptide ends in the MHC peptide-binding groove. The anchor residues are highly conserved in classical MHC molecules across divergent vertebrate lineages62,63, but may experience radical amino-acid changes in non-classical genes. Here, we identified putative non-classical sequences based on the presence of at least one or two amino acid alterations at the anchor residues within exon 2 (n = 3 residues) and exon 3 (n = 7 residues), respectively. In total, we identified up to 12 putatively non-classical sequences per species (on average 4.4 ± 1.1 [SE] % of all sequences per species), and we excluded these from all further analyses. Species with fewer than four classical sequences per exon available were not considered in further analyses. After removal of putative non-classical sequences, consensus sequences from virtually all (96%) alignments better blast-matched alleles from classical (A-C) rather than nonclassical (E-G) human MHC I loci. However, we were still not able to distinguish which classical loci our sequences were derived from. Such information could be only retrieved for a couple of model species, for which the detailed architecture of the MHC region is well resolved, and this was generally not feasible for non-model species. Here, we thus assume that the retrieved sequences were derived from a spectrum of different loci, possibly both paralogs and orthologs. Nevertheless, we believe this is unlikely to bias our estimates, as (1) we used the same methodology in every species; and (2) generally, we retrieved exons from GenBank (not from genomic resources) that were already annotated, avoiding potential issues with manual annotations.
Since recombination signal may bias estimates of nucleotide substitution rates (no unique tree topology can describe evolutionary history of the entire sequence64), we first aimed to identify recombination breakpoints and recombinant sequences in our data. For small alignments (<50 seq), we used the Genetic Algorithm for Recombination Detection (GARD, Datamonkey web server65) and partitioned all available sequences into recombinant fragments for further analyses. Since GARD performed poorly for larger alignments (i.e., 50–100 seq), recombinant sequences in these alignments were detected using the RDP4 software66. Briefly, recombination events were identified based on the consensus from at least two different recombination detection algorithms implemented in RDP4. On average, 15.8 ± 3.4 [SE] % of sequences were identified as recombinant and were removed, resulting in the final sample sizes lower than 100 sequences per exon in all the species.
The average final size of alignments (classical sequences with no recombinants) was 22.4 ± 3.0 [SE] sequences per exon per species, but ranged from 4 to 91 sequences. To account for this uneven sampling across species (i.e., variation in the number of available sequences), we applied a standardized subsampling procedure for species with >10 available sequences per exon. Specifically, we estimated selection for MHC I diversity using ten randomly selected (subsampled) sequences per species. Subsampling was conducted using fasta.sample function in the FastaUtils R package67 and iterated ten times per exon per species. In these cases, the estimates of nucleotide substitution rates (see below) were averaged across all iterations to obtain a representative measure of selection for each species.
Following the elimination of potential recombination-related bias and subsampling, we estimated the relative rate of non-synonymous (dN) and synonymous (dS) base pair substitutions using Fast Unconstrained Bayesian AppRoximation68 (FUBAR, Datamonkey) as a primary estimate of selection for MHC I diversity. Although the dN/dS ratio is traditionally used as a measure of positive selection across lineages (different alleles are advantageous and become fixed in independent lineages)36, pathogen-mediated balancing (diversifying) selection acting within lineages tend to leave the same hallmark signature at the MHC genes69. Hence, species-specific dN/dS ratios have commonly been used to estimate the strength of historical selection for MHC diversity (e.g., refs. 70,71,72,73, where dN/dS > 1 indicates diversifying selection (faster accumulation and maintenance of non-synonymous than synonymous substitutions), dN/dS < 1 indicates purifying selection (slower accumulation of non-synonymous than synonymous substitutions), while dN/dS = 1 indicates no signature of selection (non-synonymous and synonymous substitutions accumulate at similar rates).
The dN/dS estimates were first computed for all available codons within each alignment. Nonetheless, only about 25% of codons from exons 2 and 3 code for the peptide-binding groove of the MHC I molecules37,38, which are truly responsible for direct interactions with antigens (so-called peptide binding region). Therefore, in order to confirm that the potentially detected associations are driven by functional (rather than structural) variation at the MHC, we re-extracted dN/dS ratios using only codons showing the strongest evidence of positive selection. Specifically, based on the values of Bayes factors, we identified the 20 most PSS across the MHC I sequences from all considered species (10 PSS per exon) and used the estimates of the dN/dS ratio across this set of sites as a proxy of selection for functionally relevant variation (selection for MHC I diversity at PSS). Selection for MHC I diversity at PSS was calculated similarly to the overall measures of selection for MHC I diversity (across all sites), using subsampling of 10 sequences, iterated 10 times, and averaged. While both dN and dS were initially calculated separately for exons 2 and 3, we averaged the values across the two exons to obtain a single species-specific estimate of dN/dS ratio at the MHC class I. Selection for MHC I diversity could be calculated for a total of 28 mammalian taxa with available cancer risk estimates.
Our analyses revealed relatively low dN/dS values, especially when all MHC I codons were considered (mean = 0.41, n = 28). This might be attributed to the fact that only a few codons of the peptide binding region are directly responsible for direct interactions with antigens and, thus, are under positive diversifying selection, while codons of the structural or cross-membrane part of MHC molecules remain generally invariant62. Also, pathogen or cancer-driven selection may selectively target only a fraction of codons of the peptide-binding region, while some other residues may be evolutionarily conserved due to their specific functions in peptide binding, for example, being responsible for binding peptide ends74. In fact, the mean number of PSS was only 3.62 and 3.39 at exon 2 and exon 3, respectively across the 28 species studied here (see values of PSS and NSS, respectively in Source Data). In contrast, we have evidence for an average of 5.70 and 8.00 negatively selected codons at the exon 2 and 3, respectively. Taking all this into account, there is much stronger signature of negative than positive selection across the entire exon 2/3 sequence of the mammalian MHC-I, which results in low overall dN/dS ratios. At the same time, our data shows a strong evidence of positive diversifying selection at some codons and the mean dN/dS of the 20 most PSS per species is 2.99 across the taxa explored here (see Table Source Data), providing support for strong diversifying selection acting at a fraction of MHC-I codons.
MHC I gene family size
Protein-coding sequences from wild mammals with known CMR/ICM estimates (n = 127) were obtained from genomes annotated with TOGA75 (Tool to infer Orthologs from Genome Alignments) via the Senckenberg data repository (https://genome.senckenberg.de/download/TOGA/). TOGA is a genome annotation software which uses similarities of intronic and intergenic regions to identify orthology components by comparison to a well-annotated reference genome (here, human, GRCh38/hg38). We used TOGA-annotated genomes because of their enhanced annotation completeness, especially of conserved mammalian genes75 and annotation consistency, improving comparability among the genomes. If multiple assemblies were available for a single species, only the one with the highest contig N50 score was retained. We also downloaded the human proteome from the Ensembl Genome Browser76 to identify genes of interest in the target species among the orthogroups. Exclusively, the longest transcript variant of each gene per genome was used for downstream analysis. To assign the protein sequences of each species to orthogroups, we used OrthoFinder v2.5.539 with default settings. The sequence similarity search tool DIAMOND77 was used to perform a comprehensive comparison of the protein sequences. Subsequently, homologous genes were clustered into orthogroups defined as sets of genes originating from a single gene in the common ancestor of the analyzed species39. Orthofinder assigned the protein sequences into 22,327 orthogroups, including one orthogroup (OG0000011) containing the human HLA class I genes (AZGP1, HLA-A, HLA-B, HLA-C, HLA-E, and HLA-G). We used the latter orthogroup as an estimate of MHC I gene family size and we used this variable as predictor of CMR/ICM in phylogenetic logistic regressions.
Phylogeny and statistical analyses
To account for the statistical non-independence of species due to common descent, we obtained a sample of 1000, equally plausible phylogenetic trees, from the posterior distribution published by Upham et al.78, covering 5911 species. We then obtained a rooted consensus tree using the “sumtrees” Python library79. To explore the effect of phylogenetic uncertainty, all models were run using 100 equally likely phylogenetic hypotheses and the results were checked for consistency across these. All statistical analyses were carried out in R statistical and programming environment, R v.4.3.180, using the R function gls from the package “nlme” and Brownian motion evolutionary models. Partial coefficients of determinations were calculated using the function “R2.pred” from the R package rr2 81, based on models presented in Tables S1 and S3. All statistical tests performed are two-tailed. No correction for multiple testing was performed. Data and R code needed to reproduce the analysis are publicly available (see “code availability” section below). Animal silhouettes used to visually represent mammalian orders were downloaded from PhyloPic (http://www.phylopic.org).
To test the association between CMR and MHC I, we built PGLS models. We used cancer mortality risk (CMR and ICM) as dependent variables and selection for MHC diversity (overall or at PSS), or MHC I gene family size as explanatory variables. Given that the precision of estimates of CMR is contingent on the underlying sample sizes, we controlled for this biasing effect by including the log number of individuals with RDI per species as model weight in all models of CMR.
Confounding effects
To test the correlation between life-history traits and MHC diversity, we retrieved information on body mass from ZIMS and from literature resources30 and we estimated species-specific adult life expectancy and 80% adult longevity using the individual husbandry database obtained to estimate CMR/ICM. We calculated adult life expectancy as a simple average of post-sexual-maturity lifespan of dead individuals. For 80% adult longevity we calculated age-specific survival rates using the Kaplan–Meier procedure (survfit function in the R package survival82). Individuals older than their age at sexual maturity at the beginning of data registration were left-truncated at their age at this date; individuals reaching sexual maturity after this date were left-truncated at their age at sexual maturity. Individuals still alive at the time of data extraction were considered right-censored (samples per species varied from 94 to 8460 individuals), while known fate individuals were assigned as dead (n = 34,128), irrespective of whether RDI was available for the individual or not.
We also performed multiple sensitivity analyses, to explore the robustness of our results. (1) To test the potentially confounding effects of body mass and life expectancy on the relationship between CMR and selection for MHC I diversity, we introduced these two variables as covariates in PGLS models with CMR as dependent and MHC diversity as explanatory variables. (2) To check the robustness of our results to the phylogenetic hypothesis, we repeated single-predictor phylogenetic logistic regressions using a random sample of 100 trees provided by Upham et al.79. (3) Given that we previously have demonstrated that the consumption of mammalian prey items significantly increases the risk of cancer across mammals, we aimed to test whether this effect might confound our results. To do so, we obtained information on the diet of each species83 and we re-categorized mammal-based diet at two levels: never/rarely consumed or representing the primary/secondary food item of the species (see also ref. 1). Mammal consumption was introduced as a categorical explanatory variable to phylogenetic logistic regressions between cancer mortality and selection of MHC I diversity or MHC I gene family size.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Aggregated data generated in this study are available in Source Data. Raw data used to estimate cancer risk (Species360 Data Use Approval Number 91663) are available under restricted access (Species360 is the custodian, not the owner of their members’ data), and access can be obtained through Research Request applications (form available at https://docs.google.com/forms/d/1znoy62VEkDlhAp_0RfEvF7Zsx03g4W5AlppJHqo3_WQ/viewform?edit_requested=true&pli=1). Source data are provided with this paper.
Code availability
Data and R code needed to reproduce the analysis are publicly available at https://github.com/OrsolyaVincze/VinczeEtalNatComm_MHC-cancer.
References
Dersh, D., Hollý, J. & Yewdell, J. W. A few good peptides: MHC class I-based cancer immunosurveillance and immunoevasion. Nat. Rev. Immunol. 21, 116–128 (2021).
Rock, K. L., Reits, E. & Neefjes, J. Present yourself! by MHC class I and MHC class II molecules. Trends Immunol. 37, 724–737 (2016).
Carrington, M. et al. HLA and HIV-1: heterozygote advantage and B*35-Cw*04 disadvantage. Science 283, 1748–1752 (1999).
Piertney, S. B. & Oliver, M. K. The evolutionary ecology of the major histocompatibility complex. Heredity 96, 7–21 (2006).
Robinson, J. et al. The IPD and IMGT/HLA database: allele variant databases. Nucleic Acids Res. 43, D423–D431 (2015).
Sari, G. & Rock, K. L. Tumor immune evasion through loss of MHC class-I antigen presentation. Curr. Opin. Immunol. 83, 102329 (2023).
Dhatchinamoorthy, K., Colbert, J. D. & Rock, K. L. Cancer immune evasion through loss of MHC class I antigen presentation. Front. Immunol. 12, 469 (2021).
Topalian, S. L., Drake, C. G. & Pardoll, D. M. Immune checkpoint blockade: a common denominator approach to cancer therapy. Cancer Cell 27, 450–461 (2015).
Ehrlich, P. aul Ueber den jetzigen Stand der Karzinomforschung. Ned. Tijdschr. Geneeskd. 5, 273–290 (1909).
Thomas, L. “Discussion”. In Cellular and Humoral Aspects of the Hypersensitive States, (ed. Lawrence, H. S.), 529–532 (Hoeber-Harper, New York, 1959).
Burnet, M. Cancer - a biological approach. Br. Med. J. 1, 841–847 (1957).
Burnet, F. M. The concept of immunological surveillance. Prog. Exp. Tumor Res. 13, 1–27 (1970).
Shankaran, V. et al. IFNγ and lymphocytes prevent primary tumour development and shape tumour immunogenicity. Nature 410, 1107–1111 (2001).
Marty, R. et al. MHC-I genotype restricts the oncogenic mutational landscape. Cell 171, 1272-1283.e15 (2017).
Hughes, A. L. & Nei, M. Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335, 167–170 (1988).
Hughes, A. L. & Nei, M. Maintenance of MHC polymorphism. Nature 355, 402–403 (1992).
Siddle, H. V. et al. Transmission of a fatal clonal tumor by biting occurs due to depleted MHC diversity in a threatened carnivorous marsupial. Proc. Natl. Acad. Sci. USA 104, 16221–16226 (2007).
Doherty, P. C. & Zinkernagel, R. M. Enhanced immunological surveillance in mice heterozygous at the H-2 gene complex. Nature 256, 50–52 (1975).
Chowell, D. et al. Evolutionary divergence of HLA class I genotype impacts efficacy of cancer immunotherapy. Nat. Med. 25, 1715–1720 (2019).
Chowell, D. et al. Patient HLA class I genotype influences cancer response to checkpoint blockade immunotherapy. Science 359, 582–587 (2018).
Khong, H. T. & Restifo, N. P. Natural selection of tumor variants in the generation of “tumor escape” phenotypes. Nat. Immunol. 3, 999–1005 (2002).
Morris, K. M., Wright, B., Grueber, C. E., Hogg, C. & Belov, K. Lack of genetic diversity across diverse immune genes in an endangered mammal, the Tasmanian devil (Sarcophilus harrisii). Mol. Ecol. 24, 3860–3872 (2015).
Siddle, H. V., Marzec, J., Cheng, Y., Jones, M. & Belov, K. MHC gene copy number variation in Tasmanian devils: implications for the spread of a contagious cancer. Proc. R. Soc. B: Biol. Sci. 277, 2001–2006 (2010).
Caldwell, A. et al. The newly-arisen devil facial tumour disease 2 (DFT2) reveals a mechanism for the emergence of a contagious cancer. eLife 7, e35314 (2018).
Keskin, D. B. et al. Neoantigen vaccine generates intratumoral T cell responses in phase Ib glioblastoma trial. Nature 565, 234–239 (2018).
Rojas, L. A. et al. Personalized RNA neoantigen vaccines stimulate T cells in pancreatic cancer. Nature 618, 144–150 (2023).
Kelley, J., Walter, L. & Trowsdale, J. Comparative genomics of major histocompatibility complexes. Immunogenetics 56, 683–695 (2005).
Kaufman, J. Unfinished business: evolution of the MHC and the adaptive immune system of jawed vertebrates. Annu. Rev. Immunol. 36, 383–409 (2018).
Seluanov, A., Gladyshev, V. N., Vijg, J. & Gorbunova, V. Mechanisms of cancer resistance in long-lived mammals. Nat. Rev. Cancer 18, 433–441 (2018).
Vincze, O. et al. Cancer risk across mammals. Nature 601, 264–267 (2022).
Boddy, A. M., et al. Lifetime cancer prevalence and life history traits in mammals. Evol. Med. Public Health 2020, 187–195 (2020).
Abegglen, L. M. et al. Potential mechanisms for cancer resistance in elephants and comparative cellular response to DNA damage in humans. JAMA 314, 1850 (2015).
Peto, R. et al. Cancer and ageing in mice and men. Br. J. Cancer 32, 411–426 (1975).
Cagan, A. et al. Somatic mutation rates scale with lifespan across mammals. Nature 604, 517–524 (2022).
Bieuville, M., Tissot, T., Robert, A., Henry, P.-Y. & Pavard, S. Modeling of senescent cell dynamics predicts a late-life decrease in cancer incidence. Evol. Appl 16, 609–624 (2023).
Kryazhimskiy, S. & Plotkin, J. B. The population genetics of dN/dS. PLoS Genet. 4, e1000304 (2008).
Bjorkman, P. J. et al. The foreign antigen binding site and T cell recognition regions of class I histocompatibility antigens. Nature 329, 512–518 (1987).
Saper, M. A., Bjorkman, P. J. & Wiley, D. C. Refined structure of the human histocompatibility antigen HLA-A2 at 2.6 Å resolution. J. Mol. Biol. 219, 277–319 (1991).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Bentkowski, P. & Radwan, J. Evolution of major histocompatibility complex gene copy number. PLoS Comput. Biol. 15, e1007015 (2019).
Migalska, M., Sebastian, A., Konczal, M., Kotlík, P. & Radwan, J. De novo transcriptome assembly facilitates characterisation of fast-evolving gene families, MHC class I in the bank vole (Myodes glareolus). Heredity 118, 348–357 (2017).
Hilton, H. G. et al. Single-cell transcriptomics of the naked mole-rat reveals unexpected features of mammalian immunity. PLoS Biol. 17, e3000528 (2019).
Lin, T. & Buffenstein, R. The unusual immune system of the naked mole-rat. In The Extraordinary Biology of the Naked Mole-Rat (eds. Buffenstein, R., Park, T. J. & Holmes, M. M.) 315–327 (Springer International Publishing, Cham, 2021).
Minias, P., Pikus, E., Whittingham, L. A. & Dunn, P. O. Evolution of copy number at the MHC varies across the avian tree of life. Genome Biol. Evol. 11, 17–28 (2019).
Compton, Z. T. et al. Cancer prevalence across vertebrates. Cancer Discov. OF1–OF18. https://doi.org/10.1158/2159-8290.CD-24-0573 (2024).
Ronget, V. & Gaillard, J. M. Assessing ageing patterns for comparative analyses of mortality curves: going beyond the use of maximum longevity. Funct. Ecol. 34, 65–75 (2020).
Wang, Q.-L. et al. Association of HLA diversity with the risk of 25 cancers in the UK biobank. eBioMedicine 92, 104588 (2023).
Ribatti, D. The concept of immune surveillance against tumors: the first theories. Oncotarget 8, 7175–7180 (2017).
Migalska, M., Sebastian, A. & Radwan, J. Major histocompatibility complex class I diversity limits the repertoire of T cell receptors. Proc. Natl. Acad. Sci. USA 116, 5021–5026 (2019).
Klein, L., Kyewski, B., Allen, P. M. & Hogquist, K. A. Positive and negative selection of the T cell repertoire: what thymocytes see and don’t see. Nat. Rev. Immunol. 14, 377–391 (2014).
Chen, Y., Williams, V., Filippova, M., Filippov, V. & Duerksen-Hughes, P. Viral carcinogenesis: factors inducing DNA damage and virus integration. Cancers 6, 2155–2186 (2014).
Sun, D. et al. Novel genomic insights into body size evolution in cetaceans and a resolution of Peter’s paradox. Am. Nat. 199, E28–E42 (2022).
Miller, M. R., White, A. & Boots, M. Host life span and the evolution of resistance characteristics. Evolution 61, 2–14 (2007).
Prugnolle, F. et al. Pathogen-driven selection and worldwide HLA class I diversity. Curr. Biol. 15, 1022–1027 (2005).
Radwan, J., Babik, W., Kaufman, J., Lenz, T. L. & Winternitz, J. Advances in the evolutionary understanding of MHC polymorphism. Trends Genet. 36, 298–311 (2020).
Takasugi, M. et al. Naked mole-rat very-high-molecular-mass hyaluronan exhibits superior cytoprotective properties. Nat. Commun. 11, 1–10 (2020).
Tian, X. et al. High-molecular-mass hyaluronan mediates the cancer resistance of the naked mole rat. Nature 499, 346–349 (2013).
Gorbunova, V., Seluanov, A., Zhang, Z., Gladyshev, V. N. & Vijg, J. Comparative genetics of longevity and cancer: insights from long-lived rodents. Nat. Rev. Genet. 15, 531–540 (2014).
Tollis, M., Schneider-Utaka, A. K. & Maley, C. C. The evolution of human cancer gene duplications across mammals. Mol. Biol. Evol. https://doi.org/10.1093/molbev/msaa125 (2020).
Corthay, A. Does the immune system naturally protect against cancer? Front. Immunol. 5, 197 (2014).
Conde, D. A. et al. Data gaps and opportunities for comparative and conservation biology. Proc. Natl. Acad. Sci. USA 116, 9658–9664 (2019).
Kaufman, J., Salomonsen, J. & Flajnik, M. Evolutionary conservation of MHC class I and class II molecules—different yet the same. Semin. Immunol. 6, 411–424 (1994).
Almeida, T. et al. A highly complex, MHC-linked, 350 million-year-old shark nonclassical class I lineage. J. Immunol. 207, 824–836 (2021).
Schierup, M. H. & Hein, J. Consequences of recombination on traditional phylogenetic analysis. Genetics 156, 879–891 (2000).
Weaver, S. et al. Datamonkey 2.0: a modern web application for characterizing selective and other evolutionary processes. Mol. Biol. Evol. 35, 773–777 (2018).
Martin, D. P., Murrell, B., Golden, M., Khoosal, A. & Muhire, B. RDP4: Detection and analysis of recombination patterns in virus genomes. Virus Evol. 1, vev003 (2015).
Salazar, G. FastaUtils: Utilities for DNA/RNA Sequence Processing (R package v. 0.0.0.9000, 2021).
Murrell, B. et al. FUBAR: a fast, unconstrained bayesian approximation for inferring selection. Mol. Biol. Evol. 30, 1196–1205 (2013).
Bernatchez, L. & Landry, C. MHC studies in nonmodel vertebrates: what have we learned about natural selection in 15 years?. J. Evol Biol. 16, 363–377 (2003).
Westerdahl, H., Wittzell, H. & von Schantz, T. Polymorphism and transcription of MHC class I genes in a passerine bird, the great reed warbler. Immunogenetics 49, 158–170 (1999).
Babik, W., Pabijan, M. & Radwan, J. Contrasting patterns of variation in MHC loci in the alpine newt. Mol. Ecol. 17, 2339–2355 (2008).
Nigenda-Morales, S., Flores-Ramírez, S., Urbán-R, J. & Vázquez-Juárez, R. MHC DQB-1 polymorphism in the Gulf of California fin whale (Balaenoptera physalus) population. J. Hered. 99, 14–21 (2008).
Cortázar-Chinarro, M., Meyer-Lucht, Y., Laurila, A. & Höglund, J. Signatures of historical selection on MHC reveal different selection patterns in the moor frog (Rana arvalis). Immunogenetics 70, 477–484 (2018).
Kaufman, J. et al. Different features of the MHC class I heterodimer have evolved at different rates. Chicken B-F and beta 2-microglobulin sequences reveal invariant surface residues. J. Immunol. 148, 1532–1546 (1992).
Kirilenko, B. M. et al. Integrating gene annotation with orthology inference at scale. Science 380, eabn3107 (2023).
Harrison, P. W. et al. Ensembl 2024. Nucleic Acids Res. 52, D891–D899 (2024).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Upham, N. S., Esselstyn, J. A. & Jetz, W. Inferring the mammal tree: species-level sets of phylogenies for questions in ecology, evolution, and conservation. PLoS Biol. 17, e3000494 (2019).
Sukumaran, J. & Holder, M. T. DendroPy: a Python library for phylogenetic computing. Bioinformatics 26, 1569–1571 (2010).
R. Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2020).
Ives, A. R. R 2 s for correlated data: phylogenetic models, LMMs, and GLMMs. Syst. Biol. 68, 234–251 (2019).
Therneau, T. A package for survival analysis in R. R package version 3.8-3 https://CRAN.R-project.org/package=survival (2024).
Kissling, W. D. et al. Establishing macroecological trait datasets: digitalization, extrapolation, and validation of diet preferences in terrestrial mammals worldwide. Ecol. Evol. 4, 2913–2930 (2014).
Acknowledgements
We are grateful to more than 1200 zoos and aquariums, members of Species360, that record data in ZIMS, making this study possible. This research was supported by the National Scientific Research Fund OTKA K143421 (O.V.); the Romanian Ministry of Research, Innovation and Digitization (CNCS-UEFISCDI, project number PN-III-P1-1.1-TE-2021-0502) (O.V.); the Gordon and Betty Moore Foundation, grant GBMF9021 (T.P.); the Agence Nationale de la Recherche, grant ANR-23-CE02-0019 (COVER) (O.V., M.G.); and the Région Nouvelle-Aquitaine (Chaire d’excellence “Cancer et Biodiversité”) (M.G.); the South-Eastern Norway Regional Health Authority, grant 2022099 (A.C.), The Research Council of Norway through its Centres of Excellence scheme, project number 262613 (A.C.), the CNRS (IRP CANECEV) (F.T.), the HOFFMANN Family (F.T.), and by the Agence Nationale de la Recherche, grant ANR-23-CE13-0007 (EVOSEXCAN) (F.T.); the Estonian Research Council, grant PSG653 (T.S.).
Author information
Authors and Affiliations
Contributions
O.V. conceptualized the study; O.V., P.M., L.M. curated data, O.V. performed statistical analyses and led the writing of the manuscript; all authors (O.V., P.M., A.C., F.C., J.F.L., L.M., T.M., J.H., D.A.C., S.P., A.M.D., B.U., F.T., A.M.B., C.C.M., D.C., T.S., T.P., M.G.) contributed to the interpretation of the results and writing of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Vincze, O., Minias, P., Corthay, A. et al. Immunological surveillance against cancer across mammals. Nat Commun 16, 10333 (2025). https://doi.org/10.1038/s41467-025-65286-x
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41467-025-65286-x






