Introduction

Physical exercise is a common prescription for promoting good health, as it is a primary preventive measure for a wide range of diseases1. Nonetheless, the specific parameters related to the optimal intensity and quantity of physical activity are yet to be fully characterized2. For instance, high-performance exercise may have distinct or even deleterious effects on the organism, such as in terms of cardiac function3, and an extensive exercise load significantly increases the risk of injury, including through inadequate recovery from substantial exertion or overtraining4,5,6,7. In professional sports such as soccer, where teams experience dozens of muscle injuries each season8, the incidence and burden of injuries have not decreased over the past 20 years despite preventive measures9. This issue poses significant challenges both for sports performance and financial outcomes10,11. Thus, biomarkers able to predict the biological impacts of intense exercise might be useful for the effective design of health-promoting interventions and could also help guide the implementation of injury prevention measures in elite sports organizations.

In this context, the biological age of an individual, understood as a general measure of physiological functioning taking into account chronological age12, can be used as a parameter for monitoring overall health status. An increased biological age would involve a deteriorated cellular, molecular and organismal state, which seems to be associated with a wide range of health issues. In this scenario, epigenetic clocks have been recently developed as tools to estimate biological age. These clocks rely on DNA methylation (DNAm) data to build mathematical models that are able to predict an “epigenetic” age, which is indicative of the overall health status13. Moreover, deviations between epigenetic and chronological age, known as “age acceleration or deceleration”, have been linked to diverse environmental, biological and social factors, such as physical activity14. In fact, it is well known that exercise can directly impact the DNA methylome in a dose- and tissue-dependent manner15,16. Although exercise has been related to biological age deceleration17, the type and intensity of the physical activity may play an important role on the final DNAm outcome15,18. For instance, epigenetic age measured in a limited number of CpG sites has already been demonstrated to be accelerated in elite athletes19. Therefore, the question remains how prolonged high-intensity exercise specifically affects the genome-wide DNAm landscape and epigenetic age, and what implications these associations may have for health, including potential links between DNA methylation patterns and injury susceptibility. Understanding this could aid in monitoring both high-level performance outcomes and injury-related issues.

In this study we have analyzed the whole-genome DNAm patterns and epigenetic age parameters of a cohort of elite male and female soccer players using peripheral blood samples. We detected a signature that differentiates players with higher injury risk from those with lower risk. Additionally, although epigenetic age seemed to be unrelated to injuries, we could detect extensive differences in age-accelerated versus age-decelerated players, which were related to muscle and extracellular matrix pathways.

Methods

Participants

Peripheral blood samples from both female (n = 37, median age: 23.2, range: 14.6–35.6) and male (n = 37 median age: 24.0, range: 15.9–32.6) top-performance soccer players were gathered over three seasons (2019–2020, 2020–21 and 2021–22). All athletes belong to Futbol Club Barcelona soccer teams playing the first divisions of the Spanish National Leagues (women in the Primera División Femenina de España or Liga F, and men in the LaLiga EASports). Samples were processed to obtain peripheral blood mononuclear cells (PBMCs), cryopreserved and stored at the Biobank of the Hospital Clínic de Barcelona—Institut d’Investigacions Biomèdiques August Pi i Sunyer (HCB-IDIBAPS). To ensure consistency, samples were collected in pre-season (July) for all players, with the exception of 8 samples (3 male and 5 female) collected in September due to scheduling and availability constraints. BMI of players was within normal ranges across seasons (average = 22.2, SD = 1.8), and participants did not report alcohol or tobacco consumption.

Muscle, tendon and ligament injuries recording

This study followed consensus guidelines on the definitions and data collection procedures for football injury studies described by UEFA20. Injuries had to occur during training or match. We used only injuries that caused absence from at least the following training session or match. They were classified as time-loss (TL) injuries. We recorded the history of muscle, tendon and ligament injuries of all players over the seasons in our Club using a validated electronic medical record system (COR version 1.0; FCB, Spain). Injury diagnoses were made by the relevant medical physicians (FCBarcelona team doctors) of each team during the evaluation period, following criteria included in the FC Barcelona 2018 Muscle Injury Guide21. Injuries were classified using the OSICS_10 coding system (Orchard Sports Injury Classification System)22, muscle injuries were indicated by the code OSICS -M–; tendon injuries by code OSICS-T– and ligament injuries by code OSICS -J–. To calculate injury severity (return to play) we have based on a UEFA proposal20 and it was determined according to the number of days from injury occurrence until the player play again at least 60 min per match, ranging from mild (1–7 days), to moderate (8–28 days), and severe (> 28 days). In addition, specific injury variables were summarized into an “Injury score”, calculated as Injuries per season * Days out per season. A second score, “Injury score plus relapse” was computed as Injury score * Relapses per injury. These composite scores provide a single metric that captures both the frequency and severity of injuries, and may allow for a more comprehensive assessment of overall injury burden and facilitating comparisons between players.

Ethics statement

The study was conducted according to the guidelines of the Declaration of Helsinki. Institutional board approval for the study was obtained from local research ethics committee on Science and Ethics of the Barça Innovation Hub (Football Club Barcelona n˚ 2021FCB29), and the Ethics Commission of the Consell Català de l’Esport (Code 012/CEICEGC/2021, Generalitat de Catalunya, Barcelona, Spain). All participants were informed of the risks and benefits of the study and gave written consent for this study. All personal information and results were anonymized.

DNA methylation arrays

From each PBMC sample stored at the Biobank of the HCB-IDIBAPS, DNA was extracted using DNA extraction kit (Qiagen, following manufacturer instructions). After Qbit measurements and DNA quality control, the Infinium Methylation EPIC BeadChip arrays v2 (by Illumina) were performed in order to measure DNA methylation of the samples. These methylation arrays were outsourced at Human Genotyping Platform (CeGen) at CNIO (Madrid, Spain), where all the steps of sample preparation, bisulfite modification and arrays runs were performed.

DNA methylation data processing

Raw data in the form of IDAT files were processed using a pipeline based on the R/Bioconductor package SeSAMe (v1.20.0)23. Briefly, the SeSAMe preprocessing code “QCDPB” was used to include quality and SNP masking (Q), channel inference (C) and dye bias correction (D), detection p-value masking (P) and noob background correction (B)24. Probes with NA masking in one or more samples were filtered out. EPICv2 replicated probes were averaged across replicates. Self-reported sex was validated using the DNA methylation data with the estimateSex function from the R/Bioconductor package wateRmelon (v2.8.0)25. The R/Bioconductor package IlluminaHumanMethylationEPICv2anno.20a1.hg38 (v1.0.0) was used for array annotation. Additionally, the following probes were filtered out: “ch” probes (non-CpG), “rs” probes (SNP), “cl/ct” probes (control), “nv” probes (EPICv2 nucleotide variants), sex-chromosome probes, “Flagged probes” and “Probes with mapping inaccuracies” as described in the Infinium MethylationEPIC v2.0 release notes, EPICv2 off-target probes described by Peters TJ et al.26, other multimapping and cross-reactive probes previously described27,28, probes with SNPs with MAF >  = 0.01 at their CpG or SBE sites (dbSNP151), and probes not present in the IlluminaHumanMethylationEPICv2anno.20a1.hg38 annotation. Additionally, experiment-specific conflicting probes were identified and removed using the gaphunter method (threshold = 0.2, outCutoff = 0.03) from the R/Bioconductor package minfi (v1.48.0)29. After all the filtering steps, the final number of probes in the dataset was 774,288.

DNA methylation age computation

Epigenetic age was estimated for several DNA methylation clocks: 1) the R/Bioconductor package methylclock (v1.8.0)30 was used to compute the “Horvath” universal clock31, the “Hannum” blood clock32, the “PhenoAge” clock33, the “DNAmTL” telomere clock34 and the “Zhang EN” and “Zhang BLUP” clocks35; 2) the DNA Methylation Age Calculator tool (https://dnamage.clockfoundation.org/, accessed May 2024) was used to compute the “GrimAge”36 and “GrimAge2”37 clocks. Epigenetic age acceleration (EAA) was defined as the residuals of adjusting the epigenetic age values for chronological age in a linear model.

Differential DNA methylation analyses

Prior to statistical analyses, β-values were logit-transformed to M-values38. The R/Bioconductor package limma (v3.58.1)39 was used for the differential analyses. Linear models were built fitting M-values as dependent variables and different groups as independent variables. Models were adjusted for sex. Differentially methylated CpGs (DMCs) were defined as those with FDR or unadjusted p < 0.05 and a β-value difference > 5% between the compared groups. A 5% β-value threshold has been previously used for differential studies involving PBMCs40 and is above the minimal threshold previously recommended for epigenome-wide association studies41. To explore the robustness of the findings, statistical power analyses were carried out using the R package pwr (v1.3–0), based on two-sample t-tests with unequal group sizes. In the injury-cluster differential analysis (nominal significance and > 5% β-value difference), 87.5% of CpGs showed statistical power exceeding 80%. In the clock epigenetic age acceleration analysis (FDR < 0.05 and > 5% β-value difference), between 59 and 67% of sites had power above 80% across the different clock comparisons.

Pathway enrichment analyses

The R/Bioconductor package missMethyl (v1.36.0) was used for pathway enrichment analyses42. A selection of MsigDB databases was used, including Gene Ontology and Canonical Pathways gene sets43,44. The gsameth function was employed to estimate enrichments taking into account differences in the number of probes mapping to each gene and the EPICv1 annotation was used via the R/Bioconductor package IlluminaHumanMethylationEPICanno.ilm10b4.hg19 (v0.6.0), so input CpGs were filtered for those present in the EPICv1 arrays.

Results

Differential DNA methylation analysis in soccer players with high versus low injury risk

We analyzed the methylation status of 774,288 CpG sites in the peripheral blood of 74 male and female elite soccer players with extensive characterization of injury parameters (Supplementary Table 1, Fig. 1A, Methods). We initially explored possible associations between injury parameters and DNAm profiles. As a first step, we aimed at identifying athletes with higher propensity for injury frequency and severity. Although injury susceptibility is important in elite sports, there is no clear consensus criterion for defining “injury-prone” players. Thus, we decided to employ an agnostic clustering method to avoid relying on arbitrary thresholds across different injury parameters. To do so, we used the normalized injury variables and scores (Supplementary Table 1) to perform hierarchical clustering. This analysis revealed two clear groups of athletes with different patterns of non-contact injuries (Fig. 1B). Most athletes belonged to cluster 1 (N = 63, 32 females, 31 males), which is associated with reduced injury frequency and duration. In contrast, cluster 2 (N = 11, 5 females, 6 males) identified players with significantly higher scores across various injury parameters (Wilcoxon adj. p < 0.01) and therefore represent a group of individuals with higher frequency and duration of injuries (Fig. 1C). Next, we performed a differential methylation analysis (DMA) to directly compare mean DNAm levels between the two groups across all of the CpG sites measured by the array. Although we did not detect strong differences between the two clusters, using a conservative approach we were able to identify a signature of 1081 differentially methylated CpGs (DMCs) in cluster 2 as compared to cluster 1 (p < 0.05, methylation difference > 5%, 424 hypermethylated and 627 hypomethylated, Supplementary Table 2) (Fig. 1D). These CpG sites mapped to a total of 272 genes. Although these genes did not seem to be significantly enriched in any particular biological pathway, we detected some interesting candidates. These include genes such as MYOM2 and VAMP545,46 related to skeletal muscle development and function, and RYR1, NPPA or CACNB247,48,49 involved in muscle contraction (Fig. 1E). In some of these genes, such as VAMP5 or NPPA, there were more than one differential CpG between C1 and C2 clusters (Fig. 1E). This finding provides further assurance on the significance of the results and suggests that differential methylation associated to injuries may not only target specific CpGs but also larger genomic regions spanning various CpGs.

Fig. 1
figure 1

Genome-wide DNA methylation profiling of soccer players and injury profiling of the cohort. (A) Schematic of the study design. (B) Dendrogram showing the clustering of subjects into injury-low (cluster 1) and injury-high (cluster 2) groups. (C) Heatmap indicating the scaled value for the different injury variables used for the clustering. On the left, the adjusted p-values for Wilcoxon tests comparing cluster 1 to cluster 2 are shown. (D) Heatmap indicating scaled methylation values of 1,081 CpGs differential between C1 and C2 groups. (E) Boxplots of selected differential CpGs within genes associated with muscle development and function.

Epigenetic age acceleration is more prevalent in male players and is associated with a differential DNA methylation signature

In addition to epigenetic signatures, we also exploited DNAm information to estimate epigenetic age with a selection of 8 widely-used epigenetic clocks, including chronological age clocks (Horvath, Hannum, Zhang BLUP and Zhang EN), phenotypic clocks (PhenoAge, GrimAge, GrimAge2), and a telomere length estimator (DNAmTL) (see Methods section for further details). In general, the epigenetic age estimations were highly correlated with chronological age, while the DNAmTL clock showed an expected inverse correlation (Fig. 2A). All of the estimations were highly associated with chronological age (all F-test adj. p < 0.001, Fig. 2B, Supplementary Figure S1), but there was remaining unexplained variability (R^2 coefficients ranging from 0.38 to 0.88), suggesting that individuals showed deviations from the predicted epigenetic ages. Thus, we computed epigenetic age acceleration (EAA) as the residuals of a linear model fitting epigenetic age against chronological age. The calculated EAA showed variable levels of correlation between the clocks, with the GrimAge and GrimAge2 estimators being the most similar (Supplementary Figure S2). Then, we speculated that athletes in cluster 2 may have a higher EAA than those with less injury risk. However, the EAA across epigenetic clocks was comparable between the two clusters (Fig. 2C, all Wilcoxon adj. p > 0.05).

Fig. 2
figure 2

Epigenetic clocks in male and female soccer players. (A) Heatmap displaying the correlation levels (Pearson coefficients) between chronological age and epigenetic age estimations. (B) Scatter plots showing the relationship between epigenetic age and chronological age for the Horvath and DNAmTL clocks. (C) Box plots describing the values of EAA for the Horvath, Hannum, PhenoAge, Zhang2019 BLUP, Zhang2019 EN, GrimAge, GrimAge2 and DNAmTL clocks, segregated by cluster 1 and cluster 2. (D) Box plots indicating the values of EAA for the Horvath and DNAmTL clocks, segregated by female and male athletes.

Although the level and intensity of injuries did not seem to be associated with EAA, we observed heterogeneity in the levels of EAA. Remarkably, part of this heterogeneity was caused by significant EAA differences between male and female athletes across most of the clocks, with male soccer players consistently showing higher EAA than female players (Wilcoxon test, adj. p ranging from 0.043 to < 0.001, Fig. 2D, Supplementary Figure S3). These findings are consistent with observations made in other non-athlete populations, whereby males are systematically older regarding biological age estimations50,51,52,53.

We further explored whether differences in EAA may reflect particular biological features of the athletes. Therefore, we decided to group athletes based on their EAA in order to compare subjects with positive EAA to those with negative EAA levels, i.e., athletes displaying higher-than-expected versus lower-than-expected epigenetic ages, respectively. We performed a DMA and we detected widespread DNAm differences between the groups, adjusting for gender, across most epigenetic clocks (Fig. 3A, Supplementary Table 3, FDR < 0.05 and > 5% methylation difference), i.e. Hannum (N = 17,715 CpGs), PhenoAge (N = 20,103 CpGs), Zhang EN (N = 38,393 CpGs), GrimAge (N = 705 CpGs), GrimAge2 (N = 16,486 CpGs) and DNAmTL (N = 24,870 CpGs). Interestingly, we observed a strong bias towards hypomethylated CpGs in the positive EAA group (i.e. higher epigenetic age than chronological age) (Fig. 3A). The analysis of the epigenetic clock estimating telomere length (DNAmTL) predominantly showed hypermethylation in individuals with higher estimated telomer length, which is the expected reverse pattern. These results indicate that elite athletes with age acceleration (or lower telomere length) display general loss of DNAm in peripheral blood. To expand on the biological characterization of the alterations found, we mapped the DMCs to genes and performed pathway enrichment analyses. We detected significant (FDR < 0.05) enrichments against Gene Ontology pathways across most of the clocks, with several pathways being shared between various clocks (Fig. 3B, Supplementary Table 4). Remarkably, many of the shared gene sets were associated with cellular adhesion and cytoskeletal components (Fig. 3C). To explore this further, we also performed enrichments against the MSigDB Canonical Pathways database, which contains pathways describing extracellular matrix (ECM) functions. Notably, we observed strong enrichments for ECM-associated pathways across various clocks (Fig. 3D,E, Supplementary Table 4). These gene sets contain multiple collagen, protease and integrin genes, suggesting that these gene families are particularly enriched in EAA-associated DNAm alterations. Indeed, numerous collagen genes displayed several DMCs across various clocks, such as COL23A1 (Fig. 3F), which particularly reflected a global loss of DNAm in epigenetically aged athletes.

Fig. 3
figure 3

DNA methylation alterations and biological associations in epigenetically-accelerated elite soccer players. (A) Bar plots showing the number of DMCs (FDR < 0.05, methylation difference > 5%) between positive-EAA and negative-EAA athletes. (B) Upset plot indicating the numbers and intersections of significant (FDR < 0.05) Gene Ontology pathways found enriched for the DMCs. (C) Bubble plot displaying the Gene Ontology pathways found enriched in comparisons involving 3 or more clocks. (D) Upset plot showing the numbers and intersections of significant (FDR < 0.05) Canonical Pathways gene sets found enriched for the DMCs. (E) Bubble plot indicating the Canonical Pathways gene sets found enriched in comparisons involving 2 or more clocks. (F) Genome profile plots showing the methylation difference levels between positive-EAA and negative-EAA athletes defined by the PhenoAge (left) and Zhang2019 EN (right) clocks for all of the CpG sites associated with the COL23A1 gene. Statistically significant changes (FDR < 0.05, methylation difference > 5%) are marked in red.

Finally, we investigated whether additional covariates within our study cohort could confound or influence the analytical results, including sample collection date, playing position, and inferred regional ancestry (see Supplementary Table 1). We found no significant association between these variables and the previously defined injury clusters (all chi-squared adj. p > 0.05). Similarly, none of the specific injury-related variables differed significantly across any category of these groups (all sex-adjusted linear model adj. p val > 0.05).

In the case of the EAA analyses, collection date was significantly associated with EAA for the PhenoAge and Zhang BLUP clocks (sex-adjusted linear model adj. p val < 0.05). Consequently, we repeated all EAA analyses while adjusting for collection date, obtaining highly similar results in terms of the number and direction of changes (Supplementary Figure S4A-B) as well as in the enrichment of pathways related to cell adhesion, cytoskeletal organization, and extracellular matrix functions (Supplementary Figure S4C-F).

Discussion

Physical activity has been linked to good health and even to reduced biological aging14,17, but there is scarce data regarding the impact of sustained high-intensity, elite-level sports practice on the epigenome and the biological age, with some studies even suggesting potential detrimental effects19. In the present study, we have characterized the genome-wide DNAm levels of the peripheral blood of 74 elite soccer players with a tracked record of injury parameters. We aimed to identify DNA methylation alterations associated with increased injury risk and to characterize epigenetic age acceleration within this cohort. Overall, our findings revealed a distinct DNA methylation signature that differentiated players with elevated injury risk. Although epigenetic age acceleration was not linked to injury susceptibility, it exhibited pronounced sex-specific differences and was associated with widespread hypomethylation in individuals with accelerated epigenetic aging.

Although the DNA methylome of players with higher and lower frequency of injuries was overall similar, a conservative approach revealed a consistent signature of 1,081 DMCs. The DMCs targeted some genes involved in muscle activity, which is intriguing considering that we are analyzing blood samples. Indeed, not having been able to study the DNA methylome of muscle biopsies is one limitation of our study. This is not precluding the importance of DNA methylation changes triggered in situ by injuries, as has been shown in some previous studies on mice models54,55. Even though the cause and potential biological impact of the differential methylation in peripheral blood samples remains uncertain, the DNA methylation levels of individual genes or the overall signature could potentially serve as the basis to develop epigenetic biomarkers as predictors of injury predisposition. However, additional studies in larger series of soccer players, and eventually of elite players or other sports, should be undertaken to assess the reproducibility of our findings. In particular, the limited sample size in our study impeded the training and validation of predictive models of injury risk, which, if feasible, would provide valuable clinical insights.

Although our initial hypothesis was that players more susceptible to injuries may show an increased epigenetic age acceleration, which may account for their higher incidence of injury and capacity to recover, we could not find any significant association. Instead, we observed that male athletes displayed higher levels of EAA than females, a finding consistent with results from other studies not related to exercise50,51,52,53. These differences have sometimes been linked to lifestyle factors, such as smoking prevalence in men56, however, our cohort of elite soccer players maintains overall healthy and controlled lifestyle habits regardless of sex. Therefore, it is tempting to speculate that sex-related differences in the professionalization of soccer players may in part account for the exacerbated EAA in male players57. Since earlier age, men undergo a higher training and competition load than women, which may lead to sex-related molecular differences, as previously described at the metabolic level58. In our cohort, however, there were no significant differences between male and female players regarding the years spent at the club (see Supplementary Table 1). Nonetheless, we acknowledge that our study does not capture lifetime training exposure, which limits further exploration of potential associations with epigenetic age. More broadly, the lack of exposure records represents a major limitation of our study. Exposure to football activity is a critical determinant of injury risk, as even players with high susceptibility will not sustain injuries without sufficient exposure.

To gain insights regarding possible epigenetic alterations in the blood of age-accelerated players, we performed differential analyses and observed widespread DNAm changes, remarkably biased towards DNAm loss. Interestingly, DNAm loss is also known to occur with post-adulthood aging in blood59, and therefore our observations suggest that biologically-aged athletes may display epigenetic alterations similar to those acquired with physiological aging. Remarkably, when we examined the genes affected by the EAA-associated DNAm alterations, these were enriched in extracellular matrix, cytoskeletal and collagen-related functions. It is interesting that we were able to detect muscle-related pathways in a different tissue, although the systemic nature of blood makes it a communication nexus for the entire organism. Indeed, it has been reported in many contexts that physical exercise induces DNAm alterations in peripheral blood60. However, the extent to which these changes have functional significance or are merely secondary effects of the stimulation remains to be explored, particularly because the relationship between DNAm and gene expression regulation is complex and context-dependent61,62. However, we could speculate different scenarios which are not mutually exclusive: some DNAm alterations may reflect genetic polymorphisms (meQTLs) related to muscle-related genes manifested across multiple tissues. In addition, circulating signals from muscle, such as extracellular vesicles or cell-free DNA, could shape immune cell epigenetics. Finally, because immune cells contribute to muscle repair, inflammation, and matrix remodeling, shared epigenetic changes may naturally arise in both blood and muscle. Further studies not only in peripheral blood but most importantly in muscle samples might be needed to fully explore the possible functional impact of our results.

In this study, we identified DNAm alterations associated with both injury risk and epigenetic age acceleration. A mediation analysis could in principle clarify whether DNAm mediates a relationship between these variables. In our dataset, however, EAA was not associated with injury risk, and the overlap between injury-DMCs and EAA-DMCs was minimal. This prevents meaningful mediation analysis, particularly in light of limited sample size. In fact, our observations suggest that EAA and injury risk may reflect distinct biological processes in elite athletes, shaping DNAm through independent mechanisms.

Taken together, our study reveals a link between injuries and DNA methylation patterns, which may serve to predict the injury susceptibility in elite soccer players. In this context, we envision that an integrative approach including epigenetics, germline genetics, metabolomics and external load parameters measured by Global-Positioning-System (GPS)63 may serve as the basis to develop accurate omics-based predictors of injury vulnerability. Furthermore, we observed that a set of players enriched in men are epigenetic-accelerated and related to a global hypomethylation. Whether this phenomenon remains once the professional life of the players is finished, and whether it is associated with the negative effects of age-acceleration observed in the general population remains to be determined in future studies.