Introduction

Duchenne muscular dystrophy (DMD) is the most common form of muscular dystrophy1, caused by variants in the DMD gene that result in an absence of functional dystrophin2. Without dystrophin, skeletal muscles are repeatedly damaged and replaced by fibrofatty tissue over time3. Symptoms typically begin in early childhood with delayed motor milestones and include progressive muscle weakness. Untreated, most individuals lose the ability to walk by the age of 10 years, followed by loss of the ability to use the upper extremities4,5.

Current treatment for DMD consists of chronic corticosteroids (CS), including prednisone or deflazacort. CS use has been shown to improve strength and delay loss of ambulation, and when combined with non-invasive ventilation, CS can increase life expectancy to 30–40 years of age4,5. There is variability in CS dosing strategies between physicians and centers. Higher CS doses and daily administration slow motor decline to a greater extent than lower doses and intermittent dosing schedules6,7; however, higher cumulative doses increase the risk of adverse effects such as negative metabolic consequences, bone fragility, and delayed puberty7,8.

Despite the use of CS, progress in understanding the biology of the disease, and the large investment in pre-clinical research, there is currently no cure for DMD. Multiple investigational drugs have failed to show significant improvement in participants’ performance in clinical trials, with only a few disease-modifying drugs receiving commercial approval9,10,11,12. While drug trial failures may be related to limited drug potency, it has also become clear that inter-individual variation in disease trajectories and limited sensitivity of motivation-sensitive outcome measures complicate the detection of significant and clinically meaningful treatment effects in clinical trials. Additionally, participants are typically enrolled in clinical trials before the age of 8, during the early stages of disease progression, when minimal decline is expected within the duration of an interventional study. Therefore, being able to objectively predict long-term clinical outcomes based on biological evidence would greatly facilitate investigational drug trial design and conduct.

Numerous serum biomarkers have been proposed for DMD13,14,15. However the focus has mostly been on the identification of cross-sectional differences between unaffected controls and individuals with DMD. Research efforts have focused on the effects of CS use on biomarkers15, muscle damage biomarkers16,17,18, and monitoring response to (micro-)dystrophin restoration therapies19,20. However, serum biomarkers with prognostic or predictive value are currently lacking. Initial attempts to determine longitudinal trajectories in serum biomarkers related to disease progression18,21,22 have been limited by small samples, incomplete clinical and follow-up data, and limited numbers of proteins assessed. Therefore, discovery studies in large and well-characterized cohorts are needed to identify biomarkers that correlate with clinical performance and that predict clinically meaningful disease milestones.

In this retrospective, multicohort study, we aimed to identify proteins in serum associated with clinical function and predictive of meaningful disease milestones in individuals with DMD, while accounting for the effects of age and corticosteroid treatment. Using the 7 K SomaScan®, we analysed 702 longitudinal serum samples obtained from 153 individuals with DMD. We present a collection of serum proteins that are associated with, and predictive of, clinical function in DMD.

Results

Serum samples from 74 males with DMD from the Leiden University Medical Center (LUMC) cohort and 79 males with DMD from the University of Florida (UF) cohort were included in the study. In total, 693 of 702 samples (98.7%) passed quality control standards for inclusion in analyses (see methods). Table 1 gives details of participant and sample characteristics for both cohorts. LUMC participants were significantly younger at the first sample visit compared to UF participants (mean[SD]; 8.4[3.4] vs 10.9[3.2], p < 0.001). An average of 4.3 serum samples per participant were analysed, with longer follow-up duration in the LUMC cohort (5.7[3.6] vs 3.4[2.5] years, p < 0.001). An intermittent CS regimen (10 days on/10 days off) was most common in the LUMC cohort, while daily dosing was most common in the UF cohort. Age at initiation of CS treatment was comparable across sites. Only a few individuals remained CS naïve for the entire study period (4.1% vs 3.2%).

Table 1 Cohort characteristics

Serum protein associations with age and corticosteroid use

Given the progressive nature of DMD, we first identified proteins associated with age. Overall, 4796 probes (4436 proteins) were associated with age in the LUMC cohort and 2668 probes (2498 proteins) in the UF cohort (FDR < 0.05)(Fig. 1a, b). A total of 2317 probes (2186 proteins) were shared between cohorts, with 2251 probes showing concordant directional change (Fig. 1c, d). Multiple muscle proteins were negatively associated with age, including creatine kinase (CK-MM), myomesin 3 (MYOM3), titin (TTN), and troponin I2 fast skeletal type (TNNI2) (Fig. 1e–h). Other notable negative associations with age include proteins involved in fibrosis, extracellular matrix as well as bone morphogenetic proteins and protein synthesis pathways.

Fig. 1: Associations of serum proteins with age in individuals with DMD.
Fig. 1: Associations of serum proteins with age in individuals with DMD.
Full size image

a Volcano plots showing the strength and significance of the association with age in the LUMC cohort, and b in the UF cohort. For each probe (point), the -log10 FDR (y-axes) and coefficient (x-axes) of age as determined by an LMEM probe expression from age are shown. The horizontal line represents the threshold for significance, FDR < 0.05. c Venn diagram showing the overlap of protein numbers associated with age in the LUMC (orange) and UF cohorts (blue). d Scatterplot of the coefficients for age shown in (a, b) in the LUMC (x-axis) versus UF (y-axis) cohorts. Dark green dots show proteins significantly associated with age in both cohorts. eh Trajectory plots of selected probes decreasing with age such as CK-MM, MYOM3, TTN, and TNNI2, colored by site. il Trajectory plots of selected probes increasing with age such as CNTN3, CNDP1, LEP, and TIMP4, colored by site. (RFU relative fluorescence units, DMD Duchenne muscular dystrophy, LUMC Leiden University Medical Center, UF University of Florida, CK-MM muscle creatine kinase, MYOM3 myomesin 3, TTN, titin TNNI2, troponin I2 fast skeletal type, CNTN3 contactin 3, CNDP1 carnosine dipeptidase 1, LEP leptin, TIMP4 tissue inhibitor of metalloproteinase 4). Source data are provided as a Source Data file.

Proteins positively associated with age include central nervous system proteins such as contactin 3 (CNTN3) and carnosine dipeptidase 1 (CNDP1), adipose tissue proteins such as leptin (LEP) and tissue inhibitor of metalloproteinase 4 (TIMP4), and various cytokines (Fig. 1i–l). Other noteworthy findings include opposing age associations for two insulin-like growth factor 1 (IGF1) probes, as well as discordant trajectories for several insulin-like growth factor binding proteins, with IGFBP5 and IGFBP6 increasing and IGFBP1 and IGFBP2 decreasing with age. A full list of all significant concordant proteins and their coefficients, and whether they were previously reported15,21,23 or newly discovered, can be found in Supplementary Data 1. Pathway analysis of proteins associated with age shows the sustained inflammation and pro-fibrotic components related to progression in DMD (Supplementary Data 2).

Given that the majority of participants had chronic exposure to CS, we next identified protein expression related to CS exposure. Because of the association of a large number of proteins with age, we treated age as a covariate to isolate the effects of CS treatment. In the LUMC cohort, 846 probes (790 proteins) were significantly associated with CS use compared to 396 probes (364 proteins) in the UF cohort (FDR < 0.05)(Fig. 2a, b). Furthermore, 244 probes (227 proteins) were shared across sites (Fig. 2c, d). Among the significant probes, we observed previously reported proteins like CD23 and matrix metalloproteinase-3 (MMP3)8, as well as newly identified proteins such as immunoglobulin lambda-like polypeptide 1 (IGLL1) and repulsive guidance molecule A (RGMA) (Fig. 2e, h).

Fig. 2: Associations of serum proteins with corticosteroid use in individuals with DMD.
Fig. 2: Associations of serum proteins with corticosteroid use in individuals with DMD.
Full size image

Volcano plots showing strength and significance of the associations between proteins and corticosteroid use in a the LUMC cohort, and b the UF cohort. For each probe (point), the −log10 FDR (y-axes) and coefficient (x-axes) of CS use as determined by an LMEM probe expression from CS use and age are shown. A few probes are highlighted by arrows. The horizontal line represents the threshold for significance, FDR < 0.05. c Venn diagram showing the overlap of proteins associated with corticosteroid use in common between LUMC (orange) and UF (blue) cohorts. d Scatterplot of the coefficients for CS use shown in (a, b) in LUMC (x-axis) and UF (y-axis) cohorts. Box plots of probe expression in CS treated versus untreated patients (LUMC 405 samples, UF 79 samples) for two proteins previously reported in association with CS use (e, f) and two proteins identified in this study (g, h). For each box, the center line represents the median and the lower and upper edges represent the 25th (Q1) and 75th (Q3) percentiles respectively. Whiskers extend to the most extreme probe expression values within 1.5 × interquartile range from Q1 and Q3. i Scatterplot showing the relationship between age coefficients (x-axis) and CS coefficients (y-axis) for probes significantly associated with CS treatment. Proteins showing discordant coefficients for age and CS treatments were considered as efficacy biomarkers, while those with concordant coefficients were considered as safety biomarkers (shaded gray). Orange dots represent estimates for the LUMC cohort, while blue dots represent estimates for the UF cohort. (CS corticosteroids, DMD Duchenne muscular dystrophy, LUMC Leiden University Medical Center, UF University of Florida, RFU relative fluorescence units, MMP3 matrix metalloproteinase-3, IGLL1 immunoglobulin lambda like polypeptide 1, RGMA repulsive guidance molecule A). Source data are provided as a Source Data file.

To assess whether these associations were potentially related to CS efficacy or safety effects, we analysed the direction of the association with age and CS. Proteins showing opposite trends (such as declining with age as disease progresses and increasing with CS treatment) were considered potential CS efficacy biomarkers, while those with concordant effects (such as decreasing with both age and CS treatment) were considered safety biomarkers (Fig. 2i). A total of 44 proteins were identified as efficacy biomarkers, including previously identified proteins such as angiopoietin-2 (ANGPT2) and proteins related to lipoprotein transport, as well as newly identified proteins such as ADP-ribosyltransferase 3 (ART3) and RGMA (Fig. 2h) (Supplementary Data 3). A total of 172 proteins were found as potential safety biomarkers, including previously identified MMP3, Afamin, IGFBP5, and proteins involved in SMAD signaling, as well as new proteins including IGF binding proteins and metalloproteinases (Supplementary Data 4). Pathway analysis of proteins associated with corticosteroids treatment was enriched with pathways related to cholesterol and lipids metabolism, as well as immunomodulation and inflammation (Supplementary Data 2).

Serum protein associations with longitudinal motor performance

To identify proteins associated with clinical severity, we next evaluated associations with performance on common functional assessments used to monitor motor function in DMD, including the North Star Ambulatory Assessment (NSAA), timed ten-meter walk/run test (10MRW), six minute walk test (6MWT), and Performance of Upper Limb 2.0 (PUL). Individual trajectories of performance on these tests are shown in Fig. 3a–d, and age and CS use were modeled as covariates in analyses. In the LUMC cohort 2,005 probes (1882 proteins) showed a significant association with at least one motor function assessment compared to 483 probes (454 proteins) in the UF cohort (FDR < 0.05). Of these, 318 probes (294 proteins) were shared across the two sites, with most associations found with lower limb functional tests (NSAA, 10MRW, and 6MWT). This is consistent with the larger number of data points and the larger degree of functional decline in the cohorts on these assessments compared to the PUL. A scatterplot of coefficients from each cohort shows concordant directionality (Fig. 3e–h).

Fig. 3: Associations of serum proteins with performance on tests of motor function in individuals with DMD.
Fig. 3: Associations of serum proteins with performance on tests of motor function in individuals with DMD.
Full size image

ad Longitudinal trajectory plots of NSAA scores, 10MRW velocity, 6MWT distance, and PUL 2.0 scores from both cohorts. Error bands represent 95% confidence intervals. eh Scatterplots of coefficients, indicating the direction of protein association with motor function test performance (subpanels) in the LUMC (x-axes) versus UF (y-axes) cohorts. Bolded data points represent significant associations with the number of significant associations listed at the top-left in each subpanel. i UpSetR plot of significant probes for test of motor function. Bar plots at the bottom-left show the number of probes associated with each individual motor function test, colored by site. j Trajectory plots showing the relationship between ART3 levels (y-axes) and NSAA score (left panels) or 6MWT distance (right panels) by age for the LUMC cohort (top panels) and the UF cohort (bottom panels). The color code (depicted on the top-right in each panel) represents NSAA scores and 6MWT distances. (CS corticosteroids, DMD Duchenne muscular dystrophy, LUMC Leiden University Medical Center, NSAA North Star Ambulatory Assessment, PUL Performance of the Upper Limb 2.0, UF University of Florida, RFU relative fluorescence units). Source data are provided as a Source Data file.

To assess the generalizability of models, we applied the models trained on each cohort to predict clinical tests results of the second cohort. Validation results showed highest reconstruction accuracies for the NSAA, 10MRW, and 6MWT with Q2 values comparable across both cohorts for the 10MRW. Models trained on the LUMC cohort performed better for the 6MWT, while models trained on the UF cohort were more accurate for the NSAA. In the LUMC cohort, 122 probes (116 proteins) significantly and accurately predicted all three scales used to monitor lower limb function (NSAA, 10MRW, 6MWT) compared to 109 probes (105 proteins) in the UF cohort (Fig. 3i). Notably, ART3, RGMA and dihydrolipoamide dehydrogenase (DLD) showed highly significant associations and accurate predictions across multiple motor function assessments (Fig. 3j). A list of all proteins with significant associations across functional assessments and cohorts, and whether they are previously reported15,21,23 or newly discovered, is provided in Supplementary Data 5. Pathway analysis showed that proteins associated with motor function were related to energy metabolism and skeletal muscle contraction (Supplementary Data 2).

Prediction of clinical milestones

Events such as loss of the ability to ambulate (LoA), loss of the ability to reach overhead (OHR), and loss of the ability to bring the hand to the mouth (HTM) are clinically relevant disease milestones in DMD. Survival curves for these milestone events are shown in Fig. 4a for each cohort. We sought to determine whether the proteins associated with the NSAA, 10MRW, and 6MWT, could predict LoA and whether proteins associated with PUL could predict OHR and HTM. We identified 41 probes (36 proteins) significantly associated with one or more milestones (Fig. 4b). Three probes, targeting RGMA, ectonucleotide pyrophosphatase/phosphodiesterase family member 5 (ENPP5), and regulator of G protein signaling 21 (RGS21), were associated with two different milestones (FDR < 0.05). RGMA was associated with milestones in both cohorts, while ENPP5 (LUMC) and RGS21 (UF) were cohort specific. Associations for delta-Like Non-Canonical Notch Ligand 1 (DLK1) and ART3 were confirmed by two independent probes: DLK1 was associated with loss of OHR in the UF cohort and ART3 with LOA in the LUMC cohort (Fig. 4c). The direction of loge hazard ratios (lnHR) remained consistent across probes and outcomes. Notably, proteins like RGMA and anthrax toxin receptor 2 (ANTXR2) had large negative lnHRs, while receptor-type tyrosine-protein phosphatase delta (PTPRD) and neural cell adhesion molecule (NCAM) had large positive lnHRs. Kaplan–Meier plots for RGMA across all three milestones are presented for both cohorts (Fig. 4d). Supplementary Fig. 1 shows trajectory plots for RGMA, DLK1, ART3, and ANTXR3, while Supplementary Data 6 lists all proteins significantly predicting lower and upper limb milestones.

Fig. 4: Associations of serum proteins with clinical milestones.
Fig. 4: Associations of serum proteins with clinical milestones.
Full size image

LoA: LUMC n = 74, UF n = 78; OHR LUMC n = 74, UF n = 76; HTM LUMC n = 74, UF n = 77. a Survival curves showing the probability (y-axis) of never achieving LoA, loss of OHR, and loss of HTM (solid/dashed/dotted lines) by age in each cohort. b UpSetR plot of probes significantly associated with each clinical milestone by the Cox model. Bar plots at the bottom-left show the number of probes associated with each individual milestones, color coded by site. c Forest plot showing lnHR (x-axes) of protein probes (y-axes) significantly associated (FDR < 0.05) with at least one clinical milestone. Points represent the estimated lnHR, error bars represent the 95% confidence intervals. Each panel shows the proteins associated with an individual milestone. d Survival curves for the 3 clinical milestones (columns) stratified by RGMA expression quartiles at first visit (colors), by years since first visit (x-axes). The first row represents curves for LUMC, the second row represents curves for UF. (HTM hand-to-mouth, LoA loss of ambulation, OHR overhead reach, UF University of Florida, LUMC Leiden University Medical Center). Source data are provided as a Source Data file.

From the many proteins associated with functional tests and disease milestones, we shortlisted those with the most consistent and strongest associations. The selection was based on significance across multiple clinical scales and milestones within and across cohorts, and based on associations detected by more than one aptamer (Table 2). Proteins ANTXR2, ART3, euchromatic histone lysine methyltransferase 2 (EHMT2) and RGMA were linked to the risk of reaching disease milestones, with risks increasing between 136 and 981% for every one-unit decrease in the ln-transformed, standardized expression. In contrast, CS treatment was associated with increased biomarker levels, corresponding to a reduction in risk up to 90%. The greatest risk reduction was observed for the loss of HTM milestone. For the LoA milestone, risk reductions ranged from 22 to 39% in all cases. Additionally, we estimated how monitoring these proteins in blood can help evaluate the yearly risk increase for reaching these milestones, with the increase in risk ranging from 43 to 215% annually, depending on the protein.

Table 2 Top protein candidates for prediction of clinical milestones

Discussion

Large-scale serum biomarker discovery is now possible in DMD with the availability of high-throughput proteomics platforms such as SomaScan®. In this retrospective study, we had a unique opportunity to combine robust and comprehensive clinical data from two large, independent international cohorts of individuals with DMD along with serum levels of 6628 proteins. The availability of extensive longitudinal serum samples and clinical data allowed us not only to analyse protein signatures associated with age and corticosteroid usage, but also to identify novel proteins that could predict clinical function and motor milestones. Proteins like RGMA were found to predict both lower and upper limb clinical milestones including LoA and loss of HTM. Furthermore, several proteins were associated with an increased risk of a specific clinical milestone such as ART3 for LoA. The ability to compare these findings across two independent cohorts demonstrated the generalizability of our results.

Previous serum biomarker studies in dystrophinopathies have enabled the identification of proteins able to discriminate between individuals with and without DMD as well as described proteins associated with age. For instance, it is well-established that serum creatine kinase (CK) protein levels and activity decrease with age, reflecting muscle damage and loss of muscle mass. A few studies have reported SomaScan® data in DMD, identifying additional age-related biomarkers such as LEP21, MYOM38, and complement C4-A23. Complementing these findings, we show how the decline over time of muscle proteins is combined with an increase in adipogenic markers including LEP, growth hormone receptor (GHR), and adiponectin (ADIPOQ), along with proteins involved in complement activation and inflammation. Such observations show how monitoring proteins in serum reflect the active substitution of muscle mass with adipose tissue. Future studies should aim to assess whether protein levels in serum are directly associated with magnetic resonance-derived muscle fat fraction.

Analysis of CS treatment showed MMP3 and IGLL1 as the proteins most consistently associated with CS treatment both in daily and intermittent dosed patients. Previous research has also identified elevated MMP3 in individuals with DMD treated with CS24, while IGLL1 is novel in relation to DMD. However, recent research has shown that weekend CS use in adults with limb girdle muscular dystrophy (LGMD) reduces IGLL1 levels in conjunction with increased MMP325. Given the role of IGLL1 in B cells, its reduction alongside CS treatment may reflect the immunosuppressive effects exerted by the drug. CS treatment had both normalizing and exacerbating effects on disease progression proteins. For instance, it exacerbated age related biomarkers, including previously reported MMP3, Afamin, and IGFBP58, as well as certain apolipoproteins such as APOA2, APOL1, and APOA5. In contrast, a normalization was observed for other apolipoproteins such as APOE4, APOC3, and APOE, suggesting that steroid treatment could on one hand normalize dyslipidemia (with effects on APOE4, APOC3, and APOE) and on another hand affect lipid metabolism and potentially cardiovascular health (with exacerbation of APOA2, APOL1, and APOA5 levels). The compensatory effect of APOE is also supported by the more severe phenotype observed in mdx ApoE double knockout mice26,27. Moreover, we found that ANGPT28, previously associated with disease progression, was elevated in those treated with CS compared to those untreated. Likewise, RGMA, DLK1, ANTXR2, and ART3, which decreased with disease progression in both UF and LUMC cohorts, were increased by treatment with CS.

A major strength of this study was the ability to identify proteins associated with clinical function. RGMA and ART3 were directly related to patients’ performance, as measured by outcomes assessing both upper (PUL) and lower limb (NSAA, 6MWT, 10MRV) function. RGMA, ANTXR2, EHMT2, DLK1, and ART3 had large negative lnHR when considering clinical milestones. To further illustrate this finding, we stratified the population by RGMA levels and found that lower levels of RGMA corresponded to earlier disease milestones (LoA and OHR).

RGMA is part of the repulsive guidance molecule family of glycoprotein-1 (GP1) anchor proteins, primarily expressed in the central nervous system and muscle tissue, according to the Human Protein Atlas gene expression data. Initially recognized for its role in neurogenesis, guiding axonal growth, and serving as a key target for neuronal survival, RGMA has since been implicated in myogenesis28,29. It is proposed to play a central role in regulating cellular hypertrophy and hyperplasia30. Furthermore, RGMA has been identified in association with several conditions, including spinal and bulbar muscular atrophy (SBMA)31, Parkinson’s disease, Alzheimer’s disease, multiple sclerosis, and cerebrovascular accidents, as well as in association with upper limb function measured by elbow flexion in DMD21,28. Importantly, Somascan cannot discriminate between RGMA and RGMB, so future analyses should aim to refine such associations using orthogonal validation analyses.

ANTXR2, also known as capillary morphogenesis protein 2 (CMG2) plays an important role in cellular interaction by binding collagen IV and laminin, suggesting involvement in extracellular matrix adhesion. It is expressed in various tissues, including muscles. Loss-of-function variants in ANTXR2 cause hyaline fibromatosis syndrome, and ANTXR2 knockout mice show collagen VI accumulation in the uterus32 suggesting a potential involvement in muscle homeostasis.

ART3, known as ADP-ribosyltransferase 3, is mainly expressed in skeletal muscle tissue, according to GTEx and Human Protein Atlas databases. Gene expression data from both human individuals and Chinese Meishan pigs showed that ART3 is primarily expressed in muscles rich in fast twitch fibers33,34, which are more susceptible to damage in DMD. ART3 has been shown to decrease in serum across multiple dystrophies and myopathies25 and in SBMA31. In SBMA, decreased RGMA, myostatin, and ART3 expression correlated with higher thigh MRI muscle fat fraction, akin to patterns observed in DMD31. Additionally, research in Wannanhua pigs suggests ART3 is involved in fat deposition in muscle35, further strengthening the biological rationale behind the association with DMD identified in this study. This is the first study to show the relationship between ART3’s and CS status and clinical function in DMD.

Finally, EHMT2, known as euchromatic histone lysine methyltransferase 2, and DLK1 known as delta homolog 1 as well as Pref-1 (preadipocyte factor 1) showed interesting associations. Both EHMT2 and DLK1 declined with age and were associated with disease milestones. EHMT2 was normalized by CS treatment. EHMT2 was previously associated with renal fibrosis36, atrial fibrosis37, cardiomyocytes hypertrophy38 and high fat diet induced obesity and hepatic insulin resistance39, all of which align with the ongoing pathogenic processes in DMD. The reduction of EHMT2 with age may reflect the diminishing magnitude of pathological processes as muscle mass is lost progressively. DLK1, a transmembrane protein involved in cell growth during development, is expressed at low levels in adults, primarily in endocrine tissues. It regulates the differentiation of multiple cell types, including adipocytes, and plays an important role in skeletal muscle biology during fetal development and postnatal growth40. Although the role of DLK1 in adult skeletal muscle regeneration is less clear, upregulated expression has previously been observed in DMD and Becker muscular dystrophy41. In contrast, reduced DLK1 expression in fibroadipogenic progenitors corresponded to increased adipogenic committment42. This reduction in DLK1 levels may be associated with increased adipogenic commitment in DMD.

One limitation of this study was the variability in cohort characteristics, due to differences in participant pools and standards of care. The LUMC samples consisted of participants seen clinically, who were generally younger and primarily treated with a 10 days on, 10 days off CS regimen. No data were available on whether a patient was in the “on” or “off” phase of treatment at the time of sample collection. In contrast, the UF cohort samples were from participants in a natural history research study, who were generally older and primarily on a daily CS regimen, with several individuals co-enrolled in investigational drug trials receiving either placebo or the investigated drug (11.6% of samples in the LUMC cohort and 29.2% of samples in the UF cohort). Daily treatment with steroid most likely led to the observed slower motor decline in the UF cohort as well as the later occurrence of disease milestones compared to the LUMC cohort (where intermittent treatment was mostly used). The smaller proportion of individuals reaching milestones and the slower motor decline in the UF cohort may explain why fewer probes were associated with clinical outcomes and milestones in that cohort. As these different CS dosing strategies coincided with cohort effects, we were unable to directly compare the impact of intermittent CS compared to daily CS. This comparison could potentially be explored in prospective research, such as the FOR-DMD clinical trial. Additionally, we did not correct for steroid type, as a substantial number of patients had switched between CS types during the studied time period. We also did not correct for CS dose, as we found that site-specific effects, likely driven by the variations in treatment between the two cohorts, had a more substantial impact on the data than dose itself. Lastly, due to the retrospective nature of this study, we had a number of missing data points. Despite these limitations, the relatively large patient population and number of samples enabled the identification and validation of multiple serum proteins significantly associated with function across both cohorts.

In conclusion, we identified proteins associated with clinically meaningful outcomes in individuals with DMD across two independent cohorts. RGMA, DLK1, ANTXR2, EHMT2 and ART3 emerged as potential prognostic biomarkers based on the strength of their associations with clinical milestones, significance of the findings across scales, and their biological plausibility in connection to disease processes. A serum biomarker panel that accurately detects these proteins could enable the connection of short-term changes to disease stabilization and a decreased risk of decline in the mid- to long-term.

These biomarkers could have significant potential for both clinical management and clinical trials. In clinical practice, they could act as prognostic tools, helping to predict individual disease trajectories, enabling earlier interventions, and improving patient monitoring. Additionally, they could support more personalized treatment strategies by identifying individuals at risk of reaching disease milestones, allowing for timely adjustments in care. In the context of clinical trials, these proteins could refine participant selection criteria, better stratify participants based on their likelihood of disease progression, and provide an alternative readout to monitor treatment effects. These findings open the door for serum biomarkers to play a critical role in both clinical care and research in DMD.

Methods

Study cohort, design, and outcomes

This was a retrospective, multicenter, cohort study including serum samples and clinical data collected from individuals with DMD participating in research protocols at the Leiden University Medical Center (LUMC) and at the University of Florida (UF) between 2009 and 2022. We included 407 serum samples from 74 individuals aged 4–24 years at LUMC and 295 serum samples from 79 individuals aged 5–22 years at UF. Samples were included based on availability. At LUMC, blood samples were collected during annual outpatient clinic visits as part of routine clinical care, though the exact timing of blood collection was not standardized. At UF, blood sample collection was an optional addition for research participants enrolled in the ImagingDMD natural history study (NCT01484678), which included muscle magnetic resonance imaging and functional data collection. Blood samples were typically obtained at the conclusion of these visits. Written informed consent was obtained from all participants or their caregivers as described in protocol B22.013 at LUMC and IRB201500981 and IRB201700056 at UF, which were approved by the respective regulatory boards at both sites.

Clinical data were obtained at the same clinic or research visit as serum sample collection. Data included age at sample collection, CS use at the time of sample collection, and performance on tests of function. CS information was categorized by use (treated or untreated), type (deflazacort, prednisone, or other), and regimen (daily or intermittent – defined as 10 days on/10 days off or weekend dosing). Motor function tests included the North Star Ambulatory Assessment (NSAA), 10-m run/walk (10MRW) velocity, 6 min walk test (6MWT), and Performance of the Upper Limb 2.0 (PUL)43,44,45,46. Three disease milestones were recorded: age at loss of ambulation (LoA), age at loss of overhead reach (OHR), and age at loss of hand to mouth (HTM). LoA was defined as patient-reported inability to walk 5 m unaided at home in the LUMC cohort and inability to walk 10 m unaided within 45 s in the UF cohort. Age at loss of OHR and HTM were primarily derived from PUL scores and occasionally from patient-reported data. Cohort characteristics were described using mean and standard deviation (SD). Kaplan–Meier analysis was used to describe age at LoA, OHR, and HTM.

Sample collection and proteomic analysis

Serum samples were collected according to standard phlebotomy procedures, left to clot for ~30 min, and centrifuged (2350 g for 10 min at LUMC and 1000 g for 15 min at UF). Samples were then ultimately frozen at −80 °C for long term store. 150 µL aliquots were simultaneously shipped from both centers to SomaLogic (Boulder, Colorado, USA) and analysed using the SomaScan® proteomic platform as a single batch to avoid batch effects. SomaScan is a high-throughput proteomics platform that uses SOMAmers (Slow Off-rate Modified Aptamers), which are biotinylated DNA molecules designed to bind specifically to target proteins47. The SOMAmers were detected on a microarray, with fluorescence intensity (reported as relative fluorescence units- RFUs) correlating to the quantity of target protein present. This allows for the simultaneous analysis of thousands of proteins with high sensitivity and precision. The 7 K SomaScan® platform used in this study included 7596 aptamers that detect 6628 proteins. SomaLogic’s extensive quality control metrics have been previously published48 and include pre-assay quality control, hybridization normalization, interpolate median signal normalization, plate scaling, calibration, and adaptive normalization. Finally, pooled matrix-matched samples are run alongside clinical samples to quantify the quality of each assay. A total of 9 samples (2 LUMC/7 UF) did not pass SomaLogic’s quality control standards and were excluded from further analysis, results, and data tables.

Identification of protein probes associated with motor function tests

As the collected data were longitudinal, linear mixed effects models (LMEMs) were constructed using the pymer4 package, version 0.7.8, a Python interface for the lme4 R package, version 1.1-3149,50. For each site and probe, as a baseline LMEM, the relationship between probe expression values and age were modeled, grouping samples by patient. Probe expression values were loge-transformed and standardized, and ages were standardized. The strength and direction of the associations were represented by the fixed effects coefficients, with their significance indicated by the p value. Coefficients represent the change in transformed protein levels per unit change in standardized age. Next, to determine the effects of CS use, an LMEM was conducted with CS statuses (yes/no) as an additional covariate. Coefficients for CS use represent the change in transformed protein levels with CS use.

For each site, motor function test, and protein, an LMEM was then used to predict scores from loge-transformed, standardized probe expression values, again grouping samples by patient. Standardized ages and CS statuses were included. Coefficients represent the change in clinical scale per unit change in transformed protein levels. To determine the generalizability of this model, the resulting LMEM was used to predict scores of that motor function tests for the other site. That is, for LUMC models, UF-predicted values were generated using UF data, and for UF models, LUMC-predicted values were generated using LUMC data. Predicted values were compared against original values via reconstruction accuracy:

$${{Q}}^{2}=1-\frac{\sum {\left({{\rm{predicted}}}\, {{\rm{values}}}-{{\rm{original}}} \,{{\rm{values}}}\right)}^{2}}{\sum {\left({{\rm{predicted}}}\, {{\rm{values}}}\right)}^{2}}$$
(1)

Significant probes whose validation Q2 values were within the top 5% of Q2 values (i.e., at or above the 95th percentile) were retained to predict clinical milestones. To account for multiple hypothesis testing, Benjamini-Hochberg false discovery rates (FDRs) were computed in all analyses51. FDR < 0.05 was considered to be significant.

Identification of protein probes associated with clinical milestones

For each site and clinical milestone, candidate probes for downstream analyses were selected as follows: for LoA, probes associated with NSAA, 10MRV, and/or 6MWT were considered. For loss of HTM and OHR, probes associated with PUL 2.0 were considered. Then, for each candidate probe, a Cox proportional hazards model was constructed, using the lifelines Python package, version 0.2752, to predict that milestone using loge-transformed, standardized probe expression values and CS statuses as covariates. Age was used as the time component and samples were grouped by patient.

P values were computed using log-rank tests. All p values underwent FDR adjustments to account for multiple hypothesis testing. FDR < 0.05 was considered to be significant.

Changes in risk of achieving milestones

To compute changes in risk of achieving milestones (Table 2), for a single probe, site, and milestone, the loge hazard ratio (lnHR), ln H, was extracted from the Cox proportional model above. A coefficient for CS status, βCS, and a coefficient for age, βage, were obtained from the associated LMEM predicting probe expression from CS status and age. Changes in risk were computed as follows:

$${\Delta }_{{\mbox{protein}}}=\left({e}^{-{{\rm{ln}}}H}-1\right)\cdot 10\%$$
(2)
$${\Delta }_{{\mbox{CS}}}=\left(1-{e}\,^{{\beta }_{{\mbox{CS}}}\cdot {{\rm{ln}}}H}\right)\cdot 100\%$$
(3)
$${\,\Delta }_{{\mbox{age}}}=\left(1-{e}\,^{{\beta }_{{\mbox{age}}}\cdot {{\rm{ln}}}H}\right)\cdot 100\%$$
(4)

Pathway analysis

Pathway analysis was performed using the enrichR package in R53.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.