Introduction

Complex chronic illnesses pose diagnosis, treatment and research challenges as affected individuals often harbour multiple comorbidities, a concern reflected in Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS), a disabling and persistent multi-systemic illness1. ME/CFS individuals exhibit a medley of symptoms including profound, unexplained fatigue, flu-like symptoms without the presence of an active viral pathogen, post-exertional malaise, unrefreshing sleep, muscle weakness or pain, vasomotor instability, sensory and cognitive problems1. Symptoms share similarities to frequently reported comorbid conditions, including but not limited to irritable bowel syndrome (IBS), postural orthostatic tachycardia syndrome (POTS), fibromyalgia (FM), allergies, migraine and depression2,3. It is often difficult to recognise whether these comorbidities precede disease onset or develop consequentially from the debilitating nature of the condition3, preventing or prolonging the time to receive a diagnosis.

Disease heterogeneity presents a paradoxical problem for case-control study designs as it is difficult to capture all symptom combinations in a single cohort, and furthermore, individual variation may dilute weak signals that cannot be statistically detected. Current ME/CFS studies employ tailored inclusion/exclusion criteria for both case and control groups to mitigate individual heterogeneity, which often results in limited sample sizes. Moreover, various ME/CFS diagnostic criteria1,2,3,4 are used for participant recruitment producing inconsistent results across studies. The lack of powered studies and result validation has made research into reproducible biomarkers a priority, in particular, the application of metabolomics5.

Observed at the intersection between genes and the environment, metabolomics provides a comprehensive snapshot of the small molecule landscape of an organism, making it an ideal technique to investigate internal biological perturbations caused by external factors such as lifestyle, diet, stress and disease. While ME/CFS metabolomics studies have produced an extensive list of potential plasma and serum biomarkers across various domains, including energy metabolism6,7,8,9,10, amino acid metabolism6,8,11,12,13,14,15, lipid metabolism9,10,13,15,16, urea cycle7,11 and oxidative stress6, a conclusive biomarker panel is yet to be verified.

In this study, we leverage UK Biobank (UKB) resources to investigate ME/CFS pathophysiology by drawing comparisons to comorbid conditions and other well-characterised diseases. Control groups in traditional case-control studies often represent a ‘healthy’ cohort. However, they underestimate the impact of comorbid illnesses in both ME/CFS and non-ME/CFS populations. Therefore, by using a large heterogenous ME/CFS cohort and various homogenous negative and positive control groups based on common comorbidities of ME/CFS, we sought to (1) identify discriminatory and shared blood metabolomic biomarkers for ME/CFS and comorbid conditions, (2) distinguish ME/CFS and individuals with overlapping comorbid conditions using machine learning and (3) characterise the altered biological pathways that underlie the ME/CFS nuclear magnetic resonance (NMR) metabolomics profile.

Methods

Study population

Details of the UKB (https://www.ukbiobank.ac.uk/) study design have been previously described17. Briefly, the UKB recruited over 500,000 participants, aged 39–70 years old to attend one of the 22 assessment centres across the UK between 2006 and 2010 on a volunteer basis. All participants provided written informed consent. The UKB has ethics approval from the Northwest Multi-centre Research Ethics Committee as a Research Tissue Bank, allowing researchers to operate under a unified ethical framework upon successful access application. This study was approved under UKB Project #79568, covering data access and use for secondary research as presented.

Data collection

At the assessment centres, participants completed a baseline assessment, including a touchscreen questionnaire and face-to-face interview, body composition and functional measurements and the collection of non-fasting plasma, urine and saliva17. Participants were invited back on three separate occasions after the initial visit: a repeated assessment visit (2012–13), which collected similar data as the baseline visit, an imaging visit which initiated the UKB multi-modal imaging enhancement study on the brain, heart, bones and abdomen (2014–) and the first repeat imaging visit (2019–)18. Data are continuously being collected, returned and released in tranches to bona fide researchers and participants who request to be withdrawn are removed.

Cohort definitions

Disease labels were self-reported in the verbal interview, which occurred at both initial and repeat assessment visits. The interviewer asked the participant to list past or current serious illnesses or disabilities that had been informed to them by a doctor, and the response verified with a trained nurse (Data field: 20002). All analyses were based on data and samples collected at the baseline assessment. The study population included a heterogenous ME/CFS cohort, seven homogenous comorbid cohorts (hypertension, depression, asthma, IBS, hay fever, hypothyroidism and migraine) and a non-diseased or ‘healthy’ cohort (C2) (Supplementary Table 1). A heterogenous cohort was defined as the presentation of multiple, and different medical conditions and homogenous refers to the existence of one single condition.

Metabolic biomarker profiling

The metabolic biomarkers were quantitated using high-throughput NMR with a protocol previously detailed19. Quality control was measured by using blind duplicates and internal control samples. Absolute concentrations of 168 biomarkers were provided (107 non-derivable biomarkers and 61 composite biomarkers), along with 81 biomarker ratios (inclusive of percentages) (Supplementary Data 1). The non-derivable biomarkers include apolipoproteins, albumin, lipoprotein subclasses, glycoprotein, lipids, fatty acids and low molecular weight metabolites (LMWM) such as amino acids, ketone bodies and glycolysis-related metabolites.

Data pre-processing

All biomarker values and baseline characteristics were processed prior to data analyses. Firstly, technical variation was removed from the biomarker features20, followed by outliers defined by 4\(\times\)IQR ± median for biomarkers21 and 5\(\times\)SD ± mean for baseline characteristics22. Ranked categorical variables such as ‘frequency of tiredness/lethargy in the last 2 weeks’ were encoded in an ordinal manner, whereas non-ranked variables such as ‘sex’ were encoded using the dummy (or one-hot encoding) method. Data field 6145 (illness, injury, bereavement, stress in last 2 years) allowed for multiple selections of various events and was binary encoded as Yes/No under a new feature called ‘Previous stressful event(s)’. Missing values were imputed using the median for continuous variables and 0 for categorical variables. Finally, biomarker data and continuous baseline characteristics were scaled by unit variance21. The dataset used to train the machine learning model was processed using the R package caret (v6.0.93)23, which followed a similar processing workflow, except column-wise operations were performed after partitioning the data into 80% train and 20% test sets. The test set was processed separately, using the same parameters as the training data to prevent data leakage.

Statistical analyses

For descriptive statistics, continuous variables were summarised and presented using their respective median and counts for categorical variables were shown as percentages. Mann-Whitney U test and Chi-square test of independence with Yate’s continuity correction were performed on unscaled continuous and categorical variables, respectively, to determine significant differences between the two groups. Since unequal variance can inflate false positives when applying Mann-Whitney U, stringent post-hoc Bonferroni adjustments were applied24. Raw p values are presented throughout the article, with the significance threshold indicated in the main text or legend.

Biomarker associations and multiple testing correction

Logistics regression was used to estimate the odds ratio for biomarker associations with ME/CFS and comorbid cohorts against the C2 cohort. Odds ratios were adjusted for sex, age, cholesterol-lowering medication and fish oil supplements. We applied a Bonferroni threshold of P < 0.05/249 when identifying significant associations in ME/CFS only (accounting for the total number of biomarkers) and a less stringent Bonferroni adjustment of P < 0.05/8 (accounting for the number of medical conditions tested) when detecting overlapping and unique associations in multiple conditions with different sample sizes.

Variance decomposition of baseline characteristics on biomarkers

Linear regression was performed first to identify the association between the baseline characteristics and biomarkers adjusting for sex, age, cholesterol-lowering medication and fish oil supplements, and secondly, to determine the variance in biomarker measurements explained by each baseline characteristic, variance decomposition was performed in ME/CFS cohort only and for the entire study population, implemented in the R package variancePartition (v1.24.1)25.

Sampling strategies

Class imbalance is a problem in machine learning as it can bias prediction to the majority class26. Different oversampling and undersampling strategies that were applicable for both continuous and categorical data were employed to generate nine different training sets (Supplementary Table 2). The first training set considered the original class distribution: 962 participants in the minority class (ME/CFS) and 66,545 participants in the majority class (rest of study population), exhibiting a 1:70 ratio. Four additional training sets were constructed with random undersampling of the majority class at different ratios: 1:1, 1:2, 1:4 and 1:20. A training set combining bootstrapping and random undersampling was also constructed27, with 20,000 participants in each class to match the number of participants in the majority class in the test set. Two algorithmic class imbalance strategies were also employed: cluster-based undersampling using k-prototypes28 and Synthetic Minority Oversampling Technique for Nominal and Continuous (SMOTE-NC)29, implemented in the R package RSBID v0.0.2.0. The final training set used a combination of SMOTE-NC and cluster-based undersampling. The test set remained at the original class distribution.

Feature selection

Four feature sets were curated comprising of (1) all baseline characteristics and biomarkers (319 features); (2) all biomarkers (249 features); (3) baseline characteristics and biomarkers significantly associated with ME/CFS at trait-wise Bonferroni threshold (242 features) and (4) biomarkers significantly associated with ME/CFS at trait-wise Bonferroni threshold (197 features). Two feature selection methods were employed: least absolute shrinkage and selection operator (LASSO) feature selection30 and forward feature selection31 to remove irrelevant or correlated variables32, implemented in R package glmnet (v4.1.4)33 and using the scikit-learn library, respectively.

Generating ME/CFS score with machine learning

Initial models were trained using penalised logistics regression with LASSO feature selection33 and 10-fold cross-validation on all feature and training sets. An additional model was trained using the original class distribution training set with class weights, which is another method to address class imbalance by assigning greater importance to the minority class. Subsequently, models meeting a recall (>0.7) and AUC (>0.8) performance criteria were retrained using forward feature selection coupled with various machine learning algorithms including adaptive boosting, random forest, extreme gradient boosting, explainable boosted machine and light gradient boosted machine (LightGBM), optimised for both AUC and recall34. The predictive performance of all models was evaluated using an independent test set35. Finally, logistics regression was employed to determine effect sizes for features from models that met the performance criteria and ME/CFS scores were computed for the entire study population using a weighted sum denoted by Eq. (1). A detailed explanation of the machine learning algorithms can be found in the Supplementary Methods and a summary of the workflow is provided in Supplementary Fig. 1.

$${ME}/{CFS\; score}={\beta }_{1}{X}_{1}+{\beta }_{2}{X}_{2}+\ldots +{\beta }_{n}{X}_{n}$$
(1)

where Xn is the unit variance scaled feature, ßn is the effect size of the nth feature.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Results

Study design and ME/CFS comorbidities

There were 274,353 participants with baseline NMR biomarker profiles in the UKB. Participants with self-reported ME/CFS comprised 0.4% of the original UKB cohort (2161 in 502,359 participants), which included 1194 ME/CFS participants with NMR metabolomics biomarker data36. The biomarker dataset comprised 249 measurements, including lipoprotein subclasses, lipids, fatty acids, and LMWM (Supplementary Data 1).

We found that 83% of the ME/CFS cohort presented with multiple illnesses, reporting 272 different comorbid conditions (Supplementary Fig. 2 and Supplementary Data 2). Comorbid conditions reported at a frequency >5% in ME/CFS were compared to the C1 cohort, defined as a non-ME/CFS population inclusive of patients with disease (but excluding ME/CFS) and healthy participants (Table 1), to identify comorbid conditions that were more prevalent in ME/CFS. Depression, asthma, IBS, hypothyroidism, hay fever and migraine were significantly increased in ME/CFS (Bonferroni threshold: P < 4.54\(\times\)10−3) and were chosen as positive control groups. Unclassifiable, while increased, was excluded, due to the redundancy of comparing to an unknown cohort. Osteoarthritis was excluded as it is an age-related condition, and the UKB generally represents an older cohort, potentially producing confounding comparative associations. Furthermore, gastro-oesophageal reflux was excluded based on IBS already representing a comorbid condition in the gastrointestinal/abdominal classification. Although hypertension was not significantly different in ME/CFS and non-ME/CFS, it was included as a comorbid cohort because of the cardiovascular nature of this dataset. Ultimately, seven homogenous comorbid cohorts with participants who had only reported that single condition, and a non-diseased control group denoted as C2 were established as respective positive and negative control groups (Supplementary Table 1).

Table 1 Percentage of self-reported medical conditions in ME/CFS and C1 cohorts

Baseline characteristics

Demographics, clinical measurements, anthropometry, lifestyle, and symptoms are summarised in Table 2 for the study population, with a detailed breakdown provided in Supplementary Data 3. Standardised ME/CFS symptom and severity questionnaires including SF-36, and MFI-20 were unavailable for this study. Cognitive function, experience of pain, mental health, and digestive health online follow-ups were performed years after the donation of the biological samples. Therefore, only baseline characteristics collected at the initial assessment centre visit were incorporated into our analyses.

Table 2 Demographics and baseline characteristics of study population

The ME/CFS cohort comprised 74% females, consistent with a 3:1 female-to-male ratio previously reported37. The preponderance of females was also observed in depression, IBS, hypothyroidism, and migraine cohorts (67%, 67%, 87% and 77%, respectively). Overall, the ME/CFS cohort exhibited significantly different physical measurements (Bonferroni threshold accounting for the nine groups tested against ME/CFS: P < 5.56\(\times\)10−3). ME/CFS had lower hand-grip strength, an indicator of muscle fatigue38, compared to all cohorts except for hypothyroidism. Basal metabolic rate is the energy expenditure at rest39 and was lower in ME/CFS except when compared to IBS (showed no significant difference) and hypothyroidism and migraine cohorts (which were both lower than ME/CFS). Pulse rate was elevated in ME/CFS.

The UKB touchscreen questionnaire data showed that the ME/CFS cohort had a significantly higher proportion of participants reporting tiredness/lethargy (89%) and overall pain (81%), except for migraine (84%), attributed to 75% of the cohort experiencing headaches (Supplementary Data 3). All reported pain types were higher in ME/CFS, except stomach/abdominal pain, which was higher in IBS (37%) than in ME/CFS (18%). Compared to the depression cohort, ME/CFS patients reported significantly lower frequencies of depressed mood (39% in ME/CFS and 54% in depression) and mood swings (56% in ME/CFS and 78% in depression). There was no significant difference in mood swings between ME/CFS and IBS. Additionally, the distribution of the International Physical Activity Questionnaire (IPAQ) activity groups was different between all the cohorts. Individuals scoring in the low IPAQ group performed less than 600 MET minutes per week; energy expenditure was calculated by activity intensity and duration40.

ME/CFS metabolomic profile

Metabolomic-wide association studies were performed to determine individual biomarker effects for each condition. There were 168 biomarkers associated with ME/CFS at P < 2.01\(\times\)10−4, with Bonferroni threshold accounting for the number of total metabolites (Supplementary Data 4). The associations were spread across four biomarker types: 66 non-derived biomarkers with absolute concentrations, 37 composite biomarkers derived from the sum of two or more non-derived biomarkers, 42 relative biomarkers presented as percentages and 3 biologically relevant ratios41. Most of these associations were lipid compositions of triglyceride (TG), cholesteryl esters (CE), free cholesterol (FC), phospholipids (PL), total cholesterol (C), total lipids (L) in different lipoprotein subclasses (VLDL, IDL, LDL, HDL) and sizes (XS-XXL, average particle diameters are provided in Supplementary Data 1), and lipoprotein particle concentrations (P).

Figure 1 shows the associations of the non-derived lipoprotein measurements, lipids and LMWM in ME/CFS. The strongest biomarker association was total triglyceride to phosphoglyceride ratio (TG/PG), where a one standard deviation (SD) increase in the biomarker measurement was associated with 46% higher odds of having ME/CFS compared to the odds of not having ME/CFS (odds ratio (OR): 1.46, 95% confidence interval (CI): 1.38–1.56, P = 3.95\(\times\)10−33). VLDL size had the largest effect size among non-derived biomarkers (OR: 1.41, 95% CI: 1.32–1.50, P = 1.26\(\times\)10−24), followed by glycoprotein acetyls (GlycA), an NMR marker for systemic inflammation42 (OR: 1.39, 95% CI: 1.31–1.47, P = 2.56\(\times\)10−28). Conversely, HDL-CE exhibited the strongest inverse association where a 1-SD increase in the biomarker measurement decreased the odds of ME/CFS by 35% (OR: 0.65, 95% CI: 0.61–0.70, P = 4.96\(\times\)10−32).

Fig. 1: ME/CFS display strong individual biomarker associations with lipoproteins, surface lipids and inflammatory markers.
figure 1

Odds ratios and 95% confidence intervals are shown per 1-SD increment for non-derived lipoprotein concentrations, surface lipids, fatty acids (and ratios), and low molecular weight metabolites. Models were adjusted for sex, age, cholesterol-lowering medication, and fish oil use. An odds ratio >1 represents a positive association shown in red, and an odds ratio <1 represents an inverse association shown in blue. Asterisks indicate significant associations at P < 2.01\(\times\)10−4 (Bonferroni cutoff for the total number of metabolites tested). See Supplementary Fig. 3 for composite and relative (%) associations.

ME/CFS lipoprotein associations included increased VLDL particle concentration and consequently all VLDL lipid components (FC, CE, PL and TG)43, triglycerides in all lipoprotein subclasses except for L-HDL-TG, ApoB and ApoB/ApoA1 ratio and inverse associations with HDL particle concentrations and ApoA1. Decreased levels of sphingomyelins, phosphatidylcholines and total cholines, higher levels of alanine, valine and glucose and total fatty acids were also associated with ME/CFS.

The median disease duration, between reported ME/CFS onset and the blood sample donation day at the first assessment centre visit was 11.6 years. To assess potential changes in biomarker concentrations overtime, we performed another round of association tests on 181 ME/CFS participants with a disease duration of <2 years (Supplementary Fig. 4). Six biomarkers out of the 168 significant associations remained significant and exhibited greater effects (HDL-C, HDL-CE, M-HDL-C, M-HDL-CE, TG by PG and XL-HDL-FC %). The lack of significant associations was most likely attributed to the reduced sample size (15% of the full ME/CFS cohort), however, the OR estimates remained comparable with the full cohort, especially for lipoproteins, ApoA1 and alanine. We also quantified the minimum strength of association required for any unmeasured confounders that could explain away their association, represented as E-values, determined from a sensitivity analysis44. In general, a larger E-value implies that an unknown confounder of considerable strength is required to weaken the biomarker association. For example, the observed odds ratio of 0.65 for HDL-CE could be deemed negligible in the presence of another confounder that was associated with HDL-CE or ME/CFS by an odds ratio of 2.43 (the E-value). While some E-values were not particularly large, they were all greater than their respective ORs. We present corresponding E-values in Supplementary Data 5 for objective evaluations of the biomarker associations identified from this retrospective study.

Different lipid profiles observed in female and male ME/CFS participants

Association testing was also performed with males and females separated (Supplementary Data 6 and 7, Supplementary Fig. 5). In the ME/CFS female population, 62 different biomarker associations were identified, 14 in ME/CFS males and 94 were observed in both genders at P < 2.01\(\times\)10−4 (all significant biomarkers across the three ME/CFS groups were associated in the same direction). There were seven biomarkers that were not initially associated in the whole ME/CFS cohort but found in females only (polyunsaturated fatty acids (PUFA), linoleic acid, M-VLDL-C, L-LDL-P, M-LDL-L, M-LDL-PL, and L-VLDL-TG %), and four additional biomarkers identified in males only (IDL-C, IDL-CE, S-HDL-P and S-LDL-C%).

ME/CFS biomarker associations are highly pleiotropic

There were 234 pleiotropic biomarkers (those associated with two or more conditions), contributing to a total of 942 associations at P < 6.25\(\times\)10−3 with trait-wise Bonferroni threshold to account for varying sample sizes (Fig. 2, Supplementary Fig. 6). Only XXL-VLDL-TG % was uniquely associated with ME/CFS (Supplementary Fig. 6), with the remaining 196 associations also present in other conditions. Hypertension associations exhibited 81% similarity with ME/CFS associations, depression (85%), asthma (73%), IBS (97%), hay fever (46%), hypothyroidism (88%) and migraine (89%).

Fig. 2: Overlapping associations reveal distinct trends for different biomarker groups in ME/CFS and comorbid conditions.
figure 2

Forest plots compare the odds ratios and 95% confidence intervals shown per 1-SD increment for a selection of biomarkers in ME/CFS (red), hypertension (light blue), depression (lavender), asthma (brown), IBS (green), hay fever (salmon) hypothyroidism (blue) and migraine (purple). All models were adjusted for gender, age, cholesterol lowering medication, and fish oil supplements. Filled points indicate statistically significant associations at P < 6.25\(\times\)10−3 (Bonferroni threshold with trait-wise adjustment), and hollow points are non-significant. See Supplementary Fig. 6 for forest plots of the complete biomarker set.

Twenty-nine additional significant ME/CFS associations were identified at the trait-wise threshold, including total branched chained amino acids (BCAA) and inverse associations with citrate, acetate, and acetone. We observed the same biomarker association directions in ME/CFS, hypothyroidism and migraine for HDL-P, HDL-PL, ApoA1, sphingomyelins, phosphoglyceride, phosphatidylcholines, and total cholines, while depression displayed opposite effects for the latter three biomarkers (Supplementary Data 8). Similarly, associations with Total-L, driven by LDL-L and LDL-PL, were unique to depression45. Associations observed in hypothyroidism that were not present in ME/CFS included creatinine and albumin (inverse), and the migraine cohort showed stronger inverse associations in the ketone bodies panel and positive associations for glutamine and glycine. IBS did not have any discriminatory LMWM and mostly had overlapping associations for relative measurements of FC and CE in the VLDL subclass.

Addressing comorbidities within ME/CFS

To thoroughly investigate the impact of comorbid conditions in ME/CFS requires stratifying the cohort into groups of isolated condition combinations, which can substantially reduce the sample size and the statistical power. For example, there were 211 ME/CFS individuals with a combination of depression and other comorbid conditions, and 24 individuals with depression only. We recognise that the other 265 comorbid conditions not analysed in this study may influence the biomarker associations. Therefore, we created another cohort with 354 ME/CFS individuals with or without hypertension, depression, asthma, IBS, hay fever, hypothyroidism, or migraine and performed association tests (Supplementary Fig. 7) and sensitivity analysis for this subset (Supplementary Data 9). Thirty-one of the initial 168 ME/CFS biomarker associations remained significant (P < 2.01\(\times\)10−4). SFA% and omega-3 were the only significant associations that produced greater odds ratio in the subset than the full cohort. The lower odds ratios observed may be attributed to the reduced number of comorbid conditions reported by each individual, rather than the specific condition. The average number of comorbid conditions was 3.0 for the full cohort and 0.6 for the subset. This suggests that the burden of having several comorbid conditions might exacerbate ME/CFS symptoms (inclusive of symptoms from common comorbid conditions), reflecting a higher disease severity, leading to more pronounced biomarker signals in the full cohort.

Clinical predictors attributable to biomarker variation

We investigated the relationship between the NMR metabolomic biomarkers and baseline characteristics to identify risk factors and routine clinical markers that may be potential modifiable targets for treatment or management22. The maximum amount of variation explained by 61 baseline characteristics (Supplementary Fig. 8 and Supplementary Data 10) on 249 biomarkers was identified (Supplementary Data 11), and the top six most explainable biomarkers in ME/CFS are shown in Fig. 3. The largest drivers of biomarker variation were mostly consistent with established biological mechanisms including inflammation (C-reactive protein explaining 20.5% of biomarker variance; neutrophil count, 7.1%) via GlycA, kidney function (urate, 22.3%; cystatin C, 21.3%) via plasma creatinine, testosterone (16.2%) via plasma creatinine and serum urea (15.3%) via valine. These traditional blood biochemistry measurements similarly explained the same set of biomarkers in the entire study population (Supplementary Figs. 9 and 10 and Supplementary Data 12). We highlight the different contributions of white blood cell (WBC) leucocyte count and insulin-like growth factor 1 (IGF-1) between the two sample groups as potential antecedent-biomarker pairs that are worth exploring in ME/CFS. In the study population, WBC leucocyte count and IGF-1 explained 7.8% and 1.8% of the variation in GlycA and phosphatidylcholine, respectively. In the ME/CFS cohort, these factors contributed to 9.3% of the variation observed in the PUFA/MUFA (monounsaturated fatty acids) ratio and 4.9% in PUFA % (Supplementary Data 12).

Fig. 3: Variance decomposition of baseline characteristics for the top six most explainable NMR metabolomics biomarkers in ME/CFS cohort.
figure 3

Variance decomposition results show the amount of variance explained by baseline characteristics significantly associated to ME/CFS (P < 7.35\(\times\)10−4 Bonferroni cutoff adjusting for 68 baseline characteristics tested) on six NMR metabolomics biomarkers. Solid colours indicate positive associations of baseline characteristics with biomarker levels and patterned shading indicates inverse associations. The far-right column shows the maximum amount of explained variance for any biomarker by each baseline characteristic. The numbers on the y-axis on the right correspond to (1) XL-HDL-FC%, (2) PUFA/MUFA, (3) Creatinine, (4) GlycA, (5) XS-VLDL-PL %, (6) HDL-CE, (7) M-HDL-FC %, (8) S-HDL-CE, (9) ApoB/ApoA1, (10), HDL-P, (11) XXL-VLDL-C %, (12) S-HDL-C %, (13) XXL-VLDL-TG, (14) IDL-FC, (15) L-HDL-PL %, (16) Unsaturation, (17) Omega-3 %, (18) Lactate, (19) L-HDL-C %, (20) XS-VLDL-PL, (21) XS-VLDL-CE %, (22) PUFA %, (23) S-HDL-PL %, (24) Valine. Baseline characteristics are coloured to indicate physical measurements (red), lifestyle and environmental factors (light blue), health and medical history (indigo), psychological factors (green), blood count (orange) and blood biochemistry (grey).

In ME/CFS, GlycA variation was also explained by six additional baseline characteristics that were not directly linked to inflammation (Fig. 3): pulse rate, high IPAQ group, frequency of tiredness/lethargy, neck/shoulder pain, sleeplessness/insomnia and alkaline phosphatase. Additionally, lifestyle and environment, symptoms (health and medical history) and psychological factors did not contribute to large amounts of biomarker variation, with a 1.3% average explained variance.

Building an ME/CFS score with machine learning

The ability to comprehensively quantitate metabolites in a single run is one of the advantages of using NMR for metabolomics5, conveniently allowing for the combining of biomarkers to generate a multi-variable disease score through machine learning46,47. We implemented a two-stage model training and selection workflow (Supplementary Fig. 1). In the first stage, we found that penalised logistics regression with LASSO models considering both biomarker data and baseline characteristics had outperformed biomarker features only (Supplementary Fig. 11). The different training and feature sets (“Methods”) from models that achieved a performance criteria of recall >0.7 and area under the receiver operator characteristic curve (AUC) >0.8, were retrained in the second stage using forward feature selection coupled with adaptive boosting, random forest, extreme gradient boosting, explainable boosted machine, and LightGBM (Supplementary Methods). The final twelve models meeting the performance criteria (Supplementary Data 13) selected from both stages each had an even class distribution, obtained either by using class weights, random undersampling or bootstrapping (“Methods”). Across 5- and 10-fold cross-validation, these models achieved performance up to an AUC of 0.89 and recall (i.e. sensitivity) of 0.77, comparable to performance on the independent blind test set, providing confidence in the generalisability and robustness of the final models. Subsequently, an ME/CFS score was derived using a weighted sum of the important features from each model, with weights determined by logistics regression (Supplementary Fig. 12).

Models that employed LASSO feature selection had substantially more features (54–253 features) compared to forward feature selection models (6–28 features) (Supplementary Data 13). An ideal predictive model should achieve a balance between strong performance metrics and a concise set of features32. In this regard, the LightGBM model48 was chosen as the optimal model, selecting 19 baseline characteristics and nine NMR biomarkers (Supplementary Data 14), and achieving an AUC of 0.83, and a recall of 0.70 on the blind test set. Furthermore, the LightGBM score yielded an OR of 3.61, CI: 3.45–3.78, P\(\, \approx \,\)0 (Fig. 5c), which is ~2.5 times more strongly associated to ME/CFS than the top individual biomarker, TG/PG. While other forward feature selection models had slightly better performance metrics (Supplementary Data 13), models with a combination of baseline characteristics and biomarker features were preferred over baseline characteristics only as to reduce the possibility of selecting too many subjective features. Additionally, scores that exhibited inverse, non-significant or weaker associations with comorbid groups were also prioritised in the model selection process, in which the LightGBM score demonstrated with hypertension, asthma and hayfever (Supplementary Fig. 12).

Feature importance depicts three interpretations

Figure 4 shows three distinct feature importance measures (split importance, mean SHapley Additive exPlanations (SHAP) value and effect size), each offering unique insights into the 28 selected features (Supplementary Data 14). We found biomarker features (and continuous variables in general) had higher split importance, with leucine ranking first, indicating that these features were frequently used during the splitting process in decision tree training. In contrast, the mean SHAP value49 identified baseline characteristics, specifically, frequency of tiredness/lethargy in the last 2 weeks, whole body pain, and age, as the top three features, followed by Total-P and S-LDL-TG. The directionality impact of these features was further analysed with SHAP plots (Supplementary Fig. 13), using unscaled data retrained with the LightGBM algorithm (which showed an insignificant drop in performance in the test set, Supplementary Data 13) to facilitate the interpretation of the features. We noticed similar trends for the variables in both training and test sets (Supplementary Fig. 13). In general, we observed that lower levels of Total-P and M-VLDL-P, and elevated levels of S-LDL-P and S-LDL-TG favoured an ME/CFS prediction. Immature reticulocyte fraction showed that both high and low concentrations were more likely to result in an ME/CFS prediction, while medium values led to non-ME/CFS predictions (Supplementary Fig. 13a). Acetone produced a conflicting plot, where increased concentrations led to both ME/CFS and non-ME/CFS predictions, contrasting the inverse association observed as an individual biomarker.

Fig. 4: Contributions of the 28 scaled features selected by LightGBM model.
figure 4

Feature importance from the independent blind test set was measured using split importance (green), mean SHAP value (orange) and effect size (determined by logistics regression shown in purple). The features are arranged in the order chosen during forward feature selection, optimised for AUC. Split importance indicates the frequency with which a feature was used to split nodes and mean SHAP value is represented as the magnitude of the average impact the feature has on the model output. Patterned bars in the effect size panel indicate a negative direction, solid bars indicate positive association. Detailed explanations of the 28 features selected are provided in Supplementary Data 14.

Effect sizes for four biomarkers changed direction as part of the score compared to their individual association: PUFA% changed to show a positive effect and S-LDL-P, L-VLDL-FC, and M-VLDL-P shifted to show a negative effect (Fig. 4). Adjusting for cholesterol-lowering medication and fish oil in the score showed a decrease in effect sizes without a change in direction (Supplementary Data 14). PUFA%, hip pain, and S-LDL-TG were no longer significant and Total-P gained significance at the Bonferroni threshold P < 1.78.\(\times\)10−3. Additionally, the effect sizes of acetone and acetoacetate in the score had decreased compared to their individual effects, potentially to distinguish migraine due to the strong individual ketone body associations observed in that cohort.

ME/CFS score distribution in other cohorts

The efficacy of the LightGBM score was evaluated by stratifying each participant into 100 bins, representing ME/CFS score percentiles (Fig. 5a). ME/CFS cases increased proportionally, reaching 40% in the 100th percentile. However, a substantial amount of C2 was also captured (43%), despite observing a decline in C2 numbers as the score increased. Hypertension made up 12% of the 100th percentile, exhibiting an inverse association with the score, while asthma and hay fever showed no association. Depression, IBS, hypothyroidism, and migraine had increasing observation rates but contributed less than 8.8% in a percentile, each constituting less than 5% at the 100th percentile.

Fig. 5: Stratified LightGBM score shows increased detection of ME/CFS cases in higher percentiles.
figure 5

The LightGBM scores were stratified into percentiles, comprising 927 participants, for the study population. a Observed event frequencies are plotted for ME/CFS (red), hypertension (light blue), depression (lavender), asthma (brown), IBS (green), hay fever (salmon), hypothyroidism (blue), migraine (purple) and C2 (light grey) with respect to percentile. The colour legend of the cohorts is preserved throughout the figure, and the red vertical bars represent the top 10 percentiles. b The accumulative percentage of individuals in their respective cohort are shown, capturing those in the ith percentile and above where 0 <i ≤ 100. c Shows the odds ratio of the LightGBM score, adjusted for gender and age, for each comorbid cohort. Coloured circles indicate a significant association at P < 6.25\(\times\)10−3 with Bonferroni threshold. An odds ratio greater than 1 indicates a positive association and an odds ratio lower than 1 indicates an inverse association. d The LightGBM score distribution visualised as boxplots for ME/CFS (median = 96), migraine (73), depression (72), hypothyroidism (68), IBS (68), hypertension (51), asthma (51), C2 (47), hay fever (46), and (e) the median percentile for comorbid and C2 cohorts combined (dark grey), was 50.

We also calculated the cumulative percentage of ME/CFS cases captured relative to the cohort size (Fig. 5b). The top five percentiles detected 56% of the ME/CFS cohort, top 10 percentiles: 67% and the top 25 percentiles: 81%. Percentile distribution for ME/CFS overlapped with depression, IBS, hypothyroidism, and migraine (Fig. 5d, e).

The distribution of the LASSO score trained using class weights (“Methods”) was also evaluated to confirm if there were any advantages in training on the full dataset with more participants and features vs. 1:1 class ratio and selected features. (Supplementary Fig. 14). We observed slightly higher proportion of ME/CFS cases captured in the top percentiles and trends in detecting comorbid conditions were marginally changed however with no prominent differences in their ability to discriminate comorbidities.

False positive predictions were found in higher ME/CFS score percentiles

The LightGBM model had a false positive rate = 0.20 and a false negative rate = 0.30. Hence, we probed the participants who were incorrectly classified to determine whether they had a particular quality that resulted in their prediction. Most of the false positives comprised of C2 individuals, and the majority occurred in the higher score percentiles (Supplementary Fig. 15b, d). From the comorbid cohort point of view, migraine and IBS individuals were more likely to be incorrectly classified at higher percentiles, but no distinct false positive patterns were observed, i.e. false positives occurring in a specific percentile range (Supplementary Fig. 15c). The 69 false negatives were spread across the 1st–97th percentiles, with a maximum of three incorrect predictions found in the 83rd percentile.

Since the model and scores were trained on a heterogenous ME/CFS cohort, we examined the contribution of all reported comorbid conditions in the ME/CFS cohort to the score. We found ME/CFS individuals who presented with a greater number of comorbid conditions generally appeared in the higher percentiles (Supplementary Fig. 16a). The reported comorbid conditions are broken down in Supplementary Fig. 16b, noting that these conditions may co-exist.

ME/CFS score percentile suggestive of disease severity

We revisited the biomarker analysis with a stratified ME/CFS cohort based on score percentiles. ME/CFS participants were placed into three groups: high (96–100th percentiles, n = 643), medium (81st–95th percentiles, n = 269), and low (1st−80th percentiles, n = 282) to assess the specificity of biomarker signals according to disease severity. While there are no formal objective strategies to classify disease severity, a high disease score would indicate extreme values for any of the 28 selected features, and hence represent an abnormal or a greater afflicted state. The high percentile group produced greater ORs compared to the full cohort for all associations except for 10 biomarkers (Supplementary Fig. 17, Supplementary Data 15). For example, ORs for TG by PG increased from 1.46 to 1.74 (P < 1.41\(\times\)10−41) and GlycA increased from 1.39 to 1.64 (P < 9.91\(\times\)10−38). VLDL and HDL associations remained prominent in the medium percentile group and exhibited greater effect size for sphingomyelins than the high percentile group. All associations in the low percentile group were negligible, potentially representing a group of individuals that are able to function at almost full capacity and require a ‘stressor’ for molecular perturbations to be detectable50.

Discussion

The UKB offers a wealth of data containing both historical and accruing datasets that are procured non-specifically as to not bias a particular disease. This study showcases the utility of the UKB for hypothesis generation, result validation, and exploratory purposes, applied to ME/CFS research.

This metabolomics analysis presents a lipoprotein profile for ME/CFS, highlighting significant associations of the disease with VLDL subclasses and size. These findings pinpoint a triglyceride and cholesterol transport problem, potentially arising from enzyme dysregulation, such as lipoprotein lipase (LPL). Interestingly, our retrospective analysis connects a recent study revealing a 2-fold overexpression of microRNA-29a in ME/CFS50, which may inhibit LPL translation51,52. The resulting inhibition of LPL activity leads to decreased clearance of VLDL particles and reduced degradation of circulating triglycerides. Surface lipids including total cholines, phosphatidylcholines, sphingomyelins, and phosphoglycerides were significantly decreased in the UKB ME/CFS cohort. These results are consistent with prior research10,13, suggesting potential membrane destabilisation, altered cell signalling and dysregulated immune cell function53. Our study contributes further evidence with a TG/PG association showing increased core lipid content relative to surface area, which may reduce membrane fluidity.

We identified distinct energy metabolism profiles for females (n = 882) and males (n = 312). Firstly, a non-gender-specific elevation of glucose was observed, followed by significant positive associations of alanine and various fatty acids at P < 2.0\(\times\)10−4, and inverse associations with ketone bodies at P < 6.25\(\times\)10−3, found in females. Alanine plays a key role in the metabolism of nitrogen-containing compounds and may be elevated in plasma due to increased demand for amino acid catabolism for ATP production, which was also supported by elevated levels of total BCAAs at the trait-wise significance threshold. The inclination towards amino acid metabolism over more efficient mechanisms such as carbohydrate metabolism aligns with previous serum/plasma11,13 and lymphoblasts studies54. Our results extend this finding by proposing that this shift in energy metabolism is more prominent in females. Mechanistic animal models reveal significant sex differences in energy metabolism; females prioritise lipid biosynthesis and preferentially utilise anaplerosis at the expense of amino acids, while males tend to slow down anabolic pathways, under conditions of fasting55. These observations may speak to the female preponderance observed in ME/CFS and underscore the critical role of sex-based comparison in unravelling ME/CFS disease mechanisms37.

Additionally, we introduce ketolysis as another alternative pathway in this metabolic preference. In contrast, biomarker associations unique to males did not uncover any discrete alternative energy pathways, potentially due to sample size differences. Instead, they showed inverse associations of cholesteryl esters and total cholesterol in IDL and S-HDL, suggesting a cholesterol transfer problem during HDL maturation. However, we also observe inverse association of XL-HDL-P in females postulating that, overall, HDL maturation problems may not be gender-specific, but inflict varied perturbations along the pathway, influencing HDL size in a gender-dependent manner56.

The fatty acid associations found in ME/CFS females have implications in inflammatory processes57. Fatty acids, especially PUFAs exhibit similar signalling responsibilities with surface lipids as they can be incorporated into inflammatory cell membranes. Despite observing positive PUFA associations, PUFA % and PUFA/MUFA were inversely associated, modifying the fatty acid composition in the phospholipid membrane, and potentially influencing the function of inflammatory cells. Furthermore, fatty acids can interact with the neuroendocrine system58 by altering steroid hormone secretion, which conversely, can also exert control over fatty acid metabolism. Cholesterol handling is of particular importance to steroid hormones given that it serves as the exclusive precursor for all steroidogeneses. The steroid hormone cortisol is currently the most reliable biomarker in ME/CFS research, with lower levels in ME/CFS evidenced at the level of meta-analysis59. This observation aligns with recent findings in Long Covid, a condition sharing similarities with ME/CFS, wherein reduced cortisol levels have also been identified as a major distinguishing feature60, potentially contributing to the pathogenesis61.

We did not detect any aromatic amino acid anomalies despite the recent identification of potential diagnostic biomarkers using Raman spectroscopy on peripheral blood mononuclear cells62. This discrepancy can be attributed to differences in biofluid characterisation, and the relative quantification of biomolecules from Raman spectroscopic peak bands, contrasting with the absolute quantification from NMR by Nightingale Health. Both our studies, however, revealed evidence of altered fatty acid and amino acid utilisation, albeit from different biomarkers.

This study’s strength lies in leveraging comorbid cohorts as positive control groups, allowing us to investigate biological mechanisms not only between ME/CFS and healthy individuals but also across various medical conditions. Although, we did not find any prominent LMWM unique to ME/CFS, overlapping biomarker associations, and those with opposite effect sizes offered robust evidence for establishing similarities and differences in potential pathologies of ME/CFS and comorbid conditions. ME/CFS shared majority associations with the depression cohort. Nevertheless, we identified a clear biochemical distinction with increased levels of total cholines, phosphatidylcholines and phosphoglycerides observed in depression63. Migraine and ME/CFS shared ketone body associations but exhibited opposite glucose associations, suggesting a similar biological mechanism that results in frequent headaches, and migraines, while stemming from different underlying causes64. Establishing biomarkers that can differentiate between ME/CFS and common comorbidities, and not only healthy individuals, is crucial in assisting clinicians to make a more informative ME/CFS diagnosis. The concordance of amino acid associations observed in hypertension with the literature65 also provides confidence in the validity of the ME/CFS associations as recruitment and experimental workflow were employed uniformly across all UKB participants. Furthermore, many ME/CFS biomarkers, such as GlycA and those related to VLDL, and larger HDL particles produced larger effects than the other cohorts, which indicate a genuine perturbation in the metabolic pathways underpinning this condition.

The release of UKB NMR metabolomics data and the rising popularity in the Nightingale Health Platform has stimulated research endeavours aimed at using this 249-biomarker dataset in screening and risk prediction for type 2 diabetes46, dementia66, pneumonia47, all-cause mortality67 and other common diseases21,22,68. We developed a model and subsequent score tailored to estimate the likelihood of a disease event, i.e. a disease detection model, with goals of progressing it towards a diagnostic tool when deep phenotyping data from biobanks with clinically diagnosed ME/CFS such as UK ME/CFS Biobank69, DecodeME70 and All of Us71 become available for validation. LASSO regression, a commonly employed method for score generation, presents a challenge when working with this particular NMR biomarker dataset due to the multicollinearity of the lipoprotein subclass measurements, potentially leading to the determination of unstable coefficients72. To address this issue, previous studies have arbitrarily chosen a subset of features, such as using only clinically validated biomarkers for training47. Here, we employed a forward feature selection method to cover the entire feature space comprising of NMR biomarkers and baseline characteristics to produce a multi-variable score consisting of a concise set of 28 features. The necessity for incorporating additional feature types beyond metabolomics in predicting multisystemic conditions has been shown, with the integration of other molecular entities markedly enhancing model performance73. We also demonstrated the importance of other molecular markers in the variance decomposition analysis where NMR metabolomics biomarkers alone did not sufficiently explain lifestyle and medical history baseline characteristics. Therefore, as the UKB continues to generate data, the integration of multi-omics, such as genomics74,75 and proteomics, infectious disease markers, neurobiomarkers, other biofluids, and continuous wearables data would be highly attractive.

Comparing the significant individual biomarker associations and the biomarker features selected for the predictive model suggests that biomarkers distinguishing ME/CFS from healthy individuals are more likely to reflect the underlying biological mechanism of ME/CFS. However, to differentiate individuals suffering from other illnesses, additional biomarkers that may not solely be related to ME/CFS biochemistry may be essential. Feature importance analysis showed that ME/CFS individuals could possess extreme concentrations from both ends of the concentration ranges (as exemplified by the immature reticulocyte fraction). Therefore, some biomarkers may not strictly adhere to a consistent pattern of being increased or decreased when compared to what is considered a normal concentration, further alluding to the heterogeneity inherent in ME/CFS and explaining the inconsistent results reported across previous metabolomic studies76.

In this study, we evaluated our models using recall (or sensitivity). This metric prioritises prediction of the positive class (ME/CFS cases), as no such diagnostic tool for ME/CFS currently exists. Unfortunately, the models with good recall generally had poor precision resulting in many false positives (Supplementary Data 13). The majority of the false positives occurred in the higher ME/CFS score percentiles, where many ME/CFS individuals with multiple comorbidities were also located, emphasising the need for a strategy that can effectively distinguish individuals within this higher percentile group that exhibits similar symptom severities regardless of the condition. While the model was developed for exploratory purposes, it shows promise as a screening stage in a multi-diagnosis process for ME/CFS, and a refined score may be clinically utilised to assess a patient’s disease progression or treatment response, as one’s score ascends or descends in the percentiles.

Limitations of our study include the use of self-reported medical conditions in which some of the cases may be misdiagnosed. At the time of analysis, two data fields explicitly reported ME/CFS: the self-reported medical conditions during a verbal interview at the assessment centre (data field: 20002, code: 1842) and the experience of pain online follow-up (data field: 120010). Since we used baseline NMR metabolomics data, ME/CFS reported at the initial assessment centre were taken as the ground truth. This ME/CFS cohort reflected an older population, with an average age of 55, and the youngest participant was aged 40. ME/CFS can occur at any time across the lifespan, with two major onset peaks in adolescence and late 30s, which are not captured in this study37. It should also be noted that an older cohort is more likely to present with multi-morbidities22. There were no comorbid presentations of POTS in the ME/CFS cohort, and only 2.5% reported FM. The underreporting of these diagnoses in the UKB is potentially due to the lack of a clear understanding of these conditions77,78. There are also inherent biases in the UKB as volunteer-based recruitment attracts a population of generally healthier individuals, resulting in milder ME/CFS cases for this study. Additionally, UKB comprises of predominately white British individuals, therefore replication is encouraged in other ethnic backgrounds. Finally, most of the ME/CFS cohort was taking medication and supplements (Supplementary Data 16). We decided not to remove these individuals as it may indirectly result, again, in a ‘healthier’ cohort with milder symptoms. Instead, biomarker associations were adjusted for cholesterol-lowering medication and fish oil supplementation, among the highest consumption in ME/CFS compared to the control cohorts that may have affected biomarker concentrations.

We have initiated a detailed investigation into potential ME/CFS biomarkers and their biological relevance, validated previous metabolomics biomarkers and characterised ME/CFS in the UKB for future studies to integrate other data types, such as imaging data18, genomics79, proteomics80 and accelerometry data81. Most importantly, we emphasise the importance of considering comorbid conditions when assessing the efficacy of potential diagnostic statistical models and we introduce methods for doing so in the context of heterogenous pathological conditions such as ME/CFS.