Discriminating Myalgic Encephalomyelitis/Chronic Fatigue Syndrome and comorbid conditions using metabolomics in UK Biobank

Huang, Katherine; G. C. de Sá, Alex; Thomas, Natalie; Phair, Robert D.; Gooley, Paul R.; Ascher, David B.; Armstrong, Christopher W.

doi:10.1038/s43856-024-00669-7

Download PDF

Article
Open access
Published: 26 November 2024

Discriminating Myalgic Encephalomyelitis/Chronic Fatigue Syndrome and comorbid conditions using metabolomics in UK Biobank

Communications Medicine volume 4, Article number: 248 (2024) Cite this article

15k Accesses
10 Citations
106 Altmetric
Metrics details

Subjects

Abstract

Background

Diagnosing complex illnesses like Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) is complicated due to the diverse symptomology and presence of comorbid conditions. ME/CFS patients often present with multiple health issues, therefore, incorporating comorbidities into research can provide a more accurate understanding of the condition’s symptomatology and severity, to better reflect real-life patient experiences.

Methods

We performed association studies and machine learning on 1194 ME/CFS individuals with blood plasma nuclear magnetic resonance (NMR) metabolomics profiles, and seven exclusive comorbid cohorts: hypertension (n = 13,559), depression (n = 2522), asthma (n = 6406), irritable bowel syndrome (n = 859), hay fever (n = 3025), hypothyroidism (n = 1226), migraine (n = 1551) and a non-diseased control group (n = 53,009).

Results

We present a lipoprotein perspective on ME/CFS pathophysiology, highlighting gender-specific differences and identifying overlapping associations with comorbid conditions, specifically surface lipids, and ketone bodies from 168 significant individual biomarker associations. Additionally, we searched for, trained, and optimised a machine learning algorithm, resulting in a predictive model using 19 baseline characteristics and nine NMR biomarkers which could identify ME/CFS with an AUC of 0.83 and recall of 0.70. A multi-variable score was subsequently derived from the same 28 features, which exhibited ~2.5 times greater association than the top individual biomarker.

Conclusions

This study provides an end-to-end analytical workflow that explores the potential clinical utility that association scores may have for ME/CFS and other difficult to diagnose conditions.

Plain language summary

Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) is an illness with severe fatigue without a known cause. Further symptoms of ME/CFS often overlap with other medical problems making diagnosis difficult. We wanted to find a way to easily identify people with this condition, so we used data from the UK Biobank to compare people with and without ME/CFS who had other medical problems. We developed a mathematical calculation, using 19 basic health factors and nine blood markers, which could classify ME/CFS and non-ME/CFS individuals correctly 83% of the time, and recognise this condition in individuals 70% of the time. This research could lead to a better way to diagnose ME/CFS and serve as an example for diseases lacking definite laboratory testing.

Deep phenotyping of myalgic encephalomyelitis/chronic fatigue syndrome in Japanese population

Article Open access 16 November 2020

AI-driven multi-omics modeling of myalgic encephalomyelitis/chronic fatigue syndrome

Article 25 July 2025

Heightened innate immunity may trigger chronic inflammation, fatigue and post-exertional malaise in ME/CFS

Article Open access 03 September 2025

Introduction

Complex chronic illnesses pose diagnosis, treatment and research challenges as affected individuals often harbour multiple comorbidities, a concern reflected in Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS), a disabling and persistent multi-systemic illness¹. ME/CFS individuals exhibit a medley of symptoms including profound, unexplained fatigue, flu-like symptoms without the presence of an active viral pathogen, post-exertional malaise, unrefreshing sleep, muscle weakness or pain, vasomotor instability, sensory and cognitive problems¹. Symptoms share similarities to frequently reported comorbid conditions, including but not limited to irritable bowel syndrome (IBS), postural orthostatic tachycardia syndrome (POTS), fibromyalgia (FM), allergies, migraine and depression^2,3. It is often difficult to recognise whether these comorbidities precede disease onset or develop consequentially from the debilitating nature of the condition³, preventing or prolonging the time to receive a diagnosis.

Disease heterogeneity presents a paradoxical problem for case-control study designs as it is difficult to capture all symptom combinations in a single cohort, and furthermore, individual variation may dilute weak signals that cannot be statistically detected. Current ME/CFS studies employ tailored inclusion/exclusion criteria for both case and control groups to mitigate individual heterogeneity, which often results in limited sample sizes. Moreover, various ME/CFS diagnostic criteria^1,2,3,4 are used for participant recruitment producing inconsistent results across studies. The lack of powered studies and result validation has made research into reproducible biomarkers a priority, in particular, the application of metabolomics⁵.

Observed at the intersection between genes and the environment, metabolomics provides a comprehensive snapshot of the small molecule landscape of an organism, making it an ideal technique to investigate internal biological perturbations caused by external factors such as lifestyle, diet, stress and disease. While ME/CFS metabolomics studies have produced an extensive list of potential plasma and serum biomarkers across various domains, including energy metabolism^6,7,8,9,10, amino acid metabolism^{6,8,11,12,13,14,15}, lipid metabolism^{9,10,13,15,16}, urea cycle^7,11 and oxidative stress⁶, a conclusive biomarker panel is yet to be verified.

In this study, we leverage UK Biobank (UKB) resources to investigate ME/CFS pathophysiology by drawing comparisons to comorbid conditions and other well-characterised diseases. Control groups in traditional case-control studies often represent a ‘healthy’ cohort. However, they underestimate the impact of comorbid illnesses in both ME/CFS and non-ME/CFS populations. Therefore, by using a large heterogenous ME/CFS cohort and various homogenous negative and positive control groups based on common comorbidities of ME/CFS, we sought to (1) identify discriminatory and shared blood metabolomic biomarkers for ME/CFS and comorbid conditions, (2) distinguish ME/CFS and individuals with overlapping comorbid conditions using machine learning and (3) characterise the altered biological pathways that underlie the ME/CFS nuclear magnetic resonance (NMR) metabolomics profile.

Methods

Study population

Details of the UKB (https://www.ukbiobank.ac.uk/) study design have been previously described¹⁷. Briefly, the UKB recruited over 500,000 participants, aged 39–70 years old to attend one of the 22 assessment centres across the UK between 2006 and 2010 on a volunteer basis. All participants provided written informed consent. The UKB has ethics approval from the Northwest Multi-centre Research Ethics Committee as a Research Tissue Bank, allowing researchers to operate under a unified ethical framework upon successful access application. This study was approved under UKB Project #79568, covering data access and use for secondary research as presented.

Data collection

At the assessment centres, participants completed a baseline assessment, including a touchscreen questionnaire and face-to-face interview, body composition and functional measurements and the collection of non-fasting plasma, urine and saliva¹⁷. Participants were invited back on three separate occasions after the initial visit: a repeated assessment visit (2012–13), which collected similar data as the baseline visit, an imaging visit which initiated the UKB multi-modal imaging enhancement study on the brain, heart, bones and abdomen (2014–) and the first repeat imaging visit (2019–)¹⁸. Data are continuously being collected, returned and released in tranches to bona fide researchers and participants who request to be withdrawn are removed.

Cohort definitions

Disease labels were self-reported in the verbal interview, which occurred at both initial and repeat assessment visits. The interviewer asked the participant to list past or current serious illnesses or disabilities that had been informed to them by a doctor, and the response verified with a trained nurse (Data field: 20002). All analyses were based on data and samples collected at the baseline assessment. The study population included a heterogenous ME/CFS cohort, seven homogenous comorbid cohorts (hypertension, depression, asthma, IBS, hay fever, hypothyroidism and migraine) and a non-diseased or ‘healthy’ cohort (C2) (Supplementary Table 1). A heterogenous cohort was defined as the presentation of multiple, and different medical conditions and homogenous refers to the existence of one single condition.

Metabolic biomarker profiling

The metabolic biomarkers were quantitated using high-throughput NMR with a protocol previously detailed¹⁹. Quality control was measured by using blind duplicates and internal control samples. Absolute concentrations of 168 biomarkers were provided (107 non-derivable biomarkers and 61 composite biomarkers), along with 81 biomarker ratios (inclusive of percentages) (Supplementary Data 1). The non-derivable biomarkers include apolipoproteins, albumin, lipoprotein subclasses, glycoprotein, lipids, fatty acids and low molecular weight metabolites (LMWM) such as amino acids, ketone bodies and glycolysis-related metabolites.

Data pre-processing

All biomarker values and baseline characteristics were processed prior to data analyses. Firstly, technical variation was removed from the biomarker features²⁰, followed by outliers defined by 4$\times$IQR ± median for biomarkers²¹ and 5$\times$SD ± mean for baseline characteristics²². Ranked categorical variables such as ‘frequency of tiredness/lethargy in the last 2 weeks’ were encoded in an ordinal manner, whereas non-ranked variables such as ‘sex’ were encoded using the dummy (or one-hot encoding) method. Data field 6145 (illness, injury, bereavement, stress in last 2 years) allowed for multiple selections of various events and was binary encoded as Yes/No under a new feature called ‘Previous stressful event(s)’. Missing values were imputed using the median for continuous variables and 0 for categorical variables. Finally, biomarker data and continuous baseline characteristics were scaled by unit variance²¹. The dataset used to train the machine learning model was processed using the R package caret (v6.0.93)²³, which followed a similar processing workflow, except column-wise operations were performed after partitioning the data into 80% train and 20% test sets. The test set was processed separately, using the same parameters as the training data to prevent data leakage.

Statistical analyses

For descriptive statistics, continuous variables were summarised and presented using their respective median and counts for categorical variables were shown as percentages. Mann-Whitney U test and Chi-square test of independence with Yate’s continuity correction were performed on unscaled continuous and categorical variables, respectively, to determine significant differences between the two groups. Since unequal variance can inflate false positives when applying Mann-Whitney U, stringent post-hoc Bonferroni adjustments were applied²⁴. Raw p values are presented throughout the article, with the significance threshold indicated in the main text or legend.

Biomarker associations and multiple testing correction

Logistics regression was used to estimate the odds ratio for biomarker associations with ME/CFS and comorbid cohorts against the C2 cohort. Odds ratios were adjusted for sex, age, cholesterol-lowering medication and fish oil supplements. We applied a Bonferroni threshold of P < 0.05/249 when identifying significant associations in ME/CFS only (accounting for the total number of biomarkers) and a less stringent Bonferroni adjustment of P < 0.05/8 (accounting for the number of medical conditions tested) when detecting overlapping and unique associations in multiple conditions with different sample sizes.

Variance decomposition of baseline characteristics on biomarkers

Linear regression was performed first to identify the association between the baseline characteristics and biomarkers adjusting for sex, age, cholesterol-lowering medication and fish oil supplements, and secondly, to determine the variance in biomarker measurements explained by each baseline characteristic, variance decomposition was performed in ME/CFS cohort only and for the entire study population, implemented in the R package variancePartition (v1.24.1)²⁵.

Sampling strategies

Class imbalance is a problem in machine learning as it can bias prediction to the majority class²⁶. Different oversampling and undersampling strategies that were applicable for both continuous and categorical data were employed to generate nine different training sets (Supplementary Table 2). The first training set considered the original class distribution: 962 participants in the minority class (ME/CFS) and 66,545 participants in the majority class (rest of study population), exhibiting a 1:70 ratio. Four additional training sets were constructed with random undersampling of the majority class at different ratios: 1:1, 1:2, 1:4 and 1:20. A training set combining bootstrapping and random undersampling was also constructed²⁷, with 20,000 participants in each class to match the number of participants in the majority class in the test set. Two algorithmic class imbalance strategies were also employed: cluster-based undersampling using k-prototypes²⁸ and Synthetic Minority Oversampling Technique for Nominal and Continuous (SMOTE-NC)²⁹, implemented in the R package RSBID v0.0.2.0. The final training set used a combination of SMOTE-NC and cluster-based undersampling. The test set remained at the original class distribution.

Feature selection

Four feature sets were curated comprising of (1) all baseline characteristics and biomarkers (319 features); (2) all biomarkers (249 features); (3) baseline characteristics and biomarkers significantly associated with ME/CFS at trait-wise Bonferroni threshold (242 features) and (4) biomarkers significantly associated with ME/CFS at trait-wise Bonferroni threshold (197 features). Two feature selection methods were employed: least absolute shrinkage and selection operator (LASSO) feature selection³⁰ and forward feature selection³¹ to remove irrelevant or correlated variables³², implemented in R package glmnet (v4.1.4)³³ and using the scikit-learn library, respectively.

Generating ME/CFS score with machine learning

Initial models were trained using penalised logistics regression with LASSO feature selection³³ and 10-fold cross-validation on all feature and training sets. An additional model was trained using the original class distribution training set with class weights, which is another method to address class imbalance by assigning greater importance to the minority class. Subsequently, models meeting a recall (>0.7) and AUC (>0.8) performance criteria were retrained using forward feature selection coupled with various machine learning algorithms including adaptive boosting, random forest, extreme gradient boosting, explainable boosted machine and light gradient boosted machine (LightGBM), optimised for both AUC and recall³⁴. The predictive performance of all models was evaluated using an independent test set³⁵. Finally, logistics regression was employed to determine effect sizes for features from models that met the performance criteria and ME/CFS scores were computed for the entire study population using a weighted sum denoted by Eq. (1). A detailed explanation of the machine learning algorithms can be found in the Supplementary Methods and a summary of the workflow is provided in Supplementary Fig. 1.

$${ME}/{CFS\; score}={\beta }_{1}{X}_{1}+{\beta }_{2}{X}_{2}+\ldots +{\beta }_{n}{X}_{n}$$

(1)

where X_n is the unit variance scaled feature, ß_n is the effect size of the nth feature.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Results

Study design and ME/CFS comorbidities

There were 274,353 participants with baseline NMR biomarker profiles in the UKB. Participants with self-reported ME/CFS comprised 0.4% of the original UKB cohort (2161 in 502,359 participants), which included 1194 ME/CFS participants with NMR metabolomics biomarker data³⁶. The biomarker dataset comprised 249 measurements, including lipoprotein subclasses, lipids, fatty acids, and LMWM (Supplementary Data 1).

We found that 83% of the ME/CFS cohort presented with multiple illnesses, reporting 272 different comorbid conditions (Supplementary Fig. 2 and Supplementary Data 2). Comorbid conditions reported at a frequency >5% in ME/CFS were compared to the C1 cohort, defined as a non-ME/CFS population inclusive of patients with disease (but excluding ME/CFS) and healthy participants (Table 1), to identify comorbid conditions that were more prevalent in ME/CFS. Depression, asthma, IBS, hypothyroidism, hay fever and migraine were significantly increased in ME/CFS (Bonferroni threshold: P < 4.54$\times$10⁻³) and were chosen as positive control groups. Unclassifiable, while increased, was excluded, due to the redundancy of comparing to an unknown cohort. Osteoarthritis was excluded as it is an age-related condition, and the UKB generally represents an older cohort, potentially producing confounding comparative associations. Furthermore, gastro-oesophageal reflux was excluded based on IBS already representing a comorbid condition in the gastrointestinal/abdominal classification. Although hypertension was not significantly different in ME/CFS and non-ME/CFS, it was included as a comorbid cohort because of the cardiovascular nature of this dataset. Ultimately, seven homogenous comorbid cohorts with participants who had only reported that single condition, and a non-diseased control group denoted as C2 were established as respective positive and negative control groups (Supplementary Table 1).

Table 1 Percentage of self-reported medical conditions in ME/CFS and C1 cohorts

Full size table

Baseline characteristics

Demographics, clinical measurements, anthropometry, lifestyle, and symptoms are summarised in Table 2 for the study population, with a detailed breakdown provided in Supplementary Data 3. Standardised ME/CFS symptom and severity questionnaires including SF-36, and MFI-20 were unavailable for this study. Cognitive function, experience of pain, mental health, and digestive health online follow-ups were performed years after the donation of the biological samples. Therefore, only baseline characteristics collected at the initial assessment centre visit were incorporated into our analyses.

Table 2 Demographics and baseline characteristics of study population

Full size table

The ME/CFS cohort comprised 74% females, consistent with a 3:1 female-to-male ratio previously reported³⁷. The preponderance of females was also observed in depression, IBS, hypothyroidism, and migraine cohorts (67%, 67%, 87% and 77%, respectively). Overall, the ME/CFS cohort exhibited significantly different physical measurements (Bonferroni threshold accounting for the nine groups tested against ME/CFS: P < 5.56$\times$10⁻³). ME/CFS had lower hand-grip strength, an indicator of muscle fatigue³⁸, compared to all cohorts except for hypothyroidism. Basal metabolic rate is the energy expenditure at rest³⁹ and was lower in ME/CFS except when compared to IBS (showed no significant difference) and hypothyroidism and migraine cohorts (which were both lower than ME/CFS). Pulse rate was elevated in ME/CFS.

The UKB touchscreen questionnaire data showed that the ME/CFS cohort had a significantly higher proportion of participants reporting tiredness/lethargy (89%) and overall pain (81%), except for migraine (84%), attributed to 75% of the cohort experiencing headaches (Supplementary Data 3). All reported pain types were higher in ME/CFS, except stomach/abdominal pain, which was higher in IBS (37%) than in ME/CFS (18%). Compared to the depression cohort, ME/CFS patients reported significantly lower frequencies of depressed mood (39% in ME/CFS and 54% in depression) and mood swings (56% in ME/CFS and 78% in depression). There was no significant difference in mood swings between ME/CFS and IBS. Additionally, the distribution of the International Physical Activity Questionnaire (IPAQ) activity groups was different between all the cohorts. Individuals scoring in the low IPAQ group performed less than 600 MET minutes per week; energy expenditure was calculated by activity intensity and duration⁴⁰.

ME/CFS metabolomic profile

Metabolomic-wide association studies were performed to determine individual biomarker effects for each condition. There were 168 biomarkers associated with ME/CFS at P < 2.01$\times$10⁻⁴, with Bonferroni threshold accounting for the number of total metabolites (Supplementary Data 4). The associations were spread across four biomarker types: 66 non-derived biomarkers with absolute concentrations, 37 composite biomarkers derived from the sum of two or more non-derived biomarkers, 42 relative biomarkers presented as percentages and 3 biologically relevant ratios⁴¹. Most of these associations were lipid compositions of triglyceride (TG), cholesteryl esters (CE), free cholesterol (FC), phospholipids (PL), total cholesterol (C), total lipids (L) in different lipoprotein subclasses (VLDL, IDL, LDL, HDL) and sizes (XS-XXL, average particle diameters are provided in Supplementary Data 1), and lipoprotein particle concentrations (P).

Figure 1 shows the associations of the non-derived lipoprotein measurements, lipids and LMWM in ME/CFS. The strongest biomarker association was total triglyceride to phosphoglyceride ratio (TG/PG), where a one standard deviation (SD) increase in the biomarker measurement was associated with 46% higher odds of having ME/CFS compared to the odds of not having ME/CFS (odds ratio (OR): 1.46, 95% confidence interval (CI): 1.38–1.56, P = 3.95$\times$10⁻³³). VLDL size had the largest effect size among non-derived biomarkers (OR: 1.41, 95% CI: 1.32–1.50, P = 1.26$\times$10⁻²⁴), followed by glycoprotein acetyls (GlycA), an NMR marker for systemic inflammation⁴² (OR: 1.39, 95% CI: 1.31–1.47, P = 2.56$\times$10⁻²⁸). Conversely, HDL-CE exhibited the strongest inverse association where a 1-SD increase in the biomarker measurement decreased the odds of ME/CFS by 35% (OR: 0.65, 95% CI: 0.61–0.70, P = 4.96$\times$10⁻³²).

**Fig. 1: ME/CFS display strong individual biomarker associations with lipoproteins, surface lipids and inflammatory markers.**

ME/CFS lipoprotein associations included increased VLDL particle concentration and consequently all VLDL lipid components (FC, CE, PL and TG)⁴³, triglycerides in all lipoprotein subclasses except for L-HDL-TG, ApoB and ApoB/ApoA1 ratio and inverse associations with HDL particle concentrations and ApoA1. Decreased levels of sphingomyelins, phosphatidylcholines and total cholines, higher levels of alanine, valine and glucose and total fatty acids were also associated with ME/CFS.

The median disease duration, between reported ME/CFS onset and the blood sample donation day at the first assessment centre visit was 11.6 years. To assess potential changes in biomarker concentrations overtime, we performed another round of association tests on 181 ME/CFS participants with a disease duration of <2 years (Supplementary Fig. 4). Six biomarkers out of the 168 significant associations remained significant and exhibited greater effects (HDL-C, HDL-CE, M-HDL-C, M-HDL-CE, TG by PG and XL-HDL-FC %). The lack of significant associations was most likely attributed to the reduced sample size (15% of the full ME/CFS cohort), however, the OR estimates remained comparable with the full cohort, especially for lipoproteins, ApoA1 and alanine. We also quantified the minimum strength of association required for any unmeasured confounders that could explain away their association, represented as E-values, determined from a sensitivity analysis⁴⁴. In general, a larger E-value implies that an unknown confounder of considerable strength is required to weaken the biomarker association. For example, the observed odds ratio of 0.65 for HDL-CE could be deemed negligible in the presence of another confounder that was associated with HDL-CE or ME/CFS by an odds ratio of 2.43 (the E-value). While some E-values were not particularly large, they were all greater than their respective ORs. We present corresponding E-values in Supplementary Data 5 for objective evaluations of the biomarker associations identified from this retrospective study.

Different lipid profiles observed in female and male ME/CFS participants

Association testing was also performed with males and females separated (Supplementary Data 6 and 7, Supplementary Fig. 5). In the ME/CFS female population, 62 different biomarker associations were identified, 14 in ME/CFS males and 94 were observed in both genders at P < 2.01$\times$10⁻⁴ (all significant biomarkers across the three ME/CFS groups were associated in the same direction). There were seven biomarkers that were not initially associated in the whole ME/CFS cohort but found in females only (polyunsaturated fatty acids (PUFA), linoleic acid, M-VLDL-C, L-LDL-P, M-LDL-L, M-LDL-PL, and L-VLDL-TG %), and four additional biomarkers identified in males only (IDL-C, IDL-CE, S-HDL-P and S-LDL-C%).

ME/CFS biomarker associations are highly pleiotropic

There were 234 pleiotropic biomarkers (those associated with two or more conditions), contributing to a total of 942 associations at P < 6.25$\times$10⁻³ with trait-wise Bonferroni threshold to account for varying sample sizes (Fig. 2, Supplementary Fig. 6). Only XXL-VLDL-TG % was uniquely associated with ME/CFS (Supplementary Fig. 6), with the remaining 196 associations also present in other conditions. Hypertension associations exhibited 81% similarity with ME/CFS associations, depression (85%), asthma (73%), IBS (97%), hay fever (46%), hypothyroidism (88%) and migraine (89%).

**Fig. 2: Overlapping associations reveal distinct trends for different biomarker groups in ME/CFS and comorbid conditions.**

Twenty-nine additional significant ME/CFS associations were identified at the trait-wise threshold, including total branched chained amino acids (BCAA) and inverse associations with citrate, acetate, and acetone. We observed the same biomarker association directions in ME/CFS, hypothyroidism and migraine for HDL-P, HDL-PL, ApoA1, sphingomyelins, phosphoglyceride, phosphatidylcholines, and total cholines, while depression displayed opposite effects for the latter three biomarkers (Supplementary Data 8). Similarly, associations with Total-L, driven by LDL-L and LDL-PL, were unique to depression⁴⁵. Associations observed in hypothyroidism that were not present in ME/CFS included creatinine and albumin (inverse), and the migraine cohort showed stronger inverse associations in the ketone bodies panel and positive associations for glutamine and glycine. IBS did not have any discriminatory LMWM and mostly had overlapping associations for relative measurements of FC and CE in the VLDL subclass.

Addressing comorbidities within ME/CFS

To thoroughly investigate the impact of comorbid conditions in ME/CFS requires stratifying the cohort into groups of isolated condition combinations, which can substantially reduce the sample size and the statistical power. For example, there were 211 ME/CFS individuals with a combination of depression and other comorbid conditions, and 24 individuals with depression only. We recognise that the other 265 comorbid conditions not analysed in this study may influence the biomarker associations. Therefore, we created another cohort with 354 ME/CFS individuals with or without hypertension, depression, asthma, IBS, hay fever, hypothyroidism, or migraine and performed association tests (Supplementary Fig. 7) and sensitivity analysis for this subset (Supplementary Data 9). Thirty-one of the initial 168 ME/CFS biomarker associations remained significant (P < 2.01$\times$10⁻⁴). SFA% and omega-3 were the only significant associations that produced greater odds ratio in the subset than the full cohort. The lower odds ratios observed may be attributed to the reduced number of comorbid conditions reported by each individual, rather than the specific condition. The average number of comorbid conditions was 3.0 for the full cohort and 0.6 for the subset. This suggests that the burden of having several comorbid conditions might exacerbate ME/CFS symptoms (inclusive of symptoms from common comorbid conditions), reflecting a higher disease severity, leading to more pronounced biomarker signals in the full cohort.

Clinical predictors attributable to biomarker variation

We investigated the relationship between the NMR metabolomic biomarkers and baseline characteristics to identify risk factors and routine clinical markers that may be potential modifiable targets for treatment or management²². The maximum amount of variation explained by 61 baseline characteristics (Supplementary Fig. 8 and Supplementary Data 10) on 249 biomarkers was identified (Supplementary Data 11), and the top six most explainable biomarkers in ME/CFS are shown in Fig. 3. The largest drivers of biomarker variation were mostly consistent with established biological mechanisms including inflammation (C-reactive protein explaining 20.5% of biomarker variance; neutrophil count, 7.1%) via GlycA, kidney function (urate, 22.3%; cystatin C, 21.3%) via plasma creatinine, testosterone (16.2%) via plasma creatinine and serum urea (15.3%) via valine. These traditional blood biochemistry measurements similarly explained the same set of biomarkers in the entire study population (Supplementary Figs. 9 and 10 and Supplementary Data 12). We highlight the different contributions of white blood cell (WBC) leucocyte count and insulin-like growth factor 1 (IGF-1) between the two sample groups as potential antecedent-biomarker pairs that are worth exploring in ME/CFS. In the study population, WBC leucocyte count and IGF-1 explained 7.8% and 1.8% of the variation in GlycA and phosphatidylcholine, respectively. In the ME/CFS cohort, these factors contributed to 9.3% of the variation observed in the PUFA/MUFA (monounsaturated fatty acids) ratio and 4.9% in PUFA % (Supplementary Data 12).

**Fig. 3: Variance decomposition of baseline characteristics for the top six most explainable NMR metabolomics biomarkers in ME/CFS cohort.**

In ME/CFS, GlycA variation was also explained by six additional baseline characteristics that were not directly linked to inflammation (Fig. 3): pulse rate, high IPAQ group, frequency of tiredness/lethargy, neck/shoulder pain, sleeplessness/insomnia and alkaline phosphatase. Additionally, lifestyle and environment, symptoms (health and medical history) and psychological factors did not contribute to large amounts of biomarker variation, with a 1.3% average explained variance.

Building an ME/CFS score with machine learning

The ability to comprehensively quantitate metabolites in a single run is one of the advantages of using NMR for metabolomics⁵, conveniently allowing for the combining of biomarkers to generate a multi-variable disease score through machine learning^46,47. We implemented a two-stage model training and selection workflow (Supplementary Fig. 1). In the first stage, we found that penalised logistics regression with LASSO models considering both biomarker data and baseline characteristics had outperformed biomarker features only (Supplementary Fig. 11). The different training and feature sets (“Methods”) from models that achieved a performance criteria of recall >0.7 and area under the receiver operator characteristic curve (AUC) >0.8, were retrained in the second stage using forward feature selection coupled with adaptive boosting, random forest, extreme gradient boosting, explainable boosted machine, and LightGBM (Supplementary Methods). The final twelve models meeting the performance criteria (Supplementary Data 13) selected from both stages each had an even class distribution, obtained either by using class weights, random undersampling or bootstrapping (“Methods”). Across 5- and 10-fold cross-validation, these models achieved performance up to an AUC of 0.89 and recall (i.e. sensitivity) of 0.77, comparable to performance on the independent blind test set, providing confidence in the generalisability and robustness of the final models. Subsequently, an ME/CFS score was derived using a weighted sum of the important features from each model, with weights determined by logistics regression (Supplementary Fig. 12).

Models that employed LASSO feature selection had substantially more features (54–253 features) compared to forward feature selection models (6–28 features) (Supplementary Data 13). An ideal predictive model should achieve a balance between strong performance metrics and a concise set of features³². In this regard, the LightGBM model⁴⁸ was chosen as the optimal model, selecting 19 baseline characteristics and nine NMR biomarkers (Supplementary Data 14), and achieving an AUC of 0.83, and a recall of 0.70 on the blind test set. Furthermore, the LightGBM score yielded an OR of 3.61, CI: 3.45–3.78, P$\, \approx \,$0 (Fig. 5c), which is ~2.5 times more strongly associated to ME/CFS than the top individual biomarker, TG/PG. While other forward feature selection models had slightly better performance metrics (Supplementary Data 13), models with a combination of baseline characteristics and biomarker features were preferred over baseline characteristics only as to reduce the possibility of selecting too many subjective features. Additionally, scores that exhibited inverse, non-significant or weaker associations with comorbid groups were also prioritised in the model selection process, in which the LightGBM score demonstrated with hypertension, asthma and hayfever (Supplementary Fig. 12).

Feature importance depicts three interpretations

Figure 4 shows three distinct feature importance measures (split importance, mean SHapley Additive exPlanations (SHAP) value and effect size), each offering unique insights into the 28 selected features (Supplementary Data 14). We found biomarker features (and continuous variables in general) had higher split importance, with leucine ranking first, indicating that these features were frequently used during the splitting process in decision tree training. In contrast, the mean SHAP value⁴⁹ identified baseline characteristics, specifically, frequency of tiredness/lethargy in the last 2 weeks, whole body pain, and age, as the top three features, followed by Total-P and S-LDL-TG. The directionality impact of these features was further analysed with SHAP plots (Supplementary Fig. 13), using unscaled data retrained with the LightGBM algorithm (which showed an insignificant drop in performance in the test set, Supplementary Data 13) to facilitate the interpretation of the features. We noticed similar trends for the variables in both training and test sets (Supplementary Fig. 13). In general, we observed that lower levels of Total-P and M-VLDL-P, and elevated levels of S-LDL-P and S-LDL-TG favoured an ME/CFS prediction. Immature reticulocyte fraction showed that both high and low concentrations were more likely to result in an ME/CFS prediction, while medium values led to non-ME/CFS predictions (Supplementary Fig. 13a). Acetone produced a conflicting plot, where increased concentrations led to both ME/CFS and non-ME/CFS predictions, contrasting the inverse association observed as an individual biomarker.

**Fig. 4: Contributions of the 28 scaled features selected by LightGBM model.**

Effect sizes for four biomarkers changed direction as part of the score compared to their individual association: PUFA% changed to show a positive effect and S-LDL-P, L-VLDL-FC, and M-VLDL-P shifted to show a negative effect (Fig. 4). Adjusting for cholesterol-lowering medication and fish oil in the score showed a decrease in effect sizes without a change in direction (Supplementary Data 14). PUFA%, hip pain, and S-LDL-TG were no longer significant and Total-P gained significance at the Bonferroni threshold P < 1.78.$\times$10⁻³. Additionally, the effect sizes of acetone and acetoacetate in the score had decreased compared to their individual effects, potentially to distinguish migraine due to the strong individual ketone body associations observed in that cohort.

ME/CFS score distribution in other cohorts

The efficacy of the LightGBM score was evaluated by stratifying each participant into 100 bins, representing ME/CFS score percentiles (Fig. 5a). ME/CFS cases increased proportionally, reaching 40% in the 100th percentile. However, a substantial amount of C2 was also captured (43%), despite observing a decline in C2 numbers as the score increased. Hypertension made up 12% of the 100th percentile, exhibiting an inverse association with the score, while asthma and hay fever showed no association. Depression, IBS, hypothyroidism, and migraine had increasing observation rates but contributed less than 8.8% in a percentile, each constituting less than 5% at the 100th percentile.

**Fig. 5: Stratified LightGBM score shows increased detection of ME/CFS cases in higher percentiles.**

We also calculated the cumulative percentage of ME/CFS cases captured relative to the cohort size (Fig. 5b). The top five percentiles detected 56% of the ME/CFS cohort, top 10 percentiles: 67% and the top 25 percentiles: 81%. Percentile distribution for ME/CFS overlapped with depression, IBS, hypothyroidism, and migraine (Fig. 5d, e).

The distribution of the LASSO score trained using class weights (“Methods”) was also evaluated to confirm if there were any advantages in training on the full dataset with more participants and features vs. 1:1 class ratio and selected features. (Supplementary Fig. 14). We observed slightly higher proportion of ME/CFS cases captured in the top percentiles and trends in detecting comorbid conditions were marginally changed however with no prominent differences in their ability to discriminate comorbidities.

False positive predictions were found in higher ME/CFS score percentiles

The LightGBM model had a false positive rate = 0.20 and a false negative rate = 0.30. Hence, we probed the participants who were incorrectly classified to determine whether they had a particular quality that resulted in their prediction. Most of the false positives comprised of C2 individuals, and the majority occurred in the higher score percentiles (Supplementary Fig. 15b, d). From the comorbid cohort point of view, migraine and IBS individuals were more likely to be incorrectly classified at higher percentiles, but no distinct false positive patterns were observed, i.e. false positives occurring in a specific percentile range (Supplementary Fig. 15c). The 69 false negatives were spread across the 1st–97th percentiles, with a maximum of three incorrect predictions found in the 83rd percentile.

Since the model and scores were trained on a heterogenous ME/CFS cohort, we examined the contribution of all reported comorbid conditions in the ME/CFS cohort to the score. We found ME/CFS individuals who presented with a greater number of comorbid conditions generally appeared in the higher percentiles (Supplementary Fig. 16a). The reported comorbid conditions are broken down in Supplementary Fig. 16b, noting that these conditions may co-exist.

ME/CFS score percentile suggestive of disease severity

We revisited the biomarker analysis with a stratified ME/CFS cohort based on score percentiles. ME/CFS participants were placed into three groups: high (96–100th percentiles, n = 643), medium (81st–95th percentiles, n = 269), and low (1st−80th percentiles, n = 282) to assess the specificity of biomarker signals according to disease severity. While there are no formal objective strategies to classify disease severity, a high disease score would indicate extreme values for any of the 28 selected features, and hence represent an abnormal or a greater afflicted state. The high percentile group produced greater ORs compared to the full cohort for all associations except for 10 biomarkers (Supplementary Fig. 17, Supplementary Data 15). For example, ORs for TG by PG increased from 1.46 to 1.74 (P < 1.41$\times$10⁻⁴¹) and GlycA increased from 1.39 to 1.64 (P < 9.91$\times$10⁻³⁸). VLDL and HDL associations remained prominent in the medium percentile group and exhibited greater effect size for sphingomyelins than the high percentile group. All associations in the low percentile group were negligible, potentially representing a group of individuals that are able to function at almost full capacity and require a ‘stressor’ for molecular perturbations to be detectable⁵⁰.

Discussion

The UKB offers a wealth of data containing both historical and accruing datasets that are procured non-specifically as to not bias a particular disease. This study showcases the utility of the UKB for hypothesis generation, result validation, and exploratory purposes, applied to ME/CFS research.

This metabolomics analysis presents a lipoprotein profile for ME/CFS, highlighting significant associations of the disease with VLDL subclasses and size. These findings pinpoint a triglyceride and cholesterol transport problem, potentially arising from enzyme dysregulation, such as lipoprotein lipase (LPL). Interestingly, our retrospective analysis connects a recent study revealing a 2-fold overexpression of microRNA-29a in ME/CFS⁵⁰, which may inhibit LPL translation^51,52. The resulting inhibition of LPL activity leads to decreased clearance of VLDL particles and reduced degradation of circulating triglycerides. Surface lipids including total cholines, phosphatidylcholines, sphingomyelins, and phosphoglycerides were significantly decreased in the UKB ME/CFS cohort. These results are consistent with prior research^10,13, suggesting potential membrane destabilisation, altered cell signalling and dysregulated immune cell function⁵³. Our study contributes further evidence with a TG/PG association showing increased core lipid content relative to surface area, which may reduce membrane fluidity.

We identified distinct energy metabolism profiles for females (n = 882) and males (n = 312). Firstly, a non-gender-specific elevation of glucose was observed, followed by significant positive associations of alanine and various fatty acids at P < 2.0$\times$10⁻⁴, and inverse associations with ketone bodies at P < 6.25$\times$10⁻³, found in females. Alanine plays a key role in the metabolism of nitrogen-containing compounds and may be elevated in plasma due to increased demand for amino acid catabolism for ATP production, which was also supported by elevated levels of total BCAAs at the trait-wise significance threshold. The inclination towards amino acid metabolism over more efficient mechanisms such as carbohydrate metabolism aligns with previous serum/plasma^11,13 and lymphoblasts studies⁵⁴. Our results extend this finding by proposing that this shift in energy metabolism is more prominent in females. Mechanistic animal models reveal significant sex differences in energy metabolism; females prioritise lipid biosynthesis and preferentially utilise anaplerosis at the expense of amino acids, while males tend to slow down anabolic pathways, under conditions of fasting⁵⁵. These observations may speak to the female preponderance observed in ME/CFS and underscore the critical role of sex-based comparison in unravelling ME/CFS disease mechanisms³⁷.

Additionally, we introduce ketolysis as another alternative pathway in this metabolic preference. In contrast, biomarker associations unique to males did not uncover any discrete alternative energy pathways, potentially due to sample size differences. Instead, they showed inverse associations of cholesteryl esters and total cholesterol in IDL and S-HDL, suggesting a cholesterol transfer problem during HDL maturation. However, we also observe inverse association of XL-HDL-P in females postulating that, overall, HDL maturation problems may not be gender-specific, but inflict varied perturbations along the pathway, influencing HDL size in a gender-dependent manner⁵⁶.

The fatty acid associations found in ME/CFS females have implications in inflammatory processes⁵⁷. Fatty acids, especially PUFAs exhibit similar signalling responsibilities with surface lipids as they can be incorporated into inflammatory cell membranes. Despite observing positive PUFA associations, PUFA % and PUFA/MUFA were inversely associated, modifying the fatty acid composition in the phospholipid membrane, and potentially influencing the function of inflammatory cells. Furthermore, fatty acids can interact with the neuroendocrine system⁵⁸ by altering steroid hormone secretion, which conversely, can also exert control over fatty acid metabolism. Cholesterol handling is of particular importance to steroid hormones given that it serves as the exclusive precursor for all steroidogeneses. The steroid hormone cortisol is currently the most reliable biomarker in ME/CFS research, with lower levels in ME/CFS evidenced at the level of meta-analysis⁵⁹. This observation aligns with recent findings in Long Covid, a condition sharing similarities with ME/CFS, wherein reduced cortisol levels have also been identified as a major distinguishing feature⁶⁰, potentially contributing to the pathogenesis⁶¹.

We did not detect any aromatic amino acid anomalies despite the recent identification of potential diagnostic biomarkers using Raman spectroscopy on peripheral blood mononuclear cells⁶². This discrepancy can be attributed to differences in biofluid characterisation, and the relative quantification of biomolecules from Raman spectroscopic peak bands, contrasting with the absolute quantification from NMR by Nightingale Health. Both our studies, however, revealed evidence of altered fatty acid and amino acid utilisation, albeit from different biomarkers.

This study’s strength lies in leveraging comorbid cohorts as positive control groups, allowing us to investigate biological mechanisms not only between ME/CFS and healthy individuals but also across various medical conditions. Although, we did not find any prominent LMWM unique to ME/CFS, overlapping biomarker associations, and those with opposite effect sizes offered robust evidence for establishing similarities and differences in potential pathologies of ME/CFS and comorbid conditions. ME/CFS shared majority associations with the depression cohort. Nevertheless, we identified a clear biochemical distinction with increased levels of total cholines, phosphatidylcholines and phosphoglycerides observed in depression⁶³. Migraine and ME/CFS shared ketone body associations but exhibited opposite glucose associations, suggesting a similar biological mechanism that results in frequent headaches, and migraines, while stemming from different underlying causes⁶⁴. Establishing biomarkers that can differentiate between ME/CFS and common comorbidities, and not only healthy individuals, is crucial in assisting clinicians to make a more informative ME/CFS diagnosis. The concordance of amino acid associations observed in hypertension with the literature⁶⁵ also provides confidence in the validity of the ME/CFS associations as recruitment and experimental workflow were employed uniformly across all UKB participants. Furthermore, many ME/CFS biomarkers, such as GlycA and those related to VLDL, and larger HDL particles produced larger effects than the other cohorts, which indicate a genuine perturbation in the metabolic pathways underpinning this condition.

The release of UKB NMR metabolomics data and the rising popularity in the Nightingale Health Platform has stimulated research endeavours aimed at using this 249-biomarker dataset in screening and risk prediction for type 2 diabetes⁴⁶, dementia⁶⁶, pneumonia⁴⁷, all-cause mortality⁶⁷ and other common diseases^21,22,68. We developed a model and subsequent score tailored to estimate the likelihood of a disease event, i.e. a disease detection model, with goals of progressing it towards a diagnostic tool when deep phenotyping data from biobanks with clinically diagnosed ME/CFS such as UK ME/CFS Biobank⁶⁹, DecodeME⁷⁰ and All of Us⁷¹ become available for validation. LASSO regression, a commonly employed method for score generation, presents a challenge when working with this particular NMR biomarker dataset due to the multicollinearity of the lipoprotein subclass measurements, potentially leading to the determination of unstable coefficients⁷². To address this issue, previous studies have arbitrarily chosen a subset of features, such as using only clinically validated biomarkers for training⁴⁷. Here, we employed a forward feature selection method to cover the entire feature space comprising of NMR biomarkers and baseline characteristics to produce a multi-variable score consisting of a concise set of 28 features. The necessity for incorporating additional feature types beyond metabolomics in predicting multisystemic conditions has been shown, with the integration of other molecular entities markedly enhancing model performance⁷³. We also demonstrated the importance of other molecular markers in the variance decomposition analysis where NMR metabolomics biomarkers alone did not sufficiently explain lifestyle and medical history baseline characteristics. Therefore, as the UKB continues to generate data, the integration of multi-omics, such as genomics^74,75 and proteomics, infectious disease markers, neurobiomarkers, other biofluids, and continuous wearables data would be highly attractive.

Comparing the significant individual biomarker associations and the biomarker features selected for the predictive model suggests that biomarkers distinguishing ME/CFS from healthy individuals are more likely to reflect the underlying biological mechanism of ME/CFS. However, to differentiate individuals suffering from other illnesses, additional biomarkers that may not solely be related to ME/CFS biochemistry may be essential. Feature importance analysis showed that ME/CFS individuals could possess extreme concentrations from both ends of the concentration ranges (as exemplified by the immature reticulocyte fraction). Therefore, some biomarkers may not strictly adhere to a consistent pattern of being increased or decreased when compared to what is considered a normal concentration, further alluding to the heterogeneity inherent in ME/CFS and explaining the inconsistent results reported across previous metabolomic studies⁷⁶.

In this study, we evaluated our models using recall (or sensitivity). This metric prioritises prediction of the positive class (ME/CFS cases), as no such diagnostic tool for ME/CFS currently exists. Unfortunately, the models with good recall generally had poor precision resulting in many false positives (Supplementary Data 13). The majority of the false positives occurred in the higher ME/CFS score percentiles, where many ME/CFS individuals with multiple comorbidities were also located, emphasising the need for a strategy that can effectively distinguish individuals within this higher percentile group that exhibits similar symptom severities regardless of the condition. While the model was developed for exploratory purposes, it shows promise as a screening stage in a multi-diagnosis process for ME/CFS, and a refined score may be clinically utilised to assess a patient’s disease progression or treatment response, as one’s score ascends or descends in the percentiles.

Limitations of our study include the use of self-reported medical conditions in which some of the cases may be misdiagnosed. At the time of analysis, two data fields explicitly reported ME/CFS: the self-reported medical conditions during a verbal interview at the assessment centre (data field: 20002, code: 1842) and the experience of pain online follow-up (data field: 120010). Since we used baseline NMR metabolomics data, ME/CFS reported at the initial assessment centre were taken as the ground truth. This ME/CFS cohort reflected an older population, with an average age of 55, and the youngest participant was aged 40. ME/CFS can occur at any time across the lifespan, with two major onset peaks in adolescence and late 30s, which are not captured in this study³⁷. It should also be noted that an older cohort is more likely to present with multi-morbidities²². There were no comorbid presentations of POTS in the ME/CFS cohort, and only 2.5% reported FM. The underreporting of these diagnoses in the UKB is potentially due to the lack of a clear understanding of these conditions^77,78. There are also inherent biases in the UKB as volunteer-based recruitment attracts a population of generally healthier individuals, resulting in milder ME/CFS cases for this study. Additionally, UKB comprises of predominately white British individuals, therefore replication is encouraged in other ethnic backgrounds. Finally, most of the ME/CFS cohort was taking medication and supplements (Supplementary Data 16). We decided not to remove these individuals as it may indirectly result, again, in a ‘healthier’ cohort with milder symptoms. Instead, biomarker associations were adjusted for cholesterol-lowering medication and fish oil supplementation, among the highest consumption in ME/CFS compared to the control cohorts that may have affected biomarker concentrations.

We have initiated a detailed investigation into potential ME/CFS biomarkers and their biological relevance, validated previous metabolomics biomarkers and characterised ME/CFS in the UKB for future studies to integrate other data types, such as imaging data¹⁸, genomics⁷⁹, proteomics⁸⁰ and accelerometry data⁸¹. Most importantly, we emphasise the importance of considering comorbid conditions when assessing the efficacy of potential diagnostic statistical models and we introduce methods for doing so in the context of heterogenous pathological conditions such as ME/CFS.

Data availability

All data analysis results and source data have been made available in the Supplementary Data. The underlying data are open access through an application to the UK Biobank and any further materials and methods can be accessed upon request to the corresponding author.

References

Clayton, E. W. Beyond myalgic encephalomyelitis/chronic fatigue syndrome: an IOM report on redefining an illness. JAMA 313, 1101–1102 (2015).
Article PubMed CAS Google Scholar
Carruthers, B. M. et al. Myalgic encephalomyelitis: International Consensus Criteria. J. Intern. Med. 270, 327–338 (2011).
Article PubMed PubMed Central CAS Google Scholar
Carruthers, B. M. et al. Myalgic Encephalomyelitis/Chronic Fatigue Syndrome. J. Chronic Fatigue Syndr. 11, 7–115 (2003).
Article Google Scholar
Fukuda, K. et al. The chronic fatigue syndrome: a comprehensive approach to its definition and study. International Chronic Fatigue Syndrome Study Group. Ann. Intern. Med. 121, 953–959 (1994).
Article PubMed CAS Google Scholar
Wishart, D. S. Emerging applications of metabolomics in drug discovery and precision medicine. Nat. Rev. Drug Discov. 15, 473–484 (2016).
Article PubMed CAS Google Scholar
Armstrong, C. W., McGregor, N. R., Butt, H. L. & Gooley, P. R. Metabolism in chronic fatigue syndrome. Adv. Clin. Chem. 66, 121–172 (2014).
Article PubMed CAS Google Scholar
Yamano, E. et al. Index markers of chronic fatigue syndrome with dysfunction of TCA and urea cycles. Sci. Rep. 6, 34990 (2016).
Article PubMed PubMed Central CAS Google Scholar
Germain, A., Ruppert, D., Levine, S. M. & Hanson, M. R. Prospective biomarkers from plasma metabolomics of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome implicate redox imbalance in disease symptomatology. Metabolites 8, 90 (2018).
Article PubMed PubMed Central Google Scholar
Nagy-Szakal, D. et al. Insights into Myalgic Encephalomyelitis/Chronic Fatigue Syndrome phenotypes through comprehensive metabolomics. Sci. Rep. 8, 10056 (2018).
Article PubMed PubMed Central Google Scholar
Che, X. et al. Metabolomic evidence for peroxisomal dysfunction in Myalgic Encephalomyelitis/Chronic Fatigue Syndrome. Int. J. Mol. Sci. 23, 7906 (2022).
Article PubMed PubMed Central CAS Google Scholar
Armstrong, C. W. et al. NMR metabolic profiling of serum identifies amino acid disturbances in chronic fatigue syndrome. Clin. Chim. Acta 413, 1525–1531 (2012).
Article PubMed CAS Google Scholar
Fluge, O. et al. Metabolic profiling indicates impaired pyruvate dehydrogenase function in myalgic encephalopathy/chronic fatigue syndrome. JCI Insight 1, e89376 (2016).
Article PubMed PubMed Central Google Scholar
Naviaux, R. K. et al. Metabolic features of chronic fatigue syndrome. Proc. Natl. Acad. Sci. USA 113, E5472–E5480 (2016).
PubMed PubMed Central CAS Google Scholar
Germain, A., Ruppert, D., Levine, S. M. & Hanson, M. R. Metabolic profiling of a myalgic encephalomyelitis/chronic fatigue syndrome discovery cohort reveals disturbances in fatty acid and lipid metabolism. Mol. Biosyst. 13, 371–379 (2017).
Article PubMed PubMed Central CAS Google Scholar
Hoel, F. et al. A map of metabolic phenotypes in patients with myalgic encephalomyelitis/chronic fatigue syndrome. JCI Insight 6, e149217 (2021).
Article PubMed PubMed Central Google Scholar
Germain, A., Barupal, D. K., Levine, S. M. & Hanson, M. R. Comprehensive circulatory metabolomics in ME/CFS reveals disrupted metabolism of acyl lipids and steroids. Metabolites 10, 34 (2020).
Article PubMed PubMed Central CAS Google Scholar
Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Article PubMed PubMed Central Google Scholar
Littlejohns, T. J. et al. The UK Biobank imaging enhancement of 100,000 participants: rationale, data collection, management and future directions. Nat. Commun. 11, 2624 (2020).
Article PubMed PubMed Central CAS Google Scholar
Soininen, P., Kangas, A. J., Wurtz, P., Suna, T. & Ala-Korpela, M. Quantitative serum nuclear magnetic resonance metabolomics in cardiovascular epidemiology and genetics. Circ. Cardiovasc. Genet. 8, 192–206 (2015).
Article PubMed CAS Google Scholar
Ritchie, S. C. et al. Quality control and removal of technical variation of NMR metabolic biomarker data in ~120,000 UK Biobank participants. Sci. Data 10, 64 (2023).
Article PubMed PubMed Central CAS Google Scholar
Julkunen, H. et al. Atlas of plasma NMR biomarkers for health and disease in 118,461 individuals from the UK Biobank. Nat. Commun. 14, 604 (2023).
Article PubMed PubMed Central CAS Google Scholar
Pietzner, M. et al. Plasma metabolites to profile pathways in noncommunicable disease multimorbidity. Nat. Med. 27, 471–479 (2021).
Article PubMed PubMed Central CAS Google Scholar
Kuhn, M. Building predictive models in R using the Caret Package. J. Stat. Softw. 28, 1–26 (2008).
Article Google Scholar
Lee, S. & Lee, D. K. What is the proper way to apply the multiple comparison test? Korean J. Anesthesiol. 71, 353–360 (2018).
Article PubMed PubMed Central Google Scholar
Hoffman, G. E. & Schadt, E. E. variancePartition: interpreting drivers of variation in complex gene expression studies. BMC Bioinform. 17, 483 (2016).
Article Google Scholar
He, H. & Garcia, E. A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 21, 1263–1284 (2009).
Article Google Scholar
Lunardon, N., Menardi, G. & Torelli, N. ROSE: a package for binary imbalanced learning. R J. 6, 79 (2014).
Article Google Scholar
Yen, S.-J. & Lee, Y.-S. Cluster-based sampling approaches to imbalanced data distributions. In Data Warehousing and Knowledge Discovery (eds Tjoa, A. M., & Trujillo, J.) (Springer Berlin Heidelberg, 2006).
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: synthetic minority over-sampling technique. J. Artif. Int. Res. 16, 321–357 (2002).
Google Scholar
Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996).
Article Google Scholar
Jović, A., Brkić, K., & Bogunović, N. A review of feature selection methods with applications. In Proc. 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) (IEEE, 2015).
Chen, R.-C., Dewi, C., Huang, S.-W. & Caraka, R. E. Selecting critical features for data classification based on machine learning methods. J. Big Data 7, 52 (2020).
Article CAS Google Scholar
Friedman, J. H., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via Coordinate Descent. J. Stat. Softw. 33, 1–22 (2010).
Article PubMed PubMed Central Google Scholar
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn Res. 12, 2825–2830 (2011).
Google Scholar
Raschka, S. Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning (arXiv, 2018).
Lim, E. J. et al. Systematic review and meta-analysis of the prevalence of chronic fatigue syndrome/myalgic encephalomyelitis (CFS/ME). J. Transl. Med. 18, 100 (2020).
Article PubMed PubMed Central Google Scholar
Thomas, N., Gurvich, C., Huang, K., Gooley, P. R. & Armstrong, C. W. The underlying sex differences in neuroendocrine adaptations relevant to Myalgic Encephalomyelitis Chronic Fatigue Syndrome. Front. Neuroendocrinol. 66, 100995 (2022).
Article PubMed CAS Google Scholar
Jakel, B. et al. Hand grip strength and fatigability: correlation with clinical parameters and diagnostic suitability in ME/CFS. J. Transl. Med. 19, 159 (2021).
Article PubMed PubMed Central Google Scholar
de Carvalho, C. & Caramujo, M. J. The various roles of fatty acids. Molecules 23, 2583 (2018).
Article PubMed PubMed Central Google Scholar
The IPAQ Group. Guidelines for the data processing and analysis of the “International Physical Activity Questionnaire” (2005).
Wurtz, P. et al. Quantitative serum nuclear magnetic resonance metabolomics in large-scale epidemiology: a primer on -omic technologies. Am. J. Epidemiol. 186, 1084–1096 (2017).
Article PubMed PubMed Central Google Scholar
Connelly, M. A., Otvos, J. D., Shalaurova, I., Playford, M. P. & Mehta, N. N. GlycA, a novel biomarker of systemic inflammation and cardiovascular disease risk. J. Transl. Med. 15, 219 (2017).
Article PubMed PubMed Central Google Scholar
Ala-Korpela, M., Zhao, S., Järvelin, M.-R., Mäkinen, V.-P. & Ohukainen, P. Apt interpretation of comprehensive lipoprotein data in large-scale epidemiology: disclosure of fundamental structural and metabolic relationships. Int. J. Epidemiol. 51, 996–1011 (2021).
Article PubMed Central Google Scholar
VanderWeele, T. J. & Ding, P. Sensitivity analysis in observational research: introducing the E-value. Ann. Intern. Med. 167, 268–274 (2017).
Article PubMed Google Scholar
Parekh, A., Smeeth, D., Milner, Y. & Thure, S. The role of lipid biomarkers in major depression. Healthcare 5, 5 (2017).
Article PubMed PubMed Central Google Scholar
Bragg, F. et al. Predictive value of circulating NMR metabolic biomarkers for type 2 diabetes risk in the UK Biobank study. BMC Med. 20, 159 (2022).
Article PubMed PubMed Central CAS Google Scholar
Julkunen, H., Cichonska, A., Slagboom, P. E., Wurtz, P. & Nightingale, Health U. K. B. I. Metabolic biomarker profiling for identification of susceptibility to severe pneumonia and COVID-19 in the general population. eLife 10, e63033 (2021).
Article PubMed PubMed Central CAS Google Scholar
Ke, G. et al. LightGBM: a highly efficient gradient boosting decision tree. In Proc. Neural Information Processing Systems (Curran Associates Inc., 2017).
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Proc. Proceedings of the 31st International Conference on Neural Information Processing Systems (Curran Associates Inc., 2017).
Nepotchatykh, E. et al. Profile of circulating microRNAs in myalgic encephalomyelitis and their relation to symptom severity, and disease pathophysiology. Sci. Rep. 10, 19620 (2020).
Article PubMed PubMed Central CAS Google Scholar
Charriere, S. & Moulin, P. Multiple miRNA Regulation of Lipoprotein Lipase. in Handbook of Nutrition, Diet, and Epigenetics (eds Patel, V. B., & Preedy, V. R.) (Springer International Publishing, 2019).
Chen, T. et al. MicroRNA-29a regulates pro-inflammatory cytokine secretion and scavenger receptor expression by targeting LPL in oxLDL-stimulated dendritic cells. FEBS Lett. 585, 657–663 (2011).
Article PubMed CAS Google Scholar
Lee, M., Lee, S. Y. & Bae, Y.-S. Functional roles of sphingolipids in immunity and their implication in disease. Exp. Mol. Med. 55, 1110–1130 (2023).
Article PubMed PubMed Central CAS Google Scholar
Missailidis, D. et al. Dysregulated provision of oxidisable substrates to the mitochondria in ME/CFS lymphoblasts. Int. J. Mol. Sci. 22, 2046 (2021).
Article PubMed PubMed Central CAS Google Scholar
Della Torre, S. et al. Short-term fasting reveals amino acid metabolism as a major sex-discriminating factor in the liver. Cell Metab. 28, 256–267.e255 (2018).
Article PubMed PubMed Central Google Scholar
Pascot, A. et al. HDL particle size: a marker of the gender difference in the metabolic risk profile. Atherosclerosis 160, 399–406 (2002).
Article PubMed CAS Google Scholar
Calder, P. C. Omega-3 fatty acids and inflammatory processes: from molecules to man. Biochem. Soc. Trans. 45, 1105–1115 (2017).
Article PubMed CAS Google Scholar
Bhathena, S. J. Relationship between fatty acids and the endocrine and neuroendocrine system. Nutr. Neurosci. 9, 1–10 (2006).
Article PubMed CAS Google Scholar
Nijhof, S. L. et al. The role of hypocortisolism in chronic fatigue syndrome. Psychoneuroendocrinology 42, 199–206 (2014).
Article PubMed CAS Google Scholar
Klein, J. et al. Distinguishing features of long COVID identified through immune profiling. Nature 623, 139–148 (2023).
Article PubMed PubMed Central CAS Google Scholar
Yavropoulou, M. P., Tsokos, G. C., Chrousos, G. P. & Sfikakis, P. P. Protracted stress-induced hypocortisolemia may account for the clinical and immune manifestations of Long COVID. Clin. Immunol. 245, 109133 (2022).
Article PubMed PubMed Central CAS Google Scholar
Xu, J. et al. Developing a blood cell-based diagnostic test for Myalgic Encephalomyelitis/Chronic Fatigue Syndrome using peripheral blood mononuclear cells. Adv. Sci. 10, e2302146 (2023).
Article Google Scholar
Riley, C. A. & Renshaw, P. F. Brain choline in major depression: a review of the literature. Psychiatry Res. Neuroimaging 271, 142–153 (2018).
Article PubMed Google Scholar
Gross, E. C., Klement, R. J., Schoenen, J., D’Agostino, D. P. & Fischer, D. Potential protective mechanisms of ketone bodies in migraine prevention. Nutrients 11, 811 (2019).
Article PubMed PubMed Central CAS Google Scholar
Poggiogalle, E. et al. Amino acids and hypertension in adults. Nutrients 11, 1459 (2019).
Article PubMed PubMed Central CAS Google Scholar
Zhang, X. et al. Plasma metabolomic profiles of dementia: a prospective study of 110,655 participants in the UK Biobank. BMC Med. 20, 252 (2022).
Article PubMed PubMed Central CAS Google Scholar
Deelen, J. et al. A metabolic profile of all-cause mortality risk identified in an observational study of 44,168 individuals. Nat. Commun. 10, 3346 (2019).
Article PubMed PubMed Central Google Scholar
Buergel, T. et al. Metabolomic profiles predict individual multidisease outcomes. Nat. Med. 28, 2309–2320 (2022).
Article PubMed PubMed Central CAS Google Scholar
Lacerda, E. M. et al. The UK ME/CFS Biobank: a disease-specific biobank for advancing clinical research into Myalgic Encephalomyelitis/Chronic Fatigue Syndrome. Front. Neurol. 9, 1026 (2018).
Article PubMed PubMed Central Google Scholar
Devereux-Cooke, A. et al. DecodeME: community recruitment for a large genetics study of myalgic encephalomyelitis / chronic fatigue syndrome. BMC Neurol. 22, 269 (2022).
Article PubMed PubMed Central Google Scholar
Denny, J. C. et al. The “All of Us” Research Program. N. Engl. J. Med. 381, 668–676 (2019).
Article PubMed Google Scholar
Ranganathan, P., Pramesh, C. S. & Aggarwal, R. Common pitfalls in statistical analysis: logistic regression. Perspect. Clin. Res. 8, 148–151 (2017).
Article PubMed PubMed Central Google Scholar
Wang, K. et al. Sequential multi-omics analysis identifies clinical phenotypes and predictive biomarkers for long COVID. Cell Rep. Med. 4, 101254 (2023).
Article PubMed PubMed Central CAS Google Scholar
Dibble, J. J., McGrath, S. J. & Ponting, C. P. Genetic risk factors of ME/CFS: a critical review. Hum. Mol. Genet. 29, R117–R124 (2020).
Article PubMed PubMed Central Google Scholar
Das, S., Taylor, K., Kozubek, J., Sardell, J. & Gardner, S. Genetic risk factors for ME/CFS identified using combinatorial analysis. J. Transl. Med. 20, 598 (2022).
Article PubMed PubMed Central Google Scholar
Huth, T. K., Eaton-Fitch, N., Staines, D. & Marshall-Gradisnik, S. A systematic review of metabolomic dysregulation in Chronic Fatigue Syndrome/Myalgic Encephalomyelitis/Systemic Exertion Intolerance Disease (CFS/ME/SEID). J. Transl. Med. 18, 198 (2020).
Article PubMed PubMed Central CAS Google Scholar
Häuser, W. & Fitzcharles, M. A. Facts and myths pertaining to fibromyalgia. Dialogues Clin. Neurosci. 20, 53–62 (2018).
Article PubMed PubMed Central Google Scholar
Roerink, M. E. et al. Postural orthostatic tachycardia is not a useful diagnostic marker for chronic fatigue syndrome. J. Intern. Med. 281, 179–188 (2017).
Article PubMed CAS Google Scholar
Halldorsson, B. V. et al. The sequences of 150,119 genomes in the UK Biobank. Nature 607, 732–740 (2022).
Article PubMed PubMed Central CAS Google Scholar
Sun, B. B. et al. Plasma proteomic associations with genetics and health in the UK Biobank. Nature 622, 329–338 (2023).
Article PubMed PubMed Central CAS Google Scholar
Khurshid, S. et al. Wearable accelerometer-derived physical activity and incident disease. NPJ Digit. Med. 5, 131 (2022).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This research was approved by the UK Biobank under the application number 79568 and was funded by the Open Medicine Foundation and the Mason Foundation. We thank the participants of the UK Biobank for their contribution to the resource, and researchers contributing to returned datasets. D.B.A. was supported by The National Health and Medical Research Council of Australia (GNT1174405), and The Victorian Government’s Operational Infrastructure Support Program.

Author information

Authors and Affiliations

Department of Biochemistry and Pharmacology, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, VIC, Australia
Katherine Huang, Natalie Thomas, Paul R. Gooley & Christopher W. Armstrong
School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, QLD, Australia
Alex G. C. de Sá & David B. Ascher
Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
Alex G. C. de Sá & David B. Ascher
Baker Department of Cardiometabolic Health, University of Melbourne, Parkville, VIC, Australia
Alex G. C. de Sá & David B. Ascher
Integrative Bioinformatics, Inc., Mountain View, CA, USA
Robert D. Phair

Authors

Katherine Huang
View author publications
Search author on:PubMed Google Scholar
Alex G. C. de Sá
View author publications
Search author on:PubMed Google Scholar
Natalie Thomas
View author publications
Search author on:PubMed Google Scholar
Robert D. Phair
View author publications
Search author on:PubMed Google Scholar
Paul R. Gooley
View author publications
Search author on:PubMed Google Scholar
David B. Ascher
View author publications
Search author on:PubMed Google Scholar
Christopher W. Armstrong
View author publications
Search author on:PubMed Google Scholar

Contributions

K.H. and C.W.A. designed the study concept; K.H. performed the data analysis and visualisations; A.G.C.d.S. and K.H. performed the machine learning analysis; D.B.A. guided the machine learning sections; N.T., R.D.P., K.H. and C.W.A. provided biological insight to the results; P.R.G. and C.W.A. guided the overall project and K.H. wrote the manuscript. All authors contributed to the review process and editing. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Christopher W. Armstrong.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Medicine thanks Marcos Lacasa and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. [Peer review reports are available].

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Peer review File

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Supplementary Data 5

Supplementary Data 6

Supplementary Data 7

Supplementary Data 8

Supplementary Data 9

Supplementary Data 10

Supplementary Data 11

Supplementary Data 12

Supplementary Data 13

Supplementary Data 14

Supplementary Data 15

Supplementary Data 16

REPORTING SUMMARY

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Huang, K., G. C. de Sá, A., Thomas, N. et al. Discriminating Myalgic Encephalomyelitis/Chronic Fatigue Syndrome and comorbid conditions using metabolomics in UK Biobank. Commun Med 4, 248 (2024). https://doi.org/10.1038/s43856-024-00669-7

Download citation

Received: 27 December 2023
Accepted: 06 November 2024
Published: 26 November 2024
DOI: https://doi.org/10.1038/s43856-024-00669-7

This article is cited by

Elevated risk of new-onset chronic fatigue syndrome/myalgic encephalomyelitis up to four years after SARS-CoV-2 infection
- Roham Hadidchi
- Bhakti Patel
- Tim Q. Duong
Journal of Translational Medicine (2025)
Development and validation of blood-based diagnostic biomarkers for Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) using EpiSwitch® 3-dimensional genomic regulatory immuno-genetic profiling
- Ewan Hunter
- Heba Alshaker
- Dmitri Pchejetski
Journal of Translational Medicine (2025)
The search for a blood-based biomarker for Myalgic Encephalomyelitis/ Chronic Fatigue Syndrome (ME/CFS): from biochemistry to electrophysiology
- Krista S. P. Clarke
- Caroline C. Kingdon
- Fatima H. Labeed
Journal of Translational Medicine (2025)
Replicated blood-based biomarkers for myalgic encephalomyelitis not explicable by inactivity
- Sjoerd Viktor Beentjes
- Artur Miralles Méharon
- Chris P Ponting
EMBO Molecular Medicine (2025)

Subjects

Abstract

Background

Methods

Results

Conclusions

Plain language summary

Similar content being viewed by others

Introduction

Methods

Study population

Data collection

Cohort definitions

Metabolic biomarker profiling

Data pre-processing

Statistical analyses

Biomarker associations and multiple testing correction

Variance decomposition of baseline characteristics on biomarkers

Sampling strategies

Feature selection

Generating ME/CFS score with machine learning

Reporting summary

Results

Study design and ME/CFS comorbidities

Baseline characteristics

ME/CFS metabolomic profile

Different lipid profiles observed in female and male ME/CFS participants

ME/CFS biomarker associations are highly pleiotropic

Addressing comorbidities within ME/CFS

Clinical predictors attributable to biomarker variation

Building an ME/CFS score with machine learning

Feature importance depicts three interpretations

ME/CFS score distribution in other cohorts

False positive predictions were found in higher ME/CFS score percentiles

ME/CFS score percentile suggestive of disease severity

Discussion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links