Introduction

Frailty affects approximately 24% of the global population aged 50 years and older1,2. It is a state of decreased physiological reserve stemming from dysregulated multisystem processes. Core mechanisms include chronic inflammation (inflammaging), characterized by elevated cytokines like IL-6, which drive muscle catabolism and endocrine dysfunction. This is compounded by mitochondrial dysfunction, cellular senescence, and disrupted anabolic signaling (e.g., GH/IGF-1 axis), culminating in sarcopenia and impaired stress resistance3,4. Frailty is a risk factor for chronic diseases, such as cardiovascular diseases (CVDs), cerebrovascular diseases, chronic obstructive pulmonary disease (COPD), kidney diseases, liver cancer, and colorectal cancer5,6. Therefore, many clinical guidelines advocate for routine monitoring of frailty7,8. There are two classic models for defining frailty: the frailty phenotype and frailty index (FI). The frailty phenotype is characterized by unintentional weight loss, reduced grip strength, slow walking speed, poor endurance, low energy consumption, and reduced physical activity. In contrast, the FI reflects cumulative health deficits associated with age. Studies comparing these models suggest that the FI more accurately predicts the risk of adverse events, such as falls and hospitalization, than the frailty phenotype9,10. However, the measures used to calculate the FI are not easily obtained. Therefore, more accurate and convenient methods are needed to measure frailty and identify individuals at a greater risk of frailty-related diseases.

Frailty is linked to disturbances in amino acid and fatty acid metabolism11. Although studies have shown that metabolic disorders in patients with frailty impact the prognosis of chronic diseases, such as type 2 diabetes, CVDs, and metabolic-associated steatotic liver disease (MASLD)12,13,14, there is no consensus on the management and application of blood metabolites in this population. Consequently, it is difficult for clinicians to optimize patient management by controlling metabolites at an early stage. Diet plays a crucial role in metabolism, and high-fat, high-sugar diets significantly increase the risk of metabolic-related diseases, including diabetes, MASLD, and CVDs15,16,17,18. Diet quality also affects the progression of frailty19,20,21. A healthy diet can improve the prognosis of various chronic diseases, such as upper gastrointestinal cancer, kidney disease, COPD, and colorectal cancer22,23,24,25. However, it remains unclear whether a healthy diet can modulate metabolic status and improve the prognosis of frail populations at higher risk of chronic diseases.

This study aimed to explore novel frailty subtypes based on metabolite profiles, identify frail groups at higher risk of chronic diseases, and investigate whether a healthy diet can improve the prognosis of these groups.

Results

Baseline characteristics of the study population

From the UK Biobank cohort of 502,401 individuals, 160,407 individuals were included in the analysis after applying exclusion criteria. Of the included participants, 28,196 individuals were classified as frail.

The demographic and clinical characteristics of the participants are summarized in Table 1. Frail participants were predominantly older, more likely to be women, and less often White. They also had higher rates of smoking, lower alcohol consumption, and elevated average systolic pressure (SBP), average diastolic pressure (DBP), body mass index (BMI), alanine transaminase (ALT), aspartate transaminase (AST), Alkaline Phosphatase (ALP), triglyceride, and HbA1c levels.

Table 1 Baseline characteristics of the study population

Identification of key frailty features and novel subtypes

To identify metabolic features associated with frailty, standard Z-score normalization was applied to 251 metabolic data points. The CatBoost algorithm was used to analyze these features. The SHAP explainer was applied to the best-trained model, calculating SHAP values to rank the top 20 most influential features (Fig. 1a, Supplementary Fig. 2). GlycA (glycoprotein acetylation) was the most important metabolite, followed by LA/FA (percentage of linoleic acid to total fatty acids). The top 11 metabolites with the highest SHAP values were selected for further analysis using the XGBoost model.

Fig. 1: Key frailty features and novel subtypes identified using machine learning.
Fig. 1: Key frailty features and novel subtypes identified using machine learning.
Full size image

a SHAP values indicate the impact of each feature in the CatBoost model. The horizontal axis represents SHAP values, reflecting feature influence; in contrast, the vertical axis ranks features by importance. b ROC curve of the XGBoost model in the train and test set, demonstrating predictive performance based on the top 11 key features. The AUC value reflects prediction accuracy. c PCA visualization of frail participants, with clustering results from K-means analysis plotted on a two-dimensional plane. PC1 and PC2 represent the first and second principal components, respectively. ROC receiver operating characteristic, AUC area under the curve, PCA principal component analysis, PC principal component.

Following hyperparameter tuning and five-fold cross-validation, the XGBoost model achieved an AUC of 0.785 in the train set and 0.757 in the test set, indicating strong predictive performance (Fig. 1b). PCA was performed on the 11 selected metabolic features, with PC1 and PC2 accounting for 44.54% and 16.63% of the variance, respectively.

Using PC1 and PC2, K-means clustering analysis was performed. The optimal number of clusters was determined to be four based on elbow and silhouette coefficient plots (Fig. 1c, Supplementary Fig. 1). The resulting NMI of 0.864 and ARI of 0.881 demonstrate that the clustering is highly stable. Frail participants were subsequently stratified into four novel subtypes for further analysis.

Baseline characteristics and metabolic profiles of novel frailty subtypes

Table 2 summarizes the demographic, anthropometric, and clinical characteristics of the four identified frailty subtypes. Participants in subtypes III and IV were older, predominantly men, and more likely to be White. These subtypes were also associated with lower SBP and higher BMI levels. Distinct patterns among the subtypes emerged based on standardized cluster variables (Fig. 2, Supplementary Fig. 5, Supplementary Table 11).

  1. 1.

    Subtype I: Comprising 8409 participants, this group exhibited extremely low levels of GlycA and Val (valine), the highest ratios of LA/FA, and elevated proportions of docosahexaenoic acid to total fatty acids (DHA/FA). They also had higher ratios of polyunsaturated to monounsaturated fatty acids (PUFA/MUFA). Subtype I was named low GlycA and Val-related frailty (LGVF).

  2. 2.

    Subtype II: This group, with 6971 participants, was characterized by the highest concentrations of Alb (albumin) and LA, and was termed as high Alb and LA-related frailty (HALF).

  3. 3.

    Subtype III: Including 7477 participants, this subtype demonstrated the lowest levels of Alb and LA and was named low Alb and LA-related frailty (LALF).

  4. 4.

    Subtype IV: With 5339 participants, this group displayed high levels of GlycA, elevated Val concentrations, higher proportions of monounsaturated fatty acids to total fatty acids (MUFA/FA), and notably low LA/FA ratios. It was named as high GlycA and Val-related frailty (HGVF).

Fig. 2: Radar chart illustrating the characteristics of the novel frailty subtypes.
Fig. 2: Radar chart illustrating the characteristics of the novel frailty subtypes.
Full size image

Cluster means for each variable were standardized by calculating cohort means and standard deviations and converting the cluster means to z-scores. This standardization eliminated numerical differences between variables, facilitating comparisons. Radar charts display z-scores for each subtype, highlighting distinctive patterns across variables. a Subtype I, low GlycA and Val related frailty (LGVF); b Subtype II, high Alb and LA related frailty (HALF); c Subtype III, low Alb and LA related frailty (LALF); d Subtype IV, high GlycA and Val related frailty (HGVF); e Non-frail.

Table 2 Baseline characteristics of the novel frailty subtypes

The metabolic profile of the non-frail participants closely resembled that of subtype I.

Associations between novel frailty subtypes and chronic diseases

With a median follow-up time of 13.8 years, Kaplan–Meier curves illustrated differences in the cumulative incidence rates of 13 chronic diseases and all-cause mortality among the four novel frailty subtypes and non-frail participants (Supplementary Fig. 3). The highest cumulative incidence rates were observed for coronary artery disease, type 2 diabetes, COPD, and all-cause mortality. Subtypes I and II displayed cumulative incidence trajectories similar to each other, as did subtypes III and IV. Non-frail participants exhibited the lowest cumulative incidence rates across all outcomes; in contrast, subtypes III and IV had the highest cumulative risks of chronic diseases.

A multivariate Cox proportional hazards model was employed to examine the association between novel frailty subtypes and the outcomes. After adjusting for age, sex, ethnicity, current smoking and drinking, SBP, and DBP, most of the results remained significant (Fig. 3, Supplementary Tables 5, 6). Across all outcomes, the non-frail group consistently had the lowest risk.

Fig. 3: Comparisons of disease risks by novel frailty subtypes using multivariate Cox proportional hazards models, adjusted for age, sex, ethnicity, current smoking, current drinking, SBP, and DBP.
Fig. 3: Comparisons of disease risks by novel frailty subtypes using multivariate Cox proportional hazards models, adjusted for age, sex, ethnicity, current smoking, current drinking, SBP, and DBP.
Full size image

Significance level: P < 0.05.

Compared to participants in Subtype Ⅰ, individuals in subtypes Ⅲ and Ⅳ had considerably elevated risk for all outcomes, including:

  1. 1.

    Coronary artery disease (subtype III: HR [95% CI] 1.13 [1.04, 1.24]; subtype IV: HR [95% CI] 1.15 [1.04, 1.26])

  2. 2.

    Heart failure (subtype III: HR [95% CI] 1.22 [1.09, 1.36]; subtype IV: HR [95% CI] 1.18 [1.05, 1.33])

  3. 3.

    MACE (subtype III: HR [95% CI] 1.19 [1.04, 1.35]; subtype IV: HR [95% CI] 1.29 [1.12, 1.48])

  4. 4.

    MI (subtype III: HR [95% CI] 1.35 [1.21, 1.51]; subtype IV: HR [95% CI] 1.33 [1.18, 1.50])

  5. 5.

    Type 2 diabetes (subtype III: HR [95% CI] 2.24 [2.01, 2.49]; subtype IV: 3.00 [2.69, 3.34])

  6. 6.

    MASLD (subtype III: HR [95% CI] 1.42 [1.20, 1.68]; subtype IV: HR [95% CI] 1.85 [1.56, 2.20])

  7. 7.

    COPD (subtype III: HR [95% CI] 1.11 [1.02, 1.21]; subtype IV: HR [95% CI] 1.15 [1.05, 1.26])

  8. 8.

    SLD (subtype III: HR [95% CI] 1.23 [0.94, 1.60]; subtype IV: HR [95% CI] 1.45 [1.10, 1.91])

  9. 9.

    PAD (subtype III: HR [95% CI] 1.23 [1.09, 1.39]: subtype IV: HR [95% CI] 1.21 [1.06, 1.38])

  10. 10.

    ESRD (subtype III: HR [95% CI] 1.83 [1.31, 2.56]; subtype IV: HR [95% CI] 2.49 [1.77, 3.49])

  11. 11.

    Kidney cancer (subtype III: HR [95% CI] 1.05 [0.69, 1.60]; subtype IV: HR [95% CI] 1.16 [0.74, 1.81])

  12. 12.

    Lung cancer (subtype III: HR [95% CI] 1.25 [1.03, 1.53]; subtype IV: HR [95% CI] 1.38 [1.11, 1.72])

  13. 13.

    AAA (subtype III: HR [95% CI] 1.16 [0.89, 1.50]; subtype IV: HR [95% CI] 1.29 [0.97, 1.70])

  14. 14.

    all-cause mortality (subtype III: HR [95% CI] 1.19 [1.10, 1.29]; subtype IV: HR [95% CI] 1.25 [1.14, 1.36])

Associations between healthy diet and chronic diseases in the high-risk frailty group

Based on the results of the previous analysis, we found that subtypes I and Ⅱ had a lower risk of chronic diseases than subtypes III and IV. Therefore, we categorized subtypes I and II as the low-risk frailty group and subtypes III and IV as the high-risk frailty group. Statistical analysis revealed significant differences in prognosis between these two groups, with the high-risk frailty group exhibiting poorer outcomes (Fig. 4, Supplementary Tables 7, 8).

Fig. 4: Cumulative incidence rates of outcomes stratified by low-risk frailty group and high-risk frailty group.
Fig. 4: Cumulative incidence rates of outcomes stratified by low-risk frailty group and high-risk frailty group.
Full size image

The Kaplan–Meier survival curve was employed to estimate the cumulative density of events at specific time points. The Log-rank test was used to evaluate statistical differences between survival curves. a Coronary artery disease. b Heart failure. c MACE. d MI. e Type 2 diabetes. f MASLD g COPD. h SLD. i PAD. j ESRD. k Kidney cancer. l Lung Cancer. m AAA. n All-cause mortality. Significance level: P < 0.05. MACE major adverse cardiovascular events, MI myocardial infarction, MASLD metabolic dysfunction-associated steatotic liver disease, COPD chronic obstructive pulmonary disease, SLD severe liver disease, PAD peripheral artery disease, ESRD end-stage renal disease, AAA abdominal aortic aneurysm.

We then explored the association between healthy diet and prognosis in the high-risk frailty group to determine whether a healthy diet could improve their outcomes. In the high-risk frailty group, after adjusting for confounding factors, participants with characteristics of a healthy diet had reduced risks of outcomes compared with those with an unhealthy diet (Fig. 5, Supplementary Table 9). HRs and 95% CIs are as follows:

  1. 1.

    Coronary artery disease (HR [95% CI]: 0.83 [0.73, 0.94])

  2. 2.

    Heart failure (HR [95% CI]: 0.86 [0.74, 0.99])

  3. 3.

    MACE (HR [95% CI]: 0.98 [0.83, 1.15])

  4. 4.

    MI (HR [95% CI]: 0.98 [0.86, 1.12])

  5. 5.

    MASLD (HR [95% CI]: 0.93 [0.77, 1.13])

  6. 6.

    COPD (HR [95% CI]: 0.81 [0.72, 0.91])

  7. 7.

    SLD (HR [95% CI]: 0.82 [0.58, 1.14])

  8. 8.

    PAD (HR [95% CI]: 0.83 [0.71, 0.98])

  9. 9.

    ESRD (HR [95% CI]: 0.61 [0.41, 0.91])

  10. 10.

    Kidney cancer (HR [95% CI]: 0.96[0.56, 1.65])

  11. 11.

    Lung cancer (HR [95% CI]: 0.71[0.53, 0.96])

  12. 12.

    AAA (HR [95% CI]: 0.84 [0.60, 1.18])

  13. 13.

    All-cause mortality (HR [95% CI]: 0.87 [0.78, 0.97])

Fig. 5: The impact of a healthy diet on the risk of chronic diseases and all-cause mortality in the high-risk frailty group.
Fig. 5: The impact of a healthy diet on the risk of chronic diseases and all-cause mortality in the high-risk frailty group.
Full size image

HR values were obtained from multivariate Cox proportional hazards regression, adjusted for age, sex, ethnicity, current smoking, current drinking, SBP, and DBP. Significance level: P < 0.05.

Healthy diet was associated with an increased risk of type 2 diabetes in the high-risk frailty group, but this association was not significant (P > 0.05).

Subgroup analysis

Furthermore, subgroup analyses were performed based on age and sex, adjusting for factors such as age, sex, ethnicity, current smoking status, current alcohol consumption status, SBP, and DBP. The associations between subtypes III and IV and the risks of coronary artery disease, heart failure, MACE, MI, type 2 diabetes, MASLD, COPD, SLD, and all-cause mortality were more pronounced in women under 60 years of age than in other populations. In contrast, the associations between subtypes III and IV and the risk of PAD were more pronounced in individuals aged 60 and older. The association between subtypes III and IV and the risk of lung cancer was also more pronounced in individuals under 60 years of age. No interactions were observed for kidney cancer or AAA (Supplementary Table 10).

Discussion

This study identified 11 metabolic features associated with frailty from approximately 160,000 individuals in the UK Biobank using 251 NMR biomarkers and classified four novel frailty subtypes through cluster analysis. Each subtype displayed distinct metabolic and clinical characteristics. Subtypes I and II had a lower risk of chronic diseases compared to subtypes III and IV. Consequently, subtypes I and II were grouped into the low-risk frailty group, while subtypes III and IV were grouped into the high-risk frailty group. Furthermore, our findings demonstrated that high adherence to a healthy diet significantly reduced the risk of chronic diseases in the high-risk frailty group, offering insights to inform personalized clinical decision-making.

Previous studies have applied various methods to categorize frail populations. For instance, research involving approximately 6000 participants identified clustering patterns of multidimensional health issues in older adults, resulting in four subtypes of geriatric frailty26. Another study, conducted in an Asian cohort, explored the relationship between frailty and intrinsic capacity, identifying subgroups with distinct outcomes over a year27. Linzy et al. used a data-driven approach to identify three frailty subtypes—NCF, MTF, and RTF—each exhibiting varying degrees and rates of neurocognitive decline, with MTF showing the steepest trajectory28. Similarly, Okoye et al. identified four clusters of patients with heart failure based on frailty, comorbidities, and B-type natriuretic peptide levels29. In contrast to these studies, our research is the first to conduct a cluster analysis on a large UK-based cohort spanning all ages. We identified subtypes with unique metabolic profiles and examined their associations with 13 chronic diseases and all-cause mortality, contributing to the advancement of personalized medicine.

Among the 11 differential metabolites, GlycA emerged as the most significant. The association between GlycA and frailty can be mechanistically explained through its link to inflammatory pathways, particularly the interleukin-6 (IL-6) signaling axis, which is a known driver of muscle atrophy—a hallmark of frailty. GlycA, a composite biomarker of acute-phase glycoproteins, is associated with various inflammatory markers, including IL-630. The IL-6 pathway is critical in regulating muscle metabolism, where chronic elevation can drive muscle wasting. In experimental models, such as in mice with colon cancer, elevated IL-6 levels have been observed, and IL-6 inhibition was shown to prevent cancer-induced muscle mass loss31. IL-6 exerts its effects by binding to the glycoprotein 130 (GP130) receptor, activating Janus kinases (JAKs) and the signal transducer and activator of transcription 3 (STAT3) pathway. This signaling cascade has been linked to muscle atrophy in numerous contexts. Prolonged IL-6 elevation in cultured myotubes and skeletal muscle cells can lead to increased expression of mitochondrial fission proteins (DRP-1 and FIS-1), which are implicated in muscle atrophy and cellular stress responses31. Thus, elevated GlycA levels in our high-risk subtypes (particularly HGVF) may reflect activation of this IL-6-mediated proteolytic pathway, contributing to the frailty phenotype and its associated adverse outcomes.

Linoleic acid (LA), a polyunsaturated fatty acid (PUFA), showed high SHAP values. Mendelian randomization studies suggest that elevated PUFA levels may prevent frailty32, possibly owing to LA’s antioxidant and anti-inflammatory properties33,34. LA’s influence on metabolic syndrome and related diseases may partly explain the improved prognosis observed in frail individuals with high LA concentrations. Subtype II, characterized by high Alb levels, aligns with research highlighting Alb’s mediating role in frailty and in-hospital mortality among patients with COPD, possibly via inflammatory mechanisms35,36. Conversely, subtype III, with low Alb levels, exhibited a poorer prognosis. This finding reinforces the association between frailty and hypoalbuminaemia, as demonstrated in older adults as well as surgical and hospitalized patients36. Subtype IV was distinguished by elevated Val levels, which may negatively affect frailty by inhibiting muscle synthesis—a hallmark of frailty linked to poor outcomes37,38,39. Restricting Val intake has shown the potential to improve frailty conditions in preclinical models38. It is worth noting that the metabolic profiles defining our frailty subtypes are not formed in isolation but are profoundly influenced by a constellation of behavioral, social, and clinical factors. Socioeconomic status (SES) is a fundamental determinant of health, shaping dietary patterns, access to nutrient-rich foods, and exposure to chronic stress, all of which can directly modulate systemic inflammation (e.g., GlycA levels) and fatty acid metabolism40,41. Consequently, the adverse metabolic signatures observed in our high-risk subtypes (III and IV) may be partially driven by socioeconomic disparities. Furthermore, medication use represents a critical, often necessary, confounder in metabolic studies. For instance, statins drastically alter cholesterol and lipoprotein metabolism42, while metformin and anti-inflammatory drugs can influence insulin sensitivity and inflammatory pathways43. The distinct metabolite levels we observed (e.g., in LA/FA, GlycA) could therefore reflect both the underlying pathophysiology of frailty and the metabolic effects of treatments for its associated comorbidities. Lastly, physical activity is a powerful modulator of the metabolome, influencing energy substrate utilization, insulin sensitivity, and inflammation44. Sedentary behavior, often more prevalent in frail individuals, can lead to ectopic fat accumulation and reduced mitochondrial function, thereby contributing to the pro-inflammatory characteristic of our high-risk subtypes. In summary, while we identified distinct metabolite-driven frailty subtypes, their manifestation is likely orchestrated by a complex interplay between biology, behavior (diet and exercise), social determinants, and clinical management.

Given the limitations of the frailty phenotype and FI45, integrating novel frailty subtypes based on metabolomics and clinical data into routine practice could enhance dynamic monitoring and stratified management of frailty. For the high-risk frailty group, promoting adherence to healthy dietary programs could actively reduce the risk of chronic conditions, such as coronary artery disease, heart failure, MI, MACE, MASLD, SLD, ESRD, COPD, PAD, AAA, lung cancer, kidney cancer, and all-cause mortality. This comprehensive strategy may alleviate the burden of frailty and improve clinical outcomes. In detail, first, for individuals identified as high-risk frailty (Subtypes III and IV), clinicians should prioritize them for intervention. The metabolic characteristics of these patients, such as systemic inflammation (high GlycA), amino acid metabolism dysregulation (high Val), and fatty acid composition imbalance (low LA/FA, high MUFA/FA), provide clear therapeutic targets for intervention. Second, personalized management plans should be developed for this high-risk group. Our research suggests that active lifestyle interventions, particularly nutritional therapies, are most beneficial for them. Based on the results of this study, we propose the following specific dietary recommendations: For Subtype III (LALF: low albumin, low linoleic acid): Encourage the intake of high-quality protein (to boost serum albumin levels) and foods rich in Omega-6 PUFA (such as linoleic acid) (e.g., soybean oil, sunflower oil, nuts) to improve the fatty acid profile. For Subtype IV (HGVF: high GlycA, high valine): The intervention should focus on anti-inflammatory diets (e.g., consuming more Omega-3 rich fish, reducing processed foods and saturated fats) and consider limiting foods high in branched-chain amino acids (BCAAs) (e.g., certain red meats and dairy products) to reduce inflammation and regulate amino acid metabolism.

The limitations of this study should be considered when interpreting the results. First, the UK Biobank cohort predominantly consists of individuals of European descent, with approximately 95% of participants identified as White, as detailed in Table 1. This European bias limits the generalizability of the findings to other racial or ethnic groups. Therefore, caution is warranted when applying these results to more diverse populations, and further validation in Asian and African cohorts is essential to assess the robustness and applicability of these findings across different genetic and environmental contexts. Second, although the NMR metabolomics platform by Nightingale Health offers a comprehensive and standardized metabolite assessment, it does not capture the full blood metabolome and includes a limited range of metabolites. Additionally, some biomarkers lack disease specificity, necessitating further exploration of their links to frailty. Third, randomized controlled trials are essential to validate the clinical utility and impact of these newly defined frailty subtypes. Future research should focus on evaluating their effectiveness and feasibility across different medical settings and populations, considering factors such as cost-effectiveness, resource allocation, and clinical acceptance. Finally, we acknowledge that the absence of spatial transcriptomics and single-cell RNA sequencing data limits our mechanistic understanding of frailty subtypes and chronic diseases. These techniques are crucial for exploring cellular heterogeneity and intercellular interactions. Future research should focus on validating mechanisms such as liver-muscle crosstalk, using these advanced methods to provide deeper insights and identify potential therapeutic targets.

In summary, our study elucidates metabolites associated with frailty, highlights the potential of novel frailty subtypes in managing frail individuals with unhealthy metabolism, and underscores the role of a healthy diet in mitigating the risk of chronic diseases and promoting health.

Methods

Study population

The UK Biobank was approved by the North West Research Ethics Committee (REC reference: 21/NW/0157) and all participants signed an informed consent. This community-based cohort study included about 500,000 volunteers from England, Scotland, and Wales. Baseline sociodemographic, lifestyle, health-related data, and blood samples were collected between March 2006 and October 2010.

Participants with missing data on 10 or more frailty-related items, missing metabolite information, prefrailty, or covariates with >20% missing values were excluded. Finally, 160,407 participants were included, with a median follow-up time of 13.8 years (Fig. 6).

Fig. 6
Fig. 6
Full size image

Flowchart of participant enrollment.

Measurement of frailty

Frailty was assessed using the FI, which reflects the accumulation of health deficits. This index is based on multiple indicators across various physiological and psychological domains, including symptoms, diagnosed diseases, and disabilities46. From the UK Biobank, 49 items were selected to construct an FI (Supplementary Table 1)47. Each participant’s health deficits were assessed, and the total number of health deficits was divided by 49 to calculate the FI, which ranged from 0 to 1. Participants with an FI ≤ 0.10 were classified as non-frail, those with an FI of 0.10–0.21 as pre-frail, and those with an FI > 0.21 as frail48.

Metabolite analysis

The UK Biobank’s NMR metabolomic data included 251 metabolomic biomarkers (170 absolute concentrations and 81 derived ratios) from the EDTA plasma samples of approximately 280,000 participants. These biomarkers include clinically validated indicators, such as cholesterol, fatty acids, amino acids, and inflammation markers, as well as emerging biomarkers, such as lipoprotein subclasses. For this study, all 251 metabolic biomarkers were analyzed (For more details, refer to Class 220 of the UK Biobank and Supplementary Table 2).

Selection of differential metabolites

The CatBoost algorithm was used for feature selection, with frail and non-frail populations as binary outcomes. The dataset was allocated into training and validation sets in a 7:3 ratio, and optimal model parameters were determined through five-fold cross-validation. SHAP values were then calculated to rank features according to their importance, and 11 differential metabolites were selected for further analysis based on these SHAP values. SHAP values were applied to interpret the model outputs and quantify feature importance. This method is grounded in cooperative game theory, which fairly allocates the contribution of each feature to the final prediction by considering all possible combinations of features. The mean absolute SHAP value was used to rank the features, where a higher value indicates a greater overall influence on the model’s prediction. The direction and magnitude of each feature’s effect were also interpreted: a positive SHAP value indicates that the feature increases the predicted risk of frailty, while a negative value suggests a protective effect. The dispersion of SHAP values reflects the consistency and strength of each feature’s influence across the population. XGBoost modeling was used for hyperparameter tuning and five-fold cross-validation to ensure the model’s best performance and reliability. The accuracy of the 11 differential metabolites in identifying frailty status was assessed using the AUC as the performance standard.

Determination of cluster number

PCA was performed on the differential metabolites for dimensionality reduction, and the optimal number of clusters was determined using an elbow plot and silhouette coefficient. The elbow plot calculates the sum of squared errors for different cluster numbers and identifies the “elbow” position, indicating the optimal number of clusters. The silhouette coefficient combines the cohesion and separation of clusters, with values ranging from −1 to 1; a higher value indicates better clustering performance. K-means clustering analysis was then performed. We generated 100 bootstrap samples and applied K-means clustering to each sample. The stability of the clusters was assessed using the Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI), with values greater than 0.8 indicating strong stability.

Definition of outcome events

Thirteen chronic diseases and all-cause mortality were considered, including coronary artery disease, heart failure, MACE, MI, type 2 diabetes, MASLD, COPD, SLD, PAD, ESRD, kidney cancer, lung cancer, and AAA. Outcomes were determined using the International Classification of Diseases, 10th Edition codes, excluding individuals with these outcomes prior to baseline. Prognoses between different subtypes were assessed using a Cox proportional hazards model and Kaplan–Meier survival curves.

Dietary intake assessment

A touchscreen questionnaire completed by participants at baseline was used to collect data on the consumption frequency of 12 foods over the past year: beef, lamb, pork, processed meat, oily fish, non-oily fish, fresh fruit, dried fruit, raw vegetables, cooked vegetables, grains, and bread. New data fields were created for the intake of (1) red meat, (2) total fish, (3) total vegetable, (4) total fruit, (5) whole grain, and (6) refined grain. Red meat intake was derived from the sum of beef, lamb, and pork; total fish intake was derived from the sum of oily and non-oily fish. Total vegetable intake combined cooked and salad/raw vegetables; total fruit intake considered fresh and dried fruit. Whole and refined grains were categorized based on bread and other grain types consumed. Participants’ food intake was categorized into seven groups: red meat, processed meat, total fish, total fruit, total vegetables, whole grains, and refined grains. Portion sizes were defined for each food item, and weekly consumption data for bread and grain were converted into daily consumption data22. For more information on the diet, please refer to Supplementary Table 4.

Healthy diet score estimation

We used seven dietary factors and critical value based on recommendations for cardiometabolic health: increasing fruits, vegetables, whole grains, and fish intake and reducing red meat, processed meat, and refined grain intake49,50. A healthy diet score was calculated using these components: total fruit ≥3 servings/day; total vegetables ≥3 servings/day; total fish ≥2 servings/week; processed meat ≤1 serving/week; red meat ≤1.5 servings/week; whole grains ≥3 servings/day; and refined grains ≤1.5 servings/day. Each favorable dietary factor received 1 point, with scores ranging from 0 to 1. Participants were classified into unhealthy (score <4) and healthy (score ≥4) diet categories (Supplementary Table 3).

Covariate assessment

Covariate data were collected through self-completed touchscreen questionnaires or oral interviews. These included age, sex, ethnicity (White or other), current smoking and drinking status, BMI, SBP, DBP, and ALT, AST, ALP, cholesterol, direct bilirubin, total bilirubin, triglycerides, and HbA1c levels. Missing covariate data (less than 20%) were imputed using the MICE package in R.

All analyses were conducted using R (v4.1.3) and Python (v3.8.19). Statistically, significance was defined as P < 0.05. For multiple comparisons across the outcomes, we applied False Discovery Rate (FDR) correction using the Benjamini-Hochberg method to control for false positives.