Introduction

Stroke remains one of the leading causes of long-term disability worldwide, with approximately 50% of survivors experiencing persistent functional impairment and 15–30% developing severe disabilities1. Growing attention has been directed toward the impact of stroke on skeletal muscle health, as neurological deficits and reduced mobility can accelerate muscle atrophy, impair strength, and decrease physical performance2. Sarcopenia, defined as the progressive and generalized loss of skeletal muscle mass, strength, and function, is highly prevalent among older adults3. Stroke-associated sarcopenia (SAS) represents a subtype of secondary sarcopenia characterized by muscle degeneration resulting from stroke-related neurological injury, inflammation3, and long-term physical inactivity4. SAS not only increases the risk of falls, fractures, and adverse clinical outcomes but also delays rehabilitation progress and imposes a substantial burden on healthcare systems5.

Currently, the clinical identification of SAS primarily relies on muscle strength testing and physical performance assessments. However, these approaches may fail to capture early or subtle muscle deterioration, limiting their utility for timely risk stratification and early interventions6. Consequently, accurate predictive tools are needed to facilitate early recognition of high-risk patients and guide individualized rehabilitation strategies.

In recent years, several studies have attempted to develop predictive models for stroke-related sarcopenia using traditional or basic machine-learning approaches. A Chinese study using a logistic regression model reported an AUC of 0.835. More recently, another study applying logistic regression, random forest, and XGBoost yielded only modest performance, with AUCs of 0.805, 0.796, and 0.780, respectively7. Although these studies provide preliminary evidence supporting the feasibility of predictive modeling, they consistently suffer from several limitations: most rely on a single modeling strategy, lack external validation, and provide limited interpretability, making it difficult for clinicians to understand how individual predictors contribute to model outputs. These methodological constraints restrict the clinical applicability of current models. To address these gaps, our study systematically compares five mainstream machine-learning algorithms, incorporates external cohort validation, and applies SHAP interpretability analysis to enhance transparency and provide clinically meaningful insights for early risk identification of stroke-related sarcopenia. To address these gaps, the present study integrates five mainstream machine learning (ML) algorithms to construct predictive models for SAS risk, systematically compares their predictive performance, and—most importantly—incorporates SHAP-based interpretability to elucidate the relative contribution of each feature. By enhancing model transparency and generalizability, this approach aims to support early screening and tailored intervention strategies for stroke patients at high risk of developing sarcopenia.

Methods

Study population

A convenience sampling method was used to recruit stroke patients from two tertiary hospitals in Kunming, China, between October 2024 and April 2025.The inclusion criteria were as follows: (1) Patients diagnosed with hemorrhagic or ischemic stroke, including both first-ever and recurrent cases; (2) Age ≥ 60 years; (3) Clinically stable condition; (4) Provision of written informed consent. Exclusion criteria included: (1) Pre-existing sarcopenia before admission, defined by a SARC-F score > 48; (2) Severe coma, intellectual disability, psychiatric disorders, or cognitive impairment preventing cooperation with body composition analysis; (3) History of major psychiatric illness or severe systemic disease; (4) Severe comorbidities such as heart failure, renal failure, malignancy, or end-stage organ disease; (5) Severe upper limb spasticity or pain that interfered with grip strength testing.

Assessment tools

General information questionnaire

A self-designed questionnaire was developed by the research team based on study objectives. Data collected included age, sex, smoking status, alcohol use, duration of bed rest, body mass index (BMI), presence of anxiety or depression, stroke type (ischemic or hemorrhagic), history of stroke, comorbidities (e.g., diabetes, hypertension, coronary artery disease), pulmonary infections, fracture history, fall history, dysphagia, aphasia, muscle weakness, nasogastric tube use, and laboratory data serum albumin, total protein, C-reactive protein, serum calcium, urea, creatinine, uric acid, hemoglobin, triglycerides.

Diagnosis of sarcopenia

Sarcopenia was diagnosed according to the criteria proposed by the Asian Working Group for Sarcopenia9 Diagnostic criteria included: (1)Low muscle mass: appendicular skeletal muscle mass index (SMI) ≤ 7.0 kg/m2 for men and < 5.4 kg/m2 for women; (2) Low muscle strength: handgrip strength < 28 kg for men and < 18 kg for women; (3) Poor physical performance: 6-m gait speed < 1.0 m/s. A diagnosis of sarcopenia was made if criterion (1) was met, along with either (2), (3), or both.

National institutes of health stroke scale (NIHSS)

The NIHSS, developed by Brott et al. in 198910, was used to assess the severity of neurological impairment in stroke patients. The scale includes 11 items, covering level of consciousness, visual fields, facial palsy, limb movements, language, attention, and other domains. Each item is scored on a scale from 0 to 4, with a total possible score ranging from 0 to 42. Higher scores indicate more severe neurological deficits. In this study, the Cronbach’s α coefficient of the scale was 0.939, indicating high internal consistency. The NIHSS is widely used in both clinical and research settings for stroke assessment.

Glasgow coma scale (GCS)

The Glasgow Coma Scale (GCS), developed by Teasdale and Jennett in 1974, was used to assess consciousness and neurological function. The scale comprises three components: eye-opening, verbal response, and motor response. Each component is scored separately (eye-opening: 1–4, verbal response: 1–5, motor response: 1–6), yielding a total score range of 3–15. Higher scores indicate a more alert state of consciousness. The GCS demonstrated good reliability and validity in the stroke population in this study11.

Activities of daily living (ADL) scale

The ADL scale, originally developed by Katz et al. in 1963, was employed to evaluate the basic self-care ability of participants. The scale includes six items: feeding, dressing, toileting, bathing, mobility, and transferring. Each item is scored based on the degree of independence: 1 (complete dependence), 2 (partial dependence), or 3 (complete independence). The total score ranges from 6 to 18, with higher scores indicating greater functional independence. In this study, the Cronbach’s α coefficient was 0.978, suggesting excellent internal consistency12.

Data collection and quality control

Data were collected through a combination of paper-based questionnaires and in-person clinical assessments conducted at participating hospitals. Each hospital designated a trained investigator responsible for overseeing the data collection process. The questionnaire included items assessing demographic characteristics, medical history, lifestyle factors, and relevant clinical symptoms. Clinical assessments involved standardized measurements such as blood pressure, height, weight, and other relevant parameters, depending on the study’s objectives. Standardized instructions were provided to all participants prior to questionnaire completion to ensure responses were informed, voluntary, and anonymous. Ethical approval was obtained from the institutional review boards of all participating hospitals, and written informed consent was obtained from each participant. Upon completion, all questionnaires were returned to the central research team. Data were independently reviewed and double-entered into a secured database to minimize input errors. Invalid or incomplete questionnaires—such as those with excessively short completion times, missing data, or patterned/duplicate responses—were identified and excluded from the final analysis.

Machine learning model parameters

To ensure reproducibility, the hyperparameters of all machine learning models used in this study (Logistic Regression, Decision Tree, Random Forest, Naive Bayes, and Gradient Boosting) were summarized in Supplementary Table 1. All models were implemented using scikit-learn with default settings unless otherwise specified. The max_iter parameter of Logistic Regression was set to 1000 to ensure convergence. Additionally, confusion matrices were generated for all five machine-learning models to evaluate the distribution of true positives, true negatives, false positives, and false negatives in both the training and external validation cohorts. To further examine potential multicollinearity among the selected predictors, pairwise correlations were evaluated, and no strong correlations were identified (all |r|< 0.7).

Results

General characteristics of participants and prevalence of sarcopenia

A total of 456 questionnaires were collected, of which 425 were deemed valid, yielding a valid response rate of 93.2%. Among the 425 included stroke patients, 145 (34.1%) were diagnosed with stroke-related sarcopenia (SAS) based on the AWGS diagnostic criteria, while the remaining 280 were classified as non-sarcopenic. Detailed demographic and clinical characteristics are presented (Table 1).

Table 1 Comparison of general characteristics between sarcopenia and non-sarcopenia groups in stroke patients.

For model development, the dataset was randomly divided into a modeling cohort and an external validation cohort using stratified sampling based on the outcome variable. The modeling cohort consisted of 280 participants, while the external validation cohort included 145 participants. Comparison of baseline demographic and clinical characteristics between the two cohorts showed no significant differences across major variables, indicating good balance and comparability between the training and validation sets (Table 2). This ensured that the subsequent evaluation of model performance was conducted on a representative and unbiased cohort.

Table 2 Comparison of baseline characteristics between the modeling cohort and the external validation cohort.

Feature selection for stroke-related sarcopenia risk factors

To identify key predictors of stroke-related sarcopenia, a random forest (RF) algorithm was employed to rank the importance of all candidate variables. Features with importance scores exceeding the average threshold value (0.031) were selected for model inclusion. This cutoff was chosen to balance variable interpretability and model complexity in practical applications. Based on the importance ranking, 12 key features were retained for model development: BMI, serum albumin, age, uric acid, hemoglobin, creatinine, calcium ions, NIHSS score, total protein, triglycerides, CRP, and urea. The results of feature selection are illustrated (Fig. 1). Correlation analysis confirmed that no excessive multicollinearity existed among the selected predictors.

Fig. 1
figure 1

Feature importance ranking of predictors in the random forest model for sarcopenia in stroke patients.

Development and evaluation of predictive models for stroke-related sarcopenia

In this study, five machine learning algorithms were applied to construct predictive models for assessing the risk of sarcopenia in stroke patients. All models were trained using the modeling cohort and evaluated in an independent validation cohort to assess generalizability. As shown Fig. 2, the RF and GB models demonstrated the best performance in the training set, with area under the ROC curve values of 0.967and 0.943, respectively—both significantly outperforming the other models. Notably, the RF model achieved superior performance across several key classification metrics, including F1-score, accuracy, and recall, indicating strong overall predictive capability. Although the GB model exhibited a slightly higher AUC than RF in the validation set, the RF model demonstrated more consistent and robust performance across both datasets. Therefore, the Random Forest model was selected as the optimal predictive model for stroke-related sarcopenia in this study. The detailed performance metrics of all models are summarized (Table 3). In addition, to provide a more comprehensive evaluation of model performance, the confusion matrices for all five algorithms in both the modeling and external validation cohorts have been included in the Supplementary Materials (Figs. S1and S2). These matrices clearly illustrate the classification behavior of each model and help identify potential misclassification patterns.

Fig. 2
figure 2

ROC curve comparison of five machine learning models for predicting stroke-related sarcopenia.

Table 3 Comparison of performance metrics of five machine learning models in the training and validation sets.

Interpretability analysis of the stroke-related sarcopenia prediction model

Contribution of key features to model prediction

To enhance the interpretability of the model, SHAP analysis was applied to the RF model to assess the importance and directional impact of each feature in predicting stroke-related sarcopenia. The distribution of SHAP values for the 12 selected features is presented (Fig. 3). The results indicated that lower values of BMI and serum albumin were consistently associated with higher SHAP values, highlighting the critical role of poor nutritional status in sarcopenia risk. Similarly, higher SHAP values were observed in older individuals, reinforcing age as a strong independent risk factor. While some variables exhibited relatively lower average contributions, they still demonstrated substantial SHAP values in certain individuals, suggesting their potential relevance in personalized risk assessment.

Fig. 3
figure 3

SHAP summary plot of key variables in the random forest model.

Relationships between key variables and model predictions

Figure 4 presents SHAP dependence plots for the top six continuous variables ranked by feature importance: BMI, serum albumin, age, uric acid, creatinine, and hemoglobin. The x-axis represents the actual observed values of each variable, while the y-axis indicates the corresponding SHAP values, which reflect each variable’s magnitude and direction of influence on the model’s prediction. Figure 4A, B reveal a generally negative correlation between SHAP values and both BMI and serum albumin levels, suggesting that lower values of these variables significantly increase the predicted risk of sarcopenia—highlighting their close association with poor nutritional status. As shown (Fig. 4C), age exhibits a clear positive correlation with SHAP values, indicating that older age is a strong positive predictor of sarcopenia risk. In Fig. 4D, E, increases in uric acid and creatinine levels are associated with declining SHAP values, possibly reflecting underlying metabolic dysfunction or impaired muscle metabolism. Finally, Fig. 4F shows that lower hemoglobin levels correspond to higher SHAP values, implying that anemia may contribute to an elevated likelihood of sarcopenia in stroke patients.

Fig. 4
figure 4

SHAP dependence plots of key features for predicting stroke-related sarcopenia.

SHAP-based interpretations of individual predictions

Figure 5 displays SHAP force plots for two individual stroke patients, illustrating the contribution and direction of each feature in the prediction of sarcopenia risk. These visualizations offer a detailed, patient-specific explanation of how the model arrives at its final output. In the left panel, the baseline prediction probability was 0.347, which increased to 0.82 after accounting for multiple contributing factors, leading the model to classify the patient as high-risk. Among these features, BMI (17.8) emerged as the most influential positive contributor (SHAP + 0.37), followed by hemoglobin, age, and uric acid, all of which provided additional upward influence on the prediction. In contrast, creatinine and total protein had modest negative contributions, slightly lowering the risk score. In the right panel, the final prediction decreased from the baseline to 0.02, primarily due to negative contributions from BMI, age, uric acid, and creatinine, resulting in a low-risk classification. Overall, the SHAP force plots clearly illustrate how individual variables influence the model’s prediction trajectory, making explicit both the direction and magnitude of each variable’s effect. These findings highlight the model’s strong interpretability and clinical transparency at the individual patient level.

Fig. 5
figure 5

SHAP force plots for individual prediction of stroke-related sarcopenia.

Discussion

Elevated risk of sarcopenia among stroke patients

The findings of this study revealed that 34.1% of stroke patients met the diagnostic criteria for sarcopenia, a prevalence notably higher than that observed in patients with cardiovascular disease13, diabetes14, respiratory system15. This suggests that stroke survivors represent a particularly high-risk population for sarcopenia, consistent with the findings reported by Yao16. The underlying mechanisms contributing to sarcopenia in stroke patients are multifactorial. They may include reduced physical activity due to motor dysfunction, acute-phase systemic inflammation, and inadequate nutritional intake. In addition, stroke predominantly affects the elderly, who typically have lower baseline muscle reserves and are more likely to suffer from comorbid chronic conditions such as coronary artery disease and hypertension, further accelerating muscle mass loss17,18.

Moreover, the lack of effective rehabilitation interventions during the post-stroke recovery period can exacerbate the progression of sarcopenia. Therefore, early identification of high-risk individuals is essential to improve rehabilitation outcomes and enhance quality of life. Against this background, our study aimed to apply machine learning algorithms to develop robust prediction models and to identify the key factors contributing to stroke-related sarcopenia, thereby providing a theoretical foundation for personalized risk stratification and targeted intervention strategies.

Performance of machine learning models in predicting stroke-related sarcopenia

To enhance model efficiency and minimize the influence of redundant variables, a RF algorithm was employed to perform initial feature selection19. Compared with traditional feature selection methods, RF is capable of capturing complex non-linear interactions among variables and demonstrates strong robustness to outliers and noise, while preserving model interpretability. Through this process, a total of 12 key variables were identified as strongly associated with sarcopenia: BMI, serum albumin, age, uric acid, serum creatinine, hemoglobin, calcium ion, NIHSS score, triglycerides, CRP, total protein, and urea. Based on these selected features, five machine learning models were developed: LR, DT, RF, NB, and GB. All models were trained using five-fold cross-validation and subsequently evaluated on an independent validation set to assess their generalizability. Among the five models, the RF model demonstrated superior performance in both the training and validation cohorts, with higher accuracy, recall, and F1 scores, alongside better robustness and interpretability. As such, the RF model was selected as the optimal predictive tool in this study for identifying sarcopenia risk in stroke patients.

In addition, we observed inconsistent performance of the Naïve Bayes (NB) model, which showed relatively poor discrimination in the training cohort but performed noticeably better in the external validation cohort. This discrepancy is largely attributable to the class imbalance between the two datasets, as the modeling set contained a lower proportion of sarcopenia cases than the validation set. Because NB relies on prior probability estimation, its performance is more susceptible to shifts in class distribution. Furthermore, the selected predictors did not exhibit strong intercorrelations, indicating a low risk of multicollinearity and ensuring that the divergent performance of NB was not caused by feature redundancy. The confusion matrices generated for both the training and validation sets (Supplementary Fig. S1) provided additional insight into class-level errors. The NB model exhibited higher false-negative rates in the training cohort but yielded more balanced predictions in the external validation dataset. These findings highlight the importance of examining misclassification patterns—rather than relying solely on global metrics such as accuracy or AUC—when assessing model robustness. In comparison with previously published models, the predictive performance of our machine-learning framework demonstrates several notable advantages. A Chinese study using a logistic regression model reported an AUC of 0.835, while another study applying both logistic regression and decision tree algorithms achieved AUCs of 0.959 and 0.892, respectively. Although these models demonstrated acceptable discriminative ability, they relied on single modeling approaches and lacked external validation. More recently, a study employing random forest and XGBoost reported relatively modest AUCs of 0.796 and 0.780, suggesting limited generalizability of these basic machine-learning applications7. In contrast, our RF model maintained consistently strong and balanced predictive performance in both the training and external validation cohorts, demonstrating improved robustness over existing models. Furthermore, by incorporating SHAP-based interpretability, our study provides transparent and clinically meaningful insights into feature contributions—an aspect largely absent in previous research—thereby reinforcing the novelty and practical applicability of our approach.

Analysis of key predictive factors for stroke-related sarcopenia

SHAP analysis based on the RF model identified BMI, serum albumin, age, uric acid, serum creatinine, hemoglobin, calcium ion, NIHSS score, triglycerides, CRP, total protein, and urea as the most important variables contributing to sarcopenia prediction. These features span multiple physiological domains, including nutritional status, metabolic function, inflammatory activity, and overall physiological condition.

In terms of nutrition-related factors, lower levels of serum albumin and total protein suggest a high likelihood of malnutrition, which is a well-established contributor to muscle wasting and sarcopenia20. A low BMI reflects underweight status, which is often accompanied by loss of both fat and lean muscle mass, ultimately impairing muscle strength and physical function21. Interestingly, the model also flagged some individuals with a high BMI as being at elevated risk for sarcopenia, suggesting the presence of sarcopenic obesity—a condition in which excess body fat may mask underlying skeletal muscle loss22. This highlights the limitation of using BMI alone for clinical risk assessment and underscores the need for integrating body composition analysis to improve diagnostic precision. From a metabolic perspective, lower cholesterol and triglyceride levels, both indicative of reduced energy reserves or disrupted lipid metabolism, may negatively impact muscle protein synthesis23. Inflammatory and metabolic markers such as CRP, urea, creatinine, and uric acid also ranked highly in model importance. Elevated CRP levels, a classical marker of systemic inflammation, are known to inhibit muscle protein synthesis and promote catabolism24. Uric acid, creatinine, and urea levels reflect renal and muscle metabolic by-products, suggesting possible impairment in muscle metabolism25,26.Among the hematological and electrolyte indicators, low hemoglobin levels were associated with increased sarcopenia risk, likely due to compromised oxygen delivery to muscle tissue and reduced endurance capacity27. Additionally, calcium, a critical mediator of neuromuscular signaling, may contribute to muscle dysfunction if depleted28. Age, a non-modifiable risk factor, was identified as a strong positive predictor of sarcopenia in the model, reaffirming that older adults are particularly vulnerable29. This association may be linked to age-related declines in hormone levels, persistent low-grade inflammation, and reduced physical activity. Importantly, SHAP force plots revealed distinct variable contribution pathways between high-risk and low-risk individuals, reinforcing the model’s effectiveness in capturing personalized risk profiles30. This interpretability enhances clinical applicability and provides a foundation for tailoring targeted interventions.

In summary, the key variables identified in this study offer a multidimensional perspective on the pathogenesis of stroke-related sarcopenia and support the development of personalized risk stratification, early intervention, and rehabilitation strategies in clinical practice.

Conclusions

This study developed five machine learning models to predict the risk of sarcopenia among stroke patients. Among these, the RF model demonstrated superior performance across multiple evaluation metrics compared to traditional models such as LR, DT, NB, and GB. The application of machine learning offers a novel and efficient approach to identifying high-risk individuals for stroke-related sarcopenia and provides insights into the underlying pathophysiological mechanisms. These findings may support the development of personalized prevention strategies and precision rehabilitation plans.

However, several limitations should be acknowledged. First, the sample size was relatively limited, which may affect the generalizability of the model. Second, although the model incorporated a broad range of physiological and biochemical indicators, certain potentially relevant variables—such as body composition metrics, hormone levels, and physical activity data—were not included. Future research should focus on expanding the sample size, enriching the feature set, and conducting external validation across multiple clinical centers and geographic regions. Such efforts will help to further enhance the model’s robustness, generalizability, and clinical utility.