Machine learning models predict risk of lower extremity deep vein thrombosis in hospitalized patients with spontaneous intracerebral hemorrhage

Qiu, Weizhi; Cui, Penglei; Li, Shaojie; Tang, Zhenzhou; Chen, Jiani; Wang, Jiayin; Li, Yasong

doi:10.1038/s41598-025-10905-2

Download PDF

Article
Open access
Published: 10 July 2025

Machine learning models predict risk of lower extremity deep vein thrombosis in hospitalized patients with spontaneous intracerebral hemorrhage

Weizhi Qiu¹^na1,
Penglei Cui¹^na1,
Shaojie Li¹^na1,
Zhenzhou Tang¹,
Jiani Chen¹,
Jiayin Wang¹ &
…
Yasong Li¹

Scientific Reports volume 15, Article number: 24932 (2025) Cite this article

1531 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

Lower extremity deep vein thrombosis is one of the important complications of spontaneous intracerebral hemorrhage. We aimed to develop a risk assessment model to predict the risk of lower extremity DVT during hospitalization in patients with spontaneous cerebral hemorrhage. The retrospective study began by randomly dividing the data into a training set and a test set in a 7:3 ratio. Feature selection was performed in the training set, and Boruta and LASSO algorithms were used to screen significant predictors. Five machine learning algorithms were used to construct the prediction model and the model accuracy was evaluated by ROC curves. To validate the model, we constructed calibration curves and compared the calibration of the model using the Brier score. Finally, the clinical value of the model was assessed by Decision Clinical Curve (DCA) and the “black box” model was interpreted by SHAP. The training and test sets did not show significant differences between the individual variables. Screening by the LASSO and Boruta algorithms yielded 15 and 7 potentially relevant variables, respectively, resulting in the identification of six significant predictors associated with DVT. Subsequently, the performance of five machine learning algorithms in DVT prediction was evaluated in the test set. These results suggest that the LGBM model has significant advantages in predicting DVT after cerebral hemorrhage. We developed a model to predict the risk of lower extremity deep vein thrombosis during hospitalization in patients with spontaneous cerebral hemorrhage, and this model can accurately identify high-risk patients.

Comparing machine learning models for predicting preoperative DVT incidence in elderly hypertensive patients with hip fractures: a retrospective analysis

Article Open access 16 April 2025

Non-invasive diagnosis of deep vein thrombosis from ultrasound imaging with machine learning

Article Open access 15 September 2021

Development and validation of a predictive model for preoperative deep vein thrombosis following traumatic thoracolumbar fractures

Article Open access 22 August 2024

Introduction

Intracerebral hemorrhage (ICH) accounts for approximately 10–30% of all strokes, and spontaneous ICH is the most prevalent subtype, primarily caused by cerebral small vessel disease¹. As a life-threatening form of acute stroke, spontaneous cerebral hemorrhage affects approximately 2 million people worldwide each year²and its incidence increases significantly with age, with a particularly high risk in people over 85 years of age³. Despite advances in treatment and diagnosis, mortality rates remain high and most patients have difficulty returning to long-term independent living. Overall, cerebral hemorrhage remains a high-risk disease, and despite advances in treatment, its high mortality rate and poor recovery make it a major health challenge.

Venous thromboembolism (VTE) encompasses both deep vein thrombosis (DVT) and pulmonary embolism (PE)⁴. DVT is a common complication in hospitalized patients and is associated with higher complication rates, mortality, and healthcare costs⁵. The risk of VTE is higher in patients with ICH, approximately two to four times higher than in patients with acute ischemic stroke. According to retrospective studies from several large databases, VTE occurs in approximately 2–4% of ICH patients during hospitalization and 1–2% of symptomatic DVT⁶. The occurrence of DVT in the lower extremities after spontaneous cerebral hemorrhage may lead to serious consequences. The most dangerous is pulmonary embolism, which can trigger respiratory distress or even death. In addition, DVT may exacerbate the patient’s activity limitation, delay the recovery process, and increase the length of hospitalization and healthcare costs. Anticoagulant medications used to treat DVT may also increase the risk of rebleeding, especially in patients with a history of ICH. In conclusion, the occurrence of DVT significantly affects the prognosis and quality of life of patients.

Various risk assessment models (RAMs) have been created to evaluate the likelihood of VTE. Among these, the model developed by Caprini in 1991, based on data from 538 patients, is the most widely utilized⁷. Caprini score is calculated by evaluating multiple risk factors and summing their scores. Each risk factor is assigned a fixed score based on its severity or frequency⁸. The total score obtained from the Caprini risk assessment model (RAM) reflects the overall likelihood of a VTE event, but it does not distinguish between the risks of PE and DVT. Over 200 studies have evaluated the Caprini RAM, consistently confirming that higher scores are associated with an increased risk of VTE⁹. The Padua RAM is a risk-assessment score comprising 11 risk factors, ranging from 0 to 20, which can help differentiate between VTE high-risk and low-risk medical patients. Appropriate thromboprophylaxis for high-risk patients during hospitalization can prevent thromboembolic events in the long term with a lower risk of bleeding¹⁰. The Autar DVT Scale was developed based on Virchow’s triad: venous stasis, coagulation activation, and venous wall damage. It consists of seven risk factor subcategories that are typically applied in various clinical settings¹¹. The study conducted by Orak et al. showed, by ROC analysis, an AUC value of 0.61 ± 0.05 for the Autar scale and 0.53 ± 0.06 for the Padua scale. In addition, the accuracy rates of Autar and Padua were 24% and 14%. Although the sensitivity of both scales was high, the specificity was low (Autar: 14%, Padua: 3%). Comparing the AUC of the two with the log-rank test (p = 0.192) did not show any significant difference. Further analysis revealed a positive likelihood ratio of 1.17 for Autar and 1.03 for Padua¹¹.

Developing a model to predict the risk of DVT in hospitalized patients with spontaneous cerebral hemorrhage is of significant importance. Although several RAMs have been developed to predict the risk of VTE, these models fail to fully take into account the specific clinical characteristics of patients with cerebral hemorrhage. Most of the existing models such as Caprini, Padua, and Autar are based on data from hospitalized patients in general and may not accurately predict the risk of thrombosis during hospitalization in patients with cerebral hemorrhage. The limitations of these models, especially in the context of neurological injury, activity limitations, and other factors, highlight the need to develop specific VTE risk assessment tools for patients with intracerebral hemorrhage. In recent years, the application of machine learning in clinical risk prediction has demonstrated significant potential. It can automatically identify the most critical variables from a large set of clinical features—a process analogous to how clinicians prioritize key indicators when assessing patient risk. Unlike traditional scoring systems that rely on manually assigned weights and predefined rules, machine learning uses a data-driven approach to quantify the actual contribution of each variable to the outcome, thereby improving the accuracy and objectivity of predictive models.

Clinical scoring systems such as the Caprini risk score are built upon years of clinical experience and offer a clear and interpretable structure. However, they have limitations when it comes to capturing complex interactions among variables. With the advancement of artificial intelligence, comparing—and even integrating—these traditional tools with machine learning methods holds promise for achieving a better balance between interpretability and predictive performance, ultimately enhancing the efficiency and quality of clinical risk assessment and decision-making.

Materials and methods

This retrospective study collected 900 sICH patients admitted to the Department of Neurosurgery from January 2015 to May 2022 from the electronic case system of the Second Affiliated Hospital of Fujian Medical University and included them in the study sample. Due to the retrospective nature of the study, the Ethics Committee of The Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian Province, China waived the need of obtaining informed consent (Ethics Number: 2024 − 677). Relevant data on ICH were extracted from medical records. We included patients who met the following criteria: (1) age > 18 years; (2) meeting the diagnostic criteria for sICH established by the Cerebrovascular Disease Group of the Chinese Academy of Neurology, and the diagnosis was consistent with a cranial CT scan; and (3)the time interval between the onset of intracerebral hemorrhage symptoms and the first cranial CT examination was < 24 h. Exclusion criteria: (1) secondary cerebral hemorrhage, such as trauma, cerebral infarction with cerebral hemorrhage transformation, cerebral vascular malformation, and brain tumors; (2) patients with severe comorbidities in other organs such as the heart, liver, and kidneys; (3) patients with prolonged bed rest before hospitalization; and (4) exclusion of patients who had undergone surgery, interventions, or other surgical instruments before cranial CT review. (5) Incomplete data. After admission, each patient was examined and treated by a specialized neurologist according to ICH treatment guidelines.

Data collection

We collected characteristics regarding the general demographic characteristics of the patients at the time of admission, their condition at the time of admission, laboratory tests, and general imaging data. Demographic characteristics mainly included age, gender, smoking and alcohol consumption; condition at admission covered hypertension, history of diabetes mellitus, GCS, temperature, systolic blood pressure, diastolic blood pressure; laboratory investigations incorporated leukocytes, hemoglobin, platelets, glucose, uric acid, activated partial thromboplastin time, prothrombin time, fibrinogen, and D-dimer; and general imaging studies focused on hematoma volume, hemorrhage location, and midline shift. Among them, the midline shift > 5 mm was considered significant. The treatment plan for surgical patients was determined by neurosurgeons based on patient-specific clinical circumstances and relevant guidelines for the management of spontaneous intracerebral hemorrhage (sICH).

Definition of outcome events

In this study, all hospitalized patients underwent lower extremity color Doppler ultrasound screening upon admission, irrespective of clinical symptoms or elevated D-dimer levels. The presence of clinical symptoms such as swelling, pitting edema, erythema, tenderness, or visible superficial veins, combined with elevated D-dimer levels, is considered highly suggestive of in-hospital deep vein thrombosis (DVT), warranting prompt confirmation by color Doppler ultrasound. In this study, in-hospital DVT was the outcome event.

Sample size Estimation

To prevent overfitting and maintain the accuracy of a predictive model, it is crucial to ensure an adequate sample size during its development. The sample size was calculated using the formula :\(\:N=(\frac{1.96\times\:\sqrt{{\upphi\:}(1-{\upphi\:})}}{{\updelta\:}}\))². φ denotes the expected outcome ratio (φ = 0.20), and δ is the set margin of error (δ = 0.05)¹². Based on this formula, a minimum of 246 patients is required for the model development process. The training set provided an adequate sample size for the development of the model.

Statistical analysis

Categorical variables were analyzed using either the chi-square test or Fisher’s exact test, with the results presented as percentages. Continuous variables with a normal distribution were expressed as mean ± standard deviation and analyzed using the t-test. For variables that did not follow a normal distribution, data were reported as quartiles and analyzed using nonparametric tests. We divided the data into training and test sets by performing a random split of the data in a ratio of 7:3. Feature selection was carried out in the training set using a robust method to identify the most pertinent predictors for model development, ensuring that no relevant factors were overlooked. To identify significant predictors, we employed both the Boruta algorithm¹³ and the Least Absolute Shrinkage and Selection Operator (LASSO) algorithm¹⁴generating two distinct sets of relevant features. To guarantee the inclusion of only the most pertinent and stable variables in the predictive model, we selected the common predictors from both sets as the final features for model construction. This strategy aims to improve the accuracy and generalization of the model while minimizing the risk of overfitting or combining irrelevant predictor variables. Since covariance can make it difficult to evaluate the individual contribution of each feature to the outcome, we employed a pairwise Pearson correlation matrix to examine the relationships between clinical features¹⁵. To construct the DVT prediction model, we applied five ML algorithms including Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (XGB). Then, the data resampling technique (SMOTE + RUS) was used to deal with data imbalance. The DVT prediction model was then developed on the training dataset. On the test dataset, the discriminative performance of the five machine learning models was evaluated by utilizing the subject work characteristics (ROC) curves and the area under the curve (AUC) for each model was calculated. In addition, to further examine the performance of the models, we constructed calibration curves and performed internal validation using the Bootstrap method with a repetition number of B = 1000, and the Brier score was used to compare the calibration of the models. Finally, through Decision Clinical Curve (DCA) analysis, we assessed the validity and practical value of these predictive models in clinical applications. On this basis, we used metrics such as Accuracy, Prevalence, Recall, F1-Score, Matthews Correlation Coefficient (MCC), Area Under the Curve of Subjects’ Operating Characteristics (AUROC), Precision, and Specificity) and other indicators were used to further evaluate the model. With these combined metrics, the best model could be identified. The SHAP (Shapley Additive Explanations) method was used for the interpretation and visualization of the prediction models. The SHAP method explains these complex “black box” models and accurately calculates the contribution and influence of each feature in the final prediction.

Results

Baseline characteristics

This study initially included 900 participants who met the primary diagnostic criteria. After chart review, we excluded 23 patients who had undergone surgery before admission, 65 patients with symptoms present for more than 24 h, and 1 patient with missing data. Ultimately, 811 eligible participants were included for analysis (Fig. 1). In addition, Table 1 demonstrates a comparison of baseline characteristics between the training set and the testing set. No significant differences were observed between the training and testing sets for almost all variables.

Table 1 Baseline characteristics of participants.

Full size table

Screening of feature variables

To carry out preliminary modeling, this study used data from all patients in the training cohort for feature selection and model development. The screening of variables was performed using LASSO regression combined with Boruta’s algorithm. First, the optimal regularization parameter λ was determined by ten-fold cross-validation of LASSO regression, and the value of λ.min, which corresponds to the parameter at the minimum likelihood deviation, was chosen. At this value of λ, the model contains 15 nonzero coefficients that correspond to the LASSO-screened variables (Fig. 2A, B). We also screened all features in conjunction with the Boruta algorithm to assess the importance of each feature and identify those that significantly contribute to DVT after cerebral hemorrhage (Fig. 2C, D). Through this process, seven variables were identified as significant features (Fig. 2E). Ultimately, combining the 15 and 7 potentially relevant variables screened using the LASSO and Boruta algorithms, the two together identified 6 significant factors associated with DVT, which were identified as independent predictors, including age, leukocytes, admission systolic blood pressure, surgical condition, hemorrhage volume, and midline shift. In addition, we performed Spearman correlation analysis on these characteristics to assess the covariance between the variables (Fig. 2F). The results showed that none of the correlation coefficients between the characteristics exceeded 0.8, indicating that there was no significant covariance between the variables.

Testing of machine learning models

Our model training and testing is based on the selected features described above. In the test set, we evaluated five ML algorithms in DVT prediction. The results showed that LGBM, XGBoost, RF, SVM, and LR models, obtained an AUC of 0.998, 0.981, 0.839, 0.754, and 0.729, respectively. In contrast, LGBM had the highest prediction accuracy among the 5 models (Fig. 3A). Calibration curves were used to assess model prediction reliability and validity. Among the 5 machine learning models, LGBM was essentially close to the diagonal and achieved the lowest Brier score (Fig. 3B). DCA was performed on the test dataset to compare the net gain of the best model against clinical decision alternatives. The LGBM model exceeded the threshold probability of net gain over the other learning models (Fig. 3C).

Model evaluation

To further evaluate the performance of the models, this study compared common performance metrics of different machine learning prediction models. The results of the study revealed that the LGBM model showed the best prediction results among all models (Fig. 3D). Compared with other models, LGBM excelled in several key metrics, including accuracy (0.9433), recall (0.9785), F1 score (0.9504), MCC (0.8863), and AUROC (0.9881) (Supplementary Table 1). These results suggested that the LGBM model had significant advantages in predicting DVT after intracerebral hemorrhage. Therefore, LGBM was selected as the optimal model for predicting DVT after cerebral hemorrhage in this study.

Description of the LGBM model using the SHAP

The SHAP algorithm was employed to assess the contribution of each predictor variable to the outcomes predicted by the LGBM model. The variable importance plot ranked the predictors in descending order of their importance (Fig. 4A). Hematoma volume had the strongest predictive value for all predictive ranges, followed by systolic blood pressure, age, leukocytes, surgical treatment, and midline shift. Meanwhile, SHAP values were applied to reveal DVT risk factors to detect the positive or negative relationship of predictors with the target outcome. As shown in Fig. 4A, the horizontal position shows whether the effect of the value is associated with a higher or lower prediction, and the color shows whether the variable is high (red) or low (yellow) for that observation. It can be seen that an increase in mean hematoma volume has a positive effect and drives the prediction toward DVT. To further enhance the visualization, a single-sample force plot for the first patient with lower-extremity DVT in this database is presented in Fig. 4B. The SHAP values represent the features that influence the prediction for each individual patient, as well as the specific contribution of each feature to the prediction of the outcome. As shown in Fig. 4B, the red features (on the left side) represent factors that are linked to a higher risk of DVT, while the blue features correspond to factors associated with a lower risk of DVT. In this patient’s case, surgery, age, white blood cell count, and hematoma volume were the main risk factors. The length of the arrows helps to visualize the magnitude of each feature’s impact on the predicted outcome; the longer the arrow, the greater the impact of the feature on the prediction.

Discussion

Patients with spontaneous ICH are at a higher risk of developing DVT of the lower extremities, primarily due to factors such as prolonged bed rest and limited mobility during hospitalization. Studies have shown that the incidence of DVT in ICH patients during hospitalization is approximately 20–40% [1]. These patients are often unable to perform normal limb activities because of neurological impairment, which leads to blood stasis and increases the risk of DVT. Currently, there are relatively few risk prediction models for the development of lower extremity DVT in patients with spontaneous ICH, and most of the available risk assessment models (e.g., Caprini, Padua, and Autar, among others) were developed based on the general hospitalized patient population or on a specific disease group, such as patients with acute ischemic stroke. Although these models have some validity in predicting VTE risk, they have not been optimized specifically for the specific clinical characteristics of patients with intracerebral hemorrhage. The development of a DVT risk prediction model for patients with intracerebral hemorrhage would allow for a more accurate assessment of a patient’s risk of developing a thrombus during hospitalization, thereby providing individualized clinical prophylaxis. This will help improve the effectiveness of thromboprophylaxis, reduce unnecessary therapeutic interventions, lower the risk of bleeding complications, and ultimately improve the long-term prognosis of patients.

To predict the risk of lower extremity DVT in patients with spontaneous intracerebral hemorrhage, we retrospectively analyzed data from the Second Affiliated Hospital of Fujian Medical University and screened independent predictors: age, leukocytes, admission systolic blood pressure, surgical condition, hemorrhage volume, and midline shift by the Boruta and LASSO algorithms. Based on these factors, we applied five machine-learning algorithms (RF, LR, SVM, LGBM, and XGB) to construct a DVT prediction model. The AUCs of the LGBM, XGBoost, RF, SVM, and LR models were 0.998, 0.981, 0.839, 0.754, and 0.729, respectively. The performance of the models was evaluated by ROC curves, and the model was evaluated by using the calibration curve and Bootstrap method (B = 1000) for internal validation, the LGBM model showed optimal Brier scores and was close to the diagonal.DCA analysis assessed the clinical value of the model, and the LGBM showed good results in terms of accuracy (0.9433), recall (0.9785), F1 scores (0.9504), MCC (0.8863) and AUROC (0.9881) metrics. Therefore, the LGBM was selected as the optimal model for predicting DVT after intracerebral hemorrhage in this study. The application of SHAP values further elucidates the extent to which several important contributors influence the risk of DVT.

This study identified six independent predictors, which are mechanistically and clinically plausible. Intracerebral hemorrhage can trigger a systemic inflammatory response that activates leukocytes and releases cytokines and chemotactic factors, thereby promoting thrombotic tendency. Inflammatory mediators such as tumor necrosis factor (TNF-α) and interleukins (IL-6, IL-8) activate endothelial cells, promote their secretion of procoagulant substances, and enhance blood coagulation activity¹⁶. In particular, TNF-α and CD40 ligands (TNF family membrane glycoproteins on platelets) induce tissue factor (TF) expression on endothelial cells and monocytes through a nuclear factor κB-dependent mechanism¹⁷. The protein C pathway is particularly sensitive to down-regulation of IL-1β and TNF-α. IL-1β limits protein C activation and promotes thrombin accumulation by inhibiting TM and EPCR transcription¹⁸. In addition, IL-1β causes EPCR shedding and generates sEPCR that inhibits protein C activation¹⁹. tNF-α acts directly on endothelial cells, releasing tissue-type plasminogen activator (tPA) and urokinase-type plasminogen activator, as well as increasing PAI-1, which inhibits fibrinolysis and leads to incomplete fibrin removal²⁰. Studies have shown that leukocyte activation and aggregation triggered by inflammatory response is one of the important factors in deep vein thrombosis²¹. Thrombosis is not only dependent on the traditional Virchow’s triangle (slowing of blood flow, alteration of blood composition, vascular damage) but also closely related to the systemic inflammatory response²¹. In particular, the aggregation and activation of leukocytes increases the risk of thrombosis in the veins, especially in areas of slower venous blood flow, such as the lower extremities²². The intense systemic inflammatory response induced by massive bleeding further increases the probability of thrombosis by activating the coagulation system and inhibiting fibrinolysis²³. In addition, bleeding-induced blood pressure fluctuations and other circulatory stress responses may alter blood flow dynamics, thereby exacerbating the risk of DVT²⁴.

Abnormal systolic blood pressure, especially elevated systolic blood pressure in the acute phase, is strongly associated with the risk of thrombosis²⁵. High systolic blood pressure increases intravascular blood pressure, leading to hemodynamic changes. During acute intracerebral hemorrhage, patients are often in a state of high blood pressure fluctuation. High systolic blood pressure increases the turbulence of blood flow and increases the contact of blood with the vessel wall, which makes it easier to trigger thrombus formation²⁶. High systolic blood pressure in the acute phase leads to mechanical damage to the vascular endothelium, and endothelial damage is the basis for thrombosis. After endothelial cell injury, exposed collagen and other substances can activate platelets and contribute to thrombus formation. This injury is particularly pronounced in the low-flow regions of lower extremity veins, increasing the risk of DVT. High systolic blood pressure is often accompanied by slower blood flow, especially in the lower extremity veins, and when blood flow is slow, clotting factors and platelets in the blood tend to aggregate and form thrombi²⁷. High systolic blood pressure may also lead to the formation of microthrombi, which tend to accumulate in the veins where blood flow is slow, eventually leading to DVT²⁸. High systolic blood pressure may lead to an increase in total systemic vascular resistance, which further affects venous return in the lower extremities, causing blood pooling and increasing the probability of venous thrombosis²⁹. After intracerebral hemorrhage, the patient’s activity is limited, and with abnormal systolic blood pressure, the blood flow is even slower, which makes it very easy to form a thrombus. The clotting ability of the blood tends to increase with age, and the activity of coagulation factors in the blood of the elderly may increase, while the activity of the fibrinolytic system may decrease³⁰. This change makes the blood more likely to clot, increasing the risk of thrombosis. Among older adults, the venous blood vessels may show some degree of sclerosis and reduced elasticity, and the elasticity and contractility of the vessel walls are reduced, leading to slower blood flow and increased stagnation time of blood in the veins, which increases the chances of thrombus formation³¹. In addition, patients with significant midline shifts usually require surgical intervention, and these patients are often required to be bedridden for long periods after surgery, resulting in limited limb movement, especially in the lower extremities, where blood flow becomes poor. Similarly, this state significantly increases the risk of lower extremity thrombosis in postoperative patients due to obstruction of venous return.

Machine learning has shown great potential for disease diagnosis and prognosis assessment by recognizing relationships in complex clinical data³². It can incorporate general clinical conditions, past medical history, imaging features, and biochemical characteristics, resulting in an ‘information gain’ compared to models that use only these individual data categories³³. Studies can quantify the contribution of each variable to the prediction of outcomes through SHAP values, with positive SHAP values indicating a risk-promoting effect and negative values representing a protective effect. This approach improves the interpretability of the model. This method not only enhances the transparency of the model, but also supports individualized risk assessment. In recent years, the related models have gradually realized web-side deployment, which is expected to realize real-time identification of high-risk groups in the future, promote early intervention, and thus improve the clinical prognosis of patients^34,35.

Our study has several strengths. We collected not only clinical examination and laboratory data but also included imaging information to validate and cross-support the patient’s diagnosis from multiple perspectives. Combining imaging data with clinical and laboratory data helps provide more accurate analysis and results. Meanwhile, collecting data from multiple variable perspectives lays the foundation for subsequent big data analysis and mining of potential health patterns or disease prediction models. This not only provides accurate medical solutions for current patients but also informs the treatment of similar patients in the future. By simultaneously considering multiple potential confounders and applying statistical methods to control and adjust for them, it is possible to reduce bias and ensure the accuracy of study results. For predictive analytics (e.g., using machine learning or regression modeling), these multivariate data can help build more robust predictive models. In addition, by combining machine learning and SHAP explanatory modeling, the proportion of the contribution of each factor in the model can be revealed more clearly.

Furthermore, this study still has some limitations. For example, it only collected data at the time of admission and failed to analyze changes in patient outcomes or prognosis throughout treatment. Longitudinal data (e.g., follow-up data) can provide more comprehensive information to help better understand disease progression, treatment outcomes, and long-term patient health trends. In addition, this study did not include patients with intracerebral hemorrhage with small bleeding volumes, which are usually admitted by the Department of Neurology. Therefore, there may be some bias in the selection of the study population, which may affect the generalizability of the results, thereby limiting the applicability of the conclusions to other groups. The present study made every effort to include various variables related to DVT, but some variables were ultimately excluded from the analysis due to their high number of missing values (e.g., Patients with higher NIHSS scores are more likely to experience severe neurological deficits, including impaired motor function of the limbs, which substantially increases their risk of developing lower extremity deep vein thrombosis.). Although the SHAP approach has good intuition in explaining high-risk factors for DVT in ICH patients and helps clinicians understand key predictor variables, improvements in model performance and generalizability are needed to better serve diverse clinical settings. Future studies should prioritize optimization of model utility and development of user-friendly interfaces that allow clinicians to access risk prediction results in real time.

Conclusion

We have developed a model to predict the risk of lower extremity deep vein thrombosis in hospitalized patients with spontaneous intracerebral hemorrhage through retrospective analysis and machine learning, and this model can identify high-risk patients and provide effective clinical support for assisted decision-making. By integrating multidimensional data such as patients’ clinical characteristics, laboratory test results, and imaging data, the model can predict the probability of DVT on an individual basis. For high-risk DVT patients, early interventions, including mechanical prophylaxis, bedside rehabilitation training, and prophylactic anticoagulation, can be implemented to achieve personalized management and early prevention, ultimately improving patient outcomes.

Data availability

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

References

Cai, Q., Zhang, X. & Chen, H. Patients with venous thromboembolism after spontaneous intracerebral hemorrhage: a review. Thromb. J. 19, 93 (2021).
Article CAS PubMed PubMed Central Google Scholar
Dong, R. et al. Effects of an early intensive blood Pressure–lowering strategy using remifentanil and Dexmedetomidine in patients with spontaneous intracerebral hemorrhage: A multicenter, prospective, superiority, randomized controlled trial. Anesthesiology 141, 100 (2024).
Article CAS PubMed Google Scholar
Sedova, P. et al. Incidence of hospitalized stroke in the Czech republic: the National registry of hospitalized patients. J. Stroke Cerebrovasc. Dis. 26, 979–986 (2017).
Article PubMed Google Scholar
Darzi, A. J. et al. Risk-assessment models for VTE and bleeding in hospitalized medical patients: an overview of systematic reviews. Blood Adv. 4, 4929–4944 (2020).
Article CAS PubMed PubMed Central Google Scholar
Di Nisio, M., van Es, N. & Büller, H. R. Deep vein thrombosis and pulmonary embolism. Lancet 388, 3060–3073 (2016).
Article PubMed Google Scholar
Ding, D. et al. Venous thromboembolism in patients with spontaneous intracerebral hemorrhage: A multicenter study. Neurosurgery 84, E304–E310 (2019).
Article PubMed Google Scholar
Caprini, J. A., Arcelus, J. I., Hasty, J. H., Tamhane, A. C. & Fabrega, F. Clinical assessment of venous thromboembolic risk in surgical patients. Semin Thromb. Hemost. 17 (Suppl 3), 304–312 (1991).
PubMed Google Scholar
Hayssen, H. et al. Systematic review of venous thromboembolism risk categories derived from Caprini scores. J. Vasc Surg. Venous Lymphat Disord. 10, 1401–1409e7 (2022).
Article PubMed Google Scholar
Bartlett, M. A., Mauck, K. F., Stephenson, C. R., Ganesh, R. & Daniels, P. R. Perioperative venous thromboembolism prophylaxis. Mayo Clin. Proc. 95, 2775–2798 (2020).
Article PubMed Google Scholar
Barbar, S. et al. A risk assessment model for the identification of hospitalized medical patients at risk for venous thromboembolism: the Padua prediction score. J. Thromb. Haemost. 8, 2450–2457 (2010).
Article CAS PubMed Google Scholar
Orak, F., Saadat, M., Saki Malehi, A., Behdarvandan, A. & Esfandiarpour, F. Comparison of the Pauda and the autar DVT risk assessment scales in prediction of venous thromboembolism in ICU patients. Med. J. Islam Repub. Iran. 38, 48 (2024).
PubMed PubMed Central Google Scholar
Pate, A. et al. Minimum sample size for developing a multivariable prediction model using multinomial logistic regression. Stat. Methods Med. Res. 32, 555–571 (2023).
Article MathSciNet PubMed PubMed Central Google Scholar
Speiser, J. L., Miller, M. E., Tooze, J. & Ip, E. A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 134, 93–101 (2019).
Article PubMed PubMed Central Google Scholar
Li, S. et al. A machine learning model based on CT imaging metrics and clinical features to predict the risk of Hospital-Acquired pneumonia after traumatic brain injury. Infect. Drug Resist. 17, 3863–3877 (2024).
Article PubMed PubMed Central Google Scholar
Li, C. et al. Development and validation of a model for predicting in-hospital mortality in patients with sepsis-associated kidney injury receiving renal replacement therapy: a retrospective cohort study based on the MIMIC-IV database. Front. Cell. Infect. Microbiol. 14, 1488505 (2024).
Article PubMed PubMed Central Google Scholar
Zubkova, E. S. et al. Regulation of adipose tissue stem cells angiogenic potential by tumor necrosis Factor-Alpha. J. Cell. Biochem. 117, 180–196 (2016).
Article CAS PubMed Google Scholar
Amadio, P. et al. Effect of cigarette smoke on monocyte procoagulant activity: focus on platelet-derived brain-derived neurotrophic factor (BDNF). Platelets 28, 60–65 (2017).
Article CAS PubMed Google Scholar
Zhang, P. et al. Resistin impairs activation of protein C by suppressing EPCR and increasing SP1 expression. Biomed. Pharmacother. 109, 930–937 (2019).
Article CAS PubMed Google Scholar
Lee, W., Ku, S. K., Choi, H. & Bae, J. S. Inhibitory effects of three diketopiperazines from marine-derived bacteria on endothelial protein C receptor shedding in human endothelial cells and mice. Fitoterapia 110, 181–188 (2016).
Article CAS PubMed Google Scholar
Ismail, A. A., Shaker, B. T. & Bajou, K. The plasminogen-Activator plasmin system in physiological and pathophysiological angiogenesis. Int. J. Mol. Sci. 23, 337 (2021).
Article PubMed PubMed Central Google Scholar
Navarrete, S. et al. Pathophysiology of deep vein thrombosis. Clin. Exp. Med. 23, 645–654 (2023).
Article CAS PubMed Google Scholar
Imiela, A. M., Mikołajczyk, T. P., Guzik, T. J. & Pruszczyk, P. Acute pulmonary embolism and immunity in animal models. Arch Immunol. Ther. Exp. (Warsz) 72, (2024).
Nakao, T., Kitazawa, Y., Masuda, S. & Hashimoto, N. Subcutaneous hematoma elevates plasma levels of FDP and D-Dimer; an analysis by animal model experiments. J. Blood Med. 15, 557–564 (2024).
Article PubMed PubMed Central Google Scholar
Duchesne, J. et al. Circulatory trauma: A paradigm for Understanding the role of endovascular therapy in hemorrhage control. Shock 56, 22–29 (2021).
Article CAS PubMed Google Scholar
Zheng, C. M. et al. Influence of intradialytic systolic blood pressure changes on arteriovenous access thrombosis in maintenance Hemodialysis patients. Int. J. Clin. Pract. 75, e13799 (2021).
Article PubMed Google Scholar
López, J. M., Fortuny, G., Puigjaner, D., Herrero, J. & Marimon, F. Hemodynamic effects of blood clots trapped by an inferior Vena Cava filter. Int. J. Numer. Method Biomed. Eng. 36, e3343 (2020).
Article MathSciNet PubMed Google Scholar
Constantinescu-Bercu, A. et al. Activated αIIbβ3 on platelets mediates flow-dependent NETosis via SLC44A2. Elife 9, e53353 (2020).
Article PubMed PubMed Central Google Scholar
Poredos, P., Kozak, M., Antignani, P. L. & Jezovnik, M. K. From varicose veins to venous thromboembolic events. Int. Angiol. 42, 254–259 (2023).
Article PubMed Google Scholar
Nahirniak, V. et al. Studying changes of the effective radius in blood vessels after exposure of lower extremities to periodical mechanical vibrations. J. Biomed. Phys. Eng. 9, 673–678 (2019).
CAS PubMed PubMed Central Google Scholar
Kurachi, K. & Kurachi, S. Genetic mechanisms of age regulation of blood coagulation: factor IX model. Arterioscler. Thromb. Vasc Biol. 20, 902–906 (2000).
Article CAS PubMed Google Scholar
Schmidt, C. & Bollini, S. ESC Congress 2020, the digital experience: a report from the ESC scientists of tomorrow. Cardiovasc. Res. 116, e190–e192 (2020).
Article CAS PubMed Google Scholar
Al-Zaiti, S. S. et al. Machine learning for ECG diagnosis and risk stratification of occlusion myocardial infarction. Nat. Med. 29, 1804–1813 (2023).
Article CAS PubMed PubMed Central Google Scholar
Castagno, S., Birch, M., van der Schaar, M. & McCaskie, A. Predicting rapid progression in knee osteoarthritis: a novel and interpretable automated machine learning approach, with specific focus on young patients and early disease. Ann. Rheum. Dis. 84, 124–135 (2025).
Article PubMed Google Scholar
He, Y. et al. Interpretable web-based machine learning model for predicting intravenous Immunoglobulin resistance in Kawasaki disease. Ital. J. Pediatr. 51, 181 (2025).
Article PubMed PubMed Central Google Scholar
Liu, X. et al. Identification and validation of an explainable prediction model of Sepsis in patients with intracerebral hemorrhage: multicenter retrospective study. J. Med. Internet Res. 27, e71413 (2025).
Article PubMed PubMed Central Google Scholar

Download references

Funding

This study was supported by Quanzhou Science and Technology Bureau guiding project (Grant number: 2023N064S) and Joint funds for the innovation of science and technology, Fujian province (Grant number: 2023Y9239).

Author information

Weizhi Qiu, Penglei Cui and Shaojie Li contributed equally to this work.

Authors and Affiliations

Department of Neurosurgery, The Second Affiliated Hospital of Fujian Medical University, Quanzhou, 362000, Fujian, People’s Republic of China
Weizhi Qiu, Penglei Cui, Shaojie Li, Zhenzhou Tang, Jiani Chen, Jiayin Wang & Yasong Li

Authors

Weizhi Qiu
View author publications
Search author on:PubMed Google Scholar
Penglei Cui
View author publications
Search author on:PubMed Google Scholar
Shaojie Li
View author publications
Search author on:PubMed Google Scholar
Zhenzhou Tang
View author publications
Search author on:PubMed Google Scholar
Jiani Chen
View author publications
Search author on:PubMed Google Scholar
Jiayin Wang
View author publications
Search author on:PubMed Google Scholar
Yasong Li
View author publications
Search author on:PubMed Google Scholar

Contributions

All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.

Corresponding author

Correspondence to Yasong Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethical statements

Ethical Approval and Consent to Participate: The study protocol was approved by the Ethics Committee of The Second Affiliated Hospital of Fujian Medical University. This study was conducted in accordance with the Declaration of Helsinki. Due to the retrospective nature of the study, the Ethics Committee of The Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian Province, China waived the need of obtaining informed consent (Ethics Number: 2024 − 677). Additionally, this research exclusively utilized previously collected medical record information from which all personally identifiable information had been removed, ensuring no risk to the subjects and no adverse effects on their rights and health.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Qiu, W., Cui, P., Li, S. et al. Machine learning models predict risk of lower extremity deep vein thrombosis in hospitalized patients with spontaneous intracerebral hemorrhage. Sci Rep 15, 24932 (2025). https://doi.org/10.1038/s41598-025-10905-2

Download citation

Received: 22 February 2025
Accepted: 07 July 2025
Published: 10 July 2025
DOI: https://doi.org/10.1038/s41598-025-10905-2