Clinical subtypes identification and feature recognition of sepsis leukocyte trajectories based on machine learning

Miao, ShengHui; Liu, YiJing; Li, Min; Yan, Jing

doi:10.1038/s41598-025-96718-9

Download PDF

Article
Open access
Published: 10 April 2025

Clinical subtypes identification and feature recognition of sepsis leukocyte trajectories based on machine learning

ShengHui Miao¹^na1,
YiJing Liu²^na1,
Min Li¹ &
…
Jing Yan³

Scientific Reports volume 15, Article number: 12291 (2025) Cite this article

5323 Accesses
5 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Sepsis is a highly variable condition, and tracking leukocyte patterns may offer insights for tailored treatment and prognosis. We used the MIMIC-IV database to analyze patients diagnosed with Sepsis-3 within 24 h of ICU admission. Latent class mixed models (LCMM) were applied to leukocyte trajectories to identify sepsis subtypes. The primary outcome was 28-day all-cause mortality, with secondary outcomes including the need for life-support therapies. Associations between leukocyte trajectories and outcomes were assessed using multivariate regression, and findings were externally validated with the eICU database. Use the XGBoost model to identify baseline characteristics of high-risk mortality sepsis subgroups for predicting subgroup allocation upon patient admission to the ICU, and apply the SHAP method to interpret the contributing variables of the model. Among 7410 sepsis patients, eight distinct leukocyte trajectory subtypes were identified. Among those subtypes, patients with persistently high leukocyte levels had the poorest prognosis (HR 3.00; 95% CI 2.48–3.62) and a significantly greater need for life-support therapies; Patients with persistently low white blood cell levels had a higher risk of death (HR 1.68; 95% CI 1.24–2.27) but were less likely to receive invasive mechanical ventilation. Incorporating early ICU baseline variables into an XGBoost algorithm enables effective prediction of high-mortality risk subgroups (AUC > 0.8). SHAP method reveals distinct early clinical characteristics between hyperinflammatory subtypes (class 4, 7, and 8) and the hypoinflammatory subtype (class 1). In ICU-admitted sepsis patients, eight leukocyte trajectories are identified, which is the key independent predictors of prognosis, separating from single leukocyte measurements. High-mortality risk subgroups exhibit distinct clinical characteristics at ICU admission, providing valuable insights for their prediction and personalized early intervention.

A machine learning-based prediction model for poor prognosis in sepsis using lymphocyte count: a national, multicenter prospective cohort

Article Open access 22 January 2026

Prediction and risk assessment of sepsis-associated encephalopathy in ICU based on interpretable machine learning

Article Open access 31 December 2022

Subphenotypes of platelet count trajectories in sepsis from multi-center ICU data

Article Open access 30 August 2024

Introduction

Sepsis is currently defined as life-threatening organ dysfunction resulting from a dysregulated host response to infection^1,2. It represents a major global health and economic challenge, contributing to approximately 19.7% of global deaths^3,4. However, sepsis is a highly heterogeneous condition, with patients exhibiting varied responses to the same treatments and different patterns of organ dysfunction, which significantly impacts prognosis. Identifying distinct clinical subtypes of sepsis is therefore essential for advancing personalized treatment approaches and improving prognostic accuracy^5,6. Uncontrolled systemic inflammation is a central feature of sepsis and plays a critical role in the development of organ dysfunction. Consequently, assessing a patient’s inflammatory status is a key component in managing the disease. Research has shown that clinical subtypes of sepsis can be differentiated based on patients’ inflammatory responses^7,8,9,10. One of the most commonly used markers of inflammation is leukocyte count. Tracking the trajectory of leukocyte levels over time may offer deeper insights into a patient’s treatment response compared to a single static measurement. This suggests that leukocyte trajectories could be valuable for identifying distinct sepsis subtypes and guiding more precise treatment strategies^11,12. The objective of this study was to develop, evaluate and predict sepsis subtypes. The first goal was to determine whether distinct leukocyte trajectory-based subtypes in patients with sepsis can be identified through the electronic health records. The second goal was to understand whether those different subtypes are associated with the patterns of biomarkers and clinical outcomes. The third goal was to determine whether the high-risk mortality subtypes can be identified using patient baseline characteristics and early-stage clinical features upon ICU admission.

Methods

Data sources: This study utilized data from two large public databases: Development Cohort: Data were obtained from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database, version 3.0. This dataset includes de-identified electronic health records from 364,627 patients hospitalized at Beth Israel Deaconess Medical Center between 2008 and 2019¹³.Validation Cohort: The eICU Collaborative Research Database, a multi-center U.S. database, provided de-identified health data from over 200,000 ICU admissions occurring between 2014 and 2015¹⁴.

Study population: As illustrated in Fig. 1, this study included ICU patients diagnosed with sepsis based on Sepsis 3.0 criteria, defined as a suspected infection accompanied by an increase in the SOFA score of 2 points or more¹. In the database, suspected infection was identified through the administration of intravenous antibiotics and the collection of blood cultures. The exclusion criteria were: (1) patients under 18 years of age; (2) patients with multiple ICU admissions; (3) ICU stays shorter than 4 days; (4) patients diagnosed with sepsis more than 24 h after ICU admission; (5) patients with AIDS; (6) patients with leukemia; and (7) patients lacking sufficient leukocyte count data for model construction (i.e., at least one leukocyte count recorded within each of the following time intervals: 0–24 h, 24–48 h, 48–72 h, and 72–96 h post-ICU admission).

Outcomes: Primary Outcome: 28-day all-cause mortality.Secondary Outcomes: The use of life-support treatments within 7 days, including vasopressors, invasive mechanical ventilation, and continuous renal replacement therapy (CRRT).

Baseline characteristics

As detailed in Table 1, we collected demographic data (age, sex, ICU type, race, weight), SOFA scores, and information on chronic comorbidities (chronic heart failure, myocardial infarction, chronic lung disease, chronic kidney disease, chronic liver disease, rheumatologic disease, diabetes, and malignancy). Baseline vital signs (temperature, heart rate, respiratory rate, mean arterial pressure, and SpO₂) and laboratory values (e.g., hemoglobin, leukocyte count, platelet count, albumin, liver enzymes, bicarbonate, blood urea nitrogen, creatinine, APTT, INR, pH, arterial blood gases, serum sodium, and potassium) were also collected.

Table 1 Baseline characteristics.

Full size table

Outliers were handled using a capping method, with values above the 99th percentile (P99) replaced by P99 and values below the 1st percentile (P1) replaced by P1 (Fig. S1). Missing data for all baseline variables did not exceed 30% (Fig. S2), and multiple imputation was performed using the MICE package¹⁵. For numerical variables, the worst values within the first 96 h of ICU admission were used. For variables with both upper and lower bounds (e.g., serum sodium, serum potassium), we selected the value furthest from the normal range using a custom SQL aggregation function (“Farthest”; see supplementary materials). For variables with only an upper bound (e.g., SOFA score, lactate, liver enzymes), the maximum value was chosen, while the minimum value was used for variables with only a lower bound (e.g., PaO₂, PaO₂/FiO₂).

Statistical analysis

We compared and described baseline characteristics across patient groups. Continuous variables were reported as mean (standard deviation) or median (interquartile range), with differences assessed using the t-test or Mann–Whitney U test, respectively. Categorical variables were presented as counts (percentages), and group comparisons were made using the chi-square test.

In our study, Latent Class Mixed Models (LCMM) were employed to analyze longitudinal time-series data for the identification of distinct latent subgroups¹⁶. We applied LCMM to classify patients in the training cohort based on their leukocyte count trajectories during the first 96 h of ICU admission. Models with 2–10 classes were tested, and the optimal model was selected using the Akaike Information Criteria(AIC) and the Bayesian Information Criterion (BIC) minimization. To test whether the selected categorical model was the best choice for this study, we conducted a multi-dimensional evaluation. Each patient was assigned to the subgroup with the highest probability, and posterior probabilities were used to evaluate the accuracy of these assignments. We also employed Vuong’s Likelihood Ratio Test (VLMR) to assess the goodness-of-fit between different categorical models. Additionally, we calculated the Mean Entropy and Normalized Entropy to evaluate the model’s classification stability and determinacy.

The same model selection method was applied to the validation cohort, and the model was subjected to the same multi-dimensional evaluation after subtype classification. The best classification model was consistently obtained in both the development and validation cohorts. To assess the impact of different leukocyte trajectories on 28-day mortality, Cox regression models and Kaplan–Meier survival curves were constructed. Additionally, logistic regression was performed to examine the association between subgroups and the use of vasopressors, mechanical ventilation, CRRT within 7 days, and 28-day mortality. To ensure the independent effect of subgroup classification on outcomes, multivariate regression analyses were conducted, adjusting for all baseline variables. We further developed an XGBoost model to early predict the high-risk mortality subtypes on patients’ICU admission. The candidate predictive factors included demographics, comorbidities, SOFA score, laboratory indicators, and vital signs. After confirming that the model had good predictive ability, the Shapley additive explanations (SHAP) was used to assess the predictive contribution of variables for high-mortality subtypes.

All statistical analyses were performed using R software (version 4.3.3), and LCMM models were constructed with the “lcmm” package¹⁷.

Results

In the development cohort, 7410 patients were included to build the classification model. Based on Akaike Information Criteria(AIC) and Bayesian Information Criterion (BIC) comparisons, the model with eight groups provided the best fit (Table S1). The results of the multi-dimensional evaluation conducted for the model are as follows (Fig. S5, Tables S2 and S3). The posterior probabilities for this model ranged from 0.71 to 0.90, exceeding the acceptance threshold of 0.7, indicating an acceptable model fit. VLMR test for 8 versus 7 classes has the results of “p < 0.001”. However, the result of VLMR test between the 8-class and 9-class models was “p = 0.11”. The 8-class model exhibited a lower Mean Entropy (0.483) and a higher Normalized Entropy (0.768). These performance metrics identified the 8-class model as the optimal classification model.

For external validation, 7564 patients were analyzed to build the classification model. The eight distinct leukocyte trajectory groups also exhibited the lowest AIC and BIC values(Table S1). Similarly, the model was subjected to multi-dimensional evaluation (Fig. S5, Tables S2 and S3). Posterior probabilities ranged from 0.72 to 0.90. VLMR test for 8 versus 7 classes has the results of “p < 0.001”. However, the result of VLMR test between the 8-class and 9-class models was “p = 0.09”. The 8-class model exhibited a lower Mean Entropy (0.524) and a higher Normalized Entropy (0.748). These metrics further validated the 8-class model as the optimal classification model. Further statistical analysis revealed significant differences in clinical characteristics and outcomes across the eight subtypes, both in the development and validation cohorts, supporting the model’s external validity.

As shown in Fig. 2, the early leukocyte trajectories after ICU admission and the proportion of patients in each group in both cohorts were as follows:

Class 1 (red, stable, low, 2.5%/2.9%): Consistently low leukocyte levels.

Class 2 (yellow, stable, normal, 18.7%/17.7%): Stable leukocyte levels within the normal range.

Class 3 (yellow-green, stable, high, 36.6%/29.2%): Slightly elevated leukocyte levels with minimal fluctuations.

Class 4 (green, stable, very high, 5.9%/7.7%): Persistently high leukocyte levels.

Class 5 (light blue, decreasing, high, 5.6%/18.2%): Elevated leukocyte levels with a decreasing trend.

Class 6 (dark blue, increasing, high, 15.8%/16.1%): Slightly elevated leukocyte levels with an increasing trend.

Class 7 (purple, decreasing, very high, 2.3%/3.7%): Extremely high leukocyte levels that rapidly decreased.

Class 8 (magenta, increasing, very high, 2.7%/4.5%): Initially normal leukocyte levels that sharply increased.

Class 3 was the most prevalent (36.6%/29.2%), followed by Class 2 (18.7%/17.7%), Class 5 (15.6%/18.2%), and Class 6 (15.8%/16.1%), which were nearly equal in proportion. These results suggest that most sepsis patients have elevated leukocyte levels, with varying degrees of fluctuation.

Relationship Between Classifications and Outcomes:

In the development cohort, 1698 patients died within 28 days, while the validation cohort had 1,269 deaths. A Cox regression model (Fig. 3 and Table S6) and survival curves (Fig. 4) were generated, with Class 2 (the group with stable, normal leukocyte levels) serving as the reference group due to its lowest mortality risk. In contrast, Class 4 (persistently very high leukocyte levels) had the highest mortality risk (HR 3.00; 95% CI 2.48–3.62; p < 0.001), followed by Class 7 (rapidly decreasing, very high leukocyte levels, HR 2.08; 95% CI 1.56–2.77; p < 0.001), Class 8 (sharply increasing, very high leukocyte levels, HR 1.80; 95% CI 1.35–2.39; p < 0.001), and Class 1 (consistently low leukocyte levels, HR 1.68; 95% CI 1.24–2.27; p < 0.001). These associations remained significant even after adjusting for baseline variables, including static leukocyte measurements (Fig. 5/Table S8).

The logistic regression model further explored the relationship between leukocyte subtypes and the need for life-support therapies. Subtypes with higher leukocyte levels, particularly Class 4, Class 7, and Class 8, were associated with increased use of life-support treatments, mirroring the trends observed in the Cox model. However, the relationship between subtype classification and life-support use was less pronounced in the multivariate regression model, suggesting that this association may not be fully independent (Fig. 5/Table S9).

Interestingly, despite Class 1 having a higher mortality risk in the Cox model, its use of life-support therapies was not significantly different from Class 5 (the reference group). Class 1 was even associated with a lower need for invasive mechanical ventilation (development cohort: OR 0.56; 95% CI 0.40–0.79; p < 0.001; validation cohort: OR 0.57; 95% CI 0.42–0.77; p < 0.001), a finding that persisted after multivariate adjustment (Fig. 5).

When examining leukocyte trajectory trends, the impact on outcomes was not entirely dependent on the trajectory pattern. For instance, Class 5 (decreasing, high leukocyte levels) and Class 6 (increasing, high leukocyte levels), which had similar overall leukocyte levels within the first 4 days, showed comparable 28-day mortality risks (development cohort: HR 1.44; 95% CI 1.21–1.71 versus HR 1.40; 95% CI 1.18–1.66; validation cohort: HR 1.71; 95% CI 1.39–2.10 versus HR 1.94; 95% CI 1.58–2.39). A similar trend was observed between Class 7 and Class 8 (development cohort: HR 2.08; 95% CI 1.56–2.77 versus HR 1.80; 95% CI 1.35–2.39; validation cohort: HR 2.60; 95% CI 1.95–3.46 versus HR 2.12; 95% CI 1.60–2.82).

Subtypes Reproducibility And Prediction:

We further trained an XGBoost model to predict subtypes based on patient characteristics upon ICU admission. To evaluate the predictive performance of the model, we calculated the AUC values for the four high-mortality classifications (Class 1, Class 4, Class 7, and Class 8) in both the development cohort (Dev-Cohort) and validation cohort (Val-Cohort) (Fig. S6). The results showed that all AUC values exceeded 0.8, with similar AUCs between the validation cohort (purple curve) and the development cohort (green curve). Other performance metrics (Table S15) indicated that for Class 1 and 4, PPV was greater than 0.7, while for Class 7 and 8, PPV was close to 0.7. However, considering other performance metrics, NPV exceeded 0.8, accuracy exceeded 0.8, and balanced accuracy was above 0.75 across all classifications in both cohorts, demonstrating strong predictive performance of the model.

When using the trained XGBoost model to predict high-risk mortality subtypes, SHAP values exhibite that each has distinct clinical features (Fig. 6). Variables such as lactate, bicarbonate, platelet count, albumin, and PaO2 have a significant impact on predicting subtypes related to clinical outcomes. Group 1 (Consistently low leukocyte levels) was characterized by lower hemoglobin, platelet count, and creatinine, as well as a lower prevalence of cancer and liver disease. In contrast, low albumin, high platelet count, high creatinine, and high bilirubin made a significant contribution to predicting Group 4 (Persistently high leukocyte levels). Low albumin, higher blood urea nitrogen (BUN), high SOFA score, and younger age were stronger predictors of Group 7 (Extremely high leukocyte levels that rapidly decreased), while low albumin, low lactate, high heart rate, low pH, high BUN, and high platelet count were more characteristic of Group 8 (Initially normal leukocyte levels that sharply increased).

Discussion

In this study, we applied a machine learning approach, Latent Class Mixed Models (LCMM), to analyze dynamic time-series data and identify potential leukocyte trajectory subtypes in sepsis patients. In the results, we observed that the 8-class model was selected as the optimal classification model in the development cohort based on the minimum AIC and BIC values (Table S1). A multidimensional evaluation of the classification performance of this optimal model (Fig. S5, Tables S2, S3) indicated that the 8-class model was the best choice: in both cohorts, the posterior probability exceeded the acceptable threshold of 0.7, demonstrating the robustness of the model.

Furthermore, VLMR test confirmed that the 8-class model had a significantly better fit than the 7-class model (p < 0.001). However, there was no significant difference between the 8-class and 9-class models, suggesting that the 8-class model was sufficiently effective without the risk of overfitting due to an excessive number of classes. The 8-class model also exhibited a lower Mean Entropy and higher Normalized Entropy, indicating higher classification stability and lower uncertainty.Further data analysis showed that the regression analyses in the validation cohort aligned with the results from the training cohort. Notably, patients with persistently elevated leukocyte levels had the poorest clinical outcomes, a finding that remained consistent even after adjusting for baseline variables, including static leukocyte measurements. Additionally, we developed a multivariable prediction model to identify high-risk mortality subtypes at ICU admission. The predictive performance metrics (Table S15) indicated that the model effectively predicted the baseline characteristics of high-mortality subphenotypes. SHAP values further demonstrated that the impact of different combinations of feature variables on high-mortality subphenotype classification was highly stable (Fig. 6).

In critical care medicine, syndromes are commonly used to categorize patient groups in both clinical practice and research. However, as our understanding of disease complexity deepens, there is a growing recognition of the need for precision medicine. Sepsis, a highly heterogeneous condition, exemplifies this challenge, as identifying distinct clinical subtypes is essential for tailoring treatment and improving prognostic assessments¹⁰. For instance, Bhavani et al. identified four sepsis subtypes based on vital sign trajectories, revealing differences in prognosis and fluid therapy response⁸. Similarly, sepsis subtypes have been defined using organ failure trajectories derived from SOFA scores. Van Amstel et al. explored the relationships between different sepsis classification methods, finding little overlap, except for some similarities between Mars2 and SRS1 in terms of host response biomarkers (p = 0.079–0.424)⁵.

A similar study previously examined leukocyte trajectories in septic shock patients, analyzing 917 cases and identifying seven distinct subgroups. Consistent with our findings, the subgroup with the highest mortality in that study (subgroup five) closely resembled our Class 4 trajectory, which was strongly associated with poor outcomes¹⁸. However, a notable difference in our study is the identification of a low leukocyte subgroup (Class 1), which we interpret as an immunosuppressive phenotype. This subgroup had unique baseline characteristics, including lower values of platelet count, hemoglobin, lactate and creatinine, as well as a relatively lower prevalence of cancer and liver disease.(see Table S10). The SHAP values indicate that low values of these feature variables have a stable impact on the classification of this subphenotype (Fig. 6). Patients in this group who received invasive mechanical ventilation were significantly fewer than in other groups (see Table S9). The Class 1 subgroup, characterized by low leukocyte trajectory, relatively stable metabolism, and normal renal function, suggests that these patients lack typical acute immune responses and are less likely to develop respiratory failure symptoms. Alternatively, these patients may have opted for a more conservative approach, avoiding invasive interventions like intubation^19,20,21,22.

It’s important to note that immunosuppression in these patients may not be entirely attributable to their comorbidities but could also be a consequence of sepsis itself, highlighting the need for close attention to this phenotype in clinical practice^23,24.

We compared Class 5 and Class 6, as well as Class 7 and Class 8, which had similar areas under the trajectory curve (indicating comparable average leukocyte levels), but showed opposite trends following ICU admission. Despite these contrasting trends, there was no statistically significant difference in direct mortality risk between these groups, and logistic regression analysis confirmed similar findings. This lack of difference may reflect the timing of infection onset: Classes 5 and 7, which displayed a decreasing trend in leukocyte levels, likely had infections prior to ICU admission. In these cases, inflammation may have been better controlled after ICU admission, resulting in a marked reduction in leukocyte levels.

Class 4, representing 5.9% of the development cohort and 7.7% of the validation cohort, exhibited the highest mortality risk. This aligns with clinical observations that persistently elevated leukocyte levels are often associated with severe, hard-to-control infections⁴, Interestingly, despite Classes 7 and 8 displaying higher peak leukocyte levels than Class 4, their outcomes were relatively better. This trend echoes findings from a study by Xu Wang et al., which examined procalcitonin trajectories in sepsis patients during the first 7 days of ICU admission. They found that patients with persistently low procalcitonin levels had worse outcomes compared to those with higher levels but a decreasing trend—paralleling the results in our study²⁵.

Using the XGBoost algorithm combined with SHAP method, we captured the clinical characteristics of high-mortality risk subgroups (Class 1, 4, 7, and 8) at ICU admission. By plotting ROC curves and calculating AUC values, we evaluated the model’s performance. Overall, the model demonstrated high AUCs in the development set (> 0.82) and maintained good generalization ability in the validation set (AUC range: 0.818–0.878), indicating strong discriminative power across different white blood cell trajectory subtypes, excellent predictive performance, and consistent external performance—reflecting good robustness and external validity.

SHAP interpretation method revealed that lactate and albumin were the most influential variables in determining white blood cell trajectory subtypes. Patients with lower albumin and higher lactate levels were more likely to belong to high-risk subgroups. Platelet count (Plt), blood urea nitrogen (BUN), and bicarbonate also showed substantial contributions across several classes, suggesting that coagulation status, renal function, and acid-base balance play important roles in sepsis subtype classification. Additionally, respiratory rate (RR) and PaO₂ contributed notably in specific subtypes, highlighting oxygenation status as a key feature in certain groups²⁶.

Class 1 (persistently low pattern) exhibits distinct clinical characteristics and represents a hypoinflammatory subtype of sepsis, suggesting a potential state of immunosuppression. From a theoretical perspective, immunostimulatory therapies may be beneficial for this group of patients. Common strategies include immunostimulatory cytokines and growth factors (such as GM-CSF, G-CSF, and IL-7), intravenous immunoglobulin (IVIG), mesenchymal stem cells (MSCs), and immune checkpoint inhibitors (e.g., PD-1 inhibitors)²⁷.

The remaining three high-mortality risk subgroups exhibited overall white blood cell levels significantly above the normal range and shared similar clinical characteristics. At ICU admission, hypoalbuminemia emerged as their most prominent feature, accompanied by elevated lactate levels and reduced bicarbonate concentrations (base levels), indicating early hyperlactatemia, poor tissue perfusion, and metabolic acidosis.

Studies have shown that hypoalbuminemia in sepsis is associated with increased albumin clearance, and early albumin resuscitation may improve outcomes^28,29. For these patients, timely albumin supplementation and fluid resuscitation may have unique therapeutic benefits.Additionally, Class 4 and Class 8 were characterized by increased respiratory rate and tachycardia, suggesting a stronger early stress response upon ICU admission in these subgroups.

This study has several limitations. First, it was difficult to assess differences in treatment responses among the identified subtypes, which limits our understanding of how these classifications may inform therapeutic strategies. Future prospective studies are needed to validate the clinical utility of these classifications. Second, to ensure model fit and accuracy, we included only patients who stayed in the ICU for more than 4 days and were diagnosed with sepsis within 24 h of admission. The effectiveness of this classification for excluded patients, such as those with shorter ICU stays or later sepsis diagnoses, remains unclear. Third, we did not collect data on other inflammatory markers, such as C-reactive protein (CRP), procalcitonin (PCT), or heparin-binding protein (HBP), which limits a more comprehensive evaluation of the inflammatory status in these patients. Finally, the retrospective nature of this study may limit the applicability of these findings in prospective clinical settings.

Conclusion

Using the Latent Class Mixed Model (LCMM), we identified eight distinct sepsis subtypes based on leukocyte trajectories within the first 96 h of ICU admission. These subtypes exhibited significant differences in clinical outcomes and organ support requirements, proving to be independent prognostic indicators for sepsis, beyond static leukocyte measurements. External validation with an independent cohort confirmed the robustness of these findings. The XGBoost prediction model constructed using baseline characteristics upon ICU admission is able to predict high-mortality phenotypes based on baseline variables. The hyperinflammatory and hypoinflammatory subtypes exhibit distinct clinical characteristics in the early phase of ICU admission. Further research is needed to explore the clinical relevance of these subtypes, particularly their potential overlap and interaction with existing sepsis classifications, to enhance personalized treatment strategies.

Data availability

Our data was obtained from MIMIC-IV2.2 and eICU-CRD databases，which is available in PhysioNet (https://physionet.org), thus no more permission was required.

Abbreviations

AIC:: Akaike information criteria
BIC:: Bayesian information criterion
CCU:: Coronary care unit
CHF:: Chronic heart failure
CI:: confidence interval
CKD:: Chronic kidney disease
CRP:: C-reactive protein
CRRT:: Continuous renal replacement therapy
HBP:: Heparin-binding protein
HR:: Hazards ratio
ICU:: Intensive care unit
LCMM:: Latent class mixed models
MAP:: Mean arterial pressure
MIMIC:: The medical information mart for intensive care database
MV:: Mechanical ventilation
PCT:: Procalcitonin
SD:: Standard deviations
SHAP:: Shapley additive explanations
SICU:: Surgery intensive care unit
SOFA:: Sequential organ failure assessment
SQL:: Structured query language
VLMR:: Vuong’s likelihood ratio test
WBC:: White blood cell

References

Singer, M. et al. The third international consensus definitions for sepsis and septic shock (Sepsis-3). Jama 315 (8), 801–810 (2016).
Article CAS PubMed PubMed Central Google Scholar
C, M. et al. Sepsis and septic shock. Lancet (London England) 392 (10141), 75–87 (2018).
Article Google Scholar
Rudd, K. E. et al. Global, regional, and National sepsis incidence and mortality, 1990–2017: Analysis for the global burden of disease study. Lancet 395 (10219), 200–211 (2020).
Article PubMed PubMed Central Google Scholar
Daviaud, F. et al. Timing and causes of death in septic shock. Ann. Intensive Care 5 (1), 16 (2015).
Article PubMed PubMed Central Google Scholar
van Amstel, R. B. E. et al. Uncovering heterogeneity in sepsis: A comparative analysis of subphenotypes. Intensive Care Med. 49 (11), 1360–1369 (2023).
Article PubMed PubMed Central Google Scholar
Xu, Z. et al. Sepsis subphenotyping based on organ dysfunction trajectory. Crit. Care 26 (1), 197 (2022).
Article PubMed PubMed Central Google Scholar
Komorowski, M. et al. Sepsis biomarkers and diagnostic tools with a focus on machine learning. EBioMedicine 86, 104394 (2022).
Article CAS PubMed PubMed Central Google Scholar
Bhavani, S. V. et al. Development and validation of novel sepsis subphenotypes using trajectories of vital signs. Intensive Care Med. 48 (11), 1582–1592 (2022).
Article PubMed PubMed Central Google Scholar
Reddy, K. et al. Subphenotypes in critical care: Translation into clinical practice. Lancet Respir Med. 8 (6), 631–643 (2020).
Article PubMed Google Scholar
Gordon, A. C. et al. From ICU syndromes to ICU subphenotypes: Consensus report and recommendations for developing precision medicine in the ICU. Am. J. Respir Crit. Care Med. 210 (2), 155–166 (2024).
Article PubMed PubMed Central Google Scholar
Crouser, E. D. et al. Improved early detection of sepsis in the ED with a novel monocyte distribution width biomarker. Chest 152 (3), 518–526 (2017).
Article PubMed PubMed Central Google Scholar
Jang, J. Y. et al. Identification of the robust predictor for sepsis based on clustering analysis. Sci. Rep. 12 (1), 2336 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Johnson, A. E. W. et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 10 (1), 1 (2023).
Article CAS PubMed PubMed Central Google Scholar
Pollard, T. J. et al. The eICU collaborative research database, a freely available multi-center database for critical care research. Sci. Data 5, 180178 (2018).
Article PubMed PubMed Central Google Scholar
van Buuren, S. Groothuis-Oudshoorn, mice: Multivariate imputation by chained equations in R. J. Stat. Softw. 45 (3), 1–67 (2011).
Article Google Scholar
Sinha, P., Calfee, C. S. & Delucchi, K. L. Practitioner’s guide to latent class analysis: Methodological considerations and common pitfalls. Crit. Care Med. 49 (1), e63–e79 (2021).
Article PubMed PubMed Central Google Scholar
Proust-Lima, C., Philipps, V. & Liquet, B. Estimation of extended mixed models using latent classes and latent processes: The R package Lcmm. J. Stat. Softw. 78 (2), 1–56 (2017).
Article Google Scholar
Rimmer, E. et al. White blood cell count trajectory and mortality in septic shock: A historical cohort study. Can. J. Anaesth. 69 (10), 1230–1239 (2022).
Article PubMed PubMed Central Google Scholar
Gotts, J. E. & Matthay, M. A. Sepsis: Pathophysiology and clinical management. Bmj 353, i1585 (2016).
Article PubMed Google Scholar
Williams, J. C., Ford, M. L. & Coopersmith, C. M. Cancer sepsis. Clin. Sci. (Lond) 137 (11), 881–893 (2023).
Article CAS PubMed Google Scholar
Simonetto, D. A. et al. Management of sepsis in patients with cirrhosis: Current evidence and practical approach. Hepatology 70 (1), 418–428 (2019).
Article PubMed Google Scholar
Mirouse, A. et al. Sepsis and cancer: an interplay of friends and foes. Am. J. Respir Crit. Care Med. 202 (12), 1625–1635 (2020).
Article PubMed Google Scholar
Torres, L. K., Pickkers, P. & van der Poll, T. Sepsis-Induced Immunosuppression. Annu. Rev. Physiol. 84, 157–181 (2022).
Article CAS PubMed Google Scholar
Pei, F. et al. Expert consensus on the monitoring and treatment of sepsis-induced immunosuppression. Mil. Med. Res. 9 (1), 74 (2022).
CAS PubMed PubMed Central Google Scholar
Wang, X. et al. The procalcitonin trajectory as an effective tool for identifying sepsis patients at high risk of mortality. Crit. Care 28 (1), 312 (2024).
Article PubMed PubMed Central Google Scholar
Lee, C. U. et al. The index of oxygenation to respiratory rate as a prognostic factor for mortality in sepsis. Am. J. Emerg. Med. 45, 426–432 (2021).
Article PubMed Google Scholar
Slim, M. A. et al. Towards personalized medicine: A scoping review of immunotherapy in sepsis. Crit. Care 28 (1), 183 (2024).
Article PubMed PubMed Central Google Scholar
Omiya, K. et al. Albumin and fibrinogen kinetics in sepsis: A prospective observational study. Crit. Care 25 (1), 436 (2021).
Article PubMed PubMed Central Google Scholar
Wang, M. & Zhong, L. Albumin for sepsis-related peripheral tissue hypoperfusion. Crit. Care 28 (1), 79 (2024).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

None.

Funding

This work was sponsored by grants from Zhejiang Provincial Clinical Research Center for Critical Care Medicine.

Author information

ShengHui Miao and YiJing Liu contributed equally to this work.

Authors and Affiliations

The Fourth Affiliated Hospital, International Institutes of Medicine, Zhejiang University School of Medicine, Yiwu, 322000, China
ShengHui Miao & Min Li
Department of Second Clinical Medical College, Zhejiang Chinese Medicine University, Hangzhou, 310053, Zhejiang, China
YiJing Liu
Zhejiang Hospital, Zhejiang University School of Medicine, Lingyin Road 12, Hangzhou, 310013, Zhejiang, China
Jing Yan

Authors

ShengHui Miao
View author publications
Search author on:PubMed Google Scholar
YiJing Liu
View author publications
Search author on:PubMed Google Scholar
Min Li
View author publications
Search author on:PubMed Google Scholar
Jing Yan
View author publications
Search author on:PubMed Google Scholar

Contributions

Shenghui Miao and YiJing Liu co-led this study. Jing Yan and Min Li conceptualized the research aims, planned analyses, and guided the literature review. Shenghui Miao extracted data from the MIMIC-IV database and processed statistical analysis. Shenghui Miao and YiJing Liu drafted the initial manuscript. Jing Yan provided feedback and approved the final manuscript. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Jing Yan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics approval and consent to participate

The use of the database was approved by the Institutional Review Boards of MIT and Beth Israel Deaconess Medical Center. Since the database is anonymized and contains standardized data, separate ethics approval was not required, in accordance with the Declaration of Helsinki. Therefore, this study is exempt from the need for an ethical approval statement and informed consent. All participants involved in the study have successfully passed the official ethics test and are qualified to access the database.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Supplementary Material 3

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Miao, S., Liu, Y., Li, M. et al. Clinical subtypes identification and feature recognition of sepsis leukocyte trajectories based on machine learning. Sci Rep 15, 12291 (2025). https://doi.org/10.1038/s41598-025-96718-9

Download citation

Received: 16 November 2024
Accepted: 31 March 2025
Published: 10 April 2025
Version of record: 10 April 2025
DOI: https://doi.org/10.1038/s41598-025-96718-9

Keywords

This article is cited by

Influence of early temperature trajectories on clinical outcomes in traumatic brain injury: a multicenter validation study using machine learning
- Yunuo Zhao
- Tao Zhang
- Xuelei Ma
European Journal of Medical Research (2025)