Introduction

Pulmonary hypertension (PH) is a severe, progressive pulmonary vascular disorder marked by elevated pulmonary vascular pressure and resistance due to persistent abnormal contraction and remodeling of pulmonary vessels. This condition can lead to varying degrees of right heart failure and mortality1.Approximately 1% of the global population is impacted by PH.

In critically ill patients, PH is commonly observed and can exacerbate the progression toward death by contributing to right heart failure and other complications. Despite the availability of various treatments targeting pathways such as endothelin, nitric oxide, prostacyclin, and Activin/BMP signaling, the prognosis for PH patients remains poor.

Several predictive models have been established to assess the prognosis of patients with PH. However, these models often differ from our study in terms of the patient population and predicted outcomes. For example, many rely on invasive right heart catheterization measurements2, which have limited utility in clinical settings, especially in the ICU. Additionally, previous studies have frequently overlooked comprehensive physiological and clinical changes, a critical consideration in real-world scenarios.

Several biomarkers might predict the clinical outcome of patients with PPH. For instance, the serum red blood cell distribution width (RDW) is a laboratory indicator that may offer improved predictive value for the survival rate of PPH patients3,4.Some evidence suggests a positive correlation between the level of serum urea nitrogen (BUN) and the duration of hospitalization or ICU stay among patients with PPH, indicating its potential utility as a novel prognostic indicator for PPH5.There is still no comprehensive and straightforward model for evaluating the clinical prognosis of patients with PPH in the ICU.

The purpose of this study is to develop and validate a predictive model for mortality in patients with PPH admitted to the ICU. By identifying key risk factors through readily accessible clinical indicators, such as demographic information, vital signs, serum biochemical markers, complications, and severity scores. We aim for early detection and intervention. This approach aims to help these patients receive heightened attention and tailored treatments to delay disease progression and improve survival outcomes.

Methods

Data source

This retrospective observational cohort study combined data from the Medical Information Mart for Intensive Care III(MIMIC-III)6 and MIMIC-IV7 databases, which were randomly split in a 7:3 ratio for model development and internal validation. External validation was performed using data from the eICU Collaborative Research Database (eICU-CRD)8. The MIMIC-III and MIMIC-IV databases contain health-related information of more than 40,000 patients admitted to the Beth Israel Deaconess Medical Center (BIDMC) ICU between 2001 and 2012, and data on nearly 300,000 patients from 2008 to 2019. The eICU-CRD is a multicenter database that includes data on over 200,000 ICU admissions from hospitals across the USA between 2014 and 2015. The data groups included a wide range of variables, such as demographic information, laboratory results, medication usage, and vital signs. Our research team has successfully completed the preliminary work to utilize the database and has acquired the relevant certification (certificate number: 46205578). The data in MIMIC and eICU-CRD were de-identified. Due to the retrospective nature of the study, the Institutional Review Boards of the Massachusetts Institute of Technology and Beth Israel Deaconess Medical Center waived the need for obtaining informed consent. We confirm that all methods used in this study were conducted in strict accordance with the relevant guidelines and regulations.

Patient selection

The eligibility criteria were as follows. (1) The patients were diagnosed with PPH, as identified using International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes. According to the MIMIC and eICU-CRD databases, the ICD-9-CM code for PPH was 4160. (2) The patients were older than 18 years. The exclusion criteria were (1) no record of ICU hospitalization and (2) incomplete blood test data, such as missing blood levels or biochemical tests. The flow chart of patient screening is illustrated in Fig. 1.

Fig. 1
figure 1

Flowchart of patient selection.

Data extraction

The compressed files for the MIMIC and eICU-CRD databases were downloaded and installed through Post- greSQL14 software. The Navicat15 visual management tool was utilized to input pertinent programming languages and codes for extracting necessary data for the study, including demographic information, sex, age, race, vital signs, heart rate (HR), respiratory rate (RR), percutaneous oxygen saturation (SpO₂), temperature, systolic blood pressure (SBP), diastolic blood pressure (DBP), mean blood pressure (MAP), laboratory examination, white blood cell count (WBC), hemoglobin, platelets, red cell distribution width (RDW), blood urea nitrogen (BUN), creatinine, glucose, serum potassium, and serum sodium complications, coronary heart disease, diabetes mellitus, hypertension, and atrial fibrillation, and a scoring system, sequential organ failure score (SOFA) and simplified acute physiological score II (SAPS II). The baseline values of vital signs, laboratory test results, and scoring system data obtained within 24 h of admission to the ICU were analyzed. Data from only the first ICU stay were included for patients who were admitted to the ICU multiple times during a single hospitalization.

Outcome variables

The outcome variables of this study were 12-month, 24-month, 36-month and in-hospital mortality following admission to the ICU.

Statistical analysis

The research data were merged and statistically analyzed by STATA 17.0, R 4.3.2 and DecisionLinnc1.0. DecisionLinnc1.0 is a platform that integrates multiple programming language environments and enables data processing, data analysis through a visual interface9. Continuous variables with a normal distribution are expressed as mean ± standard deviation and were compared by Student’s t test, while non-normally distributed variables are presented as median (interquartile range) and were analyzed with the Mann − Whitney U test. Categorical data are presented as n (%) and were analyzed by the χ2 test. Univariate and multivariate Cox regression analysis were conducted on the development group to identify factors associated with patient mortality prognosis. Only variables with P < 0.1 in the univariate Cox regression or P < 0.05 in the multivariate Cox regression were included for further variable selection. Similar to previous methods10, stepwise Cox regression was then conducted using forced entry, backward elimination11, and bidirectional stepwise12 selection methods. Through this process, a clinical prediction model with the lowest Akaike information criterion (AIC) value was selected13. For other statistical comparisons, P < 0.05 indicated statistical significance. The predictive ability of the model in the development group was assessed as the C-index and calibration curve, while the clinical benefits were evaluated through decision curve analysis (DCA). The C-index, calibration curve, and DCA were utilized to assess the stability of the model in the validation group. Finally, a nomogram was drawn to visualize the prediction model.

Results

Patient characteristics

A total of 420 eligible patients were enrolled in the study. The development group included 260 patients, the internal validation group had 104 patients, and the external validation group comprised 56 patients. As shown in Table 1, the baseline data indicated that the SAPS II scores were significantly higher in the internal validation group compared to the development group (P < 0.05), while other indicators showed no statistically significant differences between the two groups. On the other hand, as shown in Table 2, the development group had significantly higher age, percentage of male patients, SpO₂, SBP, coronary artery disease, atrial fibrillation, and SAPS II while lower RR, temperature, hemoglobin, glucose, sodium, and RDW compared to the external validation group (P < 0.05). As for the mortality, in the development group, the in-hospital mortality rate was 13.84%, rising to 31.54% at 12 months, 40.00% at 24 months, and 46.54% at 36 months. In the internal validation group, the in-hospital mortality rate was 12.50%, increasing to 35.58% at 12 months, 39.42% at 24 months, and 43.27% at 36 months. The in-hospital mortality rate in the external validation group was 16.07%, however, follow-up data on mortality at 12, 24, and 36 months were lacking for this group, as shown in Table 3.

Table 1 Baseline characteristics between development group and internal validation group.
Table 2 Baseline characteristics between development group and external validation group.
Table 3 Mortality at different time points for the development, internal validation, and external validation groups.

Univariate and multivariate Cox regression analysis

Table 4 displayed the results of univariate and multivariate Cox regression analysis conducted on the development group. Age, RR, SpO₂, RDW, BUN, creatinine, and SAPS II were selected as significant prognostic factors for patients with PPH by univariate analysis (p < 0.05). Further multivariate analysis calculated P-values of RR, SAPS II, and RDW were 0.001,0.002 and < 0.001, respectively, revealing that RR, RDW, and SAPS II are independent predictors of mortality in these patients.

Table 4 Cox regression analysis for mortality in the development group.

Development of the model

Subsequently, stepwise Cox regressions were conducted to develop the model, as presented in Table 5. Using forced entry based on univariate Cox regression results, eight variables (RDW, RR, SAPS II, age, glucose, BUN, creatinine, and SpO₂) were included, resulting in an AIC of 1353.706. Backward elimination selected RDW, RR, and SAPS II, resulting in an AIC of 1351.564. Bidirectional stepwise selection identified RDW, RR, SAPS II, age, and glucose, achieving an AIC of 1349.946. On the other hand, based on multivariate Cox regression results, RDW, RR, and SAPS II were consistently selected by all methods (AIC = 1351.564). We finalized the model with RDW, RR, SAPS II, age, and glucose due to its lowest AIC value. The final model demonstrated a C-index of 0.736 (95% CI: 0.694–0.778) in the development group and 0.696 (95% CI: 0.622–0.769) in the internal validation group.

Table 5 Comparison of various methods in Cox regression for the variable selection.

Accuracy of mortality prediction of the model

As shown in Fig. 2, the area under the curve (AUC) at 12 months, 24 months, and 36 months in the development group was 0.810 (95% CI, 0.754–0.866), 0.787 (95% CI, 0.731–0.844), and 0.776 (95% CI, 0.719–0.832), respectively. Correspondingly, the AUC at 12 months, 24 months, and 36 months in the internal validation group was 0.771 (95% CI, 0.680–0.861), 0.755 (95% CI, 0.663–0.846), and 0.718 (95% CI, 0.620–0.816), respectively. The calibration curves for the development group (Fig. 3A-C) and internal validation group (Fig. 3D-F) demonstrated a high level of consistency between the predicted and the actual probability of death events.

Fig. 2
figure 2

ROC curves of 12-, 24-, and 36-month mortality in the development (AC) and internal validation (DF) group.

Fig. 3
figure 3

Calibration curve of 12-, 24-, and 36-month mortality in the development (AC) and internal validation (DF) group.

Clinical net benefit

The DCA analysis was demonstrated in Fig. 4. The y-axis denotes the net benefit derived from utilizing the model for predicting the risk of death, while the x-axis signifies the threshold probability of the prediction model in forecasting the risk. The black line indicates the scenario where all patients survive at the specified time node, whereas the solid green line signifies the scenario where all patients have died at that time node. The red curve represents the net benefit of using the prediction model across different threshold probabilities, illustrating the model’s performance in comparison to assuming that all patients either survive or die. A higher position of the red curve above the black and green lines within a range of threshold probabilities indicates that the model offers a greater net benefit than either extreme assumption above for predicting mortality risk. Thus, the current model indicates excellent positive net benefit.

Fig. 4
figure 4

The decision curve analysis to predict 12-, 24-, and 36-month mortality in the development group (AC) and internal validation group (DF).

External validation

As for the external validation, seen in Fig. 5, the AUC was 0.951 (95% CI, 0.892-1.000) which was higher than that of the development group (0.885, 95% CI, 0.841–0.929) and the internal validation group (0.839, 95% CI, 0.748–0.930). Besides, the DCA also demonstrated excellent positive net benefit.

Fig. 5
figure 5

ROC curves of in-hospital mortality in the development (A), internal validation (B), and external validation group (C). The decision curve analysis to predict in-hospital mortality in the development (D), internal validation (E), and external validation group (F).

Nomogram

As illustrated in Fig. 6, the nomogram’s rating ruler displayed the individual scores for each variable. These scores were summed to the total score, which was then used to determine the corresponding prediction probability line at the bottom of the nomogram, indicating the survival probability at the given time interval. For instance, a patient with PPH who is 60 years old (15 points), has an RDW of 20% (50 points), a glucose level of 150 mg/dl (7.5 points), a respiratory rate of 20 breaths per minute (27.5 points), and a SAPS II of 40 (27.5 points) has a total score of 127.5, yielding 12-month predicted survival rates of 0.42, regarded as low-risk according to the cutoff.

Fig. 6
figure 6

Nomogram of the final model. The construction of the nomogram included age, RDW, glucose, RR and SAPS II. RR respiratory rate, RDW red cell distribution width, SAPS II simplified acute physiology score II.

Heatmap

Figure 7 displayed the risk score analysis of patients with PPH in the development group using the final model. The risk score from the final model was used to categorize patients into high- and low-risk groups (Fig. 7A). Figure 7B illustrated the correlation between survival status and survival time among PPH patients. Figure 7C displayed the heat map illustrating the distribution of the five variables that made up the final model. The distributions of age, RR, RDW, glucose, and SAPS II closely aligned with the risk score, indicating their worthy inclusion as risk factors. These variables exhibited predictive accuracy in assessing the prognosis of patients with PPH and potentially influence disease progression.

Fig. 7
figure 7

Risk score analysis of PPH patients in the development group based on the final model. (A) Distribution of risk scores per patient, (B) Relationships between survival status and survival months of PPH patients ranked by risk score, (C) Heatmap of variables. Colors from blue to red indicate increasing levels from low to high. SAPS II simplified acute physiology score II, RR respiratory rate, RDW red cell distribution width.

Discussion

PPH is a chronic and progressive respiratory disease. Although advancements in treatment have raised the survival rates in recent years, the economic burden imposed on the global community by PPH remains significant14,15. This disease not only poses physical harm to patients but also profoundly impacts their psychological well-being and overall quality of life, particularly for those who have to go to the ICU16 .

The precise identification of prognostic factors is essential for clinicians to assess survival rates and tailor treatment decisions accordingly. There are no user-friendly tools to accurately forecast the prognosis of ICU patients with PPH. There is a pressing need to determine which factors influence the prognosis of ICU patients with PPH and develop a corresponding clinical prognostic model.

A comprehensive analysis of the development group was conducted using Cox regression, leading to the construction of a prognostic risk prediction model for PPH. Interestingly, the variables RR, RDW, and SAPS II selected through backward elimination based on the result of univariate Cox analysis were exactly the same as those selected through forced entry, backward elimination, and bidirectional stepwise selection based on the result of multivariate Cox analysis. This coincidence may suggest that a model composed of RR, RDW, and SAPS II has great potential to be the final model. However, as shown in Table 5, we found that the model established using bidirectional stepwise selection based on the result of univariate Cox analysis had the lowest AIC value. Therefore, according to methodological considerations, we chose this model as the final model. The final model incorporates five key indices, namely, age, RR, RDW, glucose, and SAPS II, which were identified as adverse risk factors for the survival of PPH patients. The model performed well in terms of differentiation, calibration, and clinical applicability, enabling accurate prognostic prediction for PPH patients. Similar methods have been extensively employed to assess the accuracy of prognostic prediction models17,18.

As with many cardiovascular diseases, PPH has higher prevalence and mortality rates among elderly individuals19. Young PPH patients, aged 18 to 45 years, have a higher five-year survival rate than older age groups (46–64, 65–74, and  75 years)20. Additionally, our univariate Cox analysis corroborated the significance of age as a risk factor for mortality in PPH patients. The RR serves as a fundamental physiological parameter for assessing patient condition and prognosis. It predicts mortality in individuals with acute respiratory distress syndrome and earthquake trauma, a higher respiratory frequency being linked to an increased risk of death21,22. Although few studies have focused on patients with PPH, our multivariate Cox analysis validated the importance of the RR as an independent risk factor for the prognosis of patients with PPH, as the RR was positively correlated with the mortality rate of patients with PPH. In clinical practice, it is imperative to keep the patient’s respiratory rate within a reasonable range, as deviations too high or too low may be indicative of an elevated risk of mortality. A study conducted in Germany further elucidated this concept, revealing a U-shaped functional relationship between mortality and respiratory frequency in patients with community-acquired pneumonia upon admission23. RDW indicates alterations in red blood cell volume, so it is frequently employed alongside other laboratory markers for the identification of hematological disorders, including anemia and bone marrow dysfunction2,94. RDW fluctuate in numerous chronic inflammatory conditions, such as diabetes, cardiovascular disease, infection, and cancer, and possess predictive and prognostic significance for cardiovascular disease and overall mortality in the general population25. RDW has a role as an independent prognostic indicator for adverse outcomes in patients with PAH26,27,28,29. The utility of RDW across diverse patient populations underscores its significance in facilitating early detection, risk assessment, and therapeutic management. Our multivariate Cox analysis revealed that RDW was an independent risk factor for death in PAH patients, consistent with earlier findings. Moreover, the literature suggests a heightened risk of PAH in individuals with type 2 diabetes30. Despite the limited existing research on the impact of blood glucose levels on the prognosis of patients with PPH, this study revealed that elevated blood glucose was a significant risk factor for mortality in patients with PPH, as determined through univariate Cox analysis. SAPS II, a commonly utilized tool for evaluating the severity and mortality risk in ICU patients, incorporates various factors, such as age, vital signs, physiological parameters, and disease status, to provide clinicians with a quantitative means of predicting mortality risk31. SAPS II has demonstrated efficacy in predicting mortality among patients in the ICU in clinical settings. It exhibits a robust capacity to forecast mortality in patients with sepsis, making it a valuable tool for prognosis and identification within this patient population32. While research on patients with PPH is limited, our multivariate Cox analysis indicated that SAPS II was a significant predictor of mortality in this population, higher scores correlating with poorer prognoses.

The heatmap in Fig. 7C not only demonstrated the differences in the levels of the variables SAPS II, RR, glucose, RDW, and age between the high-risk and low-risk groups but also highlighted the specific relationships between these variables. Patients in the high-risk group generally presented with higher SAPS II scores, elevated RR, increased RDW, and glucose levels, and greater age. There may be potential interactions among these variables. For instance, higher SAPS II scores reflect severe physiological dysfunction, which may lead to increased metabolic stress and subsequent elevation in glucose levels33. Additionally, respiratory distress, indicated by elevated RR, could act synergistically with metabolic disturbances such as hyperglycemia and inflammation, reflected by increased RDW, to exacerbate the severity of the condition. Similarly, advancing age is often associated with decreased physiological reserve and metabolic imbalance, as evidenced by elevated SAPS II, glucose, and RDW. The concurrent elevation of these variables collectively represented the overall severity of the disease and its adverse impact on prognosis, further supporting the model’s ability to elucidate the interplay of these risk factors and their influence on survival outcomes.

This research was innovative in that it developed the first clinical prognostic model for patients with PPH. Despite the high incidence of PPH cases annually, there is limited research on it. Although some machine learning studies have constructed predictive models for patients with pulmonary hypertension, most of these focus on diagnostic prediction models rather than prognostic models34,35. Thus, the prognostic model developed in this study is the first of its kind and underscores the significance of our findings. Our method involved variable selection using the Cox stepwise regression method to create six models, the best of which was chosen based on the AIC value, to ensure its stability.

Our study has some limitations. First, the data were sourced from the MIMIC and eICU databases. The MIMIC database covers ICU admissions from 2001 to 2019, and the eICU database spans from 2014 to 2015, thus excluding the most recent four years. Given the advancements in medical treatment and the potential impact on patient prognosis, this might have skewed the predictive outcomes. Second, this study was retrospective, which presents challenges in controlling confounding factors compared to prospective studies, potentially impacting any inference of causality. Finally, the data came solely from the MIMIC database and lacked comprehensive information such as pulmonary artery pressure. The eICU dataset lacks long-term prognostic follow-up data, preventing us from externally validating the predictive performance of our model for mortality at 12, 24, and 36 months. Additionally, the relatively small sample size may introduce selection bias. Future research should incorporate data from multiple centers to enhance the validity and generalizability of the findings.

Conclusion

The model developed and validated here accurately predicted mortality ati12, 24, and 36 months, as well as in-hospital mortality after ICU admission in PPH patients, based on their age, RR, RDW, glucose, and SAPS II. The resulting nomogram offers valuable prognostic insights for patients with PPH and can help clinicians tailor individualized treatment strategies.