Background

MPE can arise from primary malignant tumors of the pleura or from metastases of tumors originating in other parts of the body. The pathogenesis of MPE is multifactorial, involving direct invasion of the pleura by tumor cells, lymphatic obstruction, and increased vascular permeability1,2,3. These pathophysiological changes lead to fluid accumulation in the pleural cavity, often accompanied by symptoms such as dyspnea and chest pain, significantly impairing patients’ daily activities. As a result, the median survival for patients with MPE is typically limited to 3 to 12 months4,5,6.

Various therapeutic approaches are available for MPE, including ultrasound-guided thoracentesis, indwelling pleural catheter placement, chemical pleurodesis, and intrapleural chemotherapy7,8. However, these treatments are often associated with complications such as chest pain, pneumonia, empyema, pneumothorax, catheter-related infections, and bone marrow suppression7,8,9,10. Moreover, there is no conclusive evidence indicating that any particular treatment offers the greatest survival benefit for patients.

Lung cancer is the most common cause of MPE, accounting for approximately one-third of all cases, followed by breast cancer (23%) and lymphoma (10%)11. For lung cancer patients with MPE, accurate prognostic prediction is crucial for tailoring treatment strategies and improving quality of life. This study aims to leverage machine learning techniques to enhance the accuracy of prognostic predictions for these patients, providing clinicians with a reliable tool to optimize patient management and outcomes.

Methods

Patients

This retrospective study collected data from patients treated at the Affiliated Hospital of North Sichuan Medical College and Guang’an People’s Hospital between January 1, 2013 and December 31, 2021. Patients from the Affiliated Hospital of North Sichuan Medical College were stratified into a training cohort and an internal testing cohort using a 7:3 ratio. The cohort from Guang’an People’s Hospital during the same period served as external testing cohort 1. To enhance the robustness of model validation, we additionally included patients from Dazhou Central Hospital treated between January 1, 2013 and December 31, 2021 as external testing cohort 2, and patients treated at the Affiliated Hospital of North Sichuan Medical College between January 1, 2023 and December 31, 2023 as temporal external testing cohort. The study population included lung cancer patients diagnosed with MPE at initial diagnosis. Diagnostic criteria for MPE were: (a) histopathologically confirmed lung cancer (both non-small cell lung cancer and small cell lung cancer), and (b) MPE confirmed through cytological or pathological evaluation of pleural fluid or pleural tissue. MPE diagnosis was also considered if one of the following conditions was met: (1) histopathologically confirmed lung cancer with no alternative explanation for pleural effusion, or (2) diagnosed lung cancer with pleural effusion and imaging studies (CT, PET-CT, MRI, or ultrasound) revealing pleural nodules, masses, or irregular thickening indicative of pleural metastasis. Exclusion criteria included: (a) lung cancer patients without MPE at initial diagnosis, (b) patients with other types of malignant tumors, (c) individuals under 18 years of age, (d) cases with incomplete clinical information, and (e) patients lost to follow-up. This study was approved by the Ethics Committee of the Affiliated Hospital of North Sichuan Medical College (Approval Number: 2023ER423-1). All methods were performed in accordance with the relevant guidelines and regulations, including the Declaration of Helsinki. Informed consent was waived due to the retrospective nature of the study, where data was sourced from the hospital’s electronic medical records and electronic health check databases. All data were anonymized to protect the privacy of participants. Details of the screening process and study flow are presented in Fig. 1.

Fig. 1
figure 1

Overall flowchart of the study.

Machine learning

In the training cohort, variables with a p-value < 0.05 from univariate LR were included in multivariate LR analysis to identify independent predictors. Variables with a p-value < 0.05 in the multivariate analysis were retained for further model development. Nine commonly used machine learning algorithms were applied to predict patient prognosis, including DT, RF, XGBoost, Enet, rSVM, MLP, LR, LightGBM, and KNN. Each model was trained using 5-fold cross-validation and validated on both internal and external testing cohorts. Model performance was assessed using data from the training, internal, and external testing cohorts.

Statistical analysis

Model performance was evaluated using Receiver Operating Characteristic (ROC) curves, with sensitivity, specificity, recall, and AUC calculated. Calibration curves were used to assess the agreement between predicted and observed probabilities, reflecting the model’s calibration. Clinical utility was assessed using DCA, which quantifies the net benefit of using the model compared to other diagnostic and prognostic strategies. The best-performing machine learning model was used to develop a nomogram for risk stratification. Risk scores were calculated for each patient, and patients were classified into four risk groups—low, medium, high, and very high—based on thresholds for highest sensitivity, highest specificity, and the combined sensitivity and specificity. Survival curves were generated to illustrate the prognoses for different risk groups. All statistical analyses were conducted using R software (version 4.4.1).

Results

Feature selection

Clinical data from 228 patients were collected at the Affiliated Hospital of North Sichuan Medical College, based on the inclusion and exclusion criteria. These data were divided into a training cohort and an internal testing cohort in a 7:3 ratio (Supplementary Table 1). An additional 79 patients from Guang’an People’s Hospital were included as the external testing cohort 1. To further validate the model’s generalizability, we established two additional validation cohorts: 160 patients from Dazhou Central Hospital constituted external testing cohort 2, and 150 patients treated at the Affiliated Hospital of North Sichuan Medical College between January 1, 2023 and December 31, 2023 were collected to establish a temporal external testing cohort.

In the training cohort, univariate LR analysis was conducted on all features (Supplementary Table 2). Features with a p-value less than 0.05 were identified, including the total volume of pleural effusion, appearance of pleural fluid, smoking history, location of pleural effusion, high-sensitivity C-reactive protein in pleural fluid, pulmonary disease history, pathological subtype of lung cancer, treatment regimen, presence of pericardial effusion, and white blood cell count. These features were further analyzed using multivariate LR, and those with a p-value < 0.05 were retained for model construction. The final selected features included treatment regimen, presence of pericardial effusion, and total volume of pleural effusion (Table 1). These three variables were then used in the development of the machine learning models.

Performance of 9 machine learning prediction models

9 machine learning models, including DT, RF, XGBoost, rSVM, MLP, Enet, LR, LightGBM, and KNN, demonstrated good predictive performance across the training cohort (Fig. 2A), internal testing cohort (Fig. 2B), and external testing cohort (Fig. 2C). The area under the curve (AUC) values for each model exceeded 0.80, indicating robust performance (Table 2).

Fig. 2
figure 2

Evaluation of nine machine learning models for predicting patient prognosis. (A)-(C). ROC curves of the machine learning models in the training cohort (A), internal testing cohort (B), and external testing cohort 1 (C). (D)-(F). Calibration curves of the machine learning models in the training cohort (D), internal testing cohort (E), and external testing cohort 1 (F). (G)-(I). DCA of the machine learning models in the training cohort (G), internal testing cohort (H), and external testing cohort 1 (I). Abbreviations: ROC, receiver operating characteristic; DCA, decision curve analysis.

Table 1 Results of multivariate logistic regression analysis
Table 2 Performance of 9 machine learning prediction models across three cohorts.

Among the nine models, the LR model showed the best calibration curve consistency across all cohorts (Fig. 2D, E and F). DCA further indicated that the LR model provided greater benefits for patients (Fig. 2G, H and I). Therefore, considering the performance of the LR model in the training cohort, internal testing cohort, and external validation cohort, the LR model is deemed to be the optimal model.

Development of the nomogram and patient risk assessment

The nomogram, developed from the LR model identified as the optimal machine learning approach, enables individualized risk quantification (Fig. 3A). To validate its stability, we evaluated the nomogram in both an external testing cohort and a temporal external testing cohort. The nomogram demonstrated robust predictive accuracy across both validation sets (Fig. 3B), achieving AUCs of 0.962 (95% CI: 0.937–0.987) in the external cohort and 0.949 (95% CI: 0.918–0.981) in the temporal external cohort. Calibration curves confirmed strong model-reality concordance in both cohorts (Fig. 3C, D). Decision curve analysis revealed superior clinical utility, with net benefit thresholds spanning 0.01–0.95 (Fig. 3E) and 0.02–0.96 (Fig. 3F).

Fig. 3
figure 3

Development, validation, and risk stratification of the nomogram. (A) A nomogram with three predictive variables developed from the LR model for predicting 1-year survival probability in patients. (B) ROC curves of the nomogram in external testing cohort 2 and temporal external testing cohort. (C)-(D). Calibration curves of the nomogram in external testing cohort 2 (C) and temporal external testing cohort (D). (E)-(F). DCA of the nomogram in external testing cohort 2 (E) and temporal external testing cohort (F). (G) Risk stratification based on nomogram scores: All patients were scored using the nomogram. Patients in the training cohort were classified into four risk groups—low, medium, high, and very high—using thresholds for highest sensitivity, highest specificity, and combined sensitivity and specificity. The proportions of two distinct survival outcome categories within each risk group are displayed. (H) Survival curves of patients across different risk groups (excluding those from the temporal external testing cohort, as follow-up data for this cohort, collected from January 1, 2023, to December 31, 2023, did not meet the minimum 2-year follow-up requirement). Abbreviations: LR, logistic regression; ROC, receiver operating characteristic; DCA, decision curve analysis.

Through the nomogram’s point allocation system, we calculated comprehensive risk scores for all patients. These continuous scores were stratified into four clinically actionable categories using thresholds derived from the training cohort’s score distribution: low-risk (≤ 19 points), moderate-risk (19–88 points), high-risk (88–162 points), and very high-risk (≥ 162 points), as demonstrated in the training cohort’s risk profile visualization (Fig. 3G).

In the low-risk group (≤ 19 points), consisting of 60 patients, one patient had a survival time of less than one year, resulting in a mortality rate of 1.67%. In the moderate-risk group (19–88 points), comprising 245 patients, 56 had a survival time of less than one year, with a mortality rate of 22.86%. The high-risk group (88–162 points), consisting of 201 patients, had 169 patients with a survival time of less than one year, resulting in a mortality rate of 84.08%. The very high-risk group (≥ 162 points), including 111 patients, had a 100% mortality rate, with all patients dying within one year.

Significant survival differences were observed among the risk groups (Fig. 3C), with median survival times of 27 months, 22 months, 8 months, and 2 months for the low-risk, moderate-risk, high-risk, and very high-risk groups, respectively.

Discussion

With advancements in medical technology and a deeper understanding of the pathogenesis of MPE, novel therapies and drugs are continually being developed. For example, research is exploring the use of tumor-derived microparticles to encapsulate chemotherapeutic drugs, allowing for targeted tumor cell destruction while simultaneously activating immune responses to enhance therapeutic efficacy12,13,14. Additionally, newly identified immunotherapeutic targets, signaling pathways, immunomodulatory cytokines, and non-coding RNAs delivered via exosomes hold promise for advancing immunotherapy in MPE15. Despite the availability of various treatment options, no consensus has been reached on the most effective approach. Therefore, further investigation into the mechanisms of MPE formation, early prognostic assessment, and personalized treatment strategies is essential.

This study found that the treatment regimen (targeted therapy, other treatment, or no treatment), presence of pericardial effusion, and total volume of pleural effusion were significantly associated with patients’ one-year survival rate. Targeted therapy, by specifically targeting molecular markers of tumors, can more precisely inhibit tumor growth and spread16,17,18, thereby reducing the incidence of malignant pleural effusion. Due to its high efficacy and relatively low side effects, targeted therapy offers superior long-term outcomes compared to other treatment options, such as chemotherapy and radiotherapy19. In the cases examined in this study, cancer cells typically spread to the pleura and pericardium via the lymphatic system or bloodstream, leading to the formation of pleural and pericardial effusions20,21. The coexistence of these conditions, along with larger volumes of malignant pleural effusion, often indicates advanced cancer progression and greater therapeutic challenges. Pericardial effusion directly affects cardiac function, potentially causing heart failure, hypotension, and circulatory collapse22,23,24. Meanwhile, pleural effusion exerts a compressive effect on the lungs, impairing respiratory function, oxygen exchange, and overall cardiopulmonary health7,20. As the volume of effusion increases, these compressive effects intensify. Large amounts of pleural effusion can further compress the heart, restricting its function6. Similarly, pericardial effusion can compress the lungs, limiting pulmonary expansion, reducing ventilation capacity, and exacerbating hypoxemia. Together, these factors contribute to poor patient prognosis.

This study identified three critical variables—treatment regimen, pericardial effusion, and total pleural effusion volume—for prognostic prediction in lung cancer patients with MPE. The LR model showed the best performance, with AUCs of 0.885 (training), 0.954 (internal testing), and 0.920 (external testing cohort 1). The nomogram achieved AUCs of 0.962 (external testing cohort 2) and 0.949 (temporal external validation cohort), with excellent calibration and clinical utility (net benefit thresholds: 0.01–0.95 and 0.02–0.96, respectively). The nomogram stratifies patients into four risk groups—low-risk (≤ 19 points), moderate-risk (19–88 points), high-risk (88–162 points), and very high-risk (≥ 162 points)—providing an effective tool for predicting one-year mortality and assessing overall survival prognosis.

The treatment of lung cancer with MPE involves both local therapy (symptom relief) and systemic therapy (control of primary tumors and metastases)25. The goal of local therapy is primarily to alleviate dyspnea, reduce effusion recurrence, and improve quality of life. The main local treatment modalities include therapeutic thoracentesis, pleurodesis, hyperthermic intrathoracic chemotherapy, intrapleural drug perfusion, and indwelling pleural catheter. These approaches, which focus solely on managing MPE without targeting the primary tumor, were categorized as “untreated” in our study. In contrast, systemic therapy aims to inhibit tumor progression and prolong patient survival, encompassing systemic chemotherapy, targeted therapy, immune checkpoint inhibitors, and anti-angiogenic therapy. Among these, our study found that targeted therapy provided the greatest benefit to patients compared to all other treatment modalities. Based on the nomogram developed in our study and the risk stratification of patients (low-risk, moderate-risk, high-risk, and very high-risk groups), specific treatment recommendations can be made. For patients in the low-risk or moderate-risk groups, regular systemic therapy is strongly recommended, as these groups achieved median survival times of 27 months and 22 months, respectively. For patients in the high-risk group, long-term systemic therapy may be considered based on their financial situation, although the potential benefit is relatively limited, with a median survival time of only 8 months. For patients in the very high-risk group, the focus should be on reducing MPE and improving quality of life, given their short median survival time of just 2 months.

Despite the strong predictive performance of the nomogram, this study has several limitations. First, the analysis was limited to patients’ laboratory and basic clinical data, lacking an evaluation of lung cancer imaging features. Incorporating tumor imaging characteristics might further enhance the nomogram’s predictive accuracy. Second, although external validation was performed, the external dataset was relatively small. Therefore, additional validation with larger and more diverse samples from various regions is required to enhance the generalizability and reliability of the model.

Conclusion

In this study, we comprehensively assessed the predictive performance of nine machine learning models for prognostication in lung cancer patients with MPE. Our analysis revealed that the LR model, based on key predictors such as treatment regimen, presence of pericardial effusion, and total pleural effusion volume, demonstrated the best performance in terms of accuracy, calibration, and clinical utility. The nomogram derived from this model enables effective risk stratification, offering clinicians a practical tool for early prognostic assessment and intervention.