Abstract
The lymph node ratio (LNR) may emerge as a more reliable prognostic indicator for non-small cell lung cancer (NSCLC). This study utilized data collected between 2010 and 2019 from the Surveillance, Epidemiology, and End Results (SEER) 17 registry research database. The cohort comprised patients diagnosed with stage I-IIIA non-small cell lung cancer (NSCLC) who had undergone surgical resection and had at least one lymph node examined. Overall survival was predicted using XGBoost, a robust tree-based ensemble algorithm, based on LNR, and prognostic LNR thresholds were determined using SHAP (SHapley Additive exPlanations) to interpret the model. Smooth-fit curves were also employed to determine the cutoff values for the lymph node ratio (LNR) in relation to mortality. Trend tests and Cox proportional hazards regression models were applied to evaluate the association between LNR and overall survival (OS). This investigation encompassed a total of 61,990 patients who met the criteria for inclusion. The cohort was stratified into three categories based on smooth-fitting curves and SHAP dependence plot: low (<0.1), medium (0.1≤,<0.4), and high (≥ 0.4) LNR groups. Kaplan-Meier curves illustrated that patients in the low LNR group exhibited superior overall survival compared to those in the medium and high LNR groups (P < 0.001). This trend was evident in all subgroup analyses. The Kaplan-Meier curves stratified by LNR groups demonstrate superior discriminatory power compared to those stratified by N-stage grouping. In the population of patients with NSCLC, an elevated LNR is linked to diminished OS. This relationship holds true across all stratified cohorts. Evaluation of LNR serves as a dependable prognostic marker for OS in patients with stage I-IIIA NSCLC undergoing radical surgery.
Similar content being viewed by others
Acknowledgements
We hereby thank the participants for their time and energy in the data collection phase of SEER project.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics declarations
Review and approval by an ethics committee was not needed for this study because difficulties in identifying patients in SEER database.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Xiao, Y., Xu, X., Chen, W. et al. XGBoost and SHAP based lymph node ratio thresholds for predicting overall survival in stage I to IIIA NSCLC. Sci Rep (2026). https://doi.org/10.1038/s41598-026-47993-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-47993-7


