Abstract
Bone metastasis (BM) is common in high-grade lung neuroendocrine tumors (NETs). This study aimed to use multiple machine learning algorithms to exploring the significant factors associated with synchronous BM in these patients. Patients diagnosed with high-grade lung NETs were extracted from SEER 17 registries. Age-standardized incidence rate (ASIR) was calculated. All patients were randomly divided into the training cohort and validation cohort (8:2). Eight machine learning algorithms were used to construct predictive model for synchronous BM in the training cohort, and the optimal model was selected for further validation. Shapley Additive Explanations (SHAP) were used to interpret the importance of each variable in the optimal model. In addition, Kaplan–Meier (KM) survival analysis was performed to evaluate survival in patients with synchronous BM. From 2010 to 2021, the ASIR of synchronous BM in small cell lung cancer (SCLC) showed decreasing incidence (from 1.52 to 1.16 per 100,000 person-years, APC − 2.3, 95% CI − 3.3 to − 1.3, P < 0.001). No significant change was found for large cell neuroendocrine carcinoma (LCNEC). Approximately 23% of patients had synchronous BM at diagnosis. The stochastic gradient boosting (GBM) model, developed using ten-fold cross-validation, showed optimal predictive value both in the training and validation cohorts. The SHAP analysis indicated that liver metastasis had the most significant impact on synchronous BM. The median cancer-specific survival of patents with bone metastasis was 8 months. No survival difference was found between LCNEC and SCLC. The incidence of SCLC with synchronous BM showed a slight but statistically significant decrease over the last decade. These patients experienced poor survival. The selected GBM model could help identify patients at high risk of BM among those with high-grade lung NETs.
Similar content being viewed by others
Introduction
High-grade neuroendocrine tumors (NETs) of the lung consist of large cell neuroendocrine carcinoma (LCNEC) and small cell lung cancer (SCLC)1. High-grade lung NETs originate from pulmonary neuroendocrine cells and exhibit significant heterogeneity compared to other lung cancer2,3. Compared to typical carcinoid (TC) and atypical carcinoid (AC), high-grade NETs exhibited distinct molecular alterations and significantly higher Ki67 proliferation indices. Specifically, the Ki67 index was generally < 5% in TC, 5–20% in AC, and often exceeded 50%—sometimes reaching 80–100%—in high-grade NETs, reflecting their markedly higher proliferative activity. Molecularly, TC and AC frequently harbored MEN1 mutations and showed relatively low chromosomal instability. In contrast, high-grade NETs commonly exhibited TP53 and RB1 mutations, as well as alterations such as MYC amplification and NOTCH pathway inactivation. These differences were clinically significant. While TC and AC typically follow an indolent course, LCNEC and SCLC are more aggressive and prone to early metastasis4. A population-based study revealed that patients with SCLC and LCNEC had much worse overall survival (OS) than those with AC and TC (5-year OS rate: 5%, 17%, 64%, and 84%, respectively)5.
Bone metastasis (BM) is a common site of lung cancer. A study found that BM ranked second both in SCLC and LCNEC6. The occurrence of BM has a significant adverse effect on survival7,8,9. Additionally, skeletal-related events related to BM also sufficiently affect the quality of life10,11. However, the early detection of BM is difficult until obvious symptoms or enlarged lesion occur12.
Machine learning has become a popular tool in medical research in recent years. Machine learning techniques can integrate abundant data, and address the complexity and variability of tumorigenesis, progression, and prognosis. Chen et.al developed a decision tree model for predicting survival in SCLC patients with BM, which showed good performance13. Currently, no machine learning models have been developed for predicting the synchronous BM in patients with high-grade lung NETs. To address this knowledge gap, this study applied multiple machine learning algorithms to explore significant factors associated with synchronous BM in patients with high-grade lung NETs, using data from Surveillance, Epidemiology, and End Results (SEER) database.
Methods
Study population
Data from SEER 17 registries, November 2023 submission, which covered approximately 26.5% of the US population, was used to identify eligible patients for this study. The details of inclusion criteria and exclusion criteria are presented in Fig. 1. In brief, patients diagnosed with lung cancer (classified into SCLC, LCNEC, other NSCLC) and synchronous BM between 2010 and 2021 were enrolled for epidemiology analysis. Patients diagnosed with SCLC and LCNEC between 2010 and 2015 were enrolled for model development and survival analysis.
Epidemiology analysis
The age-standardized incidence rate (ASIR) of synchronous BM for SCLC, LCNEC, and other NSCLC was calculated using SEER*stat (version 8.4.3), standardized to the 2000 US standard population, and is presented as cases per 100,000 person-years. The annual percentage change (APC) in ASIR over the study period was quantified using the National Cancer Institute’s (NCI’s) Joinpoint Regression Program (version 4.9.0.0). A Two-sided P value < 0.05 indicates statistics significance.
Model construction and evaluation
In this study, we used eight machine learning algorithms to construct models for predicting synchronous BM. The algorithms included extreme gradient boosting (XGB), decision tree (DT), stochastic gradient boosting (GBM), logistic regression (LR), naive Bayes (NB), neural network (NN), multilayer perceptron (MLP), and random forest (RF).
All patients were randomly divided into the training cohort and validation cohort with a split of 8:2. Univariate and multivariate logistic regression were used to select variables (P < 0.05) for inclusion in model construction. Eight machine learning models were constructed by using the training cohort. Optimal hyperparameters for each algorithm were determined using a random search strategy combined with tenfold cross-validation. The performance of all eight models was comprehensively assessed using the area under the receiver operating characteristic curve (AUC), precision, recall, F1 score, and accuracy. The optimal model, defined as the one exhibiting the best overall performance across these evaluation metrics, was subsequently selected for further validation through calibration curves and decision curve analysis (DCA).
Shapley Additive Explanations (SHAP) were used to interpret and visualize the selected model. The importance of each variable in the optimal model was calculated.
Survival analysis
The main survival outcome was cancer-specific survival (CSS), which was defined as the follow-up time from diagnosis to death due to SCLC or LCNEC.
Statistical methods
The SEER data were obtained through the NCI’s SEER*stat software (version 8.4.3; https://seer.cancer.gov/seerstat/). The model construction and evaluation were performed using ‘caret’ package from R software (version 4.3.3; https://www.r-project.org). The survival analysis was performed by using the Kaplan–Meier method and the log-rank test.
Results
Incidence of synchronous bone metastasis in lung cancer
From 2010 to 2021, a total of 67,963 patients diagnosed with lung cancer and synchronous BM were identified. Among these, LCNEC and SCLC accounted for 0.9% and 17.4% of synchronous BM cases, respectively. The ASIR of LCNEC with synchronous BM showed no significant change (from 0.05 to 0.06 per 100,000 person-years; APC − 0.1, 95% confidence interval [CI] − 3.8–3.7, P = 0.946; Fig. 2A). The ASIRs of SCLC with synchronous BM and other NSCLC with synchronous BM showed decreasing incidences (SCLC, 1.52–1.16, APC − 2.3, 95% CI − 3.3 to − 1.3, P < 0.001; other NSCLC, 7.24–5.92, APC − 1.9, 95% CI − 2.4 to − 1.3, P < 0.001; Fig. 2A). Subgroup analysis by sex revealed that the decrease was more pronounced in male patients both for SCLC (APC − 3.1 vs. − 1.5, P < 0.001) and other NSCLC (− 2.8 vs. − 0.8, P < 0.001; Fig. 2B).
Variable selection
A total of 21,809 patients diagnosed with SCLC and LCNEC between 2010 and 2015 were enrolled for model construction. Among them, 23% had synchronous BM at diagnosis, which was more pronounced in those with SCLC (23.4% vs. 17.2%). Notably, the disparity of liver metastasis between two groups was more evident (55% vs. 22%, P < 0.001), when compared to the disparity of brain metastasis (21% vs. 15%, P < 0.001) and the disparity of lung metastasis (22% vs. 12%, P < 0.001) (Table 1).
With a split ratio of 8:2, all patients were divided into the training cohort (N = 17,448) and validation cohort (N = 4361). No significant differences in baseline characteristics between the two cohorts were found (Table S1). The univariate and multivariate logistic regression analyses were performed to select the variables for model construction in training cohort. The analyses revealed that nine variables, including age, race, sex, T stage, N stage, marital status, brain metastasis, liver metastasis and lung metastasis were significantly associated with synchronous BM (P < 0.05, Table 2). Thus, these factors were included for further machine learning analyses.
Model construction, validation and explanation
A total of eight machine learning algorithms were used to construct models. In the training cohort, the top 3 performing algorithms were RF (AUC 0.742), GBM (AUC 0.733), and NN (AUC 0.725) (Fig. 3A). While in the validation cohort, the AUC of RF was only 0.683 (Fig. 3B). The GBM model achieved an AUC of 0.723 in the validation cohort (Fig. 3B). Additionally, precision, recall, F1 score, and accuracy were evaluated for each model (Table 3). The GBM model consistently demonstrated superior performance across all metrics, thus it was selected as the optimal algorithm for constructing the predictive model. The calibration plots of GBM showed a high consistency between predictive values and observed values (Fig. 3C,D). The clinical decision curves of GBM demonstrated its potential to yield significant net clinical benefit (Fig. 3E,F).
The SHAP analysis provided detailed interpretations of the model’s predictions. The yellow indicated a higher impact on synchronous BM, which meant that liver metastasis, advanced N stage, lower age, lung metastasis, male patients, brain metastasis, advanced T stage, and married status had positive impact on synchronous BM (Fig. 4A). Among these variables, the liver metastasis showed the highest association with BM in the GBM model (Fig. 4B). Additionally, liver metastasis and age had the most significant synergistic impact (Fig. 4C). In patients with a liver metastasis, higher SHAP values were found in those with lower ages. In contrast, in patients with no liver metastasis, a higher SHAP values were found in those with older ages. Figure 4D illustrated an example of an individual patient who did not present with synchronous BM. The SHAP force plot provided an interpretable breakdown of how each feature contributed to the model’s prediction for this patient. Notably, the absence of liver metastasis, N0 stage, and absence of lung metastasis all contributed negatively to the overall SHAP value, reducing the predicted risk of bone metastasis. These features are represented as red bars pointing to the left, indicating their negative influence on the output. The patient’s predicted probability was below the average baseline prediction (E[f(x)] = 1.23), supporting the model’s accurate classification.
Survival analysis
To perform the survival analysis, all patients with high-grade NETs of the lung were classified into four groups, i.e. patients with no metastasis, patients with BM only, patients with BM and any other site-specific metastasis (including brain, lung and liver), and patients with other site-specific metastasis but no bone metastasis. The median CSS of patents with bone metastasis only was 8 months, which was worse than those without metastasis, but better than those with other site-specific metastasis or with multiple metastases (Fig. 5A). No difference was found between patients with LCNEC and SCLC (with BM only) (Fig. 5B).
Discussion
In this study, we found a decreasing incidence of synchronous BM among patients with SCLC in the last decade. Further, we developed an interpretable GBM model to predict the risk of BM in patients with high-grade lung NETs. Based on each patient’s clinical features, the model calculated a corresponding total SHAP value, where a higher value indicated a greater individual risk of developing BM, as illustrated in the example shown in Fig. 4D.
Bone is a common metastasis site in lung cancer. Previous studies found that the occurrence of BM in SCLC ranged from 16.7 to 40.4%13,14,15. In this study, we found a rate of 23% for SCLC with BM in the SEER database. Additionally, we found a rate of 17.2% for LCNEC with BM, which is higher than a previously reported rate8. The reason for these differing rates among studies in unclear, but variations in inclusion and exclusion criteria may contribute. The incidence of synchronous BM decreased in SCLC and other NSCLC in the last decade, and this was consistent with the overall decreasing incidence of lung cancer in the USA16. Additionally, the increased uptake of lung cancer screening may contribute to the decreased incidence of synchronous BM. This measure is associated with earlier stage diagnoses and improved survival, and warrants wider adoption17.
In this study, we compared the predictive value of eight machine learning algorithms and found that the GBM model had the best performance for predicting synchronous BM. Among the nine variables included in the GBM model, liver metastasis exerted the strongest positive influence on the prediction of synchronous BM. This finding suggested strong association between BM and liver metastasis in patients with high-grade lung NETs. The underlying mechanism require further research. Interestingly, we found that age had different impact on BM occurrence under different liver metastasis status. Among patients with liver metastasis, younger age had a positive impact. While among those without liver metastasis, younger age had a less impact. These findings suggested a more aggressive natural history in a subset of young patients with high-grade lung NETs.
Multiple mechanisms were found to be associated with BM in patients with SCLC. A study proved that the expression of plasma hyaluronan was positive correlated with BM12. The binding of hyaluronan and CD44 could regulate the migration and invasion, resulting in metastasis18,19. Pang’s study indicated that dickkopf1 played an important regulator of BM in vivo20. A study suggested that annexin A1 was involved in BM in SCLC21. While researches focused on the mechanisms for BM of LCNEC is lacked.
A study found significant difference in demographic and clinical characteristic between high-grade SCLC and high-grade LCNEC by using the SEER database, and that LCNEC patients had better survival22. In this study, we found no statistic difference of median CSS between SCLC and LCNEC who had synchronous BM only. The median CSS of these patients was only 8 months, indicating a poor prognosis and highlighting the unmet clinical need for more effective treatment strategies. While the optimal treatment paradigm for high-grade lung NETs with BM has not been well studied. Recent years, immunotherapy has been the promising treatment for high-grade lung NETs23. Additionally, the combination of immunotherapy and other treatment, such as chemotherapy, radiotherapy, and anti-angiogenesis therapy, could extend the survival24. Moreover, the nanomaterials are emerging as a new method to enhance the efficacy of immunotherapy25,26. Additional research is required to explore the efficacy of these treatment paradigms for patients with high-grade lung NETs with bone metastasis, or other metastases.
The GBM model developed in this study demonstrated superior performance in predicting synchronous BM in patients with high-grade lung NETs. By incorporating routinely available clinical variables from the SEER database and applying SHAP for interpretability, the model provided both individualized risk scores and insights into key predictors such as liver metastasis, age, and N stage. This transparency supports risk-adapted clinical decisions, such as early imaging for high-risk patients. However, real-world implementation poses several challenges. The absence of molecular and genetic data may limit the model’s biological precision. Additionally, practical barriers—such as integration into electronic health records, lack of external validation, concerns about interpretability, and data security—remain significant27. Model performance may also degrade over time without continuous monitoring and retraining28. Effective deployment will require robust infrastructure, regular updates, and intuitive interfaces. Prospective, multi-institutional validation is essential. Despite these limitations, interpretable models hold strong potential to enhance personalized care in high-grade lung NETs.
We acknowledge a few limitations in this analysis. Firstly, this GBM model requires external validation to improve its predictive performance and enhance the generalizability of the results. Secondly, because of the retrospective nature of SEER database, some important variables are lacking, such as Ki67, smoking status, etc. These variables could be integrated into the model for improving the predictive power in future.
Conclusion
In this study, we found that the incidence of SCLC with synchronous BM decreased slightly with statistic difference in the last decade. The selected GBM model could help screen patients at high risk of BM for high-grade lung NETs. Additionally, we found that liver metastasis had a strong correlation with bone metastasis. The survival of patients with high-grade lung NETs and BM was poor.
Data availability
The datasets for this study can be obtained from the corresponding author upon any reasonable request.
References
Randhawa, S., Trikalinos, N. & Patterson, G. A. Neuroendocrine tumors of the lung. Thorac. Surg. Clin. 31(4), 469–476 (2021).
Hendifar, A. E., Marchevsky, A. M. & Tuli, R. Neuroendocrine tumors of the lung: Current challenges and advances in the diagnosis and management of well-differentiated disease. J. Thorac. Oncol. 12(3), 425–436 (2017).
Filosso, P. L. et al. Knowledge of pulmonary neuroendocrine tumors: Where are we now?. Thorac. Surg. Clin. 24(3), ix–xii (2014).
Iyoda, A., Azuma, Y. & Sano, A. Neuroendocrine tumors of the lung: Clinicopathological and molecular features. Surg. Today 50(12), 1578–1584 (2020).
Shah, S. et al. Incidence and survival outcomes in patients with lung neuroendocrine neoplasms in the United States. Cancers (Basel) 13(8), 1753 (2021).
Huang, L. et al. Incidence, survival comparison, and novel prognostic evaluation approaches for stage iii–iv pulmonary large cell neuroendocrine carcinoma and small cell lung cancer. BMC Cancer 23(1), 312 (2023).
Liu, C., Yi, J. & Jia, J. Diagnostic and prognostic nomograms for bone metastasis in small cell lung cancer. J. Int. Med. Res. 49(10), 3000605211050735 (2021).
Yang, Q. et al. Clinicopathological characteristics and prognostic factors of pulmonary large cell neuroendocrine carcinoma: A large population-based analysis. Thorac. Cancer. 10(4), 751–760 (2019).
Ma, H. et al. A clinical nomogram for predicting cancer-specific survival in pulmonary large-cell neuroendocrine carcinoma patients: A population-based study. Int. J. Gen. Med. 14, 7299–7310 (2021).
Oster, G. et al. Natural history of skeletal-related events in patients with breast, lung, or prostate cancer and metastases to bone: a 15-year study in two large US health systems. Support Care Cancer 21(12), 3279–3286 (2013).
Silva, S. C., Wilson, C. & Woll, P. J. Bone-targeted agents in the treatment of lung cancer. Ther. Adv. Med. Oncol. 7(4), 219–228 (2015).
Zhao, C. et al. Hyaluronic acid correlates with bone metastasis and predicts poor prognosis in small-cell lung cancer patients. Front. Endocrinol. (Lausanne). 12, 785192 (2021).
Chen, Q. et al. Deep learning of bone metastasis in small cell lung cancer: A large sample-based study. Front. Oncol. 13, 1097897 (2023).
Uei, H. & Tokuhashi, Y. Prognostic factors in patients with metastatic spine tumors derived from lung cancer-a novel scoring system for predicting life expectancy. World J. Surg. Oncol. 16(1), 131 (2018).
Cetin, K., Christiansen, C. F., Jacobsen, J. B., Norgaard, M. & Sorensen, H. T. Bone metastasis, skeletal-related events, and mortality in lung cancer patients: A Danish population-based cohort study. Lung Cancer 86(2), 247–254 (2014).
Siegel, R. L., Miller, K. D., Wagle, N. S. & Jemal, A. Cancer statistics, 2023. CA Cancer J. Clin. 73(1), 17–48 (2023).
Edwards, D. M. et al. Impact of lung cancer screening on stage migration and mortality among the national Veterans Health Administration population with lung cancer. Cancer-Am. Cancer Soc. 130(17), 2910–2917 (2024).
Spadea, A. et al. Evaluating the efficiency of hyaluronic acid for tumor targeting via CD44. Mol. Pharm. 16(6), 2481–2493 (2019).
Wang, X. et al. Expression of CD44 standard form and variant isoforms in human bone marrow stromal cells. Saudi Pharm J. 25(4), 488–491 (2017).
Pang, H. et al. The biological effects of dickkopf1 on small cell lung cancer cells and bone metastasis. Oncol. Res. 25(1), 35–42 (2017).
Chen, P. et al. Annexin A1 is a potential biomarker of bone metastasis in small cell lung cancer. Oncol. Lett. 21(2), 141 (2021).
Wang, J., Ye, L., Cai, H. & Jin, M. Comparative study of large cell neuroendocrine carcinoma and small cell lung carcinoma in high-grade neuroendocrine tumors of the lung: A large population-based study. J. Cancer. 10(18), 4226–4236 (2019).
Weber, M. M. & Fottner, C. Immune checkpoint inhibitors in the treatment of patients with neuroendocrine neoplasia. Oncol. Res. Treat. 41(5), 306–312 (2018).
Lahiri, A. et al. Lung cancer immunotherapy: Progress, pitfalls, and promises. Mol. Cancer 22(1), 40 (2023).
Chiang, C. S. et al. Combination of fucoidan-based magnetic nanoparticles and immunomodulators enhances tumour-localized immunotherapy. Nat. Nanotechnol. 13(8), 746–754 (2018).
Zhang, Q. et al. Biomimetic magnetosomes as versatile artificial antigen-presenting cells to potentiate T-cell-based anticancer therapy. ACS Nano 11(11), 10724–10732 (2017).
Verma, A. A. et al. Implementing machine learning in medicine. CMAJ 193(34), E1351–E1357 (2021).
Spies, N. C., Farnsworth, C. W., Wheeler, S. & McCudden, C. R. Validating, implementing, and monitoring machine learning solutions in the clinical laboratory safely and effectively. Clin. Chem. 70(11), 1334–1343 (2024).
Acknowledgements
The authors are grateful to all the staff in the National Cancer Institute (USA) for their contribution to the SEER program.
Author information
Authors and Affiliations
Contributions
Author contributions: (I) Conception and design: Tao Liu, Jin Yang (II) Administrative support: Jin Yang (III) Provision of study materials or patients: Tao Liu, Zongyun He, Zhe Chen (IV) Collection and assembly of data: Zongyun He, Zhe Chen, Haibing Tao (V) Data analysis and interpretation: Bo Lan, Zongyun He, Zhe Chen, Haibing Tao, Tao Liu, Jin Yang (VI) Manuscript writing: All authors (VII) Final approval of manuscript: All authors.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics approval
As this study is a retrospective analysis of public dataset, ethical approval for this study was not required.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Lan, B., He, Z., Chen, Z. et al. Machine learning for synchronous bone metastasis risk prediction in high grade lung neuroendocrine carcinoma. Sci Rep 15, 24637 (2025). https://doi.org/10.1038/s41598-025-09762-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-09762-w