Predictive model for the 5-year survival status of osteosarcoma patients based on the SEER database and XGBoost algorithm

Jiang, Jiuzhou; Pan, Hao; Li, Mobai; Qian, Bao; Lin, Xianfeng; Fan, Shunwu

doi:10.1038/s41598-021-85223-4

Download PDF

Article
Open access
Published: 10 March 2021

Predictive model for the 5-year survival status of osteosarcoma patients based on the SEER database and XGBoost algorithm

Jiuzhou Jiang^1,2^na1,
Hao Pan³^na1,
Mobai Li^1,2,
Bao Qian^1,2,
Xianfeng Lin^1,2 &
…
Shunwu Fan^1,2

Scientific Reports volume 11, Article number: 5542 (2021) Cite this article

7407 Accesses
58 Citations
Metrics details

Subjects

Abstract

Osteosarcoma is the most common bone malignancy, with the highest incidence in children and adolescents. Survival rate prediction is important for improving prognosis and planning therapy. However, there is still no prediction model with a high accuracy rate for osteosarcoma. Therefore, we aimed to construct an artificial intelligence (AI) model for predicting the 5-year survival of osteosarcoma patients by using extreme gradient boosting (XGBoost), a large-scale machine-learning algorithm. We identified cases of osteosarcoma in the Surveillance, Epidemiology, and End Results (SEER) Research Database and excluded substandard samples. The study population was 835 and was divided into the training set (n = 668) and validation set (n = 167). Characteristics selected via survival analyses were used to construct the model. Receiver operating characteristic (ROC) curve and decision curve analyses were performed to evaluate the prediction. The accuracy of the prediction model was excellent both in the training set (area under the ROC curve [AUC] = 0.977) and the validation set (AUC = 0.911). Decision curve analyses proved the model could be used to support clinical decisions. XGBoost is an effective algorithm for predicting 5-year survival of osteosarcoma patients. Our prediction model had excellent accuracy and is therefore useful in clinical settings.

Development of nomogram and discussion of radiotherapy effect for osteosarcoma survival

Article Open access 05 January 2023

Predicting diagnosis and survival of bone metastasis in breast cancer using machine learning

Article Open access 25 October 2023

Improved gated recurrent unit-based osteosarcoma prediction on histology images: a meta-heuristic-oriented optimization concept

Article Open access 01 April 2025

Introduction

Osteosarcoma is the most common bone malignancy, with the highest incidence in children and adolescents^1,2,3. Osteosarcoma is the eighth most common cancer among childhood cancers¹. The incidence rate of childhood and adolescent osteosarcoma ranges between 4 and 7 per million persons per year among different ethnicities¹. The 5-year survival rate is usually used for evaluating treatments or risk factors^1,2,3,4,5. In the 1950s, the 5-year overall survival (OS) rate of patients with osteosarcoma was 22%⁶, but it has increased to 55–70% owing to the advancements in medicine in recent years^1,3,7,8,9.

The Surveillance, Epidemiology, and End Results (SEER) program, sponsored by the National Cancer Institute (NCI), is a system of population-based cancer registries that currently covers approximately 28% of the US population from geographically defined areas¹⁰. Survival prediction models for osteosarcoma patients have been constructed previously^11,12,13. However, the results of these studies have not been very satisfactory and they did not use data from the SEER database. Hence, further studies for better prediction models are needed.

For preparing prediction models for cancer, artificial intelligence (AI) models—constructed by machine learning (ML) algorithms—are common. However, most models are based on traditional ML algorithms created in the last century, including back propagation neural network (BPNN), multi-layer perceptron (MLP), decision tree, support vector machine (SVM), and Bayesian network¹⁴.

Extreme gradient boosting (XGBoost) is a large-scale machine-learning algorithm that was first officially published in 2016¹⁵. It is an improvement over the gradient boosting decision tree (GBDT). A single decision tree is a simple and weak classifier, but a tree ensemble model could be much better, such as the random forest¹⁶ and GBDT¹⁷. XGBoost is constructed by iterations for minimizing the loss of function¹⁵. Compared with GBDT, XGBoost uses a technique called ‘feature sub-sampling’, which is used in random forest to prevent over-fitting¹⁵. The XGBoost algorithm has been used widely in industries but rarely in medical research. Compared with traditional ML algorithms, XGBoost is more novel and complex. An important advantage of XGBoost over traditional ML algorithms is having random seeds that make the model better by repetitive operation even if the parameters are not changed. On comparison, SVM is not good at dealing with a problem with many samples and variables¹⁸, and the Bayesian network is easily and quickly trained, but is not complex enough.

Therefore, in the current study, we built an AI classifier by using the XGBoost algorithm to predict the 5-year survival of osteosarcoma patients, and aimed to construct a better AI prediction model. We extracted the samples from the SEER program database to train and cross-validate our prediction model. Additionally, to compare XGBoost to other traditional ML algorithms, we also built two other models by using SVM and the Bayesian network, which are common and representative ML algorithms in medical research. Receiver operating characteristic (ROC) analysis, area under the ROC curve (AUC) and decision curve analysis (DCA) of cross-validation were used for the evaluation of these three different models.

Results

Characteristics of the study population

The overall survival curve for 2694 osteosarcoma patients from the SEER program database declined much rapidly before the 5-year cut-off, compared with a slow downward trend in patient survival after 5 years (Fig. 1). Thus, predicting 5-year survival of osteosarcoma patients is of clinical value for treatment planning systems. We performed exclusion as shown in the flow chart (Fig. 2). Finally, 835 patients were included in our study. The study population was randomly divided into a training set (n = 668; 80%) and a validation set (n = 167; 20%).

There was no significant difference between the training and validation sets considering the 15 characteristics except primary tumor number (Table 1). The most common primary tumor sites were the arms and legs i.e., limbs (81.89% and 76.05% in the training and validation sets, respectively); few patients had local lymphatic metastasis (2.99% in the training and validation sets, both). Patients were more likely to have distant metastasis (21.21% and 19.76% in the training and validation sets, respectively). Most patients underwent surgery at the anatomical location (90.57% in the training set and 88.62% in the validation set) (Table 1).

Table 1 Characteristics of 835 patients included in the study.

Full size table

Survival analyses

According to the Kaplan–Meier survival curves (Fig. 3) and log-rank tests for categorical variables, sex (P = 0.060), chemotherapy (P = 296) and primary tumor number (P = 0.500) were not significant factors influencing survival, but anatomical location (P < 0.001), histological grade (P = 0.001), tumor extension (P < 0.001), radiation (P < 0.001), local lymphatic metastasis (P < 0.001), distant metastasis (P < 0.001) and surgery (P < 0.001) significantly affected patient survival (Table 2). In the Cox proportional hazards regression model, the hazard ratio (HR) was used for evaluating the relationship between the corresponding variable and patient survival. Age (HR 1.682, 95% CI 1.538–1.840; P < 0.001) and tumor size (HR 1.266, 95% CI 1.185–1.353; P < 0.001) were significantly related to patient survival (Table 2). But year of diagnosis was not (HR 1.010, 95% CI 0.895–1.141; P = 0.867).

Table 2 Survival analysis (Log-rank test and Cox regression model) evaluating the influence of characteristics for osteosarcoma patient survival.

Full size table

We selected following characteristics into model construction: anatomical location, histological grade, tumor extension, radiation, local lymphatic metastasis, distant metastasis, surgery, age and tumor size. These characteristics were significantly in the survival analyses. In addition, we take chemotherapy into our model as it is an important predictor of survival.

Model evaluation

Our model was tested and adjusted repeatedly, and the parameters were confirmed for the best model. The details of our model are shown in the supplementary materials (Material S1). To determine the accuracy of our models, we performed cross-validations. The ROC curves of the predictions for the training set (n = 668) and the validation set (n = 167) were constructed and the corresponding AUC was calculated. The XGBoost model had a better performance in the training set (AUC = 0.977, 95% confidence interval [CI] 0.968–0.986), compared with SVM (AUC = 0.817, 95% CI 0.785–0.852) and the Bayesian network (AUC = 0.817, 95% CI 0.785–0.849) (Fig. 4a.). In the validation set, the accuracy of the XGBoost model for predicting survival was higher (AUC = 0.911, 95% CI 0.865–0.956) than SVM (AUC = 0.801, 95% CI 0.726–0.876) and the Bayesian network (AUC = 0.781, 95% CI 0.689–0.873) (Fig. 4b). Our XGBoost model was better in predicting the 5-year survival of osteosarcoma patients as the AUC was over 0.9 in cross-validation (in both sets), compared to the other models.

Decision curves of the three models were constructed in our study (Fig. 5). The y-axis of the decision curve represents the net benefit, a decision analytic measure judging whether clinical decisions have more benefit than harm. Each point on the x-axis represents a threshold probability that differentiates between patients with 5-year survival and those without. The decision curve of XGBoost was greater than that of the other two models because the net benefit was the highest for most of the thresholds.

Discussion

Survival prediction for patients with malignancy is usually difficult but important, as it influences treatment planning and patient decision¹⁹. Compared with the empirical prediction from clinicians, our prediction model gives a more reliable choice for predicting the 5-year survival status of osteosarcoma patients. When clinicians prepare the plan for interventional or long-term therapy for patients, the expected survival time could be an influencing factor. Considering this, our prediction model could help prepare a reasonable therapy plan for personalized medicine.

Several survival prediction models have been used for osteosarcoma patients, including those based on nomograms (constructed by regression models)¹³, tomography images¹², or the ML algorithm¹¹. A 1-year survival prediction model using the Bayesian network was constructed in 2017¹¹, with an AUC of 0.767. However, this was a single-center study. Moreover, the 1-year survival rate of osteosarcoma patients is much higher than 5-year survival rate (Fig. 1), and is therefore not as meaningful as the 5-year survival. Furthermore, a 5-year survival prediction model for predicting the survival of patents with high-grade osteosarcoma was prepared using radiomics of tomography images¹². It was an innovative model, with an AUC of 0.86 in the training cohort and 0.84 in the validation cohort. However, this model used radiomics of tomography images to calculate a radiomics score for each patient and developed a multiple logistic regression model using radiomics score with the addition of several other characteristics. Logistic regression is a regular algorithm that can be replaced by a more complex algorithm. Thus, compared to these two studies, our study was a multicenter study and used a more accurate and stable algorithm to construct the prediction model. Therefore, our AI model based on XGBoost had a higher accuracy in predicting the 5-year survival of osteosarcoma patients (AUC = 0.977 and 0.911 in the training and validation sets, respectively); the accuracy of a prediction model is considered the most important quality¹⁴.

All the characteristics in our model were related to osteosarcoma patient prognosis. Histological grade and tumor extension influence survival time of patients. The histological grade of cancer is an indicator of the differentiation of tumor cells, and the tumor extension is used to express the degree of cancer progression^20,21. Moreover, age, tumor site, metastasis, therapy, and tumor size are important prognostic factors for osteosarcoma patients^6,7,9. In most previous prognostic models, age and tumor size were usually transformed to classified variables^11,12,13. The use of the method for transforming variables could help calculate the risk for different kinds of patients and help list the risk in a table. In our prediction model, we preferred to calculate the 5-year survival probability of a specific patient. This gives a more detailed and personalized prediction, which provides medical plans as detailed and customized as possible rather than similar medical plans for a class of patients. Personalized medicine and precision medicine have been focus areas in recent years, both of which are based on large omics, molecular diagnostics, and high-throughput technologies^22,23,24. Additionally, AI is an important tool for personalized medicine^25,26, and our AI-based prediction model could help in personal therapy planning, thereby assisting in personalized medicine. For example, a clinician could not decide to recommend a patient to perform surgery or not. He could use our model with the variable “Surgery” as “yes” and “no”. Comparing the results given by the two conditions could help for his decision.

XGBoost has outstanding performance for processing large-scale and high-dimensional data²⁷. However, for the first time, this algorithm has been used to construct prediction models for osteosarcoma patient survival. As XGBoost is good at dealing with complex problems, it is suitable for most other types of complex classification problems^27,28,29.

Our study had some advantages. First, the SEER database provided complete information of patients covering widespread areas. Second, our AI model could provide personalized survival prediction for patients, thereby providing individualized therapy. Finally, our AI model can be used to determine survival for more osteosarcoma patients because all the information used for predicting survival is easily accessible and our model can be optimized as a software-based or web-based tool.

However, the study has some limitations. First, our study was retrospective; prospective randomized clinical trials will be needed to provide high-level evidence for clinical application. Second, we could not acquire the socioeconomic status, obviously related to patient survival, and the incidence of pathologic fractures, an important prognostic factor for osteosarcoma. Finally, in the SEER data, “no” and “unknown” combined in one category in chemotherapy and radiation. We could not ignore the underreporting of chemotherapy and radiation.

In conclusion, we used the XGBoost algorithm to construct an AI model predicting the 5-year survival of osteosarcoma patients. Age, primary tumor site, histological grade, tumor extension, tumor size, local lymphatic metastasis, distant metastasis, radiation, chemotherapy and surgery were the characteristics contributing to the model. Our AI prediction model had excellent accuracy according to ROC analyses. As the clinical value of the model was confirmed considering DCA, we believe the developed AI model could be used as a clinical tool for helping clinicians in making better treatment decisions for osteosarcoma patients¹.

Materials and methods

Study population

We identified all cases of osteosarcoma listed in the SEER Research Database (2004–2014). The accession number is 10467-Nov 2018. There were 2694 cases and all were confirmed histologically as osteosarcoma. SEER*Stat Software (version 8.3.5) was used to extract these cases. We constructed a survival curve for the 2694 patients to evaluate the overall survival of osteosarcoma patients. However, most of the cases were excluded according to our inclusion and exclusion criteria. The inclusion criteria were as follows: (a) complete information about survival and follow-up available, (b) diagnosis of osteosarcoma as the primary malignant tumor. The exclusion criteria were as follows: (a) death due to other causes; (b) alive but survival < 5 years at the follow-up cut-off date; (c) information about tumor site, grade, tumor size, metastasis or therapy unavailable.

Variable selection

After comprehensive analyses for prognostic factors of osteosarcoma considering our clinical knowledge and previous studies^{7,8,9,30,31,32,33}, we selected 15 characteristics to be evaluated, including patient information (age, sex and year of diagnosis) and survival information (survival period and status at the follow-up cut-off date). Moreover, tumor information including the anatomical location, histological grade, tumor extension, tumor size, primary tumor number, local lymphatic metastasis, distant metastasis, radiation, chemotherapy and surgery was also taken into consideration.

We performed survival analyses using the patient and tumor information to determine the characteristics that significantly influenced patient survival. These analyses were performed before the exclusion of patients who alive but survival < 5 years at the follow-up cut-off date.

Construction of the prediction model

Our prediction model was based on XGBoost, a scalable tree boosting system. The model was trained using the training set and tested using the validation set to determine model accuracy. Before running the training program, a response variable was obtained for survival information. It reflected the survival status of patients at 5 years, in which 1 = survival and 0 = death. One-hot encoding was performed for the three multi-classified variables (anatomical location, histological grade, and tumor extension). Normalization was performed for the two continuous variables (age and tumor size).

Bagging (bootstrap aggregating) and boosting are ensemble learning methods that can integrate decision trees to reduce the model error³⁴. XGBoost combines the advantages of these two methods and effectively reduces the bias-related error and variance-related error of the model (Fig. 6). In our prediction model, the number of ensemble decision trees was 30 and the maximum depth of each tree was 12. This was calculated via repeated tries to get the best accuracy and avoid overfitting. The outcomes of XGBoost were continuous outputs between 0 and 1, which represented the probability of the corresponding patient survival for > 5 years.

Model evaluation

ROC curves were constructed for prediction in the training and validation sets. The AUC was used to evaluate the performance of our model. An AUC value between 0.5 and 1.0 is an important statistical property to evaluate binary classifiers³⁵.

DCA that evaluates and compares prediction models incorporating clinical consequences was another way to evaluate our model³⁶. Compared with traditional measures such as AUC that only represents the predictive accuracy, DCA give information about the clinical value of models³⁷. In our study, decision curves were constructed to calculate the net benefit across different threshold probabilities of our prediction.

For comparing XGBoost with other ML classifiers, we constructed two other prediction models, respectively, based on SVM and the Bayesian network.

Statistical analyses

The Mann–Whitney U test and chi-squared test were used to compare continuous variables and categorical variables, respectively. Kaplan–Meier survival analysis and log-rank test were performed to analyze the relationship between categorical variables and patient survival. A multivariate Cox proportional hazards regression model was constructed to analyze the relationship between continuous variables and patient survival. These test and analyses were performed using SPSS 25.0 software (IBM, Armonk, NY). R Version 3.4.4 (R Foundation for Statistical Computing, Vienna, Austria) was used to construct, train, and validate the prediction models with “xgboost” package. The decision curve analysis was also performed using R Version 3.4.4. A P-value of < 0.05 was considered statistically significant.

Ethical considerations

We obtained permission to access the files of SEER database. The personal identifying information was not involved in this study so that the informed consent was not required. This study was reviewed and approved by the Medical Ethic Committee of Sir Run Run Shaw hospital affiliated to Medical College of Zhejiang University. And the study approval number is SRRSH2017092101.

Ethical approval

Medical Ethic Committee of Sir Run Run Shaw hospital affiliated to Medical College of Zhejiang University waived the informed consent off because all the information of patients were accessed from SEER database (https://seer.cancer.gov/data/). We declare that all methods were performed in accordance with the relevant guidelines and regulations (Declaration of Helsinki).

References

Mirabello, L., Troisi, R. J. & Savage, S. A. Osteosarcoma incidence and survival rates from 1973 to 2004: Data from the surveillance, epidemiology, and end results program. Cancer 115, 1531–1543. https://doi.org/10.1002/cncr.24121 (2009).
Article PubMed Google Scholar
Ottaviani, G. & Jaffe, N. The epidemiology of osteosarcoma. Cancer Treat Res. 152, 3–13. https://doi.org/10.1007/978-1-4419-0284-9_1 (2009).
Article PubMed Google Scholar
Friebele, J. C., Peck, J., Pan, X., Abdel-Rasoul, M. & Mayerson, J. L. Osteosarcoma: A meta-analysis and review of the literature. Am. J. Orthop. (Belle Mead N.J.) 44, 547–553 (2015).
Google Scholar
Anderson, M. E. Update on survival in osteosarcoma. Orthop. Clin. N. Am. 47, 283–292. https://doi.org/10.1016/j.ocl.2015.08.022 (2016).
Article Google Scholar
Nathan, S. S. & Healey, J. H. Demographic determinants of survival in osteosarcoma. Ann. Acad. Med. Singap. 41, 390–399 (2012).
PubMed Google Scholar
Coventry, M. B. & Dahlin, D. C. Osteogenic sarcoma; a critical analysis of 430 cases. J. Bone Jt. Surg. Am. 39, 741–757 (1957).
Article Google Scholar
Mialou, V. et al. Metastatic osteosarcoma at diagnosis: Prognostic factors and long-term outcome—The French pediatric experience. Cancer 104, 1100–1109. https://doi.org/10.1002/cncr.21263 (2005).
Article PubMed Google Scholar
Boon, E. et al. Impact of chemotherapy on the outcome of osteosarcoma of the head and neck in adults. Head Neck 39, 140–146. https://doi.org/10.1002/hed.24556 (2017).
Article PubMed Google Scholar
Wang, W. et al. Survival and prognostic factors in Chinese patients with osteosarcoma: 13-year experience in 365 patients treated at a single institution. Pathol. Res. Pract. 213, 119–125. https://doi.org/10.1016/j.prp.2016.11.009 (2017).
Article PubMed Google Scholar
Noone, A. M. et al. Comparison of SEER treatment data with medicare claims. Med. Care 54, e55–e64. https://doi.org/10.1097/MLR.0000000000000073 (2016).
Article PubMed PubMed Central Google Scholar
Nandra, R., Parry, M., Forsberg, J. & Grimer, R. Can a Bayesian belief network be used to estimate 1-year survival in patients with bone sarcomas?. Clin. Orthop. Relat. Res. 475, 1681–1689. https://doi.org/10.1007/s11999-017-5346-1 (2017).
Article PubMed PubMed Central Google Scholar
Wu, Y. et al. Survival prediction in high-grade osteosarcoma using radiomics of diagnostic computed tomography. EBioMedicine 34, 27–34. https://doi.org/10.1016/j.ebiom.2018.07.006 (2018).
Article PubMed PubMed Central Google Scholar
Kim, M. S. et al. Prognostic nomogram for predicting the 5-year probability of developing metastasis after neo-adjuvant chemotherapy and definitive surgery for AJCC stage II extremity osteosarcoma. Ann. Oncol. 20, 955–960. https://doi.org/10.1093/annonc/mdn723 (2009).
Article CAS PubMed Google Scholar
Kazem, M. A. Predictive models in cancer management: A guide for clinicians. Surgeon 15, 93–97. https://doi.org/10.1016/j.surge.2016.06.002 (2017).
Article PubMed Google Scholar
Chen, T. Q. & Guestrin, C. XGBoost: A scalable tree boosting system. In Kdd'16: Proc. 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785 (2016).
Svetnik, V. et al. Random forest: A classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43, 1947–1958. https://doi.org/10.1021/ci034160g (2003).
Article CAS PubMed Google Scholar
Natekin, A. & Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobot. https://doi.org/10.3389/fnbot.2013.00021 (2013).
Article PubMed PubMed Central Google Scholar
Khadse, V., Mahalle, P. N. & Biraris, S. V. An empirical comparison of supervised machine learning algorithms for internet of things data. In 2018 Fourth International Conference on Computing Communication Control and Automation (Iccubea) (2018).
Cheon, S. et al. The accuracy of clinicians’ predictions of survival in advanced cancer: A review. Ann. Palliat. Med. 5, 22. https://doi.org/10.3978/j.issn.2224-5820.2015.08.04 (2016).
Article ADS PubMed Google Scholar
Edge, S. B., American Joint Committee on Cancer & American Cancer Society. AJCC Cancer Staging Handbook: From the AJCC Cancer Staging Manual 7th edn. (Springer, 2010).
Google Scholar
Meister, P. et al. Osteosarcoma—Histological-evaluation and grading. Arch. Orthop. Trauma Surg. 94, 91–98. https://doi.org/10.1007/Bf00433573 (1979).
Article CAS PubMed Google Scholar
Di Sanzo, M. et al. Clinical applications of personalized medicine: A new paradigm and challenge. Curr. Pharm. Biotechnol. 18, 194–203. https://doi.org/10.2174/1389201018666170224105600 (2017).
Article CAS PubMed Google Scholar
Dudley, J. T., Listgarten, J., Stegle, O., Brenner, S. E. & Parts, L. Personalized medicine: From genotypes, molecular phenotypes and the quantified self, towards improved medicine. In Pacific Symposium on Biocomputing 2015 (Psb), 342–346 (2015).
Alyass, A., Turcotte, M. & Meyre, D. From big data analysis to personalized medicine for all: Challenges and opportunities. Bmc Med. Genomics. https://doi.org/10.1186/s12920-015-0108-y (2015).
Article PubMed PubMed Central Google Scholar
Moon, H. et al. Ensemble methods for classification of patients for personalized medicine with high-dimensional data. Artif. Intell. Med. 41, 197–207. https://doi.org/10.1016/j.artmed.2007.07.003 (2007).
Article PubMed Google Scholar
Dilsizian, S. E. & Siegel, E. L. Artificial intelligence in medicine and cardiac imaging: Harnessing big data and advanced computing to provide personalized medical diagnosis and treatment. Curr. Cardiol. Rep. https://doi.org/10.1007/s11886-013-0441-8 (2014).
Article PubMed Google Scholar
Torlay, L., Perrone-Bertolotti, M., Thomas, E. & Baciu, M. Machine learning-XGBoost analysis of language networks to classify patients with epilepsy. Brain Inform. 4, 159–169. https://doi.org/10.1007/s40708-017-0065-7 (2017).
Article CAS PubMed PubMed Central Google Scholar
Zhong, J. et al. XGBFEMF: An XGBoost-based framework for essential protein prediction. IEEE Trans. Nanobiosci. 17, 243–250. https://doi.org/10.1109/TNB.2018.2842219 (2018).
Article Google Scholar
Chen, Y. et al. Classification of short single-lead electrocardiograms (ECGs) for atrial fibrillation detection using piecewise linear spline and XGBoost. Physiol. Meas. 39, 104006. https://doi.org/10.1088/1361-6579/aadf0f (2018).
Article PubMed Google Scholar
Clark, J. C. M., Dass, C. R. & Choong, P. F. M. A review of clinical and molecular prognostic factors in osteosarcoma. J. Cancer Res. Clin. Oncol. 134, 281–297. https://doi.org/10.1007/s00432-007-0330-x (2008).
Article CAS PubMed Google Scholar
Davis, A. M., Bell, R. S. & Goodwin, P. J. Prognostic factors in osteosarcoma—A critical-review. J. Clin. Oncol. 12, 423–431. https://doi.org/10.1200/Jco.1994.12.2.423 (1994).
Article CAS PubMed Google Scholar
Bacci, G. et al. Prognostic factors for osteosarcoma of the extremity treated with neoadjuvant chemotherapy—15-year experience in 789 patients treated at a single institution. Cancer 106, 1154–1161. https://doi.org/10.1002/cncr.21724 (2006).
Article PubMed Google Scholar
Kim, W. et al. Postmetastasis survival in high-grade extremity osteosarcoma: A retrospective analysis of prognostic factors in 126 patients. J. Surg. Oncol. 117, 1223–1231. https://doi.org/10.1002/jso.24963 (2018).
Article CAS PubMed Google Scholar
Ditzler, G., LaBarck, J., Ritchie, J., Rosen, G. & Polikar, R. Extensions to online feature selection using bagging and boosting. IEEE Trans. Neural Netw. Learn. Syst. 29, 4504–4509. https://doi.org/10.1109/Tnnls.2017.2746107 (2018).
Article PubMed Google Scholar
Fawcett, T. An introduction to ROC analysis. Pattern Recogn. Lett. 27, 861–874. https://doi.org/10.1016/j.patrec.2005.10.010 (2006).
Article Google Scholar
Vickers, A. J. & Elkin, E. B. Decision curve analysis: A novel method for evaluating prediction models. Med. Decis. Making 26, 565–574. https://doi.org/10.1177/0272989x06295361 (2006).
Article PubMed PubMed Central Google Scholar
Vickers, A. J., Van Calster, B. & Steyerberg, E. W. Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. Br. Med. J. https://doi.org/10.1136/bmj.i6 (2016).
Article Google Scholar

Download references

Acknowledgements

The authors thank the SEER team for help regarding clinical data acquisition. The SEER database is publicly available and we obtained the data use agreement for SEER database. This research was funded by the National Natural Science Fundation of China, Grant Numbers 81702143, 81772387 and 81472064.

Author information

These authors contributed equally: Jiuzhou Jiang and Hao Pan.

Authors and Affiliations

Department of Orthopaedic Surgery, Sir Run Run Shaw Hospital, Medical College of Zhejiang University, Hangzhou, China
Jiuzhou Jiang, Mobai Li, Bao Qian, Xianfeng Lin & Shunwu Fan
Key Laboratory of Musculoskeletal System Degeneration and Regeneration Translational Research of Zhejiang Province, Hangzhou, China
Jiuzhou Jiang, Mobai Li, Bao Qian, Xianfeng Lin & Shunwu Fan
Department of Orthopaedics, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
Hao Pan

Authors

Jiuzhou Jiang
View author publications
Search author on:PubMed Google Scholar
Hao Pan
View author publications
Search author on:PubMed Google Scholar
Mobai Li
View author publications
Search author on:PubMed Google Scholar
Bao Qian
View author publications
Search author on:PubMed Google Scholar
Xianfeng Lin
View author publications
Search author on:PubMed Google Scholar
Shunwu Fan
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization, J.J. and X.L.; Methodology, J.J., H.P. and M.L.; Software, J.J. and B.Q.; Investigation, M.L.; Resources, X.L. and S.F.; Data curation, H.P.; Formal analysis, M.L. and B.Q.; Validation, S.F.; Writing—original draft preparation, J.J. and H.P.; Writing—review and editing, J.J., H.P. and X.L; Visualization, B.Q.; Supervision, S.F.

Corresponding authors

Correspondence to Xianfeng Lin or Shunwu Fan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Jiang, J., Pan, H., Li, M. et al. Predictive model for the 5-year survival status of osteosarcoma patients based on the SEER database and XGBoost algorithm. Sci Rep 11, 5542 (2021). https://doi.org/10.1038/s41598-021-85223-4

Download citation

Received: 06 September 2020
Accepted: 26 February 2021
Published: 10 March 2021
DOI: https://doi.org/10.1038/s41598-021-85223-4

This article is cited by

Radiomics and deep learning model based on X-ray imaging for the assisted diagnosis of early Legg-Calvé-Perthes disease
- Dian Zhang
- Ya-nan Li
- Wan-liang Guo
BMC Musculoskeletal Disorders (2025)
Identification of relevant features using SEQENS to improve supervised machine learning models predicting AML treatment outcome
- Pedro Pons-Suñer
- François Signol
- Eva Barragán
BMC Medical Informatics and Decision Making (2025)
Development of a machine learning-based model to predict prognosis of alpha-fetoprotein-positive hepatocellular carcinoma
- Bingtian Dong
- Hua Zhang
- Chaoxue Zhang
Journal of Translational Medicine (2024)
Retrospective Analysis of Radiofrequency Ablation in Patients with Small Solitary Hepatocellular Carcinoma: Survival Outcomes and Development of a Machine Learning Prognostic Model
- Qi-fan He
- Yue Xiong
- Zhong-hua Chen
Current Medical Science (2024)
A comparative analysis of deep learning-based location-adaptive threshold method software against other commercially available software
- Daebeom Park
- Eun-Ah Park
- Whal Lee
The International Journal of Cardiovascular Imaging (2024)

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Characteristics of the study population

Survival analyses

Model evaluation

Discussion

Materials and methods

Study population

Variable selection

Construction of the prediction model

Model evaluation

Statistical analyses

Ethical considerations

Ethical approval

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links