Abstract
Upper gastrointestinal cancer (UGC) sometimes metastasizes to the splenic hilum lymph node (SHLN). However, surgical removal of SHLN is technically difficult, and the risk of postoperative complications is high. Although there are models that predict SHLN metastasis, they usually only provide point estimates of risk, and there is a lack of sufficient information. To address this issue, we aimed to develop a Bayesian logistic regression model called Bayes-SHLNM. The performance of the models was compared with that of the frequentist logistic regression (FLR) model as a benchmark, and the posterior probability distribution (PPD) was shown individually. The performance of Bayes-SHLNM was equivalent to that of the FLR model, and the PPD for each case was visualized as the uncertainty. These results indicate that the Bayes-SHLNM model has the potential to be used as a decision support system in clinical settings where uncertainty is high.
Similar content being viewed by others
Introduction
Gastric cancer ranks fifth in terms of incidence and fourth in terms of cancer-related mortality worldwide1. The principle of curative treatment for gastric cancer is surgical resection with appropriate regional lymphadenectomy, as excessive resection causes high morbidity and shortens life expectancy2,3,4,5,6,7. According to the Japanese Gastric Cancer Classification, the splenic hilar lymph node (SHLN) is a regional lymph node of gastric cancer in the upper third of the stomach. In total, 2.8–27.9% of SHLNs have metastasized8,9,10,11,12,13. Although Japanese phase III trial comparing splenectomy and spleen-preservation demonstrated survival non-inferiority, that trial was only limited to the upper advanced gastric cancer not invading the greater curvature6. Gastric cancer invading the greater curvature had high frequency and therapeutic value index at SHLNs, thus Japanese Gastric Cancer Treatment Guideline recommended splenic-hilar nodal dissection for these tumors2. Moreover, our retrospective study clarified that high frequency and therapeutic value index at SHLNs were shown even for gastric cancer without invasion to the greater curvature especially when the location was posterior wall and histology was undifferentiated type14. Thus, SHLN dissection with splenectomy is widely performed in Japan. Splenectomy has several disadvantages, particularly a high incidence rate of postoperative morbidity (approximately 20–30%), which has been reported to offset the survival benefit in certain cases in randomized controlled clinical trials and retrospective studies15,16,17,18,19. The importance of developing and selecting an appropriate surgical approach based on the oncological status of the cancer has intensified20,21. Furthermore, uniform treatment strategies typically applied to healthy patients are no longer feasible in patients with complex comorbidities or significant frailty22,23,24. Recently, there is a clear lack of tools and definitive criteria for decision-making that need to be developed. Surgeons and healthcare staff decide on the best operation for each patient based on the oncological profile and background of the patient. This complex decision-making process highlights the urgent need for tools that visualize and facilitate the sharing of critical information, thereby enhancing effective communication and enabling comprehensive decision-making in healthcare settings.
Machine learning techniques have been advancing rapidly in recent years and are being implemented in the field of medical oncology25,26,27,28,29,30,31,32,33. A key feature of machine learning is its data-driven nature, in which algorithms analyze large amounts of data in order to discover patterns and rules. Another key feature is its adaptability, which allows models to be updated and improve their accuracy as new data becomes available. These features make machine learning a powerful tool for addressing complex medical challenges, where data-driven insights and adaptability are critical for improving predictive performance and tailoring clinical decisions. Various machine learning models for predicting lymph node metastasis (LNM) in gastric cancer have been developed, mainly focusing on the eligibility for endoscopic resection in early gastric cancer and the prognosis of advanced gastric cancer (AGC)34,35,36. Notably, no models have been developed that attempt to change the surgical plans or extent of lymph node dissection37,38,39,40. Traditional machine learning models based on frequentist approaches fail to meet the clinical practice requirements because of their inability to predict uncertainty. Even in high-performance models, it is difficult to change clinical decision-making processes. The Bayesian approach allows prior information and beliefs to be incorporated into statistical inferences, thus providing a powerful tool for decision-making under uncertainty. Unlike traditional frequentist methods, the Bayesian framework provides intuitive interpretability by expressing results as probabilities. For example, clinicians can better understand statements such as “there is a 95% probability that the odds ratio is between 2.0 and 3.0,” which directly conveys the uncertainty associated with a prediction. This probabilistic interpretation makes the Bayesian approach particularly useful for complex clinical decisions where actionable insights must account for uncertainty. Another key advantage of Bayesian methods is their ability to quantify uncertainty through the posterior distributions of model parameters, thereby providing a comprehensive understanding of prediction confidence. Bayesian methods are inherently adaptive, allowing models to be dynamically updated as new data becomes available. By leveraging these strengths - intuitive interpretability, robust uncertainty quantification, and adaptability - the Bayesian approach offers a unique advantage over traditional methods in addressing the challenges of clinical decision-making. In the medical field, Bayesian inference has demonstrated effectiveness in various settings, including diagnosis, treatment planning, and epidemiological studies41,42,43. In this context, uncertainty may play an important role in determining the need for SHLN dissection.
In this study, we have developed a Bayesian logistic regression model called the Bayesian prediction of SHLN metastasis (Bayes-SHLNM) to identify patients with SHLN who underwent total gastrectomy and splenectomy between 2000 and 2012. The primary distinction between Bayesian and frequentist modeling approaches lies in how they handle uncertainty and parameter estimation. Frequentist models, such as the traditional logistic regression model (frequentist logistic regression [FLR]), provide single-point parameter estimates, leading to a single predictive model. By contrast, Bayesian methods treat parameters as random variables with distributions, resulting in a collection of models rather than a single model (Fig. 1). This ensemble of models captures the uncertainty inherent in the parameter estimates. The key advantage of the Bayesian approach is that it yields the posterior probability distribution (PPD) for SHLN metastasis risk. This distribution offers a richer and more comprehensive view of the predictive uncertainty than the point estimates provided by frequentist models. The Bayesian models were benchmarked against the FLR model to evaluate their performances. The PPD for each prediction was visualized to demonstrate the uncertainty and range of possible outcomes, which can be crucial for clinical decision-making in high-risk and uncertain scenarios.
This figure illustrates the different approaches of frequentist and Bayesian logistic regression models in predicting the probability of SHLN metastasis in patients who underwent total gastrectomy with splenectomy. The frequentist model (top pathway) uses clinical, tumor, lymph node location, and pathological information to generate a single-point probability estimate (e.g., 72%) for metastasis. In contrast, the Bayesian model (bottom pathway) incorporates the same data sources but outputs a posterior probability distribution (PPD), offering a more comprehensive view of the uncertainty associated with the prediction. The PPD allows for visualization of the range of possible outcomes and highlights the degree of uncertainty, which is critical for informed clinical decision-making in scenarios with high risk and uncertainty.
In this study, we aimed to develop a predictive model useful for decision-making focused on prediction uncertainty in Bayesian inference and visualize the individual PPD of SHLN metastasis in upper gastrointestinal cancer (UGC) based on clinicopathologic characteristics.
Results
Study population selection process
The patient selection flowchart is shown in Fig. 2. Between January 2000 and December 2012, a total of 5957 patients underwent gastrectomy with nodal dissection. Of these, 798 patients underwent total gastrectomy with splenectomy for primary gastric cancer. In total, 35 patients diagnosed with pT0 or pT1, 169 patients diagnosed with pStage IV, and 1 patient who underwent R1 or R2 resection were excluded. The final study population comprised 593 patients.
Flowchart depicting the selection process for the study population. From January 2000 to December 2012, a total of 5957 patients underwent gastrectomy with nodal dissection. Of these, 798 patients underwent total gastrectomy with splenectomy for primary gastric cancer. Exclusions were made for 35 patients diagnosed with pT0 or pT1 disease, 169 patients diagnosed with pStage IV disease, and 1 patient who underwent R1 or R2 resection. The final study population consisted of 593 patients.
Baseline characteristics
The study cohort comprised 593 patients. Male sex was predominant, and 15.2% of the patients received neoadjuvant chemotherapy. Half of the tumors predominantly invaded the upper gastric body, and 35.8% had greater curvature invasion (GCI). The predominant histology (histology 1) was poorly differentiated adenocarcinoma non-solid type (por2), and signet-ring cell carcinoma (sig) was the second most common histological component (histology 2). SHLN metastasis (#10) was 8.1%, which was the prediction target. The most frequent LNMs were found along the lesser curvature (#1, #3). LNMs adjacent to the SHLN (#4sb, #4d, #11d) were found in 7.3–12% of the patients. The cohort characteristics are summarized in Table 1.
Comparison of the performance of the Bayes-SHLNM and frequency-based logistic machine learning models
Table 2 shows the 5-fold cross-validation (5fCV) performance of the Bayesian and FLR models. Among the four Bayesian models, the Bayes-SHLNM model showed superior performance in terms of the receiver operating characteristic area under the curve (ROC AUC) (0.83), precision-recall AUC (PRAUC, 0.35), and F1 score (0.31), with comparable results to those of the FLR model (Fig. 3). These findings indicate that the Bayes-SHLNM is a robust alternative to the FLR model.
a The receiver operating characteristic-area under the curve (AUC) compares the true positive rate (sensitivity) and false positive rate for each model. The Bayes-SHLNM model achieves the highest AUC of 0.83 [95% confidence interval (CI), 0.74–0.91]. The shaded areas around each curve represent the 95% confidence intervals for each model. b The precision-recall curve evaluates model performance with respect to precision and recall. The Bayes-SHLNM performed best among four Bayesian models, with an AUC of 0.35 [95% CI, 0.14–0.56], which was comparable with the FLR model (AUC = 0.37 [95% CI, 0.13–0.61]). The shaded regions represent the 95% CIs.
The results of the Bayes-SHLNM and FLR model predictions obtained from 5fCV are summarized in Table 3 and Supplementary Table 1. When the tumors were divided into two categories based on whether the tumor had GCI, which is recommended as an indication for SHLN dissection, the models predicted positive results equally in both categories in approximately 20% of cases, whereas tumors without GCI were predicted as negative precisely in 99% of cases. The results of Bayes-SHLNM and FLR were found to be similar; however, for cases without GCI, Bayes-SHLNM performed slightly better, whereas for cases with GCI, FLR demonstrated slightly higher accuracy.
The Bayes-SHLNM model posterior probability distribution for individual patients
Six demonstrable cases are shown in Figs. 4, 5, where the Bayes-SHLNM model provided a PPD with prediction and uncertainty. Figures 4, 5 demonstrate representative cases of individual posterior distributions of the probability of SHLN metastasis inferred using the Bayes-SHLNM model. Non-GCI (NonG)-Case 1 (Fig. 4a) was a 61-year-old man with AGC without GCI (U, Post, Type 1, 60 mm, por2 > sig, pT4a [SE]) and LNM (#1, #3, #4sa, #4sb, #11d). According to the Japanese Gastric Cancer Association (JGCA) guidelines (version 6), tumors that do not invade the greater curvature line are strongly recommended not to undergo SHLN dissection, including splenectomy. However, this model can provide an opportunity to reconsider whether a patient is at risk of undergoing SHLN dissection. WithG-Case 1 (Fig. 5a) was a 69-year-old man with AGC, GCI (UML, Circ, Type 4, 210 mm, por2 > sig, pT4a [SE]), and LNM (#1, #2, #3, #4sa, #4sb, #5, #6, #7, #9, #11p, #11d). NonG-Case 2 (Fig. 4b) was a 56-year-old woman with AGC without GCI (U, Less, Type 2, 60 mm, tub2, pT3 [SS]) and LNM (#3). The posterior distribution of these two cases supports decisions based on the JGCA guideline. Tumors that invaded the greater curvature line are weakly recommended to undergo SHLN dissection, including splenectomy. WithG-Case 2 (Fig. 5b) was a 66-year-old woman with AGC with GCI (U, Ant, Type 2, 45 mm, tub2, pT3 [SS]) and no LNM. The posterior distribution of this case supports reconsideration of the decision based on the JGCA guidelines. This model can provide us with good information to reach a consensus on whether a patient should undergo splenectomy.
a NonG-Case 1: A 61-year-old man with advanced gastric cancer (AGC) without greater curvature invasion (U, Post, Type 1, 60 mm, por2 > sig, pT4a [SE]) and lymph node metastasis (LNM) (#1, #3, #4sa, #4sb, #11d). The Bayes-SHLNM model predicted a posterior distribution with a mean probability of 0.502 for SHLN metastasis. Although the Japanese Gastric Cancer Association (JGCA) guidelines strongly discourage SHLN dissection, this model allows for reconsideration based on the probability of metastasis. b NonG-Case 2: A 56-year-old woman with AGC without greater curvature invasion (U, Less, Type 2, 60 mm, tub2, pT3 [SS]) and LNM (#3). The model predicted a low probability of SHLN metastasis (p-mean = 0.025), consistent with the weak recommendation for splenectomy in the JGCA guidelines. c NonG-Case 3: A 79-year-old man with AGC without greater curvature invasion (U, Ant, Type 2, 100 mm, tub2 > por1, pT3 [SS]) and LNM (#1, #3, #7, #9, #11p). Despite a positive prediction by the Bayes-SHLNM model, the uncertainty in the posterior distribution suggests reconsideration of splenectomy, especially considering the advanced age of the patient.
a WithG-Case 1: A 69-year-old man with advanced gastric cancer (AGC) with greater curvature invasion (UML, Circ, Type 4, 210 mm, por2 > sig, pT4a [SE]) and lymph node metastasis (LNM) (#1, #2, #3, #4sa, #4sb, #5, #6, #7, #9, #11p, #11d). The Bayes-SHLNM model predicted a posterior probability distribution with a mean of 0.674 for SHLN metastasis, consistent with the recommendation for splenectomy in the JGCA guideline. b WithG-Case 2: A 66-year-old woman with AGC with greater curvature invasion (U, Ant, Type 2, 45 mm, tub2, pT3 [SS]) and no LNM. The posterior distribution shows a mean probability of 0.023 for SHLN metastasis, conflicting with the recommendation against splenectomy according to the JGCA guidelines. c WithG-Case 3: A 55-year-old man with AGC with greater curvature invasion (M, Gre, Type 5, 77 mm, por2 > tub2, pT2 [MP]) and LNM (#3, #4d). Although the Bayes-SHLNM model predicted “negative” for SHLN metastasis, the posterior distribution shows uncertainty with a mean of 0.085 and a probability range of 0–0.2 in the 95% high-density interval. This uncertainty suggests that splenectomy should be reconsidered.
However, Figs. 4c, 5c showed two cases in which decision-making might not change if uncertainty was considered. Therefore, predicting uncertainty is unacceptable for clinical judgment. Clinicians can then rely on the JGCA guidelines, patient will, or institutional policies. One (Fig. 5c, WithG-Case 3) was a 55-year-old man with AGC, GCI (M, Gre, Type 5, 77 mm, por2 > tub2, pT2 [MP]), and LNM (#3, #4d). The Bayes-SHLNM predicted “negative” but had room to reconsider performing as the positive case because the mean posterior distribution was 0.075, with a range of 0–0.2 in 95% highest density interval (HDI). The other patient (Fig. 4c, NonG-Case 3) was a 79-year-old man with AGC without GCI (U, Ant, Type 2, 100 mm, tub2 > por1, pT3 [SS]) and LNM (#1, #3, #7, #9, #11p). The Bayes-SHLNM model predicted “positive,” but the PPD with the uncertainty made room to reconsider not performing a splenectomy because of the patient’s age. All individual PPDs of the 5fCV models are shown in Supplementary Figure 1.
Posterior distribution of the regression coefficient parameters
Figure 6 shows the parameters of the 47 regression coefficients in the Bayes-SHLNM model trained using 593 cases. Notably, both the #4sb and #4sa coefficients had values > 0 within 95 HDI, suggesting a significant positive influence on the model. Tumor location in the greater curvature, tumor size, predominant histological por2, LNM #11d, and LNM #12a tended to be positive parameters.
This figure presents the posterior distributions of the 47 regression coefficients obtained from the Bayes-SHLNM model, which was trained on data from 593 cases. The regression coefficients indicate the impact of various clinical, tumor-related, histological, and lymph node metastasis (LNM) factors on the probability of SHLN metastasis in UGC. a Clinical features: Include variables, such as age, sex, and neoadjuvant chemotherapy. b Tumor location: Describes the anatomical site of the tumor, including the upper third, anterior wall, and involvement of the greater curvature. c Tumor features: • Macroscopic types (0–5): Different macroscopic types of the tumor. • Tumor size: Represents the maximum tumor diameter, standardized as Z-scores. • Pathological T stage: Refers to the depth of tumor invasion into the gastric wall, discretized into six values corresponding to T1a (1), T1b (2), T2 (3), T3 (4), T4a (5), and T4b (6). d Histological features: The histological subtypes are classified as follows: • Papillary adenocarcinoma (pap). • Tubular adenocarcinoma (tub): ○ (1) Well differentiated (tub1). ○ (2) Moderately differentiated (tub2). • Poorly differentiated adenocarcinoma (por): ○ (1) Solid type (por1). ○ (2) Non-solid type (por2). • Signet-ring cell carcinoma (sig). • Mucinous adenocarcinoma (muc). e LNM (#1–#12a): Represents LNM across various nodal stations. Both coefficients for #4sb and #4sa showed values exceeding 0 within the 95% highest density interval, indicating their significant positive influence on predicting SHLNM. Additionally, the location of the tumor in the greater curvature, tumor size (Z-score), non-solid type of poorly differentiated adenocarcinoma (por2), LNM #11d, and LNM #12a exhibited positive parameter values, suggesting a higher likelihood of SHLN metastasis in the presence of these factors.
Discussion
In the present study, we developed Bayesian models to predict SHLN metastasis in UGC using data from 593 patients with UGC who underwent TGS. To the best of our knowledge, this is the first report of a machine-learning model using Bayesian techniques for the prediction of SHLNM. The Bayes-SHLNM model performed comparably to the frequentist LR model, with a mean ROC AUC of 0.83 and an F1 score of 0.3 by 5fCV. When UGC was divided into two categories based on whether the UGC had GCI, which has been widely used as an indicator of SHLN dissection in Japan, positive predictive values, or precisions, were 20% regardless of GCI, whereas the negative predictive values were 99% in UGCs without GCI and 91% in UGCs with GCI2. Moreover, the Bayes-SHLNM model demonstrated PPDs that provided uncertainty in the prediction of individual cases. These results suggest that the Bayes-SHLNM model has the potential to help in clinical decision-making regarding whether SHLN dissection should be performed for UGC.
The Bayes-SHLNM model is comparable with the frequentist model. Previous models for predicting LNM in gastric cancer have focused on the decision between surgery and endoscopic resection using regional LN metastasis prediction based on a frequentist approach. They reported that their models performed, with the following values: AUC, 0.69–0.94; F1, 0.29–0.31; precision, 8–21%; and negative predict value, 99%34,35,36. Given the well-established benefits of gastric surgery, the false negative rate of LNM prediction models for early gastric cancer is unacceptable. However, in our study, considering the wide range of safety and feasibility of TGS in real-world data, the acceptable levels of false negatives and false positives differed for each cohort. In this study, we develop a prediction model with a level of predictive performance comparable with that of a frequentist, together with a range of uncertainty for these predictions.
The strength of this model is that it provides a posterior probability density distribution for each case; that is, uncertainty is accounted for. The benefit of considering uncertainty in decision-making has been reported in the literature, especially for situations that may occur infrequently but have significant negative effects if overlooked in unbalanced and limited data43,44. This model provides an effective individual indicator that can be used to evaluate and discuss the pros and cons of performing the invasive treatment, SHLN dissection, with knowledge of the advantages of performing the dissection and the disadvantages of developing complications in the case.
In addition, incorporating uncertainty into decision-making has several practical benefits in clinical settings. First, uncertainty estimates allow clinicians to identify cases in which predictions are highly reliable versus those in which additional diagnostic testing or consultation may be needed. For example, if the model predicts SHLN metastasis with high certainty, it can streamline the decision to perform dissection. Conversely, when the model’s uncertainty is high, it highlights the need for further investigation or a more cautious approach.
Secondly, quantifying uncertainty facilitates better communication between clinicians and patients. By presenting predictions as probability distributions rather than definitive outcomes, clinicians can transparently discuss the risks and benefits, enabling shared decision-making and helping patients feel more informed and involved in their care. Incorporating these advantages into clinical workflows ensures that uncertainty not only informs decision-making but also enhances robustness, balancing the risks and benefits of invasive interventions such as SHLN dissection. We believe that this feature can serve as a powerful communication tool, fostering better collaboration among clinicians and between clinicians and patients. In situations with significant uncertainty, this method allows for the effective sharing of complex information and helps all parties understand the risks and also the probabilities involved. Through improved communication, clinicians and patients can work together to select informed and acceptable treatment options tailored to the unique circumstances of each case.
Furthermore, although the Bayesian model has demonstrated its ability to provide uncertainty estimates, comparisons with other machine learning models, such as random forests, remain an important direction for future work. Random forests and similar models can generate prediction intervals based on the variance in the data, providing insight into data uncertainty. However, these intervals primarily reflect data variability and do not account for model uncertainty, which is a critical component of clinical decision-making. By contrast, Bayesian methods provide a more comprehensive framework by quantifying both data and model uncertainty, which can be particularly valuable in situations with limited or imbalanced data.
Expanding this study to include comparisons with these models would provide a broader perspective on the strengths and limitations of Bayesian approaches. Such comparisons would require significant additional work in order to ensure fair and rigorous evaluations as well as thorough validation using external datasets. Although this is beyond the scope of this study, we acknowledge the value of this approach and suggest that it is an important avenue for future research. These efforts would help further establish the robustness and clinical utility of Bayesian models compared to other predictive modeling techniques.
Hyperparameter selection plays a critical role in both frequentist and Bayesian modeling approaches; however, the methodologies are fundamentally different. In frequentist models, hyperparameters such as regularization strength are treated as fixed values determined by cross-validation or optimization techniques. In our study, we used Optuna for efficient hyperparameter tuning in the frequentist logistic regression to ensure optimal model performance.
In contrast, Bayesian models treat hyperparameters as probabilistic variables and often assign prior distributions to directly incorporate uncertainty into the modeling process. In this study, several prior distributions of Bayesian model parameters were evaluated to assess their impact on predictive performance and uncertainty quantification. The horseshoe performed the best, demonstrating superior accuracy and robustness. The horseshoe prior is particularly advantageous because of its ability to handle sparsity and mitigate overfitting, making it a natural choice for our dataset, which has a moderate number of predictors relative to the sample size.
This distinction between the frequentist and Bayesian approaches highlights the flexibility and robustness of Bayesian models, particularly in contexts where quantifying uncertainty is critical. By incorporating hyperparameter uncertainty into posterior distributions, Bayesian models provide a more comprehensive framework for prediction and decision-making than frequentist methods that rely on fixed hyperparameter values.
We selected 47 parameters as the coefficients of logistic regression and examined the training models with four different super-prior distributions for regularization. The horseshoe prior model performed the best and exhibited the strongest regularization among the four models analyzed. This result can be attributed to correlations among various explanatory factors, such as tumor morphology, size, location, histology, and LNM sites. Because clinical decisions are usually based on a combination of all these factors, we decided not to omit categories arbitrarily, but to adjust the regularization to address overfitting and multicollinearity. Figure 6 shows the posterior distributions of the 47 parameters. Only #4sa and #4sb exceeded 95% HDI. These items are consistent with the factors identified in previous statistical methodologies. Previous studies have identified independent risk factors for SHLN, such as type 4 macroscopic type, larger and deeper invaded tumor, certain regional lymph node metastases (#4sa, #4sb, #7, or #11), and undifferentiated-type histology, which is consistent with our results45,46. In our study, the pathological T-factors and #7 were not significant, which can be explained by the exclusion of pStage IV, including #16LN and CY positivity, in our cohort.
Several parameters were zero within their 95% uncertainty ranges. During the experimental phase, we explored models that included the selected parameter subsets. However, given the limited dataset used in our experiments, it was challenging to completely eliminate subjectivity in the feature selection methods. Therefore, we decided not to perform feature selection. Instead, we chose to include all 47 parameters and relied on the regularization provided by the horseshoe prior to effectively addressing overfitting and multicollinearity. We believe that this approach provides a more comprehensive representation of the data and is more consistent with the multifactorial nature of clinical decision-making. In addition, we also recognize that the use of feature selection may become a more viable approach as larger datasets become available for future studies. This could help further reduce multicollinearity and improve model interpretability without compromising predictive accuracy.
This study has some limitations. First, this was a retrospective study involving a certain patient group from a single high-volume institution. Following the publication of the JGCA guidelines in Japan, splenectomy with SHLN dissection was not performed in patients with GC without GCI. Consequently, a larger sample size is not expected in the future. Furthermore, the low rate of SHLN metastasis leads to imbalanced data, which results in the suboptimal performance of traditional machine learning methods. However, the Bayesian approach adopted in this study shows promise for addressing this issue. Second, the model was not validated using an external cohort. Therefore, further validation using a different cohort is required. Third, we could not verify the accuracy of the PPD. However, the most important factor is its usefulness in clinical decision making. Future prospective studies are required. In addition, integrating explainability into a model remains challenging. In general, deterministic models tend to align well with explainability. Incorporating explainability into a Bayesian model, which visualizes uncertainty, is an important future direction. Addressing this challenge could further enhance the clinical utility of the model and improve its practical adoption. Finally, the validity and interpretability of the uncertainty bounds provided by the Bayesian model remain unclear. To assess the accuracy of PPD and address potential model misspecifications, future research will require both prospective studies and also simulation-based analyses. These analyses examine the calibration of the Bayesian model under various scenarios, including cases involving misspecifications. In addition, validation using external datasets will provide further insight into the robustness of uncertainty estimates in different contexts. These efforts are critical for ensuring the reliability and practical applicability of the Bayesian approach in clinical settings.
In conclusion, the Bayes-SHLNM model demonstrates a performance equivalent to that of the traditional FLR while providing individual PPDs. It demonstrates potential contributions to decision-making processes and suggests promising prospects for personalized precision medicine.
Methods
Setting and ethical approval
All methods were performed in accordance with the ethical guidelines for medical and health research involving human subjects. Informed consent was obtained from all patients. This retrospective cohort study was approved by the Institutional Review Board of the National Cancer Center (2016-496, 2017-077). The study was conducted in accordance with the principles of the Declaration of Helsinki.
Datasets
In the present study, we used a cohort previously reported by Yura et al. 47 This study involved a retrospective review of the clinical records of 593 patients diagnosed with advanced gastric cancer classified as stages T2–T4. These stages are based on the depth of tumor invasion into the stomach wall: T2 indicates invasion into the muscle layer, T3 indicates invasion into the connective tissue beneath the outer membrane, and T4 indicates invasion into the membrane itself or adjacent structures. All of the patients had tumors located in the upper third of the stomach and underwent total gastrectomy (complete removal of the stomach) combined with splenectomy (removal of the spleen) and extensive lymph node dissection called D2 dissection. D2 dissection involves the removal of all the regional lymph nodes around the stomach, including those near the major blood vessels supplying the stomach. Surgeries were performed between January 2000 and December 2012 at the National Cancer Center Hospital in Japan. Importantly, all patients underwent curative surgery (referred to as R0 resection), indicating that no visible or microscopic tumors remained after surgery. Resected specimens were examined and evaluated according to the Japanese Classification of Gastric Carcinoma48,49.
Criteria for patient selection
The following criteria were used to select the study population:
Initial Pool: Patients who underwent gastrectomy with nodal dissection between January 2000 and December 2012
Inclusion Criteria:
Patients who underwent total gastrectomy with splenectomy for primary gastric cancer
Exclusion Criteria:
1. Patients diagnosed with pT0 or pT1 disease
2. Patients diagnosed with pStage IV disease (#16 LN metastasis, positive cytology)
3. Patients who underwent R1 or R2 resection
Criteria for variable selection
The explanatory variables for the model were selected from a comprehensive list of items reported to be clinically relevant for predicting SHLN metastasis in gastric cancer. The selected variables encompassed a range of clinical and pathological factors that could be predicted by preoperative examination. Clinical data included age, sex, whether neoadjuvant chemotherapy was administered, and whether the tumor invaded the greater curvature. Pathological data were classified according to the Japanese Classification of Gastric Carcinoma, including tumor location, cross-sectional area, macroscopic type, tumor size, predominant histological type, secondary predominant histological type, third predominant histological type, and metastasis to regional lymph nodes (numbers 1–12a) (Fig. 7). Common types were individually categorized, whereas special types were collectively classified as “special (sp)” due to their low incidence48,49.
This is a modified figure from the reference56. The regional lymph node metastasis is evaluated based on the findings of diagnostic imaging, such as those from an endoscopic ultrasound and CT.
Software and the basic structure of the model development
We designed our model using Python 3.10 and PyMC 5.9.250. The outcome variable for SHLN metastasis was binary (0 for absence, 1 for presence). The explanatory variables were incorporated using logistic regression. We assume that the output followed a Bernoulli distribution. We selected a non-informative prior for our Bayesian model and applied four superior distributions for regularization: a normal distribution, Student’s T, Laplace, and horseshoe priors51. Details of the horseshoe prior are provided in Supplementary Figures 2, 3. Additionally, for the performance benchmarking, the FLR model was developed by Scikit-learn module, tuned with the hyperparameter optimization framework “Optuna” version 3.5.052,53.
Selection and normalization of explanatory variables
Continuous variables, such as age, tumor size, and pathological T category, were standardized by applying Z-score normalization to convert them to a standard normal distribution. For categorical variables, we used one-hot encoding to transform them into a format suitable for the model input. The one-hot encoded variables were standardized using Z-score normalization.
Sampling and inferred posterior probability distribution
In this study, PyMC was used to estimate the PPD of the Bayesian model. We employed the No-U-Turn Sampler algorithm to perform the Markov Chain Monte Carlo (MCMC) sampling54. A total of 5000 samples were collected. The initial 2000 samples were discarded as burn-in to ensure a more accurate estimation of the distribution after convergence. Four independent chains were run to ensure sample diversity. The acceptance rate was set to 0.99 to achieve efficient and accurate sampling.
Evaluation of the model’s performance using the internal cross-validation method
To compare the model performance, we used a stratified 5fCV approach. This process involved creating five separate models, each tested on different data subsets to evaluate their performance. In the training datasets, after samples were drawn by MCMC sampling, samples of all channels obtained from posterior sampling were combined in each case as a PPD. Their means were estimated as predictions and compared with observed SHLNM55. For the decision-making, the thresholds were calculated using the following procedure:
-
1.
From the PPD of the training datasets for each fold, the mean was extracted as the predicted probability for each case.
-
2.
These predicted probabilities were used to construct an ROC curve and the Youden index was applied to determine the optimal threshold for classification. This threshold was used as the decision boundary and is referred to as the Yi threshold.
During testing, the models predicted outcomes, where a positive result was indicated if the mean posterior probability was above this threshold. These predictions were then compared with the actual outcomes, with the performance assessed using metrics, such as the ROC AUC, PRAUC, sensitivity, specificity, precision, and F1 score. This process was repeated for each of the five created datasets, and the resulting average of each value was calculated to assess the overall effectiveness of our model.
In addition to our Bayesian model, FLR models were developed for benchmark purposes. We applied the same rigorous model development, testing, and evaluation process to the FLR model used in our main model, effectively comparing how well each model predicted SHLN metastasis.
Evaluation of the utility of posterior probability distribution
Uncertainty was assessed using individual PPDs, and the feasibility of the model for clinical implementation was examined. The 95% HDI range, mean, and median of the PPDs were calculated and expressed as density distributions in kernel plots. Cases were visualized as GCI or not, which is an important indicator for clinical decision-making according to the JGCA guideline criteria2. Cases were demonstrated in which the model predictions themselves could change the clinical decision, whereas cases in which the uncertainty of the prediction could help decision-making were also shown.
Uncertainty evaluation of Bayesian regression coefficients
Using all the training cohorts, we developed a final Bayesian logistic regression model and examined the posterior distributions of the parameters using a 95% HDI. The coefficients for which the posterior distributions of the parameters did not cross zero were defined as significant.
Data availability
The data used in this study are not available on public accessdue to patient privacy concerns but are available from the corresponding author upon reasonable request.
Code availability
Code is available upon request from the corresponding authors.
References
Sung, H. et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J. Clin. 71, 209–249, https://doi.org/10.3322/caac.21660 (2021).
Japanese Gastric Cancer, A. Japanese Gastric Cancer Treatment Guidelines 2021 (6th edition). Gastric Cancer 26, 1-25, https://doi.org/10.1007/s10120-022-01331-8 (2023).
Sasako, M. et al. D2 lymphadenectomy alone or with para-aortic nodal dissection for gastric cancer. N. Engl. J. Med. 359, 453–462, https://doi.org/10.1056/NEJMoa0707035 (2008).
Sasako, M. et al. Left thoracoabdominal approach versus abdominal-transhiatal approach for gastric cancer of the cardia or subcardia: a randomised controlled trial. Lancet Oncol. 7, 644–651, https://doi.org/10.1016/S1470-2045(06)70766-5 (2006).
Kurokawa, Y. et al. Bursectomy versus omentectomy alone for resectable gastric cancer (JCOG1001): a phase 3, open-label, randomised controlled trial. Lancet Gastroenterol. Hepatol. 3, 460–468, https://doi.org/10.1016/s2468-1253(18)30090-6 (2018).
Sano, T. et al. Randomized Controlled Trial to Evaluate Splenectomy in Total Gastrectomy for Proximal Gastric Carcinoma. Ann. Surg. 265, 277–283, https://doi.org/10.1097/sla.0000000000001814 (2017).
Cuschieri, A. et al. Postoperative morbidity and mortality after D1 and D2 resections for gastric cancer: preliminary results of the MRC randomised controlled surgical trial. Lancet 347, 995–999, https://doi.org/10.1016/s0140-6736(96)90144-0 (1996).
Sasada, S. et al. Frequency of lymph node metastasis to the splenic hilus and effect of splenectomy in proximal gastric cancer. Anticancer Res. 29, 3347–3351 (2009).
Kunisaki, C. et al. Impact of splenectomy in patients with gastric adenocarcinoma of the cardia. J. Gastrointest. Surg. 11, 1039–1044, https://doi.org/10.1007/s11605-007-0186-z (2007).
Zhu, G. L. et al. Splenic hilar lymph node metastasis independently predicts poor survival for patients with gastric cancers in the upper and/or the middle third of the stomach. J. Surg. Oncol. 105, 786–792, https://doi.org/10.1002/jso.22149 (2012).
Ishikawa, S. et al. Pattern of lymph node involvement in proximal gastric cancer. World J. Surg. 33, 1687–1692, https://doi.org/10.1007/s00268-009-0083-6 (2009).
Huang, C. M. et al. A 346 case analysis for laparoscopic spleen-preserving no.10 lymph node dissection for proximal gastric cancer: a single center study. PLoS One 9, e108480, https://doi.org/10.1371/journal.pone.0108480 (2014).
Shin, S. H. et al. Clinical significance of splenic hilar lymph node metastasis in proximal gastric cancer. Ann. Surg. Oncol. 16, 1304–1309, https://doi.org/10.1245/s10434-009-0389-5 (2009).
Nishino, M. et al. Possible candidates for splenic hilar nodal dissection among patients with upper advanced gastric cancer without invasion of the greater curvature. Gastric Cancer 26, 460–466, https://doi.org/10.1007/s10120-023-01370-9 (2023).
Galizia, G. et al. Modified versus standard D2 lymphadenectomy in total gastrectomy for nonjunctional gastric carcinoma with lymph node metastasis. Surgery 157, 285–296, https://doi.org/10.1016/j.surg.2014.09.012 (2015).
Csendes, A. et al. A prospective randomized study comparing D2 total gastrectomy versus D2 total gastrectomy plus splenectomy in 187 patients with gastric carcinoma. Surgery 131, 401–407, https://doi.org/10.1067/msy.2002.121891 (2002).
Bonenkamp, J. J. et al. Randomised comparison of morbidity after D1 and D2 dissection for gastric cancer in 996 Dutch patients. Lancet 345, 745–748, https://doi.org/10.1016/s0140-6736(95)90637-1 (1995).
Kodera, Y. et al. Identification of risk factors for the development of complications following extended and superextended lymphadenectomies for gastric cancer. Br. J. Surg. 92, 1103–1109, https://doi.org/10.1002/bjs.4979 (2005).
Otsuji, E., Yamaguchi, T., Sawai, K., Ohara, M. & Takahashi, T. End results of simultaneous splenectomy in patients undergoing total gastrectomy for gastric carcinoma. Surgery 120, 40–44, https://doi.org/10.1016/s0039-6060(96)80239-x (1996).
Kinoshita, T. et al. Laparoscopic splenic hilar lymph node dissection for proximal gastric cancer using integrated three-dimensional anatomic simulation software. Surg. Endosc. 30, 2613–2619, https://doi.org/10.1007/s00464-015-4511-4 (2016).
Kinoshita, T. & Okayama, T. Is splenic hilar lymph node dissection necessary for proximal gastric cancer surgery? Ann. Gastroenterol. Surg. 5, 173–182, https://doi.org/10.1002/ags3.12413 (2021).
Feng, M. A. et al. Geriatric assessment in surgical oncology: a systematic review. J. Surg. Res 193, 265–272, https://doi.org/10.1016/j.jss.2014.07.004 (2015).
Huisman, M. G., Kok, M., de Bock, G. H. & van Leeuwen, B. L. Delivering tailored surgery to older cancer patients: Preoperative geriatric assessment domains and screening tools – A systematic review of systematic reviews. Eur. J. Surgical Oncol. (EJSO) 43, 1–14, https://doi.org/10.1016/j.ejso.2016.06.003 (2017).
Puts, M. T. et al. An update on a systematic review of the use of geriatric assessment for older adults in oncology. Ann. Oncol. 25, 307–315, https://doi.org/10.1093/annonc/mdt386 (2014).
Hamamoto, R. et al. Application of Artificial Intelligence Technology in Oncology: Towards the Establishment of Precision Medicine. Cancers (Basel) 12, 3532, https://doi.org/10.3390/cancers12123532 (2020).
Yamada, M. et al. Development of a real-time endoscopic image diagnosis support system using deep learning technology in colonoscopy. Sci. Rep. 9, 14465, https://doi.org/10.1038/s41598-019-50567-5 (2019).
Jinnai, S. et al. The Development of a Skin Cancer Classification System for Pigmented Skin Lesions Using Deep Learning. Biomolecules 10, 1123, https://doi.org/10.3390/biom10081123 (2020).
Hamamoto, R. et al. Introducing AI to the molecular tumor board: one direction toward the establishment of precision medicine using large-scale cancer clinical and biological information. Exp. Hematol. Oncol. 11, 82, https://doi.org/10.1186/s40164-022-00333-7 (2022).
Asada, K. et al. Uncovering Prognosis-Related Genes and Pathways by Multi-Omics Analysis in Lung Cancer. Biomolecules 10, 524, https://doi.org/10.3390/biom10040524 (2020).
Kobayashi, K., Miyake, M., Takahashi, M. & Hamamoto, R. Observing deep radiomics for the classification of glioma grades. Sci. Rep. 11, 10942, https://doi.org/10.1038/s41598-021-90555-2 (2021).
Asada, K. et al. Integrated Analysis of Whole Genome and Epigenome Data Using Machine Learning Technology: Toward the Establishment of Precision Oncology. Front Oncol 11, 666937, https://doi.org/10.3389/fonc.2021.666937 (2021).
Takahashi, S. et al. A New Era of Neuro-Oncology Research Pioneered by Multi-Omics Analysis and Machine Learning. Biomolecules 11, 565, https://doi.org/10.3390/biom11040565 (2021).
Kawaguchi, R. K. et al. Assessing Versatile Machine Learning Models for Glioma Radiogenomic Studies across Hospitals. Cancers (Basel) 13, 3611, https://doi.org/10.3390/cancers13143611 (2021).
Hayashi, T. et al. A discrimination model by machine learning to avoid gastrectomy for early gastric cancer. Ann. Gastroenterological Surg. 7, 913–921, https://doi.org/10.1002/ags3.12714 (2023).
Zhu, H. et al. Preoperative prediction for lymph node metastasis in early gastric cancer by interpretable machine learning models: A multicenter study. Surgery 171, 1543–1551, https://doi.org/10.1016/j.surg.2021.12.015 (2022).
Lee, H. D. et al. Development and Validation of Models to Predict Lymph Node Metastasis in Early Gastric Cancer Using Logistic Regression and Gradient Boosting Machine Methods. Cancer Res Treat. 55, 1240–1249, https://doi.org/10.4143/crt.2022.1330 (2023).
Zhang, A. Q. et al. Computed tomography-based deep-learning prediction of lymph node metastasis risk in locally advanced gastric cancer. Front Oncol 12, 969707, https://doi.org/10.3389/fonc.2022.969707 (2022).
Dong, D. et al. Deep learning radiomic nomogram can predict the number of lymph node metastasis in locally advanced gastric cancer: an international multicenter study. Annals of Oncology 31, 912–920, https://doi.org/10.1016/j.annonc.2020.04.003 (2020).
Lu, T. et al. Comparison of Machine Learning and Logic Regression Algorithms for Predicting Lymph Node Metastasis in Patients with Gastric Cancer: A two-Center Study. Technology in Cancer Research & Treatment 23, https://doi.org/10.1177/15330338231222331 (2024).
HajiEsmailPoor, Z., Tabnak, P., Baradaran, B., Pashazadeh, F. & Aghebati-Maleki, L. Diagnostic performance of CT scan–based radiomics for prediction of lymph node metastasis in gastric cancer: a systematic review and meta-analysis. Frontiers in Oncology 13, https://doi.org/10.3389/fonc.2023.1185663 (2023).
Giovagnoli, A. The Bayesian Design of Adaptive Clinical Trials. Int J Environ Res Public Health 18, https://doi.org/10.3390/ijerph18020530 (2021).
Ashby, D. Bayesian statistics in medicine: a 25 year review. Stat. Med 25, 3589–3631, https://doi.org/10.1002/sim.2672 (2006).
Troiani, J. S. & Carlin, B. P. Comparison of Bayesian, classical, and heuristic approaches in identifying acute disease events in lung transplant recipients. Stat. Med 23, 803–824, https://doi.org/10.1002/sim.1651 (2004).
Fanconi, C., de Hond, A., Peterson, D., Capodici, A. & Hernandez-Boussard, T. A Bayesian approach to predictive uncertainty in chemotherapy patients at risk of acute care utilization. EBioMedicine 92, 104632, https://doi.org/10.1016/j.ebiom.2023.104632 (2023).
Li, P. et al. Laparoscopic spleen-preserving splenic hilar lymphadenectomy in 108 consecutive patients with upper gastric cancer. World J. Gastroenterol. 20, 11376–11383, https://doi.org/10.3748/wjg.v20.i32.11376 (2014).
Aoyagi, K. et al. Prognosis of metastatic splenic hilum lymph node in patients with gastric cancer after total gastrectomy and splenectomy. World J. Hepatol. 2, 81–86, https://doi.org/10.4254/wjh.v2.i2.81 (2010).
Yura, M. et al. The Therapeutic Survival Benefit of Splenic Hilar Nodal Dissection for Advanced Proximal Gastric Cancer Invading the Greater Curvature. Ann. Surgical Oncol. 26, 829–835, https://doi.org/10.1245/s10434-018-07122-9 (2018).
Japanese Gastric Cancer, A. Japanese classification of gastric carcinoma: 3rd English edition. Gastric Cancer 14, 101-112, https://doi.org/10.1007/s10120-011-0041-5 (2011).
Nakamura, T. et al. History of the lymph node numbering system in the Japanese Classification of Gastric Carcinoma since 1962. Surg. Today 52, 1515–1523, https://doi.org/10.1007/s00595-021-02395-2 (2021).
Abril-Pla, O. et al. PyMC: a modern, and comprehensive probabilistic programming framework in Python. PeerJ Computer Sci. 9, e1516, https://doi.org/10.7717/peerj-cs.1516 (2023).
Carvalho, C. M., Polson, N. G. & Scott, J. G. Handling sparsity via the horseshoe. Artificial intelligence and statistics, 73-80 (2009).
Akiba, T., Sano, S., Yanase, T., Ohta, T. & Koyama, M. Optuna: A Next-generation Hyperparameter Optimization Framework. 2623–2631, https://doi.org/10.1145/3292500.3330701 (2019).
Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Hoffman, M. D. & Gelman, A. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 1593–1623 (2014).
Youden, W. J. Index for rating diagnostic tests. Cancer 3, 32–35 (1950).
Japanese Gastric Cancer, A. Japanese Classification of Gastric Carcinoma - 2nd English Edition. Gastric Cancer 1, 10-24, https://doi.org/10.1007/s101209800016 (1998).
Acknowledgements
We thank all members of R. Hamamoto’s laboratory for providing valuable advice and a comfortable environment.
Author information
Authors and Affiliations
Contributions
K.I. was responsible for conceptualization, methodology, investigation, data curation, formal analysis, and writing of the original draft. S.T. was responsible for the conceptualization, methodology, formal analysis, and writing—review and editing. N.K. was responsible for the methodology and writing—review and editing. K. Takasawa was responsible for the conceptualization, methodology, formal analysis, and writing—review and editing. K. Takeda was responsible for the methodology, formal analysis, and writing—review and editing. K.M. was responsible for the methodology and writing—review and editing. M.N. was responsible for the methodology, data curation, and writing—review and editing. T.H. was responsible for the methodology, data curation, and writing—review and editing. Y.Y. was responsible for the methodology, data curation, and writing – review and editing. S.M. was responsible for the methodology and writing – review and editing. T.Y. was responsible for the methodology, data curation, and writing – review and editing. R.H. was responsible for the funding acquisition, conceptualization, methodology, and writing – original draft. All authors confirm that they have full access to all data in the study and accept responsibility for the submission for publication. All the authors have read and approved the final version of this manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ishizu, K., Takahashi, S., Kouno, N. et al. Establishment of a machine learning model for predicting splenic hilar lymph node metastasis. npj Digit. Med. 8, 93 (2025). https://doi.org/10.1038/s41746-025-01480-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41746-025-01480-x
This article is cited by
-
Identification of multiomics and immune infiltration-associated biomarkers for early gastric cancer: a machine learning-based diagnostic model development study
BMC Cancer (2025)
-
Integrative machine learning models predict prostate cancer diagnosis and biochemical recurrence risk: Advancing precision oncology
npj Digital Medicine (2025)