Introduction

Gastric cancer ranks fifth in terms of incidence and fourth in terms of cancer-related mortality worldwide1. The principle of curative treatment for gastric cancer is surgical resection with appropriate regional lymphadenectomy, as excessive resection causes high morbidity and shortens life expectancy2,3,4,5,6,7. According to the Japanese Gastric Cancer Classification, the splenic hilar lymph node (SHLN) is a regional lymph node of gastric cancer in the upper third of the stomach. In total, 2.8–27.9% of SHLNs have metastasized8,9,10,11,12,13. Although Japanese phase III trial comparing splenectomy and spleen-preservation demonstrated survival non-inferiority, that trial was only limited to the upper advanced gastric cancer not invading the greater curvature6. Gastric cancer invading the greater curvature had high frequency and therapeutic value index at SHLNs, thus Japanese Gastric Cancer Treatment Guideline recommended splenic-hilar nodal dissection for these tumors2. Moreover, our retrospective study clarified that high frequency and therapeutic value index at SHLNs were shown even for gastric cancer without invasion to the greater curvature especially when the location was posterior wall and histology was undifferentiated type14. Thus, SHLN dissection with splenectomy is widely performed in Japan. Splenectomy has several disadvantages, particularly a high incidence rate of postoperative morbidity (approximately 20–30%), which has been reported to offset the survival benefit in certain cases in randomized controlled clinical trials and retrospective studies15,16,17,18,19. The importance of developing and selecting an appropriate surgical approach based on the oncological status of the cancer has intensified20,21. Furthermore, uniform treatment strategies typically applied to healthy patients are no longer feasible in patients with complex comorbidities or significant frailty22,23,24. Recently, there is a clear lack of tools and definitive criteria for decision-making that need to be developed. Surgeons and healthcare staff decide on the best operation for each patient based on the oncological profile and background of the patient. This complex decision-making process highlights the urgent need for tools that visualize and facilitate the sharing of critical information, thereby enhancing effective communication and enabling comprehensive decision-making in healthcare settings.

Machine learning techniques have been advancing rapidly in recent years and are being implemented in the field of medical oncology25,26,27,28,29,30,31,32,33. A key feature of machine learning is its data-driven nature, in which algorithms analyze large amounts of data in order to discover patterns and rules. Another key feature is its adaptability, which allows models to be updated and improve their accuracy as new data becomes available. These features make machine learning a powerful tool for addressing complex medical challenges, where data-driven insights and adaptability are critical for improving predictive performance and tailoring clinical decisions. Various machine learning models for predicting lymph node metastasis (LNM) in gastric cancer have been developed, mainly focusing on the eligibility for endoscopic resection in early gastric cancer and the prognosis of advanced gastric cancer (AGC)34,35,36. Notably, no models have been developed that attempt to change the surgical plans or extent of lymph node dissection37,38,39,40. Traditional machine learning models based on frequentist approaches fail to meet the clinical practice requirements because of their inability to predict uncertainty. Even in high-performance models, it is difficult to change clinical decision-making processes. The Bayesian approach allows prior information and beliefs to be incorporated into statistical inferences, thus providing a powerful tool for decision-making under uncertainty. Unlike traditional frequentist methods, the Bayesian framework provides intuitive interpretability by expressing results as probabilities. For example, clinicians can better understand statements such as “there is a 95% probability that the odds ratio is between 2.0 and 3.0,” which directly conveys the uncertainty associated with a prediction. This probabilistic interpretation makes the Bayesian approach particularly useful for complex clinical decisions where actionable insights must account for uncertainty. Another key advantage of Bayesian methods is their ability to quantify uncertainty through the posterior distributions of model parameters, thereby providing a comprehensive understanding of prediction confidence. Bayesian methods are inherently adaptive, allowing models to be dynamically updated as new data becomes available. By leveraging these strengths - intuitive interpretability, robust uncertainty quantification, and adaptability - the Bayesian approach offers a unique advantage over traditional methods in addressing the challenges of clinical decision-making. In the medical field, Bayesian inference has demonstrated effectiveness in various settings, including diagnosis, treatment planning, and epidemiological studies41,42,43. In this context, uncertainty may play an important role in determining the need for SHLN dissection.

In this study, we have developed a Bayesian logistic regression model called the Bayesian prediction of SHLN metastasis (Bayes-SHLNM) to identify patients with SHLN who underwent total gastrectomy and splenectomy between 2000 and 2012. The primary distinction between Bayesian and frequentist modeling approaches lies in how they handle uncertainty and parameter estimation. Frequentist models, such as the traditional logistic regression model (frequentist logistic regression [FLR]), provide single-point parameter estimates, leading to a single predictive model. By contrast, Bayesian methods treat parameters as random variables with distributions, resulting in a collection of models rather than a single model (Fig. 1). This ensemble of models captures the uncertainty inherent in the parameter estimates. The key advantage of the Bayesian approach is that it yields the posterior probability distribution (PPD) for SHLN metastasis risk. This distribution offers a richer and more comprehensive view of the predictive uncertainty than the point estimates provided by frequentist models. The Bayesian models were benchmarked against the FLR model to evaluate their performances. The PPD for each prediction was visualized to demonstrate the uncertainty and range of possible outcomes, which can be crucial for clinical decision-making in high-risk and uncertain scenarios.

Fig. 1: Comparison of frequentist and Bayesian (Bayes-splenic hilum lymph node metastasis [SHLNM]) models in predicting metastasis probability for SHLN dissection in upper gastrointestinal cancer.
figure 1

This figure illustrates the different approaches of frequentist and Bayesian logistic regression models in predicting the probability of SHLN metastasis in patients who underwent total gastrectomy with splenectomy. The frequentist model (top pathway) uses clinical, tumor, lymph node location, and pathological information to generate a single-point probability estimate (e.g., 72%) for metastasis. In contrast, the Bayesian model (bottom pathway) incorporates the same data sources but outputs a posterior probability distribution (PPD), offering a more comprehensive view of the uncertainty associated with the prediction. The PPD allows for visualization of the range of possible outcomes and highlights the degree of uncertainty, which is critical for informed clinical decision-making in scenarios with high risk and uncertainty.

In this study, we aimed to develop a predictive model useful for decision-making focused on prediction uncertainty in Bayesian inference and visualize the individual PPD of SHLN metastasis in upper gastrointestinal cancer (UGC) based on clinicopathologic characteristics.

Results

Study population selection process

The patient selection flowchart is shown in Fig. 2. Between January 2000 and December 2012, a total of 5957 patients underwent gastrectomy with nodal dissection. Of these, 798 patients underwent total gastrectomy with splenectomy for primary gastric cancer. In total, 35 patients diagnosed with pT0 or pT1, 169 patients diagnosed with pStage IV, and 1 patient who underwent R1 or R2 resection were excluded. The final study population comprised 593 patients.

Fig. 2: Study population selection process.
figure 2

Flowchart depicting the selection process for the study population. From January 2000 to December 2012, a total of 5957 patients underwent gastrectomy with nodal dissection. Of these, 798 patients underwent total gastrectomy with splenectomy for primary gastric cancer. Exclusions were made for 35 patients diagnosed with pT0 or pT1 disease, 169 patients diagnosed with pStage IV disease, and 1 patient who underwent R1 or R2 resection. The final study population consisted of 593 patients.

Baseline characteristics

The study cohort comprised 593 patients. Male sex was predominant, and 15.2% of the patients received neoadjuvant chemotherapy. Half of the tumors predominantly invaded the upper gastric body, and 35.8% had greater curvature invasion (GCI). The predominant histology (histology 1) was poorly differentiated adenocarcinoma non-solid type (por2), and signet-ring cell carcinoma (sig) was the second most common histological component (histology 2). SHLN metastasis (#10) was 8.1%, which was the prediction target. The most frequent LNMs were found along the lesser curvature (#1, #3). LNMs adjacent to the SHLN (#4sb, #4d, #11d) were found in 7.3–12% of the patients. The cohort characteristics are summarized in Table 1.

Table 1 Patient cohort characteristics for training datasets (n = 593)

Comparison of the performance of the Bayes-SHLNM and frequency-based logistic machine learning models

Table 2 shows the 5-fold cross-validation (5fCV) performance of the Bayesian and FLR models. Among the four Bayesian models, the Bayes-SHLNM model showed superior performance in terms of the receiver operating characteristic area under the curve (ROC AUC) (0.83), precision-recall AUC (PRAUC, 0.35), and F1 score (0.31), with comparable results to those of the FLR model (Fig. 3). These findings indicate that the Bayes-SHLNM is a robust alternative to the FLR model.

Fig. 3: Performance of four Bayesian models (Bayes-splenic hilum lymph node metastasis [SHLNM], Bayesian least absolute shrinkage and selection operator, basic, and Student-T models) and one frequentist model (frequentist logistic regression model) based on the average results from 5-fold cross-validation, along with the 95% confidence intervals.
figure 3

a The receiver operating characteristic-area under the curve (AUC) compares the true positive rate (sensitivity) and false positive rate for each model. The Bayes-SHLNM model achieves the highest AUC of 0.83 [95% confidence interval (CI), 0.74–0.91]. The shaded areas around each curve represent the 95% confidence intervals for each model. b The precision-recall curve evaluates model performance with respect to precision and recall. The Bayes-SHLNM performed best among four Bayesian models, with an AUC of 0.35 [95% CI, 0.14–0.56], which was comparable with the FLR model (AUC = 0.37 [95% CI, 0.13–0.61]). The shaded regions represent the 95% CIs.

Table 2 Performance of 5-fold cross-validation for the Bayesian logistic regression and FLR models

The results of the Bayes-SHLNM and FLR model predictions obtained from 5fCV are summarized in Table 3 and Supplementary Table 1. When the tumors were divided into two categories based on whether the tumor had GCI, which is recommended as an indication for SHLN dissection, the models predicted positive results equally in both categories in approximately 20% of cases, whereas tumors without GCI were predicted as negative precisely in 99% of cases. The results of Bayes-SHLNM and FLR were found to be similar; however, for cases without GCI, Bayes-SHLNM performed slightly better, whereas for cases with GCI, FLR demonstrated slightly higher accuracy.

Table 3 Prediction results of Bayes-SHLNM model category AGC with or without GCI

The Bayes-SHLNM model posterior probability distribution for individual patients

Six demonstrable cases are shown in Figs. 4, 5, where the Bayes-SHLNM model provided a PPD with prediction and uncertainty. Figures 4, 5 demonstrate representative cases of individual posterior distributions of the probability of SHLN metastasis inferred using the Bayes-SHLNM model. Non-GCI (NonG)-Case 1 (Fig. 4a) was a 61-year-old man with AGC without GCI (U, Post, Type 1, 60 mm, por2 > sig, pT4a [SE]) and LNM (#1, #3, #4sa, #4sb, #11d). According to the Japanese Gastric Cancer Association (JGCA) guidelines (version 6), tumors that do not invade the greater curvature line are strongly recommended not to undergo SHLN dissection, including splenectomy. However, this model can provide an opportunity to reconsider whether a patient is at risk of undergoing SHLN dissection. WithG-Case 1 (Fig. 5a) was a 69-year-old man with AGC, GCI (UML, Circ, Type 4, 210 mm, por2 > sig, pT4a [SE]), and LNM (#1, #2, #3, #4sa, #4sb, #5, #6, #7, #9, #11p, #11d). NonG-Case 2 (Fig. 4b) was a 56-year-old woman with AGC without GCI (U, Less, Type 2, 60 mm, tub2, pT3 [SS]) and LNM (#3). The posterior distribution of these two cases supports decisions based on the JGCA guideline. Tumors that invaded the greater curvature line are weakly recommended to undergo SHLN dissection, including splenectomy. WithG-Case 2 (Fig. 5b) was a 66-year-old woman with AGC with GCI (U, Ant, Type 2, 45 mm, tub2, pT3 [SS]) and no LNM. The posterior distribution of this case supports reconsideration of the decision based on the JGCA guidelines. This model can provide us with good information to reach a consensus on whether a patient should undergo splenectomy.

Fig. 4: Posterior probability distributions of splenic hilum lymph node (SHLN) metastasis predicted by the Bayes-SHLNM model for three NonG cases.
figure 4

a NonG-Case 1: A 61-year-old man with advanced gastric cancer (AGC) without greater curvature invasion (U, Post, Type 1, 60 mm, por2 > sig, pT4a [SE]) and lymph node metastasis (LNM) (#1, #3, #4sa, #4sb, #11d). The Bayes-SHLNM model predicted a posterior distribution with a mean probability of 0.502 for SHLN metastasis. Although the Japanese Gastric Cancer Association (JGCA) guidelines strongly discourage SHLN dissection, this model allows for reconsideration based on the probability of metastasis. b NonG-Case 2: A 56-year-old woman with AGC without greater curvature invasion (U, Less, Type 2, 60 mm, tub2, pT3 [SS]) and LNM (#3). The model predicted a low probability of SHLN metastasis (p-mean = 0.025), consistent with the weak recommendation for splenectomy in the JGCA guidelines. c NonG-Case 3: A 79-year-old man with AGC without greater curvature invasion (U, Ant, Type 2, 100 mm, tub2 > por1, pT3 [SS]) and LNM (#1, #3, #7, #9, #11p). Despite a positive prediction by the Bayes-SHLNM model, the uncertainty in the posterior distribution suggests reconsideration of splenectomy, especially considering the advanced age of the patient.

Fig. 5: Posterior probability distributions of splenic hilum lymph node (SHLN) metastasis predicted by the Bayes-SHLNM model for three cases with greater curvature invasion (WithG).
figure 5

a WithG-Case 1: A 69-year-old man with advanced gastric cancer (AGC) with greater curvature invasion (UML, Circ, Type 4, 210 mm, por2 > sig, pT4a [SE]) and lymph node metastasis (LNM) (#1, #2, #3, #4sa, #4sb, #5, #6, #7, #9, #11p, #11d). The Bayes-SHLNM model predicted a posterior probability distribution with a mean of 0.674 for SHLN metastasis, consistent with the recommendation for splenectomy in the JGCA guideline. b WithG-Case 2: A 66-year-old woman with AGC with greater curvature invasion (U, Ant, Type 2, 45 mm, tub2, pT3 [SS]) and no LNM. The posterior distribution shows a mean probability of 0.023 for SHLN metastasis, conflicting with the recommendation against splenectomy according to the JGCA guidelines. c WithG-Case 3: A 55-year-old man with AGC with greater curvature invasion (M, Gre, Type 5, 77 mm, por2 > tub2, pT2 [MP]) and LNM (#3, #4d). Although the Bayes-SHLNM model predicted “negative” for SHLN metastasis, the posterior distribution shows uncertainty with a mean of 0.085 and a probability range of 0–0.2 in the 95% high-density interval. This uncertainty suggests that splenectomy should be reconsidered.

However, Figs. 4c, 5c showed two cases in which decision-making might not change if uncertainty was considered. Therefore, predicting uncertainty is unacceptable for clinical judgment. Clinicians can then rely on the JGCA guidelines, patient will, or institutional policies. One (Fig. 5c, WithG-Case 3) was a 55-year-old man with AGC, GCI (M, Gre, Type 5, 77 mm, por2 > tub2, pT2 [MP]), and LNM (#3, #4d). The Bayes-SHLNM predicted “negative” but had room to reconsider performing as the positive case because the mean posterior distribution was 0.075, with a range of 0–0.2 in 95% highest density interval (HDI). The other patient (Fig. 4c, NonG-Case 3) was a 79-year-old man with AGC without GCI (U, Ant, Type 2, 100 mm, tub2 > por1, pT3 [SS]) and LNM (#1, #3, #7, #9, #11p). The Bayes-SHLNM model predicted “positive,” but the PPD with the uncertainty made room to reconsider not performing a splenectomy because of the patient’s age. All individual PPDs of the 5fCV models are shown in Supplementary Figure 1.

Posterior distribution of the regression coefficient parameters

Figure 6 shows the parameters of the 47 regression coefficients in the Bayes-SHLNM model trained using 593 cases. Notably, both the #4sb and #4sa coefficients had values > 0 within 95 HDI, suggesting a significant positive influence on the model. Tumor location in the greater curvature, tumor size, predominant histological por2, LNM #11d, and LNM #12a tended to be positive parameters.

Fig. 6: Posterior distributions of the 47 regression coefficients in the Bayes-splenic hilum lymph node metastasis (SHLNM) model for SHLNM prediction in upper gastrointestinal cancer (UGC).
figure 6

This figure presents the posterior distributions of the 47 regression coefficients obtained from the Bayes-SHLNM model, which was trained on data from 593 cases. The regression coefficients indicate the impact of various clinical, tumor-related, histological, and lymph node metastasis (LNM) factors on the probability of SHLN metastasis in UGC. a Clinical features: Include variables, such as age, sex, and neoadjuvant chemotherapy. b Tumor location: Describes the anatomical site of the tumor, including the upper third, anterior wall, and involvement of the greater curvature. c Tumor features: • Macroscopic types (0–5): Different macroscopic types of the tumor. • Tumor size: Represents the maximum tumor diameter, standardized as Z-scores. • Pathological T stage: Refers to the depth of tumor invasion into the gastric wall, discretized into six values corresponding to T1a (1), T1b (2), T2 (3), T3 (4), T4a (5), and T4b (6). d Histological features: The histological subtypes are classified as follows: • Papillary adenocarcinoma (pap). • Tubular adenocarcinoma (tub): (1) Well differentiated (tub1). (2) Moderately differentiated (tub2). • Poorly differentiated adenocarcinoma (por): (1) Solid type (por1). (2) Non-solid type (por2). • Signet-ring cell carcinoma (sig). • Mucinous adenocarcinoma (muc). e LNM (#1–#12a): Represents LNM across various nodal stations. Both coefficients for #4sb and #4sa showed values exceeding 0 within the 95% highest density interval, indicating their significant positive influence on predicting SHLNM. Additionally, the location of the tumor in the greater curvature, tumor size (Z-score), non-solid type of poorly differentiated adenocarcinoma (por2), LNM #11d, and LNM #12a exhibited positive parameter values, suggesting a higher likelihood of SHLN metastasis in the presence of these factors.

Discussion

In the present study, we developed Bayesian models to predict SHLN metastasis in UGC using data from 593 patients with UGC who underwent TGS. To the best of our knowledge, this is the first report of a machine-learning model using Bayesian techniques for the prediction of SHLNM. The Bayes-SHLNM model performed comparably to the frequentist LR model, with a mean ROC AUC of 0.83 and an F1 score of 0.3 by 5fCV. When UGC was divided into two categories based on whether the UGC had GCI, which has been widely used as an indicator of SHLN dissection in Japan, positive predictive values, or precisions, were 20% regardless of GCI, whereas the negative predictive values were 99% in UGCs without GCI and 91% in UGCs with GCI2. Moreover, the Bayes-SHLNM model demonstrated PPDs that provided uncertainty in the prediction of individual cases. These results suggest that the Bayes-SHLNM model has the potential to help in clinical decision-making regarding whether SHLN dissection should be performed for UGC.

The Bayes-SHLNM model is comparable with the frequentist model. Previous models for predicting LNM in gastric cancer have focused on the decision between surgery and endoscopic resection using regional LN metastasis prediction based on a frequentist approach. They reported that their models performed, with the following values: AUC, 0.69–0.94; F1, 0.29–0.31; precision, 8–21%; and negative predict value, 99%34,35,36. Given the well-established benefits of gastric surgery, the false negative rate of LNM prediction models for early gastric cancer is unacceptable. However, in our study, considering the wide range of safety and feasibility of TGS in real-world data, the acceptable levels of false negatives and false positives differed for each cohort. In this study, we develop a prediction model with a level of predictive performance comparable with that of a frequentist, together with a range of uncertainty for these predictions.

The strength of this model is that it provides a posterior probability density distribution for each case; that is, uncertainty is accounted for. The benefit of considering uncertainty in decision-making has been reported in the literature, especially for situations that may occur infrequently but have significant negative effects if overlooked in unbalanced and limited data43,44. This model provides an effective individual indicator that can be used to evaluate and discuss the pros and cons of performing the invasive treatment, SHLN dissection, with knowledge of the advantages of performing the dissection and the disadvantages of developing complications in the case.

In addition, incorporating uncertainty into decision-making has several practical benefits in clinical settings. First, uncertainty estimates allow clinicians to identify cases in which predictions are highly reliable versus those in which additional diagnostic testing or consultation may be needed. For example, if the model predicts SHLN metastasis with high certainty, it can streamline the decision to perform dissection. Conversely, when the model’s uncertainty is high, it highlights the need for further investigation or a more cautious approach.

Secondly, quantifying uncertainty facilitates better communication between clinicians and patients. By presenting predictions as probability distributions rather than definitive outcomes, clinicians can transparently discuss the risks and benefits, enabling shared decision-making and helping patients feel more informed and involved in their care. Incorporating these advantages into clinical workflows ensures that uncertainty not only informs decision-making but also enhances robustness, balancing the risks and benefits of invasive interventions such as SHLN dissection. We believe that this feature can serve as a powerful communication tool, fostering better collaboration among clinicians and between clinicians and patients. In situations with significant uncertainty, this method allows for the effective sharing of complex information and helps all parties understand the risks and also the probabilities involved. Through improved communication, clinicians and patients can work together to select informed and acceptable treatment options tailored to the unique circumstances of each case.

Furthermore, although the Bayesian model has demonstrated its ability to provide uncertainty estimates, comparisons with other machine learning models, such as random forests, remain an important direction for future work. Random forests and similar models can generate prediction intervals based on the variance in the data, providing insight into data uncertainty. However, these intervals primarily reflect data variability and do not account for model uncertainty, which is a critical component of clinical decision-making. By contrast, Bayesian methods provide a more comprehensive framework by quantifying both data and model uncertainty, which can be particularly valuable in situations with limited or imbalanced data.

Expanding this study to include comparisons with these models would provide a broader perspective on the strengths and limitations of Bayesian approaches. Such comparisons would require significant additional work in order to ensure fair and rigorous evaluations as well as thorough validation using external datasets. Although this is beyond the scope of this study, we acknowledge the value of this approach and suggest that it is an important avenue for future research. These efforts would help further establish the robustness and clinical utility of Bayesian models compared to other predictive modeling techniques.

Hyperparameter selection plays a critical role in both frequentist and Bayesian modeling approaches; however, the methodologies are fundamentally different. In frequentist models, hyperparameters such as regularization strength are treated as fixed values determined by cross-validation or optimization techniques. In our study, we used Optuna for efficient hyperparameter tuning in the frequentist logistic regression to ensure optimal model performance.

In contrast, Bayesian models treat hyperparameters as probabilistic variables and often assign prior distributions to directly incorporate uncertainty into the modeling process. In this study, several prior distributions of Bayesian model parameters were evaluated to assess their impact on predictive performance and uncertainty quantification. The horseshoe performed the best, demonstrating superior accuracy and robustness. The horseshoe prior is particularly advantageous because of its ability to handle sparsity and mitigate overfitting, making it a natural choice for our dataset, which has a moderate number of predictors relative to the sample size.

This distinction between the frequentist and Bayesian approaches highlights the flexibility and robustness of Bayesian models, particularly in contexts where quantifying uncertainty is critical. By incorporating hyperparameter uncertainty into posterior distributions, Bayesian models provide a more comprehensive framework for prediction and decision-making than frequentist methods that rely on fixed hyperparameter values.

We selected 47 parameters as the coefficients of logistic regression and examined the training models with four different super-prior distributions for regularization. The horseshoe prior model performed the best and exhibited the strongest regularization among the four models analyzed. This result can be attributed to correlations among various explanatory factors, such as tumor morphology, size, location, histology, and LNM sites. Because clinical decisions are usually based on a combination of all these factors, we decided not to omit categories arbitrarily, but to adjust the regularization to address overfitting and multicollinearity. Figure 6 shows the posterior distributions of the 47 parameters. Only #4sa and #4sb exceeded 95% HDI. These items are consistent with the factors identified in previous statistical methodologies. Previous studies have identified independent risk factors for SHLN, such as type 4 macroscopic type, larger and deeper invaded tumor, certain regional lymph node metastases (#4sa, #4sb, #7, or #11), and undifferentiated-type histology, which is consistent with our results45,46. In our study, the pathological T-factors and #7 were not significant, which can be explained by the exclusion of pStage IV, including #16LN and CY positivity, in our cohort.

Several parameters were zero within their 95% uncertainty ranges. During the experimental phase, we explored models that included the selected parameter subsets. However, given the limited dataset used in our experiments, it was challenging to completely eliminate subjectivity in the feature selection methods. Therefore, we decided not to perform feature selection. Instead, we chose to include all 47 parameters and relied on the regularization provided by the horseshoe prior to effectively addressing overfitting and multicollinearity. We believe that this approach provides a more comprehensive representation of the data and is more consistent with the multifactorial nature of clinical decision-making. In addition, we also recognize that the use of feature selection may become a more viable approach as larger datasets become available for future studies. This could help further reduce multicollinearity and improve model interpretability without compromising predictive accuracy.

This study has some limitations. First, this was a retrospective study involving a certain patient group from a single high-volume institution. Following the publication of the JGCA guidelines in Japan, splenectomy with SHLN dissection was not performed in patients with GC without GCI. Consequently, a larger sample size is not expected in the future. Furthermore, the low rate of SHLN metastasis leads to imbalanced data, which results in the suboptimal performance of traditional machine learning methods. However, the Bayesian approach adopted in this study shows promise for addressing this issue. Second, the model was not validated using an external cohort. Therefore, further validation using a different cohort is required. Third, we could not verify the accuracy of the PPD. However, the most important factor is its usefulness in clinical decision making. Future prospective studies are required. In addition, integrating explainability into a model remains challenging. In general, deterministic models tend to align well with explainability. Incorporating explainability into a Bayesian model, which visualizes uncertainty, is an important future direction. Addressing this challenge could further enhance the clinical utility of the model and improve its practical adoption. Finally, the validity and interpretability of the uncertainty bounds provided by the Bayesian model remain unclear. To assess the accuracy of PPD and address potential model misspecifications, future research will require both prospective studies and also simulation-based analyses. These analyses examine the calibration of the Bayesian model under various scenarios, including cases involving misspecifications. In addition, validation using external datasets will provide further insight into the robustness of uncertainty estimates in different contexts. These efforts are critical for ensuring the reliability and practical applicability of the Bayesian approach in clinical settings.

In conclusion, the Bayes-SHLNM model demonstrates a performance equivalent to that of the traditional FLR while providing individual PPDs. It demonstrates potential contributions to decision-making processes and suggests promising prospects for personalized precision medicine.

Methods

Setting and ethical approval

All methods were performed in accordance with the ethical guidelines for medical and health research involving human subjects. Informed consent was obtained from all patients. This retrospective cohort study was approved by the Institutional Review Board of the National Cancer Center (2016-496, 2017-077). The study was conducted in accordance with the principles of the Declaration of Helsinki.

Datasets

In the present study, we used a cohort previously reported by Yura et al. 47 This study involved a retrospective review of the clinical records of 593 patients diagnosed with advanced gastric cancer classified as stages T2–T4. These stages are based on the depth of tumor invasion into the stomach wall: T2 indicates invasion into the muscle layer, T3 indicates invasion into the connective tissue beneath the outer membrane, and T4 indicates invasion into the membrane itself or adjacent structures. All of the patients had tumors located in the upper third of the stomach and underwent total gastrectomy (complete removal of the stomach) combined with splenectomy (removal of the spleen) and extensive lymph node dissection called D2 dissection. D2 dissection involves the removal of all the regional lymph nodes around the stomach, including those near the major blood vessels supplying the stomach. Surgeries were performed between January 2000 and December 2012 at the National Cancer Center Hospital in Japan. Importantly, all patients underwent curative surgery (referred to as R0 resection), indicating that no visible or microscopic tumors remained after surgery. Resected specimens were examined and evaluated according to the Japanese Classification of Gastric Carcinoma48,49.

Criteria for patient selection

The following criteria were used to select the study population:

Initial Pool: Patients who underwent gastrectomy with nodal dissection between January 2000 and December 2012

Inclusion Criteria:

Patients who underwent total gastrectomy with splenectomy for primary gastric cancer

Exclusion Criteria:

1. Patients diagnosed with pT0 or pT1 disease

2. Patients diagnosed with pStage IV disease (#16 LN metastasis, positive cytology)

3. Patients who underwent R1 or R2 resection

Criteria for variable selection

The explanatory variables for the model were selected from a comprehensive list of items reported to be clinically relevant for predicting SHLN metastasis in gastric cancer. The selected variables encompassed a range of clinical and pathological factors that could be predicted by preoperative examination. Clinical data included age, sex, whether neoadjuvant chemotherapy was administered, and whether the tumor invaded the greater curvature. Pathological data were classified according to the Japanese Classification of Gastric Carcinoma, including tumor location, cross-sectional area, macroscopic type, tumor size, predominant histological type, secondary predominant histological type, third predominant histological type, and metastasis to regional lymph nodes (numbers 1–12a) (Fig. 7). Common types were individually categorized, whereas special types were collectively classified as “special (sp)” due to their low incidence48,49.

Fig. 7: Lymph node’s numbering system defined by the Japanese Classification of Gastric Carcinoma (15th edition).
figure 7

This is a modified figure from the reference56. The regional lymph node metastasis is evaluated based on the findings of diagnostic imaging, such as those from an endoscopic ultrasound and CT.

Software and the basic structure of the model development

We designed our model using Python 3.10 and PyMC 5.9.250. The outcome variable for SHLN metastasis was binary (0 for absence, 1 for presence). The explanatory variables were incorporated using logistic regression. We assume that the output followed a Bernoulli distribution. We selected a non-informative prior for our Bayesian model and applied four superior distributions for regularization: a normal distribution, Student’s T, Laplace, and horseshoe priors51. Details of the horseshoe prior are provided in Supplementary Figures 2, 3. Additionally, for the performance benchmarking, the FLR model was developed by Scikit-learn module, tuned with the hyperparameter optimization framework “Optuna” version 3.5.052,53.

Selection and normalization of explanatory variables

Continuous variables, such as age, tumor size, and pathological T category, were standardized by applying Z-score normalization to convert them to a standard normal distribution. For categorical variables, we used one-hot encoding to transform them into a format suitable for the model input. The one-hot encoded variables were standardized using Z-score normalization.

Sampling and inferred posterior probability distribution

In this study, PyMC was used to estimate the PPD of the Bayesian model. We employed the No-U-Turn Sampler algorithm to perform the Markov Chain Monte Carlo (MCMC) sampling54. A total of 5000 samples were collected. The initial 2000 samples were discarded as burn-in to ensure a more accurate estimation of the distribution after convergence. Four independent chains were run to ensure sample diversity. The acceptance rate was set to 0.99 to achieve efficient and accurate sampling.

Evaluation of the model’s performance using the internal cross-validation method

To compare the model performance, we used a stratified 5fCV approach. This process involved creating five separate models, each tested on different data subsets to evaluate their performance. In the training datasets, after samples were drawn by MCMC sampling, samples of all channels obtained from posterior sampling were combined in each case as a PPD. Their means were estimated as predictions and compared with observed SHLNM55. For the decision-making, the thresholds were calculated using the following procedure:

  1. 1.

    From the PPD of the training datasets for each fold, the mean was extracted as the predicted probability for each case.

  2. 2.

    These predicted probabilities were used to construct an ROC curve and the Youden index was applied to determine the optimal threshold for classification. This threshold was used as the decision boundary and is referred to as the Yi threshold.

During testing, the models predicted outcomes, where a positive result was indicated if the mean posterior probability was above this threshold. These predictions were then compared with the actual outcomes, with the performance assessed using metrics, such as the ROC AUC, PRAUC, sensitivity, specificity, precision, and F1 score. This process was repeated for each of the five created datasets, and the resulting average of each value was calculated to assess the overall effectiveness of our model.

In addition to our Bayesian model, FLR models were developed for benchmark purposes. We applied the same rigorous model development, testing, and evaluation process to the FLR model used in our main model, effectively comparing how well each model predicted SHLN metastasis.

Evaluation of the utility of posterior probability distribution

Uncertainty was assessed using individual PPDs, and the feasibility of the model for clinical implementation was examined. The 95% HDI range, mean, and median of the PPDs were calculated and expressed as density distributions in kernel plots. Cases were visualized as GCI or not, which is an important indicator for clinical decision-making according to the JGCA guideline criteria2. Cases were demonstrated in which the model predictions themselves could change the clinical decision, whereas cases in which the uncertainty of the prediction could help decision-making were also shown.

Uncertainty evaluation of Bayesian regression coefficients

Using all the training cohorts, we developed a final Bayesian logistic regression model and examined the posterior distributions of the parameters using a 95% HDI. The coefficients for which the posterior distributions of the parameters did not cross zero were defined as significant.