Introduction

Post-acute pancreatitis diabetes mellitus (PPDM-A) is a type of secondary subtype of diabetes and is the second most commonly diagnosed type of newly diabetes in adults1. Patients with PPDM-A exhibit more pronounced fluctuations in their blood sugar levels and face greater difficulty managing their condition relative to type 2 diabetes patients2,3, exposing them to a higher risk of death. There are also differences in the medication approaches used to treat these two forms of diabetes. According to the American Diabetes Association (ADA), the progression of PPDM-A is associated with a 33-fold higher risk of pancreatic cancer development4. The identification of individuals facing a higher PPDM-A risk is thus vital to effectively prevent and manage this disease and sequelae. In prior studies, imaging data radiomics-based models have shown promise as tools for PPDM-A prediction5. As these machine learning models tend to exhibit a “black box”-like nature, however, determining the basis for particular predictions in specific patient populations can often be difficult. Radiomics approaches entail the extraction of a wide array of quantitative features from computed tomography (CT) or other imaging datasets, whereupon predictive models can be developed by machine learning or deep learning algorithms that process these datasets6. The inherent complexity of the resultant models typically precludes any efforts to intuitively understand their internal operations. As a result, this lack of decision-making transparency tends to limit the clinical implementation of these models despite their predictive accuracy. Model interpretation is also very challenging, as the medical data used for modeling tends to be complex and ambiguous7. The SHAP (SHapley additive explanations) technique has been developed in an effort to clarify the basis for machine learning model-based predictions. SHAP is based on the game theory concept of Shapley values, enabling the quantification of individual feature contributions to the prediction of a given model through the establishment of a visual, interpretable explanation of model outcomes8. In this study, CT-derived radiomics features were thus used for the development and validation of a machine learning model for the prediction of PDM-A to provide a tool that can aid the accurate detection of this disease at an early time point. To ensure the interpretability of the developed radiomics model, SHAP analysis was also performed.

Materials and methods

Study participants

The present study was approved by the Medical Ethics Committee of the Affiliated Hospital of North Sichuan Medical College and the Medical Ethics Committee of the Affiliated Hospital of North Sichuan Medical College waived the requirement of informed consent owing to its retrospective design (2022ER320-1). All the procedures performed in this study were in accordance with the Declaration of Helsinki (as revised in 2013).

This study ultimately enrolled 271 patients who had undergone contrast-enhanced CT scans within 7 days of an initial bout of acute pancreatitis (AP) between March 2017 and June 2023 (Fig. 1). These patients were separated into two groups based on whether or not they developed diabetes following AP, including a PPDM-A group (n = 109) who underwent long-term follow-up for diabetes through June 2024, and a non-PPDM-A group (n = 162) for whom follow-up was performed through medical record review and by telephone through June 2024 to confirm that they had not developed diabetes during this period9. These patients were further randomized into training (n = 189) and testing (n = 82) cohorts at a 7:3 ratio. Clinical and demographic features collected from these patients included age, sex, alcohol intake, smoking history, hypertension, fatty liver, disease characteristics (modified computed tomography severity index, severity, presence/absence of necrosis), laboratory indices (hypersensitive C-reactive protein, lipase, pancreatic amylase, serum amylase), and local complications (walled-off necrosis, pancreatic pseudocyst, acute peripancreatic fluid collection10).

Fig. 1
figure 1

Flowchart of patient recruitment in this study. AP acute pancreatitis, CECT contrast-enhanced computed tomography, PPDM-A post-acute pancreatitis diabetes mellitus.

PPDM-A group inclusion criteria (1) Diagnosed AP10; (2) No prior history of diabetes; (3) Diabetes diagnosed 3+ months after AP onset, as defined by one or more of the following: Fasting blood glucose ≥ 7.0 mmol/L, 2-h postprandial blood glucose ≥ 11.1 mmol/L (200 mg/dL), HbA1c ≥ 6.5%, typical symptoms of diabetes together with random blood glucose levels ≥ 11.1 mmol/L (200 mg/dL)4,11.

Exclusion criteria (1) Acutely exacerbated chronic pancreatitis; (2) Any history of prediabetes or diabetes; (3) Blood glucose levels that were elevated while hospitalized or within 3 months following discharge; (4) Other comorbid conditions impacting pancreatic function (such as pancreatic resection, pancreatic cystic fibrosis, pancreatic tumors, pancreatic trauma, fibrous pancreatic disease, or hemochromatosis12); (5) Women who were pregnant or lactating; (6) Unclear imaging data or scanning parameters that were inconsistent; (7) < 18 or > 80 years of age; or (8) Incomplete clinical data or loss to follow-up.

Non-PPDM-A group inclusion criteria (1) First-episode AP; (2) No evidence of abnormal blood glucose levels over the course of follow-up.

Exclusion criteria (1) Acutely exacerbated chronic pancreatitis; (2) Any history of malignant tumors; (3) Unclear imaging data or scanning parameters that were inconsistent; (4) Incomplete clinical data or loss to follow-up. (5) Women who were pregnant or lactating; (6) < 18 or > 80 years of age.

Image acquisition

A 64-slice Siemens Somatom Force CT scanner and a 64-slice Philips Brilliance iCT scanner were used for CT scanning in helical mode, covering the area from the diaphragm to the lower edge of the pancreas. The scanning parameters included: tube voltage = 120 kV, automated mAs settings, pitch factor = 1, slice thickness = 0.625 mm × 64, reconstruction slice thickness = 5 mm, slice spacing = 5 mm, tube rotation time = 0.50 s, and matrix size = 334 × 334. A high-pressure injector was used for the injection of the non-ionic contrast agent iopamidol via the antecubital vein at 3.0–3.5 mL/s with an overall dose of 1.5 mL/kg, acquiring images at 25–30 s (arterial phase) and 45–50 s (venous phase) following contrast injection.

Radiomics feature selection

Two radiologists (5+ years experience) blinded to patient clinical details manually outlined the pancreas on plain, arterial, and venous phase CT images with 3D Slicer (v5.0.3, https://www.slicer.org), avoiding major structures including the common bile duct, splenic artery/vein, and pancreaticoduodenal artery/vein when delineating the region of interest (ROI) consisting of the entirety of the pancreas (Fig. 2). Laplacian Gaussian filtering and wavelet transform filtering were used to process these images in 3D Slicer for smoothing, edge enhancement, and noise reduction. The “PyRadiomics” package (v3.0.1; https://www.slicer.org/) was used to extract 1223 features from each ROI. Before these features were extracted, the resampling of all images at a 1 mm × 1 mm × 1 mm voxel resolution was performed to ensure consistency.

Fig. 2
figure 2

Illustration of the delineation of the pancreatic ROI in acute pancreatitis patients.A patient with mild acute pancreatitis and slight pancreatic enlargement (a). Blurred peripancreatic fat spaces with strip-like exudates and bilateral perirenal fascia thickening are visible. No dilation of the main pancreatic duct is observed. The region of interest (ROI) is outlined layer by layer along the pancreatic edge, avoiding blood vessels and the common bile duct. A case of acute necrotizing pancreatitis with infection and walled-off necrosis (b). A large cystic low-density lesion (approximately 14.7 × 7.2 cm) with gas density is observed, with significant enhancement of the cyst wall on contrast-enhanced imaging. Residual normal pancreatic tissue is seen at the pancreatic head and tail, with no abnormal density or enhancement. The peripancreatic fat space is poorly defined. The ROI is outlined along the cyst boundary (including walled-off necrosis area), while avoiding critical structures such as blood vessels, the common bile duct, gastrointestinal tract, peritoneum, and mesentery. ROI region of interest.

To limit the effects of any subjective biases, the ROIs for 97 patients selected at random were independently delineated by the two radiologists participating in this study, and interclass correlation coefficients (ICCs) were then used to assess agreement between these observers.

Z-score normalization was performed for all radiomics features exhibiting an ICC > 0.75, thereby ensuring that each had a mean value of 0 and a standard deviation of 1 as a means of eliminating the effects of scale-related differences among the different features. Feature selection was then performed through the variance threshold, K-best, and Least Absolute Shrinkage and Selection Operator (LASSO) methods. The detailed radiomics process is presented in Fig. 3.

Fig. 3
figure 3

Workflow of radiomics process in this study. GLCM gray level co-occurrence matrix, GLDM gray level dependence matrix, GLRLM gray level run-length matrix, GLSZM gray level size zone matrix, LASSO least absolute shrinkage and selection operator, NGTDM neighboring gray­tone difference matrix.

Model construction and evaluation

The XGBoost algorithm was used to construct a radiomics model, the performance of which was evaluated via five-fold cross-validation to minimize overfitting and ensure model generalizability. The predictive performance of the model was assessed based on the accuracy, sensitivity, specificity, F1 score, and area under the curve (AUC) values. The clinical net benefit afforded by the model was assessed through a decision curve analysis (DCA).

Model interpretability

The influence of individual features on the predictions made by the developed model was assessed using the R shapviz package (v 4.3.2, http://www.r-project.org/) to calculate SHAP values, which enable the quantification of how these features affect model outputs, thereby improving interpretability13. SHAP values were visualized to support efforts to clarify the effects of particular features on model predictions.

Statistical analysis

SPSS 26.0 (IBM Corp, NY, USA) was used for all statistical analyses. After assessing data normality for continuous data with the Kolmogorov–Smirnov test, Skewed data were reported as medians with interquartile ranges and compared using the Mann–Whitney U test, while normally distributed data were reported as means ± standard deviations and compared using independent sample t-tests. Categorical data were reported as numbers or percentages and compared with chi-square tests. The analysis of radiomics features and associated modeling was performed with R (v 4.3.2, http://www.r-project.org/), and P < 0.05 was regarded as being significant.

Results

Patient information

In total, this study enrolled 271 patients with AP (177 males, 94 females).The clinical characteristics are summarized in Table 1. Of the 22 clinical characteristics measured, only the infection status significantly differed between the PPDM-A group and the non-PPDM-A group (P < 0.001).

Table 1 Patient demographic and clinical characteristics stratified according to PPDM-A status.

Radiomics model establishment and evaluation

For each patient, 3669 radiomics features were extracted, of which 3078 were retained following ICC analysis, including 881, 1142, and 1055 features from the plain, arterial, and portal venous phase scans, respectively. The variance threshold and K-best means methods respectively reduced the numbers of features by 117 and 235, yielding a list of 2726 features. Using the LASSO algorithm with 10-fold cross-validation, the λ parameter was adjusted to an optimal list of 5 features retained for model construction, including 1 each from the plain and arterial phase scans, as well as 3 from venous phase scans (Supplementary Table S1). The model developed with the XGBoost algorithm yielded respective AUC values of 0.947 (95% CI 0.915–0.979) and 0.901 (95% CI 0.838–0.964) in the training and test cohorts (Fig. 4A,B, Supplementary Table S2). DCA analysis (Fig. 5A,B) demonstrated that the radiomics model provides a high clinical net benefit across a range of threshold probabilities. Specifically, the model outperforms the “Treat All” strategy in terms of net benefit, suggesting its potential for improving clinical decision-making.

Fig. 4
figure 4

ROC curves in (a) training and (b) test cohorts. ROC receiver operating characteristic.

Fig. 5
figure 5

DCA of the established models in the (a) training and (b) test cohorts. The grey curve represents the “Treat All Patients” strategy, where all patients are treated regardless of their predicted risk. Net Benefit on the y-axis reflects the model’s clinical utility, with higher values indicating greater benefit from the predictive model. DCA decision curve analysis.

DCA, decision curve analysis.

Model interpretation

The five most influential features in the model according to the SHAP global bar chart (Fig. 6A) were glszm-LargeAreaEmphasis, glrlm-ShortRunLowGrayLevelEmphasis, glszm-ZoneEntropy, glszm-ZonePercentage, and firstorder-Maximum. The global importance of these features was assessed based on their average SHAP values across all samples. The waterfall plots (Fig. 6C) generated for the third patient reveal the contributions of each feature to the baseline prediction value of the model [E[f(x)] = 0.395], yielding the adjusted final prediction value of [f(x) = 0.272]. For instance, the glrlm-ShortRunLowGrayLevelEmphasis feature increased the prediction score for the patient by 0.488 relative to baseline. Beeswarm plots (Fig. 6B) further highlight the effects of each feature on prediction probability values, with positive and negative effects represented in orange and purple, respectively. Positive and negative SHAP values respectively suggest increased and decreased risk of PPDM-A714. Features including glszm-LargeAreaEmphasis, glszm-ZoneEntropy, and glszm-ZonePercentage had a positive influence on the prediction of PPDM-A incidence, whereas glrlm-ShortRunLowGrayLevelEmphasis and firstorder-Maximum had a negative impact. These SHAP interactions are summarized in Fig. 6D with features along the diagonal highlighting their individual effects on predictions, while areas away from the diagonal highlight the effects of combinations of features. Wider subplots are indicative of stronger interactions among features15. The established heatmap (Fig. 6E) highlights the significant interactions that exist between glrlm-ShortRunLowGrayLevelEmphasis and glszm-LargeAreaEmphasis.

Fig. 6
figure 6

SHAP analysis results for the radiomics model for PPDM-A prediction using XGboost. In the global bar chart (a), top variables contributed more significantly to the model, exhibiting higher predictive power compared to the ones positioned at the bottom. The beeswarm plot (b) illustrates the contribution of each feature to the model’s prediction, with each point representing the SHAP value of a sample in the dataset. The SHAP waterfall plots (c) showed the individual interpretability of radiomics models. Red bar indicates increased predictive value and blue bar indicates decreased predictive value. Interaction summary plot (d) and heatmap (e) display both individual and interaction SHAP values, with diagonal points showing the individual SHAP values for each feature, and off-diagonal points revealing the interaction effects between feature pairs.

Discussion

In this study, PPDM-A was predicted using a newly developed CT radiomics-based model. SHAP analyses were also implemented to improve the interpretability of this model, yielding a high degree of AUC. Radiomics approaches are thus an effective means of predicting PPDM-A risk, while SHAP can enhance the applicability of these methods by improving the overall transparency of the model decision-making process.

PPDM-A incidence has reportedly almost doubled over the past decade, and the annual incidence rate is reportedly around 2.8%16. Zhang et al.17 used an interpretable clinical index-based machine learning model in a prior study, but the AUC of this model (0.819) indicating that simple clinical models have limited value in predicting PPDM-A. By extracting radiomics features from only either enhanced or non-contrast-enhanced CT images, Zhong et al.18 and Hu et al.5 potentially overlooked key pieces of data. In the present study, data from both image types was combined, thereby capturing a wider range of anatomical and hemodynamic features, yielding greater predictive accuracy while overcoming the limitations imposed by reliance on a single imaging strategy. The resultant model, which was based on five highly reliable texture features, exhibits excellent accuracy and generalizability. Additionally, we found a significant difference in infection status between the PPDM-A and non-PPDM-A groups. Infection can affect changes in pancreatic texture and tissue characteristics, including local blood flow, potentially introducing bias. To mitigate this, we excluded major blood vessels during segmentation and used careful feature selection methods, such as variance threshold, K-best, and LASSO, to minimize the impact of irrelevant features. Repeated cross-validation analyses showed that infection did not significantly affect the performance of the model. While infection could be a confounding factor, our model remained stable and reliable. Future studies should better control for this variable to further validate our findings.

SHAP was herein leveraged to interpret the radiomics-based predictions of PPDM-A risk for the first time. In AP, damage to the pancreas results in the destruction of islet cells, reduced insulin secretion, and higher levels of insulin resistance19,20,21. These shifts can be reflected by altered pancreatic morphology, density, and texture visible on CT images. Among the texture features incorporated into the model in this study, glszm (Gray-Level Size Zone Matrix) was particularly important. GLSZM is a texture feature that emphasizes the continuity of pixel values within two-dimensional regions. By computing characteristic parameters such as SZE (Size Zone Emphasis) and LZE (Long Zone Emphasis), it analyzes the heterogeneity of image textures and the distribution of region sizes6. The most significant predictor of PPDM-A in this study, according to SHAP analysis, was glszm LargeAreaEmphasis, likely owing to the ability of this feature to detect extensive pancreatic injury and inflammation22. Indeed, this feature reflects the overall pathological state of the pancreas owing to the fact that it captures extensive gray-level variation. As a measure of textural complexity and gray-level distribution, ZoneEntropy suggests that PPDM-A patients exhibit greater pancreatic heterogeneity consistent with the damage in these patients being more severe than that for patients without PPDM-A23.

In prior studies, both disease severity and pancreatic necrosis have been identified as being independently associated with the risk of endocrine dysfunction after AP19,24,25, with higher incidence following episodes of severe AP relative to mild AP (39% vs. 14%). This aligns well with the findings from the prior study, emphasizing the reliability of the established radiomics model. In the present model, morphological changes including diameter and volume were not significant, indicating that AP-related morphological shifts may not be good predictors of PPDM-A incidence. The established SHAP waterfall plots offer a detailed overview of the contributions of specific features to PPDM-A risk predictions, affording a higher degree of insight and specificity as compared to many such models, which suffer from a lack of interpretability.

The predictive model developed herein was generated using the XGBoost classifier, which enhanced predictive accuracy by employing ensemble learning to combine multiple weak models (individual decision trees) into a single model with greater power26. Relative to other classifiers, XGBoost is capable of automatically capturing interactions among features through the examination of multiple features during decision tree splits, thereby providing a higher degree of insight relative to reliance on any single feature. The SHAP interaction plots generated herein highlight strong interactions between the LargeAreaEmphasis and glrlm-ShortRunLowGrayLevelEmphasis features, the latter of which serves as a measure of small region uniformity within CT images. Lesion areas may thus harbor both low-gray-level small regions and extensive areas of uniformity. These detailed insights can provide a more thorough understanding of how this predictive model functions, leading to greater overall accuracy.

Limitations

This study has some limitations, For one, the retrospective nature of this study contributed to a relatively limited sample size, highlighting a need for future validation in larger cohorts of patients. Secondly, these data were all derived from a single center and may be susceptible to selection bias such that further multicenter validation of the robustness and generalizability of this model is essential. Third, this model did not incorporate clinical parameters, and certain key pieces of clinical data may thus have been overlooked. In the future, researchers should seek to combine radiomics features and clinical variables to establish more comprehensive machine learning-based models capable of predicting PPDM-A risk, thereby informing the understanding of this disease and aiding its diagnosis.

Conclusion

In conclusion, a robust, interpretable machine learning model based on CT radiomics features was herein developed as a tool to predict PPDM-A incidence. These results highlight the value of pancreatic radiomics features as biomarkers that can be leveraged to objectively predict PPDM-A, allowing for individualized clinical decision-making such that high-risk patients can receive appropriate interventions at an early time point. The generated SHAP values also ensure that this model is interpretable by clinicians, thereby ensuring the greater overall reliability of this radiomics model.