Explainable multimodal AI for skin lesion risk prediction via 3D imaging and clinical data

Wang, Zheng; Tai, Mengnan; Hu, Hui; Yuan, Hao; Wang, Chong; Fu, Hongyang; Zhang, Jianglin

doi:10.1038/s41598-025-33536-z

Download PDF

Article
Open access
Published: 22 December 2025

Explainable multimodal AI for skin lesion risk prediction via 3D imaging and clinical data

Zheng Wang^1,2^na1,
Mengnan Tai^1,2^na1,
Hui Hu¹,
Hao Yuan¹,
Chong Wang²,
Hongyang Fu^2,3 &
…
Jianglin Zhang^2,4,5

Scientific Reports volume 15, Article number: 45139 (2025) Cite this article

2303 Accesses
Metrics details

Subjects

Abstract

Accurate diagnosis of skin lesions remains challenging due to their morphological variability and the limitations of conventional diagnostic methods. In this study, we developed an explainable artificial intelligence (AI) framework that integrates three-dimensional total body photography (3D TBP) images with structured clinical data to classify and assess the risk of six common skin-lesion types. Using the ISIC 2024 dataset comprising 1,075 patients, 41 clinical and lesion-specific features were extracted and analyzed. A multinomial logistic-regression model was implemented for decision support, and model interpretability was assessed using Shapley Additive Explanations (SHAP) and Class Activation Maps (CAM). The clinical-only XGBoost model achieved moderate accuracy (basal cell carcinoma 78.6%, nevus 72.6%), while CNNs trained on 3D TBP images achieved 87.1% accuracy for nevus. The multimodal fusion model substantially outperformed unimodal approaches, achieving recall and F1 scores above 95% and Area Under the Curve (AUC) values exceeding 0.95 (0.98 for nevus and actinic keratosis), and ranked among the top-performing entries in the ISIC 2024 challenge (partial false-positive rate = 0.1734). The integrated scoring system, visualized through nomograms, identified key predictors such as visual_classifier and tbp_lv_symm_2axis. This interpretable multimodal AI framework enhances diagnostic accuracy and risk stratification, offering a transparent and clinically actionable tool for precision dermatology and early detection of skin cancer.

Introduction

Skin lesions represent a major global health concern, contributing substantially to healthcare burdens and reduced quality of life worldwide^1,2. While benign lesions, such as nevi, generally have limited malignant potential, precancerous conditions like solar and actinic keratoses can progress to squamous cell carcinoma (SCC) if left untreated^3,4,5,6. Among malignant skin cancers, basal cell carcinoma (BCC), SCC, and melanoma are the most prevalent and clinically challenging due to their variable morphology, diverse prognostic outcomes, and frequent diagnostic overlap^7,8,9,10,11. Conventional clinical approaches, including the ABCDE rule, are limited by subjective interpretation and interobserver variability^12,13. Although imaging modalities such as reflectance confocal microscopy (RCM) and optical coherence tomography (OCT) enhance lesion visualization, their widespread clinical adoption is constrained by high cost, limited resolution, and operational complexity^14,15,16.

Recent advances in artificial intelligence (AI) and deep learning (DL) have shown strong potential in automating skin-lesion classification and improving diagnostic accuracy^{17,18,19,20,21}. Convolutional neural networks (CNNs) have achieved dermatologist-level performance in several studies, yet their clinical translation remains limited by challenges in generalization across imaging modalities and the lack of interpretability^22,23,24,25. For AI-based systems to be trusted in medical settings, transparency in model decision-making is essential, allowing clinicians to validate algorithmic reasoning and integrate AI predictions into patient care^26,27.

Three-dimensional Total Body Photography (3D TBP) represents an emerging imaging modality that addresses many limitations of traditional two-dimensional imaging. Using synchronized high-resolution DSLR arrays, 3D TBP provides a comprehensive mapping of the skin surface, enabling detailed visualization of lesion morphology, color distribution, and anatomical context^{28,29,30,31,32}. This technology facilitates longitudinal lesion tracking and quantitative assessment of lesion burden, creating a rich multimodal data source for computational analysis. However, its integration with clinical data in interpretable AI frameworks remains underexplored.

In this study, we propose an explainable multimodal AI framework that integrates 3D TBP imaging features with structured clinical parameters to improve diagnostic accuracy and interpretability for skin-lesion classification. The framework employs transfer learning on CNNs pre-trained with dermoscopic datasets, combined with a multinomial logistic-regression scoring model for six-class lesion prediction. To enhance transparency, Shapley Additive Explanations (SHAP) and Class Activation Maps (CAM) are used to visualize key features and spatial attention. This approach aims to bridge advanced computational modeling with clinical reasoning, offering a transparent and clinically applicable AI system to support early detection and personalized management of skin lesions.

Methods

Data acquisition

The skin lesion data used in this study were obtained from the ISIC 2024 dataset, which includes records from patients who underwent 3D TBP between 2015 and 2024^31,33. This international dataset comprises thousands of cases collected from institutions across six continents, including Memorial Sloan Kettering Cancer Center (USA), Barcelona Clinic Hospital (Spain), Queensland University (Australia), Vienna Medical University (Austria), and Basel University Hospital (Switzerland). The dataset primarily focuses on atypical or clinically significant lesions that have been biopsied or monitored short-term using digital dermoscopy, thereby reducing the selection bias common in conventional dermoscopic databases. A total of 41 clinical and lesion-specific features were extracted retrospectively, such as approximate age (age_approx), lesion size (clin_size_long_diam_mm), and TBP-derived parameters (e.g., tbp_lv_A, tbp_lv_Bext) (Supplementary Table S1). Among the extracted features, tbp_lv_eccentricity and tbp_lv_symm_2axis quantify lesion eccentricity and border asymmetry, respectively—metrics that serve as discriminative indicators in dermatologic assessment^34,35. The 3D total-body photography (TBP) derived images preserve detailed morphological attributes that align closely with established clinical diagnostic criteria. For instance, lesion asymmetry—a fundamental component of the ABCD (Asymmetry, Border, Color, Diameter) rule—is captured through the precise delineation of shape irregularities and border contours. Concurrently, rigorous color and contrast calibration within the imaging system resolve pigmentation variations, enabling the detection of multichromatic patterns often indicative of malignancy. By preserving these clinically relevant morphological and chromatic features, the standardized TBP data enable the predictive model to leverage visual cues analogous to those utilized in direct clinical examination, thereby enhancing both interpretability and diagnostic consistency.

The dataset includes six major lesion categories: invasive melanoma (n = 157), basal cell carcinoma (n = 163), squamous cell carcinoma (n = 73), nevus (n = 443), benign not otherwise specified (Benign NOS, n = 200), and solar or actinic keratosis (n = 39). To address class imbalance, data augmentation techniques—including random rotation, horizontal flipping, and color variation—were applied to all lesion types except nevus. These methods were selected to increase data diversity without altering diagnostic labels, thereby improving the model’s ability to learn underrepresented lesion characteristics. Alternative strategies such as class weighting and oversampling were considered; however, data augmentation was chosen to minimize overfitting while maintaining the natural variability of lesion presentations. The overall study workflow is illustrated in Fig. 1.

Study design

To evaluate clinical parameters, we trained various machine learning algorithms, including logistic regression, decision trees, random forests, gradient boosting, XGBoost, CatBoost, and support vector machines (SVM), and selected the most accurate and robust models, e.g. XGBoost. For image data, CNNs were employed, with transfer learning based on the HAM10000 dataset. The CNN architecture comprised three convolutional blocks with convolutional layers, activation functions, and pooling operations, tailored for six-class skin lesion classification. To accommodate 3D TBP images, two additional Conv2D layers were appended during fine-tuning for enhanced feature extraction³⁶, although these single-modality approaches remained constrained by limited diagnostic accuracy.

Multimodal fusion strategy

We developed a multimodal fusion framework designed to integrate image-derived and clinical information into a unified predictive model (Fig. 2). First, a deep learning network trained on 3D total-body photography (TBP) images generated six-class probability outputs, which served as structured visual feature vectors. Concurrently, clinical metadata—specifically approximate age and longitudinal lesion diameter—were compiled into clinical feature vectors. To ensure scale comparability and model stability, these distinct feature sets were concatenated and standardized before being processed by an XGBoost classifier.

This late-fusion strategy synthesizes the visual representations extracted by the convolutional neural network (CNN) with complementary clinical variables, allowing both phenotypic and demographic data to jointly inform decision-making. The prediction process is formally expressed as:

$$\:{f}_{image}={CNN}_{image}\left(I\right),\:{f}_{clinical}=[{x}_{1},{x}_{2},\dots\:,{x}_{n}]$$

(1)

$$\:\widehat{y}=XGBoost\left(Concatenate\right({f}_{image},\:{f}_{clinical}\left)\right)$$

(2)

Where $\:{f}_{image}\in\:{R}^{6}$ represents the CNN-derived six-class probability vector,$\:{f}_{clinical}$ denotes the clinical feature set, and $\:\widehat{y}$ is the final prediction. Validated through interpretability analyses—including SHAP, CAM, and nomogram assessment—this multimodal approach significantly outperformed unimodal models, demonstrating an effective strategy for integrating heterogeneous data sources in dermatologic analysis.

Scoring system

In this study, we employed a multinomial logistic regression model to estimate the probabilities of six distinct skin disease categories, providing an intuitive framework for evaluating classification performance while retaining transparency and interpretability. In this study, a multinomial logistic regression model was employed to estimate the probabilities of six distinct skin lesion categories, providing a transparent and interpretable framework for classification. To ensure model stability and reduce redundancy among predictors, multicollinearity was evaluated using the Variance Inflation Factor (VIF). For each feature $\:{X}_{i}$. a linear regression model was fitted with all other features as independent variables, and the coefficient of determination ($\:{R}^{2}$)was calculated to quantify the extent to which $\:{X}_{i}$ could be explained by other predictors. The VIF was then computed as $\:VI{F}_{i}=\frac{1}{1-{{R}_{i}}^{2}}$. Features with VIF values exceeding 10 were considered highly collinear and were excluded to prevent instability in coefficient estimation. Under the assumption of minimal multicollinearity among the remaining features, the cumulative logistic function for the probability of class membership at threshold was subsequently defined as:

$$\:logit\left(P\right(Y\le\:k\mid\:X\left)\right)={{\beta\:}_{0}}^{\left(k\right)}+\sum\:_{i=1}^{n}{{\beta\:}_{i}}^{\left(k\right)}\cdot\:{X}_{i}\text{}$$

(3)

where $\:logit\left(P\right(Y\le\:k\mid\:X\left)\right)\:$represents the cumulative logistic function for category k, $\:{{\beta\:}_{0}}^{\left(k\right)}$ is the intercept term for category k, $\:{{\beta\:}_{i}}^{\left(k\right)}$ is the regression coefficient for the i-th feature in category k and $\:{X}_{i}$, is the value of the i-th input feature. To derive the predicted probabilities for each of the six categories, the model uses the following formula:

$$\:log\left(\frac{P\left(Y=1\right)}{P\left(Y=0\right)}\right){\beta\:}_{0}=+{\beta\:}_{1}\cdot\:{X}_{1}+{\beta\:}_{2}\cdot\:{X}_{2}+\cdots\:+{\beta\:}_{n}\cdot\:{X}_{n}$$

(4)

This modeling approach provides a balance between predictive performance and interpretability, enabling detailed analyses of feature importance across multiple categories. The estimated coefficients and their class-specific effects underscore the model’s capacity to capture the influence of individual features on classification outcomes.

Model training and implementation

The model was trained for 200 epochs using a learning rate of 1 × 10⁻⁴, a batch size of 128, and the Adam optimizer, with cross-entropy loss applied for multi-class classification. All experiments were performed on an NVIDIA RTX 4070 GPU (8 GB RAM) with CUDA acceleration to enhance computational efficiency. The dataset, stored in CSV format, was preprocessed and converted to tensor form using the torchvision.transforms.ToTensor() function, and each image was resized to 128 × 128 × 3 pixels to match the model’s input specifications. Training comprised standard forward and backward propagation followed by parameter updates, with loss and accuracy recorded at each epoch to monitor convergence. The model was implemented in the PyTorch framework, utilizing key libraries including torch, numpy, pandas, Pillow, matplotlib, seaborn, and scikit-learn for data handling, visualization, and performance evaluation.

Statistical analysis

In this study, we analyzed clinical and demographic characteristics (age, sex) along with lesion-specific imaging and texture features across skin lesion subtypes. Statistical significance was evaluated via p-values. A multimodal fusion model, combining clinical parameters and 3D imaging features, was developed, and its performance was evaluated using receiver operating characteristic (ROC) curves and confusion matrices. To enhance interpretability, SHAP analysis identified influential features, and a nomogram was constructed for intuitive visualization of prediction outcomes and feature contributions^37,38. This integrated approach demonstrates robust diagnostic accuracy and transparency, highlighting its potential for practical clinical decision support.

Result

Descriptive analysis of clinical covariates

We evaluated 41 clinical and lesion-specific features—encompassing demographics (age_approx, sex), lesion morphology (e.g., tbp_lv_area_perim_ratio), and color/texture metrics (e.g., tbp_lv_Bext, tbp_lv_C)—in 1,075 patients (60–65 years median age) undergoing 3D TBP. Lesions were classified into six diagnostic categories (melanoma invasive, BCC, SCC, nevus, Benign NOS, solar/actinic keratosis). Baseline comparisons showed no significant between-group differences for age (P = 0.3907), sex (P = 0.4272), lesion size (clin_size_long_diam_mm; P = 0.3371), TBP intensity/color indices (A/B/C/H/L and “ext” variants), geometry (eccentricity, minor axis), symmetry (symm_2axis), spatial coordinates, area, or perimeter (all P ≥ 0.18). The only feature demonstrating significant separation was tbp_lv_nevi_confidence (overall 45.7 ± 43.5; highest in nevi 71.0 ± 36.9; lowest in SCC 2.7 ± 12.2; P = 0.0480). DNN lesion confidence was high overall but not discriminatory (P = 0.3738). To mitigate class imbalance, all non-nevus categories underwent targeted augmentation (random rotation, horizontal flip, color jitter), and features were standardized prior to modeling.

Diagnostic accuracy using clinical-only model

We evaluated the diagnostic performance of machine learning models trained solely on clinical covariates (Table 1). The XGBoost model achieved 78.6% accuracy for BCC, although 10.7% of BCC cases were misclassified as nevi, introducing a risk of delayed treatment. Nevus classification accuracy was 72.6%, with 22.1% of nevi erroneously labeled as BCC, potentially prompting unnecessary interventions. For benign NOS lesions, the classifier attained 60.0% accuracy, misassigning 25.0% to BCC and 12.5% to nevi, indicative of overlapping clinical features. Invasive melanoma detection was markedly lower at 43.8%, reflecting a substantial false-negative rate. Actinic keratosis and SCC were correctly identified in only 12.5% and 16.7% of cases, respectively, with 75.0% of SCC lesions misdiagnosed as BCC, underscoring the model’s limited sensitivity for high‐risk tumor detection.

Table 1 Comparison of the models on clinical data.

Full size table

Acc, accuracy; Pre, precision; Rec, recall; PR, precision and recall. XGBoost, Extreme Gradient Boosting; CatBoost, Categorical Boosting; SVM, Support Vector Machine.

Evaluation of 3D TBP-only model performance

A six-class CNN was trained exclusively on 3D total body photography (TBP) images to evaluate lesion classification performance. The adapted model achieved 87.10% accuracy for nevi, 75.34% for Benign NOS, and 71.88% for invasive melanoma. Classification accuracies for basal cell carcinoma (54.05%) and squamous cell carcinoma (60.32%) were moderate, indicating elevated misclassification rates. Actinic/solar keratosis was correctly identified in 65.62% of cases, with 15.62% mislabelled as SCC. These findings demonstrate that domain adaptation of a pre‐trained image model alone yields variable performance across lesion types, reflecting morphological and textural overlaps, and underscore the necessity of incorporating multimodal data fusion to restore and enhance diagnostic robustness.

Model performance and clinical interpretability

The multimodal feature set, integrating CNN-derived 3D TBP embeddings with clinical covariates, demonstrated excellent performance in classifying six skin-lesion categories under five-fold cross-validation (Table 2). ROC analysis showed high discriminative ability, with area under the curve (AUC) values exceeding 0.95 across all classes and reaching 0.98 for nevus and actinic keratosis (Fig. 3a). These findings confirm the robustness of the multimodal fusion approach and its ability to generalize across lesion types with diverse morphological characteristics.

Table 2 Five cross-validation on multimodal data.

Full size table

To enhance interpretability, SHapley Additive Explanations (SHAP) and Class Activation Mapping (CAM) were used to identify key predictors and visualize their clinical relevance (Fig. 3b–c). The SHAP summary plot illustrates how total-body photography (TBP)–derived numerical features and clinical parameters contribute to the classification of six skin lesion types. The most influential predictors corresponded closely to established dermatological diagnostic criteria, including asymmetry, border irregularity, and pigment variation. For basal cell carcinoma (BCC), tbp_lv_symm_2axis and tbp_lv_H captured lesion asymmetry and color heterogeneity, features consistent with its irregular morphology. In benign lesions, tbp_lv_norm_color reflected uniform pigmentation and smooth texture typical of nonmalignant presentations. For nevi, tbp_lv_deltaA and tbp_lv_deltaB quantified subtle differences in color saturation and brightness between the lesion core and periphery, aligning with symmetric and evenly pigmented patterns.

In squamous cell carcinoma, tbp_lv_Lext and visual_classifier emphasized keratinized, irregular surfaces, while in invasive melanoma, tbp_lv_deltaB, tbp_lv_z, and mel_thick_mm represented lesion depth and pigment heterogeneity—hallmarks of malignancy. Actinic keratosis showed coarse texture and uneven coloration, consistent with its scaly surface morphology. These TBP-derived features, particularly tbp_lv_z, tbp_lv_norm_color, tbp_lv_deltaA, and tbp_lv_deltaB, effectively capture clinically meaningful visual attributes essential for dermatologic assessment. CAM visualizations further validated these findings, highlighting greater depth and pigment heterogeneity in malignant lesions compared with the uniformity of benign lesions. Detailed feature definitions are provided in Supplementary Table S1.

To benchmark our framework against the ISIC 2024 binary classification challenge, we collapsed the six lesion categories into benign and malignant groups; the fusion model maintained strong performance (Fig. 4; Table 3), achieving a partial false-positive rate (pFPR) of 0.17343 at a fixed sensitivity ≤ 0.20. This result ranks immediately advance the top five challenge teams (best pFPR = 0.17210–0.17264), underscoring the model’s competitive generalizability. Collectively, these outcomes validate the multimodal fusion approach as a versatile, transparent, and reliable clinical decision–support tool.

Table 3 Comparison with the top 5 teams.

Full size table

Integrated scoring panel for clinical decision diagnosis

A multinomial logistic regression model was developed to predict the probabilities of six skin lesion categories—basal cell carcinoma (BCC), benign NOS, nevus, squamous cell carcinoma (SCC), invasive melanoma, and solar/actinic keratosis. To ensure the independence of predictors, we systematically evaluated the feature set for multicollinearity. Initial inspection via a hierarchical clustering heatmap (Fig. 5a) revealed predominantly weak correlations, with distinct clusters limited to specific structural descriptors. Quantitative assessment using the Variance Inflation Factor (VIF) identified six features (tbp_lv_Lext, tbp_lv_L, tbp_lv_B, tbp_lv_A, tbp_lv_deltaL, tbp_lv_Aext) exceeding the threshold of 10. Pairwise scatterplot analysis confirmed significant linear dependencies among these variables (Fig. 5b). Following the exclusion of these highly collinear features, a re-evaluation yielded VIF values consistently below the conventional threshold (maximum = 1.7), confirming that the final feature set was free from significant redundancy and suitable for robust predictive modeling (Fig. 5c).

The multinomial logistic regression model was developed to estimate the probabilities of six skin lesion categories—basal cell carcinoma (BCC), benign NOS, nevus, squamous cell carcinoma (SCC), invasive melanoma, and solar or actinic keratosis—based on selected image-derived and clinical features. As illustrated in Fig. 6, each variable contributes a corresponding score, which is summed to generate a total score and mapped to the probability of class membership. Among the predictors, the visual_classifier exhibited the strongest association with lesion classification (β = 3.503, OR = 24.68, 95% CI: 13.27–45.90, P = 0.0021), indicating that the CNN-derived output provided the most influential contribution. Other features such as tbp_lv_Aext (OR = 0.80, P = 0.0058) demonstrated moderate effects, while tbp_lv_symm_2axis (OR = 36.15) showed a wide confidence interval, reflecting uncertainty in its effect size estimation.

In the final multivariable model, the strongest positive predictors included visual_classifier, tbp_lv_symm_2axis, and tbp_lv_color_std_mean, each contributing substantially to the cumulative log-odds of classification. Variables with coefficients near zero (e.g., tbp_lv_L, tbp_lv_x) had minimal influence, whereas predictors with negative coefficients (e.g., tbp_lv_areaMM2, tbp_lv_eccentricity) decreased the probability of higher-risk lesion classification. Age and spatial coordinates (x, y) exerted negligible effects (OR ≈ 1), suggesting limited diagnostic relevance.

The scoring system derived from this model provided an interpretable and transparent framework for evaluating lesion risk. Compared with deep learning models, the multinomial logistic regression approach offers clear coefficient-based interpretability while maintaining good predictive performance. After screening for multicollinearity using the variance inflation factor (VIF) and removing highly correlated features, the model coefficients remained stable, confirming its robustness and suitability for quantitative scoring across multiple lesion types.

Discussion

This study demonstrates that integrating structured clinical data with 3D TBP images within an explainable artificial intelligence (XAI) framework substantially improves diagnostic performance for skin-lesion classification. The proposed multimodal model achieved recall and F1 scores above 95% and AUC values exceeding 0.95 across all lesion categories, with a peak AUC of 0.98 for nevus and actinic keratosis. In contrast, unimodal models—XGBoost based on clinical data and CNNs based on 3D TBP alone—showed lower accuracy (72.6–87.1%). Benchmarking against the ISIC 2024 Challenge (pFPR = 0.1734) confirmed strong generalizability. These results highlight the model’s capacity to leverage both morphological and contextual cues, addressing diagnostic challenges caused by lesion variability and data imbalance.

The use of SHAP and CAM provided transparent interpretation of model predictions. These techniques revealed that the network focused on clinically relevant features such as color variation and border irregularity, aligning its decision-making with established dermatological heuristics. The multinomial logistic regression model enhanced clinical interpretability by quantifying class-specific risk probabilities. Nomogram visualization identified the visual_classifier and tbp_lv_symm_2axis as the primary determinants of diagnosis, yielding odds ratios of 33.28 and 6.37, respectively. These findings highlight the framework’s potential as a transparent clinical decision-support tool. While wide confidence intervals were observed for certain variables—likely reflecting sample variability—the inclusion of these features aligns with established dermatologic markers validated in prior literature. Future studies utilizing expanded, multicenter datasets are necessary to refine these effect estimates and further improve the stability and clinical applicability of multimodal systems.

Clinically, this multimodal framework offers practical applications in early detection, triage, and large-scale screening of skin cancer. It can support non-specialist clinicians in identifying high-risk lesions for referral, particularly in resource-limited settings, and enhance diagnostic consistency across clinical environments. The system’s transparent decision pathway fosters clinician trust and facilitates its integration into routine dermatologic workflows, promoting precision medicine and improved patient outcomes.

This study has limitations inherent to the exclusive reliance on the ISIC 2024 dataset. While the dataset is extensive, it contains selection and acquisition biases typical of data not derived from controlled clinical trials. Variations in imaging hardware, lighting, and camera angles across participating institutions introduce inconsistencies in image quality and lesion presentation. Furthermore, the limited sample size for specific lesion types, such as squamous cell carcinoma (SCC), may affect model stability. Although data augmentation was employed to address class imbalance, this approach may result in optimistic performance estimates. Consequently, the high AUC values observed likely reflect internal consistency within the training data rather than fully generalizable clinical performance. To address these issues, future research must incorporate larger, standardized, and more balanced multicenter datasets.

Furthermore, the current model has not been validated using real-world total-body photography (TBP) or independent external cohorts. Translating this approach to clinical practice presents challenges related to patient positioning, background complexity, and lighting heterogeneity, all of which may influence predictive accuracy. Therefore, rigorous prospective validation across diverse medical centers, devices, and patient populations is essential. Establishing the model’s robustness under these varied demographic and environmental conditions is a critical prerequisite for clinical deployment.

Conclusion

In summary, the proposed multimodal, interpretable AI framework integrates 3D TBP imaging and clinical data to achieve high diagnostic accuracy (AUC > 0.95; F1 > 95%) across six skin-lesion types while maintaining transparency through SHAP and CAM analyses. By bridging deep learning and clinical reasoning, the system provides a reliable and interpretable decision-support tool for dermatology, with potential to improve early detection, triage, and individualized patient care in real-world settings.

Data availability

The datasets generated and/or analysed during the current study are available in the ISIC repository, https://challenge2024.isic-archive.com/.

References

Hay, R. J. et al. The global burden of skin disease in 2010: an analysis of the prevalence and impact of skin conditions. J. Invest. Dermatology. 134 (6), 1527–1534 (2014).
Article Google Scholar
Dehkharghani, S. et al. The economic burden of skin disease in the united States. J. Am. Acad. Dermatol. 48 (4), 592–599 (2003).
Article PubMed Google Scholar
Pampena, R. et al. A meta-analysis of nevus-associated melanoma: prevalence and practical implications. J. Am. Acad. Dermatol. 77 (5), 938–945 (2017).
Article PubMed Google Scholar
Lee, E. H., Kishwer, S., Nehal & Joseph, J. Disa. Benign and premalignant skin lesions. Plast. Reconstr. Surg. 125 (5), 188e–198e (2010).
Article CAS PubMed Google Scholar
Callen, J. P. & Bickers, D. R. Moy. Actinic keratoses. J. Am. Acad. Dermatol. 36 (4), 650–653 (1997).
Article CAS PubMed Google Scholar
Leffell, D. J. The scientific basis of skin lesions. J. Am. Acad. Dermatol. 42 (1), S18–S22 (2000).
Article Google Scholar
MacKie, R. M. & Hauschild, A. Eggermont. Epidemiology of invasive cutaneous melanoma. Ann. Oncol. 20, vi1–vi7 (2009).
Article PubMed PubMed Central Google Scholar
Hosny, K. M. et al. Explainable deep inherent learning for multi-classes skin lesion classification. Appl. Soft Comput. 159, 111624 (2024).
Article Google Scholar
Rubin, A. I. & Elbert, H. Chen, and Désirée Ratner. Basal-cell carcinoma. N. Engl. J. Med. 353, 2262–2269 (2005).
Article CAS PubMed Google Scholar
Alam, M. & Désirée, R. Cutaneous squamous-cell carcinoma. N. Engl. J. Med. 344 (13), 975–983 (2001).
Article CAS PubMed Google Scholar
Qian, S. et al. Skin lesion classification using CNNs with grouping of multi-scale attention and class-specific loss weighting. Comput. Methods Programs Biomed. 226, 107166 (2022).
Article PubMed Google Scholar
Hammoud, S. et al. Evaluating the use of 3D skin models as simulation-based educational tools among nursing students: A quasi-experimental study. Nurse Educ. Today. 146, 106519 (2024).
Article PubMed Google Scholar
Naveed, A. et al. PCA: progressive class-wise attention for skin lesions diagnosis. Eng. Appl. Artif. Intell. 127, 107417 (2024).
Article Google Scholar
Cataldo, A. et al. Integrating microwave reflectometry and deep learning imaging for in-vivo skin lesions diagnostics. Measurement 235, 114911 (2024).
Article Google Scholar
Lundberg, S. M. et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomedical Eng. 2 (10), 749–760 (2018).
Article Google Scholar
Kurtansky, N. R. et al. The SLICE-3D dataset: 400,000 skin lesion image crops extracted from 3D TBP for skin lesions detection. Sci. Data. 11 (1), 884 (2024).
Article PubMed PubMed Central Google Scholar
Chyad, M. et al. Exploring adversarial deep learning for fusion in multi-color channel skin detection applications. Inform. Fusion. 114, 102632 (2025).
Article Google Scholar
Brancaccio, G. et al. Artificial intelligence in skin lesions diagnosis: a reality check. J. Invest. Dermatology. 144 (3), 492–499 (2024).
Article CAS Google Scholar
Groh, M. et al. Deep learning-aided decision support for diagnosis of skin disease across skin tones. Nat. Med. 30 (2), 573–583 (2024).
Article CAS PubMed PubMed Central Google Scholar
Fayyad, J., Alijani, S. & Homayoun, N. Empirical validation of conformal prediction for trustworthy skin lesions classification. Comput. Methods Programs Biomed. 253, 108231 (2024).
Article PubMed Google Scholar
Mir, A. N. & Danish Raza Rizvi. Advancements in deep learning and explainable artificial intelligence for enhanced medical image analysis: A comprehensive survey and future directions. Eng. Appl. Artif. Intell. 158, 111413 (2025).
Article Google Scholar
Mir, A., Nazir, D. R., Rizvi & Md Rizwan, A. Enhancing histopathological image analysis: an explainable vision transformer approach with comprehensive interpretation methods and evaluation of explanation quality. Eng. Appl. Artif. Intell. 149, 110519 (2025).
Article Google Scholar
Mir, A., Nazir, D. R., Rizvi & Iqra Nissar. Enhancing medical image report generation using a self-boosting multimodal alignment framework. Health Inform. Sci. Syst. 13 (1), 58 (2025).
Article Google Scholar
Marchetti, M. A. et al. 3D Whole-body skin imaging for automated melanoma detection. J. Eur. Acad. Dermatol. Venereol. 37 (5), 945–950 (2023).
Article CAS PubMed Google Scholar
Jordan, M. I., Tom, M. & Mitchell Machine learning: Trends, perspectives, and prospects. Science 349, 255–260 (2015).
Article ADS MathSciNet CAS PubMed Google Scholar
Zhang, L. & Li, J. Prospects for the application of artificial intelligence in geriatrics. J. Translational Intern. Med. 12 (6), 531–533 (2025).
Article Google Scholar
Chen, K., Li, J. & Li, L. Artificial intelligence for disease X: progress and challenges. J. Translational Intern. Med. 12 (6), 534–536 (2025).
Article Google Scholar
Prabha, A. et al. Design of intelligent diabetes mellitus detection system using hybrid feature selection based XGBoost classifier. Comput. Biol. Med. 136, 104664 (2021).
Article PubMed Google Scholar
Volinsky-Fremond, S. et al. Prediction of recurrence risk in endometrial cancer with multimodal deep learning. Nat. Med. 30 (7), 192–1973 (2024).
Google Scholar
Heal, C. F. et al. Accuracy of clinical diagnosis of skin lesions. Br. J. Dermatol. 159 (3), 661–668 (2008).
CAS PubMed Google Scholar
Liu et al. A deep learning system for differential diagnosis of skin lesions. Nat. Med. 26, 900–908 (2020).
Article ADS CAS PubMed Google Scholar
Gandhi, A. et al. Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions. Inform. Fusion. 91, 424–444 (2023).
Article Google Scholar
Gellrich, F. et al. Comparison of the efficacy of skin examination using 3D total body photography to clinical and dermoscopic examination. EJC Skin. Lesions. 2, 100264 (2024).
Google Scholar
Damian, F. et al. Feature selection of non-dermoscopic skin lesion images for nevus and melanoma classification. Computation 8.2 : 41. (2020).
Almaraz-Damian et al. Melanoma and nevus skin lesion classification using handcraft and deep learning feature fusion via mutual information measures. Entropy 22 (4), 484 (2020).
Article ADS PubMed PubMed Central Google Scholar
Gunning, D. et al. XAI—Explainable artificial intelligence. Sci. Rob. 4, 37 (2019).
Google Scholar
Premaladha, J. & Ravichandran, K. S. Novel approaches for diagnosing melanoma skin lesions through supervised and deep learning algorithms. J. Med. Syst. 40, 1–12 (2016).
Article Google Scholar
Dascalu, A. & David, E. O. Skin lesions detection by deep learning and sound analysis algorithms: A prospective clinical study of an elementary dermoscope. EBioMedicine 43 : 107–113. (2019).

Download references

Acknowledgements

We gratefully acknowledge the support of the Laboratory of 3D Scene Visualization and Intelligent Education in Hunan Province (Grant No. 2023TP1038) and the Research Center for Innovative Development of Teacher Education in the New Era.

Funding

This work was supported by the Scientific Research Fund of the Hunan Provincial Education Department (grant no. 23A0643, 23C0430, and 24A0080).

Author information

These authors jointly supervised this work: Zheng Wang and Mengnan Tai.

Authors and Affiliations

School of Computer Science, Hunan First Normal University, Changsha, 410205, China
Zheng Wang, Mengnan Tai, Hui Hu & Hao Yuan
Department of Dermatology, The Second Clinical Medical College, The First Affiliated Hospital, Shenzhen People’s Hospital, Jinan University, Southern University of Science and Technology), Shenzhen, 518020, Guangdong, China
Zheng Wang, Mengnan Tai, Chong Wang, Hongyang Fu & Jianglin Zhang
Department of Dermatology, Shenzhen Baoan Women’s and Children’s Hospital, Shenzhen, 518000, Guangdong, China
Hongyang Fu
Candidate Branch of National Clinical Research Center for Skin lesions, Shenzhen, 518020, Guangdong, China
Jianglin Zhang
Department of Geriatrics, The Second Clinical Medical College, The First Affiliated Hospital, Shenzhen People’s Hospital, Jinan University, Southern University of Science and Technology), Shenzhen, 518020, Guangdong, China
Jianglin Zhang

Authors

Zheng Wang
View author publications
Search author on:PubMed Google Scholar
Mengnan Tai
View author publications
Search author on:PubMed Google Scholar
Hui Hu
View author publications
Search author on:PubMed Google Scholar
Hao Yuan
View author publications
Search author on:PubMed Google Scholar
Chong Wang
View author publications
Search author on:PubMed Google Scholar
Hongyang Fu
View author publications
Search author on:PubMed Google Scholar
Jianglin Zhang
View author publications
Search author on:PubMed Google Scholar

Contributions

Zheng Wang, Mengnan Tai and Hongyang Fu: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing-original draft. Zheng Wang, Mengnan Tai，Hui Hu，Hao Yuan，Chong Wang and Jianglin Zhang: Investigation, Resources, Data curation, Resources, Writing-review & editing. Revising the paper. Hongyang Fu and Jianglin Zhang: Software, Revising the paper.

Corresponding authors

Correspondence to Hongyang Fu or Jianglin Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1 (download DOCX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, Z., Tai, M., Hu, H. et al. Explainable multimodal AI for skin lesion risk prediction via 3D imaging and clinical data. Sci Rep 15, 45139 (2025). https://doi.org/10.1038/s41598-025-33536-z

Download citation

Received: 22 September 2025
Accepted: 19 December 2025
Published: 22 December 2025
Version of record: 29 December 2025
DOI: https://doi.org/10.1038/s41598-025-33536-z