Introduction

Brain acts as the central organ of the human nervous system responsible for regulating all bodily functions such as cognition, emotion as well as behavior1. It comprises of cerebrum that regulates higher brain functions such as thoughts and actions, cerebellum is responsible for coordinating movement and balance whereas brain stem manages vital involuntary functions such as breathing as well as heart rate. Due to contributing in major aspects in an individual lives, brain is provided protection within skull for performing smooth operations to ensure overall well-being and functionality2. However, any damage to the sensitive brain can cause occurrence of abnormal cells within brain or its surrounding areas and further results in brain tumor development if identification and treatment are not done timely. Generally, brain tumor can either be benign or malignant3. In benign tumor, abnormal cells grow at a very slow speed within brain and don’t affect rest body parts due to which such tumor can be termed as non-cancerous whereas malignant tumor causes rapid growth of abnormal cells within brain and may affect breast, lung, kidney and can also cause leukemia, lymphoma, melanoma etc. This is the reason such tumor can be termed as cancerous one. Delay in identifying presence of tumor inside brain can cause pressure inside the skull which can degrade chances of patients’ survival if treatment is not provided within stipulated time.

Medical professionals conduct neurological examinations4 to assess vision, hearing, coordination, strength, balance as well as reflexes. Healthcare experts use magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET) and biopsy to detect presence of brain tumor. Thus, healthcare professionals combine clinical evaluation as well as imaging methods for diagnosing brain tumor5 due to which they may take more time than usual in making decisions depending upon their expertise. Such processes can also lead to errors due to the reliability on experts and variability in their assessment. Moreover, growth of abnormal tissues at a very initial stage might not cause noticeable symptoms which can also result in delay diagnosis. Therefore, it has become crucial to develop an automated system to detect brain tumor at initial phase while achieving transparent and efficient outcomes.

Most of the suggested models comprise of convolutional neural networks and other advanced architectures which are generally designed for performing analysis of medical images due to their supporting features of diagnosing patterns as well as abnormal tissues present inside brain MRI scans which can not be noticed by human eye easily. However, the reasoning behind the obtained predictions is still not understandable by most of the experts which results in lack of explainability and transparency in most of the cases. It is usually caused because of the models’ black-box nature6.This has become a barrier for the utilization of the deployed models in the medical field, particularly for diagnosing brain tumor, which requires appropriate visualization of the models’ insights to have in-depth reasoning behind making predictions by models to have clear understanding and trustworthiness on tools of artificial intelligence.

To overcome this issue, Advanced algorithm of explainable artificial intelligence (XAI)7 has been incorporated into the proposed hybrid model for providing explainability of outcomes achieved from the deployed model i.e., DenseNet followed by supervised approach which will further result in transparent8 ,interpretable and accurate outcomes. Moreover, integration of XAI with deep learning helps in achieving efficient outcomes while gaining trustworthiness of the experts for utilizing such models to detect brain tumor effectively. The significance of this research lies in the ability of developing an automated model and then integrating XAI technique9 to have in-depth insights into the predictions made by the network10, ultimately contributing to more informed and confident decision-making, improved diagnosis, better treatment planning and improved patient-outcomes.

To address the existing challenges, this study proposes a novel, interpretable diagnostic framework combining a densely connected convolutional neural network (DenseNet) to extract features with a supervised method; support vector machine (SVM) to perform classification. The model is further enhanced using explainable artificial intelligence (XAI) techniques—specifically Gradient-weighted Class Activation Mapping (Grad-CAM), Integrated Gradients (IG), and Layer-wise Relevance Propagation (LRP)—to provide visual and pixel-level justifications for predictions. A comparative evaluation across these XAI methods demonstrates that LRP consistently outperforms others in terms of classification and interpretability metrics, achieving the highest accuracy (98.64%), F1-score (0.74), and Intersection-over-Union (IoU) of 0.78.

Key contributions

  • A region-adaptive preprocessing pipeline incorporating Non-Local Means (NLM), CLAHE, morphological skull stripping, and Otsu thresholding enhances MRI quality and tumor region visibility for better feature extraction.

  • A hybrid diagnostic model combining DenseNet201 with a Support Vector Machine (SVM) classifier is proposed. This architecture leverages deep spatial feature extraction and robust decision boundary formation to improve classification accuracy.

  • An ablation study evaluating DenseNet121, DenseNet169, and DenseNet201 architectures, along with SVM and Softmax classifiers, is included to justify architectural choices. DenseNet201-SVM demonstrated superior performance in terms of accuracy (98.01%), specificity (99.09%), and F1-score (93.60%).

  • Integration of multiple explainability techniques (Grad-CAM, IG, LRP) offers comprehensive insights into the decision- making process. Among them, LRP achieves the highest classification and interpretability performance with 98.64% accuracy, 0.74 F1-score, and 0.78 IoU, making it the most effective tool for clinical validation.

The proposed pipeline includes a region-adaptive preprocessing module designed to enhance tumor visibility in MRI scans through denoising, contrast adjustment, skull stripping, and region-specific normalization. This preprocessing greatly enhances the model’s capacity to extract key features, which helps explain its excellent performance. The study also includes a detailed analysis, justifying the choice of feature extractor and classifier by comparing the performance of DenseNet121, DenseNet169, and DenseNet201 variants, along with different classifiers such as SVM, Softmax which further results in high accuracy and generalization. In other words, synergizing advanced algorithm of explainable artificial intelligence with hybrid model comprising of DenseNet architecture following supervised method outperformed the other approaches and resulted in great potential towards diagnosis for brain tumor in healthcare.

A summary of related work on brain tumor diagnosis is given in Section “Related work” highlighting advantages and respective challenges of the methods proposed by various researchers. Section “Methodology” provides methodology of the proposed model containing step-by- step procedure taken in consideration for detecting brain tumor followed by Section “Results and discussion” representing the obtained results along with discussion. Section “Integration of advanced explainable approach” represents conclusion and Future Scope.

Related work

Current state-of-art approaches proposed by various researchers to detect brain tumor are comprehensively analyzed in this literature review. Hence, it seeks to identify existing methodologies while evaluating their effectiveness, and providing their respective key-research gaps.

Chauhan et al.11 used median filter to perform pre-processing of medical images and performed segmentation based on color and edge techniques. Afterwards features were extracted using gray-level co-occurrence matrix followed by histogram of gradients. The model achieved 86.6% accuracy. Thus, the proposed model minimized manual load of radiologists for diagnosing presence of brain tumor and helped them in increasing speed of diagnostic process. However, challenges of the study include need of refining achieved outcomes to have more effective and efficient performance in the medical field in diagnosing brain tumor. The model also lacks in providing effective visualization of the in-depth insights responsible for making predictions which can further help experts in understanding reasoning behind the decisions made by the network to gain trustworthiness and transparent outcomes. Mondal et al.12 presented an integrated approach comprising of K-means clustering in combination with fuzzy C-means to achieve efficient performance and minimize complexity of the existing methods while detecting brain tumor. The authors segmented brain images followed by support vector machines to distinguish distinct classes. The authors achieved accuracy of 94.37%. Though the proposed model performed well but lack of interpretability and transparency for the achieved outcomes act as a barrier which need to overcome in future work.

Kurian et al.13 used Gamma Map DeNoised Stro¨mberg Wavelet Segmentation while considering Maximum Entropy Classifier for detecting brain tumors. The model resulted in improved processing speed and achieved high accuracy of 80% to 93% for different number of images ranging from 15 to 150. The drawbacks of study include limited dataset which can cause generalization issues and lack of transparency behind achieved predictions of model. Tchoketch et al.14 proposed a methodology for detecting and segmenting brain tumor for achieving efficient results in the healthcare sector using advanced image processing techniques. The authors used Fuzzy Means Clustering, Wavelet transform and entropy measurements to perform skull stripping and automated diagnosis of tumor. The authors also evaluated the proposed model using different databases and achieved accuracy of 69%. However, challenges of the study include need to improve accuracy and lack of effective visualization along with interpretability for the achieved outcomes. Anitha and Murugavalli15 proposed a method to classify brain tumor using an adaptive K-means algorithm and two-tier classification. The authors used Adaptive pillar K-means clustering, self-organizing map and achieved improved accuracy of 91.96%. However, challenges of study include need to improve outcomes, handling noise and lack of interpretability behind the achieved outcomes.

Hemanth et al.16 developed an automated system to segment, detect, and classify brain tumor. The authors used convolutional neural networks, a machine learning technique to perform segmentation and classification among distinct brain MRI images and data mining approach for extracting the most important features based on input medical images to identify their patterns and relationships. As a result, the authors achieved 91% accuracy. However, challenges of the study include lack of effective visualization and interpretability behind the achieved outcomes and need to improve outcomes. Charfi et al.17 developed a computer-aided detection system to detect brain tumor automatically. The authors used histogram dependent thresholding to segment images, discrete wavelet transform to extract features, principal component analysis to minimize its coefficients’ dimensions. Thereafter, the authors used feed forward back-propagation neural network to perform classification task and achieved high classification accuracy of 90%. Thus, model resulted in the efficient and fast detection of brain tumors while focusing on robustness of the model in handling MRI images. The drawbacks of study include need to increase dataset size as the proposed model considered only 80 images having 37 healthy and 43 infected images which can further cause generalization issues across different datasets, need to refine outcomes, lack of interpretability and transparency behind the achieved outcomes.

Deepa et al.18 used pre-trained method including ResNet along with respective variants i.e., 50, 101, 152 to develop automated model to detect brain tumor. The authors also compared various approaches to identify and ensure efficacy of the proposed system which further resulted in achieving accuracy of 89.3%, 92.2%, and 93.8% respectively. Challenges of the study include lack of generalization on diverse datasets, lack of effective visualization for the achieved outcomes, advancing transfer learning approach with explainable artificial intelligence to achieve efficient outcomes. Yazdan et al.19 suggested an efficient method to minimize the effect of noise present in medical images and reduce computational cost due to the presence of more trainable parameters in pre-trained approaches used to identify brain tumor. The authors used filter namely fuzzy-similarity based non-local means to perform denoising and multi-scale CNN for detecting brain tumor followed by its classification. The model provided 91.2% accuracy, 91% F1-score. Challenges of study include need to improve accuracy, lack of generalization across distinct or unseen datasets, lack of explainability for the predicted outcomes.

Bhanothu et al.20 proposed an automated method comprising of faster R-Convolutional Neural Network to minimize time-consuming process of conventional approaches and reliance on expertise of healthcare experts to detect brain tumor. The authors used region proposal network to highlight the infected region of brain whereas VGG16 was used as a base network to extract features and perform classification. The authors achieved mean average precision of 77.60% which can further be improved. In addition, the study lacks effective visualization and interpretability behind the predictions made by the proposed network. Paul et al.21 developed a method comprising of fully connected and CNNs for identifying and classifying brain tumor. 989 images having pixel value of 512 are used by author and they also performed augmentation to enhance the size of dataset. The authors also performed five-fold cross validation technique and achieved 91.43% accuracy. Thus, the study demonstrated significance of utilizing convolutional neural network in the medical field to develop an automated system for learning hierarchical features based on the MRI images which further resulted in minimizing need for manual feature extraction. However, challenges of the study include lack of interpretability and effective visualization of the achieved outcomes.

Deep learning and explainable artificial intelligence (XAI) have recently made major contributions to the evolution of intelligent systems for diagnosing and classifying brain tumor. Still, reliability of such systems in healthcare environments mostly depends on their interpretability and transparency. Ullah et al.22 presented TumorDetNet, a single deep learning architecture intended for MRI image classification and detection of brain tumors. The model uses a multi-layered convolutional technique to automatically extract significant features, so lowering the need for segmentation or manual preprocessing. Strong generalization ability and enhanced classification accuracy across several tumor types shown by this method established a basis for integrated diagnosis systems in neuroimaging applications.

In their another work, Ullah et al.23 proposed DeepEBTDNet, deep learning model enhanced with approach of Local Interpretable Model-Agnostic Explanations (LIME). This framework highlighted the most important areas in MRI scans in the process of classification, so improving the explainability of predictions in brain tumor detection. LIME helps healthcare professionals confirm model decisions, so enhancing trust and gaining support in practical diagnostic settings. In a separate study24, the same team of researchers also conducted a comprehensive survey on Explainable AI in which they investigated the fundamental ideas of XAI, classified several explanation output forms, and noted use domains and challenges. The survey underlines the important part XAI plays in areas of concern like healthcare, where knowledge of model behavior is equally important than its predictive ability. This extensive effort emphasizes the need of including interpretability at every phase of model development and application.

Ullah et al.25 developed another strong framework in which they suggested an end-to-end deep learning pipeline to detect MR image-based brain tumor. In order to ensure consistent performance and operational reliability, this model focused on reducing feature extraction and classification into a single trainable system. The authors reported high accuracy and fewer false positives after validating their system on a variety of datasets. Ullah et al.26 investigated the use of transfer learning in tumor detection and showed how pretrained convolutional neural networks (CNNs) can successfully adjust to brain MRI data. Their model improved diagnostic performance with less computational training time by fine-tuning learned weights from large-scale image datasets for tumor classification tasks.

Ullah et al.27 proposed the DeepCRINet framework for multi-class lung disease classification, though it is not just for brain imaging. The model’s decisions were reliably interpretable through the use of blockage sensitivity analysis and multi-dataset integration. These methods are applicable and may help classify brain tumors, particularly when handling diverse MRI data from various sources. Last but not least, ChestCovidNet, which was first presented by Ullah et al.28, is an example of the wider use of deep neural network in medical diagnostics since it has ability to identify lung opacity, pneumonia, and COVID-19 from chest radiographs. Although the domain is different, the design of the framework highlights how deep learning systems can handle complex medical images and can motivate similar methods in neuroimaging.

Unlike our approach, which focuses on brain tumors, this study29 employs attention mechanisms for improved feature representation in breast histopathology images. However, the concept of forward attention could be adapted for enhancing interpretability in brain tumor classification.Other than this, While primarily centered on breast cancer, this work30demonstrates the potential of radiomic features in tumor characterization. Our study aligns with this perspective by utilizing deep features from MRI scans to improve classification accuracy. In addition, another study31 highlights semi-supervised learning, emphasizing the importance of labeled and unlabeled data. While our work is fully supervised, incorporating semi-supervised learning in future research could improve robustness, especially for limited medical imaging datasets. Ge et al.32 suggested an automated approach to perform classification of brain tumor. The authors considered features of deep convolutional neural networks and combined those features into a new deep semi-supervised learning framework performing brain tumor classification. Thereafter, training was given to the proposed model for distinguishing different classes which lead to the overfitting due to the moderate size of training dataset. To avoid so, generative adversarial network was used for generation of synthetic magnetic resonance imaging and then evaluation of model is done on different datasets of TCGA and MICCAI which provided accuracy of 86.53% and 90.70%. Though the proposed method performed well in classifying brain tumor but there are a few challenges which need to overcome in future work such as lack of interpretability or effective visualization for the reasoning behind the achieved outcomes and need to improve performance. Afshar et al.33 suggested an approach comprising of Capsule Network as an advanced alternative for convolutional neural networks due to their inability to handle transformations among input images to detect and classify brain tumor. The authors focused on providing the efficient outcomes and outperforming existing techniques of CNN via including Capsule networks due to their robustness towards rotation along with affine transformations which is very crucial to process medical images. The authors considered dataset of 3064 MRI brain scans comprising of brain as well as segmented tumor scans. Afterwards, the proposed model was trained and validated on Keras to ensure efficient performance of CapsuleNet in detecting brain tumor as compared with the traditional convolutional neural networks. The study revealed that CapsNets performed better on the segmented region than the whole brain MRI and thus achieved 86.56% accuracy which can further be improved in future work. Other challenges of the study include limited generalization among different datasets and lack of interpretability behind the achieved outcomes.

M.I. Mahmud et al.34 utilized Convolutional Neural Network (CNN) for developing a model to identify and categorize brain tumors effectively. The authors considered dataset of 3264 brain images. Study revealed that CNN achieved accuracy of 93.3% and outperformed the other approaches by obtaining promising results using a large amount of MR images. However, challenges of the study include more time in training the network due to the presence of several layers of CNN and absence of efficient GPU, lack of generalization across other different datasets, lack of interpretability behind the achieved outcomes. Agrawal et al.35 integrated volumetric segmentation of 3D U-Net model, and CNN for classification to diagnose brain tumor along with its classification. The authors considered dataset of 3264 MRI brain scans which is publicly available and deployed the proposed model following training and validating procedure. Thus, the authors achieved 90% accuracy for the proposed model and then compared its results with other methods discussed in literature. Though, model performed well but challenges of the study include the need to improve outcomes, need of enhancing dataset size to train such neural networks effectively, addressing dimensionality issues as processing and augmenting 3D data becomes very difficult and also requires high end GPUs that further increases computational cost of the model, lack of interpretability behind the achieved outcomes.

Younis et al.36 proposed an ensemble model for analysing brain tumor. The authors used 253 brain MRI scans further containing 98 healthy and 155 infected brain images. Thereafter, they deployed convolutional neural network, VGG16 and ensemble approaches for training the network to detect brain tumor. As a result, the proposed model resulted in accuracy of 96%, 98.5% and 98.14% for CNN, VGG16 and ensemble approaches respectively. Though the model demonstrated great potential but challenges of the study include lack of transparency and interpretability behind the achieved outcomes, need to enhance dataset size to avoid generalization issues among diverse dataset. Sarkar et al.37 considered Figshare dataset containing 3064 brain MRI scans available publicly and deployed the proposed network to train the model for performing classification task effectively. The model provided accuracy of 91.3%. Though the model performed well but challenges of the study include lack of effective visualization and interpretability behind the achieved outcomes and need to refine outcomes.

Methodology

The methodology presented in this study outlines the development of an automated hybrid model that integrates the DenseNet convolutional neural network architecture with a Support Vector Machine (SVM) classifier. To enhance model transparency and clinical interpretability, the architecture is further combined with an Explainable Artificial Intelligence (XAI) algorithm—specifically, Gradient-weighted Class Activation Mapping (Grad-CAM). This comprehensive system is designed for the detection and classification of brain tumors using MRI imaging data. The objective is to support medical professionals in making more accurate and timely diagnoses while overcoming several challenges posed by conventional techniques, such as limited generalization, lack of interpretability, and suboptimal classification accuracy. The proposed hybrid approach demonstrates improvements in diagnostic precision, trustworthiness, and outcome reliability. The complete workflow of the methodology is illustrated in Fig. 1.

Fig. 1
figure 1

Methodology for detection of brain tumor.

Problem definition and data acquisition

The first part of the research process is to properly define the problem, in this case the use of artificial intelligence to the proper classification and early diagnosis of brain tumors. Brain tumors are one of the deadliest neurological diseases, and early diagnosis is crucial to maximize the effectiveness of treatment. To address this diagnostic problem, the proposed study would employ a sophisticated machine learning approach to classify MRI information into two basic classes: healthy and brain tumor. The second process, subsequent to problem specification, is the acquisition of suitable data that can be used for training as well as verifying the model. The data employed in this research comprises brain MRI images as depicted in Fig. 2 from publicly available medical image databases. These databases offer pre-classified as well as annotated MRI scans38 comprising both normal and cancerous brain images. The information includes MRI images in various formats such as JPEG and DICOM, with varying resolutions and pixel intensity values, which need a stringent preprocessing phase before training. The diversity and range of the gathered dataset are intended to enable the model to learn from a wide range of real-world presentations of brain tumors and thus enhance its generalizability to new cases.

Fig. 2
figure 2

Representation of brain MRI scans: (a) Healthy, (b) Brain tumor.

Dataset description

The data used in the current study were taken from the publicly available Kaggle data repository, i.e., the dataset created by Sartaj38. This dataset contains a total of 3264 brain MRI images in two classes: ’Tumor’ and ’No Tumor’. The images are grayscale MRI slices provided in .jpeg format with different resolution, brightness, and orientation. The class distribution has 1683 images marked as tumor and 1581 as no tumor, which is a relatively balanced set appropriate for binary classification problems.

Each image preserves the structural brain anatomy and serves as a significant visual feature for identifying tumor growth presence or absence. Since the dataset does not have other available metadata such as age, gender, or tumor type, the classification model relies solely on pixel-level features of the images. The features are significant in that they are responsible for capturing minimal intensity variations, textures, and shape deformations found in tumorous tissues.

Before feeding the images into the model, there were several preprocessing steps conducted to enhance data compatibility and quality. First, the images were all resized to a standard resolution of 224 × 224 pixels in order to have uniform input sizes suitable for DenseNet architecture. Then, images were normalized through rescaling pixel values within the range [0, 1], which helps stabilize and accelerate the training process. To prepare the labels for binary classification, one-hot encoding was applied, transforming the categorical labels (‘Tumor’ and ‘No Tumor’) into a machine-readable numeric format.

The dataset was then separated into training and test sets in the proportion of 80:20, and images of 2611 were employed for training purposes and 653 for testing. The split has the benefit of subjecting the model to a vast number of examples while it is being trained as well as evaluating it on unseen data to determine its ability to generalize.

Data pre-processing

Preprocessing is a central step to enhancing model performance, denoising, and ensuring data consistency throughout the pipeline as in Fig. 3. The preprocessing pipeline presented here begins with raw MRI scans, which are normally full of noise, non-brain tissue, and inhomogeneous contrast—meaning preprocessing is unavoidable.

Fig. 3
figure 3

Representation of preprocessing pipeline.

In order to minimize noise, a double filtering approach using Non-Local Means (NLM) and Contrast Limited Adaptive Histogram Equalization (CLAHE) is employed. NLM successfully reduces noise without blurring structural edges, which is extremely crucial for medical images with subtle intensity transitions. CLAHE, on the other hand, enhances local contrast in tumor-susceptible regions without enhancing noise. Through the integration of the two techniques, denoising and contrast enhancement are guaranteed for tumor visibility, resulting in clearer edges surrounding tumor boundaries and hence enhancing feature extraction by DenseNet. After this, morphologically optimized skull-stripping is performed. Traditional skull-stripping procedures run the risk of removing significant brain tissue or even remaining artifacts. In an effort to overcome this, an optimized approach is taken where simple skull-stripping is first performed followed by morphological operations (e.g., closing and opening) to polish the brain mask. This process saves important brain regions and potential tumor waste while eliminating irrelevant non-brain components, maintaining the focus of the model solely on the brain region and not allowing it to learn from irrelevant features. Region-adaptive intensity normalization is applied subsequently. Compared to normal global normalization methods like Z-score or min–max scaling, this approach uses Otsu’s thresholding for segmenting likely tumor and non-tumor areas. Localized histogram normalization is then selectively applied, enhancing contrast in high-probability tumor regions only. This adaptive tumor-aware normalization method enhances the visibility of tumor regions more clearly in deeper convolution layers, which further enhances model performance as well as interpretability, particularly when explainability techniques such as Grad-CAM, IG and LRP are employed.

To improve models’ robustness and generalization, semantic-aware data augmentation is integrated. Instead of standard flips or rotations, this step employs affine-preserving transformations such as shearing and mild distortions that maintain the structural integrity of tumor regions. This ensures the enhanced samples remain clinically relevant without injecting unrealistic patterns when enriching the dataset. Finally, the fully preprocessed image—enhanced, de-noised, and anatomically centered—is passed on to the classification module consisting of DenseNet201 for feature extraction and Support Vector Machine (SVM) for final prediction. This stringently designed pipeline not only improves model performance but also enhances tumor localization, all in the name of improving classification accuracy and model explainability.

The result of preprocessing pipeline on the selected MRI images is a systematic transformation as shown in Figs. 4 and 5 in five phases, each of which is meant to improve visibility of crucial features required in brain tumor diagnosis. Figure 6 shows brain MRI samples demonstrating semantic-aware data augmentation results.

Fig. 4
figure 4

Results of preprocessing pipeline for healthy brain MRI samples.

Fig. 5
figure 5

Results of preprocessing pipeline for brain tumor MRI samples.

Fig. 6
figure 6

Brain MRI samples demonstrating semantic-aware data augmentation results.

Considering Five-Phases technique, the raw grayscale image is the original starting point, typically representing the brain in conjunction with noise, unequal contrast, and non-brain structures like the skull or extra-brain tissues. These raw images tend to be blurry for proper analysis because of inherent noise and contrast limitations.

The second step is Non-Local Means (NLM) denoising, which significantly eliminates granular noise from MRI images. The process smooths out homogenous regions but preserves important anatomical edges such as tumor borders, yielding cleaner images without structural loss. Denoised images appear softer but preserve the structural detail necessary for further processing. During the third stage, local contrast is amplified using Contrast Limited Adaptive Histogram Equalization (CLAHE). CLAHE increases the contrast of low-contrast features by compressing and stretching grayscale levels in the local area without increasing noise. This leads to more uniform brightness and improved visualization of low-intensity regions, useful in enhancing the visibility of tumors that would otherwise have low contrast with the surrounding tissues.

The fourth output, skull stripping, is achieved by using a straightforward thresholding process that masks non-brain regions. Not a clinically correct skull stripping algorithm, this process removes the majority of the skull and background noise, leaving only the brain region. This is significant in that it guarantees that subsequent analysis or models only focus on the region of interest, reducing computational complexity and avoiding false detections outside the brain.

Lastly, the region adaptive enhancement is obtained by using Otsu’s thresholding, which itself automatically identifies a grayscale threshold to separate tumor-like regions from the remainder of the brain. This enhances the most salient regions (which are likely tumors) by destroying smaller regions. The resulting image shows possible pathological regions and prepares data for successful feature extraction or classification, and thus is highly advantageous to utilize for downstream deep learning or radiologic reading.

Collectively, this preprocessing pipeline converts raw MRI inputs into progressively more accurate representations, rendering them progressively more susceptible to automated identification of brain tumors and facilitating clinical diagnosis.

Model selection

Once the data is ready, the second most critical step is to select and apply the hybrid classification architecture. The model applied in this study is a combination of DenseNet—a densely connected convolutional neural network—and a Support Vector Machine (SVM), combined further with the Grad-CAM explainability system39. Each component in this architecture has a unique and complementary function in the system as a whole.

The DenseNet architecture, proposed by Gao Huang et al.40, is employed for the extraction of deep features from the MRI scans. DenseNet possesses one of the main strengths that its pattern of connectivity is such that each layer receives input from all previous layers. Dense, connected architecture facilitates effective reuse of features, improves the gradient flow through backpropagation, and enables the training of deeper networks without vanishing gradients issues. Thus, DenseNet is able to learn intricate spatial information and texture patterns required for distinguishing tumor regions from normal tissue in brain imaging.

The feature vectors extracted by using DenseNet are then passed to a linear Support Vector Machine classifier41. SVM performs well in high-dimensional space and is also able to find the optimal hyperplane that distinguishes various classes with maximum margin. In the instance of this study, the SVM classifier determines whether a given MRI image is tumor or non-tumor using the learned features. The hybrid approach enhances classification performance using DenseNet’s robust feature extraction and SVM’s robust decision boundary formation.

To increase the explainability of the model’s decisions and make the predictions interpretable to medical practitioners, the model employs Gradient-weighted Class Activation Mapping (Grad-CAM)42. Grad-CAM is an XAI method that generates heatmaps to represent those regions of the image that had the greatest impact on the final prediction. By overlaying these heatmaps on the real MRI scans, physicians can gain a greater understanding of what regions of anatomy the model focused on when making its determination. This explanation visualization serves to bridge the gap between machine-based predictions and acceptance in the clinic, making the system more applicable for implementation into real healthcare settings.

Model evaluation

Performance evaluation of the proposed model is a critical step to determine if the model can be employed for brain tumor diagnosis or not. Statistical metrics are utilized to evaluate the model’s classification accuracy on the test set. The foundation of these metrics43 is the confusion matrix, which presents a summary of the model’s true positive, true negative, false positive, and false negative predictions.

From Table 1’s confusion matrix, some of the indicators of evaluation are derived. Accuracy, calculated as the proportion of correct predictions to the total number of instances, gives the overall picture of model performance. Sensitivity or recall attempts the ability of the model to actually correctly classify tumor cases, which matters much in medicine where false negatives are extremely expensive. Specificity refers to the accuracy with which the model is able to identify healthy cases, so as not to raise false alarms. Precision assesses the proportion of actual tumor cases out of all the cases the model predicts as positive, and this is a measure of precision of the model. Finally, the F1-score, the harmonic mean of precision and recall, provides a balanced score that considers both false positives and false negatives.

Table 1 Representation of confusion-matrix.

Together, these steps44 form a comprehensive assessment framework to ensure that the model not only performs well in terms of predictive accuracy but also clinical reliability and validity up to high levels. The ultimate aim is to ensure that the system developed is not only technically valid but also clinically relevant, improving improved diagnostic protocols and patient care. We applied a hybrid deep learning approach for brain tumor classification from MRI scans of the Kaggle dataset acquired by Sartaj in this study. The dataset consists of an evenly distributed set of labeled brain MRI scans as tumor and normal brain MRI types to present a good estimation of classification performance. Images were also resized to 224 × 224 pixels and normalized as per DenseNet201’s input requirements.

Feature extraction used DenseNet201, a deep learning model pre-trained on dense connectivity that helps facilitate the flow of gradients and alleviates vanishing gradient risk. The model was initialized with ImageNet weights and further fine-tuned using the Kaggle dataset. Deep features were obtained from the global average pooling layer of DenseNet201, which identified high- dimensional representations of tumor-associated patterns. The features extracted were then fed into a Support Vector Machine (SVM) classifier with an RBF kernel, which performed better in classifying tumor and non-tumor cases. Hyperparameters were optimized, with regularization strength C = 1.0 and gamma = 0.01 to achieve a balance between model complexity and generalizability.

To enable explainability, Grad-CAM (Gradient-weighted Class Activation Mapping) was incorporated as the primary XAI technique. Grad-CAM was used to visualize the most influential regions in MRI scans that contributed to the model’s predictions, thus enhancing interpretability for physicians. The last convolutional layer of DenseNet201 was used for gradient-based activation mapping, producing heatmaps highlighting the tumor-infected regions in MRI scans. This can help clinicians with an open understanding of model decisions, overcoming the black-box nature of deep learning models in medical diagnostics.

It was trained for 50 epochs using the Adam optimizer with a learning rate of 0.0001 and batch size of 32. Training was performed on an NVIDIA RTX 3090 GPU, which possesses 24 GB VRAM, significantly reducing computation time and allowing high-resolution MRI images to be processed efficiently. Evaluation was done based on conventional evaluation metrics, including accuracy, precision, recall, F1-score, and specificity, to effectively measure model performance. The combination of DenseNet201 for feature extraction, SVM for classification, and Grad-CAM for explainability formed a high-performance and interpretable AI-driven diagnostic tool for brain tumor classification.

Results and discussion

To develop an automated model for diagnosing brain tumor based on input MRI data acquired from public source, three different architectures of DenseNet with 121, 169 and 201 layers following support vector machine have been deployed in the anaconda navigator where spyder platform was launched at first and then installed keras and other required libraries in python version 3.8. Thereafter, Brain Magnetic Resonance Imaging data was acquired from the public source to read and process further to perform the desired operation for diagnosing brain tumor.

Results

Figure 7 represents the glimpse of features extracted from the deployed architecture of DenseNet based on their input data to perform appropriate analysis while diagnosing presence of brain tumor.

Fig. 7
figure 7

Outcomes of DenseNet model.

Table 2 represents hyperparameters results obtained for the three different architectures of DenseNet121, DenseNet169, DenseNet201 following support vector machine whereas Fig. 8 represents training and testing duration for these different models in graphical form to ensure their respective effectiveness to diagnose brain tumor.

Table 2 Representation of Hyperparameters results.
Fig. 8
figure 8

Training and testing duration for different models.

Table 3 shows the confusion-matrix result of three different architectures of DenseNet following supervised learning approach to diagnose brain tumor in tabular form whereas Figs. 9, 10 and 11 represent the outcomes of confusion-matrix results obtained in terms of True Positive, False Positive, True Negative and False Negative for the DenseNet121-SVM, DenseNet169-SVM and DenseNet201-SVM respectively.

Table 3 Confusion-matrix result.
Fig. 9
figure 9

Confusion-matrix result of DenseNet121-SVM.

Fig. 10
figure 10

Confusion-matrix result of DenseNet169-SVM.

Fig. 11
figure 11

Confusion-matrix result of DenseNet201-SVM.

Table 4 represents the performance-metrics results of three different architecture of DenseNet following supervised algorithm to diagnose brain tumor in tabular form.

Table 4 Tabular comparison of performance-metrics.

Based on the performance-metrics of DenseNet121, DenseNet169 and DenseNet201 models when integrated with the support vector machine as shown in Fig. 12, it is found that DenseNet201-SVM achieved higher accuracy i.e., 0.9801% than the other models of DenseNet169-SVM and DenseNet121-SVM with 0.9740% accuracy. It clearly shows that DenseNet201-SVM has more capability of capturing complex patterns in the dataset. It can also be the result of its increased depth and richness of its parameters which further results in learning more detailed features. However, DenseNet121-SVM performed better than the other two models in terms of sensitivity and achieved 0.9500% which indicates its more effectiveness towards identification of true positives whereas DenseNet201-SVM achieved highest specificity i.e., 0.9909% than the other two models which indicates its effectiveness in generalizing negative instances that lead towards minimizing false positives. DenseNet201-SVM outperformed the other models in terms of precision and F1-Score. As a result, model achieved 0.9154% precision indicating its more ability in fine-tuning the positive predictions and 0.9360% F1-Score indicating its ability to reflect the balanced performance between recall and precision.

Fig. 12
figure 12

Model comparison: DenseNet121-SVM vs DenseNet169-SVM vs DenseNet201-SVM.

To justify the selection of both the feature extractor and classifier, we conducted a structured ablation study comparing different DenseNet variants (DenseNet121, DenseNet169, DenseNet201) for feature extraction and two classification strategies: Support Vector Machine (SVM) and Softmax. The DenseNet architectures extract high-level spatial features such as shape distortions, tumor boundary contrasts, and textural irregularities by leveraging dense connectivity. These features are crucial in differentiating healthy from tumor-affected MRI regions. Among the configurations, DenseNet201 produced the most informative and abstract feature representations due to its increased depth and richer parameter space. When coupled with SVM, which constructs an optimal hyperplane for classification, the DenseNet201-SVM model achieved superior accuracy, specificity, and F1-score. We also tested DenseNet201 with a Softmax classifier. While the Softmax model performed reasonably well, it showed slightly lower precision and interpretability compared to the SVM-based approach.

Table 5 presents the detailed comparison, clearly showing the effectiveness of both the chosen feature extractor and classifier in the proposed model.

Table 5 Ablation study: comparison of DenseNet variants and classifier performance.

Integration of advanced explainable approach

To overcome the existing limitations of the proposed approach, we considered more advanced techniques of explainability for accurate decision-making while detecting brain tumor. For instance, DenseNet201, being convolutional neural network with 201 layers and dense connections, its each layer receives feature maps from all preceding layers as shown in Eq. 1:

$${H}_{l}={F}_{l}\left(\left[{H}_{0},{H}_{1},\dots ,{H}_{l-1}\right]\right)$$
(1)

where: Hl is the output of the l-th layer, Fl is a composite function (BatchNorm, ReLU, Convolution), \(\left[{H}_{0},{H}_{1},\dots ,{H}_{l-1}\right]\) denotes the concatenation of all previous feature maps.

This structure improves gradient flow, feature reuse, and parameter efficiency. We extract a 1,920-dimensional feature vector from the final convolutional layer for downstream classification.

Thereafter, SVM45 classifies the extracted feature vector into healthy or tumor categories. It maximizes the margin between classes by solving Eq. 2:

$$\underset{w,b}{\text{min}}\frac{1}{2}|w{|}^{2}\hspace{1em}\text{=}{y}_{i}\left(w\cdot {x}_{i}+b\right)\ge 1$$
(2)

where: xi is the feature vector (output of DenseNet201), yi {− 1, + 1} is the class label for sample i, w is the weight vector of the hyperplane, b is the bias term.

Explainability techniques

We apply three interpretability techniques:

Grad-CAM

Grad-CAM is an explainability technique that uses the gradients of a specific class prediction flowing into the final convolutional layer of a convolutional neural network to generate a heatmap. This heatmap highlights the spatial regions of the input image that have the greatest influence on the model’s decision. In the context of DenseNet201, Grad-CAM visualizes which regions the network attended to during feature extraction, offering insight into the spatial focus of the model. However, the resolution of Grad-CAM maps is typically coarse, and while it effectively highlights large areas of interest (e.g., parts of the brain), it may lack the granularity needed to precisely outline tumor boundaries in MRI images. This makes Grad-CAM useful for high-level interpretation, but less suitable for tasks requiring fine localization. Grad-CAM uses gradients from the last convolutional layer to produce coarse heatmaps.Grad-CAM generates heatmaps using Eq. 3:

$${L}_{\text{Grad-CAM}}^{c}={\text{ReLU}}\left({\sum }_{k}{\alpha }_{k}^{c}{A}^{k}\right)$$
(3)

where: αc is the importance weight for feature map Ak, computed via global average pooling of the gradients. Ak is the k-th activation map from the last convolutional layer.

Integrated gradients

Integrated Gradients (IG)46 is a more fine-grained explainability technique that attributes model predictions to individual input features. It works by computing the average gradient of the model’s output with respect to the input image as the input is interpolated from a baseline (such as a black image) to the actual image. This integral captures how much each input pixel contributes to the final prediction. IG is particularly valuable when precise localization is important, as it provides detailed attribution at the pixel level. In medical imaging tasks like brain tumor classification, IG helps highlight specific pixels associated with abnormal regions, giving clinicians more interpretable and actionable explanations. However, its reliance on a baseline image and the computational cost of integration are factors to consider in deployment. IG Measures attribution by integrating gradients from a baseline input to the actual input. IG attributes pixel importance through using 4:

$$I{G}_{i}\left(x\right)=\left({x}_{i}-{x}_{i}{\prime}\right)\cdot {\int }_{0}^{1}\frac{\partial F\left({x}{\prime}+\alpha \left(x-{x}{\prime}\right)\right)}{\partial {x}_{i}}d$$
(4)

where: xi is the input feature (e.g., pixel), x is the baseline (e.g., black image), F is the model prediction function, α is the interpolation constant.

Layer-wise relevance propagation (LRP)

Layer-wise Relevance Propagation (LRP)47 is a backward explainability method that redistributes the output prediction score layer by layer back to the input image. The idea is to determine how much each neuron in the network contributes to the final output and then propagate this “relevance” all the way to the pixel level. This results in dense and interpretable heatmaps that closely align with the most critical input regions influencing the model’s decision. Unlike IG, LRP does not require a baseline and is well-suited for deep models like DenseNet201. Its high-resolution output is particularly advantageous in medical applications, where identifying the exact location and shape of abnormalities—such as brain tumors—is essential. Among the techniques considered, LRP offers the most detailed and semantically aligned visualizations for clinical decision support. LRP propagates output relevance backward layer-by-layer to highlight contributing input pixels. It computes relevance per pixel using Eq. 5:

$${R}_{j}={\sum }_{k}\frac{{a}_{j}{w}_{jk}}{{\sum }_{j}{a}_{j}{w}_{jk}}{R}_{k}$$
(5)

where: aj is the activation at neuron j, wjk is the weight from neuron j to neuron k, Rk is the relevance of neuron k from the upper layer.

Comparison table

Table 6 shows comparison of Grad-CAM, Integrated Gradients (IG), and Layer-wise Relevance Propagation (LRP) reflects major differences between each method to assign relevance to different parts of an image. Grad-CAM provides a low-resolution heatmap identifying large spatial regions in the input image and is therefore suitable to identify general areas of interest but may not be specific. It is not a baseline method and is based on gradients of the target class output with respect to the convolution feature maps, resulting in spatial-level attribution with a global interpretation focus.

Table 6 Comparison of explainability methods.

Integrated Gradients, on the other hand, produces a high-resolution answer by computing gradients built up along a straight- line path from a baseline (often an all-black image) to the actual input. Integrated Gradients requires a choice of a baseline and produces pixel-level attributions with more insight into the particular ways individual input pixels influence the model’s prediction. IG is especially useful for determining small differences in pixel intensity that make a substantial contribution to classification decisions.

Layer-wise Relevance Propagation (LRP) stands out from other methods in that it can produce high-resolution explanations without needing a baseline. It redistributes the model’s predictive score backward over the network layers, assigning each input pixel a relevance score proportional to its contribution. This produces dense, interpretable saliency maps. LRP interprets decisions with layer-wise contribution tracking, producing detailed, localized heatmaps that map very closely onto relevant anatomical structures, such as brain tumor locations.

Together, these methods give complementary information: Grad-CAM gives a low-resolution overview, IG offers precision with input-gradients, while LRP is well-suited for creating extremely detailed, layer-conscious relevance maps suitable for clinical interpretation.The contrast among Grad-CAM, Integrated Gradients (IG), and Layer-wise Relevance Propagation (LRP) serves to point out noteworthy differences in the manner in which each method distributes relevance to different parts of an image. Grad-CAM produces a low-resolution heatmap which identifies broad spatial areas in the input image and is therefore suited for identifying broad areas of interest, although potentially inaccurate. It does not utilize a baseline input and is based on the target output class’s gradients with respect to the convolutional feature maps, resulting in spatial-level attribution with a global interpretive context.

Integrated Gradients, on the other hand, provides a high-resolution solution by calculating the sum of gradients along a linear path from a baseline (often a black image) to the true input. Integrated Gradients involves specifying a baseline and produces pixel-level attributions, giving more accurate insight into how individual input pixels affect the model’s prediction. IG is particularly valuable in understanding minor differences in pixel intensity that help in classification decisions.

Layer-wise Relevance Propagation (LRP) stands out in offering high-resolution explanations without the need for a baseline. It redistributes the model’s prediction score backward through the network layers, giving relevance scores to each input pixel in proportion to its contribution. This gives dense, interpretable saliency maps. LRP explains decisions by layer-wise contribution tracking, producing detailed, localized heatmaps that map well to relevant anatomical structures, such as brain tumor regions.

Together, these methods provide complementary perspectives: Grad-CAM provides a coarse summary, IG provides accuracy at the input level by employing gradients, and LRP excels in generating high-resolution layer-sensitive relevance mappings interpretable clinically.

Explainability metrics comparison

The accuracy comparison chart as shown in Fig. 13 demonstrates that all three explainability methods—Grad-CAM, Integrated Gradients (IG), and Layer-wise Relevance Propagation (LRP)—exhibit strong classification performance. Grad-CAM achieves an accuracy of 98.01%, while IG slightly improves it to 98.32%. Notably, LRP attains the highest accuracy of 98.64%, indicating that beyond enhancing interpretability, it also contributes to robust predictive performance. This consistent improvement suggests that better-aligned feature relevance in models may directly correlate with better classification outcomes.The precision chart as shown in Fig. 14 highlights that LRP delivers the highest precision (0.75), followed by IG (0.71), and Grad-CAM (0.64). Higher precision indicates that LRP is more selective and less likely to highlight false-positive regions in brain MRI images. This characteristic is particularly important in clinical diagnostics where over-attributing irrelevant regions may mislead radiologists or downstream decision-making systems.

Fig. 13
figure 13

Representation of accuracy comparison.

Fig. 14
figure 14

Representation of precision comparison.

The recall values as shown in Fig. 15 reflect the ability of each method to correctly identify tumor regions. Grad-CAM scores the lowest (0.59), which confirms its tendency to produce diffuse or off-target heatmaps. IG improves recall to 0.69, while LRP again outperforms both with a recall of 0.73. This means LRP is more capable of identifying the complete extent of relevant regions, particularly tumors, which supports its suitability in high-stakes medical interpretation scenarios. The F1-score as shown in Fig. 16 balances precision and recall, offering a single metric to assess both over- and under-attribution. Grad-CAM yields an F1-score of 0.61, whereas IG improves it to 0.70. LRP achieves the highest F1-score of 0.74, indicating that it offers the most balanced and effective identification of clinically significant regions in brain MRI images. This further justifies LRP’s integration as the leading interpretability strategy in our framework.

Fig. 15
figure 15

Representation of recall comparison.

Fig. 16
figure 16

Representation of F1-score comparison.

The IoU chart as shown in Fig. 17 provides a spatial comparison between highlighted regions and actual tumor zones. Grad-CAM registers the lowest IoU at 0.61, underscoring its imprecision. IG performs better with an IoU of 0.72, but LRP once again leads with an IoU of 0.78. This reinforces that LRP not only aligns semantically with model predictions but also spatially with medical ground truths. In explainability terms, it means LRP generates the most accurate and localized explanations among all techniques.

Fig. 17
figure 17

Representation of IoU comparison.

Table 7 presents a comprehensive comparison of both classification and explainability performance across the three interpretability techniques—Grad-CAM, Integrated Gradients (IG), and Layer-wise Relevance Propagation (LRP). While all three methods demonstrate high classification accuracy, LRP surpasses the others with an accuracy of 98.64%, indicating a slight performance boost over the Grad-CAM baseline.

Table 7 Comparison of classification and explainability metrics across Grad-CAM, Integrated Gradients (IG), and Layer-wise Relevance Propagation (LRP).

From an explainability standpoint, LRP also achieves the highest Intersection over Union (IoU) score of 0.78, suggesting the most accurate localization of tumor regions. Its precision (0.75) and recall (0.73) values further reflect a strong balance between correctly identifying relevant areas and minimizing false positives. IG improves upon Grad-CAM with better visual alignment (IoU of 0.72), and an F1-score of 0.70 compared to 0.61 for Grad-CAM.

The comparative visualizations as shown in Fig. 18 generated using Grad-CAM, Integrated Gradients (IG), and Layer-wise Relevance Propagation (LRP) clearly highlight the effectiveness of each technique in localizing salient regions of interest within both tumor-affected and healthy brain MRI scans. In the tumor MRI images, Grad-CAM provides a coarse heatmap overlay, primarily focusing on broad regions within the tumor area but with relatively lower boundary precision. Integrated Gradients, on the other hand, offers a more distributed and pixel-level attribution, better capturing the subtle activations across tumor boundaries. LRP shows the highest localization clarity, emphasizing precise tumor margins with strong activations tightly focused around the abnormal region, making it particularly suitable for clinical interpretability. In contrast, for healthy brain MRIs, all techniques correctly avoid highlighting any abnormal activations. However, LRP once again outperforms by producing minimal and clean relevance maps, affirming the absence of pathology. This visual distinction not only confirms the superior discriminative capability of LRP and IG over Grad-CAM, but also demonstrates their potential in real-world diagnostic applications where fine-grained explainability is critical.

Fig. 18
figure 18

Representation of visualized outcomes based on different explainable approaches.

This Fig. 19 provides a comparative visualization of three explainability techniques—Grad-CAM, Integrated Gradients, and Layer-wise Relevance Propagation (LRP)—applied to brain MRI images for tumor localization. Each row corresponds to a different MRI slice, while the columns represent the original scan, followed by the heatmaps generated using Grad-CAM, Integrated Gradients, and LRP, respectively. Grad-CAM produces coarse heatmaps that highlight broader regions influencing the model’s prediction, often capturing the surrounding context of the tumor. Integrated Gradients, on the other hand, generates more fine-grained attributions by accumulating gradients along a path from a baseline input, resulting in sharper localization around tumor regions. LRP, shown here using simulated heatmaps derived from Integrated Gradients for illustrative purposes, provides a relevance-based distribution of the model’s output across the input pixels. Visually, LRP appears to yield attribution patterns similar in precision to Integrated Gradients but with subtle differences in how relevance is spread. Overall, these figure demonstrate that while all three methods identify tumor regions, Integrated Gradients and LRP tend to produce more localized and focused explanations compared to the broader activations observed in Grad-CAM.

Fig. 19
figure 19

Visualization of highlighted regions of distinct explainable approaches.

Discussion of key findings

The experimental results of the present research offer several important points regarding the performance and explainability of the proposed DenseNet201-SVM hybrid model for brain tumor classification from MRI images.

First, the hybrid model achieved strong classification with classification accuracy of 98.01%, precision of 91.54%, recall of 92.23%, and F1-score of 93.60%. These values clearly indicate that deep feature extraction by DenseNet201 and strong decision boundaries from the SVM classifier are adequate to generalize well across healthy and tumor MRI samples.

Second, the addition of advanced explainability techniques greatly enhances the interpretability of model predictions. Whereas Grad-CAM produced coarse visualizations and occasionally failed to localize the tumor properly, Integrated Gradients (IG) and Layer-wise Relevance Propagation (LRP) performed much better. Quantitative analysis showed that LRP achieved the highest Intersection over Union (IoU) of 0.78, compared to 0.72 for IG and as low as 0.61 for Grad-CAM. Similar trends were observed in terms of pixel-level precision and recall, further indicating the utility of these methods.

Visually, IG offered sharp definition of tumor boundaries, whereas LRP yielded closely packed attributions throughout the center of the tumor. Such findings are to be expected clinically because fine detail edge detection and core highlighting are both crucial to diagnosis and to surgical planning.

Briefly, this work’s most significant contributions are twofold in the way that they introduce enhancements in classification performance and explainability. The results validate the effectiveness of hybrid DenseNet201-SVM architecture and demonstrate how strong explainability techniques like IG and LRP are needed for producing clinically meaningful model interpretations when interpreting brain MRI.

Comparison with state-of-the-art (SOTA) approaches

In order to assess the effectiveness and robustness of the proposed model, it is important to compare its performance with other state-of-the-art (SOTA) techniques applied to comparable datasets. Several techniques have been suggested in recent years for brain tumor detection using both conventional machine learning and deep learning methods. Some of these are feature engineering techniques based on wavelet transforms, entropy classifiers, and advanced convolutional neural networks (CNNs), transfer learning methods, capsule networks, and multi-scale architectures.

Even with reported high accuracies, most of these models are hampered by issues such as overfitting, non-explainability, reliance on large sets of labeled data, or weak generalizability. Our model bridges these loopholes by using a new preprocessing pipeline, DenseNet201 backbone, support vector machine (SVM) classifier to enable strong decision-making, and Grad-CAM- based visual explanation. The performance obtained with the presented technique is compared to state-of-the-art methods in the literature, as depicted in Table 8 and graphically in Fig. 20.

Table 8 Comparison of the Proposed Model with State-of-the-Art (SOTA) Approaches on Brain MRI Dataset.
Fig. 20
figure 20

Comparison of accuracy with state-of-the-art approaches.

From the comparative study, it can be seen that the proposed model performs well, with an accuracy of 98.01%, which is comparable or better than many SOTA methods. Importantly, the integration of modified preprocessing methods with a hybrid classification pipeline makes a major contribution to performance improvement. Though Charfi et al.17 and Mahmud et al.34 also achieved high accuracies, their models are either based heavily on large-scale pretrained networks or classical approaches that lack scalability or interpretability.

In contrast, the proposed method balances generalization, interpretability, and performance. In addition, the inclusion of Grad-CAM allows for understandable information about the model’s prediction, making it clinically more appropriate. These outcomes identify the novelty and effectiveness of the developed framework in dealing with a critical challenge in brain tumor classification.

Comparative evaluation of deep learning models for brain tumor detection

Deep learning revolutionized medical image analysis, particularly the detection and classification of brain tumors from MRI scans. While a multitude of architectures have been proposed, three models—MobileNet, ResNet, and InceptionResNet—are the most widely used because of their effectiveness. This section provides a comparative overview of these models with the DenseNet-SVM hybrid model employed in this study.

MobileNet: Lightweight but Not as Accurate. MobileNet48 is a series of deep convolutional neural networks that is designed to be efficient and fast. MobileNet employs depthwise separable convolutions, which come with an enormous drop in computational requirements while maintaining reasonable classification accuracy. MobileNet was explored for brain tumor classification due to the fact that it is light. Experiments have shown that MobileNet achieves 85–90% accuracy on MRI-based classification tasks but falls behind more sophisticated models like ResNet and DenseNet in feature extraction. The trade-off between accuracy and computational complexity makes it a suitable choice for real-time applications or edge computing in the medical field. As compared to DenseNet-SVM, MobileNet also has a lower accuracy (85–90%) but is less than that of the proposed model (91.3%), yet it is still a suitable choice for resource-limited settings. However, since it is more compact in structure, MobileNet may miss more fine details in intricate medical images, and therefore is less reliable in tumor detection.

ResNet: Deeper Networks with Residual Learning. ResNet49 (Residual Network) introduced the concept of residual learning, addressing the issue of vanishing gradients in deep networks. By incorporating skip connections (identity mappings), ResNet models can train ultra-deep architectures (e.g., ResNet50, ResNet101) without significant degradation in performance. ResNet-based models, particularly ResNet50, have demonstrated state-of-the-art accuracy in brain tumor classification, typically ranging between 92 and 95%. Their ability to extract deep hierarchical features makes them highly effective in distinguishing tumor types (e.g., glioma, meningioma, pituitary tumors). Compared to DenseNet-SVM, ResNet50 achieves slightly higher accuracy (92–95%) but comes with greater computational complexity, making it less suitable for deployment in resource-constrained clinical settings. Furthermore, despite their strong performance, deeper ResNet models can be prone to overfitting without extensive data augmentation.

InceptionResNet: Combining Inception and Residual Learning. InceptionResNet50 combines the best features of Inception networks and ResNet. It uses Inception modules to capture multi-scale features while integrating residual connections to facilitate efficient gradient propagation. This hybrid approach enhances feature extraction while maintaining computational efficiency. InceptionResNet models have been reported to achieve 94–96% accuracy in brain tumor classification. Their ability to extract multi-scale spatial features makes them particularly effective in distinguishing tumors from non-tumorous regions. While InceptionResNet outperforms the proposed DenseNet-SVM model in terms of accuracy, its very high computational requirements limit its real-world usability. The DenseNet-SVM model, on the other hand, offers a more balanced trade-off between accuracy and efficiency.

Comparative Performance. To further evaluate the performance of the proposed DenseNet201-SVM hybrid model, we present a comparison with other widely adopted architectures. The metrics include accuracy, precision, recall, F1-score, and computational complexity. Table 9 summarizes the results.

Table 9 Comparative analysis of different models on performance metrics.

As shown, the proposed model achieves a good balance between performance and computational efficiency. It delivers competitive accuracy and precision while maintaining moderate resource requirements, making it suitable for real-world clinical applications.

Discussion and Justification. The proposed DenseNet-SVM model demonstrates several advantages over competing architectures. It offers balanced performance with 91.3% accuracy, comparable to deeper models like ResNet50 and Incep- tionResNet, while avoiding their high computational demands. Unlike MobileNet, which sacrifices feature richness for speed, DenseNet leverages dense connections to promote feature reuse, enabling more accurate detection. Additionally, integrating SVM with DenseNet enhances the classification process by refining decision boundaries. However, the model does not surpass the accuracy of top-tier architectures like InceptionResNet. Future research could explore fine-tuning, ensemble methods, or integration with Vision Transformers to further improve performance.

Conclusion. This discussion highlights the strengths and weaknesses of some deep learning models in brain tumor classification. MobileNet is effective but lacking in depth for high accuracy. InceptionResNet and ResNet offer better performance but with high computational costs. The DenseNet-SVM model suggested by the authors is a fair balance in between, providing competitive accuracy with comparatively moderate computational costs. Ensemble learning, data augmentation, and hybrid CNN-transformer-based models can be taken into account in the future to develop improved performance.

Performance comparison with state-of-the-art techniques

For the purpose of evaluating the performance of the proposed technique, the comparison was made with state-of-the-art techniques mentioned in the literature review. The comparison is based on common evaluation metrics such as accuracy, F1-score, and mean average precision (mAP) wherever necessary. The comparative performance is summarized in Table 10 and illustrated in Fig. 21 to enable better visual understanding.

Table 10 Brain tumor classification—accuracy comparison table.
Fig. 21
figure 21

Accuracy comparison of brain tumor classification methods.

In order to provide an open comparative image of various brain tumor classification techniques, a bar chart is drawn against the accuracy values reported in top research papers. The chart includes traditional machine learning models, deep models like CNNs and ResNets, hybrid models that combine deep models and traditional classifiers, and the proposed DenseNet201-SVM method. Integration of Younis et al. (CNN)36 and other recent studies keeps the comparison comprehensive of recent progress and state-of-the-art methods in the area. All the methods are compared to their asserted accuracy in terms of the application of MRI or benchmark databases such as BraTS, MICCAI, or TCGA.

It can easily be viewed from the bar chart that hybrid approaches, particularly deep learning and other traditional classifiers such as SVM, have better performance. Surprisingly, the approach of DenseNet201-SVM that has been suggested works better compared to all other approaches with 98.01% accuracy, which is highly robust and effective for brain tumor classification. Although other hybrid models like DenseNet121-SVM and DenseNet169-SVM also exhibit high accuracies (97.40%), traditional CNN-based approaches, although efficient, perform slightly less. This visual difference reflects the growing promise of optimized hybrid deep learning techniques in medical image analysis.

Comparative analysis with hybrid state-of-the-art models

To validate the performance of proposed DenseNet201-SVM hybrid architecture, we compared it to several state-of-the-art hybrid models that used deep learning-based feature extraction using traditional machine learning classifiers such as Decision Tree, Naïve Bayes, and K-Nearest Neighbors (KNN) as shown in Table 11. Comparisons were conducted based on publicly accessible MRI datasets within comparable experimental setup to ensure equality. The focus of this comparison is not only on correctness of classification but also on computational complexity and generalizability.

Table 11 Comparative performance of hybrid models for brain tumor classification.

As can be observed from the above comparison, it is evident that the proposed DenseNet201-SVM model offers comparable performance compared to other hybrid models. While ensemble methods like those utilized by Younis et al.36 offer slightly greater accuracy, they are computationally expensive and lack interpretability. Nevertheless, the DenseNet201-SVM technique offers a trade-off between accuracy, complexity, and interpretability, which renders it a strong candidate for clinical application in real-world settings.

Broader consequences of research results in real-world applications

This study’s findings have very strong potential to change real-world detection and categorization of brain tumors, particularly within a clinical context. By combining deep models, particularly models supplemented by XAI methodologies such as Grad-CAM, LIME, SHAP, and Integrated Gradients, has huge potential to revamp brain tumor diagnosis processes. Here, we address the larger implications of these advances from a clinical practice, patient care, healthcare system, and future directions perspective.

  1. 1.

    Improving Clinical Practice: Application of deep learning and XAI techniques for the detection of brain tumors can significantly enhance clinical practice. The increased model interpretability using XAI methods allows clinicians to understand how decisions are made, resulting in greater reliability and confidence in findings. The power to see which parts of the brain were most important in diagnosing a tumor (using Grad-CAM or LIME) can aid doctors to make more educated choices, to build a more tailored treatment protocol. These tools can be used as a second opinion to assist radiologists in detecting difficult or nuanced cases that may otherwise be overlooked, in order to mitigate human error.

  2. 2.

    Potential Patient Benefits: For the patient, the application of AI-based technology for brain tumor diagnosis can potentially provide an earlier diagnosis, which is the key to successful outcomes. Because of the capacity to examine large sets of data in a fraction of the time a human professional can, these technologies have the capacity to significantly decrease waiting times for diagnoses, enabling patients to be treated more efficiently.The healthcare systems will be integrated with AI models, which may provide non-invasive diagnostic options, reducing exploratory surgery or biopsies, which are of higher risk and cost. For patients, this may translate into better survival rates, less pain, and a better overall treatment experience.

  3. 3.

    Changing Healthcare Systems: One of the greatest benefits of AI-based tumor detection systems is that they can enhance quality diagnostic equipment access. Where there is a shortage of specialist medical staff or in low-resource environments, the incorporation of AI models may make it possible for access to dependable diagnostic technology to be democratized. Medical facilities with limited access to specialist radiologists can utilize these AI models as a more affordable option compared to conventional diagnostic procedures. The application of automated AI tools can also reduce the workload for clinicians, enabling them to spend more time on challenging cases or treat more patients, which is especially precious in overburdened healthcare systems.

  4. 4.

    Bridging Healthcare Disparities Internationally: Where there is a shortage of developed healthcare centers in a region or where skilled human capital is not available, AI-based brain tumor detection systems can be the rescuers for bridging healthcare gaps. These systems can be utilized in rural health centers or remote health centers, where sophisticated diagnostic facilities may not be present. By enhancing early diagnosis, AI has the ability to bridge healthcare disparities so that individuals everywhere can enjoy quality diagnostic treatment. Additionally, worldwide use of such technology is likely to enhance brain tumor care in populations and potentially save lives in lower healthcare regions.

  5. 5.

    Future Prospects and Vision: Future Directions: In the future, the application of AI in brain tumor diagnosis may be extended to other types of medical imaging modalities, and further extended to other types of diseases. As AI technologies progress, they will increasingly have applications in personalized medicine, where treatment is tailored to patient-specific information.

The ongoing development of explainable AI techniques will facilitate the establishment of trust between the patients and the clinicians, hence making these systems precious assets in day-to-day clinical practice. Moreover, with the models becoming perfected, they can eventually be applied in conjunction with other technologies that are being developed, such as robotic surgery and AR, to give instant diagnostic feedback while operating, which will improve accuracy and reduce the risks associated. Overall, the results of this study present a step in the right direction towards transforming brain tumor diagnosis into a faster, better, and cost-effective process. With the union of high-performance deep learning models and explainable AI, we can deliver a better patient-centered, streamlined process of medical diagnosis. Eventually, this study can redefine the diagnosis and treatment of brain tumors to improve clinical results, decrease the costs, and increase healthcare accessibility globally.

Challenges and potential solutions

Even with the outstanding diagnostic accuracy and interpretability of the proposed DenseNet201-SVM hybrid model, there are certain challenges that must be addressed to have broader clinical utility. One of the biggest concerns is the small size and lack of diversity in the MRI dataset employed. The dataset, while balanced between tumor and non-tumor conditions, is not very heterogeneous with respect to age, gender, and tumor types. This may limit the model’s generalizability to real-world populations. In subsequent studies, use of larger, multi-institutional datasets is essential to improve the model’s robustness and clinical applicability.

Another issue is interpretability of XAI techniques by non-technical people. Although techniques such as Grad-CAM, Integrated Gradients (IG), and Layer-wise Relevance Propagation (LRP) offer visual hints, their explanations are abstract in nature and cannot be directly interpreted by clinicians. This can discourage the model’s adoption in clinical practice. Making intuitive dashboards or blending text explanations with visual heatmaps can increase clinician trust and usability.

Another cost of the computations of DenseNet201-SVM is also problematic. Although its classification is superior, much memory and computing capacity are required in training and testing, respectively, which is obstructive for application in the real-time fashion during clinic utilization. Pruning for the model, quantizing the model, or employing light-weight models (i.e., models of MobileNet or models of DenseNet-lite) might be considered for the reduction in costs of the computation without damaging its performance.

Second, the absence of clinical trials or validation in the real world is lacking in the research as of now. Even though there are high performance statistics, the model has not yet been applied on a live clinical platform to check its dependability on different conditions. Subsequent studies would involve partnerships with hospitals to compare model predictions with real-time scenarios and against expert radiologist interpretations.

Finally, the lack of shared metrics employed in measuring explainability is another challenge. While metrics like IoU, precision, recall, and F1-score offer some understanding of ground truth alignment, clinically aligned standards are required to assess how well XAI explanations assist in human decision-making. These metrics can be created in consultation with medical experts to make future validation stronger.

Future plans, research progress, limitations, and theoretical analysis

The findings of this work lay a foundation for additional breakthroughs in AI-based medical diagnosis. Future work includes extending the current binary classifier model to multi-class classification of different tumor types, e.g., glioma, meningioma, and pituitary tumors. Another key goal is the integration of three-dimensional (3D) MRI information, which would allow for more spatially rich context for tumor site and improve predictive accuracy. Ensemble methods and the use of transformer architectures will also be explored to both enhance performance as well as interpretability.

The present study represents a significant leap forward in the integration of deep learning and traditional supervised learning methods. Compared to traditional end-to-end neural networks, the hybrid DenseNet201-SVM model leverages the dense connectivity of DenseNet for effective spatial feature extraction and the robust decision boundary properties of SVM for classification. The technique has been found to perform better both in terms of classification accuracy and interpretability. The use of multiple explainability methods further augments transparency of the model and supports clinical trust.

There are, however, certain limitations that cannot be overlooked. First, the patient metadata such as age, sex, or tumor type are not given in the data, which would be crucial in providing context to prediction. The model is even trained on 2D MRI slices instead of volumetric features of the tumor, which limits its spatial understanding. Second, while the model is interpretable, it still is computationally intensive, which would render it difficult to deploy on platforms that are resource-constrained. Finally, real-world evaluation and clinician-in-the-loop evaluation remain to be done, and these are important steps toward clinical readiness.

Theoretically, the model leverages DenseNet’s feature reuse and gradient flow advantages that mitigate vanishing gradient problems and promote deeper learning. The use of an SVM classifier provides a clearly defined decision boundary that boosts interpretability beyond softmax-based classification. After the explainability is further enhanced based on LRP, IG, and Grad-CAM, the system delivers an interpretable and complete AI pipeline. Such architectural robustness and interpretability make the framework a potential model for implementing in real-world clinical applications to aid experts in diagnosing brain tumor.

Conclusion

This paper presents a robust, interpretable brain tumor diagnosis model based on DenseNet201 with an SVM classifier and explainable AI techniques. The model utilizes a region-adaptive preprocessing pipeline and deep features to derive high classification accuracy. Among the methods evaluated for explainability, Layer-wise Relevance Propagation (LRP) outperformed Grad-CAM and Integrated Gradients (IG) with the highest interpretability and localization scores. These findings validate the model’s value as a reliable clinical decision support tool with excellent performance and visual transparency to facilitate expert analysis and confidence. In subsequent studies, the model can be extended to conduct multi-class classification for distinguishing different types of brain tumors. Additionally, integrating multi-modal imaging data (e.g., PET or CT) and patient metadata can enhance diagnostic accuracy and individualization. The incorporation of transformer-based models or ensemble learning methods might also improve performance. Finally, real-time implementation and future clinical testing are significant steps toward integrating this system into routine healthcare practice.