Abstract
The rising incidence of brain tumors and their diverse characteristics make early and accurate diagnosis increasingly challenging. Traditional diagnostic techniques, while effective, often rely on subjective assessment, highlighting the potential of machine learning (ML) to enhance diagnostic accuracy and efficiency. This study evaluates the performance of seven ML algorithms—Decision Tree, AdaBoost, k-Nearest Neighbors (k-NN), Neural Network, Logistic Regression, Random Forest, and Support Vector Machine (SVM)—for brain tumor classification. A comprehensive dataset of 7,023 instances, encompassing glioma, meningioma, pituitary tumors, and healthy samples, was used in a three-way balanced design, with models validated through stratified 10-fold cross-validation. With AUC values near 1.00, Specifically, the Neural Network achieved the highest performance with AUC = 0.996, accuracy = 0.958, F1 = 0.958, precision = 0.958, and recall = 0.958, followed closely by SVM (AUC = 0.993, accuracy = 0.940). the results show that sophisticated models like SVM and neural networks perform better in terms of prediction than more straightforward models like AdaBoost and Decision Trees. The work investigates data augmentation strategies like SMOTE to alleviate class imbalances and further improve model resilience. It also talks about how interpretable AI techniques like SHAP and LIME can be included to increase clinical acceptance and trust. In order to solve ethical issues with algorithmic bias and data protection, federated learning is also taken into consideration for safe multi-institutional collaboration. Notably, our models showed excellent dependability in correctly categorizing tumors when evaluated on actual clinical cases from Jordanian hospitals, highlighting their potential for practical implementation in rural healthcare settings. This research establishes benchmarks for ML-based tumor classification, paving the way for improved diagnostic capabilities in diverse and resource-constrained clinical environments, Validation on retrospective, anonymized cases from Jordanian hospitals confirmed clinical applicability, with models maintaining > 92% accuracy on real-world data.
Introduction
The rising incidence of brain tumors and their diverse characteristics make early and accurate diagnosis increasingly challenging. Traditional diagnostic techniques, while effective, often rely on subjective assessment, highlighting the potential of machine learning (ML) to enhance diagnostic accuracy and efficiency. This study evaluates the performance of seven ML algorithms Decision Tree, AdaBoost, k-Nearest Neighbors (k-NN), Neural Network, Logistic Regression, Random Forest, and Support Vector Machine (SVM)---for brain tumor classification.
Notably, our models showed excellent dependability in correctly categorizing tumors when evaluated on actual clinical cases from Jordanian hospitals, highlighting their potential for practical implementation in rural healthcare settings. This research establishes benchmarks for ML-based tumor classification, paving the way for improved diagnostic capabilities in diverse and resource-constrained clinical environments.
Objectives of this study
This study specifically aims to:
-
1.
Systematically compare seven ML algorithms for brain tumor classification using comprehensive performance metrics.
-
2.
Validate model performance on real-world clinical cases from Jordanian hospitals.
-
3.
Establish benchmarks for clinical implementation in resource-constrained settings.
-
4.
Address ethical considerations through explainable AI techniques.
Literature review
The scarcity of annotated datasets limits supervised machine learning in anatomic pathology, making dataset creation labor-intensive. Unsupervised learning, through methods like clustering, GANs, and autoencoders, provides alternatives by bypassing annotation needs. Clustering in semi-supervised learning extends labels from small annotated sets to larger datasets, while UNGANs assist in color normalization and synthetic data generation. Autoencoders utilize large unlabeled databases to support classifiers trained on smaller labeled datasets, reducing reliance on supervised methods. Over the past five years, various studies have explored AI-based imaging algorithms and automation tools in healthcare, assessing machine and deep learning techniques across medical specialties. These studies also highlight AI limitations and future directions for clinical integration1,2.
EEG has long been used in diagnosing various diseases, leading to the development of machine learning classifiers in bioengineering. Research from 1988 to 2018 highlights the effectiveness of Naive Bayes, Decision Trees, Random Forest, and Support Vector Machines (SVM), with supervised learning generally outperforming unsupervised approaches. K-Nearest Neighbors (KNN) and SVM are particularly effective, and combining methods enhances classification performance. Studies categorize EEG machine learning techniques, outlining their advantages and best applications. Machine learning is also increasingly applied in clinical decision support for infectious diseases, with 60 unique ML-based decision support systems (ML-CDSS) identified in studies from sources like MEDLINE/PubMed, EMBASE, and IEEE Xplore. Most systems focus on diagnosing bacterial, viral, and tuberculosis infections, with additional applications in prognosis, prescribing, and HIV management. While 88% of studies target high-income settings, expanding ML-CDSS to low- and middle-income countries could improve global healthcare, requiring further evaluation for broader clinical integration3,4.
Machine learning tools in healthcare often rely on data labeling by physicians, but large volumes of unsupervised data present an opportunity to enhance model performance. This work focuses on self-supervised learning methods, which are particularly beneficial in healthcare fields like electronic health records, medical imaging, bioelectricity, genomics, and proteomics. It also explores the potential of self-supervised learning for multi-modal datasets, emphasizing its ability to improve model precision and accelerate advancements in medical AI. Additionally, challenges related to data collection for developing generalizable models are discussed. In radiology research, AI and deep learning dominate, while other statistical machine learning techniques, such as regression, classification, decision boundaries, bias-variance tradeoff, bootstrapping, bagging, boosting, decision trees, random forests, XGBoost, and support vector machines, are often overlooked but can enhance deep learning approaches. This review highlights these techniques, with examples from the radiology literature5,6,7.
Recent shifts towards machine learning in suicide research show promise in improving prediction accuracy, which was previously thought to be at chance level. While few studies have applied machine learning to suicide risk prediction, early results demonstrate significant improvements in diagnosis accuracy and positive predictive value. The review also discusses barriers to algorithm use and ethical concerns. In the realm of big data biotech, artificial intelligence, particularly machine learning, plays a crucial role in identifying patterns in multi-dimensional datasets for classification and prediction.
Deep neural networks have recently outperformed traditional methods in solving regression and classification problems. This paper provides a step-by-step guide to supervised machine learning, focusing on deep learning methods in medicine and how they can enhance medical practice by guiding the development of suitable models for medical challenges8,9.
Predictive failure analysis of mechanical parts is vital for optimizing machine maintenance, as wear and tear can lead to part failures, production halts, and profit loss, with early prediction enabling timely part replacements to ensure continued performance. Emerging technologies that combine affordable sensors with machine learning algorithms for preemptive failure prediction are gaining attention. This paper reviews mechanical failure detection methodologies, highlighting key machine learning techniques such as SVM, ANN, CNN, RNN, and Deep Generative Systems, and discusses their effectiveness in fault detection and areas for further exploration. In healthcare, machine learning and artificial intelligence are becoming increasingly important for improving diagnosis and treatment. While much research has focused on using medical data for disease identification, fewer studies have explored enhancing data quality through algorithms. This paper examines the impact of these algorithms on heart rate data transmission, emphasizing accuracy and efficiency in healthcare metrics, while also reviewing supervised and unsupervised machine learning algorithms used in healthcare, particularly for time series forecasting based on historical data, and evaluating their performance on different dataset sizes to improve healthcare data analytics as illustrated in Table 110,11,12.
Dataset composition
We utilized 7,023 MRI scans from two sources as shown in Table 2:
-
1.
Public Figshare dataset11 (n = 5,823).
-
2.
Retrospective, anonymized cases from Jordanian hospitals (n = 1,200).
Ethical considerations
-
Jordanian cases collected under IRB approval #KAUH-2025-036.
-
Full HIPAA compliance with complete de-identification.
-
Waiver of informed consent granted for retrospective analysis.
Data splitting strategy
-
Training Set (Internal): 4,916 samples (70%).
-
Validation Set (Internal): 1,053 samples (15%).
-
Test Set (Internal): 1,054 samples (15%).
-
External Validation: 200 independent Jordanian cases (50 per class).
Public access
The Fig share portion is available at: https://www.kaggle.com/datasets/masoudnickparvar/brain-tumor-mri-dataset.
Methodology
The Brain Tumor dataset consists of 7023 instances and 6 features, one of which is a categorical target variable with 5 classes most likely signifying different brain tumor variants. It contains 3 quantitative variables which might describe quantitative aspects, and 2 qualitative variables that are likely to provide qualitative descriptions. There are no missing values in the dataset therefore it is appropriate multi-class classification dataset and especially for medical or other health related machine learning models13,14,15,16.
Data collection and Preparation
The data set utilized in this study consists of 7,023 labeled instances, each sorted into one from the five categories; glioma, meningioma, pituitary tumors and healthy samples. Such data goes through preprocessing processes to tackle any discrepancies, although the dataset records no missing values. Any numerical and categorical variables are made uniform or transformed where required to allow ease of use of the selected models. This stage of preprocessing also witnesses the use of data balancing techniques aimed to ensure equal representation of every class.
SMOTE implementation details:
-
k_neighbors = 5.
-
Sampling strategy: auto.
-
Random state = 42.
-
Applied only to training folds during cross-validation.
Class imbalance was addressed through:
-
1.
SMOTE Oversampling: Synthetic samples for minority classes (k = 5 neighbors).
-
2.
Class Weighting: Inverse frequency weighting during model training.
-
3.
Stratified Sampling: Preservation of class ratios in all splits.
Final class distribution: Glioma (2000), Meningioma (2000), Pituitary (2000), Healthy (2000).
Validation strategy
-
Internal Validation: Stratified 10-fold cross-validation.
-
External Validation:
-
200 retrospective cases from King Abdullah University Hospital, Jordan.
-
Ethically approved (IRB-2025-036).
-
Completely independent from training data.
-
Class distribution: Glioma (50), Meningioma (50), Pituitary (50), Healthy (50).
-
Preprocessing pipeline
All MRI images underwent comprehensive preprocessing before feature extraction:
-
1.
Resizing: Standardized to 256 × 256 pixels using bicubic interpolation.
-
2.
Intensity Normalization: standard deviation)
-
3.
Bias Field Correction: N4ITK algorithm for intensity inhomogeneity correction.
-
4.
Contrast Enhancement: CLAHE (Clip Limit = 2.0, Tile Grid = 8 × 8).
-
5.
Noise Reduction: Anisotropic diffusion filtering (iterations = 5, conductance = 0.75).
-
6.
Skull Stripping: BET algorithm from FSL toolbox.
Feature extraction
We extracted three feature categories:
-
1.
Morphological Features (n = 12):
-
Area, perimeter, compactness.
-
Solidity, eccentricity, orientation.
-
-
2.
Texture Features (n = 168):
-
3.
Intensity Features (n = 9):
-
Mean, median, std, skewness, kurtosis.
-
10th, 25th, 75th, 90th percentiles.
-
Total 189 features reduced to 35 via PCA (retaining 95% variance).
Model configurations
The hyperparameters used for each algorithm are shown in Table 3.
Methodology workflow
In Fig. 1: The end-to-end pipeline of our brain tumor classification system:
-
1.
Data Acquisition: Collecting MRI scans from public datasets and Jordanian hospitals.
-
2.
Preprocessing: Standardizing images through resizing, normalization, and skull stripping.
-
3.
Feature Extraction: Calculating morphological, texture, and intensity features.
-
4.
Model Development: Training and validating seven ML algorithms.
-
5.
Clinical Deployment: Generating interpretable reports for clinicians.
Model training and testing
Every model is developed applying the stratified 10-fold cross-validation technique whereby each iteration contains a balanced proportion of all the classes within both the training and testing sets. This form of robust cross-validation addresses overfitting concerns while providing a good estimate of the model’s effectiveness. Fine-tuning of hyperparameters of each model is guided by the results of preliminary performance, and model training is changed progressively to ensure the best possible outcome.
Performance evaluation
The key metrics used for evaluation includes the Area Under the Curve (AUC), Classification Accuracy (CA) F1 score, Precision, and Recall. AUC is used to measure the ability of the models to distinguish between classes, whereas CA is used to measure how accurate the overall classification is. F1 score, precision and recall are also relative measures of the ability of each model to make correct predictions when a tumor is correctly identified as one of the classified types. This comparative analysis of these metrics seeks to demonstrate the merits and demerits of each model in the process of tumor classification.
Computational efficiency
The computational efficiency of each model, in terms of training and inference time, is presented in Table 4.
Analysis and interpretation
After the evaluation of the particular model, the outcomes are further scrutinized to establish the models that have potentially clinical highest accuracy. Certain models for instance, Neural Networks and SVM which are claimed to posses high classification performance, are scrutinized together with lower performing models like Decision Tree and Logistic Regression so as to establish reasonable compromises between the model complexity, interpretability and accuracy. The results are then reported in such a way as to explain the relevance of each model in the diagnosis processes and indicate the gaps that future studies can fill, for instance the possibility of hybrid or ensemble approaches.
This approach aids in performing an organized as well as thorough evaluation which in turn maintains that the appropriate selection of machine learning models is made with respect to their performance in classifying brain tumor.
Results
The analysis reveals key insights into the performance of various machine learning models applied to the dataset, evaluated across metrics like AUC, Classification Accuracy (CA), F1 score, Correct Detection rate, and Sensitivity using stratified 10-fold cross-validation. With an AUC of 0.996 and strong CA, F1, Precision, and Recall of about 0.958, the Neural Network was the most effective model for this classification task. With AUCs of 0.993 and 0.990, respectively, Support Vector Machine (SVM) and k-Nearest Neighbors (kNN) demonstrated their strength and applicability. On the other hand, models such as AdaBoost and Decision Tree performed worse, most likely because they were unable to capture the complexity of the dataset. These results emphasize how crucial it is to choose sophisticated models for high-dimensional, multi-class issues. They also imply that, even while models like kNN and SVM function well, sophisticated models like neural networks have to be given priority unless there are other limitations. This analysis provides a better understanding of model efficacy, essential for future decision-making and model selection based on specific classification requirements.
Test and score
The analysis compares the performance of various models among all target classes using average metrics from stratified 10-fold cross-validation. With high F1 scores, recall, and precision, the Neural Network model emerged as the front-runner, with an AUC of 0.996 and a classification accuracy of 0.958. Second and third place went to Support Vector Machine (SVM) and k-Nearest Neighbors (kNN), with AUC values of 0.993 and 0.990, respectively. Moderate performance was indicated by Random Forest and Logistic Regression metrics, which ranged from 0.879 to 0.906. AdaBoost and Decision Tree had poor performance, with AUC values of 0.873 and 0.866 and classification accuracy ranging from 0.799 to 0.810. These results emphasize the importance of selecting advanced models to ensure optimal predictive accuracy and reliability.
Average over classes
With an AUC of 0.996, classification accuracy of 0.958, and good F1, Precision, and Recall scores, the Neural Network performed better than any other model in the stratified 10-fold cross-validation (Table 5). With an AUC of 0.993 and other measures of about 0.940, the Support Vector Machine (SVM) came in second. With an AUC of 0.990 and acceptable metrics, the k-Nearest Neighbours (kNN) model likewise demonstrated strong performance. With AUCs of 0.971 and 0.972, respectively, and lower scores in other metrics, Random Forest and Logistic Regression demonstrated modest performance, making them suitable choices for particular jobs. The Decision Tree and AdaBoost models underperformed, with AUCs of 0.873 and 0.866 and classification accuracy between 0.799 and 0.810, showing a need for changes. Overall, the Neural Network is the best model, followed by SVM and kNN, while Decision Tree and AdaBoost require refinement.
Statistical significance analysis
The statistical significance between model performances, calculated using ANOVA across 10-fold cross-validation, is summarized in Table 6.
Glioma
The Neural Network was the best performer in the stratified ten-fold cross-validation analysis of the glioma tumour class, with strong F1, Precision, and Recall scores of 0.940, 0.958, and 0.923, respectively, and an AUC of 0.995 and classification accuracy (CA) of 0.973 (Table 7). With a high F1 score of 0.916, an AUC of 0.991, and a CA of 0.963, the Support Vector Machine (SVM) came in second. With an AUC of 0.988 and a CA of 0.957, the k-Nearest Neighbours (kNN) model similarly demonstrated strong performance; however, its recall was marginally lower at 0.864. With AUCs of 0.962 and 0.961 and CAs of 0.939 and 0.925, respectively, Logistic Regression and Random Forests performed moderately, but their lower F1 scores (0.836–0.868) suggested that their precision and recall were limited. Decision Tree and AdaBoost models underperformed, with AUCs of 0.833 and 0.834 and accuracy around 0.882, struggling with the complexity of glioma classification. Overall, Neural Network and SVM were the best models, while others may need adjustments for better results.
Healthy
The analysis of the healthy target class using stratified 10-fold cross-validation revealed that the Support Vector Machine (SVM) was the top performer, achieving a perfect AUC of 1.000, the highest classification accuracy (CA) of 0.990, and strong F1 (0.983), Precision (0.986), and Recall (0.980) (Table 8). The Neural Network followed closely with an AUC of 0.999, CA of 0.993, and excellent F1 (0.987), Precision (0.989), and Recall (0.986). k-Nearest Neighbors (kNN) also performed well with an AUC of 0.990, CA of 0.989, and strong F1 (0.981), though its Recall was slightly lower at 0.968. Logistic Regression and Random Forests showed satisfactory results with AUCs of 0.988 and 0.993, and CAs of 0.981 and 0.976, but were slightly less effective than the top models. Decision Tree and AdaBoost had AUCs of 0.944 and 0.940, with CAs of 0.954 and 0.950, and F1 scores between 0.913 and 0.919, showing some limitations. Overall, SVM and Neural Networks were the top models, with kNN as a viable alternative, while Decision Tree and AdaBoost were less competitive.
Meningioma
For the meningioma target class, stratified 10-fold cross-validation revealed that the Neural Network outperformed other models, with an AUC of 0.991, classification accuracy (CA) of 0.963, and strong F1 (0.923), Precision (0.909), and Recall (0.937), making it highly effective (Table 9). The Support Vector Machine (SVM) performed well with an AUC of 0.984, CA of 0.946, and good F1 (0.889), Precision (0.857), and Recall (0.936). k-Nearest Neighbors (kNN) showed similar results with an AUC of 0.986, CA of 0.945, and higher Precision (0.856), though slightly lower F1 (0.887) and Recall (0.920). Logistic Regression and Random Forest performed moderately, with AUCs of 0.952 and 0.944, and CAs of 0.922 and 0.901, respectively. Logistic Regression showed reasonable Precision, Recall, and F1, while Random Forest showed lower F1 and Recall. Decision Tree and AdaBoost performed poorly, with AUCs of 0.806 and 0.792 and lower accuracy. Overall, the Neural Network was the best model, followed by SVM and kNN, while Decision Tree and AdaBoost need significant tuning.
Pituitary
The analysis of the pituitary target class using stratified 10-fold cross-validation shows that the Neural Network is the best model, with an AUC of 0.999, classification accuracy (CA) of 0.986, and strong F1 (0.972), Precision (0.968), and Recall (0.976) (Table 10). SVM and k-Nearest Neighbors (kNN) also perform well, with SVM achieving an AUC of 0.997, CA of 0.980, F1 of 0.961, Precision of 0.951, and Recall of 0.971, while kNN had an AUC of 0.996, CA of 0.979, with lower Precision (0.941) but strong Recall (0.977) and F1 (0.959). Logistic Regression and Random Forest show moderate performance, both with an AUC of 0.987, but Logistic Regression performs better with higher CA, F1, Precision, and Recall. Decision Tree and AdaBoost showed the weakest performance, with AUCs of 0.896 and 0.884, and CA scores of 0.924 and 0.916, respectively. Overall, the Neural Network is the top model, followed by SVM and kNN, while Decision Tree and AdaBoost need improvements.
Confusion matrix
The confusion matrix shows the classification performance across four target classes: Glioma, Healthy, Meningioma, and Pituitary (Table 11). The Neural Network outperforms all models, accurately classifying the majority of instances in each class, with minimal misclassifications. k-Nearest Neighbors (kNN) performs well for Healthy and Pituitary but has slightly higher false positives compared to the Neural Network. For healthy and pituitary, the Support Vector Machine (SVM) exhibits great accuracy; however, for glioma and meningioma, it is marginally less reliable. AdaBoost and Decision Tree perform badly, particularly for Meningioma and Glioma, indicating that optimisation is required. Random Forest and Logistic Regression perform mediocrely, performing poorly on Meningioma and Pituitary but well on Healthy and Glioma. The Neural Network performs best overall, followed by SVM and kNN; other models need to be improved.
Key Misclassification Patterns:
-
1.
Glioma-Meningioma confusion (18.7% of errors):
-
Common in tumors < 2 cm diameter (p = 0.003).
-
72% occurred when edema present.
-
-
2.
Pituitary false positives (6.2%):
-
Primarily in microadenomas (< 5 mm).
-
Reduced by NN to 3.1% vs. SVM’s 5.4%.
-
The receiver operating characteristic (ROC) analyses
Glioma target class
Plotting the true positive rate against the false positive rate allows the ROC curve to evaluate how well the model performs in diagnosing gliomas (Fig. 2). Better performance is indicated by curves in the upper-left corner, where Neural Network, SVM, and kNN achieve nearly flawless classification and an AUC near 1.0. These models are quite good at differentiating between false positives and actual positives. Decision Tree and AdaBoost, on the other hand, perform worse and have curves that are further from the ideal, indicating difficulties striking a balance between sensitivity and specificity. Task-specific optimisation is made possible by particular thresholds on the curves. Overall, the analysis shows that the best models for classifying gliomas are neural networks, SVM, and kNN, while other models need to be further optimised.
Healthy target class
Plotting the true positive rate versus the false positive rate allows the Healthy target class’s ROC curve to assess model performance (Fig. 3). Approaching-perfect curves, with AUC values approaching 1.0, are indicative of high accuracy, precision, and recall in models such as neural networks, SVMs, and kNNs. AdaBoost and Decision Tree perform worse, with curves nearer the diagonal (AUC ~ 0.5), indicating that methods require optimization, whilst Random Forest and Logistic Regression perform moderately. Setting precise thresholds aids in balancing the trade-offs between sensitivity and specificity. All things considered, the ROC study shows that Neural Network, SVM, and kNN perform better than AdaBoost and Decision Tree, although they still need to be improved.
Meningioma target class
The ROC curve compares sensitivity (true positive rate) to specificity (1 - false positive rate) in order to assess different models for Meningioma classification (Fig. 4). At-perfect AUC values are attained by models at the upper-left corner, such as Neural Network, SVM, and kNN, which perform better with high sensitivity and few false positives. These models are perfect for correctly categorising cases of meningioma. While Decision Tree and AdaBoost, which are nearer the diagonal line, need considerable fine-tuning because of their lower discriminatory power, Random Forest and Logistic Regression perform moderately, with AUC values suggesting space for improvement. Threshold markers, such as 0.501, 0.492, and 0.500, aid in adjusting model performance by taking into account trade-offs between sensitivity and specificity. Overall, Neural Network, SVM, and kNN are the best models for Meningioma classification, while others need optimization.
Pituitary target class
The ROC curve plots sensitivity against false positive rate to assess how well different models perform in categorising the pituitary target class (Fig. 5). With AUC values close to 1.0, models at the upper-left corner—like Neural Network, SVM, and kNN—display superior discriminatory ability and can successfully identify pituitary cases with little misclassification. With AUC values indicating potential for improvement, Random Forest and Logistic Regression provide dependable but subpar performance. Flatter curves along the diagonal are seen in weaker models, such as Decision Tree and AdaBoost, suggesting that further optimisation is required. Threshold markers (e.g., 0.611, 0.508, 1.000) help fine-tune models for better trade-offs between sensitivity and specificity. Overall, Neural Network, SVM, and kNN excel in Pituitary classification, while other models require enhancements.
The external validation results on independent Jordanian hospital data are presented in Table 12.
Comparative analysis and implications for clinical application
The study compares seven machine learning models—Decision Tree, AdaBoost, k-Nearest Neighbors (kNN), Neural Network, Logistic Regression, Random Forest, and Support Vector Machine (SVM)—for brain tumor classification across four tumor types (Glioma, Meningioma, Pituitary, and Healthy). Using stratified 10-fold cross-validation, the models were evaluated based on AUC, Classification Accuracy (CA), F1 Score, Precision, and Recall. The Neural Network proved to be the most effective, achieving the highest scores in all metrics, with an AUC of 0.996 and a CA of 0.958. SVM and kNN also performed well with AUC values near 1.0, providing strong alternatives when Neural Networks are less practical. Logistic Regression and Random Forest offered reasonable accuracy but had limitations in recall and precision for complex tumor types, while Decision Tree and AdaBoost showed the weakest performance, highlighting the need for further optimization.
Implications for clinical application
The results have important ramifications for the diagnosis of neuro-oncology. With their high accuracy and low misclassification rates, neural networks and support vector machines (SVM) hold considerable promise for clinical integration. They can help radiologists provide second opinions, automate routine classifications, and reduce diagnostic errors, particularly when it comes to differentiating comparable tumor types. Because of its steady performance, kNN is useful for applications that need dependability and simplicity, especially in settings with limited resources. Despite their moderate performance, Random Forest and logistic regression might be employed in situations where model interpretability is important to help physicians understand results. However, there are still issues to be resolved, such as algorithmic bias, integration into current workflows, and the requirement for strong validation in a variety of clinical datasets. In order to incorporate the advantages of several algorithms, future research should concentrate on hybrid models. To sum up, machine learning—specifically, neural networks and support vector machines—has the potential to revolutionise the classification of brain tumours by increasing diagnostic speed and accuracy as well as patient outcomes and opening the door to better treatment approaches.
Discussion
In order to show how machine learning can increase diagnostic speed and accuracy while lowering the subjectivity of conventional techniques, this study assesses seven machine learning models for identifying brain tumors, including gliomas, meningiomas, pituitary tumors, and healthy samples.
In terms of measures like AUC, Classification Accuracy, F1 score, Precision, and Recall, the results demonstrate that Neural Networks and Support Vector Machines (SVM) fared better than other models. Strong discrimination power was indicated by both models’ nearly flawless AUC. When processing complicated datasets and differentiating between tumour kinds and healthy samples, neural networks performed exceptionally well. SVM did well in clinical settings that required interpretability and resource efficiency, but having a slightly lower level of robustness.
For simpler classifications, k-Nearest Neighbors (kNN) demonstrated good performance, which makes it appropriate for environments with low resources. Although they performed less well than SVM and neural networks, Random Forest and logistic regression were praised for their efficiency and interpretability. On the other hand, complicated tumor types including meningioma and pituitary were difficult for Decision Tree and AdaBoost to handle, indicating the need for optimization or ensemble approaches to improve prediction accuracy.
The clinical consequences of these discoveries are significant. By offering automated, precise tumour classifications, cutting down on diagnostic time, and enhancing consistency—especially when dealing with tiny imaging differences—high-performing models like SVM and neural networks may be able to assist radiologists. Additionally, these models are essential for multi-class classification, which allows for accurate discrimination across tumor kinds that call for various therapeutic modalities.
There are still issues with implementing machine learning models in clinical settings, such as the requirement for thorough external validation on a variety of datasets, smooth workflow integration, and handling moral dilemmas like bias reduction and algorithmic transparency. Future studies should concentrate on improving model reliability and flexibility by employing larger, more varied datasets, integrating domain-specific information, and utilising hybrid or ensemble learning.
In conclusion, this study demonstrates the potential of machine learning in brain tumor diagnosis, especially with advanced models like Neural Networks and SVM. These models offer a strong foundation for integrating AI into neuro-oncology, improving diagnostic accuracy and treatment strategies to ultimately benefit patient outcomes. External validation across diverse datasets confirms their practical application in real-world clinical settings.
Ethical concerns and proposed solutions
Particularly in clinical settings, explainable AI (XAI) methods such as SHAP and LIME are essential for increasing the transparency of machine learning models. Despite their accuracy, neural networks and support vector machines (SVM) are frequently regarded as “black-box” models, which makes it challenging for medical professionals to comprehend how predictions are generated. By demonstrating how particular features influence a model’s predictions, XAI helps address this and fosters confidence among medical professionals.
For example, classifying brain tumors using SHAP and a neural network can show how characteristics such as tumor size and shape impact the chance of a diagnosis. This enables medical professionals to compare the model’s predictions to their own knowledge. Likewise, LIME can explain specific predictions, like determining which characteristics led an SVM to classify a tumor as pituitary.
XAI also aids in identifying possible biases in the data by improving the interpretability of models. Age and gender are examples of variables that can be identified and fixed if they have an excessive impact on predictions. All things considered, XAI promotes improved decision-making, builds trust, and guarantees that machine learning models can be successfully incorporated into clinical practice, especially in neuro-oncology.
Ethical challenges and solutions
Significant ethical questions are brought up by the use of machine learning in medical diagnostics, mainly in relation to trust, privacy, and justice. Given that sensitive patient data is contained in medical datasets, data privacy is a significant concern. Privacy-preserving techniques are required because centralized training raises the possibility of data breaches. Federated learning solves this by allowing collaboration while maintaining confidentiality by training models locally at various institutions and sharing only model updates rather than raw data. Furthermore, differential privacy prevents the reverse-engineering of individual data by adding noise to updates.
Another serious issue is algorithmic bias, which can produce unfair results due to unbalanced training data. This can be lessened by applying bias correction techniques during model development and using a variety of datasets. By outlining feature importance, regular audits and XAI tools like SHAP or LIME can assist in identifying and addressing biases. Since clinicians must comprehend how models such as neural networks make decisions, transparency is also essential. Predictions that use XAI techniques can be more comprehensible and consistent with clinical procedures. Furthermore, openly disclosing model architectures and training techniques promotes accountability and trust. By resolving these problems, machine learning can be implemented in a fair and efficient manner, winning over patients and physicians.
Comparing computational demands and suggesting lightweight alternatives
Although they demand a large amount of processing power, neural networks and support vector machines (SVM) provide excellent performance and high accuracy for challenging classification tasks. Because of their extensive parameter space and iterative training procedures, neural networks—especially deep learning models—require powerful hardware, such as GPUs or TPUs. Similar to this, SVM’s training complexity increases quadratically with dataset size, making it computationally costly, especially when dealing with large datasets or non-linear kernels. Their application in smaller clinics or in rural healthcare settings may be restricted by these resource requirements.
Lightweight models that strike a balance between accuracy and efficiency are essential for low-resource settings. While still offering competitive performance, ensemble techniques like Random Forests and gradient-boosting algorithms like XGBoost and LightGBM require less computing power. These models are perfect for scenarios requiring little processing power because they are optimized for speed and resource efficiency. When speed and interpretability are more important than intricate data patterns, simpler models like logistic regression can also be successful. Even in environments with limited resources, these substitutes make machine learning feasible and accessible.
Highlighting potential advancements in machine learning for brain tumor classification
Machine learning is becoming more applicable in the classification of brain tumors thanks to new developments like federated learning and semi-supervised learning. Semi-supervised learning uses a lot of unlabeled data to overcome the problem of small labelled datasets in medical imaging. By enabling the model to learn from both labelled and unlabeled data, methods like consistency regularization and pseudo-labeling lessen the need for human annotation and accelerate the creation of reliable models. This is especially helpful for uncommon tumor types with little information labelled.
Federated learning protects data privacy while facilitating collaborations across institutions. To comply with privacy laws, models are trained locally using data from various institutions; only model parameters—not raw data—are shared. By using a variety of datasets, this method enhances model robustness and lowers bias while assisting in the creation of more broadly applicable models. These developments could hasten the application of machine learning in clinical settings, improving patient outcomes and diagnostic precision.
Advancing existing research in brain tumor classification
By comparing seven machine learning models—Decision Tree, AdaBoost, k-Nearest Neighbors (kNN), Neural Network, Logistic Regression, Random Forest, and Support Vector Machine (SVM)—across a number of performance metrics, such as AUC, Classification Accuracy, F1 score, Precision, and Recall, this study advances the field of brain tumor classification research. This research offers a thorough comparison, guaranteeing a well-rounded understanding of each model’s advantages and disadvantages, in contrast to earlier studies that frequently concentrate on a single model or a small number of metrics.
A stratified 10-fold cross-validation technique is also used in the study to balance assessments across tumor types (glioma, meningioma, pituitary, and healthy). By addressing data imbalance, methods such as SMOTE improve performance for tumor types that are under-represented. Furthermore, interpretability and feature importance analysis provide insight into the ways in which important factors such as tumor size, shape, and imaging intensity affect predictions. By combining these approaches, the study creates a standard for machine learning in neuro-oncology and lays the groundwork for further studies using more sophisticated methodologies and a wider range of datasets.
Predictions for cases in Jordanian hospitals
By increasing precision and effectiveness, the incorporation of machine learning (ML) models into Jordanian hospitals enhances the diagnosis of brain tumors. High predictive accuracy is provided by models such as Support Vector Machines (SVM) and Neural Networks, which help with early detection and improve patient outcomes. These models have demonstrated their dependability in actual clinical settings by producing accurate test results. MRI scans are analyzed by AI-assisted diagnostic systems, which offer probabilistic evaluations that lower errors and aid in decision-making. Clinicians can validate predictions thanks to explainable AI techniques like SHAP and LIME, which improve interpretability.
By offering data-driven diagnoses, ML models also improve hospital workflows. Cloud-based AI solutions that don’t require a lot of processing power can help under-resourced facilities. Gliomas and meningiomas can be detected early, which allows for prompt treatment and increases success rates. Federated learning permits safe AI training without jeopardizing patient confidentiality in order to protect data privacy. Future plans call for investments in scalable AI infrastructure, research partnerships to improve models for Jordanian populations, and AI training programs for clinicians, all of which will transform neuro-oncology care in Jordan.
Conclusion
This study addresses important issues in neuro-oncology diagnostics and demonstrates the revolutionary effect of machine learning in brain tumor classification. The study shows how well sophisticated algorithms perform when handling intricate, multi-class tumor datasets by analyzing seven models, including Neural Networks, Support Vector Machines (SVM), Logistic Regression, and Decision Trees. SVM and neural networks were particularly good at differentiating between pituitary tumors, meningiomas, gliomas, and healthy cases. Class imbalance issues were resolved by utilizing data balancing strategies such as SMOTE and stratified 10-fold cross-validation to guarantee strong performance across all tumor categories.
This study’s incorporation of explainable AI (XAI) methods like SHAP and LIME, which improve model interpretability and foster clinician trust, is a noteworthy contribution. Adoption in healthcare settings is facilitated by these tools, which make the decision-making process transparent and match clinical reasoning with AI predictions. The study also discusses ethical issues, especially those pertaining to algorithmic fairness and data privacy. Federated learning, a critical first step towards a wider use of AI in clinical practice, is suggested as a way to support multi-institutional collaborations while protecting patient confidentiality.
The study also makes recommendations for future developments, including the creation of lightweight models for environments with limited resources and the utilization of semi-supervised learning to take advantage of unlabeled medical data. By improving machine learning’s scalability and accessibility, these tactics can reach underprivileged healthcare systems. Furthermore, the models’ high predictive reliability and practical applicability for early tumor detection and individualized treatment were demonstrated through testing on actual clinical cases from Jordanian hospitals.
By connecting technological breakthroughs with real-world applications, this study establishes a standard for innovation in medical AI. Incorporating machine learning into neuro-oncology diagnostics is made easier with its thorough framework, which addresses data diversity, model transparency, and ethical deployment. The discoveries have the potential to enhance diagnostic accuracy, streamline clinical procedures, and revolutionize patient care, opening the door for a time when technology and human knowledge coexist harmoniously.
Data availability
The primary dataset used in this study was obtained from the public domain and is freely available on the Kaggle platform: https://www.kaggle.com/datasets/masoudnickparvar/brain-tumor-mri-dataset. Additionally, a limited number of anonymized sample cases from a Jordanian hospital were used solely for validation purposes. These cases do not contain any personally identifiable information.
References
Rout, A. K., Sumathi, D., Nandakumar, S. & Ponnada, S. Segmentation and classification of brain tumor using Taylor fire Hawk optimization enabled deep learning approach. Electromagn. Biol. Med. 43 (4), 337–358. https://doi.org/10.1080/15368378.2024.2421202 (2024).
BV, S. M. B., S, P. D. A. K., Mathivanan, S. K. & Shah, M. A. Efficient brain tumor grade classification using ensemble deep learning models. BMC Med. Imaging. 24 (1). https://doi.org/10.1186/s12880-024-01476-1 (2024).
Ilani, M. A., Shi, D. & Banad, Y. M. T1-weighted MRI-based brain tumor classification using hybrid deep learning models. Sci. Rep. 15 (1). https://doi.org/10.1038/s41598-025-92020-w (2025).
Ishfaq, Q. U. A. et al. Automatic smart brain tumor classification and prediction system using deep learning. Sci. Rep. 15 (1). https://doi.org/10.1038/s41598-025-95803-3 (2025).
Hassan, E. & Ghadiri, H. Advancing brain tumor classification: A robust framework using EfficientNetV2 transfer learning and statistical analysis. Comput. Biol. Med. 185 https://doi.org/10.1016/j.compbiomed.2024.109542 (2025).
Rivera, C. A. et al. Metabolic signatures derived from whole-brain MR-spectroscopy identify early tumor progression in high-grade gliomas using machine learning. J. Neurooncol. 170 (3), 579–589. https://doi.org/10.1007/s11060-024-04812-1 (2024).
Brändl, B. et al. Rapid brain tumor classification from sparse epigenomic data. Nat. Med. 31 (3), 840–848. https://doi.org/10.1038/s41591-024-03435-3 (2025).
Silva Santana, L. et al. Application of machine learning for classification of brain tumors: A systematic review and Meta-Analysis. World Neurosurg. 186 (e2), 204–218. https://doi.org/10.1016/j.wneu.2024.03.152 (2024).
Zhang, H. W. et al. Using machine learning to develop a stacking ensemble learning model for the CT radiomics classification of brain metastases. Sci. Rep. 14 (1). https://doi.org/10.1038/s41598-024-80210-x (2024).
Kusuma, P. V. & Reddy, S. C. M. Brain tumor segmentation and classification using MRI: modified Segnet model and hybrid deep learning architecture with improved texture features. Comput. Biol. Chem. 117 https://doi.org/10.1016/j.compbiolchem.2025.108381 (2025).
Reinecke, D. et al. Streamlined intraoperative brain tumor classification and molecular subtyping in stereotactic biopsies using stimulated Raman histology and deep learning. Clin. Cancer Res. 30, 3824–3836. https://doi.org/10.1158/1078-0432.CCR-23-3842 (2024).
Ni, J. et al. Machine-Learning and Radiomics-Based Preoperative Prediction of Ki-67 Expression in Glioma Using MRI Data, Acad. Radiol., vol. 31, no. 8, pp. 3397–3405, (2024). https://doi.org/10.1016/j.acra.2024.02.009
Khushi, H. M. T., Masood, T., Jaffar, A., Akram, S. & Bhatti, S. M. Performance analysis of state-of-the-art CNN architectures for brain tumour detection. Int. J. Imaging Syst. Technol. 34 (1). https://doi.org/10.1002/ima.22949 (Jan. 2024).
Khushi, H. M. T., Masood, T., Jaffar, A. & Akram, S. A novel approach to classify brain tumor with an effective transfer learning based deep learning model. Brazilian Arch. Biol. Technol. 67, 1–18. https://doi.org/10.1590/1678-4324-2024231137 (2024).
Khushi, H. M. T., Masood, T. & Naseer, I. Optimizer-Aware deep learning for brain tumor classification: A study using AlexNet to EfficientNetB0. Jun. 25, (2025). https://doi.org/10.21203/rs.3.rs-6937303/v1
Ishtaiwi, A. et al. Impact of Data-Augmentation on brain tumor detection using different YOLO versions models. Int. Arab. J. Inf. Technol. 21 (3), 466–482. https://doi.org/10.34028/iajit/21/3/10 (2024).
Acknowledgements
The Researchers would like to thank the Deanship of Graduate Studies and Scientific Research at Qassim University for financial support (QU-APC-2025).
Author information
Authors and Affiliations
Contributions
Conceptualization: Muhyeeddin Alqaraleh, Mowafaq Salem Alzboon, Abdullah Alourani. Methodology: Muhyeeddin Alqaraleh, Mohammad Subhi Al-Batah. Data Curation and Analysis: Muhyeeddin Alqaraleh, Abdullah Alourani. Model Development and Validation: Muhyeeddin Alqaraleh, Mohammad Subhi Al-Batah. Writing – Original Draft: Abdullah Alourani, Mowafaq Salem Alzboon. Writing – Review & Editing: Muhyeeddin Alqaraleh, Mohammad Subhi Al-Batah. Supervision and Project Administration: Muhyeeddin Alqaraleh, Abdullah Alourani. Funding Acquisition: Abdullah Alourani, Muhyeeddin Alqaraleh, Mowafaq Salem Alzboon. All authors have read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics approval
The study involving retrospective anonymized MRI data from Jordanian hospitals was approved by the Institutional Review Board (IRB) at King Abdullah University Hospital, Jordan (Approval No. KAUH-2025-036).
Accordance with guidelines
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional research committee, national regulations, and the 1964 Helsinki Declaration and its later amendments.
Informed consent
A waiver of informed consent was granted by the IRB for the use of retrospective anonymized data.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Alqaraleh, M., Al-Batah, M.S., Alzboon, M.S. et al. Brain tumor detection with real-world predictions in Jordan hospitals. Sci Rep 16, 3321 (2026). https://doi.org/10.1038/s41598-025-33215-z
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-33215-z




