Introduction

Cancer, a major cause for the non-accidental mortality in recent years, is a condition that results in the abnormal growth of cells hindering the functionality of normal cells. There are many types of cancer based on the part of the body that has been affected. Leukemia, one of the most common types of cancer that affects bone marrow and blood1, resulting in the abnormal production i.e. excessive or immature production of White Blood Cells (WBC) disturbing the proper functionality of the immune system. Leukemia has been classified into two types-Acute and chronic, based on the growth rate of the affected cells. Acute leukemia exhibits a faster growth rate whereas chronic leukemia has a comparatively slower growth rate. Based on the type of cells that has been affected leukemia has been categorized into two different types, Lymphocytic leukemia where lymphocytes are affected, and myeloid leukemia if monocytes and granulocytes are infected. Thus four major categories of leukemia are Acute Lymphocytic Leukemia (ALL), Acute Myeloid Leukemia (AML), Chronic Lymphocytic Leukemia (CLL), Chronic Myeloid Leukemia (CML)2.ALL is further categorized into Early Pre-B, Pre-B, and Pro-B ALL based in the presence of precursor. There are many causes for the occurrence of the disease including genetic factors, radiation exposure, transmitted diseases etc.

A major pediatric cancer ALL, due to its rapid proliferation nature, affecting not only blood but also other vital organs becomes dreadful, if left untreated. According to World Health Organization (WHO), 487,294 leukemia cases have been reported in the year 2022 in which 227,206 cases from Asia and 107,748 cases from Europe3. The worldwide mortality rate for the specified year was 305,405 cases with 158,144 cases in Asia and 63,839 cases in Europe4. Early diagnosis of ALL and its subtypes is paramount in the effective treatment planning leading to the increase in survival rate. Among the available detection tests such as flow cytometry, cytogenetic analysis, fluorescence in situ hybridization, immunophenotyping, the microscopic examination of Peripheral Blood Smear (PBS) images is considered as the standard initial screening technique5 due to its minimum invasive nature and cost efficiency. Manual examination of PBS, though done by skilled experts, includes many challenges, availability of experts across the globe, opinion variations among different experts leading to inconsistent results, time consuming, labor intensive and error prone6. Further experts’ opinion is highly influenced by the initial staining techniques and smear preparation methods leading to misclassification. In order to overcome these drawbacks, An Automated detection system that is capable of providing more accurate, faster and consistent classification results even in case of noisy input data, utilizing comparatively less expensive hardware and computational complexity, is thus required for the early diagnosis of ALL leading to effective treatment planning.

The key contributions of this work are listed as follows,

  • A Novel segmentation algorithm, Multilevel Hierarchical Marker-Based Watershed Algorithm, is proposed to obtain the desired ROI from preprocessed PBS images containing overlapping cells.

  • Handcrafted features are extracted from the segmented ALL images to enhance system performance.

  • A novel meta heuristic, nature inspired Enhanced Glowworm swarm optimization algorithm, is introduced to select optimal features, ensuring the removal of redundant and irrelevant features, to elevate model’s efficiency.

  • A Comparative analysis of selected popular machine learning classifiers through the important evaluation metrics to access their performance

  • The developed system is benchmarked against other nature-inspired optimization algorithms-ABC, PSO and EHO to validate its effectiveness.

  • Machine learning (ML) based approach, designed to handle limited quantities of medical dataset effectively, while maintaining a perfect balance between the classification accuracy and computational complexity, deployable in a light weight devices, assisting medical practitioner across the globe.

The Manuscript is organised into five sections. Review of the existing literature is presented in section two, proposed methodology and its underlying processes are detailed in section three. Results of the proposed model & its relevant analysis are exhibited in section four. Conclusion along with future scope is presented in section five.

Literature review

In literature, a wide variety of methodologies have been proposed for the detection of Leukemia. An automatic approach to classify leukemia proposed in7 utilized Expectation maximization for segmentation, Principal component selection for significant feature selection and sparse representation for classification of leukemia in Acute Lymphoblastic Leukemia Image Database version 2 (ALL-IDB2) dataset. The proposed system achieved an accuracy of 94% but utilized only a limited dataset focusing on binary classification. Automatic optical image analysis system put forward in8 has employed a dataset from the department of hematology-Jimma medical center. The preprocessed images are segmented using k-means clustering, watershed & morphological operations and classified by Support Vector Machine(SVM). Analysis of the developed system using publicly available dataset is missing.

An integrated system for leukemia detection developed in9 employed adaptive neuro fuzzy, Principal Component Analysis(PCA), Genetic Algorithm, Particle Swarm Optimization(PSO) and Group Method Data Handling. Adaptive Neuro-Fuzzy Inference System classified ALL and AML after a lot of preprocessing steps. Major limitation of this model is its dependency on Complete Blood Count (CBC) values which is prone to outliers affecting the models accuracy. Sorayya Rezayi et al.10 used novel Convolutional Neural Networks(CNN) to classify leukemia and got 82% of test accuracy on the dataset obtained from Codalab competition. Performance of the proposed binary class classification system has been compared with two other deep learning models-Visual Geometry Group -VGG16 and ResNet-50 and it is concluded that increased dataset volume can lead to increase in system performance.

Immature leukocytes detection and classification system proposed in11 employed image format conversion, multi Otsu thresholding, morphological operations for segmentation, Random forest (RF) for classification, yielded an accuracy of 92.99% and 93.45% for detection and classification respectively. The developed system focused on AML classification, failed to segment the overlapping cells. The Leukemia detection system developed in12 utilized the Alexnet model to detect AML outperformed LeNet-5 with the accuracy of 88.9%. Zeinab et al.13 followed traditional Machine Learning (ML) procedures, preprocessing, segmentation using marker controlled watershed & global local contrast enhancement and k-means clustering. SVM uses the most relevant fifty features from a diverse dataset to obtain an accuracy of 91.025%.ALL detection system suggested in14 explored eight variants of EfficientNet to extract features and classified ALL by the ensemble of logistic regression, random forest, and SVM to obtain the accuracy of 98.5%.ALL prediction system15 used Resnet and VGG16 to obtain features from the preprocessed images of Chinese National Medical Centre (C-NMC) leukemia dataset. Ensemble soft voting classifier yielded an accuracy of 87.4% whereas SVM outperformed it with an accuracy of 90% but failed to classify ALL subtypes. A System with three major phases16, preprocessing, feature extraction and classification is explored to classify normal and abnormal images of C-NMC leukemia dataset. Hyperparameters of CNN have been optimized by the fuzzy to obtain the accuracy of 99.99%. Time required by the system to provide accurate results has not been explicitly provided. Automated system with DenseNet20117 and DenseNet169 with flipped attention block18 utilized for the classification of ALL_-IDB2 images yielded an accuracy of 94.6% and 97.94% respectively.

DL based feature engineering process put forward in19 compares the performance of Bayesian optimized SVM and subspace discriminant ensemble learning classifiers with the features from PSO, PCA and hybrid PSO-PCA and found the Bayesian optimized SVM trained by the features from hybrid PSO-PCA outperformed with the accuracy of 97.4%. Web based platform20 employed Resnet 152 as base learner for the weighted ensemble learning algorithm. Despite consuming more time for training and increased computational overhead, the model achieved an accuracy of 99.95% in the classification of types of ALL. Diagnostic model21 utilized traditional image processing techniques to detect leukemia and Resnet50 with SVM- hybrid model to classify the types of ALL. The model obtained an accuracy of 99% on the availability of sufficient quantity and quality of images in the dataset.

Algorithms proposed for optimization has been inspired from different sources such as social behavior, physics, biology, chemistry, music, sports etc., Bilal Alatas et al.22 proposed two intelligent optimization algorithms inspired by the law of reflection and refraction, Ray Optimization(RO) and Optics Inspired Optimization(OIO). Evaluation of these methods on real world engineering problems and benchmark functions bestowed evidence that RO method outperformed the latter. Sports based optimization algorithm, League Championship Algorithm23 proposed to elevate the performance of teams, competing over multiple weeks, by enhancing their tactics over time. Physics inspired optimization algorithm, OIO24 put forward to model the wavy mirror as the search space, where concave and convex mirrors are considered as valleys and peaks, handles the local optimal tracking and slow convergence speed thereby enhancing the system performance. Three optimization algorithms25 Grey Wolf Optimization (GWO), OIO, Chaos Based Optics Inspired Optimization (CBOIO) were compared on deception detection problems and CBOIO showed higher efficacy than the remaining two algorithms. El-Sayed M et al.26 proposed Greylag Goose Optimization algorithm inspired by the V shape formation of geese during migration, to reduce air resistance experienced by the following ones. Model provided better results on different cases including pressure vessel design, tension spring design etc., Mona Ahmed Yassen et al.27 developed a renewable energy sources prediction system with the aid of ML and DL algorithms. A review on ML algorithms28 provided the feasibility of improved public health forecasting and Zika virus monitoring.

Most of the Machine learning based automated systems proposed in the literature focused on the binary classification of leukemia and failed to classify its types. Many of them have not used the publicly available dataset for classification. Segmentation of overlapping cells has to be addressed in order to obtain the proper ROI and increase in the accuracy of the classification process. Although it is found that some of the deep learning models are capable of yielding a higher accuracy, it requires a large amount of data, less feasible in case of medical image dataset such as leukemia. Further deep learning methods are computationally expensive, requiring more resources, making it less deployable in the resource limited environment. Effectiveness of the optimization algorithms inspired from different phenomena is highly problem dependent, slow in high dimensional search space like leukemia, reduced search efficiency due to the lack of adaptive step size. Hence the work focused on development of a ML model that works on the publicly available dataset to have a high accuracy multiclass classification by considering segmentation of overlapping cells and selection of optimal features by enhanced swarm-based algorithm.

Proposed methodology

Development of a system for the automatic detection of leukemia involves the sequence of process such as dataset acquisition, preprocessing of acquired dataset, followed by segmentation, extracting features from the segmented data, selection of the most important and relevant feature, classification of the disease and finally the evaluation of the developed system. Pictorial representation of the process involved in the proposed system is presented in Fig. 1.

Fig. 1
figure 1

Steps involved in the development of an automatic system for the classification of leukemia present in the blood microscopic images.

Dataset acquisition

Among the different methods available to determine the presence of Leukemia, the microscopic view of PBS is chosen as the modality due to its minimal invasive nature, comparatively less requirement of cost and time. Publicly available dataset employed for the proposed model contains 3256 PBS images obtained from 89 patients, in which 25 patients are healthy and the remaining 64 patients are malignant. It is collected from the bone marrow laboratory of Taleqani Hospital situated in Tehran, Iran. The skilled laboratory staff members of the hospital categorized benign by the presence of hematogones, normal B-lymphocyte precursors. This does not demand any therapeutic intervention to sort it out. The malignant images are further grouped into Early Pre-B, Pre-B, and Pro-B ALL images. The images of 224 × 224 dimensions are obtained with the help of a Zeiss camera present in the microscope29. Table 1 outlines the dataset details.

Table 1 Details of the dataset with sample images for each class of ALL.

Pre-processing

Images present in the dataset may contain noise due to a variety of reasons such as poor illumination, vibration, changes that occur during the initial staining of slides and other environmental factors. It is handled by different preprocessing techniques. Among those methods, median filtering and Contrast Limited Adaptive Histogram Equalization (CLAHE)techniques are deployed to elevate the image quality for obtaining effective results in the subsequent stages. Median Filter will reduce noise present in the dataset and at the same time does not blur the edges of the image. Filtered image is then subjected to a special variant of Adaptive Histogram Equalization- Contrast Limited AHE to increase the local contrast, enhancing visibility of important features of the image. Over amplification of noise artifacts are overcome by the use of the above sequence of preprocessing. Effectiveness of the preprocessing steps is evident from the increase in the PSNR-Peak Signal to Noise Ratio. Figure 2 represents the results of the preprocessing module.

Fig. 2
figure 2

Preprocessed images (a) Original (b) Median filter (c) CLAHE.

Segmentation

The proposed novel segmentation method, Multilevel Hierarchical Marker-Based Watershed Algorithm (MHMW) combines the aspects of multilevel Otsu’s thresholding, morphological operations, contour detection and watershed algorithm to get the desired region of interest. The sequence of steps involved in the proposed MHMW is detailed as follows.

Step 1: Multilevel thresholding.

It is needed to process the leukemia images as the peripheral blood smear has different constituent elements30. Otsu’s method is employed to separate the given input image into classes based on the pixel’s intensity level. The classes are separated based on the optimal value of the threshold(T) that focuses on the maximization of the intra class variance.

$$Opt Th =argmaxT\left(Var T\right)$$
(1)
$$Var\left(T\right)= {P}_{1}\left(T\right).{P}_{2}\left(T\right).\left({m}_{1}\left(T\right)-{m}_{2}\left(T\right)\right)$$
(2)
$$Where{P}_{1}\left(T\right),{P}_{2}\left(T\right)\; are\; the\; probabilities\; of\;the\; region \;1 \;and\; 2$$
$${m}_{1}\left(T\right),{m}_{2}\left(T\right)\; are\; their\; corresponding\; mean.$$

Multilevel threshold is carried out by combining the effects of various binary level thresholding.

$${\omega }_{t}^{2}=\sum {\omega }_{k}^{2}$$
(3)
$$k=\text{1,2},\dots n \to represents\; the\; count\; of\; binary \;classes$$

Image resulting from the above thresholding process contains few factors, noise that need not to be processed further.

Step 2: Morphological closing and contour detection.

From the studies it is found that the particles with area less than 80 pixels are non-WBC and hence can be eliminated from the following processing steps31. The image is subjected to cleaning by the morphological operations32. Used to fill the small gaps and smoothen the obtained image, Morphological closing creates a uniform representation of the cells. The individual cells in the images are detected with the help of contours. It represents the boundaries of the connected regions.

Step 3: Hierarchical marker selection watershed algorithm.

The image is then subjected to watershed segmentation. This topology based segmentation has the ability to handle the overlapping cells present in the PBS images. It uses markers as the seed points to start flooding the basins until the basins corresponding to different markers reach the watershed. The position of the markers is very much important for the effective extraction of ROI 33. Traditional watershed algorithm suffer from the problem of over segmentation and hence it has been addressed by employing modified watershed algorithm. Hierarchical marker selection has been incorporated in the traditional watershed algorithm where background and foreground markers are defined explicitly to enhance the segmentation quality. Distance transform computes the distance internally through the Euclidean metrics.

$$De(k,l)=\sqrt{({{l}_{1}-{l}_{2})}^{2}+{({m}_{1}-{m}_{2})}^{2}}$$
(4)
$${l}_{1,}{l}_{2,} {m }_{1,}{m}_{2}\to represents\; the\; coordinate\; values$$

Desired region of interest is extracted by the MHMW algorithm involving a novel sequence of steps generating a cleaned binary image by multilevel thresholding followed by morphological operation. Watershed algorithm is applied by hierarchical marker selection where the markers are defined with respect to the foreground and background likelihoods using distance transform and thresholding. Figure 3 indicates outcomes of the proposed segmentation module. Input image is converted into binary image, where different regions of PBS have been separated based on intensity levels. Segmented images, final result of the proposed MHMW algorithm, shows the ROI isolated from the background.

Fig. 3
figure 3

Results of the proposed segmentation algorithm.

Feature extraction

Characteristics of the blast cells is pivotal in the effective detection of the classes of leukemia. Extraction of the vital features improves the accuracy of the system by avoiding the utilization of redundant data 34. Among the different categories of features considered in medical image processing, the unique features possessed by the blast cells such as morphological features, HSV (Hue, Saturation, Value) histogram, Correlogram, geometrical features, frequency based features and Gray Level Co-occurrence Matrix (GLCM) features are extracted from the segmented images. Figure 4 presents a detailed overview of the extracted features, where the number of features under each category is represented in brackets, reaching a total of 320 features.

Fig. 4
figure 4

Feature set extracted from PBS images.

Feature selection

Effectiveness of the proposed multiclass classification model can be improved by reducing the irrelevant and the redundant features. Enhanced Glowworm Swarm Optimization has been utilized for feature selection due to its ability to adapt to the dynamic environment and its decentralized behavior enhances the robustness, scalability of the model. Images of ALL exhibit a high degree of variability in the features due to cell irregularities, staining variations etc. Further the high dimensional dataset shows variation in the importance of features among various images. The proposed optimization algorithm strives to maintain balance between exploration and exploitation.

Enhanced glowworm swarm optimization

A nature inspired Meta heuristic algorithm Glowworm swarm optimization—GSO relies on the bioluminescence behavior of glow worms 35. Glow worms are an adherent of Lampyridae- a family of beetles that have the ability to emit light naturally. This luciferin characteristic takes advantage of attracting mates and prey. Intensity of the glow changes from normal to high in accordance with the state of glow worms. Local resolution range is the region where the agents- Glow worms have its view. In the search space, agents aim at occupying a region with larger objective value. Local decision domains will be established if a neighbor with high luciferin is detected and the domains will extend on the increase in the number of high luciferin neighbors. Agent’s position is subjected to change at each iteration and fitness is proportional to the luciferin intensity level. The process will progress through three different phases-Luciferin Update Phase (LUP), Motion Phase (MP) & Decision Threshold Phase (DTP)36. At the beginning of the process the glow worms are dispersed randomly.

Luciferin Update Phase (LUP): The agents of optimization algorithm have been put on with the same level of luciferin value lcgw at the starting level even though it varies based on the environmental factors. gw represents the number of agents in the search space. Agents tend to move towards the neighbors with higher lcgw. Delay in the glow is visualized by withdrawing a portion of it. At each iteration luciferin value of glowworm \({lc}_{gw}\), is updated following the below rule.

$${lc}_{gw }\left(x+1\right)=\left(1-\rho \right){lc}_{gw }\left(x\right)+\gamma {j}_{gw }\left(x+1\right)$$
(5)
$$\rho -luciferin \;constant\; in\; the\; range\; of\; 0\; to\; 1\; \gamma -luciferin \;enhancement \;constant$$
$${j}_{gw }-objective\; function\; of\; glowworm \;gw\; at \;that\; location$$

Luciferin level \({lc}_{gw}\) measures the quality of the output over the time \(x\). It depends on \(\rho\), \(\gamma\) & \({j}_{gw}\) values, that represents glowworm’s ability to forget the old information, scaling terms for the fitness function and index term of the best neighbor respectively. The agents then progress to the adjacent phase.

Motion Phase (MP): The pattern of motion of agents in the glowworm swarm will be disturbed at each iteration by the behavior of different neighbors. Each individual glow worm in the swarm exploits the probabilistic means in determining the migration towards the neighbor exhibiting brighter glow. Greater the luciferin value greater the attraction. Probability of glow worm \({P}_{gwn}\) propagating towards the neighbor n is attributed by the Eq. 6.

$${P}_{gwn}=\frac{{L}_{n}\left(x\right)-{L}_{gw}\left(x\right)}{\sum_{h\in {k}_{gw}(d)}{L}_{h}\left(x\right)-{L}_{gw}\left(x\right)}$$
(6)

n  kgw (d) represents the set of glowworm neighborhoods. Preference of glow worm \({P}_{gwn}\) moving towards its neighbour is influenced by luciferin value of itself \({L}_{gw}\left(x\right)\), its neighbor \({L}_{n}\left(x\right)\) and highest \({L}_{h}\left(x\right)\) among the neighbors.

Decision threshold phase(DTP): The decision radius and the sensory radius associated with the agents is represented in Fig. 5. At each iterative stage the radius changes enabling the reach towards the global optimum rather than the local optimum. Function with the multiple peaks can be effectively handled with the glowworm swarm optimization. The radius range can be adaptively modified in accordance with Eq. 7.

Fig. 5
figure 5

Glow worm position with decision and sensory radius.

$${t}_{gw}\left(x+1\right)={t}_{gw}\left(x\right)+s\left(i\right)\left({t}_{n}\left(\frac{{t}_{n}\left(x\right)-{t}_{gw}\left(x\right)}{\| {t}_{n}\left(x\right)-{t}_{gw}\left(x\right)\| }\right)\right)$$
(7)

Current position of the glowworm \({t}_{gw}\left(x\right)\) ie the starting point is updated to \({t}_{gw}\left(x+1\right)\) based on the calculated Euclidean distance between the current position of it with its neighbor \({t}_{n}\left(x\right)\) and the step size \(s\left(i\right)\)

After the random initialization of the agent with a random number lying in 0–1 range, the fitness function of agents is computed. The best value is determined. The glowworms in the swarm are moved and the fitness position corresponding to the latest change is evaluated and compared with the previous best fitness value and the location. Replacement of the values will be carried out if the new value is more effective than the previous state. The performance of GSO is enhanced by adaptively varying the step size,\(s\left(i\right)\) in accordance with Eq. 8.

$$s\left(i\right)={s}_{0}\left(1-\frac{i}{I}\right)$$
(8)
$${s}_{0 }-Initial\; step size$$
$$i-Current\; iteration\; count$$
$$I - Maximum \;iteration$$

Iterations continue until the termination condition is achieved resulting in the best solution. Hyperparameters of EGSO such as number of agents, maximum iterations, luciferin decay and enhancement values are tuned in the range of 20–40, 50–150, 0.3–0.5 and 0.5–0.7 respectively. Perfect tuning of the number of agents and iterations maintains a balance between finding better features and execution time. Dynamic search is facilitated by the precisely tuned value. Step size and radius range are made adaptive, promoting exploration at the initial stages and exploitation at the later stages. The flow diagram involved in the implementation of EGSO is represented in Fig. 6.

Fig. 6
figure 6

Flow diagram of enhanced glowworm swarm optimization.

Performance comparison

Effectiveness of the proposed model has been compared with the other three most popular nature inspired optimization algorithms- Particle Swarm Optimization, Artificial Bee Colony Optimization and Elephant Herd Optimization.

Particle swarm optimization

One of the most widely used evolutionary computation methodology -particle swarm optimization algorithm is based on the flocking behavior of birds to identify food without the presence of a leader. At first, birds in the swarm represented as particles in the population are randomly initialized. Each particle is associated with a velocity and position. The particles aim to find the best solution in the search space and register that as their own best position PBest. Global best (GBest) is the best solution with respect to the swarm 37.

Artificial bee colony optimization

Honey bees foraging behavior for nectar has been exploited for the determination of optimum solution in the search space. Self-organizing and intelligent behavior of three different kinds of bee population—scout, onlooker and employed bees were inspired for solving the optimization problem. Food source count and its abandonment corresponds to the solution count and limit whereas maximum generation count, position and quantity of food source corresponds to the maximum cycle count and fitness respectively38. In the employed bee phase nearby food sources having high quality and quantity of nectar is determined The solution with highest probability is considered as best in the onlooker bee phase and once the food source becomes exhausted it moves into scout bee phase repeating the searching process. ABC algorithms perform well only in exploration not in exploitation.

Elephant Herd optimization algorithm

The huge social creature -Elephants live in groups containing many clans (groups of females and their calves) headed by the matriarch. Global optimization can be achieved based on the herding nature of the elephants through clan updating and separating operators. Position of the elephant in the clan other than the fittest one is determined 39. Individuals with poor fitness will be removed by the separating operator influenced by the departure of male elephant from the clan once it attains puberty.

Classification

To categorize the given leukemia images, classifiers play a vital role. The potency of the classifier lies in its caliber to provide high accurate results with reduced computational cost and error. The relevant features are given to most popular machine learning classifiers—Decision Tree (DT), RF, Multi-Layer Perceptron(MLP), kernels of SVM & Naive Bayes (NB) to categorize them into benign, early, pro and pre types of acute lymphocytic leukemia.

Decision Tree

Classification by the Decision Tree is put forward by the recursive fractionating the available data and building a predictive model in each and every fraction 40. The final decision of the algorithm is captured by tracing a guided path in the hierarchical structure represented in a graphical manner 41. The most vital features are bestowed to the nodes of the tree and it bifurcates making a decision focusing on the maximization of accuracy in classification.

Random Forest

An ensemble learning algorithm, Random forest, relies on the majority voting of the findings of the decision trees. It can accomplish exemplary performance on the multiclass classification problems 42. The problem of overfitting can be effectively handled by this algorithm in addition to the capability of handling high dimensional datasets. Generalization capability and accuracy of the classification has been boosted by bringing in randomness in the selection of samples and the subsequent features 43.

Multilayer perceptron

Neural network containing input layer, few hidden layers and the output layer -Multilayer perceptron establishes connection between them in the feed forward fashion. Each node receives needed information from the preceding layer and guides the nodes in the forthcoming layers. The correlation in the given dataset can be effectively processed by the hidden layers. Classification accuracy relies mainly on the network structure and training methodologies 44. The networks computation is carried out by the Eq. 9

$$a=\varnothing \left(\sum_{i}\left({W}_{i}{X}_{i}+b\right)\right)$$
(9)
$${X}_{i}-input, {W}_{i}-weight$$
$$\varnothing -activation \;function, b-bias$$

Support vector machine

The non-parametric and non-linear classifier, support vector machine classifiers the given dataset by settling the hyperplane, an optimal line that strives to maximize the margin between the different classes of the data 45. The classification process involves the use of various types of kernels—linear, polynomial, Radial Basis Function (RBF) and sigmoid kernels for the projection of nonlinear data into the high dimensional space. A wide range of hyperplanes can be created to segregate the designated data but SVM intended to select one that will impart maximum margin 46. The verdict on the class separation is based on the Eq. 10

$$\rho =\{0 \;\;if\; {w}^{T }x+\varphi <0\;, 1\; if\; {w}^{T }x+\varphi \ge 0$$
(10)
$$\rho \to predicted \;class\;\varphi \to bias$$
$$x\to new \;instance\; w\to weight\;vectors$$

Naive bayes

Classification by Naive Bayes relies on the Bayes theorem to estimate the probability of each outcome 47. This Simple yet powerful algorithm performs classification with the assumption that the features are independent of each other. Existence or nonexistence of a particular feature does not have an impact on the probability determination in the specified statistical algorithm 48. Classification accuracy of the algorithm will not shrink even in the presence of irrelevant features. Estimation is based on Eq. 11. Table 2 represents the hyper parameter values of the selected classifiers.

Table 2 Hyper parameter values utilized in the chosen classifiers.
$$p\left(\frac{x}{y}\right)=\frac{p\left(\frac{y}{x}\right)p\left(x\right)}{p\left(y\right)}$$
(11)
$$p\left(x\right)-prior \;probability, p\left(y\right)-probability\; of\; y$$
$$p\left(\frac{x}{y}\right)-posterior\; probability, p\left(\frac{y}{x}\right)-likelihood$$

Result and discussion

The evaluation of the proposed automated system has been carried out with the aid of various benchmark metrics. For the multiclass classification of ALL, the dataset has been splitted into two parts, 80% of data for training the proposed model and the remaining 20% for testing. Dataset is found to be imbalanced. In case of an imbalanced dataset, the model may get trained to learn by ignoring the minority class due to data scarcity, leading to biased results. It is handled by oversampling, enabling the model to generalize among all the classes. Increase in the volume of data, results in elevated computation time, generating a tradeoff but it is effectively reduced EGSO’s vital feature selection ability. Early B class having 788 training images is identified as majority class and minority classes are oversampled to improve model learning. 652 images are used for testing the proposed model. Most popular ML algorithms-Decision tree, Random forest, Multilayer perceptron, Naive Bayes and four different kernels- Linear, polynomial, radial basis function and sigmoid kernels of SVM are employed for classification. Confusion matrix and Receiver Operating characteristic (ROC) curve of the classifiers before the optimum feature selection by EGSO are given in the Fig. 7

Fig. 7
figure 7

Confusion matrix and ROC curve of classifiers without EGSO.

One of the powerful accessing tool- confusion matrix and the qualities calculated from its entries are fruitfully exploited in judging the efficiency of the proposed automated detection and classification system. True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN) of multiclass classifier values of chosen ML classifiers are exhibited in Table 3. Each entry corresponds to one of the classes of ALL. For instance, in the decision tree classifier, TP value of 101 indicates that 101 benign images are perfectly predicted as benign. Large values in case of true prediction and small values in false predictions indicate the classifier is more accurate in classifying the different classes of ALL. RF performs better than the other classifiers whereas sigmoid kernel SVM performs least.

Table 3 True and False prediction values without EGSO algorithm for feature selection.

Confusion matrix and ROC curve of the proposed model where the most vital features are selected with the aid of EGSO is presented in Fig. 8. Entries of the confusion matrix provided evidence that there has been a greater increase in the correct prediction of benign and malignant classes and decrease in false predictions. One vs Rest approach is used to obtain the receiver operating characteristics curve of multiclass classifiers with Area Under Curve (AUC) ranges from 0.75 to 1.

Fig. 8
figure 8

Confusion matrix and ROC curve of classifiers with EGSO.

TP, TN, FP and FN values of the proposed model using MHMW algorithm and EGSO represented in Table 4. A few instances of misclassification error of type 1 and type 2 has occurred during classification, where the model has produced a minimum number of False Positive and False Negative. In case of multi-class classification, a class of image incorrectly classified as any of other classes is marked as FP and the model’s inability to classify the image belonging to a particular class is represented as FN. The proposed model has effectively handled the root causes, noisy input data, presence of redundant or irrelevant features, resulting in significant error reduction. All the algorithms bestowed an increase in true predictions and decrease in false predictions with Random forest providing the highest number of true and least number of false predictions leading to effective treatment planning.

Table 4 True and False prediction values with EGSO algorithm for feature selection.

The exhaustive comparison of the classifiers with and without the use of EGSO for the evaluation parameter of accuracy, precision, is represented in Fig. 9. F1 score & kappa value comparison of classifiers with and without utilization of EGSO is exhibited in Fig. 10. Figure 11 presents the comparative analysis of Matthews correlation coefficient (MCC) and AUC ROC for the selected classifiers. Random forest exhibited the highest performance in comparison with other classifiers with the accuracy, precision and F1 score of 98.23%, 98.25%, 98.22% and 97.58%, 97.59%, 0.99 of Kappa, MCC and AUC ROC values respectively.

Fig. 9
figure 9

Comparison of classifiers with and without EGSO with respect to accuracy and precision.

Fig. 10
figure 10

Comparative analysis of F1 score and kappa values of classifiers with and without EGSO.

Fig. 11
figure 11

Comparison of MCC and ROC AUC for classifiers with and without EGSO.

Based on the performance, classifiers with the accuracy are arranged from the best to least as Random forest (98.23%) > MLP (97.43%) > SVM with linear kernel (97.35%) > SVM with RBF kernel (95.24%) > SVM with polynomial kernel (94.97%) > Decision tree (93.55%) > Naive Bayes (90.21%) > SVM with sigmoid kernel (67.97%). Even though the sigmoid kernel of SVM performs the least, all its performance parameters were increased reflecting the impact of EGSO in choosing the features.

Increase in accuracy is the evidence for the correctness of the model in multiclass classification. Higher precision (0.9825) proved that the positive cases classified by the model are really positive facilitating proper treatment planning. Existence of harmonic balance between precision and recall in the proposed model is demonstrated by the higher value of F1-score (0.9822). Higher value of MCC (0.9759) and Kappa (0.9758) validates models’ high performance in considering false cases avoiding unnecessary panic among the healthy individuals and proper treatment for the leukemia patients according to the affected class. AUC ROC metric ranging from 0.86 to 0.99 showed that all the classifiers had greater discrimination power in identifying the positive and negative classes. From the comparison graph, it is inferred that all chosen classifiers showed improved performance on the usage of EGSO for feature selection demonstrating its ability in selecting the most vital features. Solutions of the fitness function of the metaheuristic algorithm have been evaluated based on the classification accuracy of the support vector machine. Hence decision tree and random forest classifiers exhibited slight variation in the results with and without feature selection. The developed model exhibited slight under-performance, leading to misclassification, when the SVM sigmoid kernel is used to classify the high dimensional leukemia data. It is proposed to stack the different SVM kernels for further error reduction.

Specificity, Jaccard index and error rate comparison of selected classifiers in the task of classifying four different classes of ALL with and without EGSO is represented in Fig. 12. All classifiers showed increase in specificity and jaccard index after the involvement of EGSO proves the ability of the classifiers in identifying correct ALL classes. Enhancement in the model’s performance has been further confirmed by the reduction in the error rate of the classifiers.

Fig. 12
figure 12

Specificity, Jaccard Index and Error Rate Comparison of Classifiers with and without EGSO.

Game theory based approach, SHapley Additive exPlanations (SHAP) 49, is explored to ensure fairness in model’s interpretability and consistency. It assigns importance score for each feature facilitating both global and local view, ensuring transparency and trust in the diagnostic results. Figure 13 exhibits top 20 features, arranged in the increasing order of its impact on the model’s performance. High and low feature values are represented by red and blue color respectively and its spread indicates the model’s predicted probability. Feature 6 with wide spread, has the highest importance followed by feature 151.

Fig. 13
figure 13

Feature importance by SHAP Value.

Comparison of proposed model with three chosen nature inspired algorithms are represented in Table 5 All the meta-heuristic methods specified in the manuscript run up to 100 iterations, on the identical fitness function, with 30 candidate solutions in each iteration. Solutions of the fitness function of all the optimization algorithms have been evaluated based on the classification accuracy of SVM. Around 320 extracted features, EGSO selected the most vital 153 features consuming considerably less optimization time (511 s). RF bestowed better performance among the selected classifiers in all the chosen optimization algorithms with EGSO providing highest accuracy.

Table 5 Performance comparison of optimization algorithm.

In a large scale application, computational cost becomes a vital parameter. In case of our proposed model, it depends on various factors such as number of iterations, population size and dimensionality of data. Other Optimization algorithms, ABC & EHO incorporates elaborate search process, requiring more iterations, whereas the proposed EGSO uses adaptive tuning reducing the iteration count. Even though it increases the per iteration cost compared with PSO, it is compensated by the faster convergence. Optimisation algorithms exhibit a tradeoff between accuracy and optimization time. ABC and EHO needs increased iterations to produce more accurate results. PSO facilitates faster convergence at the risk of premature convergence. The proposed model strives to balance the optimization time and quality feature selection. Comparative performance of the metaheuristic algorithms & ROC curve is represented in Fig. 14 and Fig. 15 respectively., where the proposed method selected the minimum number of optimal features in comparatively less time.

Fig. 14
figure 14

Comparison of Metaheuristics algorithms.

Fig. 15
figure 15

(a), (b) & (c) represents ROC curves of classifiers with PSO, ABC and EHO algorithms for feature selection respectively.

Performance of the proposed system is validated with the additional dataset ALL-IDB2, consisting of 260 images with 257 X 257 resolution , 24 bit color depth, out of which 130 images are normal and the remaining 130 images are malignant 50. Evaluation metric values of binary classification presented in Table 6, show that there is an enhancement in the model’s behavior with the utilization of EGSO. RF and SVM with RBF kernel produced highest classification accuracy followed by DT & SVM with polynomial kernel.

Table 6 Performance comparison of the proposed model with ALL-IDB2 dataset.

Table 7 presents comparative analysis of proposed systems with the models presented in literature. To maintain fairness, literature that uses the same dataset has been selected for comparison. In Case of binary and multiclass classification, the proposed model bestows higher accuracy than most of the other models due to effective feature selection. Few DL models 20,21 exhibited slightly higher accuracy at the expense of increased computational complexity, expensive hardware requirement and focused on a single dataset, lacking validation of results on other dataset. Further, usage of color grouping method 21 exhibits the potential limitation, requiring sufficient number high quality RGB image. It is also mentioned that the model’s efficiency will be lowered, if the requirement is not satisfied. Proposed system tested on two different dataset strives to maintain the balance between accuracy and computational complexity, even on the limited amount of data.

Table 7 Comparative analysis of the proposed model.

For the real world deployment, the proposed model should be generalized. Even though there are data sharing constraints in many hospitals across the globe, results of the developed system have been validated on two different publicly available datasets. Further the quality of input image given to it, differs owing to a variety of factors such as setting of microscope, staining variations and deviation in imaging devices parameters but the proposed model has the ability to handle it through pre-processing techniques. In order to facilitate the integration of the proposed system with the existing diagnostic tools, ML techniques are preferred since it does not require expensive hardware and comparatively less computational complex. One potential limitation of the model is the utilisation of single modality for diagnosis. Performance of it can be further enhanced by multimodal data fusion techniques where data from different modalities like CBC, Next Generation Sequencing, chromosomal abnormalities, Gene expression data etc., can be combined to have more discriminative features as well as increasing model’s accuracy and robustness to noise. Fusion can be either early or late depending whether the raw data from different modalities or their results of classification were combined.

Conclusion

Early and effective detection of appropriate classes of Acute Lymphocytic Leukemia will lead to proper treatment planning and speedy recovery. The proposed model utilized 3256 peripheral blood smear images acquired from 89 patients. Image enhancement has been carried out with the sequence of preprocessing steps of applying median filter and contrast limited adaptive histogram equalization. Then the images are segmented using a MHMW algorithm facilitating the extraction of ROI even in the presence of overlapping cells and most robust features form the pool of 320 extracted features are selected by the Enhanced Glowworm swarm optimization algorithm. Proposed model aimed at multiclass classification of Benign, Early, Pro-B, Pre-B classes by the most popular machine learning classifiers. Ability of the classifiers analyzed with various benchmark performance metrics such as Accuracy, precision, F1 score, kappa, MCC, AUC-ROC, specificity, Jaccard index and error rate proved that Random Forest yields better performance with 98.23% of accuracy compared to other selected algorithms. Enhanced ability of multiclass classification has been achieved by most of the classifiers with the involvement of EGSO. Comparison with three popular nature inspired optimization algorithms-PSO, ABC and EHO proved that the proposed model having high performance can be effectively used in the early screening of types of ALL.

Future scope

In future, the developed system can be further extended to classify other types of leukemia- AML, CLL, CML and its subtypes by performing Multi Task Learning, enabling the model to learn common features among the types and optimizing classification tasks independently. Further the model has to be trained to learn invariant features, making it robust to variations in multi institutional datasets. It is planned to improve the system performance by having multimodal data analysis.