Abstract
The difficulty of selecting features efficiently in histopathology image analysis remains unresolved. Furthermore, the majority of current approaches have approached feature selection as a single objective issue. This research presents an enhanced multi-objective whale optimisation algorithm-based feature selection technique as a solution. To mine optimal feature sets, the suggested technique makes use of a unique variation known as the enhanced multi-objective whale optimisation algorithm. To verify the optimisation capability, the suggested variation has been evaluated on 10 common multi-objective CEC2009 benchmark functions. Furthermore, by comparing five classifiers in terms of accuracy, mean number of selected features, and calculation time, the effectiveness of the suggested strategy is verified against three other feature-selection techniques already in use. The experimental findings show that, when compared to the other approaches under consideration, the suggested method performed better on the assessed parameters.
Similar content being viewed by others
Introduction
In disease diagnostic, pathologists perform microscopic examinations of histopathological samples to identify the sign of infection. During this analysis, primary focus is on tissue structure, cell count, and shape of the cell. Moreover, this manual examination demands expertise which makes it an expensive, one-sided, and tedious process1. Consequently, its automation is fundamental for quick and impartial finding2. To do this, histopathological images are captured through microscopic mounted cameras, which are further analyzed by computer-assisted histopathological image analysis methods. Figure 1 illustrates some sample images of different tissues, taken at 40x magnification level3. The intricate structure of histopathological images presents a challenging environment, even for classification tasks. Therefore, this paper presents a novel method for the efficient classification of histopathological images.
Representative histopathological images of different tissues, taken at 40x magnification level3.
Further, the success of any classification-based method is predominantly dependent on the quality of features4. In literature, numerous methods are presented to efficiently extract the features, which are broadly classified as traditional and learning. The first category corresponds to feature descriptors which are extracted based on statistical computation5. Scale-invariant feature transform (SIFT)6, histogram-oriented gradient (HOG)7, and speed-up robust features (SURF)8 are some of the common examples of traditional feature extraction methods. These methods are quite effective for natural image analysis. However, these methods produce a large number of irrelevant features in a complex environment. Moreover, features generated by traditional methods are generally non-transferable. On the contrary, learning-based methods employ various machine learning models to obtain features9. Deep neural networks (DNN) models and auto-encoders restricted Boltzmann machines are some of the commonly used models. The literature witnessed that DNN-based solutions are effective in analyzing complex and diverse environments10, like histopathological images. Masci et al.11 presented a feature extraction method by initializing a convolutional neural network (CNN) with a convolution auto-encoder. Xu et al.9 extracted features from stromal and epithelial tissue by employing a deep CNN. A comprehensive survey on various DNN-based feature extraction methods can be found in12,13. Generally, DNN-based feature extraction methods find global features by learning directly from low-level features, resulting in the generation of better feature descriptors. However, these feature descriptors may include redundant and irrelevant features, which will degrade the performance of the classifier14,15,16,17. Therefore, this paper presents a new feature selection method for efficiently classifying histopathological images.
In general, the selection of relevant features is a combinatorial optimization problem as there is a possibility of \(2^n\) combinations of relevant features for n features18,19. When feature space is large, identifying the correct combination of features is exhaustive. To fasten this search, researchers have employed meta-heuristic-based solutions20,21,22. Generally, the swarm-based algorithms are optimization models that stimulate nature’s behaviour mathematically23. In literature, these algorithms have successfully solved various complex and non-linear real-world problems of different domains such as biomedical image analysis, wireless sensor network, software engineering, sentiment analysis, etc.24,25,26,27,28,29. Some of the widely used meta-heuristic algorithms are differential algorithm30, genetic algorithm31, particle swarm optimization32, gravitational search algorithm33, artificial bee colony34, cuckoo search35, and military-dog-based algorithm36. Mostly, meta-heuristic algorithms consider single objective functions to do optimization37,38,39. However, this presents only a single perspective for solving the problem40,41,42. In contrast, multi-objective meta-heuristic algorithms are better as multiple criteria are considered while performing optimization43. In literature, researchers have presented several algorithms for unfolding multi-objective optimization. Some of the widely used methods for untangling multi-optimization problems are Pareto-archived evolution strategy (PAES), Non-dominated sorting GA (NSGA-II), and strength-Pareto evolution algorithm (SPEA)44. Abdollahzadeh and Gharehchopogh45 employed three optimization algorithms to select the best feature set. In their approach, in the first solution, the Harris Hawks Optimization (HHO) algorithm has been multiplied, and in the second solution, the Fruitfly Optimization Algorithm (FOA) has been multiplied, and in the third solution, these two solutions are hybridized to improve the efficacy. Li et al.46 introduced a multi-objective binary grey wolf optimization for feature selection using Pearson correlation-guided mutation, uniform initialization, and three distinct mutation strategies for efficient search. Jiao et al.47 presented a multiform framework for multiobjective feature selection, leveraging auxiliary single-objective tasks in a multitask environment to enhance evolutionary search by utilizing promising feature subsets. Xue et al.48 proposed a multi-objective evolutionary algorithm with interval-based initialization and self-adaptive crossover for large-scale feature selection, improving initial population distribution and reducing decision space similarity. Bai et al.49 introduced a joint multiobjective optimization approach for high-dimensional data classification using a neural network classifier and a non-iterative training algorithm, ensuring good performance and fast learning. Zhang et al.50 proposed a multi-objective optimization algorithm combining Equilibrium Optimizer (EO) and NSGA-III for feature selection in high-dimensional data. It uses S-shaped, V-shaped, and U-shaped transfer functions for binary coding and optimizes the binary search space through an external archive and clustering strategy. Recently, Aziz et al.51 presented Multi-objective whale optimization algorithm (MOWOA), which mimics the hunting behavior of humpback whales52. MOWOA consists of mainly three phases: prey encircling, bubble-net attacking strategy, and searching for new prey. Although MOWOA has shown a number of merits, such as optimal Pareto-fronts and balanced exploration and exploitation. However, it sometimes suffers from poor population diversity, resulting in less exploration rate. Therefore, this paper introduces a new variant of MOWOA, improved multi-objective WOA (IMOWOA), to improve population diversity.
The contribution of this paper can be analyzed in two folds;
-
1.
Novel Variant of Multi-Objective WOA A novel variant of the multi-objective WOA is proposed, termed improved multi-objective WOA (IMOWOA). In the proposed IMOWOA, the archive set is partitioned into different priority sets based on non-dominated sorting and crowding distance. This prioritization ensures better management of solutions, leading to improved population diversity and enhanced exploration capabilities.
-
2.
New Feature-Selection MethodA new feature-selection method is developed, which uses IMOWOA to select optimal features for the efficient classification of histopathological images. The performance of IMOWOA is validated on ten CEC2009 multi-objective benchmark problems, including bi-objective and tri-objective test problems. To compare the results, three parameters, namely maximum spread (MS), inverted generational distance (IGD), and spacing (SP) are considered. The results are validated against three existing multi-objective meta-heuristic algorithms: multi-objective evolutionary algorithm (MOEA/D), multi-objective particle swarm optimization (MOPSO), and MOWOA.
The proposed feature-selection method is investigated on a publicly available histopathological dataset with four tissue classes: epithelial, connective, muscular, and nervous. For comparison, three existing feature selection methods are considered. The extracted features by the considered methods are classified using six well-known classifiers: LDA, SVM, kNN, ZeroR, and RF. The results are compared in terms of accuracy, mean number of selected features, and computational time.
The remaining paper is organized as follows; Section “Multi-objectivewhale optimizationalgorithm (MOWOA)” briefs the multi-objective WOA. The proposed approach, along with the proposed variant, is illustrated in section “Proposed method”. Section “Results and discussion” discusses the experimental analysis of the IMOWOA and the proposed method. Finally, Section “Conclusion” presents the conclusion of the paper.
Multi-objective whale optimization algorithm (MOWOA)
MOWOA is a multi-objective version of WOA52, which is inspired by Humpback Whales. Whales are the largest creature globally; which has twice the number of body cells than humans. The optimization process is designed using the hunting analogy of the whales. There are mainly three phases of hunting: encircling the prey, bubble-net attacking strategy, and searching for new prey. The conceptual and mathematical descriptions of these phases are provided below.
-
1.
Encircle the prey: This is the first phase of hunting. As the location of the prey in the search domain is initially unknown, WOA considers the current best position as the overall best solution and updates its further movement accordingly as per Eq. (1)
$$\begin{aligned} \vec {X}(t+1)=\vec {X}_b(t) - {2\vec {a} \cdot \vec {r} - \vec {a}} \cdot \mid 2 \cdot \vec {r} \cdot \vec {X}_b(t) - \vec {X}(t)\mid \end{aligned}$$(1)where, \({X}_b(t)\) is the best position at iteration t, \(r \in (0,1)\), vector a is contain decreasing values from 2 to 0, and \({X}(t+1)\) represents the location of the whale at next step.
-
2.
Bubble-net attacking strategy In this phase, along with the encircling strategy, the spiral movement of the whales is also considered to search the prey in an equally likely manner. The probability (\(p_r\)) of using spiral and shirking encircling phases is \(50\%\) each. The mathematical formulation of bubble-net attacking is provided in Eq. (2)
$$\begin{aligned} \vec {X}(t+1)={\left\{ \begin{array}{ll} (({X}_b(t)-{X}(t))\cdot e^{bl}\cdot (2\pi l)+ {X}_b(t)) & p_r\ge 0.5\\ (Eq. (xm{4}))& p_r <0.5 \end{array}\right. } \end{aligned}$$(2)where b denotes the shape of the spiral and kept as the constant, \(l \in [-1,1]\) denotes a random number.
-
3.
New prey search: The above two phases are responsible for the exploitation, whereas this phase is leveraged to explore the search domain for finding new prey. The searching behavior is purely random and can be formulated by Eq. (3).
$$\begin{aligned} \vec {X}(t+1)=\vec {X}_{r}(t) - {2\vec {a} \cdot \vec {r} - \vec {a}} \cdot \mid 2 \cdot \vec {r} \cdot \vec {X}_r(t) - \vec {X}(t)\mid \end{aligned}$$(3)where \({X}_{r}\) is a randomly generated solution at iteration t.
The above-defined WOA can be efficiently leveraged for multi-objective problems by the inclusion of various significant constructs like finding non-dominance between solutions, maintaining the archive set, selection methods, etc. in MOWOA. In multi-objective optimization, the ranking of the solutions is defined by pareto-optimality based on non-dominated solutions. Further, to manage the repository of these solutions archive set is maintained. There are certain rules to update or modify the archive set which is described as follows:
- Rule 1::
-
Initially the archive set is empty, so all the non-dominated solutions are inserted into it.
- Rule 2::
-
Let the archive set consists of S solutions which are non-dominated. If the new non-dominated solutions are not better than these S and few solutions belonging to the archive set dominate, then there is no change in the archive set.
- Rule 3::
-
If new solutions and solutions in the archive set are non-dominated then these new solutions will be added to the archive set.
- Rule 4::
-
If any of the new non-dominated solutions is better than one or more solutions in the archive set, then the better solutions is included in place of previous ones.
- Rule 5::
-
If the archive set is not empty, then the most crowded solutions are taken away from the archive to balance the diverse set of solutions and new solutions are given chance.
Moreover, to maintain the diversity and distribution of the solutions in the archive set roulette wheel method is used.
Proposed method
This paper introduces a novel variant of multi-objective WOA termed as improved multi-objective WOA (IMOWOA). Further, a new IMOWOA-based feature-selection method is introduced to extract relevant features for the efficient classification of images, especially histopathological images. The overall flow diagram of the proposed method is depicted in Fig. 2.
IMOWOA
In the existing MOWOA, only one archive set is maintained in each iteration, which may sometimes result in poor population diversity and less exploration. However, in the proposed IMOWOA, although there is always one archive set, it is partitioned into different priority sets. This partitioning ensures better management of the solutions, leading to improved population diversity and enhanced exploration capabilities. The archive set is updated by including non-dominated solutions from different priority sets, which allows for a more diverse and comprehensive search of the solution space. The detailed steps for updating the archive set are as follows:
-
1.
Let there are N individual solutions in the population of size P. Find the non-dominated set of individual solutions from P.
-
2.
Store the non-dominated individual solutions in the \(priority\_set_i\) (i=1) and delete these solutions from the P.
-
3.
Again, find the non-dominated individual solutions from the remaining population P and add these solutions to the \(priority\_set_i\) where (\(i=2, 3, 4, \cdots\))
-
4.
Repeat the above process until no solution is left in the population P.
Once the archive set is partitioned into the different priority sets, then the population (\(P'\)) for the next iteration is updated using the following steps:
-
1.
Add the solutions of \(priority\_set_i\) (i=1) to the \(P'\)
-
2.
If (\(\mid P \mid <N\)) and (N-\(\mid P\mid )<\)s number of solutions in \(priority\_set_i\) where i = 2, 3,... then add solutions of \(priority\_set_i\) to the \(\mid P'\mid\)
-
3.
Otherwise, find the less crowded solutions in the \(priority\_set_i\) and add them to \(P'\)
Once the \(P'\) is filled with N solutions, the above procedure is stopped. The resulting \(P'\) is nothing but the newly generated archive set. Due to the inclusion of domination sorting and crowding distance, the new archive set is more diverse and has good exploration capability.
IMOWOA for feature selection
A novel IMOWOA-based feature-selection (IMOWOA-FS) method is proposed for the optimal selection of relevant features. First, the considered images are processed through a convolutional neural network to obtain the relevant features. The convolutional neural network’s architecture is illustrated in Fig. 3. Next, the proposed variant, an improved multi-objective whale optimization algorithm (IMOWOA), operates on the extracted features to attain the optimal feature set. In the proposed variant, the population is initialized with real values between 0 and 1. Each extracted feature represents one dimension of the data. Further, a sigmoid function maps each individual to 0 and 1. Equation (1) presents the formulation of the sigmoid function for \(x_i^j\) value of the \(i{\text {th}}\) individual at \(j{\text {th}}\) dimension.
Furthermore, if the resultant value \(\bigg (x_i^j\bigg )\) of \(i{\text {th}}\) individual corresponds to ‘1’, then the corresponding extracted feature is selected, else the corresponding extracted feature is discarded. Based on the set of selected features, the fitness of individuals is computed by considering 2 objective functions as presented in Eqs. (2) and (3). Equation (2) corresponds to the maximization of the classification accuracy on the selected features while Eq. (3) depicts the minimization of the number of features to be selected.
In Eq. (2), ‘k’ denotes the number of classes while, TP denotes the correctly identified instances of class ‘k’. Further, Eq. (3) considered the ratio of selected features (SF) to total extracted features (TEF). The position of each individual is updated using proposed IMOWOA. Finally, the optimal feature set trains the considered classifier to infer labels for the input image.
In multi-objective optimization, it is common to treat objectives with equal importance to ensure a balanced exploration of the solution space. The IMOWOA inherently balances the maximization of accuracy and minimization of the number of selected features through its non-dominated sorting and crowding distance mechanisms. This balance ensures that solutions are not biased towards either objective, promoting a diverse set of optimal solutions that achieve a good trade-off between accuracy and feature selection. This approach is beneficial because it allows the algorithm to explore a wide range of potential solutions without prematurely converging on solutions that overly prioritize one objective over the other. This is particularly important in feature selection tasks, where both accuracy and the number of selected features significantly impact the performance and interpretability of the classifier. By maintaining this balance, IMOWOA ensures that the selected features are not only relevant and contribute to high classification accuracy but are also minimal, reducing computational complexity and enhancing the model’s generalizability.
Simultaneous optimization of maximization and minimization objectives
The proposed IMOWOA handles both maximization and minimization objectives simultaneously through the following steps:
-
Non-Dominated Sorting: During each iteration, the algorithm sorts the population based on non-domination. Solutions that are not dominated by any other solution in terms of both objectives (maximization of accuracy and minimization of the number of selected features) are considered non-dominated.
-
Pareto Front Construction: The non-dominated solutions form the Pareto front, representing the best trade-offs between the objectives. These solutions are stored in the archive set.
-
Crowding Distance Calculation: To maintain diversity among the Pareto front solutions, a crowding distance metric is calculated for each solution. This metric ensures a diverse spread of solutions by favoring those in less crowded regions of the objective space.
-
Selection for Next Iteration: The next generation of solutions is selected based on their dominance rank and crowding distance. Solutions with better trade-offs (higher accuracy and fewer selected features) are given higher priority.
By balancing these two objectives through non-dominated sorting and crowding distance, IMOWOA effectively handles the simultaneous optimization of maximization and minimization objectives, leading to a diverse set of optimal solutions.
Time complexity of IMOWOA
In the proposed algorithm, the time complexity depends on the determination of the non-dominated individuals and the updation of the individuals. The computation complexity for identifying the non-dominated solutions is O(\(N^2\)*L) wherein N represents the number of individuals and L corresponds to the number of objective functions. The updation of individuals takes O(N*D) computing cost, where D depicts the feature space of each individual. Therefore, the overall complexity of the proposed algorithm is O(\(N^2\)*L).
Results and discussion
The performance of the proposed IMWOA has been discussed in sections “Performance analysis of proposed IMOWOA” and “Performance analysis of feature selection technique”. First, the proposed IMOWOA has been validated on ten CEC-2009 multi-objective benchmark problems in section “Performance analysis of proposed IMOWOA”, then in section “Performance analysis of feature selection technique” it has been used for selecting relevant features from histopathological image datasets. All of the experiments were run on a Matlab 2017a machine with a 2.90 GHz Core i3 processor and 16 GB of RAM for a fair analysis.
Performance analysis of proposed IMOWOA
To test the efficiency of proposed IMOWOA, 7 bi-objectives (\(UF_1\)–\(UF_7\)) and 3 tri-objective test benchmarks (\(UF_8\)–\(UF_{10}\))53,54 have been considered. The details of the benchmark functions are presented in Figs. 4 and 5. It can be observed from the table that the considered benchmark functions contain distinct multi-objective search regions with non-convex, convex, multi-modal, and dis-continuous Pareto fronts which have been considered the hardest test problems in the literature. Maximum spread (MS), Inverted generational distance (IGD), and spacing (SP) are commonly used to validate the efficacy of multi-objective methods since they consider the standard deviation and mean values. IGD value assesses the convergence while MS and SP evaluate coverage of the considered approach44,55,56. IGD is an improved version of Coello’s generational distance (GD)55,57 which is computed using Eq. (7) whereas the values of SP and MS are computed using Eqs. (8) and (9).
where N denotes the true optimum Pareto solutions (PS), whereas \({dist_j}\) is the Euclidean distance measure between the \(a{\text {th}}\) true Pareto optimal and the reference set’s nearest computed POS.
here, dist’ denotes the mean of all \(dist_a\), N denotes the total number of optimum PS found, and \(dist_a=\min _a\bigg (|f_1^i (\textbf{x})- f_1^a (\textbf{x})|+|f_2^i (\textbf{x})- f_2^a (\textbf{x})|\bigg )\) for all \(i, a=1,2,3, \cdots , N\).
where \({det{()}}\) calculates the Euclidean distance, \(x_j\), \(y_j\) is the maximum and minimum value in \(a{\text {th}}\) objective, and t represents the total count of objectives.
The optimal values SP, MS, and IGD values of all the considered approaches have been compared to assess the efficacy qualitatively. To investigate the results, the proposed IMOWOA is compared to MOPSO56, MOGWO53, and MOEA/D58. Each algorithm has been run 10 times to get the mean value of results, which have been considered to ensure a fair analysis and reduce the interference impact. The number of iterations (tit) and size of the population (N) are fixed to 1000 and 50, respectively, while the rest of the parameters are taken from the relevant literature. The mean, standard deviation, median, best, and worst values of IGD, MS, and SP of the proposed IMOWOA and other considered methods are compared to assess the efficiency of the proposed approach. The IGD, SP, and MS values produced by the proposed IMOWOA and other techniques are depicted in Tables 1, 2, 3. Table 1 shows that for more than 90% of benchmark problems, the proposed IMOWOA achieves the best IGD values. IGD values are normally used as a benchmark for comparing the convergence of different methods. So, from the results listed in Table 1, it can be envisioned that the proposed IMOWOA shows better convergence. There are a few benchmark problems for which other algorithms return the best IGD values. MOEA/D produces the best IGD values for UF3, UF6, and UF7, while for the benchmark UF8, MOPSO outperforms the proposed and other techniques. Thus, it can be vindicated from the experimental analysis that the proposed IMOWOA has a consistent performance than the other methods.
Besides, in Tables 2 and 3, the MS and SP values are also examined. As MOEA/D is not executed for the tri-objective function \(UF_8\), \(UF_9\), and \(UF_{10}\), henceforth, for these benchmark problems, the proposed IMOWOA is only validated against MOPSO and MOGWO. From the tables, it can be shown that the proposed IMOWOA has superior convergence and coverage. Though there are a few discontinuities on the Pareto optimal front achieved by the proposed IMOWOA, the entire front coverage front is wider than MOEA/D, MOPSO, and MOGWOA for a few benchmarks. However, the proposed IMOWOA’s Pareto optimal solutions are nearer to the optimal Pareto front and fairly distributed for bi and tri-objectives. Hence, from the statistical findings, the efficiency of the IMOWOA can be interpreted.
Additionally, a non-parametric Friedman test59 was performed to determine the best-performing method. This test ranks algorithms based on their performance across all datasets60. The highest-performing model is assigned a rank of 1, with subsequent ranks assigned to lower-performing models. If two algorithms perform equally, their ranks are averaged across different runs. The Friedman test produced a p-value of 2.54E–25, significantly below the threshold (0.05), indicating substantial differences in performance across various groups of meteorological variables. Table 4 presents the rankings obtained from the Friedman test, showing that the proposed method is significantly superior to other considered methods. Besides, the convergence plot is also plotted in Figs. 6 and 7. It can be observed from the convergence plot that the proposed method converges more quickly compared to other methods. Thus, from the above analysis efficacy of the proposed method can be observed.
Performance analysis of feature selection technique
A histological tissue image dataset containing four histopathological tissue images namely, nervous, connective, epithelial, and muscular tissues have been used to investigate the IMOWOA feature selection (FS) approach. The images in datasets have been taken from publicly accessible sources61,62. Each image category consists of 101 photographs with separate staining processes. The dataset parameters are listed in Table 5. In contrast Table 6 denotes the number of selected features for different algorithms. Stratified random sampling is employed to partition the entire dataset testing and training sets for classification. To elucidate the efficacy of the proposed IMOWOA-based feature selection, three state-of-the-art techniques, namely differential evolution (DE), Jaya algorithm (JA), and adaptive Jaya algorithm (AJA) are considered. For the same, all the considered methods was first tested on original dataset without feature selection and dataset with feature selection in Table 7. The original dataset without feature selection and the dataset with feature selection are shown in Table 2 to demonstrate the significance of the feature selection. All the considered methods was first tested on original dataset without feature selection and dataset with feature selection in Table 6. The number of features returned by the various classifiers is depicted in Table 7. Further, the mean selected features and accuracy of all the considered algorithms are computed using six well-known classifiers: KNN, LDA, ZeroR, RF, and SVM. The findings of all the methods are tabulated in Table 8. The accuracy of the ZeroR is the baseline for all the approaches. It can be vindicated from the tables that the IMOWOA-based FS approach corresponds to the least number of selected features from the set of features extracted by AlexNet. It eliminates 93.4 percent of features, followed by AJA’s 91.6 percent, JA’s 90.8 percent, and DE’s 88.5 percent. The accuracy obtained by the classifiers for selected FS has been used to assess the relevance of the chosen features. It is clearly visible in the table that all of the investigated classifiers generate the highest accuracy from the features confirmed by the novel IMOWOA. The accuracy achieved by the SVM, i.e., 56.50%, is the best among all the considered classifiers. Thus, the above analysis has observed that the IMOWOA with SVM outruns the other FS approaches.
In addition, the computational time of considered approaches are also compared in Table 9. It is clearly envisaged from the table that the IMOWOA has the minimum execution time for all the classifiers. Thus, it can be claimed that the obtained features are non-redundant and relevant, resulting in improved accuracy without compromising a classifier’s computing speed. Therefore, it can be said from the experimental analysis that the developed IMOWOA-based feature selection method generates fewer features while generating a high level of accuracy.
Furthermore, the classification performance of the considered feature selection approaches using SVM has been examined independently for each of the four categories. The confusion matrices of all of the investigated categories are shown in Fig. 8. MT, NT, CT, and ET correspond to muscular, nerve, connective, and epithelial tissue respectively in the figure. The IMOWOA-based technique has greater than 50% accuracy for MT, ET, CTand MT images. However, it only achieves 48% accuracy for NT, which is the maximum among all the algorithms tested for this specific image. On the contrary, AJA, JA, and DE attain greater than 50% accuracy on CT and MT images. Besides, the performance of the confusion matrix has been investigated using F-measure, precision, recall, and specificity, as shown in Table 10. The results show that IMOWOA outperforms the other approaches. IMOWOA’s overall classification accuracy is 59.65 percent, which is the best among the considered methods.
Moreover, for better visualization of the performance of all considered techniques, radar charts are developed for each category of the datasets separately. The radar charts depict the values of four parameters, namely recall, precision, F1-score, and G-mean. The method with the larger area is considered the best performer. Figure 9 represents the radar charts for connective, epithelial, muscle, and nervous tissues, respectively. It can be elucidated from the figures that the area of the developed IMOWOA is larger for each category than other considered techniques.
Comparison with other multi-objective feature selection techniques
To provide a comprehensive evaluation, we compare the proposed IMOWOA with other multi-objective feature selection techniques. We evaluate the performance of IMOWOA, NSGA-II63, SPEA244, and MODE64 on the same histopathological image dataset. The experiments are conducted using six well-known classifiers, namely LDA, SVM, kNN, ZeroR, and RF. The results are compared in terms of accuracy, mean number of selected features, and computational time.
Table 11 provides a comparative analysis of the feature selection methods, including IMOWOA, NSGA-II, SPEA2, and MODE, using the SVM classifier. The table shows that IMOWOA achieves the highest accuracy with a balanced number of selected features and computational time, indicating its effectiveness in feature selection.
Table 12 shows the classification accuracy of the investigated techniques across different classifiers. IMOWOA consistently achieves higher accuracy compared to NSGA-II, SPEA2, and MODE, demonstrating its robustness and superior performance in feature selection.
Table 13 presents the computational time for each method across different classifiers. IMOWOA exhibits competitive computational times, highlighting its efficiency in feature selection tasks without compromising on performance.
Conclusion
In this paper, a novel feature selection method, improved multi-objective WOA-based feature selection (IMOWOA-FS), has been introduced for histopathological image classification. The proposed method finds the optimal features by using a novel variant, termed an improved multi-objective WOA (IMOWOA). To validate the optimization ability, the proposed IMOWOA is tested on 7 bi-objective and 3 tri-objective CEC2009 benchmarks, and the results are validated against three other state-of-the-art methods, namely, MOPSO, MOEA/D, and MOWOA. It has been vindicated from the results that the developed IMOWOA outperforms the other considered methods on 90% of the benchmarks. Furthermore, IMOWOA-FS has been employed for the classification of histopathological images on a publicly available histopathological dataset of tissue images. The results are validated against 5 classifiers, namely, SVM, LDA, RF, kNN, and ZeroR, and compared in terms of classification accuracy, mean number of selected features, and computation time. The results show that IMOWOA-FS outperforms the other considered methods. However, it is essential to acknowledge the limitations of IMOWOA, such as its computational complexity, parameter sensitivity, and challenges in escaping local optima. Additionally, while IMOWOA has shown effectiveness in feature selection for histopathological images, its performance across other domains has not been extensively tested. Addressing these drawbacks through hybrid approaches, adaptive parameter control, and parallel computing can enhance its scalability and robustness. In future, the proposed IMOWOA-FS can be explored on other datasets, such as the plant leaf disease dataset and X-ray image dataset. Moreover, some other classifiers can be explored to improve the accuracy and computation time. Recognizing these limitations provides a valuable starting point for future research, paving the way for more efficient and generalized optimization algorithms.
Data availability
Data is publicly available on the domain https://web.archive.org/web/20200701054442/https://www.lab.anhb.uwa.edu.au/mb140/.
Abbreviations
- CNN:
-
Convolutional neural network
- FS:
-
Feature selection
- HOG:
-
Histogram oriented gradient
- KNN:
-
K-nearest neighbor
- LDA:
-
Latent dirichlet allocation
- MOEA/D:
-
Multi-objective evolutionary algorithm based on decomposition
- MOGWO:
-
Multi-objective gray wolf optimizer
- MOPSO:
-
Multi-objective particle swarm optimization
- MOWOA:
-
Multi-objective whale optimization algorithm
- RF:
-
Random forest
- SIFT:
-
Scale-invariant feature transform
- SURF:
-
Speed up robust features
- SVM:
-
Support vector machine
References
Vu, T. H., Mousavi, H. S., Monga, V., Rao, G. & Rao, U. A. Histopathological image classification using discriminative feature-oriented dictionary learning. IEEE Trans. Med. Imaging 35(3), 738–751 (2015).
Gupta, V. & Bhavsar, A. Breast cancer histopathological image classification: Is magnification important? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 17–24 (2017).
Srinivas, U., Mousavi, H. S., Monga, V., Hattel, A. & Jayarao, B. Simultaneous sparsity model for histopathological image representation and classification. IEEE Trans. Med. Imaging 33(5), 1163–1179 (2014).
Mittal, H., Saraswat, M., Bansal, J. C. & Nagar, A. Fake-face image classification using improved quantum-inspired evolutionary-based feature selection method. In 2020 IEEE Symposium Series on Computational Intelligence (SSCI), 989–995 (2020), IEEE.
Gutiérrez, R., Rueda, A. & Romero, E. Learning semantic histopathological representation for basal cell carcinoma classification. In Medical Imaging 2013: Digital Pathology, vol. 8676, 86760 (2013). International Society for Optics and Photonics.
Lowe, D. G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004).
Dalal, N. & Triggs, B. Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, 886–893 (2005), IEEE.
Bay, H., Tuytelaars, T. & Van Gool, L. Surf: Speeded up robust features. In European Conference on Computer Vision, 404–417 (2006). Springer.
Xu, J., Luo, X., Wang, G., Gilmore, H. & Madabhushi, A. A deep convolutional neural network for segmenting and classifying epithelial and stromal regions in histopathological images. Neurocomputing 191, 214–223 (2016).
Pal, R. & Saraswat, M. Enhanced bag of features using alexnet and improved biogeography-based optimization for histopathological image analysis. In 2018 Eleventh International Conference on Contemporary Computing (IC3), 1–6 (2018), IEEE.
Masci, J., Meier, U., Cireşan, D. & Schmidhuber, J. Stacked convolutional auto-encoders for hierarchical feature extraction. In International Conference on Artificial Neural Networks, 52–59 (2011). Springer.
Yao, X., Wang, X., Wang, S. -H. & Zhang, Y. -D. A comprehensive survey on convolutional neural network in medical image analysis. Multimed. Tools Appl. 1–45 (2020).
Alom, M.Z., Taha, T.M., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, M.S., Van Esesn, B.C., Awwal, A.A.S., Asari, V.K.: The history began from alexnet: A comprehensive survey on deep learning approaches. arXiv preprint arXiv:1803.01164 (2018).
Mittal, H., Saraswat, M. & Pal, R. Histopathological image classification by optimized neural network using igsa. In International Conference on Distributed Computing and Internet Technology, 429–436 (2020). Springer.
Pandey, A. C., Rajpoot, D. S. & Saraswat, M. Feature selection method based on hybrid data transformation and binary binomial cuckoo search. J. Ambient. Intell. Humaniz. Comput. 11(2), 719–738 (2020).
Pandey, A. C. & Rajpoot, D. S. Feature selection method based on grey wolf optimization and simulated annealing. Recent Adv. Comput. Sci. Commun. (Former. Recent Patents Comput. Sci.) 14(2), 635–646 (2021).
Kulhari, A., Pandey, A., Pal, R. & Mittal, H. Unsupervised data classification using modified cuckoo search method. In 2016 Ninth International Conference on Contemporary Computing (IC3), 1–5 (2016), IEEE.
Mittal, H. & Saraswat, M. Classification of histopathological images through bag-of-visual-words and gravitational search algorithm. In Soft Computing for Problem Solving, 231–241. Springer (2019).
Mittal, H. & Saraswat, M. A new fuzzy cluster validity index for hyper-ellipsoid or hyper-spherical shape close clusters with distant centroids. IEEE Trans. Fuzzy Syst. 29, 3249–3258 (2020).
Pal, R. & Saraswat, M. Histopathological image classification using enhanced bag-of-feature with spiral biogeography-based optimization. Appl. Intell. 49(9), 3406–3424 (2019).
Pandey, A. C., Rajpoot, D. S. & Saraswat, M. Data clustering using hybrid improved cuckoo search method. In 2016 Ninth International Conference on Contemporary Computing (IC3), 1–6 (2016), IEEE.
Tripathi, A. K., Sharma, K. & Bala, M. A novel clustering method using enhanced grey wolf optimizer and mapreduce. Big Data Res. 14, 93–100 (2018).
Mittal, H., Tripathi, A., Pandey, A. C. & Pal, R. Gravitational search algorithm: A comprehensive analysis of recent variants. Multimed. Tools Appl. 80(5), 7581–7608 (2021).
Mittal, H. & Saraswat, M. An automatic nuclei segmentation method using intelligent gravitational search algorithm based superpixel clustering. Swarm Evol. Comput. 45, 15–32 (2019).
Tripathi, A. K., Mittal, H., Saxena, P. & Gupta, S. A new recommendation system using map-reduce-based tournament empowered whale optimization algorithm. Complex Intell. Syst. 7(1), 297–309 (2021).
Mittal, H., Pandey, A. C., Pal, R. & Tripathi, A. A new clustering method for the diagnosis of COVID19 using medical images. Appl. Intell. 51(5), 2988–3011 (2021).
Singh, R., Mittal, H. & Pal, R. Optimal keyframe selection-based lossless video-watermarking technique using IGSA in LWT domain for copyright protection. Complex Intell. Syst. 1–24 (2021).
Pandey, A. C., Rajpoot, D. S. & Saraswat, M. Hybrid step size based cuckoo search. In 2017 Tenth International Conference on Contemporary Computing (IC3), 1–6 (2017), IEEE.
Pandey, A. C. & Rajpoot, D. S. Spam review detection using spiral cuckoo search clustering method. Evol. Intell. 12(2), 147–164 (2019).
Saraswat, M., Arya, K. & Sharma, H. Leukocyte segmentation in tissue images using differential evolution algorithm. Swarm Evol. Comput. 11, 46–54 (2013).
Katoch, S., Chauhan, S. S. & Kumar, V. A review on genetic algorithm: Past, present, and future. Multimed. Tools Appl. 80(5), 8091–8126 (2021).
Bansal, J. C., Singh, P., Saraswat, M., Verma, A., Jadon, S. S. & Abraham, A. Inertia weight strategies in particle swarm optimization. In 2011 Third World Congress on Nature and Biologically Inspired Computing, 633–640 (2011), IEEE.
Mittal, H. & Saraswat, M. An optimum multi-level image thresholding segmentation using non-local means 2d histogram and exponential kbest gravitational search algorithm. Eng. Appl. Artif. Intell. 71, 226–235 (2018).
Gupta, M., Parmar, G., Gupta, R. & Saraswat, M. Discrete wavelet transform-based color image watermarking using uncorrelated color space and artificial bee colony. Int. J. Comput. Intell. Syst. 8(2), 364–380 (2015).
Pandey, A. C., Rajpoot, D. S. & Saraswat, M. Twitter sentiment analysis using hybrid cuckoo search method. Inf. Process. Manag. 53(4), 764–779 (2017).
Tripathi, A. K. et al. A parallel military-dog-based algorithm for clustering big data in cognitive industrial internet of things. IEEE Trans. Ind. Inf. 17(3), 2134–2142 (2020).
Mittal, H. & Saraswat, M. An image segmentation method using logarithmic kbest gravitational search algorithm based superpixel clustering. Evol. Intel. 14(3), 1293–1305 (2021).
Pandey, A. C., Pal, R. & Kulhari, A. Unsupervised data classification using improved biogeography based optimization. Int. J. Syst. Assur. Eng. Manag. 9(4), 821–829 (2018).
Pandey, A. C., Tripathi, A. K., Pal, R., Mittal, H. & Saraswat, M. Spiral salp swarm optimization algorithm. In 2019 4th International Conference on Information Systems and Computer Networks (ISCON), 722–727 (2019), IEEE.
Pandey, A. C., Kulhari, A. & Shukla, D. S. Enhancing sentiment analysis using roulette wheel selection based cuckoo search clustering method. J. Ambient Intell. Humaniz. Comput. 13, 1–29 (2021).
Pandey, A. C. & Tikkiwal, V. A. Stance detection using improved whale optimization algorithm. Complex Intell. Syst. 7(3), 1649–1672 (2021).
Kohli, S., Kaushik, M., Chugh, K. & Pandey, A. C. Levy inspired enhanced grey wolf optimizer. In 2019 Fifth International Conference on Image Information Processing (ICIIP), 338–342 (2019), IEEE.
Saraswat, M. & Arya, K. Supervised leukocyte segmentation in tissue images using multi-objective optimization technique. Eng. Appl. Artif. Intell. 31, 44–52 (2014).
Zitzler, E. & Thiele, L. Multiobjective evolutionary algorithms: A comparative case study and the strength pareto approach. IEEE Trans. Evol. Comput. 3(4), 257–271 (1999).
Abdollahzadeh, B. & Gharehchopogh, F. S. A multi-objective optimization algorithm for feature selection problems. Eng. Comput. 38(Suppl 3), 1845–1863 (2022).
Li, X. et al. Multi-objective binary grey wolf optimization for feature selection based on guided mutation strategy. Appl. Soft Comput. 145, 110558 (2023).
Jiao, R., Xue, B. & Zhang, M. Benefiting from single-objective feature selection to multiobjective feature selection: A multiform approach. IEEE Trans. Cybern. 53(12), 7773–7786 (2022).
Xue, Y., Cai, X. & Neri, F. A multi-objective evolutionary algorithm with interval based initialization and self-adaptive crossover operator for large-scale feature selection in classification. Appl. Soft Comput. 127, 109420 (2022).
Bai, L., Li, H., Gao, W., Xie, J. & Wang, H. A joint multiobjective optimization of feature selection and classifier design for high-dimensional data classification. Inf. Sci. 626, 457–473 (2023).
Zhang, M. et al. Multi-objective optimization algorithm based on clustering guided binary equilibrium optimizer and NSGA-III to solve high-dimensional feature selection problem. Inf. Sci. 648, 119638 (2023).
Abd El Aziz, M., Ewees, A. A. & Hassanien, A. E. Multi-objective whale optimization algorithm for content-based image retrieval. Multimed. Tools Appl. 77(19), 26135–26172 (2018).
Mirjalili, S. & Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016).
Mirjalili, S., Saremi, S., Mirjalili, S. M. & Coelho, L. D. S. Multi-objective grey wolf optimizer: A novel algorithm for multi-criterion optimization. Expert Syst. Appl. 47, 106–119 (2016).
Zhang, Q., Zhou, A., Zhao, S., Suganthan, P. N., Liu, W. & Tiwari, S. Multiobjective optimization test instances for the CEC 2009 special session and competition (2008).
Van Veldhuizen, D. A. & Lamont, G. B. Multiobjective Evolutionary Algorithm Research: A History and Analysis (Technical report, Citeseer, 1998).
Coello, C. A. C., Pulido, G. T. & Lechuga, M. S. Handling multiple objectives with particle swarm optimization. IEEE Trans. Evol. Comput. 8(3), 256–279 (2004).
Fonseca, C. M., Knowles, J. D., Thiele, L. & Zitzler, E. A tutorial on the performance assessment of stochastic multiobjective optimizers. In Third International Conference on Evolutionary Multi-Criterion Optimization (EMO 2005), vol. 216, 240 (2005).
Zhang, Q. & Li, H. MOEA/D: A multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evol. Comput. 11(6), 712–731 (2007).
Friedman, M. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937).
Li, M.-W., Xu, D.-Y., Geng, J. & Hong, W.-C. A hybrid approach for forecasting ship motion using CNN-GRU-AM and GCWOA. Appl. Soft Comput. 114, 108084 (2022).
Blue histology. https://web.archive.org/web/20200701054442/https://www.lab.anhb.uwa.edu.au/mb140/. [Online; accessed 19-July-2021] (2018).
Sirinukunwattana, K. et al. Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images. IEEE Trans. Med. Imaging 35(5), 1196–1206 (2016).
Deb, K., Agrawal, S., Pratap, A. & Meyarivan, T. A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii. In Parallel Problem Solving from Nature PPSN VI: 6th International Conference, Paris, France, September 18–20, 2000 Proceedings 6, 849–858 (2000). Springer.
Yuan, Y., Ong, Y.-S., Gupta, A. & Xu, H. Objective reduction in many-objective optimization: Evolutionary multiobjective approaches and comprehensive analysis. IEEE Trans. Evol. Comput. 22(2), 189–210 (2017).
Acknowledgements
I would like to express my deep and sincere gratitude to my research supervisor Dr. Kapil sharma and co supervisor Dr. Manju Bala for providing me the opportunity to do research and provide invaluable guidance for this research.
Funding
No funding.
Author information
Authors and Affiliations
Contributions
All authors contributed equally to this work.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Sharma, R., Sharma, K. & Bala, M. Efficient feature selection for histopathological image classification with improved multi-objective WOA. Sci Rep 14, 25163 (2024). https://doi.org/10.1038/s41598-024-75842-y
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-024-75842-y











