Abstract
Laryngeal cancer exhibits a notable global health burden, with later-stage detection contributing to a low mortality rate. Laryngeal cancer diagnosis on throat region images is a pivotal application of computer vision (CV) and medical image diagnoses in the medical sector. It includes detecting and analysing abnormal or cancerous tissue from the larynx, an integral part of the vocal and respiratory systems. The computer-aided system makes use of artificial intelligence (AI) through deep learning (DL) and machine learning (ML) models, including convolution neural networks (CNN), for automated disease diagnoses and detection. Various DL and ML approaches are executed to categorize the extraction feature as healthy and cancerous tissues. This article introduces an automated Laryngeal Cancer Diagnosis using the Dandelion Optimizer Algorithm with Ensemble Learning (LCD-DOAEL) method on Biomedical Throat Region Image. The LCD-DOAEL method aims to investigate the images of the throat region for the presence of laryngeal cancer. In the LCD-DOAEL method, the Gaussian filtering (GF) approach is applied to eliminate the noise in the biomedical images. Besides, the complex and intrinsic feature patterns can be extracted by the MobileNetv2 model. Meanwhile, the DOA model carries out the hyperparameter selection of MobileNetV2 architecture. Finally, the ensemble of three classifiers such as bidirectional long short-term memory (BiLSTM), regularized extreme learning machine (ELM), and backpropagation neural network (BPNN) models, are utilized for the classification process. A comprehensive set of simulations is conducted on the biomedical image dataset to highlight the efficient performance of the LCD-DOAEL technique. The comparison analysis of the LCD-DOAEL method exhibited a superior accuracy outcome of 97.54% over other existing techniques.
Similar content being viewed by others
Introduction
Laryngeal cancer indicates nearly 1% to 2% of cancers globally are rising while the incidence is reducing in a few states. The uppermost incidence is in Europe, while the percentage of death and incidence peaks in Africa1. Hypopharyngeal cancer is nearly 0.5% to 1%, with a noted rising rate in women in numerous states. Above 95% of hypopharyngeal and laryngeal cancers are squamous cell carcinoma (SCC), affected mainly through alcohol and tobacco intake2. Compared to oropharyngeal cancer, disease with the human papillomavirus does not play an efficient part3,4. In modern diagnosis, false cancer detection also exists, which is frequently related to the natural anxiety of experts to prevent overlooking cancer at initial stages5. To help in this method, researchers attempt to enhance pre-treatment estimates and offer intraoperative data on the pathologies. To get the histopathology of an abnormality, the study of endoscopic images may be the most available technique. Such images are utilized for lesion recognition and classification, as projected in this paper6. However, standard imaging models like white-light endoscopy usually deliver restricted data about the laryngeal tissue. Narrow-band imaging (NBI) offers an exciting alternative to attain this goal7.
The NBI is an optical technology that improves the expert's ability to discover and analyze lesions over endoscopic inspection8. It allows improved tissue characterisation with a filtered-spectrum illumination method that enhances the absorbance and scattering of the light in tissue9. Moreover, the study indicates no significant changes in the accuracy of distinguishing malignant and benign lesions in the NBI and WLE for AI10. CNN is famous for its essential benefit in processing larger-scale imageries and has developed as a central point of study in many scientific areas and medicine11. However, laryngeal SCC still needs to be explored in AI research. Deep convolutional neural networks (DCNN) have shown an extraordinary ability to diagnose many diseases, such as interstitial pulmonary and breast tumours12. In the present scenario, significant development has been made in AI studies in head and neck cancer. Researchers have commonly used many AI models, furthering innovation in medical diagnosis and treatment13.
This article introduces an automated Laryngeal Cancer Diagnosis using the Dandelion Optimizer Algorithm with Ensemble Learning (LCD-DOAEL) method on Biomedical Throat Region Image. The LCD-DOAEL method aims to investigate the throat region images for the presence of laryngeal cancer. In the LCD-DOAEL method, the Gaussian filtering (GF) approach is applied to eliminate the noise in the biomedical images. Besides, the complex and intrinsic feature patterns can be extracted by the MobileNetv2 model. Meanwhile, the DOA model carries out the hyperparameter selection of MobileNetV2 architecture. Finally, the ensemble of three classifiers such as bidirectional long short-term memory (BiLSTM), regularized extreme learning machine (ELM), and backpropagation neural network (BPNN) models, are utilized for the classification process. A comprehensive set of simulations is conducted on the biomedical image dataset to highlight the efficient performance of the LCD-DOAEL technique. The significant contributions of the LCD-DOAEL technique are listed as follows:
-
The LCD-DOAEL method utilized the GF model to improve the quality of biomedical images by efficiently mitigating noise. This preprocessing step enhances the clarity of the input data and facilitates more precise and dependable feature extraction processes. By lessening noise artefacts, GF improves subsequent image evaluation tasks' comprehensive accomplishment and precision.
-
The MobileNetV2 technique is implemented to extract convolutional, intrinsic feature patterns from biomedical images, utilizing its optimization for mobile and embedded vision applications. This model confirms computational effectualness and outperforms robustly comprehending features significant for precise biomedical image evaluation, thereby improving the diagnostic abilities and research results.
-
The DOA method optimizes MobileNetV2's hyperparameters and batch sizes, refining learning rates and architecture settings to improve model accomplishment. This technique confirms that MobileNetV2 operates at peak effectiveness, enhancing its accuracy and adaptability in biomedical image processing tasks and thereby advancing diagnostic precision and research abilities.
-
The novelty of the LCD-DOAEL technique is in incorporating BiLSTM, regularized ELM, and BPNN classifiers for biomedical image classification. This ensemble integrates the merits of sequential learning, efficient learning, and DL paradigms to improve classification accuracy and robustness. By employing various learning models, the ensemble aims to exceed the performance limitations of individual models, offering promising enhancements in complex biomedical image evaluation for more accurate diagnostics and research insights
Literature survey
Alrowais et al.14 present a Laryngeal Cancer Detection and Classification utilizing the Aquila Optimizer Algorithm with DL (LCDC-AOADL) approach on neck area images. The InceptionV3 model has been used for the feature extractor procedure. Furthermore, the AOA is employed to tune the hyperparameters of DBN technique outcomes in enhanced recognition rates. In15, a Deep Ensemble Learning (EL) model utilizing CNN and image segmentation method is presented. Kwon et al.16 main aim is to develop the accuracy of classification by relating the results gained by utilizing decision tree EL, which is mainly used to enhance the classification accuracy for smaller datasets, with excellent outcomes for the analysis of glottal cancer. Sahoo et al.17 proposed a novel and effective DL-based Mask RCNN technique for classifying laryngeal cancer and its signs using an image dataset. In18, a hybridization of handcrafted and deep features is developed in the initial laryngeal cancer classification structure. The handcrafted features utilizing First-order statistics (STAT) and Local Binary Patterns (LBP) and DenseNet201 utilizing transfer learning (TL) are removed from the endoscopic narrowband imageries of the larynx and collectively merged into typical features. The Recursive Feature Elimination with RF (RFE- RF) technique has nominated the optimum features. In19, a deep attention network (DAN)-based UNet with a colour normalization process (CN-DA-UNet) has been developed to attain an endwise segmentation of the glottal field. Initially, the original image was treated by colour normalization to decrease the harmful impacts of low comparison and considerable variances in colour among dissimilar imageries. Next, the regularized image leads to the developed DA-UNet for the feature extractor.
In20, an image enhancement model can recognize the data of image structure. The technique mainly depends on the standard cycle constancy image translation design, which is altered to a U-shaped residual block. Simultaneously, a multi-scale cross-layer adaptation model (CLA) is intended, which mixes features of dissimilar powers over feature transfer to attain plentiful image data. Pan et al.21 propose a reverse attention network with a hybrid transformer (RANT) and integrate CNN and transformer in a sequential method to take the universal dependency feature according to the low-level spatial information. Then, the receptive field block module (RRM) links dissimilar scale features in a cascade method to mine the objective slowly. Lastly, the segmentation outcomes are enhanced by a convolution conditional random field (ConvCRF). Gharehchopogh et al.22 introduce a CQFFA, an enhanced Firefly Algorithm (FFA) incorporating twelve chaotic maps and a Quasi-Oppositional (QO)-Based Learning mechanism. In23, the authors employ the enhanced African Vultures Optimization Algorithm (AVOA) with three binary thresholds (Kapur's entropy, Tsallis entropy, and Otsu's entropy) for multi-threshold image segmentation. The model also integrates the Quantum Rotation Gate (QRG) to improve population diversity and escape local optima while using the Association Strategy (AS) for efficient solution retrieval and accelerated search for optimal outcomes. Gharehchopogh and Khargoush24 present an asymmetric clustering approach using the Interactive Autodidactic School (IAS) model. The IAS employing the Chebyshev chaotic function (CCF) demonstrates superior performance compared to other variants and metaheuristic algorithms in simulations.
The existing studies propose novel models such as DL with ensemble models, decision tree EL for smaller datasets, and advanced Mask RCNN models for precise classification and segmentation. Challenges across these models usually comprise dataset size limitations, computational complexity, reliability on precise image preprocessing, and variability in performance across diverse image qualities and conditions. The incorporation of handcrafted and deep features also presents threats in terms of feature engineering and generalizability. Furthermore, models integrating attention mechanisms and hybrid transformer architectures focus on improving classification accuracy but may need help training complexity and interpretability. These studies propose various methods for laryngeal cancer detection from neck area images. Yet, there is a notable research gap in ensuring scalability and robustness across numerous datasets, addressing computational complexity, precise image preprocessing needs, and efficiently combining handcrafted and deep features. Table 1 summarises the existing studies on laryngeal cancer diagnosis.
The proposed method
This section introduces an automated LCD-DOAEL method for biomedical throat region images. The LCD-DOAEL method aims to investigate the throat region images for the presence of laryngeal cancer. To accomplish this, the LCD-DOAEL technique comprises different sub-processes, such as GF-based preprocessing, MobileNetv2-based feature extractor, DOA-based hyperparameter tuning, and ensemble learning process. Figure 1 depicts the entire procedure of the LCD-DOAEL technique.
Image preprocessing
Initially, the LCD-DOAEL technique takes place, and the GF approach is applied to eliminate the noise in the biomedical images. GF is a popular image-processing algorithm intended to blur or smooth an image25. The underlying idea behind GF is to convolute the image with Gaussian kernels, a 2D distribution considered by a bell-shaped curve. The convolutional function includes sliding the Gaussian kernel over each image pixel, and the neighbouring pixels contribute to the weighted average for each position according to the Gaussian distribution. The GF is highly efficient in reducing high-frequency noise in the image while preserving edges and details. The extent of blurring or smoothing is controlled by the variance of Gaussian kernels—a significant variance leads to the smooth, broad bell curve, resulting in substantial blurring.
MobileNetv2 feature extractor
The MobileNetv2 model can be employed for the feature extraction process. CNN is the foundation of innovative image classification technology. It is a kind of NN model where convolution is applied for a single layer rather than matrix multiplication26. In contradiction to NN, which treats all the components of source images as input, convolution considers neighbourhood pixels, considerably improving the network efficiency. MobileNet is based on a simplified design that constructs lightweight DTL using a depthwise separable convolutional layer. The basis of MobileNet architecture is a convolution feature map that considerably affects the typical convolutional layer into a depthwise convolution (DWC) and an 11-convolution called a pointwise convolution (PWC). The DWC for MobileNet exploits one filter for all the network interfaces. Then, the outcome of the DWC is combined with the PWC using 11 convolutions. Typical convolution integrates input into new output series in a single iteration while filtering the input. The DWC can classify this into two layers, one for integration and another for filtration. This leads to a considerable reduction in framework and computation size. Figure 2 demonstrates the framework of MobileNetv2.
As a third version of MobileNet, MobileNetV2 is a highly efficient and lightweight DL method that satisfies the requirements of peripheral computing systems and mobile devices and excels in resource-constraint environments. MobileNetV2 is intended to have a balance between model size and computation efficacy. Based on integrating the "bottleneck” layer, this model primarily consists of depthwise separable convolution. This layer significantly decreases computation difficulty and model parameters, optimizing the accuracy while retaining model capacity. MobileNetV2 is also defined by the efficient approach of "inverted residuals". The paradigm strikes a balance between a linear constraint and lightweight expansion, improving the adaptability and efficiency of the model. Furthermore, MobileNetV2 integrates the “squeeze-and-excitation” model that enhances its capability for capturing crucial features by re-sizing the channel-specific feature response. The proposed model concludes with the FC layer having the class size and the 'LogSoftmax' activation function for class prediction. The MobileNetV2 is used to provide a visual representation of the structural model. The architecture of MobileNetV2 can accommodate various constraints and applications, which makes it an essential component in DL and CV.
Hyperparameter tuning using DOA mode
In this phase, the hyperparameter selection of MobileNetv2 is performed by the design of the DOA. Heuristics techniques are behaviours from natural procedures. DO is a bio‐inspired optimization method that employs SI to deal with the constant optimization complexity27. DO has been introduced with stimulation from the wind‐blown behaviour of the dandelion plant. Seeds movement in 3 phases: settling, ascending, and descending at a random position in the landing phase. The DO method characterizes these three phases with mathematical representations and searches for optimum solutions by imitating these behaviours. The mathematical stages of the DO technique are detailed below.
-
1.
Initial population: Generate a population initialization randomly.
$$Population=\left[\begin{array}{lll}{D}_{1}^{1}& \dots & {D}_{1}^{Dim}\\ \vdots & \ddots & \vdots \\ {D}_{pop}^{1}& \dots & {D}_{pop}^{Dim}\end{array}\right]$$(1)Now, pop refers to the population dimension and \(Dim\) the dimension of the variable. Every candidate solution will be arbitrarily generated among the lower limit (\({L}_{B}\)) and upper limit (\({U}_{B}\)) of the specified problem. The symbol “rand” shows a function which values have been arbitrarily allocated among \([\text{0,1}]\). The \({i}^{th}\) individual \({D}_{i}\) can be stated as follows.
$${D}_{i}=rand\times \left({U}_{B}-{L}_{B}\right)+{L}_{B}$$(2) -
2.
Evaluation of fitness values: Improved fitness values for the problems in every individual will be computed. The individual with the best fitness values can be measured as the best. The initial best candidate solution is precisely given by:
$${D}_{elite}=D\left(find\left({f}_{best}=f\left({D}_{i}\right)\right)\right)$$(3) -
3.
Ascension stage: The original places of individuals have been calculated and elevated by employing the FF values. With the impact of parameters like air humidity, wind speed, and chamomile seeds, increase to various heights. Now, the weather can be separated into two conditions.
Case-1: On a pure day, wind speeds are considered lognormal distribution \(\text{ln }Y\sim N\left(\mu , {\sigma }^{2}\right)\). The original location of the seeds can be measured as specified in Eq. (4).
$${D}_{(t+1)}={D}_{t}+\delta \times {v}_{x}\times {v}_{y}\times \text{ ln }Y\times \left({D}_{s}-{D}_{t}\right)$$(4)Now, \({D}_{t}\) denotes the location of the dandelion seeds at \({t}^{th}\) iteration. \({D}_{s}\) shows the randomly selected location within the search range at \({t}^{he tth}\) iteration, \({v}_{x}\) and \({v}_{y}\) denote the lift element coefficients because of the distinct hose movement of the dandelion. \(\delta\) denotes the coefficient ranging from zero to tone reduces nonlinearly and tactics zero.
The \(log\) uniform distribution assumed in Eq. (4) is described as \(\mu =0\) and \({\sigma }^{2}=1\) and will be presented by the next Eq. (5)
$$\text{ln }Y=\left\{\begin{array}{l}\frac{1}{y\sqrt{2\pi }}\text{exp}\left[-\frac{1}{2{\sigma }^{2}}{\left(\text{ln }y\right)}^{2}\right] y\ge 0\\ 0 y<0\end{array}\right.$$(5)In every iteration method, an adaptive parameter \(\gamma\)’ is employed for controlling the length of the search method over the overall iterations count \(T\). During the DO technique, the \(y\) value was selected in the range \([\text{0,1}]\) with uniform distribution, and \(\gamma\) can be described as:
$$\gamma =rand*\left(\frac{1}{{T}^{2}}{t}^{2}-\frac{2}{T}t+1\right)$$(6)Case 2: In a day noticeable by rainfall, the increase of dandelion seeds is prevented by humidity, air resistance, and level of alternative parameters. Accordingly, these seeds stay near their new position, and behaviour must be accurately determined by applying a mathematical Eq. (7):
$${D}_{(t+1)}={D}_{t}\times \left( 1-rand \times p\right)$$(7)\(p\) becomes a parameter employed for regulating the local search region of dandelion and measured as specified in Eq. (8). This value will be updated at every iteration dependent upon the highest iteration and the number of iterations accessible.
$$p=\left(\frac{{t}^{2}-2t+1}{{T}^{2}-2T+1}+1\right)$$(8)Now, \(T\) denotes the maximal number of iterations, and \(t\) signifies the no. of iterations obtainable.
-
4.
Descent phase: Individuals drop to the height measured at the Ascension stage, and their location will be upgraded.
$${D}_{t+1}={D}_{t}-\alpha \times {\beta }_{t}\times \left({D}_{mea{n}_{-}t}-\alpha \times {\beta }_{t}\times {D}_{t}\right)$$(9)Now,\({D}_{mea\text{n}\_\text{t}}\) refers to the mean location of the population at the \({i}^{th}\) iteration, \(\beta t\) signifies the Brownian movement and random integer from the uniform distribution.
-
5.
Landing location determination: Seeds resolve in a random position due to weather and wind conditions in their novel location. By utilizing the population's development, the global preeminent solution can be represented by Eq. (10).
$${D}_{t+1}={D}_{elite}+levy\left(\lambda \right)\times \alpha \times \left({D}_{elite}-{D}_{t}\times \sigma \right)$$(10)Now, \({D}_{elite}\) is the optimum place for dandelion seed at \({i}^{t}s\) iteration. \(levy\)(\(\lambda\)) symbolizes the operation of Levy flight and is computed by the given Eq. (11)
$$1evy\left(\lambda \right)=s\times \frac{w\times \sigma }{\left|t\right|B1}$$(11)where \(B\) refers to randomly described at \([\text{0,2}].\) \(S\) describes a continuous equivalent to 0.01. This will be arbitrarily selected among \(\omega\) and \(t[0,\) 1]. \(\sigma\) denote measured as:
$$\sigma =\left(\frac{\Gamma \left(1+B\right)\times sin\left(\frac{\pi B}{2}\right)}{\Gamma \left(\frac{1+B}{2}\right)\times B\times 2\left(\frac{B+1}{2}\right)}\right)$$(12) -
6.
Repopulation: A novel population was formed with the current positions acquired.
-
7.
Stopping conditions: Steps 2–7 have been repeated until the stopping measure is achieved.
-
8.
Best value: The idea with optimum fitness values can be considered the optimum solution.
The DOA method derives an FF to attain better classification accuracy. It defines the positive integer to depict the superior outcome of the candidate solution. Here, the reduction of classifier errors is assumed as the FF.
Ensemble learning
Finally, the classification process utilizes an ensemble of three classifiers: BPNN, regularized ELM, and the BiLSTM model.
BPNN model
BPNN is a computational model developed with the help of simulating the functional mode of neurons28. This can be generally made of output, input, and hidden layers, and its operational mode primarily contains backpropagation (BP) and forward propagation. During forward propagation, the instance of every input factor can be initially inputted into the neurons of the "input layer", followed by the input sample being transferred to the following neuron over a specific logical correlation, as expressed in Eq. (14), and lastly, the neuron of “output layer”.
This logical correlation comprises the bias \({b}_{i}\) of the \({k}^{th}\) neurons and weight \({w}_{ij}\) of the \(I\) neurons of \(the k\) layer to \({the j}^{th}\) neurons at the \(k+1\) layer. The obtained instances of the \(k+1\) layer should be handled via the activation function; later, the dataset must be changed to \(the k+2\) layers. At BP, the estimated value, y, acquired by the above-mentioned logical correlation, will related to investigational value \(y\), and the loss function must be achieved. Thereby, the gradient functions of the loss function could be gained. In conclusion, the bias and weight have been continuously converted by employing the gradient descent technique, and the optimum weight vector and bias of every layer have been attained.
Regularized ELM model
As an SLFN, ELM features \(M\) training samples29. \(\left\{\left({x}_{j}, {t}_{j}\right),j=1, \cdots , M\right\},{x}_{j}=\{{x}_{1}, {x}_{2}, \cdots , {x}_{m}{\}}^{T},{t}_{j}=\{{t}_{1}, {t}_{2}, \cdots , {t}_{n}{\}}^{T},{x}_{j},{t}_{j}\) indicate the input and output vectors of \({j}^{th}\) samples, correspondingly. The activation function is \(\left(w, b, x\right)\), and the HL node is \(L\) and t. The architecture of ELM comprises \(n\) output neurons, \(m\) input neurons, and \(L\) hidden neurons:
In Eq. (15), \({\beta }_{i}=[{\beta }_{i1}, {\beta }_{2}l, \cdots , {\beta }_{iL}{]}^{T}\) shows the connecting weight vector of \({i}^{th}\) hidden neurons to the output layer, \({W}_{i}=\{{W}_{i1}, {W}_{2}l, \cdots , {W}_{iL}{\}}^{T}\) signifies the connecting weight vector of \({i}^{th}\) hidden neurons to the input layer, and \({b}_{i}\) represents the bias of \({i}^{th}\) hidden nodes, each of them is produced at random.
Equation (17) is substituted to Eq. (16) and is attained by singular value and least square decomposition:
The regularization coefficient enhances the structural stability of ELM and produces RELM:
In Eq. (19), \(I\) shows the unit matrix, and \(C\) indicates the regularization factor.
BiLSTM model
The BiLSTM plays an important role where each element in the input signal fuses the corresponding data from the past and present30. In such cases, it produces better output. The linear DL method \(fd \left(si, vx\right)={\sum }_{bc=1}^{hd}s{i}_{bc}v{x}_{bc}\), where the input is called \(hd\), along with the term \(vx,fd(\cdot )\), indicates the weight and output of networks. The BiLSTM model has two LSTM layers. The LSTM layer was trained with the input series in the forward direction. In the reserve order, the input series was given to train an additional LSTM layer in the backward direction. This input sequence has the training data's real and imaginary parts. The LSTM is used to resolve the gradient problems in the RNN based on long data series.
Here,\({R}_{vr},{ R}_{rv},{ R}_{ti},{ and R}_{ho}\) are the weight matrices on the input state. The weight matrix from the prior short-term \(g{d}_{mn-1}\) is represented as \({V}_{vr},{ V}_{rv},{ V}_{ti},{ and V}_{ho}.\) The variables \(h{a}_{vr},h{a}_{rv},h{a}_{ri}, and ha{h}_{0}\) are represented as biased. The long‐term state \(c{d}_{mn}\) is represented as follows
Lastly, the output \({f}_{mn}\) is formulated by:
where \(c{d}_{mn-1}\) is a variable in the prior long-term state,
Experimental validation
The experimental validation inspects the LC detection outcomes of the LCD-DOAEL technique on the throat image datasets31, containing 1320 samples with four classes, as illustrated in Table 2. Figure 3 defines the sample images. The suggested technique is simulated using the Python 3.6.5 tool on PC i5-8600k, 250 GB SSD, GeForce 1050Ti 4 GB, 16 GB RAM, and 1 TB HDD. The parameter settings are provided: learning rate: 0.01, activation: ReLU, epoch count: 50, dropout: 0.5, and batch size: 5.
The confusion matrices formed by the LCD-DOAEL method on 80%TRAPH/20%TESPH and 70%TRAPH/30%TESPH are demonstrated in Fig. 4. The experimental value inferred the effective detection and classification of the above four class labels. The confusion matrices illustrate the classification performance across diverse phases and classes. In the training phase (80%), classes He, Hbv, IPCL, and Le portray varying levels of correct and misclassified instances, reflecting the model's performance during training. The testing phase (20% and 30%) further evaluates the model's ability to generalize to new data, with similar patterns observed across the predicted classes. Overall, the matrices depict the distribution of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN), providing insights into the classifier's accuracy and errors across diverse datasets and phases of evaluation.
The LC detection results of the LCD-DOAEL technique with 80%TRAPH/20%TESPH are reported in Table 3 and Fig. 5. The experimental outcome demonstrates the proficient recognition outcomes of the LCD-DOAEL technique under four classes. With 80%TRAPH, the LCD-DOAEL method offers an average \(acc{u}_{y}\) of 96.21%, \(pre{c}_{n}\) of 92.53%, \(rec{a}_{l}\) of 92.42%, \({F}_{score}\) of 92.40%, and \(AU{C}_{score}\) of 94.97%. Additionally, with 20%TESPH, the LCD-DOAEL method provides an average \(acc{u}_{y}\) of 97.54%, \(pre{c}_{n}\) of 95.03%, \(rec{a}_{l}\) of 95.03%, \({F}_{score}\) of 94.91%, and \(AU{C}_{score}\) of 96.70%.
The \(acc{u}_{y}\) curves for training (TRA) and validation (VAL) given in Fig. 6 for the LCD-DOAEL method with 80%TRAPH/20%TESPH provides meaningful interpretation into its performance on different epochs. There is a continuous refinement in TRA \(acc{u}_{y}\) and TES \(acc{u}_{y}\) with maximum epochs, which indicates the model's proficiency in recognizing and learning patterns from both data. The increasing trend in TES \(acc{u}_{racy}\) underscores the model's adaptability to the TRA data and its proficiency in making accurate predictions on hidden datasets, highlighting strong generalizability.
Figure 7 provides a detailed review of TRA and TES loss values for the LCD-DOAEL technique with 80%TRAPH/20%TESPH over various epochs. The TRA loss continuously minimizes as the model improves its weights to lessen the classifier error rate on both datasets. The loss curve demonstrates the model's alignment with the TRA data, highlighting its ability to effectively capture patterns in both datasets. The consistent improvement of parameters in the LCD-DOAEL technique is intended to reduce inconsistencies between predictions and actual TRA labels.
The findings affirm that the LCD-DOAEL model with 80%TRAPH/20%TESPH constantly obtains high PR values over all the classes regarding the PR curve given in Fig. 8. These outcomes emphasize the model's ability to discriminate between various classes, highlighting its effectiveness in accurately detecting classes.
Further, ROC curves produced by the LCD-DOAEL method with 80%TRAPH/20%TESPH are presented in Fig. 9, illustrating its ability to distinguish between class labels. This curve offers valuable insights into how TPR and FPR tradeoffs differ over diverse classifier epochs and thresholds. The outcomes emphasize the model's accurate classification outcomes on different classes, highlighting its efficiency in addressing various classification problems.
The LC detection outcomes of the LCD-DOAEL method with 70%TRAPH/30%TESPH are shown in Table 4 and Fig. 10. The outcome illustrates the proficient detection outcome of the LCD-DOAEL method on four different classes. With 70%TRAPH, the LCD-DOAEL method provides average \(acc{u}_{y}\) of 96.16%, \(pre{c}_{n}\) of 92.36%, \(rec{a}_{l}\) of 92.35%, \({F}_{score}\) of 92.33%, and \(AU{C}_{score}\) of 94.90%. Furthermore, with 30%TESPH, the LCD-DOAEL method provides an average \(acc{u}_{y}\) of 95.96%, \(pre{c}_{n}\) of 91.99%, \(rec{a}_{l}\) of 91.86%, \({F}_{score}\) of 91.91%, and \(AU{C}_{score}\) of 94.58%.
The \(acc{u}_{y}\) curves for TR and VL given in Fig. 11 for the LCD-DOAEL method with 70%TRAPH/30%TESPH offer meaningful interpretation into its performance on different epochs. Especially, there is a continuing evolution in TRA \(acc{u}_{y}\) and TES \(acc{u}_{y}\) with the highest epochs, which indicates the model's proficiency to learn and recognize patterns from both datasets. The increasing trend in TES \(acc{u}_{racy}\) highlights the model's adaptability to the TRA dataset and its ability to make correct predictions on hidden data, highlighting strong generalizability.
Figure 12 shows a comprehensive review of the TR and TS loss values for the LCD-DOAEL method with 70%TRAPH/30%TESPH over different epochs. The TR loss continuously lessens as the model refines its weights to lessen the classifier error rate on both datasets. The loss curve demonstrates the model's alignment with the TR data, emphasising its proficiency in effectively capturing patterns in two datasets. Noteworthy is the continuous improvement of the parameter in the LCD-DOAEL technique, aimed at reducing inconsistencies between predictions and actual TR labels.
The outcomes affirm that the LCD-DOAEL method with 70%TRAPH/30%TESPH constantly obtains maximum PR values across the class labels regarding the PR curve presented in Fig. 13. These outcomes emphasize the model's effective capability to discriminate between various classes, highlighting its efficiency in correctly detecting classes.
Further, Fig. 14 shows ROC curves produced by the LCD-DOAEL technique with 70%TRAPH/30%TESPH, which demonstrates its ability to distinguish between classes. These curves offer valuable insights into how the tradeoff between TPR and FPR differs over diverse thresholds and classification epochs. The outcomes highlight the model's accurate classification outcomes on different classes, underscoring its efficiency in addressing different classification problems.
Table 5 and Fig. 15 highlight the comparative analysis of the LCD-DOAEL technique with recent models14. The results indicate that the DCNN, VGG-19, VGG-16, and AlexNet techniques have yet to reach good results. Simultaneously, the Exception and ResNet approaches have managed to attain moderate performances. Meanwhile, the LCDC-AOADL technique gains near-optimal performance. But the LCD-DOAEL technique reaches effectual performance over other models with enhanced \(acc{u}_{y}\) of 97.54%, \(pre{c}_{n}\) of 95.03%, \(rec{a}_{l}\) of 95.03%, and \({F}_{score}\) of 94.91%.
Table 6 and Fig. 16 emphasize the LCD-DOAEL method's comparative computational time (CT) analysis with existing approaches. The experimental outcome shows that the DCNN, VGG-19, VGG-16, and AlexNet approaches have performed poorly. Simultaneously, the Exception and ResNet techniques have managed to obtain moderate outcomes. Meanwhile, the LCDC-AOADL technique attains near-optimum performance. However, the LCD-DOAEL method performs more effectively than other techniques with the least CT of 0.53 s.
Thus, the LCD-DOAEL technique can enhance cancer detection of throat region images.
Conclusion and future work
This article introduces an automated LCD-DOAEL method for biomedical throat region images. The LCD-DOAEL method aims to investigate the images of the throat region for the presence of laryngeal cancer. To accomplish this, the LCD-DOAEL method comprises different sub-processes such as GF-based preprocessing, MobileNetv2-based feature extractor, DOA-based hyperparameter tuning, and ensemble learning process. Initially, the LCD-DOAEL technique takes place, and the GF approach is applied to eliminate the noise in the biomedical images. Besides, the complex and intrinsic feature patterns can be extracted by the MobileNetv2 model. Meanwhile, the DOA model carries out the hyperparameter selection of the MobileNetV2 model. Finally, the ensemble of three classifiers such as BPNN, regularized ELM, and BiLSTM model. A comprehensive set of simulations is conducted on the biomedical image dataset to highlight the effective performance of the LCD-DOAEL method. The comparison analysis of the LCD-DOAEL method exhibited a superior accuracy outcome of 97.54% over other existing techniques. The LCD-DOAEL method faces various potential limitations and areas for future improvement. Firstly, the efficiency of GF may vary based on the complexity and variability of noise in biomedical images, justifying robustness analysis across several datasets. Secondly, while MobileNetV2 is effectual, its application to highly detailed or varied biomedical images may need additional adaptation or augmentation. Thirdly, optimizing hyperparameters with DOA improves performance but may need continual refinement as datasets evolve. Future research could explore hybrid or alternative filtering models, incorporate more advanced neural network models, and enhance ensemble model training techniques to attain even higher classification accuracy and generalization in biomedical image analysis. Moreover, examining methods to reduce bias and discrepancy trade-offs among the ensemble classifiers could further optimize performance in challenging diagnostic scenarios.
Data availability
The datasets used and analyzed during the current study available from the corresponding author on reasonable request.
References
Young, G. O. Synthetic structure of industrial plastics. In Plastics, 2nd ed Vol. 3 (ed. Peters, J.) 15–64 (McGraw-Hill, 1964).
Bur, A. M. et al. Interpretable computer vision to detect and classify structural laryngeal lesions in digital flexible laryngoscopic images. Otolaryngol.-Head Neck Surg. 169, 1564–1572 (2023).
Raoof, S. S., Jabbar, M. A. & Fathima, S. A. Lung cancer prediction using machine learning: A comprehensive approach. In Proc. 2nd Int. Conf. Innov. Mech. Ind. Appl. (ICIMIA), 108–115 (2020).
Raoof, S. S., Jabbar, M. A. & Fathima, S. A. Lung cancer prediction using feature selection and recurrent residual convolutional neural network (RRCNN). In Machine Learning Methods for Signal, Image and Speech Processing, 23–46 (River Publishers, 2022).
Jabbar, M. A. Breast cancer data classification using ensemble machine learning. Eng. Appl. Sci. Res. 48(1), 65–72 (2021).
Wellenstein, D. J., Woodburn, J., Marres, H. A. M. & van den Broek, G. B. Detection of laryngeal carcinoma during endoscopy using artificial intelligence. Head Neck 45(9), 2217–2226 (2023).
Huang, P. et al. A ViT-AMC network with adaptive model fusion and multiobjective optimization for interpretable laryngeal tumor grading from histopathological images. IEEE Trans. Med. Imag. 42(1), 15–28 (2023).
Bhattacharya, D. et al. Learning robust representation for laryngeal cancer classification in vocal folds from narrow-band images. In Med. Imag. Deep Learn. (2022).
Meyer-Veit, F., Rayyes, R., Gerstner, A. O. H. & Steil, J. Hyperspectral wavelength analysis with U-Net for larynx cancer detection. In Proc. Eur. Symp. Artif. Neural Netw. (ESANN), Comput. Intell. Mach. Learn., Bruges, Belgium (2022).
Timurzieva, A., Kotov, V., Popadyuk, V. & Ganshin, I. Rapid diagnosis of laryngeal cancer using Raman fluorescence spectroscopy. J. Clin. Physiol. Pathol. 1(1), 21–27 (2022).
Gharehchopogh, F. S., Ghafouri, S., Namazi, M. & Arasteh, B. Advances in manta ray foraging optimization: A comprehensive survey. J. Bionic Eng. 21(2), 953–990 (2024).
Sharma, S., Khodadadi, N., Saha, A. K., Gharehchopogh, F. S. & Mirjalili, S. Non-dominated sorting advanced butterfly optimization algorithm for multi-objective problems. J. Bionic Eng. 20, 819–843 (2022).
Khodadadi, N., Soleimanian Gharehchopogh, F. & Mirjalili, S. MOAVOA: A new multi-objective artificial vultures optimization algorithm. Neural Comput. Appl. 34(23), 20791–20829 (2022).
Alrowais, F. et al. Laryngeal cancer detection and classification using aquila optimization algorithm with deep learning on throat region images. IEEE Access. 11, 115306–115315 (2023).
Bhattacharjee, R., Devi, K. S. & Vijaykanth, S. Detecting laryngeal cancer lesions from endoscopy images using deep ensemble model. In 2023 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT), 1–6 (IEEE, 2023).
Kwon, I. et al. Diagnosis of early glottic cancer using laryngeal image and voice based on ensemble learning of convolutional neural network classifiers. J. Voice https://doi.org/10.1016/j.jvoice.2022.07.007 (2022).
Sahoo, P. K., Mishra, S., Panigrahi, R., Bhoi, A. K. & Barsocchi, P. An improvised deep-learning-based mask R-CNN model for laryngeal cancer detection using CT images. Sensors 22(22), 8834 (2022).
Joseph, J. S., Vidyarthi, A. & Singh, V. P. An improved approach for initial stage detection of laryngeal cancer using effective hybrid features and ensemble learning method. Multimed. Tools Appl. 1–23 (2023).
Ding, H., Cen, Q., Si, X., Pan, Z. & Chen, X. Automatic glottis segmentation for laryngeal endoscopic images based on U-Net. Biomedical Signal Process. Control 71, 103116 (2022).
Pan, X., Ma, M., Bai, W. & Zhang, S. PISDGAN: Perceive image structure and details for laryngeal image enhancement. Biomedical Signal Process. Control 80, 104307 (2023).
Pan, X., Bai, W., Ma, M. & Zhang, S. RANT: A cascade reverse attention segmentation framework with a hybrid transformer for laryngeal endoscope images. Biomedical Signal Process. Control 78, 103890 (2022).
Gharehchopogh, F. S., Nadimi-Shahraki, M. H., Barshandeh, S., Abdollahzadeh, B. & Zamani, H. Cqffa: A chaotic quasi-oppositional farmland fertility algorithm for solving engineering optimization problems. J. Bionic Eng. 20(1), 158–183 (2023).
Gharehchopogh, F. S. & Ibrikci, T. An improved African vultures optimization algorithm using different fitness functions for multi-level thresholding image segmentation. Multimed. Tools Appl. 83(6), 16929–16975 (2024).
Gharehchopogh, F. S. & Khargoush, A. A. A chaotic-based interactive autodidactic school algorithm for data clustering problems and its application on COVID-19 disease detection. Symmetry 15(4), 894 (2023).
Abuya, T. K., Rimiru, R. M. & Okeyo, G. O. an image denoising technique using wavelet-anisotropic Gaussian filter-based denoising convolutional neural network for CT images. Appl. Sci. 13(21), 12069 (2023).
Hossen, M. M. et al. A reliable and robust deep learning model for effective recyclable waste classification. IEEE Access. 12, 13809–13821 (2024).
Zhang, B. et al. Dynamic community detection method of a social network based on node embedding representation. Mathematics 10(24), 4738 (2022).
Zheng, P., Wang, L., Ji, Y., Zeng, Y. & Chen, X. Backpropagation neural network modeling for a pulse tube refrigerator with passive displacer. Appl. Therm. Eng. 211, 118464 (2022).
Li, J., Zhang, X., Yao, Y., Qi, Y. & Peng, L. Regularized extreme learning machine based on remora optimization algorithm for printed matter illumination correction. IEEE Access 12, 3718–3735 (2024).
Kondepogu, V. & Bhattacharyya, B. Hybrid AE and Bi-LSTM-aided sparse multipath channel estimation in OFDM systems. IEEE Access 12, 7952–7965 (2024).
Acknowledgments
The authors extend their appreciation to the Deanship of Research and Graduate Studies at King Khalid University for funding this work through Large Research Project under grant number RGP2/13/45. Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2024R716), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia. Researchers Supporting Project number (RSPD2024R787), King Saud University, Riyadh, Saudi Arabia. The authors extend their appreciation to the Deanship of Scientific Research at Northern Border University, Arar, KSA for funding this research work through the project number “NBU-FFR-2024-451-09. This study is partially funded by the Future University in Egypt (FUE).
Author information
Authors and Affiliations
Contributions
Conceptualization: Sarah A. Alzakari. Data curation and formal analysis: Mashael Maashi, Abeer A. K. Alharbi investigation and methodology: Saad Alahmari, Ahmed Sayed Project administration and Resources: Supervision; Saad Alahmari, Abeer A. K. Alharbi Validation and Visualization: Munya A. Arasi, Ahmed Sayed Writing—original draft, Saad Alahmari Writing—review and editing, Saad Alahmari All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Alzakari, S.A., Maashi, M., Alahmari, S. et al. Towards laryngeal cancer diagnosis using Dandelion Optimizer Algorithm with ensemble learning on biomedical throat region images. Sci Rep 14, 19713 (2024). https://doi.org/10.1038/s41598-024-70525-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-70525-0
Keywords
This article is cited by
-
Integration of Artificial Intelligence in Laryngeal Cancer Diagnosis and Prognosis: A Comparative Analysis Bridging Traditional Medical Practices with Modern Computational Techniques
Archives of Computational Methods in Engineering (2025)