Towards laryngeal cancer diagnosis using Dandelion Optimizer Algorithm with ensemble learning on biomedical throat region images

Alzakari, Sarah A.; Maashi, Mashael; Alahmari, Saad; Arasi, Munya A.; Alharbi, Abeer A. K.; Sayed, Ahmed

doi:10.1038/s41598-024-70525-0

Download PDF

Article
Open access
Published: 24 August 2024

Towards laryngeal cancer diagnosis using Dandelion Optimizer Algorithm with ensemble learning on biomedical throat region images

Sarah A. Alzakari¹,
Mashael Maashi²,
Saad Alahmari³,
Munya A. Arasi⁴,
Abeer A. K. Alharbi⁵ &
…
Ahmed Sayed⁶

Scientific Reports volume 14, Article number: 19713 (2024) Cite this article

3000 Accesses
5 Citations
Metrics details

Subjects

Abstract

Laryngeal cancer exhibits a notable global health burden, with later-stage detection contributing to a low mortality rate. Laryngeal cancer diagnosis on throat region images is a pivotal application of computer vision (CV) and medical image diagnoses in the medical sector. It includes detecting and analysing abnormal or cancerous tissue from the larynx, an integral part of the vocal and respiratory systems. The computer-aided system makes use of artificial intelligence (AI) through deep learning (DL) and machine learning (ML) models, including convolution neural networks (CNN), for automated disease diagnoses and detection. Various DL and ML approaches are executed to categorize the extraction feature as healthy and cancerous tissues. This article introduces an automated Laryngeal Cancer Diagnosis using the Dandelion Optimizer Algorithm with Ensemble Learning (LCD-DOAEL) method on Biomedical Throat Region Image. The LCD-DOAEL method aims to investigate the images of the throat region for the presence of laryngeal cancer. In the LCD-DOAEL method, the Gaussian filtering (GF) approach is applied to eliminate the noise in the biomedical images. Besides, the complex and intrinsic feature patterns can be extracted by the MobileNetv2 model. Meanwhile, the DOA model carries out the hyperparameter selection of MobileNetV2 architecture. Finally, the ensemble of three classifiers such as bidirectional long short-term memory (BiLSTM), regularized extreme learning machine (ELM), and backpropagation neural network (BPNN) models, are utilized for the classification process. A comprehensive set of simulations is conducted on the biomedical image dataset to highlight the efficient performance of the LCD-DOAEL technique. The comparison analysis of the LCD-DOAEL method exhibited a superior accuracy outcome of 97.54% over other existing techniques.

Automating cancer diagnosis using advanced deep learning techniques for multi-cancer image classification

Article Open access 23 October 2024

Artificial Intelligence-based methods in head and neck cancer diagnosis: an overview

Article Open access 19 April 2021

An enhanced fusion of transfer learning models with optimization based clinical diagnosis of lung and colon cancer using biomedical imaging

Article Open access 07 July 2025

Introduction

Laryngeal cancer indicates nearly 1% to 2% of cancers globally are rising while the incidence is reducing in a few states. The uppermost incidence is in Europe, while the percentage of death and incidence peaks in Africa¹. Hypopharyngeal cancer is nearly 0.5% to 1%, with a noted rising rate in women in numerous states. Above 95% of hypopharyngeal and laryngeal cancers are squamous cell carcinoma (SCC), affected mainly through alcohol and tobacco intake². Compared to oropharyngeal cancer, disease with the human papillomavirus does not play an efficient part^3,4. In modern diagnosis, false cancer detection also exists, which is frequently related to the natural anxiety of experts to prevent overlooking cancer at initial stages⁵. To help in this method, researchers attempt to enhance pre-treatment estimates and offer intraoperative data on the pathologies. To get the histopathology of an abnormality, the study of endoscopic images may be the most available technique. Such images are utilized for lesion recognition and classification, as projected in this paper⁶. However, standard imaging models like white-light endoscopy usually deliver restricted data about the laryngeal tissue. Narrow-band imaging (NBI) offers an exciting alternative to attain this goal⁷.

The NBI is an optical technology that improves the expert's ability to discover and analyze lesions over endoscopic inspection⁸. It allows improved tissue characterisation with a filtered-spectrum illumination method that enhances the absorbance and scattering of the light in tissue⁹. Moreover, the study indicates no significant changes in the accuracy of distinguishing malignant and benign lesions in the NBI and WLE for AI¹⁰. CNN is famous for its essential benefit in processing larger-scale imageries and has developed as a central point of study in many scientific areas and medicine¹¹. However, laryngeal SCC still needs to be explored in AI research. Deep convolutional neural networks (DCNN) have shown an extraordinary ability to diagnose many diseases, such as interstitial pulmonary and breast tumours¹². In the present scenario, significant development has been made in AI studies in head and neck cancer. Researchers have commonly used many AI models, furthering innovation in medical diagnosis and treatment¹³.

This article introduces an automated Laryngeal Cancer Diagnosis using the Dandelion Optimizer Algorithm with Ensemble Learning (LCD-DOAEL) method on Biomedical Throat Region Image. The LCD-DOAEL method aims to investigate the throat region images for the presence of laryngeal cancer. In the LCD-DOAEL method, the Gaussian filtering (GF) approach is applied to eliminate the noise in the biomedical images. Besides, the complex and intrinsic feature patterns can be extracted by the MobileNetv2 model. Meanwhile, the DOA model carries out the hyperparameter selection of MobileNetV2 architecture. Finally, the ensemble of three classifiers such as bidirectional long short-term memory (BiLSTM), regularized extreme learning machine (ELM), and backpropagation neural network (BPNN) models, are utilized for the classification process. A comprehensive set of simulations is conducted on the biomedical image dataset to highlight the efficient performance of the LCD-DOAEL technique. The significant contributions of the LCD-DOAEL technique are listed as follows:

The LCD-DOAEL method utilized the GF model to improve the quality of biomedical images by efficiently mitigating noise. This preprocessing step enhances the clarity of the input data and facilitates more precise and dependable feature extraction processes. By lessening noise artefacts, GF improves subsequent image evaluation tasks' comprehensive accomplishment and precision.
The MobileNetV2 technique is implemented to extract convolutional, intrinsic feature patterns from biomedical images, utilizing its optimization for mobile and embedded vision applications. This model confirms computational effectualness and outperforms robustly comprehending features significant for precise biomedical image evaluation, thereby improving the diagnostic abilities and research results.
The DOA method optimizes MobileNetV2's hyperparameters and batch sizes, refining learning rates and architecture settings to improve model accomplishment. This technique confirms that MobileNetV2 operates at peak effectiveness, enhancing its accuracy and adaptability in biomedical image processing tasks and thereby advancing diagnostic precision and research abilities.
The novelty of the LCD-DOAEL technique is in incorporating BiLSTM, regularized ELM, and BPNN classifiers for biomedical image classification. This ensemble integrates the merits of sequential learning, efficient learning, and DL paradigms to improve classification accuracy and robustness. By employing various learning models, the ensemble aims to exceed the performance limitations of individual models, offering promising enhancements in complex biomedical image evaluation for more accurate diagnostics and research insights

Literature survey

Alrowais et al.¹⁴ present a Laryngeal Cancer Detection and Classification utilizing the Aquila Optimizer Algorithm with DL (LCDC-AOADL) approach on neck area images. The InceptionV3 model has been used for the feature extractor procedure. Furthermore, the AOA is employed to tune the hyperparameters of DBN technique outcomes in enhanced recognition rates. In¹⁵, a Deep Ensemble Learning (EL) model utilizing CNN and image segmentation method is presented. Kwon et al.¹⁶ main aim is to develop the accuracy of classification by relating the results gained by utilizing decision tree EL, which is mainly used to enhance the classification accuracy for smaller datasets, with excellent outcomes for the analysis of glottal cancer. Sahoo et al.¹⁷ proposed a novel and effective DL-based Mask RCNN technique for classifying laryngeal cancer and its signs using an image dataset. In¹⁸, a hybridization of handcrafted and deep features is developed in the initial laryngeal cancer classification structure. The handcrafted features utilizing First-order statistics (STAT) and Local Binary Patterns (LBP) and DenseNet201 utilizing transfer learning (TL) are removed from the endoscopic narrowband imageries of the larynx and collectively merged into typical features. The Recursive Feature Elimination with RF (RFE- RF) technique has nominated the optimum features. In¹⁹, a deep attention network (DAN)-based UNet with a colour normalization process (CN-DA-UNet) has been developed to attain an endwise segmentation of the glottal field. Initially, the original image was treated by colour normalization to decrease the harmful impacts of low comparison and considerable variances in colour among dissimilar imageries. Next, the regularized image leads to the developed DA-UNet for the feature extractor.

In²⁰, an image enhancement model can recognize the data of image structure. The technique mainly depends on the standard cycle constancy image translation design, which is altered to a U-shaped residual block. Simultaneously, a multi-scale cross-layer adaptation model (CLA) is intended, which mixes features of dissimilar powers over feature transfer to attain plentiful image data. Pan et al.²¹ propose a reverse attention network with a hybrid transformer (RANT) and integrate CNN and transformer in a sequential method to take the universal dependency feature according to the low-level spatial information. Then, the receptive field block module (RRM) links dissimilar scale features in a cascade method to mine the objective slowly. Lastly, the segmentation outcomes are enhanced by a convolution conditional random field (ConvCRF). Gharehchopogh et al.²² introduce a CQFFA, an enhanced Firefly Algorithm (FFA) incorporating twelve chaotic maps and a Quasi-Oppositional (QO)-Based Learning mechanism. In²³, the authors employ the enhanced African Vultures Optimization Algorithm (AVOA) with three binary thresholds (Kapur's entropy, Tsallis entropy, and Otsu's entropy) for multi-threshold image segmentation. The model also integrates the Quantum Rotation Gate (QRG) to improve population diversity and escape local optima while using the Association Strategy (AS) for efficient solution retrieval and accelerated search for optimal outcomes. Gharehchopogh and Khargoush²⁴ present an asymmetric clustering approach using the Interactive Autodidactic School (IAS) model. The IAS employing the Chebyshev chaotic function (CCF) demonstrates superior performance compared to other variants and metaheuristic algorithms in simulations.

The existing studies propose novel models such as DL with ensemble models, decision tree EL for smaller datasets, and advanced Mask RCNN models for precise classification and segmentation. Challenges across these models usually comprise dataset size limitations, computational complexity, reliability on precise image preprocessing, and variability in performance across diverse image qualities and conditions. The incorporation of handcrafted and deep features also presents threats in terms of feature engineering and generalizability. Furthermore, models integrating attention mechanisms and hybrid transformer architectures focus on improving classification accuracy but may need help training complexity and interpretability. These studies propose various methods for laryngeal cancer detection from neck area images. Yet, there is a notable research gap in ensuring scalability and robustness across numerous datasets, addressing computational complexity, precise image preprocessing needs, and efficiently combining handcrafted and deep features. Table 1 summarises the existing studies on laryngeal cancer diagnosis.

Table 1 Summary of existing studies on laryngeal cancer diagnosis.

Full size table

The proposed method

This section introduces an automated LCD-DOAEL method for biomedical throat region images. The LCD-DOAEL method aims to investigate the throat region images for the presence of laryngeal cancer. To accomplish this, the LCD-DOAEL technique comprises different sub-processes, such as GF-based preprocessing, MobileNetv2-based feature extractor, DOA-based hyperparameter tuning, and ensemble learning process. Figure 1 depicts the entire procedure of the LCD-DOAEL technique.

Image preprocessing

Initially, the LCD-DOAEL technique takes place, and the GF approach is applied to eliminate the noise in the biomedical images. GF is a popular image-processing algorithm intended to blur or smooth an image²⁵. The underlying idea behind GF is to convolute the image with Gaussian kernels, a 2D distribution considered by a bell-shaped curve. The convolutional function includes sliding the Gaussian kernel over each image pixel, and the neighbouring pixels contribute to the weighted average for each position according to the Gaussian distribution. The GF is highly efficient in reducing high-frequency noise in the image while preserving edges and details. The extent of blurring or smoothing is controlled by the variance of Gaussian kernels—a significant variance leads to the smooth, broad bell curve, resulting in substantial blurring.

MobileNetv2 feature extractor

The MobileNetv2 model can be employed for the feature extraction process. CNN is the foundation of innovative image classification technology. It is a kind of NN model where convolution is applied for a single layer rather than matrix multiplication²⁶. In contradiction to NN, which treats all the components of source images as input, convolution considers neighbourhood pixels, considerably improving the network efficiency. MobileNet is based on a simplified design that constructs lightweight DTL using a depthwise separable convolutional layer. The basis of MobileNet architecture is a convolution feature map that considerably affects the typical convolutional layer into a depthwise convolution (DWC) and an 11-convolution called a pointwise convolution (PWC). The DWC for MobileNet exploits one filter for all the network interfaces. Then, the outcome of the DWC is combined with the PWC using 11 convolutions. Typical convolution integrates input into new output series in a single iteration while filtering the input. The DWC can classify this into two layers, one for integration and another for filtration. This leads to a considerable reduction in framework and computation size. Figure 2 demonstrates the framework of MobileNetv2.

As a third version of MobileNet, MobileNetV2 is a highly efficient and lightweight DL method that satisfies the requirements of peripheral computing systems and mobile devices and excels in resource-constraint environments. MobileNetV2 is intended to have a balance between model size and computation efficacy. Based on integrating the "bottleneck” layer, this model primarily consists of depthwise separable convolution. This layer significantly decreases computation difficulty and model parameters, optimizing the accuracy while retaining model capacity. MobileNetV2 is also defined by the efficient approach of "inverted residuals". The paradigm strikes a balance between a linear constraint and lightweight expansion, improving the adaptability and efficiency of the model. Furthermore, MobileNetV2 integrates the “squeeze-and-excitation” model that enhances its capability for capturing crucial features by re-sizing the channel-specific feature response. The proposed model concludes with the FC layer having the class size and the 'LogSoftmax' activation function for class prediction. The MobileNetV2 is used to provide a visual representation of the structural model. The architecture of MobileNetV2 can accommodate various constraints and applications, which makes it an essential component in DL and CV.

Hyperparameter tuning using DOA mode

In this phase, the hyperparameter selection of MobileNetv2 is performed by the design of the DOA. Heuristics techniques are behaviours from natural procedures. DO is a bio‐inspired optimization method that employs SI to deal with the constant optimization complexity²⁷. DO has been introduced with stimulation from the wind‐blown behaviour of the dandelion plant. Seeds movement in 3 phases: settling, ascending, and descending at a random position in the landing phase. The DO method characterizes these three phases with mathematical representations and searches for optimum solutions by imitating these behaviours. The mathematical stages of the DO technique are detailed below.

1.
Initial population: Generate a population initialization randomly.
$$Population=\left[\begin{array}{lll}{D}_{1}^{1}& \dots & {D}_{1}^{Dim}\\ \vdots & \ddots & \vdots \\ {D}_{pop}^{1}& \dots & {D}_{pop}^{Dim}\end{array}\right]$$
(1)

Now, pop refers to the population dimension and $Dim$ the dimension of the variable. Every candidate solution will be arbitrarily generated among the lower limit (${L}_{B}$) and upper limit (${U}_{B}$) of the specified problem. The symbol “rand” shows a function which values have been arbitrarily allocated among $[\text{0,1}]$. The ${i}^{th}$ individual ${D}_{i}$ can be stated as follows.
$${D}_{i}=rand\times \left({U}_{B}-{L}_{B}\right)+{L}_{B}$$
(2)
2.
Evaluation of fitness values: Improved fitness values for the problems in every individual will be computed. The individual with the best fitness values can be measured as the best. The initial best candidate solution is precisely given by:
$${D}_{elite}=D\left(find\left({f}_{best}=f\left({D}_{i}\right)\right)\right)$$
(3)
3.
Ascension stage: The original places of individuals have been calculated and elevated by employing the FF values. With the impact of parameters like air humidity, wind speed, and chamomile seeds, increase to various heights. Now, the weather can be separated into two conditions.

Case-1: On a pure day, wind speeds are considered lognormal distribution $\text{ln }Y\sim N\left(\mu , {\sigma }^{2}\right)$. The original location of the seeds can be measured as specified in Eq. (4).
$${D}_{(t+1)}={D}_{t}+\delta \times {v}_{x}\times {v}_{y}\times \text{ ln }Y\times \left({D}_{s}-{D}_{t}\right)$$
(4)

Now, ${D}_{t}$ denotes the location of the dandelion seeds at ${t}^{th}$ iteration. ${D}_{s}$ shows the randomly selected location within the search range at ${t}^{he tth}$ iteration, ${v}_{x}$ and ${v}_{y}$ denote the lift element coefficients because of the distinct hose movement of the dandelion. $\delta$ denotes the coefficient ranging from zero to tone reduces nonlinearly and tactics zero.

The $log$ uniform distribution assumed in Eq. (4) is described as $\mu =0$ and ${\sigma }^{2}=1$ and will be presented by the next Eq. (5)
$$\text{ln }Y=\left\{\begin{array}{l}\frac{1}{y\sqrt{2\pi }}\text{exp}\left[-\frac{1}{2{\sigma }^{2}}{\left(\text{ln }y\right)}^{2}\right] y\ge 0\\ 0 y<0\end{array}\right.$$
(5)

In every iteration method, an adaptive parameter $\gamma$’ is employed for controlling the length of the search method over the overall iterations count $T$. During the DO technique, the $y$ value was selected in the range $[\text{0,1}]$ with uniform distribution, and $\gamma$ can be described as:
$$\gamma =rand*\left(\frac{1}{{T}^{2}}{t}^{2}-\frac{2}{T}t+1\right)$$
(6)

Case 2: In a day noticeable by rainfall, the increase of dandelion seeds is prevented by humidity, air resistance, and level of alternative parameters. Accordingly, these seeds stay near their new position, and behaviour must be accurately determined by applying a mathematical Eq. (7):
$${D}_{(t+1)}={D}_{t}\times \left( 1-rand \times p\right)$$
(7)
$p$ becomes a parameter employed for regulating the local search region of dandelion and measured as specified in Eq. (8). This value will be updated at every iteration dependent upon the highest iteration and the number of iterations accessible.
$$p=\left(\frac{{t}^{2}-2t+1}{{T}^{2}-2T+1}+1\right)$$
(8)

Now, $T$ denotes the maximal number of iterations, and $t$ signifies the no. of iterations obtainable.
4.
Descent phase: Individuals drop to the height measured at the Ascension stage, and their location will be upgraded.
$${D}_{t+1}={D}_{t}-\alpha \times {\beta }_{t}\times \left({D}_{mea{n}_{-}t}-\alpha \times {\beta }_{t}\times {D}_{t}\right)$$
(9)

Now,${D}_{mea\text{n}\_\text{t}}$ refers to the mean location of the population at the ${i}^{th}$ iteration, $\beta t$ signifies the Brownian movement and random integer from the uniform distribution.
5.
Landing location determination: Seeds resolve in a random position due to weather and wind conditions in their novel location. By utilizing the population's development, the global preeminent solution can be represented by Eq. (10).
$${D}_{t+1}={D}_{elite}+levy\left(\lambda \right)\times \alpha \times \left({D}_{elite}-{D}_{t}\times \sigma \right)$$
(10)

Now, ${D}_{elite}$ is the optimum place for dandelion seed at ${i}^{t}s$ iteration. $levy$($\lambda$) symbolizes the operation of Levy flight and is computed by the given Eq. (11)
$$1evy\left(\lambda \right)=s\times \frac{w\times \sigma }{\left|t\right|B1}$$
(11)
where $B$ refers to randomly described at $[\text{0,2}].$ $S$ describes a continuous equivalent to 0.01. This will be arbitrarily selected among $\omega$ and $t[0,$ 1]. $\sigma$ denote measured as:
$$\sigma =\left(\frac{\Gamma \left(1+B\right)\times sin\left(\frac{\pi B}{2}\right)}{\Gamma \left(\frac{1+B}{2}\right)\times B\times 2\left(\frac{B+1}{2}\right)}\right)$$
(12)
6.
Repopulation: A novel population was formed with the current positions acquired.
7.
Stopping conditions: Steps 2–7 have been repeated until the stopping measure is achieved.
8.
Best value: The idea with optimum fitness values can be considered the optimum solution.

The DOA method derives an FF to attain better classification accuracy. It defines the positive integer to depict the superior outcome of the candidate solution. Here, the reduction of classifier errors is assumed as the FF.

$$\begin{aligned}fitness\left({x}_{i}\right)&=Classifier \; Error\;Rate\left({x}_{i}\right) \\ &=\frac{No. \; of \; misclassified \; samples}{Total \; No. \;of \;samples}\times 100 \end{aligned}$$

(13)

Ensemble learning

Finally, the classification process utilizes an ensemble of three classifiers: BPNN, regularized ELM, and the BiLSTM model.

BPNN model

BPNN is a computational model developed with the help of simulating the functional mode of neurons²⁸. This can be generally made of output, input, and hidden layers, and its operational mode primarily contains backpropagation (BP) and forward propagation. During forward propagation, the instance of every input factor can be initially inputted into the neurons of the "input layer", followed by the input sample being transferred to the following neuron over a specific logical correlation, as expressed in Eq. (14), and lastly, the neuron of “output layer”.

$${y}_{j}=f\left({v}_{j}\right)=f\left({\sum }_{i=1}^{n}{w}_{i{j}^{\chi }i}+{b}_{j}\right)$$

(14)

This logical correlation comprises the bias ${b}_{i}$ of the ${k}^{th}$ neurons and weight ${w}_{ij}$ of the $I$ neurons of $the k$ layer to ${the j}^{th}$ neurons at the $k+1$ layer. The obtained instances of the $k+1$ layer should be handled via the activation function; later, the dataset must be changed to $the k+2$ layers. At BP, the estimated value, y, acquired by the above-mentioned logical correlation, will related to investigational value $y$, and the loss function must be achieved. Thereby, the gradient functions of the loss function could be gained. In conclusion, the bias and weight have been continuously converted by employing the gradient descent technique, and the optimum weight vector and bias of every layer have been attained.

Regularized ELM model

As an SLFN, ELM features $M$ training samples²⁹. $\left\{\left({x}_{j}, {t}_{j}\right),j=1, \cdots , M\right\},{x}_{j}=\{{x}_{1}, {x}_{2}, \cdots , {x}_{m}{\}}^{T},{t}_{j}=\{{t}_{1}, {t}_{2}, \cdots , {t}_{n}{\}}^{T},{x}_{j},{t}_{j}$ indicate the input and output vectors of ${j}^{th}$ samples, correspondingly. The activation function is $\left(w, b, x\right)$, and the HL node is $L$ and t. The architecture of ELM comprises $n$ output neurons, $m$ input neurons, and $L$ hidden neurons:

$${t}_{j}={\sum }_{i=1}^{L}{\beta }_{i}{g}_{i}\left({w}_{i}\cdot {x}_{j}+{b}_{i}\right)j=1, \cdots , M$$

(15)

In Eq. (15), ${\beta }_{i}=[{\beta }_{i1}, {\beta }_{2}l, \cdots , {\beta }_{iL}{]}^{T}$ shows the connecting weight vector of ${i}^{th}$ hidden neurons to the output layer, ${W}_{i}=\{{W}_{i1}, {W}_{2}l, \cdots , {W}_{iL}{\}}^{T}$ signifies the connecting weight vector of ${i}^{th}$ hidden neurons to the input layer, and ${b}_{i}$ represents the bias of ${i}^{th}$ hidden nodes, each of them is produced at random.

$$T=H\beta$$

(16)

$$H=\left(\begin{array}{cccc}g({\omega }_{1},{b}_{1},{x}_{1})& g({\omega }_{2},{b}_{2},{x}_{1})& \dots & g({\omega }_{L},{b}_{L},{x}_{1})\\ g({\omega }_{1},{b}_{1},{x}_{2})& g({\omega }_{2},{b}_{2},{x}_{2})& \dots & g({\omega }_{L},{b}_{L},{x}_{2})\\ \vdots & \vdots & \vdots & \vdots \\ g({\omega }_{1},{b}_{1},{x}_{\text{N}})& g({\omega }_{2},{b}_{2},{x}_{\text{N}})& \dots & g({\omega }_{L},{b}_{L},{x}_{2})\end{array}\right)$$

(17)

Equation (17) is substituted to Eq. (16) and is attained by singular value and least square decomposition:

$$\beta =({H}^{T}H{)}^{-1}{H}^{T}T$$

(18)

The regularization coefficient enhances the structural stability of ELM and produces RELM:

$$\beta =({H}^{T}H+CI{)}^{-1}{H}^{T}T$$

(19)

In Eq. (19), $I$ shows the unit matrix, and $C$ indicates the regularization factor.

BiLSTM model

The BiLSTM plays an important role where each element in the input signal fuses the corresponding data from the past and present³⁰. In such cases, it produces better output. The linear DL method $fd \left(si, vx\right)={\sum }_{bc=1}^{hd}s{i}_{bc}v{x}_{bc}$, where the input is called $hd$, along with the term $vx,fd(\cdot )$, indicates the weight and output of networks. The BiLSTM model has two LSTM layers. The LSTM layer was trained with the input series in the forward direction. In the reserve order, the input series was given to train an additional LSTM layer in the backward direction. This input sequence has the training data's real and imaginary parts. The LSTM is used to resolve the gradient problems in the RNN based on long data series.

$$r{v}_{mn}=\phi \left({R}_{rv}d{a}_{mn}+{V}_{rv}g{d}_{mn-1}+h{a}_{rv}\right)$$

(20)

$$v{r}_{mn}=tanh\left({R}_{vr}d{a}_{mn}+{V}_{vr}g{d}_{mn-1}+h{a}_{vr}\right)$$

(21)

$$t{i}_{mn}=\nu \left({R}_{ti}d{a}_{mn}+{V}_{ti}g{d}_{mn-1}+h{a}_{ti}\right)$$

(22)

$$h{o}_{mn}=\nu \left({R}_{ho}d{a}_{mn}+{V}_{ho}g{d}_{mn-1}+haho\right)$$

(23)

Here,${R}_{vr},{ R}_{rv},{ R}_{ti},{ and R}_{ho}$ are the weight matrices on the input state. The weight matrix from the prior short-term $g{d}_{mn-1}$ is represented as ${V}_{vr},{ V}_{rv},{ V}_{ti},{ and V}_{ho}.$ The variables $h{a}_{vr},h{a}_{rv},h{a}_{ri}, and ha{h}_{0}$ are represented as biased. The long‐term state $c{d}_{mn}$ is represented as follows

$$c{d}_{mn}=r{v}_{mn}\otimes c{d}_{mn-1}+t{i}_{mn}\otimes v{r}_{mn}$$

(24)

Lastly, the output ${f}_{mn}$ is formulated by:

$$y{f}_{mn}=g{d}_{mn}=h{o}_{mn}\otimes tanb\left(cd\right)$$

(25)

where $c{d}_{mn-1}$ is a variable in the prior long-term state,

Experimental validation

The experimental validation inspects the LC detection outcomes of the LCD-DOAEL technique on the throat image datasets³¹, containing 1320 samples with four classes, as illustrated in Table 2. Figure 3 defines the sample images. The suggested technique is simulated using the Python 3.6.5 tool on PC i5-8600k, 250 GB SSD, GeForce 1050Ti 4 GB, 16 GB RAM, and 1 TB HDD. The parameter settings are provided: learning rate: 0.01, activation: ReLU, epoch count: 50, dropout: 0.5, and batch size: 5.

Table 2 Details of database.

Full size table

The confusion matrices formed by the LCD-DOAEL method on 80%TRAPH/20%TESPH and 70%TRAPH/30%TESPH are demonstrated in Fig. 4. The experimental value inferred the effective detection and classification of the above four class labels. The confusion matrices illustrate the classification performance across diverse phases and classes. In the training phase (80%), classes He, Hbv, IPCL, and Le portray varying levels of correct and misclassified instances, reflecting the model's performance during training. The testing phase (20% and 30%) further evaluates the model's ability to generalize to new data, with similar patterns observed across the predicted classes. Overall, the matrices depict the distribution of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN), providing insights into the classifier's accuracy and errors across diverse datasets and phases of evaluation.

The LC detection results of the LCD-DOAEL technique with 80%TRAPH/20%TESPH are reported in Table 3 and Fig. 5. The experimental outcome demonstrates the proficient recognition outcomes of the LCD-DOAEL technique under four classes. With 80%TRAPH, the LCD-DOAEL method offers an average $acc{u}_{y}$ of 96.21%, $pre{c}_{n}$ of 92.53%, $rec{a}_{l}$ of 92.42%, ${F}_{score}$ of 92.40%, and $AU{C}_{score}$ of 94.97%. Additionally, with 20%TESPH, the LCD-DOAEL method provides an average $acc{u}_{y}$ of 97.54%, $pre{c}_{n}$ of 95.03%, $rec{a}_{l}$ of 95.03%, ${F}_{score}$ of 94.91%, and $AU{C}_{score}$ of 96.70%.

Table 3 LC detection outcome of LCD-DOAEL method with 80%TRAPH/20%TESPH.

Full size table

The $acc{u}_{y}$ curves for training (TRA) and validation (VAL) given in Fig. 6 for the LCD-DOAEL method with 80%TRAPH/20%TESPH provides meaningful interpretation into its performance on different epochs. There is a continuous refinement in TRA $acc{u}_{y}$ and TES $acc{u}_{y}$ with maximum epochs, which indicates the model's proficiency in recognizing and learning patterns from both data. The increasing trend in TES $acc{u}_{racy}$ underscores the model's adaptability to the TRA data and its proficiency in making accurate predictions on hidden datasets, highlighting strong generalizability.

Figure 7 provides a detailed review of TRA and TES loss values for the LCD-DOAEL technique with 80%TRAPH/20%TESPH over various epochs. The TRA loss continuously minimizes as the model improves its weights to lessen the classifier error rate on both datasets. The loss curve demonstrates the model's alignment with the TRA data, highlighting its ability to effectively capture patterns in both datasets. The consistent improvement of parameters in the LCD-DOAEL technique is intended to reduce inconsistencies between predictions and actual TRA labels.

The findings affirm that the LCD-DOAEL model with 80%TRAPH/20%TESPH constantly obtains high PR values over all the classes regarding the PR curve given in Fig. 8. These outcomes emphasize the model's ability to discriminate between various classes, highlighting its effectiveness in accurately detecting classes.

Further, ROC curves produced by the LCD-DOAEL method with 80%TRAPH/20%TESPH are presented in Fig. 9, illustrating its ability to distinguish between class labels. This curve offers valuable insights into how TPR and FPR tradeoffs differ over diverse classifier epochs and thresholds. The outcomes emphasize the model's accurate classification outcomes on different classes, highlighting its efficiency in addressing various classification problems.

The LC detection outcomes of the LCD-DOAEL method with 70%TRAPH/30%TESPH are shown in Table 4 and Fig. 10. The outcome illustrates the proficient detection outcome of the LCD-DOAEL method on four different classes. With 70%TRAPH, the LCD-DOAEL method provides average $acc{u}_{y}$ of 96.16%, $pre{c}_{n}$ of 92.36%, $rec{a}_{l}$ of 92.35%, ${F}_{score}$ of 92.33%, and $AU{C}_{score}$ of 94.90%. Furthermore, with 30%TESPH, the LCD-DOAEL method provides an average $acc{u}_{y}$ of 95.96%, $pre{c}_{n}$ of 91.99%, $rec{a}_{l}$ of 91.86%, ${F}_{score}$ of 91.91%, and $AU{C}_{score}$ of 94.58%.

Table 4 LC detection outcome of LCD-DOAEL method with 70%TRAPH/30%TESPH.

Full size table

The $acc{u}_{y}$ curves for TR and VL given in Fig. 11 for the LCD-DOAEL method with 70%TRAPH/30%TESPH offer meaningful interpretation into its performance on different epochs. Especially, there is a continuing evolution in TRA $acc{u}_{y}$ and TES $acc{u}_{y}$ with the highest epochs, which indicates the model's proficiency to learn and recognize patterns from both datasets. The increasing trend in TES $acc{u}_{racy}$ highlights the model's adaptability to the TRA dataset and its ability to make correct predictions on hidden data, highlighting strong generalizability.

Figure 12 shows a comprehensive review of the TR and TS loss values for the LCD-DOAEL method with 70%TRAPH/30%TESPH over different epochs. The TR loss continuously lessens as the model refines its weights to lessen the classifier error rate on both datasets. The loss curve demonstrates the model's alignment with the TR data, emphasising its proficiency in effectively capturing patterns in two datasets. Noteworthy is the continuous improvement of the parameter in the LCD-DOAEL technique, aimed at reducing inconsistencies between predictions and actual TR labels.

The outcomes affirm that the LCD-DOAEL method with 70%TRAPH/30%TESPH constantly obtains maximum PR values across the class labels regarding the PR curve presented in Fig. 13. These outcomes emphasize the model's effective capability to discriminate between various classes, highlighting its efficiency in correctly detecting classes.

Further, Fig. 14 shows ROC curves produced by the LCD-DOAEL technique with 70%TRAPH/30%TESPH, which demonstrates its ability to distinguish between classes. These curves offer valuable insights into how the tradeoff between TPR and FPR differs over diverse thresholds and classification epochs. The outcomes highlight the model's accurate classification outcomes on different classes, underscoring its efficiency in addressing different classification problems.

Table 5 and Fig. 15 highlight the comparative analysis of the LCD-DOAEL technique with recent models¹⁴. The results indicate that the DCNN, VGG-19, VGG-16, and AlexNet techniques have yet to reach good results. Simultaneously, the Exception and ResNet approaches have managed to attain moderate performances. Meanwhile, the LCDC-AOADL technique gains near-optimal performance. But the LCD-DOAEL technique reaches effectual performance over other models with enhanced $acc{u}_{y}$ of 97.54%, $pre{c}_{n}$ of 95.03%, $rec{a}_{l}$ of 95.03%, and ${F}_{score}$ of 94.91%.

Table 5 Comparative analysis of LCD-DOAEL technique with other models.

Full size table

Table 6 and Fig. 16 emphasize the LCD-DOAEL method's comparative computational time (CT) analysis with existing approaches. The experimental outcome shows that the DCNN, VGG-19, VGG-16, and AlexNet approaches have performed poorly. Simultaneously, the Exception and ResNet techniques have managed to obtain moderate outcomes. Meanwhile, the LCDC-AOADL technique attains near-optimum performance. However, the LCD-DOAEL method performs more effectively than other techniques with the least CT of 0.53 s.

Table 6 CT analysis of LCD-DOAEL technique with other methods.

Full size table

Thus, the LCD-DOAEL technique can enhance cancer detection of throat region images.

Conclusion and future work

This article introduces an automated LCD-DOAEL method for biomedical throat region images. The LCD-DOAEL method aims to investigate the images of the throat region for the presence of laryngeal cancer. To accomplish this, the LCD-DOAEL method comprises different sub-processes such as GF-based preprocessing, MobileNetv2-based feature extractor, DOA-based hyperparameter tuning, and ensemble learning process. Initially, the LCD-DOAEL technique takes place, and the GF approach is applied to eliminate the noise in the biomedical images. Besides, the complex and intrinsic feature patterns can be extracted by the MobileNetv2 model. Meanwhile, the DOA model carries out the hyperparameter selection of the MobileNetV2 model. Finally, the ensemble of three classifiers such as BPNN, regularized ELM, and BiLSTM model. A comprehensive set of simulations is conducted on the biomedical image dataset to highlight the effective performance of the LCD-DOAEL method. The comparison analysis of the LCD-DOAEL method exhibited a superior accuracy outcome of 97.54% over other existing techniques. The LCD-DOAEL method faces various potential limitations and areas for future improvement. Firstly, the efficiency of GF may vary based on the complexity and variability of noise in biomedical images, justifying robustness analysis across several datasets. Secondly, while MobileNetV2 is effectual, its application to highly detailed or varied biomedical images may need additional adaptation or augmentation. Thirdly, optimizing hyperparameters with DOA improves performance but may need continual refinement as datasets evolve. Future research could explore hybrid or alternative filtering models, incorporate more advanced neural network models, and enhance ensemble model training techniques to attain even higher classification accuracy and generalization in biomedical image analysis. Moreover, examining methods to reduce bias and discrepancy trade-offs among the ensemble classifiers could further optimize performance in challenging diagnostic scenarios.

Data availability

The datasets used and analyzed during the current study available from the corresponding author on reasonable request.

References

Young, G. O. Synthetic structure of industrial plastics. In Plastics, 2nd ed Vol. 3 (ed. Peters, J.) 15–64 (McGraw-Hill, 1964).
Google Scholar
Bur, A. M. et al. Interpretable computer vision to detect and classify structural laryngeal lesions in digital flexible laryngoscopic images. Otolaryngol.-Head Neck Surg. 169, 1564–1572 (2023).
Article PubMed Google Scholar
Raoof, S. S., Jabbar, M. A. & Fathima, S. A. Lung cancer prediction using machine learning: A comprehensive approach. In Proc. 2nd Int. Conf. Innov. Mech. Ind. Appl. (ICIMIA), 108–115 (2020).
Raoof, S. S., Jabbar, M. A. & Fathima, S. A. Lung cancer prediction using feature selection and recurrent residual convolutional neural network (RRCNN). In Machine Learning Methods for Signal, Image and Speech Processing, 23–46 (River Publishers, 2022).
Jabbar, M. A. Breast cancer data classification using ensemble machine learning. Eng. Appl. Sci. Res. 48(1), 65–72 (2021).
Google Scholar
Wellenstein, D. J., Woodburn, J., Marres, H. A. M. & van den Broek, G. B. Detection of laryngeal carcinoma during endoscopy using artificial intelligence. Head Neck 45(9), 2217–2226 (2023).
Article PubMed Google Scholar
Huang, P. et al. A ViT-AMC network with adaptive model fusion and multiobjective optimization for interpretable laryngeal tumor grading from histopathological images. IEEE Trans. Med. Imag. 42(1), 15–28 (2023).
Article Google Scholar
Bhattacharya, D. et al. Learning robust representation for laryngeal cancer classification in vocal folds from narrow-band images. In Med. Imag. Deep Learn. (2022).
Meyer-Veit, F., Rayyes, R., Gerstner, A. O. H. & Steil, J. Hyperspectral wavelength analysis with U-Net for larynx cancer detection. In Proc. Eur. Symp. Artif. Neural Netw. (ESANN), Comput. Intell. Mach. Learn., Bruges, Belgium (2022).
Timurzieva, A., Kotov, V., Popadyuk, V. & Ganshin, I. Rapid diagnosis of laryngeal cancer using Raman fluorescence spectroscopy. J. Clin. Physiol. Pathol. 1(1), 21–27 (2022).
Article Google Scholar
Gharehchopogh, F. S., Ghafouri, S., Namazi, M. & Arasteh, B. Advances in manta ray foraging optimization: A comprehensive survey. J. Bionic Eng. 21(2), 953–990 (2024).
Article Google Scholar
Sharma, S., Khodadadi, N., Saha, A. K., Gharehchopogh, F. S. & Mirjalili, S. Non-dominated sorting advanced butterfly optimization algorithm for multi-objective problems. J. Bionic Eng. 20, 819–843 (2022).
Article Google Scholar
Khodadadi, N., Soleimanian Gharehchopogh, F. & Mirjalili, S. MOAVOA: A new multi-objective artificial vultures optimization algorithm. Neural Comput. Appl. 34(23), 20791–20829 (2022).
Article Google Scholar
Alrowais, F. et al. Laryngeal cancer detection and classification using aquila optimization algorithm with deep learning on throat region images. IEEE Access. 11, 115306–115315 (2023).
Article Google Scholar
Bhattacharjee, R., Devi, K. S. & Vijaykanth, S. Detecting laryngeal cancer lesions from endoscopy images using deep ensemble model. In 2023 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT), 1–6 (IEEE, 2023).
Kwon, I. et al. Diagnosis of early glottic cancer using laryngeal image and voice based on ensemble learning of convolutional neural network classifiers. J. Voice https://doi.org/10.1016/j.jvoice.2022.07.007 (2022).
Article PubMed Google Scholar
Sahoo, P. K., Mishra, S., Panigrahi, R., Bhoi, A. K. & Barsocchi, P. An improvised deep-learning-based mask R-CNN model for laryngeal cancer detection using CT images. Sensors 22(22), 8834 (2022).
Article ADS PubMed PubMed Central Google Scholar
Joseph, J. S., Vidyarthi, A. & Singh, V. P. An improved approach for initial stage detection of laryngeal cancer using effective hybrid features and ensemble learning method. Multimed. Tools Appl. 1–23 (2023).
Ding, H., Cen, Q., Si, X., Pan, Z. & Chen, X. Automatic glottis segmentation for laryngeal endoscopic images based on U-Net. Biomedical Signal Process. Control 71, 103116 (2022).
Article Google Scholar
Pan, X., Ma, M., Bai, W. & Zhang, S. PISDGAN: Perceive image structure and details for laryngeal image enhancement. Biomedical Signal Process. Control 80, 104307 (2023).
Article Google Scholar
Pan, X., Bai, W., Ma, M. & Zhang, S. RANT: A cascade reverse attention segmentation framework with a hybrid transformer for laryngeal endoscope images. Biomedical Signal Process. Control 78, 103890 (2022).
Article Google Scholar
Gharehchopogh, F. S., Nadimi-Shahraki, M. H., Barshandeh, S., Abdollahzadeh, B. & Zamani, H. Cqffa: A chaotic quasi-oppositional farmland fertility algorithm for solving engineering optimization problems. J. Bionic Eng. 20(1), 158–183 (2023).
Article Google Scholar
Gharehchopogh, F. S. & Ibrikci, T. An improved African vultures optimization algorithm using different fitness functions for multi-level thresholding image segmentation. Multimed. Tools Appl. 83(6), 16929–16975 (2024).
Article Google Scholar
Gharehchopogh, F. S. & Khargoush, A. A. A chaotic-based interactive autodidactic school algorithm for data clustering problems and its application on COVID-19 disease detection. Symmetry 15(4), 894 (2023).
Article ADS Google Scholar
Abuya, T. K., Rimiru, R. M. & Okeyo, G. O. an image denoising technique using wavelet-anisotropic Gaussian filter-based denoising convolutional neural network for CT images. Appl. Sci. 13(21), 12069 (2023).
Article CAS Google Scholar
Hossen, M. M. et al. A reliable and robust deep learning model for effective recyclable waste classification. IEEE Access. 12, 13809–13821 (2024).
Article Google Scholar
Zhang, B. et al. Dynamic community detection method of a social network based on node embedding representation. Mathematics 10(24), 4738 (2022).
Article Google Scholar
Zheng, P., Wang, L., Ji, Y., Zeng, Y. & Chen, X. Backpropagation neural network modeling for a pulse tube refrigerator with passive displacer. Appl. Therm. Eng. 211, 118464 (2022).
Article Google Scholar
Li, J., Zhang, X., Yao, Y., Qi, Y. & Peng, L. Regularized extreme learning machine based on remora optimization algorithm for printed matter illumination correction. IEEE Access 12, 3718–3735 (2024).
Article Google Scholar
Kondepogu, V. & Bhattacharyya, B. Hybrid AE and Bi-LSTM-aided sparse multipath channel estimation in OFDM systems. IEEE Access 12, 7952–7965 (2024).
Article Google Scholar
https://zenodo.org/record/1003200.

Download references

Acknowledgments

The authors extend their appreciation to the Deanship of Research and Graduate Studies at King Khalid University for funding this work through Large Research Project under grant number RGP2/13/45. Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2024R716), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia. Researchers Supporting Project number (RSPD2024R787), King Saud University, Riyadh, Saudi Arabia. The authors extend their appreciation to the Deanship of Scientific Research at Northern Border University, Arar, KSA for funding this research work through the project number “NBU-FFR-2024-451-09. This study is partially funded by the Future University in Egypt (FUE).

Author information

Authors and Affiliations

Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, P.O. Box 84428, 11671, Riyadh, Saudi Arabia
Sarah A. Alzakari
Department of Software Engineering, College of Computer and Information Sciences, King Saud University, Po Box 103786, 11543, Riyadh, Saudi Arabia
Mashael Maashi
Department of Computer Science, Applied College, Northern Border University, Arar, Saudi Arabia
Saad Alahmari
Department of Computer Science, Applied College at RijalAlmaa, King Khalid University, Abha, Saudi Arabia
Munya A. Arasi
Department Information Systems, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), 11432, Riyadh, Saudi Arabia
Abeer A. K. Alharbi
Research Center, Future University in Egypt, New Cairo, 11835, Egypt
Ahmed Sayed

Authors

Sarah A. Alzakari
View author publications
Search author on:PubMed Google Scholar
Mashael Maashi
View author publications
Search author on:PubMed Google Scholar
Saad Alahmari
View author publications
Search author on:PubMed Google Scholar
Munya A. Arasi
View author publications
Search author on:PubMed Google Scholar
Abeer A. K. Alharbi
View author publications
Search author on:PubMed Google Scholar
Ahmed Sayed
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization: Sarah A. Alzakari. Data curation and formal analysis: Mashael Maashi, Abeer A. K. Alharbi investigation and methodology: Saad Alahmari, Ahmed Sayed Project administration and Resources: Supervision; Saad Alahmari, Abeer A. K. Alharbi Validation and Visualization: Munya A. Arasi, Ahmed Sayed Writing—original draft, Saad Alahmari Writing—review and editing, Saad Alahmari All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Saad Alahmari.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Alzakari, S.A., Maashi, M., Alahmari, S. et al. Towards laryngeal cancer diagnosis using Dandelion Optimizer Algorithm with ensemble learning on biomedical throat region images. Sci Rep 14, 19713 (2024). https://doi.org/10.1038/s41598-024-70525-0

Download citation

Received: 03 June 2024
Accepted: 19 August 2024
Published: 24 August 2024
DOI: https://doi.org/10.1038/s41598-024-70525-0

Keywords

This article is cited by

Integration of Artificial Intelligence in Laryngeal Cancer Diagnosis and Prognosis: A Comparative Analysis Bridging Traditional Medical Practices with Modern Computational Techniques
- Pavneet Kaur
- Trilok Chand
- Sudesh Rani
Archives of Computational Methods in Engineering (2025)