Introduction

Lung cancer or Adenocarcinoma is known a very lethal form of cancer. The rate of mortality for lung cancer is also very high. Only about 45 in 100 people that is 45 percent survive the lung cancer for 1 year or more1,2,3. Furthermore, only about 20 out of 100 people that is only 20 percent will only be able to survive for 5 or more years and only about 10 out of 100 people that is only a mere 10 percent of the people affected by lung cancer2,3. Various scientists across the globe are working tirelessly for decades to find the cure and vaccination for this disease but the progress in the research to search for the cure and vaccination for the lung cancer has been quite slow the only way to treat this disease is to caught this disease at a nascent stage, if caught at a later stage it is not easy to cure and rate of survival is quite low4,5.This study aims to identify an effective method for detecting lung cancer in CT scan images using convolutional neural networks and other deep learning models6,7,8. We do this by taking the help of different algorithms like CNN, VGG16, InceptionV3, ResNet101V2, MobileNet, Xception and DenseNet CNN Model Frameworks. We perform the task by detection of fatal lung nodules at an earliest by checking its probability9,10,11. Medical practitioners perform various diagnostic procedures, including clinical assessments, CT scan analysis, positron emission tomography (PET), and needle prick biopsy analysis. But use of these type of evasive methods involves a lot of high risks and anxiety issues. CT imaging is regarded as the most suitable technique for detecting lung cancer. In this we prefer low dose CT because it takes lower frequency radiation than common dose CT12. Also result show cancer related deaths were lower in persons who were exposed to low dose Ct than those who were exposed to chest radiographs.In this approach, images are divided into thinner slices, enabling better detail. Also, the images with slice larger than 2.5 mm were abandoned. This led to a total of 888 CT scans images, with a total of 36,378 annotation by radiologists13. Only the annotation greater than or equal to 3mm were considered relevant. The nodules observed by differed bibliophiles which were far nearer than the sum of their radii was also combined which led to the average of these merged annotations Lung cancer is one of the most difficult challenges in the field of oncology and a significant contributor to global deaths. An early and precise recognition of lung cancer is very much needed for an effective treatment and better outcomes. Among the various diagnostic tools available, computed tomography (CT) has become a pioneer in the detection and treatment of lung cancer. Nonetheless, manually interpreting CT images remains challenging and susceptible to variability, highlighting the necessity for a reliable, automated, and more efficient approach to initially detect lung cancer.

In the past few years, advancements in deep learning have fundamentally transformed medical image analysis, yielding promising outcomes for enhanced diagnostic precision. This study aims to use the capabilities of deep learning combined with the optimization and fuzzy image enhancement strategies to create an efficient method for early detection using CT images9. Lung cancer detection comes with challenges, including the subtle and heterogeneous nature of lesions and the anatomical variation between different patients and the presence of noise and artifacts in CT scans. Different traditional techniques often rely on manual interpretation by radiologists, which is very inefficient and also leads to faults. In addition, the vast volume of imaging data generated in clinical settings often requires efficient and scalable solutions that meet diagnostic requirements4,13. Convolutional Neural Networks (CNNs) are adept at extracting layered features from unprocessed pixel data, rendering them highly effective for image analysis applications. By training CNNs on annotated CT scan images, we can take advantage of their ability to automatically extract different discriminative features that are indicative of lung cancer. However, indiscriminate application of CNNs to entire CT volumes can lead to suboptimal results due to the presence of non-cancerous structure. To eliminate this problem, we can use image segmentation techniques to delineate regions of interest in the lung, effectively guiding CNN’s focus on various suspicious lesions1,10. The segmentation process involves semantically dividing the CT images into significant regions such as the lung parenchyma, blood vessels, and lesions. Depending on the complexity of this task, different segmentation algorithms can be used, including thresholding, region growth, and deep learning-based methods. Once regions of interest are identified, CNNs are applied to different feature extractions and classification of potential cancerous lesions is performed. Recent advancements in transformer-based models have significantly improved medical image segmentation. For instance,14 employs a mixed-transformer with semantic segmentation and triplet pre-processing to fuse MRI and PET data for early multi-class Alzheimer’s diagnosis. Similarly, XAI-RACapsNet15 integrates an explainable capsule network with O-net ROI segmentation to enhance breast cancer detection in mammography images. Additionally,16 leverages a deep dual patch attention mechanism and adversarial learning for accurate epileptic seizure prediction. These studies highlight the growing impact of transformer-based and attention-driven models in achieving more accurate, interpretable, and disease-specific segmentation results. This synergistic approach not only increases the sensitivity and specificity of lung cancer detection, but also reduces false positives and helps improve computational efficiency3,17. In this paper, we try to present a comprehensive framework for deep learning censor detection that includes both theoretical foundations and practical implementation. We begin our research by reviewing existing methodologies and challenges in lung cancer detection by laying the groundwork of our proposed approach. Subsequently, we will dive into the theoretical foundations of deep learning and clarify the principles of CNN and their application in medical image analysis2,18. We then also describe our methodology which we used for integrating segmentation and CNNs for lung cancer detection, considering technical nuances and design considerations. In addition, we present results demonstrating the efficiency and robustness of our approach, which we have validated on a diverse CT image dataset. Through this research, we try to advance the state-of-the-art in lung cancer detection and try to offer a scalable and reliable solution for clinical practice, by leveraging the synergy between deep learning and image segmentation6, our objective is to provide healthcare professionals with tools that facilitate early diagnosis and personalize treatment for each patient individually, ultimately improving patient health.

In this study, we propose an innovative deep learning framework. Our ensemble learning approach involves evaluating different pre-trained CNN models for their feature extraction capabilities, along with employing optimization techniques to calculate optimal weights for classifying lung cancer. The main contributions include the following.

  • Six pre-trained convolutional neural network (CNN) architectures are modified and fine-tuned through transfer learning techniques, utilizing a publicly available lung cancer dataset. These CNN variants, which feature various architectural components, have been evaluated as Base classifiers.

  • The top three CNN models in terms of performance are selected to develop an ensemble model using a weighted averaging approach, ensuring robustness, stability, and improved classification performance. The combination of multiple CNN models helped reduce individual model errors.

  • A weighted ensemble learning approach is introduced to enhance the performance of the selected CNN models.

  • Various optimization algorithms, including the Flower Pollination Algorithm (FPA), Artificial Bee Colony (ABC), Particle Swarm Optimization (PSO), Bayseian Optimization(BO) and Ant Colony Optimization(ACO), are evaluated for optimizing ensemble weights. Based on the performance results, FPA is selected for weight optimization, leading to the development of the proposed FPA based weighted ensemble model.

  • An experimental comparison of various CNN models with the ensemble model has been conducted to assess the effectiveness of the proposed methodology. The proposed model exhibited a notable enhancement over the top-performing individual CNN model, affirming its efficacy.

  • The Proposed Model has demonstrated improved overall performance, an optimal balance between precision and recall, and enhanced generalization capability for classifying lung cancer.

The rest of the paper is organized as follows. In the ’Related Work’ section of this paper, the review of the literature is discussed. Methods and Materials Covers the methodology, including the data set used, the CNN models used, the ensemble approach, and the optimization technique (FPA) used to find the optimal weights for the ensemble approach. The proposed model discusses the proposed methodology. The Experimental Results and Discussion section discusses the platform and evaluation metrics used to conduct and evaluate experiments with Results and Discussion. The last section is the Conclusion.

Related work

The Lung cancer or adenocarcinoma was first properly explained by doctors in the mid-19th century, but still in the early half of the twentieth century it was quite rare. But as industries came up, the more pollution in the air was present, leading to increasing cases of lung cancer. In addition, extensive smoking leads to a high risk of lung cancer19,20. The first computer aided detection (CAD) was developed in late 1980s for the lung modules but it was not very effective in detection of lung cancer. Later, new technologies such as the graphical unit and the convolutional neural network were combined for the detection of lung cancer21. Tan et al.22 presents the review of the technique for the semantic medical image segmentation. Many prominent scientists have worked and are still working to find the cure and vaccination of lung cancer and the way to detect lung cancer at a nascent stage so that we can catch it at an early stage and prevent it from further expanding and can save the life of the patient. Setio et al. suggested a fully 3D convolutional neural network classification for decrease in FP in lung module classification. 3D classification is used to fully understand CT scans and reduce the probability of making a wrong decision23. Xu et al.24 proposed image fusion technique using dual gain video stream. Ding and Liao et al. used a 3D faster R-CNN to decrease false positive in nodule detection. This is used to speed up the process of nodule classification and find the dual path network to prevent wrong analysis of the lung nodule. Hammad et al.25 proposed myocardial infraction detection model. Jiang Hongyang is also one of the most prominent scientists who have worked in cancer detection to find a way to detect early lung cancer26. He created a group based on pulmonary nodule detection to use multiple techniques with the Frangi filter to improve performance19. It was used to combine two sets of images and use a four-channel 3D CNN to learn the features collected by the radiologists. These are just a few examples; like them, many more scientists, radiologists, and physicians worked day and night to find a suitable and effective way for proper detection of lung cancer so that we can save thousands of lives that we lost due to lung cancer. Sedik et al.27 proposed for the detection of coronavirus. Chen et al.28 proposed model for the detection of emotion detection. In recent years, there has been an increase in research efforts where researchers have tried to do an intersection of deep learning and medical imaging which is used to find an efficient way for the early detection of adenocarcinoma, also try to find different methods with which we are able to perform diagnosis of lung cancer and can be able to teat lung cancer in an efficient and a more cost-effective way. To implement the multiple model in parallel, shell scripting29 has contributed a major role for us. Gao et al.30 proposed affect of threat detection on the image segmentation. Hammad et al.31 proposed a model for arrhythmia detection using deep learning.

Several studies have explored deep learning approaches for lung cancer classification using CT scan images. Venkatesh et al.32 used a combination of K-Means clustering and CNN on public and private datasets, achieving an impressive accuracy of 99.967% with an MSE33 of 0.031. Rehan Raza et al.34 utilized EfficientNetB1-B4 with transfer learning in the IQ-OTH / NCCD dataset, reporting an accuracy of 99.01% and an AUC of 0.99. Similarly, Sachikanta Dash et al.35 applied EfficientNet with an autoencoder on the same dataset, achieving an accuracy of 98.98% and an AUC of 0.9872. Murthy et al.36 introduced a fuzzy-based Efficient Residual Network for classification using the LIDC-IDRI dataset, achieving 93.2% accuracy, 94.8% true positive rate, and 92.6% precision. Sultana et al.37 experimented with MobileNetV2, VGG19, and ResNet50 on chest CT and PET-CT images, reaching an accuracy of 98.67%. Nahed Tawfik et al.38 combined the CLAHE algorithm with Xception and EfficientNet on a public CT dataset, achieving a specificity of 99.68% and an accuracy of 99.03%. Mohammad Q. Shatnawi et al.39 tested ConvNeXt, VGG16, and ResNet50 on a public CT dataset, obtaining 99% accuracy and 99.2% precision. Lastly, Tolgahan Gulsoy et al.40 proposed FocalNeXt on the IQ-OTH/NCCD dataset, achieving high sensitivity (99.78%), recall (99.36%), F1 score (99.56%) and an overall accuracy of 99.81%. Flyckt et al.41 Dynamic Ensemble Selection achieving AUC of \(0.77\pm 0.01\) using standard blood tests and patient history data, incorporating multimodal analysis for improved prediction accuracy. Zia et al.42 Dual Attention CNN showed improved performance through channel and spatial attention mechanisms, particularly excelling in identifying small nodules with a 92% detection rate for early stage cancers. Shah et al.43 introduce a Deep Ensemble 2D CNN approach for lung nodule detection, utilizing multiple CNNs and the LUNA16 dataset to achieve enhanced accuracy in cancer screening. The proposed architecture combines three distinct CNNs with varying configurations, demonstrating 95% accuracy and outperforming the baseline methods.

Although these studies demonstrate the effectiveness of various deep learning models, there is a need to improve classification performance by utilizing ensemble techniques. The proposed research aims to introduce a weighted average ensemble44 based on the flower pollination algorithm (FPA) to improve classification accuracy. By optimally integrating multiple models, the ensemble approach seeks to achieve better generalization and robustness in the classification of lung cancer. Table 1 summarizes recent work on the detection of lung cancer.

Table 1 Comparison of various approaches for lung cancer detection.

Materials and methods

Description of dataset

We use the chest CT-Scan image dataset45, which is publicly available for research. The dataset consists of images categorized into normal lungs without cancer, Adenocarcinoma, Large cell carcinoma, and Squamous cell carcinoma. The data consists of CT scan images classified according to the presence and type of cancer. Adenocarcinoma is the most common form of lung cancer, accounting for about 30% of all cases. Large cell carcinoma represents approximately 10–15% of non-small cell lung cancers. Squamous Cell Carcinoma, closely linked to smoking, accounts for approximately 30% of nonsmall cell lung cancers. On the contrary, normal CT scans serve as a baseline for healthy lung tissue, helping to distinguish between malignant and nonmalignant cases in diagnostic imaging. The data set includes 1000 CT scans in.jpg and.png formats. It contains four types of lung cancer: adenocarcinoma with 338 scans, large cell carcinoma with 187 scans, and squamous cell carcinoma with 260 scans. In addition, there are 215 scans of normal cells.The input image is resized to \(224\times 224\) pixels, and its pixel intensities are normalized by scaling them between 0 and 1. This has been achieved by dividing each pixel value by 255.0, reducing computational complexity and ensuring consistency in data processing. As part of data pre-processing, the training images undergo augmentation using the ImageDataGenerator to enhance model generalization. we have applied augmentation over the dataset. It has included 10 degree rotation, width and height shifts of up to 20%. Shearing, zooming, and horizontal flipping, ensuring variability in the data set.These transformations help prevent overfitting by exposing the model to diverse variations in the input data. However, no augmentation is applied to validation and test images; they are only rescaled to maintain consistency during evaluation. This approach ensures that the model learns robust features while being fairly evaluated on unseen data. Figure 1 shows the distribution of data for training, validation, and testing and Fig. 2 shows sample images of the four categories.

Fig. 1
figure 1

Distribution of images for training, testing and validation.

Fig. 2
figure 2

Sample images from dataset.

CNNs

Convolutional Neural Networks (CNNs) are widely used in image-related applications46. These networks comprise multiple distinct layers: input, convolution, activation, pooling, and fully connected layers. The role of convolution layers is to extract features from the input data. Activation functions like ReLU introduce nonlinearity into the network. The combination of layers serves to reduce the number of parameters, thereby lessening the likelihood of overfitting. Finally, fully-connected layers are responsible for making predictions, which are completed using a softmax classifier.

In experimental studies, pre-trained CNN models47 based on the architecture of ResNet, MobileNet, DenseNet, VGG, and Inception have been evaluated. These models leverage unique building blocks such as residual connections, dense blocks, inception modules, and separable convolutions, enabling them to learn complex features effectively. The distinct design of each architecture helps to optimize performance for various image processing tasks.

Flower pollination algorithm (FPA) for optimizing ensemble weights

The flower pollination algorithm (FPA) is a natural-inspired optimization method that emulates the flower pollination process48. This study uses FPA to identify the optimal weights for a weighted ensemble made up of three main models: \(m_1, m_2, m_3\). The aim is to discover the weight combination that improves the accuracy of the classification. Optimization involves the following stages and Complete Process is shown in Algorithm 1.

Phase 1: Initialization A population of candidate weight vectors \(W_i\) is initialized, where \(i = 1, 2, \dots , n\), and each weight vector is represented as:

$$\begin{aligned} W_i = \{w_1, w_2, w_3\}, \quad w_j \in [0,1] \text { and } \sum w_j = 1. \end{aligned}$$
(1)

Here, \(w_1, w_2, w_3\) are the weights assigned to the models \(m_1, m_2,\) and \(m_3\), respectively. The sum constraint ensures a valid probability distribution.

Phase 2: Global pollination Global pollination enables exploration using Lévy flights:

$$\begin{aligned} W_i^{t+1} = W_i^t + \lambda \cdot L(\beta ) \cdot (W_i^t - W^*), \end{aligned}$$
(2)

where \(W^*\) is the best weight vector found so far, \(L(\beta )\) represents the Lévy flight step size, and \(\lambda\) is the scaling factor. This phase allows significant exploration in the search space to avoid local optima.

Phase 3: Local pollination Local pollination mimics self-pollination within similar solutions:

$$\begin{aligned} W_i^{t+1} = W_i^t + \epsilon \cdot (W_j^t - W_k^t), \end{aligned}$$
(3)

where \(W_j^t\) and \(W_k^t\) are two randomly selected weight vectors, and \(\epsilon\) is a random value from \(U(0,1)\). This mechanism enables fine-tuning of weight adjustments.

Phase 4: Switching probability A probability \(p \in [0,1]\) determines whether global or local pollination is performed:

$$\begin{aligned} \text {If } rand < p, \text { perform global pollination; otherwise, perform local pollination.} \end{aligned}$$
(4)

This balance between exploration and exploitation ensures efficient optimization.

Phase 5: Fitness evaluation The fitness function \(f(W)\) evaluates the classification performance of the weighted ensemble using:

$$\begin{aligned} y_{\text {ensemble}} = w_1 m_1 + w_2 m_2 + w_3 m_3. \end{aligned}$$
(5)

The weights are optimized to maximize accuracy or other performance metrics. The best weight vector \(W^*\) is updated iteratively as:

$$\begin{aligned} W^* = \arg \max _{i} f(W_i), \quad \forall i \in \{1, 2, \dots , n\}. \end{aligned}$$
(6)

Phase 6: Stopping criterion The optimization stops when a predefined maximum number of iterations \(T\) is reached or when the change in the fitness function falls below a tolerance threshold.

Algorithm 1
figure a

FPA-Based Weight Optimization for Ensemble

Transfer learning

Transfer learning involves employing a model already trained on one task to address another related challenge. This method capitalizes on the insights gained from the initial task to enhance learning efficiency in the new scenario. Through transfer learning, the proposed framework attains superior generalization and robustness49. The pretrained base classifiers, originally trained on the ImageNet dataset50, underwent fine-tuning, with their classification layers adjusted to fit the particular class structure.

Ensemble of CNN models

Ensemble learning involves the aggregation of predictions from multiple models trained on the same data set, leading to improved predictive accuracy51. The fundamental concept is to strategically integrate base models to form a more robust final model. Using ensemble methods aids in reducing model variance and errors and often results in superior performance compared to individual models alone. When applied to deep CNN architectures, ensemble methods take advantage of the feature extraction capabilities of each model, thus improving generalization performance. Standard ensemble techniques include bagging, stacking, voting, and prediction averaging. In particular, the average ensemble is widely utilized for classification tasks. Our methodology improves the significance of the model employing a weighted ensemble approach instead of the conventional average ensemble52. Figure 3 illustrates the weighted average ensemble method, which combines predictions from multiple models by assigning varying weights according to their performance. The process includes the following steps:

  1. 1.

    Train multiple individual CNN models on the lung cancer dataset.

  2. 2.

    Each model generates its prediction for testing data.

  3. 3.

    Assign weights to models based on predefined criteria, such as accuracy or confidence.

  4. 4.

    Compute the final prediction as a weighted sum of the individual prediction models.

Mathematically, the final prediction of the ensemble \({\hat{y}}\) is given by:

$$\begin{aligned} {\hat{y}} = \sum _{i=1}^{N} w_i \cdot {\hat{y}}_i \end{aligned}$$
(7)

where:

  • \(N\) is the number of models,

  • \(w_i\) is the weight assigned to the \(i\)-th model, ensuring \(\sum _{i=1}^{N} w_i = 1\),

  • \({\hat{y}}_i\) is the prediction of the \(i\)-th model.

This approach ensures that models with higher reliability contribute more to the final decision, improving overall accuracy.

Performance measurement metrics

Various performance metrics are used to evaluate deep learning models. The confusion matrix is used to calculate accuracy, precision, recall, and F1 score53. The mathematical formulations for these performance metrics are given by Eqs. (4)–(7):

$$\begin{aligned} & \text {Accuracy} = \frac{TP + TN}{TP + FN + FP + TN} \end{aligned}$$
(8)
$$\begin{aligned} & \text {Precision} = \frac{TP}{TP + FP} \end{aligned}$$
(9)
$$\begin{aligned} & \text {Recall} = \frac{TP}{TP + FN} \end{aligned}$$
(10)
$$\begin{aligned} & \text {F1-score} = \frac{2 \times \text {Precision} \times \text {Recall}}{\text {Precision} + \text {Recall}} \end{aligned}$$
(11)

where \(TP\): true positive; \(TN\): true negative; \(FP\): false positive; \(FN\): false negative54.

Fig. 3
figure 3

Weighted average ensemble.

Proposed system design

Pre-trained CNN models

In CNN models, convolutional filters facilitate the identification and learning of image features without the need for preliminary feature extraction. This research has analyzed the performance of six pre-trained CNN architectures to evaluate the efficacy of CNN model ensembles. The examined architectures are DenseNet121, MobileNet, Xception, VGG16, InceptionV3, and ResNet101V2. These models are evaluated to showcase their capabilities in various classification tasks.

FPA algorithm-based model weighting

Flower Pollination Algorithm (FPA) is an optimization method derived from the natural flower pollination process. This technique utilizes the concepts of both local and global pollen transfer. Local pollination involves self-pollination or interactions between nearby flowers, while global pollination occurs via distant pollen transfer facilitated by long-range pollinators. The aim of the FPA is to thoroughly investigate the search space and determine the optimal solutions by achieving a balance between exploration and exploitation. In the proposed model, the initial set of solutions is represented as a population of flowers, each characterized by a set of weights. These weights undergo iterative adjustment according to the FPA principles. Model performance is assessed through fitness values based on accuracy measures. The algorithm generates new candidate solutions using combined local and global pollination strategies, choosing the top solutions based on fitness scores. Through continuous iterations, the most effective solutions are preserved, while ineffective ones are eliminated. After multiple generations, the final optimal solution is identified as the best result discovered. The FPA-refined weights are then integrated into the ensemble classifier, enhancing its overall performance.

Proposed FPA based weighted ensemble classifier

The study presented involves the development of a lung cancer detection strategy utilizing deep-ensemble learning. This approach integrates several individual classifiers to formulate a prediction model that is both more precise and reliable. The typical procedure for creating an ensemble classifier consists of selecting base classifiers, training these classifiers, constructing the ensemble, and assessing its performance, as detailed below.

  1. 1.

    Best 3 classifier selection: In this study, six pre-trained models were applied to the lung cancer dataset and evaluated using 5-fold cross-validation to ensure robustness and reduce overfitting. Each model was trained and validated across five different splits of the dataset, and the performance metrics were averaged over all folds. These models encompass diverse architectures to effectively capture different data attributes. Based on the mean performance across the folds, the three leading models were selected for ensemble development, with the flower pollination algorithm (FPA) used to determine their optimal weights.

  2. 2.

    Ensemble construction: The ensemble prediction is achieved by integrating the output of the top three CNN models using a weighted average approach, where the weights are determined by FPA optimization.

  3. 3.

    Ensemble evaluation: The ensemble’s performance has been assessed using testing data and relevant metrics.

Figure 4 presents a simulation of the weighted ensemble model based on the Flower Pollination Algorithm (FPA) for lung cancer detection using CT images, with further details provided in Algorithm 2.

Fig. 4
figure 4

FPA weighted ensemble architecture of lung cancer detection.

Algorithm 2
figure b

Proposed System: Weighted Ensemble with Flower Pollination Algorithm (FPA)

Experimental results and discussion

The research conducted included a series of experimental setups. Initially, the top three convolutional neural network (CNN) architectures were selected for their high classification efficacy. To build an ensemble model, a weighted averaging strategy was implemented. Various optimization techniques55 were used to determine the optimal ensemble weights. Analysis through confusion matrices and performance graphs indicated that this framework yields superior outcomes. The subsequent section evaluates the effectiveness of the proposed methodology compared to existing studies on lung cancer detection using pre-trained CNN models. In the experiments, an initial learning rate of 0.01 and a mini-batch size of 64 were employed, with Stochastic Gradient Descent serving as the optimization method. The training was completed after 100 epochs, with no indication of overfitting the network. For the classification layer within deep CNN models, categorical cross-entropy loss was used to train the weighted ensemble model in conjunction with comparison models. All experiments were conducted on the Kaggle GPU platform.

Results

Results of pretrained CNN models

Table 2 compares the performance of six CNN models for Lung Cancer Detection tasks based on accuracy, precision, recall, and F1 score metrics. Among the models, VGG16 and InceptionV3 delivered the highest performance with an accuracy of 94.6% and 94.0%, respectively, along with excellent precision, recall and F1 score values, indicating a robust ability to classify images accurately. MobileNet also demonstrated high performance, with an accuracy of 91.7% and consistent precision and recall. However, DenseNet121 achieved moderate results with an accuracy of 86.7%, while Xception showed the lowest overall performance, with an accuracy of 80.6%. ResNet101V2 offered a balanced trade-off with a strong accuracy of 92. 7%, suggesting that it is reliable for high-performance classification tasks. For a visual representation of the classification results, the confusion matrices of all CNN models at the end of the testing process are presented in Fig. 6

Table 2 Performance and computational comparison of CNN models.

Based on the classification results of the pre-trained CNN models shown in Table 2 and illustrated in Fig. 5, VGG16, InceptionV3, and ResNet101V2 emerge as the leading models, achieving accuracy levels of 94.6%, 94.0% and 92.7%, respectively. From comparison of execution time and model size, it is clear that MobileNet has the smallest model size (19.1 MB) and the fastest inference time (6.72 s), and ResNet101V2 and Xception have the highest training times (1326.3 s and 1314.86 s, respectively), while VGG16 is the largest model at 192 MB, requiring more storage and computational power.

Fig. 5
figure 5

Accuracy comparison of pre-trained CNN models.

Fig. 6
figure 6

Confusion matrices of CNN models(0: adenocarcinoma, 1: large cell carcinoma, 2: normal, 3: squamous cell carcinoma).

Performance comparison with other optimization algorithm

In ensemble learning, numerous methodologies are available to determine the optimal weights for classifiers, typically using optimization strategies to improve the prediction accuracy of the ensemble model. To determine the most effective ensemble weights, a variety of optimization techniques was evaluated, including the Flower Pollination Algorithm (FPA), Particle Swarm Optimization (PSO), Bayesian Optimization (BO), Artificial Bee Colony (ABC) and Ant Colony Optimization (ACO). Table 3 displays the performance metrics of these optimization methods when applied to the classification of lung cancer, with the associated confusion matrices depicted in Fig. 7. The ensemble’s weights were fine-tuned to enhance the model’s effectiveness, ensuring that each classifier’s input was balanced. Among the various algorithms evaluated, FPA was identified as the most effective in redefining the weights of the ensemble, as detailed in Table 3. The ensemble model employing FPA weights achieved the highest classification accuracy for detecting Lung Cancer, with a value of 98.2%, exceeding other methods presented in Table 3.

Table 3 Performance metrics of the optimization methods.
Table 4 Ensemble weights (w1—ResNet101V2 weight, w2—InceptionV3 weight, w3—VGG16 weight).

The ensemble weights of the proposed model were derived from the optimal weights identified by the FPA. As shown in Table 4, these weights formed a combination that enhanced the ensemble model’s predictive performance to its maximum capability. The weights, determined using various optimization methods, collectively sum to 1. Specifically, the FPA algorithm assigned the following weights: 0.17642552 for ResNet101V2, 0.3318414 for InceptionV3, and 0.49173307 for VGG16. Notably, the VGG16 model received the highest weight because its prediction results, derived from softmax layer outputs, typically surpass those of the other models.

Fig. 7
figure 7

Confusion matrices of proposed FPA weighted ensemble model (0: adenocarcinoma, 1:large cell carcinoma, 2: normal, 3: squamous cell carcinoma).

Proposed FPA ensemble model

Among the various CNN architectures, VGG16, InceptionV3, and ResNet101V2 stand out, with classification accuracies of 94.6%, 94.0%, and 92.7%, respectively. However, the FPA-based ensemble method surpasses these models, achieving an accuracy of 98.2% on the test dataset. As detailed in Table 3, FPA fine-tunes the ensemble weights to achieve maximum accuracy. The optimal weight combination utilizing FPA, shown in Table 4, yields better classification results. Figs. 8 and 9 represent the accuracy and loss metrics per epoch for the three top CNN models throughout training. There is minimal variation between training and validation accuracy and loss values, demonstrating the model’s ability to avoid overfitting and its capacity to generalize effectively to new data. Figure 10 shows the AUC(ROC) and AUC(PR) comparison of the top 3 base models with the proposed FPA-weighted ensemble.

Fig. 8
figure 8

Accuracy per epoch values of the top 3 CNN models during training.

Fig. 9
figure 9

Loss per epoch values of the top 3 CNN models during training.

Fig. 10
figure 10

(a) ROC curve, (b) P–R curve between top 3 models and proposed FPA based ensemble.

Various CNN models tend to focus on unique patterns or data features during their training processes. By combining the predictions of multiple CNN models, the ensemble approach achieves higher performance compared to each model working in isolation. This enhanced performance is primarily due to the ensemble’s ability to effectively generalize to novel, unseen instances. Figure 11 presents the test results for the three highest-performing CNN models along with the FPA-weighted ensemble model.

Fig. 11
figure 11

Comparison between proposed FPA weighted and top 3 CNN model.

The following key observations can be made:

  1. 1.

    The FPA-based weighted ensemble model surpasses standalone CNN models by reaching an accuracy rate of 98.2%. This result underscores the enhanced predictive capability of the ensemble model overall.

  2. 2.

    The proposed ensemble model achieves a precision of 98.4%, which is notably superior to the top performing individual CNN classifier, VGG16, which registers a precision of 95.0%. This increased precision of the ensemble model suggests a reduction in false-positive errors.

  3. 3.

    The proposed model demonstrates a recall rate of 98.6%, surpassing the performance of the standalone CNN models. This increased recall indicates that the ensemble model is more effective in identifying a greater number of positive cases relative to the individual classifiers.

  4. 4.

    The proposed model achieves an F1 score of 98.5%, demonstrating a more effective balance between precision and recall compared to the CNN models in isolation.

In summary, the proposed FPA weighted Ensemble Model outperforms individual CNN models in terms of accuracy, precision, recall, and F1 score. This model demonstrates superior overall performance, effectively balancing precision and recall. The enhanced performance can be attributed to the ensemble methodology, which combines various classifiers, thus increasing precision compared to the use of single classifiers.

Comparison of proposed model with existing work

This subsection evaluates the effectiveness of the proposed FPA-based Ensemble technique relative to other approaches using the lung cancer dataset. As illustrated in Table 3, the FPA-based ensemble model achieves an impressive classification accuracy of 98.84% for the detection of lung cancer, surpassing the results of other studies reported in the literature. Furthermore, Fig. 12 shows that this proposed ensemble model consistently outperforms all other comparable studies using similar datasets.

Fig. 12
figure 12

Benchmarking results.

Discussion

This study introduces a novel FPA-based weighted ensemble model (Fig. 4) to detect and classify Lung Cancer. The model was evaluated using datasets that contain images of lung cancer classes (Fig. 2).

Convolutional neural networks (CNNs) are widely recognized for their efficacy in solving image-based problems, offering satisfactory performance because of their robust feature extraction capabilities. To assess the effectiveness of the proposed model, a comparative analysis of six pre-trained CNN models was performed. In particular, VGG16 achieved the highest accuracy of 94.6% (Table 2), primarily attributed to its dense block architecture that improves both feature extraction and adaptability.

An FPA-based weighted ensemble model was then designed to demonstrate how combining multiple deep CNN models can leverage the individual feature extraction strengths of each model, resulting in enhanced generalization. This ensemble integrates the predictions of the top three CNN models, VGG16, InceptionV3 and ResNet101V2, which achieved classification precision of 94.6%, 94.0% and 92.7%, respectively (Fig. 5).

The key advantage of the proposed FPA-based weighted ensemble approach lies in its ability to combine the strengths of multiple high-performing CNN architectures while mitigating their individual weaknesses. Traditional ensemble techniques often rely on static or equally distributed weights, which may not reflect the true contribution of each model to classification performance. Using the flower pollination Algorithm (FPA) for weight optimization, the proposed method adaptively assigns importance to each model based on its effectiveness, leading to improved accuracy and robustness. Furthermore, this technique improves model generalization and reduces overfitting, as the diversity among CNN architectures such as VGG16, InceptionV3, and ResNet101V2 ensures richer feature representation and better decision boundaries. This makes the model highly suitable for real-world medical imaging applications where diagnostic reliability is paramount

Performance results validate that the proposed ensemble model not only outperforms individual CNNs but also presents a scalable and robust approach to automated lung cancer detection. These results underscore the potential of integrating metaheuristic optimization with deep learning to improve classification performance in medical diagnostics.

TOPSIS analysis

To check the effectiveness of CNN models in lung cancer detection, we have conducted TOPSIS analysis to classify the classifiers according to all evaluation metrics. It is a multi-criteria decision analysis method that is used to rank and select alternatives56. This study applies TOPSIS to compare various Convolutional Neural Network (CNN) models based on four performance criteria: Accuracy, Precision, Recall, and F1 Score.

Step 1: Constructing the decision matrix The decision matrix \(D\) consists of alternatives (models) and criteria:

$$\begin{aligned} D = \begin{bmatrix} d_{11} & d_{12} & \dots & d_{1n} \\ d_{21} & d_{22} & \dots & d_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ d_{m1} & d_{m2} & \dots & d_{mn} \end{bmatrix} \end{aligned}$$

where \(d_{ij}\) represents the performance of the model \(i\) under criterion \(j\).

Step 2: normalization using L2 vector normalization Each element of the matrix is normalized using:

$$\begin{aligned} r_{ij} = \frac{d_{ij}}{\sqrt{\sum _{i=1}^{m} d_{ij}^2}} \end{aligned}$$
(12)

This ensures that all criteria are dimensionless and comparable.

Step 3: determine the best ideal and worst The Best Ideal Values (\(A^+\)) and the Worst Ideal (\(A^-\)) are calculated as:

$$\begin{aligned} & A^+ = \{ \max r_{ij} \mid j \in J^+, \min r_{ij} \mid j \in J^- \} \end{aligned}$$
(13)
$$\begin{aligned} & A^- = \{ \min r_{ij} \mid j \in J^+, \max r_{ij} \mid j \in J^- \} \end{aligned}$$
(14)

where \(J^+\) are benefit criteria (higher is better) and \(J^-\) are cost criteria (lower is better).

Step 4: calculate Euclidean distances The separation distances from the best ideal and the worst ideal are computed as:

$$\begin{aligned} & D_i^+ = \sqrt{\sum _{j=1}^{n} (r_{ij} - A_j^+)^2} \end{aligned}$$
(15)
$$\begin{aligned} & D_i^- = \sqrt{\sum _{j=1}^{n} (r_{ij} - A_j^-)^2} \end{aligned}$$
(16)

where

  • \(D_i^+\) is the distance from the best ideal

  • \(D_i^-\) is the distance from the worst ideal.

Step 5: compute the TOPSIS score (relative closeness) The relative closeness is calculated as:

$$\begin{aligned} C_i = \frac{D_i^-}{D_i^+ + D_i^-} \end{aligned}$$
(17)

where \(0 \le C_i \le 1\). A higher \(C_i\) indicates a better alternative.

Step 6: rank the alternatives Models are ordered by their TOPSIS scores from highest to lowest. The model achieving the largest \(C_i\) is assigned the top rank.

TOPSIS analysis evaluates CNN models based on their performance metrics by computing their Euclidean distances from the ideal best \(\left( A^+\right)\) and ideal worst \(\left( A^-\right)\) solutions. From Table 5 it is found that the FPA-weighted model achieves the highest relative closeness (1.000), which means that it is closest to the ideal solution. The ABC and BO weighted models follow with \(0.900540\) and \(0.817904\), respectively, making them the second and third best performers. Xception, with the highest \(D_i^+\) (\(0.106076\)) and the lowest \(D_i^-\) (\(0.000\)), ranks last. Figure 13 shows the comparison of TOPSIS analysis for all CNN models considered in this study. This analysis helps to select the most balanced model considering multiple performance criteria rather than relying solely on accuracy.

Table 5 TOPSIS analysis results for CNN models.
Figure 13
figure 13

TOPSIS analysis for CNN models.

Conclusion

Lung cancer remains one of the most life-threatening diseases and contributes significantly to global mortality. Early and accurate detection using Computed Tomography (CT) imaging is critical for improving patient outcomes. This study introduces an FPA-based weighted ensemble model that integrates three robust pre-trained CNN architectures: VGG16, ResNet101V2, and InceptionV3, where the flower pollination algorithm (FPA) optimally determines the weights of the ensemble. The primary innovation lies in the synergistic integration of diverse CNN models with an evolutionary optimization technique, which not only enhances classification performance but also ensures robustness across varying data characteristics. The proposed model surpasses the performance of individual CNN classifiers, achieving a remarkable accuracy of 98.2%, precision of 98.4%, recall of 98.6%, and an F1 score of 98.5%. These results demonstrate the effectiveness of deep learning ensembles, particularly when optimized with metaheuristic techniques like FPA, in improving diagnostic reliability.

Despite the promising results, the limited dataset size (1,000 CT scans) may introduce biases and affect the generalizability of the model. Future work should focus on validating the approach using larger and more diverse datasets to ensure robustness across populations. Additionally, integrating attention mechanisms and hybrid optimization strategies could further boost accuracy and interpretability, making the framework more applicable in real-world clinical settings. The proposed model can also serve as a decision-support tool for radiologists, aiding early diagnosis, reducing errors, and supporting effective treatment planning, thus improving patient prognosis and reducing healthcare burden.