Introduction

Innovative lung cancer prognosis relies on cutting-edge machine learning algorithms applied to a wealth of patient data. By incorporating various elements like imaging, genetics, and medical records, this model demonstrates remarkable precision in anticipating potential occurrences. Optimizing early intervention and customizing treatments, it holds the potential to elevate patient outcomes in the challenging realm of lung cancer. Early detection is vital for successful treatment1 and goal is to authenticate our Lung Cancer Prediction CNN, initially trained on US screening data, through an autonomous European multicenter dataset2. This research explores early-stage lung cancer diagnosis using CNNs, aiming to enhance diagnostic accuracy and save lives by improving early detection through reliable machine learning models3. While CNN models have demonstrated high accuracy on controlled datasets, their applicability to real-world scenarios with diverse patient populations remains a critical challenge. Addressing this requires validation across multiple datasets and strategies to ensure robustness against unseen variations in imaging data. Introducing a method utilizing Convolutional Neural Networks (CNN) to categorize tumors in lung disease screening through tomography filters, harnessing spatial invariance for effective feature extraction4. Utilizing end-to-end learning CNN, this method automatically extracts self-learned features, comparing favorably with conventional approaches and traditional computer-aided diagnosis systems5.

Developing a cutting-edge 1D CNN model for NSCLC staging and grading, leveraging insights from the latest TCIA NSCLC Radio genomics Collection. Hybrid MSER-SURF model integrates tumor features with clinical data for comprehensive analysis6. Despite the highly imbalanced and variegated nodule detection challenge, our method excels, achieving effective lung nodule detection and cancer prediction. Using a candidate proposal approach and a 3D CNN, achieve near-human performance7 and to introduce a swift CNN-RNN model for NSCLC AJCC staging, surpassing traditional ANNs through deeper layers. Comparison involves standard ML algorithms and related studies8. Focusing on individuals with lung cancer, our focus lies in crafting advanced deep learning models for predicting survival outcomes, addressing both classification and regression challenges. Feature importance analysis elucidates relevant factors impacting survival periods9. Introducing LungNet, an innovative hybrid model blending a 22-layer Convolutional Neural Network (CNN) with wearable sensor-based MIoT data and CT scan for heightened diagnostic precision10.

Leveraging CNN’s disease detection, this research explores the fusion of blockchain for secure data in enhancing food safety and lung cancer prediction11. This preliminary study underscores CNNs’ feasibility and promising efficacy in evaluating lung cancer’s T-parameter, offering rapid T1-T2 or T3-T4 probabilities. Ongoing research aims at robust TNM algorithm development12.

Introducing an innovative approach, employs a 2D CNN with Taguchi optimization for automated lung cancer recognition from CT images, as outlined in reference13. Through 36 experiments and 8 control factors, it meticulously refine CNN parameters, significantly improving classification accuracy. Additionally, accuracy was enhanced by integrating convolutional and bidirectional recurrent neural networks into a novel deep learning model, leveraging the NSCLC Radiogenomics dataset with 211 subjects as detailed in reference14. Our intelligent medical system, detailed in reference15, incorporates CNN-assisted diagnosis and decision-making, utilizing sensors to analyze NSCLC patient’s records and ensuring accuracy through transfer learning and dynamic sampling techniques.

The research structure unfolds with Section 1delvering into the introduction. Section “Literature review” navigates the scholarly landscape in the literature review. Section “Proposed methodology” intricately unfolds the intricacies of the proposed methodology, offering a detailed roadmap. Section “Result and discussion” unveils the results, igniting insightful discussions around their implications. The “conclusion” section encapsulates profound insights, culminating in a comprehensive closure that synthesizes the study’s findings and underscores their significance in the broader context of the research domain.

Literature review

In 2024, Islam et al.16 provides a comprehensive evaluation of the latest advancements in using deep learning to enhance medical imaging data. The course covers a wide range of applications, popular datasets, evaluation criteria, and both traditional and innovative augmentation techniques. To assess the current condition of the subject and identify its primary challenges, the study analyses prior research findings. Proposed are some avenues for future research that could address these concerns. This review is an invaluable resource for academics and experts in the field of medical image analysis.

In 2024, Saha et al.17 presents VER-Net, a transfer learning model for CT scan lung cancer detection. Combining VGG19, EfficientNetB0, and ResNet101 transfer learning techniques, this model four lung cancer classifications are produced via picture preparation, data augmentation, and hyperparameter tuning of the model. The results imply that VER-Net can be used to other diseases found on CT scans and is quite accurate in spotting lung cancer. This work highlights the efficient application of transfer learning in the domain of medical image processing, especially in increasing diagnostic accuracy.

In 2024, Riku Klén et al.18 examine the influence of several image augmentation techniques on the precision of a Convolutional Neural Network (CNN) that has been trained for binary classification. The CNN is trained using 11 medical datasets, mostly comprising X-rays, ultrasound (US) images, PET scans, and MRI scans. The databases centre on cancer and lung diseases. Specifically analysing seven extensively used augmentation techniques, this article compares CNN predictions with and without augmentation. The results imply that for both the US and PET datasets, the application of augmentation methods does not produce statistically significant variations. Gaussian blur was determined to be the most successful augmentation technique for X-rays and MRI images, therefore enhancing the model performance. These results show that the kind of imaging technology applied affects the effectiveness of augmentation strategies.

In 2024, S. Kukreja and M. Sabharwa et al.19 improving the survival rates of lung cancer, a major and maybe deadly disease, depends on early identification. In order to classify three different histological images: bengin, adenocarcinoma, and squamous cell carcinoma, in this paper proposes using a Convolutional Neural Network (CNN). The aim of the project is to evaluate the accuracy of the Convolutional Neural Network (CNN) model against other techniques thereby enhancing the diagnosis of lung cancer. Convolutional neural networks (CNNs) have not been used for the categorisation of these specific images so our method is novel. By means of accurate and timely classification of cancers, this approach may improve the efficacy of therapy and possibly avert death of course. By means of convolutional neural networks (CNNs), the diagnostic process can be simplified, costs can be lowered, and general accuracy in lung cancer detection can be improved.

In 2024, Zhang et al.20 demonstrates the application of Convolutional Neural Networks (CNNs) to Computed Tomography (CT) scans has led to notable progress in automating the diagnosis of lung cancer, resulting in enhanced accuracy in the detection and analysis of this disease. Notwithstanding these progressions, there are still obstacles to overcome, such as the restricted comprehensibility, variability of data, and difficulties in generalising. This paper proposes a new method which integrates the CNN with the DenseNet by using data fusion and mobile edge computing to enhance the identification and classification of lung cancer. Data fusion improves the reliability of models by integrating data from multiple sources where mobile edge computing hastens the model’s processing time to instant. Improve DenseNet’s classification accuracy further, the approach introduced an enhanced predictive sparse decomposition (PSD) method to acquire sparse feature from medical images, which also promoted the improvement of the model.

In 2023, Lulu Gai et al.21 demonstrated Vision Transformers (ViTs) and Convolutional Neural Networks (CNNs) test lung cancer identification using CT data. In this paper used a collection of 212 medical photos and to get beyond dataset restrictions, both models exploited self-supervised learning, transfer learning, and sharpness-aware minimiser. Their training consisted in no lung nodule markings. Convolutional Neural Networks (CNNs) outperform Vision Transformers (ViTs) in self-supervised learning, with an AUC of 98.1% and a recall rate of 93.4%. While both CNNs and Vision Transformers (ViTs) showed potential, CNNs outperformed ViTs, especially with smaller datasets. The study shows that CNNs can identify early-stage lung cancer, supporting their clinical utility.

In 2023, Syeda Reeha et al.22 proposes an ensemble model combining several deep learning architectures, including BEiT, DenseNet, and Sequential CNN, so addressing the immediate demand for efficient lung cancer detection and classification. To improve prediction accuracy the combined strategy uses AND, OR, Weighted Box Fusion, and Boosting. The research shows using the Chest CT-Scan Images Dataset that the ensemble model much beats single-model techniques. The study highlights how effectively combining various classifiers overcomes particular limitations and improves general performance.

In view of the available data and resources, this study provides a reasonable response by demonstrating the potential of ensemble methods as a potent tool for enhancing the precision and efficiency of lung cancer diagnosis.

In 2023, R. Raza et al.23 based on the EfficientNet architecture with extra top layers for better performance; this work presents Lung-EffNet, a new transfer learning-based model for lung cancer classification. Lung-EffNet sorts lung scans into bengin, malignant, or normal categories after five EfficientNet variations (B0–B4) are evaluated on the IQ-OTH/NCCD dataset. Lung-EffNet, especially with EfficientNetB1, shown better accuracy, efficiency, and faster training times than other pre-trained CNN architectures. Large-scale clinical implementation and automated lung cancer diagnosis from CT images fit this paradigm really nicely.

In 2023, Nandita Gautam et al.24 Essential for early detection of lung cancer, this work provides a new ensemble deep learning model that correctly labels the degree of lung nodules from CT scan images. Three advanced convolutional neural network (CNN) models are ResNet-152, DenseNet-169, and EfficientNet-B7. The weight optimisation approach used in this system combines ROC-AUC and F1-scores to improve accuracy. It surpassed recent methodologies and successfully reduced the frequency of false negatives. This approach demonstrates promise in significantly improving lung cancer diagnosis and patient outcomes.

In 2022, Dritsas and Trigka et al.25 used machine learning, the study crafted effective models for early detection of high-risk individuals prone to lung cancer, enabling timely interventions to mitigate long-term complications. Emphasizing the efficacy of Rotation Forest, the article highlighted its robust performance, subjecting it to thorough evaluation through established metrics such as precision, recall, F-Measure, accuracy, and area under the curve.

In 2021, Tsou et al.26 introduced a machine learning model that precisely detected lung cancer based on the analysis of participants’ exhaled breath, presenting a non-invasive and radiation-free diagnostic system. The research successfully showcased a novel diagnostic approach by integrating deep learning algorithms and VOC analysis, minimizing environmental interference. Ongoing efforts focused on developing standardized, automated breath sampling protocols, aiming to simplify the collection process and ensure sample quality.

While prior studies focus on augmentation and model optimization, this research uniquely addresses memory overfitting using Differential Augmentation, demonstrating superior performance in lung cancer detection.

Problem statement

The research problem focuses on enhancing the accuracy and efficiency of lung cancer detection using deep learning techniques. Although improved, current models struggle with generalisation, data variability, and interpretability across imaging modalities. Optimising model performance, especially early detection, while addressing dataset variety and augmentation limits is difficult.

Generalizability challenges

Although the CNN + DA model performs well on the controlled dataset, its generalizability to real-world scenarios remains to be validated. Challenges include unseen variations in imaging protocols, patient demographics, and artifacts in clinical datasets. Future studies will explore strategies such as transfer learning and fine-tuning on multi-center datasets to mitigate these challenges and ensure applicability in diverse clinical settings. To increase medical image analysis models’ resilience, scalability, and clinical utility, data fusion, mobile edge computing, and ensemble methods must be explored. Table 1 summarizes the objectives and limitations of previous research, highlighting both their goals and shortcomings.

Table 1 Aim and drawback of the previous research.

Research gap

While deep learning has revolutionized numerous domains, including medical imaging for lung cancer detection, a persistent research gap remains concerning memory overfitting. Despite various efforts to address general overfitting, the nuanced challenge of models memorizing training data rather than learning to generalize has not been comprehensively addressed. Many existing solutions focus on broader overfitting issues, often overlooking the subtle yet critical aspect of memory overfitting. This gap underscores the need for dedicated research into strategies and methodologies to combat memory overfitting, ensuring that models remain robust and reliable, especially in critical applications like medical diagnostics.

Proposed methodology

EfficientNetB0

EfficientNetB023 was selected as a baseline due to its balance of computational efficiency and performance, making it suitable for constrained medical imaging tasks. It makes use of MBConv blocks for feature extraction with a small amount of computational load, and has become a standard for benchmarking in image classification. This is done with a compound scaling method that scales the depth, width, and input resolution of the network in similar measure for better performance across the different axes. The architecture is based on a recently introduced concept of Mobile Inverted Bottleneck Convolution (MBConv) blocks that serves both an efficient feature extractor and keeps the model lightweight. Additionally, EfficientNetB0 incorporates Squeeze-and-Excitation (SE) blocks to enhance the network’s ability to capture important features by modelling interdependencies between channels. The model27 uses the Swish activation function, which smooths out non-linearities, contributing to better performance compared to traditional activation functions like ReLU. With around 5.3 million parameters, EfficientNetB0 is not only compact but also highly effective, making it suitable for a wide range of applications, from mobile devices to large-scale image classification tasks.

Convolutional operation in MBConv block

The convolutional operation is the core of EfficientNetB0, especially within the MBConv blocks are shown in Eq. (1)

$$\:\:\left({Z}_{ij}^{k}\right)=\sum\:_{M=1}^{M}\sum\:_{n=1}^{N}{X}_{i+m-1.j+n-1}.{W}_{mn}^{k}+{b}^{k\:}$$
(1)

Where:

\(\:{Z}_{ij}^{k}\) is the output of the convolution at position (i, j) for the kth filter.

\(\:X\) is the input feature map.

\(\:W\) is the convolution filter of size M X N.

\(\:{b}^{k\:}\) is the bias for the k-th filter.

Depthwise Convolution in MBConv block

In the MBConv block, depthwise convolution is applied to each input channel separately, reducing the computational cost is shown in Eq. (2)

$$\:\left({Z}_{ij}^{c}\right)=\sum\:_{M=1}^{M}\sum\:_{n=1}^{N}{X}_{i+m-1.j+n-1},c.{W}_{mn}^{c}+{b}^{c}$$
(2)

Where:

\(\:{\text{Z}}_{\text{i}\text{j}}^{\text{c}}\) is the output for the cth channel after depth wise convolution.

\(\:{\text{X}}_{\text{i}+\text{m}-1.\text{j}+\text{n}-1},\text{c}\) is the input at channel c.

\(\:{\text{W}}_{\text{m}\text{n}}^{\text{c}}\) is the depth wise filter for channel c.

\(\:{\text{b}}^{\text{c}}\)is the bias for channel c.

Squeeze-and-excitation (SE) block

The SE block models channel-wise dependencies using two main operations: Squeeze and Excite shown in Eqs. (3) and (4).

Squeeze Operation:

$$\:{S}_{c}=\frac{1}{HXW}\sum\:_{i=1}^{H}\sum\:_{j=1}^{W}{Z}_{ij}^{c}$$
(3)

Where:

\(\:{\text{S}}_{\text{c}}\) is the squeezed feature for channel ccc.

\(\:\text{H}\:\text{a}\text{n}\text{d}\:\text{W}\) are the height and width of the feature map.

Excite Operation:

$$\:{e}_{c}=\sigma\:({W}_{2}.ReLU\left({W}_{1}.{s}_{c}\right))$$
(4)

Where:

\(\:{\text{e}}_{\text{c}}\) is the excitation weight for channel ccc.

\(\:{\text{W}}_{2}\) are weights of the fully connected layers.

\(\:{\upsigma\:}\) is the sigmoid activation function.

Recalibration:

$$\:{\widehat{Z}}_{\varvec{i},\varvec{j}}^{\varvec{c}}={e}_{c}.{Z}_{ij}^{c}$$
(5)

Where,

\(\:{\widehat{\text{Z}}}_{\text{i},\text{j}}^{\text{c}}\:\)is the recalibrated feature map after applying the excitation weight.

Global average pooling

The feature map’s spatial dimensions are minimized to one value per channel through the use of global average pooling is shown in Eq. (6)

$$\:{GAP}_{c}=\frac{1}{HXW}\sum\:_{i=1}^{H}\sum\:_{j=1}^{W}{\widehat{Z}}_{ij}^{c}$$
(6)

Where:

\(\:{\text{G}\text{A}\text{P}}_{\text{c}}\) is the global average pooled value for channel ccc.

\(\:\text{H}\:\text{a}\text{n}\text{d}\:\text{W}\) are the height and width of the feature map.

Compound scaling method

EfficientNet uses a compound scaling method to scale the depth d, width w, and input resolution r of the network is shown in Eq. (7)

$$\:d={\alpha\:}^{d},w={\beta\:}^{w},r={\gamma\:}^{r}$$
(7)

Where:

\(\:{\upalpha\:},{\upbeta\:}\:\text{a}\text{n}\text{d}\:{\upgamma\:}\:\)are scaling coefficients.

\(\:\text{d},\text{w},\text{r}\) are the depth, width, and resolution scaling factors.

These equations provide a mathematical framework for understanding how the EfficientNetB0 architecture processes and transforms input data, emphasizing both feature extraction and efficiency through its innovative design.

Avoid memorization over fitting using CNN with the DA algorithm

Deep learning, mainly through Convolutional Neural Networks, has exhibited transformative potential in diverse applications, ranging from natural language processing to medical imaging. However, as with any powerful tool, challenges persist. A primary concern that has emerged in deep learning applications is overfitting, in the realm of machine learning, overfitting emerges when models excel on training data but stumble when confronted with unfamiliar instances. “Memory overfitting,” a particular manifestation, occurs when a model memorizes training instances rather than discerning underlying data patterns. This phenomenon, especially perilous in domains like medical diagnostics, poses significant risks due to the minimal margin for error.

Addressing this research gap is not merely refining existing models but calls for innovative strategies that fundamentally alter the training dynamics. One promising avenue is the integration of CNNs with Differential Augmentation. Differential Augmentation (DA) applies transformations such as random hue adjustments (± 10°), saturation changes (0.8–1.2), brightness scaling (0.9–1.1), and contrast modifications (0.85–1.15). For instance, a lung CT scan can be augmented to simulate varying imaging conditions, enhancing the model’s ability to generalize across datasets. These augmentations are applied randomly during training to ensure diverse representations of the training data. By diversifying the training input, DA ensures that models are exposed to a broader spectrum of data variations, discouraging memorization and promoting genuine learning.

This research explores the synergies of combining CNN with DA, aiming to mitigate memory overfitting. Through this integrative approach, the aspiration is to develop models that are not only accurate but also robust and generalizable, ensuring their reliability in real-world diagnostic scenarios.

Building on the foundational understanding of memory overfitting and its implications, the next logical step is developing and refining a novel algorithm that seamlessly integrates CNNs with Differential Augmentation. Recognizing the individual strengths of both CNNs and DA provides a unique vantage point28. CNNs, with their hierarchical feature extraction capabilities, excel at capturing intricate patterns in data. On the other hand, DA, with its dynamic data augmentation strategies, offers a solution to the data scarcity and variety issue, inherently combating overfitting.

The development of a new CNN with DA algorithm requires a multi-faceted approach. Initially, the architecture of the CNN must be tailored to be receptive to dynamic augmentations. This means layers and nodes must be optimized not just for feature extraction, but also for variability tolerance. The algorithm should be designed to be adaptive, learning not just from the features of the training data but also from the variations introduced by DA.

Furthermore, the DA component17 must be sophisticated enough to introduce meaningful augmentations. It’s not just about changing the data; it’s about enhancing it in ways that genuinely challenge and expand the model’s understanding. This ensures that the CNN doesn’t stagnate or fixate on specific data patterns but continuously evolves its understanding, mirroring the dynamic nature of real-world data. Lastly, the integration of CNN and DA must be seamless. Figure 1 is shown below the algorithm should allow feedback loops, where insights from the CNN’s performance can inform and refine the DA strategies and vice versa. This creates a symbiotic relationship between data processing and data enhancement, leading to a constant learning, adapting, and improving model.

Developing a new CNN with a DA algorithm represents an exciting frontier in deep learning research. By addressing the persistent challenge of memory overfitting head-on, this endeavour promises to set new benchmarks in accuracy, reliability, and generalizability in deep learning models.

To develop a new CNN with a Differential Augmentation (DA) algorithm to mitigate memory overfitting, it’s important to understand the mathematical foundation behind both CNNs and DA.

Convolutional layer operations

The Convolutional Neural Network (CNN) is primarily based on the convolution operation, which extracts features from the input data is shawn in Eq. (8)

Convolution Operation:

$$\:Feature\:Map\:\left({Z}_{ij}^{k}\right)=\sum\:_{M=1}^{M}\sum\:_{n=1}^{N}{X}_{i+m-1.j+n-1}.{W}_{mn}^{k}+{b}^{k\:}$$
(8)

Where:

\(\:\text{X}\) is the input feature map.

\(\:\text{W}\) is the convolution kernel (filter) of size MXN.

\(\:{\text{b}}^{\text{k}\:}\)is the bias for the kth filter.

\(\:{\text{Z}}_{\text{i}\text{j}}^{\text{k}}\) is the output feature map at position (i, j) for the kth filter.

Activation function

An activation function is used to add non-linearity after convolution. Commonly used functions are ReLU is shown in Eq. (9).

ReLU Activation Function:

$$\:A\left({Z}_{ij}^{k}\right)=\text{m}\text{a}\text{x}(0,{Z}_{ij}^{k})$$
(9)

Pooling operation

Pooling layers reduce the dimensionality of the feature maps, typically using max pooling is shown in Eq. (10).

Max Pooling:

$$\:{P}_{ij}^{k}=max\left\{{Z}_{m,n}^{k}\::mϵ\left[i,i+s-1\right],nϵ[j,j+s-1]\right\}$$
(10)

Where:

\(\:\text{s}\) is the size of the pooling filter.

\(\:{\text{P}}_{\text{i}\text{j}}^{\text{k}}\:\)is the pooled output for the kth feature map.

Fully connected layer

One or more fully connected layers receive the flattened output from the convolutional and pooling layers is shown in Eq. (11).

Fully Connected Layer Operation:

$$\:y=W.x+b$$
(11)

Where:

\(\:\text{W}\) is the weight matrix.

\(\:\text{x}\) is the input vector.

\(\:\text{b}\) is the bias.

Softmax function

For classification tasks, the output layer often uses a softmax function to convert logits into probabilities is shown in Eq. (12).

Softmax Function:

$$\:p\left(y=i|x\right)=\frac{{e}^{{Z}_{i}}}{\sum\:_{j=1}^{K}{Z}_{j}}$$
(12)

Where:

\(\:{\text{Z}}_{\text{i}}\)​ is the logit for class iii.

\(\:\text{K}\) is the number of classes.

Loss function

For classification tasks, the cross-entropy loss is typically used is shown in Eq. (13).

Cross-Entropy Loss:

$$\:\mathcal{L}=-\sum\:_{i=1}^{K}{y}_{i}\text{log}(p\left(y=i|x\right))$$
(13)

Where:

\(\:{\text{y}}_{\text{i}}\) is the ground truth label (one-hot encoded).

\(\:\text{p}\left(\text{y}=\text{i}|\text{x}\right)\) is the predicted probability.

Differential augmentation (DA)

The Differential Augmentation (DA) technique is designed to improve the model’s ability to generalize by introducing diverse transformations, such as hue, brightness, saturation, and contrast adjustments. This simulates real-world variations in imaging conditions, which are common in clinical environments.

Augmentation Function can be employed is shown in Eq. (14)

$$\:DA\left(X\right)=\{{T}_{1}\left(X\right),{T}_{2}\left(X\right),\dots\:\dots\:\dots\:{T}_{n}\left(X\right)\}$$
(14)

Where:

\(\:\text{X}\) is the original input data.

\(\:{\text{T}}_{\text{i}}\left(\text{X}\right)\) represents different augmentation transformations.

Regularization terms

To combat overfitting, regularization techniques like L2 regularization can be employed is shown in Eq. (15).

L2 Regularization:

$$\:{\mathcal{L}}_{total}=\mathcal{L}+\lambda\:\:{\sum\:}_{k}{\sum\:}_{i,j}({W}_{i,j}^{k}{)}^{2}$$
(15)

Where:

\(\:\mathcal{L}\) is the original loss function.

\(\:{\uplambda\:}\) is the regularization strength.

\(\:{\text{W}}_{\text{i},\text{j}}^{\text{k}}\) are the weights of the kth layer.

Feedback mechanism for DA-CNN integration

To ensure a feedback loop between CNN performance and DA strategie:

Performance Feedback Adjustment can be shown in Eq. (16):

$$\:{T}_{i+1}\left(X\right)={T}_{i}\left(X\right)+\alpha\:.{{\Delta\:}}_{T}.{\mathcal{L}}_{total}$$
(16)

Where: \(\:{\text{T}}_{\text{i}+1}\left(\text{X}\right)\) is the adjusted augmentation for the next iteration.

\(\:{\upalpha\:}\) is a learning rate for DA adjustment.

By integrating these mathematical components, a CNN-DA algorithm can be developed that dynamically adapts to data variations, reducing memory overfitting and enhancing model robustness.

Fig. 1
figure 1

Flow diagram for CNN with DA.

Data collection and loading

Efficient data collection and loading are crucial components in any machine learning endeavour. This involves gathering relevant datasets from diverse sources, ensuring data integrity, and employing effective loading mechanisms. Advanced methods include real-time streaming integration and automated preprocessing to maintain data quality and facilitate seamless integration into machine learning pipelines, optimizing model performance.

Data preprocessing

Data preprocessing involves vital steps such as normalization and label encoding. Images were resized to 256 × 256 pixels, normalized to the [0, 1] range by dividing by 255, and labels were one-hot encoded. Hyperparameters were set as follows: learning rate of 0.001, batch size of 32, and optimizer as Adam. Image augmentation included rotations of ± 15° and random cropping to simulate variability. Normalization ensures standardized scales for numerical features, enhancing model convergence. Label encoding transforms categorical labels into numerical values, facilitating algorithmic comprehension. These preprocessing techniques lay the foundation for robust machine learning models, enhancing accuracy and efficiency in the analysis of diverse datasets which are shown in Table 2.

Table 2 Summarizes the preprocessing steps applied in this study for enhanced reproducibility.

Normalization

Normalization is a pivotal data preprocessing technique that standardizes numerical features, ensuring a consistent scale across variables. By rescaling data to a common range, often between 0 and 1, normalization mitigates the influence of disparate magnitudes, promoting fair comparisons and optimal model convergence in machine learning applications.

Encode label

Label encoding is a crucial step in data preprocessing, particularly for categorical variables in machine learning. It involves assigning numerical values to categorical labels, enabling algorithms to interpret and process them effectively. This transformation enhances model performance by converting non-numeric information into a format suitable for mathematical computations and analysis.

CNN- model architecture

Convolutional layer

In deep learning, and more especially in convolutional neural networks, the convolutional layer is the primary layer. This layer makes use of filters or kernels, to slide over the input data and extract repeated local features, while at the same time maintaining spatial connection. Every filter analyzes certain structures within the picture, which improves the overall possibility to detect more complex patterns in the network. By capturing hierarchical features, convolutional layers are effective in image processing, enabling the model to identify edges, textures, and higher-level representations. Strides and padding control spatial dimensions, influencing the layer’s receptive field. Convolutional layers contribute significantly to the success of tasks like image recognition, object detection, and semantic segmentation in various machine learning applications.

Max pooling layer

Essential in CNNs, the max-pooling layer crucially down samples and extracts features by partitioning input into non-overlapping regions, selecting the maximum value from each feature map. This process retains essential information while reducing spatial dimensions, aiding computational efficiency and mitigating overfitting. By improving translation invariance, max-pooling strengthens the network’s resistance to changes in spatial orientation. While down sampling, it preserves dominant features, contributing to hierarchical feature learning. Max-pooling layers are instrumental in creating spatial hierarchies, improving the network’s capacity to recognize and generalize complex patterns in diverse data, particularly in image-related tasks.

Flatten layer

The flatten layer is an essential part of many neural networks and comes after the convolution and the pooling layers. Its main role is to change the input data to a single array or combine information as a way for fully connected layers. This flattening process is pivotal in transitioning from spatial hierarchies to a format suitable for traditional neural network architectures. By converting multidimensional data into a linear structure, the flatten layer facilitates comprehensive feature learning and abstraction. It plays a crucial role in tasks like image classification, enabling the neural network to extract and process intricate patterns for accurate predictions and decision-making.

Dense layer

Integral to neural networks, the dense layer, often termed fully connected, stands as a foundational building block in their architecture. Neurons in this layer are interconnected, forming a dense matrix that allows each neuron to connect to every neuron in the preceding and succeeding layers. This extensive connectivity enables comprehensive feature learning and abstraction. These typically are very important when there are complex patterns that need to be learned within the data that is being fed to model Dense layers are very important especially in tasks like image classification as well image generation and natural language processing. In dense layers the weights and the bias are trained to improve the efficiency of making accurate prediction based on most developed patterns.

Activation layer

The activation layer is pivotal in neural networks for introducing non-linearity, enhancing model complexity and expressive power. It follows each neuron’s output in a neural network layer, applying an activation function like ReLU, Sigmoid, or Tanh. This non-linear transformation introduces dynamic behaviour shows in Fig. 2, Enabling the network to grasp intricate patterns and relationships inherent in the data, facilitating more nuanced and accurate learning processes. Activation layers enable the model to capture complex features and perform sophisticated tasks such as image recognition or natural language understanding, ensuring the network’s capacity for more nuanced and powerful representations.

Fig. 2
figure 2

CNN model Architecture.

Differentiation augmentation

Differentiation Augmentation (DA) includes, hue, saturation, brightness, contrast which when incorporated into the data helps in enhancing model performance by creating diverse and realistic variations in training data. Changing the hue means moving colors up or down the spectrum, changing lighting conditions as well as training the model under different color tone, though excessive shifts may distort important color-based features. Saturation adjustment takes color to the next level by increasing or reducing the model’s ability to distinguish between colors in bright or less bright situations, but overdoing it may produce unrealistic results. Contrast changes mimic different levels of exposure, which benefits the model’s training in features with ensuring adaptability to high or low-contrast environments; yet, having too extreme corrections such as very bright or very dark diminishes training efficacy. This influence strongly helps improve the identification of features within low brightness conditions and stabilize the device’s performance in both high and low contrast settings but can cause important fine details to be obscured by the change in contrast. In aggregate, these methods enrich training data, provide a more general view of data distributions, and reduce over-fitting and thus improve all learnable tasks in the case of careful application of augmentations to preserve the realism of the data.

Adjust Hue

In image processing29, adjusting hue is vital, a technique altering pixel color representation by shifting them along the color spectrum for nuanced visual changes. This technique is valuable for tasks like color correction, image enhancement, and artistic transformations, providing flexibility in altering the overall color appearance without affecting other image characteristics.

Adjust saturation

Saturation adjustment is a key image processing operation that involves altering the intensity of colors. By manipulating the saturation levels, one can enhance or desaturate the colors in an image. This adjustment is valuable for fine-tuning the visual impact of images, contributing to artistic modifications and color correction processes.

Adjust brightness

Image processing operations require brightness adjustment to alter the entire luminance level throughout an image. One can improve visibility while resolving exposure issues or create artistic effects through modifications of image brightness levels. The operation functions inherently to enhance the visual quality alongside image clarity across different application platforms.

Adjust contrast

Contrast adjustment is a vital image processing operation that involves modifying the difference in brightness between the light and dark areas of an image. By enhancing or reducing contrast, one can bring out details, improve visibility, or achieve artistic effects. This adjustment contributes to optimizing image quality and visual impact.

Algorithm: CNN with Differential Augmentation (DA) for Lung Cancer Detection

Data Collection and Loading.

Select dataset folder D containing labelled images.

Data Preprocessing.

For each image Ii​ in D:

  • Normalize: Ii norm​​= 255.0Ii​​.

  • Encode label: li​= Encode(li​) where li​ is the label of Ii​.

Split D into training set T and validation set V with a ratio (e.g., 80:20).

Model Architecture.

Construct a CNN model M with layers:

  • Convolution (f, k) where f is the number of filters and k is the kernel size.

  • Max-pooling: P(k) where k is the pooling size.

  • Flatten: F().

  • Dense: D(n), where n is the number of neurons.

  • Activation: Use ReLU for hidden layers and Softmax for output.

Loss function L: L(y^​, y)=−∑i​yi​log(y^​i​) Where y is the actual label and ^y^​ is the predicted label.

Differentiable Augmentation.

For the image I:

  • Adjust hue: I′=Hue(I, h) where h is a random hue factor.

  • Adjust saturation: I′=Saturation(I, smin​, smax​) where smin​ and smax​ are random saturation factors.

  • Adjust brightness: I′=Brightness(I, b) where b is a random brightness factor.

  • Adjust contrast: I′=Contrast(I, cmin​, cmax​) where cmin​ and cmax​ are random contrast factors.

Training.

  • Train M using T with DA applied. For each epoch:

  • For each batch B in T.

  • Apply DA to B to get B′.

  • Update M using B′ to minimize L.

Evaluation.

Evaluate M using V to compute accuracy and loss.

Definitions

  • D: Entire dataset.

  • Ii​: Image at index i.

  • li​: Label of image Ii​.

  • T: Training dataset.

  • V: Validation dataset.

  • M: CNN model.

  • L: Loss function.

  • y^​: Predicted label by model.

  • y: True label.

Training

Differential Augmentation is a technique that applies varied transformations to individual instances within image batches during training. The primary reasons for its adoption are:

Diversity in Training Data: DA introduces variations in training samples, thereby enriching the dataset. This ensures that models are exposed to various data perspectives, improving generalization.

\(I\prime =DA\left( I \right),\)

Where I is the original image and I′ is the augmented image.

Differential Augmentation applied the following transformations: hue adjustments within ± 10 degrees, saturation variations between 0.8 and 1.2, brightness scaling between 0.9 and 1.1, and contrast modifications from 0.85 to 1.15. Each augmentation was randomly applied during training, ensuring diverse representations of input images.

Mitigation of Overfitting: By introducing randomness and variability in the training process, DA minimizes the risk of models memorizing exact training data, a phenomenon known as memory overfitting.

Effective Utilization of Limited Data: Especially in domains like medical imaging, where data can be scarce, DA artificially expands the dataset, providing more training samples without collecting new data.

DA’s impact on computational time

While DA introduces variability in the training data, it also adds a computational overhead due to the augmentation process. However, this overhead is often offset by the reduction in training epochs required to achieve convergence, thanks to the more prosperous and more diverse dataset.

Let’s consider the computational time without and with DA:

  • Without DA: no-DA = epoch Tno - DA​= E×Tepoch​.

  • With DA: DATDA​= (E − ΔE)×(Tepoch​+Taug​).

Where E is the number of epochs, epoch Tepoch​ is the time taken per epoch, ΔE is the reduction in ages due to DA, and Taug​ is the additional time per epoch due to DA.

In many scenarios, DA < no-DATDA​< Tno - DA​, making DA computationally efficient in the long run.

Neural network functioning with DA

When neural networks, particularly CNNs, are trained with DA, they learn to recognize features invariant to the augmentations applied. This means they become adept at identifying pertinent features regardless of changes in hue, brightness, contrast, or other boosts introduced by DA.

Mathematically, if a feature extractor in a neural network is denoted by F, then:

\(F\left( I \right)\,=\,F\left( {DA\left( I \right)} \right)\)

This equation signifies that the essential features extracted from the original and augmented images should be comparable.

Evaluating neural networks with DA

Evaluating the performance of neural networks trained with DA usually involves testing the model on non-augmented data (or sometimes mildly augmented data) to ensure real-world applicability. Performance metrics such as accuracy, precision, recall, and F1-score collectively offer a holistic assessment of the model’s effectiveness and capabilities.

\({\text{Accuracy}}\,=\,{\text{Number}}\;{\text{ of}}\;{\text{ Correct }}\;{\text{Predictions}}/{\text{Total}}\;{\text{ Predictions}}\)

In essence, DA acts as a regularized, enhancing neural networks’ robustness and generalization capabilities, making them more suitable for diverse real-world applications.

The lung cancer detection system, designed to revolutionize the diagnostic process, prominently features an integrated Convolutional Neural Network (CNN) augmented with Differential Augmentation. When a new set of lung scans enters the system, they undergo a series of pre-processing steps. These steps ensure image normalization for consistent intensity values, resizing to a uniform dimension fit for the CNN, and encoding labels into numerical values for supervised training. Once pre-processed, the data is directed into the system’s heart: the CNN with DA model. This model’s sophisticated architecture comprises convolutional layers designed for hierarchical feature extraction, pooling layers for dimensionality reduction while retaining pivotal information, and dense layers fine-tuned for classification.

DA’s role, crucial in the training phase, is to apply random yet meaningful transformations to images in each training epoch. This dynamic augmentation guarantees that the CNN consistently encounters varied representations of data, preventing over-reliance on specific training samples and promoting genuine feature recognition. Following intensive training, the model undergoes meticulous evaluation with an independent test dataset. Key metrics, including sensitivity, specificity, accuracy and F1-score, assess its diagnostic efficacy.

Once its efficiency and reliability are ascertained, the model is integrated into the system’s operational environment. Here, healthcare professionals interact with a user-centric interface, uploading lung scans, initiating diagnostic assessments, and receiving predictions complemented by confidence scores. The system is designed for continuous learning to maintain the model’s relevance and adaptability. As new medical findings emerge and additional data becomes available, the model undergoes periodic retraining, ensuring its alignment with the latest lung cancer diagnostics. By melding the powers of CNN and DA, this system stands as a beacon of hope for timely, accurate, and efficient lung cancer detection, aiming for better patient outcomes and streamlined healthcare processes.

XAI using LIME approach

This research introduces a new method designed to boost lung cancer image classification reliability through the application of Explainable AI (XAI) technologies. The analysis uses Image LIME30 which functions as a specific LIME algorithm variation made for image datasets to show important features active during model decisions. The main goal aims at improving the transparency and prediction reliability of model outputs while examining features which lead to accurate lung cancer image classification results.

LIME (Local Interpretable Model-agnostic Explanations)31 serves as a method which produces understandable explanations for determining the output predictions of any classification system. LIME creates an understandable model that duplicates specialized machine learning model behaviors within localized prediction zones32 Through LIME users can identify key input features since the technique studies how prediction changes when the input data is modified about a specific testing example33. This approach allows LIME to generate explanations that are meaningful for individual predictions, even when only a few variables are locally significant compared to the global context.

In the second stage of our methodology, LIME is employed to visualize and interpret the model’s decision-making process at a local level. This is done by highlighting the key regions or features within an image that significantly contribute to the model’s predictions. Our experiment applied LIME technology to analyze the priority features revealed by the model in interpreting a specific test image Subsequently, we used LIME-generated feature heatmaps to mask the images, revealing only the most critical 6, 8, 10, and 12 features. Model interpretation through this process enables users to understand decision-making steps so classification procedures become more trustworthy and interpretable.

A thorough comparison of models occurs in the final study stage which incorporates both successful predictions from the initial phase and effective LIME-based feature extraction from the second phase. The evaluation methodology for model feature relevance in lung cancer images uses the Intersection over Union (IoU) similarity metric. Users access LIME visualizations to uncover which distinct features each model depends on when making its prediction outcomes. The analysis of extractive efficiency monitors how well model-determined features match with genuine image features present in the data. The IoU score is used as a quantitative measure to compare the binary masked image generated by the model with the ground truth image, enabling the evaluation of the degree of overlap between the selected features. This evaluation method measures how well the model selects appropriate features which lead to correct lung cancer detection.

The IoU metric assesses two set similarities by dividing the ground truth image (GT) intersection area with the masked binary image (MB) area compared to their aggregated regions. The quantitative scale for measuring feature overlap ranges from 0 to 1 through this metric. The measurement scale of the IoU metric exists between 0 for no similarity and 1 for perfect overlap. The formula for calculating the IoU value consists of these terms can be shown in Eq. (17):

$$\:IoU({G}_{T},{M}_{B})=\frac{\sum\:_{j=1}^{N}\sum\:_{i=1}^{M}\left({G}_{T}\right(i,j)\cap\:{M}_{B}(i,j\left)\right)}{\sum\:_{j=1}^{N}\sum\:_{i=1}^{M}\left({G}_{T}\right(i,j)\cup\:{M}_{B}(i,j\left)\right)}$$
(17)

Both GT(i, j) represents ground truth image and MB(i, j) represents masked binary image in this context. The equation offers a quantitative way to determine the degree of image overlap which enables model feature extraction assessment.

Result and discussion

This paper delves into designing and implementing a state-of-the-art lung cancer prediction system using Python and the powerful deep learning frameworks TensorFlow and Keras. TensorFlow, an open-source machine learning library, is renowned for its flexibility and scalability, making it apt for handling complex tasks such as medical image analysis. Keras, a high-level neural networks API, provides an intuitive interface for building and training deep learning models, streamlining the process without compromising capability. This research aims to create a robust lung cancer prediction system by intertwining these technologies. The goal of combining Convolutional Neural Networks and innovative data augmentation techniques is to achieve unparalleled accuracy in detecting and classifying lung cancer from medical images This paper will shed light on the development journey, from data preprocessing and model architecture design to training, evaluation, and deployment. Additionally, it will underscore the significance of integrating AI into healthcare, elucidating how Python, TensorFlow, and Kera stand as the pillars supporting this transformative endeavour.

Dataset

For this study, the lung cancer dataset from the Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases (IQ-OTH/NCCD), publicly available on Kaggle34, was utilized. Published in 2019, this dataset includes CT scans from 110 individuals diagnosed with lung cancer, encompassing various stages: bengin, malignant, and normal. The participants are of different gender, age, and level of education. Of the 110 patients, 40 had malignant lung tumours, 15 had bengin (non-cancerous) tumours, and 55 had normal lungs. Every patient supplied about 80–200 CT scan slice sections which were 1 mm thickness.

The dataset comprises 1,097 CT images are organized into three distinct groups which bengin, malignant, and normal groups of lung cancer diseases. The bengin class contains 120 images but the malignant class possesses 561 images together with 416 images in the normal class. Lung tumors are divided into bengin non-cancerous and malignant cancerous types with additional differentiation between tumour and mass forming tumours. The data set went through training operations for 80% while the next 20% was kept for testing.

The discussion surrounding dataset limitations, particularly the relatively small size of the IQ-OTH/NCCD dataset and potential biases in class distribution, is crucial for understanding the implications of the results obtained from the CNN + DA model. The IQ-OTH/NCCD dataset, while providing high-quality CT scans from a diverse cohort of lung cancer patients, consists of only 1,097 images. This limited sample size may restrict the model’s ability to capture the full spectrum of tumor variations and complexities present in a broader population. Consequently, the model’s performance may be influenced by the specific characteristics of the training data, potentially leading to overfitting on the limited examples it has encountered.

Moreover, the class distribution within the dataset presents another layer of complexity. The dataset includes 561 images of malignant tumors, 120 images of bengin tumors, and 416 images of normal lungs, resulting in an imbalanced representation of classes. Such imbalances can skew the model’s learning process, as it may become biased towards the more prevalent class (malignant tumors) while underrepresenting the nuances of less frequent classes (bengin tumors and normal lungs). This bias can lead to a higher rate of false negatives for bengin cases, which is particularly concerning in clinical settings where accurate differentiation between bengin and malignant conditions is critical for patient management.

To mitigate these Challenges, Further study should consider augmenting the dataset with additional images from diverse sources to enhance its size and balance. Synthetic data generation methods together with transfer learning among bigger diverse sources would improve both robustness and generalization performance of the model. Furthermore, conducting thorough analyses of class-specific performance metrics will provide deeper insights into how well the model performs across different categories, allowing scientists better diagnostic capability understanding. Researchers should implement methods to remedy data set limitations which will make their findings applicable for clinical scenarios and enhance the reliability of the CNN + DA model for lung cancer diagnosis.

This work used the LC25000 dataset35, which consists of histopathological images of lung and colon cancer collected from the Kaggle public platform to assess the enhancing systems proposed here. The dataset, which was assembled by Andrew Borkowski and his team at James Hospital in Tampa, Florida, includes 25,000 Images that are evenly split up among five classes: Three forms of lung cancer and two forms of colon cancer were identified. There are 5,000 images in each class to maintain data distribution conformity. The five classes are lung_aca with images of lung adenocarcinoma, lung_bnt with images of bengin lung tissue, lung_scc with images of lung squamous cell carcinoma, and colon_aca with images of colon adenocarcinoma. More than 95% of all cases of colon cancer are adenocarcinomas, which are always the result of large intestine polyps that go undetected.

Lung adenocarcinoma is a type of lung cancer that involves glandular cells and occurs in the lungs and alveoli, making up more than 40% of all lung cancer cases. Lung squamous cell carcinoma takes more than 30% of lung cancer types and develops in the bronchi. The last two categories are bengin tissues of the colon and lung, which are non-cancerous and do not metastasize; however, their nature should be confirmed through biopsy and may have to be removed.

Worked with the dataset that shares many features with the “A Large-Scale CT and PET/CT Dataset for Lung Cancer Diagnosis (Lung-PET-CT-Dx)” from The Cancer Imaging Archive (TCIA). TCIA is an open-access project developed to facilitate cancer research through sharing open-access cancer imaging data for researchers36. There are 251,135 PET-CT images of lung cancer patients, 106,676 preoperative PET-CT images of lung cancer patients, and 667,246 preoperative diagnostic CT images in the Lung-PET-CT-Dx dataset. The information was supplied by Harbin Medical University’s Second Affiliated Hospital in Harbin, Heilongjiang Province, China. The data is based on patients with lung cancer diagnosed by biopsy of one of four main histopathological types. Radiology annotations that indicate tumor current location for every CT/PET-CT image are also provided. Tumor lesions were manually outlined with rectangular boxes of the same size made using the LabelIMG tool. The annotation process involved five academic thoracic radiologists: One person drew the bounding box, and the other four confirmed it. In this assessment, only the analysis of CT images with a resolution of 1 mm was carried out, rejecting all CT scans with other resolutions for methodological purity.

The NLST37 was a randomized, multicenter trial performed by LSS in collaboration with the American College of Radiology Imaging Network (ACRIN) to compare LSCT in a high-risk population in regard to lung-cancer-specific mortality. The first objective was to evaluate if low-dose helical CT screening could decrease lung cancer mortality among high-risk persons as compared to chest radiography. The study recruited more than 53,000 high-risk individuals aged 55 to 74 who have a history of smoking at least a pack a day from 2002 to 2004. Of these patients, slightly more than half were scanned using low-dose CT, and the rest using chest X-rays. Participants had three follow-up examinations each year through 2007, with annual surveys through 2009.

The low-dose CT arm screened 26,732 participants, and lung cancer was validated in 1,083 patients (~ 4%) during follow-up. However, at each of the three screening time points are T0, T1, T2, a significant proportion of interviewed patients screened positive for potentially malignant lung cancer characterized by non-calcified nodules or masses ≥ 4 mm in diameter or any other suspicious radiographic findings. Though the screening rates were generally low, the examination findings were positive in 27% at T0, 26% at T1, and 16% at T2. Cancer confirmation among the screen-positive patients during screening or follow-up was 9% at T0, 6% at T1, and 8% at T2, respectively.

Table 3 provides a comparison of datasets, outlining their strengths and limitations. It highlights key features and suitability for specific applications, offering insights into their optimal use in various scenarios.

Table 3 Comparison of datasets: strengths and limitations.
Fig. 3
figure 3

Sample Image and the image with DA.

Figure 3 illustrates a sample image alongside its augmented version using DA techniques. This comparison highlights the transformations applied to enhance data diversity and model robustness.

Figure 4 shows the training and validation loss curves after training the model with lung cancer images. It highlights the loss reduction over epochs, indicating improved performance and generalization.

Train and Evaluate model CNN with DA.

Fig. 4
figure 4

After the model train with lung cancer images graph shows train loss validation loss.

Evaluation of the CNN with differential augmentation (DA) algorithm

The presented graph comprehensively evaluates the CNN model augmented with Differential Augmentation (DA) across multiple training epochs. The graph juxtaposes the training and validation performance, giving insights into the model’s learning dynamics and ability to generalize.

Training vs. Validation Curves: The graph showcases two sets of curves one representing the model’s performance on the training data and the other on the validation data. This distinction allows us to gauge the model’s progression in terms of learning and its capacity to perform on unseen data.

Accuracy Trends: A noticeable observation is the trajectory of the accuracy curves for both training and validation datasets. As the epochs progress, the accuracy tends to increase, indicating the model’s improving capability to classify lung cancer images correctly. Any significant divergence between these curves might hint at overfitting or underfitting, crucial aspects to monitor in model evaluation.

Loss Trends: Complementing the accuracy, the loss curves depict how the model’s predictions deviate from the actual labels across epochs. A declining trend in the loss signifies that the model is refining its predictions, aligning them closely with the true tags. The proximity of the training and validation loss curves is an indicator of the model’s consistent performance across both datasets.

Role of DA: Integrating Differential Augmentation (DA) likely contributes to the observed trends. By continuously introducing varied representations of the training images, DA ensures the model does not over-rely on specific patterns, leading to enhanced generalization. This is evident from the synchronized progression of training and validation curves, indicating a reduced propensity for overfitting.

Concluding Remarks: The graph is a testament to the efficacy of integrating DA with CNN for lung cancer image classification. The consistent trends across epochs and the close alignment of training and validation metrics indicate a well-trained, robust model poised for real-world applications.

The CNN + DA model’s high accuracy, sensitivity, and specificity suggest strong potential for clinical applications. However, its utility in real-world settings requires further validation using diverse datasets and testing in clinical workflows. Additionally, robustness against imaging artifacts and varying scanner types must be ensured.

To predict diseases accurately based on the input shown in Fig. 5, the model likely analyzes patterns in data using machine learning techniques. It processes various features to identify potential disease outcomes, enhancing diagnosis precision.

Lung cancer prediction using CNN with DA model

Fig. 5
figure 5

Once the input the predict the accurately model diseases.

Model Forward Pass: The pre-processed image is passed through the CNN. As it traverses through the convolutional layers, essential features indicative of lung conditions are extracted. These hierarchical features, ranging from basic edges to complex patterns, are then processed by the subsequent layers.

Prediction Generation: The final layers of the CNN, typically dense or fully connected layers, take the extracted features and generate a prediction. This could be in-class probabilities, especially if the task is to categorize the lung condition. The class with the highest chance becomes the model’s prediction for the input image.

Post-processing: Post-processing might be applied once the raw prediction is obtained. This could involve converting numerical labels to their categorical counterparts (e.g., ‘Adenocarcinoma’, ‘Normal’) or applying a threshold to determine the final diagnosis.

Output Display: The prediction and any associated confidence scores or probabilities are then presented to the user. In a healthcare setting, this information assists medical professionals in making informed decisions about the patient’s condition and subsequent treatment.

Hyperparameter tuning using random search

Hyperparameter Tuning using Random Search38 involves randomly sampling hyperparameter combinations from predefined ranges, training the model with each combination, and evaluating its performance. This method is efficient, as it does not exhaustively test all possibilities. By focusing on random subsets, it identifies optimal configurations, such as learning rate, batch size, and dropout, to enhance model accuracy and generalization.

The initial training parameters for the models were set as follows: a learning rate of 0.001, a batch size of 32, and a dropout rate of 0.3. After applying Random Search for hyperparameter tuning, these values were optimized to improve the performance of each model. The learning rate was fine-tuned to values between 0.0001 and 0.0005, with lower learning rates proving more effective for ResNet and EfficientNetB0. The batch size was adjusted to 32 for CNN with DA and ResNet, while DenseNet and EfficientNetB0 achieved optimal results with the original batch size of 32. Similarly, the dropout rate was optimized to 0.2 for the Hybrid CNN model, while ResNet required a slightly higher rate of 0.4 to prevent overfitting, and DenseNet and EfficientNetB0 retained a balanced dropout of 0.3. These tuned parameters significantly enhanced the accuracy and F1-score for each model.

Table 4 outlines the training parameters used for the model. It provides details such as learning rate, batch size, and Dropout rate, essential for reproducing the training process.

Table 4 Training parameters.

Compare the performance metrics CNN with DA (Hybrid novel Model), DenseNet, ResNet and EfficientNetB0 Algorithm

The results across different datasets demonstrate that the CNN + DA (Novel) model consistently outperforms other models, particularly when evaluated on the primary IQ-OTH/NCCD dataset (Table 5). This dataset highlights the strength of CNN + DA, achieving the highest metrics in Precision (97.57%), Recall (97.67%), F1-Score (98.78%), and Accuracy (98.78%) with relatively low standard deviations. These results indicate the model’s superior ability to predict lung cancer accurately while maintaining stability and reliability. Compared to other models like DenseNet, ResNet, and EfficientNetB0, CNN + DA demonstrates clear dominance, particularly in this primary dataset.

To evaluate the accuracy of four models are CNN + DA (Novel), DenseNet, ResNet, and EfficientNetB0, a post-hoc analysis was conducted using Tukey’s HSD test shown in Table 6. The analysis revealed significant differences in accuracy between most models. The CNN + DA model demonstrated statistically significant superiority over all other models, with mean accuracy differences of − 4.61, − 5.58, and − 5.97 when compared to DenseNet, ResNet, and EfficientNetB0, respectively (p < 0.0001 for all comparisons). DenseNet also showed significantly higher accuracy than EfficientNetB0 and ResNet, with mean differences of − 1.36 and − 0.97, respectively (p < 0.001). However, no significant difference was observed between EfficientNetB0 and ResNet (p = 0.3655), indicating similar performance between these models. The results highlight CNN + DA as the most accurate model, statistically outperforming both advanced architectures like DenseNet and baseline models like ResNet and EfficientNetB0. This underscores the robustness and reliability of the CNN + DA model in delivering superior accuracy for predictive tasks. These findings position CNN + DA as the most accurate model, showcasing its superior performance and reliability for applications requiring precise predictions. The statistical validation underscores the robustness of the novel architecture and its potential to outperform widely used baseline models like DenseNet, ResNet, and EfficientNetB0.

When tested on the LC25000 dataset (Table 7), CNN + DA also delivers commendable results, achieving the highest Precision (93.02%) and ROC (92.69%). However, DenseNet marginally surpasses CNN + DA in Recall (93.12%), F1-Score (93.16%), and Accuracy (93.22%). Despite this, the lower standard deviations in CNN + DA’s performance metrics suggest greater consistency, making it a competitive option for this dataset. Based on the statistical results, CNN + DA demonstrates the highest performance across all evaluated metrics (precision, recall, F1 score, accuracy, and ROC) when compared to DenseNet, ResNet, and EfficientNetB0. The mean differences in accuracy between CNN + DA and the other models are statistically significant, with p-values well below the typical significance threshold (0.05), indicating strong evidence against the null hypothesis of no difference. Specifically, CNN + DA significantly outperforms DenseNet (mean difference of 1.80), ResNet (mean difference of 3.03), and EfficientNetB0 (mean difference of 3.88), all with low p-values shaown in Table 8. In contrast, DenseNet and ResNet show no significant difference in accuracy, suggesting their performances are more similar. These results highlight CNN + DA’s superior ability to generalize across the dataset, likely due to the combination of powerful CNN architecture and data augmentation, which enhances learning and robustness.

Similarly, the CNN + DA model excels on the Large-Scale CT and PET/CT dataset (Table 9), achieving the highest scores across all metrics, including F1-Score (98.78%) and Accuracy (98.78%). The robustness of the model is again evident from its smaller standard deviations, further solidifying its position as the best-performing and most reliable model for this dataset.

The Tukey’s HSD post-hoc test for model accuracy comparisons on the Large-Scale CT and PET/CT dataset (Table 10) reveals that CNN + DA significantly outperforms DenseNet, ResNet, and EfficientNetB0, with mean accuracy differences of 4.68, 5.78, and 6.14, respectively, all with adjusted p-values well below 0.05, indicating strong statistical significance. In contrast, no significant differences were found between DenseNet, ResNet, and EfficientNetB0, with p-values greater than 0.05, suggesting similar performance levels among these models. These results highlight CNN + DA as the most effective model for this dataset, while the other models show comparable performance but are significantly less accurate than CNN + DA.

On the National Lung Screening Trial (NLST) dataset (Table 11), the performance of CNN + DA remains strong, with the highest Precision (94.23%) and Recall (95.22%). However, ResNet slightly surpasses CNN + DA in terms of F1-Score (94.63%) and Accuracy (96.23%). This suggests that while CNN + DA generally performs well across datasets, there are scenarios where other models, like ResNet, may provide slightly better outcomes for specific metrics.

The results from the Tukey’s HSD post-hoc test and performance metrics on the National Lung Screening Trial (NLST) dataset reveal that CNN + DA outperforms DenseNet and ResNet significantly, with mean accuracy differences of 2.24 and 3.27, respectively, both with p-values below 0.05, confirming the null hypothesis is rejected shown in Table 12. However, no significant difference in accuracy was found between CNN + DA and EfficientNetB0 (mean difference of 0.13, p = 0.85), indicating their performances are comparable. Additionally, DenseNet and ResNet show no significant difference between each other (mean difference of 1.02, p = 0.18), while both outperform EfficientNetB0 with significant accuracy differences (mean differences of -− .11 and − 3.13, respectively, p < 0.05). These findings highlight CNN + DA as the most effective model, followed by DenseNet and ResNet, with EfficientNetB0 being the least effective in terms of accuracy.

The IQ-OTH/NCCD dataset was selected for this research due to its unique strengths in providing high-quality CT scans of lung cancer patients across bengin, malignant, and normal categories. This makes it well-suited for developing models that generalize effectively across diverse patient populations. Unlike other datasets such as LC25000, which focuses on histopathological images, or Lung-PET-CT-Dx, which emphasizes lung cancer subtypes, the IQ-OTH/NCCD dataset offers a real-world representation of lung conditions. Its clear annotations and balanced class distribution further enhance its suitability for lung cancer diagnosis.

Despite its relatively small size of 1,097 images, the dataset compensates with high-quality imaging (1 mm CT slice thickness) and diverse demographic representation, allowing for detailed tumor detection and improved generalizability. Additionally, the superior performance of the CNN + DA model on this dataset underscores its appropriateness for evaluating lung cancer prediction models. These factors collectively establish IQ-OTH/NCCD as the optimal choice for this research.

Table 5 compares the performance of various models, including CNN + DA (Novel), DenseNet, ResNet, and EfficientNetB0, using the IQ-OTH/NCCD dataset. The table highlights key performance metrics along with supporting statistical measures, showcasing the effectiveness of each model.

Table 5 Comparison models with performance metrics with supporting statistical measures using IQ-OTH/NCCD dataset.
Table 6 Tukey’s HSD Post-Hoc test for model accuracy comparisons using IQ-OTH/NCCD dataset.
Table 7 Comparison models with performance metrics with supporting statistical measures using LC25000.
Table 8 Tukey’s HSD Post-Hoc test for model accuracy comparisons using LC25000 Datset.
Table 9 Comparison models with performance metrics with supporting statistical measures using A Large-Scale CT and PET/CT.
Table 10 Tukey’s HSD post-hoc test for model accuracy comparisons using a large-scale CT and PET/CT dataset.
Table 11 Comparison models with performance metrics with supporting statistical measures using the National lung screening trial (NLST) dataset.
Table 12 Tukey’s HSD post-hoc test for model accuracy comparisons using the National lung screening trial (NLST) dataset.

Figure 6 illustrates the comparative performance metrics of CNN + DA, DenseNet, ResNet, and EfficientNetB0. The results highlight the superiority of the proposed model across key evaluation metrics.

Fig. 6
figure 6

Comparative performance metrics for CNN + DA, DenseNet, ResNet and EfficientNetB0, illustrating the superiority of the proposed model.

The Precision-Recall (PR) curves shown in Fig. 7. highlight the performance of different models (CNN + DA, DenseNet, ResNet, and EfficientNetB0) across various datasets (IQ-OTH/NCCD, LC25000, CT/PET-CT, and NLST). The CNN + DA model consistently demonstrates superior precision and recall, reflected in its curves being positioned closest to the top-right corner for all datasets. DenseNet, ResNet, and EfficientNetB0 exhibit progressively lower performance, with EfficientNetB0 typically yielding the lowest precision-recall values. These trends reinforce the robustness and generalizability of the CNN + DA model across diverse datasets.

Fig. 7
figure 7

Precision-recall curves comparing CNN + DA, DenseNet, ResNet and EfficientNetB0, showing better balance between precision and recall in the proposed model.

The evaluation of the proposed CNN model with Data Augmentation (CNN + DA) is conducted through multiple analytical techniques, including correlation heatmaps, residual analysis, homoscedasticity assessment, and Q-Q plots. These evaluations provide deep insights into the model’s predictive performance, error distributions, and generalization capability.

The model’s performance was evaluated using various metrics, including precision, recall, F1-score, accuracy, and ROC. The correlation heatmap reveals a high degree of correlation (close to 1) between all performance metrics, indicating a consistent performance across different evaluation aspects shown in Fig. 8. This strong positive correlation suggests that the model is effectively capturing the underlying patterns in the data and exhibiting a balanced performance.

The histogram of residuals displays a near-normal distribution, suggesting that the model’s residuals are randomly distributed and do not exhibit any systematic patterns. This observation aligns with the assumption of normality for residual analysis, providing further evidence of model adequacy shown in Fig. 9.

The plot of residuals against fitted values, commonly used for homoscedasticity check, shows a scattered distribution of points without any discernible trend or pattern. This indicates that the variance of the residuals is consistent across different fitted values, supporting the assumption of homoscedasticity shown in Fig. 10.

Furthermore, the Q-Q plot of accuracy showcases a linear relationship between the ordered values and the theoretical quantiles, indicating that the distribution of the accuracy values is close to a normal distribution shown in Fig. 11. This finding suggests that the model’s performance is consistent and does not exhibit significant deviations from normality.

Overall, the analysis of these plots and metrics indicates that the model is performing well, exhibiting high performance scores, normally distributed residuals, and a consistent performance across different evaluation aspects. These results suggest that the model could successfully uncover the underlying patterns in the data and generate predictions that are accurate.

Fig. 8
figure 8

Feature correlation heatmap for attribute dependency analysis.

Fig. 9
figure 9

Residual histogram for model adequacy assessment.

Fig. 10
figure 10

Residuals vs. fitted values plot for homoscedasticity check.

Fig. 11
figure 11

Q-Q plot of accuracy for normality assessment.

The proposed CNN with Differential Augmentation (CNN + DA) model demonstrates superior performance compared to established architectures such as DenseNet and EfficientNetB0, primarily due to its innovative approach to mitigating memory overfitting and enhancing robustness against data variability. Memory overfitting, a common challenge in deep learning, occurs when models memorize training data rather than generalizing from it. The CNN + DA model effectively addresses this issue by integrating Differential Augmentation (DA), which applies diverse transformations such as hue, brightness, saturation, and contrast adjustments during training. This exposure to a broader spectrum of data variations discourages memorization and promotes genuine learning, leading to improved generalization capabilities. Furthermore, the dynamic nature of DA allows the CNN + DA model to adapt to real-world variations in imaging conditions, enhancing its robustness and performance across diverse patient populations and imaging protocols. The architecture of the CNN + DA model is designed to leverage hierarchical feature extraction, enabling it to identify critical features indicative of lung cancer, regardless of the variations introduced by DA. Validation across multiple datasets, including the IQ-OTH/NCCD dataset, LC25000, and the Large-Scale CT and PET/CT dataset, underscores the model’s ability to generalize effectively, achieving an accuracy of 98.78%, significantly higher than that of DenseNet and EfficientNetB0. Statistical analyses, including Tukey’s HSD post-hoc tests, confirm the significance of the CNN + DA model’s superior performance, with mean accuracy differences that are statistically significant (p < 0.0001). These findings position the CNN + DA model as a valuable tool for clinical applications in lung cancer detection, with the potential for further exploration of its adaptability to other medical imaging tasks, thereby advancing deep learning methodologies for complex diagnostic challenges.

Error analysis on four datasets

Possible errors are also characterized by misclassifications in terms of error rate, which gives a clear understanding of the weak sides of the model. Others with relatively higher error rates in our experiments (e.g., 7.36% for EfficientNetB0 on the IQ-OTH/NCCD dataset) have higher misclassification rates, especially for those images that are just slightly different from one another. This means that with similar appearances, their differentiation can be a problem for the model, for example in the case of early stage diseases or noisy data. However, in the CNN + DA proposed with the lowest error rate to 1.22%, the model still occasionally misclassifies subjects This is common, particularly in individual-specific scenarios that the identification algorithm may not recognize well enough. Also, different sets of error rates are observed, which is true with EfficientNetB0 model having slightly low error rate at 3.77% when tested on the NLST dataset but high error rate of 8.86% on the LC25000 dataset to suggest that the architecture seems to have difficulty in generalizing across different data distribution or image qualities. High error rates as results from some problems like overlapping of classes, low quality input data or imbalanced data where minority classes are misclassified. This is has many advantages like enhanced accuracy of the diagnosis in various conditions, increased ability to generalize over various situations and scenarios and hence, increased patient care by minimizing errors. Specific characteristics, for instance improving the quality of data input, controlling the imbalance in the number of instances in classes and modifying model structures can help reduce errors to acceptable levels to make a model useful in practical applications.

Ethical and practical implications

The use of CNN with Differential Augmentation (DA) model for the detection of lung cancer comes with some certain unethical and practical issues. Data privacy and security is an important factor to consider since such data needs to be shared, encrypted etc. To reduce bias and unfairness to patients, bias from multiple sets of data is prevented; Mitigating bias through diverse datasets promotes fairness, while explainable AI enhances transparency for clinicians. Technical considerations are checking the actual performance of the proposed model in different real-world datasets and handling the differences in imaging to have general applications. eamless integration into clinical workflows with user-friendly interfaces is crucial for adoption. These steps acts as crucial so that the integrated CNN + DA model can be transformed into a reliable and accurate diagnosis tool which would assist in early diagnosis and better treatment, all the while, trying to prevent ethical and practical issues in its implementation.

Results for XAI using LIME

The evaluation of Convolutional Neural Network (CNN) models using the Intersection over Union (IoU) metric is presented in two contexts: bengin and malignant cases. Table 13 (for the bengin case) and Table 14 (for the malignant case) compare the performance of four models: DenseNet, ResNet, EfficientNetB0, and CNN + DA (CNN with Data Augmentation). In Table 13, CNN + DA achieves the highest average IoU score of 0.41, indicating superior performance in identifying relevant features across different feature sets (6, 8, 10, and 12 features). EfficientNetB0 follows with an average IoU of 0.38, while DenseNet and ResNet score 0.37 and 0.33, respectively. Similarly, in Table 14 (malignant case), CNN + DA again leads with an average IoU of 0.38, demonstrating consistent feature identification. EfficientNetB0 and DenseNet follow with average IoU scores of 0.31 and 0.33, respectively, while ResNet trails with 0.29. Figures 12 and 13 likely illustrate the architectures or efficiency of these models, visually complementing the tabular data. The results highlight CNN + DA as the most effective model in both bengin and malignant cases, followed by EfficientNetB0 and DenseNet, due to their ability to consistently align predictions with ground truth across varying feature sets.

Fig. 12
figure 12

Visualization of lung image analysis: original image of bengin case, LIME heatmaps, and feature-based masking for 6, 8, 10, and 12 important features.

Table 13 IoU score of the selected pretrained models for Bengin case.
Fig. 13
figure 13

Visualization of Lung Image Analysis: Original image of Malignant case, LIME Heatmaps, and Feature-Based Masking for 6, 8, 10, and 12 Important Features.

Table 14 IoU score of the selected pretrained models for maligant case.

Performance comparison of existing works with proposed work

The proposed CNN with Differential Augmentation (CNN + DA) model achieves a groundbreaking accuracy of 98.78%, outperforming existing state-of-the-art approaches in lung cancer diagnosis. Prior works have explored various deep learning models and augmentation techniques, yet gaps in addressing memory overfitting and achieving robust generalization remain. Critically, a study by Islam et al. (2024)16 developed a combination of GANs and VAEs with 94% accuracy and Saha et al. (2024)17 used the VER-Net method, a transfer learning-based method, with 91% accuracy.

Traditional Convolutional Neural Networks (CNNs) by Rainio and Klén (2024)18 and Kukreja and Sabharwal (2024)19 demonstrated accuracies of 92.6% and 96.11%, respectively, with no explicit focus on mitigating overfitting.

Advanced ensemble and hybrid models have shown promise, with Quasar et al. (2023)22 achieving 96.34% accuracy using a combination of BEiT, DenseNet, and Sequential CNN, while Raza et al. (2023)23 and Zhang et al. (2024)20 reported accuracies of 96.10% and 96% using Lung-EffNet and DenseNet-CNN Integration, respectively. Similarly, Gautam et al. (2023)24 utilized an ensemble of ResNet-152, DenseNet-169, and EfficientNet-B7, achieving 97.23%, and Dritsas and Trigka (2022)25 employed Rotation Forest for a comparable accuracy of 97.1%. However, these models either relied on complex architectures or lacked tailored strategies to prevent memory overfitting.

In contrast, the CNN + DA model integrates Differential Augmentation to introduce diverse transformations, such as hue, brightness, and contrast adjustments, directly addressing overfitting and enhancing generalization. Compared to other works, the CNN + DA model exhibits superior accuracy, proving its reliability and robustness for clinical applications in lung cancer detection. Its innovative approach bridges the gaps left by prior models, solidifying its contribution as a novel and highly effective diagnostic tool. Table 15 compares the performance of existing methods with the proposed approach, highlighting improvements in accuracy, efficiency, or other key metrics.

Table 15 Performance comparison of existing works with proposed work.

Limitations

The primary dataset used (IQ-OTH/NCCD), consisting of only 1,097 images, is relatively small for training deep learning models, which limits the model’s ability to generalize across a broader spectrum of lung cancer variations and patient demographics. This dataset’s class distribution is also imbalanced, with more malignant cases than bengin or normal ones, potentially biasing the model toward the majority class. While the CNN + DA model performs well on the controlled dataset, its performance in real-world clinical environments, which involve diverse imaging protocols, patient demographics, and potential artifacts, has yet to be validated. The model might struggle to handle unseen variations in imaging conditions common in clinical settings. Additionally, the integration of Differential Augmentation (DA) during training introduces extra computational overhead, increasing both training time and resource requirements, which could be a challenge in resource-constrained environments. Finally, the model’s performance is highly dependent on the quality of input data; noisy or low-quality images could lead to misclassifications, particularly when distinguishing between bengin and malignant tumors.

Conclusion

Lung cancer remains a significant global health challenge, requiring innovative solutions to improve diagnostic accuracy. This study addresses the persistent problem of memory overfitting in deep learning models, which impairs their ability to generalize across diverse datasets As an effort to improve the quality of lung cancer detection models, this research seeks to design a new method that involves the merge of a Differential Augmentation technique with a Convolutional Neural Network.

The effectiveness of the proposed CNN + DA model has been demonstrated, attaining the prediction accuracy of 98.78% that promoted the model to the highest level of various existing methods, such as DenseNet, ResNet, and EfficientNetB0. By leveraging targeted augmentation techniques such as hue, brightness, and contrast adjustments, the model effectively reduces overfitting, improving its reliability in diverse imaging scenarios. Statistical analyses, including Tukey’s HSD post-hoc tests, confirm the model’s significance in achieving higher accuracy and generalization.

As shown in this research, incorporating Differential Augmentation with CNNs is a good solution to the problems existing in traditional models. Thus, the conclusions point out the need of developing models that do not only solve the problems precisely, but also are immune to adversarial perturbations and can be easily scaled to tackle a wider range of problems.Future work can further explore the adaptability of this approach to other domains and datasets, expanding its utility in advancing deep learning research for complex diagnostic tasks.