Transformer-inspired training principles based breast cancer prediction: combining EfficientNetB0 and ResNet50

Shahzad, Tariq; Mazhar, Tehseen; Saqib, Sheikh Muhammad; Ouahada, Khmaies

doi:10.1038/s41598-025-98523-w

Download PDF

Article
Open access
Published: 18 April 2025

Transformer-inspired training principles based breast cancer prediction: combining EfficientNetB0 and ResNet50

Tariq Shahzad¹,
Tehseen Mazhar^2,4,
Sheikh Muhammad Saqib³ &
…
Khmaies Ouahada¹

Scientific Reports volume 15, Article number: 13501 (2025) Cite this article

2559 Accesses
1 Citations
1 Altmetric
Metrics details

Subjects

A Correction to this article was published on 22 September 2025

This article has been updated

Abstract

Breast cancer is a leading killer and has been deepened by COVID-19, which affected diagnosis and treatment services. The absence of a rapid, efficient, accurate diagnostic tool remains a pressing issue for this severe disease. Thus, it is still possible to encounter issues concerning diagnostic accuracy and utilization of errors in the sphere of machine learning, deep learning, and transfer learning models. This paper presents a new model combining EfficientNetB0 and ResNet50 to improve the classification of breast histopathology images into IDC and non-IDC classes. The implementation steps, it include resizing all the images to be of a standard size of 128*128 pixels and then performing normalization to enhance the learning model. EfficientNetB0 is selected for its efficient yet effective performance while ResNet50 employs deep residual connections to overcome the vanishing gradient problem. The proposed model that incorporates some of the characteristics from both architectures turns out to be very resilient and accurate in classification. The model demonstrates superior performance with an accuracy of 94%, a Mean Absolute Error (MAE) of 0.0628, and a Matthews Correlation Coefficient (MCC) of 0.8690. These results outperform previous baselines and show that the model performs well in achieving a good trade-off between precision and recall. The comparison with the related works demonstrates the superiority of the proposed ensemble approach in terms of accuracy and complexity, which makes it efficient for practical breast cancer diagnosis and screening.

A hybrid deep learning model EfficientNet with GRU for breast cancer detection from histopathology images

Article Open access 09 July 2025

Hybrid convolutional neural network and bi-LSTM model with EfficientNet-B0 for high-accuracy breast cancer detection and classification

Article Open access 09 April 2025

Leveraging an ensemble of EfficientNetV1 and EfficientNetV2 models for classification and interpretation of breast cancer histopathology images

Article Open access 01 July 2025

Introduction

Cancer is a leading public health problem across the world and ranks second as the cause of mortality in the United States. The COVID-19 pandemic worsened this factor by contributing to the delay of cancer diagnosis and treatment due to the shutting down of health facilities, interference with employment and health insurance, and avoidance of facilities due to the virus. While the most severe impact has been identified in mid-2020, the healthcare sectors are still not through with the effects of the pandemic. For instance, at the Massachusetts General Hospital, the surgical oncology operation was at 72% of the 2019 level only in the second half of 2020 and just 84% in 2021, which is the lowest rate of recovery among all surgical specialties¹. Malignant tumors of the breast have become the most prevalent form of cancer today; it has one of the highest mortality rates². Worryingly, recent years have shown a tendency toward increasing disease incidence in younger people³. Other than the health consequences, breast cancer also come with other psychological impacts on affected individuals. Initially, the investigation into the causes of breast cancer was limited to genetic and environmental aspects⁴. However, these factors have not fully defined the originating causes of the disease. Novel bacterial DNA sequencing technologies have highlighted that the gut microbiota is connected with the progression of numerous types of cancer, including breast cancer⁵. Such evidence shows that gut microbiota can also contribute to the development of breast cancer⁶, expanding the pathogenesis factors that have traditionally been associated with genetic and environmental causes. Cancer is classified into two main categories: histopathology which relates to various types of tissues that are affected with cancer and primary sites regarded as initial sites, locations of cancer within the body. There are many different types of cancer and histopathology is one of the most widely used classification systems⁷. Based on the site of origin, the cancers are second in their frequency. The ICD-O-3 classifications provide a global standardized terminology for naming these conditions and their histological classification is the consensus of opinion of oncologists. Approximately, primary types of cancer can be observed in six major groups in line with the histological classification⁸.

Prominent risk factors for breast cancer are gender with female, people’s age, family history of breast cancer, genetic factors such as BRCA1 and BRCA2 genes, history of breast conditions or cancers, high breast density, radiation exposure. Other factors include alcohol intake, being overweight or obese, no exercise or physical inactivity and use of hormone replacement therapy (HRT). It is important to be aware of these risks as the first signs of onset and measures to avert them should also be promptly employable⁹.

Breast cancer can be treated best if detected early and when the affected persons receive treatment on time. The American Cancer Society (ACS) has estimated 287,850 new cases and 43,780 deaths in US for 2022¹⁰. Cohort ACS estimates for 2023 were 321,070 new cases and 44,640 deaths for 2023 the ACS estimated 300,590 new cases and 43,170 deaths¹. The screening programs that employ mammography (MG) and ultrasonography (US) aim at early diagnosis of breast cancer since they detect neoplasms including microcalcifications, architectural distortions and solid masses¹¹. These early-stage treatments are often less invasive and have higher success rates, improving patient survival and quality of life. However, the growing number of cases and the risk of false or negative results have spurred the development of advanced technologies to enhance diagnosis. The process can be time-consuming and challenging due to the need to detect minor details in high-resolution images. Recent advancements in artificial intelligence (AI) for computer vision have introduced algorithms that assist physicians in identifying, delineating, and categorizing malignant lesions in medical images¹². Numerous deep learning (DL) models for breast cancer imaging, both commercial and publicly available, have been developed and demonstrated strong performance in recent comparison studies¹³. While these methods show promising results, many studies lack comprehensive evaluations of metrics such as accuracy, precision, recall, Mean Absolute Error (MAE), Matthews Correlation Coefficient (MCC), Cohen’s Kappa, Log Loss, and Hamming Loss, as well as the computational resources required for implementation. To overcome these limitations, a novel ensemble model combining EfficientNetB0 and ResNet50 transfer learning techniques has been proposed. This model merges the efficient design of EfficientNetB0 with ResNet50’s optimized cell structures, using a 128 image size, normalization transforms, and training inspired by transformer principles. The combination of EfficientNetB0’s efficiency and scalability with ResNet50’s depth and accuracy create a robust and high-performing solution, significantly enhancing image classification tasks through this complementary approach.

The difficulty of tissue architecture and the potential for human error in manual histological image analysis provide considerable challenges in the correct diagnosis of breast cancer. While regular screening modalities such as mammography and ultrasound play a vital role in early recognition, they are exposed to fake positives and negatives, which may lead to delayed or false diagnoses. Advances in artificial intelligence have facilitated the development of deep learning algorithms, which have demonstrated significant efficiency in medical graphic classification by improving diagnostic precision. Convolutional neural networks(CNNs) have shown particular promise in histopathological image analysis by capturing intricate morphological patterns that may not be easily discernible through traditional methods. To improve classification efficiency while ensuring computing efficiency, an outfit model integrating EfficientNetB0 and ResNet50 has been introduced. EfficientNetB0 employs acid scaling to optimize network depth, height, and resolution, thereby achieving higher performance with reduced computing overhead. In parallel, ResNet50’s heavy residual learning platform enhances feature removal and mitigates the vanishing curve problem, facilitating more effective learning in heavy networks. By leveraging the complementary strengths of these architectures, the proposed method aims to improve diagnostic accuracy while addressing computing constraints associated with profound studying applications in medical scanning.

Major objectives of proposed work are given below:

Establish the connection between breast cancer risk factors and histopathological imaging by analyzing their influence on image classification.
Develop an advanced deep learning-based histopathological image classification model using EfficientNetB0 and ResNet50.
Enhance diagnostic accuracy and reliability in breast cancer detection by mitigating false positives/negatives through an ensemble learning approach.
Assess the impact of model selection on classification performance using key evaluation metrics such as accuracy, precision, recall, MCC, and Hamming Loss.
Optimize computational efficiency and scalability for real-world clinical applications while maintaining high classification performance.

Literature work

Neurological disorders, such as strokes and cerebral vascular occlusions, pose significant global health challenges due to their high mortality and long-term disability rates. Early diagnosis, especially within the first hours, is crucial for improving patient outcomes. While MRI has advanced, traditional methods struggle to capture the full complexity of brain lesions. Deep learning, particularly in MRI-based studies, has shown great promise in detecting and segmenting brain anomalies. This review examines 61 studies from 2020 to 2024, comparing CNN and Vision Transformer (ViT)-based approaches, discussing dataset diversity, data privacy, and model explainability challenges. The review highlights advanced models like U-Net variants and transformers as promising tools for improving diagnostic accuracy and facilitating personalized treatment. The findings underscore the need for ethically secure frameworks, diverse datasets, and better model interpretability to transform clinical practices in managing neurological disorders¹⁴. Feature extraction algorithms have been used in numerous research to classify images of breast cancer histopathology, demonstrating the growing trend in biomedical engineering towards machine learning. Kowal et al.¹⁵ segmented nuclei using 500 images from fine-needle biopsies, and then they retrieved 42 morphological, topological, and textural features. Three different classifiers that were trained on these features were then used to classify the photos that were benign and malignant. Deep learning models can automatically extract complex and high-level features from images, in contrast to typical machine learning techniques which depend on human-built features¹⁶. As a result, deep learning approaches have been used in many studies recently to classify histological images of breast cancer¹⁷. Numerous approaches have been used, both with and without models that have already been trained.

Histopathology image identification has been given priority in medical image processing because pathological testing is the gold standard for diagnosing breast cancer. This work uses the Bioimaging 2015 dataset as the foundation for a two-stage nuclei segmentation method. The technique uses watershed segmentation to distinguish cancer from non-carcinoma after stain separation of histological images. First, stain separation is used to extract nuclei from breast cancer images, and then marker-based watershed segmentation is applied. Texture features are extracted using the local binary pattern method, while colour features are obtained through colour auto-correlation. These features are then combined and used with a support vector machine for classification. The proposed method achieved a recognition accuracy of 91.67% on the Bioimaging 2015 dataset and 92.50% on the ICIAR 2018 dataset. Although the method shows promising results, the accuracy could still be improved, especially when compared to other advanced deep-learning models in the field. Additionally, the computational complexity of the two-stage segmentation process may limit its practicality for real-time applications¹⁸.

The computational efficiency of the proposed ensemble model, integrating EfficientNetB0 and ResNet50, surpasses that of¹⁹, which employed the InceptionResNetV2 model for breast cancer classification. EfficientNetB0 utilizes compound scaling to optimize resource allocation, while ResNet50’s deep residual learning architecture mitigates gradient vanishing, leading to improved training stability. This combination not only enhances classification accuracy but also reduces computational overhead, making it more feasible for practical deployment in real-time diagnostic settings.

The proposed work is superior as compared to²⁰ with respect to computationally, primarily due to the combined use of EfficientNetB0 and ResNet50, which offer a more efficient balance of high performance and reduced computational overhead. EfficientNetB0 optimizes network scaling, allowing for higher accuracy with fewer resources, while ResNet50’s residual learning helps in training deep models without excessive computational cost. In contrast²⁰, relies on YOLO-based models, which, despite their high performance, are computationally intensive, particularly with larger image sizes and complex architectures like the Swin Transformer.

Compared to²¹, the proposed work offers superior computational efficiency, leveraging the combination of EfficientNetB0 and ResNet50. EfficientNetB0 enhances performance with optimized scaling while minimizing resource consumption, and ResNet50’s deep residual learning contributes to more efficient training. In contrast, models like the Vision Transformer in [Paper-3] are more computationally intensive, especially for medical image classification tasks.

The work of²² focuses on deep learning models for automating cervical cancer diagnosis using Pap smear images, achieving an accuracy of 99.48% with Vision Transformer (ViT)-based models and EfficientNetV2-Small. While these models show high diagnostic accuracy, their computational complexity may require more resources, potentially limiting their scalability. In contrast, the work of the proposed model combines EfficientNetB0 and ResNet50, offering a more efficient approach with reduced computing overhead. It achieves an accuracy of 94%, with precision, recall, and F1 score all at 94%, while maintaining a good balance between accuracy and computational efficiency, making it more suitable for practical and scalable breast cancer screening applications.

The ensemble model combining EfficientNetB0 and ResNet50 is optimized for image classification tasks, providing high accuracy and computational efficiency, making it ideal for applications such as breast cancer detection. In contrast, Mask R-CNN²³, designed for instance segmentation, is more complex and computationally intensive due to its dual task of object detection and pixel-level segmentation, making it less suitable for tasks focused on classification alone. The proposed ensemble model’s efficiency and specialized focus on classification make it more suitable for the intended task.

The proposed ensemble model combining EfficientNetB0 and ResNet50 is optimized for image classification, offering high accuracy and computational efficiency for tasks like breast cancer detection. In contrast, the model in²⁴ focuses on leukemia detection and is tailored for specific medical data rather than general image classification, while²⁵ use of DenseNet and EfficientNet-B5, though promising, lacks the complementary depth offered by ResNet50.

The proposed EfficientNetB0-ResNet50 ensemble surpasses²⁶ traditional machine learning models by leveraging advanced CNN architectures optimized for high-resolution medical image classification. EfficientNetB0 ensures computational efficiency through compound scaling, while ResNet50’s residual learning enhances feature extraction and prevents vanishing gradients. This combination results in superior diagnostic accuracy, robustness, and generalization for breast cancer classification.

EfficientNetB0 and ResNet50 together provide a more balanced approach by optimizing depth, width, and resolution while maintaining computational efficiency. Unlike²⁷ complex pipeline with multiple preprocessing steps and model integrations, this ensemble model simplifies feature extraction while preventing vanishing gradients. This ensures higher accuracy, robustness, and faster real-time classification in breast cancer detection.

One of the drawbacks of applying the deep learning model to HER-2/neu scoring in clinical practice is that it yields different levels of accuracy depending on the type of imaging²⁸. ERBB2, a protein associated with some kinds of breast cancer, needs to be scored well for appropriate management of the disease. This work introduced the use of DenseNet201, GoogleNet, MobileNet_v2, and Vision Transformer models to develop an end-to-end deep learning-based system for HER-2/neu scoring from WSI. Of the assessed models, the highest accuracy of 92% was received by the Vision Transformer. Nevertheless, specific challenges in translating this automated scoring to clinical practice persist²⁹. Likewise, breast cancer diagnosis relies on cell pathology and is both cumbersome and prone to subjectivity. This paper proposed a CNN model based on the Inceptionv3, applying transfer learning to distinguish between benign and malignant images of pathology with an accuracy higher than 92% on the BreaKHis dataset. However, computational considerations and the requirements for better accuracy pose some of the biggest problems³⁰. Specifically, early diagnosis is critical in breast cancer, but conventional mammography contains errors and imprecise tumor positioning. A hybrid approach of U-Net and YOLO achieved a better accuracy score of 93% and AUC of 98% in detecting and localizing lesions from mammography images. However, it is also important to note that the proposed model has a high computational complexity that may pose a problem for its implementation³¹. In this work, an EfficientNet deep learning model was presented to detect and classify breast cancer cases. The CBIS-DDSM dataset was used in this study to train and evaluate the mammography image recognition algorithm. The accuracy and AUC of the model were 0.75 and 0.83, respectively. This illustrates how deep learning methods can enhance the identification and diagnosis of breast cancer³². They proposed a Deep Convolutional Neural Network (CNN) that is intended to classify breast anomalies, including calcification, tumors, asymmetry, and malignancy. We trained this convolutional neural network using the ResNet50 model. With an accuracy rate of 88%, the model in question demonstrated a great deal of promise for enhancing breast cancer detection and therapy³³. After reviewing the existing work in this area, it is evident that current methods are not only computationally expensive but also have gaps in accuracy. The proposed model addresses these issues by achieving a 94% accuracy rate, significantly improving upon previous approaches. This enhanced accuracy helps to fill the gap left by existing methods and offers a more reliable solution for diagnosing the disease.

Proposed methodology

Figure 1 presents a combined ensemble model derived from EfficientNetB0 and ResNet50 with a view of classifying histopathology images of the breast as either IDC or non-IDC. Below is a detailed explanation of each part of the model:

Tunning of parameters

Taking into account the best practices in deep learning, a heuristic-based hyperparameter selection method was used. The batch size of 16 avoids excessive memory usage while preserving generalization by striking a compromise between consistent gradient updates and computing performance. For fine-tuning pretrained models, a learning rate of 1e-4 works well since it ensures progressive convergence without going overboard with the ideal solution. By reducing oscillations and enhancing optimization stability, a momentum of 0.9 speeds up training. By penalizing high weight values, the weight decay of 1e-4 serves as a regularization term and reduces overfitting. In order to maximize efficiency while preserving computational viability, an image size of 128 was used. Robust model generalization and effective training dynamics are guaranteed by these decisions.

Breast histopathology images

Breast histopathology images are the inputs to the model to start with. These images are mostly taken from the tissues and play a major role in identifying possible abnormalities such as Invasive Ductal Carcinoma (IDC). The raw images and their analysis are processed at or within the model pipeline as well. The input images can be represented as matrices as seen in Eq. 1 below:

$$\:\text{I}\in{\text{R}}^{\text{H}\times\text{W}\times\text{C}}$$

(1)

where H is the height, W is the width, and C is the number of color channels.

Image normalization

Before the images are fed into the deep-learning models, they undergo an essential preprocessing step known as image normalization. This process, utilizing transformations from the torchvision library, standardizes the images by adjusting their mean and standard deviation to specific values: mean = [0.485, 0.456, and 0.406] and standard deviation = [0.229, 0.224, and 0.225]. The images are also resized to 128 × 128 pixels to maintain consistency across the dataset. As a result, the breast histopathology images are normalized and ready for input into the EfficientNetB0 and ResNet50 models.

Each input image I is normalized by subtracting the mean and dividing by the standard deviation for each channel. If I_c(x,y) represents the pixel value at location (x,y) in channel c (where c could be R, G, or B for RGB images), the normalization can be expressed as shown in Eq. 2:

$$\:{I}_{c}^{{\prime\:}}\left(x,y\right)=\frac{{I}_{c}\left(x,y\right)-{{\upmu\:}}_{c}}{{{\upsigma\:}}_{c}}$$

(2)

where $\:{\mu\:}_{c}$ is the mean for channel c and $\:{\:\sigma\:}_{c}$ is the standard deviation for channel c. additionally, the image is resized to 128 × 128,128 pixels.

EfficientNetB0 model

The first model in the ensemble is EfficientNetB0, a convolutional neural network (CNN) recognized for its balance of efficiency and performance. Its architecture consists of several key modules:

Module-1

This module begins with Depthwise Convolutional 2D layers, which lower computational costs by separating spatial and depth convolutions. Next, Batch Normalization is applied to stabilize and speed up the training process, followed by an activation function to introduce non-linearity. The Depthwise Convolution and Batch Normalization are performed using the formulas in Eqs. 3, 4 and 5.

$$\:\text{DepthwiseConv2D}\left(x\right)={W}_{dw}*x$$

(3)

where Wdw denotes depthwise convolutional filters and ∗ is the convolution operation and x typically represent the input feature map (or input tensor) to the depthwise convolution layer.

$$\:\text{BatchNorm}\left(x\right)=\frac{x-\mu}{\sqrt{{\sigma\:}^{2}+\epsilon}}\gamma\:$$

(4)

Where µ and σ² are the mean and variance, γ gammaγ and β beta are learned parameters and ϵ epsilon is a small constant.

$$\:\text{ReLU}\left(x\right)=\text{max}\left(0,x\right)$$

(5)

Module-2

This module as described in Eq. 6 below contains Global Average Pooling (GAP), which decreases the spatial dimensions of the feature maps. This is followed by rescaling and use of two successive Conv2D layers described as shown in the following Eq. 7;

$$\:\text{GAP}\left(x\right)=\frac{1}{H\times\:W}{\sum\:}_{i=1}^{H}{\sum\:}_{j=1}^{W}{x}_{i,j}$$

(6)

Where H and W are the height and width of the feature map.

$$\:\text{Conv2D}\left(x\right)={W}_{2D}*x$$

(7)

where W_2D are the convolutional filters and x represents the input feature map (or tensor) to the Global Average Pooling (GAP) and Conv2D layers.

Module-3

This module parallels the similar pattern of Module-1 by fulfilling Zero Padding as indicated in Eq. 8 to maintain the spatial dimension. It consists of several numbers of DW conv2 d layers, bn layers, and activation layers.

$$\:\text{ZeroPadding}\left(x\right)=\text{Pad}\left(x,\:\text{padding}\right)$$

(8)

where padding specifies the amount of padding and x represents the input feature map that undergoes zero padding.

Module-4

An important ingredient in this module is performing element-wise multiplication between the feature maps and then applying the Conv2D, and Batch Normalization layers to make the features finer. The above entire procedure is mentioned in Eq. 9 and Eq. 10 respectively. Where x is a feature map (tensor) with shape (H, W,C), y is another feature map (tensor) of the same shape (H, W, C).

$$\:z\:=\:x\:\odot\:y$$

(9)

where ⊙ denotes the element-wise multiplication of feature maps x and y.

$$\:\text{BatchNorm}\left(\text{Conv2D}\left(z\right)\right)=\frac{\text{Conv2D}\left(z\right)-{\upmu\:}}{\sqrt{{{\upsigma\:}}^{2}+\epsilon}}{\upgamma\:}$$

(10)

This operation enhances or modulates features by scaling each spatial location and channel independently. Where µ is the mean of the batch (computed over the batch dimension), σ² is the variance of the batch, ϵ is a small constant to prevent division by zero and $\:{\gamma}$ is a learnable scaling parameter in Batch Normalization.

Module-5

The last module in EfficientNetB0 is similar to the previous one and includes a Conv2D layer, batch normalization layer and a dropout layer. Another approach is illustrated in Eq. 11 below: dropout which drops out random units during training to prevent overfitting.

$$\:\text{Dropout}\left(x\right)=x\odot\:\text{Bernoulli}\left(p\right)$$

(11)

where Bernoulli(p) represents a dropout mask with dropout probability p.

Output

The output from these modules is termed as Output-1, which will be merged with the ResNet50’s output later on.

ResNet50 model

Following EfficientNetB0 in the discussed ensemble architecture, there is another deep residual network called the ResNet50. ResNet50 is another variant of the ResNet family that successfully carries out the training of deep networks due to the usage of residual connections that help in avoiding the vanishing gradient issues. The architecture is composed of multiple stages:

Stage 1

Zero padding is applied to the input for preserving dimensions and then it proceeds to a Conv layer and Batch Norm layer. A ReLU activation function is then applied to put the network through non-linearity.

$$\:\text{ZeroPadding}\left(x\right)=\text{Pad}\left(x,\text{padding}\right)$$

(12)

$$\:\text{Conv}\left(x\right)={W}_{conv}\text{*}x$$

(13)

$$\:\text{ReLU}\left(\text{BatchNorm}\left(\text{Conv}\left(x\right)\right)\right)=\text{max}\left(0,\text{BatchNorm}\left(\text{Conv}\left(x\right)\right)\right)$$

(14)

where x is input feature map (tensor) of shape (H, W,C) and padding is the amount of zero-padding added around the spatial dimensions H and W.

Stage 2 to 5

These stages consist of convolution blocks (Conv Blocks) and identity blocks (ID Blocks) where Conv Blocks and ID Blocks are composed of several convolution layers, batch normalization and ReLU activation. The identity blocks omitted some of the connections to enhance the flow of gradients during the backpropagation process. Each block can be represented in eq-5 where x in input feature map.

$$\:B\left(x\right)=\text{ReLU}\left(\text{BatchNorm}\left(\text{Conv}\left(\text{ReLU}\left(\text{BatchNorm}\left(\text{Conv}\left(x\right)\right)\right)\right)\right)\right)$$

(15)

$$\:\text{ResidualBlock}\left(x\right)=\text{Conv}\left(x\right)+x$$

(16)

Average pooling and flattening

The final feature representation which is Output-1 for EfficientNetB0 and Output-2 for ResNet50 is concatenated to have a combined single feature. This concatenation is done using the torch with the following formula concatenated image = torch.cat ((original image, image with annotated bounding boxes)). cat function that coalesces the outputs along an indicated dimension. The final ensemble model is the combined feature vector that is a combination of EfficientNetB0 and ResNet50’s features.

$$\:\text{AveragePooling}\left(x\right)=\frac{1}{H\times\:W}{\sum\:}_{i=1}^{H}{\sum\:}_{j=1}^{W}{x}_{i,j}$$

(17)

where x is the input feature map (tensor) of shape (H, W, C), H is the height of the feature map, W is the width of the feature map, xi, j is the value of the feature map at spatial location (i, j) and C is the number of channels.

$$\:\text{Flatten}\left(x\right)=\text{Reshape}\left(x,\left[N,-1\right]\right)$$

(18)

where N is the batch size and − 1 automatically infers the correct number of elements (in this case, C, the number of channels).

Ensemble model construction

The outputs from EfficientNetB0 (Output-1) and ResNet50 (Output-2) are concatenated to form a unified feature vector. This concatenation is done using the torch.cat function, which merges the outputs along a specified dimension. The combined feature vector represents the final ensemble model, which leverages the strengths of both EfficientNetB0 and ResNet50.

$$\:\text{CombinedOutput}=\text{torch.cat}\left(\text{Output-1},\text{Output-2},\text{dim}\right)$$

(19)

where dim specifies the dimension along which the outputs are concatenated.

Softmax prediction

The combined output from the ensemble model is passed through a softmax layer, which generates probabilities for the two possible classes: Imprints: IDC and Non-Idc. Softmax function scales the output values to form a probability distribution over the classes. Finally, the model classifies each input image as either IDC (represented by [1, 0]) or Non-IDC (represented by [0, 1]) based on the highest probability.

$$\:\text{Softmax}\left({z}_{i}\right)=\frac{{e}^{{z}_{i}}}{{\sum\:}_{j}{e}^{{z}_{j}}}$$

(20)

where zi represents the logits for class i.

$$\:\text{Prediction}=\text{argmax}\left(\text{Softmax}\left(\text{CombinedOutput}\right)\right)$$

(21)

where the prediction corresponds to the class with the highest probability.

Justification about models selection

To capitalize on their complementary advantages in feature extraction, computational efficiency, and overall performance, EfficientNetB0 and ResNet50 were chosen. The vanishing gradient issue is lessened by ResNet50’s deep residual learning framework, which enables deeper networks to train efficiently while capturing high-level semantic characteristics that are essential for classification tasks. In contrast, EfficientNetB0 achieves high performance with fewer parameters by optimizing width, depth, and resolution through compound scaling. Additionally, its Squeeze-and-Excitation (SE) blocks improve feature recalibration, which lowers computing costs while increasing representation. The model gains computational efficiency and deep feature extraction by merging these two models. EfficientNetB0 guarantees lightweight but efficient representation learning, while ResNet50 offers a robust foundation for hierarchical feature learning. Because of this equilibrium, the model may generalise effectively across various data distributions without incurring undue computing costs. Due to particular trade-offs, several architectures such as DenseNet, Inception, and MobileNet were taken into consideration but ultimately rejected. Although DenseNet encourages feature reuse, its extensive concatenation results in a large processing burden. Although they are good at extracting multi-scale features, inception networks have more parameters and can be difficult to optimise. Despite being efficiency-optimized, MobileNet puts speed ahead of deep feature extraction, which makes it less appropriate for identifying complex patterns. These considerations led to the selection of the EfficientNetB0 and ResNet50 combo.

Applied example

Figure 2 illustrates the preprocessing of IDC and NON-IDC breast histopathology images, where each image is resized to 128 × 128 pixels. Given that each image has three color channels, the resulting image vector has dimensions of 128 × 128 × 3. All images undergo normalization using the mean and standard deviation values specified in the proposed model, ensuring that pixel values are scaled between 0 and 1. The normalized dataset is then prepared for training. Post-training, the IDC and NON-IDC images are evaluated using the trained model, as depicted in Fig. 3. In the final layer, two probabilities are computed using the softmax function. For IDC images, the second probability exceeds the first, resulting in a prediction vector of [0, 1], as highlighted in green in the upper portion of Fig. 3. Conversely, for NON-IDC images, the first probability is higher, yielding a prediction vector of [1, 0], as shown in green in the lower portion of Fig. 3.

Results and discussion

The dataset utilized in this study comprises breast histopathology images that have been resized to 128 × 128 pixels to ensure uniformity in model input dimensions. No data augmentation was performed, as the selected images were chosen from diverse dimensions and orientations, ensuring a balanced representation of both classes. This balanced distribution minimizes bias and enhances the model’s ability to generalize across different histopathological patterns.

The dataset used, referred to as Dataset-10262, is sourced from³⁴ and contains a total of 1,717 breast histopathology images. The dataset is organized into two folders named ‘0’ and ‘1’. The folder labeled ‘0’ contains all images related to NON-IDC, with each image labeled with ‘class0’ at the end of the filename, e.g., ‘10262_idx5_x51_y951_class0’. The folder labeled ‘1’ contains all images related to IDC, with each image labeled with ‘class1’ at the end of the filename, e.g., ‘10262_idx5_x701_y1201_class1’. In the programming process, the folder ‘0’ was renamed to ‘NON-IDC’ and the folder ‘1’ to ‘IDC’. However, during the experimental programming, label encoding was applied, which caused the class label for IDC to be encoded as ‘0’ and for NON-IDC as ‘1’, due to alphabetical order. Throughout the remainder of this paper, ‘0’ will refer to IDC and ‘1’ will refer to NON-IDC.

The Table 1 is showing the distribution of images within the dataset used for training and testing a classification model that distinguishes between IDC (Invasive Ductal Carcinoma) and NON-IDC (non-cancerous) breast histopathology images. The entire dataset contains 10,262 images, divided into two classes: IDC, labeled as ‘0’, and NON-IDC, labeled as ‘1’.

Table 1 Training and testing data from breast histopathology images.

Full size table

For the training phase, the dataset includes 400 IDC images and 553 NON-IDC images. During the testing phase, 354 IDC images and 500 NON-IDC images are used to evaluate the model’s performance. This distribution ensures that the model is trained and tested on a balanced representation of both classes, facilitating accurate prediction of IDC and NON-IDC cases.

Sample data from the training and testing sets are presented in Table 2.

Table 2 Sample data of training and testing for proposed work.

Full size table

The model chosen was a modification of EfficientNetB0 and ResNet50; the model was trained carefully with the right parameters tweaked to enhance its efficiency. That is, under the batch size of 16 which indicates the number of samples that are passed through the model before the parameters are updated, 20 epochs as the total number of feeds through entire training data set and a learning rate of 1e-4 controlling the step size at each iteration towards the minimum of the loss function, a significant possibility was created for achieving high level accuracy. The training process also used a compute momentum of 0. 9 which assists to speed up gradients vectors in the required directions ensuring faster converging and; weight decay of 1e-4 to reduce overfitting during the optimizing process. Since the present approach uses two models- ResNet50 (model_name1) and EfficientNetB0 (model_name2), these models were developed for classifying images of sizes 128*128 pixels Since deeper models while have more parameters, they would take longer time for computation and so the optimal size would be 128*128 This means that the model was supposed to distinguish between two classes, which were IDC (0) and the remaining. As the epochs progressed, the model’s loss significantly decreased from 0.028 to a remarkable 0.0015, demonstrating its effective convergence and robust learning capabilities, as illustrated in the Fig. 4.

Prediction of IDC and NON-IDC based on proposed model

Table 3 shows samples from processed IDC and NON-IDC images. Serial numbers 1 to 4 belong to the IDC class, represented by the vector [1, 0], and these were also predicted as [1, 0], indicating correct predictions. Serial numbers 6 to 10 belong to the NON-IDC class, represented by the vector [0, 1], and these were correctly predicted as [0, 1]. However, Serial No. 5 was actually IDC but predicted as [0, 1] (NON-IDC), and Serial No. 8 was actually NON-IDC but predicted as [1, 0] (IDC).

Table 3 Predicted classes from proposed model.

Full size table

Figure 5 illustrates all true and false predictions. The bars representing true classes are higher compared to those representing false classes.

Statistical results

The confusion matrix presented in Fig. 6 displays the performance of a classification model in distinguishing between two classes: “IDC” and “NON-IDC.” In this matrix, for the “IDC” class, the model made 268 correct predictions, meaning it accurately identified these cases as “IDC” when they were indeed “IDC” (True Positive). However, it also made 36 incorrect predictions where it labeled cases as “IDC” when they should have been “NON-IDC” (False Positive).

For the “NON-IDC” class, the model correctly identified 448 cases as “NON-IDC” (True Negative), but it mistakenly classified 12 cases as “NON-IDC” when they were actually “IDC” (False Negative).

Applying formulas (Eqs. 22, 23, 24, and 25) is essential for determining accuracy, recall, and the F1-score.

$$\text{Precision}=\text{TP/(TP+FP)}$$

(22)

$$\text {Recall}=\text{TP/(TP+FN)}$$

(23)

$$\text {F1} = {2^*}\text{(Precision*}\text{Recall)}/\text{(Precision+Recall)}$$

(24)

$$\text {Accuracy}=\text{(TP+TN)}/\text{(P+N)}$$

(25)

whereas FP denotes false positive, TP indicates true positive, and FN represents false negative.

Proposed Model achieved 94% accuracy as shown in Table 4. These results are based on test data,

Table 4 Confusion matrix results from proposed model.

Full size table

Figure 7 illustrates that the precision, recall, and F1-score for the classes IDC and NON-IDC are exceptionally high.

Fig. 7

Precision, recall and F1-score.

Full size image

Mistakenly predicted classes

The proposed model misclassified 36 IDC cases as NON-IDC and 12 NON-IDC cases as IDC, indicating some misinterpretations despite its overall high accuracy. Upon examining the incorrectly predicted images, it was found that some images from both classes shared structural similarities, which likely led to the misclassification. These images exhibited textures and patterns that were difficult for the model to distinguish between malignant and benign tissue. Examples of these misclassifications are shown in Fig-8, highlighting the challenge. To improve the model’s accuracy, future work should focus on better understanding these subtle textures and patterns through advanced texture analysis techniques, which may help the model learn to differentiate between IDC and NON-IDC cases more effectively.

Statistical test

Two tests were applied to the Ensemble Model: McNemar’s Test to evaluate misclassification patterns between actual and predicted classes, and Paired t-Test to assess the statistical significance of the mean differences between them. Both tests help analyze the consistency and reliability of the model’s predictions.

McNemar’s test (p = 0.0009)

McNemar’s test is used to determine whether there is a significant difference in the misclassification patterns of a model. It evaluates whether the errors made by the model follow a systematic pattern or occur randomly. In this case, the p-value of 0.0009 is much lower than the common significance threshold of 0.05. This means that the ensemble model’s misclassifications are not due to chance but rather follow a structured pattern. The significance of this test result suggests that the model’s predictions meaningfully differ from actual values, indicating a systematic influence of the model on the classification process. This implies that the model has learned distinct features and patterns from the data, rather than making arbitrary errors.

Paired t-test (p = 0.0005)

The paired t-test is used to compare the mean differences between the actual and predicted class labels. It assesses whether the model’s predictions significantly deviate from the actual values in a structured way. The p-value of 0.0005 is very small, indicating that there is a statistically significant difference between the predicted and actual classes. This suggests that the model consistently makes predictions that differ from the true values in a meaningful way. However, this does not necessarily indicate that the model is highly accurate—only that it is not randomly guessing. The result confirms that the model’s classification approach has a clear effect, reinforcing the idea that it is learning patterns rather than making arbitrary decisions.

ROC and precision-recall curve

Figure 9 presents the ROC curve, which shows the trade-off between the true positive rate (sensitivity) and the false positive rate across various classification thresholds. The Area Under the Curve (AUC) of 0.93 signifies a strong discriminatory ability of our model. Specifically, the AUC indicates a 93% probability that the model will rank a randomly chosen positive instance higher than a randomly chosen negative instance. This high AUC suggests that the model effectively minimizes false positives while maximizing true positives at different thresholds, indicating reliable classification performance. Figure 10 shows the Precision-Recall (PR) curve, which is particularly valuable for imbalanced datasets. This curve illustrates the trade-off between precision (the accuracy of positive predictions) and recall (the ability to identify all actual positive instances). The high Area Under the PR curve (AUPRC) of 0.96 indicates a strong balance between precision and recall. This suggests that the model is not only proficient at identifying a significant proportion of actual positive instances but also maintains a low rate of false positive predictions within those identified instances. In practical terms, this means that the model’s positive predictions are highly reliable, minimizing the likelihood of incorrectly classifying negatives as positives.

Comprehensive evaluation of model performance metrics

The performance of the proposed model can be evaluated using several key metrics as shown in Table 5, each of which offers a different perspective on the model’s accuracy and reliability.

Table 5 Model performance metrics.

Full size table

Mean absolute error (MAE)

MAE measures the average magnitude of errors between the predicted and actual values, disregarding their direction. This metric provides a straightforward indication of how close the predictions are to the true values, with lower values signifying better accuracy. For the proposed model, an MAE of 0.0628 suggests that the predictions are, on average, very close to the actual values, indicating strong predictive performance.

Cohen’s kappa

Cohen’s Kappa is a statistical measure that assesses the level of agreement between the predicted and actual classes, taking into account the possibility of agreement occurring by chance. A Kappa value of 0.8671 reflects a high level of agreement, significantly beyond what could be expected by random chance, demonstrating that the model’s predictions are highly reliable.

Matthews correlation coefficient (MCC)

MCC is a comprehensive metric that evaluates the quality of binary classifications by considering all elements of the confusion matrix: true positives, true negatives, false positives, and false negatives. The MCC value of 0.8690 for this model indicates a strong positive correlation between the predicted and actual classes, underscoring the model’s effectiveness in making accurate classifications.

Log loss

Log loss also known as logarithmic loss, quantifies the accuracy of the model by penalizing incorrect classifications, particularly those made with high confidence. A Log Loss of 0.3140 indicates that while the model is generally accurate, there is still some room for improvement, particularly in terms of the confidence level in its predictions.

Hamming loss

Hamming loss measures the fraction of labels that are incorrectly classified, making it the complement of accuracy in a multi-label classification setting. The proposed model’s Hamming Loss of 0.0628 implies that only about 6.28% of the labels were misclassified, further supporting the model’s overall strong performance.

Comparison with similar studies

For comparison, evaluation metrics such as accuracy, precision, recall, F1-score, and AUC-score are used to assess the models’ performance. Accuracy measures the overall correctness of the model, precision evaluates how well the model identifies positive instances, recall focuses on how well the model detects all actual positive instances, F1-score balances precision and recall, and AUC-score measures the model’s ability to discriminate between classes. These metrics provide a comprehensive view of the model’s effectiveness and robustness in the given task.

In various studies shown in Table 6, Ensemble Model consistently outperforms individual models in both accuracy and computational efficiency, making it more suitable for real-world applications. For instance, Study¹⁸ achieved 91.67% and 92.50% accuracy with a two-stage nuclei segmentation method, but its high computational complexity hinders real-time use, whereas Ensemble Model reached 94% accuracy with more efficient computation. Similarly, Study¹⁰ showed that ResNet50 achieved 92.2% accuracy, but its computational demands limit real-time application; Ensemble Model surpasses this with 94% accuracy and optimization for resource-constrained devices. In Study²⁸, the InceptionResNetV2 model achieved 91% accuracy in breast cancer risk prediction but struggled with adapting to diverse imaging techniques, while Ensemble Model provided higher accuracy and broader versatility. In Study²⁹, Vision Transformer achieved 92.6% accuracy for patch classification, yet integrating it into clinical workflows remains challenging; Ensemble Model, with 94% accuracy, offers a more seamless solution. Study³⁰ highlighted an Inceptionv3-based approach that exceeded 92% accuracy but required fine-tuning and was computationally demanding, whereas Ensemble Model improved upon this with better accuracy and efficiency. Study³¹ showed that U-Net and YOLO achieved 93.0% accuracy for tumor detection but were computationally expensive, while Ensemble Model offered even higher accuracy (94%) with less resource intensity. Combining EfficientNetB0 and ResNet50, the model achieved superior results with 94% accuracy, 0.0628 MAE, and 86% MCC, surpassing previous studies in both diagnostic precision and computational efficiency. In terms of performance, the proposed model achieves 94% accuracy, 0.0628 MAE, and an MCC of 0.8690, outperforming¹⁹, which reported 91% accuracy, and another approach using meta-learning with ResNet50, which achieved 88.2–88.9% accuracy. These results highlight the improved classification performance and robustness of the proposed model over existing methods. The proposed work achieves 94% accuracy, 94% precision, 92% recall, and 93% F1-score, which are comparable to²⁰’s 94% precision, 92% recall, and 93% F1-score. However, the proposed work demonstrates superior computational efficiency, making it more suitable for practical deployment in medical diagnostics. The proposed work achieves 94% accuracy, 94% precision, 92% recall, and 93% F1-score, outperforming²¹, which reports 88.6% accuracy, 90.1% precision, 87.4% recall, and 88.7% F1-score. This demonstrates better accuracy and efficiency in the proposed method, making it more practical for medical diagnosis.

Table 6 Comparison of proposed work with benchmarks.

Full size table

The computational efficiency of the proposed work is superior to¹⁰ due to the use of EfficientNetB0, which reduces computational overhead through compound scaling, and ResNet50’s deep residual connections that minimize network complexity. The proposed model requires fewer resources while maintaining high performance, making it more efficient for practical breast cancer diagnosis and screening compared to the multi-model approach used in¹⁰. The computational efficiency of the proposed work is superior to¹⁸ due to the use of EfficientNetB0, which optimizes network depth, height, and resolution with reduced computational overhead. Additionally, the integration of ResNet50’s residual learning minimizes the complexity of deep networks, making the model more resource-efficient compared to the two-stage segmentation and feature extraction method used in¹⁸, which involves multiple complex steps and higher computational requirements. The computational efficiency of the proposed work outperforms^28,29, and³⁰ by leveraging EfficientNetB0, which utilizes compound scaling to optimize depth, width, and resolution, thereby reducing the number of parameters and FLOPs (floating point operations) compared to the deeper, more complex architectures used in^28,29, and³⁰. Additionally, the integration of ResNet50 mitigates the vanishing gradient problem through residual connections, enabling effective learning with fewer layers and lower computational costs. This results in a more efficient model in terms of both memory usage and processing time, without compromising performance.

Conclusion

In this research, we propose a highly improved model developed from EfficientNetB0 and ResNet50 models to predict breast histopathology images as IDC and Non-IDC types. Through adopting such a fresh model, 94% accuracy and 96% AUC-Score is obtained, and this is different from past traditional methods in terms of distinguishing power and computational speed.

The methodology that was proposed in the current study involves going through the breast histopathology images through rigorous preprocessing. The images are then resampled to smaller sizes which are 128 × 128 pixels and are normalized by particular mean and standard deviation in order to make the data more similar and increase the model learning effectivity. It implies proper planning for the images in a way that will benefit the deep learning algorithms to read them to identify features that indicate IDC sufficiently.

In particular, EfficientNetB0 and ResNet50 are incorporated into the ensemble model in order not to rely solely on one network model. EfficientNetB0 performs well at a relatively small number of parameters, which is beneficial and a well-balanced model, and residual connections in ResNet50 are likely helpful in addressing problems of vanishing gradients for deeper architectures. The ensemble approach integrates these models’ outputs; it uses EfficientNetB0, a more efficient and scalable model coupled with ResNet50, a denser and accurate model. This leads to a more effective and high-performance solution for executing image classification solutions.

The model’s performance is evaluated through various metrics, including accuracy, precision, recall, F1-score, and several others. The ensemble model achieved a Mean Absolute Error (MAE) of 0.0628, indicating high predictive accuracy, and a Cohen’s Kappa of 0.8671, reflecting strong agreement between predicted and actual classes. The Matthews Correlation Coefficient (MCC) of 0.8690 further underscores the model’s effectiveness in making accurate classifications. While the Log Loss of 0.3140 suggests there is room for improvement in prediction confidence, the Hamming Loss of 0.0628 demonstrates a minimal rate of misclassification.

In comparison with similar studies, the ensemble model’s performance is exemplary. It surpasses individual models and previous benchmarks in accuracy and computational efficiency. For instance, while other models as benchmarks achieved accuracy rates around 75–93%, the ensemble model’s 94% accuracy and its efficient computational resource utilization present a significant advancement in breast cancer diagnosis. This makes the proposed model not only a valuable tool for precise cancer detection but also a practical solution for real-world applications, where both accuracy and efficiency are critical.

Limitations and future works

The proposed work achieved high classification performance; however, it has some limitations that must be acknowledged. The study utilized the Dataset-10,262, consisting of 1,717 breast histopathology images, with images resized to 128 × 128 pixels to maintain uniformity. While this dataset provided a balanced representation of IDC and NON-IDC cases, its size and diversity may limit the model’s generalizability. The training set included 400 IDC and 553 NON-IDC images, while the testing set comprised 354 IDC and 500 NON-IDC images. No data augmentation techniques were applied, which may have restricted the model’s exposure to a broader range of variations in histopathological patterns.

The proposed model is based on an ensemble of EfficientNetB0 and ResNet50, leveraging their complementary strengths in feature extraction and computational efficiency. While this combination has demonstrated superior classification performance, other deep learning architectures such as DenseNet, Vision Transformers, or hybrid CNN-transformer models and Grad-CAM or SHAP analysis could be explored in future work. These models have unique technical aspects and require parameter tuning, such as adjusting depth, width scaling, or attention mechanisms, to optimize their performance for histopathology image classification. Furthermore, validating the model on larger, more diverse datasets from different medical institutions can enhance its generalizability. Expanding the training set, incorporating data augmentation techniques, and employing domain adaptation strategies will further improve the robustness and applicability of the model for real-world clinical settings.

The generalizability of the proposed model is a critical aspect when evaluating its real-world applicability. While the model demonstrates high classification performance on the Dataset-10,262 histopathology images, its effectiveness on external datasets remains an important consideration. The lack of explicit external validation may introduce potential limitations, as models trained on a specific dataset may not always generalize well to unseen data due to domain shifts, variations in imaging techniques, or differences in histopathological staining protocols. To ensure robustness, future work should focus on testing the model on independent datasets from different medical institutions, incorporating diverse imaging conditions and patient demographics. Additionally, techniques such as domain adaptation, transfer learning with fine-tuning on external datasets, and cross-validation using multiple independent datasets can further validate the model’s reliability in broader clinical settings. This step is essential for assessing the model’s adaptability and ensuring its practical deployment in real-world diagnostic applications.

Data availability

The data used to support the findings of this study are available at https://www.kaggle.com/datasets/paultimothymooney/breast-histopathology-images.

Change history

22 September 2025
A Correction to this paper has been published: https://doi.org/10.1038/s41598-025-20244-x

References

Siegel, R. L., Miller, K. D., Wagle, N. S. & Jemal, A. Cancer statistics, 2023. CA Cancer J. Clin. 73 (1), 17–48. https://doi.org/10.3322/caac.21763 (2023).
Article PubMed Google Scholar
Sung, H. & Global Cancer Statistics. : GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries, CA. Cancer J. Clin., vol. 71, no. 3, pp. 209–249, 2021, (2020). https://doi.org/10.3322/caac.21660
Bidoli, E. et al. Worldwide age at onset of female breast cancer: A 25-Year Population-Based Cancer registry study. Sci. Rep. 9 (1). https://doi.org/10.1038/s41598-019-50680-5 (2019).
Swaminathan, H., Saravanamurali, K. & Yadav, S. A. Extensive review on breast cancer its etiology, progression, prognostic markers, and treatment. Med. Oncol. 40 (8). https://doi.org/10.1007/s12032-023-02111-9 (2023).
Papakonstantinou, A. et al. The conundrum of breast cancer and microbiome - A comprehensive review of the current evidence. Cancer Treat. Rev. 111 https://doi.org/10.1016/j.ctrv.2022.102470 (2022).
He, C., Liu, Y., Ye, S., Yin, S. & Gu, J. Changes of intestinal microflora of breast cancer in premenopausal women. Eur. J. Clin. Microbiol. Infect. Dis. 40 (3), 503–513. https://doi.org/10.1007/s10096-020-04036-x (2021).
Article CAS PubMed Google Scholar
Tomasetti, C., Li, L. & Vogelstein, B. Stem cell divisions, somatic mutations, cancer etiology, and cancer prevention. Sci. (80-). 355 (6331), 1330–1334. https://doi.org/10.1126/science.aaf9011 (2017).
Article ADS CAS Google Scholar
Cuthrell, K. M., Morton Cuthrell, K. & Tzenios, N. Breast Cancer: Updated and Deep Insights, Int. Res. J. Oncol., vol. 6, no. 1, pp. 104–118, [Online]. (2023). Available: https://www.researchgate.net/publication/371069531
Points, K. Breast Cancer risk in American women, pp. 8–11, (2014). https://www.cancer.gov/types/breast/risk-fact-sheet
Rejon Kumar, R. et al. Transforming breast Cancer identification: an In-Depth examination of advanced machine learning models applied to histopathological images. J. Comput. Sci. Technol. Stud. 6 (1), 155–161. https://doi.org/10.32996/jcsts.2024.6.1.16 (2024).
Article Google Scholar
Marmot, M. et al. The benefits and harms of breast cancer screening: an independent review. Lancet 380 (9855), 1778–1786. https://doi.org/10.1016/S0140-6736(12)61611-0 (2012).
Article Google Scholar
Asgari Taghanaki, S., Abhishek, K., Cohen, J. P., Cohen-Adad, J. & Hamarneh, G. Deep semantic segmentation of natural and medical images: a review. Artif. Intell. Rev. 54 (1), 137–178. https://doi.org/10.1007/s10462-020-09854-1 (2021).
Article Google Scholar
Allen, B., Agarwal, S., Coombs, L., Wald, C. & Dreyer, K. 2020 ACR data science Institute artificial intelligence survey. J. Am. Coll. Radiol. 18 (8), 1153–1159. https://doi.org/10.1016/j.jacr.2021.04.002 (2021).
Article PubMed Google Scholar
Bayram, B., Kunduracioglu, I., Ince, S. & Pacal, I. A systematic review of deep learning in MRI-based cerebral vascular occlusion-based brain diseases. Neuroscience 568, 76–94. https://doi.org/10.1016/j.neuroscience.2025.01.020 (2025).
Article CAS PubMed Google Scholar
Kowal, M., Filipczuk, P., Obuchowicz, A., Korbicz, J. & Monczak, R. Computer-aided diagnosis of breast cancer based on fine needle biopsy microscopic images. Comput. Biol. Med. 43 (10), 1563–1572. https://doi.org/10.1016/j.compbiomed.2013.08.003 (2013).
Article PubMed Google Scholar
Bengio, Y., Courville, A. & Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35 (8), 1798–1828. https://doi.org/10.1109/TPAMI.2013.50 (2013).
Article ADS PubMed Google Scholar
Spanhol, F. A., Oliveira, L. S., Petitjean, C. & Heutte, L. Breast cancer histopathological image classification using Convolutional Neural Networks, Proc. Int. Jt. Conf. Neural Networks, vol. 2016-Octob, pp. 2560–2567, (2016). https://doi.org/10.1109/IJCNN.2016.7727519
Hu, H. et al. April,., Breast cancer histopathological images recognition based on two-stage nuclei segmentation strategy, PLoS One, vol. 17, no. 4 (2022). https://doi.org/10.1371/journal.pone.0266973
Işık, G. & Paçal, İ. Few-shot classification of ultrasound breast cancer images using meta-learning algorithms. Neural Comput. Appl. 36 (20), 12047–12059. https://doi.org/10.1007/s00521-024-09767-y (2024).
Article Google Scholar
Coşkun, D. et al. A comparative study of YOLO models and a transformer-based YOLOv5 model for mass detection in mammograms. Turkish J. Electr. Eng. Comput. Sci. 31 (7), 1294–1313. https://doi.org/10.55730/1300-0632.4048 (2023).
Article Google Scholar
İ. PACAL, Deep learning approaches for classification of breast Cancer in ultrasound (US) images. Iğdır Üniversitesi Fen Bilim Enstitüsü Derg, 12, 4, pp. 1917–1927, (2022). https://doi.org/10.21597/jist.1183679
Pacal, I. Investigating deep learning approaches for cervical cancer diagnosis: a focus on modern image-based models. Eur. J. Gynaecol. Oncol. 46 (1), 125–141. https://doi.org/10.22514/ejgo.2025.012 (2025).
Article Google Scholar
Hassan, E., El-Rashidy, N. & Talaa, M. Review: mask R-CNN models. Nile J. Commun. Comput. Sci. 3 (1), 17–27. https://doi.org/10.21608/njccs.2022.280047 (2022).
Article Google Scholar
Hassan, E., Saber, A. & Elbedwehy, S. Knowledge distillation model for acute lymphoblastic leukemia detection: exploring the impact of nesterov-accelerated adaptive moment Estimation optimizer. Biomed. Signal. Process. Control. 94 https://doi.org/10.1016/j.bspc.2024.106246 (2024).
Saber, A., Elbedwehy, S., Awad, W. A. & Hassan, E. An optimized ensemble model based on meta-heuristic algorithms for effective detection and classification of breast tumors. Neural Comput. Appl. https://doi.org/10.1007/s00521-024-10719-9 (2024).
Article Google Scholar
Haq, I. et al. Exploring machine learning classifiers for breast Cancer classification. KSII Trans. Internet Inf. Syst. 18 (4), 860–880. https://doi.org/10.3837/tiis.2024.04.003 (2024).
Article Google Scholar
Idress, W. M. et al. Hybrid segmentation and 3D imaging: comprehensive framework for breast cancer patient segmentation and classification based on digital breast tomosynthesis. Biomed. Signal. Process. Control. 100 https://doi.org/10.1016/j.bspc.2024.106992 (2025).
Humayun, M., Khalil, M. I., Almuayqil, S. N. & Jhanjhi, N. Z. Framework for detecting breast Cancer risk presence using deep learning. Electron 12 (2). https://doi.org/10.3390/electronics12020403 (2023).
Kabir, S. et al. The utility of a deep learning-based approach in Her-2/neu assessment in breast cancer. Expert Syst. Appl. 238 https://doi.org/10.1016/j.eswa.2023.122051 (2024).
Xiao, M., Li, Y., Yan, X., Gao, M. & Wang, W. Convolutional neural network classification of cancer cytopathology images: taking breast cancer as an example, ACM Int. Conf. Proceeding Ser., pp. 145–149, (2024). https://doi.org/10.1145/3653946.3653968
Rahman, M. M. et al. Breast Cancer detection and localizing the mass area using deep learning. Big Data Cogn. Comput. 8 (7), 80. https://doi.org/10.3390/bdcc8070080 (2024).
Article CAS Google Scholar
Gengtian, S., Bing, B. & Guoyou, Z. EfficientNet-Based Deep Learning Approach for Breast Cancer Detection with Mammography Images, 8th Int. Conf. Comput. Commun. Syst. ICCCS 2023, pp. 972–977, 2023, (2023). https://doi.org/10.1109/ICCCS57501.2023.10151156
Nalifabegam, J., Ganeshbabu, C., Askarali, N., Natarajan, A. & Maheshwari, P. Cancer Classification Revolution: Employing Advanced Deep CNNs for Multi-Class Detection of Breast Irregularities, Proc. – 3rd Int. Conf. Smart Technol. Commun. Robot. 2023, STCR 2023, (2023). https://doi.org/10.1109/STCR59085.2023.10396886
https://www.kaggle.com/datasets/paultimothymooney/breast-histopathology-images.

Download references

Acknowledgements

This work is supported by the research fund of the University of Johannesburg, South Africa.

Funding

This work is supported by the research fund of the University of Johannesburg, South Africa.

Author information

Authors and Affiliations

Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg, 2006, South Africa
Tariq Shahzad & Khmaies Ouahada
School of Computer Science, National College of Business Administration and Economics, Lahore 54000, Pakistan
Tehseen Mazhar
Institute of Computing and Information Technology, Gomal University, Dera Ismail Khan, 29220, Pakistan
Sheikh Muhammad Saqib
Department of Computer Science, School Education Department, Government of Punjab, Layyah 31200, Pakistan
Tehseen Mazhar

Authors

Tariq Shahzad
View author publications
Search author on:PubMed Google Scholar
Tehseen Mazhar
View author publications
Search author on:PubMed Google Scholar
Sheikh Muhammad Saqib
View author publications
Search author on:PubMed Google Scholar
Khmaies Ouahada
View author publications
Search author on:PubMed Google Scholar

Contributions

Tariq Shahzad and Tehseen Mazhar perform the Original Writing Part, Software, and Methodology; Sheikh Muhammad Saqib and Tehseen Mazhar perform investigation, design and Conceptualization, Khmaies Ouahada, Tariq Shahzad and Tehseen Mazhar perform related work part and manage results and discussions; Tariq Shahzad and Tehseen Mazhar perform manage results and discussion; Tehseen Mazhar and Sheikh Muhammad Saqib performs Rewriting, design Methodology, and Visualization.

Corresponding authors

Correspondence to Tariq Shahzad or Tehseen Mazhar.

Ethics declarations

Consent for publication

Not Applicable.

Competing interests

The authors declare no competing interests.

Ethical approval and consent to participate

There is no need for approval because data is available at https://www.kaggle.com/datasets/paultimothymooney/breast-histopathology-images.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this Article was revised: In the original version of this Article, Tariq Shahzad, Tehseen Mazhar, Sheikh Muhammad Saqib and Khmaies Ouahada were incorrectly added as equally contributing authors. In addition, the Author Contributions section contained an error. Full information regarding the corrections made can be found in the correction for this Article.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Shahzad, T., Mazhar, T., Saqib, S.M. et al. Transformer-inspired training principles based breast cancer prediction: combining EfficientNetB0 and ResNet50. Sci Rep 15, 13501 (2025). https://doi.org/10.1038/s41598-025-98523-w

Download citation

Received: 10 December 2024
Accepted: 14 April 2025
Published: 18 April 2025
DOI: https://doi.org/10.1038/s41598-025-98523-w

Subjects

Abstract

Similar content being viewed by others

A hybrid deep learning model EfficientNet with GRU for breast cancer detection from histopathology images

Hybrid convolutional neural network and bi-LSTM model with EfficientNet-B0 for high-accuracy breast cancer detection and classification

Leveraging an ensemble of EfficientNetV1 and EfficientNetV2 models for classification and interpretation of breast cancer histopathology images

Introduction

Literature work

Proposed methodology

Tunning of parameters

Breast histopathology images

Image normalization

EfficientNetB0 model

Module-1

Module-2

Module-3

Module-4

Module-5

Output

ResNet50 model

Stage 1

Stage 2 to 5

Average pooling and flattening

Ensemble model construction

Softmax prediction

Justification about models selection

Applied example

Results and discussion

Prediction of IDC and NON-IDC based on proposed model

Statistical results

Mistakenly predicted classes

Statistical test

McNemar’s test (p = 0.0009)

Paired t-test (p = 0.0005)

ROC and precision-recall curve

Comprehensive evaluation of model performance metrics

Mean absolute error (MAE)

Cohen’s kappa

Matthews correlation coefficient (MCC)

Log loss

Hamming loss

Comparison with similar studies

Conclusion

Limitations and future works

Data availability

Change history

22 September 2025

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Consent for publication

Competing interests

Ethical approval and consent to participate

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links