Introduction

Cancer is a serious health issue globally. Among those, breast cancer (BC) is considered the most widespread and deadliest type of cancer that grows in women’s breast tissue. Moreover, the death rates from BC are extremely high compared with other cancer types1. BC is the second primary reason for cancer-related demise in women around the world. Even with the swift developments in medicine, HI analysis remains the prevalent method for analyzing BC2. The automated and accurate classification of higher-resolution HIs is the keystone and obstruction for other intensive studies, namely nuclei localization, mitosis detection, and gland segmentation3. Furthermore, the histopathological analysis provides more accurate information to identify cancer and measure its impact on the adjacent tissue. Initial identification might raise the possibility of effective treatment and survival4. Automated BC diagnosis through examining HIs plays an important role in patients and their diagnosis. Moreover, the analysis result might be influenced by the expertise level of the pathologists engaged5. Hence, CAD of HIs plays a major part in BC diagnosis as well as prediction. Current developments in image processing and machine learning (ML) have assisted the growth of CAD systems to detect and diagnose BC from HIs quickly and with extreme precision6. The CAD system examines the HIs of the specimen tissue, identifies the histopathological structures related to non-cancerous and cancerous conditions, and classifies the HIs correspondingly into benign and malignant classes.

The major difficulties related to the BC classification of HIs comprise the natural intricacy of HI, namely, cell overlapping, slight variations among images, and irregular color spread7. At present, DL approaches have seen significant development and accomplished extraordinary outcomes in the image processing and computer vision (CV) domain. DL methods can autonomously extract features, obtain data from data spontaneously, and grasp complex abstract representations of data8. DL solves the difficulties of conventional feature extraction and is effectively implemented in CV, biomedical science, and various domains, encouraging numerous researchers to use this method in HI classification9. CNNs are a broadly utilized DL type, similarly performing well on image classification and feature extraction. This outcome has laid the basis for the CNN application in HI classification10. BC pose a threat if not detected in the initial stage, often leading to poor outcomes. Early and accurate detection of cancerous tissues is significant for improving survival rates. Histological examination provides critical insights but is labor-intensive and prone to human error. Incorporating computational techniques and advanced imaging can improve diagnostic accuracy. Also, using DL provides the potential to handle intrinsic patterns in tissue samples, assisting faster and more reliable clinical decisions. Developing such automated systems can ultimately assist in enhancing the patient care and reducing the burden on healthcare professionals.

In this manuscript, a Leveraging Medical Imaging for Early Breast Cancer Detection Using Deep Leaning (LMIBCD-DL) model is proposed. The main contributions of this paper are summarized as demonstrated:

  • The LMIBCD-DL approach presents a reliable tool for supporting automated BC diagnosis in medical applications.

  • WF is used for the image pre-processing step to effectively reduce noise while preserving important tissue structures.

  • The squeeze-and-excitation ResNet (SE-ResNet) model is applied for feature extractors to capture rich and discriminative representations from complex histopathological patterns.

  • The BiLSTM network is used for the BC detection process.

  • The comparative results demonstrated better performance over the existing methodologies under dissimilar measures.

Literature survey on BC detection

Ma et al.11 proposed LMCNet, a lightweight BC multi-classification network (LMCNet). The model incorporates Parallel Convolution (PAConv), Depthwise Separable Convolution (DSC), and a Spatial and Channel Attention Enhancement (SAE) module within a refactored ShuffleNetV2 framework for mitigating computational complexity while maintaining high recognition performance. Yusuf et al.12 developed an Enhanced Shallow Convolutional Neural Network (ES-CNN) technique for efficient multi-class classification of Breast HI (BHI). The method also utilizes magnification- and patient-dependent architectural design for mitigating computational utilization and training time while maintaining high classification accuracy. Natarajan et al.13 developed an advanced forecasting model named the Dynamic Harris Hawks Optimization GRU (DHH-GRU) to predict BC. The model integrates the Gated Recurrent Unit (GRU) for capturing sequential dependencies and the Dynamic Harris Hawks Optimization (DHH) approach for parameter optimization, assisted by Fast Fourier Transform (FFT) and Principal Component Analysis (PCA) for feature extraction and dimensionality reduction. Peta and Koppu14 developed an explainable DL technique by incorporating Adaptive Unsharp Mask Filtering (AUMF) for noise reduction and the Explainable Soft Attentive EfficientNet (ESAE-Net) for precise tumor classification. The model also employs Gradient-Weighted Class Activation Mapping (Grad-CAM), Shapley Additive Explanations (SHAP), Contextual Importance and Utility (CIU), and Local Interpretable Model-Agnostic Explanations (LIME) for visualizing and explaining model predictions. Aldakhil et al.15 introduced a technique by incorporating the Efficient Channel-Spatial Attention Network (ECSAnet) with conventional ML, including Decision Tree (DT) and Logistic Regression (LR) methods. The model also utilizes AM for improving feature extraction across spatial and channel dimensions. Xing et al.16, a weight-based recursive hierarchical bootstrapping RIME approach (WHRIME) is introduced. A weighted technique is launched depending on solution quality variances to uphold population variety and improve convergence precision. Obayya et al.17 created an AOA with a DL-based histopathological BC classification (AOADL-HBCC) method. This method integrates Median Filtering (MF) for noise removal, contrast enhancement for improved image quality, feature extraction using the Arithmetic Optimization Algorithm (AOA) with SqueezeNet, and classification through a Deep Belief Network (DBN) optimized with Adamax to achieve high diagnostic accuracy.

Xie et al.18 recommended a Single HI Super-Resolution Classification network (SHISRCNet) that encompasses two models: Super-Resolution (SR) and Classification (CF) models. Yu et al.19 precisely anticipate the tunnel boring machine (TBM) advance rate by employing a hybrid DL technique incorporating attention mechanism (AM) model. The model also comprises Residual Network (ResNet)-based feature extraction, and Long Short-Term Memory (LSTM)-based sequence modeling. Khan, Asif, and Bilal20 proposed a technique by employing a hybrid deep dense learning model that integrates ResNet50, EfficientNetB1, the proposed ProDense block, and the Vision Transformer L16 (ViT-L16) methods. The model also utilizes deep transfer learning (DTL), transformer-based attention, and a stack ensemble technique for extracting high-value features and improve classification performance. Addo et al.21 proposed a BC Histopathology Image Convolutional Network (BCHI-CovNet) methodology by integrating multiscale Depth-Wise Separable Convolution (DWSC), an additional pooling module, and a multi-head self-attention (MHSA) mechanism for extracting discriminative features while mitigating computational complexity. Thatha et al.22 developed an advanced methodology for BC diagnosis by incorporating DenseNet-41 for complex spatial feature extraction. The model also achieves robust and sequential feature learning by using AlexNet integrated with GRU networks. Also, the Hippopotamus Optimization Algorithm (HOA) is employed for optimization. Balasubramanian et al.23 improved BC diagnosis by utilizing ensemble DL methods that integrate diverse CNNs for robust feature extraction. The method also utilizes image patching technique and achieves high classification accuracy across both whole-slide images (WSIs) and microscopic histopathology images. Chitta, Sharma, and Yandrapalli24 introduced a hybrid DL technique by integrating a customized EfficientNetV2 and a modified ViT models by incorporating the convolutional capabilities of EfficientNetV2 with the long-range dependency modeling of ViT. The methodology also utilizes the merits of both CNN and transformer architectures for precise BC detection. Ashraf, Alam, and Sakib25 improved BC classification by incorporating self-supervised contrastive learning with ResNet architectures for robust feature extraction. Also, a lightweight hybrid model integrating ResNet50 and the Inception module is used for enhancing computational efficiency while maintaining high classification accuracy. Chikkala et al.26 presented a model by utilizing a Bidirectional Recurrent Neural Network (BRNN). The approach also integrates ResNet50-based transfer learning (TL), GRU, residual collaborative branches, and AMs for extracting discriminative features effectively. Rahman et al.27 improved BC HI classification by employing an Attention-Guided Deep Broad Convolutional Neural Network (ADBNet) technique. The model also integrated Deep Broad Block (DBB) for handling multi-magnification images and Generative Adversarial Network (GAN) integrated with diffusion for dataset augmentation. Azmoodeh-Kalati et al.28 improved the diagnosis of BC by utilizing CNNs, specifically EfficientNetV1 and EfficientNetV2, for binary classification of HIs. Data augmentation and TL are used for addressing limited annotated data, while Grad-CAM improves model interpretability. Ensemble learning (EL) additionally improves classification performance, ensuring robust and reliable predictions. Hameed et al.29 developed the Adversarial Residual Vision Transformer (ARiViT) model by integrating Residual Learning (RL), Vision Transformers (ViTs), CNNs, and GANs to handle noisy, low-quality, and complex tumor images for reliable detection. Hameed, Zameer, and Raja30 improved automated skin cancer detection by employing CNNs for local feature extraction and ViTs for capturing global contextual relationships in dermoscopic images. The study utilized International Skin Imaging Collaboration (ISIC) dataset for experimentation. Hameed et al.31 enhanced a model by utilizing RL and ViT (ReLViT) for improved feature representation, CNNs for local pattern extraction, GANs for image quality enhancement, and YOLO for precise tumor localization. Table 1 summarizes the existing studies on BC HI evaluation.

Table 1 Summary of existing studies on BC HI analysis, emphasizing utilized techniques, models, datasets, and key contributions.

The existing studies exhibit limitations due to dependence on high-resolution images, which limits the performance on LR inputs. Also, few techniques such as AOADL-HBCC, SHISRCNet, and ADBNet improve feature extraction, as they often involve high computational cost and intrinsic architectures, making deployment on resource-constrained devices challenging. Additionally, some techniques effectually incorporate multi-magnification handling, noise robustness, and interpretability thorugh EL, AM, and TL models enhance accuracy. Optimization-based methods (HOA, DHH) enhance convergence but may suffer from slow training on large-scale datasets. Moreover, GAN-based augmentation strategies are not properly explored in balancing dataset diversity with realistic feature preservation. Overall, there is a research gap while addressing simultaneous efficiency, interpretability, multi-magnification adaptability, and high-accuracy classification in BC HI analysis.

Materials and methods

In this manuscript, a novel LMIBCD-DL approach is introduced. The LMIBCD-DL approach presents a reliable tool for supporting automated BC diagnosis in medical applications. To accomplish that, the LMIBCD-DL model has image pre-processing, feature extractor, BC detection, and parameter tuning. Figure 1 specifies the overall workflow of the LMIBCD-DL model.

Fig. 1
figure 1

Overall workflow of the LMIBCD-DL model.

Noise reduction through wiener filter

Initially, the LMIBCD-DL method applies WF for the image preprocessing step to effectively reduce noise while preserving crucial tissue structures32. This method is chosen as it effectually mitigates noise without blurring significant tissue information that are considered crucial in histopathological assessment. Additionally, this model adapts to local image discrepancy, unlike median filter (ML) or simple smoothing technique. Furthermore, WF ensures that meaningful patters are captured by WF and also efficiently preserves edges and fine structures that carry diagnostic data. WF also provides a balanced trade-off between noise removal and structural fidelity, compared to more intrinsic denoising models. Thus, it improves the reliability of downstream DL methods by giving cleaner, high-quality input.

WF is a great noise reduction method applied in image preprocessing, mainly efficient in improving the quality of medical images like CT scans, MRI, and mammograms for cancer recognition. It works according to statistical valuation, targeting to reduce the mean square error (MSE) among the new and the restored image. Unlike simpler filters, the WF adjusts to the local image modification, maintaining edges and fine details while eliminating Gaussian noise. This is important in cancer imaging, whereas correct visualization of smaller anomalies like microcalcifications or tumors is important. By increasing image clarity, WF advances the performance of the following classification and feature extraction phases. It additionally aids in decreasing false negatives and positives in the diagnostic procedure. Generally, it plays an important role in constructing a strong and noise-resilient BC detection pipeline.

SE-ResNet for TL

For feature extractors, the SE-ResNet model is employed to capture rich and discriminative representations from complex histopathological patterns. This model facilitates the network to adaptively recalibrate channel-wise feature responses and also enhances the focus on the most informative regions in the intrinsic HI. Moreover, richer and more discriminative representations are efficiently captured by this model and also assists in training deeper networks without losing vanishing gradients, unlike conventional CNNs. The model also illustrates superior feature learning with minimal computational cost. Thus, the technique provides a robust and effective model for extracting meaningful patterns from high-resolution histopathology data. Figure 2 depicts the architecture of SE-ResNet model.

Fig. 2
figure 2

Architecture of SE-ResNet model.

The residual neural network comprises residual and direct mapping parts33. The RL module mainly contains batch normalization (BN), ReLU, and convolutional (Conv) layers. It carries Conv operation on local parts of input image employing Conv kernel with diverse sizes and crosses the image with a specific stride for creating several mapping features. Let input \(\:{x}^{a}\), while \(\:a\in\:Z[1,\:n]\), the feature extraction process over the Conv region. The mathematical model of the Conv procedure is specified in Eq. (1):

$$\:{x}^{acov}=\left(\sum\:{x}^{a}\ddot{\text{*}}k\right)$$
(1)

Now \(\:{x}^{acov}\) signifies the outcome of computation accomplished by the matrix Conv of \(\:a\) th input pixel, subsequently, \(\:{x}^{cov}\) comprises \(\:{x}^{acov}\) and \(\:{x}^{a}\) represents the point matrix of the input pixel; \(\:k\) is the Conv kernel which links the input to output feature mapping. The Conv output dimension \(\:{x}^{acov}\) is specified in Eq. (2). The input image is QxQ, the size of Conv kernel FxF, the padding is \(\:P\), the step stride is \(\:S\) and the pixel counts are filled.

$$\:{x}^{acov}=[\left(Q-F+2P\right)/S]+1$$
(2)

The layer of BN mainly decreases the features eliminated by the Conv layer, retaining only the salient aspects. This is vital to improve the precision of classification. To combine learnable parameters \(\:\gamma\:\) and \(\:\beta\:\), the BN layer returns the feature distribution.

$$\:\left\{\begin{array}{l}\mu\:=\frac{1}{{m}_{batch}}{\varSigma\:}_{{m}_{batch}}{x}_{r}^{cov}\\\:{\sigma\:}^{2}=\frac{1}{{m}_{batch}}{\varSigma\:}_{{m}_{batch}}({x}_{r}^{cov}-\mu\:{)}^{2}\\\:{x}^{n}=\frac{{x}_{\text{r}}^{COV}-\mu\:}{\sqrt{{\sigma\:}^{2}+\epsilon\:}}\\\:{y}^{n}=\gamma\:{x}^{n}+\beta\:\end{array}\right.$$
(3)

Now \(\:{x}^{n}\) and \(\:{y}^{n}\) represent the input and output of \(\:N\) th observation in smaller batches, \(\:{m}_{batch}\) indicates the size of the smaller batch, \(\:\epsilon\:\) signifies a constant closer to zero for assuring numerical stabilities like \(\:\beta\:\) refers to bias parameter and \(\:\gamma\:\) refers to scaling parameter. The activation function functions to present non-linear factors, feature map exceeds a threshold value to output done by the function, therefore accomplishing the purpose of extracting features. The presented method utilizes the activation function of a modified linear unit, ReLU that is described.

$$\:\sigma\:\left(x\right)=\text{m}\text{a}\text{x}\left(x,0\right)$$
(4)
$$\:{x}_{r}^{cov}=\sigma\:\left({x}_{p}^{cov}\right)$$
(5)

The ResNet combines a Conv element, allowing it to remove features. By immediately integrating shallow aspects into the removal of deep features.

$$\:{x}^{cn}={x}^{a}\text{*}{J}_{1\times\:1}+F\left({x}^{a},\:{W}^{n}\right)$$
(6)

The residual block is separated into dual segments: the residual and direct mapping parts. It reacts to \(\:{x}^{a}\text{*}{J}_{1\times\:1}\), here \(\:{x}_{a}^{mc}\) represents the output of the residual block, \(\:F\left(xW\right)\) is the residual part that comprises two or three Conv operations and \(\:{J}_{1\times\:1}\) depicts a 1 × 1 Conv kernel. In cases where the dimensions of \(\:{x}^{a}\) and \(\:{x}_{a}^{mc}\) vary, a 1 × 1 Conv module is needed and combined into the left branch. The outcome from 1 residual block signified as \(\:{x}_{a}^{1c}\), acts as the input for succeeding residual blocks. The output from every residual block is depicted by \(\:{x}_{a}^{mc}\), now \(\:m\) depicts the residual block counts.

$$\:{y}^{mc}=\frac{1}{H\times\:W}{\sum\:}_{i=1}^{H}{\sum\:}_{k=1}^{W}{x}_{{b}^{mc}}\left(i,k\right)$$
(7)

Here \(\:{y}^{mc}\) denotes the 1D vector, \(\:{x}_{{b}^{mc}}\) represents the element positioned on \(\:b\) th feature mapping \(\:(\)with \(\:1\le\:b\le\:512)\) inside the output of the last residual block, and \(\:m\) signifies the overall counts of feature mapping outputted by the residual blocks. As the layer counts in conventional NN rise, learns the identity function turns progressively challenging, frequently inducing sub-optimal training results and network degradation. Here, \(\:C\) depicts the importance of specified channels.

$$\:{F}_{s}=\frac{1}{H\times\:W}{\sum\:}_{i=1}^{H}{\sum\:}_{k=1}^{W}{x}_{25{6}^{8c}}\left(i,k\right)$$
(8)

Whereas HxW indicates the dimension of mapping features, \(\:{F}_{s}\) signifies response after the compression operation and \(\:{x}_{a}^{mc}(i,k)\) represents pixel position value \(\:(i,k)\) in the channel of the feature graph with shape and size \(\:H\text{x}C\).

The excitation phase of the SE module primarily decreases the input size \(\:C\) through the first FC layer. This decrease not only improves the capability of models for distinguishing vital features and decreasing noise but also reduces the parameter counts inside the layers of FC, thus enhancing the efficiency of modules. Afterward, dimension \(\:C\) is reinstated over the ReLU activation function, succeeded by a secondary FC layer. To multiply these weights with the new channel aspects, the SE module creates novel and weighted aspects that operate the final output.

$$\:{F}_{e}=sig\left({M}_{2}\sigma\:\left({M}_{1}{F}_{s}\right)\right)$$
(9)

Here \(\:{M}_{1}\) and \(\:{M}_{2}\) refers to first and second FC layers, \(\:sig\) represents Sigmoid function, \(\:{F}_{e}\) depicts the excitation response and \(\:\sigma\:\) denotes activation function ReLU.

BC detection using BiLSTM method

Additionally, the BiLSTM network is employed for the BC detection process34. This method efficiently captures contextual dependencies in sequential data and also allows the model in comprehending patterns from diverse perspectives. Also, this model processes data in both forward and backward directions, unlike standard LSTM. The recognition of subtle structures is also improved by the bidirectional learning technique, which may indicate malignancy. BiLSTM also handles intrinsic feature sequences extracted by SE-ResNet, compared to conventional classifiers, thus resulting in higher detection accuracy. The capability of the model in retaining long-term dependencies also ensures that crucial tissue patterns are not missed. Thus, BiLSTM provides a robust and reliable approach for accurate BC detection from extracted features. Figure 3 represents the framework of Bi-LSTM.

Fig. 3
figure 3

Architecture of Bi-LSTM method.

The LSTM neural network is a kind of RNN which can transfer data from the preceding state to the existing state utilizing memory cells. LSTM has progressed to tackle the problem of near-gradient hindering far‐gradient and gradient explosion produced by long‐term dependency concerns. The LSTM NN cell, over its forget gate, cell unit state, and input and output gates, selectively retains valuable data and discards unimportant information while training to resolve these problems.

$$\:{F}_{t}=\sigma\:({X}_{t}{W}_{xf}+{H}_{t-1}{W}_{hf}+{b}_{f})$$
(10)
$$\:{I}_{t}=\sigma\:\left({X}_{t}{W}_{xi}+{H}_{t-1}{W}_{hi}+{b}_{i}\right)$$
(11)
$$\:{\stackrel{\sim}{C}}_{t}=tanh\left({X}_{t}{W}_{xc}+{H}_{t-1}{W}_{hc}+{b}_{c}\right)$$
(12)
$$\:{C}_{t}={F}_{t}\odot\:{C}_{t-1}+{\stackrel{\sim}{C}}_{t}$$
(13)
$$\:{O}_{t}=\sigma\:\left({X}_{t}{W}_{xo}+{H}_{t-1}{W}_{ho}+{b}_{o}\right)$$
(14)
$$\:{H}_{t}={O}_{t}\odot\:tanh\left({C}_{t}\right)$$
(15)

Here \(\:tanh\) represents the activation function of the output layer; \(\:{X}_{t}\) indicates the sequence of input; \(\:W\) refers to the weighted vector; \(\:b\) refers to the biased vector and \(\:\sigma\:\) denotes the sigmoid function.

The utilization of context data in time-series data, the Bi-LSTM network is progressed depending on the LSTM network. Dual LSTM networks are forward and backward propagation. Forward propagation takes every element of the input sequence and processes it to pass the data to succeeding elements, backward propagation reads the sequence of input in opposite order. The previous and upcoming contextual relations are acquired concurrently by the bi-directional propagation model.

$$\:{h}_{f}=LSTM\left({X}_{t},{h}_{t-1}^{f}\right)$$
(16)
$$\:{h}_{b}=LSTM({X}_{t},{h}_{t-1}^{b})$$
(17)
$$\:{h}_{t}={w}_{t}{h}_{f}+{v}_{t}{h}_{b}+{b}_{t}$$
(18)

Now \(\:{h}_{b}\) and \(\:{h}_{f}\) refer to reverse and forward outputs, \(\:{\nu\:}_{t}\), and \(\:{w}_{t}\)indicates reverse and forward weighted outputs, \(\:{b}_{t}\) is the hidden layer bias. Whereas \(\:{h}_{t}\) signifies the projected value, \(\:{X}_{t}\) represents the sequence of signals, \(\:{h}_{t}^{b}\) refers to output in reverse to the preceding moment and \(\:{h}_{t}^{f}\) indicates output at the preceding moment. Algorithm 1 demonstrates the Bi-LSTM technique.

Algorithm 1
figure a

Bi-LSTM model.

Evaluation metrics and performance assessment

The experimental evaluation of the LMIBCD-DL approach is conducted using the BreakHis dataset35. The class imbalance is addressed by utilizing data augmentation and weighted loss functions, thus ensuring that benign and malignant classes were efficiently learned by the model, thereby enhancing classification performance and robustness. The technique is simulated using Python 3.6.5 on a PC with an i5-8600k, 250GB SSD, GeForce 1050Ti 4GB, 16GB RAM, and 1 TB HDD. Parameters include a learning rate of 0.01, ReLU activation, 50 epochs, 0.5 dropout, and a batch size of 5. The dataset includes 100X and 200X datasets, with 2081 samples in the 100X dataset and 2013 samples in the 200X dataset, each containing two classes as shown in Table 2. Figure 4 illustrates the sample image.

Table 2 Details of dataset.
Fig. 4
figure 4

Sample images (a) benign and (b) malignant.

Figure 5 depicts the confusion matrices created by the LMIBCD-DL approach with 100X Dataset under 80:20 and 70:30 of training phase (TRAPA) and testing phase (TESPA). The outcomes specify that the LMIBCD-DL method has effectively recognized benign and malignant samples under all classes.

Fig. 5
figure 5

Confusion matrices on 100X Dataset (a-b) 80%TRAPA and 20%TESPA and (c-d) 70%TRAPA and 30%TESPA.

Table 3; Fig. 6 indicate the BC detection results of the LMIBCD-DL approach with 100X Dataset under 80%TRAPA and 20%TESPA. The results exemplify that the LMIBCD-DL approach properly recognized varied classes. With 80%TRAPA, the LMIBCD-DL approach presents average \(\:acc{ur}_{y}\), \(\:sen{s}_{y}\), \(\:spe{c}_{y}\), \(\:{F1}_{score}\), and MCC of 98.38%, 98.38%, 98.38%, 98.59%, and 97.18%, correspondingly. Also, with 20%TESPA, the LMIBCD-DL method achieves average \(\:acc{ur}_{y}\), \(\:sen{s}_{y}\), \(\:spe{c}_{y}\), \(\:{F1}_{score}\), and MCC of 97.90%, 97.90%, 97.90%, 98.31%, and 96.64%, respectively.

Table 3 BC detection of LMIBCD-DL approach with 100X dataset under 80%TRAPA and 20%TESPA.
Fig. 6
figure 6

Average of LMIBCD-DL approach with 100X Dataset under 80%TRAPA and 20%TESPA.

Table 4; Fig. 7 show the BC detection results of the LMIBCD-DL method with 100X Dataset under 70%TRAPA and 30%TESPA. The outcomes demonstrate that the LMIBCD-DL method correctly recognized different classes. On 70%TRAPA, the LMIBCD-DL model provides an average \(\:acc{ur}_{y}\), \(\:sen{s}_{y}\), \(\:spe{c}_{y}\), \(\:{F1}_{score}\), and MCC of 98.25%, 98.25%, 98.25%, 98.34%, and 96.67%, respectively. Also, with 30%TESPA, the LMIBCD-DL model provides average \(\:acc{ur}_{y}\), \(\:sen{s}_{y}\), \(\:spe{c}_{y}\), \(\:{F1}_{score}\), and MCC of 98.72%, 98.72%, 98.72%, 98.64%, and 97.27%, correspondingly.

Table 4 BC detection of LMIBCD-DL approach with 100X dataset under 70%TRAPA and 30%TESPA.
Fig. 7
figure 7

Average of LMIBCD-DL approach with 100X Dataset under 70%TRAPA and 30%TESPA.

In Fig. 8, the TRA \(\:acc{ur}_{y}\) (TRAAY) and validation \(\:acc{ur}_{y}\) (VLAAY) outcomes of the LMIBCD-DL methodology with 100X Dataset under 70%TRAPA and 30%TESPA are shown. The figure emphasized that the TRAAY and VLAAY values illustrate an upward trend, reflecting the efficiency of the LMIBCD-DL approach with outstanding performance across numerous iterations.

Fig. 8
figure 8

\(\:Accu{r}_{y}\) curve of LMIBCD-DL method with 100X Dataset under 70%TRAPA and 30%TESPA.

In Fig. 9, the TRA loss (TRALSS) and VLA loss (VLALSS) graphs of the LMIBCD-DL approach with 100X Dataset under 70%TRAPA and 30%TESPA are depicted. It is signified that the TRALSS and VLALSS values elucidate downward tendencies, representing the capability of the LMIBCD-DL model to stabilize a trade-off among data fitting as well as generalization.

Fig. 9
figure 9

Loss curve of LMIBCD-DL method with 100X Dataset under 70%TRAPA and 30%TESPA.

Figure 10 exhibits the confusion matrices created by the LMIBCD-DL method with 200X Dataset under 80%:20% and 70%:30% of TRAPA/TESPA. The results denote that the LMIBCD-DL approach effectively recognized benign and malignant samples under all classes.

Fig. 10
figure 10

Confusion matrices on 200X Dataset (a-b) 80%TRAPA and 20%TESPA and (c-d) 70%TRAPA and 30%TESPA.

Table 5; Fig. 11 display the BC detection outcomes of the LMIBCD-DL approach with 200X Dataset under 80%TRAPA and 20%TESPA. The outcomes represent that the LMIBCD-DL approach properly identified diverse classes. With 80%TRAPA, the LMIBCD-DL model presents an average \(\:acc{ur}_{y}\), \(\:sen{s}_{y}\), \(\:spe{c}_{y}\), \(\:{F1}_{score}\), and MCC of 95.03%, 95.03%, 95.03%, 95.62%, and 91.30%, correspondingly. Also, with 20%TESPA, the LMIBCD-DL model presents an average \(\:acc{ur}_{y}\), \(\:sen{s}_{y}\), \(\:spe{c}_{y}\), \(\:{F1}_{score}\), and MCC of 97.72%, 97.72%, 97.72%, 98.01%, and 96.03%, correspondingly.

Table 5 BC detection of LMIBCD-DL approach with 200X dataset under 80%TRAPA and 20%TESPA.
Fig. 11
figure 11

Average of LMIBCD-DL approach with 200X Dataset under 80%TRAPA and 20%TESPA.

Table 6; Fig. 12 indicate the BC detection results of the LMIBCD-DL approach with 200X Dataset under 70%TRAPA and 30%TESPA. The results exemplify that the LMIBCD-DL approach properly recognized varied classes. With 70%TRAPA, the LMIBCD-DL approach presents average \(\:acc{ur}_{y}\), \(\:sen{s}_{y}\), \(\:spe{c}_{y}\), \(\:{F1}_{score}\), and MCC of 98.54%, 98.54%, 98.54%, 98.27%, and 96.55%, respectively. Also, with 30%TESPA, the LMIBCD-DL method presents average \(\:acc{ur}_{y}\), \(\:sen{s}_{y}\), \(\:spe{c}_{y}\), \(\:{F1}_{score}\), and MCC 97.97%, 97.97%, 97.97%, 97.68%, and 95.38%, correspondingly.

Table 6 BC detection of LMIBCD-DL approach with 200X dataset under 70%TRAPA and 30%TESPA.
Fig. 12
figure 12

Average of LMIBCD-DL approach with 200X Dataset under 70%TRAPA and 30%TESPA.

In Fig. 13, the TRAAY and VLAAY results of the LMIBCD-DL approach with 200X Dataset under 70%TRAPA and 30%TESPA are proven. The outcome emphasized that the TRAAY and VLAAY values reveal a growing trend, indicating the proficiency of the LMIBCD-DL framework with effective performance among some iterations.

Fig. 13
figure 13

\(\:Accu{r}_{y}\) curve of LMIBCD-DL approach with 200X Dataset under 70%TRAPA and 30%TESPA.

In Fig. 14, the TRALSS and VLALSS graphs of the LMIBCD-DL approach with 200X Dataset under 70%TRAPA and 30%TESPA are portrayed. It is noted that the TRALSS and VLALSS values illuminate a reducing trend, demonstrating the effectiveness of the LMIBCD-DL method in stabilizing a trade-off among data fitting and generalization.

Fig. 14
figure 14

Loss curve of LMIBCD-DL approach with 200X Dataset under 70%TRAPA and 30%TESPA.

In Table 7; Fig. 15, a detailed comparison study of the LMIBCD-DL model is clearly reported under the BreakHis dataset17,19,20,21,35,36. The outcomes revealed that the ResNet-LSTM, ViT-L16, BCHI-CovNet models attained lesser \(\:acc{ur}_{y}\) of 94.17%, 82.46%, and 89.62%, correspondingly. Furthermore, VGG16, InceptionV3, ResNet-50, MobileNetV3, and DTLRO-HCBC models have shown ineffectual detection results with least \(\:acc{ur}_{y}\) of 80.20%, 81.74%, 82.23%, 89.07%, and 93.60%, respectively. Whereas, the AOADL-HBCC model has exhibited considerable performance with \(\:acc{ur}_{y}\) of 96.82%, \(\:sen{s}_{y}\) of 82.09%, \(\:spe{c}_{y}\) of 95.17%, and \(\:{F1}_{score}\) of 81.20%. Furthermore, the ImageNet + VGG16 (IVNet) techniques has accomplished reasonable outcomes with \(\:acc{ur}_{y}\) of 97.05%, \(\:sen{s}_{y}\) of 82.90%, \(\:spe{c}_{y}\) of 92.51%, and \(\:{F1}_{score}\) of 85.69%. Finally, the LMIBCD-DL technique demonstrates superior performance with increased\(\:\:acc{ur}_{y}\) of 98.72%, \(\:sen{s}_{y}\) of 98.72%, \(\:spe{c}_{y}\) of 98.72%, and \(\:{F1}_{score}\) of 98.64%.

Table 7 Comparative study of LMIBCD-DL approach with existing models under the breakhis dataset1719,20,21,35,36.
Fig. 15
figure 15

Comparative study of LMIBCD-DL approach with existing models under the BreakHis dataset.

Table 8; Fig. 16 indicate the comparison analysis of the LMIBCD-DL approach with existing methods under the Breast Histopathology Images dataset22,37. The Bi-LSTM model attained an \(\:acc{ur}_{y}\) of 92.50%, \(\:sen{s}_{y}\) of 92.39%, \(\:spe{c}_{y}\) of 92.28%, and \(\:{F1}_{score}\) of 92.35%, while GRU illustrated improvement with an \(\:acc{ur}_{y}\) of 95.69%, \(\:sen{s}_{y}\) of 93.58%, \(\:spe{c}_{y}\) of 94.48%, and \(\:{F1}_{score}\) of 93.12%. AlexNet highlighted higher performance with an \(\:acc{ur}_{y}\) of 97.53%, \(\:sen{s}_{y}\) of 97.43%, \(\:spe{c}_{y}\) of 97.33%, and \(\:{F1}_{score}\) of 96.23%, surpassing VGGNet which achieved an \(\:acc{ur}_{y}\) of 93.68%, \(\:sen{s}_{y}\) of 92.23%, \(\:spe{c}_{y}\) of 93.26%, and \(\:{F1}_{score}\) of 91.41%. GoogleNet attained an \(\:acc{ur}_{y}\) of 94.60%, \(\:sen{s}_{y}\) of 93.34%, \(\:spe{c}_{y}\) of 95.54%, and \(\:{F1}_{score}\) of 93.65%, while ResNet additionally exhibited improved results with an \(\:acc{ur}_{y}\) of 96.41%, \(\:sen{s}_{y}\) of 94.67%, \(\:spe{c}_{y}\) of 96.81%, and \(\:{F1}_{score}\) of 95.70%. XceptionNet provided high performance with an \(\:acc{ur}_{y}\) of 97.74%, \(\:sen{s}_{y}\) of 96.71%, \(\:spe{c}_{y}\) of 97.83%, and \(\:{F1}_{score}\) of 97.65%. The LMIBCD-DL methodology attained the optimum results with an \(\:acc{ur}_{y}\) of 98.72%, \(\:sen{s}_{y}\) of 98.72%, \(\:spe{c}_{y}\) of 98.72%, and \(\:{F1}_{score}\) of 98.64%, emphasizing superior performance over all other techniques in the diagnosis of BC using HI.

Table 8 Comparison assessment of the LMIBCD-DL approach with existing methods under the breast histopathology images dataset22,37.
Fig. 16
figure 16

Comparison assessment of the LMIBCD-DL approach with existing methods under the Breast Histopathology Images dataset.

Table 9; Fig. 17 specify the computational time (CT) analysis of the LMIBCD-DL method with existing techniques under the BreakHis dataset. The LMIBCD-DL method attained the fastest performance with a CT of 2.07 s, illustrating remarkable efficiency. MobileNetV3 also depicted quick processing with a CT of 3.98 s, followed closely by Vit-L16 at 4.11 s and VGG16 model at 4.29 s. InceptionV3 required 5.66 s, while ResNet-50 model took 6.18 s. BCHI-CovNet and DTLRO-HCBC achieved CTs of 6.57 s and 7.70 s, respectively, with AOADL-HBCC approach slightly higher at 7.71 s. ResNet-LSTM recorded 7.28 s, and ImageNet integrated with VGG16 (IVNet) required the longest CT of 7.94 s. Thus, the LMIBCD-DL approach showed highest efficiency, while deep hybrid models generally required longer CTs.

Table 9 CT evaluation of the LMIBCD-DL methodology with existing models under the breakhis dataset.
Fig. 17
figure 17

CT evaluation of the LMIBCD-DL methodology with existing models under the BreakHis dataset.

Table 10 Error assessment of the LMIBCD-DL approach with existing models under the breakhis dataset.
Fig. 18
figure 18

Error assessment of the LMIBCD-DL approach with existing models under the BreakHis dataset.

Table 11 indicated the highest overall performance of the LMIBCD-DL approach under the BreakHis dataset. BiLSTM incorporated with BAS, without feature extraction but with parameter tuning, achieved an \(\:acc{ur}_{y}\) of 98.22%, \(\:sen{s}_{y}\) of 97.92%, \(\:spe{c}_{y}\) of 98.13%, and \(\:{F1}_{score}\) of 98.05%. Furthermore, by integrated SE-ResNet with BiLSTM, achieved an \(\:acc{ur}_{y}\) of 97.47%, \(\:sen{s}_{y}\) of 97.42%, \(\:spe{c}_{y}\) of 97.60%, and \(\:{F1}_{score}\) of 97.46%. A standard BiLSTM with bidirectional processing achieved an \(\:acc{ur}_{y}\) of 96.70%, \(\:sen{s}_{y}\) of 96.70%, \(\:spe{c}_{y}\) of 96.94%, and \(\:{F1}_{score}\) of 96.80%, while a unidirectional LSTM illustrated an \(\:acc{ur}_{y}\) of 96.16%, \(\:sen{s}_{y}\) of 96.09%, \(\:spe{c}_{y}\) of 96.15%, and \(\:{F1}_{score}\) of 96.08%. These results indicate that the LMIBCD-DL model and BiLSTM-based variants with parameter tuning consistently outperform simpler LSTM architectures in diagnosing BC from histopathological images. Finally, the LMIBCD-DL model attained highest results with an \(\:acc{ur}_{y}\) of 98.72%, \(\:sen{s}_{y}\) of 98.72%, \(\:spe{c}_{y}\) of 98.72%, and \(\:{F1}_{score}\) of 98.64%. Thus, the evaluaion confirms that incorporating the LMIBCD-DL model’s components and optimized parameters significantly improves diagnostic performance compared to simpler LSTM-based architectures.

Table 11 Ablation study analysis of the LMIBCD-DL approach under the breakhis dataset.

Conclusion

This paper designs and develops an LMIBCD-DL approach. The LMIBCD-DL approach presents a reliable tool for supporting automated BC diagnosis in medical applications. Initially, the LMIBCD-DL method applies WF for the image preprocessing step to effectively reduce noise while preserving important tissue structures. For feature extractors, the SE-ResNet model is employed to capture rich and discriminative representations from complex histopathological patterns. Additionally, the BiLSTM method is utilized for the BC detection process. The comparative results demonstrated a superior accuracy value of 98.72% over the existing methodologies under dissimilar measures and the BreakHis dataset. The limitations include the relatively small data size, which may limit the generalization of the findings. The performance is affected due to the restriction in the applicability to more diverse diagnostic scenarios. Furthermore, the model is not evaluated on multi-class BC subtypes and also external validation is lacking, which is crucial to ensure robustness. Moreover, real-world clinical variability, such as differences in staining protocols and imaging devices, has not been fully addressed. Future work may concentrate on analyzing the model on massive and multi-class datasets, performing rigorous external validation, and evaluating performance in real-world clinical scenarios. Integration with clinical workflows and evaluation in prospective studies could assist in bridging the gap between research and practical deployment. Such efforts will improve the reliability and applicability of automated BC detection systems.