CMAF-Net: cross-modal attention fusion with information-theoretic regularization for imbalanced breast cancer histopathology

Ativi, Wisdom Xornam; Chen, Wenyu; Kwao, Lazarus; Ayivi, Williams; Sam, Francis; Alqahtani, Ali; Gu, Yeong Hyeon; Al-antari, Mugahed A.

doi:10.1038/s41598-025-32794-1

Download PDF

Article
Open access
Published: 26 February 2026

CMAF-Net: cross-modal attention fusion with information-theoretic regularization for imbalanced breast cancer histopathology

Wisdom Xornam Ativi¹,
Wenyu Chen¹,
Lazarus Kwao^1,2,
Williams Ayivi³,
Francis Sam⁴,
Ali Alqahtani⁵,
Yeong Hyeon Gu⁶ &
…
Mugahed A. Al-antari⁶

Scientific Reports , Article number: (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Breast cancer diagnosis from histopathology images remains challenging due to two intertwined factors: severe class imbalance, where malignant cases represent a small minority of samples, and the need to integrate discriminative features across multiple spatial scales. Existing methods typically address imbalance and multi-scale fusion separately, leading to biased or redundant representations. We propose CMAF-Net, a theoretically grounded architecture that unifies information bottleneck principles with margin-based learning to jointly tackle these challenges. CMAF-Net employs a dual-branch CNN-Transformer backbone fused through a Cross-Modal Attention Fusion block, which implements temperature-controlled attention and redundancy minimization to preserve complementary local and global features. At the classification level, we introduce an Adaptive Class-Balanced Focal Loss that operationalizes margin theory under imbalance, enforcing larger margins for minority classes while dynamically adapting to feature distributions. Extensive experiments on the IDC dataset show that CMAF-Net achieves 94.92% sensitivity and 95.52% balanced accuracy, outperforming state-of-the-art baselines by up to 8.6% on malignant detection. Under extreme 99:1 imbalance, CMAF-Net maintains 76.45% sensitivity, demonstrating graceful degradation where competing methods fail catastrophically. Cross-dataset evaluation on BreakHis confirms robust zero-shot transfer across four magnifications with average sensitivity of 95.61%. Ablation studies and information-theoretic analyses validate the contributions of each component, while computational profiling shows CMAF-Net achieves superior accuracy-efficiency trade-offs compared to prior fusion networks. Beyond breast cancer, our framework establishes a principled template for information-theoretic fusion under class imbalance, with implications for rare disease detection, clinical decision support, and broader multi-modal learning tasks.

Data availability

The IDC and BreakHis datasets used in this study are publicly available. The complete source code, trained weights, and experiment scripts will be released publicly on GitHub upon acceptance: https://github.com/wizzydredd/CMAF-Net

References

Siegel, R. L., Miller, K. D. & Wagle, N.S.& Jemal, A.,. Cancer statistics. CA: Cancer J. Clinicians 73(1), 17–48. https://doi.org/10.3322/caac.21763 (2023).
Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature Medicine 25(8), 1301–1309 (2019).
Madabhushi, A. & Lee, G. Image analysis and machine learning in digital pathology: Challenges and opportunities. Medical Image Anal. 33, 170–175 (2016).
Google Scholar
He, K., Zhang, X., Ren, S.& Sun, J. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G.& Gelly, S., et al. An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations (2021)
Dai, Z., Liu, H., Le, Q. V. & Tan, M. Coatnet: Marrying convolution and attention for all data sizes. Adv. Neural Inform. Processing Syst. 34, 3965–3977 (2021).
Google Scholar
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. Smote: synthetic minority over-sampling technique. J. Artificial Intell. Res. 16, 321–357 (2002).
Google Scholar
Cui, Y., Jia, M., Lin, T.-Y., Song, Y.& Belongie, S. Class-balanced loss based on effective number of samples. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9268–9277 (2019)
Lin, T.-Y., Goyal, P., Girshick, R., He, K.& Dollár, P. Focal loss for dense object detection. Proceedings of the IEEE international conference on computer vision, 2980–2988 (2017)
Zhang, Z., Xu, M., Zhang, W. & Li, Q. Information fusion for multi-scale data: Survey and challenges. Information Fusion 89, 391–417 (2023).
Google Scholar
Tishby, N., Pereira, F.C.& Bialek, W. The information bottleneck method. arXiv preprint physics/0004057 (2000)
Cao, K., Wei, C., Gaidon, A., Arechiga, N.& Ma, T. Learning imbalanced datasets with label-distribution-aware margin loss. Advances in Neural Information Processing Systems 32 (2019)
Baltrušaitis, T., Ahuja, C. & Morency, L.-P. Multimodal machine learning: A survey and taxonomy. IEEE Transa. Pattern Anal. Mach. Intell. 41(2), 423–443 (2019).
Google Scholar
Zhao, Y. et al. A comprehensive survey on deep learning based data fusion methods in smart healthcare systems. Information Fusion 108, 102361 (2024).
Google Scholar
Gao, J., Li, P., Chen, Z. & Zhang, J. A survey on deep learning for multimodal data fusion. Neural Computation 32(5), 829–864 (2020).
Google Scholar
Ramachandram, D. & Taylor, G. W. Deep multimodal learning: A survey on recent advances and trends. IEEE Signal Processing Magazine 34(6), 96–108 (2017).
Google Scholar
Shamshad, F., Khan, S., Zamir, S.W., Khan, M.H., Hayat, M., Khan, F.S.& Fu, H. Transformers in medical imaging: A survey. Medical Image Analysis, 102802 (2023)
Zhou, S. K. et al. A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises. Proceedings of the IEEE 109(5), 820–838 (2021).
Google Scholar
Huang, Y. et al. What makes multi-modal learning better than single (provably). Adv. Neural Inform. Processing Syst. 34, 10944–10956 (2021).
Google Scholar
Zhang, Y., Liu, H.& Hu, Q. Transfuse: Fusing transformers and cnns for medical image segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention, 14–24 (2021). Springer
Chen, C.-F.R., Fan, Q.& Panda, R. Crossvit: Cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 357–366 (2021)
Liu, J. et al. Multi-level feature fusion network combining attention mechanisms for polyp segmentation. Information Fusion 104, 102195 (2024).
Google Scholar
Cai, Z. et al. Dafnet: A novel dynamic adaptive fusion network for medical image classification. Information Fusion 126, 103507 (2026).
Google Scholar
Nagrani, A., Yang, S., Arnab, A., Schmid, C. & Sun, C. Attention bottlenecks for multimodal fusion. Adv. Neural Inform. Processing Syst. 34, 14200–14213 (2021).
Google Scholar
Jaegle, A., Gimeno, F., Brock, A., Vinyals, O., Zisserman, A.& Carreira, J. Perceiver: General perception with iterative attention. International conference on machine learning, 4651–4664 (2021). PMLR
Zhou, T., Fu, H., Zhang, Y., Zhang, C., Lu, X., Shen, J.& Shao, L. Multimodal learning in clinical imaging: A comprehensive survey. Medical Image Analysis, 102859 (2023)
Wang, H. et al. Tinyvit-lightgbm: A lightweight and smart feature fusion framework for iomt-based cancer diagnosis. Information Fusion 125, 105253 (2025).
Google Scholar
Liu, J., Zhang, Y., Chen, J.-N., Xiao, J., Lu, Y., Landman, B.A., Yuan, Y., Yuille, A., Tang, Y.& Zongwei, Z. Clip-driven universal model for organ segmentation and tumor detection. Proceedings of the IEEE/CVF International Conference on Computer Vision, 21152–21164 (2023)
Johnson, J. M. & Khoshgoftaar, T. M. Survey on deep learning with class imbalance. J. Big Data 6(1), 1–54 (2019).
Google Scholar
Mullick, S.S., Datta, S.& Das, S. Generative adversarial minority oversampling. Proceedings of the IEEE/CVF International Conference on Computer Vision, 1695–1704 (2019)
Zhang, H., Xu, H., Tian, X., Jiang, J. & Ma, J. Deep learning-based methods for medical image fusion: A review. Comput. Biol. Med. 136, 104664 (2021).
Google Scholar
Chlap, P. et al. A review of medical image data augmentation techniques for deep learning applications. J. Med. Imaging Radiation Oncology 65(5), 545–563 (2021).
Google Scholar
Menon, A.K., Jayasumana, S., Rawat, A.S., Jain, H., Veit, A.& Kumar, S. Long-tail learning via logit adjustment. International Conference on Learning Representations (2021)
Li, X., Sun, X., Meng, Y., Liang, J., Wu, F.& Li, J. Dice loss for data-imbalanced nlp tasks. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 465–476 (2020)
Kini, G. R., Paraskevas, O., Oymak, S. & Thrampoulidis, C. Label-imbalanced and group-sensitive classification under overparameterization. Adv. Neural Inform. Processing Syst. 34, 18970–18983 (2021).
Google Scholar
Menon, A., Rawat, A. S., Reddi, S. & Kumar, S. Statistical consistency and convergence of label noise learning under class-conditional noise models. J. Mach. Learn. Res. 22(159), 1–53 (2021).
Google Scholar
Collell, G., Prelec, D. & Patil, K. R. Unbiased loss functions for imbalanced classification. Pattern Recognition 131, 108881 (2022).
Google Scholar
Shwartz-Ziv, R.& Tishby, N. Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810 (2017)
Alemi, A.A., Fischer, I., Dillon, J.V.& Murphy, K. Deep variational information bottleneck. International Conference on Learning Representations (2017)
Saxe, A. M. et al. On the information bottleneck theory of deep learning. J. Statistical Mech.: Theory and Experiment 2019(12), 124020 (2019).
Google Scholar
Goldfeld, Z. & Polyanskiy, Y. The information bottleneck problem and its applications in machine learning. IEEE J. Selected Areas Inform. Theory 1(1), 19–38 (2020).
Google Scholar
Geiger, B.C.& Kubin, G. Information-theoretic perspective on generalization and memorization in machine learning. IEEE Transactions on Information Theory (2021)
Federici, M., Dutta, A., Forré, P., Kushman, N.& Akata, Z. Learning robust representations via multi-view information bottleneck. International Conference on Learning Representations (2020)
Wang, S. et al. Multi-view information bottleneck for medical image analysis. Medical Image Anal. 85, 102765 (2023).
Google Scholar
Pluim, J. P., Maintz, J. A. & Viergever, M. A. Mutual-information-based registration of medical images: a survey. IEEE Transa. Med. Imaging 22(8), 986–1004 (2003).
Google Scholar
Guo, Y., Wu, J., Li, L. & Gao, X. Mutual information-based multimodal image registration: A review. Neurocomputing 492, 644–663 (2022).
Google Scholar
Li, X., Chen, H., Qi, X., Dou, Q., Fu, C.-W.& Heng, P.-A. Information fusion for multi-modality medical image segmentation: A survey. Artificial Intelligence in Medicine, 102547 (2023)
Elton, D.C. Self-explaining neural networks: A review. arXiv preprint arXiv:2105.05837 (2021)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S.& Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF International Conference on Computer Vision, 10012–10022 (2021)
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A.& Jégou, H. Training data-efficient image transformers & distillation through attention. International Conference on Machine Learning, 10347–10357 (2021). PMLR
Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L.& Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Graham, B., El-Nouby, A., Touvron, H., Stock, P., Joulin, A., Jégou, H.& Douze, M. Levit: a vision transformer in convnet’s clothing for faster inference. Proceedings of the IEEE/CVF International Conference on Computer Vision, 12259–12269 (2021)
Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L.& Zhang, L. Cvt: Introducing convolutions to vision transformers. Proceedings of the IEEE/CVF International Conference on Computer Vision, 22–31 (2021)
Mehta, S.& Rastegari, M. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer. International Conference on Learning Representations (2022)
Srinidhi, C. L., Ciga, O. & Martel, A. L. Deep neural network models for computational histopathology: A survey. Med. Image Anal. 67, 101813 (2021).
Google Scholar
Dimitriou, N., Arandjelović, O. & Caie, P. D. Deep learning for whole slide image analysis: an overview. Front. Med. 6, 264 (2019).
Google Scholar
Spanhol, F. A., Oliveira, L. S., Petitjean, C. & Heutte, L. A dataset for breast cancer histopathological image classification. IEEE Transa. Biomed. Eng. 63(7), 1455–1462 (2016).
Google Scholar
Yan, R. et al. Breast cancer histopathological image classification using a hybrid deep neural network. Methods 173, 52–60 (2020).
Google Scholar
Tellez, D. et al. Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology. Med. Image Anal. 58, 101544 (2019).
Google Scholar
Ilse, M., Tomczak, J.& Welling, M. Attention-based deep multiple instance learning. International conference on machine learning, 2127–2136 (2018). PMLR
Li, B., Li, Y.& Eliceiri, K.W. Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14318–14328 (2021)
Brown, G., Wyatt, J., Harris, R. & Yao, X. Diversity creation methods: a survey and categorisation. Information fusion 6(1), 5–20 (2005).
Google Scholar
Cover, T. M. & Thomas, J. A. Elements of Information Theory 2nd edn. (John Wiley & Sons, Hoboken, New Jersey, 2006).
Macenko, M., Niethammer, M., Marron, J.S., Borland, D., Woosley, J.T., Guan, X., Schmitt, C.& Thomas, N.E. A method for normalizing histology slides for quantitative analysis. 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 1107–1110 (2009). IEEE
Wang, X., Girshick, R., Gupta, A.& He, K. Attention mechanisms in computer vision: A comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023)
Xiong, Y. et al. Nyströmformer: A nyström-based algorithm for approximating self-attention. Proceed. AAAI Conf. Artificial Intell. 35(16), 14138–14148 (2021).
Google Scholar
Loshchilov, I.& Hutter, F. Decoupled weight decay regularization. International Conference on Learning Representations (2019)
Foret, P., Kleiner, A., Mobahi, H.& Neyshabur, B. Sharpness-aware minimization for efficiently improving generalization. International Conference on Learning Representations (2021)
Dao, T., Fu, D., Ermon, S., Rudra, A. & Ré, C. Flashattention: Fast and memory-efficient exact attention with io-awareness. Adv. Neural Inform. Process. Syst. 35, 16344–16359 (2022).
Cruz-Roa, A., Basavanhally, A., González, F., Gilmore, H., Feldman, M., Ganesan, S., Shih, N., Tomaszewski, J.& Madabhushi, A. Automatic detection of invasive ductal carcinoma in whole slide images with convolutional neural networks. Medical Imaging 2014: Digital Pathology 9041, 904103 (2014)
Yang, Y., Zha, S., Wang, J. & Zhang, Z. A survey on long-tailed visual recognition. Int. J. Computer Vision 130(7), 1837–1872 (2022).
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L.& Weinberger, K.Q. Densely connected convolutional networks. Proceedings of the IEEE conference on computer vision and pattern recognition, 4700–4708 (2017)
Tan, M.& Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. International Conference on Machine Learning, 6105–6114 (2019). PMLR
Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T.& Xie, S. A convnet for the 2020s. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11976–11986 (2022)
Ding, J., Xue, N., Xia, G.-S., Dai, D. & Yang, M.Y. Hrfnet: High-resolution feature network for dense prediction. arXiv preprint arXiv:2108.07697 (2021)
Joze, H.R.V., Shaban, A., Iuzzolino, M.L. & Koishida, K. Mmtm: Multimodal transfer module for cnn fusion. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13289–13299 (2020)
Ren, J. et al. Balanced meta-softmax for long-tailed visual recognition. Adv. Neural Inform. Process. Syst. 33, 4175–4186 (2020).
Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge the financial support that made this research possible. This work was supported by the National Natural Science Foundation of China (Grant No. U22B2061), the Institute of Information & Communications Technology Planning & Evaluation (IITP) - Information Technology Research Center (ITRC) grant funded by the Ministry of Science and ICT, Republic of Korea (Grant No. IITP-2025-RS-2024-00437191), and the Deanship of Scientific Research, King Khalid University, Saudi Arabia (Grant No. RGP2/314/45).

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. U22B2061), the Institute of Information & Communications Technology Planning & Evaluation (IITP) - Information Technology Research Center (ITRC) grant funded by the Ministry of Science and ICT, Republic of Korea (Grant No. IITP-2025-RS-2024-00437191), and by the Deanship of Scientific Research, King Khalid University, Saudi Arabia (Grant No. RGP2/314/45).

Author information

Authors and Affiliations

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
Wisdom Xornam Ativi, Wenyu Chen & Lazarus Kwao
Department of Computer Science, Sunyani Technical University, Sunyani, Ghana
Lazarus Kwao
School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
Williams Ayivi
School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
Francis Sam
Center for Artificial Intelligence and Computer Science Department, King Khalid University, Abha, 61421, Saudi Arabia
Ali Alqahtani
Department of Artificial Intelligence and Data Science, College of AI Convergence, Daeyang AI Center, Sejong University, Seoul, 05006, Republic of Korea
Yeong Hyeon Gu & Mugahed A. Al-antari

Authors

Wisdom Xornam Ativi
View author publications
Search author on:PubMed Google Scholar
Wenyu Chen
View author publications
Search author on:PubMed Google Scholar
Lazarus Kwao
View author publications
Search author on:PubMed Google Scholar
Williams Ayivi
View author publications
Search author on:PubMed Google Scholar
Francis Sam
View author publications
Search author on:PubMed Google Scholar
Ali Alqahtani
View author publications
Search author on:PubMed Google Scholar
Yeong Hyeon Gu
View author publications
Search author on:PubMed Google Scholar
Mugahed A. Al-antari
View author publications
Search author on:PubMed Google Scholar

Contributions

W.X.A. conceived and designed the study, developed the methodology, curated the data, performed the formal analysis, and wrote the original draft of the manuscript. W.X.A., W.C., L.K., W.A., F.S., M.A.A.-a., Y.H.G., and A.A. contributed to writing, reviewing, and editing the manuscript. W.X.A., W.A., and A.A. curated the data, while W.X.A. and L.K. carried out the formal analyses. L.K. and F.S. contributed to visualization, and F.S. conducted the experiments. W.C., W.A., and Y.H.G. contributed to validation, and W.C. provided supervision. W.C., M.A.A.-a. and Y.H.G. acquired funding, and A.A. managed the project.All authors reviewed and approved the final manuscript.

Corresponding authors

Correspondence to Wisdom Xornam Ativi, Wenyu Chen, Yeong Hyeon Gu or Mugahed A. Al-antari.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Ativi, W.X., Chen, W., Kwao, L. et al. CMAF-Net: cross-modal attention fusion with information-theoretic regularization for imbalanced breast cancer histopathology. Sci Rep (2026). https://doi.org/10.1038/s41598-025-32794-1

Download citation

Received: 13 September 2025
Accepted: 12 December 2025
Published: 26 February 2026
DOI: https://doi.org/10.1038/s41598-025-32794-1

CMAF-Net: cross-modal attention fusion with information-theoretic regularization for imbalanced breast cancer histopathology

Subjects

Abstract

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Search

Quick links

Subjects

Abstract

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links