A novel hybrid model of simplified and external attention coupled with enhanced CNN for medical image segmentation

Shang, Yi; Li, Fu Fang; Zhang, Wei Xiang

doi:10.1038/s41598-026-43416-9

Download PDF

Original Research
Open access
Published: 16 March 2026

A novel hybrid model of simplified and external attention coupled with enhanced CNN for medical image segmentation

Yi Shang¹,
Fu Fang Li¹ &
Wei Xiang Zhang¹

Scientific Reports , Article number: (2026) Cite this article

968 Accesses
19 Altmetric
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Although UNet has proven its success in various tasks involving medical image segmentation, its capacity to capture global context is restricted by the finite receptive field inherent to convolutional operations. Transformer is capable of capturing long-range dependencies. Consequently, integrating transformer into UNet can alleviate the issue of its limited receptive field. However, transformer usually relies heavily on large-scale pre-training and struggles to capture local features. To address these challenges, we propose SimEANet, a network that employs an encoder-decoder structure with a hybrid CNN-Transformer architecture. We design an enhanced ResNet as a shallow feature extractor for the encoder. Furthermore, we introduce SimEA transformer as the backbone of the encoder. Finally, we use improved cascaded upsampling processors to obtain the segmentation result. The performance of SimEANet is substantiated through rigorous testing on two public accessible datasets. Extensive experiments demonstrate the high competitiveness of our approach, achieving average Dice Similarity Coefficients (DSC) of 82.35% and 91.85% on two datasets. SimEANet notably enhances performance in multi-organ segmentation tasks, achieving an advanced level of segmentation accuracy.

UNet with self-adaptive Mamba-like attention and causal-resonance learning for medical image segmentation

Article Open access 03 December 2025

A Siamese Swin-Unet for image change detection

Article Open access 25 February 2024

Hybrid deep learning architecture for scalable and high-quality image compression

Article Open access 02 July 2025

References

Sezgin, M. & Sankur, B. l. Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 13, 146–168 (2004).
Piórkowski, A. A statistical dominance algorithm for edge detection and segmentation of medical images. In Information Technologies in Medicine: 5th International Conference, ITIB 2016 Kamie´n ´Sl ˛aski, Poland, June 20-22, 2016 Proceedings, Volume 1, 3–14 (Springer, 2016).
Kavitha, A. & Chellamuthu, C. Brain tumour segmentation and detection using modified region growing and genetic algorithm in mri images. Int. J. Med. Eng. Informatics 9, 269–283 (2017).
Google Scholar
Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, 234–241 (Springer, 2015).
Zhou, Z., Rahman Siddiquee, M. M., Tajbakhsh, N. & Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In International workshop on deep learning in medical image analysis, 3–11 (Springer, 2018).
Huang, H. et al. Unet 3+: A full-scale connected unet for medical image segmentation. In ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), 1055–1059 (Ieee, 2020).
Chen, J. et al. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021).
Cao, H. et al. Springer,. Swin-unet: Unet-like pure transformer for medical image segmentation. In European conference on computer vision, 205–218 (2022).
Zhao, H., Shi, J., Qi, X., Wang, X. & Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2881–2890 (2017).
Chen, L. et al. Drinet for medical image segmentation. IEEE Trans. Med. Imaging37, 2453–2462 (2018).
Google Scholar
Oktay, O. et al. Attention u-net: Learning where to look for the pancreas. arxiv 2018. arXiv preprint arXiv:1804.03999 (1804).
Hu, J., Shen, L. & Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 7132–7141 (2018).
Woo, S., Park, J., Lee, J. Y., Kweon, I. S. & Cbam Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), 3–19 (2018).
Butoi, V. I. et al. Universeg: Universal medical image segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 21438–21451 (2023).
Vaswani, A. et al. Attention is all you need[J]. Advances in neural information processing systems, 30.Wenxuan, W. et al. Transbts: Multimodal brain tumor segmentation using transformer. In International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 109–119 (2021) (2017).
Roy, S. et al. Mednext: transformer-driven scaling of convnets for medical image segmentation. In International conference on medical image computing and computer-assisted intervention, 405–415 (Springer, 2023).
Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (2021).
Wenxuan, W. et al. Transbts: Multimodal brain tumor segmentation using transformer. In International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 109–119 (2021).
Chang, Y., Menghan, H., Guangtao, Z. & Xiao-Ping, Z. Transclaw u-net: Claw u-net with transformers for medical image segmentation. arXiv preprint arXiv:2107.05188 (). (2021).
Li, C., Wang, L. & Cheng, S. Enhanced transformer encoder and hybrid cascaded upsampler for medical image segmentation. Expert Syst. Appl. 238, 121965 (2024).
Google Scholar
Tang, H. et al. Htc-net: A hybrid cnn-transformer framework for medical image segmentation. Biomed. Signal. Process. Control. 88, 105605 (2024).
Google Scholar
Misra, D., Nalamada, T., Arasanipalai, A. U. & Hou, Q. Rotate to attend: Convolutional triplet attention module. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, 3139–3148 (2021).
Chen, L. C., Papandreou, G., Schroff, F. & Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (). (2017).
Guo, M. H., Liu, Z. N., Mu, T. J. & Hu, S. M. Beyond self-attention: External attention using two linear layers for visual tasks. IEEE Trans. Pattern Anal. Mach. Intell. 45, 5436–5447 (2022).
Google Scholar
MICCAI Multi-atlas Labeling Challenge. Abdominal and pelvic anatomy CT dataset [Data set]. Presented at the MICCAI Workshop on Multi-atlas Labeling of Abdominal and Pelvic Anatomy. Retrieved January 21, 2023, from (2013). https://www.synapse.org/#!Synapse:syn3193805/wiki/217789
Bernard, O. et al. Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans. Med. Imaging. 37, 2514–2525 (2018).
Sun, J. et al. Dsga-net: Deeply separable gated transformer and attention strategy for medical image segmentation network. J. King Saud Univ. Inf. Sci. 35, 101553 (2023).
Google Scholar
Milletari, F., Navab, N. & Ahmadi, S.-A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV), 565–571 (Ieee, 2016).
Fu, S. et al. Domain adaptive relational reasoning for 3d multi-organ segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 656–666 (Springer, 2020).
Wang, H. et al. Mixed transformer u-net for medical image segmentation. In ICASSP 2022-2022 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2390–2394 (IEEE, 2022).
Huang, X., Deng, Z., Li, D., Yuan, X. & Fu, Y. Missformer: An effective transformer for 2d medical image segmentation. IEEE Trans. Med. Imaging. 42, 1484–1494 (2022).

Download references

Acknowledgements

This work is supported in part by National Natural Science Foundation of China (No.61977018, 62394330 and 62394334).

Author information

Authors and Affiliations

School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, 510006, China
Yi Shang, Fu Fang Li & Wei Xiang Zhang

Authors

Yi Shang
View author publications
Search author on:PubMed Google Scholar
Fu Fang Li
View author publications
Search author on:PubMed Google Scholar
Wei Xiang Zhang
View author publications
Search author on:PubMed Google Scholar

Contributions

Yi Shang: Responsible for research design and conceptualization, data collection and processing, experiments, and writing paper drafts; Fu Fang Li: Provided the necessary resources and support for the research, supervised the entire research process, and participated in the review and revision of the final version of the paper; Wei Xiang Zhang: Formal analysis, data curation, writing, review and editing.

Corresponding author

Correspondence to Fu Fang Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Shang, Y., Li, F.F. & Zhang, W.X. A novel hybrid model of simplified and external attention coupled with enhanced CNN for medical image segmentation. Sci Rep (2026). https://doi.org/10.1038/s41598-026-43416-9

Download citation

Received: 09 September 2024
Accepted: 03 March 2026
Published: 16 March 2026
DOI: https://doi.org/10.1038/s41598-026-43416-9