Abstract
Although UNet has proven its success in various tasks involving medical image segmentation, its capacity to capture global context is restricted by the finite receptive field inherent to convolutional operations. Transformer is capable of capturing long-range dependencies. Consequently, integrating transformer into UNet can alleviate the issue of its limited receptive field. However, transformer usually relies heavily on large-scale pre-training and struggles to capture local features. To address these challenges, we propose SimEANet, a network that employs an encoder-decoder structure with a hybrid CNN-Transformer architecture. We design an enhanced ResNet as a shallow feature extractor for the encoder. Furthermore, we introduce SimEA transformer as the backbone of the encoder. Finally, we use improved cascaded upsampling processors to obtain the segmentation result. The performance of SimEANet is substantiated through rigorous testing on two public accessible datasets. Extensive experiments demonstrate the high competitiveness of our approach, achieving average Dice Similarity Coefficients (DSC) of 82.35% and 91.85% on two datasets. SimEANet notably enhances performance in multi-organ segmentation tasks, achieving an advanced level of segmentation accuracy.
Similar content being viewed by others
References
Sezgin, M. & Sankur, B. l. Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 13, 146–168 (2004).
Piórkowski, A. A statistical dominance algorithm for edge detection and segmentation of medical images. In Information Technologies in Medicine: 5th International Conference, ITIB 2016 Kamie´n ´Sl ˛aski, Poland, June 20-22, 2016 Proceedings, Volume 1, 3–14 (Springer, 2016).
Kavitha, A. & Chellamuthu, C. Brain tumour segmentation and detection using modified region growing and genetic algorithm in mri images. Int. J. Med. Eng. Informatics 9, 269–283 (2017).
Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, 234–241 (Springer, 2015).
Zhou, Z., Rahman Siddiquee, M. M., Tajbakhsh, N. & Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In International workshop on deep learning in medical image analysis, 3–11 (Springer, 2018).
Huang, H. et al. Unet 3+: A full-scale connected unet for medical image segmentation. In ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), 1055–1059 (Ieee, 2020).
Chen, J. et al. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021).
Cao, H. et al. Springer,. Swin-unet: Unet-like pure transformer for medical image segmentation. In European conference on computer vision, 205–218 (2022).
Zhao, H., Shi, J., Qi, X., Wang, X. & Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2881–2890 (2017).
Chen, L. et al. Drinet for medical image segmentation. IEEE Trans. Med. Imaging37, 2453–2462 (2018).
Oktay, O. et al. Attention u-net: Learning where to look for the pancreas. arxiv 2018. arXiv preprint arXiv:1804.03999 (1804).
Hu, J., Shen, L. & Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 7132–7141 (2018).
Woo, S., Park, J., Lee, J. Y., Kweon, I. S. & Cbam Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), 3–19 (2018).
Butoi, V. I. et al. Universeg: Universal medical image segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 21438–21451 (2023).
Vaswani, A. et al. Attention is all you need[J]. Advances in neural information processing systems, 30.Wenxuan, W. et al. Transbts: Multimodal brain tumor segmentation using transformer. In International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 109–119 (2021) (2017).
Roy, S. et al. Mednext: transformer-driven scaling of convnets for medical image segmentation. In International conference on medical image computing and computer-assisted intervention, 405–415 (Springer, 2023).
Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (2021).
Wenxuan, W. et al. Transbts: Multimodal brain tumor segmentation using transformer. In International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 109–119 (2021).
Chang, Y., Menghan, H., Guangtao, Z. & Xiao-Ping, Z. Transclaw u-net: Claw u-net with transformers for medical image segmentation. arXiv preprint arXiv:2107.05188 (). (2021).
Li, C., Wang, L. & Cheng, S. Enhanced transformer encoder and hybrid cascaded upsampler for medical image segmentation. Expert Syst. Appl. 238, 121965 (2024).
Tang, H. et al. Htc-net: A hybrid cnn-transformer framework for medical image segmentation. Biomed. Signal. Process. Control. 88, 105605 (2024).
Misra, D., Nalamada, T., Arasanipalai, A. U. & Hou, Q. Rotate to attend: Convolutional triplet attention module. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, 3139–3148 (2021).
Chen, L. C., Papandreou, G., Schroff, F. & Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (). (2017).
Guo, M. H., Liu, Z. N., Mu, T. J. & Hu, S. M. Beyond self-attention: External attention using two linear layers for visual tasks. IEEE Trans. Pattern Anal. Mach. Intell. 45, 5436–5447 (2022).
MICCAI Multi-atlas Labeling Challenge. Abdominal and pelvic anatomy CT dataset [Data set]. Presented at the MICCAI Workshop on Multi-atlas Labeling of Abdominal and Pelvic Anatomy. Retrieved January 21, 2023, from (2013). https://www.synapse.org/#!Synapse:syn3193805/wiki/217789
Bernard, O. et al. Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans. Med. Imaging. 37, 2514–2525 (2018).
Sun, J. et al. Dsga-net: Deeply separable gated transformer and attention strategy for medical image segmentation network. J. King Saud Univ. Inf. Sci. 35, 101553 (2023).
Milletari, F., Navab, N. & Ahmadi, S.-A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV), 565–571 (Ieee, 2016).
Fu, S. et al. Domain adaptive relational reasoning for 3d multi-organ segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 656–666 (Springer, 2020).
Wang, H. et al. Mixed transformer u-net for medical image segmentation. In ICASSP 2022-2022 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2390–2394 (IEEE, 2022).
Huang, X., Deng, Z., Li, D., Yuan, X. & Fu, Y. Missformer: An effective transformer for 2d medical image segmentation. IEEE Trans. Med. Imaging. 42, 1484–1494 (2022).
Acknowledgements
This work is supported in part by National Natural Science Foundation of China (No.61977018, 62394330 and 62394334).
Author information
Authors and Affiliations
Contributions
Yi Shang: Responsible for research design and conceptualization, data collection and processing, experiments, and writing paper drafts; Fu Fang Li: Provided the necessary resources and support for the research, supervised the entire research process, and participated in the review and revision of the final version of the paper; Wei Xiang Zhang: Formal analysis, data curation, writing, review and editing.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Shang, Y., Li, F.F. & Zhang, W.X. A novel hybrid model of simplified and external attention coupled with enhanced CNN for medical image segmentation. Sci Rep (2026). https://doi.org/10.1038/s41598-026-43416-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-43416-9


