Infrared-visible image fusion with double-attention mechanism and adaptive interaction loss

Wang, Ziqian; Hu, Yanxiang; Zhang, Bo

doi:10.1038/s41598-026-45802-9

Download PDF

Article
Open access
Published: 03 April 2026

Infrared-visible image fusion with double-attention mechanism and adaptive interaction loss

Ziqian Wang¹,
Yanxiang Hu¹ &
Bo Zhang¹

Scientific Reports , Article number: (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Infrared and visible light image fusion aims to synthesize a more informative result by extracting and integrating complementary salient features within two heterogeneous modalities. Recent research has shown that capturing explicit self-similarity and implicit cross-correlation with the aid of an attention mechanism has garnered significant interest and presents several advantages. However, exploring the complementary relationships more comprehensively and optimizing the interaction degrees of double attentions quantitatively is still a challenging issue. In this paper, a novel infrared and visible light image fusion method exploring a double-attention mechanism is proposed. Specifically, our approach excavates intra- and inter-attention features of source images through a two-step feature extraction strategy and integrates them with an intra-attention block in the feature fusion stage. Additionally, to regulate the interaction of two kinds of attentions optimally, an adaptive interaction loss term is devised. In these ways, the salient infrared targets and visible texture details can be integrated more effectively. In the experiments, the proposed method was contrasted with seven state-of-the-art methods on the TNO and RoadScene datasets. The comprehensive subjective and objective comparisons demonstrate the superiority of our method. In addition, a thorough experiment and discussion on the interaction of intra- and inter-information is presented to validate and analyze the effectiveness of our work further.

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Code availability

Available from the corresponding author on reasonable request.

References

Zhang, H., Xu, H., Tian, X., Jiang, J. & Ma, J. Image fusion meets deep learning: A survey and perspective. Inf. Fusion 76, 323–336 (2021).
Google Scholar
Karim, S. et al. ReMamba: A hybrid CNN-Mamba aggregation network for visible-infrared person re-identification. Sci. Rep. 14, 29362 (2024).
Google Scholar
Vivone, G. Multispectral and hyperspectral image fusion in remote sensing: A survey. Inf. Fusion 89, 405–417 (2023).
Google Scholar
Li, J. et al. AttFeat: Attention-based features for infrared and visible remote sensing image matching. IEEE Geosci. Remote Sens. Lett. 22, 1–5 (2025).
Google Scholar
Voronin, V. et al. Deep visible and thermal image fusion for enhancement visibility for surveillance application. In Security + Defence (2022).
Yadav, G. & Yadav, D. Contrast enhancement of region of interest of backlit image for surveillance systems based on multi-illumination fusion. Image Vis. Comput. 135, 104693 (2023).
Google Scholar
Yuan, Q. et al. Enhanced target tracking algorithm for autonomous driving based on visible and infrared image fusion. J. Intell. Connect. Veh. 6, 237–249 (2023).
Google Scholar
Wang, D., Liu, J., Liu, R. & Fan, X. An interactively reinforced paradigm for joint infrared-visible image fusion and saliency object detection. Inf. Fusion 98, 101828 (2023).
Google Scholar
Bineeshia, J. & Kumar, B. V. AIR-GANet: Multi-head attention integrated residual dense block based generative adversarial network for visible and infrared image fusion. Sci. Rep. 15, 39464 (2025).
Google Scholar
Zhuang, C. et al. PHFuse: Unsupervised color visible and infrared image fusion with preserved hue. Sci. Rep. 15, 31458 (2025).
Google Scholar
Dosovitskiy, A. et al. An image is worth 16 \(\times\) 16 words: Transformers for image recognition at scale. Preprint at https://arxiv.org/abs/2010.11929 (2021).
Ahmad, I. et al. A multiscale transformer with spatial attention for hyperspectral image classification. Sci. Rep. 16, 4690 (2026).
Google Scholar
Ma, J. et al. SwinFusion: Cross-domain long-range learning for general image fusion via Swin transformer. IEEE/CAA J. Autom. Sin. 9, 1200–1217 (2022).
Google Scholar
Rao, D. et al. TGFuse: An infrared and visible image fusion approach based on transformer and generative adversarial network. IEEE Trans. Image Process. https://doi.org/10.1109/TIP.2023.3273451 (2023).
Google Scholar
Karacan, L. Multi-image transformer for multi-focus image fusion. Signal Process. Image Commun. 119, 117058 (2023).
Google Scholar
Wang, Z. et al. SwinFuse: A residual Swin transformer fusion network for infrared and visible images. IEEE Trans. Instrum. Meas. 71, 117058 (2022).
Google Scholar
Chen, X., Xu, S., Hu, S. & Ma, X. MGFA: A multi-scale global feature autoencoder to fuse infrared and visible images. Signal Process. Image Commun. 128, 117168 (2024).
Google Scholar
Li, H. et al. DenseFuse: A fusion approach to infrared and visible images. IEEE Trans. Image Process. 28, 2614–2623 (2019).
Google Scholar
Li, H., Wu, X. & Kittler, J. RFN-Nest: An end-to-end residual fusion network for infrared and visible images. Inf. Fusion 73, 72–86 (2021).
Google Scholar
Zhang, Z., Wu, X. & Xu, T. FPNFuse: A lightweight feature pyramid network for infrared and visible image fusion. IET Image Proc. 16, 2308–2320 (2022).
Google Scholar
Wang, H., Lu, X., Wu, Z., Li, R. & Wang, J. Infrared and visible image fusion based on autoencoder network. IET Image Process 19, 70086 (2025).
Google Scholar
Liu, R. et al. A bilevel integrated model with data-driven layer ensemble for multi-modality image fusion. IEEE Trans. Image Process. 30, 1261–1274 (2021).
Google Scholar
Liu, Y., Chen, X., Cheng, J., Peng, H. & Wang, Z. Infrared and visible image fusion with convolutional neural networks. Int. J. Wavelets Multiresolut. Inf. Process. 16, 1850018 (2018).
Google Scholar
Xu, H. et al. U2Fusion: A unified unsupervised image fusion network. IEEE Trans. Pattern Anal. Mach. Intell. 44, 502–518 (2022).
Google Scholar
Long, Y., Jia, H., Zhong, Y., Jiang, Y. & Jia, Y. RXDNFuse: A aggregated residual dense network for infrared and visible image fusion. Inf. Fusion 69, 128–141 (2021).
Google Scholar
Li, H. et al. Different input resolutions and arbitrary output resolution: A meta learning-based deep framework for infrared and visible image fusion. IEEE Trans. Image Process. 30, 4070–4083 (2021).
Google Scholar
Mustafa, H., Yang, J., Mustafa, H. & Zareapoor, M. Infrared and visible image fusion based on dilated residual attention network. Optik 224, 165409 (2020).
Google Scholar
Luo, Y., He, K., Xu, D., Yin, W. & Liu, W. Infrared and visible image fusion based on visibility enhancement and hybrid multiscale decomposition. Optik 258, 168914 (2022).
Google Scholar
Xu, J., Liu, Z. & Fang, M. An infrared and visible image fusion network based on multi-scale feature cascades and non-local attention. IET Image Process 18, 2114–2125 (2024).
Google Scholar
Hu, X., Liu, Y. & Yang, F. PFCFuse: A Poolformer and CNN fusion network for infrared-visible image fusion. IEEE Trans. Instrum. Meas. 73, 1–14 (2024).
Google Scholar
Ma, J. et al. FusionGAN: A generative adversarial network for infrared and visible image fusion. Inf. Fusion 48, 11–26 (2019).
Google Scholar
Zarimeidani, M., Amirabadi, A., Amiri, N., Ahanian, I. & Es’haghi, S. Infrared and visible image fusion using GAN with fuzzy logic and Harris Hawks optimization. Sci. Rep. 16, 70 (2026).
Google Scholar
Lebedev, M., Komarov, D., Vygolov & Vizilter, Y., Multisensor image fusion based on generative adversarial networks. In Image and Signal Processing for Remote Sensing XXV 111551T (2019).
Zhang, D., Yong, D., Zhao, J., Zhou, Z. & Yao, R. Structural similarity preserving GAN for infrared and visible image fusion. Multiresolut. Inf. Process. 19, 2050063 (2020).
Google Scholar
Ma, J. et al. GANMcC: A generative adversarial network with multiclassification constraints for infrared and visible image fusion. IEEE Trans. Instrum. Meas. 70, 1–14 (2021).
Google Scholar
Ma, J. et al. DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Trans. Image Process. 29, 4980–4995 (2020).
Google Scholar
Shen, S. et al. ADF-Net: Attention-guided deep feature decomposition network for infrared and visible image fusion. IET Image Process. 18, 2774–2787 (2024).
Google Scholar
Tang, W. et al. YDTR: Infrared and visible image fusion via Y-shape dynamic transformer. IEEE Trans. Multimed. 25, 5413–5428 (2023).
Google Scholar
Li, J. et al. CGTF: Convolution-guided transformer for infrared and visible image fusion. IEEE Trans. Instrum. Meas. 71, 1–14 (2022).
Google Scholar
Yang, X. et al. DGLT-Fusion: A decoupled global-local infrared and visible image fusion transformer. Infrared Phys. Technol. 128, 104522 (2023).
Google Scholar
Liu, Z. et al. Swin transformer: Hierarchical vision transformer using shifted windows. Preprint at https://arxiv.org/abs/2103.14030 (2021).
Golkhatmi, B., Houshmand, M. & Hosseini, S. A multi-scale attention-based Swin transformer model for medical images segmentation. Sci. Rep. 15, 38893 (2025).
Google Scholar
Ke, A., Luo, J. & Cai, B. UNet-like network fused Swin transformer and CNN for semantic image synthesis. Sci. Rep. 14, 16761 (2024).
Google Scholar
Yang, R., Liu, K. & Liang, Y. A fusion-attention Swin transformer for cardiac MRI image segmentation. IET Image Proc. 18, 105–115 (2024).
Google Scholar
Li, H. et al. MulFS-CAP: Multimodal fusion-supervised cross-modality alignment perception for unregistered infrared-visible image fusion. IEEE Trans. Pattern Anal. Mach. Intell. 47, 3673–3690 (2025).
Google Scholar
Liu, G., Qiu, J. & Yuan, Y. A multi-level SAR-guided contextual attention network for satellite images cloud removal. Remote Sens. 16, 4767 (2024).
Google Scholar
Wang, Z. et al. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004).
Google Scholar
Ma, K., Zeng, K. & Wang, Z. Perceptual quality assessment for multi-exposure image fusion. IEEE Trans. Image Process. 24, 3345–3356 (2015).
Google Scholar
Tang, L., Yuan, J. & Ma, J. Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network. Inf. Fusion 82, 28–42 (2022).
Google Scholar
Yue, J. et al. Dif-Fusion: Toward high color fidelity in infrared and visible image fusion with diffusion models. IEEE Trans. Image Process. 32, 5705–5720 (2023).
Google Scholar
Liu, X., Huo, H., Li, J., Pang, S. & Zheng, B. A semantic-driven coupled network for infrared and visible image fusion. Inf. Fusion 108, 1556–2535 (2024).
Google Scholar
Toet, A. The TNO multiband image data collection. Data Brief 15, 249–251 (2017).
Google Scholar
Tian, Y., Carballo, A., Li, R. & Takeda, K. Road scene graph: A semantic graph-based scene representation dataset for intelligent vehicles. Preprint at arxiv:2011.13588 (2020).
Qu, G. et al. Information measure for performance of image fusion. Electron. Lett. 38, 313–315 (2002).
Google Scholar
Han, Y., Cai, Y., Cao, Y. & Xu, X. A new image fusion performance metric based on visual information fidelity. Inf. Fusion 14, 127–135 (2013).
Google Scholar
Liu, Z. et al. Objective assessment of multiresolution image fusion algorithms for context enhancement in night vision: A comparative study. IEEE Trans. Pattern Anal. Mach. Intell. 34, 94–109 (2012).
Google Scholar

Download references

Acknowledgements

We would like to thank Professor Liu Zheng for providing fusion quality objective assessment toolbox.

Funding

This work was supported by National Natural Science Foundation of China under Project Number 61274021 and 61902282.

Author information

Authors and Affiliations

College of Computer and Information Engineering, Tianjin Normal University, Tianjin, 300387, China
Ziqian Wang, Yanxiang Hu & Bo Zhang

Authors

Ziqian Wang
View author publications
Search author on:PubMed Google Scholar
Yanxiang Hu
View author publications
Search author on:PubMed Google Scholar
Bo Zhang
View author publications
Search author on:PubMed Google Scholar

Contributions

Z.W. designed and implemented the fusion framework, conducted experiments, and performed data curation. Y.H. and B.Z. conceived the research idea and supervised the project. Z.W. and Y.H. wrote the manuscript. All authors reviewed, edited, and approved the final manuscript.

Corresponding author

Correspondence to Yanxiang Hu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, Z., Hu, Y. & Zhang, B. Infrared-visible image fusion with double-attention mechanism and adaptive interaction loss. Sci Rep (2026). https://doi.org/10.1038/s41598-026-45802-9

Download citation

Received: 14 January 2026
Accepted: 23 March 2026
Published: 03 April 2026
DOI: https://doi.org/10.1038/s41598-026-45802-9

Infrared-visible image fusion with double-attention mechanism and adaptive interaction loss

Subjects

Abstract

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Search

Quick links

Subjects

Abstract

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links