Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
Infrared-visible image fusion with double-attention mechanism and adaptive interaction loss
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 03 April 2026

Infrared-visible image fusion with double-attention mechanism and adaptive interaction loss

  • Ziqian Wang1,
  • Yanxiang Hu1 &
  • Bo Zhang1 

Scientific Reports , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Engineering
  • Mathematics and computing

Abstract

Infrared and visible light image fusion aims to synthesize a more informative result by extracting and integrating complementary salient features within two heterogeneous modalities. Recent research has shown that capturing explicit self-similarity and implicit cross-correlation with the aid of an attention mechanism has garnered significant interest and presents several advantages. However, exploring the complementary relationships more comprehensively and optimizing the interaction degrees of double attentions quantitatively is still a challenging issue. In this paper, a novel infrared and visible light image fusion method exploring a double-attention mechanism is proposed. Specifically, our approach excavates intra- and inter-attention features of source images through a two-step feature extraction strategy and integrates them with an intra-attention block in the feature fusion stage. Additionally, to regulate the interaction of two kinds of attentions optimally, an adaptive interaction loss term is devised. In these ways, the salient infrared targets and visible texture details can be integrated more effectively. In the experiments, the proposed method was contrasted with seven state-of-the-art methods on the TNO and RoadScene datasets. The comprehensive subjective and objective comparisons demonstrate the superiority of our method. In addition, a thorough experiment and discussion on the interaction of intra- and inter-information is presented to validate and analyze the effectiveness of our work further.

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Code availability

Available from the corresponding author on reasonable request.

References

  1. Zhang, H., Xu, H., Tian, X., Jiang, J. & Ma, J. Image fusion meets deep learning: A survey and perspective. Inf. Fusion 76, 323–336 (2021).

    Google Scholar 

  2. Karim, S. et al. ReMamba: A hybrid CNN-Mamba aggregation network for visible-infrared person re-identification. Sci. Rep. 14, 29362 (2024).

    Google Scholar 

  3. Vivone, G. Multispectral and hyperspectral image fusion in remote sensing: A survey. Inf. Fusion 89, 405–417 (2023).

    Google Scholar 

  4. Li, J. et al. AttFeat: Attention-based features for infrared and visible remote sensing image matching. IEEE Geosci. Remote Sens. Lett. 22, 1–5 (2025).

    Google Scholar 

  5. Voronin, V. et al. Deep visible and thermal image fusion for enhancement visibility for surveillance application. In Security + Defence (2022).

  6. Yadav, G. & Yadav, D. Contrast enhancement of region of interest of backlit image for surveillance systems based on multi-illumination fusion. Image Vis. Comput. 135, 104693 (2023).

    Google Scholar 

  7. Yuan, Q. et al. Enhanced target tracking algorithm for autonomous driving based on visible and infrared image fusion. J. Intell. Connect. Veh. 6, 237–249 (2023).

    Google Scholar 

  8. Wang, D., Liu, J., Liu, R. & Fan, X. An interactively reinforced paradigm for joint infrared-visible image fusion and saliency object detection. Inf. Fusion 98, 101828 (2023).

    Google Scholar 

  9. Bineeshia, J. & Kumar, B. V. AIR-GANet: Multi-head attention integrated residual dense block based generative adversarial network for visible and infrared image fusion. Sci. Rep. 15, 39464 (2025).

    Google Scholar 

  10. Zhuang, C. et al. PHFuse: Unsupervised color visible and infrared image fusion with preserved hue. Sci. Rep. 15, 31458 (2025).

    Google Scholar 

  11. Dosovitskiy, A. et al. An image is worth 16 \(\times\) 16 words: Transformers for image recognition at scale. Preprint at https://arxiv.org/abs/2010.11929 (2021).

  12. Ahmad, I. et al. A multiscale transformer with spatial attention for hyperspectral image classification. Sci. Rep. 16, 4690 (2026).

    Google Scholar 

  13. Ma, J. et al. SwinFusion: Cross-domain long-range learning for general image fusion via Swin transformer. IEEE/CAA J. Autom. Sin. 9, 1200–1217 (2022).

    Google Scholar 

  14. Rao, D. et al. TGFuse: An infrared and visible image fusion approach based on transformer and generative adversarial network. IEEE Trans. Image Process. https://doi.org/10.1109/TIP.2023.3273451 (2023).

    Google Scholar 

  15. Karacan, L. Multi-image transformer for multi-focus image fusion. Signal Process. Image Commun. 119, 117058 (2023).

    Google Scholar 

  16. Wang, Z. et al. SwinFuse: A residual Swin transformer fusion network for infrared and visible images. IEEE Trans. Instrum. Meas. 71, 117058 (2022).

    Google Scholar 

  17. Chen, X., Xu, S., Hu, S. & Ma, X. MGFA: A multi-scale global feature autoencoder to fuse infrared and visible images. Signal Process. Image Commun. 128, 117168 (2024).

    Google Scholar 

  18. Li, H. et al. DenseFuse: A fusion approach to infrared and visible images. IEEE Trans. Image Process. 28, 2614–2623 (2019).

    Google Scholar 

  19. Li, H., Wu, X. & Kittler, J. RFN-Nest: An end-to-end residual fusion network for infrared and visible images. Inf. Fusion 73, 72–86 (2021).

    Google Scholar 

  20. Zhang, Z., Wu, X. & Xu, T. FPNFuse: A lightweight feature pyramid network for infrared and visible image fusion. IET Image Proc. 16, 2308–2320 (2022).

    Google Scholar 

  21. Wang, H., Lu, X., Wu, Z., Li, R. & Wang, J. Infrared and visible image fusion based on autoencoder network. IET Image Process 19, 70086 (2025).

    Google Scholar 

  22. Liu, R. et al. A bilevel integrated model with data-driven layer ensemble for multi-modality image fusion. IEEE Trans. Image Process. 30, 1261–1274 (2021).

    Google Scholar 

  23. Liu, Y., Chen, X., Cheng, J., Peng, H. & Wang, Z. Infrared and visible image fusion with convolutional neural networks. Int. J. Wavelets Multiresolut. Inf. Process. 16, 1850018 (2018).

    Google Scholar 

  24. Xu, H. et al. U2Fusion: A unified unsupervised image fusion network. IEEE Trans. Pattern Anal. Mach. Intell. 44, 502–518 (2022).

    Google Scholar 

  25. Long, Y., Jia, H., Zhong, Y., Jiang, Y. & Jia, Y. RXDNFuse: A aggregated residual dense network for infrared and visible image fusion. Inf. Fusion 69, 128–141 (2021).

    Google Scholar 

  26. Li, H. et al. Different input resolutions and arbitrary output resolution: A meta learning-based deep framework for infrared and visible image fusion. IEEE Trans. Image Process. 30, 4070–4083 (2021).

    Google Scholar 

  27. Mustafa, H., Yang, J., Mustafa, H. & Zareapoor, M. Infrared and visible image fusion based on dilated residual attention network. Optik 224, 165409 (2020).

    Google Scholar 

  28. Luo, Y., He, K., Xu, D., Yin, W. & Liu, W. Infrared and visible image fusion based on visibility enhancement and hybrid multiscale decomposition. Optik 258, 168914 (2022).

    Google Scholar 

  29. Xu, J., Liu, Z. & Fang, M. An infrared and visible image fusion network based on multi-scale feature cascades and non-local attention. IET Image Process 18, 2114–2125 (2024).

    Google Scholar 

  30. Hu, X., Liu, Y. & Yang, F. PFCFuse: A Poolformer and CNN fusion network for infrared-visible image fusion. IEEE Trans. Instrum. Meas. 73, 1–14 (2024).

    Google Scholar 

  31. Ma, J. et al. FusionGAN: A generative adversarial network for infrared and visible image fusion. Inf. Fusion 48, 11–26 (2019).

    Google Scholar 

  32. Zarimeidani, M., Amirabadi, A., Amiri, N., Ahanian, I. & Es’haghi, S. Infrared and visible image fusion using GAN with fuzzy logic and Harris Hawks optimization. Sci. Rep. 16, 70 (2026).

    Google Scholar 

  33. Lebedev, M., Komarov, D., Vygolov & Vizilter, Y., Multisensor image fusion based on generative adversarial networks. In Image and Signal Processing for Remote Sensing XXV 111551T (2019).

  34. Zhang, D., Yong, D., Zhao, J., Zhou, Z. & Yao, R. Structural similarity preserving GAN for infrared and visible image fusion. Multiresolut. Inf. Process. 19, 2050063 (2020).

    Google Scholar 

  35. Ma, J. et al. GANMcC: A generative adversarial network with multiclassification constraints for infrared and visible image fusion. IEEE Trans. Instrum. Meas. 70, 1–14 (2021).

    Google Scholar 

  36. Ma, J. et al. DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Trans. Image Process. 29, 4980–4995 (2020).

    Google Scholar 

  37. Shen, S. et al. ADF-Net: Attention-guided deep feature decomposition network for infrared and visible image fusion. IET Image Process. 18, 2774–2787 (2024).

    Google Scholar 

  38. Tang, W. et al. YDTR: Infrared and visible image fusion via Y-shape dynamic transformer. IEEE Trans. Multimed. 25, 5413–5428 (2023).

    Google Scholar 

  39. Li, J. et al. CGTF: Convolution-guided transformer for infrared and visible image fusion. IEEE Trans. Instrum. Meas. 71, 1–14 (2022).

    Google Scholar 

  40. Yang, X. et al. DGLT-Fusion: A decoupled global-local infrared and visible image fusion transformer. Infrared Phys. Technol. 128, 104522 (2023).

    Google Scholar 

  41. Liu, Z. et al. Swin transformer: Hierarchical vision transformer using shifted windows. Preprint at https://arxiv.org/abs/2103.14030 (2021).

  42. Golkhatmi, B., Houshmand, M. & Hosseini, S. A multi-scale attention-based Swin transformer model for medical images segmentation. Sci. Rep. 15, 38893 (2025).

    Google Scholar 

  43. Ke, A., Luo, J. & Cai, B. UNet-like network fused Swin transformer and CNN for semantic image synthesis. Sci. Rep. 14, 16761 (2024).

    Google Scholar 

  44. Yang, R., Liu, K. & Liang, Y. A fusion-attention Swin transformer for cardiac MRI image segmentation. IET Image Proc. 18, 105–115 (2024).

    Google Scholar 

  45. Li, H. et al. MulFS-CAP: Multimodal fusion-supervised cross-modality alignment perception for unregistered infrared-visible image fusion. IEEE Trans. Pattern Anal. Mach. Intell. 47, 3673–3690 (2025).

    Google Scholar 

  46. Liu, G., Qiu, J. & Yuan, Y. A multi-level SAR-guided contextual attention network for satellite images cloud removal. Remote Sens. 16, 4767 (2024).

    Google Scholar 

  47. Wang, Z. et al. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004).

    Google Scholar 

  48. Ma, K., Zeng, K. & Wang, Z. Perceptual quality assessment for multi-exposure image fusion. IEEE Trans. Image Process. 24, 3345–3356 (2015).

    Google Scholar 

  49. Tang, L., Yuan, J. & Ma, J. Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network. Inf. Fusion 82, 28–42 (2022).

    Google Scholar 

  50. Yue, J. et al. Dif-Fusion: Toward high color fidelity in infrared and visible image fusion with diffusion models. IEEE Trans. Image Process. 32, 5705–5720 (2023).

    Google Scholar 

  51. Liu, X., Huo, H., Li, J., Pang, S. & Zheng, B. A semantic-driven coupled network for infrared and visible image fusion. Inf. Fusion 108, 1556–2535 (2024).

    Google Scholar 

  52. Toet, A. The TNO multiband image data collection. Data Brief 15, 249–251 (2017).

    Google Scholar 

  53. Tian, Y., Carballo, A., Li, R. & Takeda, K. Road scene graph: A semantic graph-based scene representation dataset for intelligent vehicles. Preprint at arxiv:2011.13588 (2020).

  54. Qu, G. et al. Information measure for performance of image fusion. Electron. Lett. 38, 313–315 (2002).

    Google Scholar 

  55. Han, Y., Cai, Y., Cao, Y. & Xu, X. A new image fusion performance metric based on visual information fidelity. Inf. Fusion 14, 127–135 (2013).

    Google Scholar 

  56. Liu, Z. et al. Objective assessment of multiresolution image fusion algorithms for context enhancement in night vision: A comparative study. IEEE Trans. Pattern Anal. Mach. Intell. 34, 94–109 (2012).

    Google Scholar 

Download references

Acknowledgements

We would like to thank Professor Liu Zheng for providing fusion quality objective assessment toolbox.

Funding

This work was supported by National Natural Science Foundation of China under Project Number 61274021 and 61902282.

Author information

Authors and Affiliations

  1. College of Computer and Information Engineering, Tianjin Normal University, Tianjin, 300387, China

    Ziqian Wang, Yanxiang Hu & Bo Zhang

Authors
  1. Ziqian Wang
    View author publications

    Search author on:PubMed Google Scholar

  2. Yanxiang Hu
    View author publications

    Search author on:PubMed Google Scholar

  3. Bo Zhang
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Z.W. designed and implemented the fusion framework, conducted experiments, and performed data curation. Y.H. and B.Z. conceived the research idea and supervised the project. Z.W. and Y.H. wrote the manuscript. All authors reviewed, edited, and approved the final manuscript.

Corresponding author

Correspondence to Yanxiang Hu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Z., Hu, Y. & Zhang, B. Infrared-visible image fusion with double-attention mechanism and adaptive interaction loss. Sci Rep (2026). https://doi.org/10.1038/s41598-026-45802-9

Download citation

  • Received: 14 January 2026

  • Accepted: 23 March 2026

  • Published: 03 April 2026

  • DOI: https://doi.org/10.1038/s41598-026-45802-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Infrared and visible image fusion
  • Cross-modal inter-attention
  • Adaptive training loss
  • Transformer
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics