Optimized dual-tree complex wavelet transform aided multimodal image fusion with adaptive weighted average fusion strategy

Ravi, Jampani; Narmadha, R.

doi:10.1038/s41598-024-81594-6

Download PDF

Article
Open access
Published: 04 December 2024

Optimized dual-tree complex wavelet transform aided multimodal image fusion with adaptive weighted average fusion strategy

Jampani Ravi¹ &
R. Narmadha¹

Scientific Reports volume 14, Article number: 30246 (2024) Cite this article

2158 Accesses
Metrics details

Subjects

Abstract

Image fusion is generally utilized for retrieving significant data from a set of input images to provide useful informative data. Image fusion enhances the applicability and quality of data. Hence, the analysis of multimodal image fusion is a new to the research topic, which is designed by combining the images of multimodal into single image in order to preserveexact details. On the other hand, the existing approaches face challenges in the precise interpretation of source images, and also it have only captured local information without considering the wide range of information. To consider these weaknesses, a multimodal image fusion model is planned to develop according to the multi-resolution transform along with the optimization strategy. At first, the images are effectively analyzed from standard public datasets and further, the images given into the Optimized Dual-Tree Complex Wavelet Transform (ODTCWT) to acquire low frequency and high frequency coefficients. Here, certain parameters in DTCWT get tuned with the hybridized heuristic strategy with the Probability of Fitness-based Honey Badger Squirrel Search Optimization (PF-HBSSO) to enhance the decomposition quality. Then, the fusion of high-frequency coefficients is performed using adaptive weighted average fusion technique, whereas the weights are optimized using PF-HBSSOto achieve the optimal fused results. Similarly, the low-frequency coefficients are combined by average fusion. Finally, the fused images undergo image reconstruction using the inverse ODTCWT. The experimental evaluation of the designed multimodal image fusion illustratessuperioritythat distinguishes this work from others.

A dual-stream feature decomposition network with weight transformation for multi-modality image fusion

Article Open access 03 March 2025

A novel MRI PET image fusion using shearlet transform and pulse coded neural network

Article Open access 21 February 2025

A non-sub-sampled shearlet transform-based deep learning sub band enhancement and fusion method for multi-modal images

Article Open access 12 August 2025

Introduction

Image fusion is a recently emerging concept owing to the amplifying requirements of diverse image processing applications especially in medical-aided diagnosis, video surveillance, remote sensing, and so on¹. Image fusion is rapidly growing with various imaging sensors and the accessibility of a huge range of imaging techniqueslikeComputed Tomography (CT), and Magnetic Resonance Images (MRI), etc. It has explored the healthcare community for efficient decision-making and treatment to patients². In addition, the major goal of image fusion also considers various factors like the fused image should be reliable and robust, inconsistencies or artifacts have to be eliminated, and salient information in any of the inputs must not be eradicated². On the other hand, the major issues of image fusion research are similarity across modalities as data formation can be statistically uncorrelated and completely different, the efficient feature illustration of every modality and image noise, etc.³. In addition to these requirements in real-time applications, image fusion-guided disease prognosis and diagnosis have been formulated to assist medical professionals in decision-making as there is a restriction on human interpretation of clinical images owing to their subjectivity⁴. The major reason of multimodal clinical image fusion is to get the information with superior quality by combining the complementary information from various source images⁵.

Image fusion can be performed in various ways including multi-focus, multi-temporal, multi-modal, and multi-view fusion techniques⁶. Among these approaches, multi-modal image fusion is more essential, and it is carried out on images gathered by different sensors. Moreover, it is more helpful in getting precise results, especially in the medical field⁷. Generally, multi-modal image fusion combines both complementary and supplementary information of source images⁸.The multimodal image fusion results in final fused images that is free from redundant and random information. It minimizes the storage space, and stores one final fused image instead of storing two individual images⁹. The amalgamation of two different modalities results in precise localization or detection of abnormalities.The fundamental features of multi-modal image fusion are listed here. It reduces the uncertainty as the joint information from various sensors minimizes the vagueness related to the decision or sensing process¹⁰. The temporal and spatial coverage is extended for better performance. Fusion requires condensed representations to give complete information on images¹¹. Multi-modal image fusion increases the system efficiency by reducing the redundancy in different measurements. It enhances reliability and reduces noise¹².

Generally, image fusion is conducted via different approaches like transform domain as well as spatial domain methods. The consideration of high pass filtering, Intensity and Hue Saturation (IHS), Brovey method etc., are performed in the spatial domain methods. Thus, there is a need of adopting transform domain approaches come into the picture. Some popular transforms used for image fusion process include Contourlet (CT)¹³, Curvelet (CVT)¹⁴, Stationary Wavelet (SWT)¹², Discrete Wavelet (DWT)¹¹, DTCWT¹⁵, and Non-Subsampled Contourlet Transform (NSCT)¹⁶. While estimating with spatial domain methods, transform domain techniques get higher efficiency in terms of image fusion.On the other hand, a reliable, accurate, and appropriate image fusion approach is needed for several classes of images in diverse domains that must be simply interpretable for getting superior image fusion performance¹⁷. Some challenges are inexpensive computation time, uncontrollable acquisition conditions, and errors found in fusing the images. Thus, there is a need of suggesting an innovative multi-modal image fusion approach with the integration of two medical images by adopting transform domain techniques.

The innovationssuggested in this paper are listed here.

To recommend a new Multimodal Image Fusion with intelligent approaches like ODTCWT and adaptive weighted average fusion strategy along with the suggestion of hybrid nature-inspired approaches for performing medical image fusion for better localization of abnormalities and disease diagnosis through the gathered images.
To propose a novel ODTCWT and adaptive weighted average fusion strategy for efficient image fusion using a new hybrid nature-inspired algorithm termed PF-HBSSO for obtaining the salient information from the input images and formulates the objective as the maximization of fusion mutual information.
To implement the PF-HBSSO algorithm by the combination of Honey Badger Algorithm (HBA) as well as Squirrel Search Algorithm (SSA) for recommending ODTCWT by optimizing the filter coefficients and weights for adaptive weighted average fusion model for increasing the convergence rate that aids in maximizing the fused image quality.

Followed by introduction, the forthcoming section is shown.Part II specifiesthe discussion on existing works. Part III recommends an innovativemodel for multimodal image fusion. Part IV specifies the generation of low and high frequency coefficientsusingODTCWT. Part V derives the fusion of high frequency and low frequency by heuristic algorithm. Part VI estimates the results and Part VII completes this paper.

Study on existing works

Literature review

Research work based on deep learning models

In 2021, Zuoet al.¹⁸ has presented a new automated multi-modal medical image fusion strategy using classifier-based feature synthesis with a deep multi-fusion scheme. They have used a pre-trained autoencoder to analyze the fusion strategy by multi-cascade fusion decoder and a feature classifier. The public datasets were used for analyzing the image fusion results. TheParameter-Adaptive Pulse Coupled Neural Network (PAPCNN)was used in thelow and high-frequency coefficients. This image fusion was especially used for classifying brain diseases via the final fused images.

In 2022, Sun et al.¹⁹ have suggested a new deep MFNet using LiDAR data and multimodal VHR aerial images for performing medical image fusion. The multimodal learning and attention strategy was utilized for adaptively fusing the intramodal and intermodal features. A multilevel feature fusion module, pyramid dilation blocks, and a multimodal fusion strategy were implemented. This proposed network has adopted the adaptive fusion of multimodal features, enhanced the effects of global-to-local contextual fusion, and improved the receptive field. Moreover, this network was optimized using a multiscale supervision training strategy. The ablation studies and simulation outcomes have supreme performance of the recommended MFNet.

In 2021, Fu et al.²⁰ have proposed a novel multimodal biomedical image fusion approach through deep Convolutional Neural Network (CNN) and rolling guidance filter”. The VGG model was applied for enhancing the image details and edges. Here, the rolling guidance filter was intended for extracting the detail and base images. Here, the fusion was done on both perceptual, detail, and base images with three diverse fusion methods. They have then chosen the image decomposition constraints by simulation for getting the suitable structure and texture images. In addition, the normalization operation was used to the perceptual images to eradicate noise and feature variations. Finally, it has shown the superior fusion outcomes and achieved better performance in terms of various objective measures.

In 2022, Goyal et al.²¹ have collected the images from standard sources, where NSCT was used for extracting the features. Next, a Siamese Convolutional Neural Network (sCNN) was applied for getting the significant features by weighted fusion. In order to eradicate noise, a new method has beenenhanced rate. Finally, the combination of NSCT + sCNN + FOTGV strategies has helped in enhancing the image fusion and also exhibited higher performance on both quantitative and visual analysis.

In 2022, Venkatesan and Ragupathyan²² have suggested a medical image fusion approach for fusing both MRI and CT images for recommending a healthcare model. To get both spectral and spatial domain features, the hybrid technique by integrating Deep Neural Network and DWT was suggested for getting more accuracy rate while estimating with traditional approaches. The performance enhancement was noticed in terms of standard deviation and average entropy for the designed DWT-CNN fusion approach compared with other wavelet transform methods. The superior efficiency on image fusion was noticed and has achieved considerable fusion performance rate.

In 2018, Bernal et al.²³ have suggested a supervised deep multimodal fusion model for automated human activity and egocentric action recognition to monitor and assist patients. This model has collected video data using body-mounted or egocentric camera and motion data gathered with wearable sensors. The performance was estimated on multimodal public dataset and has analyzed the efficiency. They have used CNN-LSTM architecture for performing the multimodal fusion to get the results regarding automated human activity and egocentric action recognition.

Research work based on machine learning algorithms

In 2021, Duan et al.²⁴ have recommended a new regional medical multimodal image fusion by adopting Genetic Algorithm (GA)-derived optimized approach. A weighted averaging technique was recommended for averaging the images of the source clinical images. Next, a fast Linear Spectral Clustering (LSC) superpixel technique was used for getting the homogenous regions and preserved the detailed information of images, which has segmented the average images and obtained the superpixel labels. The most significant regions were chosen and produced a decision map. The efficiency of the designed fusion approach was estimated via various experimental evaluations. Finally, the performance estimation on GA-based image fusion has shown the superiority on final fused images over others.

Research work based on image processing techniques

In 2022, Kong et al.²⁵ have implemented a new medical image fusion approach via Side Window Filtering (SWF) and Gradient Domain-Guided Filter Random Walk (GDGFRW) in the Framelet Transform (FT) domain. Initially, FT was used on standard multimodal images for getting the residual and approximate illustrations. Then, a new GDGFRW was used for integrating the superiority of GD and GFRW that has built for interpreting the sub-bands, and the fusion was done by SWF. Next, inverse FT was performed for getting the residual models have fused images. The performance were addressed the fusion issues and outperformed the recent representative ones regarding objective estimation and subjective visual efficiency.

Comparative analysis of the existing techniques and proposed model

In 2023, Zhang et al.²⁶ have implemented the novel fusion approach using Infrared-To-Visible Object Mapping (IVOMFuse) for extracting the target region from the infrared image. Further, the Expectation–Maximization (EM) has evolved to tune the probabilities in the target region. The fused image was attained by combining PCA and average fusion strategy. Hence, the final validation was attained by considering the TNO, CVC14, and RoadScene to get the final outcomes. In 2022, Zhou et al.²⁷ have suggested the differential image registration model termed as robust image fusion to assist thermal anomaly detection (Re2FAD). The fusion strategy has been effectively done to enhance the accuracy. In 2023, Gu et al.²⁸ have implemented the improved end-to-end image fusion approach (FSGAN) model to enhance the image fusion approach. Here, the auxiliary network has been extracted to enhance performance with diverse experiments.

Due to the heterogeneous nature, the multimodal image fusion is challenging in accordance of misalignment and non-linear relationships between the input data²⁶. Also, the decomposition based methods are not highly preferred in the fusion model²⁷. However, there is a complexity of discovering better multimodal images with fusion quality estimation in the suggested image fusion approaches²⁸.To eradicate the drawbacks in the existing techniques, an effective deep learning method is implemented in the multimodal image fusion model. By considering the decomposition model, the multimodal image fusion gets enhanced by analyzing the texture details and smoothen the layers. It has the ability to maximize image quality to detect the performance effectively. The diverse implementation outcome is done whereas the recommended framework ensures to get better reliable outcomes.

Problem specification

In multimodal image fusion, it is very challenging to perform the multi-scale analysis that intends to analyze the feature maps extracted using shearlet domain. The analysis of the strengths and weakness of the existing models is given in Table 1. LSC and GA²⁴ are very efficient in both the objective evaluation and visual effects in the segmentation of medical images. However, when increasing the region count, the fusion efficiency may get reduced and increase the running time in the image fusion. PAPCNN¹⁸ provides accurate and detailed information present in the fusion results. On the other hand, it does not completely utilize the fusion layer and decoding layer, which is observed through the quantification evaluations. Deep MFNet¹⁹ attains better performance regarding the visualization and quantification when considering the quantitative and qualitative evaluations. Yet, it does not consider the multi-scale decomposition for encoding and decoding to get performance enhancement. VGG network²⁰ ensures the final fused images through the combination of three informative images like fused base image, detail image as well as perceptual image. Still, it has provided the results with increased color distortion and fusion noise without considering the fusion quality. sCNN²¹ are trained with the concatenated features for considering huge significance.But, it is time-consuming and cannot perform region-based medical image fusion. DWT-CNN²² is efficient on capturing the high-level association among the modalities and obtains the feature descriptors from the spatiotemporal regions. Yet, it may fail on preserving the shift-invariance. GDGFRW²⁵ has an ability to understand the temporal patterns of behavior over data modalities, which have been hidden through the overriding of individual modality. Still, it is unable to perform multi-focus image fusion. RNN²³ has efficiently utilized the location and hand presence as their significant cues for automatically classifying the images. Yet, it does not support the practical, feedback and in-device inference in the fusion. Hence, it is important to develop an enhanced multimodal image fusion model with superior optimization strategy.

Table 1 Strengths and weakness of existing multimodal image fusion models.

Subjects

Abstract

Similar content being viewed by others

A dual-stream feature decomposition network with weight transformation for multi-modality image fusion

A novel MRI PET image fusion using shearlet transform and pulse coded neural network

A non-sub-sampled shearlet transform-based deep learning sub band enhancement and fusion method for multi-modal images

Introduction

Study on existing works

Literature review

Research work based on deep learning models

Research work based on machine learning algorithms

Research work based on image processing techniques

Comparative analysis of the existing techniques and proposed model

Problem specification

An intelligent model for multimodal image fusion

Collection of dataset

Proposed multimodal image fusion model

Generating low and high “frequency coefficients” by optimized dual-tree complex wavelet transform

Optimization concept by PF-HBSSO

Optimized DT-CWT-based image decomposition

High frequency and low frequency image fusion by proposed heuristic algorithm

Developed objective model

High frequency optimization by adaptive weighted average fusion

Low frequency optimization by average fusion

Image reconstruction by inverse ODT-CWT

Experimental analysis

Validation setting

Validation metrics

Experimental images

Estimation over heuristic approaches

Estimation over transform approaches

Comparative estimationof image fusion over heuristic algorithms

Comparative estimation on image fusion over transform algorithms

Comparative analysis of the developed model with recent methods

Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Informed consent

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links