Introduction

The process of digitalization has revolutionized healthcare by increasing the use of medical imaging technology and spreading the use of electronic health records. Medical imaging, which now includes the modalities of X-rays, MRIs, CT scans, and ultrasound images, has become an indispensable part of modern diagnostics, a key that helps make correct clinical judgments and evidence-based decisions. However, the increased use of internet and cloud servers to store and distribute such sensitive data can be a source of new threats to their security, such as unauthorized access, data alteration, and loss of intellectual property. As a result, data integrity, authenticity, and privacy protection are central concerns of healthcare institutions across the globe1,2. Several techniques have been developed to ensure the security of medical data, including cryptography, steganography, and digital watermarking. Cryptography primarily protects the data during transmission through encryption, while steganography conceals the information within other media to prevent unauthorized detection. However, both techniques may fail to ensure direct content-level integrity of medical images once they are decoded or accessed. Digital watermarking, on the other hand, embeds authentication or ownership information directly into the medical image itself, allowing verification of integrity, origin, and tamper detection without requiring the original image. Therefore, this study adopts a watermarking-based approach to achieve both authentication and integrity preservation within the medical imaging workflow.

Digital watermarking has been proposed as a viable method of handling the issue of computation medical imaging-security because it enables the insertion of invisible data, such as patient identification information, timestamp, or hospital logo into the medical images. The embedded data will act as a covert signature that could be utilized to determine ownership, tampering and maintain authenticity of the medical contents in storage and transmission. In traditional watermarking, which was usually founded on spatial and transform domain, such as DCT, DWT, and SVD, has been shown to provide reasonable trade-offs between imperceptibility and robustness with multiple image manipulations3.

However, these classical methods often lack adaptability to modern adversarial attacks and are not easily scalable to large-scale or real-time applications. Consequently, deep learning techniques, particularly CNNs, have gained traction in the watermarking domain due to their powerful feature learning capabilities and superior resilience to diverse image transformations. CNNs can effectively learn spatial correlations and encode watermarks in complex, high-dimensional feature spaces, making the watermark more robust and imperceptible under compression, noise, and geometric attacks4. Spatial-domain methods operate directly on pixel intensities through filtering, contrast adjustment, or morphological operations. They are simple, intuitive, and effective for local enhancement and noise removal. However, they are highly sensitive to noise, lack robustness against geometric or compression attacks, and may distort image quality during watermark embedding. Frequency-domain methods, such as DWT, DCT, or FFT, analyze global image characteristics like texture and periodicity, providing better robustness to noise, compression, and geometric transformations. Yet, they can lose spatial precision and may introduce artifacts after inverse transformation. Hybrid spatial/frequency approaches integrate the strengths of both domains, preserving spatial details while maintaining high robustness and imperceptibility.

Moreover these methods are especially effective in medical image watermarking, where maintaining diagnostic quality and resistance to attacks are equally crucial. Recent advancements have demonstrated the significant potential of deep learning in enhancing watermark security for medical images. A. R and Malathy proposed a multi-phase watermarking approach utilizing deep learning to embed both visible and invisible watermarks in chest X-ray images, achieving a PSNR as high as 46 dB, indicating minimal distortion to the host image5. Similarly, a study by Arevalo-Ancona et al. introduced a zero-watermarking strategy using a context encoder to extract essential features from medical images and merge them with watermark information, enabling robust authentication without altering the original pixel values6. Another crucial aspect of watermarking in healthcare is its application to protect the intellectual property of deep learning models. With the increased deployment of AI in medical diagnostics, ensuring model ownership has become vital. Researchers like Zhang et al. and Gupta et al. have explored techniques to embed watermarks directly into training data or model weights, enabling reliable ownership verification and resisting model extraction attacks7,8. While not specific to medical imaging, such methods are increasingly being adapted to healthcare AI systems. A notable challenge in medical watermarking is maintaining diagnostic quality. Since even slight alterations in medical images can lead to incorrect diagnoses, watermarking schemes must preserve high image fidelity.

Zero-watermarking techniques, which do not incorporate the watermark into the image but rely on extracted features for verification, have been proposed to address this concern9. These methods often leverage hybrid architectures, combining traditional transforms like NSST and SVD with deep learning feature extractors such as AlexNet to achieve balance between security and medical precision. Furthermore, adversarial considerations are becoming increasingly relevant. Apostolidis and Papakostas highlighted that watermarking can paradoxically serve as an adversarial attack by significantly degrading deep learning model performance when improperly applied. Their study demonstrated over 50% performance degradation in DenseNet and MobileNet models when tested on watermarked CT, MRI, and X-ray images10. This underscores the importance of designing watermarking systems that are not only secure but also adversarially robust. Watermarking strategies have also diversified in terms of content. Beyond embedding patient identifiers, researchers encode QR codes, hospital logos, and even biometric markers into medical images to strengthen authentication protocols. Zhang, Gao, and Yang developed a GAN-based algorithm that embeds patient-specific QR codes into images with exceptional extraction accuracy above 99%, further enhancing privacy protections11. Such innovations align with growing regulatory emphasis on patient confidentiality and data traceability in compliance with standards like HIPAA and GDPR.

In addition, integrating deep learning with hardware acceleration has been explored to enable real-time watermarking suitable for low-latency applications in telemedicine and edge devices. Lee et al. proposed a dedicated ASIC-based processor to implement deep watermarking algorithms in high-resolution image settings, emphasizing energy efficiency and speed12. As remote diagnostics and mobile health devices become ubiquitous, such developments will be critical in scaling secure watermarking infrastructures. Survey literature continues to emphasize the dynamic evolution of watermarking technologies. Luo, Tan, and Cai provided a comprehensive overview of deep watermarking architectures, highlighting the encoder-decoder structures that dominate recent models and their robustness across multiple attack vectors13. Similarly, Singh and Singh cataloged many learning-based watermarking strategies, pointing out the growing role of adversarial training and dual-domain embedding in enhancing resilience14. Despite substantial progress, several research challenges remain. Ensuring watermark robustness under extreme noise and compression, maintaining imperceptibility in high-resolution diagnostic images, and enabling fast extraction on constrained devices are active areas of exploration. Moreover, as watermarking systems become more intelligent, the methods for evading them prompt a parallel rise in studies focused on watermark attack and defense mechanisms15. The following summarizes the main contributions of our work:

  1. (1)

    The proposed methodology introduces a hybrid embedding scheme that leverages both DWT and DCT to imperceptibly embed two distinct watermarks, a binary hospital logo and a structured QR code, into medical images. This integration achieves a high imperceptibility threshold and improved frequency robustness, providing strong resistance against compression and filtering attacks.

  2. (2)

    A novel lightweight CNN decoder is developed to extract embedded watermarks efficiently, even under geometric and signal-based distortions. Despite its compact design, the decoder demonstrates strong reconstruction accuracy, making it suitable for real-time and resource-constrained medical environments.

  3. (3)

    An optimized loss function combining MSE, SSIM, and Sobel edge loss is designed to preserve both pixel-level and structural fidelity during extraction. This hybrid loss facilitates accurate recovery of visual and edge details, enhancing watermark resilience against noise, tampering, and adversarial distortions.

An optimized loss function combining MSE, SSIM, and Sobel edge loss is designed to preserve both pixel-level and structural fidelity during extraction. This hybrid loss facilitates accurate recovery of visual and edge details, enhancing watermark resilience against noise, tampering, and adversarial distortions. The novelty of the proposed scheme lies in its integrated hybrid watermarking framework that combines DWT-DCT frequency embedding with a lightweight CNN-based decoder, a configuration rarely explored in medical image authentication. Unlike traditional watermarking methods that rely solely on spatial or frequency operations, this approach synergizes transform-domain robustness with deep learning–based adaptive extraction, enabling accurate watermark recovery even under severe geometric and signal degradations. Furthermore, the use of a customized loss function incorporating MSE, SSIM, and Sobel edge loss ensures perceptual fidelity and structural preservation, addressing a key limitation of prior works. In this work, we explore a multi-watermarking scheme of medical images. This paper is structured as follows: Section II contains literature review of basic concepts and the related work, watermarking schemes that currently exist, watermarking schemes used in medical images authentication, and deep-learning based methods of extraction. Section III presents the proposed framework, which includes design of the Q R code-based and hospital logo-based watermark, the DWT-DCT embedding and extraction using lightweight CNN. Section IV describes the experimental details, datasets, the evaluations metrics employed and the results including robustness, imperceptibility, and different ablation tests. Section V ends with a conclusion and future investigation.

Related work

Medical image watermarking has been an active area of research for over two decades, driven by the critical need to ensure data integrity, authenticity, and secure transmission of diagnostic images. Over time, researchers have explored various techniques combining transform-based and deep learning methods to improve watermark robustness, imperceptibility, and resilience against attacks. One of the foundational methods for embedding watermarks in medical images involves the use of DWT and DCT16. These two transforms are widely appreciated for their ability to distribute watermark information in both frequency and spatial domains. The combination provides an ideal trade-off between imperceptibility and robustness, especially when targeting diagnostic imaging, where visual fidelity is paramount. A recent study by Alia et al. used a hybrid DWT-DCT method combined with Particle Swarm Optimization (PSO) for optimizing embedding locations, achieving high PSNR and Normalized Cross-Correlation (NCC) values even under denoising attacks17. Building upon such transform-based schemes, several researchers have proposed integrating deep learning models, particularly CNNs, to improve watermark extraction accuracy under complex distortions18. In one notable work, Sharma and Chandrasekaran combined a 3-level DWT decomposition with a CNN-based blind extraction mechanism, demonstrating improved robustness against both adversarial and image processing attacks. The authors emphasized that their CNN model could extract full-size watermark images under increased payload without training on each individual image, a crucial advantage in dynamic medical applications19.

While CNNs offer robustness, model complexity and inference time remain barriers to deploying these solutions on edge devices or in real-time clinical settings. This has prompted exploration into lightweight CNN architectures that preserve performance while reducing computational burden. Yang proposed a lightweight model for medical image classification using DCT-based frequency features and a small artificial neural network, which delivered results close to conventional CNNs while being 15 times faster in training20. Although focused on classification, the efficiency gains shown are promising for watermark extraction tasks, especially in low-resource environments. Hybrid schemes that fuse deep networks with classic transforms are also gaining traction. Nawaz et al. proposed a zero-watermarking scheme combining DWT, DCT, and ResNet101, a deep feature extractor. Their method uses perceptual hashing and logical key encryption to create a feature vector that is XORed with the encrypted watermark. Despite its robustness to geometric and conventional attacks, the deep backbone ResNet101 introduces computational overhead, making it less ideal for lightweight, embedded applications21. Closer to our work, an innovative approach was introduced by Bagheri et al., where Mask R-CNN was employed to determine optimal embedding locations for watermarks in sub-blocks of medical images.

The model guided DWT-DCT embedding based on region-of-interest analysis, significantly improving robustness and transparency. Though not lightweight, this work underscores the value of semantic guidance in watermark placement22. Another relevant work by R and Malathy presents a deep neural network architecture applied to watermark medical images in visible and invisible formats. Using DWT for embedding and optimizing with Adam and SGD, they achieved strong PSNR scores and an extraction accuracy of 91.43% under noise attacks. Although the network isn’t explicitly lightweight, integrating classic transforms and deep learning aligns well with the goal of robust watermarking in medical contexts5. A comparative study by Wong et al. further evaluated hybrid methods such as DWT-DCT-SVD against deep learning techniques like RivaGAN. Their analysis concluded that while deep learning models like RivaGAN show robustness under attack, traditional hybrid methods still outperform imperceptibility, especially vital in medical image integrity. Their findings support the rationale behind your hybrid method that retains imperceptibility while leveraging deep learning for extraction23. Recent literature has also started to evaluate how watermarking affects downstream tasks like classification. For instance, Mohammed et al. developed a lightweight CNN for medical image classification and then tested the influence of watermarking on model performance using RDWT and Arnold Cat Map encryption. Their results showed minimal accuracy loss, reinforcing the viability of integrating watermarking in AI workflows without compromising clinical performance24. Chacko and Chacko proposed a DCT-based watermarking method enhanced by a custom CNN and Harris Hawks Optimization (HHO) for selecting embedding regions. Their model demonstrated high imperceptibility and robustness, achieving 46 dB PSNR and strong resilience against noise and cropping attacks25. This work aligns well with the goal of high-fidelity watermark embedding guided by lightweight deep models. A detailed comparison of existing medical image watermarking techniques is presented in Table 1, highlighting their methodologies, advantages, and limitations relative to the proposed framework.

Table 1 Comparative analysis of medical watermarking techniques.

Proposed methodology

The proposed methodology introduces a dual-watermarking framework designed to ensure medical image authenticity and integrity while maintaining diagnostic quality. The system embeds both a machine-readable QR code and a hospital logo watermark using a hybrid DWT-DCT embedding process, followed by extraction through a lightweight CNN decoder trained with an enhanced loss function. This integration of frequency-domain embedding and deep learning-based extraction provides high imperceptibility, robustness, and real-time efficiency, making it ideal for secure medical imaging applications.The developed model consists of three main different phases, where:

Phase 1: The medical image is embedded with a watermark, consisting of a hospital logo and QR code, utilizing a hybrid DWT and DCT approach to produce a watermarked image.

Phase 2: Gaussian, Noise attacks, salt & pepper, and JPEG compression, are applied to the watermarked images to simulate third-party tampering or common distortions.

Phase 3: The watermarks are extracted using a lightweight CNN with a U-Net-style architecture, recovering the original watermark for evaluation.

Watermark embedding is employed to the images using a semi-fragile watermarking approach. Instead of a single embedding approach, a hybrid DWT + DCT method is adopted to ensure robustness against minor distortions while maintaining sensitivity to tampering. Figure 1 illustrates the watermark embedding phase in this research. The embedding is performed as a preprocessing step, while the CNN model handles extraction. The embedding process includes the following steps:

Step 1: Selection of a medical image as input and conversion to grayscale if needed.

Step 2: Selection of a hospital logo and QR code watermark for embedding.

Step 3: Application of 1-level DWT (Haar wavelet) to decompose the image into subbands (LL, LH, HL, HH); the watermark is embedded into the HL subband’s DCT coefficients using a scaling factor \(\:\alpha\:=\:0.1\).

Step 4: Reconstruction of the watermarked image using inverse DCT and inverse DWT, ensuring imperceptibility.

Step 5: Watermarked images are stored in designated folders for training and testing.

A lightweight CNN model, structured as an encoder-decoder (U-Net-style), is designed for watermark extraction from the watermarked image. The CNN takes the watermarked image as input and predicts the original watermark as output. The model is trained to identify the watermark pattern despite noise or distortions, leveraging a custom loss function combining MSE, SSIM, and Sobel edge loss to ensure high fidelity. The encoder processes the watermarked image through convolutional layers to extract features, while skip connections preserve spatial information. The decoder reconstructs the watermark through upsampling and convolutional layers. The watermarked image, after being subjected to noise attacks, is passed through the CNN for extraction. The extracted watermark is then compared to the original using metrics such as PSNR, SSIM, and optionally BER, as shown in Fig. 1.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Flowchart of the embedding procedure.

Implementation and performance metrics

QR code preparation

Our methodology employs a dual watermarking approach, integrating a hospital logo for institutional identity verification and a QR code to encode structured metadata such as patient information. The logo ensures visual authenticity, while the QR code enables robust, machine-readable data retrieval. This dual approach enhances security, traceability, and compatibility with medical imaging systems. A QR Code is a type of 2D matrix barcode based on Reed–Solomon error correction and finite field polynomial algebra. Let the structured input be a string: M="ID:00786;DATE:2025-03-05;MODALITY: MRI”, This message is encoded into a binary data stream using Mode Indicators, Character Count Indicators, and Alphanumeric mode encoding. For alphanumeric mode, characters are encoded using following mapping:

$$\:\text{CHAR(}\text{i}\text{)}=45\cdot\:{V}_{1}+{V}_{2}$$
(1)

Where \(\:{V}_{1},{V}_{2}\) are integer values from the QR alphanumeric table defined by the QR Code ISO/IEC 18,004 standard. 45 is the base for alphanumeric encoding. After converting the input string into a sequence of codewords, error correction is applied using Reed–Solomon encoding over the Galois Field \(\:GF\left({2}^{8}\right)\). Let k be the data codewords number and t be the error correction codewords number (\(\:t=\:n-k\)). The message is expressed as a polynomial:

$$\:m\left(d\right)={m}_{0}+{m}_{1}d+{m}_{2}{d}^{2}+\cdots\:+{m}_{k-1}{d}^{k-1}$$
(2)

The error correction polynomial r(x) is then computed by multiplying m(x) with \(\:{x}^{t}\), and reducing it modulo the generator polynomial \(\:gp\left(z\right)\):

$$\:r\left(x\right)=m\left(x\right)\cdot\:{x}^{t}\text{m}\text{o}\text{d}gp\left(z\right)$$
(3)

where gp(\(\:z\)) is defined as:

$$\:gp\left(z\right)=\left(z-{{\upalpha\:}}^{1}\right)\left(z-{{\upalpha\:}}^{2}\right)\cdots\:\left(z-{{\upalpha\:}}^{t}\right)$$
(4)

Here, \(\:{\upalpha\:}\) is a primitive element of \(\:\text{GF}\left({2}^{8}\right)\). The resulting polynomial r(x) yields the Reed–Solomon parity symbols which are appended to the data codewords to form the complete encoded message. Following error correction, the complete codeword stream is placed into a 2D matrix, typically of \(\:21\times\:21\) for a Version 1 QR code. This matrix includes finder patterns, alignment patterns, and timing patterns in predefined regions. To minimize clustering and enhance readability, a masking pattern is applied.

$$\:f\left(x,y\right)=\left\{\begin{array}{c}1,\:\:mask\:condition\:true\:\\\:0,\:\:otherwise\end{array}\right.$$
(5)

The data bits b(\(\:x,y\)) are modified accordingly using:

$$\:{b}^{{\prime\:}}\left(x,y\right)=b\left(x,y\right)\oplus\:f\left(x,y\right)$$
(6)

This masking process ensures uniform pixel distribution, which is essential for robust QR code recognition under varying imaging conditions. After matrix generation and masking, the final QR code is rendered as a binary image, For use in watermark embedding, the QR image is resized to match the resolution of the target embedding region. Each pixel value is normalized to the range [-1, 1] for consistent embedding as follows:

$$\:{Q}_{f}\left(x,y\right)=2\cdot\:Q\left(x,y\right)-1$$
(7)

Where \(\:Q\left(x,y\right)\in\:\left\{\text{0,1}\right\}\) is the binary QR matrix and \(\:{Q}_{f}\left(x,y\right)\in\:\{-\text{1,1}\}\) is the transformed grayscale value used in the embedding process, Algorithm 1 shows QR watermark creation.

Algorithm 1
Algorithm 1The alternative text for this image may have been generated using AI.
Full size image

QR Watermark Creation.

Embedding process

The integration of dual watermarks, a hospital logo and a QR code into medical images is achieved through a hybrid DWT and DCT embedding pipeline, designed to ensure imperceptibility, robustness, and compatibility with the lightweight CNN-based extraction model. The process begins with a grayscale medical image \(\:I\in\:{R}^{H\times\:W}\), where H and W represent the height and width, respectively. Two watermarks are prepared: a hospital logo \(\:{W}_{l}\in\:{\text{0,255}}^{{N}_{l}\times\:{M}_{l}}\) and a QR code \(\:{W}_{q}\in\:{\text{0,255}}^{{N}_{q}\times\:{M}_{q}}\), both are resized to (\(\:\frac{H}{4}\times\:\frac{W}{4}\)) to match the DWT subband dimensions. The resizing function for each watermark is defined as:

$$REW_{r} \left[ {i,j} \right] = W\left[ {\left\lfloor {i \cdot \frac{N}{{H/4}}} \right\rfloor ,\;\left\lfloor {j \cdot \frac{M}{{W/4}}} \right\rfloor } \right],\;i \in \left[ {0,\frac{H}{4}} \right),\;j \in \left[ {0,\frac{W}{4}} \right)$$
(8)

A 1-level DWT using the Haar wavelet is applied to the cover image I, decomposing it into four subbands:

$$\:\left[LL,\left(LH,HL,HH\right)\right]={\text{DWT}}_{2}\left(I,\text{Haar}\right)$$
(9)

Each subband has dimensions ( \(\:\frac{\text{H}}{2}\times\:\frac{\text{W}}{2}\) ). To balance robustness and imperceptibility, the hospital logo ( \(\:{W}_{r,l}\) ) is embedded into the HL subband, which captures vertical edge details, while the QR code (\(\:{W}_{r,q}\) ) is embedded into the LH subband, which captures horizontal edge details. This dual-subband approach leverages the distinct frequency characteristics of each watermark to minimize interference. The DCT is applied to the selected subbands:

$$\:{D}_{HL}=\text{DCT}2\left(HL\right),\hspace{1em}DLH={\text{DCT}}_{2}\left(LH\right)$$
(10)

where \(\:{\text{DCT}}_{2}\) denotes the 2D Discrete Cosine Transform with orthogonal normalization. The watermarks are normalized to [0,1] as \(\:{W}^{{\prime\:}}r,l={W}_{r,l}/255\) and \(\:{W}^{{\prime\:}}r,q=Wr,q/255\). Embedding is performed by modifying the DCT coefficients with a scaling factor \(\:\alpha\:=\:0.1\) :

$$\:{D}^{{\prime\:}}HL\left[i,j\right]=DHL\left[i,j\right]+{\upalpha\:}\cdot\:{W}_{r,l}^{{\prime\:}}\left[i,j\right],\hspace{1em}i<\frac{H}{4},j<\frac{W}{4}$$
(11)
$$\:{D}^{{\prime\:}}LH\left[i,j\right]=DLH\left[i,j\right]+{\upalpha\:}\cdot\:{W}_{r,q}^{{\prime\:}}\left[i,j\right],\hspace{1em}i<\frac{H}{4},j<\frac{W}{4}$$
(12)

The modified subbands are obtained via inverse DCT:

$$\:H{L}^{{\prime\:}}=\text{IDCT}2\left({D}^{{\prime\:}}HL\right),L{H}^{{\prime\:}}=\text{IDCT}2\left({D}^{{\prime\:}}LH\right)$$
(13)

Where \(\:{\text{IDCT}}_{2}\) is the 2D inverse DCT. The watermarked image is reconstructed by applying the inverse DWT, combining the modified and unmodified subbands:

$$\:{I}_{w}={\text{IDWT}}_{2}\left(\left[LL,\left(L{H}^{{\prime\:}},H{L}^{{\prime\:}},HH\right)\right],\text{Haar}\right)$$
(14)

The pixel values of \(\:{I}_{w}\) are clipped to the valid range [0, 255]:

$$\:{I}_{w}\left[i,j\right]=\text{max}\left(0,\text{min}\left(255,{I}_{w}\left[i,j\right]\right)\right)$$
(15)

This yields the final watermarked image \(\:{I}_{w}\in\:{\left[\text{0,255}\right]}^{H\times\:W}\), containing both the hospital logo and QR code. The dual embedding ensures that the CNN model can be trained to extract either watermark from \(\:{I}_{w}\), using pairs (\(\:{I}_{w},{W}_{r,l}\)) and (\(\:{I}_{w},{W}_{r,q}\)) for supervised learning. The process maintains semi-fragile properties, as the watermarks resist minor distortions but are sensitive to significant tampering, Algorithm 2 shows watermark embedding.

Algorithm 2
Algorithm 2The alternative text for this image may have been generated using AI.
Full size image

Watermark Embedding.

To control the embedding strength and maintain a balance between imperceptibility and robustness, an embedding factor (α) was introduced. The factor determines the magnitude of the watermark coefficients added to the transformed subbands. Through iterative testing on validation images, α was varied between 0.01 and 0.1, and the optimal value of α = 0.05 was selected, achieving a PSNR of 68.75 dB and NC of 1.0, ensuring both visual fidelity and watermark resilience. Traditional DWT–DCT watermarking methods often face high computational cost and vulnerability to geometric attacks. The proposed framework addresses these challenges through a lightweight CNN decoder that replaces complex inverse transforms, significantly reducing computation time. The enhanced loss function MSE, SSIM, Edge improves structural preservation, enabling robust watermark recovery under rotation and cropping while maintaining high imperceptibility and efficiency.

Light-weight CNN

In this study, we developed a lightweight CNN decoder designed specifically for the extraction of embedded watermarks from medical images. The model architecture is intentionally compact, comprising a streamlined encoder-decoder structure with minimal convolutional layers and reduced filter dimensions. The input to the network is a single-channel image of size \(\:256\times\:256\:\times\:1\), corresponding to the grayscale watermarked image. Rather than aiming for complex feature extraction or classification tasks, the primary objective is to enable efficient and accurate recovery of both QR code and hospital logo watermarks with minimal computational overhead. Despite its simplicity, the model achieves reliable extraction performance and maintains a low memory footprint, making it highly suitable for real-time deployment in medical edge devices or embedded systems where processing power and storage are constrained. The structure of the lightweight CNN is depicted in Table 2.

Table 2 CNN architecture for watermark extraction.

The proposed lightweight CNN features a shallower design with a depth of three and only 18 layers. This compact structure significantly reduces computational complexity and model size, making it suitable for deployment in resource-constrained medical systems.

Loss function design

To improve the robustness and perceptual quality of watermark extraction, a custom loss function is formulated by integrating three complementary components: MSE, SSIM, and Sobel edge loss. The MSE component ensures pixel-wise fidelity by minimizing the average squared difference between the predicted watermark \(\:\widehat{W}\) and the ground truth watermark W:

$$\:{\mathcal{L}}_{\text{MSE}}=\frac{1}{N}{\sum\:}_{i=1}^{N}{\left({W}_{i}-\widehat{{W}_{i}}\right)}^{2}$$
(16)

To preserve structural information, SSIM is employed, which evaluates contrast, structural and luminance differences between the predicted and target watermarks. The SSIM loss can be explained as follows:

$$\:{\mathcal{L}}_{\text{SSIM}}=1-\text{SSIM}\left(W,\widehat{W}\right)$$
(17)

Sobel edge loss is introduced to encourage preservation of edge-level features, which are critical for accurate watermark shape recovery, especially in QR codes and hospital logo. Gradient maps of both predicted and true watermarks are computed using Sobel filters, and the edge loss is defined as:

$$\:{\mathcal{L}}_{\text{Edge}}=\frac{1}{N}\sum\:_{i\:=\:1}^{N}\parallel\:{\nabla\:}_{Soble}\left({W}_{i}\right)-{\nabla\:}_{Soble}\left(\widehat{{W}_{i}}\right){\parallel\:}^{2}$$
(18)

The final enhanced loss function is a weighted combination of the three components:

$$\:{\mathcal{L}}_{\text{e}\text{n}\text{h}\text{a}\text{n}\text{c}\text{e}\text{d}}={\lambda\:}_{1}\cdot\:{\mathcal{L}}_{\text{M}\text{S}\text{E}}+{\lambda\:}_{2}\cdot\:{\mathcal{L}}_{\text{S}\text{S}\text{I}\text{M}}+{\lambda\:}_{3}\cdot\:{\mathcal{L}}_{\text{E}\text{d}\text{g}\text{e}}$$
(19)

with empirically tuned weights \(\:{{\uplambda\:}}_{1},\:{{\uplambda\:}}_{2},{{\uplambda\:}}_{3}\) to 0.4,0.4 and 0.2 respectively. This hybrid loss guides the lightweight CNN to not only reconstruct watermarks with pixel-level accuracy but also retain their structural and edge integrity, thereby enhancing resilience to noise and common image distortions.

Model training

Once the lightweight CNN decoder and enhanced loss function are defined, the model is trained in a supervised manner using the previously prepared dataset of watermarked medical images and their corresponding original watermarks. The training process is optimized using the Adam optimizer, which combines the advantages of Adaptive Gradient Algorithm (AdaGrad) and Root Mean Square Propagation (RMSProp). The update rule for model parameters \(\:{\uptheta\:}\) at iteration t is defined as:

$$\:{{\uptheta\:}}_{t}={{\uptheta\:}}_{t-1}-{\upeta\:}\cdot\:\frac{\widehat{{m}_{t}}}{\sqrt{\widehat{{v}_{t}}}+{\epsilon}}$$
(20)

where \(\:\eta\:\) is the learning rate, \(\:\widehat{{m}_{t}}\) and \(\:\widehat{{v}_{t}}\) are the bias-corrected first and second moment estimates of the gradient, and \(\:\epsilon\) is a small constant to avoid division by zero. To prevent overfitting and improve generalization, early stopping is employed. The training process is monitored on a validation set, and training halts automatically if the model fails to improve over a predefined number of epochs. Additionally, to further mitigate overfitting, several complementary strategies were adopted. Data augmentation techniques, including random rotations, horizontal flips, and small intensity variations, were applied to increase dataset diversity and enhance the model’s generalization. Dropout layers with a rate of 0.3 were incorporated into the CNN decoder to minimize neuron co-adaptation during training. These combined measures, along with early stopping, ensured stable convergence and prevented overfitting, as reflected by the smooth alignment of training and validation loss curves. Furthermore, the best-performing model weights are preserved using a checkpoint saving strategy, wherein the model achieving the highest validation PSNR is stored for inference. This ensures that only the most optimal parameters, based on the watermark extraction quality, are retained. Following the model training phase, the inference stage is executed to assess the lightweight CNN’s ability to extract embedded watermarks from unseen watermarked medical images. This process utilizes the optimal model parameters \(\:{{\uptheta\:}}^{*}\), saved during training at the checkpoint corresponding to the minimum validation loss. Given an unseen watermarked image \(\:{I}_{\text{unseen}}\in\:{R}^{256\times\:256\times\:1}\), containing both the QR code and hospital logo watermarks embedded via the hybrid method, the trained model performs a forward pass to predict the embedded watermark:

$$\:{y}_{\text{pred}}={f}_{\text{CNN}}\left({I}_{\text{unseen}}\hspace{0.17em}|\hspace{0.17em}{{\uptheta\:}}^{*}\right)$$
(21)

where \(\:{f}_{\text{CNN}}\) denotes the trained CNN decoder’s forward propagation function, comprising convolutional, pooling, and upsampling layers as described in Table 1. The output \(\:{y}_{\text{pred}}\in\:{R}^{64\times\:64\times\:1}\) represents the predicted watermark image, which is resized and aligned with the resolution of the original embedded watermark. To ensure compatibility with the trained model, the input \(\:{I}_{\text{unseen}}\) is first normalized to the range [-1, 1] using the transformation:

$$\:{I}_{\text{norm}}=2\cdot\:\frac{{I}_{\text{unseen}}-{I}_{\text{min}}}{{I}_{\text{max}}-{I}_{\text{min}}}-1$$
(22)

where \(\:{I}_{\text{min}}=0\) and \(\:{I}_{\text{max}}=255\) are the minimum and maximum grayscale pixel values. The inference is conducted in a batch-wise manner, typically with a batch size of 1, to optimize computational efficiency during testing. To enhance visual clarity and prepare the output for metric evaluation, the predicted watermark \(\:{y}_{\text{pred}}\) undergoes binary thresholding, yielding \(\:{y}_{\text{bin}}\), which improves readability for QR decoding and logo verification:

$$\:{y}_{bin}\left[i,j\right]=\left\{\begin{array}{c}255,\:\:if\:{y}_{pred}[i,j]\:\:\ge\:\:\tau\:\:\:\:\\\:0,\:\:otherwise\end{array}\:with\:\tau\:\:=\:128\right.$$
(23)

This binarized output enables robust comparison against the original watermark and facilitates quantitative evaluation.

Performance evaluation measurements

To assess the effectiveness of the proposed watermarking framework and the performance of the lightweight CNN decoder, multiple quantitative evaluation metrics are employed. For the fidelity and imperceptibility of the watermarking scheme, two widely accepted image quality measures are used: Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM). PSNR evaluates the pixel-level distortion between the original medical image I and the watermarked image \(\:{I}^{\text{WM}}\), defined as:

$$\:PSNR\left(I,{I}^{\text{WM}}\right)=10\cdot\:{\text{log}}_{10}\left(\frac{{\left(MA{X}_{I}\right)}^{2}}{\text{MSE}\left(I,{I}^{\text{WM}}\right)}\right)$$
(24)

where \(\:MA{X}_{I}=255\) for 8-bit grayscale images, and MSE denotes the mean squared error. A higher PSNR value indicates lower distortion and better imperceptibility of the embedded watermark. To assess perceptual similarity, the SSIM index is calculated between the original watermark W and the extracted watermark \(\:\widehat{W}\):

$$\:\text{SSIM}\left(W,\widehat{W}\right)=\frac{\left(2{{\upmu\:}}_{W}{{\upmu\:}}_{\widehat{W}}+{C}_{1}\right)\left(2{{\upsigma\:}}_{W\widehat{W}}+{C}_{2}\right)}{\left({{\upmu\:}}_{W}^{2}+{{\upmu\:}}_{\widehat{W}}^{2}+{C}_{1}\right)\left({{\upsigma\:}}_{W}^{2}+{{\upsigma\:}}_{\widehat{W}}^{2}+{C}_{2}\right)}$$
(25)

where \(\:{{\upsigma\:}}^{2}\), and \(\:{{\upsigma\:}}_{W\widehat{W}}\) represent the local means, variances, and covariance of W and \(\:\widehat{W}\), respectively, and \(\:{C}_{1},{C}_{2}\) are stability constants. SSIM values closer to 1 denote higher structural similarity and visual fidelity. For robustness evaluation, particularly under noise attacks, BER is used to quantify the number of erroneous bits in the extracted watermark:

$$\:\text{BER}=\frac{1}{n}{\sum\:}_{i=1}^{n}1\left[{W}_{i}\ne\:\widehat{{W}_{i}}\right]$$
(26)

Where \(\:1\left[\cdot\:\right]\) is the indicator function and n is the total number of bits in the watermark.

Experimental results and discussion

Experimental materials and setup

To evaluate the performance of the proposed watermarking framework and lightweight CNN decoder, a dataset of medical images was prepared using publicly available and verifiable sources. The host medical images were primarily collected from open-access datasets such as the Brain Tumor MRI33, compiled from Figshare, SARTAJ, and Br35H sources, comprises 7,023 MRI images categorized into four classes: glioma, meningioma, pituitary, and no tumor. All images were preprocessed to a fixed resolution of \(\:256\:\times\:256\) pixels and converted to single-channel grayscale format to align with the input requirements of the CNN model. Two types of watermarks were used for embedding: (i) a binary hospital logo, representing institutional ownership, and (ii) a structured QR code, encoding metadata such as patient ID, scan type, and date. The QR code was generated dynamically using Python libraries and resized to \(\:64\:\times\:64\) pixels for embedding. The entire framework was implemented using Python 3.9 with TensorFlow 2.13 and Keras as the deep learning backend. Image processing and transformation operations were performed using OpenCV, NumPy, PyWavelets, and SciPy. Experiments were performed on a machine equipped with an Intel Core i7 CPU (11th Gen), 16 GB RAM, and NVIDIA GeForce RTX 3050 GPU (4GB), running Windows 11 (64-bit). The platform ensured efficient model training and inference with support for GPU acceleration.

Robustness analysis

The proposed watermarking framework was evaluated under four conventional attacks. Each attack was applied at increasing intensity levels to assess the model’s robustness across varying degradation conditions. Under Salt & Pepper noise in Table 3, the model maintained acceptable extraction fidelity, with PSNR degrading from 43.65 dB to 36.80 dB as noise density increased from 5% to 30%, while BER rose proportionally from 0.21% to 0.87%. Similarly, Gaussian noise introduced progressive degradation, where PSNR dropped from 45.15 dB to 37.55 dB across noise variances of 5% to 30%, accompanied by BER increasing to 0.71%. In contrast, the method demonstrated high resilience to JPEG compression, retaining a PSNR of 48.00 dB and BER of only 0.05% at a quality factor of 90. Even at a reduced quality factor of 30, PSNR and SSIM remained at 42.10 dB and 0.954 respectively, confirming robustness in frequency-domain compression scenarios. Performance under median filtering further highlighted the method’s stability, achieving the highest PSNR of 50.12 dB and near-zero BER 0.02% with a [3 × 3] kernel. Even with more aggressive filters like [9 × 9], PSNR remained above 45 dB with BER contained at 0.18%. Table 4 presents the performance of the proposed watermarking framework under various geometric distortions. The model demonstrated moderate resilience to rotation, with PSNR dropping from 44.22 dB at + 5° to 41.10 dB at + 15°, and a corresponding BER increase from 0.23% to 0.46%. Scaling operations were better tolerated, maintaining high SSIM (> 0.96) and low BER (< 0.3%) at both 110% and 90% scaling factors. Translation attacks, involving pixel shifts, caused minimal impact, with PSNR above 42 dB and BER remaining under 0.5%. In contrast, cropping resulted in the most severe degradation, particularly with asymmetric crop patterns, pushing BER as high as 0.79%. Horizontal flipping and shearing moderately affected watermark structure, yielding BER values of 0.52% and 0.36%, respectively. Figures 2 and 3 illustrates the robustness of the proposed watermarking framework under a range of conventional and geometric attacks, showcasing the system’s ability to preserve watermark integrity in distorted conditions. Figure 2 subfigure (a) demonstrates the results under Salt & Pepper (S&P) noise at four increasing intensity levels: 5%, 10%, 20%, and 30%. Despite the presence of impulse noise, both the extracted QR code and the hospital logo remain clearly visible, indicating high resilience to this type of distortion. Figure 2 subfigure (b) shows the results of median filtering applied with kernel sizes of 3 × 3, 5 × 5, 7 × 7, and 9 × 9. The extracted watermarks maintain their visual clarity even as the smoothing intensity increases, reflecting the method’s tolerance to denoising operations, which are commonly used in pre-processing pipelines. Figure 3 subfigure (a) evaluates the framework against geometric rotation attacks at angles of + 5°, + 15°, and − 10°. The watermarked images are rotated, yet the extracted QR and logo watermarks remain structurally intact, suggesting a level of robustness against affine transformations. Lastly, Fig. 3 subfigure (b) highlights cropping attacks, including a 10% crop from all sides and a 15% crop from the left and bottom of the image. While some visual loss occurs due to missing spatial regions, the core structure of both the QR code and the logo remains recoverable, showing the semi-fragile nature of the watermarking approach. Overall, the figure confirms the effectiveness of the proposed dual watermarking system under various challenging conditions, supporting its applicability in medical image authentication and integrity verification. The proposed method overcomes DWT–DCT limitations by achieving high robustness of PSNR 68.75dB and NC 1.0 under geometric attacks while reducing computation time to 0.38 s for a 256 × 256 image, showing clear efficiency improvements over conventional hybrid approaches.

Table 3 Water extraction performance under conventional attacks.
Table 4 Watermark extraction performance under geometric attacks.

Imperceptibility analysis

Table 5 provides a comparative evaluation of the proposed watermarking method against several existing state-of-the-art techniques in terms of PSNR, embedding domain, underlying technique, and NC. The proposed method achieves the highest PSNR of 68.75 dB, surpassing both spatial and frequency domain baselines, indicating superior imperceptibility under ideal conditions. PSNR values 51.14 dB and 53.65 dB, respectively and slightly reduced NC in34. Frequency-based methods such as24,35 leverage DWT and hybrid DWT-DCT-SVD frameworks, yielding improved robustness but still falling short in maximum fidelity. The proposed approach has consistent NC of 1.0 across all compared methods except36 shows strong watermark integrity, but the proposed method’s higher PSNR indicates better imperceptibility without sacrificing robustness. Additionally, the minimal Kullback–Leibler Divergence and Jensen–Shannon Distance confirm superior distributional similarity between original and watermarked images.

Table 5 Comparative analysis with existing techniques.

Loss function analysis

To assess the individual contribution of each loss component in the proposed training pipeline, an ablation study was conducted using different loss configurations. The performance impact in terms of PSNR, SSIM, and BER under standard distortion conditions is presented in Table 6; Fig. 4. Starting with MSE alone, the model showed moderate extraction accuracy with a PSNR of 58.42 dB. Adding SSIM improved structural recovery, increasing PSNR and SSIM while reducing BER. Incorporating Edge loss further enhanced edge clarity, vital for QR and logo reconstruction. The full loss combination MSE, SSIM, Edge achieved the best results with 64.87 dB PSNR and the lowest BER. While spatial domain methods such as36,37 utilize LSB substitution, they suffer from lower robustness and embedding fidelity, as reflected by their relatively modest36,37. In digital watermarking, achieving an optimal balance between imperceptibility, robustness, and capacity remains a fundamental challenge. In the proposed DWT–DCT hybrid model, watermark embedding in the mid-frequency coefficients ensures high robustness against compression and filtering attacks, while maintaining strong imperceptibility, as evidenced by the achieved PSNR values between 64.87 and 68.75 dB. The inclusion of a lightweight CNN decoder further enhances extraction accuracy without compromising computational efficiency, enabling real-time applicability in medical imaging systems. However, as robustness increases, particularly under geometric distortions such as rotation and cropping, a minor loss in perceptual quality may occur due to stronger frequency domain embedding. This trade-off was intentionally optimized by adjusting the embedding strength and loss function weights λ₁=0.4, λ₂=0.4, λ₃=0.2 to achieve a balanced compromise between resilience and visual fidelity. Thus, the experimental outcomes demonstrate that the proposed framework achieves a favorable equilibrium between robustness and imperceptibility, which is essential for diagnostic grade medical image authentication. As shown in Fig. 5, during model training, both training and validation losses decreased steadily until convergence, demonstrating stable learning dynamics without overfitting. The model achieved its minimum validation loss of 0.092 at epoch 15, after which the loss plateaued, triggering early stopping to prevent performance degradation. This behavior reflects effective regularization and proper model capacity for the given dataset. The final optimized model parameters (θ*) correspond to the checkpoint with the lowest validation loss, ensuring consistent watermark extraction quality across unseen medical images.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Proposed watermarking under conventional attacks.

Training validation accuracy

In the proposed dual-watermarking framework, the lightweight CNN decoder is designed to accurately reconstruct the embedded watermarks comprising the QR code and hospital logo from distorted medical images. To evaluate the network’s learning effectiveness and generalization capability, both training and validation accuracy are continuously monitored throughout the optimization process. The accuracy metric quantifies the decoder’s ability to reproduce the ground-truth watermark with high fidelity, serving as a direct indicator of extraction robustness. Conversely, the error rate, defined as 1 - \(\:\text{accuracy}\), represents the residual uncertainty in reconstruction performance. The model achieves its peak validation accuracy near the early stopping epoch approximately epoch 15, which demonstrates rapid convergence and efficient generalization as shown in Fig. 6. This superior performance is attributed to the enhanced hybrid loss function integrating MSE, SSIM, and Sobel edge components which collectively enable the CNN to retain both structural integrity and fine-grained texture information during watermark recovery.

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Proposed watermarking under geometric attacks.

Table 6 Ablation results of proposed model.
Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

Proposed method with state of art comparison.

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

Training and validation loss curve with early stopping indicator.

Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.
Full size image

Training and validation accuracy of lightweight CNN.

Comparative payload analysis

The proposed dual-watermarking framework effectively embeds two independent watermarks hospital logo and QR code within a 256 × 256 medical image, achieving a total payload of 2048 bits, which corresponds to a payload density of approximately 0.03125 bpp. Despite operating at a slightly lower bpp than28 512 × 512 approaches, the proposed system maintains significantly higher visual fidelity PSNR 64.87 dB and structural integrity SSIM 0.954 due to optimized DWT-DCT coefficient selection and CNN-based decoding as shown in Table 7.

Table 7 Comparative payload.

Computational efficiency

The proposed dual-watermarking framework operates efficiently on 256 × 256 medical images, achieving an average processing time of 0.182 s per image, which represents an approximate 55% reduction in computational cost compared to28 as showon in Table 8. This improvement is primarily attributed to the lightweight CNN decoder architecture 0.65 MB, which reduces memory footprint and accelerates watermark extraction without sacrificing accuracy.

Table 8 Computational efficiency.

Limitations of the proposed scheme

Although the proposed dual-watermarking framework demonstrates high robustness, imperceptibility, and efficient watermark recovery, certain limitations remain. First, the model’s robustness was evaluated primarily on standard geometric and signal-based attacks; performance under severe compression or real-time transmission noise in clinical environments may require further validation. Second, the current implementation relies on a fixed DWT-DCT configuration, which may not generalize optimally to all medical imaging modalities such as CT or PET scans. Additionally, while the lightweight CNN decoder offers low computational complexity, its performance might degrade with higher-resolution or multi-channel medical images due to memory constraints. Finally, the scheme focuses on grayscale images; extending it to multi-spectral and volumetric medical data could further enhance its applicability. To overcome current limitations, future work will focus on integrating adaptive frequency-domain configurations to enhance robustness across diverse modalities. Incorporating realistic transmission noise models will strengthen reliability in clinical environments. The lightweight CNN will be extended for 3D and color medical images, while attention-based modules may further improve extraction precision and efficiency.

Conclusion

In this paper, we proposed a lightweight and robust dual watermarking framework for medical image authentication using a hybrid DWT-DCT embedding scheme and deep learning-based extraction. The method embeds a QR code carrying structured metadata and a hospital logo into medical images, enhancing both machine and visual verification. A shallow CNN decoder was trained with a custom loss function combining MSE, SSIM, and Sobel edge loss to ensure accurate and structurally consistent watermark recovery. Experimental results showed strong imperceptibility and robustness against common distortions such as Salt & Pepper noise, median filtering, cropping, and rotation. Watermarked images maintained high quality with PSNR values between 64.87 dB and 68.75 dB, and NC scores up to 1.0. The CNN model’s small footprint 0.65 MB and fast inference time make it ideal for real-time and embedded applications. Additionally, the semi-fragile nature of the system enables tamper detection while preserving the embedded watermark content. In future work, we plan to extend the system with tamper localization capabilities by integrating spatial attention mechanisms within the CNN, allowing identification of manipulated regions. Expanding the framework toward 3D and volumetric medical image authentication can further strengthen its clinical relevance, particularly in radiology and oncology imaging. Another promising direction involves combining watermarking with AI-based anomaly detection to enable simultaneous image authentication and tampering localization, ensuring comprehensive integrity verification in clinical workflows.