Table 1 Comparative analysis of deepfake detection approaches in healthcare.

From: Enhancing tumor deepfake detection in MRI scans using adversarial feature fusion ensembles

Ref.

Problem statement

Model

Methodology

Benchmark dataset

Outcome

Strengths

Weaknesses

Research gap

Ref5. 2025

Detect deepfake videos by accurately capturing subtle spatial and temporal forgery artifacts while mitigating interference from natural facial motion

GC-ConsFlow

Dual-stream architecture comprising:

• GCAF Stream: Uses a global grouped context aggregation module (GGCA) for spatial feature enhancement via XceptionNet

• FGTC Stream: Leverages optical flow residuals and gradient-based features to capture temporal inconsistencies

FaceForensics++ (evaluated under various compression levels)

Outperforms state-of-the-art detectors with high accuracy (e.g., 94.82% on DF, 87.21% on F2F, 93.83% on FS, 78% on NT in ablation studies)

Robust detection under heavy compression

Effective fusion of spatial and temporal cues

Mitigates noise from natural motion

Increased computational overhead due to dual-stream design

Dependence on accurate optical flow estimation

Addresses the need for an integrated approach that effectively captures both spatial and temporal forgery traces, a gap in many single-stream detectors

Ref6. 2025

Improve the generalization of deepfake detectors by accounting for varying forgery quality in training data and avoiding overfitting to easily detected artifacts

Quality-centric framework

Utilizes a twofold quality assessment:

Static Quality: Uses ArcFace to compute cosine similarity between fake and real images.

Dynamic Quality: Incorporates model feedback (loss-based hardness) to compute a dynamic score.

Combined via curriculum learning and enhanced by Frequency Data Augmentation (FreDA) to upgrade low-quality fakes

Evaluated on multiple datasets (e.g., FaceForensics++, Celeb-DF, DFDC-P)

Achieves an approximate 10% improvement in generalization performance compared to baseline models

Differentiates samples by forgery quality

Novel curriculum learning strategy

Innovative frequency-domain augmentation (FreDA) enhances realism of low-quality samples

Increased training complexity

Sensitivity to quality score parameter settings

Additional computational cost for quality evaluation

Fills the gap of heterogeneous forgery quality in training data, enabling detectors to generalize better across unseen deepfake techniques

Ref7. 2024

Vulnerability to diffusion-model-based medical deepfakes

DiffuDetect

Latent space analysis of diffusion-generated anomalies

Synthetic MRI (Stable Diffusion)

90.1% precision

State-of-the-art against diffusion-based fakes

Requires large synthetic datasets

Untested on clinical-grade scans

Ref8. 2024

Privacy risks in federated deepfake detection

FedSecure

Federated learning with differential privacy

DECATHLON (multi-institutional MRI)

85.7% accuracy

Privacy-preserving; scalable across hospitals

Reduced detection performance (5–8% drop)

Trade-off between privacy and accuracy

Ref9. 2024

Explainability gaps in medical deepfake detection

XAI-Med

Saliency maps + Grad-CAM for interpretable predictions

BRATS, CheXpert

83.6% accuracy

Clinically interpretable outputs

Lower performance than black-box models

Limited adversarial robustness

Ref10. 2023

Generalization gaps in detecting GAN-generated tumor manipulations

GAN-Defender

GAN discriminator repurposed for detection

TCIA, BraTS

88.9% F1-score

Effective against GAN-based deepfakes

Fails on non-GAN synthetic methods (e.g., diffusion models)

Narrow focus on GAN-generated artifacts

Ref11. 2023

Poor sensitivity to 3D spatial inconsistencies in volumetric scans

3D-CNN + LSTM

Spatio-temporal analysis of 3D MRI sequences

ADNI, OASIS-3D

86.2% AUC

Captures 3D contextual and temporal features

Computationally intensive; lacks 2D compatibility

Limited real-time applicability

Ref12. 2023

Weakness in multi-modal deepfake detection (CT + MRI)

FusionNet

Cross-modal attention with contrastive learning

TCIA, MSD-Liver

88.4% accuracy

Robust to multi-modal manipulations

Limited adversarial training

No defense against gradient-based attacks

Ref13. (2023)

Real-time detection latency in clinical workflows

LightDetect

Quantized MobileNetV3 with knowledge distillation

FastMRI, IXI

89.0% accuracy (real-time)

Low-latency (< 50 ms per scan)

Accuracy drops on high-resolution scans

Unsuitable for high-precision tasks

Ref14. 2022

Detection of synthetic tumors in MRI scans with domain-specific challenges

MedNet (ResNet variant)

Transfer learning with attention mechanisms

Private MRI dataset (1,200 scans)

87.5% accuracy

Domain-specific tuning for medical images

Limited adversarial robustness; narrow dataset diversity

No integration of handcrafted features

Ref15. 2021

Privacy leakage through synthetic ECG generation using GANs

DeepFake ECG

Conditional GANs trained on ECG traces

Synthetic ECG dataset

Synthetic ECG dataset

Visual indistinguishability from real ECGs

Addresses privacy issues by data simulation

Not tested for adversarial robustness

Impact on downstream clinical analytics unexplored

Ref16. 2020

Deepfake detection in video using deep ensemble-based feature extraction

DeepFakeStack

Deep ensemble + multimodal feature learning

FaceForensics++, DFDC

92.5% accuracy

Combines spectral, spatial, temporal features

Resource-intensive training

Requires domain-tuned hyperparameter optimization

[Proposed AFFETDS]

Detecting subtle tumor insertions/removals resistant to adversarial attacks

ResNet50 + HOG + SVM Ensemble

Adversarial training (PGD/FGSM), hybrid feature fusion, weighted voting

TCIA + ADNI (1,378 MRI scans)

91.5% accuracy, 0.80 AUC

Combines adversarial robustness, feature diversity, and computational efficiency

Limited to brain MRI; untested on multi-modal data (CT/X-ray)

Requires extension to multi-modal imaging and clinical deployment validation