Table 1 Comparison of existing approaches and proposed model for image forgery detection.
Drawback addressed | Existing approaches | Proposed model |
---|---|---|
Localization | Approaches like LBP + DCT (e.g.,3) achieve high accuracy but lack localization and computational efficiency | EfficientNetV2B0 backbone with SE-attention and Fused MBConv enhances feature extraction for better localization |
Adaptability to unseen forgeries | Gabor wavelets, LPQ with NMF (e.g.,4) show strong rotation and scale invariance but are limited by handcrafted texture descriptors | Deep learning-based approach with transfer learning and Focal Loss, improving adaptability to unseen forgeries |
Deep semantic understanding | Statistical methods like GLCM and BDCT (e.g.,5) perform well under noise but lack deep semantic understanding | The model integrates deep learning features, enhancing semantic understanding for forgery detection |
Adversarial robustness | Hybrid systems like CNN-DWT (e.g.,8) lack robustness against adversarial attacks and localization | The proposed model is robust to adversarial attacks, enhancing generalization and localization across datasets |
Real-time applicability | Vision Transformer (ViT) with SAM (e.g.,11) faces high computational overhead, limiting real-time applicability | The proposed model is optimized for real-time detection with minimal latency using lightweight architecture |
Robustness under compression | ResNet-50 with multi-scale loss (e.g.,12) improves boundary attention but struggles under compression | EfficientNetV2B0 backbone with SE-attention and Focal Loss ensures robustness under compression and diverse manipulations |
Generalization across datasets | Methods like CLAHE-boosted CNN + SVM (e.g.,9) struggle with cross-dataset generalization | The model outperforms 42 state-of-the-art methods, showing high generalization across diverse datasets and manipulation types |
Complexity and latency | Complex models like ResNet-ViT hybrid (e.g.,19) are computationally expensive with high latency | The proposed model reduces complexity with a lightweight architecture while achieving high performance for real-time use |
Handling subtle manipulations | MAC-Net (e.g.,18) is limited to splicing cases and struggles with subtle manipulations | The proposed model excels in detecting subtle manipulations using SE-attention and multi-resolution feature extraction |
Rotation and scale invariance | Gabor wavelets + LPQ (e.g.,4) provide rotation and scale invariance but are limited by handcrafted features | The proposed model’s deep learning approach allows better feature extraction, improving performance on rotated and scaled images |
Boundary awareness | ME-Net (e.g.,13) enhances edge localization but introduces complexity | EfficientNetV2B0 with SE-attention blocks improves boundary detection without additional complexity |
Cross-resolution generalization | Models like ConvNeXtFF (e.g.,15) struggle with fixed input resolution and high resource demands | The proposed model generalizes across multiple image resolutions and manipulation types effectively |
Handling noisy data | CSR-Net (e.g.,17) introduces spline-based regression but is sensitive to thresholds and curve-fitting | The model handles noisy data effectively with the integration of SE-attention and Focal Loss for robust learning |
Generalization to various manipulation types | LBRT (e.g.,16) is limited to copy-move detection and struggles with subtle manipulations | The proposed model handles multiple forgery types (splicing, copy-move, hybrid) and performs well under varying conditions |