Table 1 Comparison of existing approaches and proposed model for image forgery detection.

From: Multi-resolution transfer learning for tampered image classification using SE-enhanced fused-MBConv and optimized CNN heads

Drawback addressed

Existing approaches

Proposed model

Localization

Approaches like LBP + DCT (e.g.,3) achieve high accuracy but lack localization and computational efficiency

EfficientNetV2B0 backbone with SE-attention and Fused MBConv enhances feature extraction for better localization

Adaptability to unseen forgeries

Gabor wavelets, LPQ with NMF (e.g.,4) show strong rotation and scale invariance but are limited by handcrafted texture descriptors

Deep learning-based approach with transfer learning and Focal Loss, improving adaptability to unseen forgeries

Deep semantic understanding

Statistical methods like GLCM and BDCT (e.g.,5) perform well under noise but lack deep semantic understanding

The model integrates deep learning features, enhancing semantic understanding for forgery detection

Adversarial robustness

Hybrid systems like CNN-DWT (e.g.,8) lack robustness against adversarial attacks and localization

The proposed model is robust to adversarial attacks, enhancing generalization and localization across datasets

Real-time applicability

Vision Transformer (ViT) with SAM (e.g.,11) faces high computational overhead, limiting real-time applicability

The proposed model is optimized for real-time detection with minimal latency using lightweight architecture

Robustness under compression

ResNet-50 with multi-scale loss (e.g.,12) improves boundary attention but struggles under compression

EfficientNetV2B0 backbone with SE-attention and Focal Loss ensures robustness under compression and diverse manipulations

Generalization across datasets

Methods like CLAHE-boosted CNN + SVM (e.g.,9) struggle with cross-dataset generalization

The model outperforms 42 state-of-the-art methods, showing high generalization across diverse datasets and manipulation types

Complexity and latency

Complex models like ResNet-ViT hybrid (e.g.,19) are computationally expensive with high latency

The proposed model reduces complexity with a lightweight architecture while achieving high performance for real-time use

Handling subtle manipulations

MAC-Net (e.g.,18) is limited to splicing cases and struggles with subtle manipulations

The proposed model excels in detecting subtle manipulations using SE-attention and multi-resolution feature extraction

Rotation and scale invariance

Gabor wavelets + LPQ (e.g.,4) provide rotation and scale invariance but are limited by handcrafted features

The proposed model’s deep learning approach allows better feature extraction, improving performance on rotated and scaled images

Boundary awareness

ME-Net (e.g.,13) enhances edge localization but introduces complexity

EfficientNetV2B0 with SE-attention blocks improves boundary detection without additional complexity

Cross-resolution generalization

Models like ConvNeXtFF (e.g.,15) struggle with fixed input resolution and high resource demands

The proposed model generalizes across multiple image resolutions and manipulation types effectively

Handling noisy data

CSR-Net (e.g.,17) introduces spline-based regression but is sensitive to thresholds and curve-fitting

The model handles noisy data effectively with the integration of SE-attention and Focal Loss for robust learning

Generalization to various manipulation types

LBRT (e.g.,16) is limited to copy-move detection and struggles with subtle manipulations

The proposed model handles multiple forgery types (splicing, copy-move, hybrid) and performs well under varying conditions