Table 1 Overview of studies assessing the performance of deep learning models in medical imaging.

From: Vision language models versus machine learning models performance on polyp detection and classification in colonoscopy images

First Author, Year

VLM \Model

Major

Modality

Performance/Contribution

Pecal, 20214

YOLOv3 + CSPNet, SiLU

Gastroenterology, polyp detection

Colonoscopy

Improved YOLOv3/YOLOv4 with higher precision/recall; validated on large datasets, enhancing clinical usability.

Karaman, 2023b5

YOLOv5 + ABC optimization

Gastroenterology, polyp detection

Colonoscopy

ABC-tuned hyperparameters and activations; outperformed baseline YOLOv5 in accuracy and speed

Karaman, 2023a6

Scaled-YOLOv4 + ABC

Gastroenterology, polyp detection

Colonoscopy

First systematic YOLO optimization; +3% mAP and + 2% F1 across multiple variants.

Pecal and Karaboga, 20217

YOLOv4 + CSPNet, Mish, ensemble

Gastroenterology, polyp detection

Colonoscopy

State-of-the-art detection with precision 96%, recall 97%, F1 96%; real-time applicability.

Narasimha Raju, 20259

Hybrid CNN (ResNet-50, DenseNet-201, VGG-16) + Transformer + Multi-class SVM + Grad-CAM

CRC (multi-class lesion detection)

Colonoscopy

Achieved 98% accuracy, F1 = 0.98, precision = 97%, recall = 99%. Addressed class imbalance, interpretability, and spatial complexity with explainable heatmaps; sets new benchmark for clinically interpretable AI-assisted colonoscopy