Table 1 Summary of existing literature for non-PPE detection.

From: Automated non-PPE detection on construction sites using YOLOv10 and transformer architectures for surveillance and body worn cameras with benchmark datasets

Reference

Target PPE

Dataset

Method

Potential limitation compared to proposed method

Fang et al.13

Helmet

Custom dataset (publicly not open)

Faster R-CNN

Lower inference speed and detection accuracy

Gu et al.14

Helmet

Custom dataset (publicly not open)

Faster R-CNN with multi-scale training

Lower generalization, limited small-object detection

Yang et al.15

Helmet

Custom dataset (publicly not open)

YOLOv3 with DarkNet53

Inferior accuracy and weaker performance on small objects

Yan and Wang16

Helmet

Custom dataset (publicly not open)

YOLOv3 with DarkNet53

Less robust detection performance and slower inference

Shen et al.17

Helmet

Custom dataset (publicly not open)

VGG16-based face detector (Face); DenseNet-based classifier (Helmet)

Two-stage method, slower inference, and limited scalability

Nath18

Helmet, and vest

Custom dataset (publicly not open)

YOLOv3 (worker detection); VGG16, ResNet, Xception (helmet and vest classifiers)

Two-stage classification, lower real-time detection capability

Wang et al.19

Helmet, and vest

Custom dataset (publicly not open)

YOLOv5x and YOLOv5s

Lower AP, reduced accuracy on challenging small-scale objects

Lee et al.20

Helmet, and vest

Custom dataset (publicly not open)

Mask R-CNN with MobileNetV3

Slower inference speed due to complex instance segmentation

Nguyen et al.21

Helmet, mask, glove, vest, shoes

Custom dataset (publicly not open)

SHO-based YOLOv5

Moderate detection accuracy and limitations in object scale robustness

Park et al.22

Helmet, mask, glove, vest, shoes

Custom dataset (publicly not open)

YOLOv8, Swin Transformer, Axial Transformer

Slightly slower inference speed and lower overall accuracy