Table 5 Class-wise performance of augmented models using different architectures.

From: Automated non-PPE detection on construction sites using YOLOv10 and transformer architectures for surveillance and body worn cameras with benchmark datasets

Architecture

Metric

Class

Non-helmet

Non-mask

Non-vest

Non-glove

Non-shoes

mAP

ViT

AP50

91.48

87.13

86.26

84.51

82.77

86.43

AP50:95

67.54

64.32

63.68

62.39

61.11

63.81

IOU

86.17

82.07

81.25

79.6

77.96

81.41

AP_S

62.82

59.83

59.23

58.03

56.84

59.35

AP_M

88.3

84.09

83.25

81.57

79.89

83.42

AP_L

95.39

90.85

89.94

88.12

86.3

90.12

Swin Transformer

AP50

92.35

87.95

87.07

85.32

83.56

87.25

AP50:95

68.29

65.04

64.39

63.09

61.79

64.52

IOU

87.39

83.23

82.39

80.73

79.06

82.56

AP_S

64.5

61.43

60.82

59.59

58.36

60.94

AP_M

89.9

85.61

84.76

83.05

81.33

84.93

AP_L

96.95

92.33

91.41

89.56

87.71

91.59

PVT

AP50

91.5

87.15

86.28

84.53

82.79

86.45

AP50:95

66.55

63.38

62.74

61.48

60.21

62.87

IOU

85.46

81.39

80.58

78.95

77.32

80.74

AP_S

62.2

59.23

58.64

57.46

56.27

58.76

AP_M

88.2

83.95

83.13

81.48

79.8

83.33

AP_L

95.17

90.64

89.73

87.92

86.1

89.91