Table 5 DL classification results: frames are aggregated over each video using majority voting.
Backbone | Accuracy | Positive | Negative | |||||
---|---|---|---|---|---|---|---|---|
Initial weights | Model | Precision | Recall | F1 | Precision | Recall | F1 | |
Supervised | ResNet50 | 0.81 | 0.91 | 0.70 | 0.79 | 0.75 | 0.93 | 0.83 |
ViT | 0.82 | 0.80 | 0.87 | 0.83 | 0.85 | 0.77 | 0.81 | |
DINO | ResNet50 | 0.81 | 0.92 | 0.70 | 0.79 | 0.75 | 0.94 | 0.83 |
ViT | 0.89 | 0.94 | 0.84 | 0.89 | 0.85 | 0.95 | 0.90 |