Table 3 Resnet vs ViT top training accuracy and loss range in nearby epochs.
From: Gun identification from gunshot audios for secure public places using transformer learning
Rifle vs Handgun shot audio dataset | Top accuracy (%) | Loss range in nearby epochs |
---|---|---|
Resnet50+MFCC+MelSpectogram | 93.87 | 0.0004–0.0400 |
VIT-32+MFCC+MelSpectogram | 93.87 | 0.2768–1.538 |