Table 3 Resnet vs ViT top training accuracy and loss range in nearby epochs.

From: Gun identification from gunshot audios for secure public places using transformer learning

Rifle vs Handgun shot audio dataset

Top accuracy (%)

Loss range in nearby epochs

Resnet50+MFCC+MelSpectogram

93.87

0.0004–0.0400

VIT-32+MFCC+MelSpectogram

93.87

0.2768–1.538