Table 6 Time consumption evaluation results.
From: Multimodal anomaly detection in complex environments using video and audio fusion
Model | Training time (min) | Inference time (ms/frame) |
|---|---|---|
AlexNet | 1.2 | 20 |
CNN + LSTM | 1.8 | 30 |
3D CNN | 2.4 | 25 |
Proposed model | 0.6 | 15 |