Table 6 Time consumption evaluation results.

From: Multimodal anomaly detection in complex environments using video and audio fusion

Model

Training time (min)

Inference time (ms/frame)

AlexNet

1.2

20

CNN + LSTM

1.8

30

3D CNN

2.4

25

Proposed model

0.6

15