Table 8 Complexity comparison of different models.

From: Multimodal anomaly detection in complex environments using video and audio fusion

Model

Parameter quantity (M)

Floating-point operand (FLOPs)

Average processing time (ms/frame)

AUC value (UCSD Ped2)

AUC value (Avenue)

AUC value (ShanghaiTech)

AUC value (UCF-Crime)

STADNet

8.5

150

35

0.95

0.93

0.93

0.91

2D CNN

6.2

120

40

0.85

0.81

0.82

0.79

LSTM

9.0

180

50

0.88

0.86

0.84

0.83

Traditional methods (such as optical flow method)

2.0

60

30

0.75

0.70

0.72

0.68

  1. Note: Floating-point operand (FLOPs): indicates the floating-point operand required for each forward propagation, and the unit is floating-point operation (FLOP). FLOPs are used to measure the computational complexity of the model. The higher FLOPs are, the greater the computational complexity of the model is, and usually more computational resources are needed. Average processing time (ms/frame): indicates the average processing time of the model on each frame of video, in milliseconds (ms). The smaller the index, the faster the model can process video frames, which is more suitable for real-time video analysis tasks. AUC value (dataset) indicates the Area Under Curve (AUC) value of the model on the dataset. The higher the AUC value, the better the anomaly detection performance of the model, which can distinguish normal and abnormal events more accurately.