Introduction

In recent days, surveillance cameras have been placed everywhere since unusual events are frequently captured by surveillance cameras. People give more attention to unusual events than normal events. To analyze the unusual events happening in the surveillance footage on a 24/7 basis, it is time and memory consuming. Therefore, video analytics systems have to be made for the surveillance video to differentiate the usual and unusual events. An unusual event is an abnormal event that needs to be differentiated from the normal event. However, an unusual event is not necessary as a suspicious event from the surveillance point of view1,2,3. Exploring normal and abnormal events in the surveillance video is an important task in recent days due to the increased crime rates. Also, these kinds of tasks become a key focus area in surveillance applications for detecting traffic monitoring, vehicle congestion, traffic violations, vehicle intrusion, accidents, terrorism, etc. The above problems can be handled using keyframe extraction technique4,5, object detection and tracking technique6,7, unusual event detection technique8,9,10,11,12,13, etc.

In the recent past, many surveillance based applications have been developed using machine and deep learning approaches14,15,16,17,18,19. These approaches have been built for detecting unusual events using hand crafted features, statistical features, etc. Although noteworthy innovations have been achieved in unusual event detection20,21,22,23, there are still some problems to handle repetitive frames and to accurately detect unusual events. Since surveillance cameras continuously record the events, several repetitive frames can be identified in the frame sequence. To remove the repetitive frames for reducing memory storage, key frames are to be extracted. Identifying representative frames in original video are known as key frames that can be extracted based on user need.

To deal with repetitive frames and unusual event detection, this paper introduces Key Frame Extraction based Abnormal Vehicle Identification (KFEAVI) technique that uses statistical feature extraction technique and constrained angular second moment technique. The statistical feature extraction technique is used to extract the key frames in a statistical way by using beta distribution estimation. This technique handles well for both gradual and abrupt content changes in frames. The constrained angular second moment method is applied to find the vehicles to identify the abnormal vehicles movement. Our contributions are summarized as follows.

  1. 1.

    KFEAVI Methodology is developed for extracting key frames and identifying abnormal vehicles.

  2. 2.

    Beta distribution estimation is applied on the frame sequence to analyze the content and to find the similarity between frames.

  3. 3.

    CASM is developed for identifying vehicles with their location using Vehicle Detection Vector.

  4. 4.

    To get the abnormal vehicle identification using keyframes, beta distribution and CASM methodologies are fused.

Related work

Many unusual event detection works have been done and considerable progress has been obtained in the area of surveillance systems. In this article, some unusual event detection works have been explained.Kosmopoulos and Chatzis24 applied pixel wise computation on the two-dimensional surveillance frame sequence utilizing holistic visual behaviour understanding technique. Yen and Wang25 presented an abnormal event detection method named as Social Force Model (SFM) for crowded scenarios. The SFM is developed using optical flow computation. In this work, every individual is treated as a moving particle. The SFM is the interaction force that is applied between every two particles. Zhang et al.26 developed a model named as Social Attribute-Aware Force Model (SAFM); the SFM is extended as SAFM. Similarly, Chaker et al.27 developed a social network model using an unsupervised approach for crowded scenarios for unusual event detection. Amraee et al.28 developed a feature descriptor using histogram of oriented gradient (HOG) and a Gaussian Mixture Model (GMM) to detect unusual events in the frame sequence. Hung et al.29 extracted the SIFT feature using Spatial Pyramid Matching Kernel (SPMK) based BoW model for describing the motion of a crowd. Sandhan et al.30 developed a method for unusual event detection using an unsupervised learning approach. The proposed method is developed based on the general person’s perception that normal events occur often while abnormal events occur rarely. Tziakos et al.31 explored the projection subspace association to handle the issue about unusual event detection for both labelled and unlabelled parts in a supervised learning approach. The authors mentioned in the article the labelled information about normal events was limited. The reason could be that all kinds of normal events were not learned in the training stage. Therefore, the performance of unusual event detection is poor in many cases. Shi et al.32 developed a model for unusual event detection by using spatio-temporal co-occurrence Gaussian mixture models (STCOG). Yin et al.33 developed a method to increase and improve the information content through the dimension of the motion feature vector that is based on the density of a crowd. Mahadevan et al.34 developed a normal behaviour model using dynamic texture features to form a mixture model to detect unusual events. Singh and Mohan32 developed an approach for unusual activity recognition that is based on graph formulation method and graph kernel SVM. Gu et al.35 introduced a method by considering the advantage of the GMM and the particle entropy to represent the crowd distribution in crowded scenarios. In general, the size of the crowd will be changed according to the number of people appearing in the crowd. To handle this random size and motion patterns of the crowd in unusual situations, Lee et al.36 developed a Human Motion Analysis Method (HMAM). The HMAM is built based on statistics information and entropy computation. Optical flow computation in low-level features can reflect the relative distance of multiple moving objects in a particular scene at two various moments. This analysis is helpful in detecting unusual events in surveillance video.

This paper introduces Key Frame Extraction based Abnormal Vehicle Identification (KFEAVI) technique that uses statistical feature extraction technique and constrained angular second moment technique to identify the abnormal vehicles by extracting key frames.

Proposed KFEAVI methodology

The KFEAVI considers the surveillance video as the input. Let V bethe input surveillance video that is described by M number of frames. The frame number can be denoted as Fj = f1, f2, …….fM where j = 1 to M. The fj frame may contain redundant information. To find the Keyframe information in the frame sequence, statistical distribution is analyzed in terms of frame’s colors on the frame sequence. Furthermore, keyframes are used to find the abnormal vehicle using CASM. The workflow of the KFEAVI is explained in the Fig. 1.

Fig. 1
figure 1

The workflow of the KFEAVI.

Statistical feature extraction

For efficient feature representation, this work uses statistical feature representation models using Beta Distribution (BD) using color information on the frame sequence. In general, multiple regions of the adjacent frame having different content where object motion are either a slow or rapid change. These differences have to be measured for efficient key frame extraction. Therefore, the input frame sequence is divided into non-overlapping regions uniformly. Let rk be the number of regions where k = 1 to K. The BD is computed on each regionusing Standard Deviation (SD) st of rk.

Standard deviation computation

The SD is computed on the image using color content. Let pij is the pixel of the rk. It is explained in the Eq. 1.

$${r_k}=\left[ {\begin{array}{*{20}{c}} {{p_{11}}}&{{p_{12}}}&{{p_{13}}}&{.........}&{{p_{1j}}} \\ {{p_{21}}}&{{p_{22}}}&{{p_{23}}}&{.........}&{{p_{2j}}} \\ {..........}&{.............}&{...........}&{...........}&{..........} \\ {..........}&{............}&{............}&{............}&{...........} \\ {{p_{i1}}}&{{p_{i2}}}&{{p_{i3}}}&{.............}&{{p_{mn}}} \end{array}} \right]{\text{ ,}}$$
(1)

where p is the pixel of the rk, m is the number of pixels in the row of therk, and n is the number of pixels in the column of the rk respectively. The SD can be computed by Eq. 2.

$${s_t}=\sqrt {\frac{{\sum\limits_{{p=1}}^{m} {\sum\limits_{{p=1}}^{n} {{{\left( {{p_{ij}} - \mu } \right)}^2}} } }}{n}}$$
(2)

where pij is an individual pixel value, µ is the mean value of the rk; it is computed using all pixels on the rk, and n is the total number of pixels on the rk.

Beta distribution estimation on region

The Beta Distribution (BD) is employed to model the pixel intensity distribution within a given region. As a continuous probability distribution defined on the interval [0, 1], BD is well-suited for capturing various shapes and characteristics of intensity variations across frame sequences. By computing BD over the normalized pixel values \(\:{r}_{k}\)​, the model can effectively represent subtle changes in regional appearance. These distributional shapes can then be compared across consecutive frames to measure similarity and identify potential anomalies in the scene. The BD is computed using Eq. 3.

$$BD=\frac{1}{{B(\alpha ,{\text{ }}\beta )}}s_{t}^{{_{{}}^{{\alpha - 1}}}} {\left( {1 - {s_t}} \right)^{\beta - 1}}$$
(3)

The portion B(α, β) is estimated using Eq. 4 .The α and β value must be greater than 0 value.

$$B(\alpha ,{\text{ }}\beta )=\frac{{\Gamma \alpha {\text{ }}\Gamma \beta }}{{\Gamma \alpha {\text{ }}+{\text{ }}\Gamma \beta }}$$
(4)

Parameters estimation

In beta distribution analysis, the α and β are two parameters. These parameters are used to control the shape of the BD. Also, this parameters estimation process is used to produce the correct fitting probability distribution curve for the input data. These parameters can be defined by using following Eqs. 5 and 6.

$$\alpha =\left( {\frac{{1 - \mu }}{{{s_t}}} - \frac{1}{\mu }} \right){\mu ^2}$$
(5)
$$\beta =\alpha \left( {\frac{1}{\mu } - 1} \right)$$
(6)

Key frames extraction

In the BD computation, every rk gets a BD value. For extracting key frames, the corresponding BDs’ are used to find the similarity using Distance measure (D). Let ABD bea current frame’s region and let BBD be the consecutive frame’s corresponding region. By this process, every corresponding region’s BD value is subtracted with their corresponding regions’ values. It is explained in the Eq. 7.

$$D={\sqrt {\left( {{A_{BD}} - {B_{BD}}} \right)} ^2}$$
(7)

Finally, the regions of the frame are aggregated with their D value. Let DT be the total similarity values of the current frame. For efficient keyframe extraction, the threshold value η is applied on the DT value. If the DT value is greater than the η value is termed as a key frame; it is explained in the Eq. 8. The threshold value ηis set experimentally.

$$Ke{y_{}}Frame=\left\{ \begin{gathered} yes,{\text{ }}if{\text{ (}}{D_T}>\eta ) \hfill \\ no,{\text{ }}otherwise \hfill \\ \end{gathered} \right.$$
(8)

Foreground vehicles identification

The output key frames are used to find the abnormal events frames. Let Kebe the total number of key frames whereK1, K2, K3, …., KN are the individual key frames. These frames consist of background and foreground pixels. Usually, many actions can be performed by the foreground objects and those actions are mainly used to find the abnormal events. Therefore, background subtraction is carried out on the Ke to identify the foreground pixels. This work considers background model6 to subtract the foreground vehicle from the background objects using a reference frame.

The foreground vehicles are segmented from the foreground regions using connected component analysis algorithm37 on the Ke. This analysis is used to connect upper, top, left, and right pixels on the foreground region. Let Qc be the foreground vehicles, where c = 1 to w that is explained in the Eq. 9.

$${Q_c}=\left\{ {{q_1},q_{2}^{{}},{q_3},..........,{q_w}} \right\}$$
(9)

Abnormal vehicle identification using CASM method

The Qcis subjected to find the abnormal events using spatial analysis. To find the spatial information on Qc is found by introducing Constrained Angular Second Moment CASM method. For this analysis, the Vehicle Degree Vector (VDV) needs to be created by using one-dimensional vector Vf where f = 1to n. The Vf consists of vehicle occurrences that are denoted ‘1’ value and the remaining cells are denoted as ‘0’ value. The same size vector is used over the frame sequence. The example VDV is explained in Fig. 2.

Fig. 2
figure 2

The example of VDV.

For example VDV, there are four vehicles present in the frame. Therefore, four ‘1’ values are filled into VDV and the remaining cells filled with ‘0’ values. In this work, CASM method is used to measure the degree of vehicle occurrence with distance constraints using VDV. To find the abnormal vehicle, the vehicle interaction needs to be analysed since smaller interaction may create problems such as injuries and accidents. The interaction can be calculated using Distance measure (D) by analysingthe vehicle’s location. It is explained using the Eq. 10.

$$D({q_c},{q_{c+1}})=\sqrt {{{\left( {{q_c}_{{(x,y)}},{q_{c+1}}_{{(x,y)}}} \right)}^2}{\text{ }}}$$
(10)

where D is the distance between vehicles. The \({q_c}_{{(x,y)}}\)and \({q_{c+1}}_{{(x,y)}}\)are the location of qc and qc+1 respectively. The distance difference between two vehicles needs to be η level. CASM method is also considered the η value. Based on the distance difference value, Vehicle Occurrence (VO) is computed; it is explained in the Eq. 11.

$${\text{VO }}=\sum\limits_{{f=1}}^{n} {{{\left( {{V_{^{f}}}} \right)}^2}{\text{ if (}}\eta <{\text{D)}}}$$
(11)

In CASM methodology, an AVI can be detected within a frame if it’s spatial distance with minimal threshold η between consecutive vehicles position. If a distance variation between vehicles is greater than the threshold η, it is treated as an “isolated” and it is denoted as a ‘0’ in the one-dimensional vector Vf ​. This vector helps model the dynamic pattern of vehicle presence and spatial behavior. The decrease in VO in the current frame reflects abrupt spatial dislocation of vehicles, which may indicate the AVI.

To avoid overgeneralization, we limit the scope of AVI detection to first instances of abnormal vehicle interactions, such as sudden deceleration, collision risk, or disorganized vehicle flow. The AVI frames are considered as spatial inconsistency. It is explained in the Fig. 3.

Fig. 3
figure 3

Example Image of Abnormal Vehicle Identification.

Experimental results

The proposed KFEAVI work is carried out in the MATLAB tool using car dataset. The Car Accident Detection Dataset (CADP)38 consists of several car video clips namely Video 1 (V1), Video 2 (V2), Video 3 (V3), etc. For experimental results, this work considers five car video clips. All video clips were captured on various roads using surveillance cameras and each video clip has duration from 3 to 15 min. For experimental purposes, an average of 300 frames was taken from each video clip. To estimate the performance of the KFEAVI proposed work, several existing algorithms are compared with the proposed method.

Performance evaluation

To evaluate the performance of the proposed KFEAVI work, the accuracy measure F-score is used using Eqs. 12, 13, and 14. True Positive (TP), False Positive (FP), and False Negative (FN) are defined by the following ways. Regarding the ground truth annotation, the AVI frames were manually labelled by the user based on visual assessment of abnormal interactions. Specifically, a frame is labelled as ground truth AVI if a vehicle that was present in the previous frame and it disappears in the current frame. Only the first AVI frame is labelled to avoid over-counting prolonged AVI. A True Positive (TP) is when the proposed KFEAVI algorithm detects an AVI frame that matches a ground truth AVI. A False Positive (FP) occurs when KFEAVI detects an AVI not present in the ground truth. A False Negative (FN) occurs when a ground truth AVI is not detected by KFEAVI.

$$F - score=2 \times \frac{{\Pr ecesion \times \operatorname{Re} call}}{{\Pr ecesion+\operatorname{Re} call}}$$
(12)
$$\Pr ecision=\frac{{TP}}{{TP + FP}}$$
(13)
$$\operatorname{Re} call=\frac{{{\text{TP}}}}{{{\text{TP }}+{\text{ FN}}}}$$
(14)
Table 1 Experimental results on KFEAVI.
Table 2 Performance evaluation on KFEAVI.

In Table 1, the number of TP, FP, FN were analysed during the experimental results. Based on this analysis, precision, recall, and F-score were estimated using the Eqs. 12, 13, and 14. The results are shown in Table 2. The proposed KFEAVI work achieved best results for all video clips; they are shown in Table 2. The results revealed that the video clip V1 achieved less result than video clips V2, V3, V4, and V5. The overlapping vehicles are more appeared in V1 and thus FP rate is slightly decreased. The snapshot of AVI is explained in Fig. 4.

Fig. 4
figure 4

Experimental Results on KFEAVI.

Comparative analysis

To evaluate the performance of the proposed KFEAVI, the following existing vehicle identification algorithms viz., PC1, PC2, PC3, SVM, PC1 + SVM + GSP(OR), Automated Vehicle Monitoring System (AVMS), Intelligent Video Analytics Model (IVAM), Leveraging Convolutional Neural Networks (LCNN), and YOLOv8 are compared with the proposed KFEAVIwork using three video clips (Tables 3 and 4).

Table 3 Comparative analysis for abnormal vehicles interaction using F-score Measure.
Table 4 Comparative analysis with deep learning using F-score Measure.

Ablation study

A stepwise ablation study is carried out on the proposed method to estimate the effectiveness of the proposed KFEAVI method. The ablation study is started with the original frame sequence as the baseline and integrating two sub-works into the baseline such as KFEAVI using original frame sequence and KFEAVI using keyframes. For fair validation, the same size samples are taken for ablation study. The results are tabulated in Table 5.

Table 5 Ablation study on KFEAVI.

As observed in Table 5, the incorporating CASM and keyframe extraction using beta distribution proposed KFEAVI obtain better results and each of the modules makes a reasonable contribution to AVI task. The exclusion of CASM significantly decreased Precision @ 21%, Recall @ 26%, and F-score @ 23%, respectively. Also, exclusion of keyframe extraction modules and CASM significantly decreased Precision @ 37%, Recall 33.00%, and F-score @ 35%.

Discussion

Tables 3 and 4 demonstrate that the proposed KFEAVI method outperforms existing algorithms in terms of abnormal vehicle interaction (AVI) detection. This improvement is primarily due to KFEAVI’s ability to eliminate duplicate frames by effectively identifying keyframes based on content variation. Through this strategy, frames containing meaningful vehicle activity are isolated, enabling efficient statistical feature extraction and increasing the number of true positive (TP) detections across video clips.

In the AVI module, the Context-Aware Spatial Mapping (CASM) method is employed to assess the degree of vehicle occurrence under distance constraints. This approach facilitates more accurate detection of vehicle count and position within the frame. The combined use of CASM and spatial distance analysis strengthens KFEAVI’s capability to identify abnormal interactions between vehicles.

Despite its strong performance, the proposed method still encounters false positives (FP) and false negatives (FN), especially in scenarios involving overlapping vehicles, where spatial analysis alone may not suffice. The ablation study confirms that KFEAVI achieves significantly higher accuracy when using selected keyframes as opposed to the full frame sequence, which contains substantial redundant information.

Limitation and future work

One key limitation of the current framework is the absence of temporal modeling, which restricts its ability to detect speed-related anomalies, such as sudden stops or over speeding. These behaviors often manifest over a sequence of frames and are challenging to capture using only spatial analysis. Incorporating temporal features—such as vehicle displacement and motion patterns—would allow the system to handle complex scenarios like overlapping vehicles more effectively. Therefore, integrating temporal analysis into the AVI module will be a crucial direction for future research to enhance performance in diverse and dynamic surveillance environments.

Conclusions

This paper introduced KFEAVI work to extract key frames for identifying abnormal vehicles. The KFEAVI used statistical feature extraction technique and constrained angular second moment technique. This work handles well for both gradual and abrupt content changes in frames while extracting key frames. The constrained angular second moment method is applied to identify the abnormal vehicles movement. For evaluating performance of KFEAVI, several algorithms are compared with the proposed KFEAVI. The experimental results reveal that the proposed KFEAVI achieved better results in terms of F-score. In case of overlapping vehicles in the input frame, the KFEAVI gives less accuracy since this technique did not consider temporal information. Therefore, KFEAVI can be extended to find the abnormal events with the temporal analysis.