Flying foxes optimization with reinforcement learning for vehicle detection in UAV imagery

Almakayeel, Naif

doi:10.1038/s41598-024-71582-1

Download PDF

Article
Open access
Published: 04 September 2024

Flying foxes optimization with reinforcement learning for vehicle detection in UAV imagery

Naif Almakayeel¹

Scientific Reports volume 14, Article number: 20616 (2024) Cite this article

1892 Accesses
2 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Intelligent transportation systems (ITS) are globally installed in smart cities, which enable the next generation of ITS depending on the potential integration of autonomous and connected vehicles. Both technologies are being tested widely in various cities across the world. However, these two developing technologies are vital in allowing a fully automatic transportation system; it is necessary to automate other transportation and road components. Unmanned aerial vehicles (UAVs) or drones are utilized for many surveillance applications in the ITS. Detecting on-ground vehicles in drone images is significant for disaster rescue operations, traffic and parking management, and navigating uneven territories. This study presents a flying foxes optimization with deep learning-based vehicle detection and classification model on aerial images (FFODL-VDCAI) technique for ITS application. The main objective of the FFODL-VDCAI technique is to automate and accurately classify vehicles that exist in aerial images. Three primary processes are involved in the presented FFODL-VDCAI technique. Initially, the FFODL-VDCAI approach utilizes YOLO-GD (Ghost-Net and Depthwise convolution) for vehicle detection, where the YOLO-GD uses lightweight Ghost Net in place on the backbone network of YOLO-v4 and interchanges the conventional convolutional with depthwise separable convolutional and pointwise convolutional. Next, the FFO technique is used for hyperparameter tuning the Ghost Net technique. Finally, a deep Q-network (DQN) based reinforcement learning technique is used to classify detected vehicles effectively. A comprehensive simulation analysis of the FFODL-VDCAI methodology is conducted on the UAV image dataset. The performance validation of the FFODL-VDCAI methodology exhibited superior values of 96.15% and 92.03% under PSU and Stanford datasets concerning various aspects.

Explainable artificial intelligence with temporal convolutional networks for adverse weather condition detection in driverless vehicles

Article Open access 03 June 2025

Vehicle-to-everything decision optimization and cloud control based on deep reinforcement learning

Article Open access 09 August 2025

Vehicle detection in drone aerial views based on lightweight OSD-YOLOv10

Article Open access 11 July 2025

Introduction

One of the significant building blocks of any smart city is ITSs. Indeed, information and communication technologies (ICT) would benefit road infrastructures¹. Technology is continuously evolving despite the advanced ITS solutions to be deployed². Testing technologies on public roads has already started in nations, and severe efforts continue to mandate and regulate these near-future technologies³. Various new applications will be enabled as inter-connected and autonomous vehicle infiltration in traffic surges. UAVs, or drones, were utilized in the military for many services. In recent times⁴, there has been a drastic increase in the utility of UAVs in other sectors, like delivery of goods and services, precision agriculture, and security and surveillance. Automation of the entire transportation system could not be attained only through vehicle automation⁵. Indeed, other elements of the road and the end-wise transportation system, namely the rescue teams, support teams, road surveys, and traffic police, should be automated. Those elements are automated with reliable and smart UAVs⁶. The transportation scheme is complex, so detecting vehicles in drone images is significant. It could help detect vehicles stuck in disaster zones or rugged terrains and manage traffic and parking lots⁷.

Vehicular detection from drone images complements on-road vehicular detection and is advantageous for driver assistance models⁸. Moreover, vehicle detection is the initial step in several traffic surveillance tasks. There was an increasing trend to use CNNs with the current hype of DL and AI, CNNs for extracting data in image and video streams⁹. However, witnessed as the potential method for semantic segmentation of images, classification, and detection of aerial images are various peculiarities that vary in the traditional kinds of images¹⁰. For instance, objects are viewed from multiple viewpoints and altitudes. Therefore, single classes have several representation patterns to be studied¹¹. In addition, various classes may share similar appearances, particularly in higher altitudes. ITS are substantial for developing smart cities, giving transformative merits for managing road infrastructure and traffic flow¹². With the swift enhancement of technology, there is a growing requirement for integrating novel outcomes such as UAV imagery and deep reinforcement learning for vehicle recognition¹³. As urban areas become more complex, efficient and adaptive ITS solutions are crucial to optimize traffic management, improve safety, and improve overall transportation effectiveness¹⁴. The continuous evolution of these technologies drives the requirement for rigorous testing and refinement to confirm their practical application and regulatory compliance in real-world settings¹⁵.

Contribution of the study

This study presents a flying foxes optimization with deep learning-based vehicle detection and classification model on aerial images (FFODL-VDCAI) technique for ITS application. The main objective of the FFODL-VDCAI technique is to automate and accurately classify vehicles that exist in aerial images. Three primary processes are involved in the presented FFODL-VDCAI technique. Initially, the FFODL-VDCAI approach utilizes YOLO-GD (Ghost-Net and Depthwise convolution) for vehicle detection, where the YOLO-GD uses lightweight Ghost Net in place on the backbone network of YOLO-v4 and interchanges the conventional convolutional with depthwise separable convolutional and pointwise convolutional. Next, the FFO technique is used for hyperparameter tuning the Ghost Net technique. Finally, a deep Q-network (DQN) based reinforcement learning technique is used to classify detected vehicles effectively. A comprehensive simulation analysis of the FFODL-VDCAI methodology is conducted on the UAV image dataset. The significant contribution of the FFODL-VDCAI methodology is listed below:

The FFODL-VDCAI technique presents a vehicle detection technique utilizing YOLO integrated with GD for improved accuracy and effectiveness in UAV imagery. The model optimizes vehicle detection by using advanced DL methods. The contribution is in combining YOLO with GD to refine the detection capacities.
The FFO approach is employed to fine-tune the hyperparameters of the vehicle detection method, crucially enhancing its performance and adaptability. Utilizing FFO improves the precision and effectiveness of the technique in vehicle detection. The key contribution is the application of FFO for optimizing hyperparameters, which enhances the overall efficiency of the method.
The FFODL-VDCAI model incorporates the DQN approach to classify recognized vehicles, implementing reinforcement learning to improve the classification accuracy. By employing DQN, the model dynamically enhances its capability to discriminate between diverse kinds of cars, paving the way to more precise and reliable classification
The FFODL-VDCAI approach combines YOLO for vehicle detection, FFO for hyperparameter tuning, and DQN for classification into a unified framework, integrating cutting-edge models for improving performance across overall phases. The novelty is in its seamless incorporation of these advanced methodologies, optimizing every detection phase, tuning, and classification—to crucially improve the vehicle detection and classification from UAV imagery

Related works

Zhao et al.¹⁶ present a lightweight detection methodology that depends on an enhanced form of YOLO-v5. The coordinate attention mechanism is given to improve the feature extractor of the network and its recognition capability and detection. Non-maximum suppression is utilized to solve this problem of false detection and omission while identifying congested targets. In¹⁷, the authors introduced a new vehicle detection method termed PVIDNet, a traffic control method related to the Brazilian Traffic Code, and a lightweight proposal technique for the PVIDNet method leveraging an activation function for decreasing the implementation time of the presented method. Jagannathan et al.¹⁸ devised a novel approach to classify vehicle types. The Gaussian mixture model and AHE can be applied to enhance the quality of gathered vehicle images and to discover vehicles in the denoising image. Afterwards, the Weber Local Descriptor and the Steerable Pyramid Transform abstract the feature vector in the identified cars. Lastly, the feature removed is presented as the input for vehicle classification. Pustokhina et al.¹⁹ modelled a potential DL-related VLPR method-based recognition named OKM-CNN approach. This technique has three stages: license plate (LP) number recognition utilizing the CNN method, LP recognition, and segmentation using the OKM clustering approach. Second, the detection process and LP localization were done through Connected Component Analysis (CCA) and Improved Bernsen Algorithm (IBA) methods. Han et al.²⁰ designed a vehicle-detection approach with a CNN-related object detector. The author devised the technique, DRFBNet300, which includes a Deeper Receptive Field Block (DRFB) element that enriched the mapping feature to find smaller objects in the drone imageries. Eventually, a Split Image Processing (SIP) technique will be used to enhance accuracy.

In²¹, the focus is on expanding the automated vehicle detection approach for drone imageries. Firstly, vehicle datasets for target recognition are built. Afterwards, a novel YOLO-v3 vehicle recognition configuration is devised per the features: The vehicles targeted from the drone image are dense and small. Lastly, the presented structure is tested with three datasets: VEDAI, CAR, and COWC. Moshayedi et al.²² introduced a new low-altitude vehicle speed detector system utilizing drones for RS. To this aim, (2) Mobile Net-SSD DL method variables were entrenched in PI4B processor of physical cars at various speeds, (1) the author has discovered the optimal Raspberry PI's FOV camera in outdoor and indoor situations by altering its degree and height. At last, the author applied it in a real environment by changing the angle and height. Wang²³ devises a vehicle image detection approach utilizing DL in a drone video. First, HSV spatial brightness translation operations on the new samples will be executed to increase the flexibility of various sample diversity and light conditions. Lyu et al.²⁴ propose an improved Faster R-CNN method for recognizing small deer in thermal images, utilizing a Feature Pyramid Network (FPN) and various residual networks (ResNet18 to ResNet152) to enhance the accuracy of feature extraction and detection. Ewers et al.²⁵ aim to improve drone search missions in wilderness areas using deep reinforcement learning, optimizing flight paths based on a probability distribution map to enhance search effectualness and efficiency related to conventional methodologies crucially. In²⁶, a novel reinforcement learning-controlled Grey Wolf Optimization-Archimedes Optimization Algorithm (QGA) model is presented on 22 benchmark functions and applied to determine optimal, collision-free UAV flight paths in a 3D environment. Kumar et al.²⁷ evaluate a UAV-based spraying system in cotton fields by employing imaging methods such as Laser Droplet Analyzer and ImageJ, optimize it with response surface methodology, and utilize a hybrid GWO-ANN technique for deposition predictive evaluation. Makrigiorgis et al.²⁸ present the AirCam-RTM framework, which integrates road segmentation and vehicle detection.

Using drones, existing vehicle detection and classification techniques encounter various threats, comprising restricted scalability across diverse environments and high computational complexity. Models such as the improved YOLO-v5, PVIDNet, and DRFBNet300 may face limitations with real-world discrepancies in traffic scenarios, image quality, and computational needs. OKM-CNN and deep reinforcement learning models also need help with license plate discrepancies, image quality, and dependence on probability maps. Moreover, the UAV-based spraying technology systems may have problems adapting to various field conditions and confirming precise deposition. These limitations underscore the requirement for more versatile and efficient solutions capable of efficiently handling several environmental conditions, image qualities, and lighting scenarios.

The proposed model

This paper introduces an automated FFODL-VDCAI methodology for the vehicle detection and classification process in the ITS platform. The projected FFODL-VDCAI methodology involves three main procedures: YOLO-GD-based vehicle detection, FFO-based hyperparameter tuning, and DQN-based vehicle classification. Figure 1 illustrates the workflow of the FFODL-VDCAI methodology.

Stage I: YOLO-GD based vehicle detection

Primarily, the FFODL-VDCAI technique uses the YOLO-GD model for vehicle detection purposes. The network mainly includes feature extraction, result prediction, and feature fusion²⁹. GhostNet and depthwise convolution are selected in the YOLO-GD model for their efficiency and performance merits. GhostNet’s design lessens computational overhead by producing more feature maps with fewer parameters, improving speed and accuracy. Depthwise convolution, utilized in place of standard convolutions, additionally mitigates computational complexity by employing filters separately to every input channel, thus reducing the number of operations and parameters. This integration allows YOLO-GD to attain high detection accuracy while maintaining a lightweight and efficient architecture crucial for real-time vehicle detection tasks. Figure 2 illustrates the structure of the YOLO-GD model.

YOLO‐GD adopts a lightweight feature extraction model; moreover, the depthwise and pointwise convolutional layer replaces the convolution function, efficiently minimizing the computation overhead. Ghost Net replaced the CSPDarknet53 model from the extraction feature phase. Ghost Net aims to provide a feature map. The major function produces the Ghost mapping feature by using linear conversion dependent upon the new mapping feature and extracting the crucial data in new features at the lowest overhead.

$G_{ - }$bottleneck mainly consists of $G_{ - }$bottleneck, where $s$ shows the stride size, and “×” denotes the iterative process. $G_{ - }$bottleneck primarily consists of a Ghost system. If $stride = 1$, the 1st Ghost module is exploited to expand the layer and enhance the channel counts. Next, minimizes the number of channels for equivalent shortcut paths. The Ghost system's input and output are interconnected in a shortcut. Relu, non-linearity, and $BN$ are utilized, and afterwards, only BN is used in the second layer. Whereas $stride = 2$, the shortcut path exploits depthwise convolutional with $stride = 2$ for down-sampling and point convolutional layer to channel adjustment.

During the feature fusion and outcome forecast phases, spatial pyramid pooling $\left( {SPP} \right)$ is inserted into the network output to increase the receptive field data of networks and extract the spatial feature data of dissimilar dimensions. $SPP$ increases the robustness of the method for object variability and spatial layout:

$$ SPP = C\left( {f^{5 \times 5} MaxPool\left( F \right) + f^{9 \times 9} MaxPool\left( F \right) + f^{13 \times 13} MaxPool\left( F \right) + F} \right). $$

(1)

where $F$ denotes the feature map, $C$ symbolizes the concatenate function, $f^{5 \times 5}$ means $5$ x $5$ filters, and $MaxPool$ represents ${\text{the maxp}}$ ooling function. The Path Aggregation Network (PANet) integrates features among three resultant network layers and accomplishes geometric data in the bottom network and contour data in the top network. YOLOHead forecasts the confidence and classes and coordinates data simultaneously by setting the convolution function of filter count. $3$×$3$ convolutional layer functions are exchanged with $1$ x $1$ and $3$ x $3$ depthwise separable convolution layers to minimize the overhead.

Utilizing its advanced architecture, this technique integrates various mechanisms for addressing false alarms. The method of a lightweight feature extraction network, namely GhostNet, and depthwise separable convolutions, assists in mitigating computation overhead and enhancing feature extraction accuracy. The GhostNet's Ghost mapping feature and the efficient employment of G-bottleneck layers assist in extracting significant features with minimal redundancy, thereby enhancing precision detection. Furthermore, incorporating Spatial Pyramid Pooling (SPP) improves the capacity of the approach to handle objects at several scales and spatial layouts, which assists in discriminating between true objects and background noise. The PANet additionally refines feature integration and spatial data processing across diverse layers, paving the way to more precise object localization and classification. By optimizing these procedures and reduding computational redundancies, YOLO-GD efficiently mitigates the likelihood of false alarms in vehicle recognition.

Stage II: hyperparameter tuning using the FFO model

The FFO methodology was exploited to fine-tune the parameter values of the Ghost Net model. Flying fox is the largest bat species³⁰. The FFO method is appropriate for hyperparameter tuning due to its capability for effectively exploring and exploiting the search space, paving the way to optimal settings for complex techniques. FFO replicates the foraging behaviour of flying foxes, employing local and global search strategies to avert local minima and improve convergence. Its adaptability and robustness make it efficient for tuning hyperparameters in high-dimensional spaces, where conventional techniques may face computational complexity and convergence issues. Related to other models, namely grid search or random search, FFO presents a more systematic and intelligent technique for hyperparameter optimization, resulting in enhanced model performance and mitigated computational costs. Its balance between exploration and exploitation confirms that the hyperparameter tuning procedure is comprehensive and effectual, making it a valuable tool for improving the accuracy and robustness of ML techniques. Figure 3 specifies the architecture of the FFO technique.

The movement in space relies on the observance of the environment since they cannot echolocate. They return to the habitat tree after night meals. Flying foxes seek the coolest tree to rest on to protect themselves from rising morning heatwaves. Mostly, those who are first to place a tree with an adequate amount of heat can suffocate by other members and die.

The FFO technique starts with an arbitrary set of the flying foxes' ensuing positions. This position is demonstrated by the vector, $= \left( {x_{1} ,{ }x_{m} } \right)$, which possesses $an m$‐dimensional element. Then, the objective function assesses the solution for the position. Consequently, they find the coolest tree to ensure survival in intense heat.

Since flying foxes look for the nearest tree or follow each other's paths, it is assumed that once the habitat tree cannot give the minimal temperature for flying foxes, it moves to the dissimilar one to escape the high temperature:

$$ x_{i,j}^{t + 1} = x_{i,j}^{t} + a. rand\left( {coolj - x_{i,j}^{t} } \right) $$

(2)

where $x_{i}^{{\text{o}}} \sim U\left( {x_{{{\text{min}}}} ,x_{{\text{max }}} } \right),$ $x_{i,j}^{t}$ denotes the $j$‐th component of FF(i), at $t$ reiteration, $a$ represents a steady value, $rand \sim U\left( {0,1} \right)$ and cooler relates to the position of FFs located from the tree which takes least temperature. Equation (2) was arranged if $\left| {f\left( {cool} \right) - f\left( {x_{i} } \right)} \right| > \frac{{\delta_{1} }}{2}$, where $cool$ denotes the location vector of the flying fox located in the cooler place that is the better solution, and the $\delta_{1}$ variable equates to the longest possible distance where they are considered near one another. Once the flying foxes approach the tree with minimum temperature $\left( {|(f\left( {cool} \right) - f\left( {x_{i} } \right)| \le \frac{{\delta_{1} }}{2}} \right)$, they find the nearby space to prevent suffocation:

$$ nx_{i,j}^{t + 1} = x_{i,j}^{t} + rand_{1,j} \cdot \left( {cool_{j} - x_{i,j}^{t} } \right) + rand_{2,j} \cdot \left( {x_{{R_{1} j}}^{t} - x_{{R_{2} j}}^{t} } \right) $$

(3)

$$ x_{i,j}^{t + 1} = \left\{ {\begin{array}{*{20}l} {nx_{i,j}^{t + 1} ,} \hfill & {if\;{ }j = k{ }\;or\;{ }rnd_{j} \ge pa} \hfill \\ {x_{i,j}^{t} ,} \hfill & {otherwise} \hfill \\ \end{array} } \right. $$

(4)

where $rand \sim U\left( {0,1} \right),$ $rnd_{j}$ denotes an arbitrary integer within [$0,1$], $x_{{R_{1} }}^{t}$ and $x_{{R_{2} }}^{t}$ show the two arbitrary members going to the existing population, and $pa$ shows the probability constant. Finally, $k$ is randomly selected in $\left\{ {1,2, \ldots ,m} \right\}$, and guarantees that a minimum of one constituent from $nx_{i,j}^{t + 1}$ is selected by $x_{i,j}^{t + 1}$, to assure that there is without duplication among the original as well as novel solutions.

Once the flying foxes create the tree at a low temperature, it can be accepted as the newest solution. If not, it returns to its current position.

Several reasons resulted in the deaths of flying foxes. For example, it might result in a very remote area with high temperatures but seeking the coolest tree. Instead, a replacement List $\left( {RL} \right)$ was rearranged to the NL's unique optimum solution. Thus, an arbitrary integer $n \in \left[ {2,{ }NL} \right]$ is generated, and the position of new flying foxes is given as follows:

$$ x_{i,j}^{t + 1} = \frac{{\Sigma_{k = 1}^{n} RL_{k,j}^{t} }}{n} $$

(5)

In Eq. (5), $RL_{k}^{t}$ denotes the $k\text{th}$ FF on $RL$ at $t$ reiteration. Equation (5) increases the probability of identifying an appropriate area.

Also, they might die from being suffocated by other members of the population:

$$ pD = \frac{nc - 1}{{population\; size}} $$

(6)

In Eq. (6), $nc$ is closely related to the count of FFs with a main function related to the optimum result. Genetic crossover facilitates the mating of 2 flying foxes. The early step includes arbitrarily choosing two parents in the population, which ensures it can be different:

$$ offspring1 = L \cdot R_{1} + \left( {1 - L} \right) \cdot R_{2} $$

(7)

$$ offspring2 = L \cdot R_{2} + \left( {1 - L} \right) \cdot R_{2} $$

$R_{1}$ and $R_{2}$ denote dissimilar population members arbitrarily chosen, and $L$ shows the created random value in [0,1].

The FFO technique developed a fitness function to have larger classifier results. It expresses a positive value and indicates the better efficacy of candidate solutions. Here, minimizing the classifier mistake is the fitness function, as presented in Eq. (8).

$$ \begin{aligned} fitness\left( {x_{i} } \right) & = Classifier\;Error\;Rate\left( {x_{i} } \right) \\ & = \frac{no. of\; misclassified\; instances}{{Total\; no. of\; instances}} \times 100 \\ \end{aligned} $$

(8)

Stage III: DQN-based vehicle classification

Finally, the DQN technique is used to classify detected vehicles. A DQN is a new end-to-end RL agent that exploits a DNN to map the connection between action and state corresponding to QT in QL³¹. The DQN approach is selected for classification tasks because it can learn and make decisions directly from high-dimensional input data utilizing DL models. Unlike conventional classification approaches, DQN implements reinforcement learning to optimize policies through experience replay and target networks, which enhances learning stability and performance. Its capability for handling complex environments and dynamically altering to new data makes it specifically efficient for tasks needing complex classification strategies. Furthermore, the robustness of the DQN model to noisy data and its capacity to learn from past interactions give it a crucial edge over conventional classifiers.

The QL technique produces the table for computing the upcoming rewards for every state and action. In particular, the row indicates state, and the column denotes action. The DQN agent provided exploits a CNN to recognize the local spatial connection present in succeeding game frames. Regarding the DQN agent, one severe insufficiency of traditional QL agents lies in QT that exploits to map the relationship between action and state. Generally, QL agents, having enormous states and actions that result in the curse of dimensionally, must solve the challenges. Hence, the DQN exploits a DCNN for precisely the optimal QF instead of using a QT.

Experience replay deals with eliminating the links among observations and smoothing series by modifying the data distribution by randomizing the data. Iteration upgrading minimizes the links between the targeted and the ${\text{Q}}$ values by updating the QV periodically to the objective value. At first, experience replay stores the agent's experience for each step of the process to construct a group of memories with a specific number of experiences. Therefore, the QN was trained by updating the parameter $\theta_{{\text{i}}}$ at ${\text{i}}$ iteration by minimizing the MSE in the Bellman equation. Thus, the loss function $L_{i} \left( {\theta_{i} } \right)$, change for each iteration $i$, is expressed as follows:

$$ L_{i} = E_{s,a,r} \left[ {\left( {E_{{s^{\prime}}} \left[ {y|s,{ }a} \right] - Q\left( {s,{ }a;\theta_{i} } \right)} \right)^{2} } \right] $$

(9)

Distinguishing the loss function based on the weighted outcomes is shown below:

$$ \nabla_{{\theta_{i} }} L\left( {\theta_{i} } \right) = E_{{s,a,r,s^{\prime}}} \left[ {\left( {r + \gamma \mathop {\max }\limits_{{a^{\prime}}} Q\left( {s^{\prime},{ }a^{\prime};\theta_{i}^{ - } } \right) - Q\left( {s,{ }a;\theta_{i} } \right)} \right)\nabla_{{\theta_{i} }} Q\left( {s,{ }a;\theta_{i} } \right)} \right] $$

(10)

The optimizer of the target is essential to the differential and definition procedure of QF. Moreover, the QL method can be retrieved by upgrading the weighted then every time-step that substitutes the expectation by using a single sample and setting $\theta_{{\text{i}}}^{ - } = \theta_{{{\text{i}} - 1}} .$ Fig. 4 represents the architecture of DQN.

In the trained process of DQN, two variations of QL can be generated to ensure the trained DNN model does not diverge³². Next, QL employs a separate network to create the objective in the QL upgrading task, and these alterations could improve the reliability of DQN. This method includes a delay amongst the upgrading moment of QVs, and the corresponding effect causes the upgrade that minimizes the divergence probability or oscillations presented in the DNN parameter.

Results and discussion

This section tests the vehicle detection and classification results of the FFODL-VDCAI technique on the Stanford and PSU datasets^33,34. Tables 1 and 2 illustrates the details on Stanford and PSU datasets. Each scene is captured utilizing a 4K camera mounted on a 3DR Solo quadcopter hovering approximately 80 m above various intersections on a university campus. The videos, processed for distortion and stabilization, have a specified resolution and comprise annotated targets with classes and their trajectories in time and space. Moreover, images were obtained employing a 3DR Solo drone equipped with a GoPro Hero 4 camera in an outdoor environment at a PSU parking lot. Videos recorded by the drone were utilized for extracting frames, which were manually labelled. Images not containing cars were excluded from the dataset, and the training/testing split was performed randomly. Figure 5 defines the sample UAV images. Figure 6 demonstrates the original and detected images.

Table 1 Standford dataset description.

Full size table

Table 2 PSU dataset description.

Full size table

Table 3 and Fig. 7 examine an average precision (AP) result of the FFODL-VDCAI technique on Stanford and PSU datasets³⁵. The outcome indicates that the FFODL-VDCAI technique achieves enhanced performance on both databases. For instance, on the Stanford database, the FFODL-VDCAI technique reaches an increasing AP of 20.12%, whereas the Faster RCNN, YOLO-v3, and YOLO-v4 models obtain decreasing AP of 19.30%, 13.50%, and 17.50% respectively. Besides, on the PSU database, the FFODL-VDCAI technique attains a superior AP of 95.43%, whereas the Faster RCNN, YOLO-v3, and YOLO-v4 approaches obtain lower AP of 71%, 91.90%, and 94.30% correspondingly.

Table 3 AP analysis of the FFODL-VDCAI approach on two datasets.

Full size table

Table 4 and Fig. 8 provide a comparative average recall (AR) examination of the FFODL-VDCAI technique with other Stanford database methods. The outcome implies that the YOLOv3 (320 × 320) model performs poorly with the least AR values. Along with that, the Faster R-CNN (Inceptionv2), Faster R-CNN (Resnet50), and YOLOv4 (320 × 320) models obtain closer AR values. However, the FFODL-VDCAI technique reaches effectual results with increased $AR^{max = 1}$, $AR^{max = 10} ,$ and $AR^{max = 100}$ values of 17.45%, 20.01%, and 19.95%, respectively.

Table 4 AR outcome of the FFODL-VDCAI approach with other models on the Stanford dataset.

Full size table

In Table 5 and Fig. 9, a comparative AR examination of the FFODL-VDCAI technique with existing methodologies on the PSU dataset is provided. The performance of various methodologies is compared by depending on their Average Recall (AR) at different thresholds. Faster R-CNN with Inceptionv2 achieved $AR^{max = 1}$, $AR^{max = 10} ,$ and $AR^{max = 100}$ values of 6.20 at a maximum of 1, 41.50 at a maximum of 10, and 70.80 at a maximum of 100. Faster R-CNN with ResNet50 had $AR^{max = 1}$, $AR^{max = 10} ,$ and $AR^{max = 100}$ scores of 6.40, 41.50, and 67.20, respectively. YOLOv3 (320 × 320) recorded $AR^{max = 1}$, $AR^{max = 10} ,$ and $AR^{max = 100}$ values of 6.00, 42.20, and 81.00, while YOLOv4 (320 × 320) reached 6.80, 47.10, and 95.50. The FFODL-VDCAI model outperformed the others with higher $AR^{max = 1}$, $AR^{max = 10} ,$ and $AR^{max = 100}$ values of 7.97%, 48.45%, and 97.16%, demonstrating superior performance across all thresholds. The experimental values implied that the YOLOv3 (320 × 320) technique gain worse performance with minimal AR values. Also, the Faster R-CNN(Inceptionv2), Faster R-CNN(Resnet50), and YOLOv4(320 × 320) models obtain closer AR values.

Table 5 AR outcome of FFODL-VDCAI approach with other models on the PSU dataset.

Full size table

Table 6 and Fig. 10 examine the average IoU (AIoU) outcomes of the FFODL-VDCAI technique on Stanford and PSU datasets. The results show that the FFODL-VDCAI technique achieves enhanced performance on both databases. For instance, on the Stanford database, the FFODL-VDCAI methodology achieves superior AIoU of 92.03%, whereas the Faster RCNN, YOLOv3, and YOLOv4 approaches obtain minimal AIoU of 48.80%, 82.50%, and 90.40% correspondingly. Moreover, on the PSU dataset, the FFODL-VDCAI approach attains an increasing AIoU of 96.15%, whereas the Faster RCNN, YOLO-v3, and YOLO-v4 models reach lesser AIoU of 95.50%, 92.80%, and 91.30%, correspondingly.

Table 6 AIoU outcome of FFODL-VDCAI approach with other models on two datasets.

Full size table

In Table 7 and Fig. 11, a comparative AP analysis of the FFODL-VDCAI approach with existing models on the PSU dataset is provided. The experimental values indicate that the Faster R-CNN (Resnet50) approach obtains the least performance with the least AR values. At the same time, the Faster R-CNN(Inceptionv2), YOLOv3(320 × 320), and YOLOv4 (320 × 320) models obtain closer AR values. However, the FFODL-VDCAI methodology reaches effectual results with maximal small, medium, and large values of 0.98%, 0.74%, and 0.82%, respectively.

Table 7 AP outcome of the FFODL-VDCAI approach with other models on the PSU dataset.

Full size table

Table 8 and Fig. 12 provide a comparative AP outcome of the FFODL-VDCAI approach with existing models on the Stanford dataset. The experimental values implied that the Faster R-CNN (Resnet50) method reached minimal performance with the least AR values. It was followed by the Faster R-CNN (Inception-v2), YOLOv3 (320 × 320), and YOLOv4 (320 × 320) models to obtain closer AR values. However, the FFODL-VDCAI technique gained effectual outcomes with increased small, medium, and large values of 0.10%, 0.15%, and 0.69%.

Table 8 AP outcome of the FFODL-VDCAI approach with other models on the Stanford dataset.

Full size table

These results concluded that the FFODL-VDCAI technique gains better performance in the vehicle detection and classification model.

Conclusion

This article presents an automated FFODL-VDCAI technique for the vehicle detection and classification process in the ITS environment. In the presented FFODL-VDCAI technique, three main procedures are involved: YOLO-GD based vehicle detection, FFO-based hyperparameter tuning, and DQN-based vehicle classification. Here, the YOLO-GD uses a lightweight Ghost Net on the backbone network of YOLOv4 and interchanges the conventional convolutional with depthwise separable convolutional and pointwise convolutional. Next, the FFO technique was exploited to parameterize the Ghost Net method. Finally, the DQN method can be used to classify identified vehicles effectively. An extensive simulation analysis is performed on the UAV image dataset to validate the enhanced vehicle classification outcomes of the FFODL-VDCAI technique. The comprehensive validation of the FFODL-VDCAI methodology exhibited superior values of 96.15% and 92.03% under PSU and Stanford datasets concerning various aspects. The FFODL-VDCAI model, while advancing vehicle detection and classification, encounters multiple limitations. The YOLO-GD-based detection may face difficulty with accuracy in highly convolutional or obstructed scenarios. Computational demands and the sensitivity of the optimization to initial conditions could limit the FFO-based hyperparameter tuning. Furthermore, the DQN-based classification may need help with long training times and extensive labelled data requirements. Another limitation of the FFODL-VDCAI model is its potential difficulty scaling to massive datasets with several vehicle kinds, which may impact its robustness and generalization. Future studies should concentrate on improving detection accuracy under challenging environments, optimizing the hyperparameter tuning procedure for effectualness, and enhancing the generalization abilities of the classification method. Exploring hybrid methods and transfer learning methodologies could also be beneficial in addressing these limitations. Future work should also develop scalable solutions for various vehicle kinds and environmental conditions. Moreover, integrating real-time processing capacities and exploring the incorporation of multi-modal data sources could improve the technique's performance and applicability in practical scenarios. Additionally, improving vehicle classification effectualness can be attained by utilizing ensemble DL methods to incorporate diverse classifiers, thereby enhancing accuracy and robustness through several learning perspectives.

Data availability

The datasets used and analyzed during the current study available from the corresponding author on reasonable request.

References

Chen, Y., & Li, Z., 2022. An effective approach of vehicle detection using deep learning. Comput. Intell. Neurosci. (2022).
Böyük, M., Duvar, R., & Urhan, O., Deep learning based vehicle detection with images taken from unmanned air vehicle. In 2020 Innovations in Intelligent Systems and Applications Conference (ASYU) (pp. 1–4). IEEE (2020).
Liu, R. W., Guo, Y., Lu, Y., Chui, K. T. & Gupta, B. B. Deep network-enabled haze visibility enhancement for visual IoT-driven intelligent transportation systems. IEEE Trans. Industr. Inf. 19(2), 1581–1591 (2022).
Article Google Scholar
Gao, Z., Ji, H., Mei, T., Ramesh, B. & Liu, X. Eovnet: Earth-observation image-based vehicle detection network. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 12(9), 3552–3561 (2019).
Article ADS Google Scholar
Khalifa, O. O. et al. Vehicle detection for vision-based intelligent transportation systems using convolutional neural network algorithm. J. Adv. Transp. 2022, 1–11 (2022).
Article Google Scholar
Lin, H. Y., Tu, K. C. & Li, C. Y. VAID: An aerial image dataset for vehicle detection and classification. IEEE Access 8, 212209–212219 (2020).
Article Google Scholar
Kırac, E. & Özbek, S. Deep learning based object detection with unmanned aerial vehicle equipped with embedded system. J. Aviat. 8(1), 15–25 (2024).
Article Google Scholar
Rani, E. LittleYOLO-SPP: A delicate real-time vehicle detection algorithm. Optik 225, 165818 (2021).
Article Google Scholar
Stuparu, D. G., Ciobanu, R. I. & Dobre, C. Vehicle detection in overhead satellite images using a one-stage object detection model. Sensors 20(22), 6485 (2020).
Article ADS PubMed PubMed Central Google Scholar
Liu, W., Lyu, S. K., Liu, T., Wu, Y. T. & Qin, Z. Multi-target optimization strategy for unmanned aerial vehicle formation in forest fire monitoring based on deep Q-network algorithm. Drones 8(5), 201 (2024).
Article Google Scholar
Kang, D., Han, S., Koh, J.S., Kim, T., Hong, I., Im, S., Rho, S., Kim, M., Roh, Y., Kim, C., & Park, J., Fly-by-Feel: Wing strain-based flight control of flapping-wing drones through reinforcement learning (2024).
Yu, X. & Luo, W. Reinforcement learning-based multi-strategy cuckoo search algorithm for 3D UAV path planning. Expert Syst. Appl. 223, 119910 (2023).
Article Google Scholar
Kumar, G., & Altalbe, A., Artificial intelligence (AI) advancements for transportation security: in-depth insights into electric and aerial vehicle systems. Environ. Dev. Sustain. pp. 1–51 (2024).
Tan, Z. & Karaköse, M. A new approach for drone tracking with drone using Proximal Policy Optimization based distributed deep reinforcement learning. SoftwareX 23, 101497 (2023).
Article Google Scholar
Lin, H. Y., Chang, K. L. & Huang, H. Y. Development of unmanned aerial vehicle navigation and warehouse inventory system based on reinforcement learning. Drones 8(6), 220 (2024).
Article Google Scholar
Zhao, Q., Ma, W., Zheng, C. & Li, L. Exploration of vehicle target detection method based on lightweight YOLOv5 fusion background modeling. Appl. Sci. 13(7), 4088 (2023).
Article CAS Google Scholar
Carvalho Barbosa, R., Shoaib Ayub, M., Lopes Rosa, R., Zegarra Rodríguez, D. & Wuttisittikulkij, L. Lightweight PVIDNet: A priority vehicles detection network model based on deep learning for intelligent traffic lights. Sensors 20(21), 6218 (2020).
Article ADS PubMed PubMed Central Google Scholar
Jagannathan, P., Rajkumar, S., Frnda, J., Divakarachari, P. B. & Subramani, P. Moving vehicle detection and classification using gaussian mixture model and ensemble deep learning technique. Wirel. Commun. Mobile Comput. 2021, 1–15 (2021).
Article Google Scholar
Lyu, H. et al. Deer survey from drone thermal imagery using enhanced faster R-CNN based on ResNets and FPN. Ecol. Inform. 79, 102383 (2024).
Article Google Scholar
Ewers, J.H., Anderson, D., & Thomson, D. Deep reinforcement learning for time-critical wilderness search and rescue using drones. arXiv preprint arXiv:2405.12800 (2024)
Sreelakshmy, K. et al. 3D path optimization of unmanned aerial vehicles using Q learning-controlled GWO-AOA. Comput. Syst. Sci. Eng. 45(3), 2483–2503 (2023).
Article Google Scholar
Kumar, S. P. et al. Measurement of droplets characteristics of UAV based spraying system using imaging techniques and prediction by GWO-ANN model. Measurement 234, 114759 (2024).
Article Google Scholar
Pustokhina, I. V. et al. Automatic vehicle license plate recognition using optimal K-means with convolutional neural network for intelligent transportation systems. IEEE Access 8, 92907–92917 (2020).
Article Google Scholar
Han, S., Yoo, J. & Kwon, S. Real-time vehicle-detection method in bird-view unmanned-aerial-vehicle imagery. Sensors 19(18), 3958 (2019).
Article ADS PubMed PubMed Central Google Scholar
Luo, X. et al. Fast automatic vehicle detection in uav images using convolutional neural networks. Remote Sens. 12(12), 1994 (2020).
Article ADS Google Scholar
Moshayedi, A. J. et al. A Secure traffic police remote sensing approach via a deep learning-based low-altitude vehicle speed detector through UAVs in smart cites: Algorithm. Implement. Eval. Fut. Transp. 3(1), 189–209 (2023).
Article Google Scholar
Wang, X., Vehicle image detection method using deep learning in UAV video. Comput. Intell. Neurosci. 2022.
Makrigiorgis, R., Hadjittoouli, N., Kyrkou, C., & Theocharides, T. AirCamRTM: Enhancing vehicle detection for efficient aerial camera-based road traffic monitoring. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 2119–2128) (2022).
Yue, X., Li, H., Shimizu, M., Kawamura, S. & Meng, L. YOLO-GD: a deep learning-based object detection algorithm for empty-dish recycling robots. Machines 10(5), 294 (2022).
Article Google Scholar
Aalloul, R., Elaissaoui, A., Benlattar, M. & Adhiri, R. Emerging parameters extraction method of PV modules based on the survival strategies of flying foxes optimization (FFO). Energies 16(8), 3531 (2023).
Article Google Scholar
Ding, Y. et al. Intelligent fault diagnosis for rotating machinery using deep Q-network based health state classification: A deep reinforcement learning approach. Adv. Eng. Informat. 42, 100977 (2019).
Article Google Scholar
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015).
Article ADS CAS PubMed Google Scholar
Robicquet, A., Sadeghian, A., Alahi, A., Savarese, S. Learning social etiquette: Human trajectory understanding in crowded scenes. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 549–565
https://github.com/aniskoubaa/psu-car-dataset
Ammar, A., Koubaa, A., Ahmed, M., Saad, A. & Benjdira, B. Vehicle detection from aerial images using deep learning: A comparative study. Electronics 10(7), 820 (2021).
Article Google Scholar

Download references

Acknowledgements

The authors thank the Deanship of Research and Graduate Studies at King Khalid University for funding this work through a Large Research Project under grant number RGP2/297/45.

Author information

Authors and Affiliations

Department of Industrial Engineering, College of Engineering, King Khalid University, 61421, Abha, Saudi Arabia
Naif Almakayeel

Authors

Naif Almakayeel
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization: N.A. Data curation and formal analysis: N.A. investigation and methodology: N.A. project administration and resources: supervision; N.A. validation and visualization: N.A. writing—original draft, N.A. Writing—review and editing, N.A. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Naif Almakayeel.

Ethics declarations

Competing interests

The author declares no competing interests.

Ethics approval

This article contains no studies with human participants performed by any authors.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Almakayeel, N. Flying foxes optimization with reinforcement learning for vehicle detection in UAV imagery. Sci Rep 14, 20616 (2024). https://doi.org/10.1038/s41598-024-71582-1

Download citation

Received: 14 June 2024
Accepted: 29 August 2024
Published: 04 September 2024
DOI: https://doi.org/10.1038/s41598-024-71582-1