Fig. 2: VespAI model architecture and functionality.
From: VespAI: a deep learning-based system for the detection of invasive hornets

a Illustration of the motion detection and video pre-filtering process used by ViBe50. This ensures that the system remains passive until motion is detected and that only ‘hornet-sized’ objects—determined from a known reference range for each species (Fig. S1)—are extracted from videos and passed on to the detection algorithm. b Diagram detailing the algorithm for hornet detection, classification, and confidence assignation. This model is built on YOLOv5s architecture, utilising a ResNet-5053 backbone with a PaNet71 neck, and applies a single F-CNN to the whole image to rapidly detect and classify hornets. To optimise performance, the algorithm downscales images to a resolution of 640 × 640 and applies letterboxing during detection. Class predictions and detection confidence values between 0 and 1 are then provided on an associated bounding box that is projected back onto the original image, as detailed in the diagram. c Examples of successful detections in a range of common scenarios including target saturation and overlap, class co-occurrence, and the presence of non-target insects. Dashed boxes denote discrete modules of ViBe motion detection and background subtraction, YOLOv5s object detection and classification, and example outputs when these processes are combined.