Figure 1
From: Detecting common coccinellids found in sorghum using deep learning models

Generic architecture for object detection approaches. A modern object detection network consists of three main components: (1) a backbone network that performs feature extraction for a given input image; (2) a neck that collects and combines features from different layers; and (3) a head which is used to detect and classify objects of interest. One-stage detectors use a dense prediction head to simultaneously address the detection (bounding box regression) and classification tasks, while two-stage detectors decouple the two tasks and use a sparse prediction head to classify previously identified RoIs.