Table 7 Comparison on VisDrones after using the normal model instead of the Backbone part. Including (Faster-RCNN<sup><a data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 19" title="Ren, S. et al. Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 859 (2015)." href="/articles/s41598-024-81201-8#ref-CR19" id="ref-link-section-d599720680e6764">19</a></sup> and Cascade-RCNN<sup><a data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 62" title="Cai, Z. & Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 6154–6162 (2018)." href="/articles/s41598-024-81201-8#ref-CR62" id="ref-link-section-d599720680e6768">62</a></sup>), reporting the average precision (AP) for different IOU thresholds, respectively. (Experimented on VisDrone 2019 dataset).