Table 4 Comparison with state-of-the-art methods on the SAIOD dataset.

From: A hybrid ResNet50-vision transformer model with an attention mechanism for aerial image classification

Method

Accuracy (%)

Pretrained models

 AlexNet12,27

85.92

 SqueezeNet12,28

88.52

 GoogleNet12,29

89.40

 ResNet-5012,20

86.40

Transformer-based models

 ViT12,22

90.00

 Swin-transformer12,30

90.40

 SwinSight Net (Pradhan et al.12)

93.16

 Yolo-based models

 Pradhan et al.13

95.33

Proposed models

 Proposed model cross attention

95.52

 Proposed model based MHA

95.80

  1. Significant values are in bold