Table 7 Effect of different deep learning models on the evaluation criteria values for UAV images.

From: Remote sensing image description based on word embedding and end-to-end deep learning

Model features

Fusion

Word

P

R

F

P

R

F

CNN25

C

0.8614

0.8391

0.8501

0.8749

0.8432

0.8588

Bi-LSTM28

B-L

0.8597

0.8324

0.8509

0.8802

0.8466

0.8631

Attention29

A

0.8564

0.8312

0.8436

0.8713

0.8422

0.8565

Dense Net31

D-N

0.8658

0.8414

0.8534

0.8679

0.8435

0.8606

Attention-CNN-Bi-LSTM

A-C-B

0.8732

0.8419

0.8574

0.8766

0.8421

0.8592

Attention-CNN-IndRNN

A-C-I

0.8751

0.8441

0.8593

0.8708

0.8411

0.8557

CNN_LSTM

C-L

0.8649

0.8307

0.8475

0.8692

0.8411

0.8549

CNN-Bi-LSTM

C-B

0.8677

0.8324

0.8497

0.8701

0.8420

0.8558

CNN-IndRNN

C-I

0.8607

0.8369

0.8486

0.8655

0.8382

0.8516

End-to-end

E2E

0.8837

0.8563

0.8698

0.9234

0.8911

0.9069