Table 3 Methods information comparisons [MB: megabyte; M: million; S: seconds per image (A6000 GPU time)].

From: Regressive vision transformer for dog cardiomegaly assessment

Datasets

Depth

Size (MB)

Parameters (M)

Training time (S)

Inference time (S)

GoogleNet24

22

27

5.631

0.0643

0.0433

VGG1625

16

528

134.310

0.0857

0.0500

ResNet5026

50

96

23.004

0.0700

0.0450

DenseNet20127

201

77

20.037

0.0986

0.0550

Inceptionv328

48

89

24.377

0.8038

0.0513

Xception29

71

85

37.916

0.0957

0.0517

InceptionResnetV230

164

209

54.325

0.1029

0.0533

NasnetLarge31

533

332

84.769

1.7736

0.4317

EfficientNetB732

438

256

63.818

2.7429

0.5083

Vision transformer17

225

327.366

85.817M

0.1271

0.0817

CONVT33

208

327.226

85.780

0.0950

0.0667

Beit_large34

369

1354.662

304.662

7.7307

2.2033

RVT

340

74.965

19.626

4.3943

0.9200

  1. The inference time includes both validation and test datasets). Note that parameters are trainable parameters using our dataset, and they will be different from the number of parameters in the original model.