Table 7 5-fold cross-validation results of best-performing models in each method.

From: Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics

Task

Metric

Method

Fold 1

Fold 2

Fold 3

Fold 4

Fold 5

Mean

Whole building façade

F1 score

Swin Transformer

90.22

90.09

90.22

90.22

90

90.15

ViT

88.93

88.79

88.93

88.93

88.69

88.85

PVT

88.35

88.2

88.35

88.35

88.09

88.27

MobileViT

87.43

87.28

87.43

87.43

87.17

87.35

Axial Transformer

86.96

86.8

86.96

86.96

86.69

86.87

Accuracy

Swin Transformer

91.78

91.62

91.78

91.78

91.51

91.69

ViT

90.58

90.39

90.58

90.58

90.26

90.48

PVT

89.99

89.78

89.99

89.99

89.64

89.88

MobileViT

89.12

88.91

89.12

89.12

88.76

89.01

Axial Transformer

88.69

88.48

88.69

88.69

88.34

88.58

First story

F1 score

Swin Transformer

89.74

89.59

89.74

89.74

89.49

89.66

ViT

88.29

88.14

88.29

88.29

88.03

88.21

PVT

87.69

87.53

87.69

87.69

87.41

87.6

MobileViT

86.74

86.57

86.74

86.74

86.46

86.65

Axial Transformer

86.12

85.95

86.12

86.12

85.83

86.03

Accuracy

Swin Transformer

92.29

92.14

92.29

92.29

92.04

92.21

ViT

90.92

90.77

90.92

90.92

90.66

90.84

PVT

90.56

90.4

90.56

90.56

90.28

90.47

MobileViT

89.64

89.47

89.64

89.64

89.36

89.55

Axial Transformer

89.26

89.09

89.26

89.26

88.97

89.17