Table 5 Statistics of model performance based on F1 score and accuracy.

From: Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics

Task (Metric)

Model

Min

25%

Median

Mean

75%

Max

Std

Whole building façade (F1 score)

Swin Transformer

87.34

88.61

89.73

89.66

90.14

90.22

0.92

ViT

85.67

86.81

88.03

87.95

88.56

88.93

1.01

PVT

84.52

86.17

87.36

87.28

88

88.35

1.1

MobileViT

83.82

85.25

86.46

86.39

87.13

87.43

1.12

Axial Transformer

83.12

84.78

86.04

85.97

86.63

86.96

1.15

Whole building façade (Accuracy)

Swin Transformer

88.27

89.43

90.54

90.49

91.47

91.78

1.17

ViT

86.45

88.33

89.43

89.34

90.14

90.58

1.38

PVT

85.32

87.74

88.9

88.86

89.85

89.99

1.51

MobileViT

84.33

86.61

87.75

87.72

88.9

89.12

1.54

Axial Transformer

83.77

86.18

87.26

87.19

88.32

88.69

1.49

First story (F1 score)

Swin Transformer

88.12

89.24

90.37

90.29

90.97

89.74

1.08

ViT

86.84

87.97

89.09

88.96

89.71

88.29

1.12

PVT

85.92

87.08

88.2

88.07

88.82

87.69

1.18

MobileViT

85.12

86.33

87.58

87.42

88.15

86.74

1.2

Axial Transformer

84.47

85.72

86.87

86.74

87.52

86.12

1.22

First story (accuracy)

Swin Transformer

89.14

90.32

91.27

91.23

92.12

92.29

1.08

ViT

87.77

89.13

90.25

90.19

90.97

90.92

1.12

PVT

86.9

88.53

89.67

89.62

90.33

90.56

1.18

MobileViT

86.12

87.72

88.88

88.82

89.53

89.64

1.2

Axial Transformer

85.45

87.03

88.18

88.13

88.87

89.26

1.22