Table 1 Comparison of spinal CT models on the MultiSpine benchmark

From: Structure-aware multi-task learning with domain generalization for robust vertebrae analysis in spinal CT

Method

Segmentation

Identification

Lesion detection

 

Dice (%)

ID Acc (%)

Precision (%)

Recall (%)

AP (%)

nnU-Net7

77.5 ± 0.4

60.7 ± 2.2

62.4 ± 1.5

58.2 ± 1.5

60.6 ± 1.5

UNETR19

88.1 ± 0.3

79.5 ± 1.0

65.8 ± 1.2

64.3 ± 1.2

65.8 ± 1.2

TransBTS46

88.3 ± 0.3

80.1 ± 1.0

66.4 ± 1.2

64.8 ± 1.2

66.3 ± 1.2

H2Former22

88.6 ± 0.3

81.3 ± 0.9

66.9 ± 1.2

65.7 ± 1.2

66.9 ± 1.2

Scribformer23

88.7 ± 0.3

82.4 ± 0.9

67.4 ± 1.1

68.2 ± 1.1

67.8 ± 1.1

BLDS47

84.4 ± 0.4

70.5 ± 1.6

60.8 ± 1.7

55.3 ± 1.7

57.4 ± 1.7

Dense-U-Net6

82.2 ± 0.4

75.6 ± 1.3

58.7 ± 1.9

52.9 ±1.9

55.4 ± 1.9

VerFormer48

83.9 ± 0.3

82.4 ± 0.9

66.3 ± 1.1

67.4 ± 1.1

67.2 ± 1.1

Tao et al.49

84.5 ± 0.3

83.3 ± 0.8

65.9 ± 1.2

66.8 ± 1.2

66.2 ± 1.2

VerteFormer50

86.0 ± 0.3

84.2 ± 0.7

67.9 ± 1.1

67.4 ± 1.1

67.7 ± 1.1

VertDetect51

80.5 ± 0.6

80.6 ± 1.0

58.6 ± 1.9

52.3 ± 1.9

55.2 ± 1.9

Spineclue52

80.3 ± 0.6

80.4 ± 1.0

60.4 ± 1.7

55.6 ± 1.7

57.3 ± 1.7

Ortho2D53

78.4 ± 0.7

78.5 ± 1.1

55.4 ± 2.0

50.7 ± 2.0

52.6 ± 2.0

VertNet54

82.3 ± 0.5

82.6 ± 0.8

60.5 ± 1.6

58.2 ± 1.6

59.5 ± 1.6

VertebraFormer (ours)

89.3 ± 0.2

85.6 ± 0.6

71.1 ± 1.0

69.0 ± 0.9

68.7 ± 1.1

  1. For each metric, we report mean ± 95% BCa bootstrap confidence intervals computed via patient-level resampling (10,000 replicates).