Table 1 Comparison of spinal CT models on the MultiSpine benchmark

Method	Segmentation	Identification	Lesion detection
	Dice (%)	ID Acc (%)	Precision (%)	Recall (%)	AP (%)
nnU-Net⁷	77.5 ± 0.4	60.7 ± 2.2	62.4 ± 1.5	58.2 ± 1.5	60.6 ± 1.5
UNETR¹⁹	88.1 ± 0.3	79.5 ± 1.0	65.8 ± 1.2	64.3 ± 1.2	65.8 ± 1.2
TransBTS⁴⁶	88.3 ± 0.3	80.1 ± 1.0	66.4 ± 1.2	64.8 ± 1.2	66.3 ± 1.2
H2Former²²	88.6 ± 0.3	81.3 ± 0.9	66.9 ± 1.2	65.7 ± 1.2	66.9 ± 1.2
Scribformer²³	88.7 ± 0.3	82.4 ± 0.9	67.4 ± 1.1	68.2 ± 1.1	67.8 ± 1.1
BLDS⁴⁷	84.4 ± 0.4	70.5 ± 1.6	60.8 ± 1.7	55.3 ± 1.7	57.4 ± 1.7
Dense-U-Net⁶	82.2 ± 0.4	75.6 ± 1.3	58.7 ± 1.9	52.9 ±1.9	55.4 ± 1.9
VerFormer⁴⁸	83.9 ± 0.3	82.4 ± 0.9	66.3 ± 1.1	67.4 ± 1.1	67.2 ± 1.1
Tao et al.⁴⁹	84.5 ± 0.3	83.3 ± 0.8	65.9 ± 1.2	66.8 ± 1.2	66.2 ± 1.2
VerteFormer⁵⁰	86.0 ± 0.3	84.2 ± 0.7	67.9 ± 1.1	67.4 ± 1.1	67.7 ± 1.1
VertDetect⁵¹	80.5 ± 0.6	80.6 ± 1.0	58.6 ± 1.9	52.3 ± 1.9	55.2 ± 1.9
Spineclue⁵²	80.3 ± 0.6	80.4 ± 1.0	60.4 ± 1.7	55.6 ± 1.7	57.3 ± 1.7
Ortho2D⁵³	78.4 ± 0.7	78.5 ± 1.1	55.4 ± 2.0	50.7 ± 2.0	52.6 ± 2.0
VertNet⁵⁴	82.3 ± 0.5	82.6 ± 0.8	60.5 ± 1.6	58.2 ± 1.6	59.5 ± 1.6
VertebraFormer (ours)	89.3 ± 0.2	85.6 ± 0.6	71.1 ± 1.0	69.0 ± 0.9	68.7 ± 1.1

For each metric, we report mean ± 95% BCa bootstrap confidence intervals computed via patient-level resampling (10,000 replicates).

Quick links

Search