Table 10 Unified model vs. equivalent three-model pipeline (matched latency)

From: Structure-aware multi-task learning with domain generalization for robust vertebrae analysis in spinal CT

Method

Segmentation dice

ID accuracy

Lesion AP

Latency (A6000) (ms)

Peak memory (GB)

VertebraFormer (All-in-one)

89.3% (±0.2)

85.6% (±0.6)

68.7% (±1.1)

~72

~13.9

3-Model Pipeline

86.5% (±0.4)

81.0% (±0.8)

60.5% (±1.2)

~75

~15.8

  1. VertebraFormer is compared to a modular pipeline (UNETR for segmentation, ResNet classifier for ID, YOLO-based lesion detector). Both are constrained to ~72 ms total runtime on RTX A6000 (by reducing pipeline model sizes/resolution). Metrics are on the MultiSpine test set. The unified approach yields higher accuracy on all tasks. Memory is the peak GPU usage. Confidence intervals (95% CI) are shown for the metrics.