Table 10 Unified model vs. equivalent three-model pipeline (matched latency)

Method	Segmentation dice	ID accuracy	Lesion AP	Latency (A6000) (ms)	Peak memory (GB)
VertebraFormer (All-in-one)	89.3% (±0.2)	85.6% (±0.6)	68.7% (±1.1)	~72	~13.9
3-Model Pipeline	86.5% (±0.4)	81.0% (±0.8)	60.5% (±1.2)	~75	~15.8

VertebraFormer is compared to a modular pipeline (UNETR for segmentation, ResNet classifier for ID, YOLO-based lesion detector). Both are constrained to ~72 ms total runtime on RTX A6000 (by reducing pipeline model sizes/resolution). Metrics are on the MultiSpine test set. The unified approach yields higher accuracy on all tasks. Memory is the peak GPU usage. Confidence intervals (95% CI) are shown for the metrics.

Quick links

Search