Table 1 Overview of image processing pipeline, model specifications, training details, and demographic analysis. Chi-squared tests were used to assess differences in sex distribution across datasets, and one-way ANOVA was used to evaluate age differences.
Model | Image classifier | Vertebra detection | Landmark detection |
|---|---|---|---|
Vision transformer (ViT-B/16) | YOLOV8x | Sequential CNN (6 – Stage) | |
Training inputs | X-ray image X-ray class | X-ray image Vertebra bounding boxes | X-ray image Gaussian heatmaps |
Prediction outputs | X-ray class confidence | Vertebra boxes & confidence | Landmark confidence maps |
Image size | 320 × 320 pixels | 1280 × 1280 pixels | 640 × 640 pixels - Pelvis 224 × 224 pixels - Vertebrae |
Batch size | 24 | 10 | 10 - Pelvis, 45 - Vertebrae |
Trainable parameters | ~ 86 M | ~ 11 M | ~ 31 M |
Training hardware | 4 x Tesla K80 | 2 x Tesla V100 | 2 x Tesla V100 |
Total images | 52,772 | 9,875 | 25,249 |
Training images | 36,939 | 6,912 | 17,674 |
Validation images | 10,554 | 1,975 | 5,051 |
Test images | 5,279 | 988 | 2,524 |
Imaging centers | 391 | 275 | 384 |
% Preoperative imaging | 99.34% | 95.55% | 99.29% |
Sex (% female) | Train: 52.9% Validation: 53.7% Test: 52.9% (p = 0.689) | Train: 54.3% Validation: 52.9% Test: 54.1% (p = 0.804) | Train: 53.3% Validation: 53.2% Test: 51.2% (p = 0.274) |
Age ± SD (range) | Train: 63.9 ± 11.6 (14–94) Validation: 63.8 ± 11.6 (14–95) Test: 64.0 ± 11.5 (14–94) (p = 0.677) | Train: 64.3 ± 11.6 (18–95) Validation: 63.8 ± 11.5 (26–92) Test: 64.8 ± 11.9 (29–92) (p = 0.206) | Train: 65.2 ± 11.6 (15–96) Validation: 65.4 ± 11.8 (15–95) Test: 65.9 ± 11.3 (15–94) (p = 0.055) |