Introduction

Idiopathic scoliosis (IS) is a prevalent condition in pediatric orthopedics, manifesting primarily during puberty, characterized as a three-dimensional (3D) spinal deformity for which no specific underlying cause of the malformation can be identified1,2,3. IS affects approximately 2–3% of adolescents globally and not only leads to physical discomfort but also has profound psychological and social implications4,5,6. Early and accurate diagnosis is crucial for determining the appropriate treatment strategy, which can range from conservative bracing to surgical intervention in severe cases7,8.

Accurate measurement of spinal curvature severity, particularly in the sagittal and coronal planes, is essential for guiding treatment strategies9,10. Traditionally, these measurements, which are critical for monitoring disease progression and evaluating treatment effectiveness, have been assessed using radiography. Although X-ray imaging provides detailed images of spinal alignment, its repeated use raises significant health concerns owing to ionizing radiation exposure11,12. The biplanar low-dose X-ray device system significantly reduces radiation doses; however, it involves ionizing radiation, potentially limiting its use in regular follow-up assessments13.

Magnetic resonance imaging is a radiation-free alternative; however, it is hampered by high costs, limited accessibility, and the challenge of capturing the entire spine in the coronal plane14. Therefore, ultrasound techniques have become popular for scoliosis evaluation, particularly for measuring the coronal and sagittal angles. These techniques eliminate the risk of exposure to ionizing radiation, making them advantageous for adolescents requiring frequent monitoring15,16,17,18,19. Validation studies have shown that ultrasound imaging is a reliable method for measuring spinal curvature in children and adults, with high intra- and inter-rater reliabilities and a strong correlation with the commonly used Cobb method15,16.

Recent advances have included the introduction of 3D ultrasound systems capable of assessing scoliosis in both the coronal and sagittal planes16,19. Research exploring coronal-sagittal curvature in adolescent IS supports the reliability of ultrasound methods17,19,20. These studies validated the accuracy of various coronal ultrasound angles based on anatomical landmarks and found strong correlations with traditional Cobb angles without significant differences in reliability or validity16,17,19,20.

Manual ultrasound measurements for scoliosis assessment exhibit notable Limitations in efficiency and data management. Manual ultrasound measurements demand the identification of landmarks vertebra by vertebra. This process leads to extended procedure durations, with each measurement taking approximately 20 min. This temporal burden demonstrates cumulative significance in adolescent populations requiring serial dynamic monitoring. Furthermore, manual ultrasound measurements lack standardized data archiving protocols, where original ultrasound images and manual annotations are frequently preserved in fragmented formats, compromising comprehensive data traceability during follow-up evaluations or inter-institutional consultations. Collectively, these dual deficiencies in operational efficiency and informatics infrastructure substantially constrain the scalability of manual measurement techniques in clinical implementation16,21. Semiautomatic methods have shown promise in overcoming these limitations, facilitating the integration of ultrasound into scoliosis diagnosis, thereby improving orthotic treatment outcomes through its use in the development of the spine22,23. This study presents a new method for measuring the Cobb angle based on automated 3D ultrasonography, which can reduce the errors associated with scoliosis monitoring.

However, the stability and accuracy of current automated ultrasound systems have not yet been fully validated clinically. Research gaps exist regarding the correlation of their measurements with the gold standard X-ray Cobb angle, inter-operator reproducibility, and reliability in long-term monitoring. In this study, we assessed the accuracy and reliability of ultrasonography for automatically measuring the sagittal and coronal angles in IS. By comparing ultrasonographic results with X-ray measurements, we advocate for the integration of ultrasonography into the standard diagnostic protocol for IS, aiming to reduce the health risks associated with frequent X-ray exposure and enhance the accessibility of premier diagnostic services across diverse healthcare settings.

The study is aimed at assessing the reliability and validity of 3D automatic ultrasound imaging in patients with idiopathic scoliosis aged 8–25 years. This will be achieved by comparing ultrasound-derived measurements, specifically the ultrasound curve angle (UCA) and ultrasound lamina angle (USLA), with conventional radiographic parameters, including the Cobb angle, thoracic kyphosis (TK) and lumbar lordosis (LL).

Methods

Trial design

This cross-sectional study was conducted at our hospital with approval from the First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine) ethics committee (Reference Number: 2024-KS-467-01,2024.01.22). All procedures were performed in compliance with relevant laws and institutional guidelines, and the study was registered with ClinicalTrials (NCTO6510049). All participants or their legal guardians provided written informed consent. We prospectively enrolled 90 participants between February 2024 and June 2024. All participants and/or their legal guardians consented to the publication of identifying information/images in online open-access publications.

Participants

Participants who fulfilled the selection criteria were enrolled at the Scoliosis Treatment Centre of the First Affiliated Hospital of Zhejiang Chinese Medical University. Inclusion criteria: Patients aged 8–25 years with a diagnosis of IS who were deemed able to adequately cooperate with the standardized ultrasound examination procedures. Exclusion criteria: Patients were excluded if they met any of the following criteria: (1) Known syndromic or secondary causes of scoliosis (e.g., Marfan syndrome, history of ankylosing spondylitis, and spinal neurofibromatosis); (2) Significant concurrent medical conditions that could confound spinal assessment or study outcomes (e.g., uncontrolled endocrine disorders, severe cardiopulmonary disease, and active malignancy); (3) Major psychiatric disorders that could impair the ability to understand study procedures or provide reliable feedback (e.g., schizophrenia and severe cognitive impairment); and (4) Any other condition determined by the investigators to potentially interfere significantly with the safe conduct of the study or the accurate interpretation of its results (e.g., severe obesity precluding adequate ultrasound imaging and documented non-cooperation).

Blinding

A single-blind method was employed to minimize bias and ensure objectivity. To reduce measurement error, both the ultrasound operator and the X-ray measurement physician were blinded to the participant’s imaging data. This measure was designed to prevent any influence from previous results and enhance measurement precision and impartiality. The outcome assessors were also blinded.

Intervention

3D ultrasound measurements

The Scolioscan 3D ultrasound imaging system (Model SCN803, Telefield Medical Imaging Ltd, Hong Kong, China), equipped with a linear probe (central frequency, 7.5 MHz and width, 7.5 cm), was used for spine image acquisition via free-hand scanning. The software utilized for image acquisition and processing was Scolio3D (Version 0.1.12.2, Telefeld Medical Imaging Limited, Hong Kong, China). Patients were positioned to stand with their arms naturally resting and supported by 4-pole supporters attached to a chest/hip board to maintain their posture. A separate screen displayed the real-time positioning of the moving probe relative to the predefined spinal levels (C7, T1, L4, and S1). Patients were instructed to breathe normally during scanning, after which optimal coronal and sagittal volume projection images (VPI) were automatically generated by the system for subsequent assessments (Fig. 1). The automatic 3D ultrasound measurement technique utilizes a deep-learning model, which incorporates a dual-branch network to detect candidate key points and performs vertebral segmentation on both ultrasound coronal and sagittal images. Additionally, two physicians with more than 5 years of experience performed 3D ultrasound measurements.

Fig. 1
figure 1

Ultrasound measurement.

X-ray image acquisition

For spinal X-ray imaging, the patients stood upright. For anteroposterior radiographs, patients faced the X-ray proprietary full spine support glass backboard in an anterior-posterior stance, with their backs touching the board and hands at their sides (Fig. 2a). For the lateral radiographs, patients were positioned laterally, hands straight, shoulder-width apart in front, palms down, and touching the backboard (Fig. 2b). The radiology staff adjusted the exposure parameters, guided patients to breathe steadily while remaining still for image clarity, and reviewed the images to ensure the visibility of all spinal components and assessment accuracy. One radiology staff member with more than 5 years of experience performed X-ray Imaging.

Fig. 2
figure 2

X-ray image acquisition (a) Anteroposterior radiographs (b) Lateral radiographs.

Acquisition time

Ultrasound and X-ray examinations were performed on the day of each patient’s visit. The interval between each ultrasound measurement was ≤ 3 min.

Angle measurement

Two physicians with more than 5 years of experience assessed coronal and sagittal X-ray images. Discrepancies of ≥ 5° in measurements prompted a review by a third physician. The final result was determined by averaging the measurement from the third physician and the measurement from whichever of the first two physicians was closer to the third physician’s value (Figs. 3a and 4a).

Fig. 3
figure 3

(a) Coronal Cobb angle as measured on an X-ray coronal image (b) Four automated ultrasound coronal VPI, each showcasing the UCA measurements, annotated with yellow numbers, where the yellow-line segment marks the starting point of the UCA measurement, the red-line segments mark the automatically identified thoracic segments, and the blue-line segments delineate the lumbar segments. UCA: ultrasound curve angle, VPI: volume projection image.

Fig. 4
figure 4

(a) TK/LL angle measured on the sagittal X-ray image (b) Eight sagittal plane images from four ultrasound scans, each scan yielding the left and right vertebral lamina Yellow numbers indicate the USLA for each image, with the final USLA angle representing the average of the automatically measured left and right sagittal lamina USLA. Red line segments denote the USLA of the thoracic and lumbar vertebrae. LL: lumbar lordosis, TK: thoracic kyphosis, USLA: ultrasound lamina angle, VPI: volume projection image.

The 3D ultrasound imaging system was operated by two trained professionals, each of whom conducted two scans. The scan interval did not exceed 3 min. Ultrasound curve angle (UCA) measurements were taken in the thoracic and lumbar segments of the coronal spine ultrasound images (Fig. 3b), and ultrasound lamina angle (USLA) measurements were taken from the thoracic kyphosis (T5–T12) and lumbar lordosis (L1–L5) segments of the sagittal spine ultrasound images (Fig. 4b). These angles were subsequently compared to those from X-ray images, serving as the clinical “gold standard” to validate the accuracy and reliability of ultrasound imaging.

Outcome

The general patient characteristics collected at baseline included age (years), sex (female or male), weight (kg), height (cm), body mass index (BMI, kg/m2)24, and major curve location (thoracic, thoracolumbar, or lumbar)25.

The primary outcomes of this study focus on evaluating the reliability and validity of 3D automatic ultrasound imaging for measuring coronal and sagittal spinal angles in patients with IS, with conventional X-ray imaging as the reference standard17.

Reliability of 3D automatic ultrasound measurements: The reliability of 3D automatic ultrasound measurements is evaluated through intra-operator and inter-operator consistency. Intra-operator reliability refers to the consistency of measurements obtained by the same operator using 3D automatic ultrasound, Inter-operator reliability reflects the consistency of measurements obtained by two different operators using 3D automatic ultrasound for UCA and USLA.

The validity of 3D automatic ultrasound measurements depends on their correlation and agreement with X-ray measurements. For the coronal plane, this includes comparing the ultrasound curve angle (UCA) with the gold-standard Cobb angle; for the sagittal plane, it involves comparing the ultrasound lamina angle (USLA) with sagittal X-ray measurements.

Subgroups were defined according to sex (male or female), BMI (abnormal or normal weight), and Cobb angle (< 20° or ≥ 20°).

Due to the different normal ranges of BMI between children and adults of different ages and sexes, we referred to the BMI standards of Chinese children and adults and categorized the patients into normal and abnormal subgroups according to the BMI standards of Chinese children and adults(Supplementary Table S1). The abnormal weight categories were underweight, overweight, and obese26,27,28,29.

Safety monitoring and adverse events

All expected and unexpected adverse events from this study were recorded and monitored, ensuring a rapid channel for managing emergencies or abnormal sensations requiring clinician intervention.

Data analyses

IBM SPSS Statistics for Windows, version 23.0 (IBM Corp., Armonk, NY, USA) was used for data analysis. Descriptive statistics (expressed as mean ± standard deviation [x ± s]) were employed for continuous data, while frequency, rate, or composition ratio were used for categorical data. UCA or USLA measurements of the thoracic or lumbar curves were compared using a paired t-test for both operators 1 and 2. The statistical significance of all tests was set at P < 0.0530.

The intraclass correlation coefficient (ICC) is commonly employed to assess the consistency of results obtained using different measurement methods or observers on the same study participants. In this study, the ICC was used to determine the concordance between the ultrasound angle and X-ray results. ICCs < 0.5, 0.5–0.75, 0.75–0.9, and > 0.90 indicate poor, moderate, good, and excellent reliability, respectively17,31.

The Bland–Altman method provides a visual assessment of the differences between quantitative evaluations by showing a residual-like plot of the difference in the observed pairs of readings against their mean values32. This method was used to compare X-ray and ultrasound angles.

Pearson’s correlation was employed to assess the linear relationship between two continuous variables. When a change in one variable is related to a proportional change in the other variable, the two variables have a linear relationship. In this study, this method was also used to compare the X-ray and ultrasound angles. Furthermore, subgroups were defined according to sex (male or female), BMI (abnormal or normal weight), and Cobb angle (< 20° or ≥ 20°). Correlation coefficients (R) and regression equation analyses were performed for these subgroups31.

The allowable error limits were T ± 5°. This difference was acceptable, and the two methods agreed well33,34.

Results

Baseline measurements

Of the 90 patients initially enrolled, 10 were excluded for various reasons: three because of inadequate and uneven ultrasound coupler application resulting in suboptimal imaging, two due to blurred radiological images impairing measurement accuracy, and five due to a coronal Cobb angle < 10°. The baseline characteristics of the remaining 80 patients are presented in Table 1.

Table 1 Baseline characteristics.

Reliability

Coronal measurements

Intra-operator reliability: For Operator 1, the ICCs for the thoracic and lumbar segments were 0.944 (95% confidence interval [CI]: 0.904–0.968, P = 0.450) and 0.953 (95% CI: 0.922–0.972, P = 0.397), respectively (Supplementary Table S2). Operator 2 demonstrated higher reliability, with ICCs of 0.973 (95% CI: 0.953–0.984, P = 0.861) and 0.920 (95% CI: 0.867–0.952, P = 0.422) for the thoracic and lumbar segments, respectively (Supplementary Table S3).

Inter-operator reliability: The ICC values were 0.916 (95% CI: 0.859–0.951) and 0.808 (95% CI: 0.701–0.879) for the thoracic and lumbar segments, respectively. The mean UCAs recorded by Operator 1 were 22.08 ± 10.47 and 20.84 ± 8.77 for the thoracic and lumbar segments, respectively. Operator 2 recorded slightly higher mean UCA values of 23.04 ± 11.32 and 21.67 ± 8.43 for the thoracic and lumbar segments, respectively (P-values: thoracic = 0.122, lumbar = 0.221), indicating high reliability in the coronal plane, especially for the thoracic segments (Supplementary Table S4).

Sagittal measurements

Intra-operator reliability: The ICC values for Operator 1 for the thoracic and lumbar segments were 0.914 (95% CI: 0.866–0.945, P = 0.378) and 0.875 (95% CI: 0.805–0.920, P = 0.234), respectively (Supplementary Table S2). Operator 2 reported ICC values of 0.930 (95% CI: 0.890–0.955, P = 0.829) and 0.919 (95% CI: 0.874–0.948, P = 0.857) for the thoracic and lumbar segments, respectively, indicating high reliability in sagittal plane measurements for both operators, with Operator 2 displaying slightly higher reliability scores (Supplementary Table S3).

Inter-operator reliability: The ICC values were 0.772 (95% CI: 0.634–0.857) and 0.656 (95% CI: 0.505–0.767) for the thoracic and lumbar segments, respectively. The mean USLA for Operator 1 was 26.52 ± 9.26 in the thoracic segment and 15.88 ± 7.90 in the lumbar segment, compared to Operator 2’s measurements of 28.95 ± 9.64 and 14.00 ± 8.54 in the thoracic and lumbar segments, respectively (P-values: thoracic < 0.001, lumbar = 0.014) (Supplementary Table S4).

The paired t-test suggested no significant difference between the two operators in both coronal and sagittal measurements (Supplementary Tables S2 and 3).

Validity

Coronal measurements: The correlation coefficients (R) for the thoracic and lumbar segments were 0.895 and 0.869 (Table 2), respectively, with R² values of 0.802 and 0.755, indicating strong linear correlations (Fig. 5a). The Bland–Altman plots showed excellent agreement between the ultrasound and radiographic measurements for both segments (Fig. 6a and b).

Table 2 Comparison of ultrasound and X-ray measurements of the thoracic or lumbar curves.
Fig. 5
figure 5

Correlations (R, R2) and regression equations between the radiographic angles and UCA/USLA are shown for the thoracic (blue) and lumbar (red) curves in both coronal (a) and sagittal (b) planes. UCA: ultrasound curve angle, USLA: ultrasound lamina angle.

Fig. 6
figure 6

Bland–Altman plots were employed to elucidate the disparities between the Cobb angle and UCA for thoracic (a) and lumbar (b) segments in the coronal plane and between the TK/LL angle and USLA for thoracic (c) and lumbar (d) segments in the sagittal plane. The blue centreline represents the mean deviation. The blue dashed Line indicates the 95% concordance limit. The red dashed line indicates the clinically acceptable limit. LL: lumbar lordosis, TK: thoracic kyphosis, UCA: ultrasound curve angle, USLA: ultrasound lamina angle.

Sagittal measurements: The R values for the thoracic and lumbar spine segments were 0.405 and 0.112 (Table 2), respectively, with R² values of 0.164 and 0.013, suggesting a low linear correlation (Fig. 5b). The Bland–Altman plots revealed a substantial degree of agreement between USLA measurements and TK/LL assessments in the thoracic and lumbar spine segments. However, the differences between the two groups exceeded the clinically acceptable limits (Fig. 6c and d).

Subgroup validity

In the thoracic and lumbar coronal segments, higher correlation coefficients (R and R²) were determined in patients those with abnormal weight, and those with Cobb angles ≥ 20°. The correlation coefficients also showed obvious differences in sex. In the thoracic coronal segments, higher correlation coefficients (R and R²) were noted in male patients. In the lumbar coronal segments, higher correlation coefficients (R and R²) were noted in female patients (Supplementary Tables S5 and 6, Supplementary Figures S1 and 2).

In the sagittal thoracic and lumbar segments, the correlation coefficients differed according to sex, BMI, and Cobb angle. In the thoracic sagittal segments, higher correlation coefficients (R and R²) were noted in patients female patients, those with normal weight, and those with Cobb angles ≥ 20°. In the lumbar sagittal segments, higher correlation coefficients (R and R²) were noted in patients male patients, those with normal weight, and those with Cobb angles < 20° (Supplementary Tables S7 and 8, Supplementary Figures S3 and 4).

Adverse events and adverse reactions

Four adverse events in the ultrasound measurements were observed: three cases of dizziness and one case of dizziness caused by cold sweats due to hypoglycemia induced by excessive hunger. One case of dizziness was caused by excessive stress. The analysis of differences in risk between ultrasound and X-ray (4 [5%] of 80 vs. 0 [0%] of 80) was not performed because of the low number of reported adverse events. Eleven adverse reactions were recorded: six patients experienced dizziness, three experienced dizziness with cold sweats, and two collapsed. No adverse events or adverse reactions were observed during X-ray image acquisition. Most adverse reactions were mild; however, overall adverse reactions were slightly more common among patients who underwent ultrasound than among those who underwent X-ray (11 [3%] of 80 vs. 0 [0%] of 80).

Discussion

In this study, automated 3D ultrasound imaging demonstrated excellent reliability and validity for Cobb angle measurement in IS, providing a radiation-free option for routine follow-up, especially in children. Our findings are consistent with those of previous manual or semi-automated studies35,36,37. Alongside other recent automated measurement work19,31,32,33,34,35,36,37,38,39, they demonstrate that improved image segmentation and landmark detection can cut processing time and operator input, paving the way for low-dose, clinic-ready surveillance. Therefore, integrating automated ultrasound into long-term scoliosis management is a promising future direction.

In a cohort of 80 patients with IS (Cobb 10–52.5°, age 8–25 years), automated 3D ultrasound achieved intra- and inter-operator ICCs of 0.875–0.973 and 0.656–0.916, respectively, in both coronal and sagittal planes. Although slightly below the figures reported by Brink et al. (ICC ≥ 0.84)17 and Lee et al. (intra ≥ 0.973, inter ≥ 0.925; sagittal ICC ≥ 0.91)16,19, these values remain clinically acceptable and demonstrate non-inferiority to manual techniques. Our study demonstrated a strong agreement between automated 3D ultrasound and radiographic assessments in the coronal plane, with correlation coefficients of up to 0.895. This aligns with previous studies: De Reuver et al. reported R² = 0.968 (thoracic) and 0.923 (lumbar)40; Brink et al. found R² = 0.987 (thoracic) and 0.970 (thoracolumbar)17; and Lee et al. reported R² = 0.893 and 0.884 using a customized VPI technique19. These results validate the clinical utility of automated ultrasound for scoliosis assessment and confirm its comparability to manual and semi-automated methods, supporting a shift toward safer and more precise diagnostics.

Additionally, although our fully automated selector for sagittal VPI frames improves efficiency, it can overlook subtleties that manual selection captures. Consistent with this limitation, our sagittal correlations were lower than those reported previously: Li et al. found R² = 0.549 and 0.657 for TK and LL, respectively21, while Lee et al. reported R² ≥ 0.574 (thoracic) and 0.635 (lumbar)16; in our cohort, thoracic values exceeded those of lumbar. Several factors may explain this discrepancy, including sub-optimal beam incidence caused by LL, soft-tissue attenuation that obscures laminar contours, limited algorithmic recognition of sagittal landmarks, and differing landmark definitions between USLA and radiography16,19,36,40. We used USLA and fixed TK (T5–T12) and LL (L1–L5), a factor that may have amplified sagittal discrepancies. Although Bland–Altman plots showed overall agreement between ultrasound and X-ray, the limits of agreement exceeded accepted clinical thresholds33. Future studies should refine probe hardware and positioning, harmonize sagittal landmark definitions, enhance landmark-extraction algorithms, and implement stricter operator training to improve sagittal accuracy.

Measurement precision was influenced by heterogeneity in BMI, sex, and curve severity within our cohort. Sub-group analysis showed that ultrasonographic accuracy varied systematically across these variables. In the coronal plane, sex and abnormal weight affected performance, with superior correlations possibly linked to differences in soft-tissue contrast or spinal loading. Curves ≥ 20° produced the strongest correlations, underscoring ultrasound’s utility in detecting more pronounced deformities. Sagittal results were less consistent that may reflect probe-fit limitations or the modest sample size. Overall, these findings highlight the need for larger, stratified cohorts and adaptive imaging algorithms tailored to patient morphology, particularly when assessing complex or severe scoliosis.

Safety assessments revealed only minor adverse effects—consistent with Brink et al.17 and far less concerning than cumulative X-ray exposure—reinforcing ultrasound’s value as a reliable, real-time modality for continuous scoliosis monitoring. Recognizing its inherent differences from radiography (Supplementary Table S9), future research should focus on identifying and mitigating these reactions while further refining ultrasonography to support timely treatment adjustments and optimize patient outcomes.

This study’s limitations include modest sample size, demographic variability, operator dependence, and differing measurement techniques, all of which may affect generalizability and measurement accuracy. Variations in X-ray correlation, difficulty assessing mild curvatures, and the lack of longitudinal follow-up also impact clinical applicability.

Future research should prioritize improving sagittal measurement accuracy and analyzing how patient and procedural factors influence automated measurements. Standardizing techniques, refining automated algorithms, and integrating machine learning will further strengthen measurement reliability.

In conclusion, automated 3D ultrasonography demonstrates high reliability and validity in the coronal plane, with consistent but more variable performance in the sagittal plane. While ultrasound cannot fully replace X-ray, it is a safer and more effective option for monitoring conservative treatment and enhancing scoliosis management.