Introduction

Cervical spondylosis is a seriously harmful disease occurring in any population, seriously affecting the health and quality of our lives. The diagnosis and treatment of cervical spondylosis usually require a combination of imaging and clinical symptoms. Cervical spondylosis is a degenerative disease that causes clinical manifestations such as neck pain, stiffness, headaches, numbness, tingling, and weakness in the arms or legs due to cervical degeneration in different parts. In the World Health Organization (WHO)’s “ Global Top Ten Persistent Diseases", cervical spondylosis ranks second. Among the global population, the number of people with cervical spondylosis is as high as 900 million. The prevalence of cervical spondylosis was 13.76%1 among 3900 selected participants, and this ratio differed significantly among the urban, suburban, and rural populations (13.07%, 15.97%, and 12.25%, respectively). With changes in lifestyle and working environments, cervical spondylosis is no longer a common condition among middle-aged and elderly people. The East Asia and Andean Latin America regions had the highest and lowest age-standardized incidence rates, with 1029.0 (910.5 to 1166.1) and 624.0 (550.3 to 708.3) per 100,000 population in 20172.

Compared to diseases with explicit lesions, such as cancer, hydroncus, or fracture, cervical spondylosis is a chronic, primarily non-traumatic and progressive condition without explicit lesions, which are usually diagnosed with the combination of appearance and professional metric measurement in clinical practice. According to the affected structures and clinical manifestations, cervical spondylosis could be caused by different types of degeneration, such as the degeneration of the cervical vertebrae curvature, vertebral instability, intervertebral disks and the intervertebral joints in the cervical spine, which compress or stimulate the adjacent spinal cord, nerve roots, and vertebral arteries, leading to the symptoms of arm pain, numbness, neck pain, weakness, incontinence, and impairment of gait (see Supplementary Table 1 in the Supplementary Information).

In particular, the pathological forms of cervical degeneration are diverse and vary greatly; correspondingly, the characterizations of different degradation phenomena on medical imaging also vary. Similarly, the therapeutic schedule of different cervical spondylosis types varies greatly, and accurate preoperative imaging is vital to determine the cause of compression as soon as possible to determine an appropriate surgical plan.

Common examinations for cervical spondylosis include X-rays, computed tomography (CT), and magnetic resonance imaging (MRI). CT is suitable for displaying changes in the spinal structure, but it exposes the body to relatively high doses of radiation. MRI shows the fine details of soft tissues, but it is time-consuming and expensive; in addition, patients with metal in their body may have contraindications. Compared with CT and MRI, X-ray radiation is cheaper, more convenient, and faster. More importantly, based on clinical experience, orthopedic experts have artificially set sagittal plane parameters for cervical spine X-rays, which have become key factors in determining surgical options for cervical spondylosis and have been widely used clinically. For example, if the X-ray shows cervical kyphosis, it is suggested to choose anterior surgery to improve the curvature; if it shows cervical spinal canal stenosis, it prompts the surgeon to choose posterior spinal canal enlargement surgery to relieve spinal cord compression; if it shows cervical segmental instability, it indicates the need for metal internal fixation to enhance stability.

In general, failure to correctly identify the sagittal factors may lead to wrong-procedure surgery and poor prognosis. However, an accurate diagnostic of cervical spondylosis requires extensive clinical experience. On the one hand, the distribution of medical resources in most countries is uneven, and senior doctors are concentrated in large cities at the nation’s top-class hospitals. While the experience of physicians at primary hospitals is limited, their diagnosis and treatment efficiency for cervical spondylosis may not be guaranteed. On the other hand, the workload of senior doctors has long been oversaturated, and the resources for on-site guidance to the primary hospitals are limited. Thus, how to transmit the experience accumulated over the years at tertiary and top-class hospitals in treating cervical spondylosis to primary hospitals is urgently needed, which will be beneficial to the majority of patients. In addition to the on-site guidance, a deep learning model trained with expert doctors’ experience, used as supervision in training, could be a solution to improve the diagnostic accuracy in primary hospitals, meanwhile, significantly saving diagnostic time for all hospitals.

Existing deep learning based studies related to the assessment of cervical degeneration can be broadly categorized into the localization-based and semantic segmentation-based approaches for degeneration assessment. Among them, some localization-based works are proposed with X-ray images to determine the presence of certain types of degeneration, such as the ossification of the posterior longitudinal ligament3 or spinal stenosis4, or the degree of spinal cord compression5. However, there are multiple factors that lead to the degeneration of the cervical spine, and the assessment of the overall situation is still mostly based on manual measurements6,7. Moreover, the methods8,9 of localizing body structures, such as cervical vertebrae, mainly focus on the center of the vertebral body, but ignore the edges and corner points and lack further assessment. Alternatively, some semantic segmentation-based studies10,11 have performed meticulous semantic segmentation of cervical spine structures, such as bones or disks, which extremely rely on the large scale of annotated data, and the edges of vertebral segments are usually not clear enough to meet the precision requirements from the doctors. In recent years, the methods of feature point detection have become popular for medical diagnostic tasks such as head12, hand13, lower limb14 and hip bone15. However, in contrast to the landmark detection used in the scenarios where bones are susceptible to differentiation, the cervical vertebrae are highly similar to each other, and are likely to result in the projection artifacts and structural overlap. In general, previous deep learning methods have made attempts on cervical analysis from some particular aspects, such as vertebral detection, spinal stenosis, intervertebral disc degeneration, those models for separate degenerative factors still could not meet the practical requirements of doctors. One intuitive idea could be combining all those models in a system. However, considering different models are designed in different architectures and trained with different data, directly combining previous models is difficult to achieve.

In this work, we propose a multi-modal collaboration cervical spondylosis diagnosis framework trained with multi-task learning, oriented to the clinical requirements of less annotation workload for doctors, and fine-grained measurements of multiple types of degeneration. According to the patient's symptoms (see Supplementary Table 1 in the Supplementary Information) and the expert knowledge, the cervical vertebrae curvature, vertebral instability, spinal stenosis, and intervertebral stenosis are selected for the diagnosis of cervical spondylosis. Considering different degenerative factors are applicable to different medical imaging modalities, each degenerative factor is diagnosed with the most suitable modality in the proposed framework. The proposed architecture is designed with cascade and ensemble networks, where the location and diagnosis modules are implemented with a cascade architecture, and the diagnosis of different degenerative diseases is implemented as an ensemble network. Note that all the models were trained with the imaging data of the same patients, where each patient contains X-ray and MRI data. The multi-modal diagnostic results collaborate complementarily to support the doctors’ needs, reflecting real-world clinical practice where doctors integrate findings from different imaging modalities to address distinct aspects of the diagnostic process. The proposed method is capable of producing a comprehensive report detailing potential cervical degeneration and their respective positions, fulfilling the pragmatic requirements of medical practitioners more effectively.

In conclusion, we designed our multimodal diagnostic framework of cervical spondylosis from the clinical requests of doctors. Typical cervical spondylosis indicators, such as Cobb angle, SVA value, intervertebral foramen stenosis, and spinal stenosis, are required to be explicitly provided for diagnosis. Different cervical spondylosis indicators are best suited to be acquired through different data modalities. We design our framework with multimodal collaboration instead of fusion to one overall result. The key contribution of our work to doctors lies not only in providing results from a multimodal perspective, but also in offering the potential for mutual confirmation among different indicators through a multitask learning approach.

Results

Patient characteristics and dataset composition

This retrospective study collected cervical spine X-ray and MRI scans of 1500 patients (mean age ± standard deviation: 54 years ± 10.28, men: 909, 60.60%, women:591, 39.40%) with cervical spondylosis at the Third Hospital of Peking University from January 2016 to December 2018. During the preparation of data, some patients were excluded due to inappropriate labeling, such as missing vertebral bodies or spinous processes, resulting in a total of 1151 patients being used. Data collection was pre-approved by the Institutional Review Board and excluded minors as well as patients with instrumentation or other medical conditions (e.g., spinal tumors, infections, trauma, or scoliosis). They were annotated by three radiologists and randomly divided into a training set (693), a validation set (228) and a testing set (230). For the vast majority of cases, the three annotators’ results are largely consistent. In cases of disagreement, the annotations from a more authoritative expert shall prevail. All data were manually labeled by radiologists and validated against each other. The labeling information was all by the four-point method of vertebrae in lateral and flexion-extension positions on cervical spine radiographs. Each radiologist was blinded to patient demographics and clinical history. The dataset contained one cervical spine radiograph of each patient in three positions: neutral, flexion, and extension in the lateral position.

Vertebral detection on X-ray Images

In our case, the vertebral bodies are represented as quadrangles, it’s also equivalent to detecting the four vertices of the quadrangles. The detection performance is reported as the error distribution between the predicted vertices to the annotated vertices in three views, including flexion, neutral and extension in Fig. 1a–c. In most cases, from C3 to C7, the mean error is below 1 mm. In particular, the upper margin of the C2 and the down margin of C7 have larger variance, the mean of the former is slightly over 1 mm, and the mean of the latter is close to 1 mm. Also, several outliers with larger errors are observed; correspondingly, C2 and C7 contain more outliers than the others.

Fig. 1: Box diagram of vertebral body detection C2-7.
Fig. 1: Box diagram of vertebral body detection C2-7.
Full size image

(Best review in color). In the boxplots, the central line signifies the median. The box encompasses the interquartile range (IQR), defined by the 25th (Q1) and 75th (Q3) percentiles. The whiskers show the range of data within 1.5 IQR from the quartiles, and any points beyond are plotted as individual outliers. Figure 1ac illustrates the error distribution between the predicted and annotated vertices of the vertebral bodies, with 0 being the minimum error value. The horizontal axis represents a total of 24 vertices for the six vertebrae of C2-7, where U and D represent upper and down margin of the vertebral bodies, A and P represent anterior and posterior margin of the vertebral bodies.

Cervical Vertebrae Curvature (CVC) analysis on X-ray images

The C2-7 Cobb Angle (CA) analysis method was developed according to the policy in6,7,16,17, i.e., the angle between the lower end plate of C2 and the lower end plate of C7, where the vertebral bodies are detected with the model in subsection 2.1. The results of the Cobb angle are illustrated in Fig. 2, where the sub-figures Fig. 2a–c show the numerical analysis of Cobb angle estimation. The mean error of our method is 2.42°, with the median of 1.84°, and over 80% of samples were predicted with an error under 4°, the outliers over 8° were less than 1.5% (see sub-figure Fig. 2a), and the first and third quadrilles were close to 1° and 3° (see sub-figures Fig. 2b). The sub-figure Fig. 2c shows the bias distribution between our predicted results and the gold standard (of Cobb angle) with Bland-Altman analysis, and the 95% limits of agreement (± 1.96SD) were [−6.49°, 6.31°]. In the Bland-Altman analysis, if most of the points fall within the 95% consistency limit (i.e., within the two lines of mean ± 1.96 * standard deviation), and their maximum difference is clinically acceptable, then ‘the compared two methods’ can be considered to have good consistency and can be replaced with each other. In this work, our model and radiologist could be considered as having ‘good consistency’, since most points fall into the range between the 95% limits of agreement.

Fig. 2: Cobb angle results (Best review in color).
Fig. 2: Cobb angle results (Best review in color).
Full size image

a Histogram of Cobb angle errors. The right vertical axis represents the sample distribution in different error range bars. b Violin plot. The Violin plot represents the density of the error distribution. Note that the errors in (a) and (b) are calculated in absolute values. c Bland-Altman analysis. Three horizontal lines, with the middle being the mean of the difference (ordinate) and the other two being the mean ± 1.96 * standard deviation (± 1.96SD), the latter being called the 95% limits of agreement. d The classification accuracy distribution of Cobb angle analysis. According to18, we consider the kyphosis angle exceeding 13 degrees (i.e., <−13) as a guideline for clinical suggestion. So, crossing −13 is regarded as the binary classification of cervical kyphosis, if the prediction and gold standard of Cobb Angle on both sides of −13 is regarded as a misclassification (The red points are the misclassified points). e A visualization example of the Cobb angle calculation.

Sub-figure Fig. 2d illustrates the accuracy of clinical recommendations according to18. The Cobb angle is a crucial metric for assessing cervical spine curvature. It can reflect the lordosis (forward curvature) and kyphosis (backward curvature) of the cervical spine. Research conducted by Suda et al.18 has demonstrated that, through a multivariate logistic regression model, when the local kyphosis angle exceeds 13°, meaning the Cobb angle is less than −13°, successful surgery becomes challenging. Therefore, addressing the kyphosis issue should be prioritized. Conversely, when the local kyphosis angle is less than 13°, indicating a Cobb angle higher than −13°, the likelihood of surgical success is higher. Therefore, we have established a clinical threshold of −13° for the Cobb angle as a guideline. If the Cobb angle is less than −13°, surgery is not recommended, while if it is greater than 13°, surgical intervention may be considered. The accuracy distribution shows the proposed method could give a 100% accurate suggestion of cervical kyphosis, for the patients with Cobb Angle (gold standard) under −20° or over −10°. For the patient with a Cobb angle in [−20°,−10°], the accuracy is about 75.4%.

The C2-7 SVA value analysis method was developed according to the policy in19, i.e., the distance from the center vertical line of C2 to the posterior superior angle of C7. The results of the SVA value are illustrated in Fig. 3, where the sub-figures Fig. 3a–c show the numerical analysis of SVA estimation. The mean error of our method is 0.65 mm, with the median of 0.50 mm, and over 80% samples were predicted with the error under 1 mm, the outliers over 2.5 mm were less than 4% (see sub-figure Fig. 3a), and the first and third quadrilles were close to 0.3 mm and 0.8 mm (see sub-figures Fig. 3b). The sub-figure Fig. 3c shows the bias distribution between our predicted results and the gold standard (of SVA value) with Bland-Altman analysis, and the 95% limits of agreement (± 1.96SD) was [−1.58 mm, 1.78 mm].

Fig. 3: SVA value results (Best review in color).
Fig. 3: SVA value results (Best review in color).
Full size image

a Histogram of SVA value errors. The right vertical axis represents the sample distribution in different error range bars. b Violin plot. The Violin plot represents the density of the error distribution. Note that the errors in (a) and (b) are calculated in absolute values. c Bland-Altman analysis. The dots represent the measurement results of all test sets. Three horizontal lines, with the middle being the mean of the difference (ordinate) and the other two being the mean ± 1.96 * standard deviation, the latter being called the 95% limits of agreement. d The classification accuracy distribution of SVA value analysis. According to20, we consider the indicative of Cervical Deformity with SVA over 40 mm or less than 15 mm as the guidelines for clinical suggestion. So crossing 40 mm or 15 mm is regarded as the classification of Cervical Deformity (CD), where if the prediction and gold standard of SVA value on both sides of 40 mm or 15 mm is regarded as a misclassification (The red points are the misclassified points). e A visualization example of SVA value calculation.

SVA (Sagittal Vertical Axis) is a crucial factor for measuring cervical spine curvature. According to recent research20, C2–C7 SVA more than 40 mm can be considered indicative of Cervical Deformity (CD). Therefore, we set the SVA threshold at 40 mm as the clinical assessment criterion. If SVA is more than 40 mm, it is generally considered a cervical curvature imbalance, which is also considered a kyphosis deformity; thus, surgical intervention may be implemented to correct cervical spine deformity. Also, according to decades of experience of Peking University Third Hospital, SVA less than 15 mm can also be the clinical assessment criterion. The accuracy distribution shows the proposed method could give a 100% accurate suggestion of cervical kyphosis for patients with SVA value (gold standard) under 10 mm or over 20 mm. For the patient with an SVA value in [10 mm, 20 mm], the accuracy is about 90.3%.

Vertebral instability analysis on X-ray images

The spondylolisthesis angle was estimated as the absolute difference between two views of the angle of extension of the posterior edge of the adjacent vertebral body. The results of the spondylolisthesis angle were illustrated in Fig. 4a; the errors are distributed between different vertebral segments, and there was a trend with larger errors from C2-C3 to C6-C7. The mean error of C2-C3/C6-C7 is from 2° to 3.6°.

Fig. 4: Vertebral Instability results.
Fig. 4: Vertebral Instability results.
Full size image

In the boxplots, the central line signifies the median. The box encompasses the interquartile range (IQR), defined by the 25th (Q1) and 75th (Q3) percentiles. The whiskers show the range of data within 1.5 IQR from the quartiles, and any points beyond are plotted as individual outliers. a Box plot of Spondylolisthesis distance. Y-axis represents the error in millimeters. X-axis represents the Spondylolisthesis distance between different vertebral segments. b Box plot of Spondylolisthesis angle. X-axis represents the Spondylolisthesis angle between different vertebrae. Y-axis represents the error in degrees. c Classification accuracy Radar chart of clinical suggestion of Spondylolisthesis angle and distance, and the guidelines were set according to21.

The spondylolisthesis distance is estimated as the absolute difference between the two types of distance: 1) the distance between the extension line of the posterior edge of the upper vertebra and 2) the parallel line passing through the posterior superior corner of the lower vertebra. The results of the spondylolisthesis angle were illustrated in Fig. 4b, the errors are distributed between different vertebral segments, among all the segments, the segment C5-C6 have the smallest mean error about 0.4 mm, while the segment C2-C3 have the largest mean error about 0.6mm.

In terms of vertebral instability, we evaluate it based on the spondylolisthesis angle and distance. According to the guideline in21, the instability of the cervical vertebral segments was defined when there was more than 3.5 mm anterior or posterior translation or more than a 11° difference in spondylolisthesis angle. Therefore, we conducted the vertebral instability evaluation with the classification criteria of 3.5 mm in spondylolisthesis distance and 11° in spondylolisthesis angle. Meaning that when the spondylolisthesis distance exceeds 3.5 mm or the spondylolisthesis angle is more than 11°, it is considered as vertebral instability, and anterior cervical discectomy and fusion treatment may be required.

The classification results are illustrated in Fig. 4c with a Radar chart, which is also more convenient for comparison. It can be observed that the classification accuracy with spondylolisthesis distance on each segment outperforms the accuracy with spondylolisthesis angle; the overall accuracy of spondylolisthesis angle/distance is 80.5/99.4%.

Spinal stenosis analysis on X-ray images

The Sagittal Diameter of Vertebral Body (SDVB) was estimated as the distance from the midpoint of the anterior edge of the vertebral body to the midpoint of the posterior edge. The results of SDVB are illustrated in Fig. 5a, the mean errors estimated by our model are as follows: 1.67 mm for C3, 0.94 mm for C4, 0.77 mm for C5, 0.45 mm for C6, and 0.70 mm for C7.

Fig. 5: Spinal Stenosis results.
Fig. 5: Spinal Stenosis results.
Full size image

In the boxplots, the central line signifies the median. The box encompasses the interquartile range (IQR), defined by the 25th (Q1) and 75th (Q3) percentiles. The whiskers show the range of data within 1.5 IQR from the quartiles, and any points beyond are plotted as individual outliers. a Box plot of Sagittal Diameter of Vertebral Canal (SDVC). Y-axis represents the error in millimeter. b Box plot of Sagittal Diameter of Vertebral Body (SDVB). Y-axis represents the error in millimeter. c Box plot of Pavlov value: SDVC/SDVB.

The Sagittal Diameter of Vertebral Canal (SDVC) was estimated as the shortest distance from the midpoint of the posterior edge of the vertebral body to the lamina. The results of SDVC are illustrated in Fig. 5a, the mean errors estimated by our model are as follows: 2.46 mm for C3, 2.43 mm for C4, 2.14 mm for C5, 1.71 mm for C6, and 2.60 mm for C7.

The Pavlov value was estimated as the ratio of SDVC/SDVB. The results of Pavlov ratio are illustrated in Fig. 5c, the mean errors estimated by our model are as follows: 0.09 for C3, 0.11 for C4, 0.09 for C5, 0.08 for C6, and 0.10 for C7.

Pavlov value is significant in assessing developmental cervical canal stenosis. When the Pavlov value of three consecutive vertebral segments is less than 0.75, developmental cervical spinal stenosis is diagnosed. According to the guideline in22, Wang et al. attempted to categorize the patients into two groups, Developmental Canal Stenosis (DCS) and Non-Developmental Canal Stenosis (NDCS), based on the guideline of whether the Pavlov value is less than 0.75 or not. Thus, crossing 0.75 of the Pavlov value is a crucial clinical determinant; patients consisting of three consecutive vertebral segments with a Pavlov value less than 0.75 are considered as the indicative of developmental canal stenosis, and posterior cervical surgery may be considered.

Intervertebral degeneration analysis on MR images

The results of our model in analyzing the intervertebral disc degeneration (i.e., normal or bulge vs. protrusion or extrusion) are illustrated in Fig. 6a, resulting in the accuracy of 76.3% and the Area Under the Curve (AUC) of 0.844. Compared to the ROC curve of our model, all the results of doctors were under the curve of our model, demonstrating that our model could achieve better sensitivity or specificity in a similar scenario. When comparing to the doctors in more detailed Radar chart, our model outperformed the clinical doctors and radiologists in the evaluation measurement of accuracy and F1 score. The clinical doctor outperformed the radiologist and our model in specificity.

Fig. 6: Intervertebral Degeneration Results.
Fig. 6: Intervertebral Degeneration Results.
Full size image

a Intervertebral disc degeneration analysis. b Spinal stenosis. c Left foramen stenosis. d Right foramen stenosis. The comparison between our model prediction and the annotation by different doctors are illustrated, including clinical doctor, junior and senior radiologists, and the gold standard is annotated by the experts. In establishing the gold standard, all cases were de-identified by removing personal information, and the experts were blinded to both the assessments made by other doctors and the predictions from the AI model. Furthermore, all cases were presented in a randomized order. The results of our model are illustrated with an ROC curve, and the results of doctors are illustrated as the corresponding points (better review in color). The x-axis is “left 1-Specifity"right, the y-axis is sensitivity. The radar chart includes five evaluation indicators: accuracy, sensitivity, specificity, accuracy, and F1 score.

The results of spinal stenosis analysis are shown in Fig. 6b, with the AUC of 0.925, and accuracy of 87.6%, which is better than both clinical doctors and radiologists. When comparing to the doctors in a more detailed Radar chart, our model outperformed the clinical doctors and radiologists in the evaluation measurement of accuracy, sensitivity and F1 score. The clinical doctor outperformed the radiologist and our model in precision.

The results of intervertebal foramen stenosis analysis are illustrated in Fig. 6c, d, and the AUC of left and right intervertebal foramen stenosis diagnostics were 0.849 and 0.845, which were very closed to the AUC of intervertebral disc degeneration diagnostic. Comparing to the doctors in ROC curve, the points of clinical doctor were above the curve of our model, and the points of both junior and senior radiologists were under the curve of our model. When comparing to the doctors in the Radar chart, it could be seen that our model performed better in accuracy and F1 score, and the clinical doctor diagnosed better in precision and specificity. According to our statistic analysis, intervertebral degeneration usually takes about 3 minus for the radiologist, but it takes less than 2 s for our model.

The patient level diagnostic analysis

The patient-level results are illustrated in Fig. 7 as a radio chart, and also the visualization of each degenerative factor. Note that these results are calculated in patients, instead of being calculated in images in the above results, where the degeneration correctly detected in any sequence could be regarded as a correct diagnosis. Each axis in the radar chart represents the diagnostic accuracy of the corresponding degenerative factors. The overall average accuracy was 87.6%, and the diagnostic accuracy of other diseases was in the range of [75.1%, 98.5%].

Fig. 7: The framework of patient level diagnostic analysis and the patient level diagnostic results.
Fig. 7: The framework of patient level diagnostic analysis and the patient level diagnostic results.
Full size image

Each axis of the radar chart represents the diagnostic of Developmental cervical stenosis (Pavlov ratio), Cervical spine curvature (Cobb angle), Vertebral stability (Spondylolisthesis angle and distance), Sagittal plane balance (SVA), Spinal stenosis, Left foramen stenosis, Right foramen stenosis, Intervertebral disc degeneration and the overall cervical spondylosis diagnostic.

Also, a real diagnosis case of cervical spondylosis is illustrated in Fig. 8, where all the diseases diagnosed as positive are illustrated. In the case of Fig. 8, the cervical spine degeneration was detected as spinal stenosis in C4-5, and unstable in C5-6, cervical disc herniation in C3-C6, and spinal canal stenosis in C4-5. To some extent, C3-C6 Cervical disc herniation is a direct cause of Spinal canal stenosis. As the degenerative changes in the cervical spine of the patient progress, multiple disks at C3-C6 gradually undergo herniation, reducing the effective space within the spinal canal, thereby forming cervical spinal stenosis and ultimately leading to compression of the cervical spinal cord or nerve roots. At the same time, Spinal canal stenosis is also a manifestation of the compression of the spinal cord by C3-C6 Cervical disc herniation, and the two conditions can corroborate each other.

Fig. 8: A visualized diagnostic report of a real case.
Fig. 8: A visualized diagnostic report of a real case.
Full size image

This figure illustrates all conditions diagnosed as positive in the case. In this case, the cervical spine degeneration was detected as spinal stenosis in C4-5, and unstable in C5-6, cervical disc herniation in C3-C6, and spinal canal stenosis in C4-5.

The external validation

We also conducted the external validation with the patients in Beijing Haidian Hospital, the results are illustrated in Fig. 9. In general, the overall average accuracy is 85.8%, which is very close to that of Peking University Third Hospital (87.6%). Compared to the validation within Peking University Third Hospital, the performance of both our model and the clinical doctors were somewhat lower than that in Fig. 6. In particular, the AUC of intervertebral disc degeneration diagnostic is 0.808, and the AUC on left and right foramen stenosis were 0.707 and 0.705 in Fig. 9c, d, where the ‘points’ of the clinical doctor were near ROC curve of our model. The clinical doctor slightly outperforms our model.

Fig. 9: External validation results.
Fig. 9: External validation results.
Full size image

a Intervertebral disc degeneration analysis. b Spinal stenosis. c Left foramen stenosis. d Right foramen stenosis. e Overall results. The comparison between our model prediction and the annotation by different doctors is illustrated, including clinical doctors, junior and senior radiologists, and the gold standard is annotated by the experts. In establishing the gold standard, all cases were de-identified by removing personal information, and the experts were blinded to both the assessments made by other doctors and the predictions from the AI model. Furthermore, all cases were presented in a randomized order. The results of our model is illustrated with an ROC curve, and the results of doctors are illustrated as the corresponding points (better review in color). The x-axis is “left 1-Specifity"right, the y-axis is sensitivity. The radar chart includes five evaluation indicators: accuracy, sensitivity, specificity, accuracy, and F1 score.

Discussion

In spinal surgery, the concept of Cobb angle is used to describe the alignment and curvature of the spine. This concept can be applied to describe angles in both the sagittal and coronal planes, depending on the morphological characteristics of different types of spinal diseases. For cervical spondylosis, the common change in spinal angle is a reduction in the lordotic angle in the sagittal plane; whereas for thoracolumbar scoliosis, the common change is an increase in angle in the coronal plane23. Therefore, the definition of ‘cervical Cobb angle’ often refers to the angle between the extension line of the lower endplate of the C2 vertebra and the extension line of the lower endplate of the C7 vertebra in the sagittal plane24. If the patient’s X-ray shows unclear display of C7 vertebrae, C6 vertebrae can be used instead25. Similarly, the Cobb angle has also been applied and studied on cervical sagittal CT26. In previous works, Sardjono et al.27 reported an automatic Cobb angle determination and tested the accuracy in 36 patients with a mean difference of 3.3°. Wang et al.28 proposed a model called MVE-Net, which combined Anterior-posterior (AP) and Lateral (LAT) view X-rays to automatically calculate the Cobb angle. The average absolute errors estimated on the Anterior-posterior (AP) and Lateral (LAT) views were 7.81° and 6.26°, respectively. Pan et al.7 proposed an automated method for detecting the Cobb angle on chest X-rays. The mean absolute difference between computer-aided and manual methods for the Cobb angle was 3.32°. However, data with a larger number of patients is used in our model training and evaluation, and the Mean Absolute Difference (MAD) of our method is 2.42°, with the median of 1.84°, which are significantly better than previous works.

Sagittal Vertical Axis (SVA) is an important parameter for measuring sagittal plane balance. However, research on the automatic detection of SVA values using deep learning methods is currently relatively scarce. Weng et al.29 introduced an automated method based on ResUNet for detecting the SVA value. The results showed a median absolute error for SVA of 1.183 ± 0.166 mm. However, the mean difference of our method is 0.65 mm, with the median of 0.5 mm.

Most existing methods for the automated measurement of Cobb angles and SVA values primarily rely on accuracy metrics, and there is a lack of in-depth exploration of their clinical relevance and implications. Differing from existing methods, we go beyond the automated calculation of SVA and Cobb angles by setting thresholds to provide more refined clinical recommendations. Cobb angle and SVA are both vital indicators for assessing cervical vertebrae curvature. While the Cobb angle assesses cervical spine curvature balance, SVA focuses on sagittal plane balance. According to the research7, there is a higher correlation between SVA (Sagittal Vertical Axis) and the Neck Disability Index (NDI). Our model exhibits errors in SVA measurements that are smaller than those in Cobb angle measurements. Overall, combining Cobb angle and SVA provides a more robust diagnostic tool for assessing cervical vertebral curvature.

Vertebral instability could be detected with spondylolisthesis angle and distance. In general, diagnostic with spondylolisthesis distance is more accurate than that with spondylolisthesis angle, which has a similar trend to the Cobb angle and SVA value. The results somewhat demonstrate that a deep learning model is more sensitive to the distance measurement, compared to the angle measurement in cervical spondylosis diagnostic. In contrast to the composite criterion in21 (used in this paper), some works30,31 considered a single criterion as ‘20% of displacement’. In30, Xiao et al. obtain the accuracy near 90%. In our work, if only consider single criterion, the diagnostic accuracy with spondylosis distance is over 99%, which is better than that in30.

Sagittal diameter of vertebral canal (SDVC) is an important indicator for diagnosing cervical spinal canal stenosis. However, the measurement of the SDVC is influenced by the magnification factor due to the X-ray projection distance. However, the Pavlov ratio (SDVC/SDVB) is not affected by this factor. Therefore, using the Pavlov ratio as an indicator to assess cervical spinal canal stenosis is more reasonable. Furthermore, research conducted by Suk et al.32 has shown a high correlation between the Pavlov ratio and cervical spinal canal stenosis. Hence, our model’s ability to automatically calculate the Pavlov ratio to aid in assessing cervical spinal canal stenosis is meaningful.

It is apparent that for the Pavlov ratio, our model exhibits lower errors, indicating higher reliability in our results. Existing literature on artificial intelligence-based automatic detection of Pavlov values is limited, and our findings are pioneering in this regard.

As widely recognized, cervical intervertebral disc degeneration represents one of the most prevalent cervical spine disorders in clinical medicine, with sagittal MRI playing a crucial role in its diagnosis. We used a deep learning-based model for the classification of intervertebral disc degeneration. Experimental results demonstrate that our model is much better than two junior radiologists, surpasses the performance of senior radiologists, and serves as an effective diagnostic aid for medical professionals. However, due to various factors such as limited datasets, the majority of publicly available articles primarily focus on auxiliary diagnosis of lumbar intervertebral disks33,34,35,36, with limited research dedicated to cervical intervertebral disc studies. For instance, Oktay et al.33 conducted intervertebral disc segmentation and then employed a Support Vector Machine (SVM) to classify lumbar intervertebral disks into normal and abnormal categories. Mbarki et al.34 utilized a CNN-based model for feature extraction in lumbar intervertebral disc degeneration and subsequently employed a VGG-based model for classification. The work35 combined information from sagittal and axial MRI images of the lumbar spine and classified it into appropriate classes (healthy, bulge, central, right or left herniation for axial view, and healthy, L4/L5, L5/S1 level of herniation in sagittal view) using a convolutional neural network (CNN). Adibatti et al.36 similarly focused on locating, segmenting, and classifying lumbar intervertebral disks by deep learning models. All those lumbar intervertebral disc methods depend on the segmentation results. Note that intervertebral disks classification with segments that closed up the lesion could reduce the difficulty; however, they also required external models and labor costs in model training and segment annotation. Compared to those methods, our method did not require external segmentation.

Methods

This study was approved by the Peking University Third Hospital Medical Science Research Ethics Committee, which waived the requirement for informed consent. This waiver was granted as our study was a retrospective analysis of pre-existing imaging data. The research is strictly non-commercial and uses fully anonymized data with all personal identifiers permanently removed. The study was performed in accordance with national and international guidelines. The proposed method is designed in cascade and ensemble architecture, where detection module and diagnostic module are cascaded (see Fig. 7). Different degenerative indicators are integrated in an ensemble module, which are contributed as different losses in deep learning model training. This section introduces the proposed framework in two parts, detection and degeneration diagnosis in multi-task learning.

Detection

The input to the vertebral body detection model is the image I RC × W × H, where C, W, and H denote the number of channels, width, and height of the input image. The output is the locations of K keypoints, K is 24 (i.e., the four corner points of vertebral bodies from C2 to C7), denoted as \({\left\{{p}_{k}\right\}}_{k=1}^{K},{p}_{k}=\left({x}_{k},{y}_{k}\right)\). Since the input radiographic image is a grayscale image, the number of channels C is 1.

Previous keypoint detection models37 usually follow a similar pipeline to extract the image features for the k-th keypoint first, and then predict a heatmap \({\widehat{{{{\rm{H}}}}}}_{k}\in {{{{\bf{R}}}}}^{w\times h}\) for the k-th keypoint, where w × h denotes the size of the output heatmap. Then, obtain the pixel position pk of the k-th keypoint by argmax, and finally group the keypoints of different instances.

The ground truth pixel position is usually converted into a heatmap, denoted as Hk, which is generated by a 2D Gaussian function with a standard deviation of 1. The loss for heatmap supervision is computed using the Mean Squared Error (MSE) between the ground-truth heatmap H and the predicted one \(\widehat{H}\). The calculation is performed as follows:

$$MSE(H,\widehat{H})=\frac{1}{{{{\bf{N}}}}}\,{\sum }_{i=1}^{{{{\bf{N}}}}}{({h}_{i}-\widehat{{h}_{i}})}^{2}$$
(1)

where N is the number of points in the heatmap, hi and \({\widehat{h}}_{i}\) represent the ground-truth probability and the predicted probability of the i-th pixel.

The loss function of heatmap supervision L can be formulated as:

$$L={L}_{heatmap}=MSE(H,\widehat{H})$$
(2)

Heatmaps are typically generated at lower resolutions through down-sampling to balance computational efficiency and receptive field coverage, leading to localization errors. Particularly for the medical images, where most targets are relatively small in images, minimizing localization errors becomes critical. Our keypoint detection module considers to generate the heatmap on the highest resolution feature map, inspired by the HigherHRNet38, which is built on HRNet39,40. The high-resolution output obtained by deconvolution upsampling is added to form a high-resolution feature pyramid learning multi-scale perceptual representation. Heatmap loss Lheatmap contains multi-resolution supervision and can be expressed as:

$${L}_{heatmap}^{{\prime} }={\sum }_{n=1}^{N}{{{\rm{MSE}}}}\left({H}_{nk},{\widehat{H}}_{nk}\right)$$
(3)

Where N is the number of layers of the feature pyramid, while Hnk and \({\widehat{H}}_{nk}\) denote the n-th layer of the heatmap at the k-th keypoint.

Degeneration diagnosis in multi-task learning

In this section, we propose a Multi-task Learning Mechanism (MLM) module, which combines the metric measurement task with the keypoint detection task in an attempt to improve the overall performance without adding additional annotations. In our multi-task learning framework, we employ the Differentiable Spatial to Numerical Transform (DSNT)41 to convert heatmaps into coordinate representations through a differentiable spatial expectation operation, enabling end-to-end backpropagation of errors from multiple clinical indicators to the keypoints detection module. Given the predicted heatmap \(\widehat{H}\in {{\mathbb{R}}}^{h\times w}\), we apply the softmax function for normalization to ensure the results conform to a probability distribution:

$${\widehat{Z}}_{i,j}=\frac{exp({\widehat{H}}_{i,j})}{{\Sigma }_{k=1}^{h}{\Sigma }_{l=1}^{w}exp({\widehat{H}}_{k,l})}$$
(4)

We then construct two h × w matrices X and Y, where \({X}_{ij}=\frac{2j-(w+1)}{w}\) and \({Y}_{ij}=\frac{2i-(h+1)}{h}\). The matrices X and Y encode normalized coordinate values, where each element contains the x- (for X) or y-coordinate (for Y) respectively, such that the top-left corner of the image is at (−1,1) and the bottom-right is at (1,1). The predicted coordinates \((\widehat{x},\widehat{y})\) are derived through the Frobenius inner product between the normalized heatmap \(\widehat{Z}\) and coordinate matrices X, Y.

$$(\widehat{x},\widehat{y})=DSNT(\widehat{Z})=[{\langle \widehat{Z},X\rangle }_{F}\,{\langle \widehat{Z},X\rangle }_{F}]$$
(5)

where 〈, F denotes the Frobenius inner product, which is equivalent to the scalar dot product of vectorized matrices. Essentially, this operation computes a weighted average of the coordinate grid with respect to the heatmap’s probability distribution, yielding the mathematical expectation coordinates under this distribution. Since both the softmax and expectation operations are differentiable, gradients can be backpropagated seamlessly.

Although DSNT can replace the heatmap output, in order to preserve the multi-resolution heatmap of HigherHRNet, we use the predicted coordinates generated by DSNT for the metric measurement task only. The final loss function LOurs contains two parts: loss Lmultitask for the DSNT-based metric measurement task and loss \({L}_{heatmap}^{{\prime} }\) for the multi-resolution heatmap keypoint detection task, represented as:

$${L}_{Ours}={L}_{multi-task}+{L}_{heatmap}^{{\prime} }={\sum }_{m=1}^{M}{\lambda }_{m}{{{\rm{MSE}}}}\left({{{{\rm{Q}}}}}_{m},{\widehat{{{{\rm{Q}}}}}}_{m}\right)+{\sum }_{n=1}^{N}{{{\rm{MSE}}}}\left({H}_{nk},{\widehat{H}}_{nk}\right)$$
(6)
$${{{\rm{Q}}}}=[{{{{\bf{Q}}}}}_{CA},{{{{\bf{Q}}}}}_{SD},{{{{\bf{Q}}}}}_{SA},{{{{\bf{Q}}}}}_{SDVB},{{{{\bf{Q}}}}}_{SDVC}]$$
(7)

Among them, Qm is taken from the coordinates \({{{{\bf{P}}}}}^{{\prime} }\) obtained by DSNT mapping, where λm denotes the parameter of the m-th metric measurement operation, and M denotes the number of metric measurement tasks. \(\widehat{{{{{\rm{Q}}}}}_{m}}\) denotes the result of the m-th metric measurement with ground truth coordinates. It is explained below how to obtain the metric measurement Qm from the coordinate matrix \({{{{\bf{P}}}}}^{{\prime} }\):

$${{{{\bf{P}}}}}^{{\prime} }={\left[\begin{array}{cccccccc}{x}_{1} & {y}_{1} & {x}_{2} & {y}_{2} & {x}_{3} & {y}_{3} & {x}_{4} & {y}_{4}\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ {x}_{21} & {y}_{21} & {x}_{22} & {y}_{22} & {x}_{23} & {y}_{23} & {x}_{24} & {y}_{24}\end{array}\right]}_{6\times 8}$$
(8)

Points p1 to 4 represent the antero-superior, antero-inferior, postero-superior, and postero-inferior corner points of the C2 vertebrae, as well as so on.

Assessing cervical curvature relies on two indicators, the C2-7 Cobb angle and the C2-7 SVA. In particular, the angle points of the lower edges of the C2 and C7 vertebrae are used to calculate the Cobb angle. The following values in the corresponding coordinate matrix \({{{{\bf{P}}}}}^{{\prime} }\):

$${p}_{2x}=\left({{{{\bf{P}}}}}_{[0,2]}^{{\prime} }\right),{p}_{2y}=\left({{{{\bf{P}}}}}_{[0,3]}^{{\prime} }\right),{p}_{4x}=\left({{{{\bf{P}}}}}_{[0,6]}^{{\prime} }\right),{p}_{4y}=\left({{{{\bf{P}}}}}_{[0,7]}^{{\prime} }\right)$$
(9)
$${p}_{22x}=\left({{{{\bf{P}}}}}_{[5,2]}^{{\prime} }\right),{p}_{22y}=\left({{{{\bf{P}}}}}_{[5,3]}^{{\prime} }\right),{p}_{24x}=\left({{{{\bf{P}}}}}_{[5,6]}^{{\prime} }\right),{p}_{24y}=\left({{{{\bf{P}}}}}_{[5,7]}^{{\prime} }\right)$$
(10)

The detailed formula of QCA is:

$$k1=\left({p}_{2y}-{p}_{4y}\right)/\left({p}_{2x}-{p}_{4x}+\delta \right),\,k2=\left({p}_{22y}-{p}_{24y}\right)/\left({p}_{22x}-{p}_{24x}+\delta \right),\,\delta=1e-9$$
(11)
$${{{{\bf{Q}}}}}_{CA}=\arctan ((k1-k2)/(1-k1\times k2))\times \left(18{0}^{\circ }/\pi \right)$$
(12)

Similarly, the SVA calculation is dependent on the coordination of the four corners of C2, as well as the postero-superior corner p23 of C7:

$${p}_{1x}=\left({{{{\bf{P}}}}}_{[0,0]}^{{\prime} }\right),{p}_{1y}=\left({{{{\bf{P}}}}}_{[0,1]}^{{\prime} }\right),{p}_{2x}=\left({{{{\bf{P}}}}}_{[0,2]}^{{\prime} }\right),{p}_{2y}=\left({{{{\bf{P}}}}}_{[0,3]}^{{\prime} }\right)$$
(13)
$${p}_{3x}=\left({{{{\bf{P}}}}}_{[0,4]}^{{\prime} }\right),{p}_{3y}=\left({{{{\bf{P}}}}}_{[0,5]}^{{\prime} }\right){p}_{4x}=\left({{{{\bf{P}}}}}_{[0,6]}^{{\prime} }\right),{p}_{4y}=\left({{{{\bf{P}}}}}_{[0,7]}^{{\prime} }\right)$$
(14)
$${p}_{23x}=\left({{{{\bf{P}}}}}_{[5,4]}^{{\prime} }\right)$$
(15)

\({p}^{{\prime} }\) is the coordinates of the diagonal intersection of the C2 vertebrae, and the details of the calculation of the QSVA are as follows:

$${p}_{x}^{{\prime} }=\left({p}_{4x}\times {p}_{2y}-{p}_{1y}\times {p}_{3x}\right)/\left({p}_{1x}\times {p}_{2y}-{p}_{1y}\times {p}_{2x}\right)$$
(16)
$${{{{\bf{Q}}}}}_{SVA}=\left|{p}_{x}^{{\prime} }-{p}_{23x}\right|$$
(17)

Vertebral instability can be assessed using the spondylolisthesis angle and spondylolisthesis distance. Both calculations depend on the coordinates of the posterior corners of the adjoining vertebrae, as in the example of C5-6:

$${p}_{15x}=\left({{{{\bf{P}}}}}_{[2,4]}^{{\prime} }\right),{p}_{15y}=\left({{{{\bf{P}}}}}_{[2,5]}^{{\prime} }\right),{p}_{16x}=\left({{{{\bf{P}}}}}_{[2,6]}^{{\prime} }\right),{p}_{16y}=\left({{{{\bf{P}}}}}_{[2,7]}^{{\prime} }\right)$$
(18)
$${p}_{19x}=\left({{{{\bf{P}}}}}_{[3,4]}^{{\prime} }\right),{p}_{19y}=\left({{{{\bf{P}}}}}_{[3,5]}^{{\prime} }\right),{p}_{20x}=\left({{{{\bf{P}}}}}_{[3,6]}^{{\prime} }\right),{p}_{20y}=\left({{{{\bf{P}}}}}_{[3,7]}^{{\prime} }\right)$$
(19)

The value QSD of the spondylolisthesis distance is given by:

$$A={p}_{16y}-{p}_{15y},B={p}_{15x}-{p}_{16x},C=\left({p}_{15y}-{p}_{16y}\right)\times {p}_{15x}/\left({p}_{16x}-{p}_{15x}\right)\times {p}_{15y}$$
(20)
$${{{{\bf{Q}}}}}_{SD}=\left|A\times {p}_{20x}+B\times {p}_{20y}+C\right|/\sqrt{{A}^{2}+{B}^{2}}$$
(21)

The value QSA of spondylolisthesis angle is given by:

$${k}_{1}=\left({p}_{15y}-{p}_{10y}\right)/\left({p}_{15z}-{p}_{18x}+\delta \right),{k}_{2}=\left({p}_{19y}-{p}_{20y}\right)/\left({p}_{19x}-{p}_{20x}+\delta \right),\delta=1e-9$$
(22)
$${{{{\bf{Q}}}}}_{SA}=\arctan (({k}_{1}-{k}_{2})/(1-{k}_{1}\times {k}_{2}))\times \left(18{0}^{\circ }/\pi \right)$$
(23)

The assessment of spinal stenosis relies on the Pavlov value calculated as:

$${{{\bf{Pavlov}}}}={{{{\bf{Q}}}}}_{SDVC}/{{{{\bf{Q}}}}}_{SDVB}$$
(24)

SDVB is the length of the line segment joining the anterior and posterior mid-points of the vertebra, using the C3 vertebra as an example:

$${p}_{5x}=\left({{{{\bf{P}}}}}_{[1,0]}^{{\prime} }\right),{p}_{5y}=\left({{{{\bf{P}}}}}_{[1,1]}^{{\prime} }\right),{p}_{6x}=\left({{{{\bf{P}}}}}_{[1,2]}^{{\prime} }\right),{p}_{6y}=\left({{{{\bf{P}}}}}_{[1,3]}^{{\prime} }\right)$$
(25)
$${p}_{7x}=\left({{{{\bf{P}}}}}_{[1,4]}^{{\prime} }\right),{p}_{7y}=\left({{{{\bf{P}}}}}_{[1,5]}^{{\prime} }\right){p}_{8x}=\left({{{{\bf{P}}}}}_{[1,6]}^{{\prime} }\right),{p}_{8y}=\left({{{{\bf{P}}}}}_{[1,7]}^{{\prime} }\right)$$
(26)

The QSDVB of C3 was calculated as:

$${p}_{anterio{r}_{x}}=\left({p}_{5x}+{p}_{6x}\right)/2,\,{p}_{anterio{r}_{y}}=\left({p}_{5y}+{p}_{6y}\right)/2$$
(27)
$${p}_{posterio{r}_{x}}=\left({p}_{7x}+{p}_{8x}\right)/2,\,{p}_{posterio{r}_{y}}=\left({p}_{7y}+{p}_{8y}\right)/2$$
(28)
$${{{{\bf{Q}}}}}_{SDVB}=distance\left({p}_{anterior},\,{p}_{posterior}\right)=\sqrt{{\left({p}_{anterio{r}_{x}}-{p}_{posterio{r}_{x}}\right)}^{2}+{\left({p}_{anterio{r}_{y}}-{p}_{posterio{r}_{y}}\right)}^{2}}$$
(29)

QSDVC is the shortest distance from the posterior midpoint of the vertebra to the anterior cervical spinous process M, and the matrix M is obtained from the segmentation model, which in the example of C3 can be expressed as:

$${{{{\bf{Q}}}}}_{SDVC}=min(distance\left({p}_{posterior},\,p\in {{{\bf{M}}}}\right))$$
(30)

All above metrics Q = [QCA, QSD, QSA, QSDVB, QSDVC] are jointly participated in multi-task training and optimization for X-rays. Our multi-task framework is implemented as an extension of the Faster R-CNN model42. In the Faster R-CNN model, two tasks of classification and localization are jointly learned as two branches after several shared CNN layers. In our implementation (see Supplementary Fig. 1 in the Supplementary Information), each metric in Q is implemented as a branch as follows:

$${L}_{multi-task}={\sum }_{m=1}^{M}{\lambda }_{m}{{{\rm{MSE}}}}\left({{{{\rm{Q}}}}}_{m},{\widehat{{{{\rm{Q}}}}}}_{m}\right)$$
(31)

The implementation for processing X-ray images is divided into two stages. The first detection stage detects the coordinates of cervical vertebral key points, while the second prediction stage predicts multiple degenerative indicators. Our detection module extracts the feature map from the input images and then generates the heatmap on the highest resolution feature map, inspired by the HRNet39,40 and HigherHRNet38. The detection module is a parallel architecture that consists of several branches, where one branch focuses on high-resolution features, while others progressively downsample the input to create low-resolution representations (see Supplementary Fig. 1 in the Supplementary Information). The high-resolution branch consists of several residual blocks that utilize 3 × 3 convolutions, batch normalization, and ReLU activation. Alongside the high-resolution branch, there are low-resolution branches at different scales (1/2, 1/4 the original resolution), and the downsampling is implemented by the strided convolutions. The architecture also employs a fusion mechanism to combine high-resolution and low-resolution feature maps, using upsampling and convolution operations.

After obtaining the feature map through the above backbone network, we use a deconvolution upsampling operation and several residual convolutional layers to obtain the final high-resolution heatmaps. Then we use the Differentiable Spatial to Numerical Transform (DSNT)41 to get the mapped pixel coordinates from the normalized heatmap prediction, and finally group the keypoints of different instances. Finally, the predicted values of various degenerative indicators are calculated from the linear layers based on the coordinates obtained by the DSNT mapping.

The diagnosis of intervertebral disc degeneration on MRI could also be implemented as a branch after several shared CNN layers. However, considering the visual differences between X-ray and MRI images, we separately train a different CNN model for MRI images (see Supplementary Fig. 2 in the Supplementary Information).

Evaluation metrics

In order to verify the capability of our model in diagnosing cervical spondylosis diseases, we evaluate our method from two aspects: measurement error and recognition performance. The measurement error consists of the Mean Absolute Error (MAE) of angle and distance, which could be calculated as the MAE angle and distance between the predicted results of our model and the golden standard annotated by the expert. The performance metrics are evaluated through accuracy, precision, sensitivity, specificity and F1 score.

Implementation details

The width of the original X-ray images ranges from 908 to 2973 pixels, and the height ranges from 1125 to 3032 pixels. With reference to43,44, we resized all images uniformly to 512 × 512 pixels and performed normalization operations to standardize input distributions. Regarding the hyperparameter configuration for model training, we set the batch size to 32. The model was initialized with the pre-trained HigherHRNet38 weights provided by MMPose45, and trained for 300 epochs using the Adam optimizer with a fixed learning rate of 1.5 × 10−3.

Statistics and reproducibility

This retrospective study was conducted on a cohort of 1151 patients. The dataset was randomly partitioned into training (n = 693), validation (n = 228), and testing (n = 230) sets. No statistical method was used to predetermine the sample size, and no data were excluded following initial eligibility screening. The reproducibility of the analysis is facilitated by the publicly archived code.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.