Automated diagnostic of cervical spondylosis on multimodal medical images with a multi-task deep learning model

Song, Xinhang; Li, Yuan; Ouyang, Hanqiang; Yao, Meiyi; Zhao, Fangbo; Yu, Kecen; Tian, Shuai; Xing, Xiaoying; Zhou, Yan; Zhang, Enlong; Ni, Ming; Yuan, Yuan; Li, Dasheng; Lang, Ning; Jiang, Liang; Yuan, Huishu; Jiang, Shuqiang

doi:10.1038/s41467-026-69023-w

Download PDF

Article
Open access
Published: 05 February 2026

Automated diagnostic of cervical spondylosis on multimodal medical images with a multi-task deep learning model

Xinhang Song ORCID: orcid.org/0000-0002-0895-1076¹^na1,
Yuan Li²^na1,
Hanqiang Ouyang^3,4,5^na1,
Meiyi Yao^1,6,
Fangbo Zhao^1,6,
Kecen Yu^1,6,
Shuai Tian²,
Xiaoying Xing²,
Yan Zhou²,
Enlong Zhang²,
Ming Ni²,
Yuan Yuan²,
Dasheng Li⁷,
Ning Lang^2,8,
Liang Jiang^3,4,5,
Huishu Yuan² &
…
Shuqiang Jiang ORCID: orcid.org/0000-0002-1596-4326^1,6

Nature Communications volume 17, Article number: 2392 (2026) Cite this article

3727 Accesses
14 Altmetric
Metrics details

Subjects

Abstract

Cervical spondylosis is one of the most common degenerative diseases, seriously affecting life quality. Unlike diseases with explicit lesions like cancer, hydroncus, or fracture, the degeneration of the cervical spine cannot be explicitly detected from the appearance of medical images, requiring extensive experience of doctors to interpret subtle clues. However, the extremely high incidence of cervical spondylosis coincides with a serious shortage of experienced doctors and uneven distribution of medical resources, hindering early diagnosis. We propose a cascade-ensemble deep learning framework for cervical spondylosis diagnosis. The framework integrates vertebral body detection and degenerative diagnosis through a cascading architecture, and jointly trains an ensemble of degenerative indicators in a multi-task learning manner. We demonstrate that deep learning models are more sensitive to distance and position based indicators than angle based ones. In intervertebral stenosis analysis, our method achieves comparable performance to senior radiologists and clinicians, with much faster diagnostic speed.

Multi-modal and Multi-view Cervical Spondylosis Imaging Dataset

Article Open access 01 July 2025

Using multiple machine learning algorithms to predict spinal cord injury in patients with cervical spondylosis: a multicenter study

Article Open access 27 October 2025

An efficient deep learning based approach for automated identification of cervical vertebrae fracture as a clinical support aid

Article Open access 15 July 2025

Introduction

Cervical spondylosis is a seriously harmful disease occurring in any population, seriously affecting the health and quality of our lives. The diagnosis and treatment of cervical spondylosis usually require a combination of imaging and clinical symptoms. Cervical spondylosis is a degenerative disease that causes clinical manifestations such as neck pain, stiffness, headaches, numbness, tingling, and weakness in the arms or legs due to cervical degeneration in different parts. In the World Health Organization (WHO)’s “ Global Top Ten Persistent Diseases", cervical spondylosis ranks second. Among the global population, the number of people with cervical spondylosis is as high as 900 million. The prevalence of cervical spondylosis was 13.76%¹ among 3900 selected participants, and this ratio differed significantly among the urban, suburban, and rural populations (13.07%, 15.97%, and 12.25%, respectively). With changes in lifestyle and working environments, cervical spondylosis is no longer a common condition among middle-aged and elderly people. The East Asia and Andean Latin America regions had the highest and lowest age-standardized incidence rates, with 1029.0 (910.5 to 1166.1) and 624.0 (550.3 to 708.3) per 100,000 population in 2017².

Compared to diseases with explicit lesions, such as cancer, hydroncus, or fracture, cervical spondylosis is a chronic, primarily non-traumatic and progressive condition without explicit lesions, which are usually diagnosed with the combination of appearance and professional metric measurement in clinical practice. According to the affected structures and clinical manifestations, cervical spondylosis could be caused by different types of degeneration, such as the degeneration of the cervical vertebrae curvature, vertebral instability, intervertebral disks and the intervertebral joints in the cervical spine, which compress or stimulate the adjacent spinal cord, nerve roots, and vertebral arteries, leading to the symptoms of arm pain, numbness, neck pain, weakness, incontinence, and impairment of gait (see Supplementary Table 1 in the Supplementary Information).

In particular, the pathological forms of cervical degeneration are diverse and vary greatly; correspondingly, the characterizations of different degradation phenomena on medical imaging also vary. Similarly, the therapeutic schedule of different cervical spondylosis types varies greatly, and accurate preoperative imaging is vital to determine the cause of compression as soon as possible to determine an appropriate surgical plan.

Common examinations for cervical spondylosis include X-rays, computed tomography (CT), and magnetic resonance imaging (MRI). CT is suitable for displaying changes in the spinal structure, but it exposes the body to relatively high doses of radiation. MRI shows the fine details of soft tissues, but it is time-consuming and expensive; in addition, patients with metal in their body may have contraindications. Compared with CT and MRI, X-ray radiation is cheaper, more convenient, and faster. More importantly, based on clinical experience, orthopedic experts have artificially set sagittal plane parameters for cervical spine X-rays, which have become key factors in determining surgical options for cervical spondylosis and have been widely used clinically. For example, if the X-ray shows cervical kyphosis, it is suggested to choose anterior surgery to improve the curvature; if it shows cervical spinal canal stenosis, it prompts the surgeon to choose posterior spinal canal enlargement surgery to relieve spinal cord compression; if it shows cervical segmental instability, it indicates the need for metal internal fixation to enhance stability.

In general, failure to correctly identify the sagittal factors may lead to wrong-procedure surgery and poor prognosis. However, an accurate diagnostic of cervical spondylosis requires extensive clinical experience. On the one hand, the distribution of medical resources in most countries is uneven, and senior doctors are concentrated in large cities at the nation’s top-class hospitals. While the experience of physicians at primary hospitals is limited, their diagnosis and treatment efficiency for cervical spondylosis may not be guaranteed. On the other hand, the workload of senior doctors has long been oversaturated, and the resources for on-site guidance to the primary hospitals are limited. Thus, how to transmit the experience accumulated over the years at tertiary and top-class hospitals in treating cervical spondylosis to primary hospitals is urgently needed, which will be beneficial to the majority of patients. In addition to the on-site guidance, a deep learning model trained with expert doctors’ experience, used as supervision in training, could be a solution to improve the diagnostic accuracy in primary hospitals, meanwhile, significantly saving diagnostic time for all hospitals.

Existing deep learning based studies related to the assessment of cervical degeneration can be broadly categorized into the localization-based and semantic segmentation-based approaches for degeneration assessment. Among them, some localization-based works are proposed with X-ray images to determine the presence of certain types of degeneration, such as the ossification of the posterior longitudinal ligament³ or spinal stenosis⁴, or the degree of spinal cord compression⁵. However, there are multiple factors that lead to the degeneration of the cervical spine, and the assessment of the overall situation is still mostly based on manual measurements^6,7. Moreover, the methods^8,9 of localizing body structures, such as cervical vertebrae, mainly focus on the center of the vertebral body, but ignore the edges and corner points and lack further assessment. Alternatively, some semantic segmentation-based studies^10,11 have performed meticulous semantic segmentation of cervical spine structures, such as bones or disks, which extremely rely on the large scale of annotated data, and the edges of vertebral segments are usually not clear enough to meet the precision requirements from the doctors. In recent years, the methods of feature point detection have become popular for medical diagnostic tasks such as head¹², hand¹³, lower limb¹⁴ and hip bone¹⁵. However, in contrast to the landmark detection used in the scenarios where bones are susceptible to differentiation, the cervical vertebrae are highly similar to each other, and are likely to result in the projection artifacts and structural overlap. In general, previous deep learning methods have made attempts on cervical analysis from some particular aspects, such as vertebral detection, spinal stenosis, intervertebral disc degeneration, those models for separate degenerative factors still could not meet the practical requirements of doctors. One intuitive idea could be combining all those models in a system. However, considering different models are designed in different architectures and trained with different data, directly combining previous models is difficult to achieve.

In this work, we propose a multi-modal collaboration cervical spondylosis diagnosis framework trained with multi-task learning, oriented to the clinical requirements of less annotation workload for doctors, and fine-grained measurements of multiple types of degeneration. According to the patient's symptoms (see Supplementary Table 1 in the Supplementary Information) and the expert knowledge, the cervical vertebrae curvature, vertebral instability, spinal stenosis, and intervertebral stenosis are selected for the diagnosis of cervical spondylosis. Considering different degenerative factors are applicable to different medical imaging modalities, each degenerative factor is diagnosed with the most suitable modality in the proposed framework. The proposed architecture is designed with cascade and ensemble networks, where the location and diagnosis modules are implemented with a cascade architecture, and the diagnosis of different degenerative diseases is implemented as an ensemble network. Note that all the models were trained with the imaging data of the same patients, where each patient contains X-ray and MRI data. The multi-modal diagnostic results collaborate complementarily to support the doctors’ needs, reflecting real-world clinical practice where doctors integrate findings from different imaging modalities to address distinct aspects of the diagnostic process. The proposed method is capable of producing a comprehensive report detailing potential cervical degeneration and their respective positions, fulfilling the pragmatic requirements of medical practitioners more effectively.

In conclusion, we designed our multimodal diagnostic framework of cervical spondylosis from the clinical requests of doctors. Typical cervical spondylosis indicators, such as Cobb angle, SVA value, intervertebral foramen stenosis, and spinal stenosis, are required to be explicitly provided for diagnosis. Different cervical spondylosis indicators are best suited to be acquired through different data modalities. We design our framework with multimodal collaboration instead of fusion to one overall result. The key contribution of our work to doctors lies not only in providing results from a multimodal perspective, but also in offering the potential for mutual confirmation among different indicators through a multitask learning approach.

Results

Patient characteristics and dataset composition

This retrospective study collected cervical spine X-ray and MRI scans of 1500 patients (mean age ± standard deviation: 54 years ± 10.28, men: 909, 60.60%, women:591, 39.40%) with cervical spondylosis at the Third Hospital of Peking University from January 2016 to December 2018. During the preparation of data, some patients were excluded due to inappropriate labeling, such as missing vertebral bodies or spinous processes, resulting in a total of 1151 patients being used. Data collection was pre-approved by the Institutional Review Board and excluded minors as well as patients with instrumentation or other medical conditions (e.g., spinal tumors, infections, trauma, or scoliosis). They were annotated by three radiologists and randomly divided into a training set (693), a validation set (228) and a testing set (230). For the vast majority of cases, the three annotators’ results are largely consistent. In cases of disagreement, the annotations from a more authoritative expert shall prevail. All data were manually labeled by radiologists and validated against each other. The labeling information was all by the four-point method of vertebrae in lateral and flexion-extension positions on cervical spine radiographs. Each radiologist was blinded to patient demographics and clinical history. The dataset contained one cervical spine radiograph of each patient in three positions: neutral, flexion, and extension in the lateral position.

Vertebral detection on X-ray Images

In our case, the vertebral bodies are represented as quadrangles, it’s also equivalent to detecting the four vertices of the quadrangles. The detection performance is reported as the error distribution between the predicted vertices to the annotated vertices in three views, including flexion, neutral and extension in Fig. 1a–c. In most cases, from C3 to C7, the mean error is below 1 mm. In particular, the upper margin of the C2 and the down margin of C7 have larger variance, the mean of the former is slightly over 1 mm, and the mean of the latter is close to 1 mm. Also, several outliers with larger errors are observed; correspondingly, C2 and C7 contain more outliers than the others.

**Fig. 1: Box diagram of vertebral body detection C2-7.**

Cervical Vertebrae Curvature (CVC) analysis on X-ray images

The C2-7 Cobb Angle (CA) analysis method was developed according to the policy in^6,7,16,17, i.e., the angle between the lower end plate of C2 and the lower end plate of C7, where the vertebral bodies are detected with the model in subsection 2.1. The results of the Cobb angle are illustrated in Fig. 2, where the sub-figures Fig. 2a–c show the numerical analysis of Cobb angle estimation. The mean error of our method is 2.42°, with the median of 1.84°, and over 80% of samples were predicted with an error under 4°, the outliers over 8° were less than 1.5% (see sub-figure Fig. 2a), and the first and third quadrilles were close to 1° and 3° (see sub-figures Fig. 2b). The sub-figure Fig. 2c shows the bias distribution between our predicted results and the gold standard (of Cobb angle) with Bland-Altman analysis, and the 95% limits of agreement (± 1.96SD) were [−6.49°, 6.31°]. In the Bland-Altman analysis, if most of the points fall within the 95% consistency limit (i.e., within the two lines of mean ± 1.96 * standard deviation), and their maximum difference is clinically acceptable, then ‘the compared two methods’ can be considered to have good consistency and can be replaced with each other. In this work, our model and radiologist could be considered as having ‘good consistency’, since most points fall into the range between the 95% limits of agreement.

**Fig. 2: Cobb angle results (Best review in color).**

Sub-figure Fig. 2d illustrates the accuracy of clinical recommendations according to¹⁸. The Cobb angle is a crucial metric for assessing cervical spine curvature. It can reflect the lordosis (forward curvature) and kyphosis (backward curvature) of the cervical spine. Research conducted by Suda et al.¹⁸ has demonstrated that, through a multivariate logistic regression model, when the local kyphosis angle exceeds 13°, meaning the Cobb angle is less than −13°, successful surgery becomes challenging. Therefore, addressing the kyphosis issue should be prioritized. Conversely, when the local kyphosis angle is less than 13°, indicating a Cobb angle higher than −13°, the likelihood of surgical success is higher. Therefore, we have established a clinical threshold of −13° for the Cobb angle as a guideline. If the Cobb angle is less than −13°, surgery is not recommended, while if it is greater than 13°, surgical intervention may be considered. The accuracy distribution shows the proposed method could give a 100% accurate suggestion of cervical kyphosis, for the patients with Cobb Angle (gold standard) under −20° or over −10°. For the patient with a Cobb angle in [−20°,−10°], the accuracy is about 75.4%.

The C2-7 SVA value analysis method was developed according to the policy in¹⁹, i.e., the distance from the center vertical line of C2 to the posterior superior angle of C7. The results of the SVA value are illustrated in Fig. 3, where the sub-figures Fig. 3a–c show the numerical analysis of SVA estimation. The mean error of our method is 0.65 mm, with the median of 0.50 mm, and over 80% samples were predicted with the error under 1 mm, the outliers over 2.5 mm were less than 4% (see sub-figure Fig. 3a), and the first and third quadrilles were close to 0.3 mm and 0.8 mm (see sub-figures Fig. 3b). The sub-figure Fig. 3c shows the bias distribution between our predicted results and the gold standard (of SVA value) with Bland-Altman analysis, and the 95% limits of agreement (± 1.96SD) was [−1.58 mm, 1.78 mm].

**Fig. 3: SVA value results (Best review in color).**

SVA (Sagittal Vertical Axis) is a crucial factor for measuring cervical spine curvature. According to recent research²⁰, C2–C7 SVA more than 40 mm can be considered indicative of Cervical Deformity (CD). Therefore, we set the SVA threshold at 40 mm as the clinical assessment criterion. If SVA is more than 40 mm, it is generally considered a cervical curvature imbalance, which is also considered a kyphosis deformity; thus, surgical intervention may be implemented to correct cervical spine deformity. Also, according to decades of experience of Peking University Third Hospital, SVA less than 15 mm can also be the clinical assessment criterion. The accuracy distribution shows the proposed method could give a 100% accurate suggestion of cervical kyphosis for patients with SVA value (gold standard) under 10 mm or over 20 mm. For the patient with an SVA value in [10 mm, 20 mm], the accuracy is about 90.3%.

Vertebral instability analysis on X-ray images

The spondylolisthesis angle was estimated as the absolute difference between two views of the angle of extension of the posterior edge of the adjacent vertebral body. The results of the spondylolisthesis angle were illustrated in Fig. 4a; the errors are distributed between different vertebral segments, and there was a trend with larger errors from C2-C3 to C6-C7. The mean error of C2-C3/C6-C7 is from 2° to 3.6°.

**Fig. 4: Vertebral Instability results.**

The spondylolisthesis distance is estimated as the absolute difference between the two types of distance: 1) the distance between the extension line of the posterior edge of the upper vertebra and 2) the parallel line passing through the posterior superior corner of the lower vertebra. The results of the spondylolisthesis angle were illustrated in Fig. 4b, the errors are distributed between different vertebral segments, among all the segments, the segment C5-C6 have the smallest mean error about 0.4 mm, while the segment C2-C3 have the largest mean error about 0.6mm.

In terms of vertebral instability, we evaluate it based on the spondylolisthesis angle and distance. According to the guideline in²¹, the instability of the cervical vertebral segments was defined when there was more than 3.5 mm anterior or posterior translation or more than a 11° difference in spondylolisthesis angle. Therefore, we conducted the vertebral instability evaluation with the classification criteria of 3.5 mm in spondylolisthesis distance and 11° in spondylolisthesis angle. Meaning that when the spondylolisthesis distance exceeds 3.5 mm or the spondylolisthesis angle is more than 11°, it is considered as vertebral instability, and anterior cervical discectomy and fusion treatment may be required.

The classification results are illustrated in Fig. 4c with a Radar chart, which is also more convenient for comparison. It can be observed that the classification accuracy with spondylolisthesis distance on each segment outperforms the accuracy with spondylolisthesis angle; the overall accuracy of spondylolisthesis angle/distance is 80.5/99.4%.

Spinal stenosis analysis on X-ray images

The Sagittal Diameter of Vertebral Body (SDVB) was estimated as the distance from the midpoint of the anterior edge of the vertebral body to the midpoint of the posterior edge. The results of SDVB are illustrated in Fig. 5a, the mean errors estimated by our model are as follows: 1.67 mm for C3, 0.94 mm for C4, 0.77 mm for C5, 0.45 mm for C6, and 0.70 mm for C7.

The Sagittal Diameter of Vertebral Canal (SDVC) was estimated as the shortest distance from the midpoint of the posterior edge of the vertebral body to the lamina. The results of SDVC are illustrated in Fig. 5a, the mean errors estimated by our model are as follows: 2.46 mm for C3, 2.43 mm for C4, 2.14 mm for C5, 1.71 mm for C6, and 2.60 mm for C7.

The Pavlov value was estimated as the ratio of SDVC/SDVB. The results of Pavlov ratio are illustrated in Fig. 5c, the mean errors estimated by our model are as follows: 0.09 for C3, 0.11 for C4, 0.09 for C5, 0.08 for C6, and 0.10 for C7.

Pavlov value is significant in assessing developmental cervical canal stenosis. When the Pavlov value of three consecutive vertebral segments is less than 0.75, developmental cervical spinal stenosis is diagnosed. According to the guideline in²², Wang et al. attempted to categorize the patients into two groups, Developmental Canal Stenosis (DCS) and Non-Developmental Canal Stenosis (NDCS), based on the guideline of whether the Pavlov value is less than 0.75 or not. Thus, crossing 0.75 of the Pavlov value is a crucial clinical determinant; patients consisting of three consecutive vertebral segments with a Pavlov value less than 0.75 are considered as the indicative of developmental canal stenosis, and posterior cervical surgery may be considered.

Intervertebral degeneration analysis on MR images

The results of our model in analyzing the intervertebral disc degeneration (i.e., normal or bulge vs. protrusion or extrusion) are illustrated in Fig. 6a, resulting in the accuracy of 76.3% and the Area Under the Curve (AUC) of 0.844. Compared to the ROC curve of our model, all the results of doctors were under the curve of our model, demonstrating that our model could achieve better sensitivity or specificity in a similar scenario. When comparing to the doctors in more detailed Radar chart, our model outperformed the clinical doctors and radiologists in the evaluation measurement of accuracy and F1 score. The clinical doctor outperformed the radiologist and our model in specificity.

**Fig. 6: Intervertebral Degeneration Results.**

The results of spinal stenosis analysis are shown in Fig. 6b, with the AUC of 0.925, and accuracy of 87.6%, which is better than both clinical doctors and radiologists. When comparing to the doctors in a more detailed Radar chart, our model outperformed the clinical doctors and radiologists in the evaluation measurement of accuracy, sensitivity and F1 score. The clinical doctor outperformed the radiologist and our model in precision.

The results of intervertebal foramen stenosis analysis are illustrated in Fig. 6c, d, and the AUC of left and right intervertebal foramen stenosis diagnostics were 0.849 and 0.845, which were very closed to the AUC of intervertebral disc degeneration diagnostic. Comparing to the doctors in ROC curve, the points of clinical doctor were above the curve of our model, and the points of both junior and senior radiologists were under the curve of our model. When comparing to the doctors in the Radar chart, it could be seen that our model performed better in accuracy and F1 score, and the clinical doctor diagnosed better in precision and specificity. According to our statistic analysis, intervertebral degeneration usually takes about 3 minus for the radiologist, but it takes less than 2 s for our model.

The patient level diagnostic analysis

The patient-level results are illustrated in Fig. 7 as a radio chart, and also the visualization of each degenerative factor. Note that these results are calculated in patients, instead of being calculated in images in the above results, where the degeneration correctly detected in any sequence could be regarded as a correct diagnosis. Each axis in the radar chart represents the diagnostic accuracy of the corresponding degenerative factors. The overall average accuracy was 87.6%, and the diagnostic accuracy of other diseases was in the range of [75.1%, 98.5%].

**Fig. 7: The framework of patient level diagnostic analysis and the patient level diagnostic results.**

Also, a real diagnosis case of cervical spondylosis is illustrated in Fig. 8, where all the diseases diagnosed as positive are illustrated. In the case of Fig. 8, the cervical spine degeneration was detected as spinal stenosis in C4-5, and unstable in C5-6, cervical disc herniation in C3-C6, and spinal canal stenosis in C4-5. To some extent, C3-C6 Cervical disc herniation is a direct cause of Spinal canal stenosis. As the degenerative changes in the cervical spine of the patient progress, multiple disks at C3-C6 gradually undergo herniation, reducing the effective space within the spinal canal, thereby forming cervical spinal stenosis and ultimately leading to compression of the cervical spinal cord or nerve roots. At the same time, Spinal canal stenosis is also a manifestation of the compression of the spinal cord by C3-C6 Cervical disc herniation, and the two conditions can corroborate each other.

**Fig. 8: A visualized diagnostic report of a real case.**

The external validation

We also conducted the external validation with the patients in Beijing Haidian Hospital, the results are illustrated in Fig. 9. In general, the overall average accuracy is 85.8%, which is very close to that of Peking University Third Hospital (87.6%). Compared to the validation within Peking University Third Hospital, the performance of both our model and the clinical doctors were somewhat lower than that in Fig. 6. In particular, the AUC of intervertebral disc degeneration diagnostic is 0.808, and the AUC on left and right foramen stenosis were 0.707 and 0.705 in Fig. 9c, d, where the ‘points’ of the clinical doctor were near ROC curve of our model. The clinical doctor slightly outperforms our model.

**Fig. 9: External validation results.**

Discussion

In spinal surgery, the concept of Cobb angle is used to describe the alignment and curvature of the spine. This concept can be applied to describe angles in both the sagittal and coronal planes, depending on the morphological characteristics of different types of spinal diseases. For cervical spondylosis, the common change in spinal angle is a reduction in the lordotic angle in the sagittal plane; whereas for thoracolumbar scoliosis, the common change is an increase in angle in the coronal plane²³. Therefore, the definition of ‘cervical Cobb angle’ often refers to the angle between the extension line of the lower endplate of the C2 vertebra and the extension line of the lower endplate of the C7 vertebra in the sagittal plane²⁴. If the patient’s X-ray shows unclear display of C7 vertebrae, C6 vertebrae can be used instead²⁵. Similarly, the Cobb angle has also been applied and studied on cervical sagittal CT²⁶. In previous works, Sardjono et al.²⁷ reported an automatic Cobb angle determination and tested the accuracy in 36 patients with a mean difference of 3.3°. Wang et al.²⁸ proposed a model called MVE-Net, which combined Anterior-posterior (AP) and Lateral (LAT) view X-rays to automatically calculate the Cobb angle. The average absolute errors estimated on the Anterior-posterior (AP) and Lateral (LAT) views were 7.81° and 6.26°, respectively. Pan et al.⁷ proposed an automated method for detecting the Cobb angle on chest X-rays. The mean absolute difference between computer-aided and manual methods for the Cobb angle was 3.32°. However, data with a larger number of patients is used in our model training and evaluation, and the Mean Absolute Difference (MAD) of our method is 2.42°, with the median of 1.84°, which are significantly better than previous works.

Sagittal Vertical Axis (SVA) is an important parameter for measuring sagittal plane balance. However, research on the automatic detection of SVA values using deep learning methods is currently relatively scarce. Weng et al.²⁹ introduced an automated method based on ResUNet for detecting the SVA value. The results showed a median absolute error for SVA of 1.183 ± 0.166 mm. However, the mean difference of our method is 0.65 mm, with the median of 0.5 mm.

Most existing methods for the automated measurement of Cobb angles and SVA values primarily rely on accuracy metrics, and there is a lack of in-depth exploration of their clinical relevance and implications. Differing from existing methods, we go beyond the automated calculation of SVA and Cobb angles by setting thresholds to provide more refined clinical recommendations. Cobb angle and SVA are both vital indicators for assessing cervical vertebrae curvature. While the Cobb angle assesses cervical spine curvature balance, SVA focuses on sagittal plane balance. According to the research⁷, there is a higher correlation between SVA (Sagittal Vertical Axis) and the Neck Disability Index (NDI). Our model exhibits errors in SVA measurements that are smaller than those in Cobb angle measurements. Overall, combining Cobb angle and SVA provides a more robust diagnostic tool for assessing cervical vertebral curvature.

Vertebral instability could be detected with spondylolisthesis angle and distance. In general, diagnostic with spondylolisthesis distance is more accurate than that with spondylolisthesis angle, which has a similar trend to the Cobb angle and SVA value. The results somewhat demonstrate that a deep learning model is more sensitive to the distance measurement, compared to the angle measurement in cervical spondylosis diagnostic. In contrast to the composite criterion in²¹ (used in this paper), some works^30,31 considered a single criterion as ‘20% of displacement’. In³⁰, Xiao et al. obtain the accuracy near 90%. In our work, if only consider single criterion, the diagnostic accuracy with spondylosis distance is over 99%, which is better than that in³⁰.

Sagittal diameter of vertebral canal (SDVC) is an important indicator for diagnosing cervical spinal canal stenosis. However, the measurement of the SDVC is influenced by the magnification factor due to the X-ray projection distance. However, the Pavlov ratio (SDVC/SDVB) is not affected by this factor. Therefore, using the Pavlov ratio as an indicator to assess cervical spinal canal stenosis is more reasonable. Furthermore, research conducted by Suk et al.³² has shown a high correlation between the Pavlov ratio and cervical spinal canal stenosis. Hence, our model’s ability to automatically calculate the Pavlov ratio to aid in assessing cervical spinal canal stenosis is meaningful.

It is apparent that for the Pavlov ratio, our model exhibits lower errors, indicating higher reliability in our results. Existing literature on artificial intelligence-based automatic detection of Pavlov values is limited, and our findings are pioneering in this regard.

As widely recognized, cervical intervertebral disc degeneration represents one of the most prevalent cervical spine disorders in clinical medicine, with sagittal MRI playing a crucial role in its diagnosis. We used a deep learning-based model for the classification of intervertebral disc degeneration. Experimental results demonstrate that our model is much better than two junior radiologists, surpasses the performance of senior radiologists, and serves as an effective diagnostic aid for medical professionals. However, due to various factors such as limited datasets, the majority of publicly available articles primarily focus on auxiliary diagnosis of lumbar intervertebral disks^33,34,35,36, with limited research dedicated to cervical intervertebral disc studies. For instance, Oktay et al.³³ conducted intervertebral disc segmentation and then employed a Support Vector Machine (SVM) to classify lumbar intervertebral disks into normal and abnormal categories. Mbarki et al.³⁴ utilized a CNN-based model for feature extraction in lumbar intervertebral disc degeneration and subsequently employed a VGG-based model for classification. The work³⁵ combined information from sagittal and axial MRI images of the lumbar spine and classified it into appropriate classes (healthy, bulge, central, right or left herniation for axial view, and healthy, L4/L5, L5/S1 level of herniation in sagittal view) using a convolutional neural network (CNN). Adibatti et al.³⁶ similarly focused on locating, segmenting, and classifying lumbar intervertebral disks by deep learning models. All those lumbar intervertebral disc methods depend on the segmentation results. Note that intervertebral disks classification with segments that closed up the lesion could reduce the difficulty; however, they also required external models and labor costs in model training and segment annotation. Compared to those methods, our method did not require external segmentation.

Methods

This study was approved by the Peking University Third Hospital Medical Science Research Ethics Committee, which waived the requirement for informed consent. This waiver was granted as our study was a retrospective analysis of pre-existing imaging data. The research is strictly non-commercial and uses fully anonymized data with all personal identifiers permanently removed. The study was performed in accordance with national and international guidelines. The proposed method is designed in cascade and ensemble architecture, where detection module and diagnostic module are cascaded (see Fig. 7). Different degenerative indicators are integrated in an ensemble module, which are contributed as different losses in deep learning model training. This section introduces the proposed framework in two parts, detection and degeneration diagnosis in multi-task learning.

Detection

The input to the vertebral body detection model is the image I ∈ R^{C × W × H}, where C, W, and H denote the number of channels, width, and height of the input image. The output is the locations of K keypoints, K is 24 (i.e., the four corner points of vertebral bodies from C2 to C7), denoted as ${\left\{{p}_{k}\right\}}_{k=1}^{K},{p}_{k}=\left({x}_{k},{y}_{k}\right)$. Since the input radiographic image is a grayscale image, the number of channels C is 1.

Previous keypoint detection models³⁷ usually follow a similar pipeline to extract the image features for the k-th keypoint first, and then predict a heatmap ${\widehat{{{{\rm{H}}}}}}_{k}\in {{{{\bf{R}}}}}^{w\times h}$ for the k-th keypoint, where w × h denotes the size of the output heatmap. Then, obtain the pixel position p_k of the k-th keypoint by argmax, and finally group the keypoints of different instances.

The ground truth pixel position is usually converted into a heatmap, denoted as H_k, which is generated by a 2D Gaussian function with a standard deviation of 1. The loss for heatmap supervision is computed using the Mean Squared Error (MSE) between the ground-truth heatmap H and the predicted one $\widehat{H}$. The calculation is performed as follows:

$$MSE(H,\widehat{H})=\frac{1}{{{{\bf{N}}}}}\,{\sum }_{i=1}^{{{{\bf{N}}}}}{({h}_{i}-\widehat{{h}_{i}})}^{2}$$

(1)

where N is the number of points in the heatmap, h_i and ${\widehat{h}}_{i}$ represent the ground-truth probability and the predicted probability of the i-th pixel.

The loss function of heatmap supervision L can be formulated as:

$$L={L}_{heatmap}=MSE(H,\widehat{H})$$

(2)

Heatmaps are typically generated at lower resolutions through down-sampling to balance computational efficiency and receptive field coverage, leading to localization errors. Particularly for the medical images, where most targets are relatively small in images, minimizing localization errors becomes critical. Our keypoint detection module considers to generate the heatmap on the highest resolution feature map, inspired by the HigherHRNet³⁸, which is built on HRNet^39,40. The high-resolution output obtained by deconvolution upsampling is added to form a high-resolution feature pyramid learning multi-scale perceptual representation. Heatmap loss L_heatmap contains multi-resolution supervision and can be expressed as:

$${L}_{heatmap}^{{\prime} }={\sum }_{n=1}^{N}{{{\rm{MSE}}}}\left({H}_{nk},{\widehat{H}}_{nk}\right)$$

(3)

Where N is the number of layers of the feature pyramid, while H_nk and ${\widehat{H}}_{nk}$ denote the n-th layer of the heatmap at the k-th keypoint.

Degeneration diagnosis in multi-task learning

In this section, we propose a Multi-task Learning Mechanism (MLM) module, which combines the metric measurement task with the keypoint detection task in an attempt to improve the overall performance without adding additional annotations. In our multi-task learning framework, we employ the Differentiable Spatial to Numerical Transform (DSNT)⁴¹ to convert heatmaps into coordinate representations through a differentiable spatial expectation operation, enabling end-to-end backpropagation of errors from multiple clinical indicators to the keypoints detection module. Given the predicted heatmap $\widehat{H}\in {{\mathbb{R}}}^{h\times w}$, we apply the softmax function for normalization to ensure the results conform to a probability distribution:

$${\widehat{Z}}_{i,j}=\frac{exp({\widehat{H}}_{i,j})}{{\Sigma }_{k=1}^{h}{\Sigma }_{l=1}^{w}exp({\widehat{H}}_{k,l})}$$

(4)

We then construct two h × w matrices X and Y, where ${X}_{ij}=\frac{2j-(w+1)}{w}$ and ${Y}_{ij}=\frac{2i-(h+1)}{h}$. The matrices X and Y encode normalized coordinate values, where each element contains the x- (for X) or y-coordinate (for Y) respectively, such that the top-left corner of the image is at (−1,1) and the bottom-right is at (1,1). The predicted coordinates $(\widehat{x},\widehat{y})$ are derived through the Frobenius inner product between the normalized heatmap $\widehat{Z}$ and coordinate matrices X, Y.

$$(\widehat{x},\widehat{y})=DSNT(\widehat{Z})=[{\langle \widehat{Z},X\rangle }_{F}\,{\langle \widehat{Z},X\rangle }_{F}]$$

(5)

where 〈⋅, ⋅〉_F denotes the Frobenius inner product, which is equivalent to the scalar dot product of vectorized matrices. Essentially, this operation computes a weighted average of the coordinate grid with respect to the heatmap’s probability distribution, yielding the mathematical expectation coordinates under this distribution. Since both the softmax and expectation operations are differentiable, gradients can be backpropagated seamlessly.

Although DSNT can replace the heatmap output, in order to preserve the multi-resolution heatmap of HigherHRNet, we use the predicted coordinates generated by DSNT for the metric measurement task only. The final loss function L_Ours contains two parts: loss L_multi−task for the DSNT-based metric measurement task and loss ${L}_{heatmap}^{{\prime} }$ for the multi-resolution heatmap keypoint detection task, represented as:

$${L}_{Ours}={L}_{multi-task}+{L}_{heatmap}^{{\prime} }={\sum }_{m=1}^{M}{\lambda }_{m}{{{\rm{MSE}}}}\left({{{{\rm{Q}}}}}_{m},{\widehat{{{{\rm{Q}}}}}}_{m}\right)+{\sum }_{n=1}^{N}{{{\rm{MSE}}}}\left({H}_{nk},{\widehat{H}}_{nk}\right)$$

(6)

$${{{\rm{Q}}}}=[{{{{\bf{Q}}}}}_{CA},{{{{\bf{Q}}}}}_{SD},{{{{\bf{Q}}}}}_{SA},{{{{\bf{Q}}}}}_{SDVB},{{{{\bf{Q}}}}}_{SDVC}]$$

(7)

Among them, Q_m is taken from the coordinates ${{{{\bf{P}}}}}^{{\prime} }$ obtained by DSNT mapping, where λ_m denotes the parameter of the m-th metric measurement operation, and M denotes the number of metric measurement tasks. $\widehat{{{{{\rm{Q}}}}}_{m}}$ denotes the result of the m-th metric measurement with ground truth coordinates. It is explained below how to obtain the metric measurement Q_m from the coordinate matrix ${{{{\bf{P}}}}}^{{\prime} }$:

$${{{{\bf{P}}}}}^{{\prime} }={\left[\begin{array}{cccccccc}{x}_{1} & {y}_{1} & {x}_{2} & {y}_{2} & {x}_{3} & {y}_{3} & {x}_{4} & {y}_{4}\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ {x}_{21} & {y}_{21} & {x}_{22} & {y}_{22} & {x}_{23} & {y}_{23} & {x}_{24} & {y}_{24}\end{array}\right]}_{6\times 8}$$

(8)

Points p_1 to 4 represent the antero-superior, antero-inferior, postero-superior, and postero-inferior corner points of the C2 vertebrae, as well as so on.

Assessing cervical curvature relies on two indicators, the C2-7 Cobb angle and the C2-7 SVA. In particular, the angle points of the lower edges of the C2 and C7 vertebrae are used to calculate the Cobb angle. The following values in the corresponding coordinate matrix ${{{{\bf{P}}}}}^{{\prime} }$:

$${p}_{2x}=\left({{{{\bf{P}}}}}_{[0,2]}^{{\prime} }\right),{p}_{2y}=\left({{{{\bf{P}}}}}_{[0,3]}^{{\prime} }\right),{p}_{4x}=\left({{{{\bf{P}}}}}_{[0,6]}^{{\prime} }\right),{p}_{4y}=\left({{{{\bf{P}}}}}_{[0,7]}^{{\prime} }\right)$$

(9)

$${p}_{22x}=\left({{{{\bf{P}}}}}_{[5,2]}^{{\prime} }\right),{p}_{22y}=\left({{{{\bf{P}}}}}_{[5,3]}^{{\prime} }\right),{p}_{24x}=\left({{{{\bf{P}}}}}_{[5,6]}^{{\prime} }\right),{p}_{24y}=\left({{{{\bf{P}}}}}_{[5,7]}^{{\prime} }\right)$$

(10)

The detailed formula of Q_CA is:

$$k1=\left({p}_{2y}-{p}_{4y}\right)/\left({p}_{2x}-{p}_{4x}+\delta \right),\,k2=\left({p}_{22y}-{p}_{24y}\right)/\left({p}_{22x}-{p}_{24x}+\delta \right),\,\delta=1e-9$$

(11)

$${{{{\bf{Q}}}}}_{CA}=\arctan ((k1-k2)/(1-k1\times k2))\times \left(18{0}^{\circ }/\pi \right)$$

(12)

Similarly, the SVA calculation is dependent on the coordination of the four corners of C2, as well as the postero-superior corner p₂₃ of C7:

$${p}_{1x}=\left({{{{\bf{P}}}}}_{[0,0]}^{{\prime} }\right),{p}_{1y}=\left({{{{\bf{P}}}}}_{[0,1]}^{{\prime} }\right),{p}_{2x}=\left({{{{\bf{P}}}}}_{[0,2]}^{{\prime} }\right),{p}_{2y}=\left({{{{\bf{P}}}}}_{[0,3]}^{{\prime} }\right)$$

(13)

$${p}_{3x}=\left({{{{\bf{P}}}}}_{[0,4]}^{{\prime} }\right),{p}_{3y}=\left({{{{\bf{P}}}}}_{[0,5]}^{{\prime} }\right){p}_{4x}=\left({{{{\bf{P}}}}}_{[0,6]}^{{\prime} }\right),{p}_{4y}=\left({{{{\bf{P}}}}}_{[0,7]}^{{\prime} }\right)$$

(14)

$${p}_{23x}=\left({{{{\bf{P}}}}}_{[5,4]}^{{\prime} }\right)$$

(15)

${p}^{{\prime} }$ is the coordinates of the diagonal intersection of the C2 vertebrae, and the details of the calculation of the Q_SVA are as follows:

$${p}_{x}^{{\prime} }=\left({p}_{4x}\times {p}_{2y}-{p}_{1y}\times {p}_{3x}\right)/\left({p}_{1x}\times {p}_{2y}-{p}_{1y}\times {p}_{2x}\right)$$

(16)

$${{{{\bf{Q}}}}}_{SVA}=\left|{p}_{x}^{{\prime} }-{p}_{23x}\right|$$

(17)

Vertebral instability can be assessed using the spondylolisthesis angle and spondylolisthesis distance. Both calculations depend on the coordinates of the posterior corners of the adjoining vertebrae, as in the example of C5-6:

$${p}_{15x}=\left({{{{\bf{P}}}}}_{[2,4]}^{{\prime} }\right),{p}_{15y}=\left({{{{\bf{P}}}}}_{[2,5]}^{{\prime} }\right),{p}_{16x}=\left({{{{\bf{P}}}}}_{[2,6]}^{{\prime} }\right),{p}_{16y}=\left({{{{\bf{P}}}}}_{[2,7]}^{{\prime} }\right)$$

(18)

$${p}_{19x}=\left({{{{\bf{P}}}}}_{[3,4]}^{{\prime} }\right),{p}_{19y}=\left({{{{\bf{P}}}}}_{[3,5]}^{{\prime} }\right),{p}_{20x}=\left({{{{\bf{P}}}}}_{[3,6]}^{{\prime} }\right),{p}_{20y}=\left({{{{\bf{P}}}}}_{[3,7]}^{{\prime} }\right)$$

(19)

The value Q_SD of the spondylolisthesis distance is given by:

$$A={p}_{16y}-{p}_{15y},B={p}_{15x}-{p}_{16x},C=\left({p}_{15y}-{p}_{16y}\right)\times {p}_{15x}/\left({p}_{16x}-{p}_{15x}\right)\times {p}_{15y}$$

(20)

$${{{{\bf{Q}}}}}_{SD}=\left|A\times {p}_{20x}+B\times {p}_{20y}+C\right|/\sqrt{{A}^{2}+{B}^{2}}$$

(21)

The value Q_SA of spondylolisthesis angle is given by:

$${k}_{1}=\left({p}_{15y}-{p}_{10y}\right)/\left({p}_{15z}-{p}_{18x}+\delta \right),{k}_{2}=\left({p}_{19y}-{p}_{20y}\right)/\left({p}_{19x}-{p}_{20x}+\delta \right),\delta=1e-9$$

(22)

$${{{{\bf{Q}}}}}_{SA}=\arctan (({k}_{1}-{k}_{2})/(1-{k}_{1}\times {k}_{2}))\times \left(18{0}^{\circ }/\pi \right)$$

(23)

The assessment of spinal stenosis relies on the Pavlov value calculated as:

$${{{\bf{Pavlov}}}}={{{{\bf{Q}}}}}_{SDVC}/{{{{\bf{Q}}}}}_{SDVB}$$

(24)

SDVB is the length of the line segment joining the anterior and posterior mid-points of the vertebra, using the C3 vertebra as an example:

$${p}_{5x}=\left({{{{\bf{P}}}}}_{[1,0]}^{{\prime} }\right),{p}_{5y}=\left({{{{\bf{P}}}}}_{[1,1]}^{{\prime} }\right),{p}_{6x}=\left({{{{\bf{P}}}}}_{[1,2]}^{{\prime} }\right),{p}_{6y}=\left({{{{\bf{P}}}}}_{[1,3]}^{{\prime} }\right)$$

(25)

$${p}_{7x}=\left({{{{\bf{P}}}}}_{[1,4]}^{{\prime} }\right),{p}_{7y}=\left({{{{\bf{P}}}}}_{[1,5]}^{{\prime} }\right){p}_{8x}=\left({{{{\bf{P}}}}}_{[1,6]}^{{\prime} }\right),{p}_{8y}=\left({{{{\bf{P}}}}}_{[1,7]}^{{\prime} }\right)$$

(26)

The Q_SDVB of C3 was calculated as:

$${p}_{anterio{r}_{x}}=\left({p}_{5x}+{p}_{6x}\right)/2,\,{p}_{anterio{r}_{y}}=\left({p}_{5y}+{p}_{6y}\right)/2$$

(27)

$${p}_{posterio{r}_{x}}=\left({p}_{7x}+{p}_{8x}\right)/2,\,{p}_{posterio{r}_{y}}=\left({p}_{7y}+{p}_{8y}\right)/2$$

(28)

$${{{{\bf{Q}}}}}_{SDVB}=distance\left({p}_{anterior},\,{p}_{posterior}\right)=\sqrt{{\left({p}_{anterio{r}_{x}}-{p}_{posterio{r}_{x}}\right)}^{2}+{\left({p}_{anterio{r}_{y}}-{p}_{posterio{r}_{y}}\right)}^{2}}$$

(29)

Q_SDVC is the shortest distance from the posterior midpoint of the vertebra to the anterior cervical spinous process M, and the matrix M is obtained from the segmentation model, which in the example of C3 can be expressed as:

$${{{{\bf{Q}}}}}_{SDVC}=min(distance\left({p}_{posterior},\,p\in {{{\bf{M}}}}\right))$$

(30)

All above metrics Q = [Q_CA, Q_SD, Q_SA, Q_SDVB, Q_SDVC] are jointly participated in multi-task training and optimization for X-rays. Our multi-task framework is implemented as an extension of the Faster R-CNN model⁴². In the Faster R-CNN model, two tasks of classification and localization are jointly learned as two branches after several shared CNN layers. In our implementation (see Supplementary Fig. 1 in the Supplementary Information), each metric in Q is implemented as a branch as follows:

$${L}_{multi-task}={\sum }_{m=1}^{M}{\lambda }_{m}{{{\rm{MSE}}}}\left({{{{\rm{Q}}}}}_{m},{\widehat{{{{\rm{Q}}}}}}_{m}\right)$$

(31)

The implementation for processing X-ray images is divided into two stages. The first detection stage detects the coordinates of cervical vertebral key points, while the second prediction stage predicts multiple degenerative indicators. Our detection module extracts the feature map from the input images and then generates the heatmap on the highest resolution feature map, inspired by the HRNet^39,40 and HigherHRNet³⁸. The detection module is a parallel architecture that consists of several branches, where one branch focuses on high-resolution features, while others progressively downsample the input to create low-resolution representations (see Supplementary Fig. 1 in the Supplementary Information). The high-resolution branch consists of several residual blocks that utilize 3 × 3 convolutions, batch normalization, and ReLU activation. Alongside the high-resolution branch, there are low-resolution branches at different scales (1/2, 1/4 the original resolution), and the downsampling is implemented by the strided convolutions. The architecture also employs a fusion mechanism to combine high-resolution and low-resolution feature maps, using upsampling and convolution operations.

After obtaining the feature map through the above backbone network, we use a deconvolution upsampling operation and several residual convolutional layers to obtain the final high-resolution heatmaps. Then we use the Differentiable Spatial to Numerical Transform (DSNT)⁴¹ to get the mapped pixel coordinates from the normalized heatmap prediction, and finally group the keypoints of different instances. Finally, the predicted values of various degenerative indicators are calculated from the linear layers based on the coordinates obtained by the DSNT mapping.

The diagnosis of intervertebral disc degeneration on MRI could also be implemented as a branch after several shared CNN layers. However, considering the visual differences between X-ray and MRI images, we separately train a different CNN model for MRI images (see Supplementary Fig. 2 in the Supplementary Information).

Evaluation metrics

In order to verify the capability of our model in diagnosing cervical spondylosis diseases, we evaluate our method from two aspects: measurement error and recognition performance. The measurement error consists of the Mean Absolute Error (MAE) of angle and distance, which could be calculated as the MAE angle and distance between the predicted results of our model and the golden standard annotated by the expert. The performance metrics are evaluated through accuracy, precision, sensitivity, specificity and F1 score.

Implementation details

The width of the original X-ray images ranges from 908 to 2973 pixels, and the height ranges from 1125 to 3032 pixels. With reference to^43,44, we resized all images uniformly to 512 × 512 pixels and performed normalization operations to standardize input distributions. Regarding the hyperparameter configuration for model training, we set the batch size to 32. The model was initialized with the pre-trained HigherHRNet³⁸ weights provided by MMPose⁴⁵, and trained for 300 epochs using the Adam optimizer with a fixed learning rate of 1.5 × 10⁻³.

Statistics and reproducibility

This retrospective study was conducted on a cohort of 1151 patients. The dataset was randomly partitioned into training (n = 693), validation (n = 228), and testing (n = 230) sets. No statistical method was used to predetermine the sample size, and no data were excluded following initial eligibility screening. The reproducibility of the analysis is facilitated by the publicly archived code.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The datasets used in this study were provided by Beijing Third Hospital. The datasets are not yet public due to data privacy laws and data transfer agreements from the corresponding centers, access can be obtained through the corresponding author upon request subject to ethical review. Source data for the figures are provided with this paper. Source data are provided with this paper.

Code availability

Code and model files are available on the GitHub page: https://github.com/isiay/Diagnostic-analysis-of-multimodal-cervical-degenerative-diseases⁴⁶.

References

Wei Lv, Y. et al. The prevalence and associated factors of symptomatic cervical spondylosis in chinese adults: a community-based cross-sectional study. BMC Musculoskeletal Disorders 19 https://api.semanticscholar.org/CorpusID:52190367 (2018).
Kazeminasab, S. et al. Neck pain: global epidemiology, trends and risk factors. BMC Musculoskelet. Disord. 23, 1–13 (2022).
Article Google Scholar
Tamai, K. et al. A deep learning algorithm to identify cervical ossification of posterior longitudinal ligaments on radiography. Sci. Rep. https://doi.org/10.1038/s41598-022-06140-8 (2022).
Lee, G. W., Shin, H. & Chang, M. C. Deep learning algorithm to evaluate cervical spondylotic myelopathy using lateral cervical spine radiograph. BMC Neurol. 22, 147 (2022).
Article PubMed PubMed Central Google Scholar
Fujinaka, A., Mekata, K., Takizawa, H. & Kudo, H. Segmentation of cervical intervertebral disks in videofluorography by cnn, multi-channelization and feature selection. Int. J. Comput. Assist. Radiol. Surg. 15, 901–908 (2020).
Article PubMed Google Scholar
Gawroska, A. Lower cervical spine instability among children: the meaning of radiological assessment in contact sports (judo) qualifications. EPOS https://doi.org/10.1594/ecr2017/C-2105 (2016).
Pan, Z. et al. Débridement and reconstruction improve postoperative sagittal alignment in kyphotic cervical spinal tuberculosis. Clin. Orthop. Relat. Res. 475, 2084–2091 (2017).
Article PubMed PubMed Central Google Scholar
Cui, Z. et al. Vertnet: Accurate vertebra localization and identification network from CT images. https://doi.org/10.1007/978-3-030-87240-3_27 (2021).
Daenzer, S. et al. Volhog: a volumetric object recognition approach based on bivariate histograms of oriented gradients for vertebra detection in cervical spine MRI. Med, Phys, 41, 504–513 (2014).
Article Google Scholar
Bae, H.-J. et al. Fully automated 3d segmentation and separation of multiple cervical vertebrae in ct images using a 2d convolutional neural network. Comput. Methods Programs Biomed. 184, 105119 (2020).
Article PubMed Google Scholar
Zheng, H.-D. et al. Deep learning-based high-accuracy quantitation for lumbar intervertebral disc degeneration from MRI. Nat. Commun. 13, 841 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Qian, J. et al. Cephann: a multi-head attention network for cephalometric landmark detection. IEEE Access 8, 112633–112641 (2020).
Article Google Scholar
Payer, C., Štern, D., Bischof, H. & Urschler, M. Integrating spatial configuration into heatmap regression-based CNNs for landmark localization. Med. image Anal. 54, 207–219 (2019).
Article PubMed Google Scholar
Pei, Y. et al. Automated measurement of hip–knee–ankle angle on the unilateral lower limb x-rays using deep learning. Phys. Eng. Sci. Med. 44, 53–62 (2021).
Article PubMed Google Scholar
Liu, C. et al. Misshapen pelvis landmark detection with local-global feature learning for diagnosing developmental dysplasia of the hip. IEEE Trans. Med. Imaging 39, 3944–3954 (2020).
Article ADS PubMed Google Scholar
Ailon, T. et al. Outcomes of operative treatment for adult cervical deformity: a prospective multicenter assessment with 1-year follow-up. Neurosurgery 83, 1031–1039 (2018).
Article PubMed Google Scholar
Li, J. et al. Clinical characteristics and radiology measurement on degenerative lower cervical instability. Chin. J. Spine Spinal Cord. 8, 18–21 (1998).
Google Scholar
Suda, K. et al. Local kyphosis reduces the surgical outcomes of expansive open-door laminoplasty for cervical spondylotic myelopathy. Spine 28, 1258–1262 (2003).
Article PubMed Google Scholar
Jackson, R. & McManus, A. Radiographic analysis of sagittal plane alignment and balance in standing volunteers and patients with low back pain matched for age, sex, and size. a prospective controlled clinical study. Spine 19, 1611–1618 (1994).
Article CAS PubMed Google Scholar
Nagoshi, N. et al. Changes in cervical spinal alignment after thoracolumbar corrective surgery in adult patients with adolescent idiopathic scoliosis. Spine 45, 877–883 (2020).
Article PubMed Google Scholar
Maeda, T. et al. Soft-tissue damage and segmental instability in adult patients with cervical spinal cord injury without major bone injury. Spine 37, E1560–1566 (2012).
Article PubMed Google Scholar
Wang, Z., Leng, J., Liu, J. & Liu, Y. Morphological study of the posterior osseous structures of subaxial cervical spine in a population from northeastern china. J. Orthop. Surg. Res. 10, 53 (2015).
Article PubMed PubMed Central Google Scholar
Silber, J. S., Lipetz, J. S., Hayes, V. M. & Lonner, B. S. Measurement variability in the assessment of sagittal alignment of the cervical spine: a comparison of the gore and cobb methods. Clin. Spine Surg. 17, 301–305 (2004).
Google Scholar
Kong, C. et al. The ratio of c2–c7 Cobb angle to t1 slope is an effective parameter for the selection of posterior surgical approach for patients with multisegmental cervical spondylotic myelopathy. J. Orthop. Sci. 25, 953–959 (2020).
Article PubMed Google Scholar
Zhang, J., Buser, Z., Abedi, A., Dong, X. & Wang, J. C. Can c2-6 cobb angle replace c2-7 cobb angle?: an analysis of cervical kinetic magnetic resonance images and x-rays. Spine 44, 240–245 (2019).
Article PubMed Google Scholar
Wang, C. et al. Deep learning model for measuring the sagittal Cobb angle on cervical spine computed tomography. BMC Med. Imaging 23, 196 (2023).
Article CAS PubMed PubMed Central Google Scholar
Sardjono, T. et al. Automatic Cobb angle determination from radiographic images. Spine 38, 1256–1262 (2013).
Article Google Scholar
Wang, L. et al. Accurate automated Cobb angles estimation using a multi-view extrapolation net. Med. image Anal. 58, 101542 (2019).
Article PubMed Google Scholar
Weng, C.-H. et al. Artificial intelligence for automatic measurement of sagittal vertical axis using the ResNet framework. J. Clin. Med. 8, 1826 (2019).
Article ADS PubMed PubMed Central Google Scholar
Xiao, Q. et al. Application of TVD-net for sagittal alignment and instability measurements in cervical spine radiographs. Med. Phys. 50, 4182–4196 (2023).
Article PubMed Google Scholar
Girard, V. et al. Post-traumatic lower cervical spine instability: Arthrodesis clinical and radiological outcomes at 5years. Orthop. Traumatol.: Surg. Res. 100, 385–388 (2014).
CAS PubMed Google Scholar
Suk, K.-S. et al. Reevaluation of the pavlov ratio in patients with cervical myelopathy. Clin. Orthop. Surg. 1, 6 – 10 (2009).
Article PubMed PubMed Central Google Scholar
Oktay, A. B., Albayrak, N. B. & Akgul, Y. S. Computer aided diagnosis of degenerative intervertebral disc diseases from lumbar mr images. Comput. Med. Imaging Graph. 38, 613–619 (2014).
Article PubMed Google Scholar
Mbarki, W. et al. Lumbar spine discs classification based on deep convolutional neural networks using axial view MRI. Interdiscip. Neurosurg. 22, 100837 (2020).
Article Google Scholar
Sustersic, T. et al. A deep learning model for automatic detection and classification of disc herniation in magnetic resonance images. IEEE J. Biomed. Health Inform. 26, 6036–6046 (2022).
Article PubMed Google Scholar
Adibatti, S., Sudhindra, K. & Manisha Shivaram, J. Segmentation and classification of intervertebral disc using capsule stacked autoencoder. Biomed. Signal Process. Control 86, 105311 (2023).
Article Google Scholar
Pfister, T., Charles, J., Zisserman, A. & IEEE. Flowing convnets for human pose estimation in videos https://doi.org/10.1109/ICCV.2015.222 (2015).
Cheng, B. et al. Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In Proc. of the IEEE/CVF conference on computer vision and pattern recognition, 5386–5395 (2020).
Sun, K., Xiao, B., Liu, D. & Wang, J. Deep high-resolution representation learning for human pose estimation. In Proc. of the IEEE/CVF conference on computer vision and pattern recognition, 5693–5703 (2019).
Wang, J. et al. Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43, 3349–3364 (2021).
Article ADS PubMed Google Scholar
Nibali, A., He, Z., Morgan, S. & Prendergast, L. Numerical coordinate regression with convolutional neural networks.Preprint at arXiv https://doi.org/10.48550/arXiv.1801.07372 (2018).
Ren, S., He, K., Girshick, R. B. & Sun, J. Faster R-CNN: towards real-time object detection with region proposal networks. In Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M. & Garnett, R. (eds.) Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, 91–99 https://proceedings.neurips.cc/paper/2015/hash/14bfa6bb14875e45bba028a21ed38046-Abstract.html (2015).
Tanno, R. et al. Collaboration between clinicians and vision–language models in radiology report generation. Nat. Med. 31, 599–608 (2025).
Article CAS PubMed Google Scholar
Niu, C. et al. Medical multimodal multitask foundation model for lung cancer screening. Nat. Commun. 16, 1523 (2025).
Article ADS CAS PubMed PubMed Central Google Scholar
Contributors, M. Openmmlab pose estimation toolbox and benchmark. https://github.com/open-mmlab/mmpose (2020).
Song, X. et al. isiay/diagnostic-analysis-of-multimodal-cervical- degenerative-diseases: Automated diagnostic of cervical spondylosis release https://doi.org/10.5281/zenodo.17532045 (2025).

Download references

Acknowledgments

This work was supported by the National Key Research and Development Program of China under Grant 2023YFC2410703 to H.Y., in part by the National Natural Science Foundation of China under Grant 62125207 to S.J., 62032022 to X.S., 82371921 to N.L., 82102638 to H.O., 8257071728, 82171927 to H.Y., in part by the Beijing Natural Science Foundation under Grant Z190020, 7232132 to S.J., JQ22012 to X.S., L252150 to H.O., 7212126 to H.Y., in part by the Beijing Nova Program under Grant 20250484965 to S.J., in part by the Beijing New Health Industry Development Foundationin under Grant XM2020-02-006, XM2022-02-002 to H.Y., in part by Key Clinical Projects of Peking University Third Hospital No. BYSYZD2024014 to E.Z.. We would like to thank Huacheng Pang, Qizheng Wang, Yongye Chen, Ke Liu, and Weili Zhao for the data annotation.

Author information

These authors contributed equally: Xinhang Song, Yuan Li, Hanqiang Ouyang.

Authors and Affiliations

State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences (CAS), Beijing, China
Xinhang Song, Meiyi Yao, Fangbo Zhao, Kecen Yu & Shuqiang Jiang
Department of Radiology, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing, China
Yuan Li, Shuai Tian, Xiaoying Xing, Yan Zhou, Enlong Zhang, Ming Ni, Yuan Yuan, Ning Lang & Huishu Yuan
Department of Orthopaedics, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing, China
Hanqiang Ouyang & Liang Jiang
Engineering Research Center of Bone and Joint Precision Medicine, 49 North Garden Road, Haidian District, Beijing, China
Hanqiang Ouyang & Liang Jiang
Beijing Key Laboratory of Spinal Disease Research, 49 North Garden Road, Haidian District, Beijing, China
Hanqiang Ouyang & Liang Jiang
University of Chinese Academy of Sciences, Beijing, China
Meiyi Yao, Fangbo Zhao, Kecen Yu & Shuqiang Jiang
Department of Radiology, Beijing Haidian Section of Peking University Third Hospital (Beijing Haidian Hospital), Beijing, China
Dasheng Li
State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University Third Hospital, Beijing, China
Ning Lang

Authors

Xinhang Song
View author publications
Search author on:PubMed Google Scholar
Yuan Li
View author publications
Search author on:PubMed Google Scholar
Hanqiang Ouyang
View author publications
Search author on:PubMed Google Scholar
Meiyi Yao
View author publications
Search author on:PubMed Google Scholar
Fangbo Zhao
View author publications
Search author on:PubMed Google Scholar
Kecen Yu
View author publications
Search author on:PubMed Google Scholar
Shuai Tian
View author publications
Search author on:PubMed Google Scholar
Xiaoying Xing
View author publications
Search author on:PubMed Google Scholar
Yan Zhou
View author publications
Search author on:PubMed Google Scholar
Enlong Zhang
View author publications
Search author on:PubMed Google Scholar
Ming Ni
View author publications
Search author on:PubMed Google Scholar
Yuan Yuan
View author publications
Search author on:PubMed Google Scholar
Dasheng Li
View author publications
Search author on:PubMed Google Scholar
Ning Lang
View author publications
Search author on:PubMed Google Scholar
Liang Jiang
View author publications
Search author on:PubMed Google Scholar
Huishu Yuan
View author publications
Search author on:PubMed Google Scholar
Shuqiang Jiang
View author publications
Search author on:PubMed Google Scholar

Contributions

S.J., H.Y., L.J., and N.L. conceived the project. X.S., Y.L., and H.O. conceived and designed the study. Y.L., H.O., S.T., X.X., Y.Z., E.Z., M.N., Y.Y., and D.L. collected and analyzed the data. X.S., M.Y., F.Z., and K.Y. performed data processing, conducted the experiments and drafted the manuscript. All authors provided edits to the manuscript.

Corresponding authors

Correspondence to Ning Lang, Liang Jiang, Huishu Yuan or Shuqiang Jiang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Reporting Summary (download PDF )

Transparent Peer Review file (download PDF )

Source data

Source Data (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Song, X., Li, Y., Ouyang, H. et al. Automated diagnostic of cervical spondylosis on multimodal medical images with a multi-task deep learning model. Nat Commun 17, 2392 (2026). https://doi.org/10.1038/s41467-026-69023-w

Download citation

Received: 28 November 2023
Accepted: 13 January 2026
Published: 05 February 2026
Version of record: 12 March 2026
DOI: https://doi.org/10.1038/s41467-026-69023-w