Background & Summary

Magnetic resonance imaging (MRI) is essential for diagnosis of cervical spine diseases including degenerative spondylosis, spinal infection, and spinal tumors1,2,3,4. Recently, there has been considerable interest in quantitative analysis of MRI to improve and standardize radiologic assessment of cervical spine diseases with most efforts focusing on degenerative spondylosis2,5. Unfortunately, detailed anatomic segmentation is required for many such analyses and is prohibitively time consuming to perform manually.

Deep learning enabled automated segmentation could greatly facilitate cervical spine MRI analysis tasks such as quantification of degenerative disc disease. However, only a limited number of studies in literature have explored automated methods for cervical spine MRI segmentation6,7,8,9 or total spine segmentation on MRI that included cervical spine10,11,12. To the best of our knowledge, there is no publicly available dataset on cervical spine MRI with comprehensive vertebral body and intervertebral discs segmentation to develop and evaluate these methods.

Here we present the Duke University Cervical Spine MRI Segmentation Dataset (CSpineSeg)13, a publicly available MRI dataset comprising 1,255 sagittal T2-weighted cervical spine MRIs from 1,232 patients along with semantic segmentations of vertebral bodies and inter-vertebrae discs. Approximately 40% of the data were manually annotated and verified by expert radiologists with experience in spine imaging. A deep learning segmentation model on vertebral bodies was trained and evaluated on manually annotated data. The best performing model was used to generate segmentations on the remaining unannotated data.

CSpineSeg13 differs from existing related spinal image segmentation datasets in that it focuses entirely on MRI rather than CT14,15,16. While automated segmentation models already exist for CT, these models are not easily translated to MRI. Given the importance of MRI for evaluating cervical spine diseases, particularly degenerative spondylosis, CSpineSeg13 fills an important, unmet need in cervical spine imaging research. In addition to the source imaging data, we also provide expert manual segmentations of relevant vertebral anatomy and a pre-trained model that can be used to automatically segment additional studies.

Methods

This retrospective study was approved by the Institutional Review Board (IRB) of Duke University (Protocol Number: Pro00106785) with a waiver of informed consent.

Data collection

From December 2019 to November 2020, we initially identified 1,326 MRI examinations with the study description “MRI CERVICAL SPINE WITHOUT CONTRAST” via systematic search of the electronic health record database at Duke University. MRI exams were downloaded from the institutional imaging archive and manually reviewed for accuracy. To retrieve sequential cervical spine MRIs within a specific date range, we utilized two DICOM fields, namely’Study Description’ and’Study Date’, to construct a targeted query. This query was executed through the vendor neutral archive (VNA) application programming interface (API) hosted by Duke University. We programmatically utilized the’Study Description’ tag to focus on sagittal T2-weighted series without fat saturation for inclusion in the dataset. First, we selected candidate series by filtering the “Study Description” that included the string “sag T2”. Then, the designated radiologist manually verified the sequence type. We also utilized the’Study Date’ tag to filter the study date range. Each examination includes various series of DICOM files. Exclusion criteria included missing or incomplete sagittal T2-weighted imaging and exams with a field of view not specifically focused on the cervical spine (e.g. combined cervicothoracic MRI). Figure 1 demonstrates the detailed eligibility criteria for the dataset. 1,255 MRI examinations from 1,232 patients were included in the final dataset. Associated demographic data was retrieved from the EHR using Duke Enterprise Data Unified Content Explorer (DEDUCE) (https://doi.org/10.1016/j.jbi.2014.07.006).

Fig. 1
figure 1

The eligibility criteria for the Dataset.

Patient demographics

Patient demographics are presented in Table 1. The average age was 55 +/− 17 years at the exam level. The youngest patient was 18 and the oldest were greater than 90 years old. 557 (45%) patients were male, and 675 (55%) patients were female. 795 (65%) patients were Caucasian/White, 342 (28%) patients were Black or African American, 24 (2%) patients were Asian, 9 (<1%) patients were American Indian or Alaskan Native, and 1 patient (<1%) was Native Hawaiian or Other Pacific Islander. 58 (5%) patients were Hispanic and 1,141 (93%) were non-Hispanic.

Table 1 Demographic statistics of the dataset, including 1,255 MRI exams from 1,232 patients.

Ground-truth annotation

We selected 491/1,255 exams (39%) from 481/1,323 patients (36%) for manual semantic segmentation using a pseudo-random approach (exams were selected by medical record number in alphabetical order). Manual segmentations were performed by six board-certified radiologists (five with fellowship training in neuroradiology and one with fellowship training in musculoskeletal radiology) and one post-doctoral researcher without medical training. The post-doctoral researcher completed the first draft of the annotations, which were subsequently reviewed and revised by one of the radiologists.

Segmentation was performed using the publicly available ITK-Snap software tool. Semantic segmentation labels included vertebral bodies (label value 1), and intervertebral discs (label value 2) as demonstrated in Fig. 2. Vertebral bodies were defined as all portions of the vertebral body visible on imaging including any osteophytes but excluding the pedicles and posterior elements. Annotators were instructed to ensure segmentation of the vertebral bodies was performed from at least C2 to T1, and that the segmentation included the entire vertebral body up to the junction with the pedicle and included the uncovertebral joints. Segmentation of C1 was intentionally excluded due to the lack of corpus. Intervertebral discs were defined as all visible disc space including any disc bulge/herniation or intradiscal fluid/gas. The intervertebral disc segmentations were to include the entire disc, with associated herniations, disc-osteophyte complexes, and adjacent vertebral endplates. The posterior longitudinal ligament was not included in the disc segmentations. Fusion across two or more vertebral segments was treated as a single “block” vertebral segment. Surgical hardware was excluded from the segmentations. The segmentation process focused on anatomy rather than disease-focused objectives.

Fig. 2
figure 2

An Example of the sagittal T2 MRI with human annotations of vertebral bodies (red) and intervertebral discs (green) and a zoomed-in view for the C5-C7 region.

The series descriptors identified in the annotation dataset were utilized to select the un-annotated dataset for automated labeling. All final manual segmentations were reviewed and approved by one of the six radiologist annotators. The mid-sagittal slice of all unannotated exams was reviewed for quality and consistency by one of the six experts.

Segmentation experiments and evaluation

Manually annotated cervical spine exams were randomly split into a development set (391/491, 80%) and a test set (100/491, 20%). Development set data was used to train and validate a deep learning segmentation model using pre-processing pipelines and five-fold cross validation implemented in nnU-Net17. Both 2D and 3D U-Net models were trained for 1000 epochs without early stopping. We evaluated the 2D and 3D models separately as well as an ensemble of both, which takes the average of softmax probabilities from the model outputs17. Evaluation was performed on the test set with the Dice similarity coefficient (DSC) as the primary metric. The best performing segmentation model was applied to the remaining unannotated data. We reported the mean and standard deviation of the DSC scores. Non-parametric Mann-Whitney U tests18 were applied to test the difference between non-normally distributed DSC scores of different model configurations. The confidence level was set at 95%.

Data Records

The Duke University Cervical Spine MRI Segmentation Dataset (CSpineSeg)13 is publicly available to facilitate research in cervical spine imaging research. Data is publicly available on Medical Imaging and Data Resource Center (MIDRC, https://doi.org/10.60701/H6K0-A61V), medical imaging data commons funded by the National Institute of Biomedical Imaging and Bioengineering (NIBIB). MRI data adheres to the DICOM format and has undergone thorough de-identification procedures via RSNA DICOM anonymizer tool (Version 16, https://github.com/RSNA/mirc.rsna.org/Anonymizer-installer.jar), with modifications to dates that preserve the temporal interval between scans for the same patient. Deidentified data were strictly reviewed and verified by the institution. Users can download the full dataset via https://doi.org/10.60701/H6K0-A61V under the title “Duke-CspineSeg”. Four data download links of compressed files were provided, including Structured Data TSVs, MRI Image Files, Segmentation Files, and Annotation Files. We demonstrated the file structure in Fig. 3.

Fig. 3
figure 3

The file directory tree of the dataset folder, with an example of the case “593973-000003” in the dataset. “DukeCSpineSeg_structured” stored metadata of the dataset into 6 tab-delimited files. “DukeCSpineSeg_imaging_files” stored DICOM files for each study of this patient. Original DICOM files were saved in a compressed “.zip” file. After decompressing the zip file, users can get access to the original 14 DICOM files for this series of the study. “DukeCSpineSeg_annotation” stored the converted NIfTI file from the 14 DICOM files and “DukeCSpineSeg_segmentation” stored the annotation masks in NIfTI format.

Structured Data TSVs

Six structured metadata files in tab-delimited format are included in this download link to record metadata for the dataset. Specifically, demographic information (e.g. age_at_index, sex, race, ethnicity) is available in Clinical_manifest_RSNA_20250321.tsv and the imaging parameters (e.g. echo time, repetition time, slice thickness, spacing_between_slices, manufacturer, etc.) are available in mr_series_RSNA_20250321.tsv. MIDRC generates the codebook for recording metadata by using the in-house developed data dictionary model (available on https://github.com/uc-cdis/midrc_dictionary) and the visual tabular representation (available on the MIDRC data portal https://data.midrc.org/DD).

MRI image files

Users can access de-identified DICOM files in this folder. The file directory follows the following structure: “./case_image/{Patient_ID}/{Study_Instance_UID}/{Series_Instance_UID}.zip”, where we fetched Patient_ID, Study_Instance_UID, and Series_Instance_UID from the attributes stored in de-identified DICOM files. After unzipping the file, users can obtain a sequence of DICOM files.

Annotation files

This is a flat directory that stores the original MRI in NIfTI format, converted by using dcm2niix19. We selected the following de-identified DICOM attributes, Patient_ID, Accession_Number, Series Number, and Instance Number, from the de-identified DICOM files for naming each converted NIfTI file using the following format: “{Patient_ID}/Study-MR-{Accession_Number}/Series-{Series_Number}/{Instance_Number}.nii.gz”.

Segmentation files

This is a flat directory that stores the segmentation of each NIfTI image in Annotation Files. The segmentation files are stored in NIfTI format, with a suffix “_SEG” attached to the original file name before the “.nii.gz” extension.

Technical Validation

Deep learning segmentation performance

We demonstrate our segmentation model performance in Table 2. All three model configurations performed well. The ensembled outputs reach the best performance for segmentation of vertebral bodies (DSC = 0.929), intervertebral discs (DSC = 0.904), and the macro-average of both labels (DSC = 0.916), with comparable differences from the other two configurations (P-value > 0.05 from non-parametric tests). Robust performance was found in cross validation folds (Supplementary Materials A.). Figure 4 demonstrated the DSC distributions in vertebral bodies and intervertebral discs. The segmentation performance was comparable to previously published work on vertebrae segmentation of lumbar spine20,21,22.

Table 2 Mean +/− standard deviation of dice coefficients of vertebral bodies, intervertebral discs, and their macro-averages based on three nnU-Net model configurations.
Fig. 4
figure 4

Boxplots of the DSC distributions in vertebral bodies (Median DSC = 0.950) and intervertebral discs (Median DSC = 0.917). Medians were shown in the orange horizontal lines.

Limitation and future work direction

CSpineSeg13 has the following limitations. First, the labeling of vertebral bodies and intervertebral discs is binary without classification of vertebral level (i.e. C1-C7). Future work can focus on the classification using either manual efforts or automated methods, for example using connected component analysis. We have briefly explored the connected component method and provided preliminary results for per-intervertebral-disc segmentation based on the middle slice of the test cases (Supplementary Materials B.). We hope that this method can be further utilized and refined in future studies. Second, the automatically labeled data generated from this model have not undergone manual review and should be considered as “weakly” labeled data. Specific usage of the weakly labeled data should be under discretion. Third, the provided annotations were anatomical rather than pathological. We did not consider the presence or absence of cervical spine pathology in this dataset, and individual exams are not labeled for pathology. Future label extensions, such as adding pathological labels by expert review, analyzing Modic changes23, or grading intervertebral discs degeneration using Pfirrmann grading system24, are feasible for downstream clinical analysis. Fourth, manual annotations of each exam were verified by one of the six board-certified readers. This workflow reflects the practice of a common radiology exam evaluation process, but it lacks multi-rater variability evaluations. We are happy to address any potential annotation issues discovered in the future.

Usage Notes

To download the dataset on MIDRC, users must register and log into their account. Users are required to strictly follow the MIDRC Data Use Agreement (https://www.midrc.org/midrc-data-use-agreement) when accessing and using the de-identified data. DICOM files of one series can be visualized in ITK-SNAP by importing the series folder. To correctly visualize the segmentation annotation, users are recommended to import MRI NIfTI as the main image and load its associated segmentation NIfTI as the segmentation in ITK-SNAP, in which Label 1 (red) represents vertebral bodies and Label 2 (green) represents inter-vertebrae discs. We used nnU-Net V2 (https://github.com/MIC-DKFZ/nnUNet/tree/master/nnunetv2) for this work. Previously trained model weights can be accessed via google drive upon reasonable requests.

Both the test set and the weakly labeled set were collected from our institution. Thus, we believe that the high performance of the trained segmentation model (DSC >90%) on the test set could reflect the high quality of the weakly labeled data. However, due to the limited radiologists’ availability, the weakly labeled data were not evaluated by our annotation team. Thus, we recommend additional manual reviews to refine the weakly-labeled data. This process can be accelerated by using an active learning framework25. Additionally, by combining both weakly and strongly labeled data, semi-supervised learning approaches can be applied to refine the segmentation models26,27,28.