Background & Summary

The widespread use of imaging methods has contributed to a 2.4-fold increase in the incidence of thyroid cancer over the last 30 years, the fastest increase of any cancer type1,2, with a correspondingly unchanged or declining mortality rate for thyroid cancer3. This has given rise to the need for accurate diagnosis of thyroid cancer. There are two commonly used methods for diagnosing benign and malignant thyroid nodules: non invasive ultrasonography (US)4 and invasive fine-needle aspiration biopsy (FNAB)5. Currently, ultrasound is the first clinical choice of thyroid nodules screening, because of its non-radioactivity, easy-to-operate, and rapid diagnostic work-up6,7,8,9,10. In actual studies, there are overlapping ultrasound image features of benign and malignant thyroid nodules with blurred nodule appearance and irregular shape11,12, making manual feature extraction and annotation difficult for experience-based clinicians. In addition, the clinical experience of the operator, and different diagnostic criteria can interfere with the physician’s assessment of thyroid nodules13,14. The sensitivity of using US to diagnose thyroid cancer ranges from 27% to 63% only15,16. In clinical practice, ultrasound Thyroid Imaging Reporting and Data System (TIRADS) stage >3 is recommended to perform FNAB or surgical resection17. The use of FNAB can cause trauma to patients and incur additional costs. Moreover, FNAB is not absolutely effective and suffers from ultrasound localization of nodules, failing to provide a definitive diagnosis in at least 20% of patients or requiring repeated FNABs that still do not yield definitive results18,19,20,21. Therefore, to address the exponentially increasing patient demand and reduce the burden on healthcare services, computer-aided diagnosis was introduced to improve diagnostic performance. This approach aims to automate image analysis, providing a robust and reliable diagnosis22.

Deep learning (DL) is a subset of machine learning (ML) and artificial intelligence (AI) and can automatically extract features from images with complex hierarchical structures23. The introduction of DL such as convolutional neural networks in thyroid imaging has achieved better diagnostic results than experienced radiologists24,25,26. However, in practice, thyroid ultrasonography acquires the region of interest by reading dynamic images, the number of which is usually large and uneven. Different nodules of the same patient may have different labels, and the annotation of all images is a tedious and time-consuming task. Furthermore, the previous DL-based methods tend to be trained using TI-RADS reports as image labels. After that, histological examinations must be conducted to obtain the final histological diagnosis. Given the difficulty and inconsistency of dynamic image annotation in US, FNAB remains the gold standard for nodule diagnosis even after ultrasound assessment. Therefore, we have built a US of thyroid nodules dataset with direct histological diagnostic labels, aiming to explore the relationship between ultrasound images and videos and histological diagnosis.

Methods

Subject characteristics

The thyroid nodules ultrasound images from 842 cases were collected at the Second Affiliated Hospital of Jiaxing University in China in 2019–2022 (see Fig. 1). The datasets were prepared according to the following inclusion criteria: (1) hemi‐ or total thyroidectomy, (2) maximum nodule diameter 2.5 cm, (3) examination by conventional US and real‐time elastography (RTE) within 1 month before surgery, and (4) no previous thyroid surgery or percutaneous thermotherapy. This study received approval from the institutional review boards of the Second Affiliated Hospital of Jiaxing University (No. 2022ZFYJ295-01). The requirement of obtaining written informed consent from patients was forgone because retrospective data collection has not impacted the standard diagnostic procedures and all data has been anonymized before being entered into the database. The ethics committee has issued a waiver of consent and approved the open publication of the dataset.

Fig. 1
Fig. 1
Full size image

Thyroid image instances.

Image acquisition

Ultrasound acquisitions were performed using a portable machine equipped with a 3.5 MHz probe. The Esaote My Lab was used in the Second Affiliated Hospital of Jiaxing University. During each acquisition, the operator positioned the probe. These original images were cropped to remove sensitive information about patients. Histological labels were obtained according to the histological diagnosis report of the corresponding tissue slides.

Data Records

All the ultrasound images in ‘JPG’ format for each case with pathological diagnosis annotation (benign or malignant) is available at the public figshare repository27. The images in JPG format and their associated meta data in .csv format are stored in .zip files within the repository. The datasets file structure are shown in Fig. 2, the demographic and histological label information can be matched using the corresponding case name in the .csv files.

Fig. 2
Fig. 2
Full size image

Overview of the datasets structure.

Technical Validation

To validate the dataset proposed in this study, we introduced a novel dual attention-guided deep learning framework, ThyUS2Path. We assessed its performance against two state-of-the-art multiple instance learning (MIL) based methods, Meanpool and Maxpool. The MIL is a type of weakly supervised learning where training instances are arranged in sets, and a label is provided for the entire set, opposedly to the instance themselves. The dataset was organized into two batches: (1) Batch 1, consisting of 6,005 thyroid images from 601 patients, and (2) Batch 2, consisting of 2,503 thyroid images from 241 patients. The subject demographics of these datasets are shown in Table 1. Model training was conducted using a 5-fold cross-validation strategy. For Batch 1, 90% of the patients were allocated to the training-validation set, while the remaining 10% were reserved for testing. The training-validation set was further split using 5-fold cross-validation, with each fold comprising 4,380 images for training and 1,077 images for validation, ensuring no overlap between validation sets. Finally, 548 images were used for internal testing, while all 2,503 images from Batch 2 were used for external independent validation.

Table 1 Patient characteristics in Batch1 dataset and Batch2 dataset.

Our model is composed of three interconnected modules (Fig. 3):

  1. 1)

    Backbone Network: We adapted the state-of-the-art ResNet-34 as our backbone network to extract relevant features from each thyroid image. Specifically, we removed the final classification layer, which is the fully connected layer with 1,000 neurons. The backbone network processes thyroid images from each patient to automatically extract tumor-related nodule patterns.

  2. 2)

    Dual Attention Feature Aggregation Module: The features extracted by the backbone network are passed through a dual attention module (Fig. 3). This module comprises two sub-modules: spatial attention and instance attention. The input to the dual attention module is the output from the final convolutional layer of the backbone network. First, the spatial attention sub-module filters the input features of multiple images from each patient along the spatial dimension to capture subtle relationships between adjacent regions within each image. Importance scores are generated for each region to quantify its contribution to the final prediction. These spatially filtered features are then processed by the instance attention sub-module, which assigns attention scores to weight the different images from each case. This process aggregates the features into a patient-level thyroid nodule representation.

  3. 3)

    Fully Connected Layer Classifier: The final module is a fully connected layer with 2 neurons, which converts the patient-level features generated by the dual attention module into a histological diagnosis prediction for the patient.

Fig. 3
Fig. 3
Full size image

Overview of the proposed deep learning framework in this study. (a,b) The training strategy, where multiple thyroid images from each patient are input into ThyUS2Path to predict the histological label. (c) The structure of the framework, which is composed of three main components: the backbone network, the dual-attention module, and the fully connected classifier.

We utilized 6,005 thyroid images from 601 patients to develop the deep learning framework. The dataset characteristics are summarized in Fig. 4. As depicted in Fig. 4a, the training dataset included 601 patients, comprising 218 benign cases and 383 malignant cases, each with corresponding histological label. Since each patient may have multiple thyroid images, we provided a statistical overview of all patients in Fig. 4b. Additionally, we displayed representative thyroid nodule images from both malignant and benign patients for comparison in Fig. 1.

Fig. 4
Fig. 4
Full size image

Performance comparison between ThyUS2Path and state-of-the-art MIL-based methods on the internal test set.

Due to variations caused by different operators in thyroid ultrasonography, images from different cohorts can differ in appearance. Ensuring the generalizability of computational ultrasonography algorithms to real-world clinical data is essential. To address this, we collected 2,503 thyroid nodule images from a separate cohort as an external test dataset. As shown in Table 2 and Fig. 5, the results on this dataset were promising, with AUROCs (area under the receiver operating characteristic curve) ranging from 0.70 to 0.80 and AUPRCs (area under the precision-recall curve) from 0.78 to 0.83, demonstrating that ThyUS2Path generalized well to heterogeneous real-world data. Furthermore, the other two MIL-based methods also yielded promising results, further validating the validity and reliability of our dataset. These findings may be used to evaluated the generalization performance of different deep learning algorithms for thyroid nodule identification.

Table 2 Model performance of five-fold cross validation on the external test set.
Fig. 5
Fig. 5
Full size image

ROC and PR curves of ThyUS2Path and state-of-the-art MIL-based methods on the external test set.

Limitations

The datasets and model have some limitations. Firstly, because the dataset is from retrospective cohorts, there may be some quality problems such as lower resolution in some of these images. Secondly, we validated the datasets only on three different models. For a more comprehensive evaluation, future studies should include a broader range of models to establish a more robust baseline. Despite these limitations, these datasets still provide valuable insights into the thyroid nodules identification and can serve as a foundation for future research.