An ultrasonography of thyroid nodules dataset with pathological diagnosis annotation for deep learning

Hou, Xiaowen; Hua, Menglei; Zhang, Wei; Ji, Jianxin; Zhang, Xuan; Jiang, Huiru; Li, Mengyun; Wu, Xiaoxiao; Zhao, Wenwen; Sun, Shuxin; Cao, Lei; Wang, Liuying

doi:10.1038/s41597-024-04156-5

Download PDF

Data Descriptor
Open access
Published: 23 November 2024

An ultrasonography of thyroid nodules dataset with pathological diagnosis annotation for deep learning

Xiaowen Hou^1,2^na1,
Menglei Hua³^na1,
Wei Zhang⁴,
Jianxin Ji³,
Xuan Zhang³,
Huiru Jiang⁵,
Mengyun Li²,
Xiaoxiao Wu²,
Wenwen Zhao²,
Shuxin Sun⁶,
Lei Cao³ &
…
Liuying Wang⁷

Scientific Data volume 11, Article number: 1272 (2024) Cite this article

11k Accesses
10 Citations
Metrics details

Subjects

Abstract

Ultrasonography (US) of thyroid nodules is often time consuming and may be inconsistent between observers, with a low positivity rate for malignancy in biopsies. Even after determining the ultrasound Thyroid Imaging Reporting and Data System (TIRADS) stage, Fine needle aspiration biopsy (FNAB) is still required to obtain a definitive diagnosis. Although various deep learning methods were developed in medical field, they tend to be trained using TI-RADS reports as image labels. Here, we present a large US dataset with pathological diagnosis annotation for each case, designed for developing deep learning algorithms to directly infer histological status from thyroid ultrasound images. The dataset was collected from two retrospective cohorts, which consists of 8508 US images from 842 cases. Additionally, we explained three deep learning models used as validation examples using this dataset.

ELTIRADS framework for thyroid nodule classification integrating elastography, TIRADS, and radiomics with interpretable machine learning

Article Open access 13 March 2025

Deep learning-based classification of thyroid nodules using uncertainty-aware multi-modal ultrasound imaging

Article Open access 12 January 2026

Identification of lesion location and discrimination between benign and malignant findings in thyroid ultrasound imaging

Article Open access 30 December 2024

Background & Summary

The widespread use of imaging methods has contributed to a 2.4-fold increase in the incidence of thyroid cancer over the last 30 years, the fastest increase of any cancer type^1,2, with a correspondingly unchanged or declining mortality rate for thyroid cancer³. This has given rise to the need for accurate diagnosis of thyroid cancer. There are two commonly used methods for diagnosing benign and malignant thyroid nodules: non invasive ultrasonography (US)⁴ and invasive fine-needle aspiration biopsy (FNAB)⁵. Currently, ultrasound is the first clinical choice of thyroid nodules screening, because of its non-radioactivity, easy-to-operate, and rapid diagnostic work-up^6,7,8,9,10. In actual studies, there are overlapping ultrasound image features of benign and malignant thyroid nodules with blurred nodule appearance and irregular shape^11,12, making manual feature extraction and annotation difficult for experience-based clinicians. In addition, the clinical experience of the operator, and different diagnostic criteria can interfere with the physician’s assessment of thyroid nodules^13,14. The sensitivity of using US to diagnose thyroid cancer ranges from 27% to 63% only^15,16. In clinical practice, ultrasound Thyroid Imaging Reporting and Data System (TIRADS) stage >3 is recommended to perform FNAB or surgical resection¹⁷. The use of FNAB can cause trauma to patients and incur additional costs. Moreover, FNAB is not absolutely effective and suffers from ultrasound localization of nodules, failing to provide a definitive diagnosis in at least 20% of patients or requiring repeated FNABs that still do not yield definitive results^18,19,20,21. Therefore, to address the exponentially increasing patient demand and reduce the burden on healthcare services, computer-aided diagnosis was introduced to improve diagnostic performance. This approach aims to automate image analysis, providing a robust and reliable diagnosis²².

Deep learning (DL) is a subset of machine learning (ML) and artificial intelligence (AI) and can automatically extract features from images with complex hierarchical structures²³. The introduction of DL such as convolutional neural networks in thyroid imaging has achieved better diagnostic results than experienced radiologists^24,25,26. However, in practice, thyroid ultrasonography acquires the region of interest by reading dynamic images, the number of which is usually large and uneven. Different nodules of the same patient may have different labels, and the annotation of all images is a tedious and time-consuming task. Furthermore, the previous DL-based methods tend to be trained using TI-RADS reports as image labels. After that, histological examinations must be conducted to obtain the final histological diagnosis. Given the difficulty and inconsistency of dynamic image annotation in US, FNAB remains the gold standard for nodule diagnosis even after ultrasound assessment. Therefore, we have built a US of thyroid nodules dataset with direct histological diagnostic labels, aiming to explore the relationship between ultrasound images and videos and histological diagnosis.

Methods

Subject characteristics

The thyroid nodules ultrasound images from 842 cases were collected at the Second Affiliated Hospital of Jiaxing University in China in 2019–2022 (see Fig. 1). The datasets were prepared according to the following inclusion criteria: (1) hemi‐ or total thyroidectomy, (2) maximum nodule diameter 2.5 cm, (3) examination by conventional US and real‐time elastography (RTE) within 1 month before surgery, and (4) no previous thyroid surgery or percutaneous thermotherapy. This study received approval from the institutional review boards of the Second Affiliated Hospital of Jiaxing University (No. 2022ZFYJ295-01). The requirement of obtaining written informed consent from patients was forgone because retrospective data collection has not impacted the standard diagnostic procedures and all data has been anonymized before being entered into the database. The ethics committee has issued a waiver of consent and approved the open publication of the dataset.

Image acquisition

Ultrasound acquisitions were performed using a portable machine equipped with a 3.5 MHz probe. The Esaote My Lab was used in the Second Affiliated Hospital of Jiaxing University. During each acquisition, the operator positioned the probe. These original images were cropped to remove sensitive information about patients. Histological labels were obtained according to the histological diagnosis report of the corresponding tissue slides.

Data Records

All the ultrasound images in ‘JPG’ format for each case with pathological diagnosis annotation (benign or malignant) is available at the public figshare repository²⁷. The images in JPG format and their associated meta data in .csv format are stored in .zip files within the repository. The datasets file structure are shown in Fig. 2, the demographic and histological label information can be matched using the corresponding case name in the .csv files.

Technical Validation

To validate the dataset proposed in this study, we introduced a novel dual attention-guided deep learning framework, ThyUS2Path. We assessed its performance against two state-of-the-art multiple instance learning (MIL) based methods, Meanpool and Maxpool. The MIL is a type of weakly supervised learning where training instances are arranged in sets, and a label is provided for the entire set, opposedly to the instance themselves. The dataset was organized into two batches: (1) Batch 1, consisting of 6,005 thyroid images from 601 patients, and (2) Batch 2, consisting of 2,503 thyroid images from 241 patients. The subject demographics of these datasets are shown in Table 1. Model training was conducted using a 5-fold cross-validation strategy. For Batch 1, 90% of the patients were allocated to the training-validation set, while the remaining 10% were reserved for testing. The training-validation set was further split using 5-fold cross-validation, with each fold comprising 4,380 images for training and 1,077 images for validation, ensuring no overlap between validation sets. Finally, 548 images were used for internal testing, while all 2,503 images from Batch 2 were used for external independent validation.

Table 1 Patient characteristics in Batch1 dataset and Batch2 dataset.

Full size table

Our model is composed of three interconnected modules (Fig. 3):

1)
Backbone Network: We adapted the state-of-the-art ResNet-34 as our backbone network to extract relevant features from each thyroid image. Specifically, we removed the final classification layer, which is the fully connected layer with 1,000 neurons. The backbone network processes thyroid images from each patient to automatically extract tumor-related nodule patterns.
2)
Dual Attention Feature Aggregation Module: The features extracted by the backbone network are passed through a dual attention module (Fig. 3). This module comprises two sub-modules: spatial attention and instance attention. The input to the dual attention module is the output from the final convolutional layer of the backbone network. First, the spatial attention sub-module filters the input features of multiple images from each patient along the spatial dimension to capture subtle relationships between adjacent regions within each image. Importance scores are generated for each region to quantify its contribution to the final prediction. These spatially filtered features are then processed by the instance attention sub-module, which assigns attention scores to weight the different images from each case. This process aggregates the features into a patient-level thyroid nodule representation.
3)
Fully Connected Layer Classifier: The final module is a fully connected layer with 2 neurons, which converts the patient-level features generated by the dual attention module into a histological diagnosis prediction for the patient.

We utilized 6,005 thyroid images from 601 patients to develop the deep learning framework. The dataset characteristics are summarized in Fig. 4. As depicted in Fig. 4a, the training dataset included 601 patients, comprising 218 benign cases and 383 malignant cases, each with corresponding histological label. Since each patient may have multiple thyroid images, we provided a statistical overview of all patients in Fig. 4b. Additionally, we displayed representative thyroid nodule images from both malignant and benign patients for comparison in Fig. 1.

Due to variations caused by different operators in thyroid ultrasonography, images from different cohorts can differ in appearance. Ensuring the generalizability of computational ultrasonography algorithms to real-world clinical data is essential. To address this, we collected 2,503 thyroid nodule images from a separate cohort as an external test dataset. As shown in Table 2 and Fig. 5, the results on this dataset were promising, with AUROCs (area under the receiver operating characteristic curve) ranging from 0.70 to 0.80 and AUPRCs (area under the precision-recall curve) from 0.78 to 0.83, demonstrating that ThyUS2Path generalized well to heterogeneous real-world data. Furthermore, the other two MIL-based methods also yielded promising results, further validating the validity and reliability of our dataset. These findings may be used to evaluated the generalization performance of different deep learning algorithms for thyroid nodule identification.

Table 2 Model performance of five-fold cross validation on the external test set.

Full size table

Limitations

The datasets and model have some limitations. Firstly, because the dataset is from retrospective cohorts, there may be some quality problems such as lower resolution in some of these images. Secondly, we validated the datasets only on three different models. For a more comprehensive evaluation, future studies should include a broader range of models to establish a more robust baseline. Despite these limitations, these datasets still provide valuable insights into the thyroid nodules identification and can serve as a foundation for future research.

Code availability

The code used in this study was written in Python3 and is available at GitHub (https://github.com/TomHardy1997/ThyUS2Path). The code is based on PyTorch (version 1.6).

References

Hayat, M. J., Howlader, N., Reichman, M. E. & Edwards, B. K. Cancer statistics, trends, and multiple primary cancer analyses from the Surveillance, Epidemiology, and End Results (SEER) Program. Oncologist. 12(1), 20–37 (2007).
Article PubMed Google Scholar
Zhu, C. et al. A birth cohort analysis of the incidence of papillary thyroid cancer in the United States, 1973–2004. Thyroid. 19(10), 1061–6 (2009).
Article PubMed PubMed Central Google Scholar
Ahn, H. S. et al. Thyroid Cancer Screening in South Korea Increases Detection of Papillary Cancers with No Impact on Other Subtypes or Thyroid Cancer Mortality. Thyroid. 26(11), 1535–40 (2016).
Article PubMed Google Scholar
Brito, J. P. et al. The accuracy of thyroid nodule ultrasound to predict thyroid cancer: systematic review and meta-analysis, The Journal of Clinical Endocrinology & Metabolism 99(4), pp. 1253–1263 (2014).
Cibas, E. S. & Ali, S. Z. The 2017 bethesda system for reporting thyroid cytopathology, Thyroid, vol. 27, no. 11, pp. 1341–1346, (2017).
Chikui, T. et al. Quantitative analyses of sonographic images of the parotid gland in patients with Sjögren’s syndrome. Ultrasound Med Biol. 32(5), 617–22 (2006).
Article PubMed Google Scholar
Haugen, B. R. et al. 2015 American Thyroid Association Management Guidelines for Adult Patients with Thyroid Nodules and Differentiated Thyroid Cancer: The American Thyroid Association Guidelines Task Force on Thyroid Nodules and Differentiated Thyroid Cancer. Thyroid. 26(1), 1–133 (2016).
Article MathSciNet PubMed PubMed Central Google Scholar
Mahesh, M. The Essential Physics of Medical Imaging, Third Edition. Med Phys. 40(7) (2013).
Jemal, A., Siegel, R., Xu, J. & Ward, E. Cancer statistics, 2010. CA Cancer J Clin. 60(5), 277–300 (2010).
Article PubMed Google Scholar
Cooper, D. et al. Revised American Thyroid Association management guidelines for patients with thyroid nodules and differentiated thyroid cancer. Thyroid 19(11), 1167–1214 (2009).
Article PubMed Google Scholar
Iannuccilli, J. D., Cronan, J. J. & Monchik, J. M. Risk for malignancy of thyroid nodules as assessed by sonographic criteria: the need for biopsy. J Ultrasound Med. 23(11), 1455–64 (2004).
Article PubMed Google Scholar
Wienke, J. R., Chong, W. K., Fielding, J. R., Zou, K. H. & Mittelstaedt, C. A. Sonographic features of benign thyroid nodules: interobserver reliability and overlap with malignancy. J Ultrasound Med. 22(10), 1027–31 (2003).
Article PubMed Google Scholar
Tessler, F. N. et al. ACR Thyroid Imaging, Reporting and Data System (TI-RADS): White Paper of the ACR TI-RADS Committee. J Am Coll Radiol. 14(5), 587–95 (2017).
Article PubMed Google Scholar
Russ, G. et al. European Thyroid Association Guidelines for Ultrasound Malignancy Risk Stratification of Thyroid Nodules in Adults: The EU-TIRADS. Eur Thyroid J. 6(5), 225–37 (2017).
Article PubMed PubMed Central Google Scholar
Frates, M. C. et al. Management of thyroid nodules detected at US: Society of Radiologists in Ultrasound consensus conference statement. Radiology. 237(3), 794–800 (2005).
Article PubMed Google Scholar
Remonti, L. R., Kramer, C. K., Leitão, C. B., Pinto, L. C. & Gross, J. L. Thyroid ultrasound features and risk of carcinoma: a systematic review and meta-analysis of observational studies. Thyroid. 25(5), 538–50 (2015).
Article PubMed PubMed Central Google Scholar
Hang, J. F., Hsu, C. Y. & Lai, C. R. Thyroid Fine-Needle Aspiration in Taiwan: The History and Current Practice. J Pathol Transl Med. 51(6), 560–4 (2017).
Article PubMed PubMed Central Google Scholar
Misiakos, E. P. et al. Cytopathologic diagnosis of fine needle aspiration biopsies of thyroid nodules. World J Clin Cases. 4(2), 38–48 (2016).
Article PubMed PubMed Central Google Scholar
Sebo, T. J. What are the keys to successful thyroid FNA interpretation? Clin Endocrinol (Oxf). 77(1), 13–7 (2012).
Article PubMed Google Scholar
Theoharis, C. G., Schofield, K. M., Hammers, L., Udelsman, R. & Chhieng, D. C. The Bethesda thyroid fine-needle aspiration classification system: year 1 at an academic institution. Thyroid. 19(11), 1215–23 (2009).
Article PubMed Google Scholar
VanderLaan, P. A., Marqusee, E. & Krane, J. F. Clinical outcome for atypia of undetermined significance in thyroid fine-needle aspirations: should repeated fna be the preferred initial approach? Am J Clin Pathol. 135(5), 770–5 (2011).
Article PubMed Google Scholar
Doi, K. Computer-aided diagnosis in medical imaging: historical review, current status and future potential. Comput Med Imaging Graph. 31(4-5), 198–211 (2007).
Article PubMed PubMed Central Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature. 521(7553), 436–44 (2015).
Article ADS PubMed CAS Google Scholar
Li, X. et al. Diagnosis of thyroid cancer using deep convolutional neural network models applied to sonographic images: a retrospective, multicohort, diagnostic study. Lancet Oncol. 20(2), 193–201 (2019).
Article PubMed Google Scholar
Wei, X. et al. Ensemble Deep Learning Model for Multicenter Classification of Thyroid Nodules on Ultrasound Images. Med Sci Monit. 26, e926096 (2020).
Article PubMed PubMed Central CAS Google Scholar
Shen, W., Zhou, M., Yang, F., Yang, C. & Tian, J. Multi-scale Convolutional Neural Networks for Lung Nodule Classification. Inf Process Med Imaging. 24, 588–99 (2015).
PubMed Google Scholar
Hou, X. W. et al. An ultrasonography of thyroid nodules dataset with pathological diagnosis annotation for deep learning. https://doi.org/10.6084/m9.figshare.27021604.v1 (2024).
Article Google Scholar

Download references

Acknowledgements

The study was supported by National Natural Science Foundation of China (NSFC) (82304250, 82273734). Shanghai Three-Year Action Plan for Strengthening Public Health System (GWVI-11.1-26), Cardiovascular Disease - Cardiovascular and Cerebrovascular Diseases, Key Discipline Project. Shanghai Municipal Science and Technology Commission Professional and Technical Service Platform Project (22DZ2292400), “Shanghai Municipal Cardiovascular and Cerebrovascular Disease Biological Sample and Database Professional and Technical Service Platform”. 2023ZZ02021, Shanghai Research Center of Cardiovascular and Cerebrovascular Diseases, Shanghai Municipal Health Commission, Shanghai, China. 2022JC013, “Accurate Early Warning and Intervention of Cardiovascular and Cerebrovascular Diseases Based on the Whole-Life Cohort”, Emerging Cross-cutting Research Program, Shanghai Municipal Health Commission. Health Research Program of Pudong New Area Health Commission (PW2023E-02), “Effects of Sleep Patterns on Cardiovascular and Cerebrovascular Diseases in a Community Cohort in Pudong”.

Author information

These authors contributed equally: Xiaowen Hou, Menglei Hua.

Authors and Affiliations

Ningbo Hangzhou Bay Hospital, Ningbo, China
Xiaowen Hou
Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine, Ningbo, China
Xiaowen Hou, Mengyun Li, Xiaoxiao Wu & Wenwen Zhao
Department of Biostatistics, School of Public Health, Harbin Medical University, Harbin, 150081, China
Menglei Hua, Jianxin Ji, Xuan Zhang & Lei Cao
Division of Cardiology, State Key Laboratory of Systems Medicine for Cancer, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
Wei Zhang
Department of Cardiology Renji Hospital, Shanghai Jiao Tong University, Shanghai, 200127, China
Huiru Jiang
Department of Ultrasonography, The Second Affiliated Hospital of Jiaxing University, Jiaxing, China
Shuxin Sun
Department of Health Management, Harbin Medical University, Harbin, 150081, China
Liuying Wang

Authors

Xiaowen Hou
View author publications
Search author on:PubMed Google Scholar
Menglei Hua
View author publications
Search author on:PubMed Google Scholar
Wei Zhang
View author publications
Search author on:PubMed Google Scholar
Jianxin Ji
View author publications
Search author on:PubMed Google Scholar
Xuan Zhang
View author publications
Search author on:PubMed Google Scholar
Huiru Jiang
View author publications
Search author on:PubMed Google Scholar
Mengyun Li
View author publications
Search author on:PubMed Google Scholar
Xiaoxiao Wu
View author publications
Search author on:PubMed Google Scholar
Wenwen Zhao
View author publications
Search author on:PubMed Google Scholar
Shuxin Sun
View author publications
Search author on:PubMed Google Scholar
Lei Cao
View author publications
Search author on:PubMed Google Scholar
Liuying Wang
View author publications
Search author on:PubMed Google Scholar

Contributions

M.H., S.S., L.C., X.H., K.L. and L.W. conceived of the project, M.H., X.Z. and J.J. designed the model and developed the algorithms, S.S., X.H. provided and annotated the images, W.Z., H.J., M.L., X.W. and W.Z. participated in workstation environment deployment. M.H., J.J., L.W and X.Z. contributed to the preprocess of the data. M.H. and L.W. wrote the manuscript. All the authors revised the paper.

Corresponding authors

Correspondence to Shuxin Sun, Lei Cao or Liuying Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Hou, X., Hua, M., Zhang, W. et al. An ultrasonography of thyroid nodules dataset with pathological diagnosis annotation for deep learning. Sci Data 11, 1272 (2024). https://doi.org/10.1038/s41597-024-04156-5

Download citation

Received: 20 June 2024
Accepted: 18 November 2024
Published: 23 November 2024
Version of record: 23 November 2024
DOI: https://doi.org/10.1038/s41597-024-04156-5

This article is cited by

Training the diagnostic artificial intelligence in thyroid sonography: how well is deep learning truly learning?
- Oana Lozan
- Petra B. Musholt
- Thomas J. Musholt
Updates in Surgery (2026)
TN5000: An Ultrasound Image Dataset for Thyroid Nodule Detection and Classification
- Huan Zhang
- Qianglin Liu
- Weidong Sun
Scientific Data (2025)
An annotated heterogeneous ultrasound database
- Yuezhe Yang
- Yonglin Chen
- Yong Dai
Scientific Data (2025)