Explainable multimodal deep learning for predicting thyroid cancer lateral lymph node metastasis using ultrasound imaging

Shen, Pengcheng; Yang, Zheyu; Sun, Jingjing; Wang, Yun; Qiu, Cheng; Wang, Yirou; Ren, Yongyong; Liu, Sheng; Cai, Wei; Lu, Hui; Yao, Siqiong

doi:10.1038/s41467-025-62042-z

Download PDF

Article
Open access
Published: 01 August 2025

Explainable multimodal deep learning for predicting thyroid cancer lateral lymph node metastasis using ultrasound imaging

Pengcheng Shen¹^na1,
Zheyu Yang²^na1,
Jingjing Sun³^na1,
Yun Wang⁴,
Cheng Qiu⁵,
Yirou Wang⁶,
Yongyong Ren ORCID: orcid.org/0000-0001-9217-3483^1,7,
Sheng Liu⁸,
Wei Cai ORCID: orcid.org/0000-0002-4765-3370²,
Hui Lu ORCID: orcid.org/0000-0001-8347-0830^1,7,9 &
…
Siqiong Yao ORCID: orcid.org/0000-0001-6968-1586^1,7

Nature Communications volume 16, Article number: 7052 (2025) Cite this article

20k Accesses
20 Citations
6 Altmetric
Metrics details

Subjects

Abstract

Preoperative prediction of lateral lymph node metastasis is clinically crucial for guiding surgical strategy and prognosis assessment, yet precise prediction methods are lacking. We therefore develop Lateral Lymph Node Metastasis Network (LLNM-Net), a bidirectional-attention deep-learning model that fuses multimodal data (preoperative ultrasound images, radiology reports, pathological findings, and demographics) from 29,615 patients and 9836 surgical cases across seven centers. Integrating nodule morphology and position with clinical text, LLNM-Net achieves an Area Under the Curve (AUC) of 0.944 and 84.7% accuracy in multicenter testing, outperforming human experts (64.3% accuracy) and surpassing previous models by 7.4%. Here we show tumors within 0.25 cm of the thyroid capsule carry >72% metastasis risk, with middle and upper lobes as high-risk regions. Leveraging location, shape, echogenicity, margins, demographics, and clinician inputs, LLNM-Net further attains an AUC of 0.983 for identifying high-risk patients. The model is thus a promising for tool for preoperative screening and risk stratification.

Deep learning-based multimodal radiopathomics for preoperative prediction of lymph node metastasis in papillary thyroid carcinoma

Article Open access 24 April 2026

Predictive risk-scoring model for lateral lymph node metastasis in papillary thyroid carcinoma

Article Open access 19 March 2025

Preoperative prediction of lymph node metastasis risk in papillary thyroid carcinoma based on multiple model comparisons

Article Open access 09 October 2025

Introduction

Thyroid cancer is a prevalent malignancy worldwide, with an increasing incidence reported globally^1,2. Cervical lymph node metastasis, encompassing both central and lateral compartments, is a critical factor affecting patient prognosis, with an incidence rate of 20–50%³, and increasing the risk of mortality by 46%^4,5. The central compartment is widely recognized as the station for lymph node metastasis⁶, and numerous related studies have been conducted^7,8,9. In contrast, studies on lateral compartment metastasis are scarce, primarily due to: (1) the complexity of the anatomical structure¹⁰, with the distribution pathways and connection patterns of lymphatic vessels around the thyroid varying among individuals¹¹; (2) the dispersed distribution of lateral lymph nodes, complicating statistical analysis and potentially leading to omissions¹²; (3) limitations of research methods—lateral lymph nodes are often located in deep cervical tissues, and their small size and depth make accurate detection and research challenging, with ultrasound sensitivity in predicting lateral lymph node metastasis (LLNM) being only 62%³; and (4) difficulty in sample acquisition. Biopsy of lateral lymph nodes requires a high level of clinician expertise and poses certain risks to patients, limiting the scale and depth of related studies¹³. Furthermore, central compartment lymph nodes are routinely dissected, whereas prophylactic dissection of lateral lymph nodes is not typically considered standard procedure in some countries and regions, leading to a significant shortage of available samples for research³.

Preoperative prediction of Lateral Lymph Node Metastasis (LLNM) is crucial for surgical planning and prognostic management in thyroid cancer. A positive LLNM result typically indicates that the tumor has begun to spread more extensively, prompting physicians to adopt a more aggressive treatment strategy, which may include expanding the surgical field and considering postoperative radiotherapy or chemotherapy¹⁴. Literature^15,16 has shown that LLNM is associated with a worse prognosis; the recurrence rate is significantly higher compared to patients with central compartment lymph node metastasis (60% vs. 30%, P = 0.007). Disease-free survival and average recurrence time are also markedly shorter (30 months vs. 52 months, P = 0.035, and 7 months vs. 44 months, P = 0.004, respectively)¹⁵. Therefore, effectively predicting LLNM enables physicians to develop more appropriate treatment plans, reduce the risk of cancer progression due to missed dissections, and more accurately assess patient prognosis, providing more comprehensive support and care^3,16. In some cases, patients may present with skip metastases—negative central compartment and positive lateral compartment^17,18 —which are prone to being missed during preoperative evaluation and surgery¹⁹. Moreover, the prognosis of skip metastases varies among different tumor types^20,21, suggesting that clinicians should consider the specific biological characteristics of the tumor and the anatomical pathways of the lymphatic system¹³. Currently, preoperative lymph node biopsy is the standard method for evaluating LLNM³; however, ultrasound-guided fine-needle aspiration has limitations, such as inaccurate or missed punctures^3,22. Given the particular importance of LLNM, there is an urgent need for preoperative evaluation methods that can effectively predict LLNM, assisting clinicians in determining the nature of the disease and taking appropriate measures, thereby contributing significantly to improving patient survival rates.

Current research indicates that cervical lymph node metastasis is closely associated with the histological morphology and location of the primary tumor^6,23,24. Specifically, it often results from the growth and spread of the primary tumor (thyroid nodule), with different tumor types exhibiting varying metastatic tendencies. For instance, abnormally enlarged tumors may be an important sign of lymph node metastasis²⁵, while tumors with abnormal morphology or texture may also suggest the possibility of metastasis^6,26. Moreover, the growth location of the primary tumor affects the invasion pathway of tumor cells into the lymphatic system, increasing the likelihood of superior pole metastasis^23,24 and influencing the risk and prognosis of lymph node metastasis in different regions^3,15.

Ultrasound imaging, known for its non-invasive, real-time, and convenient features, is one of the most common diagnostic methods for thyroid cancer. It helps physicians detect early tumor abnormalities such as increased size, irregular shape, and abnormal internal structure, thereby playing a predictive role in LLNM^27,28,29. However, it suffers from low inter-organ contrast and poor image quality, and evaluation results heavily depend on the physician experience. Deep learning techniques can enhance tumor recognition by learning image features such as tumor morphology, size, and calcification^{6,30,31,32,33}. Techniques like foreground-background algorithms and graph convolutional networks statistically analyze positional information⁷, assisting physicians in preoperative diagnosis and prognosis assessment, including tumor malignancy grading, subtype evaluation, and prediction of cervical lymph node metastasis^34,35. Yet, there is a lack of large-scale cohort studies and efficient intelligent tools for precise analysis of LLNM^{16,27,36,37,38}, with conclusions often lacking qualitative/quantitative explanations^23,24.

Here we show the LLNM-Net, a bidirectional attention architecture that integrates multimodal data for preoperative LLNM prediction. As illustrated in Fig. 1a, we employ foreground-optimized segmentation^39,40 and Central Point Distance Transformation (CPDT)⁴¹ to extract tumor morphology and precise location. Our Thyroid Multimodal Deep Learning (TMDL) transformer⁴² (Fig. 1b) fuses imaging features with clinical reports and demographic data via bidirectional attention exchange^43,44,45. We generate 3D risk heatmaps through diffeomorphic registration and perform attention-based gradient analysis to interpret metastasis mechanisms. Evaluated on 39,451 patients from seven institutions (Fig. 2), LLNM-Net provides quantitative preoperative assessment to guide surgical planning and prognosis management.

**Fig. 1: Network architecture of LLNM-Net.**

**Fig. 2: Flow diagram summarizing the inclusion of patients.**

Results

Data description

We collected pathological diagnoses, preoperative ultrasound images, radiology reports, and demographic information from a cohort of 39,451 patients (Table 1). Notably, the median age was 43 years, with female patients outnumbering male patients by more than twofold, and approximately 91% of the cohort identified as Han ethnicity. The most represented categories in the Kwak Thyroid Imaging Reporting and Data System (Kwak-TIRADS)⁴⁶ were 4B (48%) and 4C (28%). Patients with thyroid nodules smaller than 10 mm accounted for 73%, and the rate of LLNM-positive patients was 52%. The subtypes collected included 35,804 cases of papillary thyroid carcinoma (PTC), 2845 cases of follicular thyroid carcinoma (FTC), and 802 cases of medullary thyroid carcinoma (MTC). We trained the model using 80% of the 29,615 patients from two hospitals, with the remaining 20% used for model validation. The external test sets comprised 9836 patients from five multicenter sites. More detailed information can be found in Tables S1 and S2.

Table 1 Characteristics of the patients at baseline^a

Full size table

Prediction performance of models and human experts

Figure 3a shows that LLNM-Net exhibits significantly superior predictive performance, with an Area Under the Curve (AUC) of 0.948 (95% CI: 0.946–0.950) on the validation set. Furthermore, the AUC on the external test set is 0.944 (95% CI: 0.942–0.945), with an accuracy of 0.847(95% CI: 0.840–0.854). In the comparative test, LLNM-Net (accuracy = 0.875) is significantly higher than the predictive accuracy of human experts (accuracy = 0.643, paired t-test, t = 1.998, P = 0.0473) (Fig. 3a, b). Additionally, the high specificity and PPV demonstrated in the results can more effectively prevent missed diagnoses and enhance the screening performance for LLNM. The accuracy of the segmentation module is presented in Table S4, and the accuracy results for FTC and MTC can be found in Table S5. Comparative experimental results with the latest published AI methods are provided in Table S7.

**Fig. 3: Prediction performance of model and human.**

The study recruited 108 imaging experts, including 42 senior radiologists (with over five years of clinical experience, 21 female and 21 male) and 66 junior radiologists (with three to five years of clinical experience, 33 female and 33 male). These physicians demonstrated high accuracy in diagnosing the malignancy of thyroid nodules (Fig. 3b), with an average accuracy of 0.883, specificity of 0.899, and sensitivity of 0.868. However, in the experiments predicting LLNM classification, both senior and junior physicians exhibited lower predictive capabilities, with an accuracy of 0.643, specificity of 0.642, and sensitivity of 0.644 (Fig. 3b, c). This indicates that the ability to predict LLNM based on traditional clinical knowledge and subjective experience is limited, whereas artificial intelligence models can learn important factors contributing to LLNM from a large number of features.

Qualitative and quantitative analysis for predicting LLNM

Figure 4 provides qualitative and quantitative analyses for predicting LLNM in the locational dimension. Figure 4a displays the visualization results of nodule locational information. We defined the central point and calculated the minimum distance from all points within the nodule region, obtaining the location feature image through the transformation φ. The model learns iteratively and computes the attention heatmap through gradient-weighted calculations⁴⁷, with the heatmap indicating areas identified by the model that have a greater impact on LLNM metastasis. Figure 4b presents a statistical analysis of the minimum distance from the nodule to the thyroid capsule. The results show that as the minimum distance decreases, the probability of LLNM correspondingly increases. When the minimum distance is less than 0.25 cm, the average probability of LLNM occurrence exceeds 72%.

**Fig. 4: Qualitative and quantitative assessment of nodule locational information.**

Figure 4c illustrates the three-dimensional risk heatmap model for LLNM occurrence in thyroid regions, statistically analyzed based on the diffeomorphic affine transformation method^48,49. This model is publicly available at: https://snowinbio.github.io/LLNM-Net/. The results indicate that the upper central region of the left lobe of the thyroid, as well as the medial and lateral regions of the upper part of the right lobe, are key areas of concern for LLNM occurrence, likely closely related to lymphatic metastasis pathways. Figure 4d showcases an example application of the heatmap model. When provided with the growth location of a thyroid nodule, the model can automatically generate the probability value for LLNM occurrence at that location.

In Fig. 5, we present an example of LLNM-Net. The model takes both imaging data and user information as inputs and automatically outputs a multi-feature score. It also provides the contribution ratio of each feature, as illustrated by the Sankey diagram on the right. This multi-feature metastatic score helps physicians understand how the model predicts metastasis. To demonstrate that our interpretation aligns with clinical knowledge, we compared it with TIRADS. It can be concluded that there is a correlation between tumor malignancy and LLNM, as shown in Figs. S1, S2.

**Fig. 5: An illustrative example of LLNM-Net results.**

Figure 6a shows the contribution ratio of all features to LLNM, indicating that the growth location of the thyroid nodule is the most significant factor, accounting for 48.8%. This is followed by morphological features, which account for 29.9%. Among the morphological features, shape and internal morphology contribute 7.5% and 64.1%, respectively, suggesting that clinical attention should focus on nodules with abnormal shapes or significant enlargement. Text information from clinical reports contributes 19.7% overall. Gender is an important factor in demographics (accounting for 56.7% in demographics). Figure 6b displays the attention heatmaps for LLNM-negative and LLNM-positive cases, showing that the network focuses on the edges and various internal texture features. Figure S3 illustrates examples of the association between imaging reports and internal morphological features.

**Fig. 6: Attention analysis and decision curve analysis.**

The decision curve analysis (Fig. 6c) illustrates the clinical benefit of stratery (the benefit of effective LLND minus the cost of LLND). Under different cost-benefit ratio settings, our model-based strategy outperforms the naïve strategies of performing lateral lymph node (LLN) fine-needle aspiration (FNA) for all or no patients. Figure 6d shows the clinical impact of our model-based strategy within our cohort. At a threshold of 0.5, the strategy improves the identification of 47.4% of patients who should undergo LLN FNA, with only 5.3% of LLNM-negative patients undergoing unnecessary LLN FNA. Additionally, we conducted a reverse cognitive test to evaluate the interpretability of our model. The results indicate that clinicians’ understanding of the model outcomes improved by 25.4% compared to general AI (Fig. S4).

Predicting high-risk lymph node metastasis patients

The model is capable of predicting the stage of lateral lymph node metastasis. Based on clinical guidelines^3,50,51,52, we classified lymph node metastasis into three stages according to the number and size of metastatic nodes:

Stage 1: Low risk. Five or fewer micro-metastases (<0.2 cm in largest dimension).
Stage 2: Medium risk. More than five metastatic lymph nodes, and any metastatic lymph node <3 cm in largest dimension.
Stage 3: High risk. Any metastatic lymph node >3 cm in largest dimension.

Table 2 demonstrates that the model exhibits good predictive performance, achieving an average AUC of 0.971 in external test sets. This suggests that LLNM-Net can precisely identify individuals at medium to high risk, offering physicians reference advice for FNA testing, and ensure regular follow-ups for low-risk individuals, thereby optimizing the efficient allocation of medical resources.

Table 2 Performance of multi-level risk of lymph node metastasis^a

Full size table

Application of LLNM-Net in clinical practice

Figure 7 illustrates the traditional clinical guidelines³ and guidelines with LLNM-Net. In Fig. 7a, when patients undergo thyroid imaging, physicians assess the malignancy of nodules based on subjective experience and decide whether FNA is necessary. If needed, a cervical lymph node ultrasound examination is considered to evaluate the likelihood of LLNM and consider whether to perform FNA for the lateral lymph nodes. Then the decision to proceed with thyroidectomy and LLND is made based on the FNA results. However, whether the physician conducts a cervical ultrasound examination depends on subjective experience, leading to potential missed diagnoses. Incomplete coverage of the detection area during the ultrasound examination can also result in missed diagnoses, affecting the accuracy of FNA results for lymph nodes and influencing the decision to perform LLND.

**Fig. 7: Clinical application of LLNM-Net.**

In Fig. 7b, the integration of LLNM-Net aids in preoperative LLNM risk prediction, helping doctors determine whether to conduct cervical ultrasound examinations, thereby reducing missed diagnoses. During cervical ultrasound examinations, LLNM-Net highlights patients at risk of LLNM, decreasing the rate of missed diagnoses. It also recommends FNA for lymph nodes for medium and high risk LLNM patients, reducing missed diagnoses caused by incomplete ultrasound examination areas or physician judgment errors. In summary, LLNM-Net helps improve the diagnostic and treatment process for LLNM.

In this diagnostic study, we developed an interpretable multimodal deep learning model that can be implemented as an AI support system for LLNM risk assessment based on thyroid ultrasound images. This model provides qualitative and quantitative clinical explanations for predictions based on the fusion transformer method. This study addresses the lack of effective methods for preoperative diagnosis of LLNM, providing clinical insights while accurately screening high-risk populations, and significantly improving patient survival rates and societal welfare.

Discussion

We developed LLNM-Net, achieving precise preoperative prediction of LLNM and demonstrating strong generalizability across five external centers, with predictive performance reaching an AUC of 0.944 (95% CI: 0.942–0.945). Currently, ultrasound has a low sensitivity of 0.623 for detecting lateral neck regions. Our model’s AUC improved by 7.4% compared to the best existing model, and its accuracy was 20.4% higher than that of human physicians. In high-risk population screening, the AUC reached 0.971. Furthermore, compared to general AI scores, our model improved clinicians’ understanding of the results by 25.4%. This significantly addresses the gap in LLNM research, enhances preoperative predictions for more precise treatments, and guides physicians in early patient stratification for closer monitoring and treatment, thereby improving patient quality of life and survival rates. This is crucial for enhancing fairness in AI-based clinical diagnosis.

Our study provides guidance for clinical detection area research, revealing findings: nodules located in the upper middle region of the left lobe and the upper middle region of the right lobe are high-risk areas for LLNM, suggesting that clinicians should focus on these regions. This may be closely related to lymphatic metastasis pathways. Statistics show that when the minimum distance between the nodule and the capsule is less than 0.25 cm, the average probability of LLNM exceeds 72%, indicating a correlation between the minimum distance, growth region, and LLNM. We quantified the risk areas and feature contributions for each patient.

We addressed the challenge of small foregrounds and high intra-class variance in backgrounds in ultrasound images by using an attention-based foreground optimization segmentation network for precise nodule segmentation. To tackle the multi-scale issue in ultrasound images, we proposed a CPDT method to accurately extract the precise location information of thyroid nodules. To efficiently integrate ultrasound images and clinical information, we designed a multimodal deep learning approach based on a bidirectional attention exchange mechanism, extracting local interconnected information between report text and imaging features and uncovering clinicians’ latent findings. This end-to-end multimodal feature extraction method can be widely applied for efficient tumor detection, growth location analysis, and nature assessment, with the potential to solve most imaging diagnostic tasks in various diseases.

In traditional diagnostic processes, physicians rely on subjective experience to decide whether to perform cervical ultrasound examinations, leading to potential missed diagnoses. During the cervical ultrasound examination, incomplete coverage of the detection area may also result in missed diagnoses, affecting the accuracy of FNA results for lymph nodes and influencing the decision for LLND. Using LLNM-Net can optimize this process by helping reduce the missed detection rate of cervical ultrasounds and minimizing missed diagnoses caused by incomplete examination areas, thus improving clinical guidelines.

There are still some limitations in our study. First, the number of collected cases and disease types is limited. In the future, we plan to collect more extensive data from a broader population, including more subtypes, countries, and regions, to enhance LLNM-Net’s generalizability and applicability. Second, the actual clinical benefits of our model have not yet been validated. We plan to design prospective experiments for validation and explore the model’s real-world effectiveness across different institutions and regions.

Methods

Ethics approval

All clinical data, including demographics, operative procedures, pathology, and complications, were retrospectively collected. This study was approved by the local Ethics Committee and the Institutional Review Board (IRB) of Ruijin Hospital, Shanghai Jiao Tong University School of Medicine Hospital, and undertaken according to the Declaration of Helsinki. Informed consent from patients with thyroid cancer and controls was exempted by the IRB because of the retrospective nature of this study.

Data collection

We conducted a retrospective analysis by gathering preoperative thyroid ultrasound images, radiological reports, and clinical information from patients undergoing thyroidectomy. The criteria for patient inclusion in our study were as follows: (a) patients must be 18 years of age or older, (b) they should have undergone thyroid ultrasound examination with clear ultrasound images available, (c) a diagnosis of thyroid malignant nodule following thyroidectomy, (d) patients were required to have undergone central lymph node dissection with a total of at least 5 lymph nodes removed, and (e) there must be a pathologic assessment of FNA for LLNM. Exclusion criteria were as follows: missing pathological reports, surgery not on thyroid, and patients who had received preoperative treatment. To maintain a high standard of image quality, we implemented rigorous control measures, which involved excluding cases with poor image quality, one image with multifocal lesions, and images with measuring lines. Concurrently, as part of our data collection, we gathered extensive patient demographic information, radiological reports, Kwak Thyroid Imaging Reporting and Data Systems (Kwak-TIRADS) grade⁴⁶, postoperative pathology results, and details regarding LLNM. Each patient’s data includes two clear ultrasound images from different orientations, a complete ultrasound report, and clinical information. Sex information was collected through self-reporting. However, the primary objective of this study is to predict lateral lymph node metastasis in thyroid cancer, and no differential results were found for sex characteristics, so no further differentiation is made.

We collated patient data from seven hospitals to form training and test cohorts spanning from January 2015 to May 2021. Figure 1 delineates the process of patient inclusion and exclusion. Furthermore, the training cohort from two hospitals was subdivided into a training set and a validation set, while the test cohort from five additional hospitals was designated as external test sets. This methodical strategy guaranteed that our study population was a representative sample of diverse individuals across various geographical and ethnic strata in China. For patient privacy protection, the researchers were granted access solely to anonymized data.

For the classification of patients as LLNM positive, we selected individuals in whom at least one positive lymph node was identified among those excised during surgery. In terms of meticulous quality control for data annotation, we implemented a two-step process:

(1)
Differentiation of malignant nodules: All malignant nodules were diagnosed based on pathological reports. Independent ultrasound physicians with over 5 years of experience were assigned to reassess the images. In cases where discrepancies between their evaluation and the original report were identified, we sought expert judgment to resolve the differences.
(2)
Pathological annotation of nodules with LLNM: In managing patients with multiple nodules, determining which nodule metastasized to the lymph nodes posed a challenge during the annotation process. To address this complexity, three ultrasound radiologists were engaged to meticulously compare the ultrasound images with the corresponding pathological reports for each patient. Their objective was to select the images of nodules most likely to have metastasized, taking into account factors such as nodule location and degree of malignancy.

Data quality control principles

We obtained preoperative thyroid ultrasound images from seven hospitals. To analyze the ultrasound images, we first removed all patient, institution, and device information from the images. Then we trained an image cropping model to crop images from different institutions and devices, applying a standardized brightness range to achieve uniform images. To preserve the morphology of the nodules (particularly the aspect ratio), we did not use any scaling methods throughout the process. There are two types of clinical text data: unstructured imaging reports (containing the physician’s expertise) and structured demographic data (age and gender). We set the maximum length for imaging report data to 50 characters: if the report length exceeded 50 characters, we used only the first 50 characters; otherwise, we applied zero-padding to meet the length requirement.

LLNM-Net architecture

To effectively predict LLNM preoperatively, we have developed the LLNM-Net. This model combines segmentation, distance transformation, and intra-model attention exchange modules to achieve an integrated analysis of the tumor’s morphological and locational information. It also incorporates demographic information and clinical reports to provide a comprehensive prediction of LLNM. Additionally, it performs qualitative and quantitative analysis of the metastasis mechanism through attention-based feature analysis (Fig. 1).

Figure 1a shows the process of extracting independent features within the model. Ultrasound images of thyroid nodules present challenges such as small detection targets (foreground) and high intra-class variance in the background. We employed a foreground optimization segmentation network^39,40 based on an attention mechanism to achieve precise segmentation of thyroid nodules, simultaneously extracting morphological features such as internal morphology, edges, echogenicity differences, and shape. To address the issue of multi-scale input in ultrasound images, we proposed a CPDT method⁴¹ to accurately extract the precise location information of thyroid nodules.

The ultrasound report contains verbal descriptions by medical experts regarding nodule characteristics, such as “normal size and volume,” “heterogeneous echogenicity,” and “diffuse changes“^53,54,55 We designed a TMDL transformer⁴² based on a bidirectional attention exchange mechanism^43,44,45 to efficiently integrate imaging features, report text information, and patient demographic data (Fig. 1b). The TMDL consists of two embedding layers, four bidirectional attention blocks, and twelve self-attention blocks. The embedding layers convert inputs into image and text tokens, which are then processed through the bidirectional attention blocks. In these blocks, attention exchange is used to compute intermodal attention among tokens across different modalities, uncovering potential local interconnections between report text and imaging features, providing advantages over non-integrated models. The computed multimodal representations are then fed into the twelve self-attention blocks for efficient learning.

We conducted qualitative and quantitative analyses of key factors related to LLNM. Using a flexible diffeomorphic registration method⁴⁸, we created a risk heatmap from a three-dimensional perspective showing the likelihood of LLNM occurrence in different thyroid regions. Additionally, we used attention-based gradient-weighted calculations⁴⁷ to analyze the relationship between various features and the prediction outcomes.

Data feature extraction process

For the input image data $x$, we used an optimized YOLO-v8 model^39,40 for segmentation, obtaining the nodule label ${l}_{n}$. And we used the U-Net++ network^56,57 to obtain the thyroid label ${l}_{t}$. Based on the obtained label ${l}_{n}$ and image $x$, we applied a cropping operation to extract the texture feature ${x}_{t}$, and calculated the length and width to obtain the shape⁵⁸ feature ${x}_{s}$. Besides, we derived ${x}_{e}$ by calculating the difference in the mean echo values inside and outside the nodule boundary^30,59. Using a random mosaic method⁶⁰, we minimized the influence of nodule morphology on edge blurring, independently extracting the edge feature ${x}_{m}$. Then we obtained the merged label ${l}_{t}-{l}_{n}$.

We designed the CPDT method to convert positional features. For a point ${l}_{i}$ within the nodule region, where ${l}_{i}\in {l}_{n}$, and the point ${l}_{j}$ within the thyroid region, where ${l}_{j}\in {l}_{t}$, the following equation applies:

$${d}_{i}\left({{l}_{i},l}_{t}\right)={\min }_{{l}_{j}\in {l}_{t},\,{l}_{i}}\left(\sqrt{{\left({p}_{{l}_{j}}-{p}_{{l}_{i}}\right)}^{2}+{\left({q}_{{l}_{j}}-{q}_{{l}_{i}}\right)}^{2}}\right)$$

(1)

Where $p$ and $q$ represent the horizontal and vertical coordinates of a point in the image. ${d}_{i}$ represents the minimum distance from point ${l}_{i}$ to the thyroid capsule.

We have defined the central point $C$, which is the point within the thyroid region that has the maximum distance to the thyroid capsule. The maximum distance, ${d}_{\max }$, from any point inside the thyroid to the capsule is defined as follows:

$${d}_{\max }\left({l}_{t}\right)={\min }_{{l}_{j}\in {l}_{t},\,{l}_{t}}\left(\sqrt{{\left({p}_{{l}_{j}}-{p}_{{l}_{t}}\right)}^{2}+{\left({q}_{{l}_{j}}-{q}_{{l}_{t}}\right)}^{2}}\right)$$

(2)

When evaluating the risk of metastasis based solely on distance metrics, the point $C$ has the minimum risk of distance-related metastasis. Meanwhile, points on and beyond the thyroid capsule can be considered to have the maximum risk of distance-related metastasis. Therefore, we designed the distance transformation $\varphi$ to represent the risk of distance-related metastasis for a given point:

$$\varphi \left({l}_{t},{l}_{i}\right)=1-{d}_{i}\left({{l}_{i},l}_{t}\right)/{d}_{\max }\left({l}_{t}\right)$$

(3)

Through the $\varphi$, the grayscale value of each point in ${l}_{n}$ is converted to its minimum distance from the thyroid, thereby representing the relative locational information of the nodule and thyroid. Ultimately, we obtained the locational information ${x}_{l}$. By calculating the distance ratio between the nodule region and point $C$, we can extract information about the different positions of the nodule within the thyroid region. For instance, when the nodule is on the left and right sides of point $C$, the resulting ${x}_{l}$ will be different, even if the distance to the edge of the thyroid is the same.

TMDL module

In practice, we pass multimodal input data (i.e., medical images and clinical text information) to the TMDL module to compute prediction logits, where binary cross-entropy is chosen as the loss function. TMDL is a unified Transformer module. Its structure mainly includes: two initial embedding layers that embed tokens from input images and text respectively; four stacked bidirectional multimodal attention blocks that learn intermediate representations of fused features by capturing interactions between tokens from the same modality and different modalities; 12 stacked self-attention blocks that learn the overall multimodal representation and enhance its discriminative power, and a classification head for generating prediction logits.

In TMDL, the multimodal input data consists of eight components: image data includes location ${x}_{l}$, texture ${x}_{t}$ and margin ${x}_{m}$, as well as imaging reports ${x}_{r}$, echogenicity ${x}_{e}$, shape ${x}_{s}$ and each patient’s gender ${x}_{{sex}}$ and age ${x}_{{age}}$. We combine ${x}_{l}$, ${x}_{t}$ and ${x}_{m}$ and pass them through a convolutional layer, which generates a series of visual tokens. Next, we add standard learnable 1D positional embeddings^61,62 and dropout to each visual token, resulting in a series of image patch tokens ${{Tokens}}_{{Image}}\left(3n\right)$, where $n$ is the length of a single image patch. At the same time, we use a tokenization encoder to encode each word in ${x}_{r}$. Specifically, we use a pre-trained BERT model⁶² to generate embedding feature vectors for each word in ${x}_{r}$, producing a series of word tokens ${{Tokens}}_{{Text}}\left(m\right)$, where $m$ is the maximum length set for the text. We linearly project ${x}_{{sex}}$, ${x}_{{age}}$, ${x}_{e}$ and ${x}_{s}$ to obtain encoded feature vectors ${{Tokens}}_{{Sex}}$, ${{Tokens}}_{{Age}}$, ${{Tokens}}_{e}$ and ${{Tokens}}_{s}$. We then concatenate $\{{{Tokens}}_{{Text}}\left(m\right),\,{{Tokens}}_{{Sex}},\,{{Tokens}}_{{Age}},\,{{Tokens}}_{e},\,{{Tokens}}_{s}\}$ to generate a series of clinical text tokens ${{Tokens}}_{T}(m+4)$. In practice, we set mmm to 50.

The combined tokens are fed into four stacked bidirectional multimodal attention blocks. Assume that the input to the first bidirectional multimodal attention block consists of ${{Tokens}}_{I}^{l}$ and ${{Tokens}}_{T}^{l}$, where $l\,\left(=0\right)$ denotes the layer index, ${{Tokens}}_{I}^{0}={{Tokens}}_{{Image}}\left(3n\right)$ represents the set of image patch tokens, and ${{Tokens}}_{T}^{0}={{Tokens}}_{T}(m+4)$ represents the set of clinical text tokens. In the bidirectional multimodal attention block, the process of generating the query, key, and value matrices for each modality is as follows:

$${Q}_{I}^{l},{K}_{I}^{l},{V}_{I}^{l}={LP}\left({Norm}\left({{Tokens}}_{I}^{l}\right)\right)$$

(4)

$${Q}_{T}^{l},{K}_{T}^{l},{V}_{T}^{l}={LP}\left({Norm}\left({{Tokens}}_{T}^{l}\right)\right)$$

(5)

Where ${LP}\left(\cdot \right)$ and ${Norm}\left(\cdot \right)$ represent linear projection and layer normalization, respectively. The forward pass within the bidirectional multimodal attention block can be summarized as follows:

$${{{{\mathcal{T}}}}}_{I}^{l}={Attention}\left({Q}_{I}^{l},{K}_{I}^{l},{V}_{I}^{l}\right)+\alpha {Attention}\left({Q}_{I}^{l},{K}_{T}^{l},{V}_{T}^{l}\right)$$

(6)

$${{{{\mathcal{T}}}}}_{T}^{l}={Attention}\left({Q}_{T}^{l},{K}_{T}^{l},{V}_{T}^{l}\right)+\alpha {Attention}\left({Q}_{T}^{l},{K}_{I}^{l},{V}_{I}^{l}\right)$$

(7)

Among them, ${Attention}({Q}_{I}^{l},{K}_{I}^{l},{V}_{I}^{l})$ and $A{ttention}({Q}_{T}^{l},{K}_{T}^{l},{V}_{T}^{l})$ capture intra-modal connections within the image and text modalities, respectively. ${Attention}({Q}_{I}^{l},{K}_{T}^{l},{V}_{T}^{l})$ and ${Attention}({Q}_{T}^{l},{K}_{I}^{l},{V}_{I}^{l})$ explore inter-modal connections between the image and text. Next, the intra-modal and inter-modal connections are encoded into latent representations ${{{{\mathcal{T}}}}}_{I}^{l}$ and ${{{{\mathcal{T}}}}}_{T}^{l}$. After some preliminary experiments, we set $\alpha$ to 1.0. ${Attention}\left(Q,K,V\right)$ consists of two matrix multiplications followed by a scaled ${softmax}$ operation:

$${Attention}\left(Q,K,V\right)={softmax}\left(\frac{Q{K}^{\intercal}}{\sqrt{{d}_{k}}}V\right)$$

(8)

Here, ${{{\rm{T}}}}$ denotes the matrix transpose operator, and ${d}_{k}$ is a scaling hyperparameter, which we set to 64. We then introduce residual learning and pass the resulting ${{{{\mathcal{T}}}}}_{I}^{l}$, ${{{{\mathcal{T}}}}}_{T}^{l}$ to the next normalization layer and MLP:

$${{Tokens}}_{I}^{l+1}={MLP}\left({Norm}\left({{{{\mathcal{T}}}}}_{I}^{l}\right)\right)+{{Tokens}}_{I}^{l}$$

(9)

$${{Tokens}}_{T}^{l+1}={MLP}\left({Norm}\left({{{{\mathcal{T}}}}}_{T}^{l}\right)\right)+{{Tokens}}_{T}^{l}$$

(10)

${{Tokens}}_{I}^{l+1}$ and ${{Tokens}}_{T}^{l+1}$ are passed as inputs to the next bidirectional multimodal attention block, producing ${{Tokens}}_{I}^{l+2}$ and ${{Tokens}}_{T}^{l+2}$. This operation is repeated until the fourth layer, generating ${{Tokens}}_{I}^{l+4}$ and ${{Tokens}}_{T}^{l+4}$. Then we concatenate the tokens from ${{Tokens}}_{I}^{l+4}$ and ${{Tokens}}_{T}^{l+4}$ to form a unified sequence of tokens, which are passed to the subsequent self-attention blocks. We also allocate 12 multiple heads⁴² in the bidirectional multimodal attention and self-attention blocks. This multi-head mechanism allows the model to perform attention operations simultaneously across multiple representation subspaces and subsequently aggregate the results.

Finally, we apply average pooling to the unified tokens generated from the last self-attention block to obtain the overall multimodal representation used for predicting LLNM. This representation is passed through a two-layer MLP to produce the final prediction logits. During the training phase, we compute the binary cross-entropy ${loss}$ between these logits and the lymph node metastasis labels, as given by the following formula:

$${loss}=-\frac{1}{N}{\sum }_{i=1}^{N}\left[{Y}_{i}\cdot \log \left(P\left({Y}_{i}\right)\right)+\left(1-{Y}_{i}\right)\cdot \log \left(1-P\left({Y}_{i}\right)\right)\right]$$

(11)

Here, $N$ represents the number of samples in the training set, ${Y}_{i}$ denotes the label of a sample, and $P\left({Y}_{i}\right)$ is the probability value predicted by the LLNM-Net output. A patient has two sets of imaging data from different directions but shares the same clinical information. Each set of patient data results in a loss value calculation, so there are two loss values per patient. We apply average pooling to these values, taking the mean, and then pass it to the two-layer MLP and ${loss}$ function.

Model interpretation method

We used a standard attention analysis method for feature analysis. For each layer in LLNM-Net, we computed the average attention weights across multiple heads. Considering the residual connections, we added an identity matrix to each attention matrix and normalized the resulting weight matrix. Next, we recursively multiplied the weight matrices from different layers of LLNM-Net. Finally, we obtained an attention map that includes the similarity between each input token and the CLS token. Since the CLS token is used for diagnostic prediction, these similarities indicate the correlation between the input tokens and the prediction outcome, which can then be used for visualization. We used Grad-CAM++⁴⁷ to visualize the model parameters.

To provide a qualitative interpretation of position, we employed a symmetric diffeomorphism-based algorithm⁴⁸ for registration. Thyroid images are not spatially aligned due to individual variability and factors during image acquisition. Therefore, we needed to map all thyroid data onto a unified standardized template. We defined the registration process as $R$. This process entailed interacting with an atlas feature matrix referred to as ${x}_{l}$ and a target feature matrix indicated by ${x}_{l}^{R}$, both expressed as functions ${x}_{l},{x}_{l}^{R}:R$. The algorithm posits that the diffeomorphism ${\mathfrak{d}}$ is established within the domain of the feature matrix $\Omega$, connecting these feature matrices so that ${x}_{l}^{R}={x}_{l}\cdot {\varphi }^{-1}$. The boundary point ${\mathfrak{d}}{\mathfrak{=}}{{\mathfrak{D}}}_{1}$ of the curve ${\mathfrak{d}}{\mathfrak{=}}{{\mathfrak{D}}}_{t},{t}\in \left[{{\mathrm{0,1}}}\right]$ adheres to the ordinary differential equation (o.d.e.):

$$\frac{d{{\mathfrak{D}}}_{t}}{{dt}}={v}_{t}\left({{\mathfrak{D}}}_{t}\right),\,{{\mathfrak{D}}}_{0}={Fd},\,t\in \left[0,1\right]$$

(12)

In this context, ${{\mathfrak{d}}}_{0}={Fd}$ presents the identity transformation, while ${v}_{t}$ signifies the time-varying, smooth velocity field, which is defined as ${v}_{t}:\varOmega \to R,{t}\in \left[{{\mathrm{0,1}}}\right]$. The computation of $\varphi$ is performed as indicated below: ${\mathfrak{d}}{\mathfrak{=}}{{\mathfrak{D}}}_{1}={\int }_{0}^{1}{v}_{t}\left({{\mathfrak{D}}}_{t}\right){dt}$ with ${{\mathfrak{d}}}_{0}={Fd}$. Here, we determine the optimal ${v}_{t}$ by solving the standard Large Deformation Diffeomorphic Metric Matching (LDDMM)⁶³ equation:

$$\hat{v}={{{{\rm{arg}}}}{\min} }_{v:\frac{d{{\mathfrak{d}}}_{t}}{{dt}}={v}_{t}\left({{\mathfrak{d}}}_{t}\right)}\left({\int }_{0}^{1}{\bigg||L{\nu }_{t}|\big|}_{{L}^{2}}^{2}{dt}+\frac{1}{{\sigma }^{2}}{\bigg||{x}_{l}\cdot {{\mathfrak{d}}}^{-1}-{x}_{l}^{R}|\big|}_{{L}^{2}}^{2}\right)$$

(13)

Where $L$ is the smoothness operator defined by equation: $L=-\alpha {\nabla }^{2}+\gamma x$, where ${\nabla }^{2}$ is the Laplacian operator. We used linear interpolation for image transformation. Mutual information served as the optimization metric during the registration process, and the final evaluation index employed was the mean square error (MSE).

$${MSE}=\frac{1}{{MN}}{\sum }_{q=1}^{M}{\sum }_{p=1}^{N}{\left[{x}_{l}\left(p,q\right)-{x}_{l}^{R}\left(p,q\right)\right]}^{2}$$

(14)

where $M$ and $N$ represented the row and column dimension of the matrix, respectively.

We calculated the sum of ${x}_{l}^{R}\left({Meta}\right)$ for all metastatic patients and subtracted the sum of ${x}_{l}^{R}\left({Non}\right)$ for all non-metastatic patients. Then we scaled the matrix to the 0−1 range, obtaining the metastasis risk distribution map ${{Risk}}_{{Meta}}$。

$${{Risk}}_{{Meta}}=N{orm}\left(\sum {x}_{l}^{R}\left({Meta}\right)-\sum {x}_{l}^{R}\left({Non}\right)\right)$$

(15)

Model evaluation and radiologist competing test

We evaluated the performance of the predictive model using the AUC of the receiver operating characteristic (ROC) curve, as well as its sensitivity, specificity, accuracy, NPV, and PPV. To compare the predictive effectiveness of AI and human experts for LLNM, we designed an ultrasound physician test experiment. During the recruitment of physicians, a rule of equal representation of male and female experts was followed. Sex information was collected through self-reporting. However, the sex of experts is not used as a variable in the analysis of this study. All participating physicians were required to complete two tasks:

Task 1: Physicians were asked to diagnose thyroid cancer based on 200 ultrasound images, which included 100 benign and 100 malignant nodules.
Task 2: Physicians were required to predict LLNM based on 200 cases using ultrasound images, imaging reports, and clinical information, including 100 LLNM-positive and 100 LLNM-negative cases.

The purpose of Task 1 was to assess the participating physicians’ expertise in detecting thyroid lesions on ultrasound images. Task 2 was designed to evaluate the physicians’ ability to predict LLNM using a combination of images and clinical information.

Statistical analysis

We estimated the 95% confidence intervals (CI) for the performance metrics pertaining to our classification results using bootstrapping, which encompassed AUC, sensitivity, specificity, accuracy, NPV and PPV. The method we used involved implementing n-out-of-n bootstrap sampling with replacement at the image level for our datasets. For each bootstrap sample (100 samples), we calculated and retained the performance metrics specific to that sample. This process was carried out 1000 times. Subsequently, we established the 95% CIs by taking the 2.5th and 97.5th percentiles from the distribution of each metric’s empirical data. All computations and statistical analyses were conducted using Python, version 3.9 (Python Software Foundation).

NPV is the probability that a person testing negative for a disease truly does not have the disease. In other words, it’s the percentage of negative results that are correct. The formula for NPV is:

$$N{PV}=\frac{{True\; negative}}{{True\; negative}+{False\; negative}}$$

(16)

PPV is the probability that a person testing positive for a disease truly has the disease. It’s the percentage of positive results that are correct. The formula for PPV is:

$${PPV}=\frac{{True\; positive}}{{True\; positive}+{False\; positive}}$$

(17)

Sensitivity (also referred to as the true positive rate and the recall) is the proportion of positives that are correctly identified as follows:

$${Sensitivity}/{TPR}=\frac{{True\; positive}}{{True\; positive}+{False\; negative}}$$

(18)

Specificity (also known as the true negative rate), which measures the proportion of correctly identified negatives, was calculated as follows:

$${Specificity}/{TNR}=\frac{{True\; negative}}{{True\; negative}+{False\; positive}}$$

(19)

AUC, standing for Area under the ROC Curve, measuring the entire two-dimensional area underneath the entire ROC curve (think integral calculus) from (0,0) to (1,1), was calculated as:

$$A={\int }_{0}^{1}{TPR}({FP}{R}^{-1}(x)){dx}$$

(20)

Where ${FPR}=1-{TNR}=1-{Specificity}$. Given two bounding boxes ${b}_{1}$ and ${b}_{2}$, their IoU could be computed as:

$${IOU}\left({b}_{1},{b}_{2}\right)=\frac{{{{\rm{A}}}}\left({b}_{1}\cap {b}_{2}\right)}{{{{\rm{A}}}}({b}_{1}\cup {b}_{2})}=\frac{{{{\rm{A}}}}\left({b}_{1}\cap {b}_{2}\right)}{{{{\rm{A}}}}\left({b}_{1}\right)+{{{\rm{A}}}}({b}_{2})-{{{\rm{A}}}}({b}_{1}\cap {b}_{2})}$$

(21)

Where A(·) was the area of the shape. The calculation of IoU could therefore be formulated as a problem involving the computation of the area of each spherical rectangle and the intersection of two spherical rectangles.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All data supporting the research findings in this study are available in the article and its supplementary information files. The minimum dataset required to interpret, verify, and extend the study results on patients with lateral lymph node metastasis has been deposited in Hugging Face with the access link: https://huggingface.co/datasets/Snowinbio/LLNM_Multimodal_dataset. This includes: (1) Pre-processed cropped imaging data (ultrasound images with anonymized metadata). (2) Corresponding ultrasound imaging reports for patients, including professional descriptions of the images by physicians, characteristics of nodules, etc. (3) Clinical characteristic information of patients, including age and sex. The source data file containing detailed LLNM-Net outputs and key evaluation metrics can also be obtained via the following link: https://github.com/Snowinbio/LLNM-Net/blob/main/Source%20Data.xlsx. Due to ethical restrictions and patient confidentiality agreements, the full dataset (such as raw imaging data, detailed imaging reports, and patient clinical records) cannot be publicly available. This is because even after de-identification, detailed patient clinical records and high-resolution imaging data may still pose a risk of re-identification due to the unique characteristics of thyroid cancer cases. Researchers wishing to access additional data for non-commercial academic purposes may submit a formal application to the corresponding author. Applications will be reviewed by the institutional ethics committee and data custodians. The applicable conditions are as follows: (1) Purpose: The data may only be used for research purposes consistent with the original study objectives. (2) Access restrictions: Requestors must sign a data use agreement prohibiting re-identification or redistribution. (3) Data retention: Approved data will be available for 2 years from the date of publication. The data for each figure in this study are included in the “Source Data” section, with the file name Source Data.xlsx. This file can also be downloaded from the following link: https://github.com/Snowinbio/LLNM-Net/blob/main/Source%20Data.xlsx. Source data are provided in this article. Source data are provided with this paper.

Code availability

The primary code for the project is accessible at: https://github.com/Snowinbio/LLNM-Net.git. Installation instructions are provided in the repository. We have provided a permanent reference⁶⁴ for the specific code version used in this study. The code is released under the Apache License 2.0, which permits free use, modification, and redistribution under its terms. The model implementation is built upon multiple publicly available open-source projects. We have retained all original license information and copyright notices in the corresponding source files. Specifically, we acknowledge the following contributions: IRENE (Apache 2.0): https://github.com/RL4M/IRENE, YOLOv8 by Ultralytics (AGPL-3.0): https://huggingface.co/Ultralytics/YOLOv8.

References

Bray, F. et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Cancer J. Clin. 74, 229–263 (2024).
Article Google Scholar
Miranda-Filho, A. et al. Thyroid cancer incidence trends by histology in 25 countries: a population-based study. Lancet Diab Endocrinol. 9, 225–234 (2021).
Article Google Scholar
Haugen, B. R. et al. 2015 American Thyroid Association management guidelines for adult patients with thyroid nodules and differentiated thyroid cancer: the American Thyroid Association guidelines task force on thyroid nodules and differentiated thyroid cancer. Thyroid 26, 1–133 (2016).
Article PubMed PubMed Central Google Scholar
Zaydfudim, V., Feurer, I. D., Griffin, M. R. & Phay, J. E. The impact of lymph node involvement on survival in patients with papillary and follicular thyroid carcinoma. Surgery 144, 1070−1077 (2008).
Smith, V. A., Sessions, R. B. & Lentsch, E. J. Cervical lymph node metastasis and papillary thyroid carcinoma: does the compartment involved affect survival? Experience from the SEER database. J. Surg. Oncol. 106, 357–362 (2012).
Article PubMed Google Scholar
Yu, J. et al. Lymph node metastasis prediction of papillary thyroid carcinoma based on transfer learning radiomics. Nat. Commun. 11, 1–10 (2020).
Article ADS CAS Google Scholar
Yao, S. et al. Thyroid cancer central lymph node metastasis risk stratification based on homogeneous positioning deep learning. Research 7, 0432 (2024).
Article PubMed PubMed Central Google Scholar
Wang, Y. et al. Risk factors and a prediction model of lateral lymph node metastasis in CN0 papillary thyroid carcinoma patients with 1–2 central lymph node metastases. Front. Endocrinol. 12, 716728 (2021).
Lin Y., Cui N., Li F., Wang Y., Wang B. The model for predicting the central lymph node metastasis in cN0 papillary thyroid microcarcinoma with Hashimoto’s thyroiditis. Front. Endocrinol. 15, 1330896 (2024).
Machens, A., Hauptmann, S. & Dralle, H. Prediction of lateral lymph node metastases in medullary thyroid cancer. Br. J. Surg. 95, 586–591 (2008).
Article CAS PubMed Google Scholar
Mohebati, A. & Shaha, A. R. Anatomy of thyroid and parathyroid glands and neurovascular relations. Clin. Anat. 25, 19–31 (2012).
Article CAS PubMed Google Scholar
Greene, F. L. et al. AJCC cancer staging handbook: TNM classification of malignant tumors. (Springer Science & Business Media, 2002).
Liddy, W., Bonilla-Velez, J., Triponez, F., Kamani, D. & Randolph G. 31 - Principles in Thyroid Surgery. In: Surgery of the Thyroid and Parathyroid Glands (Third Edition) (ed Randolph GW). (Elsevier, 2021).
Ito, Y. et al. Preoperative ultrasonographic examination for lymph node metastasis: usefulness when designing lymph node dissection for papillary microcarcinoma of the thyroid. World J. Surg. 28, 498–501 (2004).
Article PubMed Google Scholar
de Meer, S. G. A. et al. Not the number but the location of lymph nodes matters for recurrence rate and disease-free survival in patients with differentiated thyroid cancer. World J. Surg. 36, 1 (2012).
Google Scholar
Ruan, J. et al. Lateral lymph node metastasis in papillary thyroid microcarcinoma: a study of 5241 follow-up patients. Endocrine 83, 414–421 (2024).
Article CAS PubMed Google Scholar
Hu, D. et al. Risk factors for and prediction model of skip metastasis to lateral lymph nodes in papillary thyroid carcinoma. World J. Surg. 44, 1498–1505 (2020).
Article PubMed Google Scholar
Feng, J.-W. et al. Predictive factors for lateral lymph node metastasis and skip metastasis in papillary thyroid carcinoma. Endocr. Pathol. 31, 67–76 (2020).
Article PubMed Google Scholar
Wu, X., Li, B., Zheng, C. & He, X. Risk factors for skip metastasis in patients with papillary thyroid microcarcinoma. Cancer Med. 12, 7560–7566 (2023).
Article CAS PubMed Google Scholar
Liu, J., Liu, Q., Wang, Y., Xia, Z. & Zhao, G. Nodal skip metastasis is associated with a relatively poor prognosis in thoracic esophageal squamous cell carcinoma. Eur. J. Surg. Oncol 42, 1202–1205 (2016).
Article CAS PubMed Google Scholar
Prenzel, K. L. et al. Skip metastasis in nonsmall cell lung carcinoma. Cancer 100, 1909–1917 (2004).
Article PubMed Google Scholar
Tee, Y. Y., Lowe, A. J. & Brand, C. A., Judson, R. T. Fine-needle aspiration may miss a third of all malignancy in palpable thyroid nodules: a comprehensive literature review. Ann. Surg. 246, 714–720 (2007).
Qubain, S. W., Nakano, S., Baba, M., Takao, S. & Aikou, T. Distribution of lymph node micrometastasis in pN0 well-differentiated thyroid carcinoma. Surgery 131, 249–256 (2002).
Article PubMed Google Scholar
Back, K., Kim, J. S., Kim, J.-H. & Choe, J.-H. Superior located papillary thyroid microcarcinoma is a risk factor for lateral lymph node metastasis. Ann. Surg. Oncol. 26, 3992–4001 (2019).
Article PubMed Google Scholar
Ywata de Carvalho, A., Kohler, H. F., Gomes, C. C., Vartanian, J. G. & Kowalski, L. P. Predictive factors for recurrence of papillary thyroid carcinoma: analysis of 4,085 patients. Acta Otorhinolaryngol. Ital. 41, 236–242 (2021).
Article PubMed PubMed Central Google Scholar
Liu, J. et al. Follicular variant of papillary thyroid carcinoma. Cancer 107, 1255–1264 (2006).
Article PubMed Google Scholar
Sheng, L. et al. Predicting factors for central or lateral lymph node metastasis in conventional papillary thyroid microcarcinoma. Am. J. Surg. 220, 334–340 (2020).
Article PubMed Google Scholar
Huang, J. et al. Developing and validating a multivariable machine learning model for the preoperative prediction of lateral lymph node metastasis of papillary thyroid cancer. Gland Surg. 12, 101–109 (2023).
Article PubMed PubMed Central Google Scholar
Yao, J. et al. DeepThy-Net: a multimodal deep learning method for predicting cervical lymph node metastasis in papillary thyroid cancer. Adv. Intell. Syst. 4, 2200100 (2022).
Article Google Scholar
Yao, S. et al. Human understandable thyroid ultrasound imaging AI report system—a bridge between AI and clinicians. iScience 26, 106530 (2023).
Article ADS PubMed PubMed Central Google Scholar
Yao, S. et al. Enhancing the fairness of AI prediction models by Quasi-Pareto improvement among heterogeneous thyroid nodule population. Nat. Commun. 15, 1958 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Ren, Y. et al. BMAP: a comprehensive and reproducible biomedical data analysis platform. bioRxiv, 2024.2007.2015.603507 (2024).
Deng, L., Wu, Y., Ren, Y., Lu, H. Autonomous self-evolving research on biomedical data: the DREAM paradigm. Adv. Sci. 12, 2417066 (2025).
Ha, E. J. et al. Artificial intelligence model assisting thyroid nodule diagnosis and management: a multicenter diagnostic study. J. Clin. Endocrinol. Metab. 109, 527–535 (2024).
Article CAS PubMed Google Scholar
Zhou, L.-Q. et al. Deep learning predicts cervical lymph node metastasis in clinically node-negative papillary thyroid carcinoma. Insights Imaging 14, 222 (2023).
Article CAS PubMed PubMed Central Google Scholar
Tong, Y. et al. Ultrasound-based radiomic nomogram for predicting lateral cervical lymph node metastasis in papillary thyroid carcinoma. Acad.c Radiol. 28, 1675–1684 (2021).
Article Google Scholar
Xing, Z. et al. Thyroid cancer neck lymph nodes metastasis: Meta-analysis of US and CT diagnosis. Eur. J. Radiol. 129, 109103 (2020).
Article PubMed Google Scholar
Dai, F. et al. Improving AI models for rare thyroid cancer subtype by text guided diffusion models. Nat. Commun. 16, 4449 (2025).
Article CAS PubMed PubMed Central Google Scholar
Zheng, Z., Zhong, Y., Wang, J., Ma, A. Foreground-aware relation network for geospatial object segmentation in high spatial resolution remote sensing imagery. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2020).
Redmon, J. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition) (2016).
Borgefors, G. Distance transformations in digital images. Comput. Vis. Graph. Image Process. 34, 344–371 (1986).
Article Google Scholar
Vaswani, A. Attention is all you need. Advances in Neural Information Processing Systems (NeurIPS, 2017).
Zhou, H.-Y. et al. A transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics. Nat. Biomed. Eng. 7, 743–755 (2023).
Article PubMed Google Scholar
Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1). Association for Computational Linguistics (2019).
Kolesnikov, A. et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In: International Conference on Learning Representations (2021).
Kwak, J. Y. et al. Thyroid imaging reporting and data system for us features of nodules: a step in establishing better stratification of cancer risk. Radiology 260, 892–899 (2011).
Article PubMed Google Scholar
Selvaraju, R. R. et al. Grad-cam: Visual explanations from deep networks via gradient-based localization (2017).
Avants, B. B., Tustison, N. & Song, G. Advanced normalization tools (ANTS). Insight J. 2, 1–35 (2009).
Google Scholar
Tustison, N. J. et al. The ANTsX ecosystem for quantitative biological and medical imaging. Sci. Rep. 11, 1–13 (2021).
Article Google Scholar
Ito, Y. et al. Prognosis of patients with papillary thyroid carcinoma having clinically apparent metastasis to the lateral compartment. Endocr. J. 56, 759–766 (2009).
Article PubMed Google Scholar
Zaydfudim, V., Feurer, I. D., Griffin, M. R. & Phay, J. E. The impact of lymph node involvement on survival in patients with papillary and follicular thyroid carcinoma. Surgery 144, 1070–1078 (2008).
Article PubMed Google Scholar
Ricarte-Filho, J. et al. Papillary thyroid carcinomas with cervical lymph node metastases can be stratified into clinically relevant prognostic categories using oncogenic BRAF, the number of nodal metastases, and extra-nodal extension. Thyroid 22, 575–584 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ramundo, V. et al. Is thyroid nodule location associated with malignancy risk? Ultrasonography 38, 231–235 (2019).
Article PubMed Google Scholar
Zhang, F. et al. Thyroid nodule location on ultrasonography as a predictor of malignancy. Endocr. Pract. 25, 131–137 (2019).
Article CAS PubMed Google Scholar
Jasim, S., Baranski, T. J., Teefey, S. A. & Middleton, W. D. Investigating the effect of thyroid nodule location on the risk of thyroid cancer. Thyroid® 30, 401–407 (2020).
Article CAS PubMed Google Scholar
Ronneberger, O., Fischer, P., Brox, T. U-Net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015 (eds Navab N., Hornegger J., Wells W. M., Frangi A. F.) (2015).
Zhou, Z., Siddiquee, M. M. R., Tajbakhsh, N. & Liang, J. UNet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 39, 1856–1867 (2020).
Article PubMed Google Scholar
Alexander, E. K. et al. Thyroid nodule shape and prediction of malignancy. Thyroid® 14, 953–958 (2004).
Article PubMed Google Scholar
Xie, C., Cox, P., Taylor, N. & LaPorte, S. Ultrasonography of thyroid nodules: a pictorial review. Insights Imaging 7, 77–86 (2016).
Article PubMed Google Scholar
Xu, P., Ding, J., Zhang, H. & Huang, H. Discernible image mosaic with edge-aware adaptive tiles. Comput. Vis. Media 5, 45–58 (2019).
Article Google Scholar
Zhang, K. et al. Deep-learning models for the detection and incidence prediction of chronic kidney disease and type 2 diabetes from retinal fundus images. Nat. Biomed. Eng. 5, 533–545 (2021).
Article CAS PubMed Google Scholar
Akselrod-Ballin, A. et al. Predicting breast cancer by applying deep learning to linked health records and mammograms. Radiology 292, 331–342 (2019).
Article PubMed Google Scholar
Beg, M. F., Miller, M. I., Trouvé, A. & Younes, L. Computing large deformation metric mappings via geodesic flows of diffeomorphisms. Int. J. Comput. Vis. 61, 139–157 (2005).
Article Google Scholar
Shen, P. Explainable multimodal deep learning for predicting thyroid cancer lateral lymph node metastasis using ultrasound imaging: LLNM-Net). v1.0.0 edn. Zenodo (2025).

Download references

Acknowledgements

Funding. This work was partially supported by the National Natural Science Foundation of China (grant No. 62406191), Shanghai Municipal Education Commission (No.2024AIYB010), the Fundamental Research Funds for the Central Universities (YG2025LC03), the Interdisciplinary Program of Shanghai Jiao Tong University (YG2024QNA02), the National Natural Science Foundation of China (82403192) and the Shanghai Anticancer Association (SACA-CY23C07), the Startup for Young Faculty (SFYF) at SJTU(24×010500175), the Science and Technology Commission of Shanghai Municipality (STCSM) (Grant No. 23JS1400800&23JS1400700), SJTU Transmed Awards Research (STAR) Grant No. 20210106 (HL) and Neil Shen’s SJTU Medical Research Fund. The content of this article does not reflect the view of the funding sources.

Author information

These authors contributed equally: Pengcheng Shen, Zheyu Yang, Jingjing Sun.

Authors and Affiliations

Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
Pengcheng Shen, Yongyong Ren, Hui Lu & Siqiong Yao
Department of General Surgery, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, PR China
Zheyu Yang & Wei Cai
Department of Ultrasound, Shanghai Fourth People’s Hospital, School of Medicine, Tongji University, Shanghai, PR China
Jingjing Sun
Hepatobiliary Pancreatic Center, Xuzhou Central Hospital, Xuzhou, Jiangsu Province, PR China
Yun Wang
Medical college, Nantong University, Nantong, Jiangsu, PR China
Cheng Qiu
Department of Endocrinology and Metabolism, Shanghai Children’s Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, PR China
Yirou Wang
SJTU-Yale Joint Center of Biostatistics and Data Science, Technical Center for Digital Medicine, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, PR China
Yongyong Ren, Hui Lu & Siqiong Yao
Department of Thyroid and Breast Surgery, Shanghai Fourth People’s Hospital, School of Medicine, Tongji University, Shanghai, PR China
Sheng Liu
Institute of Bioinformatics, Shanghai Academy of Experimental Medicine, Shanghai, PR China
Hui Lu

Authors

Pengcheng Shen
View author publications
Search author on:PubMed Google Scholar
Zheyu Yang
View author publications
Search author on:PubMed Google Scholar
Jingjing Sun
View author publications
Search author on:PubMed Google Scholar
Yun Wang
View author publications
Search author on:PubMed Google Scholar
Cheng Qiu
View author publications
Search author on:PubMed Google Scholar
Yirou Wang
View author publications
Search author on:PubMed Google Scholar
Yongyong Ren
View author publications
Search author on:PubMed Google Scholar
Sheng Liu
View author publications
Search author on:PubMed Google Scholar
Wei Cai
View author publications
Search author on:PubMed Google Scholar
Hui Lu
View author publications
Search author on:PubMed Google Scholar
Siqiong Yao
View author publications
Search author on:PubMed Google Scholar

Contributions

P.S., S.Y., H.L. and W.C. developed the concept for the manuscript. P.S. and S.Y. contributed to drafting of the manuscript. P.S. and S.Y. designed the model and analysis the data. P.S. and Z.Y. contributed to data cleaning. W.C., S.L., Z.Y., J.S., Yun Wang, Q.C., Yirou Wang and Y.R. contributed to providing medical data and clinical advice.

Corresponding authors

Correspondence to Sheng Liu, Wei Cai, Hui Lu or Siqiong Yao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Qicheng Lao and Nikita Pozdeyev for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Peer Review File (download PDF )

Reporting Summary (download PDF )

Source data

Source Data (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Shen, P., Yang, Z., Sun, J. et al. Explainable multimodal deep learning for predicting thyroid cancer lateral lymph node metastasis using ultrasound imaging. Nat Commun 16, 7052 (2025). https://doi.org/10.1038/s41467-025-62042-z

Download citation

Received: 29 October 2024
Accepted: 09 July 2025
Published: 01 August 2025
Version of record: 01 August 2025
DOI: https://doi.org/10.1038/s41467-025-62042-z

This article is cited by

Explainable multimodal deep learning for predicting thyroid cancer lateral lymph node metastasis using ultrasound imaging
- Pengcheng Shen
- Zheyu Yang
- Siqiong Yao
Nature Communications (2025)