Introduction

Thyroid cancer is the most common endocrine malignancy, with its incidence rapidly increasing in recent years. Global cancer statistics for 2020 show 586,202 new cases of thyroid cancer, making up 3% of all new cancer cases, and 43,646 deaths, accounting for 0.4% of all cancer deaths, ranking 9th in incidence1,2. While fine-needle aspiration biopsy (FNAB) is crucial for evaluating thyroid tumors, it can result in false-negative results, especially with large thyroid nodules3. Identification methods based on detecting molecular mutations in BRAF, NRAS, and HRAS genes have improved FNAB’s inadequacies, but their sensitivity remains unclear4. Therefore, finding molecular markers that influence thyroid cancer development is essential for primary diagnosis and prognosis assessment.

Early studies on oncogenes primarily focused on protein-coding genes. However, coding genes make up only 2% of all transcripts in the human genome, while non-coding RNAs, such as long non-coding RNA (lncRNA), pseudogenes, and small RNAs, constitute the majority5. lncRNAs are ribonucleic acids longer than 200 nucleotides that do not code for proteins6. lncRNA sequences present on microRNA response elements (MREs) can bind to microRNA (miRNA) targets and act as negative regulators of miRNAs7. Recent advancements in whole genome sequencing technologies have revealed that lncRNAs play critical regulatory roles in human diseases and cancer development8,9. Weighted gene co-expression network analysis (WGCNA) has been widely used in genetics, clinical diseases, and cancer research. This method supports the exploration of large-scale datasets and the screening of biomarkers10,11. The competitive endogenous RNA (ceRNA) hypothesis, proposed in 2011, suggests that various RNA transcripts can act as ceRNA, competing for miRNA binding through binding sites and preventing miRNA from binding to messenger RNA (mRNA), thereby regulating mRNA expression12. Recently, studies on ceRNA networks have increased, particularly in high-incidence cancers such as breast, gastric, and colorectal cancers, as well as multiple diseases13,14.

In the ceRNA gene regulatory network, the complex regulatory relationship between lncRNA, miRNA and target genes may participate in gene expression, thus affecting the biological characteristics of tumors. At the cellular level, the environment for tumor cell growth and development is complex, and immune cells such as immune cells and epithelial cells are linked to fibroblasts, fat cells and extracellular matrix through chemokines and cytokines15. These immune cells have complex and varied communication patterns that are critical for tumor proliferation, migration, and immunosuppression. The interaction between tumor cells and immunoinfiltrating cells may be regulated by the ceRNA network. The cross-talk between cancer cells and TME has a complex effect on gene expression at the transcriptional and translational levels. Deepening the understanding of TME on cancer occurrence and targeted immunotherapy can provide basis and reference for the clinical management of cancer and the development of novel anticancer therapies. Immunotherapy can resist tumor cells by interfering the activity of immune molecules. Finding effective immune-related markers to apply to individual therapy is a new idea to develop tumor prevention programs.

Previous studies have revealed that in lung adenocarcinoma, ZIC2 might be modulated by several distinct lncRNAs-mediated ceRNA mechanisms, and the expression of this gene directly influences the infiltration level of immune cells16. In the ceRNA regulatory network of renal cancer, the KCNN4 gene is a crucial regulatory factor closely associated with the quiescence of Tregs and mast cells. Its abnormal expression leads to an increased possibility of immune evasion17. STRA6 can accelerate the infiltration of related fibroblasts and promote epithelial-mesenchymal transition. As a prognostic marker for BRAF mutant tumors, it may drive thyroid carcinogenesis by activating oncogenic pathways and regulating the immunosuppressive tumor microenvironment18. These studies offer insights into exploring the potential co-expression relationship between the ceRNA regulatory axis and immune cells in thyroid cancer.

The main analytical work in this study is shown in the flowchart (Fig. 1). For sample information and gene expression data from TCGA, GTEx, and GEO datasets, we screened for essential genes for thyroid cancer using WGCNA and differential expression analysis. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyzed the potential biological functions of genes and the pathways involved. The Protein-Protein Interaction (PPI) network illustrated the interactions between genes. Subsequently, we predicted targeting relationships between genes and constructed differential specific ceRNA regulatory networks in thyroid cancer. Survival analysis identified genes that affect overall survival in thyroid cancer patients and constructed a prognostic ceRNA sub-network. The Cibersort algorithm assessed the proportion of immune cells in cancerous thyroid tissue and associated key genes in the ceRNA regulatory networks with immune cells. Based on prognostic genes and prognosis-associated immune cells in the ceRNA regulatory network, we constructed an assessment model to predict the prognostic risk of cancer patients, providing a reference for exploring the complex pathogenesis of thyroid cancer.

Fig. 1
figure 1

Flow chart of ceRNA network construction in thyroid cancer.

Results

Differential expression analysis

Based on the TCGA and GTEx databases, 4,791 differentially expressed mRNAs were identified between cancerous and normal thyroid tissues. Among these, 2,017 genes were upregulated, and 2,774 genes were downregulated in cancer tissues (Fig. 2a). The GEO dataset (GSE33630) identified 925 differentially expressed mRNAs, with 480 upregulated and 425 downregulated in cancer tissues (Fig. 2b). After intersecting the two datasets, 489 differentially expressed mRNAs were identified, of which 325 were upregulated and 164 were downregulated. Additionally, 2,686 differentially expressed lncRNAs were identified in cancer tissues, with 1,153 upregulated and 1,533 downregulated (Fig. 2c). Finally, 162 differentially expressed miRNAs were identified, with 121 upregulated and 41 downregulated (Fig. 2d).

Fig. 2
figure 2

Volcano Plot of Differentially Expressed RNAs. Red dots indicate upregulated expression in cancer, green dots indicate downregulated expression, and black dots indicate no significant difference in expression. (a) 4,791 mRNAs (2,017 upregulated, 2,774 downregulated). (b) 925 mRNAs (480 upregulated, 425 downregulated). (c) 2,686 lncRNAs (1,153 upregulated, 1,533 downregulated). (d) 162 miRNAs (121 upregulated, 41 downregulated).

Weighted gene co-expression analysis

After removing one outlier (GTEX-SUCS-0026-SM-5CHQG) and filtering out low-expression genes, 337 normal and 512 thyroid cancer samples were included in the analysis. WGCNA identified mRNAs co-expressed in thyroid tissue, using a soft threshold of 10 (Fig. S1), a standard R2 of 0.86, and a network consistent with a scale-free distribution. Co-expressed genes were clustered into modules, and modules with a correlation higher than 0.75 were merged. This resulted in 10 mRNA co-expression modules. The cyan, green-yellow, and magenta modules, containing a total of 5,905 mRNAs, showed the highest correlation with thyroid tissue characteristics (Fig. 3a/b). The same analysis was performed to identify co-expressed lncRNAs, using a soft threshold of 3 (Fig. S2), resulting in 6 co-expressed modules. The turquoise module, with 4,089 lncRNAs, had the highest correlation with thyroid tissue characteristics (Fig. 3c/d). The mRNAs and lncRNAs identified by both differential analysis and WGCNA were taken as the intersection. This resulted in a ceRNA regulatory network constructed from 372 mRNAs and 1,661 lncRNAs (Fig. 3e/f).

Functional enrichment analysis and protein-protein interaction analysis

GO enrichment analysis included three aspects: biological process (BP), cell component (CC), and molecular function (MF). The results showed that the overlapping mRNAs were mainly enriched in extracellular matrix organization, extracellular structure organization, external encapsulating structure organization, and response to transforming growth factor beta (BP level). At the CC level, they were enriched in collagen-containing extracellular matrix, secretory granule lumen, cytoplasmic vesicle lumen, and vesicle lumen. At the MF level, they were enriched in extracellular matrix structural constituent, heparin binding, sulfur compound binding, and glycosaminoglycan binding (Fig. 4a). KEGG analysis showed that overlapping mRNAs were mainly involved in protein digestion and absorption, small cell lung cancer, cytokine-cytokine receptor interaction, transcriptional misregulation in cancer, complement and coagulation cascades, malaria, p53 signaling pathway, thyroid cancer, and AGE-RAGE signaling pathway in diabetic complications (Fig. 4b). In the PPI network constructed from overlapping mRNAs, there were 122 nodes and 192 edges (Fig. 4c). MCODE results identified 5 key gene clusters in the network, involving 32 node genes (Fig. S3a-e). The MCC algorithm in the CytoHubba function identified the top 10 key node genes in the network: TYMS, RRM2, UBE2C, KIAA0101, TOP2A, TPX2, TK1, CKS2, FN1, and ITGA3. These genes were present in the five key gene clusters simultaneously (Fig. S3f).

Fig. 3
figure 3

WGCNA identifies gene modules associated with phenotypic traits. (a) Dendrogram of mRNA clustering and module color division, with branches indicating mRNA clustering and each color representing a co-expression module. (b,d) Heatmap of the correlation between module eigengenes and traits. The number above each color block is the correlation coefficient, and the brackets below show the P-value. The color of the blocks reflects the magnitude of the correlation coefficient, with the legend on the right showing the range from red to blue. (c) Dendrogram of lncRNA clustering and module color division, with branches indicating lncRNA clustering and each color representing a co-expression module. (e) Venn diagram for mRNA, showing a total of 372 mRNAs: 230 upregulated and 142 downregulated. (f) Venn diagram for lncRNA, showing a total of 1,661 lncRNAs: 525 upregulated and 1,136 downregulated.

Fig. 4
figure 4

GO/KEGG Functional Enrichment Analysis and PPI Networks. (a) Bar plot of GO. The horizontal axis represents the number of mRNAs, and the vertical axis represents the GO entry. Pink, green, and blue indicate the enrichment analysis results for BP, MF, and CC, respectively. (b) Bar plot of KEGG. The horizontal axis represents the number of mRNAs, and the vertical axis represents the pathway names. Colors from blue to red indicate the enrichment results from low to high significance. (c) PPI networks. Each node represents a gene, with orange indicating upregulated genes and purple indicating downregulated genes. The size of each node indicates the strength of interaction with other nodes, and the thickness of the line between nodes indicates the interaction strength.

Construction of the ceRNA network

To construct the ceRNA regulatory network, we used RNAs that met specific conditions based on the ceRNA regulatory mechanism. miRNAs had expression trends opposite and negatively correlated with mRNAs and lncRNAs, while mRNAs targeting the same miRNA were positively correlated with lncRNAs. Using Starbase and LncACTdb, we identified 40 lncRNAs and 26 miRNAs with targeting relationships out of 525 up-regulated lncRNAs and 41 down-regulated miRNAs. In Starbase and TargetScanHuman, we found that 59 mRNAs and 10 miRNAs had targeting relationships out of 230 up-regulated mRNAs and 41 down-regulated miRNAs. After filtering out unmatched correlations, we constructed a ceRNA regulatory network comprising 16 down-regulated lncRNAs, 7 up-regulated miRNAs, and 43 down-regulated mRNAs (Fig. 5a). We performed the same analysis for 1,136 up-regulated lncRNAs, 41 up-regulated miRNAs, and 142 up-regulated mRNAs, resulting in a ceRNA network with 16 up-regulated lncRNAs, 11 down-regulated miRNAs, and 13 up-regulated mRNAs (Fig. 5b). Pearson correlation analysis was performed on gene expression levels, and the results were shown in supplementary Table 1.

Fig. 5
figure 5

ceRNA Networks. (a) Network comprising 16 down-regulated lncRNAs, 7 up-regulated miRNAs, and 43 down-regulated mRNAs. (b) Network comprising 16 up-regulated lncRNAs, 11 down-regulated miRNAs, and 13 up-regulated mRNAs. Circles represent mRNAs, diamonds represent lncRNAs, and rounded rectangles represent miRNAs. Red indicates up-regulated genes in thyroid cancer, while blue indicates down-regulated genes.

Survival analysis

Survival analysis of 372 overlapping mRNA expression levels in cancer patients identified 38 mRNAs significantly correlated with overall survival. These 38 mRNAs were used to construct a prognostic ceRNA regulatory sub-network. Targeting predictions for these mRNAs were made using the Starbase and TargetScanHuman databases. As a result, 13 mRNAs were found to be targeted by 15 differentially expressed miRNAs and were correlated with patient prognosis (Table 1). Kaplan-Meier survival curves analyzed the relationship between these 13 targeted mRNAs and overall survival in thyroid cancer patients (Fig. S4).

Table 1 13 mRNAs targeted by MiRNAs and associated with overall survival in thyroid cancer patients.

Construction of the prognosis-associated ceRNA subnetwork and the prognostic risk model

The targeting relationships between miRNAs and prognostic mRNAs obtained in the previous step were sorted. Based on Starbase and LncACTdb predictions, 63 differentially expressed lncRNAs were found to target 15 miRNAs. These results were further analyzed for regulatory relationships and correlations. Consequently, a prognostic ceRNA regulatory sub-network consisting of 3 mRNAs, 4 miRNAs, and 14 lncRNAs was constructed (Fig. 6a). In thyroid cancer tissues, the expression of mRNAs and lncRNAs in the prognostic ceRNA subnetwork was up-regulated, while miRNA expression was down-regulated (Fig. S5). The expression levels of up-regulated mRNAs and lncRNAs were positively correlated, whereas the expression levels between lncRNAs/mRNAs and down-regulated miRNAs were negatively correlated (Fig. S6). Pearson correlation analysis was performed for prognostic related gene expression levels in ceRNA regulatory network, and the results were shown in supplementary Table 2.

After collating the clinical information of 498 thyroid cancer patients, basic clinical data were summarized (Table 2). Tumor staging was divided into Stage I-II and Stage III-IV groups. The effects of RNAs in the ceRNA regulatory network on clinical staging were analyzed. The results showed that 3 mRNAs (CTSH, GALNT7, PRR15) and 11 lncRNAs (e.g., GAS5, CDKN2B-AS1, LINC00511) were up-regulated in tumor tissues. Higher expression levels corresponded with more advanced tumor stages. Among the miRNAs, down-regulated hsa-miR-363-3p and hsa-miR-20b-5p showed relatively low expression levels in advanced tumor stages. Two miRNAs (hsa-miR-195-5p, hsa-miR-138-5p) and three lncRNAs (AC005562.1, AC005083.1, ERICD) were not significantly associated with clinical staging (Fig. 6b).

Combining these results, 14 lncRNAs and 3 mRNAs were used to construct a risk assessment model for thyroid cancer prognosis. Univariate COX regression analysis identified 10 genes associated with prognosis. Multivariate regression analysis narrowed these to three RNAs for the risk assessment model: PRR15 (mRNA), HCP5 (lncRNA), and DUXAP8 (lncRNA). Multivariate analysis showed that PRR15 (HR = 0.797) and HCP5 (HR = 0.665) were protective genes, with higher expression linked to better prognosis in thyroid cancer patients. Conversely, DUXAP8 (HR = 1.855) was a risk gene, with high expression associated with poor prognosis. Risk scores for thyroid cancer patients were calculated using the formula (Table 3):

$$Risk\,score = PRR15\,\exp ression\,value \times \left( { - 0.226} \right) + HCP5\,\exp ression\,alue \times \left( { - 0.409} \right) + DUXAP8\,\exp ression\,value \times \left( {0.618} \right)$$

A cut-off risk score of 1.9 divided patients into high-risk and low-risk groups. Kaplan-Meier survival curves showed significantly worse survival in the high-risk group (P = 0.046) (Fig. 7a). The ROC curves demonstrated good predictive performance of the model, with AUCs of 0.681 at 1 year, 0.803 at 3 years, and 0.69 at 5 years (Fig. 7b). The gene prognostic model was evaluated using the internal cohort data from TCGA. The performance of the model was assessed through two randomly divided test groups, demonstrating superior evaluation performance. Specifically, survival time and status data were analyzed for test group 1 (350 samples) and test group 2 (148 samples). The optimal cutoff value of 1.9, as determined by the model, was applied to stratify the two validation datasets. This resulted in 70 high-risk and 280 low-risk samples in group 1, and 33 high-risk and 115 low-risk samples in group 2. Validation results indicated a significant difference in overall survival between the high-risk and low-risk groups in test group 1 (P = 1.048 × 10 − 4, Fig. 7c). The ROC curve analysis for test group 1 showed an AUC of 0.809 at 1 year and 0.662 at 3 years for predicting thyroid cancer outcomes (Fig. 7d). Additionally, there was a significant difference in survival between the high-risk and low-risk groups across both test groups (P = 5.771 × 10 − 3, Fig. 7e). The ROC curve analysis for both test groups collectively demonstrated an AUC of 0.705 at 1 year and 0.761 at 5 years (Fig. 7f). Further univariate (HR = 1.181, P < 0.001) and multivariate (HR = 1.103, P = 0.011) prognostic analyses confirmed that the 3 RNAs in the prognostic model could serve as independent prognostic risk factors (Fig. 8).

To validate the expression levels of these key RNAs, quantitative PCR was performed on the thyroid cancer cell lines (TPC-1 and B-CPAP) and the normal thyroid epithelial cell line (Nthy-ori 3 − 1). The results demonstrated that the RNA expression levels of DUXAP8, PRR15, and HCP5 were significantly higher in both TPC-1 and B-CPAP cells compared to Nthy-ori 3 − 1 cells (Fig. 9). These findings provide experimental support for the proposed roles of PRR15, HCP5, and DUXAP8 in thyroid cancer pathogenesis and their potential as prognostic markers.

Table 2 Basic clinical information for thyroid cancer patients.
Table 3 Statistical values of thyroid cancer prognostic risk assessment model.
Fig. 6
figure 6

Construction of the prognostic ceRNA sub-network and the prognostic model. (a) Prognostic ceRNA sub-network constructed from 3 mRNAs, 4 miRNAs, and 14 lncRNAs. Each rectangle represents a gene, and the size of the rectangle indicates the degree of connectivity. (b) Relationship between RNA expression in prognostic ceRNA sub-networks and clinical staging of thyroid cancer patients. Each column in the box-line plot represents a gene, with the gene name at the top. Genes associated with clinical stages are marked with an asterisk (*). Statistical significance is indicated as follows: *** P < 0.001, ** P < 0.01, * P < 0.05. Genes with no significant correlation to clinical stages are unmarked. Green boxes indicate patients in stages I-II, and red boxes indicate patients in stages III-IV.

Fig. 7
figure 7

(a) KM survival curve; (b) AUC curve of prognostic risk assessment model; (c) KM survival curve of test group 1; (d) AUC curve of test group 1; (e) KM survival curve of test group 2, and (f) AUC curve of test group 2. In the survival graph, the blue curve shows the survival rate of the low-risk group and the red curve shows the survival rate of the high-risk group. The area under the AUC curve indicates the recognition ability of the prediction model, and the blue, green, and red curves indicate the accuracy of the model in predicting the onset of thyroid cancer patients at 1, 3, and 5 years.

Analysis of the content and differences of 22 immune cells

The Cibersort algorithm estimated the content of immune cells in thyroid tissue, showing high levels of memory B cells, plasma cells, CD4 resting memory T cells, M0 macrophages, and M2 macrophages. The level of immune infiltration was likely deep (Fig. 10a). The Wilcoxon rank-sum test was used to calculate the differential levels of immune cell content between normal and cancerous thyroid tissue. The differences were statistically significant (P < 0.05). The results showed significant differences (P < 0.05) in the content of 15 immune cell types between cancerous and normal tissues, including memory B cells, plasma cells, CD8 T cells, naive CD4 T cells, resting memory CD4 T cells, follicular helper T cells, regulatory T cells (Tregs), resting NK cells, activated NK cells, M0 macrophages, M2 macrophages, resting dendritic cells, activated dendritic cells, resting mast cells, and activated mast cells (Fig. 10b).

Fig. 8
figure 8

Forest plot for regression analysis of factors associated with thyroid cancer prognosis. (a) Univariate regression analysis. (b) Multivariate regression analysis. Solid squares indicate HR values, red are risk factors (HR > 1), green are protective factors (HR < 1) horizontal lines indicate 95% confidence intervals.

Fig. 9
figure 9

Relative expression analysis in diverse cell lines. qPCR analysis of PRR15 (a), HCP5 (b), and DUXAP8 (c) expression in thyroid cancer cell lines (TPC-1, B-CPAP) and normal thyroid cell line (Nthy-ori 3 − 1). Data are normalized to GAPDH expression and presented as mean ± SD. *p < 0.05, **p < 0.01, ***p < 0.001.

Fig. 10
figure 10

Content and differences of 22 immune cell types in thyroid samples. (a) Heatmap. The 66 normal samples are shown in blue at the top, and the 174 cancer samples are shown in red. The legend indicates low to high levels of immune cells from blue to red. (b) The content of immune cells in normal thyroid tissue and tumor tissue is compared in the violin chart. The horizontal coordinate shows 22 types of immune cells, and the vertical coordinate shows the content of immune cells in the sample. Normal tissue is shown in blue and cancerous tissue in red.

Survival information and immune cell content for thyroid cancer samples were integrated. Univariate COX regression analysis, followed by multivariate COX regression analysis for immune cells with P < 0.05, showed that resting NK cells, monocytes, M0 macrophages, and activated mast cells were associated with thyroid cancer prognosis (Fig. 11a). Additionally, risk survival curves based on multivariate COX analysis revealed that patients in the high-risk group had a poorer prognosis (Fig. 11b). The ROC curves demonstrated that the model was a good predictor of patient prognosis, with AUC values of 0.905, 0.963, and 0.969 for 1-year, 3-year, and 5-year survival, respectively (Fig. 11c).

Fig. 11
figure 11

Construction of an immune cell-based prognostic model. (a) Forest plots for multivariate COX regression analysis. (b) Kaplan-Meier survival curves. (c) ROC curves.

Immune cell correlations and drug sensitivity analysis

Pearson correlation analysis was performed on the immune cell content and prognosis-related genes in the prognostic ceRNA sub-network (Fig. 12a). The analysis showed that DUXAP8 was significantly positively correlated with activated mast cells (R = 0.31, P < 0.001, Fig. 12b) and monocytes (R = 0.33, P < 0.001, Fig. 12c). HCP5 was significantly negatively correlated with resting NK cells (R=-0.23, P = 0.0026, Fig. 12d). These findings suggest that immune cell infiltration is closely associated with the expression of prognosis-related genes, highlighting their potential role in thyroid cancer progression.

To investigate therapeutic implications, drug sensitivity analysis was conducted on thyroid cancer patients from the TCGA dataset. The results revealed significant differences in drug sensitivity between high-risk and low-risk groups. For sorafenib, the IC50 values were significantly lower in the high-risk group compared to the low-risk group (P = 0.0039), indicating greater sensitivity to this drug among high-risk patients (Fig. S8). Conversely, for sunitinib, the IC50 values were significantly lower in the low-risk group than in the high-risk group (P < 0.001), suggesting higher sensitivity to this drug in low-risk patients (Fig. S8).These findings highlight that patients stratified into different prognostic risk groups exhibit distinct responses to targeted therapies. This underscores the utility of the prognostic risk assessment model in guiding personalized treatment strategies and aiding in resistance prevention for thyroid cancer patients.

Fig. 12
figure 12

Analysis of co-expression between immune cells and key RNAs of the ceRNA networks. (a) Co-expression heatmap. Immune cells and gene expression are marked with * when correlated. Statistical significance is indicated as follows: *** P < 0.001, ** P < 0.01, * P < 0.05. The color gradient from red to blue indicates the correlation from large to small. (b) Scatter plot of the correlation between DUXAP8 and activated mast cells. (c) Scatter plot of the correlation between DUXAP8 and monocytes. (d) Scatter plot of the correlation between HCP5 and resting NK cells. The horizontal axis represents the amount of gene expression, the vertical axis represents the number of immune cells, each point represents a sample, and the line indicates the correlation between the two.

Discussion

The pathogenesis of thyroid cancer is complex, and the causative factors and reasons for its high incidence remain unknown. Traditionally, tumor screening is performed using radiography, endoscopy, and ultrasonography to determine the tumor’s location. However, these methods are time-consuming and costly when applied in the early stages of tumor screening. Equally important is the data mining of large samples at the molecular genetic level for tumor screening. Early diagnosis and cancer course assessment with molecular markers can improve early diagnosis and prognosis for patients19.In addition to genetic markers, studies suggest that immune cell infiltration in the tumor microenvironment significantly affects the immune response and patient prognosis. Complex ceRNA regulatory networks are expected to silence cancer-promoting pathways at the molecular level20. Immune cells form the cellular basis of immunotherapy, and immune infiltration may provide a new strategy to treat cancer21. However, few studies have comprehensively analyzed the RNA molecules involved in ceRNA networks and immune cell infiltration.

In this study, differential ceRNA regulatory networks were constructed using large-scale gene expression data from thyroid tissues in the TCGA, GTEx, and GEO databases. Differential analysis and WGCNA identified mRNAs, lncRNAs, and miRNAs associated with thyroid carcinogenesis. The ceRNA network contained 32 lncRNAs (16 up-regulated and 16 down-regulated), 18 miRNAs (11 up-regulated and 7 down-regulated), and 56 mRNAs (43 up-regulated and 13 down-regulated). Clinical data from TCGA were analyzed to identify mRNAs affecting overall survival, resulting in a prognostic ceRNA sub-network of 21 RNAs (3 mRNAs, 4 miRNAs, and 14 lncRNAs). In this sub-network, 3 mRNAs and 11 lncRNAs, including CTSH, GALNT7, GAS5, and CDKN2B-AS1, were positively correlated with tumor stage. Additionally, hsa-miR-363-3p and hsa-miR-20b-5p showed down-regulation in later clinical stages. Several genes in the prognostic ceRNA sub-network have been experimentally confirmed. For instance, GALNT7 acts as a ceRNA targeting the EGFR/PI3K pathway with hsa-miR-30b-5p, inhibiting papillary thyroid carcinoma progression22. GAS5 sponges hsa-miR-221-3p, inducing cell cycle arrest and inhibiting follicular thyroid cancer cell proliferation when highly expressed23. TNRC6C-AS1, a ceRNA for hsa-miR-129-5p, regulates UNC5B in thyroid cancer, affecting cell proliferation, migration, and invasion24. High ZFAS1 expression is associated with papillary thyroid cancer progression, involving the hsa-miR-590-3p/HMGA2 regulatory axis25. hsa-miR-363-3p inhibits thyroid cancer development by targeting NOB1, and hsa-miR-20b inhibits the MAPK/ERK signaling pathway by targeting SOS1 and ERK226,27. These studies highlight the critical impact of abnormal gene expression on thyroid cancer cell proliferation, invasion, and migration, and explore the complex mechanisms of ceRNA regulation.

Genes in the ceRNA regulatory network were analyzed using univariate and multivariate COX regression. A prognostic risk assessment model comprising PRR15 (mRNA), HCP5 (lncRNA), and DUXAP8 (lncRNA) was developed. DUXAP8 was identified as a risk factor for patient survival, with high expression significantly associated with poor prognosis. Regression analysis incorporating age, gender, and clinical staging revealed that the risk score could serve as an independent prognostic factor for thyroid cancer. Previous studies identified PRR15 as a key gene in breast cancer development28, though its role in cancer prognosis has not been extensively explored. HCP5, when highly expressed in thyroid cancer tissues, promotes cancer cell proliferation, migration, invasion, and angiogenesis29. Abnormal HCP5 expression is linked to overall survival in melanoma and ovarian cancer patients30. High DUXAP8 expression is significantly associated with poor prognosis in hepatocellular and pancreatic cancer patients, consistent with the present study’s findings31.However, the prognostic risk assessment model results require validation in large-scale clinical trials. qPCR analysis confirmed elevated PRR15, HCP5, and DUXAP8 expression in thyroid cancer cell lines (TPC-1, B-CPAP) compared to normal cells (Nthy-ori 3 − 1), supporting their roles in thyroid cancer. While PRR15 and HCP5 were identified as protective in TCGA data, their increased expression in cell lines may reflect context-dependent functions within the tumor microenvironment. Elevated DUXAP8 expression in cancer cells further supports its role as a risk gene. Future studies should investigate the specific mechanisms by which these genes contribute to thyroid cancer pathogenesis.

This study also analyzed immune cell infiltration in thyroid tissue samples. Previous studies showed that monocytes and M0 macrophages promote cancer development32, while mast cells enhance cancer progression by releasing cytokines and improving the interstitial space of thyroid cancer33. However, the role of NK cells in tumors is less well-documented and requires further research15. Correlation analysis of prognosis-related immune cells with prognosis-related genes in the ceRNA network showed that DUXAP8 was positively correlated with activated mast cells and monocytes, while HCP5 was negatively correlated with resting NK cells. This analysis revealed significant associations between DUXAP8 (lncRNA)-hsa-miR-20b-PRR15, DUXAP8-hsa-miR-363-3p-GALNT7, HCP5 (lncRNA)-hsa-miR-20b-PRR15, and HCP5-hsa-miR-138-5p-CTSH regulatory axes, suggesting cooperation between ceRNA networks and immune cell-related mechanisms in thyroid cancer prognosis. Few studies have experimentally validated the potential co-expression of ceRNAs and immune cells in thyroid cancer.

In comparing our prognostic model, which integrates ceRNA components (PRR15, HCP5, and DUXAP8), with established models like TNM staging and BRAF mutation status, we find that our model offers enhanced predictive capabilities. The TNM system primarily categorizes patients based on tumor size and metastasis but often lacks specificity for individual prognosis. Our model, showing AUC values between 0.681 and 0.803, demonstrates a better ability to stratify patients by incorporating molecular and immune profiles. Additionally, while BRAF mutations are significant predictors of aggressive disease, our model’s focus on ceRNA dynamics provides novel insights into the tumor microenvironment’s role in thyroid cancer progression.

The degree of immune cell infiltration is regulated by the intrinsic signaling pathways of tumor cells and gene regulatory networks. Immunotherapy can resist tumor cells by interfering the activity of immune molecules. Finding effective immune-related markers to apply to individual therapy is a new idea to develop tumor prevention programs. Previous studies have found that lncRNA regulates CXCL9 expression by targeting miR-1269a in patients with gastric cancer, and this regulatory axis is positively correlated with molecules related to T cell and NK cell activation and function in CXCL934. A prognostic risk assessment model constructed from ceRNA regulatory networks associated with chromosomal instability in lung adenoma showed that patients at lower risk had higher B cell, T cell CD4+, T cell CD8+, neutrophils, macrophages, and myeloid dendritic cell infiltrations35. In a related review of thyroid cancer cells, BRAFV600E has been shown to reduce the expression of TSP-1 in undifferentiated thyroid cancer cells, which is associated with enhanced proliferation, collagen adhesion, migration and invasion of thyroid cancer cells. Activation of immune cells in the thyroid cancer microenvironment helps stabilize newly generated blood vessels by secreting PDGRFβ, VEGF, and other factors36. Other studies have shown that PDGF-BB is considered to be the most potent lymphangiogenic factor, promoting lymphatic metastasis in PTC through better binding to PDGF receptor α. PDGF-BB is overexpressed in many cancer types of epithelial origin and affects cancer-related angiogenesis by up-regulating VEGF-A expression37. The correlation analysis of other cancers showed that in the co-expression analysis of ceRNA and immune cells, HCP5 was found to be associated with CD4 + T cells, CD8 + T cells, neutrophils, macrophages, B cells, and dendritic cells in liver cancer38. DUXAP8 was positively correlated with M2 macrophages and negatively correlated with activated mast cells, CD8 T cells and follicular helper T cells39. However, these results have been less reported in the study of the invasive effect of thyroid cancer. Genes and their regulatory axes in the ceRNA regulatory network of thyroid cancer may play a synergistic role with immune cells to affect tumor development and patient prognosis.

Inevitably, there are some limitations in our study that must be acknowledged. First, the limited number of samples we collected from public databases with incomplete clinicopathological parameters may impose some limitations. The data of the ceRNA prognostic risk assessment model constructed in this study were verified only by TCGA internal data, and more clinical samples should be included in subsequent studies to confirm the accuracy of the model. In addition, the key pathways of immune cells through ceRNA regulatory networks will be comprehensively explored. Due to the uncertainty of biomarkers and the complexity of the operational process, few studies have experimentally verified the potential co-expression relationship between ceRNA regulatory axes and immune cells in thyroid cancer. It still needs to be further confirmed by rigorous biological experiments in the later stage.

Our study is the first to combine transcriptomic data from TCGA, GTEx, and GEO databases on a large scale to analyze thyroid cancer and normal tissues. Including normal thyroid samples from GTEx improved the study’s accuracy. Differential analysis using large sample data from TCGA and validation of mRNA expression levels using GEO datasets enhanced the results reliability. This study innovatively combined WGCNA results with differential expression analysis to identify lncRNA and mRNA modules associated with clinical features. Based on these findings, a specific ceRNA regulatory network for differential expression was constructed, followed by prognostic analysis to develop a ceRNA sub-network and a prognostic risk assessment model. Integrating immune infiltration analysis hypothesized a potential association between the ceRNA network and immune cells, providing novel insights into thyroid cancer pathogenesis. This study applied bioinformatics algorithms to analyze ceRNA regulation and immune infiltration mechanisms in thyroid cancer, offering a better understanding of key genes and immune cells’ roles and enabling personalized cancer treatment.

Methods

Data acquisition and differential expression analysis

Transcriptomic data, sample phenotype files, and gene annotation files were obtained from the UCSC Xena database (http://xena.ucsc.edu/). These included normal tissue samples from the GTEx database (https://www.gtexportal.org/) and thyroid tissue samples from the TCGA database (https://portal.gdc.cancer.gov/)40. Expression data of 14,833 lncRNAs and 19,706 mRNAs were accessed from 338 normal thyroid tissue samples and 512 cancerous thyroid tissue samples. The GEO dataset (https://www.ncbi.nlm.nih.gov/geo/, GSE33630) provided expression profiles of 21,755 mRNAs in 60 cancerous thyroid tissues and 45 normal tissues.

Clinical information, including age, gender, survival time, survival status, and tumor stage of thyroid cancer patients, was retrieved from the TCGA database, which included miRNA high-throughput sequencing expression data. Annotation information for miRNA names was sourced from the miRBase database (http://mirbase.org/)41, resulting in the collation of expression information for 2,190 miRNAs in 59 normal samples and 514 cancerous thyroid samples.

The collation and normalization of thyroid tissue samples and gene expression information were performed using R (version 4.1). High-throughput sequencing data were analyzed with the “edgeR” package42, while chip sequencing data were analyzed using the “limma” package43. The threshold for significantly differentially expressed lncRNAs, miRNAs, and mRNAs was set at |log2 (fold change, FC)| >1 and a false discovery rate (FDR) P value < 0.05.

Weighted gene co-expression analysis

Weighted gene co-expression network analysis (WGCNA) of lncRNA and mRNA in thyroid tissue was performed using the “WGCNA” package in R44. Gene and sample quality were assessed, and missing data entries were checked. Soft thresholds were determined to construct an adjacency matrix, and connectivity was calculated to generate a topological overlap matrix. Based on gene co-expression similarity, many genes were clustered into co-expression modules, which were labeled by color. Pearson correlation analysis was then performed to evaluate the relationship between the modules and thyroid tissue features. After identifying modules highly correlated with sample features, the genes in these modules were extracted. The lncRNAs and mRNAs identified by WGCNA and differential expression analysis were included in the subsequent analysis.

Functional enrichment analysis and protein-protein interaction analysis

The overlapping mRNAs, defined as the intersection of differentially expressed mRNAs and modular mRNAs identified through WGCNA, were analyzed for functional enrichment. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were conducted using the “clusterProfiler” package45,46. In STRING (https://string-db.org/), interactions between proteins encoded by overlapping mRNAs with interaction scores greater than 0.7 were explored. The protein-protein interaction (PPI) network was visualized in Cytoscape (version 3.7). Key regions of densely connected proteins within the network were identified using the MCODE function, while important hub proteins were highlighted with the CytoHubba function.

Construction of the ceRNA network

The targeting relationships of miRNAs and mRNAs were independently identified using TargetScanHuman (https://www.targetscan.org/vert_80/) and starBase v2.0 (https://starbase.sysu.edu.cn/), and their intersection was determined47. The targeting relationships of lncRNAs and miRNAs were compiled using starBase v2.0 and LncACTdb 3.0 (http://bio-bigdata.hrbmu.edu.cn/LncACTdb/)48. These relationships were then integrated to compare interactions among mRNAs, miRNAs, and lncRNAs. According to the ceRNA hypothesis, miRNAs regulate mRNA expression by inhibiting translation, while coding and non-coding RNAs interact by competing for miRNA binding. The expression of lncRNAs and mRNAs as ceRNAs is positively correlated, while the expression of miRNAs that bind to both is negatively correlated. Correlation analysis of RNA expression in the ceRNA network was performed to identify RNA targeting relationships consistent with the ceRNA hypothesis. The targeting relationship data consistent with the ceRNA hypothesis were imported into Cytoscape (version 3.7) to visualize the ceRNA networks.

Survival analysis

Survival data from thyroid cancer patients were compiled, focusing on samples with both gene expression data and survival information. Univariate COX regression was used to identify genes significantly affecting survival in these patients. Survival curves were analyzed using Kaplan-Meier analysis. The X-tile software (version 3.6.1) calculated the optimal cut-off value of the risk score, calculated the risk of each sample according to the optimal cut-off value, and divided the samples into high-risk and low-risk groups to verify the prognosis model. The total samples were randomly divided into two test groups at a ratio of 7:3.

Construction of the prognostic risk model

Survival data were compiled, excluding samples with missing information or unknown tumor pathological stage. Univariate Cox proportional hazards regression analysis was used to identify genes in the ceRNA regulatory network associated with overall survival. Statistically significant genes were then selected for multivariate Cox proportional hazards regression analysis. Patients were divided into high-risk and low-risk groups based on the median risk score, and survival curves were plotted to compare these groups. The equation for the prognostic risk score model is:

$$\:Risk\:Score=\:{\sum\:}_{i=1}^{n}\beta\:\left(i\right)\:\times\:gene\:expression\left(i\right)$$
(1)

where i represents genes in the model and β is the Cox coefficient. This process was performed using the “survival” and “survminer” packages. The performance of the prognostic model was assessed using the “timeROC” package, with the area under the ROC curve evaluating the model’s predictive power.

RTqPCR

TPC-1, B-CPAP, and Nthy-ori 3 − 1 cells were cultured in 1640 medium with 10% FBS and passaged until reaching approximately 70% confluency. Following trypsinization and resuspension, cells were seeded in T25 flasks and cultured to 80% confluency prior to RNA extraction. Total RNA was extracted using TriQuick Reagent (Solarbio, R1100) and subsequently used as a template for cDNA synthesis with the HiScript II kit (Vazyme, R223-01). Quantitative PCR (qPCR) was performed using primers synthesized by Azenta, with GAPDH as internal reference. Gene expression was quantified using the 2−ΔΔCt method. The primer sequences were listed below: PRR15 sense (5′-CAGGTGCTGGAGTTCAAGTG-3′) and antisense (5′-TCCACAGGGTCCTTGTTGAT-3′); HCP5 sense (5′-CAGCTACGTGCTACC TGAAG-3′) and antisense (5′-GTCCATGTCGATGGTCTTG-3′) ; DUXAP8 sense (5′-CAGGTGCTCAAGGTCTCTGA-3′) and antisense (5′-TCCAGGTACGGTC AAGATCC-3′).

Analysis of immune cell infiltration

CIBERSORT is a machine learning algorithm that uses support vector regression to deconvolute and simulate the transcription signature of immune cells49. This algorithm predicted the abundance of immune cells in tissues based on gene expression data. The number of cycles during the simulation was set to 1000 using gene expression profile data from 850 thyroid tissue samples. Samples with P < 0.05 were selected for further analysis. The Wilcoxon rank-sum test was used to compare the abundance of immune cells in normal and cancerous thyroid tissues, with differences considered statistically significant at P < 0.05.

Statistical methods

The log-rank test was used for Kaplan-Meier survival analysis. Pearson correlation was applied to analyze the expression levels of mRNAs, lncRNAs, and miRNAs in cancer samples. The Wilcoxon rank-sum test was used to examine differences in gene expression levels at different clinical stages and to compare immune cell content in normal and cancerous thyroid tissues. All computational analyses were performed using R.

Drug sensitivity analysis

Based on the prognostic risk scoring model, patients were categorized into high-risk and low-risk groups. The “pRRophetic” R package was utilized to calculate the half-maximal inhibitory concentration (IC50) of chemotherapy drugs, assessing drug sensitivity in thyroid cancer patients within these groups. The Wilcoxon rank-sum test was employed to compare the IC50 differences between the two groups, with statistical significance set at P < 0.05.

Conclusions

We identified differentially expressed genes using WGCNA, TCGA, GTEx, and GEO databases to construct a specific ceRNA regulatory network for thyroid cancer. A prognostic ceRNA sub-network, consisting of 3 mRNAs, 4 miRNAs, and 14 lncRNAs, was established through survival analysis. Additionally, prognostic models were developed for RNAs in the ceRNA sub-network. A comprehensive analysis incorporating immune cell infiltration revealed that DUXAP8 and its ceRNA axis correlated with activated mast cells and monocytes, while HCP5 and its ceRNA axis correlated with resting NK cells. This comprehensive analysis of prognosis-related genes in the ceRNA regulatory network and the impact of immune cell infiltration on thyroid cancer prognosis provides valuable insights for studying the pathogenesis, diagnosis, and treatment of thyroid cancer.