Introduction

Papillary thyroid carcinoma (PTC) accounts for approximately 85% of differentiated thyroid tumors1,2. Although PTC generally carries a favorable prognosis, with a five-year survival rate exceeding 90%, outcomes worsen significantly in aggressive cases. Recurrence occurs in nearly 30% of patients, and recurrent disease can result in mortality rates of up to 60%3,4,5. Despite extensive research, the genetic basis of PTC remains incompletely understood, with only a limited number of gene mutations identified to date 6.

The competitive endogenous RNA (ceRNA) hypothesis provides a framework to explain tumorigenesis, based on the observation that many cellular transcripts contain microRNA (miRNA) response element (MRE) sequences7,8. Within this system, ceRNA components—including miRNAs, pseudogenes, long noncoding RNAs (lncRNAs), circular RNAs (circRNAs), and mRNAs9,10—compete for binding to shared MREs, thereby regulating the expression of corresponding mRNAs. Recent studies have shown that pseudogenes, lncRNAs, and circRNAs contribute to cancer development through miRNA sponging mechanisms11,12,13,14,15. Thus, maintaining balance among ceRNA transcripts is essential, and their dysregulation can drive pathological processes, including oncogenesis. Moreover, transcriptional regulation of ceRNA network components by specific transcription factors (TFs) may further illuminate mechanisms of cancer progression.

However, there remains a lack of research integrating ceRNA and TF networks to identify key causal genes in PTC. To address this gap, we performed RNA-seq transcriptome analysis to identify overlapping differentially expressed genes (DEGs) across various PTC stages and subsequently highlighted hub genes for further validation. Next, we constructed ceRNA and DE-TF networks and validated candidate genes using RT-qPCR. Finally, we performed downstream analyses, including receiver operating characteristic (ROC) curve evaluation and survival analysis, to assess the diagnostic and prognostic significance of candidate genes within these networks. A schematic representation of the entire investigation is shown in Fig. 1.

Fig. 1
figure 1

Schematic representation of multilayer gene analysis in papillary thyroid carcinoma (PTC).

Materials and methods

Expression data

RNA-seq expression data for thyroid cancer (THCA) were obtained from The Cancer Genome Atlas (TCGA) portal (https://portal.gdc.cancer.gov/). Data from the TCGA-THCA project were retrieved using the TCGAbiolinks package16, yielding 571 thyroid tissue samples, including both PTC and normal tissues. After excluding eight metastatic samples and two with ambiguous pathological staging, 561 samples remained, comprising 502 PTC tissues and 59 normal tissues. Among the PTC cohort, 283 cases were stage I, 52 stage II, 112 stage III, and 55 stage IV. Because circRNA expression data are not available in TCGA, the GSE171011 dataset was used to evaluate circRNA expression. However, the samples in GSE171011 were not staged, limiting the ability to assess circRNA expression across different PTC stages.

Data filtering and normalization

Data filtering was applied to remove low-quality samples, excluding those with total read counts below 50% of the median library size. The remaining data was processed using edgeR package17. Raw counts were organized into a DGEList object, and lowly expressed genes were filtered out using the filterByExpr function to improve statistical reliability while preserving biological relevance. To account for differences in library composition, normalization was performed using the weighted trimmed mean of M-values (TMM) method. The TMM-normalized data were then prepared for linear modeling with the voom function, which transformed the counts into log₂-counts per million (logCPM) and assigned precision weights to each observation based on the mean–variance relationship of the RNA-seq data. This transformation addressed non-constant variance and produced a matrix of normalized, weighted logCPM values that was subsequently used for differential expression analysis, ensuring statistical robustness and validity18.

Identifying DEGs across all stages of PTC

Normalized expression data were analyzed for differential expression using the Limma package19. Multiple testing correction was performed using the Benjamini–Hochberg procedure to control the false discovery rate (FDR). To identify stage-specific DEGs, contrast matrices were generated, and each PTC stage (I, II, III, and IV) was compared individually against normal thyroid tissue. Protein-coding DEGs (DEmRNAs) were defined as transcripts with a log₂ fold change (logFC) greater than 1.0 and an FDR < 0.05. Noncoding RNAs—including miRNAs, lncRNAs, and pseudogenes—were designated as DEmiRs, DElncs, and DEpseudogenes, respectively, and filtered using an FDR < 0.05 irrespective of logFC. After identifying stage-specific RNAs, intersections across the four stages were examined to identify overlapping molecules. Using comparable approaches, consistently deregulated noncoding RNAs (DElncs, DEmiRs, and DEpseudogenes) shared among all PTC stages were determined. Differentially expressed circRNAs (DEcircs) were defined as transcripts with logFC > 1.0 and FDR < 0.05.

Selection of hub genes via network analysis

DEmRNAs were used to construct a PPI network with a minimum interaction confidence of 0.420. Important nodes were prioritized using complementary centrality measures spanning local, semilocal, and global topology. Maximal clique centrality (MCC) was applied as a local measure given its strong performance in detecting biologically meaningful hubs21. The Integrated Value of Influence (IVI) served as an integrative index, computed with the Influential R package, to capture multiple topological dimensions by combining key local (degree centrality, ClusterRank), semilocal (neighborhood connectivity, local H-index), and global (betweenness centrality, collective influence) metrics in a bias-mitigating manner22. Candidate genes were defined as the overlap between the top 10% ranked by MCC and the top 10% ranked by IVI.

To detect densely connected modules, the MCODE plugin in Cytoscape23 was used, and the highest-scoring cluster was selected for downstream analysis. Hub genes were defined at the intersection of the MCODE-derived cluster and the candidate gene set.

Verification of hub genes by RNAseq and microarray data analysis

Gene Expression Profiling Interactive Analysis (GEPIA) was used to validate the expression levels of PTC hub genes, leveraging RNA-seq data from the TCGA-THCA project24. To increase the number of normal tissue samples to 337, additional normal tissue data were incorporated from the Genotype-Tissue Expression (GTEx) portal (https://www.gtexportal.org). Differential expression analysis was performed using the Limma package in a two-step process. First, a single-dataset analysis compared PTC tumors with normal tissues, both obtained from the TCGA-THCA database. Second, a combined analysis compared PTC tissues from TCGA-THCA with pooled normal tissues from both TCGA and GTEx databases. Genes with a LogFC > 1.0 and P < 0.05 were considered GEPIA-validated. Additional validation was conducted using microarray data from the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov). Datasets involving treatments or therapies were excluded. Seven datasets were included for analysis: GSE3678, GSE9115, GSE27155, GSE33630, GSE53157, GSE60542, and GSE138198 (Table S1). Data were normalized using log2 transformation and quantile normalization. Hierarchical clustering and principal component analysis (PCA) were performed to identify outliers. Metadata from individual datasets were integrated to create a unified microarray dataset, with batch effects corrected using the SVA25 and Limma packages. DEGs between PTC and normal tissues were identified using Limma (FDR < 0.05). Hub genes were defined as those overlapping between the DEGs from microarray analysis and the genes validated via GEPIA.

Assessment of confounding factors

The recount package was employed to assess potential confounding effects of sex and age on hub gene expression levels26. Following integration of TCGA and GTEx data, batch effects were identified and corrected using the SVA package, thereby enhancing analytic precision across cohorts. To evaluate the influence of sex, four comparisons were performed: i) PTC tissues from female patients versus corresponding normal female tissues, ii) PTC tissues from male patients versus corresponding normal male tissues, iii) all PTC tissues versus all normal tissues, irrespective of sex, and iv) direct comparison of hub gene expression between male and female subjects. For age-related analysis, a comparative study was conducted to assess the impact of age on hub gene expression. The median age among PTC patients in both the TCGA/GTEx dataset and fresh tissue samples was 46 and 46.5 years, respectively. Accordingly, PTC tissues were divided into two groups: patients aged 46 years or older and those younger than 46 years.

Identification of ceRNA network components

In addition to hub genes, regulatory molecules targeting these genes—including miRNAs, lncRNAs, pseudogenes, and circRNAs—were identified. Four databases (miRDB27, RNAInter28, miRTarBase29, and miRWalk30) were used to predict miRNA targets. lncRNAs and pseudogenes were identified using the RNAInter, StarBase31, and miRcode32 databases. MiRNA selection followed two criteria: first, a hub gene had to be targeted by miRNA(s) present in at least two of the databases; second, the identified miRNAs were required to overlap with DEmiRs. MiRNAs meeting both criteria were considered common miRs and used to predict the corresponding lncRNAs, pseudogenes, and circRNAs. Common lncRNAs were defined as those overlapping between predicted lncRNAs and DElncs, while common pseudogenes were those overlapping between predicted pseudogenes and DEpseudogenes. To identify DEcircs potentially interacting with the common miRs, the IntaRNA tool33 was employed to assess circRNA–miRNA interactions. CircRNAs with predicted interaction energies below − 7 kcal/mol were incorporated into the ceRNA network.

Coexpression analysis and construction of a ceRNA network

Normalized expression values of genes within the ceRNA network were used to construct a coexpression network. Data analysis was performed using the Hmisc package34, while the corrplot package35 was employed to generate graphical representations of the correlation matrix. Visualization of the ceRNA network, including hub genes, common miRs, common lncRNAs, common pseudogenes, and DEcircRNAs, was carried out using Cytoscape software.

Pan-cancer analysis of the ceRNA network

To assess the expression levels of ceRNA network components across various cancer types, the TIMER 2.0 resource was utilized36. Statistical significance was determined using the Wilcoxon test, with significance levels indicated as follows :*P < 0.05; **P < 0.01; ***P < 0.001 (denoted by asterisks).

DE-TF network construction

The Open Access Repository of Transcriptional Interactions (ORTI) was used to identify experimentally validated transcription factors (TFs)37. Two separate networks were constructed: one involving TFs associated with hub genes and another related to ceRNA network genes. A differentially expressed TF (DE-TF) network was created by identifying components of the TF networks that overlapped with DEmRNAs. Visualization and identification of interactions within the DE-TF network were facilitated using the ggalluvial package.

Interstage analysis of the ceRNA/DE-TF network components

RNA-seq data from TCGA and GTEx were analyzed using GEPIA. The expression levels of genes within the ceRNA and DE-TF networks were evaluated across the four clinical stages of PTC. Statistical significance was defined as a P value of the F statistic (Pr(> F)) less than 0.05.

Development of a prognostic model for the components of the ceRNA and DE-TF networks

To develop a robust prognostic model and mitigate the risk of overfitting, a comprehensive internal validation was performed using a stratified tenfold cross-validation (CV) approach38,39. To this end, the patient cohort was partitioned into ten folds, and the analytical pipeline was iterated ten times, each time training on nine folds and testing on the held-out fold. The pipeline utilized univariable Cox regression analysis to identify genes with P values below 0.05. Further evaluation of significant genes was conducted using the least absolute shrinkage and selection operator (LASSO) model, which identifies the most stable predictors. LASSO applies a penalty that improves model accuracy by shrinking some coefficients to zero, thereby eliminating non-significant predictors and resulting in a model that retains only a subset of predictors with nonzero coefficients. The optimal penalty parameter (λ) was determined through tenfold cross-validation, based on the minimum partial likelihood deviance. LASSO analysis was performed using the glmnet package. The key genes identified from the LASSO model were then used to construct the risk assessment model, with risk scores calculated by formula (1):

$$Rik score = \mathop \sum \limits_{k = 1}^{m} \beta_{k} *E_{k}$$
(1)

In this context, m represents the total number of genes identified by the LASSO model, βK denotes the risk coefficient, and EK indicates the expression level of each gene included in the risk model. Patients were then stratified into high-risk and low-risk groups based on the median risk score as the cutoff. Kaplan‒Meier survival analysis was performed via the survminer package40 to compare the survival rates between these groups, and the log-rank test was used for statistical evaluation. A time-ROC analysis was conducted to assess the accuracy of the risk model predictions. The area under the curve (AUC) values were computed at 12, 36, and 60 months following PTC diagnosis using the timeROC package41. The aggregated out-of-sample predictions were used to provide an unbiased assessment of the model ‘s performance and confirm the signature ‘s stability. After addressing the risk of overfitting, the entire patient cohort was utilized to develop a gene prognostic signature.

Identification of independent prognostic factors and the construction of a nomogram

Cox proportional hazards regression analysis was performed in two phases to identify independent prognostic factors for PTC. Initially, univariable Cox regression was conducted on a range of clinicopathological variables, including patient age, sex, TNM staging, tumor size, nodal invasion, and the SERN model. Variables with a P value less than 0.05 in the univariable analysis were subsequently included in the multivariable Cox regression model. A nomogram was then developed using the results of the multivariable Cox regression to estimate overall survival (OS) rates at 12, 36, and 60 months post-PTC diagnosis. The nomogram ‘s performance was assessed by calibration curves generated from 1,000 bootstrap resamples. Discriminatory ability was further evaluated using time-dependent ROC curve analysis. All nomogram construction and calibration analyses were performed using the rms package in R.

Evaluation of the diagnostic performance of genes

ROC curve analysis was conducted to evaluate the diagnostic performance of components within the ceRNA and DE-TF networks using RNA-seq data. A robust tenfold CV strategy was implemented via the caret package to reliably estimate the classification performance of each gene and reduce the risk of model overfitting. For each gene, the RNA-seq dataset was randomly partitioned into 10 equal subsets; nine were used for training and one for validation in each iteration. This procedure was repeated so that every subset served once as the validation set, and results from all ten test folds were aggregated using the pROC package 42. Genes with an AUC greater than 0.6 were selected as candidates. A pooled ROC curve was created, and AUC values with 95% confidence intervals (CIs) were calculated using the DeLong method. The optimal classification threshold was established by maximizing the Youden index (J = Sensitivity + Specificity − 1). Sensitivity and specificity metrics were also reported for each gene. To experimentally validate these findings, ROC curve analysis using RT-qPCR data from fresh tissue samples was performed for three selected ceRNA network components: GAS5, PKMYT1, and circMET. This validation cohort comprised paired samples independent from those used in the RNA-seq dataset. ΔCt values were used as predictors to assess each gene ‘s discriminatory power between PTC and normal tissues. AUC values and their 95% CIs were estimated via the non-parametric trapezoidal rule and DeLong method, respectively. Due to the limited sample size in this cohort, cross-validation was not applied. All statistical analyses and graphical representations were carried out using GraphPad Prism (version 10.2.3.402; GraphPad Software, San Diego, CA, USA). Results with P values less than 0.05 were considered statistically significant.

Creation of a multi-gene diagnostic panel

A multi-gene panel was developed to assess whether combining multiple genes could improve diagnostic accuracy. This was achieved by augmenting the previously described tenfold CV framework with a nested feature selection process. Within each iteration, a backward stepwise logistic regression algorithm was applied to the training set (nine folds) to identify an optimal subset of predictive genes. The resulting multi-gene model was then used to generate predictions on the corresponding held-out test fold. The overall performance of the final multi-gene panel was determined by aggregating the out-of-sample predictions across all folds. The optimal diagnostic threshold was selected by maximizing the Youden index. Sensitivity and specificity at this cutoff were reported. AUC values with 95% CIs were calculated using the DeLong method. Statistical significance was set at P < 0.05.

Ethics statement

Informed consent was obtained from all participants and their legal guardians. All subjects were fully informed of the study ‘s objectives, methods, funding sources, the institutional affiliations of the researchers, anticipated benefits, potential risks, and arrangements for post-study care. Participants were also notified of their right to withdraw from the study at any time. The study protocol was approved by the Iran National Committee for Ethics in Biomedical Research and the Ethics Committee of the Research Institute for Endocrine Sciences, Isfahan University of Medical Sciences, Isfahan, Iran (IR.UI.REC.1403.006). All experiments involving human subjects were conducted in compliance with the principles of the Declaration of Helsinki (7th edition) and its subsequent amendments.

Tissue acquisition and pathological affirmation

Thirty paired samples were collected from patients undergoing thyroidectomy at Sina Hospital, Isfahan Province, Iran. Samples were strategically selected to encompass a broad range of clinicopathological characteristics, thereby ensuring that the cohort represented the spectrum of disease observed in the broader patient population. The study included 16 low-aggressive PTC tissues (stage I), 14 high-aggressive PTC tissues (stages II, III, and IV), and 30 corresponding normal tissues adjacent to PTC tumors, providing a balanced representation of low- and high-aggressive PTC cases and effectively capturing clinical heterogeneity. Inclusion criteria required a confirmed histopathological diagnosis of PTC, sufficient tissue for molecular analysis, and no history of chemotherapy or radiotherapy. Fresh tissue samples were immediately submerged in RNAlater RNA Stabilization Reagent (Qiagen, Hilden, Germany) according to the manufacturer ‘s instructions. Samples were incubated overnight at 4 °C, then rinsed with normal saline to remove residual RNAlater, and stored at − 70 °C until total RNA extraction.

Tissue pathology

Histopathological examination of the paraffin-embedded tissues was conducted by a special pathologist. Tumor staging was performed via the 7th edition of the American Joint Committee on Cancer tumor-node-metastasis (TNM) staging system. Patient information and tumor pathological characteristics are detailed in Table S2.

Blood biochemistry

Venous blood samples were collected from PTC patients prior to surgery. Blood calcium levels were measured using a calcium diagnostic kit (Pars-Azmoun Company, Tehran, Iran) according to the manufacturer ‘s protocol. Calcium concentrations were determined by colorimetric assay using a biochemical autoanalyzer (BT 1500, Biotecnica, Italy). Based on these measurements, PTC tissues were categorized into two groups: a low-calcium group (LCG) and a normal-calcium group (NCG), with classification criteria based on clinical reference ranges established by the American Association for Clinical Chemistry. Specifically, NCG was defined as blood calcium levels between 8.5 and 10.5 mg/dL, while LCG was defined as levels below 8.5 mg/dL. To ensure accuracy and reproducibility, each sample was analyzed in duplicate, and internal quality control samples were included in each assay run. The coefficient of variation for the calcium assay was maintained below 3%. Comparative analysis of mean expression levels for GAS5, PKMYT1, and circMET genes was performed between the LCG/NCG PTC tissues and their matched adjacent normal tissues using Student ‘s t-test. Additional analyses were conducted to compare gene expression between LCG and NCG PTC tissues.

RNA extraction and quality control

Total RNA was extracted from RNAlater-treated tissues using a one-step RNA extraction reagent (BIOBASIC, Canada) in accordance with the manufacturer ‘s instructions. The integrity and purity of the extracted RNAs were evaluated via agarose gel electrophoresis and spectrophotometry. RNA fragmentation was assessed via gel electrophoresis. Protein and salt/phenol contaminations were identified by calculating the absorbance ratios of 260/280 and 260/230, respectively.

Primer Designing

Primers targeting mature miR-3671 were designed using the stem–loop methodology43. Divergent primers specific for circMET (hsa_circ_0082002) were generated with the CircInteractome web tool44, and their specificity was further assessed using circPrimer245. Primers for GAS5, PKMYT1, SYMPK, and RNAU6 were designed using Beacon Designer 8.1 (Premier Biosoft International, Palo Alto, CA, USA). All primers were subsequently evaluated with Oligo7 software (Molecular Biology Insights, Colorado Springs, USA) to minimize the potential for secondary structures, such as primer dimers, stem loops, and hairpins. Additionally, the NCBI Primer-BLAST tool was used to confirm the specificity of each primer at the genomic level. RNU6 was utilized as a housekeeping gene for miRNA analysis, whereas SYMPK functioned as a housekeeping gene for circRNAs, lncRNAs, and mRNAs during normalization. Table S3 summarizes the sequences of all primers.

cDNA synthesis

Total RNA (1,000 ng) was reverse transcribed using the AddScript cDNA Synthesis Kit (AddBio, Canada) with random hexamers. For miR-3671 cDNA synthesis, specific RT primers were first heated to 85 °C for 5 min and then cooled prior to reverse transcription. For circMET cDNA synthesis, a branched rolling-circle amplification approach was used, employing a primer spanning the backsplice junction46.

Reverse transcription quantitative polymerase chain reaction (RT‒qPCR)

Expression levels of ceRNA network genes were analyzed using the Rotor-Gene platform and SYBR Green qPCR Master Mix (Amplicon, Denmark). All RT-qPCR experiments were conducted in duplicate. Relative gene expression was calculated using the 2 − ΔΔCt method47. Analyses were performed to compare gene expression between low-aggressive tumors (stage I) and normal tissues, high-aggressive tumors (stages II, III, and IV) and normal tissues, as well as between low- and high-aggressive tumors. Statistical significance of expression differences was assessed using REST 2009 v2.0.13 software48. Boxplots visualizing gene expression levels across groups were generated using GraphPad Prism.

Experimental validation of the most influential component of the ceRNA network

Experimental validation of the in silico-constructed ceRNA network was performed using RT-qPCR on paired clinical PTC tissues and their adjacent normal counterparts. The analysis examined elements of the five-layer ceRNA network axis, specifically the central hub gene PKMYT, its regulator mir3671, and the miR sponges GAS5 lncRNA and circMET. LINC02910 lncRNA and the FOLH1B pseudogene were excluded due to undetectable expression in fresh tissues. For RT-qPCR data, the Student ‘s t-test was used for pairwise group comparisons, preceded by assessment of data normality using the Shapiro–Wilk test. Homogeneity of variance was evaluated using Levene ‘s test. Three types of pairwise comparisons were performed: high-aggressive PTC tissues versus adjacent normal tissues, low-aggressive PTC tumors versus adjacent normal tissues, and high-aggressive versus low-aggressive PTC tumors.

Results

Differentially expressed RNA molecules were identified

We identified DEmRNAs, DElncs, DEmiRs, and DEpseudogenes in four stages of PTC by analyzing RNAseq data (Figure S1). When stage I PTC tissues were compared with normal tissues, 1425 DEmRNAs, 28 DEmiRs, 4041 DElncs, and 1295 DEpseudogenes were identified. Compared with normal tissue, stage II PTC contained 903 DEmRNAs, 22 DEmiRs, 3066 DElncs, and 968 DEpseudogenes. Differential expression analysis between stage III PTC and normal tissues identified 1590 DEmRNAs, 27 DEmiRs, 4065 DElncs, and 1364 DEpseudogenes. Analysis of stage IV PTC tissues revealed 1815 DEmRNAs, 26 DEmiRs, 3947 DElncs, and 1239 DEpseudogenes that were differentially expressed relative to normal tissues. Four stages were overlapped to identify commonly expressed molecules, resulting in the identification of 771 DEmRNAs (Figure S1a), 19 DEmiRs (Figure S1b), 2593 DElncs (Figure S1c), and 751 DEpseudogenes (Figure S1d).

Twenty-three hub genes were recognized

Utilizing IVI, MCC, and MCODE, twenty-three hub genes were identified (Figure S2). A PPI network was constructed using 771 DEmRNAs (Figure S2a), with 661 nodes and 2148 edges. A total of 55 candidate genes were overlapped by the MCC and IVI methods, representing the top ten percent of nodes (Figure S2b). MCODE identified 25 clusters, with cluster 1 attaining the highest score of 19.818 (Table S4). MCODE-cluster 1 consists of 23 nodes, or genes, all situated within the overlapping region with candidate genes, identified as hub genes (Figure S2c). Hub genes exhibited greater interconnectivity compared to other candidate genes (Figure S2d).

Six of the 23 hub genes were validated via GEPIA

Six of the 23 hub genes were significantly upregulated in both the single and combined analyses (Figure S3). These genes included protein kinase membrane associated tyrosine/threonine 1 (PKMYT1, Figures S3a-b), E2F transcription factor 1 (E2F1, Figures S3c-d), thymidine kinase 1 (TK1, Figures S3e-f), antisilencing function 1B histone chaperone (ASF1B, Figures S3g-h), chromatin licensing and DNA replication factor 1 (CDT1, Figures S3i-j), and RecQ-mediated genome instability 2 (RMI2, Figures S3k-l).

GEPIA-validated hub genes were reinforced by microarray metadata analysis

Quality control of microarray metadata was done (Figure S4). Sample distributions before and after batch effect removal are shown in boxplots (Figures S4a and S4b, respectively) and PCA plots (Figures S4c and S4d, respectively). After removing outliers and addressing batch effects, a total of 267 samples remained, comprising 86 normal samples, 103 PTC samples, and 78 samples representing other THCA subtypes. Overall, 4,669 genes were found to be upregulated in PTC compared to normal tissues adjacent to PTC tumors. Two datasets, GSE27155 and GSE9115, were excluded from the microarray meta-analysis due to concerns regarding data quality and redundancy.

RNA-seq data were used to investigate the expression of hub genes across different PTC stages, while the microarray meta-analysis focused on PTC tissues without detailed staging information (Table 1). RNA-seq analysis revealed significant overexpression of GEPIA-validated hub genes across all four PTC stages relative to normal tissues. Importantly, all six hub genes validated by GEPIA were also identified as DEGs in the microarray meta-analysis, independent of detailed pathological annotation.

Table 1 Assessment of hub gene expression through the analysis of RNAseq data and microarray metadata.

The expression of hub genes was not influenced by the sex or age of the patients

The influence of sex on hub gene expression is illustrated in Figure S5. Hub genes were consistently identified as DEGs in PTC tissues compared to normal tissues, irrespective of sex (Figure S5a). Both male and female PTC tissues exhibited significant differential expression of hub genes relative to matched normal tissues (Figures S5b and S5c, respectively). However, no significant differences in hub gene expression were observed when directly comparing PTC tissues from female versus male patients (Figure S5d).

Similarly, no differences in hub gene expression were found across PTC tissues from different age groups (Figure S6), indicating that age does not significantly impact the expression levels of hub genes.

PKMYT1 and RMI2 were the hub genes used for the construction of the ceRNA network

A ceRNA network was constructed centered on two hub genes: PKMYT1 and RMI2 (Fig. 2). The PKMYT1 axis consists of five layers: PKMYT1, miRNAs, lncRNAs, circRNAs, and pseudogenes. The RMI2 axis comprises three layers: RMI2, miRNAs, and circRNAs.

Fig. 2
figure 2

Construction of the ceRNA network. (a) miR-3671 is a microRNA that targets the PKMYT1 hub gene, as indicated in both the RNAInter and miRDB databases. It is also identified as a DEmiRs. (b) AGAP11, LINC02910, and GAS5 are three lncRNAs identified by RNAInter as potential sponges for miR-3671, thereby influencing the regulation of PKMYT1. These lncRNAs are also present among DElncs. (c) FOLH1B is identified by RNAInter as a pseudogene that can act as a sponge for miR-3671. It also exists among DEpseudogenes. (d) miR-616-5p is a microRNA that targets the RMI2 hub gene, as identified in the RNAInter, miRWalk, and miRDB databases. It is also present among DEmiRs. (e) TSPEAR-AS2 is an lncRNA proposed by RNAInter to act as a sponge for miR-616-5p and regulate RMI2. This lncRNA is also present among DElncs. (f) RNAInter-predicted pseudogenes do not overlap with DEpseudogenes, indicating that no pseudogene is currently identified as a sponge for miR-616-5p. (g) GAS5, LINC02910, and FOLH1B demonstrate positive correlations with PKMYT1, whereas PKMYT1 exhibits a negative correlation with miR-3671. RMI2 exhibits negative correlations with miR-616-5p and TSPEAR-AS2. (h) A ceRNA network is observed to consist of two hub genes (PKMYT1 and RMI2), two miRNAs (miR-3671 and miR-616-5p), two lncRNAs (GAS5 and LINC02910), twenty-three circRNAs, and one pseudogene (FOLH1B). Red circles denote circRNAs, orange arrows indicate pseudogenes, blue ovals represent miRNAs, brown hexagons signify lncRNAs, and light brown squares correspond to mRNA. ceRNA: competitive endogenous RNA; DEmiRs: differentially expressed microRNAs; DElncs: differentially expressed long noncoding RNAs; DEpseudogenes: differentially expressed pseudogenes; PKMYT1: protein kinase membrane associated tyrosine/threonine 1; RMI2: recQ mediated genome instability 2; GAS5: growth arrest specific 5; AGAP11: ArfGAP with a GTPase domain, ankyrin repeat, and PH domain 11; LINC02910: long intergenic nonprotein-coding RNA 2910; FOLH1B: folate hydrolase 1B; miR-3671: microRNA 3671; miR-616-5p: microRNA 616-5p; TSPEAR-AS2: TSPEAR antisense RNA 2.

Within the PKMYT1 axis, miR-3671 emerged as a key regulator (Fig. 2a). RNAInter predicted six lncRNAs interacting with miR-3671, three of which overlapped with DElncs (Fig. 2b). They include growth arrest-specific 5 (GAS5), ArfGAP with a GTPase domain, ankyrin repeat, and PH domain 11 (AGAP11), and long intergenic nonprotein-coding RNA 2910 (LINC02910). Neither StarBase nor miRcode identified additional lncRNAs. Two pseudogenes (folate hydrolase 1B: FOLH1B and NBPF member 8: NBPF8) were predicted via RNAInter, with FOLH1B being most prominent, overlapping with 751 DEpseudogenes (Fig. 2c).

For the RMI2 axis, miR-616-5p was identified as a common regulator overlapping between predicted miRNAs and DEmiRs (Fig. 2d). Among eight lncRNAs predicted by RNAInter, only TSPEAR antisense RNA 2 (TSPEAR-AS2) overlapped with DElncs (Fig. 2e). No common pseudogene sponges were identified for miR-616-5p (Fig. 2f).

Correlation analysis was performed to finalize network components (Fig. 2g), based on the ceRNA hypothesis: miRNAs should be negatively correlated with target hub genes, while miRNA sponges should exhibit positive correlations. The final PKMYT1 axis comprised miR-3671 and four sponges (GAS5, LINC02910, AGAP11, and FOLH1B), while the RMI2 axis included miR-616-5p and one sponge (TSPEAR-AS2). However, AGAP11 and TSPEAR-AS2 were excluded due to negative correlations with their respective hub genes.

Regarding circRNAs, eleven DEcircs were shown to interact with miR-3671, with four novel circRNAs (Table S5), notably novel_circ_011147, which displayed the strongest binding affinity. Among known DEcircs, hsa_circ_0003571 had the highest interaction energy (-13.51 kcal/mol), while hsa_circ_001640 presented the most interaction sites with miR-3671. Seventeen DEcircs were found to interact with miR-616-5p, including four novel circRNAs (Table S6). Novel_circ_010456 and hsa_circ_000013 exhibited the strongest affinity (-8.81 kcal/mol and -12.54 kcal/mol), while hsa_circ_000609 had the most interaction sites with miR-616-5p.

A ceRNA network containing two hub genes (PKMYT1 and RMI2), two miRNAs (miR-3671 and miR-616-5p), two lncRNAs (GAS5 and LINC02910), twenty-three circRNAs, and a pseudogene (FOLH1B) was constructed (Fig. 2h). Novel_circ_004588 and novel_circ_013288, circ_0007444, circ_001209, and hsa_circ_000609 were found to have binding sites for both miR-3671 and miR-616-5p.

Validation analyses of ceRNA network genes (excluding circRNAs) were completed both with and without PTC tumor stage annotations (Table 2). All network components were identified as DEGs across various PTC stages and in comparisons of normal versus cancerous tissues, regardless of pathological classification. Other hub genes did not meet the miRNA selection criteria (Figures S7a-d).

Table 2 RNAseq data analysis in the ceRNA network with and without tumor staging.

Deregulation of ceRNA network genes was observed in the pan-cancer analysis

The expression levels of ceRNA network genes across various cancer types are shown in Figure S8. PKMYT1 was significantly overexpressed in tumors of all types compared to adjacent normal tissues (Figure S8a). GAS5 also demonstrated significant upregulation in multiple cancers, including THCA (Figure S8b). RMI2 was consistently overexpressed across all cancer types evaluated, including THCA (Figure S8c). Both FOHL1B and LINC02910 exhibited significantly increased expression in THCA and several other cancers (Figures S8d and S8e, respectively).

Fourteen TFs were incorporated into the DE-TF network construction

To identify potential regulators, two types of TF networks were constructed for ceRNA components and hub genes followed by the development of a DE-TF network incorporating those regulators identified within DEmRNAs (Fig. 3). An initial TF network comprising 11 experimentally validated TFs targeting ceRNA genes (Fig. 3a). Robust interactions among hub genes at both protein and mRNA levels (Fig. 3b, c) facilitated the formation of a secondary TF network, comprising 28 members (Fig. 3d). By intersecting both networks with the DEmRNAs, a DE-TF network containing 14 TFs was established (Fig. 3e). These included androgen receptor (AR), E2F transcription factors 1, 3, and 4 (E2F1, E2F3, E2F4), fos proto-oncogene, AP-1 transcription factor subunit (FOS), forkhead box A1 (FOXA1), Hes Family BHLH Transcription Factor 1 (HES1), lymphoid enhancer binding factor 1 (LEF1), nuclear factor of activated T cells 1 (NFATC1), nuclear receptor subfamily 4, group A member 1 (NR4A1), Sp3 transcription factor (SP3), signal transducer and activator of transcription 6 (STAT6), thyroid hormone receptor beta (THRB), and tumor protein P53 (TP53). Constructed DE-TF network revealed several regulatory relationships: AR was found to regulate GAS5 and LINC02910 lncRNAs, with FOXA1 also regulating LINC02910. NFATC1 was identified as a regulator of the pseudogene FOLH1B. E2F1, subject to autoregulation, was additionally regulated by seven TFs, including TP53, THRB, NR4A1, LEF1, HES1, E2F4, and E2F3. ASF1B was regulated by four TFs: E2F1, E2F3, E2F4, and FOS. Both CDT1 and TK1 were under the control of E2F1 and E2F4, respectively. STAT6 was identified as a regulator of RMI2.

Fig. 3
figure 3

Development of the DE-TF network. (a) An initial TF network identifies 11 experimentally validated transcription factors that target ceRNA genes. (b) The protein–protein interaction network illustrates the connections between two ceRNA components, PKMYT1 and RMI2, and other hub genes at the protein level. (c) Two ceRNA components, PKMYT1 and RMI2, exhibit strong correlations with other hub genes at the mRNA level. (d) A second TF network identifies 28 experimentally validated transcription factors that regulate hub gene expression. (e) A DE-TF network consists of 14 transcription factors that are included among the DEmRNAs. DE-TF: differentially expressed transcription factor; TFs: transcription factors; DEmRNAs: differentially expressed messenger RNAs; ceRNA: competitive endogenous RNA; PKMYT1: protein kinase membrane associated tyrosine/threonine 1; RMI2: recQ mediated genome instability 2.

RNA-seq data revealed that GAS5, AR, and ASF1B were expressed differently during PTC stages

Stage-wise expression plots for one ceRNA network component and two DE-TF network components are shown in Figure S9. GAS5 expression (Figure S9a) was found to increase significantly with tumor aggressiveness (Pr(> F) = 3.11 × 10−5). ASF1B and AR also exhibited stage-dependent differential expression in PTC (Figures S9b and S9c, respectively). In contrast, other DE-TF network components demonstrated uniform expression levels across different PTC stages (data not shown).

A PTC prognostic model was developed containing four genes

A prognostic gene signature for PTC was developed and validated through the integration of ceRNA and DE-TF network components (Fig. 4). A tenfold cross-validation approach employing univariate Cox regression and LASSO regression identified 9 of 24 genes within the ceRNA and DE-TF networks (Table S7). Four genes—STAT6, E2F1, RMI2, and NR4A1—emerged as the most stable and consistently significant predictors of PTC survival. The Kaplan–Meier (KM) plot demonstrated a reduced OS rate for high-risk PTC relative to low-risk PTC (Fig. 4a, log-rank p = 0.039). Time-dependent ROC analysis revealed AUC values of 0.60 at 12 months, 0.75 at 36 months, and 0.58 at 60 months (Fig. 4b).

Fig. 4
figure 4

Development and validation of a prognostic gene signature in PTC. Internal validation confirms the model ‘s predictive accuracy through (a) the KM plot and (b) the time-dependent ROC curve. (c) A forest plot presents the findings of univariable Cox regression analysis for the entire cohort, emphasizing five candidate genes. (d) The tenfold CV method is employed to identify the optimal tuning parameter (lambda) in the LASSO regression model. (e) The LASSO coefficient profiles of the five candidate genes identified a prognostic gene signature comprising four genes. (f) The KM plot indicates significantly different survival outcomes between the high-risk and low-risk groups within the entire cohort, as determined by the prognostic gene signature. (g) The upper panel presents the risk score and survival status for each sample. The lower panel displays the mRNA expression levels of the prognostic gene signature. (h) Time-dependent ROC curves confirm the prognostic model ‘s performance across the entire cohort at 12 months (red), 36 months (blue), and 60 months (yellow), with all AUC values surpassing 0.70. PTC: papillary thyroid cancer; KM: Kaplan-Meier; ROC: receiver operating characteristic; tenfold CV: tenfold cross validation; LASSO: least absolute shrinkage and selection operator; AUC: area under the curve.

After internal validation of the risk model, the complete patient cohort was evaluated to establish a gene prognostic signature. Out of 24 genes within ceRNA and DE-TF networks, five candidates (RMI2, E2F1, ASF1B, NR4A1, and STAT6) were identified as prognostically significant through univariable Cox regression analysis (Fig. 4c). To determine the optimal gene combination, LASSO regression was applied to these five genes, resulting in the exclusion of ASF1B from further analysis (Fig. 4d, e). A risk score model—termed the SERN model—was subsequently developed, comprising STAT6, E2F1, RMI2, and NR4A1. Patients in the low-risk group demonstrated significantly better OS than those in the high-risk group (Fig. 4f). A strong association was observed between higher risk scores and increased mortality. In the risk formula (Eq. 1), negative coefficients for STAT6 (− 1.32), E2F1 (− 0.45), and RMI2 (− 0.05) indicated their protective roles, with higher expression in low-risk patients. Conversely, the positive coefficient for NR4A1 (0.16) identified it as a risk gene, showing lower expression in the low-risk group (Fig. 4g). The SERN model demonstrated robust predictive accuracy, yielding AUC values of 80.43%, 84.82%, and 76.97% at 12, 36, and 60 months post-PTC diagnosis, respectively (Fig. 4h). The high sensitivity and specificity of these four-gene signatures confirmed the SERN model as a reliable predictor of OS in PTC patients.

Correlation analysis between the SERN model and clinicopathological features

Both univariable and multivariable Cox regression analyses were conducted to assess the independent prognostic value of the SERN model in PTC (Table 3). In univariable analysis, the SERN model was significantly associated with OS (HR = 1.43). Other clinicopathological features, including age (HR = 1.01), TNM stage (HR = 2.67), and tumor size (HR = 3.12), were also significantly associated with OS. Multivariable Cox analysis revealed that the SERN model retained a significantly increased hazard ratio (HR = 1.78, p < 0.001), with age remaining a significant factor (HR = 1.01). TNM stage and tumor size were no longer significant predictors in the multivariable context (p > 0.05). These findings indicate that the SERN model and patient age are independent prognostic factors for OS in PTC.

Table 3 Univariable and multivariable Cox regression analysis of prognostic factors.

A nomogram was developed to estimate the probability of OS at 12, 36, and 60 months following PTC diagnosis (Fig. 5). This tool was constructed based on the independent prognostic factors identified in multivariable Cox regression analysis, specifically incorporating the SERN model risk score and patient age as key predictors (Fig. 5a). The discriminative performance of the nomogram was assessed using time-dependent ROC curves, demonstrating excellent predictive accuracy: the AUC was 95.93% for 1-year OS, 96.85% for 3-year OS, and 94.46% for 5-year OS (Fig. 5b). Calibration plots further confirmed strong agreement between the nomogram ‘s predicted survival probabilities and actual observed outcomes at 1, 3, and 5 years post-diagnosis (Fig. 5c–e).

Fig. 5
figure 5

Assessment of the Predictive Nomogram. (a) A nomogram predicts OS at 12, 36, and 60 months in patients with PTC. (b) Time-dependent ROC curves evaluate the nomogram ‘s discrimination at 1, 3, and 5 years. Calibration curves show the nomogram ‘s performance at (c) 1 year, (d) 3 years, and (e) 5 years. The x-axis denotes the survival probability predicted by the nomogram, while the y-axis indicates the actual observed survival rate. The solid line illustrates the nomogram ‘s performance, while the vertical bars denote the 95%CI. The dashed 45-degree line indicates a perfectly calibrated model, wherein predicted outcomes align precisely with actual outcomes. PTC: papillary thyroid cancer; OS: overall survival; ROC: receiver operating characteristic; 95%CI: 95% confidence intervals.

Components of the ceRNA and DE-TF networks have potential for PTC diagnosis

A comprehensive tenfold CV analysis demonstrated that all genes within the ceRNA and DE-TF networks possessed significant diagnostic capability for distinguishing PTC from normal tissue (Table S8). In the ceRNA network, RMI2 emerged as the leading biomarker, achieving a cross-validated AUC of 0.88. Other network components, GAS5 (AUC = 0.86), PKMYT1 (AUC = 0.81), miR-3671 (AUC = 0.75), LINC02910 (AUC = 0.71), miR-616-5p (AUC = 0.73), and FOLH1B (AUC = 0.66), also displayed substantial diagnostic accuracy. DE-TF genes yielded comparably robust results, with E2F1 showing particularly strong performance (AUC = 0.92). Other prominent DE-TF biomarkers included CDT1 (AUC = 0.90), STAT6 (AUC = 0.89), and TP53 (AUC = 0.86), while the remaining DE-TF candidates were ranked by descending AUC values.

A gene panel including components of the ceRNA and DE-TF networks has enhanced diagnostic accuracy for PTC

A multi-gene panel was established to enhance diagnostic accuracy using backward stepwise logistic regression within a tenfold CV framework (Fig. 6). This process identified an optimal panel of eight genes: PKMYT1, LINC02910, GAS5, E2F1, NFATC1, STAT6, E2F3, and TK1. As the components of the ceRNA network, GAS5 (AUC = 0.861), PKMYT1 (AUC = 0.817), and LINC02910 (AUC = 0.715) exhibited substantial diagnostic accuracy (Figs. 6a–c). As the components of the DE-TF network, E2F1 (AUC = 0.929), STAT6 (AUC = 0.898), TK1 (AUC = 0.846), NFATC1 (AUC = 0.837), and E2F3 (AUC = 0.761) were identified as additional high-performing DEGs (Figs. 6d–h). The combined eight-gene panel exhibited a notable synergistic effect with enhanced diagnostic performance of 0.969 (Fig. 6i). At the optimal Youden index threshold, the panel yielded a sensitivity of 97.0% and a specificity of 91.5%.

Fig. 6
figure 6

Assessment of the effectiveness of PTC diagnosis utilizing a multi-gene panel. Backward stepwise logistic regression identifies an optimal eight-gene panel for diagnosing PTC, comprising PKMYT1, E2F1, NFATC1, STAT6, E2F3, LINC02910, GAS5, and TK1. (ah) The ROC curves assess the diagnostic accuracy of each gene individually. (i) The multi-gene panel comprising eight genes demonstrates an enhanced AUC of 0.969. PTC: papillary thyroid cancer; ROC: receiver operating characteristic; AUC: Area under the curve; PKMYT1: protein kinase membrane associated tyrosine/threonine 1; E2F1: E2F transcription factors 1; NFATC1: nuclear factor of activated T cells 1; STAT6: signal transducer and activator of transcription 6; E2F3: E2F transcription factors 3; LINC02910: long intergenic nonprotein-coding RNA 2910; GAS5: growth arrest specific 5; TK1: thymidine kinase 1.

RT-qPCR revealed a stage-specific expression pattern for PKMYT1 and GAS5

RNA-seq-derived hub genes were not identified as DEGs in the comparison of fresh PTC tissues with adjacent normal tissues, excluding considerations of tumor aggressiveness (data not shown). The initial discrepancy was resolved by categorizing PTC tissues according to their aggressiveness (Fig. 7). Both PKMYT1 and GAS5 were significantly downregulated in low-aggressive tumors compared to normal tissues (Fig. 7a, d, respectively). In contrast, these genes were significantly upregulated in high-aggressive PTC tissues relative to normal tissues (Figs. 7b, e, respectively). A direct comparison between high- and low-aggressive tumors revealed notable upregulation of PKMYT1 and GAS5 in the high-aggressive subgroup (Fig. 7c, f, respectively). Among circRNAs, circMET was selected for further experimental validation due to its highest logFC in the RNA-seq data (Table S9). circMET expression was significantly elevated in both low- and high-aggressive PTC tissues compared to adjacent normal tissues (Fig. 7g, h, respectively). However, no significant differential expression of circMET was observed between high- and low-aggressive tumors (Fig. 7i). For miR-3671, expression levels remained consistent across all PTC subgroups and normal tissues (data not shown).

Fig. 7
figure 7

Differential expression of PKMYT1, GAS5, and circMET in relation to PTC aggressiveness. (a) PKMYT1 shows decreased expression in low-aggressive PTC tissues compared to normal thyroid tissue. (b) PKMYT1 shows increased expression levels in high-aggressive PTC tissues relative to normal thyroid tissue. (c) PKMYT1 shows increased expression levels in high-aggressive PTC tissues compared to those with low aggression. (d) GAS5 expression is diminished in low-aggressive PTC tissues compared to normal thyroid tissue. (e) GAS5 exhibits increased expression in high-aggressive PTC tissues relative to normal thyroid tissue. (f) GAS5 shows markedly elevated expression levels in high-aggressive PTC tissues compared to low-aggressive PTC tissues. (g) circMET shows elevated expression in low-aggressive PTC tissues compared to normal thyroid tissue. (h) circMET exhibits increased expression in high-aggressive PTC tissues relative to normal thyroid tissue. (i) circMET expression remains unchanged across high-aggressive and low-aggressive PTC tissues. PTC: papillary thyroid cancer; PKMYT1: protein kinase membrane associated tyrosine/threonine 1; GAS5: growth arrest specific 5.

PKMYT1, GAS5, and circMET demonstrated diagnostic capabilities in clinical specimens

Diagnostic performance analysis demonstrated that the expression of PKMYT1, GAS5, and circMET could effectively predict PTC incidence and progression (Fig. 8). PKMYT1 exhibited strong diagnostic accuracy for distinguishing high-aggressive from low-aggressive PTC, with an AUC of 0.81 (Fig. 8a). GAS5 also showed high diagnostic accuracy, with an AUC of 0.87 for differentiating high-aggressive from low-aggressive PTC tumors (Fig. 8b). circMET demonstrated robust capability to distinguish low-aggressive PTC from normal thyroid tissues, achieving an AUC of 0.81 (Fig. 8c).

Fig. 8
figure 8

Analysis of the diagnostic performance of PKMYT1, GAS5, and circMET. (a) PKMYT1 exhibits notable diagnostic accuracy in distinguishing high-aggressive from low-aggressive PTC tissues. (b) GAS5 demonstrates significant diagnostic accuracy in differentiating high-aggressive from low-aggressive PTC tissues. (c) circMET exhibits significant diagnostic effectiveness in distinguishing low-aggressive PTC tissues from normal thyroid tissues. PTC: papillary thyroid cancer; AUC: area under the curve; PKMYT1: protein kinase membrane associated tyrosine/threonine 1; GAS5: growth arrest specific 5.

Blood calcium levels influence the expression patterns of GAS5 in PTC tissues

Differential expression of GAS5 was observed in PTC patients based on their blood calcium levels (Fig. 9). In PTC tissues from the LCG, GAS5 was significantly downregulated compared to adjacent normal tissues (relative expression = 0.38, p = 0.02; Fig. 9a). In contrast, PTC tissues from the NCG exhibited significant upregulation of GAS5 relative to normal tissues (relative expression = 3.01, p = 0.01; Fig. 9b). Moreover, direct comparison between LCG and NCG PTC tissues revealed a significant downregulation of GAS5 in the LCG group (relative expression = 0.34, p = 0.02; Fig. 9c). These findings suggest a regulatory role for Ca2⁺ in modulating GAS5 expression in PTC patients.

Fig. 9
figure 9

Calcium-dependent differential expression of GAS5 in PTC tissues. (a) GAS5 expression is diminished in LCG PTC tissues compared to adjacent normal tissues. (b) GAS5 shows elevated expression in NCG PTC tissues compared to adjacent normal tissues. (c) GAS5 shows decreased expression in the LCG PTC tissues compared to the NCG PTC ones. LCG PTC: low-calcium level group papillary thyroid cancer; NCG PTC: normal-calcium level papillary thyroid cancer; GAS5: growth arrest specific 5.

Discussion

Numerous studies have highlighted the importance of ceRNAs in tumor development, emphasizing the interplay among lncRNAs, miRNAs, mRNAs, and circRNAs49,50,51,52. This study represents the first report of a ceRNA network in PTC comprising five distinct RNA types: mRNAs, miRNAs, lncRNAs, pseudogenes, and circRNAs. The network benefitted from a stringent selection process for integrated hub genes, ensuring that their expression was not influenced by confounding variables such as patient sex or age. All components of the ceRNA network were dysregulated in PTC, exhibiting differential expression between cancerous and normal tissues—a pattern consistently observed across all PTC stages for every component except circRNAs. The identified ceRNA network features two regulatory axes (a five-layer axis and a three-layer axis), both of which are presented here for the first time in PTC.

Within the five-layer axis, PKMYT1 emerged as a central oncogene, displaying overexpression in PTC as well as in various other cancers. This observation aligns with previous reports documenting PKMYT1 upregulation in triple-negative breast, prostate, ovarian, oral, liver, and gastric cancers53,54,55,56,57,58. Functional studies indicate that PKMYT1 promotes tumorigenesis and cancer progression, such as enhancing growth in liver and ovarian cancer by inhibiting GSK3β (thereby activating the β-catenin/TCF pathway) and SIRT355,56. In summary, they established the oncogenic role of PKMYT1 and its association with tumorigenesis and progression.

In our study, regulation of PKMYT1 expression was attributed to miR-3671 and its sponges, which included FOLH1B, GAS5, LINC02910, and eleven DEcircs. While the regulatory roles of miR-3671 have been reported in gastric and breast cancers, its involvement in PTC has not been previously described59,60. We found miR-3671 to be downregulated, with concurrent upregulation of its sponges—FOLH1B, GAS5, LINC02910, and DEcircs—in PTC tissues. Notably, FOLH1B overexpression has, to date, only been associated with invasion and metastasis in prostate cancer61,62. According to the ceRNA network hypothesis, miRNA sponges are typically coexpressed with their target mRNAs, which matches our observations63,64.

There remain controversies regarding the roles of some lncRNAs identified. For instance, GAS5 has been proposed as a tumor suppressor in PTC and other cancers65,66,67,68, while LINC02910 has primarily been reported as downregulated only in NSCLC69. Gene polymorphisms can impact GAS5 ‘s expression and cell-specific functionality70; for example, rs55829688 has been shown to negatively regulate GAS5 expression in colorectal cancer71. Our pan-cancer RNA-seq analysis reinforces our findings, revealing overexpression of both GAS5 and LINC02910 in PTC. These findings are consistent with those of previous studies of GAS5 within the Iranian population72.

Among the eleven upregulated circRNAs identified as potent miR-3671 sponges, four were novel. Several have previously been implicated as oncogenes—circMET73, circ_0003571, and circ_00060974—influencing PTC metastasis to lymph nodes. We propose that the other circRNAs, not previously associated with PTC, may also serve as significant regulatory molecules due to their interaction with miR-3671.

Within the three-layer axis, RMI2 emerged as a central oncogene, exhibiting upregulation in PTC tumors as well as across a range of other cancer types. This finding is consistent with previous studies linking elevated RMI2 expression to tumor progression in cervical, lung, and prostate cancers75. RMI2 has been implicated in tumorigenesis through various mechanisms, including disruption of the TP53 pathway76 and activation of the PI3K/AKT pathway77.

Our analysis identified miR-616-5p—which was found to be downregulated in PTC tumors—as a regulator of RMI2. Prior research has shown that miR-616-5p plays a role in cancer initiation and progression, notably promoting ovarian and lung cancers by targeting the GSK3β pathway78. Additionally, decreased miR-616-5p expression has been associated with increased cell viability and reduced apoptosis in gastric cancer, mediated by downregulation of DUSP279.

This study is the first to demonstrate the simultaneous dysregulation of RMI2, miR-616-5p, LINC02910, miR-3671, and FOLH1B within a PTC-specific ceRNA network. Five out of twenty-three circRNAs identified were found to target both the PKMYT1 and RMI2 axes, highlighting potential crosstalk between these regulatory pathways.

A DE-TF network was established to clarify the regulation of hub genes and ceRNA network components. Of particular interest is E2F1, which functions not only as a component of the DE-TF network but also as one of the hub genes. Remarkably, E2F1 regulates both itself and other hub genes, with the exception of PKMYT1. As a self-regulating gene, E2F1 is further controlled by seven different transcription factors. Consistent with our findings, the overexpression of E2F1 has been documented in gastric, lung, and PTC80,81,82,83. For example, binding of E2F1 to the promoter of the lncRNA RGMB-AS1 has been shown to facilitate PTC metastasis and invasion84.

Beyond E2F1, E2F4 was identified as the transcription factor regulating the largest number of ceRNA network components. This observation is supported by prior studies reporting E2F4 upregulation in both liver and head and neck tumors, further implicating E2F4 as a potential oncogene that promotes tumorigenesis and progression85,86. E2F4 participates in the regulation of critical cellular processes, including cell proliferation, cell cycle progression, and inhibition of apoptosis85. Nevertheless, some studies have observed reduced E2F4 expression in breast cancer tissues, suggesting that it may exert tumor-suppressive effects in specific contexts87. Thus, the role of E2F4 in cancer biology appears to be highly dependent on tumor type and cellular environment.

All genes included in the ceRNA and DE-TF networks demonstrated diagnostic potential for PTC, with AUC values ranging from 0.64 to 0.92. The integration of eight genes—PKMYT1, LINC02910, GAS5, NFATC1, E2F3, STAT6, E2F1, and TK1—further improved diagnostic accuracy, yielding a specificity of 96.9% for PTC detection.

Identifying robust gene signatures for predicting patient outcomes is fundamental to contemporary oncology. Recent studies have successfully identified CIAPIN1 and DBNDD1 as markers of survival in invasive breast carcinoma88,89. Consistent with previous survival studies, our SERN prognostic model—comprising STAT6, E2F1, RMI2, and NR4A1—highlights the significance of these four genes as reliable prognostic markers for OS in patients with PTC. STAT6, E2F1, and RMI2 had negative regression coefficients, identifying them as protective factors; higher expression levels were associated with lower risk scores and improved survival outcomes. In contrast, NR4A1, with its positive coefficient, served as a risk factor, where greater expression correlated with diminished survival prospects.

In our in silico analysis, we focused on the five-layer regulatory axis of the ceRNA network and validated its components using RT-qPCR in clinical samples. This work illuminated the complex regulatory roles of PKMYT1 and GAS5 in PTC, revealing their stage-specific expression patterns and highlighting potential diagnostic applications. For the first time, experimental validation via RT-qPCR demonstrated a dual expression pattern for PKMYT1 and GAS5: both genes were downregulated in low-aggressive PTC tissues and significantly upregulated in high-aggressive PTC tissues. These findings suggest dynamic and complex regulatory roles for these genes during PTC progression. Additionally, circMET was found to be overexpressed in both low- and high-aggressive PTC tumors. A notable challenge in the RNA-seq data was the lack of pathological stage annotation, which hindered comprehensive analysis of circRNA expression across PTC stages. However, RT-qPCR analysis of fresh PTC tissues with known staging overcame this limitation, revealing that circMET expression was uniform across different PTC stages. Thus, in contrast to PKMYT1 and GAS5, circMET does not appear to serve as a stage-specific marker for PTC. Overall, these results confirm that all ceRNA network components—including circRNAs—exhibited differential expression between PTC and normal tissues, with PKMYT1 and GAS5 specifically associated with tumor aggressiveness.

The integration of RT-qPCR and RNA-seq data yielded unexpected results. While in silico analyses identified GAS5, circMET, and PKMYT1 as key modulators in PTC, initial RT-qPCR experiments did not detect significant deregulation of these genes when simply comparing PTC tissues to adjacent normal tissues. To resolve this discrepancy, we hypothesized that gene expression is correlated with PTC staging characteristics. Subsequent analyses confirmed distinct stage-specific expression patterns for these genes. Pairwise comparisons indicated that GAS5 and PKMYT1 exhibited differential expression between high-aggressive and low-aggressive tumors, as well as in comparisons of each group to normal tissues. In contrast, circMET did not show any significant difference between high-aggressive and low-aggressive tumors, although it was differentially expressed between tumor tissues (both high- and low-aggressive) and normal tissues. Rather than simply confirming RNA-seq predictions, our RT-qPCR results provided a significant enhancement, revealing a complex, stage-dependent expression profile not evident in the initial non-stratified analyses. Overall, these findings indicate that the dysregulation of these genes is closely linked to tumor aggressiveness in PTC.

Our results, corroborated by RNA-seq data analysis, highlight the significant potential of PKMYT1, GAS5, and circMET as valuable biomarkers for diagnosing and monitoring PTC. ROC curve analysis using RT-qPCR data demonstrated that both PKMYT1 and GAS5 possess strong diagnostic capabilities, particularly in distinguishing high-aggressive from low-aggressive PTC tissues. These findings emphasize the clinical relevance of these markers for stratifying tumor aggressiveness. In addition, circMET demonstrated robust diagnostic performance, with an accuracy exceeding 80% for PTC detection. Collectively, these results suggest that PKMYT1 and its regulatory network components represent promising diagnostic biomarkers for PTC.

One of the most intriguing findings of our study is the calcium-dependent regulation of GAS5 expression in PTC tissues. Specifically, GAS5 expression was significantly reduced under low blood calcium conditions and upregulated when blood calcium levels were normal. This suggests a novel calcium-mediated regulatory mechanism involving GAS5 in PTC development. Such calcium dependency may partly explain the variability observed in GAS5 expression and offers important insights into its context-specific role in PTC progression. The discovery of calcium-dependent regulation of GAS5 introduces a new layer of complexity to our understanding of PTC biology and suggests a potential therapeutic target. Future research should focus on elucidating the precise mechanisms underlying stage-specific and calcium-mediated expression patterns of GAS5. Additionally, validation in larger, independent patient cohorts will be important to confirm the clinical utility of these findings as diagnostic biomarkers.

Conclusion

A novel ceRNA and DE-TF network comprising components with diagnostic potential for PTC was developed. The integration of these components resulted in the creation of an eight-gene model that increased the diagnostic accuracy for PTC to 96.9%. PKMYT1 and GAS5 exhibited differential expression patterns correlated with the stages of PTC. In addition to circMET, these genes have demonstrated diagnostic potential, providing invaluable information about PTC progression and biomarker development. The calcium-dependent regulation of GAS5 further emphasizes the importance of exploring metabolic factors in PTC biology.