Abstract
Urinary pseudouridine levels have been proposed as diagnostic biomarkers for various malignancies; however, their association with colorectal cancer (CRC) remains unclear. This study investigates the molecular mechanisms underlying pseudouridine-related genes (PRGs) in CRC. The study incorporated a training cohort (TCGA-CRC), a validation cohort (GSE87211), a single-cell dataset (GSE200997), and PRGs retrieved from public databases. Quality control was performed on the single-cell dataset, followed by cell type annotation. Differentially expressed genes (DEGs) across distinct cell populations were identified. Weighted gene co-expression network analysis (WGCNA) was employed to screen module genes strongly correlated with PRG scores. DEGs between tumor and normal samples in the training cohort were also determined. Candidate genes were selected by intersecting DEGs from key cell types, tumor-normal comparisons, and WGCNA-derived module genes. A prognostic risk model was constructed using Cox regression analyses. Independent prognostic factors were identified through univariate and multivariate Cox analyses, integrating clinical parameters and risk scores, to establish a prognostic nomogram. Comparative analyses of mutation profiles, immune infiltration, and functional pathways were conducted between high- and low-risk groups, and molecular mechanisms of prognostic genes were explored. Additionally, pseudo-temporal trajectory analysis was applied to assess prognostic gene expression dynamics in key cell types. Seven cell types were annotated in the single-cell dataset, with T cells and epithelial cells representing predominant and functionally significant populations. A total of 116 candidate genes were identified by overlapping 4,762 DEGs from T cells, 4,525 DEGs from epithelial cells, 9,772 tumor-normal DEGs, and 2,990 module genes. A prognostic risk model incorporating three PRGs—BCL10, TAF1B, and WWTR1—was developed and validated across training and validation cohorts. Risk score, age, T stage, N stage, and tumor stage were recognized as independent prognostic factors for constructing the nomogram. Pseudo-temporal trajectory analysis revealed that TAF1B expression was relatively elevated at the terminal differentiation phase in epithelial cells. A pseudouridine-related prognostic model based on three PRGs was established and validated, offering a potential reference for CRC treatment and risk stratification.
Introduction
Colorectal cancer (CRC) is among the most prevalent malignancies worldwide and ranks as the fourth leading cause of cancer-related mortality. By 2030, its global incidence is projected to rise by 60%, with over 2.2 million new cases and approximately 1.1 million deaths annually1. CRC progression is primarily driven by the cumulative acquisition of genetic and epigenetic alterations in colonic epithelial cells2. Over the past decade, significant advances in cancer epigenetics have identified aberrant DNA methylation and dysregulated histone modifications as key contributors to CRC pathogenesis3,4,5. Brenner et al. reported that early-stage CRC diagnosis is associated with a five-year survival rate exceeding 90%6,7. However, the asymptomatic nature of early-stage CRC leads to delayed detection, with nearly half of patients diagnosed after hepatic metastases have developed, reducing the five-year survival rate to 14% in cases with distant metastases8. Current therapeutic strategies include surgical resection, radiotherapy, chemotherapy, and targeted therapies, though surgery remains the only potentially curative approach. In China, a considerable proportion of CRC cases are diagnosed at advanced stages, precluding optimal surgical intervention and significantly compromising survival outcomes9. The urgency of early detection and timely treatment in improving prognosis highlights the need for robust biomarkers. Developing novel, reliable biomarkers for early CRC diagnosis is crucial to enhancing treatment efficacy and alleviating disease burden.
RNA molecules undergo extensive post-transcriptional modifications, with over 100 distinct modifications identified to date10. Epitranscriptomic regulation, encompassing chemical modifications of both coding and non-coding RNAs11, plays a pivotal role in RNA metabolism, cellular homeostasis, and post-transcriptional gene regulation12. Alterations in these modifications are increasingly recognized as critical drivers of tumorigenesis. For instance, Shuibin Lin et al. demonstrated that dysregulated RNA modifications selectively enhance the translation of oncogenic transcripts, contributing to hepatocellular carcinoma (HCC) progression13. Furthermore, RNA modifications and their regulatory factors significantly influence the tumor microenvironment (TME)14. Dongliang Li et al. reported that m6A methylation facilitates non-small cell lung cancer (NSCLC) progression by modulating heterogeneous nuclear ribonucleoprotein A2/B1 (HNRNPA2B1)15. These findings underscore the strong association between aberrant RNA modifications and poor cancer prognosis, highlighting their potential as therapeutic targets. Pseudouridine (Ψ), the most abundant and first-identified RNA modification, arises from the enzymatic conversion of uridine via the formation of a carbon-carbon instead of a carbon-nitrogen bond16,17. Unlike many reversible RNA modifications, Ψ is stable in mammals and is excreted in urine as a metabolic byproduct18. Elevated urinary Ψ levels have been linked to multiple malignancies, including prostate, liver, gastric, and CRC, positioning it as a promising biomarker19. Targeting dysregulated post-transcriptional modifications has also emerged as a potential therapeutic strategy in oncology20,21. Dyskerin pseudouridine synthase 1 (DKC1), a key enzyme in Ψ biosynthesis, plays a pivotal role in tumor cell proliferation, invasion, and metastasis across various cancers, including CRC22. Through its regulation of internal ribosome entry site (IRES)-mediated translation and precursor RNA processing, DKC1 modulates the expression of cancer-related genes, thereby promoting tumor progression and metastasis22,23. Its dysregulation is strongly correlated with poor prognosis24,25. Investigating the role of Ψ in CRC may provide deeper insights into disease pathogenesis, facilitating more precise molecular classification and personalized therapeutic interventions.
Single-cell RNA sequencing (scRNA-seq), an advanced high-throughput technology, enables genome-wide transcriptomic and epigenomic profiling at single-cell resolution26. This approach is essential for identifying clinically relevant tumor subpopulations and has become an indispensable tool for dissecting tumorigenesis and intratumoral heterogeneity27. Since its introduction in 200928, scRNA-seq has significantly advanced cancer research, addressing key challenges in CRC, glioblastoma, HCC, metastatic renal cell carcinoma, and breast and lung adenocarcinomas29,30,31,32,33.
This study integrates bioinformatics approaches to identify and characterize key pseudouridine-related genes (PRGs) in CRC. By leveraging single-cell data, it examines the functional relevance of PRGs, assesses their prognostic significance, and explores their roles in tumor biology and immune infiltration. These findings aim to provide a theoretical framework for understanding the contribution of PRGs to CRC pathogenesis, offering new perspectives for targeted therapeutic strategies and prognostic refinement in CRC management.
Materials and methods
Data source
The training dataset utilized in this study was The Cancer Genome Atlas (TCGA)-CRC cohort, obtained from the TCGA database (http://cancergenome.nih.gov/), comprising RNA sequencing data, clinical characteristics, and survival information from 606 CRC tissue samples and 48 adjacent normal tissue samples. The validation dataset GSE87211 (GPL13497), consisting of RNA expression profiles from 190 CRC tumor tissues with survival data, was retrieved from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). Detailed clinical characteristics of both cohorts are provided in Tables S1 and S2. Additionally, the scRNA-seq dataset GSE200997 (GPL21697), containing 16 CRC tumor samples and 7 adjacent normal colon tissue samples, was acquired from the GEO database. A total of 18 PRGs were extracted from the Molecular Signatures Database (MSigDB, https://www.gsea-msigdb.org/gsea/msigdb/index.jsp) by querying the term “pseudouridine.”
Single-cell data analysis
For the single-cell dataset, raw data were processed using the CreateSeuratObject function from the Seurat package (v4.1.0)34. Quality control filtering criteria were applied to exclude cells with nCount > 10,000, Feature_RNA < 200, nFeature > 4000, and mitochondrial gene content exceeding 20%. Normalization was performed using NormalizeData, followed by identification of highly variable genes using the FindVariableFeatures function with selection.method = vst and nfeatures = 2000. The top 2000 highly variable genes underwent ScaleData normalization before principal component analysis (PCA). Linear dimensionality reduction was conducted using JackStraw and ScoreJackStraw, and the variance explained by each principal component (PC) was ranked. PCA inflection points and scree plots were generated to determine the optimal number of PCs for downstream analysis. Cell clustering was performed using the UMAP algorithm with a resolution of 0.5, and cell populations were annotated based on established marker genes from the literature35. The predominant cell type within the dataset was designated as the core cell population. Gene expression patterns across clusters were visualized, and cluster-specific annotations were refined accordingly. To elucidate the biological functions associated with each cell type, all single-cell samples underwent gene set variation analysis (GSVA) using the ReactomeGSA package, enabling pathway enrichment analysis. Differential expression analysis across distinct cell types was conducted using the FindMarkers function, comparing tumor and normal samples. Genes with a fold change exceeding 1.2 were retained, and the differential expression threshold was set at mean |log2Fold Change (FC)| > 0.5 and p < 0.05.
Acquisition of module genes
Weighted gene co-expression network analysis (WGCNA) was performed using the WGCNA package (v1.72-1)36 to identify gene modules most strongly associated with PRG scores. Prior to clustering, gene expression values were pre-filtered based on the absolute median deviation (top 75%). Tumor samples from the training cohort were subjected to hierarchical clustering, and outliers were excluded to ensure analytical robustness. A soft threshold (β) was determined to optimize gene interaction consistency with a scale-free network topology. Systematic clustering was then applied, and gene modules were segmented and merged using the dynamic tree-cutting algorithm, with a minimum module size of 30 genes and a merging threshold of MEDissThres = 0.25. PRG scores for all samples were computed using the GSVA package37 and compared between tumor and normal tissues in the training cohort via the Wilcoxon test. The correlation between PRG scores and each module was analyzed, and modules with a significant correlation to GSVA scores (|r| > 0.3) were selected as key modules. The genes within these modules were designated as module genes.
Identification of candidate genes
Differentially expressed genes (DEGs) between tumor and normal samples in the training cohort were identified using the DESeq2 package (v1.40.2), with selection criteria |log2FC| > 0.5 and p adj. < 0.05. The expression profiles of DEGs were visualized using volcano plots (ggplot2, v3.3.5) and heatmaps (pheatmap, v1.0.12). To identify candidate genes, the intersection of core cell DEGs, tumor-normal DEGs, and module genes was taken. A protein-protein interaction (PPI) network for the candidate genes was then constructed using the STRING database (http://www/string-db.org/) with a confidence threshold of 0.4.
Construction and validation of prognostic risk model
Candidate prognostic genes significantly associated with CRC prognosis (p < 0.05) were initially screened through univariate Cox regression analysis. The least absolute shrinkage and selection operator (LASSO) method was applied via the glmnet package (v4.1-4)38 (family = “Cox”) to refine feature gene selection. The genes selected through LASSO were subsequently used as inputs for multivariate Cox regression modeling, which was further optimized using stepwise selection to establish a final prognostic risk model comprising key prognostic genes and a risk score formula:
The proportional hazards (PH) assumption was tested to validate the risk model (p > 0.05), ensuring that the model met the Cox regression assumptions. The model’s predictive performance was evaluated and validated in both the training and validation cohorts, with samples stratified into high-risk and low-risk groups based on an optimal threshold. Receiver operating characteristic (ROC) curves were generated separately for the two datasets using the timeROC package, and Kaplan–Meier (K–M) survival curves were plotted to compare survival differences between risk groups. The area under the ROC curve (AUC), which accounts for the true positive rate (TPR) and false positive rate (FPR), was used to assess model performance across classification thresholds. An AUC ≤ 0.5 indicated no discriminative ability, equivalent to random guessing. An AUC > 0.5 and ≤ 1 suggested varying levels of classification accuracy, with performance improving as AUC approached 1. An AUC > 0.6 indicated reliable predictive accuracy with clinical reference value. PCA was conducted in both cohorts to evaluate the discriminative power of prognostic genes in CRC classification.
Construction of independent prognostic model
Univariate Cox regression analysis was performed to identify independent prognostic factors, incorporating clinical variables such as age, TNM stage, tumor stage, radiotherapy status, and risk scores. Clinical factors with p < 0.01 were selected for multivariate Cox regression analysis. To validate the multivariate Cox model, scaled Schoenfeld residuals were employed to confirm compliance with the PH assumption. A nomogram was then constructed using the rms package based on independent prognostic factors, providing 1-, 4-, and 7-year survival predictions. Model accuracy was assessed through calibration curves, and Wilcoxon tests were used to analyze risk score differences across clinical subgroups.
Function and gene mutational analysis for two risk cohorts
We conducted KEGG and GO pathway analyses using the GSEA method based on a preranked gene list39,40,41. Specifically, differential expression analysis between high-risk and low-risk groups in the TCGA-CRC cohort was performed using the DESeq2 package42. Genes were ranked by differential expression magnitude and subjected to gene set enrichment analysis (GSEA) in the GO and KEGG databases To ensure the objectivity of the analysis, we only presented the most significant GO terms and KEGG pathways, and selected them based on statistical significance (p < 0.05). Additionally, gene mutation profiles in the TCGA-CRC dataset were analyzed using the maftools package. The most frequently mutated genes in each risk group were visualized in a heatmap.
Immune infiltration analysis
The ESTIMATE algorithm from the Immunedeconv package was applied to score tumor samples in the training cohort, assessing stromal, immune, and ESTIMATE scores. Differences between high- and low-risk groups were evaluated using the Wilcoxon test (p < 0.05), followed by Spearman correlation analysis between each score and the risk score. To further investigate the involvement of prognostic genes in immune infiltration, CIBERSORT analysis was conducted via the GSVA package to estimate the abundance of 22 immune cell types in tumor samples. The distribution of immune cell subsets across risk groups was initially visualized, and differential immune cell abundances between the two cohorts were analyzed using the Wilcoxon test (p < 0.05). Additionally, the Spearman algorithm was used to correlate risk scores with immune cell infiltration levels in the TCGA-CRC dataset. In addition, we validated the immune infiltration results through single-cell analysis. Specifically, based on cell annotation results, we calculated the risk score using the AddModuleScore function from the R package ‘Seurat’. Using the median value of the risk score as a cutoff, we divided the cells into high-risk and low-risk groups, and then generated UMAP and violin plots. Based on the gene expression profiles in the single-cell data, we analyzed the expression differences of T cell activation-related genes between the high-risk and low-risk groups. T cell activation-related genes were identified from the literature and intersected with the single-cell data. Differential expression analysis was then performed on the intersected genes.
Immunotherapy response and drug sensitivity for two risk cohorts
To predict immune checkpoint blockade (ICB) therapy response, the immunophenoscore (IPS) was retrieved from The Cancer Immunome Atlas (TCIA, https://tcia.at/home), and differences in IPS scores between risk groups were analyzed using the Wilcoxon test (p < 0.05). The impact of immunotherapy response was further evaluated by comparing TIDE scores, Dysfunction scores, and Exclusion scores between risk groups in the training set via Wilcoxon tests. For chemotherapy drug sensitivity prediction, drug response data were obtained from the Genomics of Drug Sensitivity in Cancer (GDSC) database, and the half-maximal inhibitory concentration (IC50) of candidate drugs for patients in the TCGA-CRC dataset was estimated using the oncoPredict package. Inter-group IC50 differences were assessed using the Wilcoxon test, and drugs meeting the criteria p < 0.001 and mean IC50 < 0.1 were selected for further demonstration. These therapeutic responses were predicted through computational analysis.
Functions and upstream regulation network for prognostic genes
To identify prognostic gene-associated regulatory networks, the GeneMANIA database was used to construct a gene interaction network centered on prognostic genes. Prognostic gene-associated miRNAs were retrieved from the miRDB database using the multiMiR package, while associated lncRNAs were obtained from the same database based on miRNA interactions. A mRNA-miRNA-lncRNA regulatory network was then constructed and visualized using Cytoscape.
Pseudo-temporal analysis
Pseudo-temporal analysis was conducted using the Monocle package (v2.24.1) to construct single-cell trajectory maps. Based on highly variable genes, cellular pseudo-temporal trajectories were separately established for T cells and epithelial cells using the DDRTree method for dimensionality reduction and cell ordering. The dynamic expression patterns of prognostic genes were visualized along pseudo-time in both cell types.
RNA extraction and quantitative real-time polymerase chain reaction (qRT-PCR)
Tissue samples were collected from 40 patients with CRC at Ningxia Medical University General Hospital during surgical procedures. Total RNA was extracted using TRIzol (Takara, China) and reverse transcribed into cDNA with PrimeScript RTase (Takara, China). Quantitative real-time PCR (qRT-PCR) was performed on a Bio-Rad CFX96 System (Bio-Rad, USA), with primer sequences detailed in Table S3.
Immunohistochemistry (IHC) validation of model genes
To assess protein-level expression differences, paraffin-embedded CRC tissue samples from the Pathology Department of Ningxia Medical University General Hospital were analyzed, including matched tumor and adjacent non-tumorous tissues (Table S4). The expression levels of BCL10, TAF1B, and WWTR1 in CRC and adjacent normal tissues were evaluated using immunohistochemistry (IHC). The primary antibodies used were BCL10 (Abcam, EP606Y), TAF1B (Abcam, ab238759), and WWTR1 (Abcam, CL0371). For each tissue section, at least five high-resolution images of randomly selected fields were captured using consistent microscope settings. These images were processed in ImageJ (v1.53e), with Color Deconvolution applied to separate DAB (brown) and hematoxylin (blue) signals. The DAB channel was thresholded to quantify positively stained areas, and the positive area and integrated optical density (IOD) were measured. The average values from multiple fields were used to derive the final quantitative expression levels. Statistical analyses were conducted in GraphPad Prism (v9.0.0) using paired t-tests or other appropriate statistical methods to compare protein expression levels between tumor and normal tissues.
SiRNA transfection
Normal colonic epithelial cells (NCM460) and CRC cell lines (HT29, HCT116, SW480, and SW620) were obtained from the Cell Bank of the Chinese Academy of Sciences (Shanghai, China). Small interfering RNAs (siRNAs) targeting BCL10 and the corresponding negative control were purchased from Sangon Biotech (Shanghai, China). Cells were seeded into 6-well plates and transfected at 60–70% confluence using Lipofectamine 3000 (Thermo Fisher Scientific) according to the manufacturer’s instructions. The final siRNA concentration was 50 nM unless otherwise specified. After 6 h of transfection, the medium was replaced with fresh complete culture medium. Cells were harvested 48 h post-transfection for downstream analyses. Knockdown efficiency of BCL10 was assessed by qRT-PCR and WB.
Western blotting (WB)
Total protein was extracted from cultured cells using RIPA lysis buffer (Beyotime, Shanghai) supplemented with 1% protease and phosphatase inhibitor cocktail. Cell lysates were incubated on ice for 30 min and centrifuged at 12,000 × g for 15 min at 4 °C. The resulting supernatants were collected, and protein concentrations were determined using a BCA Protein Assay Kit (Beyotime). Equal amounts of protein (20–30 µg per lane) were separated by SDS-PAGE and transferred onto PVDF membranes (Millipore). Membranes were blocked with 5% non-fat milk in TBST (0.1% Tween-20) for 1 h at room temperature and then incubated overnight at 4 °C with primary antibodies. After three washes in TBST, membranes were incubated with HRP-conjugated secondary antibodies for 1 h at room temperature. Protein bands were visualized using an ECL chemiluminescence detection system (Thermo Fisher Scientific) and imaged on a Tanon 5200 platform under exposure conditions within the linear detection range. β-Actin was used as the internal loading control. Detailed information on primary and secondary antibodies is provided in Table S5.
Cell counting Kit-8 (CCK-8) assay
Cells were collected, resuspended in complete culture medium, adjusted to a density of 5 × 10⁴ cells/mL, and seeded into 96-well plates at 100 µL per well, with five technical replicates per group. Plates were incubated overnight at 37 °C in a humidified 5% CO₂ atmosphere to allow cell attachment. At 0, 24, 48, 72 and 96 h, 10% CCK-8 working solution (Dojindo, Japan) was added to each well and incubated for 1 h at 37 °C in the dark. Absorbance at 450 nm was recorded using a microplate reader, and the mean values were used to construct cell viability and proliferation curves.
Wound-Healing assay
Cells were seeded into 6-well plates at a density of 5 × 10⁵ cells/mL and cultured until a confluent monolayer formed. After washing with PBS, a linear scratch was generated using a 200 µL pipette tip. Detached cells and debris were removed by gentle PBS rinsing, and images were captured at 0 h. The cultures were then incubated for an additional 24 h before imaging again. Wound width reduction between 0 and 24 h was quantified in ImageJ (v1.53e) based on the scale bar, and the percentage of wound closure was calculated to evaluate cell migratory capacity.
Transwell assay
Cells were harvested, enzymatically digested, and resuspended in serum-free medium at a concentration of 5 × 10⁴ cells/100 µL. Migration assays were performed using uncoated Transwell chambers, whereas invasion assays utilized upper chambers precoated with Matrigel (Corning) and incubated at 37 °C for 2 h to allow gel solidification. The lower chambers were filled with 600 µL of complete medium containing 20% FBS as a chemoattractant. After 48 h of incubation, cells remaining on the upper membrane surface were gently removed. Cells that migrated or invaded to the lower surface were fixed with 4% paraformaldehyde for 20 min and stained with 0.1% crystal violet for 15 min. Images were captured from five randomly selected fields, and stained cells were quantified to evaluate migratory and invasive capacities.
Statistical analysis
All statistical analyses were performed using R software (v4.2.2).
Results
Identification of seven cell types in single-cell dataset
First, we performed single-cell analysis on the GSE200997 dataset, and quality control was conducted (Fig. S1). By applying PCA for dimensionality reduction, we identified 22 cell clusters (Fig. S2,3). Marker gene expression patterns enabled the annotation of seven major cell types: T cells, B cells, epithelial cells, myeloid cells, fibroblasts, endothelial cells, and a subset of unclassified cells (Fig. 1A). The proportional distribution of these cell types across single-cell samples (Fig. 1B). indicated a predominance of T cells, followed by epithelial cells, highlighting their central roles. Epithelial cells and T cells were crucial factors in the TME, influencing tumor progression, immune evasion, and treatment outcomes. GSVA was subsequently performed on all single-cell samples, revealing the 30 most significantly enriched pathways (Fig. 1C), among which T cells and epithelial cells were notably associated with TASK channels, hydroxycarboxylic acid-binding receptors, and ATP-sensitive potassium channels. Differential analysis of cell types between tumor and normal groups showed that, T cells and epithelial cells had 2748 and 5880 differentially expressed genes, respectively, with 2203 and 3568 upregulated, and 545 and 2312 downregulated (Fig. 1D).
Recognition of BCL10, TAF1B, and WWTR1 as prognostic genes
To screen genes related to pseudouridine, we performed WGCNA analysis based on large-scale RNA-Seq data. For WGCNA, no outlier samples were identified through cluster analysis. When the scale-free topology model fit index (R2) surpassed 0.9 (red line), the soft-thresholding power (β) was set to 24, ensuring that the network conformed to a scale-free topology (Fig. S4A). After hierarchical clustering and module merging, nine gene modules were identified (Fig. S4B). PRG scores were significantly elevated in tumor samples compared to normal tissues in the training set (Fig. 2A). Correlation analysis revealed that five modules (paleturquoise, darkolivegreen, midnightblue, skyblue, and grey60) exhibited significant associations with PRG scores, collectively encompassing 2,990 module genes (Fig. 2B). Based on large-scale RNA-Seq data, we performed differential expression analysis, the results showed, between tumor and normal groups within the training set, 9,772 DEGs were identified, including 5,204 upregulated and 4,568 downregulated genes (Fig. 2C, S5). By intersecting DEGs from core cell populations (4,762 from T cells and 4,525 from epithelial cells), 9,772 DEGs from tumor and normal samples, and 2,990 module genes, 116 candidate genes were identified. (Fig. 2D).
Recognition of BCL10, TAF1B, and WWTR1 as prognostic genes. (A) Box plots comparing GSVA scores between normal and tumor samples. (B) Heatmap depicting correlations between gene modules and PRGs. (C) Volcano plot of 9772 differentially expressed genes. (D) Venn diagram illustrating the intersection of genes. (E) Forest plot from univariate Cox analysis for prognostic gene selection. (F) Selection of prognostic genes using Lasso Cox analysis. (G) Forest plot of the prognostic model.
Univariate Cox regression analysis of these 116 candidate genes identified 14 genes significantly associated with CRC prognosis (p < 0.05) (Fig. 2E). In the LASSO model, when the partial-likelihood deviance λ was set to 0.0104603, seven genes were selected for multivariate Cox regression analysis (Fig. 2F). Ultimately, three prognostic genes—BCL10, TAF1B, and WWTR1—were incorporated into the final multivariate Cox risk model (Fig. 2G).
In this formula, “coef” represents the regression coefficient, while “exp” denotes its expression level in the sample. Specifically, the coef values for BCL10, TAF1B, and WWTR1 were − 0.25425, -0.55816, and 0.33448, respectively. None of these genes exhibited statistical significance (p > 0.05), indicating that the risk model satisfied the PH assumption (Fig. S6).
The high-risk group exhibits a worse prognosis
In both the training and validation sets, patients were stratified into high- and low-risk cohorts based on the optimal threshold (Fig. 3A-B). The risk score effectively distinguished CRC prognosis across both datasets (Fig. 3C-D). In the training set, the AUC values for 1-, 4-, and 7-year survival were 0.626, 0.654, and 0.635, respectively, suggesting moderate predictive accuracy with clinical relevance for CRC prognosis (Fig. 3E). Similarly, in the validation set, AUC values for 1-, 4-, and 7-year survival all exceeded 0.6, further supporting the model’s predictive capability (Fig. 3F). KM survival curves demonstrated a significant difference between high- and low-risk cohorts in both datasets (p < 0.05), with low-risk patients exhibiting markedly better survival outcomes (Fig. 3G-H). PCA confirmed distinct clustering patterns between risk groups in both datasets, underscoring the ability of the selected prognostic genes to effectively stratify patients based on risk (Fig. 3I-J).
Evaluation and Validation of the Prognostic Model. (A-B) Risk score distribution curves in training and validation cohorts. (C-D) Scatter plots showing the distribution of risk scores in training and validation cohorts. (E-F) ROC curve analysis for 1-, 4-, and 7-year survival predictions in training and validation cohorts. (G-H) Kaplan-Meier survival analysis comparing high- and low-risk groups in training and validation cohorts. (I-J) PCA visualization of risk group separation in training and validation cohorts.
Construction of a well-performing nomogram
Following univariate and multivariate Cox regression analyses and PH assumption verification, risk score, age, T stage, N stage, and tumor stage were identified as independent prognostic factors for constructing the final multivariate model. This model demonstrated strong predictive performance, with p < 0.0001 and a C-index of 0.77 (Fig. 4A-C). To enhance clinical applicability, a prognostic nomogram incorporating these independent prognostic factors was developed (Fig. 4D). The calibration curve showed that predicted survival probabilities at 1, 4, and 7 years closely aligned with actual survival outcomes, validating the model’s reliability (Fig. 4E). Furthermore, there was a significant difference in risk scores between subgroups of tumor stage and N stage (Fig. 4F-G).
Construction and Validation of an Independent Prognostic Model. (A) Forest plot of univariate Cox analysis for clinical factors. (B) Forest plot of independent prognostic factors in the multivariate Cox model. (C) PH assumption test for the independent prognostic model. (D) Nomogram for individualized survival prediction based on independent prognostic factors. (E) Calibration curve assessing the accuracy of the nomogram. (F-G) Differences in risk scores across pathological feature subgroups. ( *p < 0.05).
Differences between the two cohorts in functional and somatic mutations
In GO analysis, genes upregulated in the high-risk cohort were predominantly enriched in pathways related to collagen trimer formation, whereas those highly expressed in the low-risk cohort were significantly associated with DNA replication initiation (Fig. 5A). In KEGG pathway analysis, genes overexpressed in the high-risk cohort showed significant enrichment in the calcium signaling pathway, while those in the low-risk cohort were primarily enriched in the homologous recombination pathway (Fig. 5B). Mutation analysis in the training set revealed that missense mutations constituted the most frequent alteration type, with SNPs being the predominant variant form. Among SNP mutations, C > T transitions occurred at the highest frequency. On average, each sample carried 88 variants. The ten most frequently mutated genes were TTN, APC, MUC16, SYNE1, TP53, KRAS, FAT4, RYR2, PIK3CA, and CSMD3 (Fig. 5C). In both high-risk (n = 269) and low-risk (n = 307) cohorts, APC exhibited a consistently high mutation frequency, ranking among the six most frequently mutated genes (Fig. 5D). Notably, TP53 mutations were more prevalent in the high-risk cohort, whereas KRAS mutations were more frequent in the low-risk cohort.
Functional Enrichment and Mutation Analysis in High- and Low-Risk Groups. (A) GO enrichment analysis of DEGs between high- and low-risk groups. (B) KEGG enrichment analysis of DEGs between high- and low-risk groups. (C) Overview of gene mutation profiles from TCGA and GEO datasets. (D) Comparative mutation analysis of the top six most frequently mutated genes in high- and low-risk groups.
Immuno-related analysis and drug sensitivity
Stromal, immune, and ESTIMATE scores were significantly elevated in the high-risk cohort (p < 0.05), indicating a greater degree of immune cell infiltration compared to the low-risk cohort (Fig. 6A). Spearman correlation analysis demonstrated strong positive associations between risk scores and stromal (r = 0.57), immune (r = 0.37), and ESTIMATE scores (r = 0.51) (Fig. 6B-D). The composition of 22 immune cell types was analyzed across risk groups (Fig. 6E), revealing significant differences (p < 0.05) in 11 immune cell subsets, including resting and activated CD4 memory T cells, as well as regulatory T cells (Fig. 6F). Among these, regulatory T cells exhibited the strongest positive correlation with risk scores, whereas activated CD4 memory T cells showed the highest negative correlation (Fig. 6G). Immune checkpoint analysis identified significant differences in IPS scores between risk cohorts for CTLA4/PD1 (Fig. 6H). Additionally, TIDE, dysfunction, and exclusion scores were significantly elevated in the high-risk cohort, suggesting a more immunosuppressive TME characterized by increased immune cell exclusion and impaired immune function (Fig. 6I). Based on single-cell data, UMAP dimensionality reduction was used to show the distribution characteristics of cells with different risk scores in the single-cell dataset (Fig S7A). The violin plot then demonstrated significant differences in risk scores across various immune cell types (Fig S7B). Additionally, T cell activation-related genes exhibited significant differences between the high-risk and low-risk groups (Fig S7C), suggesting that the high-risk and low-risk groups may be associated with T cell activation characteristics. Drug sensitivity analysis revealed that bortezomib, docetaxel, and daporinad exhibited significant differences (p < 0.001) between risk groups, with mean IC50 values below 0.1, indicating distinct therapeutic responses across cohorts (Fig. 6J).
Exploration of Tumor Immune Status and Drug Sensitivity. (A) Comparison of ESTIMATE scores between high- and low-risk groups. (B-D) Correlation analysis between stromal, immune, and ESTIMATE scores with risk scores. (E) Heatmap showing the distribution of immune cell abundances across high- and low-risk groups. (F) Box plots illustrating significant differences in immune cell abundance between high- and low-risk groups. (G) Correlation analysis between risk scores and immune cell infiltration levels. (H-I) Comparison of IPS and TIDE scores between high- and low-risk groups. (J) Differential drug sensitivity analysis between high- and low-risk groups. (***p < 0.001, ****p < 0.0001)
The co-expression and mRNA-miRNA-lncRNA networks
To explore the potential regulatory mechanisms of prognostic genes in CRC, the 20 co-expressed genes most strongly associated with the prognostic genes were integrated with the three prognostic genes to construct a co-expression network using the GeneMANIA database (Fig. 7A). A total of 102 miRNAs linked to the prognostic genes were identified, and 57 lncRNAs were retrieved based on these miRNAs. Ultimately, 58 miRNAs were found to form a complete prognostic gene (mRNA)-miRNA-lncRNA regulatory axis. Thus, the mRNA-miRNA-lncRNA network comprised three prognostic genes, 58 miRNAs, and 57 lncRNAs (Fig. 7B), including TAF1B-hsa-miR-1304-3p-LINC00632, BCL10-hsa-miR-4743-5p-LINC01551, and WWTR1-hsa-miR-335-5p-LINC00324 etc. The expression of LINC00632 was associated with various cancers43. LINC01551 significantly reduced the proliferation, invasion, and metastasis of nasopharyngeal carcinoma (NPC) cells44. hsa-miR-335-5p had been quantified as a prognostic marker for gastric cancer45. LINC00324 regulated the proliferation, migration, and invasion of CRC cells and might have become a potential therapeutic target for CRC46. This suggested that prognostic genes might have formed complex regulatory axes through miRNAs and lncRNAs in the ceRNA network, affecting the proliferation, migration, and invasion of CRC cells.
Expression patterns of prognostic genes in cell pseudotime trajectories
In T cells, pseudo-temporal trajectory analysis based on highly variable genes revealed distinct differentiation stages (Fig. 8A-C). All three prognostic genes exhibited relatively high expression at different time points within the trajectory (Fig. 8D). This suggested that these prognostic genes (BCL10, TAF1B, and WWTR1) might have played an important role in the development, differentiation, or functional regulation of T cells. Similarly, pseudo-temporal analysis in epithelial cells identified multiple differentiation states (Fig. 8E-G). BCL10 expression was elevated in states 1, 2, 4, 5, and 6 but reduced in states 3 and 7 (Fig. 8H). This suggested that the expression of BCL10 exhibited dynamic changes in the different differentiation or functional states of epithelial cells. TAF1B expression was notably higher at the terminal differentiation stages (states 5 and 6). WWTR1 displayed a relatively low overall expression but increased expression towards the end of differentiation. This suggested that TAF1B was associated with the late differentiation or functional maturation of epithelial cells, while WWTR1 might have played a specific role in the later stages of epithelial cell differentiation.
Pseudo-Temporal Trajectory Analysis. (A) Highly variable genes in T cells. (B) Pseudo-temporal dynamics of T cells. (C) Pseudo-temporal staging of T cells, with three distinct developmental stages indicated by color-coded dots. (D) Pseudo-temporal expression patterns of prognostic genes in T cells. (E) Highly variable genes in epithelial cells. (F) Pseudo-temporal dynamics of epithelial cells. (G) Pseudo-temporal staging of epithelial cells, with three distinct developmental stages indicated by color-coded dots. (H) Pseudo-temporal expression patterns of prognostic genes in epithelial cells.
Differential expression patterns of BCL10, TAF1B, and WWTR1
To examine the expression levels of BCL10, TAF1B, and WWTR1 in CRC and normal tissues, their transcriptomic profiles were retrieved from the TCGA database and subjected to comparative analysis. qRT-PCR was performed to validate these differential expression patterns. As shown in (Fig. 9A-F), BCL10 and WWTR1 were upregulated in normal colorectal tissues, whereas TAF1B expression was elevated in CRC tumor tissues.
Expression Profiles of PRGs in CRC and Normal Tissues. (A-C) Expression levels of BCL10, TAF1B, and WWTR1 in CRC tumor and normal tissues from the TCGA database. (D-F) qRT-PCR validation of BCL10, TAF1B, and WWTR1 expression in CRC tumor and normal tissues. (G, I, K) IHC staining for BCL10, TAF1B, and WWTR1 in CRC tumor and normal tissues. (H, J, L) Quantitative analysis of IHC staining intensity for BCL10, TAF1B, and WWTR1 in CRC and normal tissues, with statistical significance assessed using independent t-tests (*p < 0.05, **p < 0.01, and ***p < 0.001).
IHC analysis was conducted to assess the protein expression of BCL10, TAF1B, and WWTR1 (Fig. 9G-L). Consistent with the transcriptomic findings, BCL10 and WWTR1 exhibited significantly higher expression in normal colorectal tissues, as indicated by enhanced staining intensity and a higher proportion of positively stained cells. In contrast, TAF1B expression was markedly elevated in tumor tissues, with intense nuclear staining predominating in cancerous regions.
Knockdown of BCL10 promotes proliferation, migration, and invasion of colorectal cancer cells
To further investigate the functional role of BCL10 in CRC, we first examined its protein expression in a normal human colonic epithelial cell line (NCM460) and four CRC cell lines (SW620, HT29, HCT116, and SW480). WB analysis showed that BCL10 expression was markedly reduced in all four CRC cell lines compared with NCM460 (Fig. 10A–B). SW480 cells were selected for subsequent experiments. Transfection with si-BCL10 efficiently suppressed BCL10 expression at both the mRNA and protein levels relative to the si-NC, as confirmed by qRT-PCR and WB (Fig. 10C–D). CCK-8 assays demonstrated that BCL10 knockdown significantly enhanced the proliferative capacity of SW480 cells over time, particularly at later time points after transfection (Fig. 10E). Wound-healing assays showed that the migration distance at 24 h was significantly greater in the si-BCL10 group than in the si-NC group, indicating accelerated migratory ability following BCL10 silencing (Fig. 10F–G). Consistently, Transwell assays revealed that the numbers of migrated and invaded cells were markedly increased in the si-BCL10 group compared with controls (Fig. 10H–I).
BCL10 expression in CRC cell lines and its effects on proliferation, migration, and invasion. (A) WB analysis of BCL10 protein levels in the normal human colonic epithelial cell line NCM460 and four CRC cell lines (SW620, HT29, HCT116, and SW480). (B) Quantification of relative BCL10 protein expression in the indicated cell lines. (C) qRT-PCR analysis of BCL10 mRNA expression in SW480 cells transfected with si-NC or si-BCL10. (D) WB confirmation of BCL10 knockdown efficiency in SW480 cells. (E) CCK-8 assay showing the proliferative capacity of SW480 cells at 24, 48, 72, and 96 h after transfection with si-NC or si-BCL10. (F,G) Representative images and quantitative analysis of wound-healing assays in SW480 cells following BCL10 knockdown. (H,I) Representative images and quantitative analysis of Transwell migration and invasion assays in SW480 cells transfected with si-NC or si-BCL10. Data are presented as mean ± SD (n = 3). *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001.
Discussion
Recent advancements in early detection techniques for CRC have significantly improved patient survival rates47. Despite these developments, CRC remains the second leading cause of cancer-related mortality worldwide48. Identifying effective prognostic biomarkers and therapeutic targets at earlier disease stages is critical for improving clinical outcomes49. Publicly available databases such as TCGA, GEO, and MSigDB have facilitated the development of various risk-scoring models for CRC prognosis prediction. However, no prognostic models incorporating pseudouridine modification as a predictive factor have been established to date.
In this study, bioinformatics approaches were utilized to identify 116 potential PRGs in CRC. Through co-expression network analysis and statistical modeling, including univariate and multivariate Cox regression analyses, three key prognostic genes—BCL10, TAF1B, and WWTR1—were identified. TAF1B plays a pivotal role in the RNA polymerase I preinitiation complex (PIC), which is essential for ribosomal biogenesis and cellular proliferation50. Its knockdown has been shown to regulate the p53-miR-101 axis, underscoring its influence on RNA polymerase I activity and its role in tumor progression, particularly in HCC51. Elevated TAF1B expression has been observed in gastric cancer, where it correlates with increased tumor growth and poor prognosis. Functional studies indicate that TAF1B silencing inhibits tumor proliferation and reduces viability in gastric cancer cells, as well as tumor burden in xenograft models52. In HCC, overexpression of TAF1B is linked to unfavorable clinical outcomes, whereas its deletion induces nucleolar stress, apoptosis, and activation of the p53-miR-101 pathway51. Given its involvement in survival pathway regulation and apoptosis induction, TAF1B is being explored as a potential therapeutic target in CRC, particularly in tumors exhibiting microsatellite instability53. BCL10 is implicated in RNA modifications and post-transcriptional dysregulation, often undergoing RNA-level mutations that evade genomic detection54. Research by Qin et al. highlights its role in ubiquitination and oncogenic signaling, facilitating tumor growth and immune evasion55. BCL10 is a key component of the NF-κB signaling cascade, a critical pathway governing cancer cell survival and proliferation56. In breast cancer, BCL10 enhances NF-κB activation, promoting tumor cell viability and resistance to chemotherapy57. Similarly, in CRC, aberrant BCL10 expression is associated with aggressive tumor phenotypes and poor clinical prognosis58. Beyond its direct impact on tumor progression, BCL10 contributes to an inflammatory TME, modulating immune responses and supporting immune evasion. These findings position BCL10 as a promising target for cancer immunotherapy59. WWTR1 (also known as TAZ) has been implicated in cell cycle regulation and tumor progression. In Merkel cell carcinoma (MCC), WWTR1 influences tumor development through TEA domain (TEAD)-dependent transcriptional repression mediated by MCPyV LT60. In CRC and NSCLC, WWTR1 activation enhances oncogenic pathways that drive tumor cell proliferation, migration, and invasion. Moreover, elevated TAZ expression in lung cancer has been associated with chemotherapy resistance, highlighting its potential as a therapeutic target61. Collectively, these findings suggest that BCL10, TAF1B, and WWTR1—genes strongly linked to pseudouridine modification—play critical roles in shaping the tumor immune microenvironment through RNA modifications. Their involvement in immune regulation and tumor progression underscores their potential as therapeutic targets for immunotherapy in CRC, offering promising avenues for improving patient prognosis and treatment outcomes.
A prognostic risk model was developed based on the three differentially expressed genes (BCL10, TAF1B, and WWTR1), which demonstrated high predictive accuracy in both the training and validation datasets. The model effectively stratified patients with CRC into high-risk and low-risk groups based on calculated risk scores, with KM survival analysis confirming its prognostic utility.
Clinical characteristics not only influence variable selection in prognostic models but also impact their predictive accuracy and interpretability. For instance, Aleix Prat and colleagues successfully integrated clinical data to predict survival and treatment response in early HER2-positive breast cancer, reinforcing the value of incorporating clinical parameters into prognostic models to enhance reliability and applicability62. By integrating the risk score with clinical parameters, the model was validated as an independent prognostic factor, and a prognostic nomogram was constructed for clinical implementation. This nomogram serves as a practical tool for predicting patient outcomes and guiding personalized treatment decisions. Calibration analysis further demonstrated the model’s accuracy in predicting patient survival, underscoring its potential value in targeted therapy and its broader clinical applicability.
In ovarian cancer, regulatory T cells (Tregs) suppress their own activity through the release of inhibitory cytokines such as transforming growth factor-β (TGF-β) and interleukin-10 (IL-10). Additionally, Tregs directly interact with effector T cells and other immune components via surface molecules, including CTLA-4 and PD-1, facilitating immune evasion by tumor cells63. In CRC, a significant correlation has been established between dense Treg infiltration within tumor tissues and both increased recurrence rates and reduced overall survival, highlighting Treg expression levels as a potential prognostic biomarker64. Our findings further support a strong association between CRC risk scores and immune cell infiltration. Specifically, high-risk patients exhibit a positive correlation between risk scores and Treg infiltration, whereas activated CD4 memory T cells demonstrate a negative correlation. Differences in the infiltration levels of eleven immune cell types between high- and low-risk groups underscore the critical role of Treg infiltration as a key risk factor in CRC. Moreover, genes related to T-cell activation exhibit significant differences between high-risk and low-risk groups, suggesting that these groups may be associated with distinct T-cell activation characteristics. In the TME, T cells are suppressed through various pathways, particularly within the tumor-associated immunosuppressive milieu, where they are induced to differentiate into different subtypes65. Tregs promote tumor immune evasion by inhibiting the function of effector T cells66. Additionally, T-cell immune responses are modulated by myeloid cells, which exert crucial immunosuppressive functions in the TME through differentiation into myeloid-derived suppressor cells (MDSCs)67. MDSCs suppress anti-tumor immune responses via multiple mechanisms, thereby facilitating tumor growth and metastasis68. Furthermore, M2 macrophages, as key myeloid cells, significantly impair pro-inflammatory potential by weakening tumor antigen presentation and secreting inhibitory factors (such as IL-12), ultimately suppressing immune responses69. Single-cell analysis, incorporating quality control and dimensionality reduction clustering, identified seven major cell types within CRC samples, with T cells and epithelial cells being the most abundant. Pseudo-temporal trajectory analysis of BCL10, TAF1B, and WWTR1 in T cells and epithelial cells revealed dynamic expression patterns during different stages of differentiation. Notably, BCL10 exhibited significant variation in expression across epithelial cell states70, while TAF1B expression increased towards the terminal differentiation stages, suggesting its involvement in cellular maturation or tumor-induced signaling responses71. The CBM signaling complex, composed of CARD11, BCL10, and MALT1, regulates T-cell receptor-induced gene expression by modulating NF-κB activation and mRNA stability72. Dysregulated BCL10 expression in murine models has been shown to aberrantly activate NF-κB signaling, driving T-cell activation and malignant tumor progression73, positioning BCL10 as a promising immunotherapeutic target. Furthermore, research by Markus Casper et al. identified TAF1B as a regulator of microsatellite instability (cMSI) in HCC, implicating it in tumor progression through its effects on stem cell proliferation and apoptosis74. Similarly, elevated WWTR1 expression aligns with its role in the Hippo signaling pathway, where it modulates cellular proliferation and contact inhibition61. These findings provide deeper insights into CRC cell dynamics while highlighting BCL10, TAF1B, and WWTR1 as potential molecular targets for novel therapeutic interventions aimed at improving the outcomes of patients with CRC.
Through an integrative analysis of single-cell and bulk transcriptomic data, this study investigated the role of PRGs in CRC and their prognostic implications. Beyond elucidating the critical involvement of PRG-associated transcriptional programs in CRC progression, scRNA-seq technology was leveraged to dissect the functional heterogeneity of distinct cell populations within the TME. The development of a PRG-based prognostic risk model incorporating BCL10, TAF1B, and WWTR1 significantly enhances CRC prognosis prediction and provides a theoretical framework for future targeted therapies and precision medicine strategies. Furthermore, the potential mechanisms by which these PRG-associated signatures influence CRC biology were explored through immune infiltration analysis, drug sensitivity assessments, mRNA–miRNA–lncRNA regulatory network construction, and pseudotime trajectory analysis. Collectively, this study advances the molecular understanding of CRC while providing valuable insights for future research and clinical applications. However, certain limitations should be acknowledged. First, our analyses are primarily correlative and do not demonstrate that BCL10, TAF1B, and WWTR1 are direct writers, erasers, readers, or substrates of pseudouridine; rather, they should currently be regarded as genes whose expression is closely linked to pseudouridine-related gene signatures and pathways. The relatively small sample size introduced variability in data partitioning, contributing to performance inconsistencies, including an AUC in the validation set that exceeded that of the training set, and heterogeneity within the experimental cohort may have influenced analytical robustness. In addition, the treatment response in this study represents a computational prediction rather than an outcome derived from in vivo or in vitro experimental research, and the precise molecular mechanisms and biological functions of the identified prognostic genes require further validation. Future studies should address these limitations by increasing cohort sizes, enriching subgroup stratifications, and integrating data from diverse populations to validate prognostic gene expression patterns across distinct tissue samples. In particular, the three-gene signature should be validated in large, prospective, multicenter cohorts with standardized treatment and follow-up, and further in vitro and in vivo functional experiments will be essential to clarify the mechanistic links with pseudouridylation and to corroborate these findings, thereby ensuring their translational applicability in CRC diagnosis, prognosis, and therapeutic decision-making.
Data availability
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://portal.gdc.cancer.gov, The Cancer Genome Atlas; https://www.ncbi.nlm.nih.gov/geo, The Gene Expression Omnibus; https://www.gsea-msigdb.org/gsea/msigdb/index.jsp, The Molecular Signatures Database.
References
Arnold, M. et al. Global patterns and trends in colorectal cancer incidence and Mortalit y. Gut 66 (4), 683–691 (2017).
Puccini, A. et al. Colorectal cancer: epigenetic alterations and their clinical implicati Ons. Biochim. Biophys. Acta Rev. Cancer. 1868 (2), 439–448 (2017).
Cavalli, G. & Heard, E. Advances in epigenetics link genetics to the environment and disease. Nature 571 (7766), 489–499 (2019).
Chi, P., Allis, C. D. & Wang, G. G. Covalent histone modifications–miswritten, misinterpreted and mis-era Sed in human cancers. Nat. Rev. Cancer. 10 (7), 457–469 (2010).
Zhao, Z. & Shilatifard, A. Epigenetic modifications of histones in cancer. Genome Biol. 20 (1), 245 (2019).
Brenner, H., Kloor, M. & Pox, C. P. Colorectal cancer. Lancet 383 (9927), 1490–1502 (2014).
Vera, R. et al. Multidisciplinary management of liver metastases in patients with Colo rectal cancer: a consensus of SEOM, AEC,SEOR, SERVEI, and SEMNIM. Clin. Transl Oncol. 22 (5), 647–662 (2020).
Hon, K. W., Ab-Mutalib, N. S., Abdullah, N. M. A., Jamal, R. & Abu, N. Extracellular Vesicle-derived circular RNAs confers chemoresistance in colorectal cancer. Sci. Rep. 9 (1), 16497 (2019).
Siegel, R. L., Miller, K. D. & Jemal, A. Cancer statistics, 2015. CA Cancer J. Clin. 65 (1), 5–29 (2015).
Boccaletto, P. et al. MODOMICS: a database of RNA modification pathways. 2017 update. Nucleic Acids Res. 46 (D1), D303–D307 (2018).
Li, F. & Li, W. Readers of RNA modification in cancer and their anticancer inhibitors. Biomolecules 14 (7), 881 (2024).
Orsolic, I., Carrier, A. & Esteller, M. Genetic and epigenetic defects of the RNA modification machinery in Ca Ncer. Trends Genet. 39 (1), 74–88 (2023).
Lin, S. & Kuang, M. RNA modification-mediated mRNA translation regulation in liver cancer: mechanisms and clinical perspectives. Nat. Rev. Gastroenterol. Hepatol. 21 (4), 267–281 (2024).
Zhang, X., Zhu, W-Y., Shen, S-Y., Shen, J-H. & Chen, X-D. Biological roles of RNA m7G modification and its implications in Cance r. Biol. Direct. 18 (1), 58 (2023).
Li, D., Fu, Z., Dong, C. & Song, Y. Methyltransferase 3, N6-adenosine-methyltransferase complex catalytic subunit-induced long intergenic non-protein coding RNA 1833 N6-methyla denosine methylation promotes the non-small cell lung cancer progressi on via regulating heterogeneous nuclear ribonucleoprotein A2/B1 expres Sion. Bioengineered 13 (4), 10493–10503 (2022).
Zimna, M., Dolata, J., Szweykowska-Kulinska, Z. & Jarmolowski, A. The expanding role of RNA modifications in plant RNA polymerase II Tra nscripts: highlights and perspectives. J. Exp. Bot. 74 (14), 3975–3986 (2023).
Zou, J. et al. Dynamic regulation and key roles of ribonucleic acid methylation. Front. Cell. Neurosci. 16, 1058083 (2022).
Bernick, D. L., Dennis, P. P., Höchsmann, M. & Lowe, T. M. Discovery of pyrobaculum small RNA families with atypical Pseudouridin e guide RNA features. RNA 18 (3), 402–411 (2012).
Krstulja, A. et al. Tailor-Made molecularly imprinted polymer for selective recognition of the urinary tumor marker Pseudouridine. Macromol. Biosci. 17 (12). https://doi.org/10.1002/mabi.201700250 (2017).
Barbieri, I. & Kouzarides, T. Role of RNA modifications in cancer. Nat. Rev. Cancer. 20 (6), 303–322 (2020).
Boriack-Sjodin, P. A., Ribich, S. & Copeland, R. A. RNA-modifying proteins as anticancer drug targets. Nat. Rev. Drug Discov. 17 (6), 435–453 (2018).
Kan, G. et al. Dual Inhibition of DKC1 and MEK1/2 synergistically restrains the growt h of colorectal cancer cells. Adv. Sci. (Weinh). 8 (10), 2004344 (2021).
Alawi, F. & Lin, P. Dyskerin is required for tumor cell growth through mechanisms that are independent of its role in telomerase and only partially related to i Ts function in precursor rRNA processing. Mol. Carcinog. 50 (5), 334–345 (2011).
Wu, X. et al. Yin X-Y: SUMO specific peptidase 3 halts pancreatic ductal adenocarcinoma Metas Tasis via desumoylating DKC1. Cell. Death Differ. 30 (7), 1742–1756 (2023).
Liu, X-Y., Tan, Q. & Li, L-X. A pan-cancer analysis of dyskeratosis congenita 1 (DKC1) as a prognost Ic biomarker. Hereditas 160 (1), 38 (2023).
Haque, A., Engel, J., Teichmann, S. A. & Lönnberg, T. A practical guide to single-cell RNA-sequencing for biomedical researc h and clinical applications. Genome Med. 9 (1), 75 (2017).
Navin, N. E. The first five years of single-cell cancer genomics and beyond. Genome Res. 25 (10), 1499–1507 (2015).
Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods. 6 (5), 377–382 (2009).
Patel, A. P. et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary g Lioblastoma. Science 344 (6190), 1396–1401 (2014).
D’Avola, D. et al. High-density single cell mRNA sequencing to characterize Circulating t Umor cells in hepatocellular carcinoma. Sci. Rep. 8 (1), 11570 (2018).
Kim, K-T. et al. Application of single-cell RNA sequencing in optimizing a combinatoria l therapeutic strategy in metastatic renal cell carcinoma. Genome Biol. 17, 80 (2016).
Chung, W. et al. Single-cell RNA-seq enables comprehensive tumour and immune cell Profi Ling in primary breast cancer. Nat. Commun. 8, 15081 (2017).
Min, J-W. et al. Identification of distinct tumor subpopulations in lung adenocarcinoma via Single-Cell RNA-seq. PLoS One. 10 (8), e0135817 (2015).
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184 (13), 3573–3587e3529 (2021).
Khaliq, A. M. et al. Refining colorectal cancer classification and clinical stratification through a single-cell atlas. Genome Biol. 23 (1), 113 (2022).
Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 9, 559 (2008).
Hänzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinform. 14, 7 (2013).
Simon, N., Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for cox’s proportional hazards model via coordina Te descent. J. Stat. Softw. 39 (5), 1–13 (2011).
Kanehisa, M., Furumichi, M., Sato, Y., Matsuura, Y. & Ishiguro-Watanabe, M. KEGG: biological systems database as a model of the real world. Nucleic Acids Res. 53 (D1), D672–D677 (2025).
Kanehisa, M. Toward Understanding the origin and evolution of cellular organisms. Protein Sci. 28 (11), 1947–1951 (2019).
Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28 (1), 27–30 (2000).
Love, M. I., Huber, W. & Anders, S. Moderated Estimation of fold change and dispersion for RNA-seq data Wi Th DESeq2. Genome Biol. 15 (12), 550 (2014).
Yerukala Sathipati, S., Sahu, D., Huang, H-C., Lin, Y. & Ho, S-Y. Identification and characterization of the LncRNA signature associated with overall survival in patients with neuroblastoma. Sci. Rep. 9 (1), 5125 (2019).
Xue, M. Y. & Cao, H. X. LINC01551 promotes metastasis of nasopharyngeal carcinoma through Targ Eting microRNA-132-5p. Eur. Rev. Med. Pharmacol. Sci. 24 (7), 3724–3733 (2020).
Ramírez-Vidal, L. et al. Peripherical blood hsa-miR-335-5p quantification as a Prognostic, but not Diagnostic, marker of gastric cancer. Diagnostics (Basel). 14 (15), 1614 (2024).
Ni, X., Xie, J. K., Wang, H. & Song, H. R. Knockdown of long non-coding RNA LINC00324 inhibits proliferation, Mig ration and invasion of colorectal cancer cell via targeting miR-214-3p. Eur. Rev. Med. Pharmacol. Sci. 23 (24), 10740–10750 (2019).
Hajebi Khaniki, S., Shokoohi, F., Esmaily, H. & Kerachian, M. A. Analyzing aberrant DNA methylation in colorectal cancer uncovered Inta Ngible heterogeneity of gene effects in the survival time of patients. Sci. Rep. 13 (1), 22104 (2023).
Christodoulou, S. et al. MicroRNA-675-5p overexpression is an independent prognostic molecular biomarker of Short-Term relapse and poor overall survival in colorecta l cancer. Int. J. Mol. Sci. 24 (12), 9990 (2023).
Eagle, S. R. et al. Evaluating targeted therapeutic response with predictive Blood-Based B iomarkers in patients with chronic mild traumatic brain injury. Neurotrauma Rep. 4 (1), 404–409 (2023).
Tremblay MG, Sibai DS, Valère M, Mars J-C, Lessard F, Hori RT, Khan MM, Stefanovsky VY, Ledoux MS, Moss T: Bidirectional cooperation between Ubtf1 and SL1 determines RNA Polymerase I promoter recognition in cell and is negatively affected in the UBTF-E210K neuroregression syndrome. bioRxiv 2021:2021.2006.2007.447350.
Chen, H-F. et al. TAF1B depletion leads to apoptotic cell death by inducing nucleolar St Ress and activating p53-miR-101 circuit in hepatocellular carcinoma. Front. Oncol. 13, 1203775 (2023).
Tang, L., Guo, C., Li, X., Zhang, B. & Huang, L. TAF15 promotes cell proliferation, migration and invasion of gastric c ancer via activation of the RAF1/MEK/ERK signalling pathway. Sci. Rep. 13 (1), 5846 (2023).
Rocha, M. L., Schmid, K. W. & Czapiewski, P. The prevalence of DNA microsatellite instability in anaplastic thyroid carcinoma - systematic review and discussion of current therapeutic o Ptions. Contemp. Oncol. (Pozn). 25 (3), 213–223 (2021).
Apostolou, S., Murthy, S. S., Kolachana, P., Jhanwar, S. C. & Testa, J. R. Absence of post-transcriptional RNA modifications of BCL10 in human Ma lignant mesothelioma and colorectal cancer. Genes Chromosomes Cancer. 30 (1), 96–98 (2001).
Qin, H. et al. Integration of ubiquitination-related genes in predictive signatures f or prognosis and immunotherapy response in sarcoma. Front. Oncol. 14, 1446522 (2024).
Perkins, N. Abstract SY06-04: regulation of cancer cell proliferation and survival by NF-κB. Cancer Res. 74 (19_Supplement), SY06–SY04 (2014). -SY.
Kumar, S. et al. Author correction: Dll1 + quiescent tumor stem cells drive chemoresista Nce in breast cancer through NF-κB survival pathway. Nat. Commun. 13 (1), 3927 (2022).
Xu, J. et al. Biochanin A Suppresses Tumor Progression and PD-L1 Expression via Inhi biting ZEB1 Expression in Colorectal Cancer. J Oncol 2022:3224373. (2022).
Zhu, X. et al. Tumor-associated macrophage-specific CD155 contributes to M2-phenotype transition, immunosuppression, and tumor progression in colorectal Ca Ncer. J. Immunother Cancer. 10 (9), e004219 (2022).
Frost, T. C. et al. YAP1 and WWTR1 expression inversely correlates with neuroendocrine Mar Kers in Merkel cell carcinoma. J. Clin. Invest. 133 (5), e157171 (2023).
Thompson, B. J. YAP/TAZ: drivers of tumor Growth, Metastasis, and resistance to therap y. Bioessays 42 (5), e1900162 (2020).
Prat, A. et al. Development and validation of the new HER2DX assay for predicting path ological response and survival outcome in early-stage HER2-positive Br East cancer. EBioMedicine 75, 103801 (2022).
Chen, Y-L. et al. Depletion of regulatory T lymphocytes reverses the Imbalance between p ro- and anti-tumor Immunities via enhancing antigen-specific T cell Im mune responses. PLoS One. 7 (10), e47190 (2012).
Lam, J. H. et al. CD30+ OX40+ Treg is associated with improved over all survival in colorectal cancer. Cancer Immunol. Immunother. 70 (8), 2353–2365 (2021).
Oliveira, G. & Wu, C. J. Dynamics and specificities of T cells in cancer immunotherapy. Nat. Rev. Cancer. 23 (5), 295–316 (2023).
Fehérvari, Z. & Sakaguchi, S. CD4 + Tregs and immune control. J. Clin. Invest. 114 (9), 1209–1217 (2004).
Gabrilovich, D. I. & Nagaraj, S. Myeloid-derived suppressor cells as regulators of the immune system. Nat. Rev. Immunol. 9 (3), 162–174 (2009).
Gabrilovich, D. I., Ostrand-Rosenberg, S. & Bronte, V. Coordinated regulation of myeloid cells by tumours. Nat. Rev. Immunol. 12 (4), 253–268 (2012).
Zhou, D. et al. Promising landscape for regulating macrophage polarization: epigenetic viewpoint. Oncotarget 8 (34), 57693–57706 (2017).
Yang, G. et al. Quantitative analysis of differential proteome expression in epithelia l-to-Mesenchymal transition of bladder epithelial cells using SILAC me thod. Molecules 21 (1), 84 (2016).
Aguilera, R. et al. Heat-shock induction of tumor-derived danger signals mediates rapid mo nocyte differentiation into clinically effective dendritic cells. Clin. Cancer Res. 17 (8), 2474–2483 (2011).
Lork, M., Staal, J. & Beyaert, R. Ubiquitination and phosphorylation of the CARD11-BCL10-MALT1 signaloso me in T cells. Cell. Immunol. 340, 103877 (2019).
Yang, D., Zhao, X. & Lin, X. Bcl10 is required for the development and suppressive function of Foxp 3+ regulatory T cells. Cell. Mol. Immunol. 18 (1), 206–218 (2021).
Casper, M. et al. Hepatocellular carcinoma as extracolonic manifestation of Lynch syndro me indicates Sect. 63 as potential target gene in hepatocarcinogenesis. Scand. J. Gastroenterol. 48 (3), 344–351 (2013).
Funding
This project was supported by the National Natural Science Foundation of China (No. 82260543), Natural Science Foundation of Ningxia (2023AAC05058, 2024AAC03557), and Scientific Research Foundation of Fujian Provincial Hospital, China (No.2020YJ04).
Author information
Authors and Affiliations
Contributions
The research was planned by TJ. The data was analyzed and the original article was written by ZJW and LYM. JZC and STL were responsible for the in vitro experimental validation and contributed to data analysis and interpretation. RXM and JW gathered references and reviewed the paper. The data was gathered by HYL and ZXZ. The final manuscript was reviewed and approved by all authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval and consent to participate
Written informed consent was obtained from each patient before surgery and all study protocols were approved by the Ethics Committee for Clinical Research of General Hospital of Ningxia Medical University (Reference Number : KYLL-2022-0800). All methods were carried out in accordance with relevant guidelines and regulations/Declaration of Helsinki.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, Z., Ma, L., Cao, J. et al. A pseudouridine-related prognostic model of colorectal cancer based on single-cell sequencing analysis and transcriptome analysis. Sci Rep 16, 4735 (2026). https://doi.org/10.1038/s41598-025-34933-0
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-34933-0









