Abstract
Alternative polyadenylation (APA) is a crucial mechanism for regulating gene expression during pre-mRNA 3′ processing. Pre-mRNA 3′ end processing factors is the main factor involved in this process. However, pre-mRNA 3′ end processing factors in different cancer expression profiles and the relationship between pre-mRNA 3′ end processing factors and tumor microenvironment and the prognosis of the same patient is still unclear. In this study, we conducted a comprehensive exploration of the core pre-mRNA 3′ end processing factors across various cancer types by utilizing common cancer database, and revealing a robust correlation between the expression of these core factors and tumor characteristics. Leveraging advanced bioinformatics databases, we evaluated the expression levels and prognostic relevance of pre-mRNA 3′ end processing factors across pan-cancer tissues. Our extensive pan-cancer analysis revealed unique expression patterns of pre-mRNA 3′ end processing factors in both tumor and adjacent non-tumorous tissues. Notably, we found a significant correlation between the expression levels of pre-mRNA 3′ end processing factors and patient prognosis. Furthermore, we identified strong associations between pre-mRNA 3′ end processing factors expression and various factors, such as stromal, immune, RNA stemness, and DNA stemness scores across pan-cancer tissues. Our data also highlighted a link between the expression of pre-mRNA 3′ end processing factors and sensitivity to specific drugs, including pyrazoloacndine, amonaflide, and chelerythrinede, among others. We found four key pre-mRNA 3′ end processing factors that play a crucial role in mRNA preprocessing. Our study illuminates the potential promotion and inhibition role of pre-mRNA 3′ end processing regulators in the progression of cancer, CPSF2, CPSF3, CSTF2, SYMPK offering valuable insights for future research investigations on these regulators as diagnostic markers and therapeutic targets across pan-cancer.
Similar content being viewed by others
Introduction
Alternative Polyadenylation (APA) is an important regulatory mechanism during the pre-mRNA 3′ processing in eukaryotes, which determines the usage and frequency of polyadenylation sites (PAS). This mechanism occurs in more than 60% of human genes and plays a pivotal function in cancer key genes’ mRNA stability, export, and translationation, thus affecting the occurrence and development of tumors and tumor microenvironment1,2,3. Recent advancements in research have pinpointed several paramount genes that orchestrate APA events. Notable among these are CPSF1, CPSF2, CPSF3, CPSF4, NUDT21, CPSF6, CSTF1, CSTF2, CSTF3, CSTFT, FIP1L1 (hFip1), SYMPK (Symplekin), WDR33, CLP1 (hClp1), PCF11 (hPcf11), and RBBP64. The proteins encoded by these genes form distinct complexes, thereby facilitating pre-mRNA cleavage and polyadenylation (CP) activity and APA. This event leads to the emergence of diverse transcript variants5, subsequently exerting an infulence on cellular gene expression. Thereby, eukaryotic cells can employ the pre-mRNA APA mechanism to precisely modulate gene expression and function, consequently influencing stem cell fate determination and orchestrating tissue and organ maturation. Additionally, In numerous pathological conditions, the alternative polyadenylation (APA) profile of pre-mRNA undergoes significant alterations. For instance, in most tumor types, APA regulation consistently favors the usage of proximal PAS site of mRNA, leading to the transcripts with shorter 3′UTR region6,7,8. These findings suggest that pre-mRNA 3′ end processing may play a crucial role in cancer development and modulation of expression for cancer-specific genes.
The pre-mRNA 3′UTR processing machinery consists mainly of four core complexes, CPSF (cleavage and polyadenylation specificity factor), CSTF (cleavage stimulation factor), CFIm (mammalian cleavage factors I) and CFIIm (mammalian cleavage factors II)9. The CPSF complex includes CPSF1, CPSF2, CPSF3, CPSF4, WDR33 and FIP1L1. Among these, CPSF3 has endonuclease activity and is the core protein that performs pre-mRNA 3′ UTR cleavage. The CSTF complex (including CSTF1, CSTF2, CSTF3) and CFIm complex (including NUDT21, CPSF6, and CPSF7) interact with the CPSF complex by binding to a binding motif conserved in the 3′ UTR region. Both play a role in facilitating pre-mRNA 3′ UTR processing activity. CFIIm, including PCF11 and CLP1, are able to interact with RNA polymerase II to induce a pause in the transcription complex at the 3′ UTR processing site, thereby facilitating the 3′ UTR processing of pre-mRNA. Furthermore, additional 3′ processing factors, such as PAP and RBBP6, were discovered early on in the purification and identification of pre-mRNA 3′ UTR complexes, and their function has been continuously expanded10,11. Prior research has underscored the role of CPSF1 in modulating the malignant trajectory of both breast and gastric cancers12,13. Recent investigations has showed the WDR33 are overexpressed in lung cancer tissue, which suggests that it may affect the occurrence and development of lung cancer14. Together with CPSF4, WDR33 recognizes the conserved AAUAAA sequence in the 3′ UTR region and recruits the entire CPSF complex to the PAS for cleavage15,16. However, their molecular mechanisms in tumors remain mostly unknown. In specific tumor variants, such as chronic eosinophilic leukemia (CEL) and systemic mastocytosis (SM), FIP1L1 undergoes fusion with genes like PDGFRA, culminating in the emergence of the FIP1L1-PDGFRA fusion gene17,18. This chimeric gene is perceived as a pivotal driver for these particular tumor classifications and bears profound implications for therapeutic stratification and prognostic assessment.
Insight into the roles of pre-mRNA 3′ end processing factors in driving cancer tumorigenesis and progression can significantly enrich our grasp of tumor biology. In this study, using public data from The Cancer Genome Atlas (TCGA), We comprehensively analyzed parts of pre-mRNA 3′ end processing factors including CPSF1, CPSF2, CPSF3, CPSF4, NUDT21, CPSF6, CSTF1, CSTF2, CSTF3, CSTFT, FIP1L1 (hFip1), SYMPK (Symplekin), WDR33, CLP1 (hClp1), PCF11 (hPcf11), and RBBP6, among others, in different cancers for their prognostic significance. Our primary objective was to discern the correlation between the expression profiles of pre-mRNA 3′ end processing factors and tumor microenvironment (TME), immune subtypes, drug responsiveness, and immunotherapeutic outcomes in cancer patients. Additionally, we specifically explored the correlation between tumor mutation burden (TMB) and microsatellite instability (MSI) with four pre-mRNA 3′ end processing factors: CPSF2, CPSF3, CSTF2, and SYMPK.
This in-depth analysis sheds light on the pivotal role of pre-mRNA 3′ end processing factors in oncogenesis and their prospective clinical ramifications. CPSF2, CPSF3, CSTF2, and SYMPK provide more insightful information for upcoming study on these pre-mRNA 3′ end processing factors as diagnostic markers and therapeutic targets for pan-cancer. Our findings provide new insights into the role of pre-mRNA 3′ end processing factors in cancer, suggesting avenues for further research. .
Materials and methods
Identification of differential expression of pre-mRNA 3′ end processing factors in human pan-cancer tissues
We sourced RNAseq (FPKM) gene expression data, along with clinical, pathological, and immune subtype information, from the UCSC Xena database. Survival data was also obtained from this database19. For our pan-cancer TCGA analysis, we extracted and amalgamated the expression levels of 17 pre-mRNA 3′ end processing factors utilizing Perl. The Wilcox test was employed to discern significant disparities between tumor and adjacent non-tumorous tissues. We crafted boxplots and heatmaps with the R packages “ggpubr” and “pheatmap” respectively. The correlation dynamics of pre-mRNA 3′ end processing factors were explored via the R package “corrplot”. It is pivotal to highlight that the TCGA database might be constrained in terms of normal control samples for certain malignancies, sometimes offering fewer than five samples. Such a scant sample pool can introduce systematic inaccuracies in the pronouncements. Consequently, data from these specific cancers were omitted in our R code analysis when juxtaposing tumor and normal tissues.
For the expression and gene alteration data of pre-mRNA 3′ end processing factors across diverse cancer cell lines, we turned to the CCLE database (https://portals.broadinstitute.org/ccle)20. The CCLE database contains a large amount of human cancer cell line data, which is obtained from various types of cancer Therefore, using CCLE data for pan cancer analysis can provide more comprehensive data support and help researchers more accurately understand the commonalities and differences between different types of cancer. The Kruskal–Wallis rank test was deployed to delineate the expression landscape of pre-mRNA 3′ end processing factors, deeming differences as statistically significant at p < 0.05. Boxplots and heatmaps were fashioned using the R packages “ggpubr” and “ComplexHeatmap”, respectively.
Survival analyses based on the expression level of pre-mRNA 3′ end processing factors in human cancer
To delve into the association between pre-mRNA 3′ end processing factors expression and clinical outcomes, we sourced survival data for each sample from the TCGA database21. We meticulously assessed the overall survival rate (OS)22. For survival analysis, Kaplan–Meier (KM) survival curves and log-rank tests were utilized, setting a significance threshold at p < 0.05. The median expression level of pre-mRNA 3′ end processing factors for each specific cancer type served as the cutoff, bifurcating patients into high- or low-risk cohorts. Using the "survminer" and "survival" R packages, we crafted survival curves, elucidating the survival disparities between the delineated risk groups. Furthermore, a Cox analysis was undertaken to probe the nexus between pre-mRNA 3′ end processing factors′ expression and the overarching prognosis of cancer. Conclusively, the R packages "survival" and "forestplot" facilitated the creation of a forest plot, offering a holistic snapshot of our findings.
Correlation analysis of the expression of pre-mRNA 3′ processing factors with TME and stemness score in pan-cancer tissues
The Estimation of Stromal and Immune Cells in Malignant Tumor Tissues using Expression Data (ESTIMATE) is a method derived from the single-sample Gene Set Enrichment Analysis (ssGSEA)23. In our research, we harnessed the "estimate" and “limma” R packages to derive stromal and immune cell scores. These scores served to estimate the degree of stromal and immune cell infiltration across various cancer tissues. To complement the database information, we calculated RNA stemness scores (RNAss) and DNA stemness scores (DNAss), as these were not directly available from the UCSC Xena database24. To discern the relationships between the expression patterns of pre-mRNA end processing factors and both RNASS and DNASS, we applied the Spearman correlation method, utilizing the “cor” function in R.
Correlation analysis of pre-mRNA 3′ end processing factors with drug sensitivity and immune subtypes
We retrieved the processed drug sensitivity data from the CellMiner dataset (available at CellMiner as of 19 June 2022). Data analysis and visualization were carried out using R Studio (version 4.2.1), leveraging the impute, limma, and ggplot2 R packages. For the drug sensitivity analysis, the expression levels of 17 genes from UCSC Xena database were extracted. Subsequently, a correlation analysis between the gene expression levels and drugs in CellMiner database was performed using pearson’s method. Results with p < 0.05 were filtered and exported into Excel tables. These tables were then visualized, and data with a p value < 0.01 were selected. For the immune subtype analysis, we sourced the pertinent data from UCSC. To probe the relationship between pre-mRNA 3′ end processing factors and immune subtypes, we predominantly employed the limma and reshape2 packages within the R environment.
Correlation analysis of CPSF2 CPSF3 CSTF2 and SYMPK expression with TMB and MSI
Tumor Mutational Burden (TMB) scores in various cancer cells have been shown to enhance immune recognition and are strongly correlated with the efficacy of immunotherapy25,26,27. Tumors with high Microsatellite Instability (MSI-H) are distinguished by an increased frequency of insertions/deletions (indels). When these indels occur within coding regions, they can give rise to new neoantigens28. In our research, we derived these TMB scores from somatic mutation data sourced from The Cancer Genome Atlas (TCGA). To depict the relationship between the expression of CPSF2, CPSF3, CSTF2, and SYMPK with TMB and MSI, we constructed four radar charts. For this analysis, we employed Spearman's rank correlation method.
Statistical analyses
In our research, we harnessed a spectrum of statistical techniques tailored to the data's characteristics and our investigative aims. For contrasting means between a pair of groups, the t-test was our tool of choice to ascertain statistical significance. In scenarios with more than two groups, we leaned on one-way ANOVA or the Kruskal–Wallis tests. To dissect survival rates, Kaplan–Meier (KM) survival curves were crafted, with the log-rank test pinpointing notable disparities among these curves. We employed Spearman’s correlation analysis to gauge correlation coefficients, granting us insights into the magnitude and trajectory of variable interrelations. Additionally, univariate Cox proportional hazard models were deployed to deduce hazard ratios for variables, thereby assessing their merit as standalone prognostic indicators. A p value threshold of 0.05 was set as our benchmark for statistical significance, meaning outcomes with p values beneath this marker were recognized as statistically pertinent.
Results
Expression and correlation of pre-mRNA 3′ end processing factors in pan-cancer tissues
The expression patterns of pre-mRNA 3′ end processing factors spanned across 33 unique cancer types (Fig. 1A). The expression data for these pre-mRNA 3′ end processing factors comes from Table S2. Several genes within this family, including CPSF1, CPSF3, CPSF4, NUDT21, PAPOLA, SYMPK, CSTF2, CSTF3, CSTF2T, and CSTF1, manifested elevated expression levels universally across these cancers. We embarked on an exploration of the interplay among various pre-mRNA 3′ end processing factors’ genes (Fig. 1B). Our data revealed a predominantly positive correlation among the expression profiles of most pre-mRNA 3′ end processing factors. To delve deeper, we meticulously examined the expression patterns of all pre-mRNA 3′ end processing factors within these 33 cancer types (Fig. 1C). The p values for comparing the gene expression differences between cancer types and disease groups and normal groups that can be analyzed for multiple differences are shown in Table S3. Notably, CPSF6 showcased pronounced expression in the CHOL category, while CSTF2T’s expression was notably subdued in pan-cancer tissues, especially within KICH (Fig. 1C).
Expression levels and correlations between pre-mRNA 3′ end processing factors in various cancers from TCGA. (A) Overall expression of pre-mRNA 3′ end processing factors in 33 types of cancers. (B) Correlations between pre-mRNA 3′ end processing factors. Blue and red dots represent positive and negative correlations, respectively. (C) Expression data from TCGA database showing the expression of pre-mRNA 3′ end processing factors in 18 types of cancers. The color of each small rectangle represents high or low expression of pre-mRNA 3′ end processing factors in each cancer. Red and green indicate high and low expression, respectively.
Further, we harnessed RNA sequencing data from the TCGA database, processed using R software, aiming to discern the differential expression of pre-mRNA 3′ end processing factors across a myriad of cancer types. Our analysis revealed that CPSF2's expression was heightened in a variety of cancers, such as BRCA, BLCA, and CHOL, while it was diminished in KIRC (Fig. 2A). CPSF3′s expression trajectory was elevated in numerous cancers, yet it was subdued within kidney chromophobe (KICH) (Fig. 2B). CSTF2 and SYMPK displayed elevated expression levels in a host of cancer types, including but not limited to bladder urothelial carcinoma (BLCA), breast invasive carcinoma (BRCA), and cholangiocarcinoma (CHOL) (Fig. 2C,D). The subsequent genes, from CPSF4 to PCF11, exhibited varied expression patterns across different cancers, as detailed in (Fig. S1A–M).
pre-mRNA 3′ end processing factors expression levels in different cancer types and normal tissue. (A) CPSF2, (B) CPSF3, (C) CSTF2, (D) SYMPK. The red rectangle box represents gene expression levels in tumor tissue and the blue rectangle box represents normal tissue. *p < 0.05; **p < 0.01; ***p < 0.001. Red- and blue-colored names indicate high and low expressions of the corresponding pre-mRNA 3′ end processing factors, respectively. BLCA Bladder urothelial carcinoma, BRCA Breast invasive carcinoma, CHOL Cholangiocarcinoma, COAD Colon adenocarcinoma, ESCA Esophageal carcinoma, GBM Glioblastoma multiforme, HNSC Head and neck squamous cell carcinoma, KICH Kidney chromophobe, KIRC Kidney renal clear cell carcinoma, KIRP Kidney renal papillary cell carcinoma, LIHC Liver hepatocellular carcinoma, LUAD Lung adenocarcinoma, LUSC Lung squamous cell carcinoma, PRAD Prostate adenocarcinoma, READ Rectum adenocarcinoma; STAD Stomach adenocarcinoma, THCA Thyroid carcinoma, UCEC Uterine corpus endometrial carcinoma. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article).
To further dissect the expression profiles of pre-mRNA 3′ end processing factors across diverse cancer cell lines, we sourced data from the CCLE database, embarking on an exhaustive statistical analysis. The gene expression patterns of NUDT21, CPSF1, and PAPOLA were positively correlated with a plethora of cancer cells. In contrast, the gene expression of PCF11, WDR33, CLP1, and CSTF2T was negatively associated with these malignancies (Fig. 3A). In specific cellular contexts, genes like CPSF1, CPSF6, FIP1L1, and NUDT21 displayed elevated expression levels in cell lines derived from breast and other sources. On the flip side, genes like CLP1, CSTF2T, PCF11, and WDR33 manifested reduced expression within these cancer cell lines (Fig. 3B). Additionally, we observed significant mutation events in specific genes across distinct cancer types, as illustrated in (Fig. 3C).
(A) The pre-mRNA 3′ end processing factors expression in different cancer cell lines (breast, central nervous system, kidney, large intestine, liver, urinary). Red represents high level expression, blue represents low level expression. (B) The pre-mRNA 3′ end processing factors expression in different cancer cell lines (breast, central nervous system, kidney, large intestine, liver, urinary). (C) Mutation frequency of pre-mRNA 3′ end processing factors in different cancer cell lines (LIHC, BRCA, GBM, KIRC, COAD, BLCA) from CCLE database. Red color represents high mutation frequency whereas blue color represents low mutation frequency.
Prognostic value of pre-mRNA 3′ end processing factors in pan-cancer
We embarked on a comprehensive exploration of the prognostic implications of pre-mRNA 3′ end processing factors across a spectrum of cancers. Using COX analysis (Fig. 4), we assessed the prognostic risk of these genes in a pan-cancer setting. Supplementary COX analysis results for other pre-mRNA 3′ end processing factors are depicted in (Fig. S2). The detailed data of COX regression analysis is shown in Table S4.
Correlation analysis of pre-mRNA 3′ end processing factors CPSF2, CPSF3, SYMPK, CSTF2 expression with survival by the COX method in different types of cancers. Different colored lines indicate the risk value of different genes in tumors, hazard ratio < 1 represent low risk and hazard ratio > 1 represent high risk.
To gauge the prognostic significance of differentially expressed pre-mRNA 3′ end processing factors in various tumor patients, Kaplan–Meier survival curves were employed. These curves highlighted the relationships between specific pre-mRNA 3′ end processing factors and clinical outcomes. Intriguingly, elevated expression of pre-mRNA 3′ end processing factors correlated with enhanced patient survival rates, whereas diminished expression was linked to decreased survival rates (Fig. 5).
For instance, CPSF2 had adverse implications in ACC, LIHC, and UVM but was protective in LGG (Fig. 5A). CPSF3 was detrimental in ACC, KIRC, LIHC, and MESO but beneficial in THYM (Fig. 5B). Similarly, CSTF2 was associated with unfavorable outcomes in LAML, LIHC (Fig. 5C). Meanwhile, SYMPK played a detrimental part in ACC and KIRC and embraced a protective function in UVM and PAAD (Fig. 5D). The patterns continued with CPSF1, CPSF4, NUDT21, CPSF6, CSTF1, CSTF3, CSTF2T, CLP1, WDR33, FIP1L1, RBP6, PAPOLA and PCF11 each showing varied expression implications across different cancer types, as detailed in (Fig. S3A–M).
To further understand the expression profiles of pre-mRNA 3′ end processing factors in various cancer cell lines, we sourced data from the CCLE database. Genes like NUDT21, CPSF1, and PAPOLA showed positive correlations with several cancer cells, while genes like PCF11, WDR33, CLP1, and CSTF2T had negative associations. In specific cellular contexts, certain genes displayed elevated or reduced expression levels, influencing the prognosis in various ways. In summary, our findings provide a comprehensive overview of the prognostic implications of pre-mRNA 3′ end processing factors across a range of cancers, offering valuable insights for future research and potential therapeutic interventions.
Association of pre-mRNA 3′ end processing factors with TME and stemness score in pan-cancer tissues
The tumor microenvironment (TME) plays a crucial role in driving cancer cell diversity, enhancing drug resistance, and steering cancer progression and metastasis. Our previous research confirmed the predictive potential of pre-mRNA 3′ end processing factors across various cancers. Understanding the relationship between pre-mRNA 3′ end processing factors expression and the TME in pan-cancer tissues is essential. Table S5 shows the p values and correlation scores of RNA and DNA, as well as the correlation scores of estimate scores.
Using the ESTIMATE algorithm, we calculated immune and stromal scores across pan-cancer tissues, as shown in Fig. 6. Notably, there was a strong positive correlation between the scores and the expression of CSTF2T and CLP1 (Fig. 6A,B). Additionally, significant positive or negative correlations were observed between pre-mRNA 3′ end processing factors expression and RNA signatures (Fig. 6C) and DNA signatures (Fig. 6D). We further explored the relationship between pre-mRNA 3′ end processing factors expression and scores related to the immune system, stroma, estimate, and stemness in specific cancers, including KIRC (top) and LIHC (bottom) (Fig. 8). In BLCA and COAD, pre-mRNA 3′ processing factors showed significant correlations with TME, as well as with DNAss and RNAss (Figs. S4–S6). In essence, our findings highlight the profound connection between pre-mRNA 3′ end processing factors and the TME, offering valuable insights for future cancer research.
Correlation of pre-mRNA 3′ end processing factors expression with tumor microenvironment, Stemness score in pan-cancer. (A, B) Pre-mRNA 3′ end processing factors expression associated with stromal score and immune score in different cancers. Red dots indicate a positive correlation between gene expression in the tumor and stromal score, and blue dots indicate a negative correlation. (C, D) Pre-mRNA 3′ end processing factors expression associated with RNAss and DNAss in different cancers. Red dots indicate a positive correlation between gene expression in the tumor and immune score, and blue dots indicate a negative correlation. RNAss RNA stemness score, DNAss DNA stemness score. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article. The color of the circle represents the correlation coefficient, red represents a positive correlation between gene expression and RNA score, while blue represents a negative correlation and the circle size is related to correlation.
Association of pre-mRNA 3′ end processing factors with immune subtypes in pan-cancer tissues
Previous research identified six unique immune subtypes, labeled C1–C629, through an in-depth immunogenomic analysis. The expressioin of pre-mRNA 3′ end processing factors have shown significantly different among these subtypes. Building on this, we explored the relationship between pre-mRNA 3′ end processing factors and these immune subtypes.
Distinct expression patterns of pre-mRNA 3′ end processing factors were observed across various pan-cancers (Fig. 7A). In particular, CPSF4, CPSF2, CPSF3, PCF11, CLP1, and CSTF2 showed marked differential expression in bladder urothelial carcinoma (BLCA) (Fig. 7B). In breast invasive carcinoma (BRCA), a range of pre-mRNA 3′ end processing factors, including CPSF1, WDR33, FIP1L1, and others, displayed significant variations in expression (Fig. 7C). In liver hepatocellular carcinoma (LIHC), genes such as CPSF1, WDR33, FIP1L1, and CPSF4, among others, showed notable differences in expression, with CPSF1 being especially elevated (Fig. 7D). Lastly, in kidney renal clear cell carcinoma (KIRC), distinct expression levels were observed for genes like WDR33, CPSF3, NUDT21, and several others (Fig. 7E). These correlations indicate potential areas for further investigation in the context of cancer therapy, contributing to the broader field of precision medicine (Fig. 8).
Correlation between the expression of pre-mRNA 3′ end processing factors and immune subtypes in BLCA, BRCA, KIRC, LIHC. (A) Correlation between the expression of pre-mRNA 3′ end processing factors and immune subtypes in pan cancers. (B) Correlation between the expression of pre-mRNA 3′ end processing factors and immune subtypes in BLCA. (C) Correlation between the expression of pre-mRNA 3′ end processing factors and immune subtypes in BRCA. (D) Correlation between the expression of pre-mRNA 3′ end processing factors and immune subtypes in LIHC. (E) Correlation between the expression of pre-mRNA 3′ end processing factors and immune subtypes in KIRC. X-axis represents immune subtype, and y-axis represents gene expression. C1, Wound healing; C2, IFN-gdominant; C3, Inflammatory; C4, Lymphocyte depleted; C5, Immunologically quiet; C6, TGF-βdominant. *p < 0.05; **p < 0.01; ***p < 0.001.
Association of pre-mRNA 3′ end processing factors with pan-cancer drug sensitivity gene therapy treatments
To investigate the potential relationship between the expression of pre-mRNA 3′ end processing factors and the susceptibility of various human cancer cell lines to different drugs, as recorded in the CellMiner™ database, we conducted an in-depth correlation analysis. All relevant data, including the expression profiles of these cell lines and their associated drug sensitivities, are detailed in (Table S6). We systematically outlined the set of 17 pre-mRNA 3′ end processing factors in Fig. 9 and (Table S6), each showing a unique association with certain drugs. Our research highlighted significant associations, such as NUDT21 having a positive correlation with susceptibility to pyrazoloacndine, amonaflide, chelerythrine, Fludarabine (Fig. 9A,C,F,M), while it negatively correlated with sensitivity to okadaic acid and hydrastinine HCL (Fig. 9G,N). Additionally, FIP1L1 showed a positive correlation with chelerythrine sensitivity (Fig. 9B). Notably, CPSF6 revealed positive correlations with sensitivity to chelerythrine (Fig. 9D), while CSTF3 was positively associated with ifosfamide (Fig. 9L) and CPSF1 was positively assosciated with fludarabine (Fig. 9E). Similarly, CPSF3 was positively correlated with chelerythrine sensitivity (Fig. 9F), and PCF11 was positively associated with susceptibility to chelerythrine, and PX-316 (Fig. 9J,K). Finally, RBBP6 showed positive correlations with sensitivity to both chelerythrine and PX-316 (Fig. 9H,I). Considering the varied expression patterns of pre-mRNA 3′ end processing factors in tumor tissues compared to adjacent non-tumor tissues, along with the unique RNAss and DNAss profiles and the prognostic significance of pre-mRNA 3′ end processing factors, we identified CPSF2, CPSF3, CSTF2, and SYMPK as standout members of the pre-mRNA 3′ end processing factors.
Association of pre-mRNA 3′ end processing factors with pan-cancer immune microenvironment
As a component of the pre-mRNA 3′end processing machinery, the cooperative interplay among CPSF2, CPSF3, SYMPK and CSTF2 modulates mRNA 3′UTR processing activity. Although recent studies have implicated their involvement in certain tumors, a comprehensive analysis is still lacking30,31,32,33. TMB has been recognized as a valuable biomarker for predicting the outcomes of immunotherapy. Importantly, a higher TMB level suggests increased effectiveness in tumor immunotherapies34,35. Moreover, Microsatellite Instability (MSI) has been linked to tumor progression36,37. Given these insights, we sourced TMB and MSI data from the TCGA database to explore the complex relationship between TMB/MSI and the expression of CPSF2, CPSF3, CSTF2, and SYMPK (Fig. 10A,B). Tables S7 and S8 represent the MSI and TMB scores of CPSF2, respectively. A significant correlation was observed between CPSF2 expression and TMB in various cancers, including LUAD, LUSC, STAD, THYM, and UCEC (Fig. 10A). Similarly, a pronounced association was found between CPSF3 expression and TMB in cancers like BLCA, BRCA, and HNSC. A significant relationship was also noted between CSTF2 expression and TMB in cancers such as BLCA, LGG, LUAD, SARC, SKCM, STAD, and UCEC. Concurrently, CPSF2 expression showed a strong correlation with MSI in several cancers, including BRCA, COAD, DLBC, HNSC, OV, PRAD, SKCM, THCA, and UCEC. CSTF2 expression also correlated significantly with MSI in cancers like ACC, BRCA, COAD, KIRC, and UCEC (Fig. 10B). Similarly, SYMPK expression had a significant association with MSI in cancers such as BRCA, HNSC, LIHC, LUAD, LUSC, PRAD, and READ.
Discussion
Cancer remains a major global health concern, leading to rising morbidity and mortality rates.In our current study, we leveraged data from the TCGA database to evaluate the expression levels of pre-mRNA 3′ end processing factors across 33 distinct cancer types. We pinpointed and documented the differentially expressed pre-mRNA 3′ end processing factors in these cancers. To emphasize their prognostic potential, we employed KM survival curves and Cox regression analysis to determine the relationship between each pre-mRNA 3′ end processing factor and patient outcomes. Additionally, we delved into the association between the pre-mRNA 3′ end processing factors and the tumor microenvironment (TME), as well as stemness scores in each cancer type. Notably, our analysis revealed that several pre-mRNA 3′ end processing factors, including CPSF1, CPSF2, CPSF3, CPSF4, NUDT21, CPSF6, CSTF1, CSTF2, CSTF3, CSTFT, FIP1L1, SYMPK, WDR33, CLP1, PCF11, and RBP6, show promise in predicting immune subtypes. We further analyzed the relationship between CPSF2, CPSF3, CSTF2, SYMPK, drug sensitivity, immunotherapy response, tumor mutational burden (TMB), microsatellite instability (MSI), and immune activation-related genes. Over the past few decades, this has become an increasingly pressing issue in the medical community.
Through extensive research, including experimental models, preclinical study, and clinical trials, a link has been established between traditional clinicopathological tumor markers and the expression of specific pre-mRNA 3′ end processing factors in pan-cancer tissues. We further determined the relationship between pre-mRNA 3′ end processing factors and clinical pathological diagnostic markers through prognostic analysis, indicating that pre-mRNA 3′ end processing factors especially CPSF2, CPSF3, CSTF2, and SYMPK can serve as a prognostic target for cancer. This association indicates the potential of pre-mRNA 3′ end processing factors as valuable diagnostic and prognostic markers38. Furthermore, multiple evidence suggests that manipulating various pre-mRNA 3′ end processing factors can yield promising anti-tumor effects in both lab settings and living organisms. Thus, delving into pre-mRNA 3′ end processing factors are crucial for creating targeted cancer therapies. As we continue to understand the role of pre-mRNA 3′ end processing factors in pan-cancer tissues, they are progressively recognized as potential therapeutic targets, especially in relation to tumor progression and metastasis. For example, NUDT21 (CFIm25), a specific APA regulator during pre-mRNA 3′ end processing, has been observed to induce 3′ UTR shortening in several genes within glioblastoma cells, promoting tumor suppression39. Additionally, CPSF7 another large subunit of the CFIm complex, has been found to plays a regulatory role in liver cancer growth and metastasis by targeting the WWP2/PTEN/AKT signaling pathway40.
The tumor microenvironment (TME) has garnered significant attention in recent cancer research41,42,43. The focus has shifted from merely studying metastatic tumor cells to understanding the core cancer cells and their surrounding environments. Immune and stromalcells, the crucial non-tumor components of the TME, are now recognized as crucial factors in cancer diagnosis and prognosis. The composition of cells within the TME and the extent of immune and stromal cell infiltration can significantly influence patient outcomes. The ESTIMATE algorithm is designed to quantify the immune and stromal components within tumors, calculates immune and stromal scores using specific gene expression signatures. To better comprehend the prognostic relationship between immune and stromal cells and pre-mRNA 3′ end processing factors, we conducted a systematic analysis, comparing the expression of pre-mRNA 3′ end processing factors with the calculated immune and stromal scores. Our findings suggest an inverse relationship between the expression levels of pre-mRNA 3′ end processing factors and the infiltration of immune and stromal cells within the TME. Additional research into this relationship's functional implications could offer insights into the role of pre-mRNA 3′ end processing factors especially CPSF2, CPSF3, CSTF2, and SYMPK in modulating the TME and their potential impact on cancer prognosis.
In our research, we assessed the tumor stemness scores for pre-mRNA 3′ end processing factors across various cancers using DNAss and RNAss. We consistently observed a negative association between the pre-mRNA 3′ end processing factors and cancer stem cells, suggesting that the pre-mRNA 3′ end processing factors might suppress cancer stemness characteristics. Moreover, our correlation analysis revealed significant negative correlations between the expression levels of certain pre-mRNA 3′ end processing factors and stromal, immune, and ESTIMATE scores in hepatocellular carcinoma (LIHC). This suggests that higher expression levels of these genes are associated with lower stromal and immune infiltration levels within the LIHC TME. Furthermore, we found positive correlations between pre-mRNA 3′ end processing factors expression levels and DNAss and RNAss scores, supporting the idea that pre-mRNA 3′ end processing factors might play a role in modulating tumor stemness in LIHC. These findings underscore the potential significance of pre-mRNA 3′ end processing factors in characterizing the TME and regulating stemness in hepatocellular carcinoma, warranting further investigation.
While our study offers a comprehensive analysis of pre-mRNA 3′ end processing factors across multiple cancers, some limitations persist. This research primarily centered on the bioinformatic analysis of pre-mRNA 3′ end processing factors expression and patient survival prognosis in pan-cancer but lacked in vivo or in vitro validation experiments. Delving deeper into the cellular and molecular mechanisms of pre-mRNA 3′ end processing factors would offer more detailed insights into their functions and potential interactions with other biological processes. Addressing these limitations and conducting further experimental and prospective studies will contribute to a more comprehensive understanding of pre-mRNA 3′ end processing factors' role in cancer, potentially leading to the development of novel therapeutic strategies or biomarkers.
Conclusions
We evaluated 33 types of cancer and summarized the differential expression of pre-mRNA 3′ end processing factors in different cancers. Meanwhile, we determined the prognostic and survival significance of the pre-mRNA 3′ end processing factors in different patients through prognostic analysis. We also focused on evaluating some of the connections between the pre-mRNA 3′ end processing factors and drug sensitivity, immune subtypes, immune microenvironment, tumor mutation load and microsatellite instability. We particularly found that CPSF2, CPSF3, CSTF2, and SYMPK are particularly important and may play a crucial role in the regulation of key tumor genes, providing valuable tumor targets.
In summation, our investigation has unveiled the expression panorama of pre-mRNA 3′ end processing factors, interlinked with disease prognosis, tumor microenvironment, stemness score, and the therapeutic trajectory across diverse malignancies. Our research sheds light on the possible promotion and inhibitory roles of pre-mRNA 3′ end processing factors in the development of cancer, providing insightful information for future studies on these regulators as diagnostic indicators and therapeutic targets for all types of cancer.
Data availability
The data that support the findings of this study are openly available in The Cancer Genome Atlas (TCGA) at UCSC Xena (http://xena.ucsc.edu/), CCLE database (https://portals.broadinstitute.org/ccle) and CellMiner™ database (http://discover.nci.nih.gov/cellminer/).
References
Zhang, Y. et al. Alternative polyadenylation: Methods, mechanism, function, and role in cancer. J. Exp. Clin. Cancer. Res. 40, 1–19 (2021).
Wickens, M., Anderson, P. & Jackson, R. J. Life and death in the cytoplasm: Messages from the 3′ end. Curr. Opin. Genetics. Develop. 7, 220–232 (1997).
Wilusz, C. J., Wormington, M. & Peltz, S. W. The cap-to-tail guide to mRNA turnover. Nat. Rev. Mol. Cell. Biol. 2, 237–246 (2001).
Tian, B. & Manley, J. L. Alternative polyadenylation of mRNA precursors. Nat. Rev. Mol. Cell. Biol. 18, 18–30 (2017).
Tian, B., Hu, J., Zhang, H. & Lutz, C. S. A large-scale analysis of mRNA polyadenylation of human and mouse genes. Nucleic Acids Res. 33, 201–212 (2005).
Guo, S. & Lin, S. mRNA alternative polyadenylation (APA) in regulation of gene expression and diseases. Genes Dis. 10, 165–174 (2023).
Yuan, F., Hankey, W., Wagner, E. J., Li, W. & Wang, Q. Alternative polyadenylation of mRNA and its role in cancer. Genes Dis. 8, 61–72 (2021).
Zheng, D. & Tian, B. RNA-binding proteins in regulation of alternative cleavage and polyadenylation. Adv. Exp. Med. Biol. 825, 97–127 (2014).
Chan, S., Choi, E. A. & Shi, Y. Pre-mRNA 3′-end processing complex assembly and function. Wiley Interdiscip Rev. RNA. 2, 321–335 (2011).
Boreikaite, V., Elliott, T. S., Chin, J. W. & Passmore, L. A. RBBP6 activates the pre-mRNA 3′ end processing machinery in humans. Genes Dev. 36, 210–224 (2022).
Schmidt, M. et al. Reconstitution of 3′ end processing of mammalian pre-mRNA reveals a central role of RBBP6. Genes Dev. 36, 195–209 (2022).
Bai, Y. et al. Circulating essential metals and lung cancer: Risk assessment and potential molecular effects. Environ. Int. 127, 685–693 (2019).
Schönemann, L. et al. Reconstitution of CPSF active in polyadenylation: Recognition of the polyadenylation signal by WDR33. Genes Dev. 28, 2381–2393 (2014).
Chan, S. L. et al. CPSF30 and Wdr33 directly bind to AAUAAA in mammalian mRNA 3′ processing. Genes Dev. 28, 2370–2380 (2014).
Guo, Q. et al. An alternatively spliced p62 isoform confers resistance to chemotherapy in breast cancer. Cancer Res. 82, 4001–4015 (2022).
Kang, W., Yang, Y., Chen, C. & Yu, C. CPSF1 positively regulates NSDHL by alternative polyadenylation and promotes gastric cancer progression. Am. J. Cancer. Res. 12, 4566–4583 (2022).
Cools, J., Stover, E. H., Wlodarska, I., Marynen, P. & Gilliland, D. G. The FIP1L1-PDGFRalpha kinase in hypereosinophilic syndrome and chronic eosinophilic leukemia. Curr. Opin. Hematol. 11, 51–57 (2004).
Frickhofen, N. et al. Complete molecular remission of chronic eosinophilic leukemia complicated by CNS disease after targeted therapy with imatinib. Ann. Hematol. 83, 477–480 (2004).
Zhang, Y. et al. A signature for pan-cancer prognosis based on neutrophil extracellular traps. J. Immunother. Cancer. 10, e004210 (2022).
Li, H. et al. The landscape of cancer cell line metabolism. Nat. Med. 25, 850–860 (2019).
Chen, C. et al. The genetic, pharmacogenomic, and immune landscapes associated with protein expression across human cancers. Cancer Res. 83, 3673–3680 (2023).
Liu, J. et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell. 173, 400–416 (2018).
Linxweiler, M. et al. The immune microenvironment and neoantigen landscape of aggressive salivary gland carcinomas differ by subtype. Clin. Cancer. Res. 26, 2859–2870 (2020).
Hajimorad, M., Gray, P. R. & Keasling, J. D. Framework and model system to investigate linear system behavior in Escherichia coli. J. Biol. Eng. 5, 1–15 (2011).
Picard, E., Verschoor, C. P., Ma, G. W. & Pawelec, G. A. Relationships between immune landscapes, genetic subtypes and responses to immunotherapy in colorectal cancer. Front. Immunol. 11, 369 (2020).
Rizzo, A., Ricci, A. D. & Brandi, G. PD-L1, TMB, MSI, and other predictors of response to immune checkpoint inhibitors in biliary tract cancer. Cancers 13, 558 (2021).
Jardim, D. L., Goodman, A., de Melo Gagliato, D. & Kurzrock, R. The challenges of tumor mutational burden as an immunotherapy biomarker. Cancer Cell. 39, 154–173 (2021).
Mestrallet, G., Brown, M., Bozkus, C. C. & Bhardwaj, N. Immune escape and resistance to immunotherapy in mismatch repair deficient tumors. Front. Immunol. 14, 1210164 (2023).
Thorsson, V. et al. The immune landscape of cancer. Immunity 48, 812–830 (2018).
Nilubol, N., Boufraqech, M., Zhang, L. & Kebebew, E. Loss of CPSF2 expression is associated with increased thyroid cancer cellular invasion and cancer stem cell population, and more aggressive disease. J. Clin. Endocrinol. Metab. 99(7), E1173–E1182 (2014).
Shen, P. et al. Therapeutic targeting of CPSF3-dependent transcriptional termination in ovarian cancer. Sci Adv. 9(47), eadj0123 (2023).
Buchert, M. et al. Symplekin promotes tumorigenicity by up-regulating claudin-2 expression. Proc. Natl. Acad. Sci. U. S. A. 107(6), 2628–2633 (2010).
Xu, Y. et al. The RNA-binding protein CSTF2 regulates BAD to inhibit apoptosis in glioblastoma. Int. J. Biol. Macromol. 31(226), 915–926 (2023).
Sha, D. et al. Tumor mutational burden as a predictive biomarker in solid tumors. Cancer Discov. 10, 1808–1825 (2020).
Büttner, R. et al. Implementing TMB measurement in clinical practice: Considerations on assay requirements. ESMO Open 4, e000442 (2019).
Berardinelli, G. N. et al. Association of microsatellite instability (MSI) status with the 5-year outcome and genetic ancestry in a large Brazilian cohort of colorectal cancer. Eur. J. Hum. Genet. 30, 824–832 (2022).
Jasmine, F. et al. Interaction between microsatellite instability (MSI) and tumor DNA methylation in the pathogenesis of colorectal carcinoma. Cancers (Basel) 13, 4956 (2021).
Lembo, A., Di Cunto, F. & Provero, P. Shortening of 3′ UTRs correlates with poor prognosis in breast and lung cancer. PLoS One 7, e31129 (2012).
Masamha, C. P. et al. CFIm25 links alternative polyadenylation to glioblastoma tumour suppression. Nature 510, 412–416 (2014).
Fang, S. et al. CPSF7 regulates liver cancer growth and metastasis by facilitating WWP2-FL and targeting the WWP2/PTEN/AKT signaling pathway. Biochim. Biophys. Acta. Mol. Cell. Res. 2020, 118624 (1867).
Hinshaw, D. C. & Shevde, L. A. The tumor microenvironment innately modulates cancer progression. Cancer Res. 79, 4557–4566 (2019).
Wu, T. & Dai, Y. Tumor microenvironment and therapeutic response. Cancer Lett. 387, 61–68 (2017).
Quail, D. F. & Joyce, J. A. Microenvironmental regulation of tumor progression and metastasis. Nat. Med. 19, 1423–1437 (2013).
Acknowledgements
We sincerely thank the researchers who collected, managed, and maintained TCGA and CCLE data. Their high-quality work and efforts provide great help for our research.
Author information
Authors and Affiliations
Contributions
X.L. performed the data analysis and writing. Y.C. performed the visualization. X.W. performed the data curation. Y.Z. review and edit the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Li, X., Che, Y., Wang, X. et al. A pan-cancer analysis of the core pre-mRNA 3′ end processing factors, and their association with prognosis, tumor microenvironment, and potential targets. Sci Rep 14, 17428 (2024). https://doi.org/10.1038/s41598-024-57402-6
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-024-57402-6













