Introduction

Over half a million women receive a diagnosis of cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC) yearly, resulting in more than 300,000 deaths globally1. Most CESC cases are caused by the infection of high-risk subtypes of human papillomavirus (HPV), with 70% of CESC-related deaths being positive for HPV types 16 and 182,3. In 2020, the World Health Organization (WHO) classified CESC as HPV-associated (HPVA) and HPV-independent(HPVI). Identifying novel and more accurate biomarkers is needed to understand how HPV infection and integration affect CESC progression and to suggest novel treatment strategies accordingly4. Moreover, CESC represents a highly heterogeneous tumor, with evolving viral subtypes and acquired multidrug resistance leading to failure of oncological management. Investigating cellular heterogeneity in CESC development could be highly relevant a breakthrough for furthering our understanding of disease pathogenesis and exploring targeted drugs5yet relevant studies remain sparse. Conventional genomic profiling often overlooks tumor heterogeneity. In contrast, investigation of the entire mRNA transcriptome on a cellular level seems to respond to potential CESC molecular isoforms6.

Single-cell RNA sequencing (scRNA-seq) enables the amplification and sequencing of the entire transcriptome at the single-cell level. This technique operates on the principle of amplifying trace amounts of whole transcriptome RNA extracted from isolated single cells, followed by high-throughput sequencing. By utilizing this methodology, we are capable of determining the gene expression status in individual cells, which accurately reflects the cellular heterogeneity7,8,9identifying the diverse immune cell populations in different tissues10and establishing interoperable networks between diverse cell populations7. Researchers have performed scRNA-seq of CESC tissues to construct a map of human CESC cells11,12. During this process, they have identified the heterogeneity of the CESC tumor microenvironment (TME) and have identified cancer stem cell subpopulations (CSCs)12. For example, Li X et al.13 found that genetic markers of tumor-associated macrophages (TAM) C1QC and SPP2 were better suited than the traditional phenotypes of TAMs M1/M2 for clustering of clinical outcomes of CESC patients, suggesting the existence of novel molecular subtypes within CESC that warrant further investigation. Wei E et al.14 found differences in CESC heterogeneity between the two groups regarding HPV + and HPV-.

RNA binding proteins (RBPs), which are essential regulators of gene expression15play a vital role in various diseases, including CESC16,17. The modifications in cervical lesions’ RBP expression are likely a consequence of HPV infection18. However, expression heterogeneity and function of RBPs in HPV + and HPV- CESC tissues have not yet been reported. With this study, we hope to characterize the differential expression of RBPs in HPV + and HPV- CESC to predict their possible oncogenic role associated with HPV infection. We first revealed cell heterogeneity and differential RBP expression between the two groups of samples. We further clustered and subgrouped the epithelial with the most robust heterogeneity. We then combined bulk RNA sequencing and scRNA-seq data to reveal the functions of RBPs and concluded that CSTB, TIPARP, NDRG1, and NDRG2 were identified as key regulators of RNA alternative splicing in CESC. Thus, our investigations reveal a complete landscape of RBPs in CESC, and related molecules could be targets for future diagnostics and therapeutics.

Materials and methods

Retrieval and processing of scRNA-seq data

Retrieved single-cell RNA-seq data matrix with unique molecular identifiers (UMIs) for 2 HPV + and 2 HPV- CESC samples from GSE171894 (https://www.ncbi.nlm.nih.gov/bioproject/GSE171894). HPV status was determined by PCR or RNA-seq, with HPV + samples defined as those positive for high-risk HPV DNA (e.g., HPV16/18) and HPV- samples as those with no detectable HPV DNA or RNA. The UMI count matrix was transformed into a Seurat object using the R package Seurat19 (version 4.0.4). Cells with low quality were excluded, including those with UMI counts < 1000, identified gene counts < 500, and counts of mitochondrial-derived UMIs over 15%. Before further studies, genes found in fewer than five cells were eliminated.

Clinical data and tissue sample collection

Tissue samples and clinical data were gathered from Guangxi Medical University’s First Affiliated Hospital. Included were 74 patients pathologically diagnosed with cervical squamous cell carcinoma or adenocarcinoma(CESC) in our hospital between 2012 and 2017, as determined by pathology. Complete clinical case data, follow-up information for every patient, and paraffin pathology sections of matched CESC tissues were gathered. The First Affiliated Hospital of Guangxi Medical University’s Ethics Committee approved the use of CESC samples in this study (No.: KT-2022-Guike-009), informed consent was obtained from all participants or their legal guardians, and all methods were performed by the relevant guidelines and regulations.

ScRNA-seq statistics preprocessing and quality control

The UMI count matrix was log-normalized. Seurat’s findIntegrationAnchors function was subsequently employed to generate viable Anchors from the pinnacle 2,000 fairly variable genes. To limit the dimensionality of the scRNA-Seq dataset, principal component analysis (PCA) was carried out on a built-in information matrix. For the downstream evaluation using Seurat’s Elbowplot function, the pinnacle 50 principal components (PCs) were selected. The primary cell clusters were identified using Seurat’s FindClusters function at the default resolution(res = 0.8). Subsequently, these clusters were prepared into a total of 17 main cell clusters. The visualization of these clusters used to be achieved using the UMAP technique. To confirm the cell types, we employed the “FindMarkers” function of the Seurat package(v4.0.4), which enabled us to discover gene markers unique to every cell cluster. We then utilized the ScType tools20incorporating until now posted prolonged marker genes14 to annotate mobile types.

ScRNA-seq differential gene expression analysis

With the FindMarkers/FindAllMarkers function included in the Seurat package, differentially expressed genes (DEGs) were identified (one-tailed Wilcoxon rank sum test, p-values were adjusted for multiple testing using the Bonferroni correction). To compute the DEGs accurately, we ensured that all genes were adequately assessed. This involved providing that the variance in the percentage of detected cells differed by at least 0.15, the discrepancy in expression on a natural logarithmic scale for all genes was no less than 0.5, and the adjusted p-value was below 0.05.

RBP genes analysis

Firstly, a catalog of 2,141 RNA-binding proteins (RBPs) was retrieved from four previously published studies21,22,23,24. This list includes both canonical RBPs (with well-defined RNA-binding domains) and non-canonical RBPs (with secondary RNA-binding capabilities). RBPs’ UMI count matrices were retrieved and used as Seurat’s cell cluster input, and the Seurat package’s “FindAllMarkers” function was used to select RBPs with differential activation.

Retrieval and process of bulk RNA-seq data from TCGA

For alternative splicing identification, splice junction data in BED format were downloaded from GDC25 for 142 HPV + and 8 HPV- tumor samples from TCGA CESC. For downstream analysis, we obtained the TCGA CESC project data from the UCSC XENA database(https://xenabrowser.net/datapages/), which consisted of gene expression patterns and clinical data.

Identification of differentially expressed RBPs between HPV + and HPV- samples

The DESeq2 software26 was utilized to filter out the DEGs, which analyzes the differential expression of genes by utilizing raw count data from expressed genes. To assess the differential expression of a gene, the results were assessed considering the fold change (FC ≥ 2 or ≤ 0.5) and false discovery rate (FDR ≤ 0.05). Then differentially expressed RBP expression profiles were selected from all DEGs.

Alternative splicing analysis

We utilized the SUVA software to detect and measure regulatory alternative splicing events (RASE)27. Each group was subjected to an analysis to investigate various splicing scenarios. The reading proportion for each SUVA AS event (pSAR) was computed. In brief, SUVA detection of five types of ASEs was based on the splice junction reads, including 5’ splice site is shared and 3’ splice site is alternative (alt3p), 3’ splice site is shared and 5’ splice site is alternative (alt5p), both splice sites are different but part of the splice junction are overlapped (olp), both splice sites are different but one splice junction is contained in another splice junction (contain), a pair of splice sites are either used or not used which is identical to the intron retention (ir). We calculated the ratio of alternatively spliced to constitutively spliced reads as the RASE ratio between compared samples. We set a p-value ≤ 0.05 for RASE discovery. Student’s t-test was performed to assess an altered ASE ratio in a repetition comparison. The differences in ASEs at the p-value cutoff of 0.05 were considered RASEs.

Co-expression analysis

We conducted co-expression analysis on all RBPs and RASs (pSAR ≥ 50%) that were differentially expressed. Furthermore, we computed correlation coefficients for the differentially expressed RBPs and RASs, and only DERBP-RAS pairs meeting the thresholds (|correlation coefficient| ≥0.4, p-value ≤ 0.01) were retained for further analysis.

Functional enrichment analysis

To classify the functional categories of the chosen gene sets, KOBAS 2.028 was utilized to detect Gene Ontology (GO) terms and KEGG pathways29. The enrichment of each term/pathway was determined by employing the hypergeometric test and the Benjamini-Hochberg FDR controlling procedure.

Immunohistochemical staining

A cohort of 74 HPV + and HPV- CESC patients was sampled randomly and which were recruited for immunohistochemical staining. The clinical profile of the selected samples is listed in Table 1. These tissues were embedded in paraffin and sectioned for immunohistochemical staining. To remove the wax from the slides, xylene should be used, followed by rehydration using graded ethanol. Retrieval of the antigen involves boiling the slide in a citrate solution. Subsequently, the slides were blocked with bovine serum albumin and incubated overnight with primary antibody. The horseradish peroxide-bound secondary antibody was incubated with slides. They were then placed in a DAB solution and stained with hematoxylin. A semi-quantitative score was assigned based on the degree of staining (0, unstained; 1, weak; 2, moderate; 3, strong).

Table 1 Clinical characteristics and IHC level of patients with CESC.

Other statistical analysis

Clustering based on the Euclidean distance was performed utilizing the heatmap package (https://cran.r-project.org/web/packages/pheatmap/index.html) in R. For assessing the comparison between two sets of duplicate samples, the speckle R package (version 0.0.3) was implemented30. T-tests were used to compare quantitative data, and Chi-square and Yates’ correction tests were used to compare qualitative data, with P<0.05 indicating statistically significant. Log–rank and Kaplan–Meier plots determined the overall survival (OS). The association between RBP and the TCGA database’s OS for cervical cancer was investigated using the online tool GEPIA2(http://gepia2.cancer-pku.cn/). Using Cox univariate and multivariate analysis, independent clinical factors affecting the prognosis of patients with CESC were found. Independent risk factors were then selected as variables for prognostic model construction.

Results

ScRNA-seq analysis of tumors from HPV + and HPV- CESC patients identified different cell types

We collected scRNA-seq data from 2 HPV + CESC samples and 2 HPV- CESC samples that have been published. First, we acquired the transcriptome profiling data from 14,453 unique cells after stringent data quality control. Following principal component normalization and downscaling analysis, the top 50 principal components were retained for downscaling and UMAP visualization, resulting in 17 cell clusters after unbiased cluster analysis (Fig. 1A) and by utilizing the recently developed sctype software and integrating it with the cellular marker previously documented in the original CESC literature identified 9 major cell types (Fig. 1B, Figure S1B), which were significantly differentially distributed across HPV + and HPV- groups (Figure S1A). Subsequently, an examination of the cell cluster proportions within each sample revealed noticeable alterations in the composition of cells between the HPV + and HPV- sample groups, particularly in the epithelial cell population. Compared with HPV-, the proportions of epithelial cells such as C1, C8, C11, C12: Epithelial increased dramatically; and the proportions of C4, C16: Epithelial, C6: NK, C12: DC, C10: Macrophage, and C5:CD8 T cells decreased (Fig. 1C-D, Figure S1C). These results revealed distinct cell composition differences between HPV + and HPV- CESC tissues.

Fig. 1
figure 1

The analysis of tumor cells from HPV + and HPV- CESC patients using ScRNA-seq revealed distinct cell types. UMAP plot illustrating the composite single-cell transcriptomic pattern of tumor samples from HPV + and HPV- CESC patients. Colors respectively represent cell clusters(A) and cell annotations(B). (C) The cell proportions of different cell types in each sample were illustrated using a bar plot. (D) Rank order based on decreasing values of the relative frequency ratio between two sample groups. Arrange the order of sample groups based on lowering relative frequency ratios. (E) Unsupervised clustering heatmap showing relative expression (z score, column scaled) levels of top 3 marker genes in each cell cluster. (F) Number of differentially expressed genes and RBPs between HPV + vs. HPV- sample groups within each cell type. (G) The bar plot displays the pinnacle 10 enriched GO BP phrases for upregulated genes in epithelial cells between HPV + vs. HPV- groups. (H) The bar plot shows the pinnacle 10 enriched GO BP phrases for down-regulated genes in epithelial cells between HPV + vs. HPV- groups. (The legend should be concise and capable of being interpreted without reference to the text.).

Further cluster analysis revealed that different cell clusters and cell types in CESC share precise RBP marker gene expression (Fig. 1E). Differential expression analysis of HPV + vs. HPV- was carried out for each cell type separately. Figure 1F displays the number of genes expressed differentially and the number of differentially expressed RBPs in the two groups. As can be seen, the difference between epithelial HPV + vs. HPV- cells is most evident. Analyzing the gene ontology enrichment, it was observed that the up-regulated genes in epithelial cells demonstrated significant enrichment in protein folding, cellular differentiation, and various other pathways. On the other hand, the down-regulated genes were predominantly enriched in pathways associated with translation(Fig. 1G-H). The up-regulated genes for Macrophage differences were primarily related to immune recognition and immunotaxis pathways, while the down-regulated genes were associated with IL17 and other signaling pathways (Figure S1D-E). Together, these results provide important insights into HPV + and HPV- CESC in terms of cellular heterogeneity and heterogeneity of RBP expression, as illustrated by the top marker RBP genes of C11:Epithelial, MALAT1, and NEAT1, which were significantly heterogeneous in HPV + CESC(Figure S1F-G).

Identifying heterogeneity and regulatory module of cell-specific RBP expression patterns in HPV + and HPV- CESC patients

Utilizing Seurat software, unsupervised clustering was conducted with the gene expression data of 2,141 RBPs(Fig. 2A). Figure S2A-B shows the expression of the clusters of RBP-expressing cells in the 17 cell clusters and 9 cell types characterized in the previous section. We found that groups of RBP-expressing cells were highly cell type-specific in CESC, with lymphocytes having a relatively similar composition, expressing primarily R0 and R1 clusters; DCs and macrophages being somewhat similar in composition, mainly expressing R6 cluster; Endothelial cells predominantly expressing the R9 cluster; and epithelial cells were the most heterogeneous cell type, with a variety of different groups of RBP-expressing cells being comprised (Fig. 2B, Figure S2C). At the single-cell level, the expression of RNA-binding proteins (RBPs) exhibited heterogeneity across various cell types, RBP-expressing cell clusters, and sample groups(Fig. 2C), with particularly significant differences observed between the HPV + and HPV- groups, specifically within the R2-R8 clusters(Figure S2D). We specifically focused on VIM, mainly expressed in immune cells (Fig. 2D), while S100A9 was expressed primarily in epithelial cells (Fig. 2E).

Fig. 2
figure 2

Identifying heterogeneity and regulatory module of cell-specific RBP expression patterns in HPV + and HPV- CESC patients. (A) UMAP plot of scRNA-seq profile. Different cell clusters are distinct based on the RBP expression module. (B) The proportions of cell populations inside every RBP expression module contrast in a bar plot. (C) A clustering heatmap indicates the relative expression (column scaled, z score) levels of RBP markers for every RBP expression pattern within a single-cell dataset, considering various clinical alternatives and containing distinctive cell sorts and sample groups. (D-E) The gene expression levels of VIM and S100A9 were in particular shown in the UMAP plot, highlighting their affiliation with the HPV- and HPV + pattern companies.

Functional RBPs are primarily regulated in epithelial cells between HPV + and HPV- samples from CESC patients

We focused on secondary clustering and analysis of epithelial in cervical cancer, obtaining 9 subpopulations(E0-E8)(Fig. 3A). There was significant heterogeneity between groups of samples from HPV + and HPV- (Figure S3A). 6 RBP-expressing cell clusters (eR0-eR5) were derived based on RBP expression in epithelial(Fig. 3B). The main HPV + expressing clusters were E0 (mainly expressing eR0), E1 (mainly expressing eR0), E4 (mainly expressing eR4), E5 (mainly expressing eR2), and E7 (mainly expressing eR5). The main HPV- expressing clusters were E2 (mainly expressing eR1), E3 (mainly expressing eR3), E6 (mainly expressing eR1), and E8 (mainly expressing eR4) (Fig. 3C-D). Figure 3E demonstrates significant differences in RBPs between HPV + and HPV-, and their distribution across subpopulations of epithelial cells and clusters of RBP-expressing cells is distinct(Fig. 3E, Figure S3B). By co-expression analysis, HPV + RBPs are co-expressed in several cancer-related pathways, such as endoplasmic reticulum stress, apoptosis, endocytosis, and hypoxia response (Fig. 3F-G). These RBPs include CNN3, which strongly correlates with genes related to hypoxia response, and CNN3 has also been reported to be an oncogene that promotes the invasion and migration of CESC cells (Fig. 3H-I). HPV + down-regulated co-expressed RBP genes were then enriched in pathways linked to transcription and translation (Figure S3C), and we specifically focused on genes with PPIA and PKM that are respectively highly correlated with the response to hypoxic stress pathway and glycolysis pathway (Figure S3D-E).

Fig. 3
figure 3

Functional RBPs are regulated mainly in epithelial cells between HPV + and HPV- samples from CESC patients. (A) The distribution of 9 cell clusters based on epithelial cells is displayed in a UMAP plot. The color denotes the sample groups (left) and cell clusters (right). (B) The RBP expression pattern categorizes cells in a UMAP plot of scRNA-seq profiles. Cells are colored based on their respective cell clusters. (C) A stacked bar plot illustrating the relative scale of cell populations. (D) The relative proportions of epithelial cell populations, based on the RBPs expression module, are displayed in a stacked bar plot across different cell clusters. (E) An unsupervised clustering heatmap showcases the expression levels of up-and down-regulated RBPs between HPV + and HPV- samples, considering various clinical alternatives with samples and RBP clusters. (F) Cytoscape visualizes co-expression networks consisting of upregulated RBPs in HPV+. Nodes characterize genes, while edges connect RBP-target gene pairs. RBPs are highlighted in a larger font size and red color. The “GRN” algorithm from SCENIC is utilized to construct co-expression associations between RBPs and target genes in epithelial cells. (G) The heat map plot displays the pinnacle enrichment GO biological procedure pathways of each RBP’s (Fig. 3F) co-expressed genes. (H) The UMAP plot indicates the gene expression level of CNN3 in HPV + and HPV- samples. (I) The violin plot highlights the gene expression level of CNN3 divided into distinct pattern groups.

Characterizing HPV+-related AS events that are well conserved in CESC sufferers and co-disrupted with RBP

To mitigate the limitations of our single-cell data, we validated key findings using bulk RNA-seq data from the TCGA-CESC cohort (142 HPV + and 8 HPV- samples). Through SUVA analyses, we identified a total of 818 Regulatory alternative splicing(RAS) events in the TCGA database(Fig. 4A). Matching the splicing events identified by SUVA to the classic splicing events, Exon skip(ES), cassette Exon(Exon skip, ES), Alternative 3’ splice site(A3SS), and Alternative 5’ splice site(A5SS), which are the most common alternative splicing events in humans, were responsible for the majority of the proportion (Fig. 4B). Additionally, we filtered out 454 splice events (pSAR > = 50%) that accounted for the prevailing transcripts (Fig. 4C). PCA based on the splicing ratios of these RASs separated the two groups of samples, suggesting that the RNA splicing landscape is closely related to the development of cervical cancer (Fig. 4D). The heatmap in Fig. 4E shows the different patterns of HPV + and HPV- splicing ratios. These 454 RAS genes are enriched in cancer-related pathways, such as Endocytosis, Rap1 signaling pathways, Axon guidance, and Tight junction (Fig. 4F), while GO enrichment analysis demonstrated that the GO terms were enriched in endosome to lysosome transport and negative regulation of double-strand break repair(Figure S4A). Figure S4B is a heat map illustrating the landscape of HPV + vs. HPV- differentially expressed (DE) RBPs. Following filtering (|correlation|>=0.4, and p-value < = 0.01), a total of 4 RBPs were found to have covariation with 25 RAS (Fig. 4G-H, Figure S4C). As the center of the covariation network, the alternative splicing event that covaries with RBP CSTB deserves more attention(Fig. 4H), which is enriched in the PPAR signaling pathway, phosphonate metabolism, tight junction, and other cancer-related pathways (Fig. 4I), and possesses distinct expression in HPV + and HPV-CESC (Fig. 4J).

Fig. 4
figure 4

Characterizing HPV+-related AS events that are well conserved in CESC sufferers and that are co-disrupted with RBP. (A) A bar plot displays the number of identified regulated alternative splicing events (RAS) between tumor samples from the TCGA CESC dataset, comparing HPV + and HPV- groups. (B) Detected RAS events, annotated to classical AS event types, were represented by a splice junction, and a box plot illustrated the distribution of every classical AS event type’s count. (C) The bar plot demonstrates the number of RAS events at different abundance levels (pSAR). For further analysis, only RAS events with a pSAR > = 50% were considered and highlighted in blue. (D) The principal component analysis (PCA) was conducted on the splicing ratio of RAS events with a pSAR > = 50%. The confidence ellipse for each group was plotted. (E) A heatmap illustrates the splicing ratio of RAS events (with pSASR > = 50%) and corresponding genes across all tumor samples. (F) Top 10 enriched KEGG pathways of genes involved in RAS with pSAR > = 50%. (G) The Venn diagram illustrates the intersection of DE RBPs that exhibited upregulation in HPV + samples as identified by bulk RNA-seq and epithelial cells by scRNA-seq. (H) Covariate network showing the relationship between (G) co-regulatory RBP expression with RAS event splicing rate (pSAR > = 50%). Only correlations with a |Pearson’s correlation|>=0.4 and p-value < = 0.01 were retained. Ellipses characterize RBPs, while squares indicate RAS events. (I) Top 10 enriched KEGG pathways of genes involved in RAS co-disturbed with CSTB.(J) Gene expression level of CSTB in epithelial from scRNA-seq was represented in the UMAP plot separated by HPV + and HPV- samples.

RBPs co-varying with AS episodes in CESC patients: immunohistochemical validation and prognostic impact

We performed immunohistochemistry to verify the differentially expressed RBPs further(Fig. 5A). Immunohistochemical experiments revealed that TIPARP expression was significantly down-regulated in HPV + cervical cancer (p = 0.033). Regarding CSTB, NDRG1, and NDRG2, no significant differences were found(Table 1). However, the lack of a significant survival association in larger cohorts (Fig. 5B, C) suggests that the role of TIPARP may be influenced by other factors, such as the tumour microenvironment or treatment response(Fig. 5B-C). This result needs to be confirmed by studies with a larger sample size.

Fig. 5
figure 5

RBPs co-varying with AS episodes in CESC patients in our samples: immunohistochemical validation and prognostic impact. According to the immunohistochemistry, TIPARP was the most significant factor linked to HPV infection (p < 0.05). Examining the survival curve revealed no significant association between the expression level of TIPARP and the overall survival (OS) in CESC.(A) Immunohistochemical patterns of RBPs.(B) Overall survival of CESC patients in the TCGA database. (C) Overall survival of CESC patients in our sample. (D) Kaplan-Meier curves for high-risk and low-risk groups. (E) Forest plot of Univariate analysis.

Prognostic value

Univariate Cox regression analysis demonstrated a strong correlation between CESC prognosis and age, stage, and treatment (p < 0.05), while the correlation with 4 RBPs was not significant(Table 2). The variables significantly associated with prognosis in the univariate Cox regression model were included sequentially, along with RBPs, to construct a multivariate Cox regression model. The results indicated that the model incorporating all RBPs, staging, and treatment yielded the highest c-index (concordance = 0.936, p<0.001). Based on this model, we calculated risk scores for 74 CESC patients, standardized the median risk score to 0, and divided the sample into high and low risk using median standardisation. The prognosis was worse in the high-risk group(Fig. 5D-E).

Table 2 Cox regression analyses of the 5-year survival of CESC.

Discussion

RNA binding proteins(RBPs) interact with RNAs to form ribonucleoprotein (RNP) complexes, which typically bind to sequences or structural motifs in RNA through one or more structurally defined RNA binding domains (RBDs). However, recent studies have identified a variety of non-canonical RBPs that lack classical RBDs and whose roles are equally essential in the field of cancer31. RBPs dictate the development and destiny of target RNA substrates and govern gene expression through various mechanisms, including mRNA splicing, RNA stability, RNA editing and translation, cleavage and polyadenylation, and RNA localization21. RBPs are crucial in maintaining cell homeostasis, but dysregulation is associated with various clinical diseases, including cancer32,33. Studies suggest that RBPs may be necessary in the development of HPV infection-induced cervical carcinogenesis. In the early stages of HPV gene expression, RBPs maintain the oncogenic cellular state by affecting alternative splicing to optimize the E6/E7 ratio. In late expression, RBPs inhibit the immunogenic L2 protein, promoting immune evasion of cancer cells32. Our findings reveal the heterogeneity of RBP expression in HPV + and HPV- CESC cells at the single-cell level, supporting the involvement of RBPs in HPV-induced cervical carcinogenesis, particularly in epithelial cells.

Our analysis of CESC epithelial cell clustering and RBP co-variation shows that HPV + up-regulated RBP co-expressed genes were enriched in endoplasmic reticulum stress, apoptosis, endocytosis, and hypoxia response, which are some of the known pathways associated with cancer. For example, CNN3 (Calponin 3), is positively expressed in 6/8 breast, 9/11 colon cancer lines, and HeLa cells. It can affect the EMT of colon cancer and thereby promote invasion and reduce sensitivity to treatment34. In a recent Xia L. et al. study, CNN3 as an oncogene has been shown to promote invasion and migration in CESC, and RPLP1 is a potential downstream candidate gene35. Our findings are consistent with the previous study that identified CNN3 as upregulated in HPV + CESC cells and associated with the hypoxia pathway. Future research will focus on further investigating hypoxia, with experimental validation required.

Tumor prevalence and development are frequently accompanied by alternative splicing events. Studies have demonstrated that AS plays a significant role in the metastasis, invasion, and proliferation of CESC, and this phenomenon displays a strong association with HPV infection36. In addition, RBP plays an important role in the regulation of tumor-associated AS. Our study focuses on this and identifies CSTB as highly correlated with alternative splicing events in CESC. CSTB(cystatin B), implicated in neurology37is also associated with carcinogenesis38,39. For example, Guan et al.38 demonstrated that CSTB modulates alternative splicing in ovarian cancer, promoting tumor progression through dysregulation of key oncogenic pathways. Similarly, Zhang et al.39 reported that CSTB downregulation in gastric cancer suppresses apoptosis and enhances cell proliferation, potentially through splicing-mediated mechanisms, suggesting its potential as a therapeutic target. Our study found that CSTB, affected by HPV infection, could alter the alternative splicing isoforms of critical genes, including CD44, PPARD, and USO1, contributing to cervical carcinogenesis. Multiple investigations have formerly disclosed that abnormal expression of CD44 alternative splicing occurs in patients with CESC, leading to an unfavorable prognosis and reduced survival rates40,41. These studies collectively support the notion that CSTB is a critical splicing regulator across multiple cancer types, including cervical cancer, and that this process may be closely linked to HPV infection. While CSTB did not show significant differences in IHC staining, its roles in alternative splicing and tumor progression is supported by prior studies. These findings suggest that its functional significance may extend beyond protein expression levels, warranting further investigation into its RNA-binding activities and downstream effects.

TIPARP, also known as PARP7, a member of the ADP-ribosyltransferase diphtheria toxin-like (ARTD) family, is a type I interferon (IFN) signaling blocker. TIPARP has previously been reported to be associated with breast and ovarian cancers42,43. In breast cancer, loss of TIPARP has been shown to increase type I interferon signaling, enhancing antitumor immunity, which can be used in future immunotherapy research directions. An IARC multicenter case-control study by James McKay et al. revealed a critically significant association of TIPARP polymorphism with CESC44. In the present study, we observed increased TIPARP expression in HPV infection-associated cervical cancer (p < 0.05). This alteration might be connected to the participation of TIPARP in the control of AS linked to the development of cancer following viral infection. Despite the absence of a significant association between elevated TIPARP expression and the prognosis of CESC in this study, it is crucial to acknowledge the potential efficacy of TIPARP inhibitors as a targeted treatment for CESC.

To summarize, our investigation uncovers for the first time the specific expression features of RBPs in HPV + and HPV- CESC cells at a single-cell level. Our systematic search revealed fewer studies directly investigating the roles of CSTB, TIPARP, NDRG1, and NDRG2 in cervical cancer. This highlights a critical gap in the literature and underscores the novelty of our findings. These RBPs emerging as potential diagnostic and therapeutic targets. Our findings underscore the translational potential of targeting RBPs in cervical cancer, particularly in HPV + tumors. Future studies should prioritize the development and testing of RBP-targeted therapies in preclinical models of cervical cancer. However, our study has limitations. Firstly, the history of HPV infection in HPVI cervical cancer patients cannot be completely ruled out. Secondly, we did not explore the association between RBPs and HPV oncoproteins. More importantly, the small sample size of our single-cell data (2 HPV + and 2 HPV- samples) limits the generalizability of our findings. While we validated key results in larger cohorts, future studies should prioritize larger single-cell datasets to confirm our observations and explore subtype-specific differences. Furthermore, while our study identifies CSTB and TIPARP as key regulators of HPV-associated splicing events, this study is limited by the lack of direct experimental validation of RBP interactions with HPV oncoproteins or splicing reporters. Future work will prioritize these mechanistic studies to establish causality.