Abstract
Platinum-based therapy is an integral part of the standard treatment for ovarian cancer. However, despite extensive research spanning several decades, the identification of dependable predictive biomarkers for platinum response in clinical practice has proven to be a formidable challenge. Recently, the development of single-cell technology has enabled more precise investigations into the heterogeneity of cancer. In this study, we isolated cancer cells from the single-cell transcriptomic data of platinum-sensitive and platinum-resistant patients with ovarian cancer. Differential gene analysis of platinum-sensitive and platinum-resistant cancer cells revealed that several of the differentially expressed genes had previously been reported in other studies to be associated with platinum resistant. Gene set enrichment analysis revealed the up-regulation of pathways involved in processes such as autophagy, cell cycle regulation, and DNA damage repair, which are known to promote platinum resistance in ovarian cancer. Based on these findings, we hypothesized that these differentially expressed genes could be used to predict the response of ovarian cancer patients to platinum-based chemotherapy. To validate this hypothesis, we explored 7 different machine learning models for predicting platinum chemotherapy response at varying feature gene counts. Ultimately, the random forest model performed the best, with 5 genes (PAX2, TFPI2, APOA1, ADIRF and CRISP3) and achieve an AUC of 0.993 in test cohort and 0.989 in GSE63885 independent validation cohorts. We named this model GPPS (Genes to Predict Platinum response Signature). Furthermore, we discovered that the GPPS model can also predict patient prognosis.
Similar content being viewed by others
Introduction
Ovarian cancer is the most fatal type of gynecological malignancy, with a survival rate of fewer than 50% after a 5-year period1. Platinum-based therapy is an integral part of the standard treatment for ovarian cancer2. The use of platinum-based chemotherapy in ovarian cancer patients can lead to an increase of more than 10% in their 10-year survival rate, while also doubling the objective response rate, according to previous studies3. However, the majority of ovarian cancer patients will eventually develop platinum resistance, leading to an almost inevitable and fatal refractory disease4. Anti-angiogenesis therapies have emerged as a critical strategy in the treatment of ovarian cancer, targeting the vascular endothelial growth factor (VEGF) pathway to inhibit tumor-associated angiogenesis. Angiogenesis, the formation of new blood vessels, is a fundamental process for tumor growth and metastasis. In ovarian cancer, VEGF plays a pivotal role in promoting tumor vascularization, increasing vascular permeability, and facilitating peritoneal dissemination5 By blocking VEGF signaling, anti-angiogenic agents can reduce tumor vascularization, normalize abnormal vasculature, and enhance the efficacy of chemotherapy by improving drug delivery. Bevacizumab, a monoclonal antibody targeting VEGF-A, was the first anti-angiogenic agent approved for ovarian cancer treatment. Clinical trials such as GOG-218 and ICON7 demonstrated that the addition of bevacizumab to standard chemotherapy significantly prolonged progression-free survival (PFS) in both the first-line and recurrent settings5. Beyond bevacizumab, tyrosine kinase inhibitors (TKIs) such as pazopanib and cediranib have also shown promise in targeting VEGF receptors and other angiogenic pathways. The PAOLA-1 trial further highlighted the potential of combining anti-angiogenic agents with other targeted therapies, such as poly(ADP-ribose) polymerase (PARP) inhibitors, to enhance treatment efficacy6. Despite the clinical benefits, resistance to anti-angiogenic therapy remains a major challenge. Tumors can activate alternative angiogenic pathways or adapt by increasing invasiveness and metastasis potential. Additionally, prolonged VEGF inhibition may lead to hypoxia-induced changes that promote immune evasion and resistance7. Moreover, adverse effects such as hypertension, proteinuria, and gastrointestinal perforation can limit the long-term use of these agents8. The deficiency in homologous recombination repair pathway, commonly known as homologous recombination deficiency (HRD), may indicate the sensitivity to both platinum-based chemotherapy and poly (ADP-ribose) polymerase inhibitors (PARPi) in various cancers, especially in ovarian cancer9. Poly (ADP-ribose) polymerase inhibitors (PARPi) have emerged as a transformative class of targeted therapies in the treatment of ovarian cancer. These inhibitors exploit the principle of synthetic lethality by targeting the DNA repair mechanisms in cancer cells, particularly in those with BRCA1/2 mutations and homologous recombination deficiency (HRD). By blocking the enzymatic activity of PARP, these agents prevent the repair of single-strand DNA breaks, leading to the accumulation of double-strand breaks that ultimately result in genomic instability and cell death10. The clinical development of PARPi has significantly altered the landscape of ovarian cancer treatment, particularly in the maintenance setting after first-line platinum-based chemotherapy. Multiple phase III clinical trials, including SOLO-1, PRIMA, and PAOLA-1, have demonstrated that PARPi maintenance therapy leads to a substantial improvement in progression-free survival (PFS), not only in BRCA-mutated patients but also in those with HRD-positive tumors11. Furthermore, studies have shown that the benefits of PARPis extend beyond BRCA-mutant populations, with patients exhibiting some degree of HRD still experiencing improved outcomes12. Despite these advancements, challenges remain in optimizing the use of PARPis in ovarian cancer. Acquired resistance mechanisms, such as restoration of BRCA function, increased drug efflux, and alternative DNA repair pathways, can limit their long-term efficacy13. Additionally, while PARPi are generally well tolerated, adverse events such as hematologic toxicities, including anemia and thrombocytopenia, can impact treatment adherence and patient quality of life14. Notably, the ARIEL4 phase III trial revealed a paradoxical survival outcome: While rucaparib (a PARP inhibitor) showed superior progression-free survival versus chemotherapy (7.4 vs. 5.7 months; HR 0.64), it paradoxically reduced overall survival by 8 months (19.4 vs. 25.4 months; HR 1.3)15. Nevertheless, approximately half of high-grade serous ovarian cancer patients do not present HRD tumors, thus facing very few treatment options at the chemotherapy-resistant stage. In light of the dearth of effective treatments, a more comprehensive comprehension of the disease’s biological progression mechanism is required to explore novel biomarkers or predicted methods for platinum resistance and curtail unnecessary therapeutic intervention. While current clinical investigations on platinum resistance often focus on genomic modifications such as copy number variations (CNVs) and BRCA gene mutations16. it is important to note that chemotherapy can also impact the transcriptional programs of cancer cells, providing a unique opportunity to comprehensively decode the most relevant chemotherapy-induced processes. Due to the heterogeneity of tumors, detecting platinum resistance mechanisms in clinical settings has been a considerable challenge in the past. Bulk RNA sequencing can only explain differences between samples at a macroscopic level, and have difficulty revealing subtle variances between individual cells. However, the emergence of single-cell RNA sequencing overcomes this limitation, allowing for investigation of the molecular mechanisms at the single-cell level. In this study, we employed single-cell RNA sequencing (scRNA-seq) to explore differentially expressed genes in platinum-sensitive and platinum-resistant cancer cells of patients. The main objective was to establish a predictive model for ovarian cancer patients’ response to platinum.
Materials and methods
Quality control of single cell RNA sequencing data
To conduct single-cell RNA sequencing analysis, it remains imperative to filter the data prior to analysis. The number of total unique molecular identifiers (nUMIs), the number of expressed genes (nGene) and the percentage of transcripts from mitochondrial (per.mito) or ribosomal genes (per.ribo) were widely used to perform cell quality control (QC)17,18,19,20. Since solid tumor cells demonstrate a heightened tendency for stress in the course of sample dissociation, we leveraged the findings of van den Brink et al.21 to create a list of genes associated with sample dissociation and use the percentage of this gene list as another QC index (per.diss). To account for potentially varied distributions in quality control metrics across different sample sources and experimental conditions, we determined adaptively the filtering thresholds, which involved the detection of the outlier data-points from the distribution of the quality control metrics, instead of relying on fixed cutoffs routinely. First, we obtained a raw single-cell count matrix of ovarian cancer patients who had undergone platinum-based chemotherapy from the public database GSE16589722. Next, we calculated the threshold for the QC indices, and finally, the cells were filtered based on specific thresholds: nGene (200 < nGene < 4622), nUMI (nUMI < 21,977), ribo.percent (< 0.524), diss.percent (< 0.087) and mito.percent (< 0.093) (Figure S1 A-H). After this basal filtering, scDblFinder23 was used to remove doublets with default parameters (Figure S1 I).
General clustering
We conducted standard procedures that included filtering, variable gene selection, dimensionality reduction, and clustering using Seurat v424. Counts were log-normalized, and then scaled by linear regression against the number of reads. Variable genes (Ngenes = 4000) were selected using a threshold for dispersion, with z-scores normalized by expression level. The variable genes were projected onto a low-dimensional subspace using principal component analysis. The number of principal components (Npcs) were selected based on inspection of the plot of variance explained (Npcs = 50). A shared-nearest-neighbors graph was constructed based on the metric if Euclidean distance in the low-dimensional subspace. Cells were visualized using a 2-dimensional uniform manifold approximation and projection (UMAP) on the same distance metric (Res = 0.5, Kparam = 30). Cell types were assigned to each cluster of cells using the abundance of known marker genes.
Identified cancer cells
Cells previously annotated as epithelial cells were subset and re-clustered using methods described above. Malignant epithelial cells were identified using inferCNV25, a widely used tool developed by the Broad Institute, to identify cancer cells based on copy number variation (CNV) analysis in single-cell transcriptomic data. Although inferCNV has not yet been published as a standalone methodological paper, it has been extensively validated and utilized in many peer-reviewed studies26,27,28,29 and A benchmarking study demonstrated that InferCNV exhibits superior sensitivity in tumor cell identification30. inferCNV identifies cells with large copy number variations by sorting genes according to their chromosomal location, and then applying a moving average (i.e., a sliding window of 100 genes) to the relative expression level of the genes on each chromosome. All epithelial cells were used as inputs and an additional 1000 T cells were used as reference controls. An additional 1000 T cells were added as spikes. Normal epithelial cells cluster together with T cells, while cancer cells are separated from normal cells and T cells.
Different expression gene analysis of cancer cells
We used the Seurat31 function, FindMarkers, along with the MAST test and sample name as the covariate to identify gene expression differences between sensitive and resistant cancer cells.
Integration of transcriptome data from different platforms
Transcriptome data are often generated in batches due to logistical or practical constraints, leading to potential technical variations and differences across batches. These variations, commonly referred to as batch effects, can have a substantial impact on the data’s homogeneity within each batch32. We extracted patient information who received platinum chemotherapy from the publicly available GEO datasets (GSE5137333 n = 28, GSE6388534 n = 75, GSE1562235 n = 14, TCGA ovarian cancer) and used "Rank-in"36 to integrate gene expression data from different platforms. Rank-in is designed to integrate microarray and RNA-seq for cancer. The core idea of Rank-in is to transform the original expression intensity into relative rankings within each sample and subsequently reduce nonbiological effects through weighting and singular value decomposition (SVD). This process enables the merging of data from different techniques for further analysis. The authors of Rank-in provide a web application (http://www.badd-cao.net/rank-in/submission.html) that allows users to upload RNA-seq and microarray data. After backend analysis, users can obtain integrated data. Initially, the datasets GSE15622, GSE51373, and TCGA OV were merged and uploaded to the Rank-In website platform to generate integrated data. Subsequently, the dataset GSE63855 was independently uploaded to the same platform to acquire the processed data.
Gene Set Enrich Analysis (GSEA)
GSEA was used to explore the pathway enrichment between sensitivity and resistant groups using the R package clusterProfiler(v2.1.6)37 on the Reactome pathway database38. The fold change of gene expressions between two groups was used to rank the genes. A gene set was considered to be significantly enriched when the P adjust value < 0.05 (adjust method: Benjamini-Hochberg) and the value of the normalized enrichment score (NES) > 1 for the corresponding gene set.
Statistical analysis
The statistical analyses were performed using R (v4.1.1). Kruskal–Wallis rank-sum test was used to compare variables between two groups using the ggsingif packages (0.6.3) in R. For the ROC curve analyses, the area under the curve (AUC), sensitivity, and specificity were calculated using the pROC package (v1.18.0) in R. The best cutoff was determined by the maximized sum of sensitivity and specificity. Kaplan–Meier survival curves were analyzed by log-rank test using survival (v3.2–11) and survminer (v0.4.9) packages. P < 0.05 was considered statistically significant.
Results
Resolving cancer and non-cancer epithelial cells through clustering-based copy number variation analysis
Recent years have witnessed a growing interest in distinguishing cancer cells from epithelial cells. Discriminating between these cell types is of utmost importance in understanding the underlying mechanisms and pathogenesis of cancer. The development of single-cell RNA sequencing (scRNA-seq), a breakthrough technology, allows us to directly study cancer cells in patient tissues without the need for cell lines39.
The public dataset GSE16589722, which includes single-cell transcriptomic data from ovarian cancer patients with progression-free survival information, was utilized to analyze chemotherapy resistance in high-grade serous ovarian cancer (HGSOC). This analysis was conducted using patient-derived prospective tissue sample pairs collected before and after treatment at single-cell resolution. We applied stringent quality-control criteria (details in Methods) to ensure that the selected data originated from single and live cells (Figure S1 A-H). Following the initial cell filtering, the scDoublets algorithm40 was employed to identify doublet cells (Figure S1 I). After this step, 39,408 cells were obtained, including 5219 epithelial cells (cell makers: EPCAM, CDH1, KRT8, DSP, KRT19; clusters: 4, 10, 19, 32, 35, 37, 39), 6559 fibroblasts cells (cell markers: PDPN, DCN, COL1A1; clusters: 8, 14, 20, 24, 29 ,30 43), 17,709 T cells (cell markers: CD3D, CD3E, cluster: 0, 2, 3, 5, 6, 7, 11, 13, 15, 17, 25, 36,), 2507 B cells (cell markers: CD79A, CD79B, CD19, MS4A1, cluster: 9,16, 33), 4271 Macrophage (cell markers: CD14, CD68, CD163, TREM2, NRC1, cluster: 5, 12,18, 22, 27, 42), 2002 NK cells (cell markers: CD3D-, CD3E−, CD4− CD8A−, CD8B−, KLRB1+, KLRD1+, GZMB+, GAMA+ PRF1+,NKG7+; cluster: 6, 21, 40), 1141 DC cells (cell markers: CLEC10A, FCER1A, CD1C, LAMP3, CLEC9A; cluster: 23, 41) (Fig. 1A, B, D). In contrast to fibroblasts and immune cells, which exhibited clustering based on cell type regardless of patient origin, cancer cells displayed a distinct, patient—specific expression pattern (Fig. 1C), consistent with previous studies41,42. Identifying cancer cells from non-cancer epithelial cells is a crucial task in single cell RNA sequencing analysis. As cancer is often linked to chromosomal alterations on a large scale, we leveraged copy-number variation (CNV) derived from RNA expression to differentiate between cancer and non-cancer epithelial cells using inferCNV. In our analysis, we employed 500 T cells from cluster 0 and 500 T cells from cluster 2 as the reference, and another set of 1000 T cells from clusters 3 and cluster 5 as the internal standard. Non-cancerous cells were clustered together with the internal standard T cells, while the cancer cells should be clustered separately. Compared to T cells (reference), cancer cells displayed larger changes from relative expression intensities across the genome (Figure S5) and we found that all epithelial cells were cancer cells.
Identification of cancer cells by scRNA-seq. The scRNA-seq dataset—GSE165897 was analyzed (A) Uniform manifold approximation and projection (UMAP) plot of all cells that were classified into 25 clusters. Different colors represent different clusters. (B) UMAP plot of all cells according to cell annotation. Different colors represent different cell type (C) UMAP plot across multiple patients, highlighting interpatient tumor cell heterogeneity. Different colors represent different patients (D) The Dotplot of different cell populations with maker genes.
Different gene analysis of cancer cells between platinum-sensitive and platinum-resistant patients
The Platinum-Free Interval (PFI) is a measure used in clinical practice to assess the duration of response to platinum-based chemotherapy in cancer patients. It is defined as the time elapsed between the last dose of first-line platinum-based chemotherapy and the date of tumor progression. Ovarian cancer patients who have a PFI less than 6 months are typically categorized as resistance indicating that their tumors progressed relatively quickly after completing platinum-based chemotherapy. On the other hand, patients with a PFI greater than 6 months are categorized as sensitive indicating that their tumors exhibited a longer period of response before progression.
All cancer cells were re-clustered and cancer cells of platinum-sensitive patients were completely separated from those of platinum-resistant patients which indicate that cancer cells in platinum-resistant patients had different expression patten, compared with cancer cells in platinum-sensitive patients (Fig. 2A). We utilized the Seurat function FindMarkers with default parameters to identify differentially expressed genes. Single cell different expression analysis compared sensitive to resistant cancer cells with top 50 genes shown in Fig. 2B and several genes have been reported to be associated with platinum resistance in ovarian cancer. Marinho et al.43 reported that increased expression of APOA1 could increase the sensitivity of cancer cells to platinum by affecting the AKT signaling. PAX2 inactivation has been shown to enhance cisplatin-induced apoptosis in renal carcinoma cells44. Additionally, it has been found to promote the progression of epithelial ovarian cancer through the involvement of fatty acid metabolic reprogramming45. Identifying significantly altered molecular pathways between sensitive and resistant cancer cells using Gene Set Enrichment Analysis (GSEA) is a crucial aspect of cancer research. Focusing on molecular pathways allows researchers to elucidate the underlying mechanisms contributing to cancer progression and treatment resistance. In this study, we utilized GSEA with statistical thresholds of an adjusted P adjust value less than 0.05 (adjusted using the Benjamini—Hochberg method) and a Normalized Enrichment Score (NES) greater than 1. GSEA revealed that in comparison to sensitive cancer cells, resistant cancer cells exhibited significantly up-regulation in cell proliferation-related pathways including cellular response to hypoxia, MAPK signaling, NF-κB signaling, as well as cell cycle-related pathways such as G1 phase, M phase, S phase, G1/S Transition, Cell Cycle Mitotic, Cell Cycle Checkpoints (Fig. 2C). Additionally, pathways associated with PI3K/Akt signaling and autophagy were also significantly up-regulation in the resistant cancer cells. The PI3K/Akt signaling pathway not only contributes to ovarian cancer development and tumorigenesis but also plays a significant role in the mechanism of chemo-resistance exhibited by ovarian cancer cells towards platinum-based drugs46,47. targeting Akt signaling has become an appealing strategy for overcoming chemo-resistance in ovarian malignancies. The induction of autophagy has been identified as a contributing factor to cisplatin resistance in human ovarian cancer cells48.
Differential gene analysis and Gene Set Enrichment Analysis (GSEA) between sensitive and resistant cancer cells. (A) UMAP plot of all cancers with two groups (sensitive and resistant). (B) Heatmap of z-scored expression of the top 25 up-regulated and 25 down-regulated genes in sensitive and resistant cancer cells. (C) Identification of significantly altered molecular pathways (P adjust value < 0.05 and NES (normalized enrichment score)) between sensitive and resistant cancer cells. The molecular pathways are sourced from the Reactome database.
Identification of optimal gene combination: high performance of the GPPS model in predicting platinum response
Based on the understanding that these differential signature genes can provide insights into the status of resistant and sensitive cancer cells, we formulated a hypothesis that the expression pattern of these genes may serve as predictors of platinum response in ovarian cancer. To investigate this hypothesis, we conducted a comprehensive data collection, consisting of a total of 404 samples (TCGA ovarian cancer: 287, GEO: 117) derived from four different datasets (TCGA ovarian cancer, GSE15622, GSE51373 and GSE63885). Among these samples, there were 71 classified as sensitive and 333 classified as resistant to platinum treatment. Given the potential presence of batch effects in datasets obtained from different sources, we performed principal component analysis (PCA) to assess the existence of such effects across the four datasets. The resulting PCA plot revealed distinct clustering of samples from each dataset, indicating the presence of batch effects (Figure S2A). To mitigate these effects, we employed the "Rank-in" which is designed to integrate microarray and RNA-seq for cancer to remove the batch effect (Figure S2B). Subsequently, the datasets GSE15622, GSE51373, and TCGA OV were integrated and were randomly divided into training and testing sets at an 8:2 ratio, with GSE63885 reserved as an independent validation set. In the training process of machine learning models, we use fivefold cross-validation for identifying optimal hyperparameters use sklean49. The method involves partitioning the training set into five equally sized subsets. In each iteration, four subsets are used for training, while the remaining subset serves as the validation set. This process is repeated five times, ensuring that each subset is utilized for validation exactly once. By systematically evaluating the model across different partitions, this approach provides a more comprehensive assessment of its generalization ability, mitigating biases that may arise from specific data splits. Once the optimal hyperparameters are determined, the model is reinitialized with these parameters and trained on the entire training set, maximizing the use of available data and further improving performance. This technique not only enhances the model’s robustness but also establishes a strong foundation for its effectiveness in test sets or real-world applications. To identify the optimal model for predicting platinum resistance with the smallest possible number of genes while achieving the best performance, we followed a stepwise approach. Initially, we incrementally added 5 feature genes at a time until a total of 50 genes were included. We then evaluated the performance of 7 machine learning models, namely Decision Tree (DT), Gradient Boosted Decision Trees (GBDT), K-Nearest Neighbors (KNN), Logistic Regression (LR), Multi-layer Perceptron (MLP), Random Forest (RF), and Support Vector Machine (SVM), using the AUC value of the ROC curve. The evaluation was conducted for each model across different numbers of feature genes. In our analysis, we observed that the Random Forest (RF) model consistently demonstrated the highest performance across different feature genes. It outperformed other models in predicting platinum resistance. Following RF, the K-Nearest Neighbor (KNN) model showed relatively strong performance. Conversely, Logistic Regression (LR) consistently exhibited poor performance across varying numbers of features. The remaining models, including Decision Tree (DT), Gradient Boosted Decision Trees (GBDT), Multi-layer Perceptron (MLP), and Support Vector Machine (SVM), displayed significant variations in their AUC values when considering different numbers of characteristic genes (Fig. 3A). This conclusion is further supported by the specificity and sensitivity curves of these models (Fig. 3B, C). In the RF model, we observed that when the number of feature genes was set to 5, the model exhibited remarkable performance, achieving an AUC value of 0.993. However, as we further increased the number of genes, the improvement in performance, particularly in terms of AUC, became limited for the RF model. (Fig. 3A). To determine the optimal combination of genes, we sequentially added genes to the Random Forest (RF) model. After careful analysis, we found that when the number of genes reached 5, the model exhibited the highest performance (Fig. 3D–F). An examination of the expression levels of the 5 genes revealed that 3 of them were up-regulated (PAX2: P < 0.001, CRISP3: P < 0.001 ADIRF: P < 0.01 rank-sum test), while 2 were down-regulated (TFPI2: P < 0.001, APOA1: P < 0.001, rank-sum test) in resistant patients as compared to sensitive patients (Figure S3 A-E). Importantly, this trend aligns consistently with the findings from single-cell sequencing data, which compared resistant cancer cells to sensitive cancer cells. Based on the compelling performance of the 5-gene model in predicting platinum resistance, we have chosen to designate this model as GPPS (Genes of Predicted Platinum Response).
Performance comparison of different models for predicting response to platinum chemotherapy using varying numbers of feature genes. (A–C) Progressive analysis of different models’ performance: AUC, specificity, and sensitivity with incremental addition of five genes each time. (D–F) Identifying optimal gene combination for best performance: evaluating AUC, specificity, and sensitivity among 5 genes.
Evaluate the performance of the final model.
To investigate the performance of the GPPS model, we examined the GPPS scores of samples within both the training, test and validation set. In the fivefold cross validation process, each fold is sequentially utilized as the test set. The GPPS score for the samples within that fold is then calculated. Subsequently, the GPPS scores derived from all 5 folds are aggregated to form the comprehensive GPPS score for the entire training set and we observed that the scores of patients classified as resistant to platinum treatment were significantly higher than those of sensitive patients (Training data: P < 0.001 Fig. 4A). Similarly, in the independent test and validation set, the GPPS scores of resistant patients were significantly higher compared to sensitive patients (Test set P < 0.001, Validation set P < 0.001, rank-sum test, Fig. 4B, C). Based on the optimal threshold of 0.5, determined through ROC curve analysis on the training set, patients were divided into the GPPS-positive and GPPS-negative groups. Those with scores exceeding 0.5 were assigned to the GPPS-positive group, while the remaining patients were allocated to the GPPS-negative group. This threshold was identified as the point where the sum of specificity and sensitivity was the highest, indicating its effectiveness in distinguishing between the two groups. Fisher’s exact test revealed that GPPS-positive patients exhibited significantly higher resistance rate to platinum chemotherapy than the GPPS-negative group in both the test set and the training set and independent validation set (Fig. 4D–F). The performance evaluation of the GPPS model demonstrated excellent discrimination, with a mean AUC of 1 (fivefold cross validation) in the training set and AUC values of 0.993 in the test set, 0.989 in the validation set (Fig. 4G–I). Additionally; using the GSE63885 dataset, which includes mutation data for BRCA1, we evaluated the predictive role of BRCA1 mutations in platinum chemotherapy response. Our analysis revealed no significant difference in the proportion of platinum-sensitive patients, prognosis, or survival between those with BRCA1 mutations and those with the wild-type genotype (Figure S4). Additionally, we compared the predictive performance of the GPPS score with other molecular combination models. The GPPS score demonstrated superior predictive accuracy for platinum chemotherapy response and prognosis, achieving an AUC of 0.989 (Fig. 4I), compared to AUC values ranging from 0.74 to 0.92 reported by other models validated on the GSE63885 dataset50,51,52. These findings consistently demonstrate that the GPPS model effectively distinguishes between patients who are likely to be resistant or sensitive to platinum-based therapies.
Verify the performance of the final model. (A–C) Differences in GPPS scores between resistant and sensitive patients: training set, test set and independent validation set. (D–F) Fisher exact test plot showed the statistical significance of the association between response to platinum chemotherapy and the GPPS status in training set and independent validation set (G–I) The ROC curves of GPPS model in training set, and independent validation set. ***P < 0.001, ****P < 0.001, rank-sum test.
GPPS can predict the prognosis of ovarian cancer patients who have received treatment of platinum
Patients were classified into GPPS-positive and GPPS-negative groups based on the GPPS score. The overall survival analysis for these two subgroups showed a significant difference in the TCGA cohort (Fig. 5A, P < 0.0001) and GEO63885 validation cohorts (Fig. 5C, P < 0.0001). For TCGA cohort (Fig. 5B) and GSE63885 (Fig. 5D), the difference remained statistically significant after adjusting for age, clinical stage and tumor grade which indicated that GPSS score can serve as an independent prognostic factor. Therefore, the GPPS could be a good model to predict the prognosis of ovarian cancer patients who have received treatment of platinum.
GPPS score can function as a prognostic index for ovarian cancer patients. (A) Kaplan–Meier plots of the survival probability for GPPS-negative and GPPS-positive groups of TCGA ovarian cancer cohort, respectively. (B) Forest plot representation of multivariate Cox model depicting the association between overall survival and GPPS subgroups with other clinical factors considered in the TCGA ovarian cancer cohort. (C) Kaplan–Meier plots of the survival probability for GPPS-negative and GPPS-positive groups of GSE64885 ovarian cancer cohort, respectively. (D) Forest plot representation of multivariate Cox model depicting the association between overall survival and GPPS subgroups with other clinical factors considered in the GSE64885 ovarian cancer cohort.
Discussion
Platinum-based chemotherapy remains a fundamental component of cancer treatment across various malignancies. Despite extensive research efforts spanning several decades, the identification of reliable predictive biomarkers for platinum response in clinical practice has remained challenging. The availability of such biomarkers would have substantial implications for cancer therapy, as it would enable the selection of patients who are more likely to benefit from platinum-based therapy. Moreover, it would facilitate the design of clinical trials aimed at evaluating novel therapeutic strategies and ultimately improving patient outcomes53.
In this study, we employed dimensionality reduction clustering of single-cell transcriptome data to identify ovarian cancer epithelial cells. Subsequently, cancer cells were isolated from the epithelial cell population based on their characteristic of exhibiting higher copy number variation compared to normal epithelial cells. Further analysis through re-dimension reduction clustering revealed distinct clusters of cancer cells from sensitive and resistant patients. These findings underscore the presence of distinct gene expression patterns between sensitive and resistant cancer cells. Through a comparative analysis of differentially expressed genes in cancer cells between resistant and sensitive patients, we identified several feature genes previously associated with platinum chemotherapy. Gene set enrichment analysis demonstrated the upregulation of cell cycle and cell proliferation-related pathways in resistant cancer cells, suggesting their enhanced proliferation and increased malignancy relative to sensitive cancer cells. Additionally, pathways such as autophagy and PI3K/AKT, known to contribute to platinum resistance in ovarian cancer, exhibited higher expression levels in resistant cells. To harness the informative potential of these feature genes in distinguishing between sensitive and resistant cancer cell states, we explored various machine learning models and different combinations of gene sets. Ultimately, we successfully developed a random forest model called GPPS, utilizing a subset of five characteristic genes (APOA1, TFPI2, PAX2, CRISP3, ADIRF). APOA1, a key lipid metabolism regulator, has been implicated in platinum resistance in cervical squamous carcinoma. In vitro cell line studies have shown that APOA1 enhances chemoresistance by activating the P38 MAPK and PI3K pathways, promoting tumor cell survival54. Additionally, increased expression of APOA1 may potentially enhance the sensitivity of cancer cells to platinum by affecting the AKT signaling pathway. However, AKT was significantly activated only in one of the tested cell lines55. Tissue factor pathway inhibitor 2 (TFPI2) has been widely studied in various cancers, including cervical, gastric, breast, and colorectal cancers. Research suggests that TFPI2 acts as a tumor suppressor but is frequently silenced through promoter hypermethylation, leading to cancer progression56,57,58,59. PAX2, a transcription factor involved in tissue development, has been shown to influence platinum sensitivity in renal carcinoma and ovarian cancer45, with its inactivation enhancing cisplatin-induced apoptosis in renal carcinoma cells44. Cysteine-rich secretory protein 3 (CRISP3) has been implicated in the progression of multiple cancers, including prostate cancer and cervical cancer, Studies have suggested that CRISP3 plays a role in tumor cell invasion, adhesion, and immune response modulation. In prostate cancer, CRISP3 has been identified as one of the most highly upregulated proteins during the transition from normal epithelium to malignancy. It has been shown to enhance tumor invasion and progression by regulating cell–cell adhesion proteins such as LASP1 and TJP1, facilitating cancer cell migration60. Additionally, CRISP3 expression in prostate cancer is strongly associated with PTEN deletion and ERG fusion status, defining a molecular subtype with poorer prognosis61. It is important to note that these conclusions are primarily derived from animal models, and further validation in human studies is needed to confirm their applicability to clinical settings. Finally, ADIRF, though less studied in platinum resistance, has been linked to fatty acid metabolism and tumor survival, suggesting a potential role in chemotherapy response, particularly in metabolically active cancers62,63. The genes ADIRF, TFPI2, APOA1, PAX2, and CRISP3 have been linked to the response to platinum-based chemotherapy or development of cancer. Understanding their roles in chemotherapy resistance and sensitivity may lead to better biomarker identification and improved treatment strategies.
Accumulating evidence indicates that BRCA mutations function as predictive biomarkers for platinum-based chemotherapy sensitivity in ovarian cancer64,65,66. BRCA1/2-encoded proteins are essential mediators of homologous recombination repair (HRR), the primary pathway for resolving DNA double-strand breaks (DSBs). Pathogenic mutations in these genes induce homologous recombination deficiency (HRD), compromising the cells’ ability to repair platinum-induced DNA interstrand crosslinks (ICLs)—the cytotoxic lesions generated by agents such as cisplatin and carboplatin. Tumor cells harboring BRCA mutations exhibit heightened sensitivity to platinum-based chemotherapy due to their inability to resolve ICLs, leading to lethal genomic instability and apoptosis67,68,69. Emerging evidence increasingly challenges the universality of BRCA1/2 mutations as reliable predictive biomarkers for platinum sensitivity across all ovarian cancer patients. Current research highlights significant limitations in the utility of BRCA1/2 mutations for accurately predicting the efficacy of platinum-based chemotherapy in this patient population70. The GSE63885 dataset, which contains mutation information for BRCA1, was used to assess the predictive value of BRCA1 mutations in response to platinum chemotherapy. Our study challenges the notion that BRCA1 mutations alone can reliably predict platinum sensitivity across all ovarian cancer patients. This discrepancy may arise from the fact that only BRCA1 mutation data are available, whereas BRCA2 mutation information is lacking. A study has demonstrated that patients with BRCA2 mutations are significantly more likely to predict sensitivity to platinum-based chemotherapy than those with BRCA1 mutations (p < 0.01). Specifically, individuals with BRCA2 mutations tend to have longer platinum-free survival and exhibit higher sensitivity to chemotherapy. In contrast, patients with BRCA1 mutations generally show a lower response rate71. These findings suggest that different types of BRCA mutations may differentially influence the predictive capacity for platinum-based chemotherapy outcomes. The GPPS score has been proven to possess strong predictive accuracy for both platinum chemotherapy response and prognosis. Therefore, it may be worthwhile to consider integrating BRCA mutation data with the GPPS score to enhance the prediction of platinum chemotherapy response in ovarian cancer. Furthermore, our study underscores the superior predictive accuracy of the GPPS score compared to other molecular combination models. The GPPS score outperformed existing models in predicting platinum chemotherapy response and prognosis, suggesting that it may provide a more comprehensive and reliable tool for guiding treatment decisions in ovarian cancer. Based on the findings above, GPPS appears to demonstrate enhanced predictive capability for platinum response in ovarian cancer relative to currently available biomarkers.
It is important to acknowledge the limitations of our current study sample, which is derived from existing public datasets. While these datasets provide valuable insights, they may not fully represent the broader population or other specific cohorts. Given these limitations, we recognize the need to further explore the generalizability of our models. Future research will focus on incorporating more diverse and larger datasets to better understand the models’ applicability across different populations. We also welcome other researchers to validate our models in different cohorts to provide more comprehensive evidence on their robustness and potential for broader application.
Data availability
The datasets supporting the conclusions of this article are available in the GEO (GSE165897, GSE51373, GSE63885, GSE15622) and xenabrowser TCGA ovarian cancer cohort.
References
Siegel, R. L., Giaquinto, A. N. & Jemal, A. Cancer statistics, 2024. CA Cancer J. Clin. 74(1), 12–49 (2024).
Lheureux, S., Braunstein, M. & Oza, A. M. Epithelial ovarian cancer: Evolution of management in the era of precision medicine. CA Cancer J. Clin. 69(4), 280–304 (2019).
Luo, S. et al. Clonal tumor mutations in homologous recombination genes predict favorable clinical outcome in ovarian cancer treated with platinum-based chemotherapy. Gynecol. Oncol. 158(1), 66–76 (2020).
Richardson, D. L., Eskander, R. N. & O’Malley, D. M. Advances in ovarian cancer care and unmet treatment needs for patients with platinum resistance: A narrative review. JAMA Oncol. 9(6), 851–859 (2023).
Monk, B. J., Minion, L. & Coleman, R. Anti-angiogenic agents in ovarian cancer: past, present, and future. Ann. Oncol. 27, i33–i39 (2016).
Nero, C. et al. Ovarian cancer treatments strategy: focus on PARP inhibitors and immune check point inhibitors. Cancers (Basel) 13(6), 1298 (2021).
Yue, H. & Lu, X. Metabolic reprogramming of the ovarian cancer microenvironment in the development of antiangiogenic resistance: Metabolic reprogramming of the OC microenvironment. Acta Biochim. Biophys. Sin. (Shanghai) 55(6), 938 (2023).
Zhang, W. et al. The benefits and side effects of bevacizumab for the treatment of recurrent ovarian cancer. Curr. Drug Targets 18(10), 1125–1131 (2017).
Mirza, M. R. et al. The forefront of ovarian cancer therapy: update on PARP inhibitors. Ann. Oncol. 31(9), 1148–1159 (2020).
Li, Z., & Li, H. Ovarian cancer treatment strategies: focus on PARP inhibitors. In International Conference on Modern Medicine and Global Health (ICMMGH 2023): 2023: SPIE, 245–251 (2023).
Mirza, M. et al. The forefront of ovarian cancer therapy: update on PARP inhibitors. Ann. Oncol. 31(9), 1148–1159 (2020).
Hao, J. et al. Efficacy and safety of PARP inhibitors in the treatment of advanced ovarian cancer: An updated systematic review and meta-analysis of randomized controlled trials. Crit. Rev. Oncol. Hematol. 157, 103145 (2021).
Hirschl, N. et al. PARP inhibitors: Strategic use and optimal management in ovarian cancer. Cancers (Basel) 16(5), 932 (2024).
Yang, Y. et al. The efficacy and safety of the addition of poly ADP-ribose polymerase (PARP) inhibitors to therapy for ovarian cancer: A systematic review and meta-analysis. World J. Surg. Oncol. 18, 1–11 (2020).
Oza, A. M. et al. Rucaparib versus chemotherapy for treatment of relapsed ovarian cancer with deleterious BRCA1 or BRCA2 mutation (ARIEL4): final results of an international, open-label, randomised, phase 3 trial. Lancet Oncol. 26(2), 249–264 (2025).
Norquist, B. et al. Secondary somatic mutations restoring BRCA1/2 predict chemotherapy resistance in hereditary ovarian carcinomas. J. Clin. Oncol. 29(22), 3008–3015 (2011).
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36(5), 411–420 (2018).
McCarthy, D. J., Campbell, K. R., Lun, A. T. & Wills, Q. F. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33(8), 1179–1186 (2017).
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: Large-scale single-cell gene expression data analysis. Genome Biol. 19(1), 15 (2018).
Lun, A. T., McCarthy, D. J. & Marioni, J. C. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with bioconductor. F1000Res 5, 2122 (2016).
van den Brink, S. C. et al. Single-cell sequencing reveals dissociation-induced gene expression in tissue subpopulations. Nat. Methods 14(10), 935–936 (2017).
Zhang, K. et al. Longitudinal single-cell RNA-seq analysis reveals stress-promoted chemoresistance in metastatic ovarian cancer. Sci. Adv. 8(8), eabm1831 (2022).
Xi, N. M. & Li, J. Benchmarking computational doublet-detection methods for single-cell RNA sequencing data. Cell Syst. 12(2), 176–194.e176 (2021).
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184(13), 3573–3587 e3529 (2021).
inferCNV of the Trinity CTAT Project. https://github.com/broadinstitute/inferCNV.
Maynard, A. et al. Therapy-induced evolution of human lung cancer revealed by single-cell RNA sequencing. Cell 182(5), 1232–1251.e1222 (2020).
Kim, N. et al. Single-cell RNA sequencing demonstrates the molecular and cellular reprogramming of metastatic lung adenocarcinoma. Nat. Commun. 11(1), 2285 (2020).
Chen, K. et al. Single cell RNA-seq reveals the CCL5/SDC1 receptor-ligand interaction between T cells and tumor cells in pancreatic cancer. Cancer Lett. 545, 215834 (2022).
Liu, Y. et al. Tumour heterogeneity and intercellular networks of nasopharyngeal carcinoma at single cell resolution. Nat. Commun. 12(1), 741 (2021).
Oketch, D. J., Giulietti, M. & Piva, F. A comparison of tools that identify tumor cells by inferring copy number variations from single-cell experiments in pancreatic ductal adenocarcinoma. Biomedicines 12(8), 1759 (2024).
Hao, Y. et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat. Biotechnol. 42(2), 293–304 (2024).
Leek, J. T. et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11(10), 733–739 (2010).
Koti, M. et al. Identification of the IGF1/PI3K/NF kappaB/ERK gene signalling networks associated with chemotherapy resistance and treatment response in high-grade serous epithelial ovarian cancer. BMC Cancer 13, 549 (2013).
Lisowska, K. M. et al. Unsupervised analysis reveals two molecular subgroups of serous ovarian cancer with distinct gene expression profiles and survival. J. Cancer Res. Clin. Oncol. 142(6), 1239–1252 (2016).
Ahmed, A. A. et al. The extracellular matrix protein TGFBI induces microtubule stabilization and sensitizes ovarian cancers to paclitaxel. Cancer Cell 12(6), 514–527 (2007).
Tang, K. et al. Rank-in: enabling integrative analysis across microarray and RNA-seq for cancer. Nucleic Acids Res. 49(17), e99–e99 (2021).
Xu, S. et al. Using clusterProfiler to characterize multiomics data. Nat. Protoc. 19(11), 3292–3320 (2024).
Jassal, B. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 48(D1), D498–D503 (2020).
Jovic, D. et al. Single-cell RNA sequencing technologies and applications: A brief overview. Clin. Transl. Med. 12(3), e694 (2022).
Germain, P.-L., Lun, A., Meixide, C. G., Macnair, W. & Robinson, M. Doublet identification in single-cell sequencing data using scDblFinder. F1000Res 10, 979 (2022).
Maynard, A. et al. Therapy-induced evolution of human lung cancer revealed by single-cell RNA sequencing. Cell 182(5), 1232–1251 e1222 (2020).
Neftel, C. et al. An integrative model of cellular states, plasticity, and genetics for glioblastoma. Cell 178(4), 835–849 e821 (2019).
Marinho, A. T. et al. Anti-tumorigenic and platinum-sensitizing effects of apolipoprotein A1 and apolipoprotein A1 mimetic peptides in ovarian cancer. Front Pharmacol. 9, 1524 (2018).
Hueber, P.-A., Waters, P., Clarke, P., Eccles, M. & Goodyer, P. PAX2 inactivation enhances cisplatin-induced apoptosis in renal carcinoma cells. Kidney Int. 69(7), 1139–1145 (2006).
Feng, Y. et al. PAX2 promotes epithelial ovarian cancer progression involving fatty acid metabolic reprogramming. Int. J. Oncol. 56(3), 697–708 (2020).
Blagden, S. & Gabra, H. Promising molecular targets in ovarian cancer. Curr. Opin. Oncol. 21(5), 412–419 (2009).
Brasseur, K., Gevry, N. & Asselin, E. Chemoresistance and targeted therapies in ovarian and endometrial cancers. Oncotarget 8(3), 4008–4042 (2017).
Bao, L. et al. Induction of autophagy contributes to cisplatin resistance in human ovarian cancer cells. Mol. Med. Rep. 11(1), 91–98 (2015).
Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Zhao, H. et al. High expression levels of AGGF1 and MFAP4 predict primary platinum-based chemoresistance and are associated with adverse prognosis in patients with serous ovarian cancer. J. Cancer 10(2), 397 (2019).
Xi, Y. et al. A chemotherapy response prediction model derived from tumor-promoting B and Tregs and proinflammatory macrophages in HGSOC. Front Oncol. 13, 1171582 (2023).
Chen, S. et al. A risk model of gene signatures for predicting platinum response and survival in ovarian cancer. J. Ovarian Res. 15(1), 39 (2022).
Huang, D. et al. A highly annotated database of genes associated with platinum resistance in cancer. Oncogene 40(46), 6395–6405 (2021).
He, Y., Han, S.-B., Liu, Y., Zhang, J.-J. & Wu, Y.-M. Role of APOA1 in the resistance to platinum-based chemotherapy in squamous cervical cancer. BMC Cancer 22(1), 411 (2022).
Marinho, A. T. et al. Anti-tumorigenic and platinum-sensitizing effects of apolipoprotein A1 and apolipoprotein A1 mimetic peptides in ovarian cancer. Front Pharmacol. 9, 1524 (2019).
Hibi, K. et al. Methylation of TFPI2 gene is frequently detected in advanced well-differentiated colorectal cancer. Anticancer Res. 30(4), 1205–1207 (2010).
Takada, H. et al. Tissue factor pathway inhibitor 2 (TFPI2) is frequently silenced by aberrant promoter hypermethylation in gastric cancer. Cancer Genet. Cytogenet. 197(1), 16–24 (2010).
Fullar, A. et al. Two ways of epigenetic silencing of TFPI2 in cervical cancer. PLoS ONE 15(6), e0234873 (2020).
Stavik, B. et al. TFPIα and TFPIβ are expressed at the surface of breast cancer cells and inhibit TF-FVIIa activity. J. Hematol. Oncol. 6, 1–14 (2013).
Volpert, M. et al. CRISP3 expression drives prostate cancer invasion and progression. Endocr. Relat. Cancer 27(7), 415–430 (2020).
Al Bashir, S. et al. Cysteine-rich secretory protein 3 (CRISP3), ERG and PTEN define a molecular subtype of prostate cancer with implication to patients’ prognosis. J. Hematol. Oncol. 7, 1–11 (2014).
Teng, Y., Zhao, X., Xi, Y. & Fu, N. N6-methyladenosine-regulated ADIRF impairs lung adenocarcinoma metastasis and serves as a potential prognostic biomarker. Cancer Biol. Ther. 24(1), 2249173 (2023).
Guo, M., Fan, X., Zhu, S., Zhao, X., Guo, Q., Wang, L., Han, Y. & Liu, Z. ADIRF expression reversely correlates with stage progression and involves keratinocyte differentiation in esophageal squamous cell carcinoma (2022).
Jiménez Labaig, P. et al. The influence of carrying BRCA pathogenic mutations in ovarian cancer and its platinum-sensitivity: 5-year experience of a tertiary center. J. Clin. Oncol. 40, e17572–e17572 (2022).
Zikan, M., Vecerova, L., Dubova, O., Sehnal, B. & Soukupova, J. BRCA mutation carriers suffering from ovarian cancer as a model for treatment decision in higher lines–Place for platinum reinduction. J. Cancer Res. Ther. 19(3), 684–687 (2023).
Han, G. H., Cho, H., Yun, H. & Kim, J.-H. PACSIN3 is a novel biomarker for platinum resistance BRCA mutated platinum resistance epithelial ovarian cancer. J. Gene Med. 24, e3452 (2022).
Ceppi, I. et al. Mechanism of BRCA1–BARD1 function in DNA end resection and DNA protection. Nature 634(8033), 492–500 (2024).
Pan, Z. & Xie, X. BRCA mutations in the manifestation and treatment of ovarian cancer. Oncotarget 8(57), 97657 (2017).
Rottenberg, S., Disler, C. & Perego, P. The rediscovery of platinum-based cancer therapy. Nat. Rev. Cancer 21(1), 37–50 (2021).
Akashi, H. et al. SLFN11 is a BRCA independent biomarker for the response to platinum-based chemotherapy in high-grade serous ovarian cancer and clear cell ovarian carcinoma. Mol. Cancer Ther. 23(1), 106–116 (2024).
Yang, D. et al. Association of BRCA1 and BRCA2 mutations with survival, chemotherapy sensitivity, and gene mutator phenotype in patients with ovarian cancer. JAMA 306(14), 1557–1565 (2011).
Author information
Authors and Affiliations
Contributions
Suxia Han designed the study, Tingting Gao analysis and interpreted the results and Peng Zhao collected the public data.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Gao, T., Zhao, P. & Han, S. Integrating bulk RNA-seq and scRNA-seq analyses with machine learning to predict platinum response and prognosis in ovarian cancer. Sci Rep 15, 19123 (2025). https://doi.org/10.1038/s41598-025-99930-9
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-99930-9







