Introduction

High-grade serous ovarian cancer (HGSOC) is the most common and aggressive subtype of epithelial ovarian cancer, accounting for approximately 70% of all cases1. This disease is often referred to as the “silent killer” due to its complex pathogenesis and asymptomatic nature in early stages. Because over 75% of patients are diagnosed at an advanced stage with widespread peritoneal metastasis and malignant ascites, the prognosis is extremely poor2,3. There is an urgent need for methods to diagnose this disease at an earlier stage. Microscopically, HGSOC typically manifests as expansive, bilateral lesions with a composite architecture comprising solid, papillary, and cystic components4. The molecular hallmark of HGSOC includes near-universal mutations in the TP53 gene, which is associated with chromosomal instability and defective DNA repair mechanisms. Additionally, HGSOC often exhibits homologous recombination deficiency (HRD), making it responsive to platinum-based chemotherapy. Despite these therapeutic options, the overall survival (OS) of HGSOC patients has not significantly improved over the past 2 decades5.

Paclitaxel (PTX), originally extracted from the bark of the Pacific yew tree, is renowned for its ability to stabilize microtubules, thereby arresting cell division and inducing apoptosis. However, these cytotoxic effects are not selective, targeting both cancerous and normal tissue cells, leading to frequent adverse reactions such as myelosuppression, alopecia, and hypersensitivity. The development of resistance to PTX is also an inevitable challenge6. To address these issues, various combination therapies have emerged. For instance, PTX-loaded mesenchymal stem cells (MSCs) leverage the homing properties of MSCs to deliver PTX directly to tumor sites. However, the drug loading capacity and efficiency of this delivery system still need improvement7. One study encapsulated PTX in poly(lactic-co-glycolic acid) (PLGA) and combined it with MSCs to significantly enhance drug loading8. Another innovative approach used genetic engineering to enable MSCs to express tumor necrosis factor-related apoptosis-inducing ligand (TRAIL) while carrying PTX. This combination restored sensitivity to TRAIL in resistant pancreatic cancer cells, suggesting that PTX may act on specific targets within the drug resistance mechanisms of cancer cells9. Therefore, elucidating the targets of PTX in cancer cells is crucial for optimizing treatment strategies and improving patient prognosis.

In recent years, bioinformatics, machine learning, network pharmacology, and molecular docking have emerged as powerful tools for identifying therapeutic targets in cancer research. These approaches have been successfully applied in various cancers to elucidate drug mechanisms and identify potential therapeutic targets. For example, integrating proteomics and histopathology using machine learning models has demonstrated improved predictive accuracy for platinum response in HGSOC10. Molecular docking and machine learning have also been used to identify novel EGFR-targeting compounds in cervical cancer11. In osteosarcoma, a five-gene natural killer (NK) cell-based prognostic signature was constructed using LASSO Cox regression analysis, which helped predict the response of patients to immunotherapy12. In colon cancer, studies have identified hypoxia-related subtypes and established prognostic models based on hypoxia-related genes, demonstrating the potential of bioinformatics in characterizing tumor microenvironment infiltration and identifying prognostic biomarkers13. These studies underscore the potential of integrating bioinformatics and machine learning to identify therapeutic targets and improve prognostic accuracy in various cancers.

Despite these advances, a comprehensive analysis of the mechanisms and targets of PTX in HGSOC from the perspectives of bioinformatics, machine learning, and network pharmacology is lacking, this study provides a novel perspective on the mechanisms of PTX in HGSOC. Investigating the therapeutic targets and primary diagnostic biomarkers of PTX in HGSOC is crucial for diagnosis and treatment, offering a foundation and reference for future research.

Materials and methods

Data collection

The chemical structure of PTX was obtained from the PubChem database (https://pubchem.ncbi.nlm.nih.gov/) using the search term “Paclitaxel” This structure was subsequently uploaded to the Swiss Target Prediction database (http://www.swisstargetprediction.ch/), with the search species specified as “human,” to identify potential targets of PTX.

The expression matrix and platform files for GSE54388 were manually downloaded from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/)14. This dataset contains 6 normal tissue samples and 16 tumor tissue samples. The “GEOquery” package in R was used to extract clinical information and the expression matrix15. The expression matrix was normalized, and probe IDs were converted to standard gene names. For genes with multiple probes, the mean expression value was calculated. Batch effects were removed using the ComBat function from the “sva” package16.

To further evaluate the accuracy and generalizability of the combined diagnostic model, two additional independent external validation datasets were employed: GSE26712 and GSE1247017,18,19. The GSE26712 dataset includes 10 normal tissue samples and 185 tumor tissue samples, while GSE12470 comprises 10 normal tissue samples and 43 tumor tissue samples. These datasets were also manually downloaded from the GEO database. The same preprocessing steps used for GSE54388 were applied to these datasets, including normalization of the expression matrix, conversion of probe IDs to standard gene names, calculation of mean expression values for genes with multiple probes, and removal of batch effects using the ComBat function from the “sva” package.

Identification and visualization of differentially expressed genes

Differentially expressed genes (DEGs) between normal ovarian tissue and HGSOC samples were identified using the “limma” package in R20. The analysis included six normal and 16 HGSOC samples. DEGs were defined by |log2 fold change (log2FC)|> 1 and adjusted P value (adj. P value) < 0.05. Volcano plots and heatmaps were generated using the “ggplot2” and “pheatmap” packages, respectively, with the top 10 upregulated and downregulated genes annotated21,22.

Intersection analysis and PPI network construction

The intersection of PTX targets and differentially expressed genes was identified using the “VennDiagram” package in R and visualized with a Venn diagram23. The intersecting genes were uploaded to the STRING database (https://string-db.org/), and genes without interaction relationships were excluded24. The interaction network was saved and downloaded in tsv format, then imported into Cytoscape 3.9.0. Core targets were identified using the CytoHubba plugin to construct the PPI network25,26.

Functional analysis and protein distribution

The “clusterProfiler” package was used to perform Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses on the intersection genes, with results visualized using the “ggplot2” package27,28,29,30 (www.kegg.jp/kegg/kegg1.html). Survival analysis was conducted using the Kaplan–Meier Plotter website (https://www.kmplot.com/analysis/)31,32. The expression levels and distribution of proteins encoded by intersection genes in normal and cancerous ovarian tissues were investigated using the Human Protein Atlas (HPA) database (https://www.proteinatlas.org/)33,34,35.

Development and validation of a combined diagnostic model

To construct a diagnostic model for HGSOC, we employed LASSO regression analysis using the R package glmnet36. The optimal regularization parameter (λ) was determined through cross-validation, and the minimum λ value (λ = 0.0003988178) was selected to balance model complexity and predictive performance. This approach ensured that the LASSO regression coefficients for the selected DEGs were stable and reproducible. The mRNA expression data from the training dataset, GSE54388, which includes 6 normal tissue samples and 16 tumor tissue samples, were used to derive the LASSO regression coefficients. The LASSO risk score (RiskScore) was calculated using the following formula:

$${\text{RiskScore}} = {\text{Intercept}} + \sum\limits_{{\text{i}}} {\left[ {{\text{mRNA Expression}}\left( {{\text{gene}}_{{\text{i}}} } \right) \times {\text{Coefficient}}\left( {{\text{gene}}_{{\text{i}}} } \right)} \right]}$$

To ensure reproducibility, the random seed was set using set.seed(500) before model training, and the logistic regression model was specified with family = “binomial” to accommodate the binary classification of HGSOC and normal tissues.

The robustness and generalizability of the model were evaluated using two independent external validation datasets: GSE26712, which includes 10 normal tissue samples and 185 tumor tissue samples, and GSE12470, which includes 10 normal tissue samples and 43 tumor tissue samples. Model performance was assessed by ROC curve analysis and AUC calculation using GraphPad Prism 10 software. Additionally, violin plots were generated to visualize the distribution of risk scores, revealing significantly higher scores in tumor samples compared to normal tissues. These analyses collectively ensured a thorough evaluation of the model’s predictive accuracy across diverse cohorts37.

Molecular docking validation and visualization

The chemical structure of PTX was obtained from PubChem (https://pubchem.ncbi.nlm.nih.gov/). The target protein IDs of key genes were identified using the UniProt database (https://www.uniprot.org/)38, and their corresponding 3D structures were retrieved from the PDB database (https://www.rcsb.org/)39,40. Molecular docking experiments were conducted using AutoDockTools 1.5.6 to investigate the binding interactions between PTX and its target proteins.

To ensure the robustness and reproducibility of our docking experiments, we carefully defined the docking parameters. Each protein was subjected to a unique grid box size and coordinates, tailored to its specific binding site. These parameters were optimized to ensure that the entire active site was adequately covered. Detailed grid box sizes and coordinates for each protein are provided in Supplementary Table S4. The default scoring function integrated within AutoDockTools 1.5.6 was employed to assess binding affinities. This scoring function is based on a semi-empirical free energy force field, incorporating terms for van der Waals interactions, hydrogen bonding, and electrostatic interactions, thereby providing a comprehensive evaluation of the binding energy. To achieve a thorough exploration of the conformational space, we set the exhaustiveness parameter to 20. This parameter controls the precision of the search algorithm, ensuring that multiple potential binding orientations were evaluated.

The docking outcomes were visualized using PyMOL and LigPlus software to elucidate the molecular interactions involved. These tools allowed us to analyze the binding modes and interactions between PTX and its target proteins in detail.

Immune cell infiltration analysis

Immune cell infiltration was analyzed in normal and ovarian cancer groups using the “CIBERSORT” package in R41. The infiltration levels of stromal and immune cells were calculated using the “Estimate” package42, with ssGSEA scoring stromal and immune cells in each sample. Scores included the ESTIMATE Score, Immune Score, and Stromal Score.

Statistical analysis

All statistical analyses and graph generation were performed using R software (version 4.4.1). P < 0.05 was considered as statistically significant.

Results

Identification and analysis of PTX targets and differentially expressed genes

The 2D structure and SMILES code of PTX were retrieved from PubChem and used to identify 100 potential targets via the Swiss Target Prediction database. Differential expression analysis between normal ovarian tissue and HGSOC samples revealed 2267 differentially expressed genes (1316 upregulated and 951 downregulated), visualized through volcano plots and heatmaps (Fig. 1A,B). The intersection of PTX targets and differentially expressed genes identified 15 overlapping genes (Fig. 1C). These genes were used to construct a protein–protein interaction (PPI) network using the STRING database, with interactions visualized in Cytoscape 3.9.0 using the cytoHubba plugin.

Fig. 1
figure 1

Identification and interaction analysis of PTX targets and differentially expressed genes. (A) Volcano plot of differentially expressed genes. (B) Heatmap of differentially expressed genes. (C) Venn diagram of overlapping genes. (D) PPI network of target proteins. (E) Association degree of target proteins.

Figure 1D presents the regulatory network of the intersecting genes, highlighting their protein–protein interactions. This network provides insights into the functional relationships and potential pathways involving these genes. In Fig. 1E, the color intensity reflects the degree of interaction relevance, with darker red indicating stronger interactions. The connecting lines represent direct protein–protein interactions, revealing the key nodes and their connections within the network. These visualizations elucidate the central roles of these genes in the cellular context and their potential contributions to the pathogenesis of HGSOC. The network analysis identified several hub genes with high connectivity, including Aurora kinase A (AURKA), Cyclin A2 (CCNA2), Heat shock protein 90α (HSP90AA1), Aurora kinase B (AURKB), and Proteasome subunit beta type 5 (PSMB5), which ranked as the top 5 proteins.

Supplementary Table S1 provides detailed results of the differential expression analysis, including gene symbols, log2 fold changes, average expression levels, t-values, P values, adjusted P values, and change labels (Up, Down, Not Significant).

Functional enrichment of overlapping genes

The 15 overlapping genes were analyzed using GO and KEGG enrichment analyses with the “clusterProfiler” package. GO analysis identified significant enrichment in biological processes related to the G2/M transition of the mitotic cell cycle, cellular components such as the spindle and mitotic spindle, and molecular functions including protein kinase activity (Fig. 2A). KEGG analysis highlighted enrichment in pathways such as progesterone-mediated oocyte maturation, the cell cycle, and neuroactive ligand–receptor interaction (Fig. 2B). The KEGG network analysis further revealed that genes with higher log2 fold change (log2 FC) values, such as AURKA, AURKB, CBX7, and CCNE2, were more deeply colored, indicating their significant roles in these pathways. Notably, larger nodes in the network, including progesterone-mediated oocyte maturation, the cell cycle, and neuroactive ligand–receptor interaction, suggest that PTX may regulate these pathways to influence the progression of HGSOC.

Fig. 2
figure 2

Functional enrichment of overlapping genes. (A) GO enrichment analysis of overlapping genes. (B) KEGG pathway enrichment analysis of overlapping genes, highlighting significant pathways such as progesterone-mediated oocyte maturation, cell cycle, and neuroactive ligand–receptor interaction.

Survival analysis and protein distribution of overlapping genes in HGSOC and normal tissues

Survival analysis using the Kaplan–Meier Plotter website revealed that high expression of AURKA, CCNA2, HSP90AA1, PSMB5, and Tubulin beta-3 chain (TUBB3), and low expression of Angiotensin II receptor type 1 (AGTR1), Chromobox homolog 7 (CBX7), and Complement factor D (CFD) were associated with poor overall survival in HGSOC patients (Fig. 3A). These findings suggest that the expression levels of these genes may serve as potential prognostic biomarkers for HGSOC.

Fig. 3
figure 3

Survival analysis and protein distribution of overlapping genes in HGSOC and normal tissues. (A) Survival analysis using the Kaplan–Meier Plotter website, indicating that high expression of AURKA, CCNA2, HSP90AA1, PSMB5, and TUBB3, and low expression of AGTR1, CBX7, and CFD were associated with poor overall survival in HGSOC patients. (B) Protein levels of AURKA in HGSOC tissue (staining: high; intensity: strong; quantity: 75–25%) and in normal follicle cells (staining: not detected; intensity: negative; quantity: none). (C) Protein levels of CBX7 in HGSOC tissue (staining: high; intensity: strong; quantity: > 75%) and in normal follicle cells (staining: medium; intensity: moderate; quantity: > 75%), as well as in normal ovarian stroma cells (staining: low; intensity: moderate; quantity: < 25%). (D) Protein levels of CCNA2 in HGSOC tissue (staining: medium; intensity: strong; quantity: < 25%) and in normal ovarian stroma cells (staining: not detected; intensity: negative; quantity: none). (E) Protein levels of HSP90AA1 in HGSOC tissue (staining: medium; intensity: moderate; quantity: 75–25%) and in normal follicle cells (staining: not detected; intensity: negative; quantity: none), as well as in normal ovarian stroma cells (staining: not detected; intensity: negative; quantity: none). (F) Protein levels of TUBB3 in HGSOC tissue (staining: medium; intensity: moderate; quantity: 75–25%) and in normal ovarian stroma cells (staining: low; intensity: weak; quantity: > 75%). (G) Protein levels of AGTR1 in HGSOC tissue (staining: not detected; intensity: negative; quantity: none) and in normal follicle cells (staining: not detected; intensity: negative; quantity: none), as well as in normal ovarian stroma cells (staining: not detected; intensity: negative; quantity: none). (H) Protein levels of PSMB5 in HGSOC tissue (staining: not detected; intensity: negative; quantity: none) and in normal ovarian stroma cells (staining: not detected; intensity: weak; quantity: < 25%).

To further validate the expression patterns of these genes, we utilized the Human Protein Atlas (HPA) database and immunohistochemistry43. Detailed immunohistochemical data, including staining intensity, quantity, and localization, are provided in Supplementary Fig. S1. AURKA exhibited high expression in HGSOC tissues (Fig. 3B) but was undetectable in normal follicle cells. CBX7 showed high expression in HGSOC tissues (Fig. 3C) and moderate expression in normal follicle cells, but low expression in normal ovarian stroma cells. CCNA2 demonstrated moderate expression in HGSOC tissues (Fig. 3D) and was undetectable in normal ovarian stroma cells. HSP90AA1 exhibited moderate expression in HGSOC tissues (Fig. 3E) but was undetectable in both normal follicle and stroma cells. TUBB3 showed moderate expression in HGSOC tissues (Fig. 3F) and low expression in normal ovarian stroma cells. In contrast, AGTR1 and PSMB5 were generally undetectable in both HGSOC and normal tissues (Fig. 3G,H), indicating their potential limited role in the pathogenesis of HGSOC.

Collectively, the survival analysis and immunohistochemical validation provide a strong basis for selecting AURKA, CBX7, CCNA2, HSP90AA1, and TUBB3 as key components in the construction of a combined diagnostic model. These genes were chosen for their significant differential expression and potential as biomarkers for disease progression and prognosis.

Construction and validation of a combined diagnostic model

We constructed a combined diagnostic model for HGSOC using LASSO regression, which integrates the expression levels of five key genes: AURKA, CBX7, CCNA2, HSP90AA1, and TUBB3. LASSO regression, a technique that penalizes coefficients to reduce model complexity while enhancing predictive performance, was optimized via cross-validation using the training dataset GSE54388, which includes 6 normal tissue samples and 16 tumor tissue samples. The minimum λ value was determined to be 0.0003988178, and this optimal λ was selected based on its predictive accuracy (Fig. 4A,B). Our model revealed that elevated expression levels of TUBB3, AURKA, CCNA2, and HSP90AA1 are associated with an increased risk of ovarian cancer, whereas higher expression of CBX7 is linked to a reduced risk. The detailed regression coefficients for these five key genes are provided in Supplementary Table S2.

Fig. 4
figure 4

Construction and validation of the combined diagnostic model. (A) LASSO regression path plot showing the relationship between log(λ) and the coefficients of the five key genes. (B) Cross-validation results indicating the optimal λ value (0.0003988178) selected for model construction. (C) ROC curve analysis for the GSE26712 validation dataset. (D) Violin plot of risk scores for the GSE26712 dataset. (E) ROC curve analysis for the GSE12470 validation dataset. (F) Violin plot of risk scores for the GSE12470 dataset.

To assess the model’s predictive capability, we employed two independent external validation datasets: GSE26712 and GSE12470. Through ROC curve analysis performed with GraphPad Prism 10 software, the model demonstrated high accuracy in distinguishing between normal and tumor groups. For the GSE26712 dataset, which includes 10 normal tissue samples and 185 tumor tissue samples, the model achieved an AUC of 0.9892 (P < 0.0001), indicating excellent diagnostic performance (Fig. 4C). To further translate our model’s diagnostic performance into a clinically actionable metric, we sought to identify an optimal cut-off value that could effectively distinguish between normal and tumor tissues. This step is crucial for translating the model’s predictive power into practical clinical applications, where a clear threshold is needed to make binary diagnostic decisions. Further analysis identified a cut-off value of 18.78, corresponding to a sensitivity of 95.14% and specificity of 100%. These high sensitivity and specificity values reinforce the model’s potential for clinical application (Fig. 4C). This finding was further supported by violin plots, which visually highlighted significantly higher risk scores in the tumor group compared to normal tissues (P < 0.0001) (Fig. 4D). The cut-off value of 18.78 is depicted by a dashed line in the violin plot (Fig. 4D).

For the GSE12470 dataset, which includes 10 normal tissue samples and 43 tumor tissue samples, the model achieved an AUC of 0.9465 (P < 0.0001), further confirming its robustness (Fig. 4E). Similarly, we identified a cut-off value of 11.76, with a sensitivity of 95.35% and specificity of 90%, underscoring the model’s high diagnostic precision (Fig. 4E). Violin plots again visually confirmed significantly higher risk scores in the tumor group (P < 0.0001) (Fig. 4F), with the cut-off value of 11.76 indicated by a dashed line in the violin plot (Fig. 4F).

Overall, these results highlight the significant application value of our combined diagnostic model in the diagnosis of HGSOC, emphasizing the crucial role of the five key genes in accurately identifying ovarian cancer. Detailed statistical analysis and validation results for both datasets are provided in Supplementary Table S3, which includes specific P values, mean differences, and confidence intervals, further supporting the robustness and reliability of our model.

Molecular docking and immune infiltration analysis

Molecular docking experiments were conducted between PTX and key gene target proteins, including AURKA, CBX7, CCNA2, HSP90AA1, and TUBB3. The binding energies were all less than − 5.0 kcal/mol, indicating good binding stability, with the binding energy between PTX and CCNA2 being − 8.4 kcal/mol, the most stable among all interactions (Table 1). Visualization of these interactions revealed that PTX can effectively bind to these key proteins, potentially modulating their activity and contributing to its therapeutic effects (Fig. 5A–E).

Table 1 Binding energies of PTX with key gene target proteins.
Fig. 5
figure 5

Molecular docking and immune infiltration analysis. (AE) Molecular docking visualizations of PTX binding to key target proteins (AURKA, CBX7, CCNA2, HSP90AA1, TUBB3). 3D visualizations depict PTX in orange, hydrogen bonds in blue–green, and the protein backbone in grey. 2D visualizations show PTX in purple, hydrogen bonds as green dashed lines, and the protein in orange. (F) Stacked histograms showing immune cell subtype abundance in each sample. (G) Box plots showing differences in immune cell subtype abundance between normal and tumor groups. (H) Box plots showing differences in Immune Score, Stromal Score, and ESTIMATE Score between normal and tumor groups.

Immune cell infiltration analysis using the CIBERSORT algorithm revealed significant differences between normal and tumor tissues. Stacked histograms showed the abundance of immune cell subtypes in each sample (Fig. 5F). Box plots demonstrated that monocytes and neutrophils were more abundant in normal tissues (P < 0.001 and P < 0.01, respectively), while follicular helper T cells (TFH), activated dendritic cells (DCs), and eosinophils were more abundant in tumor tissues (all P < 0.05) (Fig. 5G). These findings suggest enhanced immune responses and microenvironment remodeling in tumors, which may influence disease progression and treatment response.

The ESTIMATE algorithm was used to analyze gene expression data, calculating Immune Score, Stromal Score, and ESTIMATE Score. Box plots showed that normal tissues exhibited higher Immune Score and ESTIMATE Score, while tumor tissues had significantly lower Stromal Score (P = 0.00048) (Fig. 5H). This indicates lower stromal cell abundance and potential immune evasion in the tumor microenvironment, which may contribute to the immunosuppressive nature of ovarian cancer.

Discussion

Ovarian cancer is the most lethal gynecologic malignancy, with HGSOC being the most common and aggressive subtype, accounting for 70–80% of ovarian cancer-related deaths44. Due to the unique anatomical structure of the ovaries, early symptoms of HGSOC are often occult. Patients with HGSOC are typically diagnosed at an advanced stage (III or IV), with rapid disease progression and poor prognosis. The 5-year survival rate for patients with advanced HGSOC is only approximately 25%45. Currently, there is a lack of effective screening methods for ovarian cancer, highlighting the significance of developing combined diagnostic models for clinical diagnosis.

PTX, a natural anticancer drug originally extracted from the Pacific yew tree, has been a standard treatment for various cancers since its approval by the US FDA for the treatment of advanced ovarian cancer in 1992. Its classic mechanism of action involves inhibiting microtubule depolymerization to block cancer cell mitosis, thereby suppressing tumor growth46. However, with advancements in immunology, oncology, and biotechnology, increasing evidence suggests that PTX exerts its effects through multiple targets and pathways47,48. These targets and pathways are not only involved in cancer cell proliferation and apoptosis but also influence the regulation of the tumor microenvironment49. Therefore, identifying the relevant targets and pathways of PTX in the treatment of HGSOC is of great importance.

AURKA is a serine/threonine protein kinase primarily responsible for regulating mitosis and the cell cycle to ensure accurate genetic information transmission. It plays a crucial role in various stages of the cell cycle, including centrosome maturation, spindle assembly, and cell entry into mitosis50. In this study, high expression of AURKA was found to be associated with an increased risk of HGSOC, a finding corroborated by Wang et al.51. Their study showed that AURKA knockout significantly inhibited ovarian cancer cell-induced angiogenesis (P < 0.001) and tumor formation in nude mice (P < 0.001). AURKA also contributes to drug resistance in cancer cells. It can form a transcriptional coactivator complex with DEAD-box helicase 5 (DDX5), inducing the transcription and upregulation of the oncogenic long noncoding RNA (lncRNA) TMEM147-AS1. TMEM147-AS1 binds to hsa-let-7b/7c-5p (a miRNA) via a sponge mechanism, leading to increased AURKA expression. This feedback loop maintains AURKA overexpression and contributes to cisplatin resistance in ovarian cancer cells through lipophagy activation. These findings demonstrate the close association of AURKA with the occurrence, progression, and drug resistance of ovarian cancer.

PTX exerts its microtubule-stabilizing effects by binding to β-tubulin, a member of the cytoskeletal protein family (α and β-tubulin) that forms heterodimers and assembles into microtubules. PTX drugs inhibit microtubule depolymerization, blocking tumor cell progression at the G2-M phase of mitosis and suppressing cell proliferation52. This study found that overexpression of TUBB3 in HGSOC is associated with poor prognosis. Similarly, a study focused on using ELISA to efficiently and accurately detect anti-TUBB3 antibodies in serum for disease prognosis prediction.

CCNA2 regulates the G1/S and G2/M phase transitions of the cell cycle by binding to and activating cyclin-dependent kinase 2 (CDK2). Overexpression of CCNA2 in ovarian cancer has been reported in this study and multiple others53. One study demonstrated that miR-508-3p directly regulates downstream molecules CCNA2 and MMP7, inhibiting ovarian cancer cell proliferation, migration, and invasion, with CCNA2 serving as an independent prognostic biomarker for overall survival in ovarian cancer patients. HSP90AA1 is a molecular chaperone protein induced under cellular stress conditions, involved in protein folding, stabilization, and degradation. Overexpression of HSP90AA1 in certain cancers has been documented. To further investigate the relationship between HSP90AA1 and cancer cells, one study used siRNA to silence HSP90AA1, significantly reducing the expression of IL-6, IL-7, CXCL1, and CXCL2 and inhibiting colorectal cancer cell growth in the Caco-2 and Colo-205 cell lines54. In ovarian cancer-related research, one study elucidated the role of HSP90AA1 in drug resistance. Overexpression of HSP90AA1 decreased SKOV3 cell sensitivity to cisplatin, demonstrating that HSP90AA1 overexpression enhances SKOV3 cell resistance to cisplatin55. CBX7, a member of the Polycomb repressive complex, is a potent epigenetic regulator and gene silencer. This study found that CBX7 overexpression is associated with a decreased risk of ovarian cancer. In numerous studies, CBX7 has been considered a tumor suppressor, with its downregulation linked to the aggressiveness and poor prognosis of cervical, breast, and osteosarcoma cancers56,57,58. Zhang et al.’s study used adipose-derived exosomes to deliver miR-421, resulting in CBX7 downregulation and enhanced ovarian cancer cell migration potential, thereby promoting disease progression. These studies confirm that overexpression of TUBB3, AURKA, CCNA2, and HSP90AA1 is associated with an increased risk of ovarian cancer, while CBX7 overexpression is associated with a decreased risk.

Our study demonstrates that PTX primarily affects the progression of HGSOC by regulating the G2/M transition of the mitotic cell cycle via the spindle. This effect is also observed in adjacent normal tissues. One study has shown that PTX induces senescence in normal peritoneal cells, including peritoneal mesothelial cells (PMCs) and peritoneal fibroblasts (PFBs), which is associated with the G2/M transition of the mitotic cell cycle. These senescent cells can promote the proliferation, migration, and invasion of primary epithelial ovarian cancer59. Another study confirmed the critical role of AURKA in the G2/M transition of the mitotic cell cycle in HGSOC and its association with drug resistance. The specific mechanism involves using the AURKA inhibitor alisertib (AL) to suppress AURKA expression, thereby overcoming resistance to PTX or cisplatin60.

Significant differences in immune cell infiltration were observed between normal ovarian tissues and those affected by HGSOC. These differences highlight the complexity of the tumor microenvironment and appear to be closely related to the immunomodulatory effects of PTX. PTX, a classic chemotherapeutic agent, is well-known for its ability to directly inhibit cancer cell proliferation by stabilizing microtubules. However, recent evidence suggests that PTX can also exert indirect antitumor effects by modulating the infiltration and function of immune cells within the tumor microenvironment61.

Monocytes were found to be more abundant in normal tissues but may differentiate into tumor-promoting macrophages within the tumor microenvironment. PTX has been shown to activate the Toll-like receptor 4 (TLR4) signaling pathway, inhibiting the differentiation of monocytes into pro-tumorigenic M2 macrophages and promoting the polarization of anti-tumorigenic M1 macrophages, thereby enhancing antitumor immune responses62. Additionally, neutrophils were more abundant in normal tissues but relatively fewer in tumor tissues. This may be due to the formation of neutrophil extracellular traps (NETs) in the tumor microenvironment, which facilitate tumor progression. NETs not only promote tumor cell adhesion and metastasis but also exclude lymphocyte infiltration, thereby enhancing treatment resistance63. This notion has been recently corroborated by studies showing that DLL4+ tumor cells can secrete Delta-like 4 (DLL4) protein, which activates the Notch signaling pathway in neutrophils, inducing NET formation and release, and thereby enhancing tumor drug resistance64.

In HGSOC tissues, increased infiltration of follicular helper T cells (TFH) and activated dendritic cells was observed, which may be associated with immune activation in the tumor microenvironment. PTX has been proven to promote dendritic cell maturation and enhance their antigen-presenting capacity, thereby activating T cell-mediated immune responses65. Moreover, low-dose PTX can activate the cGAS/STING signaling pathway, further promoting dendritic cell maturation and increasing the infiltration of cytotoxic T lymphocytes and natural killer cells, ultimately inhibiting tumor progression66. These studies indicate that PTX can significantly influence immune reactions within the tumor microenvironment by modulating dendritic cell maturation and function, thereby enhancing antitumor effects and impacting cancer progression.

We identified 2267 DEGs from the HGSOC dataset GSE54388 and found 15 intersecting genes with PTX targets. Further analysis revealed that PTX may exert additional effects beyond its classic cytotoxic mechanisms, highlighting the importance of exploring its broader mechanisms of action in HGSOC. After constructing a protein–protein interaction (PPI) network and conducting survival analysis using the Kaplan–Meier Plotter, as well as validating protein expression levels using the Human Protein Atlas (HPA) database, we identified five key genes: AURKA, CBX7, CCNA2, HSP90AA1, and TUBB3. Based on these analyses, we developed a combined diagnostic model for HGSOC using LASSO regression. The model demonstrated good diagnostic performance in two independent external validation datasets (GSE26712 and GSE12470), highlighting its potential application in accurately identifying ovarian cancer. Our comprehensive review of the literature indicates that most prior studies have focused on identifying single diagnostic biomarkers and developing individual clinical diagnostic models. In contrast, this study has developed a combined diagnostic model for HGSOC based on five key target proteins. This model achieved high diagnostic accuracy in an independent external validation set, significantly contributing to the clinical diagnosis of HGSOC. Therefore, this study effectively bridges bioinformatics and clinical research, translating bioinformatics findings into practical clinical diagnostic tools.

The development of a combined diagnostic model for HGSOC represents a significant advancement in the field of ovarian cancer diagnostics. Current diagnostic methods for HGSOC, including imaging techniques such as CT and MRI, have limitations in terms of sensitivity and specificity, especially in the early stages of the disease. The combined diagnostic model we developed, based on the expression levels of five key genes (AURKA, CBX7, CCNA2, HSP90AA1, and TUBB3), offers a novel approach to improve diagnostic accuracy. Integrating this model with existing diagnostic methods, such as CT and MRI, could enhance the overall diagnostic performance. This integration could provide a more comprehensive assessment of disease presence and progression, potentially leading to earlier detection and intervention. Moreover, the high diagnostic accuracy of our model in independent validation datasets suggests that it could serve as a valuable adjunct to existing diagnostic protocols. This integration could be particularly beneficial in cases where traditional methods yield inconclusive results, providing clinicians with additional information to guide treatment decisions.

This study offers novel perspectives on the molecular mechanisms underlying PTX’s effects in HGSOC, thereby supporting the development of innovative therapeutic approaches and diagnostic tools. However, several limitations should be acknowledged. First, the datasets used in this study, including GSE54388, GSE26712, and GSE12470, may have inherent biases that could affect the generalizability of our findings. These biases could arise from differences in patient populations, sample collection methods, and experimental designs. Future studies should aim to validate our findings in larger and more diverse cohorts to ensure robustness and applicability across different clinical settings. Second, while the combined diagnostic model demonstrated good diagnostic performance in two independent external validation datasets (GSE26712 and GSE12470), its accuracy and reliability may still be influenced by the specific characteristics of these datasets. The model’s assumptions and the potential for overfitting need to be carefully evaluated. Future work should focus on refining the model and assessing its performance in additional independent datasets to further validate its clinical utility. Additionally, the model’s integration with existing diagnostic methods, such as CT and MRI, should be explored to enhance its practical application in clinical settings.

Future research should integrate multi-omics data, including transcriptomics, proteomics, metabolomics, and microbiomics, to provide a more comprehensive understanding of HGSOC. This integrative approach could reveal additional biomarkers and therapeutic targets, enhancing our ability to diagnose and treat this aggressive form of ovarian cancer. Furthermore, additional experimental studies are needed to elucidate the direct effects of PTX on the tumor microenvironment and to confirm the roles of the five key genes in cancer cell killing, tumor microenvironment regulation, and drug resistance. A better understanding of the interplay between PTX, HGSOC, and immune infiltration will be crucial for developing more effective treatment strategies.

Conclusions

In summary, this study identified five key diagnostic biomarkers for HGSOC through an integrated approach combining bioinformatics, machine learning, network pharmacology, and molecular docking. These biomarkers—AURKA, CBX7, CCNA2, HSP90AA1, and TUBB3—were used to construct a combined diagnostic model with high accuracy, validated in two independent external datasets. The mechanisms of action of these biomarkers in the tumor microenvironment were comprehensively analyzed, highlighting their potential roles in cancer cell proliferation, apoptosis, and immune regulation. These findings provide valuable insights for future scientific research and clinical diagnosis, supporting the development of new therapeutic strategies and diagnostic tools for HGSOC.