Unbiased discovery of cancer pathways and therapeutics using Pathway Ensemble Tool and Benchmark

Wang, Luopin; Pattnaik, Aryamav; Sahoo, Subhransu Sekhar; Stone, Ella G.; Zhuang, Yuxin; Benton, Annaleigh; Tajmul, Md; Chakravorty, Srishti; Dhawan, Deepika; Nguyen, My An; Sirit, Isabella; Mundy, Kyle; Ricketts, Christopher J.; Hadisurya, Marco; Baral, Garima; Tinsley, Samantha L.; Anderson, Nicole L.; Hoda, Smriti; Briggs, Scott D.; Kaimakliotis, Hristos Z.; Allen-Petersen, Brittany L.; Tao, W. Andy; Linehan, W. Marston; Knapp, Deborah W.; Hanna, Jason A.; Olson, Matthew R.; Afzali, Behdad; Kazemian, Majid

doi:10.1038/s41467-024-51859-9

Download PDF

Article
Open access
Published: 24 August 2024

Unbiased discovery of cancer pathways and therapeutics using Pathway Ensemble Tool and Benchmark

Luopin Wang^1,2^na1,
Aryamav Pattnaik^2,3^na1,
Subhransu Sekhar Sahoo^2,3,
Ella G. Stone^2,4,
Yuxin Zhuang^2,3,
Annaleigh Benton^2,4,
Md Tajmul^3,5,
Srishti Chakravorty^2,3,
Deepika Dhawan⁶,
My An Nguyen^2,7,
Isabella Sirit^2,4,
Kyle Mundy⁷,
Christopher J. Ricketts ORCID: orcid.org/0000-0003-4814-7207⁸,
Marco Hadisurya ORCID: orcid.org/0000-0002-4453-5854^2,3,
Garima Baral ORCID: orcid.org/0000-0002-0074-921X^2,4,
Samantha L. Tinsley^2,4,
Nicole L. Anderson^2,4,
Smriti Hoda³,
Scott D. Briggs ORCID: orcid.org/0000-0003-2852-3594^2,3,
Hristos Z. Kaimakliotis⁹,
Brittany L. Allen-Petersen^2,4,
W. Andy Tao ORCID: orcid.org/0000-0002-5535-5517^2,3,10,
W. Marston Linehan ORCID: orcid.org/0000-0001-7983-3109⁸,
Deborah W. Knapp^2,6,
Jason A. Hanna ORCID: orcid.org/0000-0001-7253-7964^2,4,
Matthew R. Olson^2,4,
Behdad Afzali⁵^na2 &
…
Majid Kazemian ORCID: orcid.org/0000-0001-7080-8820^1,2,3^na2

Nature Communications volume 15, Article number: 7288 (2024) Cite this article

10k Accesses
12 Citations
23 Altmetric
Metrics details

Subjects

Abstract

Correctly identifying perturbed biological pathways is a critical step in uncovering basic disease mechanisms and developing much-needed therapeutic strategies. However, whether current tools are optimal for unbiased discovery of relevant pathways remains unclear. Here, we create “Benchmark” to critically evaluate existing tools and find that most function sub-optimally. We thus develop the “Pathway Ensemble Tool” (PET), which outperforms existing methods. Deploying PET, we identify prognostic pathways across 12 cancer types. PET-identified prognostic pathways offer additional insights, with genes within these pathways serving as reliable biomarkers for clinical outcomes. Additionally, normalizing these pathways using drug repurposing strategies represents therapeutic opportunities. For example, the top predicted repurposed drug for bladder cancer, a CDK2/9 inhibitor, represses cell growth in vitro and in vivo. We anticipate that using Benchmark and PET for unbiased pathway discovery will offer additional insights into disease mechanisms across a spectrum of diseases, enabling biomarker discovery and therapeutic strategies.

CombPDX: a unified statistical framework for evaluating drug synergism in patient-derived xenografts

Article Open access 29 July 2022

Cancer drug-tolerant persister cells: from biological questions to clinical opportunities

Article 02 September 2024

Engineered poly(A)-surrogates for translational regulation and therapeutic biocomputation in mammalian cells

Article Open access 04 January 2024

Introduction

The pathogenesis of complex diseases is often underpinned by dysregulation of several cellular pathways. This is well exemplified by neoplastic diseases, where abnormalities in multiple pathways contribute to cancerous transformation or in which cancer-disposing events perturb multiple cellular pathways. Despite notable successes in some cancers, limited therapeutic gains have been made by identification and targeting of single contributory genes. Precisely pinpointing key dysregulated celluar pathways in cancer and ranking them by order of importance is, therefore, of major biological and clinical significance. Doing so would enable molecular subtyping, determination of diagnostic/prognostic biomarkers and inform choice of effective and cancer-specific drug regimens. This is especially important at the present time because the incidence of cancer is increasing¹.

The contemporary approach to elucidate aberrations in biological pathways is dominated by omics methods, such as RNA-sequencing (RNA-seq). As expected, vast amounts of omics data have been generated for most cancers. However, unlocking their full potential has remained challenging. Biological pathway analysis, a computational search to identify and rank by importance the key relevant and/or dysregulated pathways among the thousands of known potential biological pathways, is a critical means for deciphering these data. Numerous in silico methods (reviewed here^2,3), for example GSEA⁴ and Enrichr⁵, have been developed and broadly applied to conduct pathway analyses. One common application of these methods has been to test whether specific pathways hypothesized by researchers are, in fact, enriched in a dataset. In this approach, pathway enrichment analyses have generally proven successful, but pre-suppose a priori knowledge or hypothesis. An alternative, highly desired application, especially in the context of exponentially growing omics datasets, is the adoption of unbiased discovery-based approaches. Here, the objective is to rank the “correct pathways” above all other pathways in the reported results without imparting bias from a priori knowledge. However, the performance and suitability of existing pathway analysis methods for this form of blinded or unbiased discovery remains largely unknown. In other words, it is not known how often the top pathway(s) returned by an analysis method represent(s) the most important perturbation. This is likely due to the fact that most methods were developed and evaluated (“benchmarked”) using computer simulated pathways⁶ or verified only in very small scale biological datasets from historical, low-resolution microarray technology^2,7,8,9. Therefore, a key limitation is lack of proper benchmarking and assessment of the capacity of these tools for pathway discovery under experimental settings, which often contain substantial “noise” due to stochastic events and biological variations. Such lack of a suitable evaluation platform predicated on large-scale experimental data significantly impedes informed deployment of pathway analysis tools for blinded discovery of perturbations in cancer.

Here, we designed an extensive “Benchmark” to explicitly measure the performance of many commonly used pathway analysis tools for unbiased discovery under experimental settings. Benchmark was constructed from large and diverse high throughput experimental data generated by ENCODE¹⁰, such that expected pathways are well-defined. Using Benchmark, we found that commonly used pathway analysis tools consistently underperformed in unbiased pathway discovery, even after extensive optimization. Given these findings, we next developed another method for biological pathway discovery, termed Pathway Ensemble Tool (PET), which statistically combines the rank metrics from multiple input methods. PET significantly outperformed all existing tools for unbiased identification of dysregulated pathways with high accuracy and was resistant to biological noise. Importantly, we applied PET to systematically determine biological pathways associated with favorable or unfavorable prognosis in 12 distinct cancer types. Combinations of leading genes from PET-identified prognostic pathways offered superior prognostication power compared to existing biomarkers, dividing patients into highly resilient and highly vulnerable categories. As a way of validation, these findings were confirmed in independent cancer datasets and found to be consistent with established aggressive molecular subtypes. We next carried out computational screening for repurposing existing drugs to normalize genes from PET-derived prognostic pathways for each cancer type. This was done by specifically looking for drugs that can down-regulate the leading genes of pathways associated with unfavorable prognosis or up-regulate the leading genes of pathways associated with favorable prognosis. This resulted in the identification of both established effective drugs and additional drug candidates. To explore these, we validated the top predicted drug, CCT068127, a CDK2/9 inhibitor, for bladder and cervical cancer in vitro and in vivo settings. We confirmed significant repression of cancer growth, which was superior to that seen with known clinically efficacious drugs in this category. We demonstrated that the predicted drug indeed restricts cancer cells by normalizing the expression of genes belonging to predicted prognostic pathways. We anticipate that Benchmark will enable development of highly accurate, unbiased pathway discovery methods such as PET, which can lead to tangible insights into molecular mechanisms and identify additional diagnostic/prognostic biomarkers and therapeutics.

Results

Benchmark enables identification of optimal settings and best practices for systematic pathway discovery in experimental datasets

We developed Benchmark as an evaluation platform to explicitly evaluate performance of pathway/geneset analysis tools (Fig. 1a, b; please note that the terms ‘pathway’ and ‘geneset’ are used interchangeably throughout the manuscript). Briefly, Benchmark comprises three components: 1) “input genesets” (IGS), representing genesets to be investigated – these are most commonly user lists of genes derived from high throughput assays, such as RNA-seq or Chromatin Immunoprecipitation sequencing (ChIP-seq) performed during projects of interest; 2) “target genesets” (TGS), representing curated biological pathways (e.g. in the KEGG and Gene Ontology databases) to be examined for their relationship to the IGS; 3) pathway enrichment analysis tools which test statistical relationships between IGS and TGS (Fig. 1a). The genesets used in Benchmark were extracted from large numbers (~1000) of high-throughput sequencing experimental assays from ENCODE¹⁰ (Supplementary Data 1a). Every geneset in Benchmark was composed of genes bound in ChIP-seq by a defined transcription factor (TF), in eCLIP-seq by a defined RNA binding protein (RBP), or in RNA-seq as genes differentially expressed after small interfering or hairpin RNA-mediated knockdown of a specified target (gKD). For each TF, RBP or gKD, at least two genesets from distinct cell lines (e.g. K562 and HepG2) or species (e.g. human and mouse) were represented. Importantly, Benchmark was designed to simulate a common scenario where the investigated genesets (i.e. IGSs) are derived from different sources (e.g. cell lines, species, conditions) than established biological pathways (i.e. TGSs). To achieve this, Benchmark was created such that genesets from the same TF, RBP or gKD but from distinct cell lines are better matched to each other than any other genesets. As an example, the IGS that contains STAT1-bound genes in K562 cells is more closely related to the TGS that contains STAT1-bound genes in HepG2 cells than any other TGS.

**Fig. 1: Benchmark for evaluating and optimizing existing pathway analysis methods.**

Consequently, for each IGS, an ideal pathway analysis method should identify the most closely matched (i.e. “correct”) TGS and to rank it as number one compared to all other TGSs. Thus, in Benchmark, the enrichment of TGS_{1 to n} is calculated for each IGS using the user’s pathway analysis method of choice to generate a score matrix, from which the rank of correct pathways can be determined for that method (Fig. 1a). Thus, by evaluating how accurately pathway analysis methods rank matched IGSs and TGSs compared to all other TGSs, Benchmark can assess the performance of different methods in identifying and ranking correct biological pathways in experimental settings.

Using Benchmark, we first evaluated the performance of different pathway analysis tools, namely, decoupler¹¹, piano¹², egsea⁶, ssgsea⁴, camera¹³, ora⁶, enrichment browser¹⁴ (e.browser), zscore¹⁵, gsva¹⁶, padog¹⁷, roast¹⁸, plage¹⁹, gage²⁰ and safe²¹. We compared three statistics for all pathway analysis methods. First, we determined the median rank of the correct pathway returned by each method. As experimental biologists typically focus on the top 10 reported pathways, we also determined the frequency by which the correct pathway is among the top 10 reported pathways, termed “Precision@10” (P@10). We also adopted another reliable performance metric commonly used in information retrieval, known as Average Precision at 10 (AP@10). AP@10 is the average (mean) of the precision scores at each of the first 10 positions, i.e. P@1, P@2,…, P@10. We tested four different settings for how the genes can be ranked and genesets can be generated in these methods (i.e. ratio of classes or signal2noise or edgeR- or DESeq2-based ranking (Fig. 1b, Supplementary Fig. S1a). The genes generated from matched IGSs and TGSs using these settings exhibited a significantly higher degree of overlap compared to randomly generated control IGSs and TGSs (Supplementary Fig. S1b). The top three methods, decoupler¹¹, piano¹², egsea⁶, notably all ensemble approaches, achieved a median correct pathway rank of 1–8, P@10 of 52–76%, and AP@10 of 44–69%, under the best setting (Fig. 1b). This creates a scenario where pathways with significant biological importance are regularly not among the top 10 reported pathways, hindering the utility of these methods for unbiased discovery.

We next focused on evaluating the performance of the three most used individual pathway analysis methods ora⁶, GSEA⁴ and Enrichr⁵ in more detail. We tested the four different settings described above (Supplementary Fig. S1c) as well as 3 different options for sizes of genesets (i.e. 200, 500 and 1000 genes) (Supplementary Fig. S1d). Specifically, under the default settings (i.e. signal2noise ranking), the median ranks of the correct pathway (7–14) and P@10 (45–54%) values were similar to the top three performing methods above (Fig. 1c, Supplementary Fig. S1c), indicating that ora, GSEA and Enrichr are not ideal for unbiased pathway discovery under default settings. We found that optimization of input parameters did significantly improve pathway discovery - a geneset size of 200 genes was near optimal for both GSEA and Enrichr (Supplementary Fig. S1d) and using DESeq2-ranked lists of differentially expressed genes returned the correct pathway at a median rank of 3 and 63–68% of the time among the top 10 reported pathways (Fig. 1c, Supplementary Fig. S1c). Similar improvement could be achieved when selecting the optimal binding intensity statistics in ChIP-seq or eCLIP-seq. For example, using the signal value of binding in ChIP-seq data to rank genes and create genesets usually leads to an improved pathway discovery performance compared to utilizing adjusted p-values (or q-values) of binding signals (Supplementary Fig. S1e). It is worth noting that the overlap size of matched IGSs and TGSs generated from ChIP-seq and eCLIP-seq data was significantly larger compared to that of shRNA-seq data (Supplementary Fig. S1f), which could explain the overall trend of better performance observed in pathway analysis methods on these genesets.

Collectively, these data indicate that Benchmark can be used for evaluating existing and future pathway analysis methods for discovery. Notably, although the most widely used individual methods can be optimized, they remain sub-optimal for unbiased pathway discovery, and ensemble approaches typically outperform these individual methods.

Pathway Ensemble Tool (PET) enables unbiased pathway discovery

We next sought to develop a high precision method to enable unbiased systematic pathway discovery. Leveraging the accuracy and robustness of ensemble approaches, we created Pathway Ensemble Tool (PET). PET works by calculating the unweighted mean of the rank statistics from GSEA, ora, and Enrichr, all operating under the optimal settings determined above (see Methods). We tested the accuracy of PET using Benchmark. PET significantly improved the median rank of the correct genesets to 1, indicating that PET tends to rank the correct pathway as number 1 (Fig. 1d). PET also significantly enhanced P@10 and AP@10 statistics, with 73–82% of the time the correct pathways being among the top 10 pathways reported by PET (Fig. 1d).

Pathway analysis tools are expected to conform to the false discovery rate (FDR) estimates, meaning that at 5% FDR levels, 5% of reported pathways are expected to be false. To determine the actual false discovery rates of pathway analysis methods, we randomly sampled and divided replicated RNA-seq datasets into two groups and ran pathway analysis methods comparing the two groups for enrichment of biological pathways (see Methods). Since the samples were randomly selected biological replicates, any reported pathways significantly different between the groups represent false discoveries. Surprisingly, we found that some of the pathway analysis methods had much higher than the expected false discovery rate, in some cases as high as 25% (Supplementary Fig. S1g, h). In fact, specific pathways, such as the KEGG Ribosome pathway, were common false discoveries (Supplementary Fig. S1i and Supplementary Data 1b), suggesting that caution should be exercised when these pathways are identified as significant during unbiased discovery approaches. However, we found that PET operated at the expected 5% false discovery rate (Supplementary Fig. S1g, h).

To further validate PET reported results and ensure its ability to discern pathways associated with alterations that drive differences in transcriptional programs within groups, we applied PET to compare fibroblast growth factor receptor 3 (FGFR3) mutant vs. wild-type (WT) bladder cancer cell lines, estrogen receptor (ER)⁺ vs. ER^– breast cancer cell lines, and epidermal growth factor receptor (EGFR) mutant vs. WT lung cancer cell lines, available in CCLE²² (Supplementary Data 1c), assessing all canonical pathways. In all three comparisons, the top reported pathways were supported by the literature (Supplementary Fig. S2). For instance, the top two enriched pathways in ER⁺ breast cancer cell lines were “Estrogen dependent gene expression” and “Estrogen receptors (ESR)-mediated signaling”, which corresponds with the established understanding of estrogen receptor signaling in ER⁺ breast cancer^23,24. In EGFR mutant lung cancer cell lines, the top two enriched pathways were “Surfactant metabolism” and “Integrin5 pathway”, which are both recognized for their cross-talk with EGF signaling in lung cancer^{25,26,27,28,29}. Moreover, the “MYC activity pathway” and its regulated “Ribosomal RNA processing” pathway were among the top enriched pathways in FGFR3 mutant bladder cancer cell lines, aligning with known interactions between FGFR3 and MYC in bladder cancer, as supported by existing literature^{30,31,32,33,34} (Supplementary Fig. S2). These data demonstrate that PET is able to discern subtle yet significant variations in pathway activities across different genetic backgrounds.

Because most pathways are defined under real experimental conditions, there are multiple variables, such as cell type, species and stimulation, that contribute a degree of differences or inaccuracies in genesets. This “noise” manifests as missing or unrelated genes among matched IGSs and TGSs. To test the robustness of common pathway analysis tools, we evaluated their performance under conditions of simulated noise by replacing genes from the target genesets with increasing proportions of unrelated genes selected randomly from the genome. We found most available pathway analysis methods to be highly sensitive to such differences as their performance significantly dropped after >30% of unrelated random genes were introduced into existing genesets (Fig. 1e). Importantly, however, PET was highly robust to even as high as 60% added noise (Fig. 1e). Collectively, these results indicate that PET is robust, has controlled false discovery rate and high precision, thus can enable reliable and systematic identification of key dysregulated pathways in omics datasets.

PET identifies prognostic pathways in 12 distinct cancer types

We next sought to deploy PET for unbiased pathway discovery in cancer. We selected 12 distinct cancer types from the cancer genome atlas (TCGA) that had cancer tissue RNA-seq data from the earliest stages of disease (based on the pathological stage or primary tumor size/extent) in at least 20 subjects that had unfavorable prognosis (i.e. succumbed to death) during the follow-up period (Fig. 2a; Supplementary Data 2a, b). We used PET to identify the pathways enriched in samples from patients with favorable (survivors) or unfavorable (deceased) prognosis for each cancer type, since these pathways could act as potential biomarkers of outcomes. To ensure that age, sex and cancer stage were not confounders, patients with favorable prognosis were randomly matched with those with unfavorable prognosis (see Methods). We confirmed that the groups were matched in age and sex and that age and sex were not determinants of outcome (Supplementary Fig. S3). Since tumor infiltrating lymphocytes, classically depicted by CD8-expression, are known determinants of outcome, we also noted that CD8 expression did not differ between subjects with favorable and unfavorable outcomes and that CD8 expression could not distinguish survivors from deceased subjects (Supplementary Fig. S4a). We next examined >1000 canonical biological pathways curated in MSigDB⁴ that could potentially differentiate cancer patients that survived from those that did not using PET. Out of these pathways, PET identified tens of biological pathways significantly associated with favorable or unfavorable prognosis for each cancer type (Fig. 2b, Supplementary Data 2c). Examples of the top favorable and unfavorable prognostic pathways in each cancer types are shown in Supplementary Fig. S4b and Fig. 2c, respectively. As proof-of-principle, we validated the reported pathways associated with unfavorable outcome in bladder cancer in three independent cohorts^35,36,37 (Supplementary Fig. S4c). Some pathways were significantly dysregulated in multiple different cancers (e.g. cell cycle in bladder carcinoma (BLCA), cervical squamous cell carcinoma (CESC), kidney renal papillary cell carcinoma (KIRP) and pancreatic adenocarcinoma (PAAD)) whereas others were cancer-specific (e.g. glycerolipid metabolism in breast invasive carcinoma (BRCA) and arginine and proline metabolism in KIRP. Some of these pathways have been shown to have biological significance in previous studies but have not yet been formally recognized as prognostic markers. For example, we identified triacylglyceride synthesis as a significant poor prognostic pathway in breast carcinoma (Fig. 2c). This is consistent with fatty acid synthesis as a requisite for breast cancer metastasis to brain³⁸ and association of upregulated cellular triacylglycerol with enhanced survival of human breast cancer cells³⁹. Another identified poor prognostic pathway that was shared between multiple cancer types including bladder and pancreatic cancers was extracellular matrix (ECM) organization (Fig. 2c). This is in line with prior knowledge about the role of ECM components, such as laminin, collagen and fibronectins in invasion, progression, and metastasis of bladder and pancreatic cancers and restriction of antitumor drug delivery^40,41. Similarly, we found that Vitamin D (VitD) receptor pathway to be a strong favorable prognostic pathway in stomach cancer (Supplementary Fig. S4b). This is congruent with the fact that in gastric cancer patients, VitD levels serve as an independent prognostic factor and VitD supplementation can reduce the risk of developing gastric cancer^42,43.

**Fig. 2: PET identifies pathways and gene combinations associated with unfavorable prognosis.**

PET-identified biological pathways guide combination biomarker discovery

We next explored the genes in significant prognostic pathways to identify a subset of genes with high prognostic power that might have clinical utility. Single gene expression values have provided some level of prognostic power for different cancers^44,45. However, expression of multiple biomarkers are consistently more accurate diagnostic/prognostic tools to stratify patients and improve cancer management^46,47. This has fueled a growth in studies identifying gene signatures associated with response to therapy and/or survival. However, these signatures are typically derived from lists of differentially expressed genes and characteristically consist of hundreds of genes, making clinical application of these large signatures challenging⁴⁸. The alternative brute force strategy, which involves iteratively uncovering combinations of genes with prognostic power among ~20,000 protein-coding genes in the genome, is computationally challenging (or impossible) and statistically unfeasible. For example, an exhaustive search space for a 3-gene combination has 8 trillion permutations, which is statistically unfeasible given the number of cancer samples available for study. We postulated that limiting the search space of combinatorial biomarker discovery to leading genes from prognostic pathways might alleviate this problem. We therefore selected up to 20 of the most significant prognostic pathways reported by PET and extracted their leading genes from the corresponding GSEA results. We then generated all possible 1–5 gene combinations from the leading genes and quantified prognostic power by calculating the area under the curve (AUC) statistic of the receiver operating characteristic (ROC) curve for the ability of a gene or gene combinations to distinguish survivors from deceased subjects (see Methods). Combinations of 3–5 leading genes from prognostic pathways were found to be the best predictor of prognosis. Notably, none of the individual gene biomarkers displayed a higher AUC than the combinations identified. These are shown in Fig. 2d, e, Supplementary Fig. S4d, e and Supplementary Data 2d, e, together with their corresponding overall survival plots (Fig. 2f, Supplementary Fig. S4f). As an example, high expression of two collagen genes (COL5A1, COL3A1) and vitronectin (VTN) provides a significantly unfavorable overall survival outcome (<20% 10-year survival; HR 9.5; p < 0.0001) in renal papillary cancers (KIRP; Fig. 2e, f). The ability of this 3-gene combination to distinguish survivors from non-survivors over a prolonged period of follow-up is superior to any other biomarkers of renal cancer in current use⁴⁹. Previous studies have linked the expression of individual collagen genes such as COL5A1 and COL23A1 to renal cancer^50,51. Collagen expression has also been associated with tumor grade⁵² and plays a key role in promoting tumor cell invasion⁵³ in renal cancers. Furthermore, vitronectin is linked to inducing the differentiation of cancer stem cells and the formation of tumors⁵⁴. Thus, it is biologically conceivable that high expression of this 3-gene combination biomarker provides the observed dramatically unfavorable overall survival outcome, surpassing the prognostic power of individual collagen genes alone.

Moreover, the combination biomarkers outperformed any individual gene including differentially expressed genes (DEGs; Supplementary Data 2f) in all cancers, and no individual gene was found to be the most effective on its own (Fig. 2d, e, Supplementary Fig. S4d, e). Illustrative examples are indicated in Fig. 2f and Supplementary Fig. S4f, showing Kaplan–Meier overall survival curves of the top combination biomarker compared to the top DEG in each cancer type. As further illustration, we compared the performance of the PET-derived combination biomarkers (Fig. 2f) to that of PAI-1 (Plasminogen activator inhibitor 1) and uPA (Urokinase plasminogen activator) (Fig. 2g), because these two genes are frequently investigated as potential cancer prognostic markers⁵⁵. PAI-1 is associated with poor prognosis in some cancer types, such as breast, colon, and lung cancer. uPA is a protease involved in the degradation of the extracellular matrix, and linked to tumor invasiveness, angiogenesis, and poor prognosis in some cancers, including breast, prostate, and ovarian cancers. We found that although elevated expression of PAI-1 in CESC, LUSC, PAAD, and STAD, and uPA in PAAD may have prognostic value (Fig. 2g), they are both inferior to the identified top combination biomarkers (Fig. 2f). Moreover, the identified combination biomarkers accurately predicted outcomes across independent cohorts of patients. For example, >70% of biomarkers in TCGA cohorts exhibited consistent favorable or unfavorable overall survival outcomes in several independent cohorts of bladder^35,36,37, breast⁵⁶, pancreatic⁵⁷, and stomach^58,59 cancers (Supplementary Fig. S4g; Supplementary Data 2g). We also investigated the percentage of combination biomarkers that consistently exhibited predictive power in later-stage samples. We found that in certain cancer types (e.g. HNSC, PAAD), the early-stage biomarkers consistently predict outcomes even in later stages of cancer. However, in other cancer types (e.g. BLCA, KIRP), only a small fraction of the early-stage biomarkers demonstrate consistency in their predictive ability (Supplementary Data 2h, i).

Mutations play an important role as biomarkers in clinical settings, therefore we sought to compare the common cancer mutations versus gene expression-based biomarkers for their power in determining overall survival outcome. To this end, we focused on the three most frequently mutated genes in cancer samples, BRAF, TP53 and KRAS, that often confer poor prognosis^60,61,62. For each TCGA cancer type with sufficient number of samples exhibiting mutations (n > 15), we split the samples into two groups based on their mutational status: wild-type samples without the mutation and samples containing mutant oncogenes. To assess whether these mutations could serve as indicators of survival, we conducted log-rank tests between the wild-type (WT) samples and mutated (mut) samples for each cancer type. Among all the cancer types examined, pancreatic cancer (PAAD) was the only one in which the presence of KRAS mutation served as a significant indicator of poorer overall survival. (Hazard ratio = 1.8; logrank p-value = 0.02) (Supplementary Fig. S4h). Nevertheless, the KRAS mutation exhibited less discriminatory power compared to gene expression-based biomarkers in distinguishing between favorable (Supplementary Fig. S4f) and unfavorable (Fig. 2f) survival outcomes.

We also assessed the susceptibility of the identified biomarkers or conventional DEGs and their combinations to overfitting, a commonly encountered pitfall in the search for trustworthy biomarkers that leads to poor performance when tested on different samples. We found that combination biomarkers were significantly less prone to overfitting compared to those derived from conventional DEGs (Supplementary Fig. S4g). This indicates that combination biomarkers have a better balance between complexity and robustness, which makes them more suitable for real-world applications. Collectively these data indicate that combination biomarkers identified using PET have great potential to stratify patient outcomes and perform better than existing known biomarkers.

Identifying resilient and vulnerable cancer patients is crucial as it allows healthcare providers to personalize treatment plans, improve prognosis predictions, prioritize resources, and match patients with appropriate clinical trials. This leads to more effective and efficient care, better outcomes, and improved quality of life for patients. We thus considered whether combining the top biomarkers of favorable with the top biomarkers of unfavorable outcome would provide added prognostication power. To do this, we divided the samples into two groups based on expression of the top favorable combination biomarker and two groups based on expression of the top unfavorable combination biomarker. By overlapping these groups, patients can be further divided into subgroups according to simultaneous low and high expressions of the top biomarkers of favorable and unfavorable survival. The resilient category was defined as having high levels of the top favorable combination biomarker and low levels of the top unfavorable combination biomarker, while the vulnerable category was defined as having low levels of the top favorable combination biomarker and high levels of the top unfavorable combination biomarker. Indeed, this approach identified highly resilient and highly vulnerable cancer patients (Fig. 3a, Supplementary Fig. S5a, Supplementary Data 3a). For example, resilient patients with papillary or clear cell kidney cancer (KIRP and KIRC) had nearly 90–100% survival, while vulnerable patients had only 10–20% overall survival over a prolonged period of follow-up (~4000 days) (Fig. 3b). This prognostication strategy also corroborated with established independent prognostic biomarkers in these patients⁶³. Specifically, vulnerable patients have more than two-fold higher frequency of repressed CDKN2A expression via copy number loss or DNA methylation when compared to resilient patients (Fig. 3c, Supplementary Data 3b). The same approach in other cancers also yielded added benefit, because in almost all cancers combining the top biomarker of favorable and the top biomarker of unfavorable outcome provided highly accurate prognostication (Supplementary Fig. S5a, Supplementary Data 3a). For instance, the 10-year overall survival rate for resilient patients with bladder cancer is nearly 9 times higher than that of vulnerable patients (HR: 4.6; p < 0.0001; Supplementary Fig. S5a). Such accurate prognostication can have a major influence on clinical decision making and treatment course and can potentially help rationalize less or more aggressive individualized treatment for patients within different vulnerability.

**Fig. 3: Juxtaposition of favorable and unfavorable prognostic biomarkers offers precise overall survival stratification.**

The molecular subtype of tumors, particularly basal and luminal subtypes, is one of the factors implicated in predicting the outcome of bladder cancer. The basal subtype is associated with muscle invasiveness and worse overall prognosis but is more responsive to platinum-based chemotherapy⁶⁴. To verify if identified prognostic markers corroborate with this known prognostic factor, we sourced three independent cohorts of bladder cancer samples^35,64,65 that was annotated with the two major molecular subtypes. Indeed, we found that combination biomarkers of unfavorable prognosis were expressed at high levels in the basal molecular subtype across all three cohorts (Supplementary Fig. S6a, Supplementary Data 3c).

We had previously demonstrated using Benchmark that PET is highly robust to noise (Fig. 1e). To formally test this under real-world settings, we obtained RNA-seq datasets from a different species of mammal. We selected canine bladder cancers that included annotations for luminal and basal subtypes⁶⁶ as these sub-types behave similarly to humans and are considered as having good and poor prognosis, respectively. The genes associated with poor prognosis in human bladder cancer effectively differentiated luminal and basal subtypes of canine bladder cancer as well (Supplementary Fig. S6b), demonstrating the robustness of these biomarkers even across different species. Collectively, these data indicate that our biomarker discovery approach can identify combination prognostic biomarkers with significant clinical value and to stratify molecular subtypes of cancer with differential prognosis.

PET-identified biological pathways guide drug repurposing

Since prognostic markers are derived from dysregulated pathways, we postulated that manipulating gene expression to promote the biomarkers of favorable prognosis and demote the biomarkers of poor prognosis by re-purposing known drugs might provide clinical benefit. Major consortium efforts have created a valuable library of integrated cellular signatures (LINC) that includes cellular transcriptional responses to large numbers of pharmacologic perturbagens⁶⁷. Similar signatures have also been derived from mining large numbers of publicly available sequencing read archive (SRA) data⁶⁸. Recently we showed that computational screening for drugs that normalize expression of genes in highly prognostic and pathogenic pathways using cellular signatures of those drugs provides significant therapeutic potential^69,70. Thus, we computationally screened >1600 drugs (mostly FDA-approved) for which the up- or down-regulated drug targets are known⁶⁸ (Supplementary Data 4a). Drugs were ranked by their ability to suppress unfavorable and to promote favorable prognostic biomarkers. The rank was calculated based on the significance of overlap between genes up/down regulated by a drug and genes from favorable/unfavorable prognostic pathways in cancer (Fig. 4a; see Methods). This in silico screening predicted several drugs already in clinical use for each cancer type, as well as many drugs that have not yet been tested (Fig. 4b and Supplementary Data 4b). Key interactions between highly significant unfavorable prognostic pathways and drugs that can normalize these are summarized for each cancer type in Fig. 4c. By way of validation, the top predictions included drugs currently in clinical practice or under trial for some of these cancers. For example, doxorubicin, a commonly used chemotherapy agent for the management of metastatic breast cancer^71,72, was the second top prediction for breast cancer. Celastrol, the top prediction for colon cancer, was recently shown to inhibit proliferation of colorectal cancer cells and migration via the PI3K/AKT pathway⁷³. Ruxolitinib, the top prediction for liver cancer, is known to have marked tumoricidal effects on hepatocellular carcinoma by inhibiting JAK/STAT signaling⁷⁴. Interestingly, irinotecan, a topoisomerase inhibitor, which regulates the cell cycle, was the top prediction for pancreatic cancer (PAAD), a tumor type in which “cell cycle” is among the top pathways associated with unfavorable prognosis (Fig. 2c). Last year, the FDA approved investigational use of liposomal irinotecan for first line treatment of pancreatic cancer patients^75,76,77. Two other pathways strongly associated with unfavorable outcome from pancreatic adenocarcinoma are the Erbb1 (also known as EGFR) and MYC pathways; both of these have previously been associated with worse outcomes from this cancer type^78,79 and are also inhibited by irinotecan⁸⁰. These data indicate that matching drug signatures and prognostic markers can be used for repurposing existing therapeutics.

**Fig. 4: PET-derived prognostic pathways guide drug repurposing.**

The CDK2/9 inhibition restricts cervical and bladder cells in vitro and/or in vivo

We next sought to test the efficacy of one of our predicted drugs. The top predicted drug for bladder and cervical cancer was a cyclin-dependent kinase (CDK)2/9 inhibitor named 0175029 (Fig. 4b). This was somewhat surprising because inhibitors of other CDKs, specifically CKD4/6 inhibitors, have previously been proposed as monotherapy for bladder cancer. The use of CDK4/6 inhibitors have been justified because elevated expressions of either CDK4 or CDK6 are associated with poor survival in bladder cancer, whereas high expression of CDK2/9 have been associated with better survival (Supplementary Fig. S7a). However, CDK4/6 inhibitors including Palbociclib did not meet the primary endpoint during phase II clinical trials for bladder cancer^81,82,83,84. Moreover, taking the conventional DEG-based approach, where one normalizes the most differentially expressed genes between cancer survivors and non-survivors, does not identity CDK2/9 inhibitor as a significant lead in either bladder or cervical cancers (Supplementary Fig. S7b, Supplementary Data 4c). However, it is worth noting that although (CDK)2/9 inhibitor emerges as the top prediction for bladder cancer using DEG-based approach, it does not reach the statistical significance threshold (Supplementary Fig. S7b). We therefore scrutinized whether PET-based drug prediction specifically advocates CDK2/9 or CDK4/6 inhibition in bladder cancer. Since, the drug database used for screening above⁶⁸ did not include any CDK4/6 inhibitors, we computationally screened an additional 5425 compounds from the LINC1000 database⁸⁵ that included the CDK4/6 inhibitor Palbociclib. PET-based drug predictions did not advocate Palbociclib (p-value = 1; rank 1601) but returned 0175029, a CDK2/9 inhibitor as the number 1 scoring drug that can significantly normalize genes of prognostic pathways (Supplementary Fig. S7c, Supplementary Data 4d).

For experimental validation, we then tested the ability of a panel of CDK inhibitors to restrict the growth of bladder and cervical cancers in vitro, including Palbociclib (a CDK4/6 inhibitor), GW8510 (a CDK2 inhibitor), and Seliciclib (a CDK2/7/9 inhibitor) (Supplementary Fig. S7d). Because 0175029 is not commercially available we used its recently identified analog CCT068127 which is also a potent CDK2/9 inhibitor⁸⁶. Consistent with PET-based drug prediction, CCT068127, but not Palbociclib, significantly inhibited the growth of both cervical and bladder cancer cell lines, whereas the others either did not, or had minimal effect (Fig. 5a). We also found that in two out of three independent cohorts of bladder cancer samples, the expression of CDK9 (but not CDK2) was higher in the basal subtypes of bladder cancers compared to luminal subtypes (Supplementary Fig. S7e), suggesting that CDK9 inhibition might be more beneficial for the basal subtype of bladder cancer. CDK9 expression also showed a significant correlation (Pearson r = 0.71; p-value = 0.006) with unfavorability scores across bladder cancer cell lines available in the CCLE database²² (see Fig. 5b and Supplementary Data 5a). The unfavorability score for each sample was calculated as the mean (z-score) expression of genes associated with an unfavorable outcome (Supplementary Data 2e). To expand on this, we referenced publicly available data from the GDSC database⁸⁷ where a range of bladder cancer cell lines were treated with a different CDK9 specific inhibitor (i.e. CDK9_5038) and IC50 values were calculated. Despite the difference in inhibitors, we observed a trend where cell lines with higher unfavorability scores required higher concentrations of the inhibitor to achieve similar inhibition of proliferation (Pearson r = 0.54, p < 0.06) (Fig. 5b). These associations suggest a broader efficacy of CDK9 inhibitors across bladder cancer cell lines, with efficacy correlating inversely with unfavorability scores.

**Fig. 5: Predicted CDK2/9 inhibition restricts the growth of cervical and bladder cancers in vitro and in vivo.**

At present, the first line of treatment for bladder cancer consists of chemotherapy regimens that feature cisplatin. As expected, cisplatin also significantly inhibited the growth of bladder cancer cell lines (Fig. 5a). It is worth noting that our approach also predicted that cisplatin desirably normalizes prognostic genes (Supplementary Fig. S7c). To further investigate the impact CCT068127 on bladder cancer cells, we performed RNA-sequencing on T24 bladder cancer cells treated with carrier or CCT068127 (see Methods). 822 genes were found to be up-regulated and 490 genes down-regulated (RPKM > 2, Fold change ≥ ±3 at adjusted p-value ≤ 0.05; Fig. 5c; Supplementary Data 5b). CCT068127 significantly reduced the expression of genes associated with CDK9 inhibition, while showing no significant effect on genes associated with CDK4/6 inhibition (Fig. 5c). To test whether CCT068127 can normalize the predicted prognostic signature, we gathered all the genes linked to favorable or unfavorable prognosis from the prognostic pathways identified by PET and conducted a gene set enrichment analysis. As anticipated from the drug predictions, genes that were upregulated by CCT068127 were significantly enriched in those with PET-predicted favorable prognosis. Conversely, genes that were downregulated by CCT068127 were significantly enriched in in those with PET-predicted unfavorable prognosis (Fig. 5c, d). The up-regulated genes included PRDX4 and DIO3, which were determined by PET to be key markers of a favorable prognosis (Supplementary Fig. S4e). Conversely, the down-regulated genes including XAF1, KPNB1 and PAS1 which were found to be significant markers of an unfavorable prognosis (Fig. 2e). These data corroborated the expectation that the top predicted drug normalized identified prognostic genes.

We next tested the in vivo efficacy of CCT068127 in mice. Pharmacokinetic studies identified daily administration of up to 30 mg/kg of CCT068127 for two weeks to be well-tolerated by mice, as evidenced by stable weight during that time (Supplementary Fig. S7e) and lack of gross pathology on necropsy. We then implanted bladder cancer cells subcutaneously in immunodeficient mice and after two weeks, when tumors were approximately 100 mm³ in size, randomly assigned mice to control or treatment groups, injecting them with 30 mg/kg/day of DMSO or CCT068127 i.p. for 10 days, respectively (Fig. 5e). CCT068127-treated mice demonstrated a significant reduction in both tumor volumes and weights (Fig. 5f, g), indicating that CCT068127 was effective at restricting tumor growth in vivo.

Taken together, these data indicate that, despite being unexpected from traditional methods, CDK9 inhibition predicated on favorable and unfavorable prognostic biomarkers suppress tumor cell growth both in vitro and in vivo. Furthermore, they corroborate the notion that a PET-based approach can guide repurposing drugs that prove effective in the treatment of cancer.

Discussion

The contemporary investigation of the underlying mechanisms in cancer relies disproportionately on obtaining genome-wide omics data. Biological pathway analysis is a crucial tool in interpreting these data. As such, several pathway analysis methods have been developed, with the implicit assumption that they identify and prioritize the most important pathways among their top-ranked outputs (reviewed here^2,3). Indeed, these methods have been successful in confirming some of the speculated and/or dysregulated pathways. Many of these methods have also been tested on simulated data⁶ or anecdotally verified in small scale biological datasets^2,8,9. However, the performance and limitations of these methods for an unbiased discovery of related biological pathways has been largely unknown.

Here, we developed a platform named Benchmark to explicitly measure performance of pathway analysis tools for discovery under experimental settings. It is worth noting that Benchmark has several key advantageous features, making it superior to existing gold-standard evaluation platforms⁷. First, IGSs and TGSs that are used for evaluation in Benchmark are derived from large numbers (~1000) of experimental datasets, where expected pathways are well-defined. This is in clear contrast to the few existing evaluation platforms that utilize small experimental datasets^2,7, computer-simulated pathways⁸⁸, and/or subjective measures for defining pathways². Second, since IGSs and TGSs in Benchmark are derived from experimental data, they eliminate systematic biases introduced by computer simulations. For example, IGSs and TGSs in Benchmark are free from biases introduced due to the assumption by most simulations that genes within each pathway are independent of each other. Third, IGSs and TGSs recapitulate a realistic setting where genesets under investigation (i.e., IGSs) are often derived under conditions (e.g., cell lines, species, conditions) that are different from those of known biological pathways (TGSs). Fourth, because Benchmark uses experimental data, it contains noise inherent to biological data, therefore provides a platform for a faithful readout of the performances of pathway analysis tools under real-life settings. Lastly, we present two evaluation measures, namely P@10 and AP@10, which accurately reflect the practical application of pathway analysis methods for biologists.

Our findings from Benchmark show that while common pathway analysis methods are effective for confirming speculated pathways, they are not optimal for unbiased and systematic discovery of the most disrupted pathways. Our study found that PET, which integrates the ranking statistics of several pathway analysis tools, is significantly more effective for discovery and has high accuracy and noise resistance. It is worth noting that ensemble approaches, such as decoupler¹¹, piano¹², and egsea⁸⁹, have been previously developed for pathway analysis and have generally resulted in improved outcomes compared to single method approaches. In our evaluation using Benchmark, we found that these ensemble methods, when employed with default settings and when the underlying methods and input parameters were chosen ‘blindly’ without careful consideration, either performed marginally better or were comparable to their underlying methods. Our results demonstrated that optimizing the input parameters of individual pathway analysis methods not only significantly improves their performance but also enhances the results of the ensemble approach overall. Here we have optimized and combined results from the three most common pathway analysis methods. However, additional research is required to assess how to effectively integrate statistics from other pathway analysis methods to improve pathway discovery. The number of replicates in each group is an important consideration in controlling for false discovery rates⁹⁰. Our Benchmark is generally constructed from 2 replicates per group available in ENCODE¹⁰. However, the average number of overlapping genes among related genesets (n = 16) compared to unrelated genesets (n = 1.3) across cell lines, obtained from 2 replicates (Supplementary Fig. S1b, f), suggests that this sample size is sufficient for accurately distinguishing related from unrelated genesets. Future studies will extend the Benchmark to include additional genesets and replicates, and to evaluate the suitability of pathway analysis methods for increasingly available single cell data.

We next applied PET to systematically determine pathways associated with prognosis in 12 different well-characterized cancer types. The predicted pathways from PET were correlated with prognostic features hypothesized from biological experiments and were confirmed in independent datasets. We further leveraged genes from the prognostic pathways to identify single gene and combination prognostic biomarkers. It is important to note that exhaustive genome-wide searches to identify combinations of a few genes as biomarkers (even as few as 3) is computationally and statistically unfeasible. Thus, many studies extract large gene signatures (>50 genes) associated with favorable or unfavorable prognosis by identifying differentially expressed genes (DEGs). Although useful, these signature sets are challenging to apply in a clinical setting⁴⁸. Our data indicate that a search space of only 3–5 leading edge genes from prognostic pathways yeilds gene combinations with superior prognostic power when compared to single genes or DEGs. Importanly, combining leading biomarkers of favorable and unfavorablle prognoses resulted in the categorization of patients into highly resilient and highly vulnerable groups, providing a guidance for determining the level of treatment intensity. Moreover, our data show that genes from prognostic pathways distingush known molecular subtyes of cancer with high precision. Thus it is likely that these genes will also identify previously unknown molecular subtypes, but further investigations are needed to provide proof. It is important to recognize that the selection of samples can influence the identification of prognostic pathways and biomarkers. In our analyses, we classified patients with unfavorable outcomes as those in early cancer stages who passed away, and age and sex matched patients with favorable outcomes as those in later stages who were marked as alive in the TCGA cohort. A more precise alternative selection criterion would have been to classify deceased patients with survival times less than the median, and age and sex matched alive individuals with days to last follow-up more than the median of all samples as unfavorable and favorable selections, respectively.

Pathways with prognostic potential could provide a means for identification of effective therapeutics. Indeed, our data indicate that a computational search for therapeutics that can normalize the expression of genes from prognostic pathways can be used for drug repurposing. Proof-of-concept in vitro and in vivo validations of the top predicted repurposed drug in bladder and cervical cancer cells indicate the potential of our approach. The prediction of CDK9 but not CDK4/6 by our approach was somewhat unexpected but aligned with the observation that Palbociclib (a CDK4/6 inhibitor) did not meet the primary endpoint during phase II clinical trials in bladder cancer^81,82,83,84. Previous studies have indicated that alterations in the CDK4/6 pathway could potentially drive bladder cancer^91,92,93 but the cell lines and assays used in these studies are slightly different than ours. Consistently, our cell viability measurements, conducted using the CCK-8 assay, also demonstrated a modest yet significant effect of Palbociclib on T24 cells. Further in vivo investigation using patient-derived bladder cancer xenografts (PDX) models are warranted⁹⁴, to compare the effects of CDK9 inhibitors and CDK4/6 inhibitors on primary bladder tumors in future studies. Of note, our data driven approach will also provide the mechanistic rationale (i.e., the involved pathways) for identified therapeutics and offer insights into basic mechanisms that can be exploited for drug development. However, whether predicted prognostic pathways contribute directly to the pathophysiology (as opposed to merely representing biomarkers) can only be determined via experimentation. It is also important to note that prognostic biomarkers, which predict patient outcomes, and predictive biomarkers, which determine the efficacy of individual drugs, are often distinct from each other. With increasing omics data from responders and non-responders, future research will be needed to apply pathway enrichment analysis to identify not only prognostic pathways but also predictive pathways.

Methods

Ethics statement

All procedures involving mice were conducted in accordance with the guidelines established in the Guide and Use of Laboratory Animals of the National Institutes of Health, USA. The protocol was approved by the Institutional Animal Care and Use Committee at the Purdue University (PACUC Protocol #1908001941). Animals were housed in Hansen mouse facility at Purdue University. All mice are fed and watered ad libitum, with consistent access to food and water, and housed in a facility maintained at ambient temperature and humidity with 12-h light/12-h dark cycles. All mice were sacrificed prior to the tumors reaching a mean diameter of 2.0 cm or in the event of ulceration with necrosis/infection as outlined in the PACUC policy on “Human endpoint criteria for rodent tumor studies”.

Cell culture

T-24 and UMUC3, human urinary bladder cancer cells were acquired from ATCC (HTB-4^TM and CRL-1749^TM, respectively). Both the cells were maintained in Roswell Park Memorial Institute medium (RPMI 1640; Gibco) supplemented with 10% fetal bovine serum (FBS; Hyclone Lab, Logan, Utah), 1% penicillin/streptomycin (PS) (Invitrogen, Carlsbad, CA, USA) in humidified chamber with 5% CO₂ at 37 °C. HeLa, human cervical carcinoma cells were acquired from ATCC (CCL-2™) and maintained in complete Eagle’s Minimum Essential Medium (MEM, Gibco) at similar conditions. The cell lines used here were not independently authenticated but were regularly tested for mycoplasma contamination using the Mycoplasma Detection Kit (abm, cat#G238).

Cell viability assay

CDK inhibitors GW8510 (CDK2), Palbociclib (CDK4/6), Seliciclib (CDK2/7/9), CCT068127 (CDK2/9) were purchased from AOBIOUS, INC (Gloucester, MA, USA) and were dissolved in dimethyl sulfoxide (DMSO) to make stock solutions of 10 mM and stored at −20 °C. Working concentrations, as indicated, were further diluted with either cell culture media or saline. Hela, UMUC3 and T-24 cells (approximately 20,000 per well) were seeded in a 96-well plate 24 h before prior to treatment with various concentrations of the drugs for 72 h at 37 °C. Cell viability was measured using Cell Counting Kit-8 (APExBIO; Houston, USA) reagent following manufacturer’s recommendations. Briefly, 10 ul of CCK-8 reagent per 100 ul media was added to each well and kept at 37 °C for 2 h. The absorbance of WST-8 formazan was recorded using a Veritas Microplate Luminometer at 450 nm (BioTec; Epoch2). The viability (%) was calculated by using formula: Cell viability (also called survival rate) (%) = [(Absorbance of tested compound − Absorbance of blank)/(Absorbance of control − Absorbance of blank)] × 100.

RNA-sequencing and data analysis

T-24 cells (approximately 1 × 10⁶ per well) were seeded in a 6 well plate. Next day, cells were treated with either vehicle (DMSO) control or CCT068127 at 1uM concentration for 48 h. Cells were stored with TRI Reagent (ZYMO Research International, Tustin, CA) at −80 °C. Genomic RNA was extracted from cells using a Direct-zol RNA MiniPrep Plus (ZYMO Research International; Tustin, CA). An Agilent 4200 TapeStation system was used to determine quality and quantity of RNA. Samples having RNA integrity number (RIN) value above 8.5 were used for library preparation. 500 ng total RNA for each sample was subjected to NEBNext Poly(A) mRNA Magnetic Isolation (E7490) and resulting mRNA libraries were prepared for RNA-seq by NEB Ultra II kit (New England Biolabs) following manufacturer’s protocols and sequenced by NovaSeq (Illumina). All sequenced reads were aligned to hg38 genome using RSEM⁹⁵ with parameters “rsem-calculate-expression --bowtie-n 1 --bowtie-m 100 --seed-length 28 --bowtie-chunkmbs 1000”. Differential expression analysis was performed using edgeR. Average RPKM ≥2 in at least one condition, Fold change ≥3 and adjusted p-value ≤0.05 was used to determine DEGs.

The TCGA data analysis used read count, normalized expression profiles, FPKM, and clinical information from samples of 12 cancer types (as shown in Fig. 2a). Stage selection was based on ajcc_pathologic_stage (NCI Thesaurus Code: C28257) and ajcc_pathologic_t (NCI Thesaurus Code: C48881 and C48739) from clinical data. In cases where both indicators were present but contradictory, the indicator that corresponds to an earlier stage of tumor staging is prioritized. To account for age as a potential confounder, all deceased samples and those annotated as alive with an age higher than the first quantile of all deceased patients in the dataset were included. The selected samples were tested to confirm that age and sex is not a significant confounder for all cancers.

For canine data analysis, we employed PET to discover prognostic pathways in later stages of human Bladder Urothelial Carcinoma (BLCA) patients (annotated as T3 and T4). The ability to predict outcomes for single genes and combinations of genes from the identified pathways was determined (see below). Finally, the genes with the strongest prognostic abilities were used to distinguish molecular subtypes of canine tumors.

Xenograft mouse model

The original strain of all mice used in this study were purchased from Jackson Laboratory, Strain# 025216, NOD.Cg-Prkdc^scid H2-K1^b-tm1Bpe H2-Ab1^g7-em1Mvw H2-D1^b-tm1Bpe Il2rg^tm1Wjl/SzJ and were acclimatized and maintained in Hansen mice facility at Purdue University. Due to availability, 10–11 weeks-old female mice were used in all experiments. Tumors were implanted into the flanks of these mice. Briefly, cultured UMUC3 cells were resuspended at 1 × 10⁶ cells/100 ul in PBS. Mice were anesthetized with isoflurane. Cells were injected subcutaneously into left and right flank of the mouse using 29G1/2 needle. Animals were monitored and weighed every day. Tumor volumes were measured by using digital caliper and calculated by the formula: Tumor volume = (length × width × height)0.52. On 14^th day post inoculation, mice developed about 100 mm³ size tumors. Mice were randomized into two groups and treated for the next 10 days with vehicle (DMSO in 100 ul saline) and CCT068127 in saline (30 mg/kg). All treatments were administered by intra-peritoneal injections.

Pharmacokinetic study

Groups of NSG female mice (n = 4) were injected intraperitoneally every day for two weeks with doses of 10 mg/kg or 20 mg/kg or 30 mg/kg of body weight of CCT068127 in saline. Mice were monitored daily for weight changes and clinical signs of toxicity. After completion of the experiment, mice were euthanized by CO₂ inhalation. A necropsy study was done in a systemic manner to identify any changes in anatomical features due to drug toxicity. We selected a concentration of 30 mg/kg of CCT068127 for further experiment.

Benchmark data

We curated raw and processed data from ENCODE project¹⁰ listed in Supplementary Data 1a. The expected counts and normalized gene expression values, FPKM, and statistics from differential expression analysis were extracted from HepG2 and K562 shRNA-seq datasets. For ChIP-seq, we annotated the processed peak to the nearest gene using HOMER annotatePeaks.pl⁹⁶ from A549, CH12.LX, GM12878, HEK293, HepG2, K562 and MEL cell lines. RNA protein binding profiles (i.e. eCLIP-seq) were processed similarly to ChIP-seq for HepG2 and K562.

Pathway ensemble tool

PET combines the outputs from three underlying pathway analysis methods: GSEA, ora and Enrichr. Reported pathways were ordered by normalized enrichment score (NES), ora p-values by Fisher exact test, Enrichr “combined scores”, respectively, and the ranks were determined. For each pathway, an average rank between the three methods was calculated and a combined p-value was estimated by Stouffer’s Method using scipy.stats.combine_pvalues with method = “stouffer”. Finally, all pathways were ordered based on the average rank.

Ora was re-implemented using scipy.stats.fisher_exact function. Enrichr was re-implemented in Python as described in Computing enrichment section of the original manuscript⁵ using N = 1000 permutations. GSEA was invoked using different indicated ranking metrics in the manuscript. For “signal2noise” and “ratio of classes”, the normalized expression values were used with GSEA parameters set as -collapse No_Collapse -set_max 500 -set_min 20”. For other modes (e.g. DESeq2, edgeR), a pre-ranked gene list was generated from statistics of differential expression analysis or peak calling results, where -log10(p-value)*sign(log(fold change)) was used as the gene weight. The “prerank” mode parameters were set as -collapse No_Collapse -set_max 500 -set_min 20”. All the other parameters were kept as default. PET implementation in python incorporates the option to use DESeq2⁹⁷ or PyDESeq2⁹⁸, as well as GSEA⁴ or GSEApy⁹⁹, providing flexibility in selecting the platform for their analysis.

Running ensemble methods

To conduct pathway analyses using e.browser¹⁴, we utilized the built-in function deAna (expr_mat, de.method = “DESeq2”) to make the differential gene expression analyses consistent with PET. Every underlying pathway analysis method supported by e.browser was executed (except padog which had technical errors), and their results were combined using “combResults” and then ranked by “gsRanking” function.

To conduct pathway analyses using piano¹², the geneLevelStats were provided as the p-values from DESeq2 analyses for all the methods allowed or log2 fold change as required by the remaining methods. Every underlying pathway analysis method supported by piano was executed, and their results were combined using consensusScores (res_list, class = ‘distinct’, direction = ‘up’).

To conduct pathway analyses using decoupler¹¹, we utilized the function run_consensus. The DESeq2 statistics, defined as -log10(p-value) for each gene were provided as the required ‘mat’ argument. All Benchmark genesets were provided as the required ‘net’ argument, with each geneset ‘connected’ to its genes. Then, all genesets were ranked based on ‘consensus_estimate’ score returned by run_consensus function.

All other indicated pathway analysis methods were executed using egsea Bioconductor toolkit⁸⁹ provided with read count matrix. The “average rank” setting for egsea resulted in the best overall median rank, precision and average precision scores among all its other ranking metrics and thus was chosen to be reported here.

Simulated noise and false-discovery rate calculation

To simulate inaccuracies and noise in biological pathways, a portion of genes in each pathway were substituted with an equal number of randomly chosen genes from the entire human genome. To calculate false discovery rates, we first collected raw read counts and normalized gene expression data from TCGA breast invasive carcinoma (BRCA), TCGA lung adenocarcinoma (LUAD), TCGA prostate adenocarcinoma (PRAD), and GTEx breast, lung and prostate tissues. We randomly selected 2 N samples (i.e. N = 2–10, 15, 20, 25, 30) from each tissue/cancer and split them into two groups, and ran pathway analysis methods on the expression matrix of the two groups against all canonical pathways (c2.cp.v7.1.symbols.gmt), where the permutations are repeated 100 times for each N. For methods requiring a pre-ranked gene list or set of genes, we utilized DESeq2 to perform the differential expression analysis and used –log10(p-value) or top 200 DEGs as input for those methods. Since the samples were randomly selected biological replicates, the expectation is that any reported pathway significantly different between the groups represents false discovery, where significance is defined as the adjusted p-value ≤ 0.05. Thus, the proportion of significant pathways among all the pathways is defined as the false-discovery rate.

Prognostic power of individual genes, gene combinations and independent validation

The prognostic power of each biomarker was quantified using the area under the curve (AUC) based on the ability of a gene or a gene combination biomarker to distinguish different sample groups, e.g., alive vs. deceased or progression vs. no-progression to the next tumor stage. The gene expression values were first transformed to z-scores (across samples). For gene combination biomarkers, the average z-score of the participating genes were calculated. The z-scores were then used to calculated AUC values by sklearn.metrics.roc_auc_score. To calculate the significance of individua genes or combination biomarkers, the p-value for logrank test of survival were generated by LIFELINES [https://doi.org/10.21105/joss.01317] and were FDR-adjusted for multiple testing by statsmodels.stats.multitest.fdrcorrection. Adjusted FDR value less than 0.05 were considered as significant.

To identify gene combination biomarkers from prognostic pathways identified by PET or differentially expressed genes (DEGs) in each cancer type in TCGA dataset, the following steps were taken: For PET, we identified the top 20 pathways associated with unfavorable prognoses and the top 20 pathways associated with favorable prognoses according to PET p-values. From each pathway, we extracted up to 10 leading genes based on optimized GSEA results and generated all possible combinations of 1–5 leading genes (n = 637). Finally, we calculated the AUC values for all the combinations as mentioned above. For individual DEGs, we obtained the top 100 up and top 100 down-regulated DEGs (sorted by DESeq2 adjusted p-value) between alive vs. deceased groups and calculated the AUC values for these DEGs as mentioned above (Fig. 2d, Supplementary Fig. S4d). Since generating combinations of 1–5 genes from 100 DEGs would be impractical, for combination biomarkers derived from DEGs, we focused on the top 10 most upregulated and top 10 most downregulated genes separately. We enumerated all possible combinations of 1–5 genes from the upregulated genes (n = 637) and another set of combinations from the downregulated genes (n = 637). Finally, we calculated the AUC values for these combination biomarkers, following the same method mentioned above. Only significant combination biomarkers based on FDR-adjusted logrank of survival (see above) were kept for visualization (e.g. Fig. 2e, Supplementary Fig. S4e) and further analysis.

To independently validate the combination biomarkers derived from either DEGs or PET analysis, we extracted the top 200 significant combination biomarkers based on their AUC values. We then tested the performances of these biomarkers on several independent datasets (Supplementary Fig. S4g and Supplementary Data 2g). Specifically, for each cancer type, we calculated the proportion of identified favorable or unfavorable markers identified from corresponding TCGA cancer dataset that remained as favorable or unfavorable markers in the indicated independent dataset.

Computational drug prediction

The drug-target dataset was downloaded from DSigDB V1⁶⁸. For each drug, the down-regulated and up-regulated target genesets were designated as drug_repressed_geneset and drug_induced_geneset. The gene biomarkers that are intended to be upregulated or downregulated were designated as favorable_geneset and unfavorable_geneset respectively. To identify the “favorable” or “unfavorable” geneset in each type of cancer from PET analysis, we extracted the top 20 pathways associated with either favorable or unfavorable prognoses. From these pathways, we selected up to 10 leading genes based on optimized GSEA results and compiled them as the respective “favorable” or “unfavorable” geneset. For DEG-based drug prediction, we extracted the top 200 upregulated and the top 200 downregulated DEGs according to DESeq2 adjusted p-value between the alive vs. deceased patient groups as the corresponding “favorable” or “unfavorable” geneset. For each pair of drug and cancer, we then performed the following four Fisher exact tests with respect to the gene universe and calculated the corresponding p-values using scipy.stats.fisher_exact: a) drug_induced_geneset and unfavorable_geneset, b) drug_repressed_geneset and favorable_geneset, c) drug_induced_geneset and favorable_geneset, d) drug_repressed_geneset and unfavorable_geneset. The drug effect on a prognostic marker is then calculated as E = −log10(c) − log10(d) + log10(a) + log10(b). Finally, all the drugs were ranked by the predicted effect E.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Source data are provided with this paper. The RNA-seq datasets generated from three independent biological replicates of DMSO or CCT068127-treated T24 bladder cancer cells are available in the GEO repository with accession number GSE222567. All data generated or analyzed during this study are included in this published article and its supplementary information files. Benchmark data have also been deposited to the Figshare database [https://doi.org/10.6084/m9.figshare.c.7252324]¹⁰⁰. Some of the results are in part based upon data generated by The Cancer Genome Atlas (TCGA) Research Network: https://www.cancer.gov/tcga, the Genotype-Tissue Expression GTEx: https://gtexportal.org/, the Encyclopedia of DNA Elements ENCODE: https://www.encodeproject.org/ and the Cancer Cell Line Encyclopedia CCLE: https://depmap.org/portal/ccle/. Source data are provided with this paper.

Code availability

The source codes and stable releases for Benchmark [https://github.com/hedgehug/Benchmark] and PET are provided in GitHub repository and have also been deposited to the Figshare database [https://doi.org/10.6084/m9.figshare.c.7252324]¹⁰⁰ along with all the data files.

References

Howlader, N. et al. SEER Cancer Statistics Review, 1975–2018. National Cancer Institute. Bethesda, MD, https://seer.cancer.gov/csr/1975_2018/, based on November 2020 SEER data submission posted to the SEER web site (2021).
Nguyen, T. M., Shafi, A., Nguyen, T. & Draghici, S. Identifying significantly impacted pathways: a comprehensive review and assessment. Genome Biol. 20, 203 (2019).
Article PubMed PubMed Central Google Scholar
Khatri, P., Sirota, M. & Butte, A. J. Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput. Biol. 8, e1002375 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Article ADS CAS PubMed PubMed Central Google Scholar
Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinforma. 14, 128 (2013).
Article Google Scholar
Alhamdoosh, M. et al. Combining multiple tools outperforms individual methods in gene set enrichment analyses. Bioinformatics 33, 414–424 (2017).
Article CAS PubMed Google Scholar
Geistlinger, L. et al. Toward a gold standard for benchmarking gene set enrichment analysis. Brief. Bioinform. 22, 545–556 (2021).
Article CAS PubMed Google Scholar
Yu, C. et al. A strategy for evaluating pathway analysis methods. BMC Bioinforma. 18, 453 (2017).
Article Google Scholar
Paczkowska, M. et al. Integrative pathway enrichment analysis of multivariate omics data. Nat. Commun. 11, 735 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Consortium, E. P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Article ADS Google Scholar
Badia, I. M. P. et al. decoupleR: ensemble of computational methods to infer biological activities from omics data. Bioinform. Adv. 2, vbac016 (2022).
Article Google Scholar
Varemo, L., Nielsen, J. & Nookaew, I. Enriching the gene set analysis of genome-wide data by incorporating directionality of gene expression and combining statistical hypotheses and methods. Nucleic Acids Res. 41, 4378–4391 (2013).
Article PubMed PubMed Central Google Scholar
Wu, D. & Smyth, G. K. Camera: a competitive gene set test accounting for inter-gene correlation. Nucleic Acids Res. 40, e133 (2012).
Article CAS PubMed PubMed Central Google Scholar
Geistlinger, L., Csaba, G. & Zimmer, R. Bioconductor’s EnrichmentBrowser: seamless navigation through combined results of set- & network-based enrichment analysis. BMC Bioinforma. 17, 45 (2016).
Article Google Scholar
Lee, E., Chuang, H. Y., Kim, J. W., Ideker, T. & Lee, D. Inferring pathway activity toward precise disease classification. PLoS Comput Biol. 4, e1000217 (2008).
Article PubMed PubMed Central Google Scholar
Hanzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinforma. 14, 7 (2013).
Article Google Scholar
Tarca, A. L., Draghici, S., Bhatti, G. & Romero, R. Down-weighting overlapping genes improves gene set analysis. BMC Bioinforma. 13, 136 (2012).
Article Google Scholar
Wu, D. et al. ROAST: rotation gene set tests for complex microarray experiments. Bioinformatics 26, 2176–2182 (2010).
Article CAS PubMed PubMed Central Google Scholar
Tomfohr, J., Lu, J. & Kepler, T. B. Pathway level analysis of gene expression using singular value decomposition. BMC Bioinforma. 6, 225 (2005).
Article Google Scholar
Luo, W., Friedman, M. S., Shedden, K., Hankenson, K. D. & Woolf, P. J. GAGE: generally applicable gene set enrichment for pathway analysis. BMC Bioinforma. 10, 161 (2009).
Article Google Scholar
Barry, W. T., Nobel, A. B. & Wright, F. A. Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics 21, 1943–1949 (2005).
Article CAS PubMed Google Scholar
Garraway, L. A. et al. Integrative genomic analyses identify MITF as a lineage survival oncogene amplified in malignant melanoma. Nature 436, 117–122 (2005).
Article ADS CAS PubMed Google Scholar
Saha Roy, S. & Vadlamudi, R. K. Role of estrogen receptor signaling in breast cancer metastasis. Int J. Breast Cancer 2012, 654698 (2012).
Article PubMed Google Scholar
Clusan, L., Ferriere, F., Flouriot, G. & Pakdel, F. A Basic Review on Estrogen Receptor Signaling Pathways in Breast Cancer. Int J. Mol. Sci. 24, 6834 (2023).
Article CAS PubMed PubMed Central Google Scholar
Whitsett, J. A., Weaver, T. E., Lieberman, M. A., Clark, J. C. & Daugherty, C. Differential effects of epidermal growth factor and transforming growth factor-beta on synthesis of Mr = 35,000 surfactant-associated protein in fetal lung. J. Biol. Chem. 262, 7908–7913 (1987).
Article CAS PubMed Google Scholar
Inoue, A. et al. Suppression of surfactant protein A by an epidermal growth factor receptor tyrosine kinase inhibitor exacerbates lung inflammation. Cancer Sci. 99, 1679–1684 (2008).
Article CAS PubMed PubMed Central Google Scholar
Hasegawa, Y. et al. Surfactant protein D suppresses lung cancer progression by downregulation of epidermal growth factor signaling. Oncogene 34, 4285–4286 (2015).
Article CAS PubMed Google Scholar
Ricono, J. M. et al. Specific cross-talk between epidermal growth factor receptor and integrin alphavbeta5 promotes carcinoma cell invasion and metastasis. Cancer Res. 69, 1383–1391 (2009).
Article CAS PubMed PubMed Central Google Scholar
Rubio, K. et al. Non-canonical integrin signaling activates EGFR and RAS-MAPK-ERK signaling in small cell lung cancer. Theranostics 13, 2384–2407 (2023).
Article CAS PubMed PubMed Central Google Scholar
Mahe, M. et al. An FGFR3/MYC positive feedback loop provides new opportunities for targeted therapies in bladder cancers. EMBO Mol. Med. 10, e8163 (2018).
Article PubMed PubMed Central Google Scholar
Grandori, C. et al. c-Myc binds to human ribosomal DNA and stimulates transcription of rRNA genes by RNA polymerase I. Nat. Cell Biol. 7, 311–318 (2005).
Article CAS PubMed Google Scholar
van Riggelen, J., Yetil, A. & Felsher, D. W. MYC as a regulator of ribosome biogenesis and protein synthesis. Nat. Rev. Cancer 10, 301–309 (2010).
Article PubMed Google Scholar
Arabi, A. et al. c-Myc associates with ribosomal DNA and activates RNA polymerase I transcription. Nat. Cell Biol. 7, 303–310 (2005).
Article CAS PubMed Google Scholar
Grewal, S. S., Li, L., Orian, A., Eisenman, R. N. & Edgar, B. A. Myc-dependent regulation of ribosomal RNA synthesis during Drosophila development. Nat. Cell Biol. 7, 295–302 (2005).
Article CAS PubMed Google Scholar
Kardos, J. et al. Development and validation of a NanoString BASE47 bladder cancer gene classifier. PLoS One 15, e0243935 (2020).
Article CAS PubMed PubMed Central Google Scholar
Lee, J. S. et al. Expression signature of E2F1 and its associated genes predict superficial to invasive progression of bladder tumors. J. Clin. Oncol. 28, 2660–2667 (2010).
Article CAS PubMed Google Scholar
Choi, W. et al. Identification of distinct basal and luminal subtypes of muscle-invasive bladder cancer with different sensitivities to frontline chemotherapy. Cancer Cell 25, 152–165 (2014).
Article CAS PubMed PubMed Central Google Scholar
Ferraro, G. B. et al. Fatty Acid Synthesis Is Required for Breast Cancer Brain Metastasis. Nat. Cancer 2, 414–428 (2021).
Article CAS PubMed PubMed Central Google Scholar
Przybytkowski, E. et al. Upregulation of cellular triacylglycerol - free fatty acid cycling by oleate is associated with long-term serum-free survival of human breast cancer cells. Biochem. Cell Biol. 85, 301–310 (2007).
Article CAS PubMed Google Scholar
Brunner, A. & Tzankov, A. The role of structural extracellular matrix proteins in urothelial bladder cancer (review). Biomark. Insights 2, 418–427 (2007).
Article PubMed PubMed Central Google Scholar
Weniger, M., Honselmann, K. C. & Liss, A. S. The Extracellular Matrix and Pancreatic Cancer: A Complex Relationship. Cancers 10, 316 (2018).
Article PubMed PubMed Central Google Scholar
Mahendra, A. et al. Vitamin D and gastrointestinal cancer. J. Lab Physicians 10, 1–5 (2018).
Article CAS PubMed PubMed Central Google Scholar
Ren, C. et al. Prognostic effects of 25-hydroxyvitamin D levels in gastric cancer. J. Transl. Med. 10, 16 (2012).
Article CAS PubMed PubMed Central Google Scholar
Kamel, H. F. M. & Al-Amodi, H. Exploitation of Gene Expression and Cancer Biomarkers in Paving the Path to Era of Personalized Medicine. Genomics Proteom. Bioinforma. 15, 220–235 (2017).
Article CAS Google Scholar
Willis, S. et al. Single Gene Prognostic Biomarkers in Ovarian Cancer: A Meta-Analysis. PLoS One 11, e0149183 (2016).
Article PubMed PubMed Central Google Scholar
Voronova, V. et al. Diagnostic Value of Combinatorial Markers in Colorectal Carcinoma. Front. Oncol. 10, 832 (2020).
Article PubMed PubMed Central Google Scholar
Rakha, E. A., Reis-Filho, J. S. & Ellis, I. O. Combinatorial biomarker expression in breast cancer. Breast Cancer Res. Treat. 120, 293–308 (2010).
Article CAS PubMed Google Scholar
Qian, Y. et al. Prognostic Cancer Gene Expression Signatures: Current Status and Challenges. Cells 10, 648 (2021).
Article PubMed PubMed Central Google Scholar
Farber, N. J. et al. Renal cell carcinoma: the search for a reliable biomarker. Transl. Cancer Res. 6, 620–632 (2017).
Article CAS PubMed Google Scholar
Xu, F. et al. The Oncogenic Role of COL23A1 in Clear Cell Renal Cell Carcinoma. Sci. Rep. 7, 9846 (2017).
Article ADS PubMed PubMed Central Google Scholar
Guo, L., An, T., Huang, Z., Wan, Z. & Chong, T. Comprehensive analysis of the collagen family members as prognostic markers in clear cell renal cell carcinoma. Transl. Cancer Res. 11, 1954–1969 (2022).
Article CAS PubMed PubMed Central Google Scholar
Best, S. L. et al. Collagen organization of renal cell carcinoma differs between low and high grade tumors. BMC Cancer 19, 490 (2019).
Article PubMed PubMed Central Google Scholar
Majo, S., Courtois, S., Souleyreau, W., Bikfalvi, A. & Auguste, P. Impact of Extracellular Matrix Components to Renal Cell Carcinoma Behavior. Front. Oncol. 10, 625 (2020).
Article PubMed PubMed Central Google Scholar
Hurt, E. M. et al. Identification of vitronectin as an extrinsic inducer of cancer stem cell differentiation and tumor formation. Stem Cells 28, 390–398 (2010).
Article CAS PubMed Google Scholar
Li, S. et al. Plasminogen activator inhibitor-1 in cancer research. Biomed. Pharmacother. 105, 83–94 (2018).
Article CAS PubMed Google Scholar
Gyorffy, B. Survival analysis across the entire transcriptome identifies biomarkers with the highest prognostic power in breast cancer. Comput Struct. Biotechnol. J. 19, 4101–4109 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zhang, J. et al. The International Cancer Genome Consortium Data Portal. Nat. Biotechnol. 37, 367–369 (2019).
Article CAS PubMed Google Scholar
Yoon, S. J. et al. Deconvolution of diffuse gastric cancer and the suppression of CD34 on the BALB/c nude mice model. BMC Cancer 20, 314 (2020).
Article CAS PubMed PubMed Central Google Scholar
Cristescu, R. et al. Molecular analysis of gastric cancer identifies subtypes associated with distinct clinical outcomes. Nat. Med. 21, 449–456 (2015).
Article CAS PubMed Google Scholar
Alkader, M. S. et al. Impact of KRAS Mutation on Survival Outcome of Patients With Metastatic Colorectal Cancer in Jordan. Cureus 15, e33736 (2023).
PubMed PubMed Central Google Scholar
Tan, E. et al. BRAF Mutations Are Associated with Poor Survival Outcomes in Advanced-stage Mismatch Repair-deficient/Microsatellite High Colorectal Cancer. Oncologist 27, 191–197 (2022).
Article PubMed PubMed Central Google Scholar
Robles, A. I. & Harris, C. C. Clinical outcomes and correlates of TP53 mutations and cancer. Cold Spring Harb. Perspect. Biol. 2, a001016 (2010).
Article PubMed PubMed Central Google Scholar
Cancer Genome Atlas Research, N. et al. Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma. N. Engl. J. Med. 374, 135–145 (2016).
Article Google Scholar
Seiler, R. et al. Impact of Molecular Subtypes in Muscle-invasive Bladder Cancer on Predicting Response and Survival after Neoadjuvant Chemotherapy. Eur. Urol. 72, 544–554 (2017).
Article CAS PubMed Google Scholar
Robertson, A. G. et al. Comprehensive Molecular Characterization of Muscle-Invasive Bladder Cancer. Cell 171, 540–556 e525 (2017).
Article CAS PubMed PubMed Central Google Scholar
Dhawan, D., Hahn, N. M., Ramos-Vara, J. A. & Knapp, D. W. Naturally-occurring canine invasive urothelial carcinoma harbors luminal and basal transcriptional subtypes found in human muscle invasive bladder cancer. PLoS Genet 14, e1007571 (2018).
Article PubMed PubMed Central Google Scholar
Evangelista, J. E. et al. SigCom LINCS: data and metadata search engine for a million gene expression signatures. Nucleic Acids Res. 50, W697–W709 (2022).
Article CAS PubMed PubMed Central Google Scholar
Yoo, M. et al. DSigDB: drug signatures database for gene set analysis. Bioinformatics 31, 3069–3071 (2015).
Article CAS PubMed PubMed Central Google Scholar
Yan, B. et al. SARS-CoV-2 drives JAK1/2-dependent local complement hyperactivation. Sci. Immunol. 6, eabg0833 (2021).
Article PubMed PubMed Central Google Scholar
Chauss, D. et al. Autocrine vitamin D signaling switches off pro-inflammatory programs of TH1 cells. Nat. Immunol. 23, 62–74 (2022).
Article CAS PubMed Google Scholar
Zhao, M. et al. Use of liposomal doxorubicin for adjuvant chemotherapy of breast cancer in clinical practice. J. Zhejiang Univ. Sci. B 18, 15–26 (2017).
Article CAS PubMed PubMed Central Google Scholar
Barrett-Lee, P. J. et al. Expert opinion on the use of anthracyclines in patients with advanced breast cancer at cardiac risk. Ann. Oncol. 20, 816–827 (2009).
Article CAS PubMed Google Scholar
Bufu, T. et al. Celastrol inhibits colorectal cancer cell proliferation and migration through suppression of MMP3 and MMP7 by the PI3K/AKT signaling pathway. Anticancer Drugs 29, 530–538 (2018).
Article CAS PubMed Google Scholar
Wilson, G. S. et al. Tumoricidal effects of the JAK inhibitor Ruxolitinib (INC424) on hepatocellular carcinoma in vitro. Cancer Lett. 341, 224–230 (2013).
Article CAS PubMed Google Scholar
Wainberg, Z. A. et al. First-line liposomal irinotecan with oxaliplatin, 5-fluorouracil and leucovorin (NALIRIFOX) in pancreatic ductal adenocarcinoma: A phase I/II study. Eur. J. Cancer 151, 14–24 (2021).
Article CAS PubMed Google Scholar
Hubner, R. A. et al. Quality of life in metastatic pancreatic cancer patients receiving liposomal irinotecan plus 5-fluorouracil and leucovorin. Eur. J. Cancer 106, 24–33 (2019).
Article ADS CAS PubMed Google Scholar
Woo, W., Carey, E. T. & Choi, M. Spotlight on liposomal irinotecan for metastatic pancreatic cancer: patient selection and perspectives. Onco Targets Ther. 12, 1455–1463 (2019).
Article PubMed PubMed Central Google Scholar
Hessmann, E., Schneider, G., Ellenrieder, V. & Siveke, J. T. MYC in pancreatic cancer: novel mechanistic insights and their translation into therapeutic strategies. Oncogene 35, 1609–1618 (2016).
Article CAS PubMed Google Scholar
Oliveira-Cunha, M., Newman, W. G. & Siriwardena, A. K. Epidermal growth factor receptor in pancreatic cancer. Cancers 3, 1513–1526 (2011).
Article CAS PubMed PubMed Central Google Scholar
Stewart, C. F. et al. Gefitinib enhances the antitumor activity and oral bioavailability of irinotecan in mice. Cancer Res. 64, 7491–7499 (2004).
Article CAS PubMed Google Scholar
Rubin, S. M., Sage, J. & Skotheim, J. M. Integrating Old and New Paradigms of G1/S Control. Mol. Cell 80, 183–192 (2020).
Article CAS PubMed PubMed Central Google Scholar
Klein, M. E., Kovatcheva, M., Davis, L. E., Tap, W. D. & Koff, A. CDK4/6 Inhibitors: The Mechanism of Action May Not Be as Simple as Once Thought. Cancer Cell 34, 9–20 (2018).
Article CAS PubMed PubMed Central Google Scholar
Du, Q. et al. The application and prospect of CDK4/6 inhibitors in malignant solid tumors. J. Hematol. Oncol. 13, 41 (2020).
Article CAS PubMed PubMed Central Google Scholar
Rose, T. L. et al. Phase II trial of palbociclib in patients with metastatic urothelial cancer after failure of first-line chemotherapy. Br. J. Cancer 119, 801–807 (2018).
Article CAS PubMed PubMed Central Google Scholar
Subramanian, A. et al. A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles. Cell 171, 1437–1452 e1417 (2017).
Article CAS PubMed PubMed Central Google Scholar
Whittaker, S. R. et al. Molecular profiling and combinatorial activity of CCT068127: a potent CDK2 and CDK9 inhibitor. Mol. Oncol. 12, 287–304 (2018).
Article CAS PubMed PubMed Central Google Scholar
Yang, W. et al. Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 41, D955–D961 (2013).
Article CAS PubMed Google Scholar
Salviato, E., Djordjilovic, V., Chiogna, M. & Romualdi, C. simPATHy: a new method for simulating data from perturbed biological PATHways. Bioinformatics 33, 456–457 (2017).
Article CAS PubMed Google Scholar
Alhamdoosh, M. et al. Easy and efficient ensemble gene set testing with EGSEA. F1000Res 6, 2010 (2017).
Article PubMed PubMed Central Google Scholar
Schurch, N. J. et al. How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA 22, 839–851 (2016).
Article CAS PubMed PubMed Central Google Scholar
Rubio, C. et al. CDK4/6 Inhibitor as a Novel Therapeutic Approach for Advanced Bladder Cancer Independently of RB1 Status. Clin. Cancer Res. 25, 390–402 (2019).
Article CAS PubMed Google Scholar
Long, Q. et al. Combination of cyclin-dependent kinase and immune checkpoint inhibitors for the treatment of bladder cancer. Cancer Immunol. Immunother. 69, 2305–2317 (2020).
Article CAS PubMed PubMed Central Google Scholar
Tong, Z. et al. Functional genomics identifies predictive markers and clinically actionable resistance mechanisms to CDK4/6 inhibition in bladder cancer. J. Exp. Clin. Cancer Res. 38, 322 (2019).
Article PubMed PubMed Central Google Scholar
Pan, C. X. et al. Development and Characterization of Bladder Cancer Patient-Derived Xenografts for Molecularly Guided Targeted Therapy. PLoS One 10, e0134346 (2015).
Article PubMed PubMed Central Google Scholar
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinforma. 12, 323 (2011).
Article CAS Google Scholar
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Article CAS PubMed PubMed Central Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central Google Scholar
Muzellec, B., Telenczuk, M., Cabeli, V. & Andreux, M. PyDESeq2: a python package for bulk RNA-seq differential expression analysis. Bioinformatics 39, btad547 (2023).
Article CAS PubMed PubMed Central Google Scholar
Fang, Z., Liu, X. & Peltz, G. GSEApy: a comprehensive package for performing gene set enrichment analysis in Python. Bioinformatics 39, btac757 (2023).
Article CAS PubMed Google Scholar
Wang, L. et al. Unbiased discovery of cancer pathways and therapeutics using Pathway Ensemble Tool and Benchmark-Source data and code. Figshare https://doi.org/10.6084/m9.figshare.c.7252324 (2024).
Article Google Scholar
Sher, S. et al. VIP152 is a selective CDK9 inhibitor with pre-clinical in vitro and in vivo efficacy in chronic lymphocytic leukemia. Leukemia 37, 326–338 (2023).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was supported by extramural research programs of the NIH (R35GM138283 to M.K., AI136995 to S.D.B., 3RF1AG064250 to W.A.T.) and intramural research programs of the National Institute of Diabetes and Digestive and Kidney Diseases (ZIA/DK075149 to B.A.). The authors also gratefully acknowledge the SIRG Graduate Research Assistantships Award to S.C. and support from the Purdue University Center for Cancer Research, P30CA023168. The results here are in part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.

Author information

These authors contributed equally: Luopin Wang, Aryamav Pattnaik
These authors jointly supervised this work: Behdad Afzali, Majid Kazemian

Authors and Affiliations

Department of Computer Science, Purdue University, West Lafayette, IN, USA
Luopin Wang & Majid Kazemian
Purdue Institute for Cancer Research, Purdue University, West Lafayette, IN, USA
Luopin Wang, Aryamav Pattnaik, Subhransu Sekhar Sahoo, Ella G. Stone, Yuxin Zhuang, Annaleigh Benton, Srishti Chakravorty, My An Nguyen, Isabella Sirit, Marco Hadisurya, Garima Baral, Samantha L. Tinsley, Nicole L. Anderson, Scott D. Briggs, Brittany L. Allen-Petersen, W. Andy Tao, Deborah W. Knapp, Jason A. Hanna, Matthew R. Olson & Majid Kazemian
Department of Biochemistry, Purdue University, West Lafayette, IN, USA
Aryamav Pattnaik, Subhransu Sekhar Sahoo, Yuxin Zhuang, Md Tajmul, Srishti Chakravorty, Marco Hadisurya, Smriti Hoda, Scott D. Briggs, W. Andy Tao & Majid Kazemian
Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
Ella G. Stone, Annaleigh Benton, Isabella Sirit, Garima Baral, Samantha L. Tinsley, Nicole L. Anderson, Brittany L. Allen-Petersen, Jason A. Hanna & Matthew R. Olson
Immunoregulation Section, National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), NIH, Bethesda, MD, USA
Md Tajmul & Behdad Afzali
Department of Veterinary Clinical Sciences, College of Veterinary Medicine, Purdue University, West Lafayette, IN, USA
Deepika Dhawan & Deborah W. Knapp
Department of Biomedical Engineering, Purdue University, West Lafayette, IN, USA
My An Nguyen & Kyle Mundy
Urologic Oncology Branch of Center for Cancer Research, National Cancer Institute (NCI), NIH, Bethesda, MD, USA
Christopher J. Ricketts & W. Marston Linehan
Department of Urology, School of medicine, Indiana University, Indianapolis, IN, USA
Hristos Z. Kaimakliotis
Department of Chemistry, Purdue University, West Lafayette, IN, USA
W. Andy Tao

Authors

Luopin Wang
View author publications
Search author on:PubMed Google Scholar
Aryamav Pattnaik
View author publications
Search author on:PubMed Google Scholar
Subhransu Sekhar Sahoo
View author publications
Search author on:PubMed Google Scholar
Ella G. Stone
View author publications
Search author on:PubMed Google Scholar
Yuxin Zhuang
View author publications
Search author on:PubMed Google Scholar
Annaleigh Benton
View author publications
Search author on:PubMed Google Scholar
Md Tajmul
View author publications
Search author on:PubMed Google Scholar
Srishti Chakravorty
View author publications
Search author on:PubMed Google Scholar
Deepika Dhawan
View author publications
Search author on:PubMed Google Scholar
My An Nguyen
View author publications
Search author on:PubMed Google Scholar
Isabella Sirit
View author publications
Search author on:PubMed Google Scholar
Kyle Mundy
View author publications
Search author on:PubMed Google Scholar
Christopher J. Ricketts
View author publications
Search author on:PubMed Google Scholar
Marco Hadisurya
View author publications
Search author on:PubMed Google Scholar
Garima Baral
View author publications
Search author on:PubMed Google Scholar
Samantha L. Tinsley
View author publications
Search author on:PubMed Google Scholar
Nicole L. Anderson
View author publications
Search author on:PubMed Google Scholar
Smriti Hoda
View author publications
Search author on:PubMed Google Scholar
Scott D. Briggs
View author publications
Search author on:PubMed Google Scholar
Hristos Z. Kaimakliotis
View author publications
Search author on:PubMed Google Scholar
Brittany L. Allen-Petersen
View author publications
Search author on:PubMed Google Scholar
W. Andy Tao
View author publications
Search author on:PubMed Google Scholar
W. Marston Linehan
View author publications
Search author on:PubMed Google Scholar
Deborah W. Knapp
View author publications
Search author on:PubMed Google Scholar
Jason A. Hanna
View author publications
Search author on:PubMed Google Scholar
Matthew R. Olson
View author publications
Search author on:PubMed Google Scholar
Behdad Afzali
View author publications
Search author on:PubMed Google Scholar
Majid Kazemian
View author publications
Search author on:PubMed Google Scholar

Contributions

All the authors contributed to the preparation of the manuscript. L.W., A.P., B.A., and M.K. conceptualized the manuscript and wrote the main body. S.S.S., E.G.S., Y.Z., A.B., S.C., M.A.N., I.S., G.B., S.L.T., N.L.A., S.H., S.D.B., H.Z.K., B.L.A., J.A.H., and M.R.O. contributed to the in vitro and in vivo experimental studies. M.T. contributed to RNA-seq library preparation and sequencing. M.H. and W.A.T. contributed to pancreatic cancer biomarker analyses. C.J.R. and W.M.L. contributed to kidney cancer genetic alteration analyses. K.M. contributed to software testing and documentation. D.D. and D.W.K. contributed to the canine studies.

Corresponding authors

Correspondence to Behdad Afzali or Majid Kazemian.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Peer Review File (download PDF )

Description of Additional Supplementary Files (download PDF )

Supplementary Data 1 (download XLSX )

Supplementary Data 2 (download XLSX )

Supplementary Data 3 (download XLSX )

Supplementary Data 4 (download XLSX )

Supplementary Data 5 (download XLSX )

Reporting summary (download PDF )

Source data

Source Data (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, L., Pattnaik, A., Sahoo, S.S. et al. Unbiased discovery of cancer pathways and therapeutics using Pathway Ensemble Tool and Benchmark. Nat Commun 15, 7288 (2024). https://doi.org/10.1038/s41467-024-51859-9

Download citation

Received: 06 March 2023
Accepted: 19 August 2024
Published: 24 August 2024
Version of record: 24 August 2024
DOI: https://doi.org/10.1038/s41467-024-51859-9

This article is cited by

A new form of diabetes caused by INS mutations defined by zygosity, stem cell and population data
- Yue Tong
- Marianne Becker
- Miriam Cnop
EMBO Molecular Medicine (2026)
Discovering anticancer drug target combinations via network-informed signaling-based approach
- Bengi Ruken Yavuz
- Hyunbum Jang
- Ruth Nussinov
Communications Medicine (2025)