CanSeer: a translational methodology for developing personalized cancer models and therapeutics

Butt, Rida Nasir; Amina, Bibi; Sultan, Muhammad Umer; Tanveer, Zain Bin; Gondal, Mahnoor Naseer; Hussain, Risham; Khan, Salaar; Akbar, Rida; Nasir, Zainab; Khalid, Muhammad Farhan; Channan-Khan, Asher Alban; Faisal, Amir; Shoaib, Muhammad; Chaudhary, Safee Ullah

doi:10.1038/s41598-025-99219-x

Download PDF

Article
Open access
Published: 29 April 2025

CanSeer: a translational methodology for developing personalized cancer models and therapeutics

Rida Nasir Butt¹,
Bibi Amina¹,
Muhammad Umer Sultan¹,
Zain Bin Tanveer¹,
Mahnoor Naseer Gondal^1,2,3,
Risham Hussain^1,4,
Salaar Khan^1,5,
Rida Akbar¹,
Zainab Nasir¹,
Muhammad Farhan Khalid¹,
Asher Alban Channan-Khan⁶,
Amir Faisal⁷,
Muhammad Shoaib⁸ &
…
Safee Ullah Chaudhary ORCID: orcid.org/0000-0002-3758-6581¹

Scientific Reports volume 15, Article number: 15080 (2025) Cite this article

4055 Accesses
1 Citations
1 Altmetric
Metrics details

Subjects

A Correction to this article was published on 26 May 2025

This article has been updated

Abstract

Computational modeling and analysis of biomolecular network models annotated with omics data are emerging as a versatile tool for designing personalized therapies. Current endeavors aimed at employing in silico models towards personalized cancer therapeutics remain limited in providing all-in-one approach that ascertains actionable targets, re-positions FDA (Food and Drug Administration) approved drugs, furnishes quantitative cues on therapy responses such as efficacy and cytotoxic effect, and identifies novel drug combinations. Here we propose “CanSeer”—a methodology for developing personalized therapeutics. CanSeer employs patient-specific genetic alterations and RNA-seq data to annotate in silico models followed by dynamical network analyses towards assessment of treatment responses. To exemplify, three use cases involving paired samples, unpaired samples, and cancer samples only, of lung squamous cell carcinoma (LUSC) patients are provided. CanSeer reveals the effectiveness of repositioned drugs along with the identification of several novel LUSC treatment combinations including Afuresertib + Palbociclib, Dinaciclib + Trametinib, Afatinib + Oxaliplatin, Ulixertinib + Olaparib, etc.

Functional personalized complex combination nano therapy for osteosarcoma

Article Open access 16 October 2025

Unlocking hidden potential: advancements, approaches, and obstacles in repurposing drugs for cancer therapy

Article Open access 27 November 2023

Cancer drug-tolerant persister cells: from biological questions to clinical opportunities

Article 02 September 2024

Introduction

Cancer remains a leading cause of death worldwide despite an ever-increasing repertoire of treatment modalities^1,2,3,4. Amongst the salient impediments faced by anticancer regimens are drug resistance⁵, patient-to-patient variability of therapeutic response⁶, and cytotoxicity^7,8. Towards overcoming these issues, molecular targeting-based therapies, seminally exemplified by imatinib’s success in treating chronic myeloid leukemia in 2001⁹, has paved the way for subsequent breakthroughs. The ensuing research impetus led to the development of several targeted therapies including vemurafenib^10,11,12, gilteritinib^13,14, temsirolimus^15,16, etc. These breakthrough drugs further stimulated efforts aimed at identification of molecular targets in wider varieties of cancer and onward employment in devising targeted therapies^{17,18,19,20,21}.

Towards designing targeted therapies, advancements in high throughput sequencing and multiplex mutational screening have been instrumental in elucidating novel molecular signatures^{22,23,24,25,26} thus giving rise to the domain of personalized therapeutics^27,28,29,30. In an early-harvest project, MD Anderson Cancer Center embarked on a program to screen patient-specific genetic alterations with “matched targeted therapies”^31,32,33, leading to momentum in genome-informed precision medicine. However, subsequent clinical trials remained limited to monotherapies (a single target—a single drug combination) which eventually started to lose efficacy and developed resistance. As an example, about 65% of FLT3 mutated refractory acute myeloid leukemia (AML) patients showed resistance to gilteritinib¹⁴. Similarly, BRAF targeting single-agent vemurafenib failed to offer benefits during phase-II clinical trials³⁴. This was partially attributable to drug resistance caused by underlying genetic heterogeneity³⁵ and molecular cross-talk between interconnected signaling pathways³⁶, thereby treatment escape^37,38. Endeavors at overcoming these undesired outcomes now employ combinations of monotherapies such as trametinib + fluvastatin for treating lung cancer³⁹, trametinib + zoledronate for KRAS-mutated patients of metastatic colorectal cancer⁴⁰, etc. Results from such combinatorial regimens have demonstrated lowered rates of treatment escape and resistance to therapy^41,42.

To assist in predicting effective combinatorial therapies, as well as to better understand the mechanistic underpinnings, dynamical simulations of omics-informed biomolecular network models have gained prominence^{43,44,45,46,47,48}. In particular, mathematical modeling coupled with computational dynamical analysis have advanced our understanding of human biomolecular signaling, along with incorporation of patient-specific multi-omics data. Recent studies, such as those by Beal et al., have used omics profiles to stratify patients and analyze individualized survival⁴⁹, while Eduati et al. constructed continuous logic models that revealed heterogeneity amongst pancreatic cancer patients⁵⁰. However, the study remained limited to only a single “apoptotic” readout. Ianevski et al. leveraged machine learning for identifying patient-specific drug combinations, reducing toxicity in leukemic cells⁵¹, but lacked mechanistic insights and posed a risk of off-label drug use. In a recent study, we reported a personalized in silico “drosophila patient model” that employed patient-specific somatic mutations and identified personalized combinatorial treatment for colorectal cancer⁵². The proposed annotation of network models with somatic mutations remained deficient in construing a comprehensive molecular signature of individual patients. Moreover, the coverage of druggable and clinically actionable targets remained uncatered during exhaustive screening for predicting efficacious targets. Later in 2022, Montagud et al. employed PROFILE and enriched the Boolean model of cancer signaling with more interactions, followed by customization for prostate cancer patients⁵³. For developing individualized models, the study considered the presence/absence of somatic mutations and copy number alterations. However, the strategy lacked considering ground knowledge of mutation type and mapping of copy number altered genes with RNA-seq based gene expressions. Also, the in silico drug response approach could not differentiate between the effect of two independent drugs targeting the same gene of patient, one at a time.

To date, a multi-factorial patient data-integrative approach to simulate drug effects using drug scores, followed by quantification of therapy response remains unassessed. In addition, the lack of a systematic all-in-one approach impedes devising personalized therapies. This presents the need of an overarching approach that involves the identification of suitable treatment options, determination of actionable targets, repositioning of FDA (Food and Drug Administration) approved drugs, evaluation of efficacy and cytotoxic effect, and exploration of novel drug combinations. Also, the advantages offered by in silico studies towards personalized cancer therapeutics till date, remain to be fully translated⁵⁴, thereby, necessitating the development of a computational method for devising translational personalized anticancer therapies.

Towards this aim, we propose a method “CanSeer” to develop in silico models for clinical translation of personalized treatments. The proposed method initiates with the development of literature-derived biomolecular regulatory network models followed by their validation. The validated model is then integrated with patient-specific genomic and transcriptomic data for dynamical analysis, selecting druggable targets and curating drugs to inform personalized treatment decisions based on molecular and clinical considerations. For this purpose, drug activity scores are computed to divulge the effect of different drugs targeting the same gene. Moreover, to decode the varying effect of a drug from patient to patient, CanSeer computes “drug scores” (DS) for patients by employing the drug activity score and normalized gene expressions of each individual cancer patient. DS are subsequently used to annotate the personalized models, followed by the re-analysis of each model. Employing network model’s cell fate outcomes, treatment efficacy and cytotoxic effect are computed, which then elucidates treatment suitability. The method concludes with the comparison of cell fate predictions under tailored treatment of patient cancer models, demonstrating personalized treatment response. The three use cases of CanSeer exemplified with lung squamous cell carcinoma (LUSC) case study demonstrated CanSeer's utility in predicting optimal personalized treatments, suggesting potential repositioned drugs and identifying novel drug combinations like Afuresertib + Palbociclib, Dinaciclib + Trametinib, amongst others.

Taken together, CanSeer provides a next-generation clinical translation framework for personalized cancer therapeutics by combining mechanistic insights from computational modeling of “multi-omics” patient data. As a result, the method helps (i) elicit optimal tailored treatments, (ii) provide mechanistic insights into patient-to-patient variability of therapeutic response, (iii) repositions FDA-approved drugs, (iv) determines and evaluates actionable targets, (v) elucidates treatment efficacy, (vi) reveals cytotoxic effect, and (vii) facilitates the discovery of novel drug combinations. In conclusion, the work provides a translational approach to precision oncology to formulate and assess clinically deployable personalized treatment plans.

Results

CanSeer—a personalized cancer therapeutics framework

“CanSeer” is a novel methodology for developing in silico biomolecular network models towards designing personalized cancer therapeutics and their clinical evaluation. The proposed method comprises of four salient steps which include: (i) development of literature-derived Boolean models of biomolecular regulations, and their validation, (ii) genetic alterations and expression data acquisition, pre-processing, and normalization, (iii) model annotation with patient-specific omics data comprising of genomic and transcriptomic profiles followed by the dynamical analyses of personalized models, and (iv) therapeutic screening and identification of optimal regimens for cancer patients (Fig. 1).

Below, we provide the exemplification of proposed methodology through a case study on lung squamous cell carcinoma.

Step 1—Literature and database-driven development of biomolecular regulatory network models and validation

To exemplify Step 1 in CanSeer (Fig. 2A), a large-scale human signaling network was adopted from Cho et al.⁵⁵ and analyzed (Fig. 2B). The analysis result along with input file, cell fate file, network rules, and parameters are given in Supplementary Information 1—Supplementary Data 1, while the detailed description of network in provided as Supplementary Information 2—Supplementary Fig. S1 and Supplementary Table S1). The network model comprised of 197 nodes and 744 edges, organized in a mesh topology with 13 input nodes, 8 output nodes, and 176 processing nodes. Topologically, the network followed a scale-free structure, wherein 114 nodes had a high degree (connectivity), playing a critical role in maintaining network robustness. The network contained distinct groups of nodes and edges that programmed five cellular processes including normal proliferation, abnormal proliferation, apoptosis, quiescence and metastasis. Lastly, in terms of structure, the network model had a hierarchical topology with input nodes triggering the downstream processing nodes, which then interact with each other and set the output nodes. Alongside, cell fates involved in oncogenesis such as senescence, and cell cycle arrest were programmed and mapped by updating the published network. Our results from deterministic analysis (DA) of the updated network, under normal input conditions, corroborate with the published outcomes (Fig. 2C; Supplementary Information 1—Supplementary Data 2). To evaluate network robustness, perturbations were introduced as input signals and the network response was scrutinized using DA pipeline^56,57. The results exhibited highest variation in normal proliferation (SEM 0.0025) followed by apoptosis (SEM 0.0021) in the network model (Fig. 2D; Supplementary Information 1—Supplementary Data 3). Further, to inspect the distinct input–output relationship, parameter sensitivity analysis was carried out by screening input nodes individually and in combinations (Supplementary Information 1—Supplementary Data 4). Specifically, the analysis assessed how changes in the states of Cho et al.’s input nodes (e.g., switching a node from active to inactive, or 1 to 0 etc.) influenced the overall system behavior i.e. model’s output. Briefly, we first altered the input nodes by switching them between 0 (inactive) and 1 (active), or vice versa, in incremental steps of 0.1. This gradual variation (increase or decrease) in input signal allowed us to observe the incremental effects on the model output, providing insights into how each node contributes to the overall dynamics of the network. The overall process is illustrated as a flowchart in Supplementary Information 2—Supplementary Fig. S2. The results showed that the network was particularly sensitive to the input levels of following nodes “alphailig”, “DNA damage”, “EGF”, “IL1/TNF”, “TGF-β”, and “Wnt”. Noteworthy was that an increase in DNA damage signal led to an increase in apoptosis, along with a decrease in normal proliferation, abnormal proliferation, and metastasis (Fig. 2E). Moreover, the specific input perturbation to cell fate outcomes were classified into 4 categories including: (i) “Apoptosis”, which increased with higher DNA damage signals, while on the other hand, elevated levels of EGF signaling reduced apoptosis by supporting cell growth and survival. (ii) “Normal cell proliferation”, which was observed to increase with Wnt and EGF signaling, both individually and in combination, promoting cell growth. This was in contrast with DNA damage, IL-1/TNF, and TGF-β inhibit normal proliferation, as these signals typically induce stress responses, inflammation, or growth suppression. (iii) “Abnormal cell proliferation” and “Metastasis” decreased with an increase in DNA damage signals or α_i ligands expression. Both stimuli act to suppress unchecked cell growth and inhibit the spread of metastatic cells. (iv) Increased IL-1/TNF and TGF-β signaling promoted “Metastasis”, due to their role in inflammation and tumor progression. Summarily, EGF and Wnt signaling promote cell survival by enhancing normal cell proliferation and suppressing apoptosis, supporting growth. On the other hand, DNA damage, IL-1/TNF, TGF-β, and α_i ligands lead to anti-survival outcomes by inhibiting proliferation, promoting apoptosis, or suppressing metastasis.

In addition, we also explored the combined effects of perturbing multiple input nodes, simultaneously. Through this combinatorial screening, we again assessed the system’s sensitivity to complex perturbations. The detailed results are provided in Supplementary Information 1—Supplementary Data 4.

Step 2—Omics data collection, pre-processing, and normalization

In CanSeer’s second step, patient-specific omics data is acquired, pre-processed, and normalized for later use in network model annotation. The detailed workflow of CanSeer’s Step 2 is presented in Fig. 3.

Patient data acquisition

To demonstrate Step 2A of CanSeer, we selected normal solid tissue and primary tumor samples of The Cancer Genome Atlas (TCGA) project “Lung Squamous Cell Carcinoma (TCGA-LUSC)” at GDC portal. The samples were then filtered for transcriptomic profiling, mRNA expression quantification, and RNA-seq data in FPKM (Fragments Per Kilobase of transcript per Million mapped reads) format. All openly accessible (551 files) with RNA-seq based expression data for lung cancer samples were downloaded along with the sample sheet (Supplementary Information 1—Supplementary Video 1). In addition, genetic variations including Copy Number Variations (CNVs), Somatic Mutations (SMs) and Genomic Structural Variations (SVs) of the selected samples, were obtained from TCGA PanCancer Atlas study using cBioPortal (Supplementary Information 1—Supplementary Video 2). Patient dataset matching was ensured by comparing sample IDs between GDC and cBioPortal.

Data pre-processing

To describe the data pre-processing step of CanSeer, Ensembl IDs were mapped onto gene symbols (Supplementary Information 1—Supplementary Data 5), gene symbols list was filtered, and aliases were obtained (Supplementary Information 1—Supplementary Data 6). The network nodes list (from Step 1) was converted into gene symbols and the respective RNA-seq based gene expression values were extracted for the designed use-cases (described in Proposed Methodology—Step 2B) in CanSeer (Supplementary Information 1—Supplementary Data 7). On selecting Case 1 and choosing whole dataset, 49 paired normal and cancer samples (Case 1) were obtained. On choosing Case 2 and picking whole dataset, 50 normal and 453 cancer samples were attained. Opting for Case 3 and whole dataset, 453 cancer samples were acquired. Next, to remove the skewed data from normal samples, outliers were removed using interquartile range (IQR). To remove outliers from expression data of cancer samples mapped with CNVs, SMs, and SVs, again IQR was chosen (Supplementary Information 1—Supplementary Data 8). After the removal of outliers from normal and cancer samples, the cleaned data was combined (N + C_s) in Cases 1 and 2. However, the processing remains limited to cancer samples only in Case 3.

Normalization

Subsequent to the data pre-processing step, the process of normalization is undertaken. The dataset pertaining to lung squamous cell carcinoma (LUSC) samples, which had undergone processing in previous step (Step 2B) for all three different cases, was then subjected to normalization (Supplementary Information 1—Supplementary Data 9).

Step 3—Model annotation with patient-specific omics data and dynamical analysis

In the third step, CanSeer employs the normalized gene expression values of cancer samples along with their CNVs, SMs and SVs, and incorporates them into the Boolean logic model for personalizing rules-based biomolecular network models (Fig. 4). A step-by-step description of model annotation with normalized omics data in case of paired (Case 1), unpaired (Case 2), and cancer samples only (Case 3) is given in “Proposed Methodology—Step 3”.

To exemplify model annotation with patient-specific omics data for Case 1, 7 samples of LUSC were randomly selected from 49 paired normal and cancer samples obtained in Case 1 of Step 2. The network model was annotated for each patient’s normal and cancer samples as described in Case 1 of Proposed Methodology—Step 3. Notably, some input nodes are not directly associated with the gene symbols, and hence were not assigned gene expressions directly. Moreover, input nodes representing biomolecules that contain multiple sub-units were assigned abstracted values computed from downstream nodes’ associated genes. The criteria to assign representative values to network nodes in light of patient’s gene expressions were based on the following: patient mutation (PM), network connectivity (NC), frequency (F), exact match (EM), etc. (Supplementary Information 1—Supplementary Video 3, Supplementary Table S2). The DA of patient-specific cancer models exhibited an overall decrease in apoptosis and senescence with an increase in proliferation as compared to corresponding patient-specific normal models. The other cell fates including cell cycle arrest, quiescence, and metastasis varied from patient to patient (Supplementary Information 1—Supplementary Data 10A). For Case 2, median value of normalized RNA-seq based gene expressions (all 50 normal samples) was assigned to the network model, and DA was performed. The cell fate propensities of proliferation, apoptosis, quiescence, senescence and cell cycle arrest obtained from DA were 0.1370, 0.2709, 0.3608, 0.1214 and 0.0911, respectively. From 453 unpaired cancer samples obtained in Step 2—Case 2, 10 cancer samples were randomly picked for developing personalized cancer models. The 10 personalized cancer models were developed by annotating the network model with patient-specific gene expressions along with SMs, CNVs, and SVs. Next, DA was performed, and the cell fate outcome of each individualized cancer model was compared with the cell fate propensities of median assigned normal model. The comparison revealed decrease in apoptosis and senescence with an increase in quiescence. The patient-to-patient variation in cell fates is shown in Supplementary Information 1—Supplementary Data 10B. For Case 3, the model annotation for cancer samples only was similar to cancer samples in Case 2 of Step 3 (Supplementary Information 1—Supplementary Data 10C).

Step 4—Therapeutic evaluation for personalized cancer treatment

The last step of CanSeer involves the personalization of cancer therapeutics. To demonstrate Step 4, all targetable nodes of the network (either druggable or clinically actionable) were identified (Supplementary Information 1—Supplementary Data 11) and their oncogenic roles i.e. oncogene and tumor suppressor were acquired (Supplementary Table S3). The list of targetable nodes and their corresponding drugs including EGFR—Afatinib, AKT—Ipatasertib, BRAF—Dabrafenib, etc. are listed in Supplementary Information 1—Supplementary Data 12. Next, the genetic alterations of LUSC patients (mentioned in exemplification of Step 3) were mapped with the cancer driver genes of TCGA-LUSC project determined by OncoVar and IntOGen (Supplementary Table S4). The drugs identified to target druggable, or clinically actionable nodes were then queried in GDSC2 to obtain IC₅₀ values. The IC₅₀ values of candidate drugs were extracted for “Lung NSCLC squamous cell carcinoma” specific cell lines. The average IC₅₀ values computed after the removal of outliers for afatinib, afuresertib, APR-246, carmustine, dabrafenib, dinaciclib, ibrutinib, ipatasertib, nutlin-3a, olaparib, osimertinib, oxaliplatin, palbociclib, ribociclib, selumetinib, trametinib, ulixertinib and KU-55933 were 5.06, 17.52, 707.84, 748.64, 203.12, 0.08, 113.09, 74.53, 265.12, 98.49, 4.42, 60.17, 65.97, 63.55, 28.35, 1.23, 18.80 and 208.66, respectively (Supplementary Information 1—Supplementary Data 13). These average IC₅₀ values were then utilized for computing normalized drug activity score (NDAS) e.g., ipatasertib 0.0474, afuresertib 0.0715, osimertinib 0.1377, etc. (Supplementary Information 1—Supplementary Data 13). Next, we computed drug score (DS) for both patient-specific normal and cancer samples (Case 1) whose annotation was exemplified in Step 3. With DS, the personalized models developed in Step 3 were re-analyzed (Supplementary Information 1—Supplementary Data 14A). The cell fates obtained on screening the personalized models with DS reported several efficacious drug combinations along with patient-specific response to various treatment options (Fig. 5A, Supplementary Information 1—Supplementary Data 14A). The cell fate outcomes were employed to compute efficacies and evaluate cytotoxic effects, detailed in the individual patient files in Supplementary Information 1—Supplementary Data 14A. Further, patient-specific therapeutic response index (TRI) for each drug and drug combination was calculated as the difference between efficacy and cytotoxic effect (Supplementary Information 1—Supplementary Data 14A). The drug combination with highest TRI was ranked one. The top ranked treatment options exhibited patient-centric variability (Table 1, Fig. 5B) and their cell fate outcomes were compared against their respective normal and cancer model cell fates (Supplementary Information 1—Supplementary Data 14A). Additionally, the drug combinations were listed in ascending order based on proliferative indices and descending order based on apoptotic indices (Fig. 5C and D, Supplementary Information 1—Supplementary Data 14A). Therapeutic evaluations for cases 2 and 3 are shown in Supplementary Information 1—Supplementary Data 14B and C, respectively.

Table 1 The top ranked personalized treatment options identified by employing Case 1 of CanSeer.

Full size table

Overall, the LUSC case study revealed four mono-targeted drugs (Afatinib, Osimertinib, Dabrafenib, and Trametinib) and two drug combinations (Osimertinib + Oxaliplatin and Dabrafenib + Trametinib) which have already received FDA (Food and Drug Administration) approval for non-small cell lung cancer (Fig. 6, Supplementary Information 1—Supplementary Data 15). Additionally, our predictions of 26 drugs/drug combinations for LUSC, such as Osimertinib + Selumetinib, Palbociclib + Selumetinib, APR-246 + Olaparib, etc. corroborate with the scientific literature (Fig. 6, Supplementary Information 1—Supplementary Data 15). Moreover, 20 drug combinations predicted by CanSeer require further evaluation in LUSC, including APR-246 + Ibrutinib, Olaparib + Ibrutinib, Ulixertinib + Trametinib, etc. as they are either under investigation or approved for various other cancer types, suggesting potential for repositioning (Fig. 6, Supplementary Information 1—Supplementary Data 15). Finally, CanSeer also identified several novel treatment combinations for LUSC including Afuresertib + Palbociclib, Dinaciclib + Trametinib, Afatinib + Oxaliplatin, Ulixertinib + Olaparib, etc. (Fig. 6, Supplementary Information 1—Supplementary Data 15).

Discussion

Computational analysis of biomolecular regulatory networks in cancer has become an indispensable tool for elucidating the mechanistic underpinnings of tumorigenesis and tumor progression^58,59,60,61. Dynamical simulations of such models have further helped classify patients into clinical sub-groups as well as predict survival^46,48,49,62. In this regard, network modeling has also been used to evaluate efficacious drug targets^52,63, but lack integration of comprehensive omics data and information on druggable and clinically actionable targets, impeding clinical translation. Incorporating RNA-seq based gene expressions, copy number variations (CNVs), and somatic mutations (SMs) into network models improves mapping of molecular profiles, yet qualitative approaches fall short in developing the clinically employable combinatorial treatments. Additionally, the employment of online drug databases, which also contain clinically disapproved drugs, can lead to off label in silico drug targeting. Consequently, the assistance offered by in silico studies towards personalized cancer care remains rudimentary in the context of clinical adoption. Moreover, the evaluation of drug effects using drug scores coupled with quantitative patient’s omics data to assess therapeutic response, remains unexplored. Lastly, integrative modeling frameworks that facilitate computational investigations into effective treatments, and cytotoxicity towards designing personalized therapies remain lacking. Taken together, there is a need for a systematic method to develop in silico models that provide comprehensive coverage of patient’s molecular profile, as well as facilitate clinical translation of modeling-based personalized therapeutics.

To address this need, we have proposed “CanSeer”, a method to develop in silico models for clinical translation of personalized therapeutics. CanSeer couples’ dynamical analysis of biomolecular network models with patient-specific genomic and transcriptomic data to assess the individualized therapeutic responses to targeted drugs, chemotherapeutic agents, and their combinations. CanSeer is network model agnostic and supports all major types of biomolecular network models. Once any network model is defined and converted into a computable form (e.g. Boolean rules, weight-based, or differential equations-based models), it can be readily ported for onward dynamical analysis.

Towards a quantitative evaluation of personalized in silico models, CanSeer employs RNA-seq based gene expression data to annotate the network models. Moreover, the patient’s CNVs, SMs, and genomic structural variants (SVs) are incorporated to enable the elucidation of molecular signatures. CanSeer envisages quantitative personalized models in comparison to recent approaches that encoded CNVs (amplified genes and homozygous deletions), and SMs (oncogenes and tumor suppressors) as 0 or 1. For an integrative investigation into patient tumors, statistical outliers detected in gene expression data having CNVs, SMs, and SVs are tagged and stored. Considering the heterogeneous impact of genetic alterations, network nodes are annotated with normalized expressions such that the node activity remains conserved throughout the analysis for genetically altered genes.

For enabling translational pertinence and prioritizing repositioning opportunities, cancer driver genes in patients are identified, and druggable and/or clinically actionable target nodes are selected for therapeutic interventions. As a result, CanSeer can elicit optimal tailored treatment by employing therapeutic screening of druggable and clinically actionable targets, thus enabling clinical translation. The proposed formulation of drug scores assisted in capturing the effect of drug-to-drug variability e.g. Ipatasertib and Afuresertib are both AKT inhibitors with average IC₅₀ of 74.5 and 17.5 for lung NSCLC (non-small-cell lung cancer) squamous cell carcinoma cell lines. Computing drug scores for patients and performing targeted in silico screening on cancer samples allowed identifying patient-specific responses to targeted drugs and their combinations. Likewise, administering chemotherapeutic agents to both normal and cancer samples helped reveal patient-centric responses to chemotherapy drugs and their combinations with targeted drugs. Finally, the patient-specific treatment efficacy together with evaluating cytotoxic effect helped identify and rank optimal tailored treatments.

In terms of modelling approach, CanSeer takes Kauffman’s logical modeling^64,65,66 paradigm and employs deterministic analysis pipeline^56,57, to enable customization and evaluation of large personalized networks. Earlier, Beal et al. devised personalized models by integrating CNVs, and SMs as discrete data, and expressions (gene/protein) as continuous data into Boolean models⁴⁹. However, the resultant model’s integrative investigation remained limited as CNVs, and SMs remained unmapped with expression data. CanSeer addresses this by integrating RNA-seq based gene expressions with CNVs, SMs, and SVs for a consolidative investigation into patient’s tumor. The mapping between gene expression and genetic alterations is based on the positive correlation of CNVs, SMs, and SVs with gene expression levels^{67,68,69,70,71,72,73}. In 2022, Montagud et al. applied Beal et al.’s approach on Fumia’s Boolean model to personalize 488 prostate cancer patients and eight cell lines⁵³. Despite having multiple readouts including proliferation, invasion, migration, metastasis, DNA repair, and apoptosis, the study remained limited to eliciting proliferation and apoptosis in patient-centric models. Further, to identify potential treatments, knockouts were performed to increase apoptosis and deplete proliferation. A similar approach was applied in our previous study⁵² to predict combinatorial treatment for colorectal cancer. However, a caveat in employing discrete knock-out perturbations is that they show maximum effect, and hence cannot be employed as direct targets in wet-lab experiments and clinical trials. In contrast, CanSeer considers druggable and clinically actionable targets and investigates the quantitative patient treatment response by incorporating drugs’ IC₅₀ values to establish clinical relevancy and applicability. Moreover, in our colorectal cancer study, cell-type specific gene expression data of fly gut was integrated with patient-specific mutations to identify personalized drug combinations. In contrast, CanSeer incorporates a comprehensive molecular signature comprising of patient-specific RNA-seq based gene expressions, CNVs, SMs, and SVs. Towards improving clinical translation of computational precision oncology approaches, Ianevski et al. executed a machine learning approach to select patient-specific drug combinations with synergy-efficacy-toxicity balance⁵¹. However, some of the identified combinations were not clinically applicable due to the utilization of public drug databases that contain incomplete information. CanSeer employs extensive literature review in addition to drug database utilization to avoid devising inoperable combinations. In addition, CanSeer applies computational modeling approach that provides mechanistic insights into patient tumors, and optimal personalized therapy identification.

CanSeer’s methodology exemplified through three use cases suggest that the proposed methodology is effective in identifying personalized anticancer therapies. However, the published model used for demonstration of CanSeer lacked coverage for certain mutations in some patients, resulting in inconclusive outcomes. Moreover, the study remains limited in prioritizing synergy-based drug combinations. Alongside, the patient-specific sensitivity to drug cocktails revealed by the in silico personalized models require further evaluation by employing tumor-on-chip models that have the potential to reproduce patient’s cancer cells and microenvironment^74,75,76. Prospectively, the proposed methodology can be extended by integrating in silico personalized models with microfluidics-based tumor-on-chip models to expedite testing of predicted personalized treatment options and assessing response. In terms of limitations and possibilities for future extension, CanSeer provides a roadmap for developing tailored therapeutic combinations in light of patient genomic and transcriptomic profiles, however, wet lab validation of predicted efficacious drug combinations, and repositioned drugs can enhance CanSeer’s clinical acceptability. Moreover, CanSeer provides a direction for model annotation with patient’s data, however, requires an expansion to cater for panomics data, to avoid under or overfitting of patient’s profile. Pertaining to patient’s tumor profile, it is also important to discover patient-specific cancer driver mutations for which the algorithms^77,78,79 can be combined with CanSeer to identify potential driver genes in patient. Likewise, various algorithms identifying CNVs, SMs, and SVs can be integrated to abstract the missing information of a particular patient from whole-exome sequencing data^{80,81,82,83,84,85,86,87,88}.

Furthermore, in view of future extension, CanSeer cues the development of in silico organoids⁸⁹, and multi-scale modeling pipelines⁹⁰ that can simulate the spatiotemporal evolution of patient tumors, and the temporal effect of therapy on personalized in silico organoid models. Such automated multi-scale modeling pipelines backed by high-throughput performance computing leveraging massively on parallel graphics processing units (GPUs) can enable real-time analyses for predicting precision medicine.

In terms of clinical feasibility, CanSeer has the potential to assist oncologists in making informed treatment decisions by integrating patient genomic and transcriptomic data, potentiating effective cancer therapies and improved patient survival. The direct use of molecular data enhances the accuracy of individualized treatment options, while the capability to reposition FDA-approved drugs accelerates development of novel therapeutic avenues. In addition, CanSeer’s capacity to furnish quantitative treatment responses empowers clinicians to tailor therapies with precise risk–benefit assessments. Its scalability across different cancer types broadens the framework’s applicability in clinical oncology, and its identification of novel drug combinations may improve efficacy and outcomes possibilities in end-stage cancer cases.

Navigating lengthy regulatory processes for novel drug combinations can delay clinical deployment, rendering an impediment in translation of in silico predictions into practice. This could be confounded by the limited provision of high-quality omics datasets. Additionally, the complexity of personalized dynamic network modeling demands high-performance computing posing scalability challenges in clinical settings. Furthermore, accurately predicting drug synergies and toxicity is complicated by intricate drug interactions, while the successful integration of CanSeer into clinical workflows requires the development of intuitive software solutions.

In conclusion, the proposed methodology demonstrates assessment of response and cytotoxicity of personalized cancer therapeutics. The novel framework thus fills a crucial void in clinical provision of personalized cancer treatments through a comprehensive incorporation of patient’s molecular profile coupled with evaluation of clinically actionable targets.

Materials and methods

Biomolecular network model construction

The process of network model development comprises of two sub-steps, i.e. (a) model assembly, and (b) assembled model’s implementation within the modeling and simulation platform, TISON, for onward analysis. The model assembly process is typically manual and starts with a review of published literature to gather data on gene regulatory interactions, protein–protein interactions, signaling pathways, etc. Once an exhaustive list of interacting partners and the type of interactions is collected, individual Boolean rules are defined for each gene or protein (node) using logic operators. The process of defining Boolean rules is detailed in Supplementary Information 2—Supplementary Method 1. Note that users can also choose to construct networks by retrieving interactions from interaction databases like Pybel, Reactome, BioPAX, or from literature using natural language processing. To implement the assembled model, users can employ TISON’s rules editor (Supplementary Information 2—Supplementary Method 2).

To develop a biomolecular regulatory network model, pathways and interactions were retrieved from existing databases including Kyoto Encyclopedia of Genes and Genomes (KEGG), PathBank, Pathway Interaction Database (PID), and Reactome. These pathways were then integrated, and their crosstalk was incorporated into the network architecture. Rules-based and weight-based formalisms were used for modelling the resultant network architecture towards carrying out network analyses. Case study exemplars were developed around the human signaling network comprising of 197 nodes and their interactions. Boolean logic rules were defined to translate the adopted network into a rules-based model. For developing a weight-based version of rules-based biomolecular regulations, the basal value of each network node was set at 1. Next, interaction weights were computed based on the number of adjacent nodes such that the output of weight-based model becomes comparable to the rules-based model⁹¹. The interaction weights were then adjusted iteratively until the results closely matched those of a deterministic analysis of rules-based model. To compare the results from both approaches, the steady state cell fate propensities of the output nodes were compared.

Input and output node setup using fixed node states and cell fate programming

To analyze the human signaling network under normal conditions, values for input nodes were abstracted from published literature (Supplementary Table S5). Each input value was “fixed” to cater for inputs such that the nodes’ state remains unchanged during dynamical analysis. The normalized patient gene expression for each input node was selected to personalize the network model. In the personalized cancer model, patient’s genetic alterations were also fixed as “fixed node states”. A fixed node state refers to a node in a biological network that is “locked” in a specific state—either ON (1) or OFF (0)—and remains unchanged throughout the simulation, regardless of the rules governing the node’s behavior (Supplementary Information 2—Supplementary Method 3). To program the cell fates, associated biomarkers and the associated network nodes’ states were acquired from the published literature (Supplementary Table S6). Output nodes and their associated nodes’ states defining cellular functions such as apoptosis, senescence, cell cycle arrest, etc. were abstracted from literature to further expand the set of cell fates (Supplementary Table S7). The resultant cell fate classification program was tuned in the light of normalized RNA-seq based gene expression data of patients to personalize the models (Supplementary Table S8). Combinations of nodes’ states were created to provide an exhaustive coverage for the cell fate classification scenario.

Dynamical deterministic analyses of biomolecular networks

Dynamical assessment of networks was undertaken by performing Deterministic Analysis (DA) using an in-house web-based modelling and simulation platform, Theatre for in silico Systems Oncology (TISON)⁵⁷. TISON provides a user-friendly and web-based implementation of both deterministic^56,57 as well as probabilistic^92,93 analysis pipelines within the network’s editor (Supplementary Information 2—Supplementary Method 4). To derive the Boolean rules from Cho et al.⁵⁵ provided truth tables (Supplementary Information 2—Supplementary Method 5), we employed Kadelka et al.’s logical rules generation toolbox⁹⁴. Next, for deterministic analysis of the network in TISON, two additional files including fixed node states file (inputs) and the cell fate classification file were adopted from Cho et al. The fixed node file contained initial input conditions to cue the DA. The maximum iterations to search for the steady state were set to 500 and a bootstrap of 256 network states was applied. The maximum number of iterations for finding the steady state and the bootstrap value were estimated through iterative network analysis and simulations (Supplementary Information 2—Supplementary Method 6). The node update policy in DA updates all nodes in the system simultaneously at each time step following predefined rules (Supplementary Information 2—Supplementary Method 7). Finally, output node propensities from DA were used to arrive at the cell fates.

For the input nodes, normalized gene expression data of patients, under normal and cancer conditions, were defined in the fixed node states file. Additionally, the normalized gene expressions of genetically altered genes were set as fixed node states for the patient-specific cancer network. The fixed node states file of the cancer patient was used as-is for therapy. An additional input “exhaustive screening file” was defined for performing DA-based therapeutic evaluation. The patient drug scores were integrated to target the specific nodes using the exhaustive screening file.

Robustness analysis and sensitivity analyses

To determine the susceptibility of the biomolecular network to minor variations, the network inputs were perturbed by up to 10%. The input conditions were taken in combination for network analyses. A random sample of 256 network states was selected along with 10% of fixed node combinations. DA was performed to compute average node propensities, standard deviation, and standard error of mean (SEM) for each mapped cell fate. The average propensities and SEMs of emergent cellular phenotypes were plotted to evaluate the stability of the network. To threshold, a SEM value of 0.05 was used for declaring the network robust.

To analyze the network behavior under large input perturbations, the adopted network with cell fate expansion was exposed to all possible input values (parameters), sequentially. Each input was assessed at uniform increment of 0.1 from 0 to 1. In order to see the influence of each parameter, all other inputs were kept at normal levels. For that, DA for each input was performed with bootstrap of 256 network states. Results from each input node perturbation were compiled to see the overall effect of each input on the network. Biological plausibility was established by validating the interpretation of each input-cell fate relationship against literature (Supplementary Information 2—Supplementary Method 8). Additionally, multiple input perturbations were performed to investigate the synergistic or antagonistic relationship of the input stimuli in the human signaling network. For combinatorial parameter sensitivity assessment, the levels of co-occurring inputs were varied, and the effect on associated cell fates was stored. DA was performed using TISON.

Genomic and transcriptomic data assembly, pre-processing, normalization and model personalization

To acquire patients’ transcriptomic profiles (RNA-seq based gene expression data), The Cancer Genome Atlas (TCGA) PanCancer Atlas studies were accessed using Genomic Data Commons (GDC) portal. For obtaining the genomic data including copy number variations (CNVs), somatic mutations (SMs), and genomic structural variants (SVs), the same TCGA project was queried in cBioPortal. Patient data matching was ensured by comparing sample IDs between GDC and cBioPortal. Patients’ genomic and transcriptomic data were obtained for the TCGA-LUSC project to demonstrate the three cases of proposed method “CanSeer”. MATLAB 2020b⁹⁵ was used to implement the pre-processing and normalization of genomic, and transcriptomic data.

Implementation of genomic and transcriptomic data pre-processing and normalization

The CanSeer genomic data pre-processing and RNA-seq based gene expression data normalization algorithm was implemented in MATLAB 2020b. The MATLAB script is available in Supplementary Information 1—Supplementary Data 7 and as well as at GitHub (https://github.com/BIRL/CanSeer). The process initiates with filtering of the sample sheet obtained from the TCGA program based on the following keywords: “normal”, “tumor”, “primary”, and “metastatic” to include all types of normal and tumor samples, except ‘recurrent tumors’. Next, specific case amongst Cases 1, 2, and 3 as (i) “paired”, (ii) “unpaired”, and (iii) “cancer samples only” is selected. For case 1, a new sample sheet is assembled for both normal and cancer samples that include patient/sample IDs along with the RNA-seq gene expression file names mapped against normal and cancer samples. For case 2, two separate sample sheets are formed for the normal and cancer case. Each sample sheet contains patient/sample IDs with their corresponding RNA-seq gene expression file names. For case 3, the sample sheet is assembled for cancer samples only. Subsequently, the option of random sample or complete dataset of RNA-seq gene expressions can be opted. Next, the network’s nodes list, comprising of all network nodes, is employed to extract and align RNA-seq gene expressions of patients with the respective genes and patient IDs. In the next step, the outliers are detected in the dataset using MAD (Median Absolute Deviation) or IQR (interquartile range) methods. After detection of statistical outliers, Copy Number Variations (CNVs), Somatic Mutations (SMs) and Genomic Structural Variations (SVs) are selected and processed. From CNVs, deep deletions (-2) and amplification (+ 2) (based on GISTIC processing) are saved. CNVs having low-confidence values of –1 and + 1, obtained using GISTIC processing, are removed. The MATLAB script encodes deep deletions of genes as -2, amplifications as + 2, and remaining data as 0. Next, the filtered SMs (for selected patients and network genes) are transformed into logical arrays, where 0 and 1 represent the absence and presence of mutation, respectively. Similarly, patient-specific SVs are also transformed into logical arrays representing the absence (0) and presence (1) of genomic SVs. Lastly, the detected outliers are super-imposed with the CNVs, SMs, and SVs to retain the highly altered RNA-seq gene expressions resulting from CNVs, SMs, and SVs. The remaining outlier gene expressions are removed after which the normal and cancer samples are combined and normalized between 0 and 1 using the highest gene expression across patients. The normalization strategy for Case 3 is similar to Cases 1 and 2, with the exception of RNA-seq gene expression data availability for only cancer samples, wherein, the RNA-seq gene expressions are normalized by the maximum gene expression from cancer samples only.

The normalized RNA-seq based gene expression data was used to annotate the network model for personalization.

Target identification using literature and drug-target databases and evaluation

Drugs targeting each node (biomolecule) in the network were obtained from published literature, and drug-target databases including PanDrugs, DrugBank, and The Drug Gene Interaction Database (DGIdb). Only druggable and clinically actionable nodes were selected to enable the clinical translation of resultant models. The drugs identified were then queried in Genomics of Drug Sensitivity in Cancer (GDSC2) to obtain inhibitory concentrations (IC₅₀). The IC₅₀ values of candidate drugs were acquired for “Lung NSCLC squamous cell carcinoma” cell lines and their mean were computed after removing the outliers. Next, the log of mean IC₅₀ values of candidate drugs were used to formulate the drug activity score (DAS). DAS was then normalized and utilized to compute a “drug score for patient”, which was then employed to annotate the personalized model. The cell fate outcomes from the resultant model’s DA were then used to calculate the patient-specific responses including treatment efficacy, cytotoxic effect, therapeutic response index (TRI), proliferative index and apoptotic index.

Proposed methodology

The proposed methodology of CanSeer comprises of four steps, which are detailed below:

Step 1—Literature and database-driven development of biomolecular regulatory network models and validation

To abstract biomolecular regulation and develop a cancer-type specific biomolecular network architecture, published literature as well as online databases including KEGG⁹⁶, PathBank⁹⁷, PID⁹⁸, and Reactome⁹⁹ are employed. The resultant network topology includes input, output, and processing nodes along with their interactions. The input nodes in the network cue the downstream processing nodes which then crosstalk, and signal output nodes. For dynamical analysis of the model, this topology is translated into the rules-based^100,101,102 or weight-based^103,104 network models. For the analysis of rules-based or weight-based model, normal conditions are abstracted from the literature and are assigned to the input nodes. Next, the dynamical analysis of the network is performed using either deterministic (DA)^56,57, probabilistic(PA)^92,93, or ordinary differential equation(ODE)¹⁰⁵ modalities under the influence of input conditions. DA, PA and ODE modalities are detailed in (Supplementary Information 2—Supplementary Method 9). Note that the case study provided in this work utilized deterministic analysis modality from TISON, only and the resulting outcomes are detailed in the results section. Results from dynamical analyses include output node propensities which are then used to program cell fate outcomes such as quiescence, proliferation, cell cycle arrest, apoptosis, etc. To validate the cell fate outcomes, the trends in node-specific cell fate propensities are tallied with the published literature. To further evaluate the biological plausibility and sensitivity of the validated model, random and systematic perturbations are introduced into the model by varying input conditions termed as robustness¹⁰⁶, and parameter sensitivity^107,108 analyses. Here again, the resulting cell fate outcomes are matched and validated against the published literature. The workflow of network model development and dynamical evaluation is shown in Fig. 2A.

Step 2—Omics data collection, pre-processing, and normalization

In CanSeer’s second step, patient-specific omics data is acquired, pre-processed, and normalized for later use in network model annotation. The detailed workflow of CanSeer’s Step 2 is presented in Fig. 3.