Exploring weighting schemes for the discovery of informative generalized between pathway models to uncover pathways in genetic interaction networks

M. Yu, Kevin; J. Cowen, Lenore

doi:10.1038/s41598-025-16353-2

Download PDF

Article
Open access
Published: 18 August 2025

Exploring weighting schemes for the discovery of informative generalized between pathway models to uncover pathways in genetic interaction networks

Kevin M. Yu¹ &
Lenore J. Cowen¹

Scientific Reports volume 15, Article number: 30169 (2025) Cite this article

992 Accesses
Metrics details

Subjects

Abstract

In S. cerevisiae, a large and rich collection of epistasis data has been collected. When this data comes from double knockouts, it has a natural representation as a signed and weighted graph, where the weight on an edge is computed based on deviation from the expected sickness or health of the double-deletion mutant as compared to its constituent single deletion mutants. Different probabilistic null models (minimum, multiplicative, and logarithmic) to set edge weights appropriately were studied empirically by Mani et al. where the goal was to determine the best weighting scheme for detecting the presence or absence of epistasic effect in an individual double knockout in isolation. On the other hand, approaches such as the LocalCut algorithm of Leiserson et al. look at the entire network, and search for graph-theoretic structure indicative of compensatory pathways. The effect of different edge weighting schemes on the biological pathways returned by algorithms such as LocalCut has not been previously studied. We compare the generalized Between Pathway Models produced by LocalCut under multiple different ways of calculating edge weights, and analyze the resulting collections of putative redundant pathways that are produced. We recover some known pathways, find some interesting new pathways as well as give broad recommendations for how to set the parameters of LocalCut to produce the most biologically relevant gene sets.

Integrating bioinformatics and machine learning to discover sumoylation associated signatures in sepsis

Article Open access 24 April 2025

Comparative transcriptome and WGCNA reveal key genes involved in lignocellulose degradation in Sarcomyxa edulis

Article Open access 01 November 2022

Exploration of effective biomarkers for venous thrombosis embolism in Behçet’s disease based on comprehensive bioinformatics analysis

Article Open access 10 July 2024

Introduction

It is estimated that only about 18% of individual yeast genes are essential in normal growth conditions in rich media, meaning that a deletion mutant with that gene deleted or suppressed is not viable¹. For genes that are not essential, the S. cerevisiae genetic interaction network comes from high-throughput epistasis experiments, where edge weights represent the surprise in growth rates from double deletion mutants, compared to their associated single deletion mutants. In particular, a negative weight edge indicates that there is a growth defect (or in worst case synthetic lethality, meaning the double deletion mutant is not viable), and a positive weight edge indicates that the double deletion mutant is not sicker than the constituent single deletion mutants (or in best case, synthetic rescue, where although single deletion mutants display reduced growth, the double deletion mutant behaves like wild type)². The pattern and organization of this signed, weighted genetic interaction network has been shown³ to contain interesting motifs that can indicate redundancy and the presence of compensatory pathways. These alternative pathways can be mechanisms of global resilience, where the cell can still accomplish an essential function through a different pathway if one is not functioning.

The genetic interaction network can be clustered into clusters that broadly recapture gene function, searched for genes that are globally highly pleiotropic, or, as is the main focus of the current work, mined for subgraphs that witness mechanisms of compensation and redundant pathways, generalizations of the so-called Between Pathway Model (BPM), first defined for the unweighted synthetic lethality data by Kelley and Ideker³.

Kelley and Ideker’s BPM is a network motif found in the superimposition of synthetic lethality genetic interaction (GI) edges, and physical protein-protein interaction (PI) edges. Consider a model consisting of a pair of protein pathways where each pathway serves as a redundant backup for the other. Within each pathway, there will be many physical interactions between nodes (protein-protein binding, direct transcriptional regulation, etc.), reflecting each pathway’s existence as a coherent functional unit. Synthetic-lethality interactions, on the other hand, will be few or nonexistent within each pathway, since the other pathway provides a failsafe mechanism for its partner. Between the two pathways, there will be more observed synthetic-lethality interactions: if corresponding components are deleted or suppressed in both pathways at once, the fault-tolerance of the system is defeated, and the strain dies. A network motif corresponding to this situation, in which two groups of genes, each group found to be edge-dense within the PI network, are connected by many synthetic-lethality edges in the GI network, defines the BPM (see Fig. 1: panel A). In Brady et al.⁴, we observed that the dense bipartite subgraphs that comprised the synthetic lethality edges in BPMs were analogous to 2-edge cuts, and used the theory of max cut problems to design algorithms to find putative BPM subgraphs in this setting; using the location of the known physical interaction edges as validation⁴. Thus our method required only the GI network as input, and found BPMs without considering the PI network. Other early papers also considered the task of finding BPMs in this unweighted setting^5,6.

Advances in the technology led to high-throughput experiments using E-MAP⁷ and SGA⁸ technologies that were able to give a scalar weight to the genetic interaction edges (instead of just the binary viable/non-viable). The weights on genetic interaction edges were calculated from a null model of the expected sickness of the double mutant as compared with its constituent single deletion mutants. The earliest high-throughput experiments used a null model based on the fitness of the sicker of the single deletion mutants⁸. We created the LocalCut algorithm^9,10 that generalized the notion of BPM to this setting, in the natural way (see the definition of gBPM below).

However, there is still an open question as to how to best to calculate the weights on the GI edges, based on the measured sickness and wellness of the double deletion mutant as compared to its constituent single deletion mutants. A study of Mani et al.¹¹ showed that a log or multiplicative model behaved in a more informative manner than a model that considered the minimum sickness, particularly for uncovering epistatic relationships including genes whose single deletion mutant was highly deleterious¹¹ (whereas for low-magnitude single deletion mutants, the models were largely equivalent). Thus most subsequent experiments^12,13 moved to a multiplicative model for determining epistasis weights. However, the objective of the work of Mani et al.¹¹ was to determine, in isolation, which pairs of genes showed epistasis, rather than searching for sets of genes that comprised generalized BPMs. There has not been a systematic study of how these weighting schemes influence the computational search for generalized BPMs. This is the subject of the present paper. We find that regardless of whether a minimum, multiple or logarithmic epistasis base model is employed, that squaring the edge weights (to pull them away from zero and denoise the signal, as recommended in Gallant et al¹⁰) improves discovery of generalized BPMs. We also find that some interesting generalized BPMs are uniquely discovered by each weighting scheme.

Related work

The LocalCut algorithm of Leiserson et al⁹ was shown to be better than a similar algorithm of Kelley and Kingsford¹⁴. Most recently, a new ILP method of Liany et al.¹⁵ was shown to outperform LocalCut when searching for weighted BPMs in two human cancer synthetic lethality networks. However, since their method uses a similar definition of a high scoring weighted BPM as LocalCut, only improving the search heuristic, any weight improvement for LocalCut should also transfer to their methods. Other recent related work (reviewed in Wang et al.¹⁶) is not directly comparable, because it primarily focuses on predicting individual synthetic lethality interactions based on integrating other sources of biological data such as physical protein-protein interaction networks, co-expression data, mutual exclusivity in cancer, or metablomics data^17,18,19, though it is worth mentioning the work of Amar et al²⁰ in particular, that uses combined physical PPI and genetic interaction data to search for network modules, a more general concept than BPMs.

Background

gBPM collections

In the unweighted case, the GI BPMs corresponded to a set of genes, partitioned into two pathways (A, B) where the majority of GI (in this case Synthetic lethality) edges, fell between A and B, with one endpoint in each (hence the name “Between Pathway Model”). The natural generalization to weighted genetic interaction networks, instead searches for pairs of sets of genes, or pathways in which there are predominantly large negative edge weights in the inter-pathway edges, and predominantly positive edge weights among the intra-pathways edges (Fig. 1: panel B). These pairs of sets of genes we will call gBPMs for generalized between pathway models. More formally, we now introduce the following definition:

Definition 1

Let G be a graph with positive and negative real edge weights. A pair of disjoint subsets \((A,B) \in V(G)\) is a (k, W)-gBPM if \(3 \le A,B \le k\), and \(M =\sum _{x \in A, y \in A} w(x,y) + \sum _{x \in B, y \in B} w(x,y) - \sum _{x \in A, y \in B} w(x,y) < -W.\) We refer to positive number \(-M\) as the score of the gBPM.

In this definition, A and B are the putative compensatory pathways, and edges between genes in opposite pathways should have large negative weights, while edges in the same pathway should have positive weights. As noted in past studies, there is an asymmetry between the positive and negative edge weights, where the negative edge weights are typically of greater magnitude than the positive edge weights. This does not affect the definition, of course, but it does mean that the between pathway edges tend to be more important than the within pathway edges when a subset (A, B) qualifies to be labeled a gBPM.

Note, however, that whenever (A, B) is a gBPM, many sets of genes that are highly overlapping with (A, B) will also be a gBPM. For each specific gene g, we can ask if there exists a (k, W)-gBPM that includes it. It is also interesting to construct collections of “unique” gBPMs, filtered by a threshold on how similar the sets (A, B) and (C, D) can be for them both to be placed in the same collection (we follow past work, where the Jaccard index²¹ was used to set this threshold).

Definition 2

A collection \({\mathcal {C}}\) of gBPMs is j-filtered, if when, \((A_s,B_s)\) and \((A_t,B_t)\) in \({\mathcal {C}}\) with \(s \ne t\), we have \(Jaccard(A_s \cup B_s, A_t \cup B_t) \le j\).

Definition 3

For collection \({\mathcal {C}}\) of gBPMs, \((A_i, B_i)\), with scores \(M_i\), Score \(({\mathcal {C}}) = \sum _{i \in {\mathcal {C}}} M_i\).

We note that we can also search for j-filtered collections of gBPMs in the unweighted setting by simply including edges of weight −1 for each synthetic lethal interaction.

The LocalCut method searches for j-filtered gBPMs^9,10 in a weighted GI network, and was shown to find many meaningful compensatory pathways, including some involved in DNA repair, using the data of Tong et al.⁸. However, since that work, much more comprehensive collections of yeast epistatsis data have been generated. In this work, we look at how the schemas for generating edge weights (log, multiplicative or min) affects the number and quality of BPMs found by LocalCut in this new data.

Methods

Weighting schemes

Here are the alternative weighting schemes we test. Let \(S_a\) and \(S_b\) denote the fitness scores for single deletion mutations of genes a and b respectively. Let \(D_{a, b}\) denote the fitness score for the double deletion mutation for genes a and b. Let w(a, b) be the weight given to the edge between genes a and b.

Definition 4

Minimum weighting. \(W(a, b) = D_{a, b} - min(S_a, S_b)\)

Definition 5

Multiplicative weighting. \(W(a, b) = D_{a, b} - (S_a * S_b)\)

Definition 6

Logarithmic weighting. \(W(a, b) = log_2(D_{a, b}) - (log_2(S_a) + log_2(S_b))\)

In the remainder of the paper, we abbreviate these weighting schemes as “min”, “mult” and “log” weighting, respectively.

In addition, because this was suggested in the original papers, we also considered a version where the edge weight magnitudes were squared (retaining the positive or negative sign). Squaring the weights is a way of denoising the networks, as weights of magnitude less than 1 will be pulled toward 0, and weights greater than 1 will be increased. Other weighting schemes were considered, such as a cubed version of minimum weighting, but were ultimately disregarded due to poor performance or high levels of noise.

LocalCut parameters

Once a method for choosing edge weights is fixed, LocalCut still has parameters that need to be set. These are m, the number of partitions LocalCut’s max cut algorithm runs, c, the percentage threshold of matching partitions needed to be added to a gBPM, j, the threshold for Jaccard index filtering, and minimumsize and maximumsize, the size requirements for a gBPM’s inclusion in the final output.

We chose to maintain LocalCut’s default values for m, j, minimumsize, and maximumsize (250, 0.66, 3, and 25 respectively), because the changes in scoring methods did not affect what these thresholds were designed to accomplish. More specifically, the maximum and minimum size of the gBPM, j, as well as the amount two different gBPMs returned by the algorithm are allowed to overlap are general parameters of the collection of gBPMs we are trying to find, so it was important to hold them fixed so that the returned collections of gBPMs by different weighting schemes were comparable. m is a robustsness parameter which is there to make sure the stochastic choices of the randomized algorithm always find consistent gene sets. However, the value of c correlates roughly to the sensitivity of the algorithm, and changes in scoring methods necessitated a closer look at this threshold. We did do some testing of different values of m for our different weighting schemes and different values of c, but setting \(m=250\) (the local cut default) remained easily large enough to produce stable results, in every case (results not shown).

Increasing the value of c leads to fewer genes being incorporated in our gBPMs, and fewer gBPMs overall, leading to more true positives. Decreasing c can lead to more genes being incorporated into our gBPMs but also more false positives. A scoring metric that pushes most scores close to zero benefits from a greater c value, since there tends to be less low scoring noise. Thus, a greater c value increases the stringency of the algorithm, reducing noise from the inherent randomness in the algorithm. Meanwhile, a scoring metric with a greater standard distribution of scores benefits from a lower c value, since many of the meaningful genes in a BPM may not pass a greater threshold of c.

Enrichment validation

In order to validate that our gBPMs represent biologically meaningful pathways, we first examine the Gene Ontology (GO) functional enrichment of our pathways. Though each theoretical gBPM has two distinct pathways, because they represent compensatory pathways, we expect them to be enriched for the same or similar terms. Enrichment analysis was performed using the g:Profiler API²². We used the 2,727 genes found in our input as the background, and corrected for multiple testing using the Benjamini-Hochberg correction²³, selecting for \(p < 0.05\). A pathway was considered enriched if g:Profiler finds the pathway enriched in a GO term with less than 500 total genes. BPMs can then be categorized into four categories: No Pathways Enriched, meaning neither pathway is enriched under our definition, One Pathway Enriched, meaning one of the two pathways is enriched, Enriched for Different, meaning both pathways are enriched but do not share any commonly enriched terms (with less than 500 total genes), and Enriched for Same, meaning both pathways are enriched for at least one identical term.

Gene expression validation

Finally, our pathways should contain genes that are generally correlated with each other when looking at gene expression. The SPELL search engine²⁴ version 2.0.3 is integrated with the Yeast Genome Database (SGD)²⁵, and searches for yeast microarray expression datasets that contain a set of genes, to look for expression correlation. We scored each pathway (gBPM module) by taking the average correlation between each gene and every other gene in the pathway across each study. To create a null distribution, we randomized the genes in each gBPM module, drawing from the total pool of genes, and calculated the same scores. This removed any bias from the different number of genes in each pathway. To account for potential bias in the genes chosen in a module, we also tested a null distribution where we shuffled the genes in each module (ensuring that no genes were repeated in any given module). We preferred edge weighting schema where the gene expression correlation for the genes in an individual gBPM pathway is well separated from random.

Experimental setup

We used the raw scores for sickness and health of single and double mutant yeast strains from Constanzo et al.¹². Because of computational constraints running LocalCut, we did not include all yeast genes, but first used a PPI network (as downloaded from STRING, version 11.5²⁶) to cluster sets of yeast genes as in Kolawole and Cowen²⁷. Using physical interaction edges only and the “combined” confidence weight unchanged provided by STRING, we ran cDSD²⁸ plus spectral clustering²⁹ to cluster the genes in the PPI network, as implemented in the glidetools³⁰ package plus the ScikitLearn³¹ implementation of spectral clustering. We asked spectral clustering for four clusters, and merged the largest and smallest cluster (since the smallest cluster contained only 2 genes). This gave us a subset of 2,727 genes out of the 5,848 total yeast genes in the dataset of Constanzo et al¹² from which we constructed a complete genetic interaction network, weighting edges as described above, and where we search for gBPMs. The list of these 2,727 genes appears in the supplement.

Results

gBPM collections

We first explore the number of different gBPMs (where LocalCut only includes new gBPMs if their Jaccard similarity to an existing gBPM is no more than \(j =.66\), matching previous studies), the pathway size, and GO enrichment for each of the six different edge weighting schemes we test. We find that each of these values depends strongly on how the LocalCut consistency threshold c is set, and varies in turn by weighting scheme.

Our first result is that squaring the edge weights always improves performance, for all three different ways of calculating the weights (mult, min and log) and almost all settings of c. There is no consistent trend for the number of modules produced by the squared or unsquared versions of the weights (See Fig. 2), except for the min weighting scheme, where the unsquared weights result in the discovery of a sufficiently tiny number of modules that we ignore min with unsquared weights for the rest of this discussion. When we examine the percent of BPMs with both modules functionally enriched (sum of blue and orange bars in Fig. 3), as well as the proportion of enriched modules across all the returned BPMs, squaring the weights increases these values, usually substantially (see Fig. 4). On the other hand, the average number of genes in the modules goes down somewhat in almost all setting when the weights are squared: this is what would be expected as some of the extra genes are most likely noise, particularly as c is relaxed. The average module size across all weighting schemes lies between 4 and 12 genes (see Fig. 5), with smaller average module size with stricter c values. As c is decreased, it becomes more likely that genes that do not belong are included in a gBPM by chance, however, if gBPMs are being post-checked for enrichment, it may be preferred to allow a more permissive c value. For the rest of the discussion, we consider the squared versions of the edge weights.

In general, with squared weights, the 48 modules found by the min weighting scheme with \(c=70\) give the best values for the proportion of enriched modules as well as the largest proportion of gBPMs with both pathways enriched for the same GO function (Fig. 3), On the other hand, mult with \(c=70\) and log with \(c=70\) gave, by an order of magnitude, the largest number of candidate gBPMs (681 and 559, respectively), however, over one third of these candidate gBPMs display no known enrichment in either pathway, indicating that perhaps more than a third of these candidate gBPMs are just noise. Even when a gBPM is enriched, we expect that this setting of c is too permissive and will result in many spurious extra genes being included in the gBPM sets returned with this parameter.

Co-expression validation

We next compared the co-expression of the component pathways in our gBPMs as compared with random gene sets of the same size. Results for multi (squared) with \(c=90\) and \(c=80\) appear in Figs. 6 and 7 below. Full results for all weighting schemes appear in the supplement, against both random sets of genes (Figures S1-S12) and randomly shuffled sets of genes (Figures S13-S27). As can be seen in Figs. 6 and 7, there are a substantial number of gBPM modules whose expression correlation behaves differently from random gene collections when weights are squared. On the other hand, with unsquared weights, while the mult weighting scheme still produces gBPM modules that are more correlated than random genesets, the mult weighting scheme is nearly indistinguishable from random sets of genes with \(c=80\) (see Fig. 8). In general, the log and min weighting schemes also produces gBPM modules that are skewed from random genesets, especially with squared weights (see the graphs in the supplement).

Some example interesting gBPMs and pathways

gBPMs found across multiple weighting schemes

Some of the same gBPMs and pathways we find are recapitulated across multiple weighting schemes. These gBPMs are very robust to weighting method, and include genes that are part of redudant pathways that have been identified in previous work. Because the Jaccard threshold under which we discard overlapping gBPMs is only set to 2/3, in some of the weighting schemes, we discover different parts of these same redundant pathways, multiple times. One of the most robust pathways that we find consistently across weighting schemes includes 4 components of the COG complex as one of the pathways. The COG complex consists of 8 genes (COG1-COG8), but deleting COG5, COG6, COG7 and COG8 simultaneously yields yeast strains that look like wildtype in laboratory conditions³². These 4 genes from the COG complex was also highlighted as a compensatory pathway in previous work⁹. The compensatory pathways involves genes such as VPS29 and VPS35 involved in retromer transport³³, as well as the cargo receptor protein ERV14³⁴. Other proteins appearing in compensatory pathway opposite the COG genes include IRS4, SYS1 and RAV1, identified by Bonangelino et al³⁵ as vacuolar protein sorting genes. Since the COG complex is well-known to be involved in GOLGI trafficking³⁶, it is possible that some of the retromer transport and cargo receptor, and vacuolar protein sorting genes identified can partially compensate in routing.

Another fault-tolerance mechanism that is recapitulated across multiple weighting schemes involves the well-known parallel pathways for DNA double break strand repair. This was also highlighted in Kolawole and Cowen²⁷. The error-free RAD52 pathway and error-prone zeta DNA polymerase complex (containing genes REV1, REV3, and REV7) pathway for rescuing replication fork arrest are components of two alternative pathways for repairing DNA breaks³⁷. Portions of these pathways are recovered under most of the gBPM weighting schemes we examined. Another pathway that showed up across multiple weighting schemes involved two sets of genes involved in miotic spindle positioning: for example, Rizk et al³⁸ demonstrate that KIP3 plays a major role in regulating spindle length during anaphase, and it is found opposite genes critical to the dynein pathway^39,40.

Interestingly, different weighting schemes also found multiple gBPMs containing one pathway including the Perixomal genes PEX1, PEX2, PEX3, PEX4, PEX8, PEX10, PEX13, and PEX14, whose functions in peroxisomal matrix protein import in S. cerevisiae is reviewed in Akcsit and van der Klei⁴¹. The genes in the other pathway are variable, but always contain the IML1 gene. There is no supporting literature that explains any role for IML1 in the peroxisomal matrix, so the consistency of this gene in the opposite pathway is unexpected and interesting.

gBPMs for particular weighting schemes

Looking at the multiplicative weighting scheme (squared) with \(c=80\), here is an example interesting gBPM that is not found using the min weighting scheme, and only overlaps pathways of the log weighting scheme when setting \(c=70\): Pathway 1 consists of the genes UBC7, EMP24, LAS21, SPC2 and OST3; whereas Pathway 2 cosists of the genes HAC1, OST6, and IRE1. The paralog pair OST3 and OST6’s synthetic lethality relation was already studied⁴². A relation between HAC1 and IRE1 was also known, where the accumulation of unfolded ER proteins activates the transmembrane kinase/nuclease Ire1p, triggering alternative splicing of HAC1 mRNA, initiating the unfolded protein response pathway⁴³. As noted by Jonikas et al.⁴⁴, deletions of either UBC7 or SPC2 will also initiate the unfolded protein response pathway. We thus hypothesize that both pathways in this gBPM are redundant mechanisms to trigger the unfolded protein response pathway.

Conversely, a set of genes that is functionally enriched for “anerobic respiration” in one gBPM module is found opposite multiple different sets of genes with no clear known functional coherence with \(c=90\) with the logarithmic weighting scheme (squared). The involved genes include BTS1, known to be essential for anerobic growth⁴⁵, and MSN4, known to be involved in regulating key genes in anerobic response⁴⁶. Additional genes in the module include DIA2, KIN1, NUP120, GUT2, MDM31, YFH7, and DOA1, where deletions mutants of DIA2 and YFH7 show up in large scale surveys as having decreased anerobic growth.

A list of all gBPMs organized into their 2 component pathways that are found under all 18 parameter choices we test (3 values of c, weighting schemes mult, min and log, weights squared or unsquared) is provided as supplementary files.

Discussion

We revisited the question of how to best compute pairwise genetic interaction weights to find putative compensatory gene sets in yeast. While when looking at only a pair of genes in isolation, Mani et al.¹¹ showed convincingly that a multiplicative weighting scheme was more sensitive to signaling which pairs of genes show any epistatic effect at all; however we find that when using a method such as LocalCut⁹ that searches for conistent epistatic patterns across multiple gene pairs, looking only at the stronger epistasis signals (by squaring weights) will uncover more meaningful genesets.

We found c to be also an important hyperparameter whose optimal value can differ based on the weighting scheme chosen. For mult and log, we recommend setting c above 70 to reduce noise, while for min, setting c to 70 allows it to find a greater range of gBPMs.

Our results also demonstrated that different schemes can find unique BPMs not found by any other schemes, indicating that a full search for redundant pathways may benefit from running LocalCut multiple times with different weighting schemes. We suggest looking at multiple different weighting schemes to uncover interesting redundant pathways in future work.

Data availability

Data is provided within the manuscript and supplementary information files. All gBPMs generated using all weighting schemes we test are also available at https://bcb.cs.tufts.edu/bpmweight The LocalCut code we are using is from https://github.com/BurntSushi/genecentric with further documentation at https://bcb.cs.tufts.edu/genecentric/

References

Winzeler, E. A. et al. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285, 901–906 ( 1999).
Tong, A. H. Y. et al. Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 294, 2364–2368 (2001).
Article ADS PubMed CAS Google Scholar
Kelley, R. & Ideker, T. Systematic interpretation of genetic interactions using protein networks. Nat. Biotechnol. 23, 561–566 (2005).
Article PubMed PubMed Central CAS Google Scholar
Brady, A., Maxwell, K., Daniels, N. & Cowen, L. J. Fault tolerance in protein interaction networks: stable bipartite subgraphs and redundant pathways. PLoS ONE 4, e5364 (2009).
Article ADS PubMed PubMed Central Google Scholar
Ulitsky, I. & Shamir, R. Pathway redundancy and protein essentiality revealed in the Saccharomyces cerevisiae interaction networks. Mol. Sys. Bio. 3, 104 (2007).
Article Google Scholar
Ma, X., Tarone, A. M. & Li, W. Mapping genetically compensatory pathways from synthetic lethal interactions in yeast. PLoS ONE 3, e1922 (2008).
Article ADS PubMed PubMed Central Google Scholar
Schuldiner, M. et al. Exploration of the function and organization of the yeast early secretory pathway through an epistatic miniarray profile. Cell 123, 507–519 (2005).
Article PubMed CAS Google Scholar
Tong, A. H. Y. et al. Global mapping of the yeast genetic interaction network. Science 303, 808–813 (2004).
Article ADS PubMed CAS Google Scholar
Leiserson, M. D., Tatar, D., Cowen, L. J. & Hescott, B. J. Inferring mechanisms of compensation from E-MAP and SGA data using local search algorithms for max cut. J. Comput. Biol. 18, 1399–1409 (2011).
Article MathSciNet PubMed PubMed Central CAS Google Scholar
Gallant, A., Leiserson, M. D., Kachalov, M., Cowen, L. J. & Hescott, B. J. Genecentric: a package to uncover graph-theoretic structure in high-throughput epistasis data. BMC Bioinformatics 14, 1–7 (2013).
Article Google Scholar
Mani, R., St. Onge, R. . P., Hartman IV, J. . L., Giaever, G. & Roth, F. . P. Defining genetic interaction. Proc. Natl. Acad. Sci. 105, 3461–3466 (2008).
Article ADS PubMed PubMed Central CAS Google Scholar
Costanzo, M. et al. A global genetic interaction network maps a wiring diagram of cellular function. Science 353, aaf1420 (2016).
Article PubMed PubMed Central Google Scholar
Costanzo, M. et al. Global genetic networks and the genotype-to-phenotype relationship. Cell 177, 85–100 (2019).
Article PubMed CAS Google Scholar
Kelley, D. R. & Kingsford, C. Extracting between-pathway models from E-MAP interactions using expected graph compression. J. Comput. Biol. 18, 379–390 (2011).
Article MathSciNet PubMed CAS Google Scholar
Liany, H., Lin, Y., Jeyasekharan, A. & Rajan, V. An algorithm to mine therapeutic motifs for cancer from networks of genetic interactions. IEEE J. Biomed. Heal. Informatics ( 2022).
Wang, J. et al. Computational methods, databases and tools for synthetic lethality prediction. Brief. Bioinform. 23, bbac106 (2022).
Article PubMed PubMed Central Google Scholar
Apaolaza, I. et al. An in-silico approach to predict and exploit synthetic lethality in cancer metabolism. Nat. Commun. 8, 459 (2017).
Article ADS PubMed PubMed Central Google Scholar
Dey, A., Mudunuri, S. & Kiran, M. Magical: A multi-class classifier to predict synthetic lethal and viable interactions using protein-protein interaction network. PLoS Comput. Biol. 20, e1012336 (2024).
Article PubMed CAS Google Scholar
Liany, H. et al. ASTER: A method to predict clinically relevant synthetic lethal genetic interactions. IEEE J. Biomed. Health Inform. 28, 1785–1796 (2024).
Article PubMed Google Scholar
Amar, D. & Shamir, R. Constructing module maps for integrated analysis of heterogeneous biological networks. Nucleic Acids Res. 42, 4208–4219 (2014).
Article PubMed PubMed Central CAS Google Scholar
Jaccard, P. Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull Soc Vaudoise Sci Nat 37, 547–579 (1901).
Google Scholar
Kolberg, L. et al. g: Profiler-interoperable web service for functional enrichment analysis and gene identifier mapping (2023 update). Nucleic Acids Res. 51, W207–W212 (2023).
Article PubMed PubMed Central CAS Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc.: Ser. B (Methodol.) 57, 289–300 (1995).
Article MathSciNet MATH Google Scholar
Hibbs, M. A. et al. Exploring the functional landscape of gene expression: Directed search of large microarray compendia. Bioinformatics 23, 2692–2699 (2007).
Article PubMed CAS Google Scholar
Wong, E. . D. et al. Saccharomyces genome database update: server architecture, pan-genome nomenclature, and external resources. Genetics 224, iyac191 (2023).
Article PubMed PubMed Central CAS Google Scholar
Szklarczyk, D. et al. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 49, D605–D612 (2021).
Article PubMed CAS Google Scholar
Kolawole, B. & Cowen, L. J. Combining spectral clustering and large cut algorithms to find compensatory functional modules from yeast physical and genetic interaction data with GLASS. In Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, 1–4 ( 2022).
Cao, M. et al. New directions for diffusion-based network prediction of protein function: Incorporating pathways with confidence. Bioinformatics 30, i219–i227 (2014).
Article PubMed PubMed Central CAS Google Scholar
Ng, A., Jordan, M. & Weiss, Y. On spectral clustering: Analysis and an algorithm. Adv. Neural Inf. Process. Syst. 14 ( 2001).
Devkota, K. et al. GLIDER: Function prediction from GLIDE-based neigborhoods. Bioinformatics 38, 3395–3406 (2022).
Article PubMed PubMed Central CAS Google Scholar
Pedregosa, F. et al. Scikit-learn: Machine learning in Python. JMLR 12, 2825–2830 (2011).
MathSciNet Google Scholar
Ishii, M., Lupashin, V. V. & Nakano, A. Detailed analysis of the interaction of yeast COG complex. Cell Struct. Funct. 43, 119–127 (2018).
Article PubMed Google Scholar
Fuse, A. et al. VPS29-VPS35 intermediate of retromer is stable and may be involved in the retromer complex assembly process. FEBS Lett. 589, 1430–1436 (2015).
Article PubMed CAS Google Scholar
Herzig, Y., Sharpe, H. J., Elbaz, Y., Munro, S. & Schuldiner, M. A systematic approach to pair secretory cargo receptors with their cargo suggests a mechanism for cargo selection by Erv14. PLoS Biol. 10, e1001329 (2012).
Article PubMed PubMed Central CAS Google Scholar
Bonangelino, C. J., Chavez, E. M. & Bonifacino, J. S. Genomic screen for vacuolar protein sorting genes in Saccharomyces cerevisiae. Mol. Biol. Cell 13, 2486–2501 (2002).
Article PubMed PubMed Central CAS Google Scholar
Blackburn, J. B., D’Souza, Z. & Lupashin, V. V. Maintaining order: Cog complex controls golgi trafficking, processing, and sorting. FEBS Lett. 593, 2466–2487 (2019).
Article PubMed PubMed Central CAS Google Scholar
Endo, K., Tago, Y.-I., Daigaku, Y. & Yamamoto, K. Error-free RAD52 pathway and error-prone REV3 pathway determines spontaneous mutagenesis in Saccharomyces cerevisiae. Genes & Genet. Syst. 82, 35–42 (2007).
Article CAS Google Scholar
Rizk, R. S., DiScipio, K. A., Proudfoot, K. G. & Gupta, M. L. Jr. The kinesin-8 Kip3 scales anaphase spindle length by suppression of midzone microtubule polymerization. J. Cell Biol. 204, 965–975 (2014).
Article PubMed PubMed Central CAS Google Scholar
Lammers, L. G. & Markus, S. M. The dynein cortical anchor Num1 activates dynein motility by relieving Pac1/LIS1-mediated inhibition. J. Cell Biol. 211, 309–322 (2015).
Article PubMed PubMed Central CAS Google Scholar
Stuchell-Brereton, M. D., Moore, J. K. & Cooper, J. A.: The role of dynein in yeast nuclear segregation. Handb. Dynein 325 ( 2012).
Akşit, A. & van der Klei, I. J. Yeast peroxisomes: How are they formed and how do they grow?. The Int. J. Biochem. & Cell Biol. 105, 24–34 (2018).
Article Google Scholar
Knauer, R. & Lehle, L. The oligosaccharyltransferase complex from Saccharomyces cerevisiae: Isolation of the OST6 gene, its synthetic interaction with OST3, and analysis of the native complex. J. Biol. Chem. 274, 17249–17256 (1999).
Article PubMed CAS Google Scholar
Travers, K. J. et al. Functional and genomic analyses reveal an essential coordination between the unfolded protein response and er-associated degradation. Cell 101, 249–258 (2000).
Article PubMed CAS Google Scholar
Jonikas, M. C. et al. Comprehensive characterization of genes required for protein folding in the endoplasmic reticulum. Science 323, 1693–1697 (2009).
Article ADS PubMed PubMed Central CAS Google Scholar
Ishtar Snoek, I. & Yde Steensma, H. Why does Kluyveromyces lactis not grow under anaerobic conditions? Comparison of essential anaerobic genes of Saccharomyces cerevisiae with the Kluyveromyces lactis genome. FEMS Yeast Res. 6, 393–403 ( 2006).
Lai, L.-C., Kosorukoff, A. L., Burke, P. V. & Kwast, K. E. Dynamical remodeling of the transcriptome during short-term anaerobiosis in Saccharomyces cerevisiae: differential response and role of Msn2 and/or Msn4 and other factors in galactose and glucose media. Mol. Cell. Biol. ( 2005).

Download references

Acknowledgements

This research was partially supported by the DIAMONDS REU under NSF grant 2149871. We thank Mia Taylor and the Tufts BCB group for helpful discussions.

Author information

Authors and Affiliations

Department of Computer Science, Tufts University, Medford, MA, 02155, USA
Kevin M. Yu & Lenore J. Cowen

Authors

Kevin M. Yu
View author publications
Search author on:PubMed Google Scholar
Lenore J. Cowen
View author publications
Search author on:PubMed Google Scholar

Contributions

K.Y. and L.C. conceived the experiments, K.Y. conducted the experiments, K.Y. and L.C. analyzed the results, wrote the paper, and reviewed the final manuscript.

Corresponding author

Correspondence to Lenore J. Cowen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

M. Yu, K., J. Cowen, L. Exploring weighting schemes for the discovery of informative generalized between pathway models to uncover pathways in genetic interaction networks. Sci Rep 15, 30169 (2025). https://doi.org/10.1038/s41598-025-16353-2

Download citation

Received: 04 February 2025
Accepted: 14 August 2025
Published: 18 August 2025
DOI: https://doi.org/10.1038/s41598-025-16353-2

Subjects

Abstract

Similar content being viewed by others

Integrating bioinformatics and machine learning to discover sumoylation associated signatures in sepsis

Comparative transcriptome and WGCNA reveal key genes involved in lignocellulose degradation in Sarcomyxa edulis

Exploration of effective biomarkers for venous thrombosis embolism in Behçet’s disease based on comprehensive bioinformatics analysis

Introduction

Background

gBPM collections

Definition 1

Definition 2

Definition 3

Methods

Weighting schemes

Definition 4

Definition 5

Definition 6

LocalCut parameters

Enrichment validation

Gene expression validation

Experimental setup

Results

gBPM collections

Co-expression validation

Some example interesting gBPMs and pathways

gBPMs found across multiple weighting schemes

gBPMs for particular weighting schemes

Discussion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links