Fig. 1: Overview of KSTAR algorithm.
From: KSTAR: An algorithm to predict patient-specific kinase activities from phosphoproteomic data

First, we heuristically prune dense and highly overlapping weighted kinase-substrate prediction graphs from NetworKIN24 into many sparse, binary graphs. Statistical enrichment is calculated for an experiment that has a defined set of phosphorylation sites for every kinase across all networks using a hypergeometric distribution. We generate and calculate enrichment in 150 random experiments using the same approach. Next, we use the Mann-Whitney U test to measure the likelihood that the enrichment p-values in the real experiment are more significant than the random experiments, giving us a final p-value, which accounts for the underlying enrichment of substrates in a network, aggregates that information across the different network configurations, and controls for the kinase- and experiment-specific behavior of enrichment that occurs by random chance. We measure the false positive rate by measuring the distribution-based test for a random experiment against the remaining 149 random experiments, repeating this for 100 times. Finally, the numerical KSTAR “score” (the -log10 transformation of the Mann–Whitney U-test) is presented in graphical format where the dot size is larger when there is more evidence phosphorylation sites are coordinately sampled from a kinase network. The FPR is indicated by “Significance” of having less than a specific empirical FPR. Source data are provided with this paper.