Introduction

Protein enzymes are crucial in regulating cellular function, orchestrating activities through intricate signal transduction pathways1. Dysregulation in enzyme activity has been implicated in numerous complex human diseases, such as cancer, diabetes, and neurodegeneration2,3,4. Given their central role in disease pathways, several groups of enzymes, including protein kinases, have emerged as the primary targets for cancer drugs5,6,7. However, responses to the drugs can vary significantly among individual patients, underscoring the necessity for accurate measurement of enzyme activities. Traditionally, enzyme activities have been primarily characterized through substrate events and assessed via small-scale in vitro experiments8,9. With advancements in high-throughput mass spectrometry (MS)-based proteomics, large-scale proteome profiling has become a powerful tool for measuring post-translational modification (PTM) events and understanding signal transduction pathways in diseases10,11. For instance, these studies identified 19,947 relationships of kinase-phosphorylated substrates in human, mouse, and rat12, 3556 relationships of ligase-ubiquitinated substrates in human, mouse, and rat13, and 503 human relationships of histone acetyltransferases (HAT)-acetylated substrate14 on a large scale. Despite these advancements, the analysis of PTM profiling data remains challenging due to the lack of effective tools for inferring the activities of enzymes, such as protein kinases, E3 ubiquitin ligases, and HATs.

To address the challenges, several tools have been developed to derive activities for protein kinases using phosphoproteomics data. However, no tools are currently available for E3 ubiquitin ligases and HATs. These tools can be broadly categorized into three main types based on their underlying algorithms: enrichment-based methods, clustering algorithms, and machine-learning methods. Enrichment-based methods, including GSEA15, KSEA16, KEA17, and VIPER18, predict kinase activity by leveraging kinase-substrate interactions obtained from public databases (e.g., PhosphoSitePlus). While a major limitation of these methods is their reliance on prior knowledge of kinase-substrate relationships in databases, some extensions have been developed that use unbiased, fully data-driven approaches. These include the reverse-engineering of regulatory models from experimental proteomics data19 and even from expression profiles using ARACNe and VIPER18,20. However, a major limitation of these methods is their reliance on prior knowledge of kinase-substrate relationships in databases. Clustering algorithms (e.g., CLUE) group phosphoproteomics data into clusters based on similar spatio-temporal kinetics of the phosphosites followed by an enrichment test. However, these methods often incorporate potential ‘noise’, such as low-abundance phosphopeptides and weak correlation between kinase-substrate relationships. Machine-learning methods (e.g., IKAP) attempt to globally fit a model of kinase activity and affinity to the phosphoproteomics data. However, these methods may face challenges in achieving a global optimum solution, and the derived activities from the model can vary with each computation.

Structural equation modeling (SEM) has proven to be a powerful method and is extensively employed for analyzing biological networks in genomics21. However, its application in proteomics, particularly for enzyme activity inference, remains largely unexplored. In contrast to existing tools for inferring protein enzyme activities, SEM offers several advantages. Firstly, it enables the inference of latent variables, such as protein enzyme activities, which cannot be directly measured. Secondly, SEM accounts for the interactions among enzymes, aligning closely with the inherent complexity of biological systems. Lastly, it incorporates measurement errors in all observed variables, such as an abundance of phosphopeptides in phosphoproteome experiments. This is especially true in MS-based PTM profiling data, where measurements inherently include errors and imperfections arising from biological and experimental variations.

In this study, we aim to develop a computational tool of JUMPsem for inferring the activities of protein enzymes. The program infers enzyme activities based on enzyme-substrate relations and PTM profiling data. We then apply the JUMPsem tool to analyze kinase activities using phosphoproteome data, E3 ligase activities using the ubiquitinome, and HAT activities using the acetylome. To assess its performance, we compare JUMPsem with two widely used programs, IKAP and KSEA, for protein kinase activity. Another feature of the JUMPsem program is its ability to incorporate sequence motifs, significantly enhancing the identification of enzyme-substrate relationships. The JUMPsem program is made publicly available as an open-source R package, along with a user-friendly web-based R/Shiny application.

Results

Design and implementation of the JUMPsem program

We developed JUMPsem, a software tool specifically designed to infer protein enzyme activity using MS-based PTM profiling data (Fig. 1). The JUMPsem program comprises three major components: (i) construction of enzyme-substrate relationships, (ii) inference of enzyme activity through a structural equation modeling algorithm, and (iii) output and comparison of enzyme activity. The program takes a quantitative table of modified peptides from PTM profiling studies as input, which can include analyses of phosphoproteome, ubiquitinome, or acetylome. Outputs from JUMPsem consist of a table containing inferred enzyme activities and a second table containing the affinities between protein enzymes and their substrates (Supplementary Fig. S1).

Fig. 1: Schematic diagram of the JUMPsem pipeline.
figure 1

a Construction of enzyme-substrate relationships and discovery of novel enzyme-substrate relationships through motif search. Two approaches are employed: (1) generation of enzyme-substrate relationships from an existing database, indicated by solid lines in the diagram, and (2) motif searches to predict new enzyme-substrate interactions, denoted by dashed lines. The adjacency matrix and quantitative proteomic data are used to infer kinase activity. b Inferring enzyme activity using a structural equation modeling (SEM) algorithm. c The output of enzyme activities generated by the JUMPsem pipeline.

The quantitative table of PTM peptides can be generated by identifying peptides via database search, filtering PTM peptides at a user-specified false discovery rate (FDR), and quantifying PTM peptides from either labeled or label-free PTM profiling data. If the corresponding whole proteome data is available, JUMPsem provides an option for normalizing the measured expression levels of PTM peptides relative to the corresponding changes in protein abundance. This normalization allows the determination of changes in PTM state independent of variations at the protein level (Supplementary Fig. S2). In the case of phosphoproteome analysis, the JUMPsem program initiates the process by extracting kinase-substrate relationship tables from public databases, such as the PhosphoSitePlus (PSP) database12. Subsequently, the program constructs an adjacency matrix for those kinase-substrate relationships with detected phosphopeptides identified in the quantitative phosphoproteomics data. The adjacency matrix can be expanded to include kinase-substrate relationships derived from motif searches, which increases the depth of analyzed kinases (Supplementary Fig. S3).

The JUMPsem program uses the SEM algorithm to calculate enzyme activity, implemented through the lavaan R package22. The core of its computation lies in the “sem” function, which fits the model by applying maximum likelihood estimation to determine the parameters. Both the adjacency matrix and PTM abundance table are used for the estimation. The outputs generated by the JUMPsem include the enzyme activity and the affinity coefficient (or connection strength) of each PTM site, representing the estimated effective connectivity with the corresponding enzyme.

Applications of JUMPsem to estimate kinase activity using mouse phosphoproteomic data

We first applied the JUMPsem program to estimate protein kinase activity using the phosphoproteomic data derived from two mouse high-grade glioma (HGG) xenograft models driven by mutated receptor tyrosine kinase (RTK) oncogenes, platelet-derived growth factor receptor alpha (PDGFRA) and fusion genes of the neurotrophic tyrosine receptor kinase 1 (NTRK1), as well as from normal controls (CTRL)23. Using 45,574 unique phosphopeptides quantified at the peptide FDR of <1%, the JUMPsem program detected a total of 324 substrates with phosphosites that can be found in the PSP database. The JUMPsem program estimated the activity of 67 protein kinases using 177 substrates. By comparing kinase-substrate relationships without motif-assisted prediction, the motif prediction strategy, on average, increases 14.7% (572/3896) kinase-substrate relationships (Fig. 2a). For example, SIK1 kinase has two substrates (CRTC3 and PDE4D) with three sites being found in the phsophositePlus database. With the motif-assisted function, we found two additional substrates (HMGA2 and NEFL). As expected, these two additional substrates showed a consistent trend, indicating that motif-assisted function selects biologically meaningful substrates instead of random assignment (Fig. 2b).

Fig. 2: Kinase motif-assisted enhancement of kinase-substrate relationship identification.
figure 2

a Heatmaps display the kinase activity profiles derived from JUMPsem with and without motif-assisted and IKAP methods within two mouse high-grade glioma (HGG) xenograft models, PDGFRA and NTRK1, contrasted with normal cortex controls. b Heatmap illustrating the example SIK1 substrate abundance from JUMPsem with and without motif assisted. The substrate names marked in red were additionally called out with the assistance of the motif function. c Venn diagram illustrates the overlapping differential kinase identifications for PDGFRA vs CTRL and NTRK1 vs CTRL between JUMPsem with motif assistance and IKAP analyses. d Heatmap delineates kinases exclusively detected by JUMPsem with motif assistance, highlighting the method’s sensitivity. e Circos plot visualizes the pathway enrichment attributable to the motif-assisted JUMPsem, linking kinases to biological processes. f Four exemplar protein kinases show differential activity among NTRK1 and PDGFRA-driven glioma models, compared to cortex samples.

Our analysis revealed significant alterations in the activity of 33 protein kinases between PDGFRA mice and CTRL are shared both from JUMPsem without motif assisted and IKAP, and 23 kinases between NTRK1 mice and CTRL are shared from these two tools (Fig. 2c). JUMPsem with motif assisted can alternatively infer 28 kinases compared with IKAP (Fig. 2d). These altered kinases were significantly enriched (FDR < 0.01) in cancer-related pathways (Supplementary Data S1a–d), including pathways associated with the checkpoint, and pathways in cancer and glioma. Compared to our previous analysis conducted using the IKAP program, we found a high degree of consistency in the majority of identified pathways between the IKAP and JUMPsem. Motif-assisted strategy allows for the uncovering of previously unknown kinase-substrate relationships, thereby enabling the estimation of the activity of additional kinases. By applying this strategy to the HGG phosphoproteomic data, we further revealed several kinases, such as MAPK7, STK4, DAPK1, CAMK4, MYLK, PAK2, PAK3, and ILK, which were not identified in the previous IKAP analysis. Many of these kinases were implicated in cancer-related pathways, including MAPK and ErbB pathways (Fig. 2e), exhibiting activity alterations without corresponding changes at the protein level (Fig. 2f). This motif-assisted prediction strategy enhances the scope of kinase activity analysis.

Comparison of JUMPsem with other programs applied to human phosphoproteomic data

To assess the performance of JUMPsem, we compared the performance of JUMPsem with IKAP and KSEA, two widely used programs developed for kinase activity estimation, using a human phosphoproteomic dataset. The phosphoproteomic data were generated from the P31/Fuj and Kasumi-1 acute myeloid leukemia (AML) cell lines24. P31/Fuj cells are deficient in PTEN (phosphatase and tensin homolog), whereas Kasumi-1 cells have wild-type PTEN, which is resistant to multiple drugs25. The dataset was originally used to estimate the activity of protein kinases using the KSEA program. Utilizing a total of 4129 phosphorylated peptides quantified across the cell lines, we estimated the activity of protein kinases using JUMPsem, IKAP, and KSEA (Fig. 3a). When comparing the results from IKAP, JUMPsem revealed a cluster that shows a similar kinase activity pattern observed between P31/Fuj and Kasumi-1 cell lines. In addition, JUMPsem identified two unique clusters of kinase activity (Clusters 1 and 2). With a significance threshold set at a p-value < 0.05 and an absolute fold change greater than 1, we identified 75 up-regulated and 31 down-regulated kinases in activity between the P31/Fuj and Kasumi-1 cells (Fig. 3b; Supplementary Data S2). For instance, PRKCA exhibited a notable 7.9-fold decrease in activity in Kasumi-1 compared to P31/Fuj cells. These kinases with differential activity were enriched in the pathways, such as glioma (e.g., MAP2K2), Toll-like receptor signaling (e.g., MAP2K2 and MAP2K4), acute myeloid (e.g., MAP2K2) (Fig. 3c, d and Supplementary Data S3a, b).

Fig. 3: Comparison of kinase activity inference methods.
figure 3

a Heatmap depicting the z-score normalized kinase activities between two conditions, P31/Fuj, and Kasumi-1 cell lines, as determined by JUMPsem without motif-assisted search, JUMPsem with motif-assisted search, IKAP, and KSEA. Colors correspond to activity levels: red indicates high kinase activity, and blue indicates low kinase activity. Three clusters (Cluster 1, Cluster 2, and Cluster 3) are annotated to the right of each heatmap, indicating groupings of kinases based on activity patterns. The waterfall plot on the right is the result from KSEA, which we applied to compare with IKAP and our JUMPsem. b Volcano plot illustrating the variation in kinase activity between P31/Fuj and Kasumi-1 cell lines, as inferred by JUMPsem with motif search. The x-axis denotes the log2 fold change, where positive values suggest increased activity in P31/Fuj relative to Kasumi-1, and negative values denote decreased activity. The y-axis corresponds to the negative logarithm (base 10) of the p-value, with higher values signifying more substantial statistical significance. c Bar chart representing pathways enriched in protein kinases with heightened activity in P31/Fuj compared to Kasumi-1 inferred from IKAP. The x-axis indicates the positions of the pathway members in the ranked list of proteins, while the y-axis lists the pathways. The values in parenthesis indicate the enrichment score, quantifying the overrepresentation for each pathway. The order of the protein position is ranked by the fold change score of kinase activity. d Bar chart detailing pathways enriched in protein kinases with heightened activity in P31/Fuj relative to Kasumi-1 inferred from JUMPsem with motif assisted, using the same axes format as in (c).

Performance evaluation of JUMPsem using phosphoproteomic benchmark datasets

To evaluate the performance of JUMPsem, we conducted a comprehensive analysis using benchmark data from two previously published studies26,27, which included 16 phosphoproteomic datasets covering a total of 31 conditions (Supplementary Data S4). We assessed the performance by comparing the precision of JUMPsem and IKAP across various threshold cutoffs. Briefly, for each dataset and condition, we first processed the phosphoproteome, generated a kinase-substrate relationship table, and identified the target kinase (Supplementary Fig. S5a). The average kinase activity was then calculated across all replicates, and the rank of the target kinase was recorded among all inferred kinases for each dataset (Supplementary Fig. S5b). A true positive was defined as a target kinase whose rank was below the threshold cutoff. Precision was finally calculated for performance comparison, which was defined as the ratio of true positives to the total number of true and false positives. Overall, our analysis demonstrated that JUMPsem slightly outperformed IKAP in terms of precision (Supplementary Fig. S5c).

Inference of activities of E3 ubiquitin ligases and histone acetyltransferases

We next sought to infer the activities of E3 ubiquitin ligases from ubiquitinomic data using the JUMPsem program. The ubiquitinomic data were previously generated from cells under various conditions: unstressed, heat-stressed, or arsenite-stressed, in the presence or absence of 0.5 μM bortezomib, a proteasome inhibitor28. The ubiquitinomic data were generated by the tandem mass tag (TMT) strategy from samples in response to a 60-min heat stress and following 2 h of recovery, quantifying 16,525 unique ubiquitin-modified peptides, corresponding to 4892 ubiquitinated proteins. The UbiBrowser database29, which contains a total of 1599 ubiquitinated substrate proteins, corresponding to 408 human E3 enzymes, was used to construct the relationship table of E3 ubiquitin ligases and substrates. The JUMPsem program found 161 E3 ubiquitin ligases and 4892 unique substrate proteins in the database (Fig. 4a). Our findings revealed that 124 unique ubiquitin ligases exhibited significant alterations in their activity between the heat-shock conditions (HS) and the control condition (Ctrl), as determined by a t-test (p-value < 0.05), with 80 showing differences between HS and Ctrl, 105 between HS and Ctrl with bortezomib treated (Supplementary Data S5). For example, three ligases (i.e., TRIM28, TRIM13, and TRIP12) were among the ligases upregulated in both HS and Recovery stress conditions compared to the control with bortezomib treated (Fig. 4b). In contrast, two ligases, MKRN1 and TRIM25, displayed a decrease in activity in heat shock with proteasome inhibitor bortezomib (i.e., HS_Bortezomib and Recovery_Bortezomib) compared the control, respectively (Fig. 4b). The ligase activity showed consistency with substrate abundance. For example, TRIP12, especially after bortezomib treatment, a consistent increase in both substrate abundance and ligase activity in the HS and Recovery_Bortezomib groups compared to the control condition (Fig. 4c). To estimate the activity of histone acetyltransferases (HATs), TMT10 quantification techniques for global acetylation profiling, luminal and basal subtype breast cancer xenograft tissue samples were homogenized, lysed, and digested30. A total of 4180 Kac peptides correspond to 3653 Kac sites, and 1804 Kac proteins were quantified. These acetylated proteins were mapped to 6 HATs in the compendium of protein lysine acetylation (CPLA) database, curated for lysine-acetylated substrates with their sites14 (Fig. 4d). The JUMPsem detected three HATs, including KAT2A, GTF3C4, and MGEA5, showed activities as inhibitors between luminal and basal xenograft samples.

Fig. 4: Activity profiles of E3 ubiquitin ligases and histone acetyltransferases from ubiquitinome and acetylome data.
figure 4

a Heatmap illustrating the activity profiles of E3 ubiquitin ligases across cells subjected to no stress, heat stress, or arsenite stress, with and without proteasome inhibition (0.5 μM bortezomib), as indicated by previously generated ubiquitinomic data. b Heatmap of five E3 ligases exhibiting altered activities across five stress conditions. c Heatmap depicting consistent expression level changes in three substrates of the TRIP12 ligase. d Heatmap representing the activities of histone acetyltransferases in various conditions.

Modular and scalable R package with shiny web application

JUMPsem is a modular and scalable package, consisting of distinct components designed to process specific types of PTM profiling data. This modular design allows for the future expansion of JUMPsem to incorporate other PTM data, such as methylation profiles. To streamline analysis and visualize results, we have developed a companion R/Shiny application (Fig. 5) that integrates seamlessly with the JUMPsem R package. This web-based JUMPsem R/shiny application can be broadly categorized into three primary steps: loading the raw files, optionally configuring group parameters, and exploring and visualizing kinase analysis. For more comprehensive information and step-by-step guidance, the package vignettes and documentation are available online at the application website (https://jumpsem.shinyapps.io/JUMPsem/).

Fig. 5: JUMPsem R/Shiny web-based platform.
figure 5

The JUMPsem platform workflow encompasses three primary steps: importing data and configuring group parameters, performing JUMPsem analysis, and visualizing enzyme activities.

Discussion

Recent advancements in mass spectrometry-based proteomics technologies have enabled us to profile large-scale PTM data. The resulting PTM profiling data are instrumental in unraveling complex signal transduction processes and enzyme activities. Despite the wealth of PTM data generated, the interpretation remains challenging due to a dearth of tools specifically designed to handle and make sense of this complexity. Addressing this critical gap, our study introduces JUMPsem, an innovative computational tool that is capable of analyzing large-scale PTM datasets by facilitating the inference of enzyme activity. The JUMPsem program outperforms existing widely used tools, such as IKAP and KSEA, in terms of the number of enzymes it can analyze and its computational speed. JUMPsem is an effective tool for enzyme activity inference, available as an open-source R package and as an R shiny webtool.

One significant advantage of the JUMPsem program over existing tools for predicting enzyme activity is its integrated motif search strategy. This strategy substantially broadens the potential kinase-substrate interactions that can be identified, going beyond the recorded entries in the PSP database. Despite the fact that proteomics technology is able to detect thousands of modified peptides in a single experiment, a mere fraction of the modified peptides is cataloged in the public database. For example, less than 10% of the quantified phosphopeptides in an MS-based experiment are represented in the PSP database. Consequently, a vast portion of the data remains underutilized for activity estimation. The JUMPsem program tackles this limitation by augmenting its motif search strategy, thereby extending the detectable range of kinase-substrate relationships beyond the confines of existing databases. This enhanced strategy permits JUMPsem to leverage a more significant segment of the phosphoproteomics data, thus facilitating a more comprehensive analysis of cellular signaling pathways.

Another advantage of the JUMPsem program lies in its utilization of the SEM algorithm, which handles potential measurement errors present in all observed variables. This is particularly important in MS-based phosphoproteomics, where the measurement of modified peptides from large-scale PTM profiling data contains inherent uncertainties and is susceptible to various errors, including quantification measuring errors. In addition, a modified peptide could be incorrectly assigned to a spectrum and a modified site can be misassigned to an adjacent site due to missing informative ions in the spectrum. By accounting for these uncertainties, JUMPsem provides a more robust and reliable estimation of enzyme activities, reflecting the complex reality of biological data.

One limitation of the current JUMPsem program is its design for inferring the activity of individual kinases without considering the interaction of different enzymes within complex signaling networks. These interactions could be incorporated into the SEM model. Most existing computational tools, including the current version of JUMPsem, are tailored to single kinases, which can introduce biases in activity estimation. This bias arises because changes in substrate abundance may be due to the influence of co-active enzyme partners. For instance, fluctuations in the substrate levels of a kinase might be the result of actions by other kinases. Furthermore, the PTM landscape can be influenced by a variety of factors, including kinases relocating within the cell, interactions with other proteins, or competition with de-activation enzymes (e.g., the dynamic between phosphatases and kinases). Such complexities could lead to incorrect estimation of the enzyme activity.

In summary, we introduce the JUMPsem program, an innovative tool designed for inferring protein enzyme activity from PTM profiling data. The modular and scalable JUMPsem package, coupled with its user-friendly web application, makes it a valuable tool for inferring enzyme activity. We have successfully demonstrated the application of JUMPsem to three types of proteomic data—phosphoproteomic, ubiquitomic, and acetylomic—identifying and quantifying enzyme activities, as well as substrate affinities. The utility of JUMPsem in estimating enzyme activity and elucidating cellular signaling pathways marks it as an indispensable tool in the field.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.