Introduction

Phenotype-based drug screening is a classical and commonly used approach in modern drug development. Screening involves specifying the biological activities of compounds but often lacks clear identification of potential drug targets. The identified drug targets can be used for drug development through multiple processes, including revealing the molecular mechanism of drug actions, facilitating rational drug design, reducing side effects, and facilitating drug repositioning1. Moreover, target deconvolution reveals the irreplaceable importance of improving drug potency and safety profiles, especially considering the distinct direct targets of many clinical drugs or drugs with determined effects2.

With the urgent demands of drug deconvolution within drug discovery, advanced biotechnologies, such as proteomics, single-cell sequencing, molecular docking and molecular simulation, were utilized immediately after their first appearance3. Despite their diversity, these approaches are time-consuming, expensive, and rely heavily on experienced researchers. Disappointingly, drug deconvolution remains a significant challenge for the pharmaceutical industry. For example, aspirin or acetylsalicylic acid (ASA), which is a traditional anti-inflammatory and pain-relieving drug, is commonly used to treat cardiovascular conditions and other diseases4. However, the underlying mechanism is unclear because of the dynamic regulation of distinct molecules and biological pathways by aspirin. Undoubtedly, a cost-effective method for drug target prediction could revolutionize drug discovery and benefit the pharmaceutical industry.

Drug discovery hinges upon collaborative efforts across diverse disciplines. Increasing exploration in the life sciences provides valuable information, prompting the need for a systematic understanding of the intricate crosstalk between drugs, their targeted proteins, and the roles of drug targets in both healthy and diseased signaling pathways, leveraging vast knowledge from various databases. The absence of a comprehensive understanding of biological processes has markedly hindered drug discovery, encompassing drug target deconvolution.

For example, p53 is a 373 amino acid protein encoded by the TP53 gene and plays crucial roles in various diseases, including cancer, dysplasia, neurodegenerative diseases, autoimmune inflammatory diseases and cardiovascular disease, primarily by regulating the p53 signaling pathway5. Dysfunction of p53 caused by p53 mutation or the irregularity of p53 regulators always results in disease. Therefore, an increasing number of drugs are being developed to restore p53 function and reactivate the p53 signaling pathway. However, owing to the regulation of the function of p53 in tumor cells by a variety of stress signaling pathways and the numerous regulatory elements involved, identifying the direct target of a p53 signaling activator remains a challenge, which partly hinders the clinical application of these drugs6.

Recent insights into p53 activation have led to two primary screening strategies for p53 activators. The target-based approach specifically targets p53 and its regulators (such as USP7 (ubiquitin-specific protease, also known as UBP7), MDM2, MDMX, and Sirt proteins), necessitating separate systems for each target and potentially overlooking multitarget compounds. Recent advancements in pocket-based methods address this limitation by leveraging structural similarities in protein binding pockets, enabling the identification of compounds that may act across multiple targets7. However, pocket-based methods for multitarget drug discovery face issues such as binding specificity, off-target effects, and protein flexibility. Conversely, the phenotype-based approach identifies drugs that modify p53-related phenotypes but struggles with exploring mechanisms. Although this strategy can unveil new targets, the lengthy process of clarifying the mechanisms and targets complicates drug development. For example, the mechanism of PRIMA-1, discovered in 2002 for mutant p53 tumors, was revealed only in 20098. Many p53-restoring compounds still lack known mechanisms. Hence, efficient target deconvolution is critical in phenotype-based screening.

In this study, we propose a novel method based on a protein-protein interaction knowledge graph (PPIKG) to predict direct targets, which are subsequently verified by biological assays. Knowledge graphs are suitable for knowledge-intensive scenarios with few labeled samples, such as target prediction and ancient character recognition9. The well-studied but challenging p53 signaling pathway was selected to evaluate this method. Using the screening of p53 agonists as an example, the potential p53 pathway activator UNBS5162 was screened with a biological phenotype (p53-transcriptional-activity)-based high-throughput luciferase reporter drug screening system. Furthermore, the signaling pathway and node molecules related to p53 activity and stability were analyzed with the p53_HUMAN PPIKG system. By integrating these two systems with a p53 protein target-based computerized drug virtual screening system, USP7 was identified as a possible direct target of the p53 signaling activator UNBS5162, and further experimental verification was performed.

Continued research on drug-target interactions through the novel approach of target deconvolution from phenotype-based screening using knowledge graphs and multidisciplinary cross-examination will help address the time-consuming and costly reverse targeting process of drugs obtained on the basis of phenotype screening. Additionally, the biological significance of target-based drug screening and design will alleviate limitations, such as the difficulty of screening and designing drugs, thereby promoting drug screening and pharmacological research.

Our contributions can be summarized as follows:

  • Using the screening of p53 agonists as an example, the potential p53 pathway activator UNBS5162 was screened with a biological phenotype (p53-transcriptional-activity)-based high-throughput luciferase reporter drug screening system.

  • The signaling pathway and node molecules related to p53 activity and stability were analyzed via the p53_HUMAN PPIKG system.

  • The two systems were combined with a p53-protein-target-based computerized drug virtual screening system, leading to the exploration of USP7 as a possible direct target of the p53 signaling activator UNBS5162. Further experimental verification was performed.

  • This novel method, which utilizes PPIKG in conjunction with molecular docking, effectively narrows the screening range and facilitates target discovery by pharmaceutical experts. This approach saves computing resources, enhances the efficiency of drug screening, and significantly improves the interpretability of molecular docking.

The paper is organized as described below. Section 2 provides a brief overview of related methods, and Section 3 presents our detailed methodology. In Section 4, we describe the qualitative and quantitative evaluations of our model. Section 5 concludes this paper.

Related work

The accumulation of biomedical data has shifted the pharmaceutical industry from phenotype-based discovery to a target-based approach. Phenotype-based methods, which are effective but costly and time-consuming, focus more on desired outcomes than mechanisms of action do, often leading to inefficiencies in early clinical trials. This has given rise to computational methods, which enhance the drug development process by reducing costs and time10. These methods help researchers understand complex biological systems and develop targeted treatments. Drawing on Che’s research11, computational methods can be classified into the following categories.

  • Deep Learning Methods. In recent decades, deep learning methods have shown significant power in identifying and repurposing drugs. Notable efforts include Himmelstein’s Hetionet12 knowledge graph and Che’s graph convolutional network model for predicting COVID-19 treatments11. Farah Jabeen’s term frequency-inverse document frequency (TF-IDF) and long short-term memory (LSTM) models have been applied to predict butyrylcholinesterase inhibitors for the treatment of Alzheimer’s disease13. Although powerful, deep learning methods suffer from the black box problem, making results less interpretable.

  • Knowledge Graph Embedding (KGE) Methods. These methods predict potential targets by mapping entities and relationships to a vector space, retaining knowledge graph characteristics while addressing feature sparsity. Sameh K Mohamed et al.14 introduced TriModel, a novel knowledge graph embedding model designed for predicting candidate drug targets. Vít Nováček et al.15 developed a new embedding technique employing multipart vectors to forecast polypharmacy side effects. However, these methods are unsuitable for drug repositioning for emerging diseases because of incomplete disease-related information in the knowledge graph.

  • Text Mining Methods. These methods extract useful information from the literature to identify new drug-disease interactions. Rita T Sousa et al.16 explained protein-protein interactions with a knowledge graph on the basis of semantic similarity. S M Shamimul Hasan et al.17 integrated Louisiana Tumor Multiple Cancer Registry data into a knowledge graph, enabling complex queries and facilitating iterative analyses to better understand cancer registry data. However, text mining is limited by inconsistent and contradictory information in the literature.

  • Biological Feature-Based Methods. These methods use machine learning to extract drug and target features for repositioning, such as SimBoost18 and NRLMF19, which predict drug-protein interactions. However, they often overlook drug-drug or protein-protein interactions20.

  • Network-Based Methods. These methods construct networks to predict interactions by calculating similarities in network topology. Notable examples include DAJLENet21, DDR22, DTI-NET23, MSCMF24, HNM25, and Neo-DTI26, which improve prediction accuracy by integrating heterogeneous data. However, the existing methods are limited in heterogeneous network mining and suffer from data loss11.

For p53 activator screening, two strategies exist:

  • Target-Based Screening. This strategy focuses on specific targets, such as MDM2, MDMX, USP7, and Sirt proteins, requiring multiple systems for each p53 regulator. This method often misses compounds affecting multiple targets and fails to predict their true efficacy and toxicity.

  • Phenotype-Based Screening. This strategy identifies drugs that alter p53-related phenotypes but struggles with target identification and mechanism exploration.

This study developed a p53 transcriptional activity-based high-throughput luciferase reporter drug screening system enhanced with PPIKG for target deconvolution, which combines phenotype-based and target-based approaches. This interdisciplinary method can be extended to other diseases, improving the understanding of drug-target interactions and drug development.

Methods

In this study, UNBS5162 (Cas#13018-10-5) was purchased from TargetMol (Shanghai, P. R. China). The primary antibody against p53 (Cat#2524) was purchased from Cell Signaling Technology (CST). The anti-GAPDH antibody was purchased from KANGCHEN (Cat#KC-5G4). The peroxidase-conjugated AffiniPure goat anti-mouse (H+L) secondary antibody (Cat#115-035-003) was purchased from Jackson ImmunoResearch.

The datasets primarily originated from UniProtKB/SwissProt and BioGRID. SwissProt, a high-quality, manually annotated, and reviewed section of UniProtKB27, serves as a nonredundant protein sequence database28. Currently, UniProtKB contains more than 227 million sequences. BioGRID, an open-access database, is dedicated to the curation and archival storage of protein, genetic and chemical interactions for all major model organism species and humans29. In BioGRID version 4.4.225, data from 82,951 publications encompassing 2,639,546 protein and genetic interactions, 30,725 chemical interactions and 1,128,339 posttranslational modifications from major model organism species are included.

Proteins predominantly exert their functions through interactions with other proteins. A specific genetic abnormality is not restricted to the activity of the protein that carries it but can spread along the links of the biomolecular network. Therefore, systematically studying proteins at the global and dynamic levels could significantly advance research into sophisticated biological activities. This paper proposed an approach for target deconvolution from phenotype-based screening via a knowledge graph, as shown in Fig. 1.

Fig. 1
figure 1

Illustration of the method used for target deconvolution.

Identification of activators via high-throughput chemical screening

  • Cell culture. HCT116, human colorectal carcinoma cell line, were cultured in McCoy’s 5A medium (Gibco) supplemented with 10% fetal bovine serum (FBS). HCT116 was obtained from the Cell Bank of the Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences (Shanghai, China). The cells were maintained in an incubator at 37 °C with 5% CO\(_2\).

  • Luciferase activity assay. A luciferase activity assay was performed as described previously30. Briefly, cells containing the p53-driven luciferase reporter gene were plated in 96-well plates at a density of \(1 \times 10^{4}\) cells/well. Twelve hours after the initial incubation, the test compounds (a library containing 4000 compounds) were added to each well at specific concentrations, while an equal volume of the solvent DMSO was used as a blank control. Luciferase activity was measured via the Luciferase Assay System (Cat#E2920, Promega) on a Molecular Devices/SpectraMaxL instrument with a 562 nm emission filter set.

Construction of the PPIKG

Knowledge graphs organize intricate biomedical data by identifying potential relationships within the biomedical community, thus providing valuable insights into drug mechanisms. As shown in Fig. 2, protein data originating from databases such as BioGRID and UniProt were integrated to construct the PPIKG. The nodes of the PPIKG represent proteins, and the edges represent protein-protein interactions. A detailed description of the PPIKG building process is provided below.

Fig. 2
figure 2

Construction of the PPIKG.

Extraction of entities, attributes and relationships

The data sources we obtained were all XML files. First, we analyzed the tree structure of XML to clarify the tree structure and the position of useful attribute values in the tree. Second, the XML tree was parsed. Finally, the extracted entities were stored as CSV files for subsequent import into Neo4J. Three different types of relationships were extracted:

  1. 1)

    Reference information described by hasDbXref, Xref, etc. This information is an important basis for establishing relationships between entities from various data sources, as shown in Fig. 3.

  2. 2)

    Internal relationships between single data sources. These relationships are only established within one data source, such as protein interaction relationships in BioGRID.

  3. 3)

    Relationships extracted from additional relationship files. These files are only used to create or add additional relationships between entities, especially across data source entities, without adding or adding only a small amount to the attributes of entities.

Fig. 3
figure 3

Cross-reference is the basis for entity alignment (the pentagram marks the ID of the same proteins that are referenced to each other in UniProt and BioGRID, and the flag marks multiple names of the same protein).

The extracted relationships were also saved as CSV files labeled “Source”, “Rel”, and “Target” for the head entity, the type of relationship, and the tail entity, respectively.

Entity alignment and relation fusion

After entity extraction, we obtained several entities and instances with different schema. Entity alignment was applied to avoid the same entity being represented by distinct nodes or the same instance being represented by distinct nodes. First, we used ontology learning and manual compilation to construct a protein ontology according to UniProtKB. Second, semantic maps were created as follows31:

  1. 1)

    Semantic mappings were created between UniProt and protein ontology. As shown in Fig. 4, the UniProt tags <recommendedName> and <primaryName> were mapped to “Name” in the protein ontology. Tags <alternativeName> and <synonymName> were mapped to “Synonym” in the protein ontology. Tag <NCBI TaxonomyId> was mapped to “TaxId” in the protein ontology. In addition, proteins and associated properties were saved as instances of protein ontology.

  2. 2)

    Semantic mappings between BioGRID and protein ontology were created. As shown in Fig. 5, BioGRID tags <shortLabel>,<ncbiTaxId> and <alias> were mapped to the protein ontology classes “Name”, “TaxId”, and “Synonym”, respectively. As the semantic maps were created, the data in the BioGRID tags could subsequently be automatically integrated. Moreover, proteins, their related attributes and interaction data were saved as instances of protein ontology.

Fig. 4
figure 4

Semantic mappings between UniProt and the protein ontology.

Fig. 5
figure 5

Semantic mappings between BioGRID and the protein ontology.

P53_HUMAN PPIKG-guided screening of candidate proteins

In this study, a method for target prediction based on PPIKG is proposed, which consists of five steps. First, p53_HUMAN PPIKG was constructed for screening candidate proteins. Second, the candidate proteins were further narrowed via metrics such as degree centrality and closeness centrality. Third, candidate proteins were screened on the basis of “direct interaction” relationships, as viewed from a biological perspective. Fourth, the vector similarity between two proteins was applied to quantify the likelihood of a protein interaction, incorporating a priori knowledge gleaned from the literature. Finally, the extent of direct protein interactions was assessed via a k-hop reachable path statistical algorithm for further screening of candidate proteins. After this process, the screened candidate proteins were subsequently subjected to molecular docking and experimental verification.

  • Step 1. Construction of p53_HUMAN PPIKG for screening candidate proteins

    The p53_HUMAN PPIKG was constructed through filtering with the p53_HUMAN protein on the basis of the PPIKG. The p53_HUMAN PPIKG contains 1,088 proteins and 27,259 protein interactions, as shown in Fig. 6. The interactions within p53_HUMAN PPIKG are diverse, encompassing categories such as “direct interaction”, “suppressive genetic interaction defined by inequality”, “synthetic genetic interaction defined by inequality”, “colocalization”, “additive genetic interaction defined by inequality”, “association”, and “physical association”.

  • Step 2. The screening of candidate proteins relies on a centrality measurement

    Centrality analysis has been widely applied to identify key players in biological processes32. Highly connected vertices in protein interaction networks are often functionally important. Therefore, we adopted metrics such as degree centrality and closeness centrality to predict potential proteins that interact with p53_HUMAN.

    • Screening the top 50% by degree centrality

      Nieminen33 quantified degree centrality through the number of adjacencies, as described by the following formula:

      $$\begin{aligned} C_{D}(p_{k} )=\sum _{i=1}^{n}a(p_{i} , p_{k} ) \end{aligned}$$
      (1)

      where n represents the number of proteins neighboring protein \(p_{k}\). \(a(p_{i} , p_{k} )=1\) if \(p_{i}\) and \(p_{k}\) are connected; otherwise, it equals 0. The degree centrality \(C_{D}(p_{k} )\) represents the degree of a node \(p_{k}\) as the “count of nodes to which a given node \(p_{k}\) is connected or adjacent”34. Degree centrality measures a node’s influence, with a high degree value indicating a well-connected node within the network, thus participating in numerous interactions35. Nodes with higher degree centrality often appear as “hubs” of activity, such as a more “active” protein. We ranked the top 50% of proteins on the basis of degree centrality as shown in Table 1. For the complete data, please refer to https://github.com/Xiong-Jing/PPIKG.

    • Screening the top 50% by closeness centrality

      Closeness centrality, introduced by many researchers, measures the proximity of a node to all other nodes within a network. It is calculated as the average shortest path length from the node to every other node in the network36. The most predominant and revealing measure was given by Sabidussi37, who defined a node’s centrality as the inverse of the sum of geodesics from the node to all other nodes in the network. This measure by Sabidussi can be expressed as follows:

      $$\begin{aligned} C_{c}(p_{k} )^{-1}=\sum _{i=1}^{n}d(p_{i} ,p_{k} ) \end{aligned}$$
      (2)

      where \(d(p_{i} ,p_{k} )\) represents the number of edges in the geodesic between nodes \(p_{i}\) and \(p_{k}\), and n represents the number of neighboring proteins to protein \(p_{k}\). The value of \(C_{c}(p_{k} )^{-1}\) increases as the distance between \(p_{k}\) and other nodes in the network increases, indicating a reduction in the closeness of \(p_{k}\). Conversely, this value decreases as the distance decreases and closeness improves. \(C_{c}(p_{k} )^{-1}\) is influenced by the graph size and provides an absolute measure. Closeness centrality intuitively measures how rapidly information can disseminate from node \(p_{i}\) and is based on the concept that a node is proximal to all other nodes in the protein interaction network rather than just its immediate neighbors. A node with high closeness centrality is strategically positioned to spread information efficiently38. Similarly, a protein with strong closeness centrality is more central within the protein-protein interaction knowledge graph and can serve as a significant influencer.

      We utilized closeness centrality to identify influential proteins within p53_HUMAN PPIKG. For ease of calculation and sorting, Formula 2 was modified as follows:

      $$\begin{aligned} C_{c}(p_{k} )=\frac{1}{\sum _{i=1}^{n}d(p_{i} ,p_{k} )} \end{aligned}$$
      (3)

      We selected the top 50% of proteins on the basis of their closeness centrality. The distribution of closeness centrality is displayed in Table 2. For the complete data, please refer to https://github.com/Xiong-Jing/PPIKG.

  • Step 3. Screening of candidate proteins by direct interactions within PPIKG

    A direct interaction refers to a molecular interaction wherein molecules are in direct contact with each other39. It encompasses covalent binding, enzymatic reactions and so on. In this study, candidate proteins were selected on the basis of direct interactions. A total of 157 candidate targets were obtained, as shown in Table 3 (please refer to https://github.com/Xiong-Jing/PPIKG for the complete data).

  • Step 4. Screening of candidate proteins by vector similarity

    Semantic relatedness refers to the human assessment of how closely connected a given pair of concepts is. This study evaluates the semantic relevance of two protein name vectors in a biological corpus by calculating their vector similarity. We assume that the closer the vector distance is, the greater the semantic relatedness. This finding further illustrates the possibility that these two proteins may interact with each other.

    There are various methods to calculate the similarity of the vectors, such as Word2vec40, ELMo41, GPT42, and BERT43. Since our research focuses on term-level similarity and does not require extensive training on biological texts, we opted to use Word2vec. This decision is further supported by the fact that EVEX44, a text mining resource, provides a series of pretrained Word2vec models that already encompass our dataset.

    Our training data contain more than 40 million biomolecular events and more than 76 million gene/protein names automatically extracted from PubMed abstracts and PubMed Central (PMC) full-text articles. Stop words and symbols, such as “+” and “-”, were retained because they may have important biological significance. First, from the 157 candidate proteins identified in the previous step, we meticulously excluded those that were unimportant and of low frequency, culminating in a curated set of 138 key proteins that will serve as the input for our word vector analysis. The vector distances between the remaining 138 candidate proteins and p53_HUMAN were then calculated and sorted in ascending order. The vector distance indicates the likelihood of interaction between each protein and p53_HUMAN. Finally, 69 proteins ranked in the top 50% of the 138 proteins were selected as candidates, as shown in Table 4 (please refer to https://github.com/Xiong-Jing/PPIKG for complete data).

  • Step 5. Screening of candidate proteins via the reachable path statistical algorithm based on k-hop

    In Step 3, we applied the concept of “direct interaction” to screen proteins, even though the exact nature of these interactions remained unclear. Source proteins may interact through various pathways, with the interaction depth labeled k. Our goal was to predict the best candidate proteins closely linked to the source protein. To achieve this, we used a k-hop-based reachable path statistical algorithm to measure the degree of direct interaction between the target and source proteins. We then ranked the target proteins according to the number of k-hop reachable pathways. A greater number of k-hop reachable paths indicates a stronger level of direct interaction.

    The formula for this calculation is as follows:

    $$\begin{aligned} count(s,k,t)=\sum _{i=1}^{n}count(s_{i},k-1,t) \end{aligned}$$
    (4)

    where count(skt) represents the number of paths from the source protein s to destination protein t through k hops, \(s_{i}\) denotes proteins adjacent to s, and n indicates the number of neighbors to protein s.

    The link between protein A and protein B strengthens when both proteins are associated with protein C; hence, we selected a value of 2 for k in this investigation based on ternary closure theory45. By using the p53_HUMAN protein as the source, we applied a 2-hop reachable path statistical algorithm to predict target proteins according to Formula (3). We hypothesized that proteins with strong interactions with p53_HUMAN would demonstrate more 2-hop paths connecting to p53_HUMAN proteins.

    For our analysis, we concentrated on the top 50% of candidate proteins, which were selected based on vector similarity. Using the 2-hop reachable path statistical algorithm, we further refined the number of candidate proteins to 35. The detailed results of this selection are presented in Table 5.

Fig. 6
figure 6

p53_HUMAN PPIKG.

Table 1 Degree centrality of the top 50% proteins (544) in p53_HUMAN PPIKG.
Table 2 Closeness centrality distribution of the top 50% of proteins (544 proteins) in p53_HUMAN PPIKG (we performed normalization by dividing the result by the raw maximum closeness centrality).

According to the degree centrality and closeness centrality, we filtered 525 candidate proteins from the intersection, as shown in Fig. 7.

Fig. 7
figure 7

525 proteins filtered by centrality measurement.

Table 3 Candidate targets with direct interactions.
Table 4 Candidate proteins filtered by vector distance.
Table 5 Candidate proteins filtered by the 2-hop-based reachable path statistical algorithm.

Molecular docking

AutoDock Vina46 has demonstrated the best binding affinity estimation (scoring power) among ten docking programs, including five commercial programs (LigandFit, Glide, GOLD, MOE Dock, and Surflex-Dock) and five academic programs (AutoDock, AutoDock Vina, LeDock, rDock, and UCSF DOCK)47. Hence, AutoDock Vina was used for molecular docking and num_modes was set to 20 and exhaustiveness was set to 8. UNBS5162 was prepared by LigPrep of Schrödinger and converted into PDBQT format by the Python script of MGLTools. The protein structures were optimized via the Protein Preparation Wizard of Schrödinger as the default parameter and transformed into PDBQT format with the Python script of MGLTools. The center of the reference ligand at approximately 6 Å within the complex was set as the binding site. Finally, proteins with a Vina score less than -5.0 kcal/mol were selected for the activity assay.

Experimental verification

Western blot

For western blotting, the cells were washed with ice-cold PBS three times and lysed in cell lysis buffer (Cat#9803, CST) for 30 minutes on ice, after which the protein was harvested. The cell lysates were centrifuged at 13,000 g for 10 minutes at 4 °C, after which the supernatants were collected. The protein concentration was measured with a BCA protein assay kit (Cat#PQ0012, MULTI SCIENCE) according to the manufacturer’s instructions. Immunoblotting was performed as described previously 32. Briefly, cell lysates were separated on SDS-PAGE gels and transferred to NC membranes. Proteins were detected with the antibodies indicated in the figure via Immobilion TM Western Chemiluminescent HRP Substrate (Cat#WBKLS0100, MILLIPORE).

In vitro USP7 activity assay

The deubiquitination activity of USP7 was measured by performing continuous kinetic assays using the identical fluorogenic substrate Ub-Rho110. The fluorescence intensity was monitored with a multifunctional enzyme marker (SpectraMax\(^\circledR\) i3x, Molecular Devices, U.S.A.) using wavelengths of 425 and 590 nm for excitation and emission, respectively. The experiments were performed in a 100 \(\upmu\)L reaction system with a buffer consisting of 50 mM Tris-HCl (pH 7.4), 150 mM NaCl and 0.5% Triton-X100. UNBS5162 was dissolved and diluted in DMSO to the desired concentrations to measure its activity. One microliter of diluted compound was added to 50 \(\upmu\)L of solution containing 30 nM USP7, and then the solutions were incubated at room temperature for 10 min. The reaction was initiated by adding 50 \(\upmu\)L of substrate (at a final concentration of 400 nM). The fluorescence intensity was monitored once every 30 seconds. The initial reaction velocities were calculated by fitting the linear portion of the curves (within the first 3 min of the progression curves) to a straight line via the Softmax Pro program and were converted to enzyme activity (amount of substrate cleaved)/second.

Results and discussion

Identification of UNBS5162 as a p53 signaling activator

High-throughput chemical screening based on a luciferase reporter was performed to identify small molecules that activate p53 signaling. For the screening, a stable cell line with a p53-driven luciferase reporter carrying the p53 DNA-binding site was established via the p53 wild-type HCT116 cell line. High-throughput chemical screening was performed with a chemical library containing 4000 compounds to identify compounds that increased p53-driven luciferase activity. The effectiveness of this system was confirmed by SAR405838, which inhibits MDM2 (mouse double minute 2)-driven p53 degradation and thus increases p53. Then, the top five hits that consistently showed more than a 4-fold increase in luciferase activity compared with the average fluorescence value were chosen (Fig. 8A), and UNBS5162, which had the strongest effect, was confirmed to be effective in a dose-dependent manner (Fig. 8B). These findings suggested that UNBS5162 can improve the transcriptional activity of the p53 protein.

P53 transcriptional activity is regulated by multiple pathways, such as posttranslational modification and protein complex formation48. Identifying the target of a compound plays a crucial role in explaining the mechanism of drug action and drug development, as the complex regulatory network within biological phenotypes makes this target determination highly challenging. Revealing the p53-related protein network may provide an approach to deconvolute the direct target of UNBS5162.

Fig. 8
figure 8

P53 signaling-based drug screening. (A) The effects of SAR405838 and UNBS5162, which are p53 signaling activators, were confirmed; (B) UNBS5162 improved p53 transcriptional activity in a dose-dependent manner.

P53_HUMAN PPIKG

A p53_HUMAN PPIKG was constructed to reveal the biological significance of disease-associated mutations identified by protein interaction studies and to identify drug targets for complex diseases. Protein information was collected from the UniProt database, and protein interaction information was collected from the BioGRID database. The P53_HUMAN PPIKG includes 1,088 proteins and 27,259 direct interactions within 2 hops, as shown in Fig. 9.

Fig. 9
figure 9

P53_HUMAN PPIKG at depth 2.

The screening based on p53_HUMAN PPIKG

One of the purposes of this study is to assist biomedical experts in drug target discovery and reduce their workload, which can be considered a search problem. Inspired by the binary search algorithm, we adopted a strategy of filtering out the top 50% of the elements sorted by each process for further analysis. As shown in Fig. 10, first, we obtained 1088 candidate proteins that are directly related to p53_HUMAN. Second, the intersection of the screening results based on degree centrality and closeness centrality was taken to obtain 525 candidate proteins. Third, we further narrowed the candidate proteins on the basis of “direct interaction”, and 157 candidate proteins were selected. Fourth, 69 candidate proteins were screened via vector similarity. Fifth, the likelihood of two proteins interacting with each other was quantified via a k-hop-based reachable path statistical algorithm. Finally, the 2-hop-based reachable path statistical algorithm decreased the number of candidate proteins to 35.

Fig. 10
figure 10

Candidate protein outputs in the process of screening.

Identification of USP7 as a potential target of UNBS5162

As a method to evaluate potential targets of UNBS5162, we used the AutoDock Vina to predict interactions between candidate proteins and UNBS5162. Only 23 of the 35 candidate proteins had a distinct three-dimensional structure, and only 18 of the 23 proteins had a clear molecular docking binding site. After these candidate proteins were ranked by the Vina score, 18 hits were selected, as shown in Table 6.

Table 6 Candidate proteins ranked by the Vina score.

In Table 6, we used the following tools and methods: docking software–AutoDock Vina; PDBQT format files created via MGLtools; and protein optimization–Schrodinger 2020 Prepwizard (fill in missing side chains and loops, minimize hydrogens, and other parameters as the default).

The selected proteins were then reordered according to the number of 2-hop paths connected to p53_HUMAN in the knowledge graph, as shown in Table 7.

Table 7 Candidate proteins ranked by path statistics.

After HCT116 cells were treated with UNBS5162 and CHX (cycloheximide), the degradation rate of the top candidate protein, MDM2, was detected, and UNBS5162 promoted the degradation of MDM2 (Fig. 11 A). Since the top candidate protein, MDM2, showed accelerated degradation upon UNBS5162 treatment (Fig. 11 B), the direct target of UNBS5162 remains unknown. We presumed that MDM2 may not be a direct binding target of UNBS5162 but rather a substrate for the target of UNBS5162. We subsequently performed molecular docking between the other proteins and UNBS5162 to predict the potential target of UNBS5162.

Fig. 11
figure 11

UNBS5162 promotes MDM2 degradation. (A) CHX was added at a concentration of 10 \(\upmu\)M with 10 \(\upmu\)M UNBS5162 or 1‰DMSO, total protein was extracted at the indicated time points after treatment, and MDM2 protein levels were subsequently analyzed via WB. (B) The bands on the blot were quantified via Tanon GIS, and the decay constants were calculated via Calculater.net (https://www.calculator.net/half-life-calculator.html). The results shown are representative of three independent experiments. The values represent the means ± SDs (n=3).

Fig. 12
figure 12

Binding mode between UNBS5162 and USP7. (A) UNBS5162 chemical structure; (B) USP7 binds to UNBS5162 via Thr511, Asn497 and His464.

Our findings revealed that the second candidate protein, USP7 (also known as UBP7), could bind to UNBS5162 (Fig. 12 A), and USP7 was shown to bind to UNBS5162 via Thr511, Asn497 and His464 (Fig. 12 B), indicating that USP7 could be a potential target for UNBS5162.

UNBS5162 actives USP7 and increases the p53 protein level

USP7, a deubiquitinating enzyme, removes ubiquitin from its substrate49. P53 is one of the substrates of USP7, and studies have reported that overexpression of USP7 leads to the stabilization of p5350, indicating a possible mechanism by which UNBS5162 actives p53 signaling. The interaction between USP7 and UNBS5162 was examined via surface plasmon resonance (SPR). The response curve of USP7 was quite different from that of BSA, which was used as a negative control (Fig. 13 A), and the association rate constants (\(K_{on}\)), dissociation rate constants (\(K_{off}\)), and dissociation constants (KD) of the interactions were calculated. The experimental results revealed that the KD value between UNBS5162 and USP7 was as low as 1.92e-8 M, as shown in Table 8. These result show that UNBS5162 can closely bind to USP7. All the data indicate an interaction between USP7 and UNBS5162, and it is speculated that UNBS5162 may directly act on USP7 to regulate the stability of p53.

Table 8 Kinetic association constant between USP7 and UNBS5162.

As shown in Fig. 12, UNBS5162 binds to USP7 via Thr511, Asn497 and His464, which may help to promote the formation of active USP7 and subsequently improve its hydrolytic activity. Interactions between a compound and a protein do not necessarily directly influence the biological activity of the protein. The influence of UNBS5162 on the deubiquitinating activity of USP7 was monitored via a fluorometric assay using ubiquitin-rhodamine (Ub-Rho) as the substrate to further verify whether UNBS5162 directly affects the activity of USP7. The activation of USP7 was attributed to the enzyme-mediated cleavage of the rhodamine fluorophore from Ub-Rho. UNBS5162 dose-dependently increased the hydrolytic activity of USP7 (Fig. 13 B), leading to increased p53 protein levels (Fig. 13 C). These findings suggest that UNBS5162 targets USP7 to improve its deubiquitinating activity and thus activate p53 signaling.

Fig. 13
figure 13

UNBS5162 actives USP7 and increases p53 protein levels. (A) Curve showing the response of USP7 to UNBS5162. (B) UNBS5162 increases the deubiquitinating activity of USP7 in a dose-dependent manner. (C) UNBS5162 increases the p53 protein level.

Conclusion

In this study, a protein-protein interaction knowledge graph was built to represent molecular interactions, which could integrate relatively heterogeneous or sparse omics datasets. By establishing a high-throughput luciferase screening system based on p53 transcriptional activity, a screening study revealed that UNBS5162 can activate the p53 signaling pathway, which may constitute a new mechanism underlying the antitumor activity of UNBS5162. Furthermore, a series of novel p53_HUMAN PPIKG-based deconvolution methods for screening candidate proteins was designed by combining centrality measurements of knowledge graph topology, molecular interaction mechanisms, and semantic relatedness. On the basis of the candidate proteins predicted by the knowledge graph, molecular docking and experiments were applied to successfully identify USP7 as a direct target of the p53 signaling activator UNBS5162. This method not only offers enhanced efficiency in drug screening but also significantly improves the interpretability of molecular docking, and it is poised to play a pivotal role in promoting drug screening and pharmacological research. In response to sudden outbreaks of diseases caused by emerging or reemerging viruses, we can effectively utilize PPIKG and phenotype-based screening systems to develop antiviral drugs. Ongoing multidisciplinary research on drug-target interactions will help overcome significant time and cost challenges associated with target deconvolution in phenotype-based screening hits, as well as the difficulties in ensuring biological relevance in target-based drug screening and design. This, in turn, will advance drug screening and pharmacological research.