Abstract
Drug–drug interactions (DDIs) present serious risks in clinical settings, especially for patients who are prescribed multiple medications. A major factor contributing to these interactions is the inhibition of cytochrome P450 (CYP450) enzymes, which are vital for drug metabolism. As a result, reliably identifying compounds that may inhibit CYP450 enzymes is a key step in drug development. However, existing machine learning (ML) methods often fall short in terms of prediction accuracy and biological interpretability. To address this challenge, we introduce a Multimodal Encoder Network (MEN) aimed at improving the prediction of CYP450 inhibitors. This model combines three types of molecular data (chemical fingerprints, molecular graphs, and protein sequences) by applying specialized encoders tailored to each format. Specifically, the Fingerprint Encoder Network (FEN) processes molecular fingerprints, the Graph Encoder Network (GEN) extracts structural features from graph-based representations, and the Protein Encoder Network (PEN) captures sequential patterns from protein sequences. By integrating these diverse data types, MEN can extract complementary information that enhances predictive performance. The encoded outputs from FEN, GEN, and PEN are fused to build a comprehensive feature representation. An explainable AI (XAI) module is incorporated into the model to support biological interpretation, using visualization techniques such as heatmaps. The model was trained and validated using two datasets: chemical structures in SMILES format from PubChem and protein sequences of five CYP450 isoforms (1A2, 2C9, 2C19, 2D6, and 3A4) obtained from the Protein Data Bank (PDB). MEN achieved an average accuracy of 93.7% across all isoforms. The individual encoders performed with accuracies of 80.8% (FEN), 82.3% (GEN), and 81.5% (PEN). Additional performance results include an AUC of 98.5%, sensitivity of 95.9%, specificity of 97.2%, precision of 80.6%, F1-score of 83.4%, and a Matthews correlation coefficient (MCC) of 88.2%. All data and code are available at https://github.com/GracedAbena/MEN-Leveraging-Explainable-Multimodal-Encoding-Network.
Similar content being viewed by others
Introduction
Human Cytochrome P450 (CYP450) enzymes are crucial for metabolizing a wide range of substances, including drugs, toxins, and endogenous compounds, as well as synthesizing lipids like cholesterol and steroids1. Cytochrome P450 (CYP450) enzymes exhibit a broad substrate range and metabolic activity, making them key players in pharmacology and toxicology. Variability in CYP450 activity can alter drug metabolism rates, impacting drug effectiveness, potential toxicity, and the risk of adverse drug reactions2. These critical isoforms are discussed in further detail below.
-
CYP3A4 This isoform, the most abundant in the liver and intestine, is involved in the metabolism of nearly half of all marketed drug3. Known for its broad substrate selectivity, it plays a critical role in the first-pass metabolism of orally administered medications, significantly impacting their bioavailability and therapeutic effectiveness.
-
CYP2D6 CYP2D6 is responsible for metabolizing roughly 25% of widely prescribed drugs, such as opioids, beta-blockers, antipsychotics, and antidepressants, despite its lower concentration4. Genetic differences result in varied metabolic capacities among individuals, categorizing them as ultra-rapid, extensive, intermediate, or poor metabolizers, each with distinct drug response and dosing implications.
-
CYP2C9 CYP2C9 is responsible for metabolizing an estimated 15–20% of drugs, including medications like phenytoin, warfarin, tolbutamide, and several NSAIDs, such as ibuprofen and diclofenac5. Genetic differences in CYP2C9 can lead to variability in individual drug responses, with this enzyme’s activity significantly affecting the pharmacokinetics and therapeutic effectiveness of these drugs6.
-
CYP1A2 Approximately 9–15% of drugs prescribed in clinical practice are metabolized by CYP1A27. This enzyme is responsible for processing several key substances, including caffeine, the bronchodilator theophylline, certain antidepressants like fluvoxamine, and the antipsychotic clozapine. CYP1A2 activity can vary due to a range of factors, including genetic polymorphisms, dietary influences, and environmental exposures such as smoking.
-
CYP2C19 CYP2C19 is another essential enzyme in the cytochrome P450 family, responsible for metabolizing 8–10% of drugs commonly used in clinical settings. This enzyme processes a variety of important medications, including the proton pump inhibitor omeprazole, the antiplatelet drug clopidogrel, certain antidepressants like citalopram, and anxiolytic diazepam. Genetic differences in CYP2C19 can greatly influence its activity, leading to individual variations in drug safety and efficacy8.
It is true that Drug-Drug Interactions, or DDIs, happen when two drugs are taken at the same time, and one affects the metabolism of the other. Drug levels may change because of this interaction, which could have negative consequences or result in therapy failure. Through processes like enzyme stimulation or inhibition, the CYP450 isoforms frequently contribute to DDIs. CYP450 isoform inhibitors can increase the plasma levels of other medicines that are processed by the same isoform of the enzyme9. One strong CYP3A4 inhibitor that can raise CYP3A4 substrate levels considerably and possibly cause toxicity is ketoconazole. On the other hand, CYP450 enzyme inducers increase the expression and activity of these enzymes, which lowers the plasma concentration of medications that are processed by the induced enzyme10. For example, the CYP3A4 inducer rifampicin can reduce the efficacy of medications such as immunosuppressants and oral contraceptives. The cycle of DDIs is shown in Fig. 1, which emphasizes the part CYP450 isoforms play in drug metabolism. When Drug A and Drug B are administered together, there may be no effect on the metabolism of each drug. If Drug A’s metabolism is impacted, it suggests that Drug B either induces or inhibits the CYP450 isoforms that oversee Drug A’s metabolism, and vice versa. Significant clinical consequences result from the participation of CYP450 isoforms in drug metabolism and DDIs. Knowledge of these interactions can help with drug dosing, preventing side effects, and improving treatment outcomes9. To prevent side effects and maximize effectiveness, a patient’s CYP450 genotype can guide personalized treatment. For instance, CYP2D6 genotyping can be used to customize antidepressant therapy. CYP450 interaction screening should be a routine component of preclinical review in drug development since it can improve medication safety profiles and avert expensive late-stage failures. Furthermore, patients who take various drugs that are processed by the same CYP450 isoforms can benefit from therapeutic monitoring of drug levels, which can aid in the management and mitigation of DDIs11.
Cytochrome P450 inhibitors are compounds that block the activity of cytochrome P450 enzymes. They are important in medicine and pharmacology because they alter drug metabolism, which can result in changes in drug safety and efficacy as well as interactions between drugs. These enzymes, which are members of the hemoprotein superfamily and have 57 isoforms found in hepatic cells, can interfere with metabolism and cause undesirable medication reactions, including acute side effects and metabolic problems such as drug-food and drug-drug interactions. Modern drug discovery relies heavily on the study and discovery of CYP inhibitors since one way to predict DDIs, optimize medication design, and guarantee patient safety is to understand how these compounds interact with CYP450 enzymes. High-throughput screening, which was expensive, was employed to evaluate the bioactivities and characteristics of drugs before computational methods.
Using large chemogenomic and pharmacological databases such as DrugBank, KEGG, STITCH, ChemBL, and Davis, artificial intelligence (AI) models have recently improved prediction accuracy and efficiency in drug development. Two varieties of data are considered by AI models: protein sequences and ligands. Protein-binding ligands could either activate or inhibit CYP450 enzymes, and these interactions are greatly influenced by the structural and physicochemical characteristics of the ligands. The structural conformation and interaction of CYP450 enzymes with ligands are dictated by their protein sequence. AI models can more accurately predict a compound’s propensity to be a CYP450 inhibitor by combining information from protein sequences and ligands. Advanced cheminformatics approaches are utilized to extract attributes of ligands, which include molecular weight, hydrophobicity, electrical properties, and three-dimensional conformation, for use as input in AI models. The three-dimensional structure and functional sites of CYP450 enzymes are determined by protein sequences, which are made up of amino acids. Advances in bioinformatics and sequence-based characteristics allow for precise predictions of the structure and function of enzymes. By capturing intricate interactions between ligands and protein sequences, recent advances in machine learning12 and deep learning such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and graph neural networks (GNNs) have improved predictions of CYP450 inhibitors. These algorithms create reliable classifiers to predict the inhibitory potential of new compounds since they are trained on large datasets of known CYP450 inhibitors and non-inhibitors. Model performance and generalizability are improved by methods like transfer learning and multitask learning, while accuracy and dependability are guaranteed by cross-validation and external validation. To improve the prediction of CYP450 inhibitors, we investigated in this research study how to incorporate ligand characteristics and protein sequence data into AI models. Our findings regarding the enhanced predictive performance attained by this integrative strategy are presented, along with a discussion of the methods for feature extraction, model training, and validation. Our findings show how AI-driven models have the potential to greatly progress in the fields of drug metabolism and safety evaluation, which will ultimately lead to safer and more effective drug development procedures. The primary contributions of the study include,
-
(1)
A unique Multiple Encoders Network (MEN) AI model is proposed to predict the CYP450 inhibitors by merging and assembling features from multiple pipelines of Graph, FingerPrint, DDE, AAC, and the PseAAC Encoders. All pipelines integrate attention modules to improve the feature extraction of ligands and protein target sequences.
-
(2)
A unique attention method Residual Multi Local Attention (ReMLA) is proposed to extract significant characteristics.
-
(3)
Explainable heat maps are produced using RDkit to aid in the visualization of the CYP450 inhibitor anomalies.
Related work
Hybrid learning
Using a 5-page Google Scholar search, Fig. 2a shows the existing AI models in years. Most of these methods represent data (such proteins or ligands) as numerical vectors and are feature-based machine learning approaches. The distinct physicochemical composition of the machine learning (ML) model’s training data is displayed by these vectors13,14,15. Figure 2b shows how the AI techniques discussed are distributed over ML, deep learning (DL), and a hybrid of the two. Of these, sixty percent are DL techniques and thirty percent are ML approaches. Approximately 10% of models are hybrid.
A recent survey analyzed AI-based research on CYP450 inhibitor predictions from 2020 to 2024. Part (a) shows a yearly summary of publications, while part (b) categorizes these studies into Machine Learning, Deep Learning, and Hybrid AI models, highlighting trends in methodology preferences. This data was sourced from Google Scholar.
In predicting tasks like CYP inhibition, one of the most used computational techniques is quantitative structure–activity (property) relationship (QSAR/QSPR) modeling16. QSAR/QSPR models were designed using linear, non-linear, or empirical functions according to the chemical composition of molecules. Chemical representations include the simplified molecular input line entry system (SMILES)17, fingerprints, 2D/3D structures, and SMARTS. These methods might be used as training data for artificial intelligence (AI) models. Molecular property prediction has become more common by using raw molecule structures as inputs for artificial neural networks (ANN) and Deep Learning. Heat capacity, protein structure, bandgaps, toxicity, bioactivity, lipophilicity, water solubility, and odor descriptors have all been successfully predicted18,19,20. Many DL modeling models and their feature representations, including ANN-based fingerprint representations21, Made up of multiple linear regression (MLR) and deep neural networks (DNN)22. Techniques like Deep Reinforcement Learning, Long Short-Term Memory, Convolution Neural Networks, and Gated Recurrent Units23,24 uses the SMILES string-based format.
Neural networks
For CNN and in silico methods, various proposals have been made to predict CYP isoform inhibition. To predict the CYP450 inhibitors, this section thoroughly examines and compiles the most recent AI-based research studies. In 2024, Njimbouom et al.25 presented MuMCyp_Net, a multimodal neural network for CYP450 inhibitor prediction. MuMCyp_Net incorporates many feature descriptors, including RDKit, MACCS, and Morgan fingerprints to improve prediction accuracy. Nevertheless, there is no biological interpretability to this method. Interpreting neural networks can be challenging, particularly when they have intricate topologies and numerous data modalities. Explainability is critical in drug development, where it is especially important to comprehend the logic underlying the model’s predictions for trust and validation. In the same year, Chang et al.26 introduced the DeepP450 framework that enables the prediction of small chemical activity on human P450 enzymes in the same year. Protein sequences of P450 enzymes were encoded using a pre-trained language model, like those used in natural language processing. This made it easier to identify intricate patterns and correlations in the activity of enzymesIt’s possible that DeepP450’s pre-trained protein language model does not generalize well to other P450 enzyme types or organisms. The system may exhibit bias towards the training set or fail to incorporate all pertinent properties for a range of enzyme–substrate interactions. In 2023, Chen et al.27 employed the Keras framework and TensorFlow to train models on a large dataset of 85,715 compounds from the PubChem bioassay database in 2023. For various CYP isoforms (CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4), they achieved a high area under the curve (AUC) value of 0.94–0.97. Although their research offers structural warnings (SAs) for inhibition, deep learning models, especially neural networks, often function as “black boxes,” making it challenging to completely comprehend how various structural aspects influence predictions and how decisions are made. In contrast to multi-task models, which predict several isoforms simultaneously, which could potentially increase predictive performance and provide a more comprehensive understanding of inhibition across distinct CYP enzymes, they used single-task models for each CYP isoform. In 202328, put forth another AI model that sought to predict substrates and inhibitors of the cytochrome P450 2B6 enzyme. The difficulty of recognizing these substances is addressed in the study since inaccurate predictions can result in negative drug responses and ineffective treatments. From databases and published literature, the researchers assembled a dataset of CYP2B6 substrates and inhibitors that are currently known. A variety of molecular descriptors, such as structural, electronic, and physicochemical properties represented chemical compounds. Numerous machine learning methods were utilized, including Random Forest, Support Vector Machines (SVM), Gradient Boosting Machines, and Neural Networks. When trained on a subset of the data, these models achieved different levels of accuracy, sensitivity, and specificity in finding inhibitors and substrates and verified using cross-validation procedures to evaluate their prediction performance. The selection of features affects how well ML models work. Considering that not all of the chemical descriptors that the authors choose to use may be useful for predicting CYP2B6 interactions, poor feature selection can impair model performance because biological systems are dynamic and complicated. Interactions between drugs and enzymes can change depending on several circumstances, including concentration and the presence of other drugs. It is possible that the ML models utilized did not consider this. DeepCYPs, a method for encoding chemical structures that use graph-based representations and molecular descriptors to capture pertinent aspects of compounds, was proposed by Da et al.29. Using methods like cross-validation to provide robustness, their model was trained on multiple large datasets with known CYP activities. DEEPCYPs may work well on the training set of data, but their effectiveness may be restricted when dealing with novel substances or other CYP enzyme types. This raises questions about how widely applicable the paradigm is. By identifying current medications that may inhibit CYP450 enzymes, Ouzounis et al.30 developed an AI framework in 2023 to enable drug repurposing and maybe reduce the time and expenses involved in drug development. They used a range of AI algorithms, such as ensemble techniques31. Their approach showed significant improvements in prediction accuracy over existing methods, as evaluated by benchmark datasets. The way molecules are represented for training can have a big impact on how well they work. Efficient feature selection could influence predictive accuracy. Weiser et al.32 sought to enhance conventional docking methods with ML models to assess docking data to increase the precision of predicting how different drugs will interact with CYP enzymes. To predict the probability that new drugs would be inhibitors, models are trained using a dataset of known CYP inhibitors and non-inhibitors. The quality of the features obtained by docking contributes to the success of the integrated approach. Qiu et al.33 in 2022, presented a model for predicting cytochrome CYP450 enzyme inhibitors using graph convolutional neural networks (GCNNs) with an attention mechanism. The authors used a unified design, which eliminates the requirement for unique models for each isoform, as the work in27. Their model outperformed earlier DL techniques and conventional ML techniques in terms of accuracy and robustness after being tested on several datasets. The unified model they have developed is effective for several tasks. It may provide computing difficulties when applied to huge datasets or intricate molecular structures. The proposed attention method improves performance, but it also makes the model more complex, which makes tuning and optimization more difficult. To anticipate inhibitors of the human cytochrome P450 enzymes34, introduced a Multitask Learning and Molecular Fingerprint-Embedded Encoding approach in the same year. To lower the likelihood of unfavorable drug interactions, the proposed iCYP-MFE model can be used in drug discovery to screen for possible CYP inhibitors early in the drug development process. Depending on the precise chemical characteristics that are important for CYP inhibition, molecular fingerprints can have varying degrees of efficiency. The selected fingerprint may miss some relevant interactions. Plonka et al.35 in (2021) introduced Cyplebrity, an ML framework intended to predict inhibitors of cytochrome P450 (CYP) enzymes, to enhance drug safety and design. The authors concluded that ML techniques of ensemble techniques with strong feature sets are useful instruments for CYP inhibition prediction. These models may be found in pharmacokinetic research, toxicity assessment, and drug discovery. Although their method yields precise predictions, it provides only a limited understanding of the underlying biochemical pathways causing CYP inhibition. The inability to analyze the results can make it difficult to comprehend how the projected consequences relate to biology. The main goal of Park et al.’s work, which was also presented in 2021, was to develop a predictive model that can precisely determine if a small molecule inhibits particular human CYP enzymes, essential for drug metabolism36. Their method is likely to have trouble generalizing to new compounds or situations, especially if they are very different from what it was trained on.
Vision transformers approach
Recent studies on vision transformers (ViTs) for CYP450 inhibitor prediction show that these models can greatly increase the accuracy of in silico predictions, allowing for the early detection of possible DDIs. The ViTs which were first created for image analysis have been modified for use with molecular data. They are capable of capturing long-range dependencies within molecular structures, which is crucial for precisely predicting interactions with CYP450 isozymes, including CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A437,38,39. As demonstrated in the GTransCYPs model, integrating ViTs with graph neural networks (GNNs) is one promising strategy. Through attention pooling and molecular feature aggregation, this model achieves great performance in CYP450 inhibition prediction by combining transformer convolution layers with graph-based encoding. Research shows that across the main CYP450 isozymes, GTransCYPs improve the precision of differentiating between CYP450 inhibitors and non-inhibitors38. By providing insights into structure–activity connections that aid in drug discovery and development, these techniques highlight the versatility of ViTs for pharmacological activities37,40. Furthermore, hybrid frameworks that combine autoencoders and ViTs have been investigated to increase the predictive capacity for the classification of CYP450 inhibitors. These models demonstrate strong performance by utilizing deep neural network capabilities in conjunction with the representational strength of ViTs, making them promising instruments for preclinical and clinical drug screening applications39,40,41.
Ensemble learning
The ability of ensemble learning models, such as Random Forests, Gradient Boosting, and Extreme Gradient Boosting (XGBoost), to handle complicated molecular data and precisely capture its complex interactions has made them very successful in identifying CYP450 inhibitors. Drug-enzyme interactions are critical in pharmacology, and ensemble techniques help to avoid overfitting by pooling results from many models, improving predictive accuracy42,43. CYP450 inhibitor prediction has extensively used Random Forest (RF) models. Because RF models aggregate several decision trees, they are especially resistant to overfitting, a problem that often arises when working with various molecular descriptors and fingerprints. Research demonstrates that RF is successful in identifying enzyme inhibitors by improving predictive performance and adjusting model parameters using strategies like grid search42,44. CYP450 inhibitor prediction has also benefited greatly from gradient boosting, especially XGBoost, which can capture intricate nonlinear correlations between features. XGBoost-based models frequently employ molecular descriptors such as MACCS and Morgan fingerprints, which aid in capturing the structural and functional subtleties of CYP450 inhibitors for increased predictive accuracy44. Even greater predictive power than standalone models has been demonstrated by stacking ensemble methods, which combine algorithms such as RF, XGBoost, and deep learning models. For instance, the Ensemble of Extreme Learning Machines (EELM) successfully predicts inhibitory interactions with particular CYP isoforms, exhibiting high sensitivity and specificity characteristics that are necessary for precise drug-drug interaction evaluations45. Ensemble learning is a potential approach for CYP450 inhibitor prediction due to its adaptability and efficiency in processing a variety of chemical compounds, which will help ensure safer and more dependable drug design and development.
Materials and methodology
As illustrated in Fig. 3, this work introduces an AI-based framework called MEN, specifically designed for predicting CYP450 inhibitors. The MEN framework is structured with multiple processing pipelines, including the Fingerprint Encoder Network (FEN), Graph Encoder Network (GEN), and three distinct Protein Encoder Networks (PENs) tailored to protein sequences. The PEN encoder networks of AAC, PseAAC, and DDE are simultaneously integrated end-to-end within the framework. To enhance feature extraction, various attention modules are strategically embedded across all pipelines, enabling more accurate analysis of both compounds and protein target sequences. The GEN and FEN modules are optimized to process compound data, generating high-level deep features, while the PENs add complementary insights by analyzing protein sequences. The overall prediction performance is then improved by merging the output features that were derived from all pipelines using a feature space ensemble technique. In the interim, interpretable heat maps are produced. using the RDkit to easily understand the CYP450 inhibitor decision-making process visually. The model’s components are explained in detail in the subsequent sections.
Dataset
We used two input datasets to train, evaluate, and adjust the proposed AI-based MEN framework: Drug ligands or substances (1) SMILES strings gathered from the PubChem repository46 and (2) target protein sequences retrieved from the PDB database47. The subsequent sections elaborate these datasets.
Ligand data from the PubChem database
The SMILES48 strings dataset was obtained from the PubChem database. SMILES strings are text-based notations that give a simple, understandable depiction of molecular structures. They comprise symbols and alphanumeric characters that define the molecular bonds and atoms. Figure 4 shows a few examples of compounds or ligands taken from the PubChem database. The SMILES string for each of these compounds is given below it, and the number above each compound indicates its PubChem ID. Every entry in the SMILES dataset is represented by a SMILES string and refers to a distinct chemical molecule. Millions of SMILES strings covering a wide range of organic and inorganic compounds are freely accessible through PubChem. These strings are commonly used in cheminformatics for structure–activity relationships study, compound identification, and similarity searches. 21 chemical or ligand datasets, which describe the five CYP isoforms (CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4) and their inhibitory activities, were acquired from the PubChem database46. We selected datasets with sample sizes exceeding 100 samples to achieve the goal of the proposed AI prediction system. For each of the 21 datasets, sample statistics and data for training and validation are shown in Table 1. To provide a fair comparison, the dataset configuration used in33 was followed. The test set distribution among the five CYP450 inhibitors is displayed in Table 2. Molecules with activity scores greater than 40 and a curved class of 1.1, 1.2, or 2.1, were classified as inhibitors. Compounds not meeting either of these criteria were labeled as inconclusive and excluded from further analysis.
Examples of the chemical compounds. This dataset was sourced from the PubChem database37.
CYP target protein sequence dataset
The CYP Protein Target Sequence was outsourced from the Protein Data Bank (PDB)47. The linear sequence of amino acids linked together by peptide bonds is identified as the primary structure of the proteins. PDB data contain the full primary sequence of each protein structure. PDB includes structures for proteins from various organisms, including people, bacteria, viruses, yeast, and organisms (such as mice and yeast). Every PDB entry has comprehensive details about the sequence, such as the length of the sequence, the chain identifier(s) that correspond to the protein(s) in the structure, and occasionally annotations concerning domains, functional locations, and mutations. The significant sequence variety observed in CYP450 enzymes is indicative of their adaptability to metabolize a diverse array of substrates. Each CYP isoform in this work was denoted as a sequence consisting of important ligand-binding residues. Figure 5 shows the protein sequences and three-dimensional (3D) structures that we gathered from the PDB Database for CYP1A2 (PDB ID: 2HI4), CYP2C9 (PDB ID: 6VLT), CYP2C19 (PDB ID: 4GQS), CYP2D6 (PDB ID: 4WNW), and CYP3A4 (PDB ID: 4D7D). The Biovia Discovery Tool and PyMol software were used to perform simulations to determine the essential binding residues for each of the five CYP450 isoforms. Dashed lines show the distances within 5 angstroms of the CYP450 protein targets. Below the structures are the corresponding protein sequences for every isoform. The residues are specified on the second row using the three-letter notation of the protein residue name according to IUPAC49 (Amino acids or Protein codes used in Bioinformatics). Methionine, for instance, is represented as MET, and Threonine as THR. We used the one-letter IUPAC representation of these protein residue names in our testing, as per33. The one-letter notation for Methionine and Threonine is represented by the letters M and T, respectively. Just like in33, we also exclusively employed the first 42 binding residues of chain A for each CYP450 protein target sequence. The third row provides the 42 binding residues for each isoform under the corresponding protein target.
Examples from the CYP Target Protein Sequence dataset (Section “Examining ReMLA”) are shown. This dataset, sourced from the Protein Data Bank (PDB)39, includes the 3D structures of each of the five CYP450 isoforms along with their ligand inhibitors, depicted as balls in the first row. In the second row, each isoform’s key binding residues within a 5-angstrom radius of the ligands are displayed, with ligand inhibitors highlighted in yellow.
Preprocessing and input matrix handling
Prior to model input, we ensured that all data were consistently preprocessed. Protein sequences were cleaned and standardized before computing descriptors. Compound SMILES strings were canonicalized using RDKit to remove inconsistencies and invalid entries. The molecular graphs were constructed with appropriate atom- and bond-level features, while fingerprint vectors were normalized where necessary. Finally, we implemented custom batching procedures to accommodate the varying input formats across fingerprint-based, graph-based, and sequence-based data, ensuring compatibility with our model architecture during both training and evaluation phases. In addressing missing values in the ligand data, we employed molecular structure similarity, specifically the Tanimoto coefficient to identify structurally similar compounds and impute missing values based on their shared features. Additionally, we occasionally applied SMILES string augmentation by reconstructing incomplete strings using established structural patterns. To standardize the ligand graph representations, Z-score normalization was employed. In a few cases where data was incomplete or missing in the protein sequence, we leveraged sequence motifs such as binding sites illustrated in Fig. 5 along with relevant biological knowledge to infer the missing information based on the surrounding sequence context.
In our approach, we processed compound and protein sequence data using a combination of descriptor-based and graph-based representations to effectively prepare inputs for our machine learning model. For compound data, we utilized three distinct molecular fingerprints—MACCS keys, Pharmacophore ErG, and PubChem fingerprints—which provided complementary structural and substructural information about each molecule in fixed-length vector formats. In addition to these descriptor-based encodings, we also employed a graph-based representation of the compounds by converting SMILES strings into molecular graphs, where atoms and bonds were represented as nodes and edges respectively, enabling the use of graph neural networks to capture topological and connectivity-based features. For protein sequences, we extracted three types of descriptors: Amino Acid Composition (AAC), Pseudo Amino Acid Composition (PseAAC), and Dipeptide Deviation from Expected Mean (DDE). These feature extraction methods captured both global compositional information and local sequence-order information, offering a diverse and informative representation of protein sequences. All descriptors were computed using fixed-length numerical feature vectors, allowing for seamless integration with the compound representations.
The MEN framework proposed for the prediction of CYP450 inhibitors prediction
As shown in Fig. 3, the presented model MEN is made up of five integrated modules: (1) FingerPrint Encoder Network (FEN), (2) the Graph Encoder Network (GEN), (3) three Protein Encoder Networks (PEN) which are AAC, PseAAC, and DDE, (4) Position Awareness Module (PAM), (5) Vision Transformer with Dynamic Attention (ViTDA) with an interpreter. FEN and GEN were engineered to process ligands, while the PEN handled protein sequences. Predictions were made using the ViTDA and the interpretations of results were made using the interpreter. Three molecular fingerprints (MACCS, Pharmacophore ERG, and PubChem fingerprint) were adopted to retrieve distinct information from each compound in FEN. GEN processes the SMILES as a graph structure integrating these two novel proposals: REMLA (Residual Multi Local Attention) and AHGIN (Attentive Hierarchical Graph Isomorphism Network). Each Encoder Network in PEN is constructed by incorporating an embedding layer, a self-attention layer, a hybrid pooling layer, and a convolutional layer. MEN is derived as,
where \({X}_{FEN,}{X}_{GEN,}{X}_{PEN,}\) are the respective inputs of FEN, GEN and PEN (takes sequence data), \(Gm\) is the global merger function, \(pam\) is the position awareness algorithm that produces the position awareness tokens to be fed to \(ViTDA\) which produces the prediction result. \({X_{{FEN}}^{\prime } ,X_{{GEN,}}^{\prime } X_{{PEN}}^{\prime } }\) represent the output features from the three encoders of MEN respectively.
Fingerprint encoder network (FEN)
The Fingerprint Encoder Network (FEN) is engineered using three different fingerprints––MACCS50, Pharmacophore ERG51 and PubChem fingerprint52.
-
MACCS One kind of molecular fingerprint used in cheminformatics to depict chemical structures is the MACCS (Molecular ACCess System) fingerprint. It uses a binary vector, usually of 166 bits, to encode whether certain substructures or chemical properties are present in a molecule. A predetermined chemical pattern or substructure, such as a certain atom type, ring structure, or functional group, is represented by each bit. A molecule’s functional groups, atom kinds, and particular bond patterns are only a few of the chemical properties and substructures encoded by the MACCS fingerprint. These properties frequently correspond to the biological function of the molecule, especially its inhibitory potential. Structural patterns that are more common in inhibitors can be found by comparing the MACCS fingerprints of known inhibitors and non-inhibitors. These patterns can act as unique markers, for instance, if specific bits (representing substructures) are often present in inhibitors but absent in non-inhibitors.
-
Pharmacophore ERG A cheminformatics technique called the Pharmacophore Extended-Reduced Graph (Pharmacophore ERG) fingerprint uses important pharmacophoric properties and their spatial correlations to characterize chemical structures. Essential molecular characteristics crucial for interactions with biological targets are captured, including hydrophobic areas, aromatic rings, charged groups, and hydrogen bond donors and acceptors. Certain pharmacophoric configurations necessary for binding to a target are usually shared among inhibitors. These configurations are captured by pharmacophore ERG fingerprints, which facilitate the identification of patterns linked to inhibitory action. Based on structural and pharmacophoric similarity, the probability of a new compound functioning as an inhibitor can be evaluated by comparing its pharmacophore ERG fingerprints with those of established inhibitors.
-
PubChem fingerprint The PubChem fingerprint is a binary vector obtained from the PubChem database that usually has 881 bits. Each bit indicates whether a certain chemical substructure or feature is present in the molecule. These substructures consist of bonding patterns, ring systems, functional groups, and atom categories. The PubChem fingerprints of known inhibitors and non-inhibitors can be compared to see whether structural patterns or characteristics are more common in inhibitors. The similarity scores between molecules can be computed using this fingerprint. Prioritization for additional screening is made possible by the higher likelihood of inhibitory activity in compounds with fingerprints resembling those of known inhibitors.
After combining these fingerprint features to create the Joint fingerprint input feature for REMLA, the output was processed by a convolution layer and regularized with Mish. It was then sent through three fully connected layers to prepare it for the final FEN output vector, and the FEN output feature is derived as,
where \(X\prime _{{FEN}}\) is the result generated by FEN, and \({X}_{{J}_{f}}\) is the Joint fingerprint input. \({X}_{{J}_{f}}\) is written mathematically as
The following paragraphs describe each component of the FEN pipeline, depicted in detail in Fig. 6.
Residual multi-local attention (ReMLA)
Two residually connected Multi-Local Attention (MLA) modules form part of the Residual Multi-Local Attention (REMLA) module. By giving gradients a straight channel during backpropagation, residual connections are known to mitigate the vanishing gradient issue in deep networks, for MLA, where several layers may be applied successively, this can be extremely helpful. Effective training may be hampered by gradients that diminish as they propagate back through each attention layer when residual connections are absent. Two localized attention modules with distinct receptive fields make up each MLA. For each input feature, different feature scales are obtained for each of the receptive fields \(R\). By employing multiple kernel sizes that display different receptive fields, the size of aggregation can be enhanced. Figure 6 illustrates the detailed REMLA module. Given the feature, \(X \in R\), from the former layer, we express LA as,
where σ denotes. The sigmoid function and \(f_{a}\) () is the attention function with the kernel weight of the form,
The i-th weight of xi is formulated as
where Ωk denotes a set of k features that are neighbors of xi. We implemented this with a 1D convolution expressed as X˜ = σ(CIDk(K)) ∗ X, where CID indicates the 1D convolution function. By using different receptive fields, and Eq. 1, MLA can be mathematically written as,
where \(r_{1} = 5,r_{2} = 7\) and \(DC() \in R^{C}\). The Dual Conv (DC) is the 1D convolution function, with a learnable weight \(w \in R^{C \times C \times 2}\). Thus, REMLA can be expressed as,
Here, the input \(X\) is processed by the first MLA layer \(MLA_{1}\). The second MLA layer \(MLA_{2}\) then, processes this output of the first MLA layer. λ denotes the scaling factor and \(Z\) is the final output obtained after applying the residual connection, which adds the original \(X\) to the output of the second MLA layer.
Graph encoder network (GEN)
The SMILES string is used to create a molecular graph, \(G = (N,E)\) where \(N = n_{i}\) denotes the group of nodes representing atoms. \(E = e_{ij}\) is the set of edges that represents the bonds existing between atoms. Each node \(n_{i}\) contains a feature vector \(v_{i}^{(0)} \in R^{d}\), made up of features like atom type, valence, etc. The input \(V^{(0)} \in R^{|N| \times d}\) is a matrix with every row denoting the feature vector \(v_{i}^{(0)}\) for \(n_{i}\). \(V^{(2)}\) becomes the normalized feature matrix. Figure 7 illustrates the process flow for creating a molecular graph using the SMILES of the compound. The RDkit package53 is used to validate these SMILES and convert them into molecular objects. After giving each atom in the molecule a unique identity, the molecular graph was traversed to finalize the graph representation. Figure 8 shows the Graph Encoder Network (GEN). The graph input was processed first by the Residual Multi-Local Attention (ReMLA) module to retrieve the most relevant node features. These node attributes were then normalized using a normalization procedure prior to being processed by the Attentive Hierarchical Graph Isomorphism Network (AHGIN). Then, the main high-level features from the ligand graph input dataset were extracted using a hybrid pooling layer, which combines both max and mean pooling. After which, the outcome was processed with a linear layer. The output of GEN is given as:
where \(X_{G}\) is the graph input, GEN is the Graph Encoder Network, and \(X^{\prime}_{GEN}\) is the output from GEN. Each component of the GEN network pipeline is described in depth in the paragraphs that follow.
Applying ReMLA to graph input
Applying the ReMLA layer to the node feature matrix \(V^{(0)}\). Let \(MLA(V)\) represent a single MLA layer applied to node features \(V\). The output from ReMLA can be written as,
where \(MLA_{1} (V^{(0)} )\) modifies the input node features with the first MLA layer. The output of \(MLA_{1}\) is further processed by \(MLA_{2} (.)\) using the next MLA layer. \(V^{(1)}\) is the feature matrix generated after ReMLA.is applied.
Node normalization
Node normalization \(Norm(.)\) is employed to modify node attributes and stabilize the learning process. after ReMLA,. With each node \(n_{i}\),
where: \(v_{i}^{l}\) represents the feature vector of the node \(n_{i}\) after ReMLA. \(\mu_{i}\) and \(\sigma_{i}\) signify the mean and standard deviation of features in \(v_{i}^{(l)}\), respectively. \(v_{i}^{(2)}\) represents the normalized feature vector for the node \(n_{i}\). The corresponding normalized feature matrix is \(V^{(2)}\).
Attentive hierarchical graph isomorphism network (AHGIN)
Following node normalization, an attention technique is used to capture local and global graph structure using the Attentive Hierarchical Graph Isomorphism Network (AHGIN).
Composite pooling layer
Max and mean pooling strategies are combined in the Composite pooling layer. Both local and global ligand graph properties can be captured by applying both methods to the AHGIN result. The mean pooling smooths out individual node changes and captures the ligand’s global structure by providing an average representation of the graph’s nodes. The mean pooling across the last node embeddings from AHGIN can be calculated mathematically as follows:
where \(h_{i}^{(U)}\) signifies the embedding of the node \(i\) after the final GIN layer \(U\) and hierarchical pooling. The total number of nodes (atoms) in the ligand graph is given as \(|N|\). Regardless of the relative importance of each node, mean pooling averages all node embeddings to capture the overall structural information. In a ligand, this gives an appropriate representation of the general features of the molecule, such as average polarity or distribution of atom types, which are necessary for interactions across the whole molecule. Max pooling captures distinctive, high-importance aspects in the graph by highlighting the most noticeable features across all nodes. Max pooling can be stated mathematically as,
where the greatest value in each feature dimension is chosen by applying max element-wise across the final node embeddings. Max pooling draws attention to important atoms in the ligand, such as those that are a part of active sites or functional groups (such an aromatic ring or carboxyl group), which are frequently crucial for binding or interaction. The model’s capacity to identify ligands with functional properties is improved by max pooling, which highlights nodes that might be crucial for ligand action by capturing these distinctive traits. Once both \(V_{mean}\) and \(V_{\max }\), have been calculated, we concatenate them to create a complete graph representation.
This mixed representation \(V_{concat}\) is very expressive because it incorporates both important local elements and global structure.
Protein encoder network (PEN)
Three separate encoders make up the Protein Encoder Network. The three protein descriptors that are employed are Amino Acid Composition (AAC), Pseudo Amino Acid Composition (PseAAC), and Dipeptide Deviation from Expected Mean (DDE). Figure 9 shows the detailed PEN architecture. These descriptors are explained below,
-
Amino acid composition (AAC) Without considering the amino acid order, the AAC descriptor determines the frequency of each amino acid in a protein sequence. It offers a straightforward representation of the overall amino acid composition of the protein. The relative frequency of each type of amino acid in the protein sequence is determined by AAC. The pairwise interactions between amino acids and the sequence order are not considered. The result shows the fraction of each amino acid as a 20-dimensional vector, with one dimension for each amino acid. AAC offers a fundamental compositional profile that captures the general biochemical characteristics of the CYP protein, including charge, polarity, and hydrophobicity.
-
Pseudo amino acid composition (PseAAC) PseAAC incorporates sequence order and other biochemical features in addition to basic frequencies, which may be important for extracting structural as well as functional information. To get the basic frequency of each amino acid, AAC is first calculated as a base. Correlation functions that take sequence order into account or other physicochemical characteristics (such hydrophobicity or charge) are also computed. The PseAAC vector is created by concatenating the AAC values with these extra features. Depending on the extra attributes selected, a variable-length vector that is usually greater than AAC is generated as the output. PseAAC depicts proteins in a way that more accurately reflects structural and functional motifs by incorporating information about both composition and sequence order. This makes it particularly helpful for tasks involving categorization and functional prediction.
-
Dipeptide deviation from expected mean (DDE) Dipeptide (amino acid pair) frequency deviations from the predicted values of a random distribution are computed using DDE. This draws attention to variations brought about by sequence patterns, which may represent functional motifs. The variance for each dipeptide is represented by a 400-dimensional output vector because there are 20 amino acids, each of which can form 20 potential pairs. Due to its ability to highlight dipeptides that occur more or less frequently than anticipated, DDE is especially helpful for finding sequence-specific features like structural motifs or binding areas.
AAC encoder
The AAC Encoder uses multi-layer processing with attention to transform the simple, frequency-based Amino Acid Composition (AAC) vector into a more expressive form. The AAC Encoder algorithm is shown below.
The dimensionality of AAC vector is increased by the initial linear layers and ReLU activations, enabling it to depict more intricate connections in the amino acid composition data. While residual connections preserve stability during training, the REMLA layers draw attention to significant regions of the AAC vector that may indicate crucial amino acids or patterns in frequency distributions. The processed data is condensed into a concise, comprehensible representation by the fully connected network, preparing it for subsequent tasks such as categorization. For various protein research tasks, our AAC encoder efficiently converts the original AAC input into a feature-rich embedding, capturing composition information and local patterns through attention.
PseAAC encoder
The PseAAC Encoder converts the Pseudo Amino Acid Composition (PseAAC) vector into a more expressive feature representation. The PseAAC vector includes information on physicochemical qualities, sequence-order correlations, and amino acid composition. In retrieving important information, each encoder block uses a REMLA mechanism, ReLU activation, and linear transformation. The Pseudo Amino Acid Composition (PseAAC) of a protein is represented by a variable-dimensional vector sent into the PseAAC Encoder. The 20 standard amino acid frequencies are usually included in this vector, together with other features that capture sequence order or metabolic characteristics. PseAAC Encoder’s algorithm is comparable to AAC Encoder’s algorithm.
DDE encoder
A 400-dimensional vector containing a protein’s DDE characteristics is sent into the DDE encoder. Every element covers every potential dipeptide pair consisting of the 20 standard amino acids and represents the departure from the expected mean of a particular dipeptide frequency. The encoder can record intricate, non-linear relationships inside the DDE vector with assistance from the linear layers and ReLU activations. By concentrating on crucial dipeptide deviation patterns, the REMLA mechanism improves the model’s capacity to collect significant biological or functional data. The feature representation is further compressed and improved by the FCN, prepping it for activities that come after.
Local merger
This merger function ensures that each feature type contributes effectively to the final merged representation by integrating the encoded outputs from each of the three encoders (i.e., AAC, PseAAC, and DDE) using concatenation and linear transformations.
\(W_{AAC,} W_{PseAAC.} W_{DDE}\) are the initial transformations that combine each encoder’s output into the same feature space. These converted vectors are stacked in the concatenation phase to maintain the dimensional disparities and the distinct information from each encoder. The concatenated vector is compressed into the ultimate output size by this transformation, creating a single merged representation \(y_{merged}\) for subsequent tasks.
Global merger
This technique combines unique information from each encoder, such as protein descriptors, fingerprint features, and graph-based compound data. The global merger concatenates the linear outputs of each network after applying a weighted transformation. The concatenated vector is reduced to a single feature vector via a final linear transformation.
\(W_{FEN} ,W_{GEN} ,W_{PEN}\), transforms each encoder’s output into a unified \(d_{merged}\) dimensional space. This provides a standard concatenation of features across networks. Stacking the converted vectors preserves information specific to fingerprints, graph-based features, and protein data by combining distinct information from each encoder. The last linear transformation gives the feature representation \(y_{global}\). This is projected into a final dimension \(d_{final}\) appropriate for studies, such as prediction or classification. The outputs from these three encoders are effectively combined by this global merger, producing a thorough representation that includes important features from every network.
Position awareness module (PAM)
The concept of "position-aware" refers to the ability of a model (like a transformer) to recognize and incorporate the order or location of tokens within a sequence. Without positional embedding, a transformer processes all tokens independently of their order, which is problematic for sequential data like sentences, protein sequences, or other structured data. Positional embedding provides this position-awareness by encoding the order information into the token representations. Figure 10 shows the detailed PAM architecture.
Vision transformer with dynamic attention (ViTDA)
A Dynamic Attention mechanism has been added to the transformer-based ViTDA model. In contrast to conventional self-attention, dynamic attention modifies attention weights according to content features and positional embeddings in a manner specific to each token (or data type). This modification enables the ViTDA to dynamically modify its focus and prioritize significant interactions within the data according to certain correlations between inputs, such as ligand features and sites for protein binding. Since it can handle various kinds of data (such as chemical structure, molecular fingerprints, and protein sequences) and combine them into a logical representation. The Vision Transformer with Dynamic Attention is particularly well-suited for MEN architecture because of this, MEN can leverage ViTDA’s potential, making it ideal for applications involving the interpretation of multi-modal data. Figure 11 shows the detailed ViTDA.
Predicting CYP450 inhibition requires the model to highlight pertinent aspects within each data type, which is made possible by Vision Transformer’s dynamic attention mechanism. It enables the model to focus on fingerprints, graphs, or sequence characteristics that might point to inhibitory relationships. MEN dynamically operate allowing it to adapt to various kinds of inhibitors and non-inhibitors.
Interpreter
As seen in Fig. 11b, the Interpreter is a model evaluation tool that shows and explains the attention weights produced for a particular ligand–protein interaction by the Vision Transformer with Dynamic Attention (ViTDA). The Interpreter assists in determining if MEN accurately identifies the most significant atoms or substructures that contributed most to the ligand’s binding affinity or inhibitory potential for the CYP450 enzyme by comparing these attention weights to the data from docking experiments. The BIOVIA Discovery Studio software54 was used to simulate the Docking experiment, while exploring the RDKit tool53 to produce the attention weights as shown in Fig. 11. Comparing these two provides proof of MEN’s interpretability and accuracy, revealing how well the internal attention weights correspond to the binding interactions confirmed with simulation. Thus, the Interpreter helps researchers assess and improve model predictions in ligand–protein interaction investigations by acting as an interface between biological docking experiments and machine learning attention observations.
Evaluation metrics
These metrics were used to evaluate the performance of the proposed model. Accuracy (Acc), specificity (Spe), precision (Pre), the area under the ROC curve (AUC), F1-score, Sensitivity (Sen) and Matthew’s correlation coefficient (MCC). These are described below according to the True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN).
Training environment design
In this work, the proposed MEN framework was executed using Pytorch 2.1.2, which was run on an RTX 2070 GPU. We employed an Adam optimizer with a learning rate of 0.0005 and a batch size of 128 for training and evaluation. By monitoring MEN’s performance on the validation dataset during training, the early stopping and dropout regularization strategies were employed to control overfitting. Early stopping also improved MEN.s ability to generalize to unfamiliar data (test data) by addressing overfitting. Random weight sampling was used to correct the dataset imbalance.
Results
Convergence of the proposed models
Figure 12 shows the proposed MEN’s convergence versus its components over 100 training epochs. Line plots are used to display the training performance, and the standard deviation is shown in the shaded area surrounding the line plots to give a comprehensive picture of each plot’s variability. MEN’s convergence is superior to that of its sub-networks. In terms of performance and convergence speed, MEN performs better than the FEN, GEN, and PEN submodules on all metrics. Compared to the other models, MEN shows a lower standard deviation, suggesting greater consistency and less unpredictability.
FEN results
To demonstrate the contribution of each component, the following sections discuss the specific performances of the various FEN components.
Examining ReMLA
The impact of the different localized attentions on the FEN output was then investigated using the ReMLA module. Table 3 displays the findings for the receptive field \(rf = 5\).
The result was also recorded for the receptive field of \(rf=7\) as summarized in Table 4.
The performance of the proposed ReMLA module was also examined in the case of \(rf = 5\) and \(rf = 7\) combined receptive fields. The results for this are presented in Table 5.
GEN results
The results of running GEN alone are presented. First, GEN is run with the complete ReMLA without the AHGIN proposal, and then the reverse is also experimented with and recorded. The results for GEN comprise all its components, including ReMLA and AHGIN. The results for these various configurations are displayed in Tables 6, 7, and 8 respectively.
PEN evaluation results
To determine the efficacy of the proposed PEN module, its prediction performance is also examined independently. The results of each of the three encoders of PEN are presented respectively in Tables 9, 10, and 11. Table 12 illustrates the results obtained when the outputs of all three encoders are combined. This excludes the results from the AGIN and FEN.
Clinical validation and verification
To further evaluate the predictive capabilities of MEN, we selected 15 drugs (Fluoxetine (Prozac), Diltiazem, Digoxin, Gentamicin, Warfarin, Verapamil, Ritonavir, Voriconazole, Lithium, Imipenem, Theophylline, Amiodarone, Levothyroxine, Clarithromycin and Ketoconazole) that are typically administered to patients West China Hospital. and require Therapeutic Drug Monitoring (TDM). None of these drugs were included in the training or validation datasets. Among the 15 drugs, MEN successfully identified as Fluoxetine (Prozac), Diltiazem, Verapamil, Ketoconazole, Amiodarone and Ritonavir as CYP450 inhibitors, with accuracy ranging from 51 to 80%. The remaining drugs were classified with low accuracy (20–40%) as inhibitors by MEN. Additionally, 5 of these were later confirmed to be potent substrates (non-inhibitors) of the CYP450 isoforms, while Lithium and Digoxin were found to not be associated with CYP450 metabolism. Table 13 presents the drugs identified by MEN as inhibitors.
Comparison with the state-of-the-art
The results of MEN and SOTA models are presented in Table 13.
Discussion
The evaluation results from each component of the suggested MEN model are discussed in this section. Meanwhile, we express the comparison between the whole MEN architecture and the use of FEN, GEN, and PEN independently. A comparison with other recent research papers is also summarized and discussed.
FEN ablation study
This section elaborates on the results achieved by FEN alone. From Tables 3, 4 and 5 the results obtained when \(rf = 7\) are better than when \(rf = 5\) for AUC and Accuracy. However, vice versa is true for the other metrics. This might occur because of the local attention being able to gain a more thorough view of the data patterns within the Fingerprints data with a bigger rf of 7. This could result in improved overall discrimination and classification, which would raise metrics like Accuracy and AUC. These metrics place greater emphasis on accurate categorization throughout the dataset than they do on individual positive or negative performance. Better sensitivity to actual positive or negative situations inside those localized areas could result from a smaller rf of 5, which would instead concentrate more on finely defined local patterns. Metrics that are more sensitive to the balance of accurately detecting positives and negatives, such as sensitivity and F1-score, can benefit from this attention. Here, precision, MCC, and specificity additionally improve since precisely tuned pattern recognition tends to balance them. The full FEN proposal outperformed the other FEN setups in all metrics because a dual rf strategy can increase the model’s robustness to different fingerprint patterns. By processing both local and wider contexts, the local attention module can more effectively generalize to various input types, increasing flexibility and decreasing overfitting.
The convergence curves in Fig. 12 illustrate the performance of all FEN configurations across the metrics. Among the three setups of FEN, the complete FEN model has the lowest standard deviation (the area around line plots) and the highest scores across all metrics. The larger rf (7) achieved better convergency curves than the lower rf of 5. Most metrics show that the larger receptive field (rf = 7) performs better than the smaller receptive field (rf = 5). This is because the larger receptive field enables the model to handle long-range dependencies, capture more context, and generalize better, which improves the learning of meaningful features and overall prediction accuracy, sensitivity, and specificity. The model performs better on more complicated tasks and datasets owing to the more general setting that a larger receptive field improves overall performance across most assessment criteria, even though smaller receptive fields could be useful for capturing local patterns.
GEN ablation study
The results obtained for the different setups of GEN are given in Table 13 and Fig. 13. It can be inferred that GEN with AHGIN outperformed GEN with the full ReMLA for all metrics except for F1-score and Precision. The hierarchical framework of AHGIN makes it possible to record relationships between nodes at multiple levels, providing a more thorough understanding of the interactions and structure of the graph. Capturing both local and global graph elements, aids in improving overall metrics such as Accuracy and AUC, which gain from a thorough and well-rounded comprehension of the graph. Also, comprehensive structural relationships are captured by AHGIN’s multi-level attention, which results in more robust high-level feature representations throughout the graph. Better classification and generalization may result from this more comprehensive and structured approach. However, local neighborhood patterns in a network are highlighted by ReMLA, which uses local attention mechanisms and residual connections. It is quite good at identifying real positives because of its local concentration, which aids in identifying minor features in smaller neighborhoods. ReMLA is better at filtering pertinent components since it is more localized, which helps metrics sensitive to true positive detection, such as F1-score and Precision. This fine-grained attention improves the ability to identify true instances. AHGIN offers a more comprehensive, hierarchical viewpoint than ReMLA’s local attention. Since balanced classification over the entire dataset is required for overall metrics like Accuracy and AUC, this can result in a gap in generalization and holistic feature capture, which is why it might perform poorly. In situations where specificity in identification is critical, AHGIN’s hierarchical structure may generalize too widely, which could introduce noise into the identification of true positives and marginally reduce its effectiveness in capturing exact positive cases, leading to lower Precision and F1-score. Combining ReMLA and AHGIN in the full GEN allows the model to capture interactions between substructures and overall graph structures by efficiently applying attention at both the local and hierarchical levels. Because of this synergy, the model can recognize patterns that could only be noticeable when considering both local and global settings simultaneously. By giving gradient quick cuts, the residual connections in ReMLA aid in training stability. This residual method can help reduce the dangers of vanishing or exploding gradients in deeper models when used in conjunction with AHGIN. This way, the combined architecture can gain from deeper and more complicated hierarchical representations without sacrificing crucial local information. Combining AHGIN and ReMLA in GEN highlights the advantages of each module, resulting in a more comprehensive and balanced representation that improves the model’s overall specificity, generalization, and accuracy across all metrics. Due to this synergy, the combined model is better than either ReMLA or AHGIN alone in capturing the complexity of the graph data. Graph isomorphism and hierarchical attention enable AHGIN to capture both local and global patterns in graph-structured data efficiently. Figure 13 convergence curves illustrate the performance of each GEN configuration across all metrics, with the complete GEN model yielding the highest scores on all metrics and the lowest standard deviation among the three GEN setups. AHGIN outperformed ReMLA with better convergency curves. This leads to improved stability, quicker convergence, and more successful generalization to unknown data. However, because it concentrates more on local patterns and has a less effective gradient flow for complicated data structures, ReMLA may take longer to converge due to its multi-local attention and residual connections.
PEN ablation study
PEN (AAC) achieves average performance in Accuracy and Specificity as shown in Tables 9. This is probably because it focuses mostly on the general amino acid frequency patterns. Lower Sensitivity and AUC can come from AAC’s failure to recognize subtler patterns or structural motifs essential for identifying all positive cases due to its lack of positional information. While AAC can detect positive cases, it lacks the finer discriminatory power to consistently capture true positives, which results in some false positives and lowers the F1-score. The F1-score and Precision indicate that AAC does not capture sequence nuances that may be crucial for identifying specific functional or structural features in proteins. The Accuracy and Specificity of PEN (PseAAC) are higher than those of ACC due to the additional sequence context, which enables it to identify more intricate patterns. PseAAC can reduce false positives and identify proteins more accurately by integrating physicochemical features. Lower Sensitivity and AUC can result from PseAAC’s focus on general sequence features, which may cause it to miss some uncommon or subtle patterns essential for identifying all positive cases. Since it is more geared toward wider sequence-level patterns, it could overlook situations when finer sequence-specific properties are essential. By utilizing both frequency and sequence features, PseAAC can distinguish real positives from false positives more effectively, as evidenced by its comparatively high Precision . Out of the three, PEN (DDE) has the best sensitivity and accuracy, demonstrating that it successfully detects local patterns, which helps identify true positives. DDE can detect sequence motifs or structural cues that are crucial for accurately detecting all positive cases by incorporating dipeptide patterns. Despite capturing local patterns, DDE might not generalize as well as PseAAC when it comes to distinguishing true negatives, resulting in more balanced but moderate MCC (40.4%) and Specificity. The DDE’s Precision is marginally lower than PseAAC, presumably due to a slight increase in false positives because of its increased sensitivity to positive patterns, which results in a modest trade-off in Precision with a potential risk of misclassifying borderline cases as positives. By merging the AAC, PseAAC, and DDE encoders into a single Protein Encoder Network (PEN), the network may take advantage of the distinct advantages of each encoding technique and perform better across the board. Protein sequences are represented more intricately by the model when the embeddings produced by ACC, PseAAC, and DDE are joined. By examining varied and complementary features, downstream classifiers can produce accurate predictions. Proteins frequently have intricate structures that are difficult for one feature type to fully depict. Correct predictions result from the network’s improved alignment with the biological complexity of proteins through the incorporation of data from various encoders. The sequence and physicochemical nature of PseAAC, the generalization of ACC, and the knowledge of local interactions provided by DDE are all advantageous to the entire Protein Encoder Network. A comprehensive representation of protein data is produced by this combination, which simultaneously captures general, subtle, and specific features of protein sequences, enhancing performance on every metric. The PseAAC continuously performs better than DDE and AAC across several metrics, exhibiting greater stability and superior learning progression, as indicated by the convergence curves in Fig. 14. The AAC is typically the worst performer, with a slower rate of improvement across all metrics, whereas DDE exhibits satisfactory results. The higher performance of PseAAC implies that it might be more appropriate for classification, representing protein sequences more successfully in this setting.
The proposed MEN versus FEN, GEN and PEN
Although GEN and FEN have similar FI-score and AUC values, GEN outperforms FEN in terms of sensitivity and accuracy The graph-based structure of GEN captures intricate related patterns, which gives it an advantage in correctly identifying positive cases. Nevertheless, regarding specificity and precision, FEN outperforms GEN, indicating that it may be more adept at detecting true negatives because it emphasizes generic composition traits. FEN’s AUC is slightly lower than PEN’s , but its sensitivity and specificity are comparable. Given its multi-feature protein encoding, PEN attains good predictions across all metrics, outperforming FEN in MCC and F1-score. Multi-scale attention and a richer, layered feature representation enable MEN to concentrate on important local and global patterns, improving its performance. FEN’s ability to capture complex sequences and structural details may be limited by its simpler feature encoding. In contrast, MEN excel at handling these nuances through hierarchical attention, yielding superior results on all metrics. GEN’s graph-based methodology, which may capture associations between data points, is the reason why it exhibits somewhat higher accuracy and sensitivity than FEN. FEN, however, does better in MCC and Specificity, indicating that GEN’s graph-based encoding occasionally misclassifies negatives as positives. Compared to PEN, which is focused on sequence information, GEN performs better in nearly every metric, showing that GEN’s performance advantage stems from richer, more nuanced representations offered by graph-based molecular structures. This makes GEN more appropriate for predicting bioactivity in compound data. While PEN’s multi-encoding technique produces a more specialized and contextually varied feature set, GEN’s emphasis on structure through graph encoding aids in general classification. Hierarchical attention and graph-like structural elements are combined in MEN to capture relational interdependence and sequence information. With this multi-level, attention-based representation, MEN can attain greater F1-score, MCC, and Accuracy.
The proposed MEN could perform better on all metrics because GEN’s exclusive focus on structure works well but lacks the flexible and nuanced attention that MEN offers. In terms of accuracy and precision, PEN outperforms FEN. This implies that, in contrast to FEN, which concentrates only on broad composition, PEN’s combination of ACC, PseAAC, and DDE encodings enables it to capture both global composition and local sequential aspects more successfully. GEN outperforms PEN in several measures. This implies that although PEN is very good at processing protein sequence data, it might not be as good at tasks involving ligand/compound data. Because they are longer and more variable, protein sequences might add noise making precise predictions more difficult. GEN employs graph-based information to classify chemicals more efficiently than PEN, which concentrates on sequence patterns. MEN can dynamically focus on the most pertinent segments of the sequence data, both locally and globally, due to its multi-scale encoding and hierarchical attention. Although PEN employs different encoding techniques to offer extensive sequence coverage, it lacks the nuanced focus that MEN’s attention mechanisms provide, which leads to MEN’s superior performance in terms of F1-score, Accuracy, Sensitivity, and Precision.
Furthermore, MEN can identify patterns in data at both fine and large dimensions due to its hierarchical and multi-scale attention mechanisms. By better differentiating between positive and negative situations, MEN can dynamically shift its focus to pertinent data locations, improving both sensitivity and specificity. The MEN combine these feature types with a flexible attention structure that can prioritize the most informative aspects, in contrast to FEN, which may concentrate on overall composition, and GEN, which concentrates on structural connections. This improves Accuracy, MCC, and F1-score by producing a more thorough representation. By successfully lowering false positives and negatives, MEN’s attention mechanisms adaptively prioritize pertinent information and context, enabling it to earn high scores in Precision and AUC in contrast to PEN, which use multiple encodings without strong prioritization. MEN can more effectively generalize across a wide range of data types and complexity thanks to its layered feature extraction, which most likely includes residual or multi-local attention features. In view of this, it is adaptable and continuously performs well on every metric.
MEN consistently surpass GEN, PEN, and FEN across all measures, as seen by the curves in Fig. 15. These curves swiftly converge and attain higher values in accuracy, AUC, sensitivity, specificity, MCC, and F1-score. With greater performance and better consistency than PEN and FEN, particularly in metrics like accuracy and AUC, GEN is the next top performer. The fact that PEN and FEN perform similarly but fall short of MEN and GEN suggests that they might not be as good at this task. While GEN might be a decent substitute with respectable performance, the convergence patterns indicate that MEN is the most reliable and efficient encoder for these classification measures. The Confusion matrices in Fig. 16 show that FEN has a comparatively large number of false positives (2528), suggesting a propensity to mistakenly identify non-inhibitors as inhibitors. While FEN’s true positive rate is low and demonstrates fair sensitivity, the high false positive rate raises the possibility that it lacks specificity. When compared to FEN, GEN exhibits a notable improvement in both true positives and true negatives. There are fewer false positives (1078) and false negatives (848), suggesting that sensitivity and specificity are better balanced. GEN demonstrates versatility in classification by effectively detecting both inhibitors and non-inhibitors. With a true positive count (4496) and a false positive count (2604), PEN performs similarly to FEN. Additionally, a very high number of false negatives (1063) indicates lower sensitivity. Like FEN, PEN has a significant false positive rate, which implies less specificity and makes it less accurate at differentiating between inhibitors and non-inhibitors. FEN performs well, as seen by the bar chart in Fig. 17, with F1-scores typically surpassing accuracy for all classes except for 1A2, where accuracy is noticeably lower than other classes. This indicates that FEN has relatively lower overall accuracy for class 1A2 but succeeds in striking a balance between precision and recall over most targets. GEN performs better in modeling graph-based data than FEN, as evidenced by its greater accuracy and F1-score across most classes. For most classes, the F1-score and accuracy are the same, indicating consistent classification across these metrics.
When it comes to managing protein sequence data, PEN performs similarly to GEN but with somewhat higher accuracy and F1-score than FEN. Robust forecasts are shown by performance measures that seem to be balanced across all classes. For all classes, MEN perform better in accuracy and F1-score than the individual networks (FEN, GEN, and PEN), suggesting that merging several encoders (FEN, GEN, and PEN) yields better predictive power. The advantages of employing an integrated strategy to capture more intricate data connections are demonstrated by MEN’s excellent results across all classes. Overall, MEN has a considerable advantage over FEN, GEN, and PEN, which do not pay the same adaptive attention, due to its capacity to dynamically capture and prioritize local and global features. The MEN could create a more nuanced representation that corresponds to high Accuracy, MCC, and F1-score by merging various structural, sequential, and compositional elements in a flexible, layered manner. By concentrating on the most contextually important data, MEN’s attention-based structure reduces false positives and negatives. With an extremely low number of false positives (60) and false negatives (51) MEN exhibit exceptional performance. Among the four encoders, it successfully distinguishes both inhibitors and non-inhibitors, as evidenced by the greatest true positive and true negative counts. MEN’s low rates of misclassification reflect its high sensitivity and specificity.
MEN vs the SOTA models
As inferred from Table 13, the MEN model was further validated using various test datasets for each of the five CYP isoforms, and Table 14 shows that MEN produced competitive results compared to four other models: MumCYP_Net25, DeepCYPs29, GCNN33, and iCYP-MFE34. The availability of high-quality training data may affect MumCypNet’s generalization ability, while MEN’s superior results are due to its use of multiple self-attention mechanisms that enhance feature extraction and generalization on different datasets, addressing data quality limitations. Its complexity may also lead to interpretability issues.
The potential sensitivity to noise in input data and DeepCYPs’ incapacity to scale to large datasets have an impact on the predicted accuracy. Additionally, it can be challenging to modify its architecture for certain use cases. Because MEN’s architecture allows for more flexible adjustments and better management of noisy data, it may increase robustness and scalability. GCNN sometimes requires large amounts of computing power, which restricts its application in smaller labs. It may perform worse in uneven graph designs and may struggle in capturing long-range dependencies. By integrating local attentions, MEN can better gather local and global structural information, improving performance on irregular graph structures. iCYP-MFE’s drawbacks include the possibility of overfitting, especially on smaller datasets, and its reliance on preset molecular features that might not capture all relevant information. The interpretability of the results could pose another issue. MEN’s AHGIN reduced overfitting and enhanced molecular feature representation, leading to more accurate predictions and better interpretability. With accuracy and area under the curve (AUC) values above 90% and 95%, respectively, MEN showed excellent performance in determining the inhibitory potencies of unknown ligands for each of the five CYP isoforms, as shown in Fig. 18.
MEN makes better predictions by capturing complex correlations in data through various self-attention techniques; its ability to focus on relevant features improves its accuracy; the combination of local attentions and Graph Isomorphism Networks (GIN) allows MEN to distinguish between classes; and the combination of these two techniques improves the AUC by more accurately representing the trade-off between sensitivity and specificity. Self-attention may improve feature representation, increasing sensitivity and thus helping MEN detect positive examples more successfully. MEN can differentiate between true negatives and false positives once the connections have been successfully modeled in local and global contexts. This would improve specificity and ensure that adverse circumstances are appropriately classified. By better differentiating between true negatives and false positives through modeling of the interactions in both local and global settings, MEN increased specificity and ensured that negative samples were appropriately identified. MEN achieved a superior balance between precision and recall, leading to a higher F1-score. This is particularly useful when reducing false positives and false negatives is essential. MEN focuses on the most relevant features for classification due to its local attention, which can improve accuracy by reducing the number of false positive predictions. MEN integrates the benefits of graph isomorphism, local attention, and self-attention to learn and generalize from complex data structures. This could lead to improved performance on several metrics.
Biological deductions from clinical validation, verification, and real-world use
Given that many of the selected drugs have a narrow therapeutic index and complex pharmacokinetics, it is crucial for patients to regularly monitor their drug concentrations and watch for potential DDIs that may cause significant side effects. As a result, predicting potential DDIs for these drugs is highly beneficial. Among the 6 drugs MEN identified as CYP450 inhibitors, 3 of them – (Ketoconazole, Amiodarone and Ritonavir) seem to inhibit most of the CYP450 enzymes, thus we further studied them. Information retrieved from medscapes.com reveals Ketoconazole has 42 contraindicated drug interactions, 213 serious drug interactions, 90 minor drug interactions and 309 of its interactions with other drugs need to be monitored closely. It is not recommended to take ketoconazole concurrently with dofetilide, quinidine, pimozide, cisapride, methadone, disopyramide, dronedarone, and ranolazine. Through CYP3A4 inhibition, ketoconazole can raise the plasma levels of these medications, which might result in longer QT intervals and, in certain situations, potentially fatal cardiac arrhythmias like torsades de pointes. The QT interval is a measure of the time it takes for the ventricles of the heart to contract and then relax, also known as the ventricular depolarization and repolarization duration. From the same data source, we found out there were 27 drug contraindicated interactions for Amiodarone, 208 serious drug interactions, 293 drugs needed to be monitored closely and 51 were minor interactions. Information gathered also from Drugs.com, indicates Amiodarone when taken with warfarin, inhibits CYP450 2C9, preventing the metabolism of S-warfarin in the liver. Other oral anticoagulants may experience similar effects, increasing the risk of bleeding and severe hypoprothrombinemia. Adding amiodarone to an anticoagulant regimen can enhance anticoagulant effects, which may become evident one to several weeks after initiation and persist for months after discontinuation of amiodarone. The severity of this interaction varies; some individuals experience no symptoms, while others may suffer from severe, and sometimes fatal, bleeding. Those with impaired CYP450 2C9 metabolism may be more prone to bleeding and experience the interaction more quickly55.
As presented in the Medline repository, there are 36 contraindicated drug interactions for Ritonavir, 176 serious drug interactions, 396 drug interactions need to be monitored closely and 76 are minor interactions. Ritonavir’s effect on hepatic metabolism can lead to severe and potentially fatal side effects when used in combination with sedative-hypnotics, antiarrhythmics, or ergot alkaloids. As ritonavir inhibits CYP450 3A, drugs that are primarily metabolized by CYP3A and undergo significant first-pass metabolism are more likely to have increased blood concentrations when taken concurrently.
Visual interpretability evaluation
One of the key benefits of MEN over other CYP prediction models is its ability to comprehend interactions from both the compound and target perspectives, which can highlight crucial interatomic connections (Fig. 18). Drug discovery requires an understanding of DDI predictions. A case study was conducted for the CYP isoform inhibitor with the highest predicted value in the test set to examine the inhibitory activity. Where the molecule has the strongest inhibitory impact, the atoms with the biggest attention weights highlight those regions. Molecular docking simulations were performed using the tools PyMol56, AUTODOCK57 and BIOVIA Discovery Studio54 to examine these interactions in more detail. AutoDock was used for docking, PyMol was used to prepare the ligand and protein, and BIOVIA Discovery Studio was used to visualize the results. MEN gives the ligand’s nitrogen-containing groups and core ring structure a lot of weight when it comes to CYP1A2. This implies that the model gives nitrogen interactions and the core aromatic area priority. The docking results validate hydrophobic and hydrogen bond interactions with residues such as THR A-391 and SER A-404 around the core ring. The docking interactions and the model’s emphasis on this area show that the model has identified the key binding characteristics that support ligand stability in 1A2. MEN concentrate on the nitrogen and oxygen atoms in the 2C9 isoform, indicating that these atoms are crucial for binding. Strong ionic interactions and hydrogen bonding between oxygen and nitrogen atoms are demonstrated by the docking studies with LYS A-421 and SER A-422. It appears that the model appropriately prioritizes the functional groups involved in binding to 2C9 because its attention weights closely match these important interaction locations. With weights ranging from 0.8 to 0.9, MEN identify oxygen atoms in the ligand’s core structure on the 2C19 isoform, suggesting that these groups may impact binding. The docking study reveals hydrogen bonds between oxygen atoms and residues like GLY A-296 and VAL A-208. This consistency implies that MEN correctly pinpoint areas necessary for ligand interaction with 2C19, emphasizing atoms that support hydrogen bonds. MEN uses weights between 0.9 and 0.8 to highlight the sulfur and nitrogen atoms as well as adjacent ring structures for 2D6. Near these same areas, where the ligand forms hydrogen bonds, docking results show hydrogen bond interactions with residues such as ASP A-216 and PHE A-120. According to this alignment, the model’s attention weights concentrate on areas with established binding interactions, indicating that it captures important binding characteristics that are crucial for 2D6.
High emphasis is placed on the ligand’s big ring structures and nitrogen atoms in 3A4’s interactions, indicating that these are essential for binding. These regions engage in hydrogen bonding and hydrophobic interactions with residues such as PHE A-120, LEU A-121, and GLU A-374, as confirmed by docking interactions. The model’s focus is in line with these binding interactions, indicating that it successfully gives priority to areas of the 3A4 binding site that are crucial for ligand affinity. Docking interactions indicate that the MEN model’s attention weights consistently identify atoms and structural regions that are essential for binding across all ligands. With the highest attention weights going to functional groups involved in hydrogen bonding, ionic interactions, and hydrophobic contacts with protein residues, this alignment shows that the model successfully represents crucial binding properties. This implies that the model is a helpful tool for understanding and predicting ligand–protein interactions since its predictions are not only precise but also interpretable in terms of conventional binding interactions. Docking interactions indicate that MEN’s attention weights consistently identify atoms and structural regions that are essential for binding across all ligands. With the highest attention weights going to functional groups involved in hydrogen bonding, ionic interactions, and hydrophobic contacts with protein residues, this alignment shows that the model successfully represents crucial binding properties. This implies that the model is a helpful tool for understanding and predicting ligand–protein interactions since its predictions are not only precise but also interpretable in terms of conventional binding interactions.
Limitations
It is important to highlight that the key binding residues for different medicines interacting with other CYP isoforms may vary, as the scope of this study could not fully capture all such variants. These variations will need to be further investigated in future studies. Several essential binding residues that comprise the protein sequences utilized for training would be produced by the same CYP450 protein bound to a different ligand inhibitor. MEN could not show attention weights on the protein sequences even though the protein sequences were also part of the data used. Research in the future might focus on figuring out how to accomplish this. A complex architecture that requires a significant amount of time and processing power to train can be created by combining self-attention, local attention, and graph isomorphism networks. Hyperparameter choices can have a significant impact on MEN’s performance, therefore careful adjustment is required to achieve the optimum results. As dataset size increases, the model’s processing demands may also increase, making scalability for larger datasets or real-time applications challenging.
Future works
Data is essential to the operation of many ML and DL models, and how data is represented directly affects the model’s learning ability. High-quality, well-represented data enables better predictions and successful generalization to new, unknown data. The proposed approach, MEN in this study uses various fingerprint representations of the chemical component, its graph representation, and protein embedding layers to extract the required information from ligands, compounds, and sequences. Some compound descriptors not used in this study that could be investigated are Quantitative Structure–Activity Relationship (QSAR) Descriptors (which include a variety of features that describe chemical properties and their relationships with biological activity, such as hydrophobicity and electronic properties), and other data feature strategies for both proteins and chemicals that could be investigated to see how they affect the model’s output. Composite Descriptors (which are derived from combining various simpler descriptors to capture more complex features), Molecular Descriptors Combining Multiple Properties (such as LogP and Topological Polar Surface Area (TPSA)), Geometric Descriptors (such as molecular shape descriptors that combine aspects of 3D shape and size), and Hybrid Descriptors (such as combining structural, electronic, and topological information, such as combining molecular weight, number of hydrogen bond donors, and acceptors). Some other Protein descriptors such as Topological Descriptors (Graph-based descriptors, Contact map), Structural Descriptors (Secondary structure content, Solvent accessible surface area, Protein volume), Physicochemical Descriptors (Molecular weight, Isoelectric point., Hydrophobicity, Aliphatic index, Instability index), Functional Descriptors (Functional domains, Active sites), Evolutionary Descriptors (Position-specific scoring matrix, Conservation scores) and Motif-Based Descriptors (Sequence motifs, Protein family signature), can also be used in the future.
Conclusion
The MEN model was designed for this study to predict potential inhibitors of five different CYP isoforms. FEN (processes drug compound fingerprints, representing molecular structures in a compact and simplified binary form, highlighting key features like functional groups and bonding patterns), GEN (focuses on graph-based representations of ligands, where each compound is treated as a graph with atoms as nodes and bonds as edges), and PEN (designed for protein sequence data) are some of the proposed encoders or sub-models that comprise MEN. PEN uses patterns learned from the amino acid sequence to predict properties like function, structure, and interactions with compounds. With the help of the Position Awareness Module (PAM), the novel ViTDA model was used to make the prediction. The isoforms are encoded using pseudo sequences of 42 significant substrate-binding residues, and all sub-models were well-trained using ablation studies. The effectiveness of MEN was confirmed by comparing its performance to that of its sub-models, highlighting the benefits of their combination. MEN used an embedding technique, ligand/molecular graph featurization, and a variety of fingerprint descriptors to find crucial structural information linked to inhibitory efficacy. The Residual Multi-Local Attention (ReMLA) approach prioritized significant local interactions, whereas the AHGIN proposal coupled self-attention with Graph Isomorphism Networks (GIN) to enhance molecular graph representations. Our extensive datasets validated MEN’s generalizability, and its explainability sets it apart from other machine learning methods and enables bioinformatics stakeholders to use it. Additionally, this study connected interpretability with the module utilized for the final prediction or classification (ViTDA) using RDKit’s heatmap tool. MEN is unique in that it uses data from all sub-models to enhance its predictions, and molecular docking experiments validated its findings. Overall, MEN outperformed state-of-the-art models. With the SOTA models proposed for CYP prediction, MEN achieved competitive outcomes across a variety of parameters, such as accuracy, AUC, F1-scores, etc.
Data availability
We used two input datasets to train, evaluate, and adjust the proposed AI-based MEN framework: Drug ligands or substances (1) SMILES strings gathered from the PubChem repository [46] and (2) target protein sequences retrieved from the PDB database [47].
Abbreviations
- AI::
-
Artificial intelligence
- FEN::
-
Fingerprint encoder network
- GEN::
-
Graph encoder network
- PEN::
-
Protein encoder network
- MEN::
-
Multimodal encoding network
- GIN::
-
Graph isomorphism network
- AHGIN::
-
Attentive hierarchical graph isomorphism network
- LA::
-
Local attention
- ReMLA::
-
Residual multi- local attention
- CYP450: :
-
Cytochrome P450
- DDI::
-
Drug-drug interactions
- ML::
-
Machine learning
- DL::
-
Deep learning
- PAM::
-
Position awareness module
- ViTDA::
-
Vision transformer with dynamic attention
- SMILES::
-
Simplified molecular input line entry system
- PDB::
-
Protein data bank
- Sen::
-
Sensitivity metric
- Spe::
-
Specificity metric
References
Li, X. et al. Prediction of human cytochrome P450 inhibition using a multitask deep autoencoder neural network. Mol. Pharm. 15(10), 4336–4345. https://doi.org/10.1021/acs.molpharmaceut.8b00110 (2018).
Arimoto, R. Computational models for predicting interactions with cytochrome p450 enzyme. Curr. Top. Med. Chem. 6(15), 1609–1618 (2006).
Guengerich, F. P. Cytochrome P450s and other enzymes in drug metabolism and toxicity. AAPS J. 8(1), E101–E111. https://doi.org/10.1208/aapsj080112 (2006).
Ingelman-Sundberg, M. Pharmacogenetics of cytochrome P450 and its applications in drug therapy: the past, present and future. Trends Pharmacol. Sci. 25(4), 193–200. https://doi.org/10.1016/j.tips.2004.02.007 (2004).
Zhou, S.-F., Liu, J.-P. & Chowbay, B. Polymorphism of human cytochrome P450 enzymes and its clinical impact. Drug Metab. Rev. 41(2), 89–295. https://doi.org/10.1080/03602530902843483 (2009).
Rendic, S. & Guengerich, F. P. Survey of human oxidoreductases and cytochrome P450 enzymes involved in the metabolism of xenobiotic and natural chemicals. Chem. Res. Toxicol. 28(1), 38–42. https://doi.org/10.1021/tx500444e (2015).
Zhou, S.-F. et al. Structure, function, regulation and polymorphism and the clinical significance of human cytochrome P450 1A2. Drug Metab. Rev. 42(2), 268–354. https://doi.org/10.3109/03602530903286476 (2010).
Zhou, Y., Ingelman-Sundberg, M. & Lauschke, V. M. Worldwide distribution of cytochrome P450 alleles: a meta-analysis of population-scale sequencing projects. Clin. Pharmacol. Ther. 102(4), 688–700. https://doi.org/10.1002/cpt.690 (2017).
Deodhar, M. et al. Mechanisms of CYP450 inhibition understanding drug-drug interactions due to mechanism-based inhibition in clinical practice. Pharmaceutics 12(9), 846 (2020).
Cassagnol, B.G.M., Biochemistry, Cytochrome P450. (2023).
Jacob, S. & Nair, A. B. An Updated Overview on Therapeutic Drug Monitoring of Recent Antiepileptic Drugs. Drugs R D. 16(4), 303–316. https://doi.org/10.1007/s40268-016-0148-6 (2016).
Pang, Y., Yao, L., Jhong, J. H., Wang, Z. & Lee, T. Y. AVPIden: a new scheme for identification and functional prediction of antiviral peptides based on machine learning approaches. Brief. Bioinform. 22(6), 2bab263 (2021).
Achiaa Atwereboannah, A., Wu, W.-P. & Nanor, E. Prediction of drug permeability to the blood-brain barrier using deep learning. In 4th International Conference on Biometric Engineering and Applications 2021. Taiyuan, China. https://doi.org/10.1145/3476779.3476797.
Atwereboannah, A.A., et al. Protein-Ligand Binding Affinity Prediction Using Deep Learning.https://doi.org/10.1109/ICCWAMTIP53232.2021.9674118.
Agyemang, B. et al. Multi-view self-attention for interpretable drug-target interaction prediction. J. Biomed. Inform. 110, 103547–103547. https://doi.org/10.1016/j.jbi.2020.103547 (2020).
Muratov, E. N. et al. QSAR without borders. Chem. Soc. Rev. 49(11), 3525–3564. https://doi.org/10.1039/d0cs00098a (2020).
Korotcov, A. et al. Comparison of deep learning with multiple machine learning methods and metrics using diverse drug discovery data sets. Mol. Pharm. 14(12), 4462–4475. https://doi.org/10.1021/acs.molpharmaceut.7b00578 (2017).
Lusci, A., Pollastri, G. & Baldi, P. Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J. Chem. Inf. Model. 53(7), 1563–1575. https://doi.org/10.1021/ci400187y (2013).
Torrisi, M., Pollastri, G. & Le, Q. Deep learning methods in protein structure prediction. Comput. Struct. Biotechnol. J. 18, 1301–1310. https://doi.org/10.1016/j.csbj.2019.12.011 (2020).
Tang, B. et al. A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility. J. Cheminform. 12(1), 15–15. https://doi.org/10.1186/s13321-020-0414-z (2020).
Palmer, D. S. & Mitchell, J. B. O. Is experimental data quality the limiting factor in predicting the aqueous solubility of druglike molecules?. Mol. Pharm. 11(8), 2962–2972. https://doi.org/10.1021/mp500103r (2014).
Schütt, K. T. et al. SchNet-A deep learning architecture for molecules and materials. J. Chem. Phys. 148(24), 241722–241722. https://doi.org/10.1063/1.5019779 (2018).
Hirohara, M. et al. Convolutional neural network based on SMILES representation of compounds for detecting chemical motif. BMC Bioinform. 19(Suppl 19), 526–526. https://doi.org/10.1186/s12859-018-2523-5 (2018).
Arús-Pous, J. et al. Randomized SMILES strings improve the quality of molecular generative models. J. Cheminform. 11(1), 71–71. https://doi.org/10.1186/s13321-019-0393-0 (2019).
Njimbouom, S. N. & Kim, J.-d. MuMCyp _ Net : A multimodal neural network for the prediction of Cyp450. Expert Syst with Appl. 255, 124703–124703. https://doi.org/10.1016/j.eswa.2024.124703 (2024).
Chang, J., Fan, X. & Tian, B. DeepP450: predicting human P450 activities of small molecules by integrating pretrained protein language model and molecular representation. J. Chem. Inf. Model. 64(8), 3149–3160. https://doi.org/10.1021/acs.jcim.4c00115 (2024).
Chen, Z. et al. Prediction of cytochrome P450 inhibition using a deep learning approach and substructure pattern recognition. J. Chem. Inf. Model. 64(7), 2528–2538. https://doi.org/10.1021/acs.jcim.3c01396 (2024).
Li, L. et al. Machine learning models to predict cytochrome p450 2b6 inhibitors and substrates. Chem. Res. Toxicol. 36(8), 1332–1344. https://doi.org/10.1021/acs.chemrestox.3c00065 (2023).
Ai, D. et al. DEEPCYPs: A deep learning platform for enhanced cytochrome P450 activity prediction. Front. Pharmacol. 14(April), 1–12. https://doi.org/10.3389/fphar.2023.1099093 (2023).
Ouzounis, S. et al. Molecular representations predicts CYP450 inhibition: toward precision in drug repurposing. OMICS: A J. Integrat. Biol. 27(7), 305–314. https://doi.org/10.1089/omi.2023.0075 (2023).
Guan, J., Yao, L., Chung, C. R., Chiang, Y. C. & Lee, T. Y. Stackthpred: Identifying tumor-homing peptides through gbdt-based feature selection with stacking ensemble architecture. Int. J. Mol. Sci. 24(12), 10348 (2023).
Weiser, B. et al. Machine learning-augmented docking. 1. CYP inhibition prediction. Digital Discovery 2(6), 1841–1849. https://doi.org/10.1039/d3dd00110e (2023).
Qiu, M. et al. A unified GCNN model for predicting CYP450 inhibitors by using graph convolutional neural networks with attention mechanism. Comput. Biol. Med. 150, 106177–106177. https://doi.org/10.1016/j.compbiomed.2022.106177 (2022).
Nguyen-Vo, T.-H. et al. iCYP-MFE: identifying human cytochrome P450 inhibitors using multitask learning and molecular fingerprint-embedded encoding. J. Chem. Inf. Model. 62(21), 5059–5068. https://doi.org/10.1021/acs.jcim.1c00628 (2022).
Plonka, W. et al. CYPlebrity: machine learning models for the prediction of inhibitors of cytochrome P450 enzymes. Bioorg. Med. Chem. 46, 116388–116388. https://doi.org/10.1016/j.bmc.2021.116388 (2021).
Park, H. et al. Prediction of human cytochrome P450 inhibition using bio-selectivity induced deep neural network. Bull. Korean Chem. Soc. 43(2), 261–269. https://doi.org/10.1002/bkcs.12445 (2022).
Vaswani, A., Shazeer, N., Parmar, N. Attention is all you need. NeurIPS (2020).
Zonyfar, C., Njimbouom, S. N., Mosalla, S. & Kim, J. D. GTransCYPs: an improved graph transformer neural network with attention pooling for reliably predicting CYP450 inhibitors. J. Cheminform. 16(2), 95–106 (2024).
Lin, F., Tseng, Y. & Yuan, L. Vision transformers in computational pharmacology: Applications and adaptations. Front. Mol. Biosci. 8(3), 210–219 (2021).
Chen, Z. & Wang, Z. Multimodal transformers for drug discovery. J. Cheminform. 15(1), 72–82 (2023).
Zhao, T. & Wang, J. Graph neural networks for drug interaction prediction. Comput. Biol. Chem. 101, 107–117 (2022).
Banerjee, P. et al. SuperCYPsPred-a web server for the prediction of cytochrome activity. Nucleic Acids Res. 48(W1), W580–W585. https://doi.org/10.1093/NAR/GKAA166 (2021).
Kim, G.A.A.a.S.N.N.a.P.D.G.a.T.-J.O.a.J.-D., Efficient machine learning-based prediction of CYP450 inhibition. International Conference on Database and Expert Systems Applications, 2021. vol 14147. https://doi.org/10.1007/978-3-031-39821-6_36 (2021).
Ouzounis, S. et al. A robust machine learning framework built upon molecular representations predicts cyp450 inhibition: toward Precision in Drug Repurposing. OMICS 27(7), 305-314. https://doi.org/10.1089/omi.2023.0075 (2023).
Wu, H., Di, Y. -Q., Zheng, C. -H. & Xia, J. Prediction of cytochrome P450 inhibition using an ensemble of extreme learning machines. 2013 IEEE International Conference on Bioinformatics and Biomedicine, Shanghai, China, 2013, pp. 342-344, https://doi.org/10.1109/BIBM.2013.6732515 (2022).
Kim, S. et al. PubChem 2023 update. Nucleic Acids Res. 51(D1), D1373–D1380. https://doi.org/10.1093/nar/gkac956 (2023).
Berman, H. M. et al. The protein data bank. Nucleic Acids Res. 28(1), 235–242. https://doi.org/10.1093/nar/28.1.235 (2000).
Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chemical. Info. 28, 31–36 (1988).
GDR, H. B., Sharon, N., & Australia, E. W. Nomenclature and symbolism for amino acids and peptides (Recommendations 1983). Pure Appl. Chem. 56, 595–624. https://doi.org/10.1351/pac198456050595.
Durant, J. L., Leland, B. A., Henry, D. R. & Nourse, J. G. Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42, 1273–1280 (2002).
Stiefl, N., Watson, I. A., Baumann, K. & Zaliani, A. ErG: 2D pharmacophore descriptions for scaffold hopping. J. Chem. Inf. Model 46, 208–220 (2006).
Bolton, E. E. et al. Chapter 12–PubChem: integrated platform of small molecules and biological activities. Ann. Rep. Comput. Chem. 4, 217–241 (2008).
Landrum, G.A. RDKit: Open-source cheminformatics. Release 2014.03.1.
Pawar, S.S., Review on discovery studio: An important tool for molecular docking. (2021) https://doi.org/10.5958/0974-4150.2021.00014.6.
Marcus, F. I. Drug interactions with amiodarone. Am. Heart J. https://doi.org/10.1016/0002-8703(83)90017-0 (1983).
Yuan, S., Chan, H., Hu, Zhenquan, Using PyMOL as a platform for computational drug design.https://doi.org/10.1002/wcms.1298.
Morris, G. M. et al. AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. J. Comput. Chem. 30, 2785–2791 (2009).
Acknowledgements
This work was supported by the IITP(Institute of Information & Communications Technology Planning & Evaluation)-ITRC(Information Technology Research Center) grant funded by the Korea government(Ministry of Science and ICT)(IITP-2025-RS-2024-00437191). This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. RS-2023-00256517).
Funding
This work was supported by the IITP (Institute of Information and Communications Technology Planning and Evaluation)-ITRC (Information Technology Research Center) grant funded by the Korea government (Ministry of Science and ICT), (IITP-2025-RS-2024-00437191).
Author information
Authors and Affiliations
Contributions
A.A.A.: Writing—Original Draft, Methodology, Simulation; W.-P.W.: Supervision, Formal Analysis, Writing—Review & Editing; M.A.A.: Supervision, Writing—Review & Editing, Funding acquisition, Conceptualization, Supervision; S.B.Y.: Software, Visualization; C.J.E.: Data Curation, Methodology; E.K.T.: Software, Visualization; G.-M.B.K.: Formal analysis, Validation; G.S.A.E: Data Curation, Methodology, Simulation; Y.H.G.: Writing—Review & Editing, Visualization, and Funding acquisition; E.A.: Writing—Review & Editing.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Atwereboannah, A.A., Wu, WP., Al-antari, M.A. et al. MEN: leveraging explainable multimodal encoding network for precision prediction of CYP450 inhibitors. Sci Rep 15, 21820 (2025). https://doi.org/10.1038/s41598-025-04982-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-04982-6