Introduction

Drug development is a prolonged and resource-intensive endeavor, often taking over ten years and costing approximately USD 1.4 billion, with clinical trials representing 80% of these costs1,2. The challenge of identifying effective compounds stems from the vastness of the chemical space and complexity of biological interactions. Traditional approaches such as high-throughput screening and ligand-based design are limited by inefficiency and scalability3,4. In contrast, recent advances in machine learning (ML) and deep learning (DL) have shown great promise in addressing these limitations, accelerating chemical space exploration, and enhancing drug discovery workflows5,6.

Generative artificial intelligence (AI) models, particularly Generative Adversarial Networks (GANs)7 and Variational Autoencoders (VAEs)8, have emerged as transformative tools for drug discovery. These models generate novel molecular data that replicate the properties of existing compounds, enabling efficient exploration of the chemical space and more accurate drug-target interaction (DTI) prediction9,10. VAEs primarily focus on producing synthetically feasible molecules, whereas GANs generate structurally diverse compounds with desirable pharmacological characteristics11,12.

While VAEs effectively capture latent molecular representations, they may generate overly smooth distributions, limiting structural diversity. GANs complement this by introducing adversarial learning, enhancing molecular variability, mitigating mode collapse, and generating novel chemically valid molecules. This synergy ensures precise interaction modeling, optimizing both feature extraction and molecular diversity, and ultimately improving DTI prediction accuracy13. Integrating GANs with multilayer perceptrons (MLPs) further improves the accuracy of DTI predictions by leveraging generated features

Generative AI has the potential to streamline the drug discovery process by reducing costs, accelerating timelines, and improving overall efficiency. For example, it can minimize clinical development costs by up to 50%, shorten the trial duration by over 12 months, and raise the net present value by at least 20% through automation, regulatory optimization, and enhanced quality control. The McKinsey Global Institute reported that generative AI could contribute between USD 60 billion and USD 110 billion annually to the pharmaceutical sector, underscoring its transformative role in drug development and therapeutic advancement14,15. Integrating generative models with simplified molecular input line entry system (SMILES)-based molecular data can optimize drug discovery, improve DTI predictions, and facilitate the development of novel therapies16,17.

Problem identification and research objectives

Drug discovery involves extensive laboratory experiments and clinical trials, which limit the exploration of vast chemical spaces18. This study presents an advanced computational framework for precise DTI prediction designed to streamline drug discovery, while reducing costs and timelines. The primary objectives of this study are as follows:

  • Utilize VAEs to generate latent representations of molecular structures and novel molecules for target protein interactions.

  • Employ GANs to generate realistic molecular structures, enhancing compound efficacy.

  • Integrate MLP to refine the DTI predictions using a labeled dataset.

  • Improve predictive accuracy while ensuring effective molecule-target interactions.

Methodology

This section outlines the procedures and essential aspects of the experiments.

Model development

The proposed framework for DTI prediction is illustrated in Fig. 1, which highlights its architecture and workflow. The subsequent sections provide a detailed discussion of the methodologies and components integral to framework design. The VGAN-DTI framework integrates GANs, VAEs, and MLPs to enhance the DTI prediction. Its components include:

  • VAEs for refining small molecular representations

  • GANs for generating diverse drug-like molecules

  • MLPs for predicting binding affinities

Fig. 1
figure 1

Overview of the VGAN-DTI framework integrating VAEs, GANs, and MLPs for enhanced DTI prediction. VAEs encode molecular features, GANs generate diverse candidates, and MLPs classify interactions and predict binding affinities.

VAE architecture

VAEs use a probabilistic encoder-decoder structure that encodes data into a distribution to generate diverse, smooth samples.

  • Encoder network

    The encoder input layer receives molecular features as fingerprint vectors, whereas the hidden layers consist of fully connected units activated by a Rectified Linear Unit (ReLU). Typical configurations include two to three hidden layers, each with 512 units. The encoder function \(f_{\theta }\) is represented by Eq. (1).

    $$\begin{aligned} z = f_{\theta }(x) \end{aligned}$$
    (1)

    where \(x\) is the input molecular structure, and \(z\) is the latent representation.

    The latent-space layer generates the mean (\(\mu\)) and log-variance (\(\log \sigma ^2\)) of the latent-space distribution. This was achieved using an initial dense layer followed by two distinct and separately parameterized dense layers for (\(\mu\)) and (\(\log \sigma ^2\)). This process is described by Eq. (2).

    $$\begin{aligned} q(z|x) = {\mathcal {N}}(z|\mu (x), \sigma ^2(x)) \end{aligned}$$
    (2)

    where \(\mu (x)\) and \(\sigma ^2(x)\) denote the mean and variance output of the encoder, respectively.

  • Decoder network The decoder input layer receives a sample z from the latent space in this architecture. The hidden layers consisted of fully connected layers with ReLU activation, which were typically designed to mirror the encoder configuration. The output layer generates molecular representations such as SMILES strings through a final dense layer. The decoder function is given by Eq. (3).

    $$\begin{aligned} p(x|z) = \textrm{Bernoulli}\big (x|\textrm{Decoder}(z)\big ) \end{aligned}$$
    (3)

    where \(\textrm{Decoder}(z)\) is the decoder network output.

The decoder reconstructs the original molecular structure by using the latent representation described in Eq. (4).

$$\begin{aligned} {\hat{x}} = g_{\phi }(z) \end{aligned}$$
(4)

where \({\hat{x}}\) denotes the reconstructed molecular structure,

The VAE loss function combines the reconstruction loss with the Kullback-Leibler (KL) divergence19 between the learned latent distribution and prior distribution p(z) as described in Eq. (5).

$$\begin{aligned} {\mathcal {L}}_{\text {VAE}} = {\mathbb {E}}_{q_{\theta }(z|x)}[\log p_{\phi }(x|z)] - D_{\text {KL}}[q_{\theta }(z|x) || p(z)] \end{aligned}$$
(5)

The reconstruction loss measures the accuracy of the decoder in reconstructing the input from the latent space. The KL divergence penalizes deviations between the learned latent distribution and the prior distribution p(z), which is typically a standard normal distribution.

GAN architecture

GANs rank among the most productive and commonly used generative architectures, and deliver notable positive outcomes7. They comprise two neural networks, the generator and discriminator, which are trained adversarially. By utilizing these two modules, GANs can generate realistic molecular structures and predict DTI with high accuracy.

  • Generator network The generator input layer receives a random latent vector z from the generator network. The hidden layers are fully connected networks with activation functions, such as rectified linear units (ReLUs), and the output layer produces molecular representations. The generator function is expressed in Eq. (6).

    $$\begin{aligned} x = G(z) \end{aligned}$$
    (6)

    where \(G\) denotes the generator network parameterized by \(\theta _g\).

  • Discriminator network

    The discriminator input layer receives the molecular representations in the discriminator network. The hidden layers comprise fully connected networks with activation functions such as leaky ReLU. The output layer provides a probability that indicates whether an input molecule is authentic. The discriminator function is given by Eq. (7).

    $$\begin{aligned} D(x) = \sigma (D(x)) \end{aligned}$$
    (7)

    where \(\sigma\) is the sigmoid function and \(D\) is the discriminator network parameterized by \(\theta _d\).

  • Loss function

    The discriminator loss is expressed as Eq. (8).

    $$\begin{aligned} {\mathcal {L}}_D = {\mathbb {E}}_{z \sim p_{\text {data}}(x)} \left[ \log D(x) \right] + {\mathbb {E}}_{z \sim p_z(z)} \left[ \log \left( 1 - D(G(z)) \right) \right] \end{aligned}$$
    (8)

    where \(p_{\text {data}}(x)\) represents the distribution of real molecules and \(p_z(z)\) is the prior distribution of the latent vectors.

    Similarly, generator loss is expressed by Eq. (9).

    $$\begin{aligned} {\mathcal {L}}_G = -{\mathbb {E}}_{z \sim p_z(z)} \left[ \log D(G(z)) \right] \end{aligned}$$
    (9)

    The loss function prompts the generator to produce molecules that the discriminator classifies as real.

Multilayer perceptron (MLP)

MLPs are essential for improving DTI predictions. After generating molecules using VAEs and GANs, MLPs predict the interactions between these molecules and target proteins. As universal function approximators, MLPs capture complex nonlinear relationships, enabling accurate predictions from labeled DTI datasets. The MLP DTI prediction model included an input layer, several hidden layers, and an output layer. The input layer merges the features of the drug molecule and the target protein into a vector, which is processed through three hidden layers using linear transformations and nonlinear activation functions. The output layer produces a scalar that indicates the probability of interaction. During the forward pass, the model computes the interactions, and the loss function measures the errors to improve predictive accuracy.

  • Forward pass The output of each layer is computed during the forward pass through the linear and nonlinear activation functions. The output layer subsequently predicts interactions based on the learned features. Consider the following notations,

    • x is the input feature vector,

    • \(W_i\) is the weight matrix of the i-th layer,

    • \(b_i\) is the bias vector of the i-th layer,

    • \(h_i\) is the output of the i-th hidden layer,

    • \(\sigma\) is the activation function (e.g., ReLU for hidden layers, and sigmoid for the output layer).

    Equations (10) and (11) define the computations for the hidden layers and output layer, respectively.

    $$\begin{aligned} h_i= & \sigma (W_i x + b_i) \end{aligned}$$
    (10)
    $$\begin{aligned} y= & \sigma (W_4 h_3 + b_4) \end{aligned}$$
    (11)

    where y represents the predicted interaction between the drug and the target protein.

  • Loss function The MLP model is trained using the Mean Squared Error (MSE) loss, which calculates the average squared difference between the predicted interaction (\({\hat{y}}\)) and true interaction (y). The loss function \(L_{MLP}\) is defined by Eq. (12).

    $$\begin{aligned} L_{MLP} = {\mathbb {E}}[(y - {\hat{y}})^2] \end{aligned}$$
    (12)

    where y denotes the true interaction label, \({\hat{y}}\) is the predicted interaction label, and \({\mathbb {E}}\) denotes the expectation (average) for all the training examples.

GAN algorithm

The GAN framework consists of two networks: a generator, which creates synthetic samples, and a discriminator, which distinguishes real data from generated data. Both networks are improved through adversarial training. The generator produces more realistic samples, and the discriminator enhances its detection accuracy. The discriminator computes the loss by distinguishing between real and synthetic data, whereas the generator adjusts to minimize this loss. Both networks were optimized using their respective loss functions. Algorithm 1 presents the execution steps for the GAN model.

Algorithm 1
figure a

Pseudo-code for GAN.

VAE algorithm

The VAE generates new data using a probabilistic model composed of an encoder and decoder. Initially, the encoder and decoder weights are randomly initialized, and optimization is performed with a fixed learning rate. The encoder projects the input into the latent space, and the decoder recreates the original input from the samples within this space. The VAE loss combines the reconstruction loss and KL divergence, regularizing the latent space to follow a standard normal distribution. Minimizing this loss function enhances reconstruction accuracy and optimizes the latent-space structure. Algorithm 2 outlines the execution steps of the VAEs.

Algorithm 2
figure b

Pseudo-code for VAE.

MLP algorithm

MLP training starts by initializing the model parameters and optimizer. In each epoch, the molecular and protein features are concatenated and passed through the MLP. The forward pass computes the activations through the hidden layers to produce the predicted interaction. The mean squared error (MSE) loss was calculated, and the weights were updated using back propagation and gradient descent. This process is outlined in Algorithm 3.

Algorithm 3
figure c

Pseudo-code for MLP.

Experiments

This section outlines the experimental configurations used for training and evaluating the VGAN-DTI framework.

Dataset

The dataset used in this study was sourced from the BindingDB repository, which contains extensive data on small molecule-protein binding affinities. A subset of approximately 1.3 million records was selected, ensuring that each entry included a PubChem CID, SMILES string, UniProt ID, sequence data, and Gene Ontology annotations. From this collection, only records containing IC50 values were considered, as IC50 is the most consistently reported and widely utilized binding affinity metric in DTI classification tasks20,21.

Entries with only Ki or Kd values were excluded from this study. While Ki (inhibition constant) and Kd (dissociation constant) provide valuable insights into binding affinity, they lack standardized binary classification thresholds and are highly context-dependent, making them unsuitable for consistent label assignment across large-scale datasets.

To ensure high-quality labeling and clear binary classification, we adopted a well-established thresholding strategy commonly used in prior DTI studies22,23, where strong and weak interactions are defined as follows:

  • Strong (positive) interaction: IC50 below 100 nM

  • Weak (negative) interaction: IC50 greater than 10,000 nM

Entries with IC50 values between 100 nM and 10,000 nM or missing any essential fields were removed to maintain binary label precision and eliminate ambiguity. The dataset was further filtered to retain only drug-like small molecules (molecular weight below 1000 Da), based on extended thresholds beyond Lipinski’s Rule of Five, to include structurally diverse bioactive compounds with potential therapeutic relevance24,25.

Data processing

The data quality significantly affects the accuracy of DL-based frameworks26. Data optimization for model training involves converting molecular structures into the SMILES notation, which is a compact and linear representation. Morgan fingerprints have been computed to capture essential chemical properties and to identify structural features and substructures27.

The dataset was preprocessed to ensure standardized molecular and target representations. Canonical SMILES strings were used for drug molecules, and UniProt IDs were utilized to identify protein targets. All IC50 values were converted to nanomolar (nM) units to maintain uniformity across the dataset. Records lacking essential information such as SMILES, protein sequences, or annotations were excluded. Additionally, duplicate entries and those with inconsistent or ambiguous activity values were removed to minimize noise and improve label consistency.

A systematic partitioning scheme was applied, following established protocols in the DTI literature23, to evaluate model generalizability. The curated dataset was divided into four experimental settings:

  1. 1.

    Both seen: Drug-target pairs in which both the drug and protein were present during training.

  2. 2.

    Drug unseen: Test pairs included novel drugs not encountered during training, with known proteins.

  3. 3.

    Protein unseen: Test pairs contained novel proteins, with drugs previously seen during training.

  4. 4.

    Both unseen: Drug-target pairs in which neither the drug nor the protein appeared during training.

This partitioning approach enables comprehensive evaluation across varying levels of difficulty and biological relevance, simulating real-world DTI prediction scenarios where novel compounds or targets may be encountered.

Feature extraction

Feature extraction from the BindingDB dataset utilizes techniques that effectively represent the molecular structures. Morgan fingerprints encode structural fragments as vectors, whereas physicochemical properties such as octanol-water partition coefficient, denoted as log P, molecular weight, hydrogen bond acceptor counts, and topological polar surface area enrich the chemical and biological profiles of compounds28. The molecular structures in the SMILES notation29 are transformed into numerical formats, such as embeddings, for compatibility with DL models. Graph-based features, representing atoms as nodes and bonds as edges, capture structural details including connectivity, bond lengths, and ring sizes. These descriptors form the robust foundation for DTI prediction and generative drug design30.

Feature representation

A key strength of the VGAN-DTI framework is its emphasis on precise feature representation. Small molecules encoded as SMILES strings were converted into molecular fingerprints using the RDKit module31 to represent structural and chemical properties for model training. Fingerprints were converted into numerical vectors that were compatible with the predictive model. Protein sequences were encoded as numeric vectors by integrating the amino acid composition and physicochemical properties to represent key biochemical features. This comprehensive feature representation enables the model to detect complex DTIs with enhanced predictive accuracy.

Model setup and training

The VGAN-DTI framework incorporates the VAE, GAN, and MLP components to predict drug-target interactions (DTIs). The setup and key training aspects are as follows.

VAE setup

The VAE optimizes the latent representations of molecular structures by minimizing a loss function that combines reconstruction loss and KL divergence. The reconstruction loss is defined in Eq. (13).

$$\begin{aligned} L_{\text {recon}} = {\mathbb {E}}_{q(z|x)} \left[ \log p(x|z) \right] \end{aligned}$$
(13)

KL divergence, as given in Eq. (14), regularizes the latent space to approximate a standard normal distribution.

$$\begin{aligned} L_{\text {KL}} = D_{KL} \left[ q(z \mid x) \parallel p(z) \right] \end{aligned}$$
(14)

The total VAE loss, which combines both terms, is expressed by Eq. (15).

$$\begin{aligned} L_{\text {VAE}} = L_{\text {recon}} + L_{\text {KL}} \end{aligned}$$
(15)

GAN setup

The GAN framework comprises a generator and a discriminator. The generator loss, expressed in Eq. (16), drives the generation of realistic molecular samples.

$$\begin{aligned} L_G = -{\mathbb {E}}_{z \sim p(z)} \left[ \log D(G(z)) \right] \end{aligned}$$
(16)

The discriminator loss, given in Eq. (17), enables the discriminator to differentiate real from generated data.

$$\begin{aligned} L_D = -{\mathbb {E}}_{x \sim p_{\text {data}}(x)} \left[ \log D(x) \right] - {\mathbb {E}}_{z \sim p(z)} \left[ \log (1 - D(G(z))) \right] \end{aligned}$$
(17)

MLP setup

The MLP predicts DTIs by minimizing the Mean Squared Error (MSE) loss, as shown in Eq. (18).

$$\begin{aligned} L_{\text {MLP}} = {\mathbb {E}} \left[ (y - {\hat{y}})^2 \right] \end{aligned}$$
(18)

where \(y\) denotes the true interaction value, and \({\hat{y}}\) denotes the predicted value. This setup ensures accurate DTI predictions.

By combining these components, the VGAN-DTI framework efficiently learns molecular representations and predicts the DTIs.

Optimizing GAN, VAE, and MLP

The GAN module was trained using the Adam optimizer, with a learning rate of 0.0002. The generator and discriminator were alternately updated to minimize their respective losses, with a batch size of 64 and a latent dimension of 100. Similarly, VAE was trained for 150 epochs using the same optimizer settings to ensure effective latent-space learning and accurate molecular representations.

The MLP model consists of three hidden layers, each with 512 units and ReLU activation. The training utilized the Adam optimizer at a learning rate of 0.0002, with the output layer employing either sigmoid or softmax activation, depending on the task. The key optimization parameters are listed in Table 1.

Table 1 Hyperparameters for the proposed VGAN-DTI model.

DTI prediction

The DTI prediction process integrates the molecules generated by the VAE and GAN models as inputs for the MLP, facilitating accurate predictions based on their learned representations. These models generate feature vectors that capture the structural properties of drug molecules. These vectors are combined with the target protein feature vectors to form inputs for the MLP, which then predicts the likelihood of interaction between drugs and target proteins.

Let \({\textbf{x}}_d\) represent the feature vector of a drug molecule and \({\textbf{x}}_t\) represent the feature vector of the target protein. The input to the MLP is a concatenated vector \({\textbf{x}} = [{\textbf{x}}_d, {\textbf{x}}_t]\).

The forward pass through the MLP is mathematically defined by Eqs. (19), (20), and (22).

$$\begin{aligned} h_1&= \sigma (W_1 {\textbf{x}} + b_1) \end{aligned}$$
(19)
$$\begin{aligned} h_2&= \sigma (W_2 h_1 + b_2) \end{aligned}$$
(20)
$$\begin{aligned}&\vdots \end{aligned}$$
(21)
$$\begin{aligned} y&= \sigma (W_n h_{n-1} + b_n) \end{aligned}$$
(22)

Here, \(W_i\) and \(b_i\) are the weight and bias for the \(i\)-th layer, respectively, \(\sigma\) is the activation function (ReLU for the hidden layers and sigmoid or softmax for the output layer), and \(y\) is the predicted interaction score.

Evaluation

The proposed model was rigorously evaluated using key performance metrics, with validation loss monitored during training to mitigate overfitting. Early stopping was implemented to optimize performance, and an independent test set was used to ensure unbiased assessment. The evaluation framework included precision, recall, accuracy, and F1 score. Accuracy reflects the proportion of correct predictions; precision indicates the reliability of positive predictions; recall quantifies the model’s ability to identify true positives; and the F1 score, as the harmonic mean of precision and recall, offers a comprehensive measure of predictive performance.

DTI prediction distinguished between strong (IC50 < 100 nM) and weak interactions (IC50 > 10,000 nM). The performance metrics were analyzed across the four configurations by varying the inclusion of drugs and proteins in the training set. This comprehensive approach ensured thorough evaluation and identified areas for potential improvement.

Results

This section presents the experimental results of the VGAN-DTI approach for DTI prediction and drug design, emphasizing the effectiveness of integrating VAEs, GANs, and MLPs to achieve accurate and reliable outcomes.

Model performance

The VGAN-DTI framework achieved excellent performance across key evaluation metrics, as illustrated in Fig. 2. It achieved an AUC-ROC of 0.9523, precision of 0.9542, recall of 0.9412, F1 score of 0.9442, and an accuracy of 0.9635. These results highlight the robustness and precision of the framework, which makes it a valuable tool for drug discovery.

Fig. 2
figure 2

Performance evaluation of the VGAN-DTI framework across key metrics, demonstrating its predictive accuracy, robustness, and generalization capability.

Comparative assessment with baseline methods

The VGAN-DTI framework advances DTI predictions by leveraging the combined strengths of the VAEs, GANs, and MLPs. VAEs explore the latent chemical space to identify novel drug candidates, whereas GANs generate diverse, high-quality molecules to augment the training data. The MLP trained on this enriched dataset achieved significant improvements in prediction accuracy. Three popular baseline methods are considered to evaluate the overall performance of the framework. As shown in Table 2, VGAN-DTI demonstrated superior predictive performance, outperforming the existing methods.

The ELECTRA-DTA model (Wang et al., 2022) reported an AUC-ROC of 0.8745, precision of 0.8321, recall of 0.8154, F1-score of 0.8236, and accuracy of 0.845232. The MDCT-DTA model (Zhu et al., 2024) had an AUC-ROC of 0.8543, precision of 0.8215, recall of 0.7987, F1-score of 0.8099, and accuracy of 0.836833. The HoTS model (Lee et al., 2022) delivered an AUC-ROC of 0.8891, precision of 0.8454, recall of 0.8327, F1-score of 0.8390, and accuracy of 0.867534. The VGAN-DTI framework delivers promising results, demonstrating significant improvements over the state-of-the-art methods.

Table 2 Comparison of VGAN-DTI approach with state-of-the-art methods.

Figure 3 compares the AUC-ROC scores of the VGAN-DTI, ELECTRA-DTA32, MDCT-DTA33, and HoTS34 models, highlighting the superior performance of VGAN-DTI for DTI predictions.

Fig. 3
figure 3

Comparison of VGAN-DTI with state-of-the-art methods based on Accuracy, Precision-Recall, ROC AUC, and F1 Score, demonstrating its superior predictive performance and robustness.

Ablation study

Ablation studies were conducted to assess the individual impact of the VAE, GAN, and MLP components in the VGAN-DTI framework, with each module evaluated separately to determine its contribution to the overall predictive performance. Figure 5 illustrates the training loss curves for the VAE, GAN, and MLP components, showcasing their convergence and stability throughout the training process.

Molecule reconstruction by VAE

The VAE model encodes key features related to biological activity in the latent space, as shown in Fig. 4, enabling the generation of diverse molecular structures. The target protein interactions of these novel molecules were evaluated using the MLP model.

Fig. 4
figure 4

Latent space representation learned by the VAE, illustrating the molecular feature distribution and structural diversity.

The VAE training optimizes the reconstruction loss and KL divergence to ensure high-quality molecule generation, as shown in Fig. 5a. This approach enabled the generated molecules to closely mirror the original molecules while retaining their potential for target-protein interactions. The model demonstrated balanced performance, achieving 92% accuracy with precision, recall, and F1 scores of 90%, 91%, and 91%, respectively, underscoring its ability to generate novel and relevant molecular structures for DTI.

Fig. 5
figure 5

Training loss curves for (a) VAE, (b) GAN, and (c) MLP models, illustrating convergence trends, optimization performance, and training stability across epochs.

Molecule generation by GAN

The GAN model generates realistic molecular structures using a discriminator, which is subsequently employed for DTI prediction, thereby improving accuracy. The steady improvement in the training loss curve, illustrated in Fig. 5b, reflects the optimization process of the model. The GAN achieved 94% accuracy, with precision, recall, and F1 scores of 93%, 92%, and 93%, respectively. These remarkable results are attributed to efficient adversarial training, in which the generator produces artificial molecules and the discriminator assesses their authenticity, thereby enhancing the overall molecular quality.

DTI prediction by MLP

The MLP module serves as the final predictor, leveraging the molecular representations generated by the VAE and GAN models. The model training and optimization processes are illustrated in Fig. 5c, which shows a steady improvement. The model achieved 96% accuracy, distinguishing between interacting and non-interacting drug-target pairs. It also achieved a precision of 95%, reduced false positives, and a recall of 94%, thus validating its strength in identifying true positives. An F1 score of 94% indicated a balanced performance, optimizing both precision and recall. These findings underscore the efficacy of the model in capturing complex relationships and affirm its reliability for DTI predictions.

Discussion

This study highlights the integration of VAEs, GANs, and MLPs as a robust framework to improve DTI predictions. The following section examines the impact of the key parameters on model optimization.

Impact of training strategies

The efficacy of deep generative models depends on the optimized training strategies. The VAE reconstruction loss ensures accurate molecule generation, whereas GAN adversarial training enhances the discriminator to produce realistic molecules. The excellent performance of MLP is attributed to its deep architecture and dropout layers, which prevent overfitting and diverse datasets. Hyperparameter optimization, including learning rate and batch size adjustments, further improved the model efficiency.

Cross-validation and generalization analysis

A 5-fold cross-validation method was used to assess the stability and reliability of the models. The dataset was split into five segments, with four segments used for training and one segment for validation in each fold. The performance metrics along with the average and standard deviation, were recorded for each fold. The results in Table 3 highlight the consistency and robustness of the proposed model.

Since VGAN-DTI is the proposed and implemented model, its performance metrics are derived from five-fold cross-validation, allowing us to report mean ± standard deviation. In contrast, baseline models are state-of-the-art methods with performance metrics reported in prior studies or standard implementations, where standard deviations are often unavailable. This distinction ensures a fair comparison without extrapolating unreported variations from existing literature.

Table 3 5-fold cross-validation performance metrics.

The cross-validation results shown in Fig. 6 confirm the robustness and generalizability of the integrated model. Low standard deviations indicate consistent performance across data subsets, highlighting the reliability of the model. This consistency is crucial to ensure the applicability of the model to real-world data.

Fig. 6
figure 6

Cross-validation performance metrics over five folds, highlighting the model’s robustness and generalization. Mean ± standard deviation is reported only for VGAN-DTI, which was fully implemented and evaluated via five-fold cross-validation.

Error analysis

A detailed error analysis was conducted to examine misclassification patterns in VGAN-DTI. The model exhibited a misclassification rate of 12%, primarily owing to overlapping molecular features, such as Molecular Weight, Hydrophobicity, and Polar Surface Area between interacting and non-interacting classes. This feature redundancy can lead to ambiguous predictions, impacting the classification performance. Figure 7 illustrates the distribution of key features in the misclassified samples, highlighting the significant overlaps that contribute to the errors. To address these challenges, the framework leverages adversarial training to enhance feature distinctions and reduce misclassifications. Additionally, the SHAP-based sensitivity analysis provides insights into feature contributions and improves interpretability and robustness. These findings indicate that the generative framework in VGAN-DTI enhances molecular representation, minimizing error propagation and improving prediction reliability over conventional DTI models.

Fig. 7
figure 7

Error analysis of the VGAN-DTI model, revealing insights into predictive uncertainties and areas for refinement. By identifying error patterns and misclassified instances, this analysis guided targeted improvements in the molecular representation and model generalization. These insights contribute to enhancing the model’s robustness, optimizing generative learning strategies, and ultimately improving DTI prediction accuracy.

Feature importance and interpretability

The Shapley Additive Explanation (SHAP) summary and sensitivity analysis were performed to assess the performance and interpretability of the proposed model, as illustrated in Fig. 8. Table 4 records observations from the SHAP analysis, highlighting key features, such as molecular weight and hydrophobicity, that influence prediction. SHAP values offer an interpretable framework that is essential for real-world applications and can inform future improvements. Figure 8a shows the SHAP summary plot, emphasizing the importance of features and their impact on predictions.

Table 4 Feature importance based on mean SHAP values.

Sensitivity analysis showed minimal variation with changes in the input data. Figure 8b highlights the robustness and reliability of the model, ensuring consistent predictions, even in the presence of noise or slight data fluctuations. This robustness is crucial for real-world applications where data may not always be perfectly accurate or clean.

Fig. 8
figure 8

SHAP summary and sensitivity analysis of the VGAN-DTI model, highlighting the contribution and influence of key input features on predictive outcomes.

Implications of findings

The optimized integration of VAEs, GANs, and MLPs has significantly advanced drug discovery by expanding chemical space and improving DTI prediction. The ability of this framework to make precise DTI predictions, even for structurally diverse candidates, underscores its potential to streamline drug discovery, enhance diversity, and accelerate novel drug identification.

Limitations and future work

Despite these promising results, this study has notable limitations that require further investigation. The performance of VAEs and GANs is heavily influenced by the quality and diversity of the training data, with biases and limited diversity restricting chemical space exploration and reducing the prediction accuracy. The computational complexity of training these models requires significant resources, thereby highlighting the need for more efficient algorithms and hardware accelerators.

Interpretability remains a challenge because the understanding of the biological mechanisms behind predictions is limited. Employing explainable AI methods is essential to enhance model transparency and trust. Incorporating additional data modalities, such as genomic or proteomic data, could further improve DTI predictions by offering a more holistic view of drug-target interactions. Ethical concerns, including data privacy, model bias, and responsible AI deployment in healthcare, must also be addressed. Ensuring transparency, fairness, and safety in AI-driven drug discovery is critical for broader adoption.

Collaboration between computer scientists and biologists is vital to overcome these challenges. Combining domain expertise with technological innovation can enhance model performance, interpretability, and ethical practices, thereby advancing AI-driven drug discovery.

Related Work

The integration of generative AI models, particularly GAN-based architectures, VAEs, transformer-based large language models (LLMs) such as generative pretrained transformers (GPT)35,36, and diffusion models37, has gained significant attention in drug discovery and presents new opportunities for enhancing DTI predictions and optimizing critical drug development processes. Research in this domain has used vast biochemical resources to explore innovative applications and to advance drug discovery38. Generative models, such as GANs and VAEs, analyze patterns within chemical databases to design novel molecules from scratch, thereby facilitating key tasks such as quantitative structure-activity relationship (QSAR) modeling and molecular optimization39,40. This section reviews pioneering studies in this field, emphasizing their contributions in optimizing drug discovery pipelines.

Lin et al. explored recent advancements in drug design using GANs and introduced the FL-DISCO framework, which combines GANs with GNNs in a federated learning context. This approach demonstrates the ability to generate novel compounds with desirable drug-like properties41. Abbasi et al. developed the Feedback GAN framework, which utilizes an encoder-decoder architecture to convert SMILES into latent-space vectors and train a WGAN-GP network. This framework effectively generates realistic, diverse, and unique molecules, exploring new chemical spaces optimized for receptor-binding affinities and achieving 99% reconstruction accuracy42. Xu et al. introduced DeepGAN, a generative model trained on DeepSMILES data that addressed the limitations of traditional SMILES representations. Reinforcement learning principles have been used to optimize rewards and adversarial loss, allowing the model to generate diverse molecules while improving the validity and metrics43.

Li et al.. presented advanced quantum machine learning (ML) methods for drug molecule generation and protein binding site classification by integrating GANs, CNNs, and VAEs. In this study, a qubit-efficient quantum GAN (QGAN-HG) and an image-based method were used to advance quantum techniques for drug discovery44,45. Surana et al. developed PandoraGAN, a DL approach that uses GANs to accelerate the development of novel antiviral peptides (AVPs). Using a dataset of 130 highly active peptides, PandoraGAN efficiently generated novel peptide backbones with properties similar to those of known active peptides, thereby presenting significant potential for drug development against pathogenic viruses46. Song et al. introduced DNMG, a deep GAN architecture that utilizes transfer learning to enhance de novo molecular design by incorporating 3D spatial information and atomic physicochemical properties. This model generates novel drug-like ligands with enhanced binding affinities and physicochemical properties for specific targets, thereby advancing drug discovery47.

Li et al. addressed the challenges in de novo drug molecule design by developing a scalable quantum generative autoencoder (SQ-VAE) for molecule reconstruction and sampling. This study explored hybrid quantum-classical networks that generated high-dimensional molecular structures with superior drug properties in various dimensions48. Joo et al. proposed a conditional variational autoencoder (CVAE) architecture to address the challenge of generating syntactically invalid molecules in DL-based generative models. The CVAE framework, trained on molecular fingerprints and GI50 results for breast cancer cell lines, generates valid fingerprints with the desired properties and enhances database search capabilities49.

Accurate prediction of protein-ligand binding affinity is essential for optimizing compounds and enhancing their interactions with target proteins50. This study applied generative AI frameworks to streamline DTI predictions, improving prediction accuracy using benchmark datasets and binding affinity measurements.

Conclusion

The VGAN-DTI framework presented in this study uniquely combines variational autoencoders (VAEs), generative adversarial networks (GANs), and multilayer perceptrons (MLPs), enabling significant advancements in drug-target interaction (DTI) prediction. This methodology sets a new standard for DTI prediction, achieving remarkable outcomes, with 96% accuracy, 95% precision, 94% recall, and a high F1 score. The focus on data quality and accurate feature representation enables scalable and efficient prediction, which optimizes molecular interaction strategies and discovers novel drug candidates. The potential of generative AI models to streamline drug discovery by expanding the chemical space for critical tasks was outlined in this study. Leveraging generative AI-based computational methods can significantly reduce both the timelines and the costs of drug discovery. Future research should validate this framework across diverse datasets and integrate additional biological data to enhance its applicability and impact on personalized medicine and drug discovery.