Table 2 Data processing methods
From: Application of Artificial Intelligence In Drug-target Interactions Prediction: A Review
Type of problem | Methodological characteristics | Author |
---|---|---|
Definition of negative samples is not standardized | High-quality negative samples are selected through the screening process to reconstruct the sample dataset, while the spherical search method is used to identify DTIs to avoid falling into local optimums and optimize the recognition ability of the extreme learning machine. | Hu et al.69 |
The digital divide between positive and negative sample sizes | An integrated learning framework for negative stacking that narrows the gap between positive and negative samples by sampling and splitting the negative samples, followed by integrated training. | Yang et al.70 |
 | The data expansion method Synthetic Minority Oversampling Technique (SMOTE) was introduced to generate new samples from a small number of existing samples. | Calangian et al.71 |
Specific sample sizes are not rich enough and are noisy and multidimensional | A lightweight learning framework light deep convolutional neural network, LDCNN-DTI, uses fewer protein descriptors and is able to convolve amino acid sequences of different lengths. | Wang et al.18 |
Cold start problems | Re-split the dataset. Split the positive samples into 5 groups, randomly select the negative samples as counterexamples, and combine them into 4 training sets and 1 test set. | Li et al.72 |
 | An unsupervised approach is used to introduce both intra- and inter-class interaction information of drugs and targets into the prediction network using migration learning. This pre-training method also performs well on the DTA prediction task. | Nguyen et al. 73 |