npj Biomedical Innovations

Table 2 Data processing methods

From: Application of Artificial Intelligence In Drug-target Interactions Prediction: A Review

Type of problem	Methodological characteristics	Author
Definition of negative samples is not standardized	High-quality negative samples are selected through the screening process to reconstruct the sample dataset, while the spherical search method is used to identify DTIs to avoid falling into local optimums and optimize the recognition ability of the extreme learning machine.	Hu et al.⁶⁹
The digital divide between positive and negative sample sizes	An integrated learning framework for negative stacking that narrows the gap between positive and negative samples by sampling and splitting the negative samples, followed by integrated training.	Yang et al.⁷⁰
	The data expansion method Synthetic Minority Oversampling Technique (SMOTE) was introduced to generate new samples from a small number of existing samples.	Calangian et al.⁷¹
Specific sample sizes are not rich enough and are noisy and multidimensional	A lightweight learning framework light deep convolutional neural network, LDCNN-DTI, uses fewer protein descriptors and is able to convolve amino acid sequences of different lengths.	Wang et al.¹⁸
Cold start problems	Re-split the dataset. Split the positive samples into 5 groups, randomly select the negative samples as counterexamples, and combine them into 4 training sets and 1 test set.	Li et al.⁷²
	An unsupervised approach is used to introduce both intra- and inter-class interaction information of drugs and targets into the prediction network using migration learning. This pre-training method also performs well on the DTA prediction task.	Nguyen et al. ⁷³

Back to article page

Search

Advanced search

Quick links