Fig. 2: sChemNET outperforms other methods at predicting small molecules bioactive against 125 miRNAs in Homo sapiens. | Nature Communications

Fig. 2: sChemNET outperforms other methods at predicting small molecules bioactive against 125 miRNAs in Homo sapiens.

From: sChemNET: a deep learning framework for predicting small molecules targeting microRNA function

Fig. 2

a Training and testing set with labeled and unlabeled chemical structure information built in. Molecules and activity data were selected from the SM2miR and the Drug Repurposing Hub databases. We used these to assess the prediction performance of sChemNET and other computational approaches under a leave-one-out cross validation procedure. (Left) The training set consists of labeled and unlabeled small molecules. The labeled compounds are small molecules known to be bioactive against at least one miRNA from Homo sapiens. The unlabeled set of compounds consists of ~2400 randomly selected small molecules without known activity against the set of miRNAs. (Right) The testing set consists of 4000 small molecules of which only a single small molecule is known to be bioactive against the specific miRNA under evaluation. b Prediction performance of eight computational methods obtained under our leave-one-out cross-validation procedure. Notched boxplots show the distribution around the median of the percentage of recall obtained for 125 miRNAs (y-axis) when retrieving top-K (100, 300, 500, 1000) small molecules from the test set. (Left) When considering all the 1102 instances; (Right) When considering only chemically dissimilar instances between training and testing sets. sChemNET was run with and without (\({s}_{{uv}}=1\)) miRNA sequence similarity information in its loss function. FNN stands for Feed-Forward Neural Network and “random” for scores sampled from a uniform distribution. The chemical similarity baseline ranks all the small molecules in the test set based on the max 2D Tanimoto chemical similarity to the bioactive small molecules in the training set. The distribution shows the variation of recall across the \(n=\)125 miRNAs for all the methods. For the boxplots the center line represents the median and the lines extending from both ends of the box indicate the quartile (Q) variability outside Q1 and Q3 to the minimum and maximum values. The notch represents the 95% confidence interval of the median.

Back to article page