Fig. 5: Accelerated missing link discovery through iterative learning. | Nature Communications

Fig. 5: Accelerated missing link discovery through iterative learning.

From: Knowledge integration and decision support for accelerated discovery of antibiotic resistance genes

Fig. 5

a A high correlation between the probability assignment by the hypothesis generator and forward experimental validation (226 and 90 validated hypotheses from the first and second iteration, respectively; R2 = 0.94). b The probability distribution of the positively validated hypotheses from the second iteration (i.e., dark blue bar in a) compared to the probability of the same hypotheses from the first iteration (n = 29 positively validated second iteration hypotheses). Updating the knowledge graph with the validated hypotheses in the first iteration (i.e., light blue and red bars in a) and re-training of the hypothesis generator led to the 14-fold probability increase (0.55 vs. 0.04, respectively, p value = 1.1 × 10−11), which in turn enabled the discovery that would not have been possible with only one iteration of hypothesis generation. The box represents the interquartile range, the middle line represents the median, the whisker line extends from minimum to maximum values, and the diamond represents outliers. The p value was calculated using the two-sided t-test. c, d The precision-recall (PR) and receiver operating characteristic (ROC) curves of the generated hypotheses compared against our wet-lab validation results. The AUCPR and AUROC of the second iteration hypotheses increased by 19.4% and 7.3%, respectively, when compared to the first iteration hypotheses. e We predicted and validated 64 CRA hypotheses from iteration 1 and 29 CRA hypotheses from iteration 2 for a total of 83 E. coli genes (blue node) that confer resistance (gray arrow) to one or more of 15 antibiotics (purple node). Genes with green and red labels indicate previously unknown genes that are not associated with antibiotic resistance in E. coli (9 genes) or any microbe (6 genes), respectively. The edge thickness is proportional to the KIDS predicted probability.

Back to article page