Fig. 2: Results of the predictive model. | Nature Communications

Fig. 2: Results of the predictive model.

From: AI-guided few-shot inverse design of HDP-mimicking polymers against drug-resistant bacteria

Fig. 2

Cross validation results (n = 15 fold) with Gradient Boosting Decision Tree (GBDT, ac), Random Forest (RF, df), Extreme Gradient Boosting (XGB, gi), Adaptive Boosting (Adaboost, jl) for applying descriptors downselection and data augmentation on predicting the values of the minimum inhibitory concentration (MIC) for S. aureus (MICS.aureus) and E. coli (MICE.coli) and the value of the minimum concentration to cause 10% hemolysis (HC10) with the metric of R-squared coefficient (R2). Descriptor_Init to Descriptor_Opt are different sets of descriptors (from the initial set to the optimized set) when downselection. Red boxes are results for augmented data and the bules for original data. The borders of the boxes indicate the first quartile (left) and the third quartile (right) of the results. The line in the box indicates the median. The whiskers refer to the most extreme, nonoutlier data points, with minima on the left and maxima on the right. mo Property prediction results of unseen test set Dtest with deep neural network on MICS.aureus, MICE.coli and HC10 with different polymer representation combination (n = 10). The borders of the boxes indicate the first quartile (left) and the third quartile (right) of the results. The line in the box indicates the median. The whiskers refer to the most extreme, nonoutlier data points, with minima on the left and maxima on the right. “Seq” is the abbreviation of “Sequence” (Source data are provided as a Source Data file).

Back to article page