Fig. 2: Prediction results on B-H dataset by RS-Coreset. | Communications Chemistry

Fig. 2: Prediction results on B-H dataset by RS-Coreset.

From: An active representation learning method for reaction yield prediction with small-scale data

Fig. 2

A The initial representation for the reaction space by the traditional fingerprint “MorganFP”. We use the visualization tool t-SNE, where the color of each instance represents the observed yield recorded in the public dataset. High-yield and low-yield combinations are mixed in this initial representation space. B The new representation of the reaction space generated by our RS-coreset, where the instances of the coreset are marked by red star (the coreset accounts for only 5% of the full dataset). In this new representation, most of the high-yield combinations locate in the right, and most of the low-yield combinations locate in the left. Namely, there is a strong correlation between yield and spatial location in the new representation space. C The predicted yields of our model trained on the new representation space of (B). The x and y axes of this plot are the dimensions yielded by 2D t-SNE on the reaction space (as same as (A) and (B)). The z axis is the corresponding predicted reaction yield. D Absolute errors of the prediction (i.e., the absolute value of the difference between our predicted yield and the observed yield). The percentages of the predictions with absolute error lower than 5%, 10%, 15% correspond to 38.1%, 62.0%, 77.4%, respectively.

Back to article page