Fig. 4: A study of pretraining tasks. | Nature Communications

Fig. 4: A study of pretraining tasks.

From: A self-conformation-aware pre-training framework for molecular property prediction with substructure interpretability

Fig. 4

The AUC represents the area under the receiver operating characteristic curve and the RMSE represents root mean square error. The results of the ablation experiments are established at the default parameter settings. a Finetuning results of various pretraining tasks on the MoleculeNet dataset. Each pretraining task was repeated \(n\)=10 repetitions using different random seeds. The metric displayed in the graph is the AUC value or RMSE. The top and bottom edges of the graph represent the maximum and minimum values, respectively, while the left and right sides illustrate the probability density distribution of the data. The red dot in the center indicates the median. The w/o pretrain represents not loading the pretrained model. “FG” represents functional group prediction task. “Finger” represents molecular fingerprint prediction task. “SP” represents 2D atomic distance prediction task. “Angle” represents 3D bond angle prediction task. The red dashed line indicates the average performance in the case of using all pretrained tasks. b Finetuning results of different pretraining tasks on the Active Cliffs dataset. Each pretraining task was repeated \(n\)=10 repetitions using different random seeds. The metrics in these graphs are RMSE values. The top and bottom edges of the graphs represent the maximum and minimum values, respectively, while the left and right sides depict the probability density distribution of the data. The red dot in the center indicates the median. The red dashed line indicates the average performance in the case of using all pretrained tasks. c Analysis of pre-training representation capacity. We evaluate the representations learned by different pre-training tasks using t-Distributed Stochastic Neighbor Embedding (t-SNE) visualization (coloring points by label) and Davies-Bouldin Index (DBI) scores calculated on both the original high-dimensional representations (‘DBI original’, indicating intrinsic feature separability) and the 2D t-SNE projections (‘DBI t-SNE’, reflecting clustering quality in the visualization). Samples with the same label have the same color. Red circles indicate areas of sample confusion. The red dashed line divides the sample into two parts that do not overlap. Source data are provided as a Source Data file.

Back to article page