Fig. 2: Performance of methods on the active cliff dataset.

RMSE represents the root mean square error for the entire dataset. RMSEcliff represents the root mean square error for the subset of molecules with active cliffs. SCAGE is our proposed method. AFP37, CNN38, GAT39, GCN40, MPNN41, ImageMol29, and GEM26 are our compared methods. a RMSE and RMSEcliff results across \(n\)=30 datasets using various methods. Each result is the average of 10 replicate experiments using different random seeds. Center line shows the median; box limits represent the 25th (Q1) and 75th (Q3) percentiles; whiskers extend to 1.5× interquartile range (IQR); points are outliers beyond whiskers. b Each violin plot RMSE and RMSEcliff use the distribution of results across \(n\)=30 datasets for various methods. Each result is the mean deviation of n = 30 methods in (a). The left and right sides indicate the probability density distribution of the data. The central bold line indicates the interquartile range (25th to 75th percentile), while the black dot indicates the median (50th percentile). c Errors for all methods for active cliff compounds (RMSEcliff) compared to errors for all compounds (RMSE). Colored dots indicate the predicted results of the method. Gray dots indicate the results of all other methods. The black dashed line indicates RMSE = RMSEcliff, while the gray dashed line indicates a difference of ±0.25 between RMSEcliff and RMSE.