Fig. 4: Model performance of ligand-based virtual screening on target HTR2A.

a, b Performance of active compound prediction using different perturbational representations in chemical-blind and cell-blind setting. Box-and-whisker plots show the median (center line), 25th, and 75th percentile (lower and upper boundary), with 1.5 × inter-quartile range indicated by whiskers. Colored dots indicate the corresponding data points for seven cell lines. Two-sided t-test was applied between the models, and the exact p values are in source data. c Dimensionality reduction visualization of HTR2A active and inactive compounds based on various inferred perturbational representations. d Performance of active compound prediction by applying early fusion and late fusion for TranSiGen-derived representation from seven different cell lines. All models were run five times with different random seeds. Error bars represent the mean ± standard deviation. Two-sided t-test was applied between the models, and the exact p values are in source data. e Performance of active compounds prediction within different thresholds of max similarity of test molecules relative to train data. All models were run five times with different random seeds. Black dots indicate the corresponding data points, and error bars represent the mean ± standard deviation. Two-sided t-test was applied between the models, and the exact p values are in source data. Source data are provided as a Source Data file. (****p < 0.0001; ***0.0001 < p ≤ 0.001; **0.001 < p ≤ 0.01; *0.01 < p ≤ 0.05 and ns, 0.05 < p ≤ 1.0).