Fig. 5: PrePR-CT operates successfully in the small-data regime.

a,b, R2 scores for predicting the mean (a) and standard deviation (b) of post-perturbation gene expression in B cells as a function of training set size on the Kang dataset. Results are shown for all genes in the B cell graph (n = 3,176) and the top 100 DEGs for both CPA and PrePR-CT. Each point represents the mean performance across n = 3 independent runs with different random seeds; error bars denote ± standard deviation across the three runs. PrePR-CT consistently outperforms CPA across all training set sizes, demonstrating robust performance even in the small-data regime.