Fig. 1: The problem that Scouter addresses, the architecture of Scouter and the datasets used.

a, Perturbation experiments provide gene expression data for multiple single-gene perturbations. Methods such as Scouter learn from observed transcriptional responses and predict the responses of unseen perturbations. Single-gene perturbations are used for this illustration, but Scouter is applicable to both single- and two-gene perturbations. The upward and downward arrows indicate up- and down-regulation of gene expression, respectively. n denotes the total number of genes. b, A binary heatmap of the gene embedding matrix derived from GO terms. Each row corresponds to a gene perturbed in the Adamson dataset that has at least one GO term, while each column corresponds to a gene that shares at least one GO term with another gene in the row. The matrix includes 82 perturbations × 635 genes, with axis ticks automatically sampled for visual clarity. Cells are colored white to indicate zero values and black for non-zero values. The predominance of zero entries indicates minimal overlap in GO annotations among the genes. c, The Scouter architecture based on a compressor–generator neural network. g1, g2, …, gn represent the observed or predicted expression levels of the n genes. d, A summary of the datasets used and the computational time and resources required by different methods. Each dataset contains about 5,000 genes. Statistics for biolord on the Dixit data are not reported as its predictions did not yield meaningful improvement over the baseline in our evaluations.