Fig. 3: Analysis of Perturbation Response and Reverse Perturbation Predictions.
From: CellFM: a large-scale foundation model pre-trained on transcriptomics of 100 million human cells

a A diagram showcasing the perturbation prediction model leveraging cell-specific gene embeddings derived from CellFM. b The mean square error (MSE) between predicted and actual post-gene expressions for the top differentially expressed (DE) genes in a zero-shot setting. c Comparison of CellFM with other single-cell foundation models and the perturbation prediction method GERAS. Pearson correlation coefficients between predicted and actual gene expression changes are reported for the top differentially expressed (DE) genes in a zero-shot setting. d Analysis of gene expression changes following perturbations of AHR+ctrl (n = 464 cells) and BPGM+ZBTB1 (n = 280 cells). The plots compare predicted versus actual expression changes for the top 20 differentially expressed genes. Box plot elements represent: center line, median (50th percentile); box limits, upper (75th) and lower (25th) quartiles; whiskers, 1.5 × interquartile range (IQR) from the box; points beyond whiskers are considered outliers. The horizontal dashed line indicates the null effect baseline (0 change). Minimum and maximum values are represented by the whisker endpoints, with all percentiles calculated from the empirical distribution of expression changes. e A graphical representation of potential perturbation combinations across a 20-gene space, differentiated by experiment type (train, valid, test, unseen). Predicted perturbations are indicated by square boxes, with the actual source perturbation marked by a cross. The boxes are colored as follows: dark purple for test data, light purple for validation, medium purple for training, and gray for unseen. f The accuracy of each model in predicting the correct source of perturbation among the top 10 predictions for test cases in a fine-tuning setting. Source data are provided as a Source Data file.