Fig. 3: The diagram of data splitting and the performance of inferring DEGs in different scenarios.

a The diagram of chemical-blind splitting and cell-blind splitting. In scenario 1-1, a dataset of 355 compounds on 7 cell lines is split by compounds, ensuring that test compounds do not seen in the training set. In scenario 1-2, a complete dataset of 8316 compounds on 164 cell lines is split by compounds. In scenario 2-2, the complete dataset of 8316 compounds on 164 cell lines is split by cell lines. The model was trained using the profiling data of 10, 50, and 150 cell lines, and the prediction performance was evaluated on 7 new cell lines. b Model performance comparison in chemical-blind splitting. c Model performance comparison in cell-blind splitting (scenario 2-1). d The performance of TranSiGen in cell-blind splitting (scenario 2-2) by using different numbers of cell lines in the training set. All models were run three times with different random seeds. Black dots indicate the corresponding data points, and error bars represent the mean ± standard deviation. Two-sided t-test was applied between the models, and the exact p values are in source data. Source data are provided as a Source Data file. (****p < 0.0001; ***0.0001 < p ≤ 0.001; **0.001 < p ≤ 0.01; *0.01 < p ≤ 0.05 and ns, 0.05 < p ≤ 1.0).