Fig. 1: A general framework for training computational models for L1000 gene expression profile prediction and using them for downstream application (that is, drug repurposing for COVID-19 treatment).

θ is the set of model parameters, f is the function of θ that maps experiment information to gene expression profiles, and l is the function of θ that computes the differences between predicted and ground-truth gene expression profiles. The objective for the learning process is to minimize the loss between predicted profiles and ground–truth profiles in the L1000 dataset. After training, the models are used to generate profiles for new chemicals in an external molecular database (DrugBank). These profiles are then used for in silico screening (comparing with patient gene expression) to find potential drugs for COVID-19 treatment.