Fig. 3: Mining FDA-approved drugs by correlating disease and drug signatures using an overparameterized autoencoder embedding.
From: Causal network models of SARS-CoV-2 expression and aging to identify candidates for drug repurposing

a Gene expression (\({\mathrm{log}\,}_{2}\) RPKM + 1) of A549-ACE2 cells infected with SARS-CoV-2 versus normal A549-ACE2 cells with genes collected as part of the CMap study using the L1000 reduced representation expression profiling method highlighted as stars, showing that L1000 genes significantly overlap with SARS-CoV-2 associated genes, shown in red, (p value = 7.94 × 10−16, one-sided Fisher’s exact test). b Signature of SARS-CoV-2 infection on A549 and A549-ACE2 cells visualized using the first two principal components based on RNA-seq data from ref. 23. The signature of SARS-CoV-2 infection is aligned across normal A549 and A549-ACE2 cells as well as across different levels of infection. Green and orange points indicate data from A549-ACE2 and A549 cells, respectively. Circles and crosses indicate data from two different batches, the multiplicity of infection (MOI) of 0.2 versus 2, respectively. c Comparison of the signatures of a selection of 13 representative FDA-approved drugs (black arrows) as compared to the reverse signature of SARS-CoV-2 infection based on A549-ACE2 cells (green arrow) visualized using the first two principal components. Drugs whose signatures maximally align with the direction from SARS-CoV-2-infected cells (red) to normal cells (blue) are considered candidates for treatment. As expected, drugs have varying signatures of varying magnitudes. d Correlation between drug signatures in A549 and MCF7 cells when using the original L1000 expression space versus the embedding obtained from an overparameterized autoencoder. The overparameterized autoencoder aligns the drug signatures in A549 and MCF7 cells by shifting the correlations towards −1 or 1 while maintaining the sign of the correlation in the original space. e Histogram of correlations between cell types for a given drug using original L1000 gene expression vectors (blue), overparameterized autoencoder embedding (pink), top 100 principal components (purple), and top 3 principal components (green). The overparameterized autoencoder achieves about the same alignment of drug signatures as using the top three principal components, while at the same time faithfully reconstructing the data (10−7 training error). f A list of drugs whose signatures maximally align with the direction from SARS-CoV-2 infection to normal in A549-ACE2 cells (MOI 2) with respect to correlations using the overparameterized autoencoder embedding, the original L1000 gene expression space, and the top 100 principal components.