Fig. 3: Gene sampling weights which maximize observability provide machine learned ranking for extraction of genetic sensing elements. | Nature Communications

Fig. 3: Gene sampling weights which maximize observability provide machine learned ranking for extraction of genetic sensing elements.

From: Learning perturbation-inducible cell states from observability analysis of transcriptome dynamics

Fig. 3

a The gene sampling weights, w, normalized by standard deviation of the corresponding gene, sorted by magnitude and plotted in the upper panel. The weights are grouped into three categories: (i) the third of genes with highest magnitude of sampling weights (green), (ii) the third of genes with second highest magnitude of sampling weights (orange), and ii) and the lower third that remains (blue). The lower panel is a histogram of the sampling weights and a kernel density estimate is superimposed. b The reconstruction accuracy (R2) between the true initial condition and the estimated initial condition when sampling 50 genes at random from each of the aforementioned groups for T = 2 time points (top). (Bottom) The reconstruction accuracy for the high group as a function of T. c Reconstruction accuracy between the estimated initial condition \({\hat{{{{{{{{\bf{z}}}}}}}}}}_{0}\) and the actual \({\bar{{{{{{{{\bf{z}}}}}}}}}}_{0}\) is plotted for number of sampled time points T = 1 to T = 10. d The average fold change response of each of the 20 genes which contribute most (top) and least (bottom) to the observability of the initial cell state are plotted. e The background subtracted TPM (malathion (TPM) − negative control (TPM)) of the 15 biomarker genes selected from the proposed ranking–by contribution to observability. The label on each x-axis indicates the percentage rank (out of 624 genes) of the gene, with respect to the gene sampling weights, with 100% corresponding to highest rank. The two biological replicates are shown using solid and dashed lines, respectively. Malathion was introduced to the cultures after collecting the sample at 0 minutes, hence this sample is not used for modeling and cell state inference (shaded in gray). f A Venn diagram comparing 180 differentially expressed genes and genes with the largest sampling weights identified by our approach (top). DESeq216 was used to identify differentially expressed genes. The bottom panel shows a histogram of the L2 norm (Euclidean distance from the origin) of the fold change responses for the genes in the unique sets in the Venn diagram.

Back to article page