Figure 2

Comparison of the reproduced gene expression profiles (REPs) that are computationally made from cell identity codes (CICs) and the original gene expression profiles (GEPs). (a) The measured mean squared error (MSE) over the test samples after each epoch of training. Each dot represents the mean value for 10 measurements during 10-fold cross-validation, and error bars show standard errors. (b) The ranking of similarity of each REP to the original GEP, in comparison with all other test-cases. The similarity is measured by Spearman correlation coefficient. Rank 1 shows the number of test cases that their REPs have been the most similar to their GEPs than any other test-case GEPs. (c) A similar ranking, but the similarity is measured by the MSE value, the lower the MSE value the higher the similarity. (d) Three examples of test-cases from Colon (left column), Neuron (middle column), and ESC (right column). In each column, the first row (red scatter-plots) compare two GEPs of very similar biological samples obtained from the same experiment. The second row shows one of those GEPs, along with its REP. (e) Boxplots that show the quantiles and median of the distribution of Spearman correlation coefficient between GEPs of each test sample and the closest test sample to it (GEP-GEP), and also between each test sample GEP and its REP. (f) The same comparison for MSE in log10 scale. (g) Comparison of the MSE distance between original GEP and reproduced expression profile by using different algorithms. MSE = Mean squared error, COR = Spearman correlation coefficient, GEP = Gene expression profile, REP = Reproduced gene expression profile, CIC = Cell identity code, ESC = Embryonic stem cells, PCA = Principal components analysis, NMF = Non-negative matrix factorization, RBF = Radial basis function, Cos = Cosine.