Supplementary Figure 3: Combined principal component and weighted gene co-expression network analyses reduce the number of key spMN maturation and embryonic development markers (see Fig. 3). | Nature Neuroscience

Supplementary Figure 3: Combined principal component and weighted gene co-expression network analyses reduce the number of key spMN maturation and embryonic development markers (see Fig. 3).

From: ALS disrupts spinal motor neuron maturation and aging pathways within gene co-expression networks

Supplementary Figure 3

(a) mRNA expression values compared between RT-qPCR and microarray for an iPSC line and its iMN derivative, as well as for fetal spinal cord (fSC) samples at gestational days 52 and 53. Data were obtained from the same sample of RNA used in both platforms. RT-qPCR expression values based on the average of n = 3 technical replicates. (b) Gene expression density plot for 6,640 overlapping genes represented in the training data set as well as in the validation data sets, totaling 120 samples. Each line represents the gene expression distribution from one sample. Colors denote the study from which they were obtained. Black line represents the quantile normalized distribution of all samples. (c) Principal component analysis of 6,640 mRNA transcripts from training and validation samples (n = 120 samples) illustrates the major features that define the samples with respect to one another. The y-axis depicts the coordinate along each principal component for each sample. The percent contribution to the total variance of the data by each principal component is shown along the x-axis. In order to reduce obscuration, data points are jittered randomly along the x-axis within each bin. Sample legend is shown on the far right. Colors of data points indicate general sample type, and shapes of data points indicate the study from which the data were obtained. Microarray platforms are also indicated. (d) Principal component analysis performed on 20 gene expression values across 77 samples represented in b without the 43 samples in the training set. Samples are plotted by their coordinates along PC1 and PC2. Sample legend is the same as for c. (e – g) ROC analysis performed on four methods classifying samples in the validation data set (in addition to two human fibroblast samples, n = 79 samples) as pluripotent stem cells (e), fetal spinal cord-like cells (f), or adult spinal cord cells (g). Classifications were based on Pearson’s correlation of the sample to the median expression values of target cell types in the training data set using 6,640 genes (orange) or 20 genes (green) or based on sample coordinates along the spMN maturation or embryonic development principal components using 6,640 genes (black) or 20 genes (blue). The area under the curve (AUC) is shown next to each like-colored curve, and summarizes the overall performance of each classification method. For additional information, see Supplementary Tables 1a (for sample meta data), 2c (for module assignments and properties), 2f–i (for gene scoring properties), 3a (for qPCR primer sequences), and 3b (for normalized linear expression values used in validation analyses).

Back to article page