Fig. 2: Identification, scoring and ranking of genes dysregulated in Parkinson’s disease reprogrammed neurons. | npj Parkinson's Disease

Fig. 2: Identification, scoring and ranking of genes dysregulated in Parkinson’s disease reprogrammed neurons.

From: Druggable transcriptomic pathways revealed in Parkinson’s patient-derived midbrain neurons

Fig. 2

A Principal component analysis (PCA) of the transcriptome data from PD patient- and control- reprogrammed neurons collected across six datasets, post-mortem adult human substantia nigra samples (GTEx), and highly pure populations of human floor plate-derived midbrain dopaminergic neurons (iCell® DopaNeurons). In vitro-engineered neuronal tissue clusters separately from post-mortem substantia nigra tissue (GTEx) and groups by method of derivation (iPSC reprogramming or direct iN conversion) irrespective of the laboratory of origin. Each data point represents a bulk or artificial bulk (for single-cell RNA-seq datasets) tissue transcriptome. Artificial bulk samples were generated by summing up the gene counts from all cells of the same subject. Color indicates dataset of origin (annotation shown in (B)). B Heatmap clustering of the average transcriptomes of the six reprogrammed neuron datasets, GTEx substantia nigra and striatum tissues, and iCell® DopaNeurons using all expressed genes (≥1 transcript per million [TPM] across all samples). Hierarchical clustering is based on average linkage and Euclidean distance-based similarity. The darker shade denotes higher similarity. C Pipeline for computation of a per-gene dysregulation score (D) based on individual-dataset differential expression (DE) analysis results. DE analysis was performed on each dataset independently on both read counts and TPM values, and results were combined using logitp method (for a combination of P-values) and arithmetic mean averaging (for a combination of log2 fold changes). Combined P-value and log2 fold change measures were mapped to a continuous (0.01-1) scale using desirability functions48, and integrated by weighted geometric averaging to obtain an overall dysregulation score (D) for each gene. Information about whether the gene was expressed in the adult human midbrain was used as a soft filter at 1% of the total weight to prioritize the ranking of relevant genes. Dysregulation scores are integrated across multiple datasets, weighting for cross-dataset similarity in log2 fold change directionality, to obtain an overall dysregulation score per gene (Doverall; refer to Methods for details). D Number of genes (P < 0.05 and |fold change | ≥ 1.25) up- and downregulated in each dataset of PD versus healthy control reprogrammed neurons. E Volcano plot of differentially expressed genes between PD and control reprogrammed neurons (shown for dataset bulk-iN-Mixed). Genes with greater statistical significance and/or greater fold change in expression have a larger dysregulation score (D; color-coded in plot). F Dataset expression levels of housekeeping genes GAPDH and ACTB are very similar between patients and controls and are associated with a low overall (multiple-dataset) dysregulation score. G Relative expression levels of the top 15 differentially expressed genes in PD versus control neurons in each of the six analyzed RNA-seq datasets. Each heatmap cell represents a single cell (single-cell datasets) or bulk tissue sample (bulk datasets). Disease phenotype and subject ID are annotated horizontally, and gene function is annotated vertically according to the legend. A subset of individuals (N = 17) was selected to generate both IPSC and iN models, and three neuronal lines were profiled using both single-cell and bulk RNA-sequencing (see Supplementary Table 1 for subject details). For each gene, the dataset-specific dysregulation score and rank, as well as the overall dysregulation score calculated across all six datasets, are shown on the right with dark intensity indicating the strength of the score.

Back to article page