Fig. 1: Unbiased weighted gene co-expression network analysis (WGCNA) of the human transcriptome in the healthy adult prefrontal cortex (PFC).
From: Convergent and distributed effects of the 3q29 deletion on the human neural transcriptome

a A schematic of the data analysis workflow underlying WGCNA-derived predictions for functional interrogation of the 3q29Del interval. The reference dataset was obtained from the GTEx Project to construct a systems-level network representation of coordinated gene expression patterns across 107 non-pathological postmortem samples collected from the Brodmann Area 9 (BA9) of male and female adults with no known history of psychiatric or neurological disorder. Modules of highly co-expressed genes were identified based on their topological overlap measure (TOM). The TOM between two genes is high if the genes have many overlapping network connections, yielding an interconnectedness measure that is proportional to the number of shared neighbors between pairs of genes. The resulting network was screened for modules harboring 3q29 interval genes (3q29 modules), which were then interrogated for biological function and hub genes. A test dataset obtained from the BrainSpan Project was used to validate the reproducibility of this network in an independent sample of 30 non-pathological postmortem specimens collected from four subregions of the PFC from adult males and females with no known history of psychiatric or neurological disorder. These subregions are the OFC orbital frontal cortex, DLPFC dorsolateral PFC, VLPFC ventrolateral PFC, and MPFC medial PFC. b Sample-level dendrogram and trait heatmaps of the reference dataset. The dendrogram was yielded by hierarchical clustering of 107 GTEx samples using normalized, outlier-removed, and residualized gene expression values for 18,410 protein-coding genes. Color bars represent trait heatmaps for sex, age-group (range = 20–79 years), death-classification based on the Hardy scale (range = 0–4), postmortem interval (PMI), and batch id. The color intensity (from light yellow to red) is proportional to continuous or categorical values (in increasing order) of each variable. For sex, yellow and orange indicate female and male, respectively. Transcriptomic data were corrected for covariance mediated by these variables prior to network construction. Adjusted data reveal no distribution bias associated with the interrogated confounds in sample-level clustering patterns. c Determination of the soft-thresholding power (β) used for WGCNA. A β of 8 (black arrow) was identified as the lowest possible power yielding a degree distribution that results in approximate scale-free network topology (SFT R2 fit index = 0.8; red line). d Clustering dendrogram and module assignments of genes, with dissimilarity based on TOM. 18,410 protein coding genes (leaves = genes) clustered into 19 final modules (bottom color bar), detected by the dynamic hybrid tree cut method. Modules with strongly correlated eigengenes (Pearson’s r > 0.8, P < 0.05) were amalgamated to eliminate spurious assignment of highly co-expressed genes into separate modules. Color bars reflect module assignments before and after the merging of close modules. e Composite Zsummary scores for module-preservation (how well-defined modules are in an independent test dataset) and module-quality (how well-defined modules are in repeated random splits of the reference dataset). Permutation tests were performed to adjust the observed preservation and quality statistics of each module for random chance by defining Z statistics. All modules (labeled by color) identified in the reference network were preserved (reproducible) in the test network (Zsummary > 2; blue line). Overall, 15 out of 18 modules, including all 3q29 modules (red arrows), exhibited strong preservation (Zsummary > 10; green line). 3/18 modules exhibited moderate preservation (2 < Zsummary < 10). All modules demonstrated strong evidence for high quality (Zsummary > 10), confirming that the modules identified in the reference network were well-defined and nonrandom.