Fig. 1: The overall workflow of the analyses and colon vs rectum comparison. | British Journal of Cancer

Fig. 1: The overall workflow of the analyses and colon vs rectum comparison.

From: Identification of unique rectal cancer-specific subtypes

Fig. 1

a Workflow of the data processing steps. Raw files (.CEL) were downloaded from 16 different colorectal studies in the Gene Expression Omnibus (GEO) database. 10 datasets were processed following the same steps. After pre-processing, they were merged and batch effect correction was applied. Hierarchical clustering in rectal samples (n = 182) in these datasets was calculated to define rectal-specific subtypes (RSS). Then, by generating gene co-expression networks and identifying gene modules, we developed a classifier to predict RSSs in different datasets. 6 microarray datasets and TCGA RNA-seq samples were separated initially due to the different technology platforms there were originated from. RSSs were calculated in these datasets separately after the classifier was developed. Volcano plots of differential expression analysis show that top differentially expressed genes between the colon and rectum in microarray (b) and RNA-Seq (c) datasets are similar. HOXB13, HOXC6 and CLDN8 were among the most DEGs in both cohorts. Different thresholds (p-value and fold-change) were used to demonstrate the differential expression in microarray and RNA-Seq due to the unbalanced distribution of colon/rectal ratio and lower sensitivity in microarray datasets.

Back to article page