Figure 4
From: A Novel Statistical Method to Diagnose, Quantify and Correct Batch Effects in Genomic Studies

Detection of batch effect in CRC and normal gene expression datasets. (A) A PCA plot showing clustering of samples according to batches (the two CRC datasets – GSE18088 and GSE23878) in the principal subspace defined by the first two PCs. The filled squares identify normal samples and the filled circles identify tumors. (B) BIC values from findBATCH showing the optimal number of pPCs for pooled/merged datasets. The higher the BIC value, the better the model. The red dashed vertical line identifies the optimal number of pPCs to be nine. (C,D) Forest plots depicting different pPCs from findBATCH applied to quantify (C) batch and (D) normal/biological effect using uncorrected CRC pooled dataset (GSE18088 and GSE23878). pPCs are considered significant only if 95% CIs do not include zero.