Fig. 2

Comparison of clustering quality of the pre-processing pipeline TMM + CPM + SVA. (A) 3D PCA plot displaying the separation of samples by tissue type before normalization, after normalization, and after batch correction. The tissue clusters of Brain-Cortex, Heart-Atrial Appendages, Heart-Left Ventricle, Liver, Skin-Sun Exposed, and Skin-Not Sun Exposed were shown as a representation, and the measure of Euclidean distance between PCA tissue clusters is denoted in red text. The 3D plots can be visualized using the links (https://dhana2403.github.io/3D_plots/3d_pca_plot_all_tissues_tmm.html) and (https://dhana2403.github.io/3D_plots/3d_pca_plot_all_tissues_sva.html). (B) The bar graph compares the average Euclidean distance after TMM + CPM and TMM + CPM + SVA processing steps across 54 GTEx tissues. The centroid for each tissue is calculated as the mean of PC1, PC2, and PC3. (C) The bar graph illustrates the percentage of variance explained after applying TMM + CPM normalization and SVA batch correction on gene expression data. The percentage of variance corresponding to the principal components in the x-axis is mentioned at the top of each bar. (D) The bar graph compares the Davies-Bouldin Index (DBI) for tissue clusters after TMM + CPM normalization and SVA batch correction. DBI is calculated based on the first three principal components (PC1, PC2, and PC3). Lower DBI values indicate better separation and compact clusters. E) The plot compares brain tissue-specific gene SNAP25 expression trajectory after TMM + CPM and TMM + CPM + SVA processing steps across 54 tissue types. The distribution of expression values in brain tissue samples is highlighted in the purple box.