Figure 4: PCA on subsets of data can reveal additional dimensions.

(a) Two different subsets of the Lukk dataset, a brain subset and a cancer subset, were analyzed in more detail. The larger graphics show their location on the first three PCs of the complete dataset with color coding according to cancer primary site or brain region. The inset (small graphics) show the first two PCs (called “residual subset PCs”) of a PCA applied to the residual data matrix of the cancer (left) or brain (right) subset. The analysis shows that the different cancers or brain regions can be nicely separated on the residual space while they overlap more strongly on the three dimensional PCA space derived based on the complete dataset. (b) The dimensions derived in (a) can be projected to the own dataset, i.e. across microarray platforms, showing a similar separation of cancer types and brain regions on the residual space. This verifies the actual biological relevance of the additional dimensions in the residual space (insets). Background colors represent the complete Lukk dataset (a) or own dataset (b) with colors according to large-scale groups (red: brain, orange: hematopoietic, green: cell line, blue: incompletely differentiated, magenta: muscle, grey: other).