Extended Data Fig. 1: Samples with outlier values for at least one QC metric cluster separately from most non-outlier samples.
From: Early prediction of preeclampsia in pregnancy with cell-free RNA

a–c, For discovery (a), validation 1 (b), and validation 2 (c), hierarchical clustering (left) and PCA (right) reveals that most outlier samples cluster with negative control (NC) samples and separately from non-outlier samples. d, e, Visualization of other QC metrics like the amount of cfRNA extracted (d) and the percent of reads that align uniquely to the human genome (e) (n = 209, 106, 89 samples for discovery, validation 1, and validation 2, respectively). For PCA in a–c, sample outliers and poorly detected genes drive PCA and serve as leverage points. The top two principal components are visualized when performed using all samples and all genes (leftmost PCA) or only samples that pass QC metrics (middle PCA) reveals that certain samples can act as leverage points. Once sample outliers and lowly detected genes are removed from the cfRNA gene matrix (rightmost PCA), the top two principal components reflect natural variance in the data and are no longer driven by a few leverage points. For box plots, centre line, box limits, whiskers and outliers represent the median, upper and lower quartiles, 1.5× interquartile range and any outliers outside that distribution, respectively.