Fig. 2: Overview of ENCODE database swap detection. | Nature Communications

Fig. 2: Overview of ENCODE database swap detection.

From: Detecting sample swaps in diverse NGS data types using linkage disequilibrium

Fig. 2

a Overview of 8851 genotyped datasets from ENCODE, partitioned by cell type (top left), assay type (top right), and by target for ChIP-seq (bottom). Cell types that had less than 100 datasets derived from them were pooled—so all the datasets from them are grouped into those with less than 30 datasets or those with 30-100 datasets. All hg19 aligned reads from total RNA-, polyA RNA-, ChIP-, and DNase-seq experiments performed on samples belonging to donors with at least four datasets in total were included in the analysis. All ChIP-seq targets, including histone modifications (HM), transcription factors (TF), chromatin modifiers (CM), CTCF, and control experiments were included. b Distribution of LOD scores from ENCODE genotyping. Each dataset was compared to three representative datasets from its nominal donor. Any dataset scoring negatively against any of the three representatives was flagged for further review. A comparison resulting in an LOD score between −5 and 5 was deemed inconclusive (insufficient evidence to indicate shared or distinct genetic origin). c Each flagged sample was compared to all other samples from its nominal donor, as well as the representatives for all other donors in our database to nominate true donor identity and identify genetically consistent sub-clusters. Comparisons of flagged samples between two HUVEC donors reveal five genetically distinct clusters.

Back to article page