Extended Data Fig. 2: Targeted NanoSeq study design and quality metrics. | Nature

Extended Data Fig. 2: Targeted NanoSeq study design and quality metrics.

From: Somatic mutation and selection at population scale

Extended Data Fig. 2: Targeted NanoSeq study design and quality metrics.

a, Flow diagram describing the selection of the donor cohort used in this study. b, c, Distribution of zygosity, sex, age, smoking (pack-years) and drinking (drink-years) values for (b) buccal swab (n = 1,042) and (c) archival blood sample donors (n = 371). d, Identification of samples contaminated with human DNA from another individual, comparing the proportion of mutation calls falling in the ‘SNP+noise’ mask versus the median fraction of reference bases at alternative homozygous SNPs; point size is proportional to the duplex coverage. The vertical dashed line indicates our exclusion criterion for human contaminated samples (>0.01). e, Identification of samples contaminated with non-human DNA, comparing the dN/dS values in passenger genes versus the percentage of unmapped reads mapping to a set of potential contaminant species. Horizontal dashed line indicates neutral dN/dS=1. Red points indicate samples with upper bound 95% CI dN/dS ratio <1. Vertical dashed line shows our exclusion criterion for non-human contaminated samples (>0.25). f, Histogram of duplex coverage (dx) in the buccal swab cohort, at on-target and near-target regions. Vertical dashed line shows our exclusion criterion for low coverage samples (<50dx). g, Distribution of the mean deduplicated coverage (×) in the buccal swab cohort, at on-target and near-target regions. Raw sequencing coverage is ~6.6 times higher due to the average 85% duplicate rate required for duplex consensus calling. h, Estimation of epithelial fraction in buccal swab samples by targeted enzymatic methylation sequencing. Vertical dashed line shows the median epithelial fraction of 0.95. i, Sequencing quality metrics including the on-target capture fractions, estimated excess in strand drop-out (SDO), and the achieved duplicate rates for the buccal swab cohort. Random binomial sampling is expected to cause lack of coverage in one of the DNA strands in a proportion of cases. We estimated the excess in SDO by subtracting the observed and expected SDOs. Box plots show the interquartile range, median, 95% confidence intervals and outliers as dots for the buccal cohort (n = 1,042). j, Relationship between duplicate rates and sequencing efficiency, measured as the number of bases with duplex support divided by the total number of bases sequenced. k, Relationship between duplicate rates and sequencing efficiency after factoring in the on-target fraction (t) and the excess in strand drop-out (SDO). l, Number of duplex calls as a function of the primary alignment score minus secondary alignment score (AS-XS) threshold. m, Substitution burdens calculated within each AS-XS threshold corrected for trinucleotide context. Error bars for substitution burdens indicate Poisson 95% CIs. n, o, Hotspots covered by different AS-XS thresholds, shown as (n) total number of hotspots and (o) their aggregated frequency in TCGA. Horizontal dashed line indicates the detection of all studied hotspots.

Back to article page