Extended Data Fig. 4: Purity estimation and ppVAF transformation in normal and premalignant samples. | Nature

Extended Data Fig. 4: Purity estimation and ppVAF transformation in normal and premalignant samples.

From: Polyclonal origins of human premalignant colorectal lesions

Extended Data Fig. 4: Purity estimation and ppVAF transformation in normal and premalignant samples.

a-b. Comparison between sample purities estimated by two copy-number based algorithms (FACETS and Sequenza) in the HTAN WGS (a) and WES (b) samples. Colors indicate the type of sample (green: mucosa, orange: benign polyp, blue: dysplastic polyp) and shape indicates the inferred clonal origin of the sample from the bulk sequencing data (circle: monoclonal, triangle: polyclonal). c-d. Comparison between WES and WGS sample purities estimated by FACETS (c) or Sequenza (d) for samples profiled with both modalities. As before, colors indicate the type of sample (green: mucosa, orange: benign polyp, blue: dysplastic polyp) and shape indicates the inferred clonal origin of the sample from the bulk sequencing data (circle: monoclonal, triangle: polyclonal). e-f. The distributions of purities inferred by Sequenza (dashed colored lines; mucosa shown in green in e, benign (orange) and dysplastic (blue) polyps are shown in f) or FACETS (solid colored lines) are very different from the epithelial cell fractions measured using scATAC-seq (grey filled distributions). g-h. Toy examples showing estimation of ppVAFs using uncertain sample purity values, for a mutation with 20 mutant and 80 wild-type reads. Using a known sample purity with no uncertainty (vertical line in sample purity distributions in g), the posterior distribution for the ppVAF has the narrowest possible width (corresponding posterior distribution in h). As the sample purity distribution gets wider (orange and red distributions in g), the ppVAF posterior distributions for a mutation with the same reference and alternate allele sequencing counts get wider as well (corresponding orange and red distributions in h). The true ppVAF of the mutation given an 80% pure sample is noted by the dashed vertical line in g. i-l. Raw VAF distributions (grey) and corresponding ppVAF distributions (blue), computed using the scATAC-seq measured polyp sample purity distribution (shown in f) from four example samples where computational purity estimation using copy-number based algorithms (FACETS and Sequenza) produced poor results. In these examples, many mutations, including the bulk of the clonal mutation peak, have substantially higher VAFs than the expected clonal heterozygous VAF calculated from the FACETS purity (vertical solid line) and/or the Sequenza purity (vertical dashed line).

Back to article page