Extended Data Fig. 3: Developing and optimizing PhaVa, a long-read based, accurate inverton caller.
From: Intragenic DNA inversions expand bacterial coding capacity

(A) Schematic of the PhaVa workflow. Putative invertons are identified and long-reads are mapped to both a forward (highlighted by the black dashed lines) and reverse orientation (highlighted by the grey dashed lines) version of the inverton and surrounding genomic sequence. Reads that do not map across the entire inverton and into the flanking sequence on either side or have poor mapping characteristics are removed. See methods for details. (B-C) Optimizing cutoffs for the minimum number of reverse reads, as both a raw number and percentage of all reads, to reduce false positive inverton calls with simulated reads. Cell color and number represent (B) the false positive rate per simulated readset and (C) the total number of unique false positives across all simulated datasets. (D) False positives in simulated data plotted per species. All measurements were made with a minimum of three reverse reads cutoff and varying the percentage of minimum reverse reads cutoff. Dashed line indicates the minimum reverse reads percent cutoff used for isolate and metagenomic datasets. Solid lines indicate sample mean while colored bands indicate 95% confidence interval. (E) Output tables of particular interest are labeled and shown below the diagram with example output.