Fig. 5: Overview of isONcorrect.
From: Error correction enables use of Oxford Nanopore technology for reference-free transcriptome analysis

The input to isONcorrect is reads from a single cluster produced by isONclust (or any other software that group reads into gene families of origin). This figure illustrates a cluster with five reads (r1–r5) from three isoforms. isONcorrect finds all intervals with distance between xmin to xmax using anchor minimizers (shown as colored blocks) and adds them to a hash table. To correct a single read (e.g. r1), all the anchor minimizer pairs found in r1 are queried in the hash table, and all reads containing a given anchor minimizer pair are retrieved. In this example, r1 has 11 such anchor pairs (shown in Step 1). Each anchor pair is assigned a weight that is the product of its span and the number of reads containing this anchor pair (with the exception of filtering out anchor pairs of dissimilar regions; details in “Methods”; Step 1). For example, the anchor pair (p1, p2) occurs in three reads (r1, r2, and r3). The instance is sent to a weighted interval scheduler that finds the set of non-overlapping anchor pairs with the largest weight (Step 2). In this case, four anchor pairs are selected. All segments between the chosen anchor pairs are sent for correction. A consensus is created (Step 3) using spoa, and one or more trusted variants are identified, based on their frequencies and sequence contexts (Step 4). Each read segment in r1 is corrected to the closest trusted context (Step 5). The segments are inserted back into the original read r1 in what becomes the corrected read of r1 (Step 6). An optional Step 7 corrects the segments of the other reads in the same manner and stores them in a hash table to be retrieved whenever it is their turn to be corrected. For example, when it is r2’s and r3’s turn to be corrected, the interval spanned by the anchor pair (p1, p2) may be again encountered in the optimal scheduling solution, allowing Steps 3–5 to be skipped at that point.