Fig. 2: Overview of the KnotFold approach to predicting RNA secondary structure including pseudoknots.

a The main procedures of KnotFold illustrated using CP000097.1_937913-937973 as an example: KnotFold first predicts the base pairing probabilities for any two bases of the target RNA, then constructs a potential function based on the acquired base pairing probability, and finally realizes the optimal secondary structure with the lowest potential using the specially designed minimum-cost flow algorithm. Here, the flow network shows four bases, i.e., 5G, 41U, 50C, 56A, and 12 edges among these bases as representatives, and KnotFold selects the corresponding base pairs 5G-50C (in magenta) and 41U-56A (in green) as part of the predicted secondary structure. The final prediction consists of a total of 18 base pairs but only one false-positive base pair 15G-20C (in blue). b The iteration steps of solving the optimal flow. The proposed algorithm begins with a zero flow with no edges and iteratively adds new edges to the current flow, or sometimes removes existing edges. We use KnotFold to construct the secondary structures corresponding to the intermediate flows. The cost decreases as iteration proceeds and finally reaches −351.6 after 36 steps. During this process, some base pairs are newly added (shown as orange lines here) while some are removed, which is described in more detail in Supplementary Fig. 5.