Fig. 4: Workflow of LongBow.
From: Restoring flowcell type and basecaller configuration from FASTQ files of nanopore sequencing data

a The hierarchical framework of LongBow. The three layers of LongBow are presented at the top of the figure. The boxes at the bottom indicate the features used by each layer. The first layer classifies basecallers based on the maximum QV value of each sample. The second layer predicts flowcell type (R10 or R9) and basecaller version based on the Bhattacharyya distance of the QV distribution. The third layer predicts whether the basecalling mode is FAST, HAC, or SUP based on QV autocorrelation. b The pipeline for LongBow evaluation. Basecalling was performed to convert the raw signals in FAST5/POD5 files into basecalled FASTQ files. During this process, the flowcell types and basecaller configurations were extracted and recorded as truth. LongBow then predicted the flowcell types and basecaller configurations based on the FASTQ files. These predictions were compared with the truth to evaluate LongBow’s accuracy. Three species icons were obtained from PhyloPic (https://www.phylopic.org) Gorilla gorilla by Margot Michaud and Arabidopsis thaliana var. thaliana by Jake Warner are under Creative Commons Zero 1.0 Public Domain license (https://creativecommons.org/publicdomain/zero/1.0/). Myoviridae by Ninjatacoshell is under Creative Commons CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). Icons for gDNA, cfDNA, and mtDNA were sourced from Servier Medical Art (https://smart.servier.com) under Creative Commons CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). The cpDNA icon was obtained from Openclipart (https://openclipart.org), which is chloroplast by Torisan under Creative Commons Zero 1.0 Public Domain license (https://creativecommons.org/publicdomain/zero/1.0/).