Fig. 1: The significance of flowcell types and basecaller configurations in ONT data analysis. | Nature Communications

Fig. 1: The significance of flowcell types and basecaller configurations in ONT data analysis.

From: Restoring flowcell type and basecaller configuration from FASTQ files of nanopore sequencing data

Fig. 1

a Proportion of ONT data with raw FAST5/POD5 files in the SRA database. b Proportion of ONT data with specific flowcell types or basecaller configurations in the metadata of SRA. c Proportion of ONT data with specific flowcell types or basecaller configurations in the publications associated with 100 randomly selected SRA records. d Performance of Clair3 for SNP calling using correct and incorrect flowcell types or basecaller configurations. The color scale indicates the F1-score. The x-axis represents the flowcell type and basecaller configuration used for basecalling the test data. The y-axis represents the parameter configuration or pretrained model of the evaluated algorithm. Clair3 uses the same model for Guppy 3 and Guppy 4, and another shared model for Guppy 5 and Guppy 6. Flowcell type and basecaller configuration are encoded as strings, where the number after ‘R’ denotes the major flowcell version, ‘G’ represents Guppy, ‘D’ represents Dorado, and the number following ‘G’ or ‘D’ is the major basecaller version. E(\(\Delta {\rm{F}}1\)) is the average F1-score loss when using random models. e Performance of Clair3 for INDEL calling. The definitions of x-axis and y-axis are identical to subfigure d. f Performance of Shasta for genome assembly. The definitions of x-axis and y-axis are similar to those in subfigure d, with an additional suffix to represent the basecalling mode. The color scale indicates Yak QV of the assembled contigs and the circle size indicates the decimal logarithm of contig NG50. M means million bases. E(\(\Delta {\rm{QV}}\)) and E(\(\Delta\)NG50) are the average Yak QV loss and NG50 loss when using random parameters. g Performance of Medaka for genome polishing. The definitions of x-axis and y-axis are identical to those in subfigure f. The color scale indicates Yak QV shift of the assembled contigs after polishing. The encoding of flowcell type and basecaller configuration is identical to that in subfigure f. E(\(\Delta {\rm{QV}}\) shift) is the average \(\Delta {\rm{QV}}\) shift loss when using random parameters.

Back to article page