Fig. 4: Benchmarking performance metrics (accuracy, precision, recall, F1 score, annotation rate) of VITAP compared to vConTACT2, CAT, PhaGCN2, and geNomad based on newly released genomes (3705 viral reference genomes, released after 2022.01).
From: VITAP: a high precision tool for DNA and RNA viral classification based on meta-omic data

All of these pipelines are based on the VMR-MSL37/NCBI RefSeq209 database. a The taxonomic assignment performance evaluation on family-level. Boxplots represent the five performance metrics of five fragment subsets with different lengths (1-, 5-, 10-, 20-, and 30-kb), which were generated from the VMR-MSL37/NCBI RefSeq209 database, including 191,596, 40,425, 21,857, 13,622, and 8844 fragments, respectively. These boxplots represent the distribution of averages of five classification matrices for 16 different viral phyla produced by two pipelines. The center lines of the boxes indicate the median values of taxonomic assignment matrices on 16 viral phyla. The bounds of the box represent the interquartile range, with the lower and upper bounds, respectively, corresponding to the first and third quartiles. The whiskers denote the lowest and highest values within 1.5 times the interquartile range. For each box plot, the boxes from left to right represent the statistical data of vConTACT2, CAT, PhaGCN2, geNomad, and VITAP, respectively; b The taxonomic assignment performance evaluation on genus level. For each box plot, the boxes from left to right represent the statistical data of vConTACT2, CAT, and VITAP, respectively; c On the aspect of family-level taxonomic assignments, the F1 score and annotation rate of VITAP and the other four pipelines were compared spanning various sequence lengths; d On the aspect of genus-level taxonomic assignments, the F1 score and annotation rate of VITAP and the other four pipelines were compared spanning various sequence lengths.