Fig. 2: The generalization ability of VITAP compared to vConTACT2 based on the VMR-MSL38 database.
From: VITAP: a high precision tool for DNA and RNA viral classification based on meta-omic data

The VMR-MSL38 database was divided into a training set comprising 70% of the data and a test set comprising 30%. Sequences in the test set were sliced into genome fragments of varying lengths (1-, 5-, 10-, 20-, and 30-kb), and taxonomic assignments were performed using VITAP and vConTACT2, which was built on the 70% training set. The dataset splitting, training, and taxonomic assignment steps were independently repeated ten times. The accuracy, precision, recall, F1 score, and annotation rate were employed to characterize the tenfold cross-validation. The center lines of the boxes indicate the median values of taxonomic assignment matrices on 17 viral phyla. The bounds of the box represent the interquartile range, with the lower and upper bounds, respectively, corresponding to the first and third quartiles. The whiskers denote the lowest and highest values within 1.5 times the interquartile range. a The F1 scores of VITAP and vConTACT2 across ten independent family and genus level taxonomic assignments; b The annotation rate of VITAP and vConTACT2 across ten independent family and genus level taxonomic assignments; c Taxonomic assignment performances of VITAP and vConTACT2 on family levels were evaluated by accuracy, precision, recall F1 score, and annotation rate. The boxplots represent the distribution of averages of five classification matrices for 17 different viral phyla produced by two pipelines; d Taxonomic assignment performances of VITAP and vConTACT2 on genus levels were evaluated by accuracy, precision, recall F1 score, and annotation rate. The boxplots represent the distribution of averages of five classification matrices for 17 different viral phyla produced by two pipelines; e Taxonomic assignment performances of VITAP and vConTACT2 on family-level cross 17 viral phyla were evaluated by F1 score and annotation rate; f Taxonomic assignment performances of VITAP and vConTACT2 on genus level cross 17 viral phyla were evaluated by F1 score and annotation rate.