Fig. 5: Evaluation of the final classification model’s performance at the order level.

a Decision boundary plotted for the classifier at the order level in the dimension of two t-distributed stochastic neighbour embedding (T-SNE) components. Data dimensions were reduced using PCA and T-SNE. All dots are colored by the giant virus order. Training data are visualized in circles with black border. The sizes of the transparent dots (without border) indicate the probability of class membership for each point on the grid across the feature space. b Normalized confusion matrix of classification at the order level. Rows correspond to the true taxonomic assignments of sequences, and columns represent predicted classification. The diagonal values indicate the percentage of times the predicted classification matches the true taxonomy. Values were normalized by class sizes.