Table 1 Comparative analysis of ten viral sequence classifiers

From: Maximal viral information recovery from sequence data using VirMAP

Pipeline

Mapped Reads (%)

Unique Calls

Viral Taxonomies

CCR (% of mapped)

Precision

Recall

F-score

VirMAP

3,099,015 (50.1%)

8

8

3,099,007 (99.999%)

0.88

1.00

0.94

Read classification

 FastViromeExplorer

2,710,170 (43.85%)

7

4

2,710,170 (100%)

1.00

0.57

0.73

  VirusSeekera

10,750 (0.174%)

16

16

1,467 (13.65%)

0.31

0.57

0.40

  Kaiju

2,287,962 (37.02%)

227

227

433,243 (18.94%)

0.09

1.00

0.17

  ViromeScan

663,185 (10.73)

427

354

614,016 (92.586%)

0.01

0.57

0.02

Contig classification

  drVMb

22,404,813 (362.54%)

673

158

18,235,876 (81.39%)

0.35

1.00

0.52

  VirusTAP

NA

5

5

NA

0.6

0.43

0.50

  VIPIEc

~109633 (~1.77%)

13

11

~23,731 (~21.65%)

0.30

0.71

0.42

 Standard methodd

2,319,573 (37.53%)

8

8

2,273,193 (98.03%)

0.75

0.86

0.80

Marker gene classification

  MetaPhlAn2

NA

5

5

NA

0.40

0.29

0.34

  1. The Viral Mock Community (VMC) dataset (6,180,026 trimmed reads) was processed through nine different pipelines for viral taxonomic classification. VMC was generated by combining purified preparations of seven different viruses (human adenovirus B, human adenovirus C, murine gammaherpesvirus 4, coxsackievirus B4 [strain Tuscany], echovirus E13 [strain Del Carmen], human poliovirus type 1 [strain Mahoney], and rotavirus A) in phosphate-buffered saline. Unique calls refer to the distinct database entries reported while viral taxonomies represent a reduction of unique calls to NCBI taxonomic ID. CCR: Correctly Classified Reads. Precision: (true positives/true positives + false positives). Recall: (true positives/true positives + false negatives), F-score: harmonic average of recall and precision scores 2 × ((P × R) / (P + R))
  2. aVirusSeeker applies filtering and clustering techniques to the reads and final counts are derived from this reduced set
  3. bdrVM internally counts identical reads across multiple reported entries, so the total counts can exceed 100%
  4. cVIPIE reports reads as counts per 100,000 reads, the approximation is a rescaled amount against the original read counts
  5. dThe standard approach employs a metagenomic assembly using MEGAHIT and a sequential top-hit mapping classification using BLASTn and BLASTx