Extended Data Fig. 2: Analysis of palmprint contigs recovered by Serratus.
From: Petabase-scale sequence alignment catalyses viral discovery

a Length distribution of amino acid sequences in the rdrp1 query (upper histogram) and microassembled contigs (lower histogram, length=nucleotides/3). b Distribution of Palmscan confidence scores. c Observations of the 10 most frequent “super-motifs” (six well-conserved residues marked with asterisk) reported by Palmscan. d Kernel distribution and mean (white cross) of coverage vs. abundance (number of runs where a given palmprint is observed), showing that palmprints have similar underlying coverage distributions at all abundances. e Preston plot of distinct palmprints vs. abundance exhibiting similar, approximately log-log-linear relationships to totals at end-of-year 2015 to 2019 and final totals at approx. end of 2020 (all). f Preston plot of number of distinct palmprints observed in a given run vs. number of runs with 95% confidence interval. g Numbers of singletons and second observations (confirmations) at the end of each year showing that the growth in singletons is matched by a comparable growth in confirmations. h Kingdom predicted by Virsorter2 for RdRP+ contigs (by Palmscan) obtained by full assembly of 880 randomly chosen RdRP+ runs. i Number of palmprints in each phylum assigned by taxonomy (known) or predicted (novel). j Number of OTUs as a function of clustering identity.