Fig. 2: RNA-dependent RNA polymerase in the SRA.
From: Petabase-scale sequence alignment catalyses viral discovery

a, The RdRP palmprint is the protein sequence spanning three well-conserved sequence motifs (A, B and C), including intervening variable regions, exemplified within the full-length poliovirus RdRP structure with essential aspartic acid residues (asterisks) (Protein Data Bank code: 1RA649). Conservation was calculated from RdRP alignment in a previous study19, trimmed to the poliovirus sequence; motif sequence logos are shown below. aa, amino acids. b, Per-phylum histogram of amino acid identity of novel sOTUs aligned to the NCBI non-redundant protein database. Extended Data Figure 3c shows the per-order distribution. Inset, Preston plot and linear regression of palmprint abundances indicates that singleton palmprints (that is, observed in exactly one run) occur within 95% confidence intervals of the value predicted by extrapolation from high-abundance palmprints (linear regression applied to log-transformed data), and this distribution is consistent through time (Extended Data Fig. 2). NA, not applicable; uncl, unclassified.