Figure 2
From: Discovering viral genomes in human metagenomic data by predicting unknown protein families

(a) Summary of the resulting 32 high confidence families. The scatterplot summarizes the basic statistics of the predicted proteins. The X and Y axis encode for ORF length and RNAcode p-value respectively, while the size of the dots are scaled by number of sequences in the alignment. Protein families are colored based on their hits to the different databases. (b) Example of RNAcode output for predicted ORFan family 457. The multiple alignment for cluster 457 is shown with the RNAcode-predicted peptide sequence on the top and the high-scoring segment highlighted in yellow. Codons colored in green indicate the presence of synonymous mutations, suggesting that selective pressures act on those sites to preserve the amino acid. In contrast, pink or red codons indicate non-synonymous mutations which do not preserve the amino acid encoding.