Figure 6: Distribution of lengths of ORFs potentially produced by transcriptional slippage (TS) in single-strand RNA viruses.

A simulation was performed to obtain the amino acid sequences of proteins potentially produced by TS. Information on genomic sequences and ORF positions of single-strand RNA viruses were obtained from the NCBI website (http://www.ncbi.nlm.nih.gov/). ORF sequences with a 1 base insertion or deletion at the G1–2A6+(a) or G0A6+(b) motif were generated and translated into amino acid sequences and are presented for potyviruses (upper panels) and other viruses (lower panels). Magenta bar, blue bar with horizontal stripes and green bar with vertical stripes represent distribution of the –1, +1 and original reading frames, respectively. Differences in the numbers of entries among viral species in the database were normalized per site by selecting only one entry with the longest predicted amino acid sequence following the simulated indel of a virus when multiple entries for the virus with the same length, and the same start and stop codon coordinates of the original ORF containing the motif exist. Note that in (b), The G0A6 motif is found only in lupine mosaic virus (GenBank/EMBL/DDBJ Accession No. NC_014898), which does not carry the G1–2A6+motif67.