Fig. 1: Highly conserved regions identified between multiple sarbecoviruses from bats, pangolin, and human SARS-CoV and SARS-CoV-2.

A multiple alignment of 82 sarbecoviruses were selected to capture diversity across the sarbecovirus proteome. The top part of the graph shows where the ORFs and protein coding regions are located within the proteome, the highlighted most conserved regions are labeled Region 1, 2 and 3; Regions 1a and 1b were exceptionally conserved within region 1. The Membrane (M) and Nucleocapsid (N) proteins are also labeled. PTE coverage relative of the aligned proteins by the SARS-CoV-2 Wuhan reference strain (Accession _NC_045512) is indicated in red; a value near 1 means that nearly all of the proteins in the alignment have a perfect match to the 9 amino acid long peptide (PTE) starting in a given position. 17% of the viruses were SARS-CoV-2, so a score of <0.2 indicates that the PTE in the reference strain is commonly matched only among SARS-CoV-2 viruses. Entropy scores for each position in the full proteome alignment are shown in blue in the bottom graph; in this case, lower scores indicate greater conservation, and Regions 1, 2 and 3, are indicated by the beige boxes in this plot, with the most highly conserved stretches highlighted in green and blue.