Fig. 3: Mutations associated with chronic-like clades.
From: Using big sequencing data to identify chronic SARS-Coronavirus-2 infections

a Mutations that best explain model predictions of chronic-like clades. Each panel depicts the mutations per a given background variant, with only those with an explainability score higher than 0.05 (75th quantile) shown (y-axis). The x-axis represents the average LIME R2 score, indicating the prediction reliability. Dot size correlates with the number of samples in which the word was observed. For clarity only mutations with an explainability score higher than 0.1 are labelled, as well as those with a score higher than 0.05 that recur across different background variants (b). The full list of mutations is available in Supplementary Dataset 6. b Recurrent mutations associated with chronic-like clades, across different background variants. All mutations with an explainability score higher than 0.05 that appeared in at least two variants were used for the intersection. Fitness is based on inferences derived from mutations abundance across all globally circulating sequences43 and antibody escape is based on inferences from deep-mutational scans using a variety of different types of antibodies44,70. a, b Synonymous nucleotide substitutions are denoted by their genomic position, whereas amino-acid replacements are denoted by the protein name and amino-acid replacement. c Distribution of fitness effect values for all n mutations with LIME scores higher than 0.05, shown for all genes in the genome (but, accessory genes under little selection were omitted) in the upper panel and for the spike gene in the lower panel. The dashed line represents fitness effect of zero (indicating neutrality).