Fig. 4: Most variable sites cause fewer reversions in the Viridian tree than the GenBank tree.
From: Addressing pandemic-wide systematic errors in the SARS-CoV-2 phylogeny

a, Plot showing how many positions in the genome have at least N reversions in each tree (Viridian in blue, GenBank in red). Viridian curve drops faster, having fewer positions that create many reversions. b, Scatter-plot comparing count of reversion mutations found in the GenBank dataset and Viridian dataset. Note that (0, 0) is slightly indented from the origin of the plot. Each point represents a position of the SARS-CoV-2 genome. Three points below the line y = x are highlighted (labeled by genomic coordinates 22786, 8835 and 15521) where Viridian has particularly high numbers of reversions, and one (labeled 21987) for GenBank. c, Blow-up of dotted square from b showing vast majority of variable sites in the genome lie above the line y = x.