Figure 4
From: Generator based approach to analyze mutations in genomic datasets

Continuous Evolution. A variant/strain is defined by deterministic stable mutations, however, evolution is characterized by the randomness of the transient mutations. In these plots, we quantify this randomness for different SARS-CoV-2 clades by comparing SARS-CoV-2 sequences within the same clade/variant in different months. (a) represents validation accuracy of a logistic regression classifier built using state machine representations of SARS-CoV-2 sequences (with \(k = 4\), \(b = 1\), \(\beta = 0.5\)) in January 2020 with the months of February, March and April 2020 within GISAID clade L. (b) represents validation accuracy of a logistic regression classifier built using state machine representations of SARS-CoV-2 sequences (with \(k = 4\), \(b = 1\), \(\beta = 0.5\)) in May 2020 with the months of June, July, August, September and October 2020 within GISAID clade G. (c) represents validation accuracy of a logistic regression classifier built using state machine representations of SARS-CoV-2 sequences (with \(k = 4\), \(b = 1\), \(\beta = 0.5\)) in Apr 2021 with the months of May, June, and July 2021 within Delta variant. Greater accuracy in both (a), (b) and (c) point towards more separation between the sequence classes being compared.