Figure 2 | Scientific Reports

Figure 2

From: Epidemiological associations with genomic variation in SARS-CoV-2

Figure 2

Subclade identification using CoV genome and gene variation in population of sample in our study. Subclade finding was performed using omeClust and enrichment score of metadata was measured based on the overlap of detected clades and metadata using normalized mutual information (NMI). (a) regions of CoV genome have been clustered using z-score of enrichment scores for three metadata variables available for all lineages. Regions such as S, nsp6, N, nsp3, ORF1a, ORF1ab are more similar to genomes using clusters of scaled enrichment scores. (b) omeClust identifies communities of CoV lineages that are mostly explained by organisms (NMI = 0.9). (c) Spike protein that facilitates binding and entering to host cells carries similar variation among organisms as the whole CoV genome. (d) nsp3 protein has a similar variation to S protein and can be targeted as a protein with an important biological function. omeClust detects four communities (points colors) corresponding to the four known organisms (points shapes).

Back to article page