Fig. 2: Relationship between methods for determining genomic clusters for E. faecium isolates from the major multilocus sequence type (ST1421 n = 134).

Core genome multilocus sequence type (cgMLST) clusters are based on single-linkage clustering using a pairwise allelic difference threshold of ≤25 alleles. cgMLST core alignment (cgMLSTCA), cgMLST with cluster reference core alignments (cgMLSTCRCA), pairwise comparison using de novo references (PCDR) and split kmer analysis (SKA) clusters are based on a pairwise single nucleotide polymorphism (SNP) distance threshold determined based on intrapatient SNP diversity (cgMLSTCA: ≤44 SNPs, cgMLSTCRCA: ≤6 SNPs, PCDR: ≤10 SNPs, SKA: ≤7 SNPs) and are generated using single linkage clustering. The size of the nodes represents the number of isolates in each of the clusters and is relative for each multilocus sequence type (MLST) and the number of isolates in each cluster is displayed in brackets. PCDR was used as the gold standard method and so the genomic clustering pattern of the other methods should be as close as possible to PCDR.