Fig. 4: Assessment of allele-based clustering at all possible threshold levels for S. enterica and comparison with traditional MLST and serotype.

a Composition of the S. enterica dataset used in this study in terms of serotype and in comparison with datasets of previous studies (INNUENDO22 and BioProject PRJEB20997142), and the Enterobase database, as of November 202164. A GrapeTree59 visualization of the MST obtained with the INNUENDO-like-INNUENDO99 pipeline is shown. Nodes (i.e., samples) are collapsed at the threshold with highest congruence with serotype (1514 ADs for this pipeline) and colored according to the ST classification. b Number of partitions obtained by each pipeline at each possible distance threshold. c Clustering stability regions determined for each pipeline. To better distinguish each region (represented by separated rectangle blocks), the different blocks are vertically phased, starting in a different line. Distance thresholds (x axis) are presented in log2 scale. d Barplot (top) with the number of samples of the top represented serotypes (≥50 samples) in S. enterica dataset, with a swarmplot (bottom) indicating the AD threshold at which each pipeline clusters together all samples of each serotype. Source data are provided as a Source Data file.