Fig. 6: Overview of sampling, sequencing, and analytical workflow for the whole-genome isolates (WGS) and metagenomic data.

Samples were collected from human, animal, food, water, and environmental sources across four African countries between 2019 and 2023. Importantly, the datasets were processed for WGS and metagenome sequencing separately, using different subsets of samples. For WGS, cultured isolates were subjected to DNA extraction, library preparation, and paired-end sequencing. Reads were quality-checked (QC), trimmed, assembled, and taxonomically classified, followed by multi-locus sequence (MLST) typing, serotyping, and functional annotation. Metagenomic DNA was extracted directly from raw samples, sequenced, and analyzed for community composition and taxonomic profiling. Reads were classified using Kraken2/Bracken and KMA, and relative abundance was estimated using centered log ratio (CLR) transformation. Metagenomes were assembled using MEGAHIT and screened for pathogens of interest using QUAST. MetaBAT2 was used to bin contigs into putative genomes, and CheckM2 was employed to evaluate their completeness and contamination, retaining only high-quality metagenome-assembled genomes (MAGs) for downstream analysis. These MAGs were classified taxonomically using the Genome Taxonomy Database Toolkit (GTDB-Tk) and compared to cultured isolates using average nucleotide identity (ANI) and phylogenetic reconstruction. Together, MAGs and isolates were analyzed to explore pathogen diversity, abundance, and ecological structures across samples.