Fig. 1: E. coli PanSelect probe design and applications.

a Probe design. i) All available, complete E. coli genomes were downloaded from RefSeq (295) and the NCBI Pathogens database (3141). ii) k-mer similarity was used to identify 1713 unique genome clusters. iii) Orthologous gene groups were constructed from these genome clusters with SynerClust71, filtered based on prevalence, and further clustered at 80% identity with UCLUST72. 60-75 bp probes with specificity to the resulting clusters were iv) generated with CATCH25 and v) filtered based on homology to other common gut microbes (ie Bacteroidetes and Firmicutes). b E. coli PanSelect workflow. i) Sequencing libraries are constructed from complex communities containing low abundances of E. coli (red). ii) Short, biotinylated oligonucleotide probes are added to the sequencing library, which bind complementary sequences. iii) Streptavidin pulldown is used to isolate bound target sequences from the library before iv) sequencing. c Applications of E. coli PanSelect. i) Enrichment of a four-strain mock community for initial benchmarking. ii) Analysis of E. coli gene content and transcription in stool from a clinical study of recurrent UTIs (rUTI). Created in BioRender. Young, M. (2024) BioRender.com/v86w986.