Introduction

Clinical microbiology laboratories are essential for exploring microbial traits pertinent to clinical services, including the diagnosis and treatment of infectious disease as well as infection prevention and control1. Traditionally, the initial identification of microbes based on their observed color and shape through microscopy relies on staining techniques such as Gram staining2, followed by isolation in culture media3. However, recent advances in high-throughput sequencing have enabled the comparison of genomes from large numbers of bacterial isolates4,5. Whole-genome sequencing (WGS) offers unparalleled resolution compared with classical molecular typing methods such as multilocus sequence typing (MLST) and pulsed-field gel electrophoresis, rendering some of them redundant6,7,8. WGS can be used to effectively uncover molecular characteristics that are relevant to bacterial adaption, epidemiology, and drug resistance7,9 across a variety of species, including Acinetobacter baumannii6,10,11, Escherichia coli12, Klebsiella pneumoniae13,14,15, Neisseria gonorrhoeae16, Pseudomonas aeruginosa17, Salmonella Typhimurium18, and Staphylococcus aureus19.

Bacterial WGS relies on various de novo assembly strategies that are categorized by the read length employed, namely, short-read only, short- and long-read hybrid, and long-read only assemblies. Short-read technologies, such as Illumina sequencing, are widely used in WGS. However, this approach often yields fragmented genome assemblies20,21. Additionally, the high operational and capital costs associated with Illumina sequencing limit its integration into routine laboratory workflows. Generating a perfect bacterial genome using Oxford Nanopore Technologies (ONT) and Illumina data involves a long-read-based de novo assembly, followed by serial polishing with high-accuracy short reads to correct the error-prone contigs20. This process requires considerable human expertise and comes with the additional cost of Illumina sequencing, making it impractical for laboratory generation of complete bacterial genomes. Recent advancements have made ONT long-read-only sequencing a promising alternative for pure cultures. This approach generates high-quality genomes15,22 through ligation-based library preparation following careful DNA extraction to obtain long fragments. However, the optimal DNA extraction method for WGS can vary depending on the target organism. For bacteria that are difficult to lyse, additional enzymes or sonication may be required to achieve sufficient cell wall disintegration20,23. Furthermore, although ONT rapid barcoding kits facilitate swift library preparation, the impact of this rapid barcoding in combination with mechanical bead beating during DNA extraction on genome quality remains unclear and requires further investigation.

In this study, we propose the RapidONT workflow, a streamlined approach for bacterial WGS. RapidONT employs a universal DNA extraction protocol that uses beat beating for efficient cell disruption, regardless of the organism’s Gram staining characteristics. This protocol aligns with the prioritization of nine pathogens by the World Health Organization (WHO24). Subsequently, the ONT Rapid Barcoding Kit is used to facilitate rapid and simplified library construction, minimizing wet lab procedures. The de novo assembly process is straightforward, involves minimal manual intervention, and is followed by polishing to improve genome assembly quality. Following assembly, the web-based platform Pathogenwatch is used for species identification, molecular typing, and antimicrobial resistance (AMR) prediction. This platform requires only basic bioinformatics skills. In evaluation experiments, the results from the RapidONT workflow were compared against complete reference genomes.

Materials and methods

Bacterial isolates

The WHO 2017 global priority pathogens list categorizes 12 bacterial groups into three tiers: critical, high, and medium priority24. This study employed isolates representing 9 of these 12 priority groups, sourced from the Taiwan Surveillance of Antimicrobial Resistance (TSAR) program25, which maintains a collection of clinical isolates from hospitals across Taiwan. These isolates were collected as part of routine surveillance efforts to monitor antimicrobial resistance trends, and their use in this research was carried out in accordance with relevant guidelines and regulations. A total of 90 isolates, encompassing both gram-negative and gram-positive pathogens, were included in the analysis. Information on the isolation source and collection date is available in Table S1.

The analysis included the following gram-negative bacterial species: Acinetobacter baumannii complex (ABC) (10 isolates: 8 A. baumannii, 2 Acinetobacter pittii, and 1 Acinetobacter nosocomialis), Enterobacteriaceae (10 isolates: 8 Klebsiella pneumoniae, 1 Klebsiella quasipneumoniae, and 1 Escherichia coli), Haemophilus influenzae (10 isolates), Neisseria gonorrhoeae (10 isolates), Pseudomonas aeruginosa (10 isolates), and Salmonella spp. (10 isolates). The following gram-positive bacterial species were analyzed: Enterococcus faecium (10 isolates), Staphylococcus aureus (10 isolates), and Streptococcus pneumoniae (10 isolates).

Gram stain–dependent and universal bacterial DNA extractions

For Gram stain–dependent DNA extraction approach, bacterial isolates were categorized based on their Gram stain characteristics. Subsequently, DNA extraction was performed using the DNeasy Blood & Tissue Kit (Qiagen GmbH, Cat. No. 69504, Hilden, Germany) following the manufacturer’s instructions. For gram-positive isolates, in the enzyme treatment step, lysostaphin at a concentration of 0.26 mg/mL was used instead of lysozyme for S. aureus. In the universal DNA extraction approach, the DNAs of all bacterial isolates, regardless of their Gram stain characteristics, were extracted using the DNeasy UltraClean Microbial Kit (Qiagen GmbH, Cat. No. 12224-50, Hilden, Germany) with automation using the QIAcube Connect machine (Qiagen GmbH, Cat. No. 9002864, Hilden, Germany) according to the manufacturer’s instructions. Bacterial lysis was achieved using a Precellys 24 tissue homogenizer (Bertin Technologies SAS, France). The homogenizer settings were optimized for efficient lysis: 6800 rpm for 30 s, followed by a 60-s pause, repeated over three cycles.

Illumina shotgun and ONT nanopore sequencing

Illumina shotgun sequencing

Illumina shotgun sequencing was conducted using the iSeq100 platform (Illumina). Library preparation involved the use of the Nextera DNA Flex kit and barcode kit (Illumina), according to established protocols. Sequencing was partially conducted by the Microbial Genomics Core Laboratories at the Institutes of Genomics and Bioinformatics, National Chung Hsing University.

ONT nanopore sequencing

For de novo whole-genome assembly, DNA extracted using the Gram stain-dependent methods was sequenced using various ONT kits, primarily the ligation sequencing kit and native barcoding kit, following previously described protocols11,26.

DNA extracted using the universal method was subjected to library construction with the ONT Rapid Barcoding Kit 96 (SQK-RBK110.96) following manufacturer’s instructions. However, the amount of DNA input was modified to 200 ng of DNA per sample, along with 1.3 µL of rapid barcode (RB01-96). The DNA library containing a maximum of 24 barcoded samples was loaded onto a MinION SpotON flowcell R9.4.1 (FLO-MIN106), and sequencing was executed using MinKNOW v22.08.9 with live basecalling, demultiplexing, and barcode trimming on Guppy v6.2.11 in high accuracy mode (HAC), targeting a minimum duration of 18 h.

Following the sequencing run, the flowcell was flushed using the flowcell wash kit (EXP-WSH004) as per ONT’s instructions. Before a second DNA library was loaded, the flowcell was primed as instructed in the SQK-RBK110.96 manual.

De novo genome assembly

For Nanopore sequencing reads derived from the Gram stain-dependent DNA extraction method, two distinct assembly strategies were implemented: long-read only and the short- and long-read hybrid assembly. The long-read only assemblies were generated using CCBGpipe26, whereas Unicycler v0.4.927 was employed in the hybrid method. To enhance the quality of the assembled genomes, a multistep polishing process was implemented. First, the long-read only assemblies were polished using ONT Medaka v1.4.3 twice and Homopolish v0.3.3 once28. This was followed by short-read polishing with Polypolish v0.5.029. The completeness of the genome assemblies was manually assessed using a combination of tools, including web-based BLASTN30, minimap2 v2.24-4112231, and Tablet v1.21.02.0832, to facilitate the integration of genome assemblies obtained using both methods. The strategy for genome assembly integration is detailed in the Supplementary Methods. The integrated complete genome assemblies were subjected to dual polishing by using the POLCA tool within MaSuRCA-4.1.033. This step generated circular, complete genomes that served as reference sequences for subsequent comparative analyses, including the evaluation of assemblies by using QUAST 5.2.034 and genomic characteristic predictions.

For Nanopore sequencing reads derived from the universal DNA extraction method, de novo assembly was conducted using Flye v2.9.235, resulting in draft genomes. These draft genomes were further polished using Medaka v1.9.1 with the r941_min_hac_g507 model and Homopolish v0.3.3. All software were run using 20 cores (Intel Xeon Gold 6230 2.10-GHz) with 1TB of memory. By using the complete genomes obtained using the conventional workflow as references, QUAST was employed to assess the various assemblies generated during the RapidONT workflow: Flye-assembled (draft), Medaka-only polished (Medaka) assemblies, Homopolish-only polished (Homopolish or homo for short), and Medaka followed by Homopolish polished (m + h, i.e., polish) assemblies.

Species identification, MLST, and AMR prediction

Following assembly and quality assessment, the complete reference genomes along with draft assemblies generated by Flye and the various polished versions (Medaka, Homopolish, and m + h) were uploaded to Pathogenwatch (https://pathogen.watch). This web-based platform facilitates rapid in silico characterization of bacterial genomes, including species identification, MLST, and AMR prediction36,37,38. For A. pittii and A. nosocomialis, mlst 2.23.0 was employed for MLST prediction39. AMR prediction for E. coli, H. influenzae, P. aeruginosa, and Salmonella enterica was performed using ResFinder 4.3.240.

Results

Complete genomes obtained through the conventional workflow, serving as references

To evaluate the completeness of genome assemblies generated by the RapidONT workflow, complete reference genomes were established using a conventional workflow (Fig. 1). One S. aureus sample with a previously published complete genome sequence on NCBI (accession number: CP006838)41 was included, while DNA extraction for the remaining 89 samples performed based on their Gram staining characteristics. Hybrid assemblies (combining short and long reads) were generated using Unicycler, while long-read only assemblies were processed with CCBGpipe and subsequently polished through multiple steps (Table S2). After manual curation, circular genomes were successfully assembled for all 89 samples and further polished using Illumina reads with the POLCA tool, resulting in an average quality value (QV) of 93.03 (range, 46.42–100, Table S3). Detailed protocols are provided in Supplementary Methods. These high-quality complete genome assemblies represent a robust reference standard, serving as the baseline for evaluating and comparing genome assemblies obtained using the RapidONT workflow.

Fig. 1
figure 1

Comparison of bacterial whole-genome analysis: conventional versus RapidONT workflows. Conventional workflow: Obtaining complete genomes using the conventional workflow involves the use of different DNA extraction protocols and extensive manual interventions during the de novo assembly process. A total of 89 sequencing datasets, generated by both ONT and Illumina platforms, were de novo assembled using CCBGpipe and Unicycler, respectively. The CCBGpipe assemblies underwent a serial polishing process, and the polished contigs were used as references for completeness evaluation. Unique circular contigs generated by Unicycler, following plasmid validation, were integrated into the CCBGpipe results. The final complete circular genomes were polished twice with POLCA. The RapidONT workflow is a simplified approach. It uses a universal DNA extraction protocol, followed by de novo assembly and a basic assembly polishing process, minimizing the need for manual intervention.

Nanopore sequencing of 90 bacterial isolates by using two MinION flow cells

The RapidONT workflow comprises three key steps: universal DNA extraction, transposase-based rapid library construction, and de novo assembly using Flye (Fig. 1). To mimic real-world practices, DNA samples from various species were sequenced in a single run, eliminating the waiting time for sample collection for the same species. The universal DNA extraction method, regardless of Gram staining, consistently yields DNA that meets both quantity and quality requirements for direct use in ONT sequencing. The quality assessment data of extracted DNA are summarized in Table S1. Four sequencing runs were conducted using two MinION flow cells. The first flow cell generated passed sequencing data of 5.88 Gbp (Run1, barcode1-24) and 4.19 Gbp (Run2, barcode57-80) from 24 samples each, whereas the second flow cell produced 3.96 Gbp (Run3, barcode1-24) and 2.29 Gbp (Run4, barcode57-74) of passed sequences from 24 and 18 samples, respectively. To address potential carryover contamination despite a previous report indicating minimal risk from reusing a washed flow cell42, samples in the reused run were assigned distinct barcodes for efficient filtering of remnant sequences. Following overnight sequencing (~ 20 h), Flye generated draft assemblies for the targeted nine WHO priority pathogens, averaging 30 min per run (Supplementary Methods). These assemblies were polished once with Medaka and once with Homopolish, without requiring additional short-read data, to produce polished assemblies. The polishing process required less than 30 min per run. Post de novo assembly analysis was primarily performed using the web-based platform Pathogenwatch (Fig. 1). The streamlined RapidONT workflow produced genome assemblies that were subsequently compared with the complete reference genomes generated by the conventional workflow. This comparison allowed for the evaluation of assembly completeness and accuracy, demonstrating the effectiveness of this simplified process for rapid bacterial genome sequencing and analysis.

Sequencing performance and quality assessment of the RapidONT workflow

Out of the 90 samples, only one Salmonella spp. isolate with the lowest sequencing yield (8 Mbp) failed to produce a Flye assembly. However, during the polishing process, 12 additional Flye assemblies encountered difficulties due to numerous fragmented contigs and the inability to identify 20 homologous sequences by using Homopolish. These 13 samples (identified as hollow circles in Fig. 2A and crosses in Fig. 2B) included nine Salmonella spp. isolates, three S. aureus strains, and one P. aeruginosa strain. As depicted by the sequence summary in Fig. 2A, gram-negative strains generally yielded a greater number of sequence bases (average yield, 244 Mbp for 50 samples) than gram-positive strains (average yield, 69 Mbp for 20 S. aureus and S. pneumoniae samples), except for Salmonella spp. and E. faecium. The throughput ranged from the highest 497 Mbp in A. pittii to as low as 8 Mbp in Salmonella spp (Table S1). Among specific species, A. baumannii (average, 356 Mbp) and H. influenzae (average, 264 Mbp) exhibited significantly higher sequence yields (tested by Wilcoxon rank-sum test with p < 0.05), whereas Salmonella spp. (average, 50 Mbp), S. aureus (average, 53 Mbp) and S. pneumoniae (average, 86 Mbp) had significantly lower yields. In terms of read length, the average read length across all the 90 samples was 4,230 bp. However, significant differences were observed between species. P. aeruginosa (6,668 bp), H. influenzae (5,215 bp), and Enterobacteriaceae (5,069 bp) had significantly longer average read lengths, whereas Salmonella spp. (985 bp) and S. pneumoniae (2,838 bp) had shorter average read lengths than other species (Fig. 2A). These findings indicate that RapidONT effectively generated draft genome assemblies for eight pathogenic groups, except for Salmonella spp. Additionally, a minimum read length of 2,000 bp is recommended to ensure high-quality draft genome assemblies.

Fig. 2
figure 2

Sequencing and assembly metrics for different pathogen groups. (A) Sequencing data presenting key metrics such as total sequence bases, average read length, sequencing depth, and the percentage of the genome covered by at least fivefold sequencing reads. (B) Assembly metrics, including the number of contigs, length of the largest contig, assembly coverage rate, and assembly accuracy compared with the reference genome. Different colors indicate different species within each group: A. pittii is depicted in red, A. nosocomialis in light blue, K. quasipneumoniae in orange, and E. coli in purple. Hollow circles in (A) and crosses in (B) indicate samples that only have draft assemblies generated using Flye.

Bacterial genomes vary in size (e.g., > 7 Mbp for P. aeruginosa vs. < 2 Mbp for H. influenza). Therefore, sequence depth is a more reliable metric than sequence bases for determining sequencing needs. Through the normalization of sequence depth to genome size, we determined a minimum coverage of 20-fold to be necessary for genome assemblies with sufficient completeness (assembly coverage > 98.4%) and high accuracy (> 99.5%; Table S1). In all, 23 samples, including the 13 samples with only draft assemblies, fell below this 20-fold cutoff. Nevertheless, seven draft assemblies maintained adequate contiguity, with their N50 being equal to the maximum contig length (Table S1). The fivefold coverage rate reveals isolates of insufficient sequencing amount, particularly evident in fragmented assemblies (hollow circles in Fig. 2A). As expected, the 77 polished assemblies achieved a high average coverage rate (99.81%) and accuracy (99.94%). Notably, 31 samples achieved complete assembly coverage (100%), and 55 samples had accuracy exceeding 99.98%. However, the accuracy check revealed an outlier among the nine pathogens (Fig. 2B)—N. gonorrhoeae had a lower average accuracy of 99.64%. After N. gonorrhoeae was excluded, the average accuracy for the remaining species reached 99.98% (Table S1).

The assembly accuracy was also calculated based on the number of mismatches and indels reported in the QUAST results (Table S4). Accuracy was compared across three polishing strategies: Medaka-only polishing (Medaka), Homopolish-only polishing (Homopolish), and Medaka followed by Homopolish polishing (m + h). As shown in Fig. 3, the m + h polishing method was most effective in reducing both mismatches and indels across all nine pathogenic groups compared with the Medaka and Homopolish methods. Medaka effectively mitigated mismatch errors, especially for H. influenzae and P. aeruginosa, whereas Homopolish was better at handling indels, particularly in Enterobacteriaceae and P. aeruginosa assemblies. The m + h approach offered a balanced strategy for correcting both mismatches and indels across different bacterial genomes, resulting in highly accurate contigs. Therefore, the m + h polishing method was adopted in this study for RapidONT workflow.

Fig. 3
figure 3

Comparison of the effectiveness of various polishing methods in reducing mismatches and indels within draft assemblies for different pathogenic groups. Stacked bar graphs represent the total number of errors identified using QUAST across draft assemblies generated using Flye and polished assemblies processed using different methods, namely Medaka only (medaka), Homopolish only (homopolish), and Medaka followed by Homopolish (m + h), compared with the reference genomes.

Comparisons of species, MLST, and resistance analyses between the conventional and RapidONT workflows

Complete genomes, draft assemblies, and polished assemblies were uploaded to the Pathogenwatch platform for species identification, MLST typing, and predictions of AMR and virulence factors. The platform successfully identified two isolates of A. pittii and one of A. nosocomialis in the A. baumannii group, as well as one isolate each of Klebsiella quasipneumoniae and Escherichia coli in the Enterobacteriaceae group, and Salmonella enterica in the Salmonella spp. group across all genomes obtained from both workflows. While MLST and AMR predictions were available for most species (data in Table S5), the platform returned incomplete results for a few others. Specifically, MLST data were unavailable for A. pittii and A. nosocomialis, and AMR predictions were unavailable for E. coli, H. influenzae, P. aeruginosa, and S. enterica. To address these gaps, manual analysis was conducted to ensure comprehensive characterization. The MLST analysis of the 90 complete genomes detected an undetermined allele each in the following samples: N. gonorrhoeae (abcZ in VP84), S. aureus (aroE in VP77), and P. aeruginosa (acsA in VP27), as shown in Table S5. These findings suggest potential novel allelic variations not yet included in the current PubMLST reference database. For the 77 isolates with polished assemblies, we compared MLST predictions from both draft and polished assemblies against the reference genome with 536 alleles. Figure 4 presents the prediction agreement rates for MLST allele and AMR prediction (excluding one S. enterica sample with seven identical predictions). Furthermore, perfect predictions (100% agreement) were achieved using polished assemblies (m + h) generated using the RapidONT workflow for four pathogen groups: A. baumannii, H. influenzae, E. faecium, and S. pneumoniae (Fig. 4). However, three alleles in three K. pneumoniae strains (gapA in VP42, tonB in VP43, and gapA in VP66), one allele in P. aeruginosa (guaA in VP28), and four alleles in one S. aureus strain (gmk, pta, tpi, and yqiL in VP77) were not accurately predicted (Table S5), which may be attributed to the low sequence depth. The sequence depths of these five strains in sequential order were 10, 13, 51, 21, and 9 (Table S1). Notably, despite having sufficient sequence depths (60–141), N. gonorrhoeae displayed limited prediction success (2–3 alleles per isolate).

Fig. 4
figure 4

Agreement rates of genomic predictions of multilocus sequence typing alleles and antimicrobial resistance genes between reference genomes and various assemblies across different pathogenic groups.

For AMR predictions, only the number of resistant classes identified by Pathogenwatch or ResFinder was considered. The comparative analysis revealed that A. baumannii, H. influenzae, P. aeruginosa, S. aureus, and S. pneumoniae (in Fig. 4), along with the single S. enterica sample, exhibited perfect agreement (100%) between draft and polished assemblies compared with complete genomes. Although polishing increased the number of AMR predictions E. faecium (from 107 in Flye-assembled genomes to 119, 123, and 123 in Medaka, Homopolish, and m + h polished assemblies), two AMR predictions remained absent in two K. pneumoniae strains (i.e. VP42 and VP68 in Table S5). Similarly, N. gonorrhoeae exhibited a substantial improvement in AMR predictions after polishing (from 11 to 29), but seven strains still lacked a total of 12 AMR predictions compared with the reference genome. Interestingly, strains VP81, VP86, and VP89 achieved perfect prediction accuracy with polished assemblies. These findings suggest that the RapidONT workflow offers accurate AMR predictions from polished assemblies, comparable to those derived from complete genome obtained through the conventional workflow.

Discussion

Modern laboratories prioritize the development of rapid, scalable, and cost-effective workflows to minimize labor-intensive tasks. Despite rapid advances in WGS, its routine adoption has been hindered by challenges such as cost, potential inaccuracy, complex library construction processes, the need for a variety of DNA extraction protocols, and the need for expertise in postsequencing data analysis. The high capital expenditure associated with Illumina sequencing system often restricts WGS to well-funded research institutions and biotechnology companies. Conversely, ONT offers the MinION Mk1B starter pack for a very low price of US$1999, broadening access to whole-genome analysis, particularly in lower income regions43. Moreover, ONT actively invests in the development of rapid sequencing chemistries, simplifying library construction and enabling the generation of multiplexed sequencing libraries within an hour. This study presents a streamlined workflow designed to challenge the traditional “longer-is-better” paradigm in long-read sequencing. Although the universal bead-beating cell lysing protocol for DNA extraction and transposase-based library construction contribute to DNA fragmentation, the resulting average read length of 4,230 bp across 90 samples remains advantageous for de novo assembly compared with the shorter reads generated by Illumina systems. Additionally, the benefits of a universal DNA extraction protocol, coupled with the rapid turnaround time for library construction, outweigh the drawbacks associated with DNA fragmentation. This facilitates the broader adoption of WGS in various laboratory settings. We aimed to simplify experimental protocols by using a universal DNA extraction method and a rapid barcoding kit. To maximize the capacity of a single flow cell while mitigating potential barcode imbalance issues, we selected samples from nine WHO priority pathogen groups, with a maximum of 24 samples processes per run. To further reduce costs (below US$20 per sample), flushed flow cells were reused once, enabling the sequencing of a maximum of 48 samples per flow cell. The costs associated with the RapidONT workflow are outlined in Table S6. Notably, the Illumina website indicates that a price point below US$20 per sample is currently only achievable for targeted gene expression profiling (US$23) or 16S metagenomic sequencing (US$18) (https://sapac.illumina.com/science/technology/next-generation-sequencing/beginners/ngs-cost.html, accessed May 13, 2024). Furthermore, the average cost of bacterial genome sequencing by using Illumina technology typically ranges from US$50 to US$10044. Our findings demonstrate the effectiveness of the RapidONT workflow for a diverse range of human pathogens, with the exception of Salmonella spp., which had problems with short read length and low throughput, and N. gonorrhoeae, which had low assembly accuracy.

Our study findings indicate that a minimum average read length of 2000 bp and a minimum sequencing depth of 20-fold are crucial to obtaining accurate genomic information necessary for species identification, MLST, and AMR prediction in most bacterial pathogens. The employed universal DNA extraction method was particularly effective for gram-positive E. faecium, yielding a high number of long reads. However, the method was less effective for other gram-positive species, S. aureus and S. pneumoniae, resulting in a reduced quantity and shorter length of extracted DNA. Conversely, this method was successful for all tested gram-negative species, with the exception of Salmonella spp. This disparity highlights the need for future investigation into alternative bead-beating settings or DNA extraction kits to optimize DNA yield and quality across diverse bacterial taxa. For example, we have successfully integrated the DNeasy PowerLyzer Microbial Kit (Qiagen, Hilden, Germany) into the RapidONT workflow, enabling the sequencing of over 250 clinical isolates of the Enterobacter cloacae complex by using only six MinION flow cells. Furthermore, we are currently utilizing R10.4.1 flow cells to sequence over 100 K. pneumoniae isolates with the RapidONT workflow. While each flow cell process approximately 40 genomes of this size (~ 5 Mb), the resulting genome quality has shown significant improvement, achieving Q-scores of Q40 or higher.

This study employed Flye, Medaka, and Homopolish for de novo assembly and polishing of bacterial genomes, enabling accurate genomic predictions for MLST and AMR. However, a significant outlier was observed in the sequencing accuracy of N. gonorrhoeae. Although a relatively high sequencing depth (averaging 97) was achieved, the estimated prepolishing sequence accuracy for N. gonorrhoeae contigs assembled with CCBGpipe (ONT long-read only) was notably lower (99.7%) compared with other pathogens (99.9%) assembled using Polypolish (Table S2). This discrepancy persisted even after re-basecalling the N. gonorrhoeae samples in the super accuracy mode (SUP) and subsequent polishing with Medaka and Homopolish. This discrepancy suggests potential limitations extending beyond the basecalling and polishing methods. Whether the higher degree of sequence variability in N. gonorrhoeae is attributed to its higher levels of methylation45 or the inherent variability associated with its diploid nature46,47, which contributes to the observed basecalling errors, remains unclear. Although Illumina short-read alignments indicated a homozygous pattern, this approach might not fully capture the complexity introduced by methylation. Our analysis using the DeepMod2 tool48 revealed that these sequence variations were indeed located at methylated sites. This finding supports the hypothesis that methylation could play a role in the observed basecalling errors. These findings highlight the complex genomic structure of N. gonorrhoeae. The heightened levels of methylation, or the variability in methylation due to its diploid nature, might have influenced the accuracy of basecalling algorithms in this study.

Pathogenwatch provides a user-friendly interface for uploading genome assemblies and conducting various analyses, including species determination, MLST, cgMLST, and genomic predictions of AMR and virulence. While its specialized capabilities for specific bacterial species are a notable strength, some limitations do also exist. One such limitation is the targeted application of functionalities. For example, MLST analysis for Acinetobacter species, such as A. pittii and A. nosocomialis, was omitted despite using the same database as A. baumannii. Similarly, although carbapenemase-producing E. coli poses a global public health threat, Pathogenwatch lacks AMR predictions for this specific species. Other bioinformatics resources such as the Center for Genomic Epidemiology offer noncommercial, free online tools such as ResFinder for users with limited bioinformatics expertise. These resources bridge the gap between wet- and dry-lab practices in WGS applications. The development of a cohesive and user-friendly informatics system could enable public health organizations to more effectively share data, processes, and analyses, promoting reproducibility, accessibility, and verifiability in pathogen genomic studies while maintaining individual autonomy49.

Conclusion

The study introduces the RapidONT workflow, a streamlined approach for WGS of various bacterial pathogens that combines a universal DNA extraction protocol, ONT transposase-based rapid barcoding library construction, de novo assembly, and basic assembly polishing. This workflow is coupled with a user-friendly web-based platform for species identification, molecular typing, and AMR prediction. The minimum reference point requires a 20-fold sequencing depth with an average read length of at least 2,000 base pairs for accurate genomic characterization. This threshold allows for the termination of a sequence run and flow cell reuse after data acquisition, thereby reducing sequencing costs. The proposed RapidONT workflow streamlines experimental protocols and minimizes costs associated with WGS. The effectiveness of this approach in generating accurate genomic characteristics for crucial pathogen groups has been validated.