Introduction

Pseudomonas aeruginosa is an opportunistic Gram-negative pathogen and a common cause of hospital-acquired infections, accounting for nearly 10% of all nosocomial infections1. This pathogen is particularly alarming due to its high mortality and morbidity rates, reaching up to 40% in immunocompromised patients2. P. aeruginosa’s adaptability to adverse environments and production of various virulence factors enable it to cause diverse infections, including in patients with cystic fibrosis, pulmonary disease, sepsis, traumas, and burn wounds1,3. Among its virulence factors, biofilm production is notably problematic as it acts as a barrier, enhancing the bacterium’s survival and persistence in harsh environments4. Additionally, P. aeruginosa’s rapid mutation and adaptation confer resistance to a broad range of antibiotics, posing a significant challenge for therapeutic treatments5. In 2017, the World Health Organization (WHO) listed P. aeruginosa as a high-threat antibiotic-resistant pathogen, necessitating global concern6.

This situation demands innovative and effective methods for early epidemiological identification and genotyping, which can facilitate more precise antibiotic therapy and prevent subsequent colonization and chronic infection by P. aeruginosa in medical settings. Despite advances in molecular techniques, current methods for identifying and classifying P. aeruginosa are limited. The gold standard techniques for genotyping P. aeruginosa include pulsed-field gel electrophoresis of SpeI-restricted genomic DNA (PFGE-SpeI)7, single nucleotide polymorphism (SNP) analysis8, and core genome multilocus sequence typing (cg-MLST)9. Although these methods provide valuable information, their wide spread application are hampered by high costs, labor intensity, technical complexity, and insufficient discriminatory power at the strain level10,11. In addition to that, high genomic similarity among P. aeruginosa strains, even from different niches, further complicates strain-level genotyping12,13.

Recently, whole-genome sequencing combined with bioinformatics analysis has become increasingly prevalent for analyzing microbial diversity and tracing pathogen origins14,15,16. As of April 5, 2024, there are 30,192 whole-genome sequences of P. aeruginosa registered in GenBank. Nevertheless, to the best of our knowledge, none of study has yet utilized the publicly accessible genome sequences of P. aeruginosa for its genomic, evolutionary, and diversity analysis. As a result, many deposited P. aeruginosa genome sequences are remains in text files without a comprehensive genomic analysis.

In this context, our study aims to conduct a comparative genomic analysis of multiple P. aeruginosa strains available in the GenBank database, with the goal of improving conventional methods for measuring genetic diversity without relying on whole-genome sequencing (WGS) techniques. Initially, we downloaded and compared the genome data of P. aeruginosa strains registered in genebank to identify intraspecific genes. We discovered some protein-encoding genes differ in single-amino-acid repeats (SARs) of histidine (H). SARs, or homopolymeric amino acid tracts, are more abundant in eukaryotes than in prokaryotes and account for nearly one-fifth of all human gene products17. Despite their high distribution, the main function and evolution of SARs remain unclear. Generally, SARs are dynamic elements present in various patterns and locations throughout the genome, varying from strain to strain, making them unique to each organism12,18. Henceforth, several studies have demonstrated the use of these unique SARs for bacterial strain identification19,20.

Using this paradigm, in this study, we aim to differentiate strains of P. aeruginosa by analyzing variations in their SARs repeat patterns. We also created gene maps for genomic segments of strains exhibiting SARs variations. Notably, strains with similar gene mapping structures demonstrated consistent SARs patterns, leading to uniform genetic profiles. The findings from this research will offer valuable insights for the development of novel, highly discriminative, and easy-to-manage genetic markers based on SARs and gene cluster patterns. This approach has the potential to reduce both the cost and time required for conventional strain typing methods.

Materials and methods

Sampling collection

The in-house P. aeruginosa strains used in the study were isolated from agricultural produce distributed in South Korea. The collection of contaminated produce samples and use were carried out in accordance with the “Detection methods of foodborne pathogens in agricultural produces” of Rural Development Administration (RDA, Jeonju, South Korea, 2021) which is responsible for the management of foodborne pathogen-contaminated crops. The source of produce samples is listed in the Table 1.

Table 1 Detailed information of in house Pseudomonas aeruginosa strains.

Bacterial cultures and DNA extraction

The used P. aeruginosa strains were isolated from peppers, carrots, radishes, and Chinese cabbage obtained from markets in South Korea. The surface area of agricultural produce was cut into 25 g pieces, which were then placed into sterilized bags containing 250 mL of buffered peptone water. Following this, the samples were incubated at 37 ℃ for 24 h. After incubation, 100 µl of the macerated samples were streaked onto Pseudomonas Isolation Agar (PIA) and further incubated at 37 ℃ for 24 h21. Next, a single colony of P. aeruginosa was selected by comparing the obtained colonies from the positive control, and then streaked repeatedly to obtain a pure culture. To further confirm whether the colonies are P. aeruginosa, specific DNA regions were amplified by C1000 Touch Thermal Cycler (Bio-Rad, Inc., Germany) with PA431CF/R primers22.

The bacterial cultures were stored with 15% glycerol in a 1:1 ratio at -80 ℃ for subsequent DNA extraction. P. aeruginosa isolates were cultured in Luria-Bertani Broth at 37 ℃ and 180 rpm for 24 h, and the cell pellets of the culture were used to extract genomic DNA using a DNA extraction Kit (IncloneTM Genomic Plus DNA Prep Kit, Inclone Biotech, Inc., Korea), according to the manufacturer’s instructions.

Primer design

Two sets of primers were designed to directly analyze amino acid tandem repeats in the isolates of P. aeruginosa using PrimerSelect software (Version 15.1.0 (155); DNASTAR Inc., Madison, WI, USA). From the nucleotide sequences of the gene encoding CDF family iron/cobalt efflux transporter AitP and protease modulator HflC, forward and reverse primers were designed more than 40 bp outside of target region. The primers cdfa_F/R (5’–GGCCGGGCTGATGCTCTACCAAT–3′, 5′–CCACCGGCGCGTCCATCTGC–3′) for the CDF family iron/cobalt efflux transporter AitP protein and hflc_F/R (5′–CACTCCCACTCGCACAGCCACCAC–3′, 5′–TCCAGCGCCGAGCCGACGAA–3′) for the protease modulator HflC were selected. The PCR mixture for the amplification was as follows: 25 ng of genomic DNA, 10 pM of each primer, 1.25 unit of Taq Polymerase (Takara Bio, Inc., Japan) 1X Ex Taq Buffer, and 0.25 mM of dNTPs in a total volume of 50 µl. PCR was performed according to the previous study with slight modifications as follows: predenaturation at 95 ℃ for 3 min; 40 cycles of 95 ℃ for 60s, 68 ℃ for 30s, and 72 ℃ for 30s; and final extension at 72 ℃ for 10 min (Choi et al., 2013). The final products were 443 bp and 488 bp for AitP and HflC genes, respectively. The amplicons of strains BS0003PA, BS0009PA, BS0018PA, BS0022PA, and BS0028PA were cloned and sequenced (Macrogen, Daejeon, South Korea) to determine amino acid repeats.

Targeted region sequencing

The PCR amplification products obtained using cdfa_F/R and hflc_F/R primers were cloned and then sequenced. Following the sequencing on the Illumina platform, the reads were assembled into contiguous sequences using SPAdes version 3.1.3.0. Gene annotation and prediction were processed using Prokka version 1.13. The annotation process included the prediction of coding sequences (CDS), ribosomal RNA (rRNA) genes, transfer RNA (tRNA) genes, and other regulatory elements in the genome23.

Specific gene genomic analyses

We collected genomic FASTA files of the coding DNA sequences (CDSs) of the 816 P. aeruginosa strains, from the NCBI bacterial genome database (https://www.ncbi.nlm.nih.gov/genome/). We then confirmed the identity of the obtained sequences by verifying their taxonomy and Average Nucleotide Identity results against the NCBI Genome Assembly database. All collected sequences were compared to mine for species-specific genes, focusing on those with more than five amino acid differences per gene. The nucleotide and amino acid sequences of these genes were compared across P. aeruginosa strains using MUSCLE module of the Megalign Pro software (Version 15.1.0 (155); DNASTAR Inc., Madison, WI, USA). As a result, we identified variations in amino acid repeats within these genes among the P. aeruginosa strains. For the comparative analysis, the MLST sequence types of strains exhibiting variations were also retrieved using the software developed by Torsten Seemann that rely on PubMLST.

Structural analysis of the genome mapping

To identify and compare nucleotide sequences, a BLAST (Basic Local Alignment Search Tool) search was conducted using the NCBI (National Center for Biotechnology Information) online platform. We compared and analyzed the CDS regions located from the TAXI family TRAP transporter solute-binding subunit to the DNA polymerase III subunit chi and Rsx family gene regions in P. aeruginosa strains. The following specific settings were used to adapt the search according to our research requirements. The “Nucleotide collection (nr/nt)” was selected from the Standard databases available on the BLAST site. We excluded sequences from uncultured or environmental samples. The search was optimized for “somewhat similar sequences (blastn)” to improve the specificity and relevance of our query results.

Result and discussion

Understanding the genetic variability among different strains of P. aeruginosa is crucial for identifying distinct strains and gaining deeper insights into their local and global transmission patterns and distribution. This knowledge aids in the development and implementation of new, targeted (strain-specific) treatment strategies to combat this pathogen in clinical settings. Various methods, including PFGE-SpeI, SNP analysis, and cg-MLST, have been employed to study the phylogeny and genomic traits of different P. aeruginosa strains7,8,9. However, these methods face challenges in examining the complete genome in a single, reliable experiment. Thus, establishing a trustworthy and high-resolution protocol for differentiating closely related strains used in commercial or scientific applications is essential. In this study, we introduce a novel genotyping technique that enables the differentiation of P. aeruginosa at the strain level by analyzing variations in single-histidine repeats (SHRs) and gene structure (gene mapping).

Initially, we obtained a total of 816 distinct P. aeruginosa sequences from NCBI and subjected them to a thorough examination to identify any genotypic markers using comparative genomic analysis. Detailed information about these strains is available in the NCBI database (https://www.ncbi.nlm.nih.gov/genome/browse#!/prokaryotes/Pseudomonas%20aeruginosa). Our results revealed significant variations in two protein-encoding genes, the CDF family iron/cobalt efflux transporter AitP and protease modulator HflC, across most of the strains (44 strains). These variations were primarily due to a trinucleotide tandem repeat (i.e., 5’-CAC or CAT-3’) that encoded histidine residues (H).

Classification based on single amino acid repeat (SAR) patterns

Single amino acid repeats (SARs) are short sequences in proteins that consist of one amino acid repeated several times in a row12,18. Due to high variability and genetic diversity among bacterial populations, these repeats are commonly used in bacterial identification, allowing unique genetic fingerprints to be generated24. Hence, a multitude of researchers has employed these SARs for the purpose of bacterial identification25,26,27. For instance, Subirana et al. (2021)26 found that tandem repeats in Bacillus exhibit distinct characteristics, including length, sequence composition, and distribution throughout the genome. These features can be used for taxonomic classification and molecular typing of Bacillus species. Another study demonstrated the utility of internet-based resources for developing and analyzing tandem repeats-based bacterial strain typing in E. coli25. Following this paradigm, our study presents unique genotyping characteristics based on SHRs to facilitate the identification of P. aeruginosa strains at the strain level.

CDF family iron/cobalt efflux transporter AitP protein

The gene encoding the CDF family iron/cobalt efflux transporter AitP (WP_023088961.1 of strain DHS01) was identified in 44 strains out of 816, displaying notable differences in gene size. The information about the 44 strain is illustrated in Table 2. The obtained variations were primarily attributed to a trinucleotide tandem repeat (i.e., 5′–CAC or CAT–3′) that encoded histidine residues (H), resulting in a single-histidine repeat (SHR) spanning from 6 to 19 units. Based on the histidine repeat number, the strains were categorized and named as CDF 6, 8, 9, 10, 12, 14, 15, and 19 (Table 2). The analysis revealed that the majority of strains, totaling 11, contained 6 units of SHRs, suggesting a high prevalence. On the other hand, only one strain each had 9 and 19 units of SHRs, showing a low prevalence. Interestingly, histidine (H) was randomly substituted with aspartic acid (D) in most strains, regardless of the SHRs length. These changes were found to be occurred by the substitution of G instead of C in the codon of histidine- CAU, CAC.

Table 2 Detailed information of the Pseudomonas aeruginosa strains displaying single histidine repeats (SHRs) variance.

Besides, to directly assess variation, the SHR gene from five different in house P. aeruginosa strains (viz., BS0003PA, BS0009PA, BS0018PA, BS0022PA, and BS0028PA) was isolated using an PCR primers (cdfa_F/R), resulting in 443 bp amplicons. Following the purification and sequencing of the obtained amplicons, the strains were found to have SHRs unit of 6, 12, and 13 (Table 1). Interestingly, we found that the histidine residues was replaced with aspartic acid in in- house strains as well. However, we did not find the exact reason for this substitution in SHRs of CDF gene.

Protease modulator HflC protein

To gain further insights, the strains showed the SHRs variation in AitP protein were subjected to additional analysis. Interestingly, it was observed that all strains also exhibited another distinguishable SHRs repeats in gene responsible for encoding HflC protein. The units of SHRs ranged from 5 to 21 and strains were categorized based on the number of repeats, which included 5, 9, 11, 13, 15, 19, and 21 (Table 2). Notably, the SHRs patterns observed in the CDF gene displayed a strong correlation with SHRs pattern of HflC protein. This led to the hypothesis that both genes, CDF and HflC, might have a similar or single mode of function in the respective P. aeruginosa strains.

Origin of SHRs groups

The origin of each SHRs groups were analyzed. The result revealed that the isolates of SHRs groups originated from 12 countries, as listed in Table 2. Notably, the highest number of SHRs groups, totaling 12, were isolated from China. Conversely, the isolates of India, Netherlands, Japan, and Sweden had the lowest number of SHRs groups, with only 1 group each.

The isolates of South Korea were in narrow range of SHRs groups such as CDF6, CDF11, HflC 19 and HflC 21 (Table 2). Whereas Brazil strains were only in the group of CDF 10 and HflC 5. This suggesting the low level of genetic diversity in P. aeruginosa strains in both South Korea and Brazil. In contrast, there was significant diversity in SHRs group among strains isolated from the United Kingdom, USA, China, and Mexico City, indicating a high level of genetic diversity. Nevertheless, there was no apparent connection between the origin of isolation and SHRs groups of CDF and HflC.

Correlation of SHRs pattern with MLST

For the comparison, we analyzed the MLST type numbers of strains that exhibited variations in SHR repeats. Notably, most of the SHR patterns for both CDF and HflC showed a strong correlation with the MLST sequence types, underscoring the reliability of these patterns for sub-typing P. aeruginosa strains (Table 2). Interestingly, this method also has the potential to differentiate strains that are indistinguishable by MLST, including those lacking an MLST number, highlighting the novelty and significance of the current study (Table 2).

Genomic mapping and comparison of DNA segments in P. aeruginosa strains

Even though, the gene typing of P. aeruginosa strains according to SHRs yield significant genetic diversity, the resolution of this method is relatively low. Therefore, we conducted genomic mapping analysis to further explore genetic diversity among different P. aeruginosa strains. We compared the genetic composition of 16 selected P. aeruginosa strains for a specific genomic segment to identify the genetic diversity. The information of the selected strains for gene map analysis is illustrated in Table 3. The Peudomonas otitidis MrB4 and Peudomonas nitroreducens L4 were used as reference sequence for the comparison. According to the gene map, it was found that most of strains exhibited variations in two segments viz., the Rsx family gene and TAXI-TRAP gene.

Table 3 Basic information of the Pseudomonas aeruginosa strains used for gene map analysis.

Grouping of P. aeruginosa strains based on the variation in TAXI-TRAP

The genetic maps were anchored to a stable backbone structure, extending from the TAXI family TRAP transporter solute-binding subunit to the DNA polymerase III subunit chi. This structural alignment allowed for the classification of strains into six distinct types from TAXI 1 to TAXI 6 (Fig. 1).

Fig. 1
figure 1

Genome mapping from ‘TAXI family TRAP transporter solute-binding subunit’ gene to ‘DNA polymerase III subunit chi’ gene among the different Pseudomonas aeruginosa strains. DUF2165 family protein are colored in purple; T3SS effector bifunctional cytotoxin exoenzyme S, grey; ABC transporter family gene, yellow; and DUF2007 domain-containing protein, green.

The gene DUF2165 (highlighted in purple in Fig. 1) was found across all P. aeruginosa strain types. The TAXI types 2, 3, 4, 5 and 6 exhibited distinct clustering patterns in the downstream of the ABC transporter gene, while TAXI type 1 showed variation in the upstream of the ABC transporter.

For TAXI Type 1 strains, genetic differences were associated with proteins such as the zinc-dependent alcohol dehydrogenase family protein (WP_033939636.1) and the metalloregulator ArsR/SmtB family transcription factor (WP_03393963.1) (Fig. 2). In TAXI type 3 strains, variations were noted in the intergenic regions between the ABC transporter protein (WP_003092876.1), an FG-GAP-like repeat-containing protein (WP_23553742.1), and a hypothetical protein (WP_123823007.1). For TAXI type 6 strains, the variation was observed in the intergenic region between WP_153519859.1 and WP_023081550.1. However, in TAXI types 2, 4, and 5, the genetic variation was associated with a hypothetical protein, which did not provide sufficient distinction between these types.

Fig. 2
figure 2

Genome mapping of distinctive genetic variations in the intergenic regions between annotated genes among three types of TAXI Pseudomonas aeruginosa strains.

Grouping of P. aeruginosa strains based on the variation in Rsx family operon

The genetic maps were aligned to a stable structural framework, ranging from the electron transport complex subunit E to the methionine-tRNA ligase. Following the gene mapping, all the strains were categorized into two groups such as Rsx A and group Rsx B (Fig. 3). The key difference between these groups was linked to the presence of three specific genes: the T6SS immunity protein Tli4 family protein, phospholipase D, and type VI secretion system tip protein Tssl/VgrG. Strains containing these three genes were categorized as Rsx B, while those without them were categorized as Rsx A.

Fig. 3
figure 3

Genome mapping of the Rsx family gene and its neighboring genes in Pseudomonas aeruginosa strains. T6SS immunity protein Tli4 family protein are colored in pink; phospholipase D, orange; type VI secretion system tip protein TssI/VgrG, blue.

Comparing the pattern of SHRs with gene mapping patterns

We also attempted to find out whether SHRs patterns of CDF and HflC groups could exhibited any correlation with genetic pattern identified in TAXI and Rsx groups. Interestingly, the gene cluster of TAXI groups were somewhat correlated with SHRs pattern of CDF and HflC groups, resulting in consistent genetic patterns (Table 2). However, not complete matching of patterns were observed between SHRs and CDF groups.

Conclusion

Overall, the present study propose a novel combination of genotyping markers based on SHRs (CDF and HflC genes) and gene structure (TAXI and Rsx genes) to enhance the robustness of strain typing by identifying specific variations within the same species. The comparison of the current approach with the MLST technique demonstrated that it produces reproducible results and can also distinguish strains that lack an MLST type number. This novel approach could serve as a useful genetic marker and reliable metric for genotyping of P. aeruginosa and could serve as one of the potential element for the understanding of their evolution and genetic diversity. However, elucidating the function of amino acid repeats and the biological significance of strains remains imperative. Further research and validation studies are needed to effectively implement this approach in practical applications.