Introduction

Porphyromonas gingivalis is an archetypal Gram-negative oral pathobiont that is the most commonly employed model organism in studies of periodontitis and related systemic sequelae, such as Alzheimer’s Disease1, rheumatoid arthritis2 and specific cancers3. There are many advantages to the use of P. gingivalis, including an extensive clinical and experimental literature, the applicability of multiple genetic tools, and the availability of animal models of disease. Multiple complete genome sequences of different P. gingivalis strains are also available, such as ATCC 33277, 381, TDC60, W50, and W83.

Within the last decade or so, the advent of transposon sequencing has facilitated the identification of those genes that are absolutely essential for the growth of the type strain, P. gingivalis ATCC 332774,5. The 281 genes comprising the minimal genome of ATCC 33277 predominantly encode proteins involved in fundamental physiological processes, such as protein synthesis and export and LPS and peptidoglycan biosynthesis4,5. Homologues of many of these inherently essential genes are common among bacteria in the Database of Essential Genes (DEG: tubic.org/deg_bak/). The classical P. gingivalis virulence factors, for example, fimbriae, gingipains and capsule, are underrepresented in the minimal genome4,5.

More recently, those P. gingivalis ATCC 33277 genes that are conditionally essential for surviving tobacco-induced stress6, to confer the ability to cause abscess formation7, for epithelial invasion7, for iron acquisition8 and for a functional type IX secretion system9 have been identified. Genes encoding key protein catabolism enzymes have also been elucidated in the type strain10.

Periodontitis is induced by the immune response to dysbiotic plaque, with tobacco smoking adversely influencing both these phenomena and accounting for the majority of periodontitis cases in developed countries11,12,13. Indeed, P. gingivalis is particularly well-adapted to this key environmental factor6,14,15. We have elucidated 256 genes that are conditionally essential for survival of the P. gingivalis ATCC 33277 in a tobacco-rich environment6,16. Genes that encode nicotinate and nicotinamide metabolism enzymes, e.g., a nicotinamidase (PGN_0534) and a nicotinate-nucleotide pyrophosphorylase (PGN_0533), oxidative stress protectors, e.g., rubrerythrin and thioredoxin, and several proteinaceous enzymes, e.g., PepO and PrtT, were among the tobacco-essential gene subset.

Conditional essentiality has been evaluated during murine abscess formation, used as a screen for P. gingivalis-specific virulence factors required for surviving an aggressive immune response in a complex in vivo niche7. An epithelial colonization model (telomerase-immortalized gingival keratinocytes, TIGK)17 has also been employed to screen for as yet unrecognized genes contributing to epithelial invasion and intracellular survival7. P. gingivalis ATCC 33277 genes that are essential under these specific conditions include an aminoacyl-histidine dipeptidase (PGN_1434) and a putative cytidine deaminase (PGN_0026), for abscess formation, and clpC (a Clp protease) and a putative Fe-S oxidoreductase (PGN_1013), for TIGK association. Genes conditionally essential for both abscess formation and for epithelial colonization including those encoding adhesins, such as mfa1 and mfa4, iron acquisition proteins, e.g., hmuR and feoB, and proteases, e.g., rgpA and rgpB.

We set out, herein, to generate an expandable global pangenome of multiple laboratory and low-passage clinical strains of P. gingivalis and to map absolutely and conditionally essential genes across this genetic atlas. As P. gingivalis is an asaccharolytic species that uses a wide range of proteases for nutrition and immune system evasion10,18, we also delineated universal conditionally essential proteolytic genes containing putative signal peptides for transmembrane transportation. We hypothesize that this composite analysis will facilitate improved planning for P. gingivalis gene mutation strategies, improve knowledge of existing and novel P. gingivalis virulence mechanisms, and help prioritize the identification of new therapeutic targets for the prevention of P. gingivalis-associated diseases.

Results

We have previously elucidated essential genes of the P. gingivalis type strain, ATCC 332776,7,10,19. We now expand these findings to a global pangenome of multiple laboratory and low-passage clinical isolates of P. gingivalis and map absolutely and conditionally essential genes across this genetic atlas.

Phylogenetic tree of P. gingivalis

A phylogenetic tree, derived from the genomes of P. gingivalis 11029, 10208c, 6404, 4612, 7303, 10512, 5607, 8012, EC324, 33277, 381, TDC60, W50 and W83 is shown in Fig. 1. Branch nodes, phylogenetic distances, and the measure of support for each downstream branch node are presented. Interestingly, the greatest genetic divergence was found between particular laboratory strains (381 and ATCC 33277 vs W50 and W83), with the clinical isolates and TDC60 being intermediary.

Fig. 1
figure 1

Phylogenetic trees derived from the pangenome of the Porphyromonas gingivalis type strain, ATCC 33277, and various laboratory and low-passage clinical strains. The scale bar above the tree indicates the phylogenetic distance, measured in the number of substitutions per site. The numbers below each branch segment represent the bootstrapped measure of support for each downstream branch node as a percentage, where 1 = 100% which would represent maximal support. Branch nodes without a bootstrapped measure of support represent ambiguous branching points, where not enough information is present in the sequences to determine a branching point between taxa accurately. Any ambiguous branching points were resolved using midpoint rooting.

Presence-absence matrices—complete pangenome

Gene presence-absence matrices, generated from the combined genomes of all sequenced P. gingivalis strains, are presented in Fig. 2A. The vertical blue lines indicate the presence of a specific gene in each strain analyzed, whereas white vertical lines indicate the absence of a gene in the various strains of interest.

Fig. 2
figure 2

The complete pangenome of P. gingivalis. Gene presence-absence matrices and core and accessory genome sizes are presented. (A) Gene presence-absence matrices were generated from the complete pangenome comprising the Porphyromonas gingivalis type strain, ATCC 33277, and various laboratory and low-passage clinical strains. Unrooted phylogenetic trees are placed on the left of each matrix to orient which row corresponds to which strain. Vertical blue lines indicate the presence of pangenome genes for each strain analyzed, whereas white vertical lines indicate the absence of pangenome genes. (B) The number of core versus accessory genes in the pangenome of P. gingivalis ATCC 33277 strain and various laboratory and low-passage clinical strains is also presented. Core genes are those present in all 14 strain genomes. Accessory genes are divided into common, shell genes (present in 3–13 genomes of genomes) and rarer, cloud genes (present in 1–2 genomes). A total of 4216 different genes were identified in the 14 strains representing the entire P. gingivalis pangenome.

Core and accessory genes of the global P. gingivalis pangenome

For a given gene to be considered part of the core pangenome, its encoded protein cluster must have exhibited a corresponding sequence that is ≥ 95% similar in each input strain examined (> 99%, 14 strains). The remaining accessory genes were considered common, shell genes (15–99% of genomes; 3–13 strains;) or rare, cloud (< 15% of genomes; 1–2 strains only) genes. A listing of the complete set of core, shell and cloud genes is presented in Supplemental Table 1, with a unique locus tag provided for each individual gene sequenced. This dataset can be added to, in order to expand the P. gingivalis pangenomes provided herein, as additional relevant data becomes available. The solid blue regions in Fig. 2A indicate the ubiquitous, core genome. The blue and white patterned region represents the accessory pangenome which is comprised of shell and cloud genes. A simplified view of the quantitative distribution of core (46%), shell (20%) and cloud (34%) genes of the complete 14 strain P. gingivalis pangenome is presented in Fig. 2B.

Absolute and conditionally essential P. gingivalis pangenomes

The distribution of absolutely (minimal genome) and conditionally (tobacco survival, abscess formation, epithelial colonization) genes across the P. gingivalis pangenome are provided in Supplemental Table 2. Core, shell and cloud genes for each pangenome are noted. Again, these datasets can be readily added to as our understanding of P. gingivalis fitness determinants grows. Selected inherently essential genes and selected core genes of the P. gingivalis pangenome that are essential under various disease-relevant conditions are presented in Table 1A–D. While the complete, complex data sets are provided as supplementary data, the genes included in Table 1A–D are intended to present specific genes whose annotated functions have clear relevance to the specific disease-related model of interest, i.e., tobacco survival, epithelial invasion or abscess formation, in a digestible format. The complete absolutely essential, minimal pangenome of P. gingivalis is visualized in Fig. 3A,B. The complete tobacco-essential pangenome of P. gingivalis is presented in Fig. 3C,D.

Table 1 Ubiquitous inherently and conditionally essential genes of the P. gingivalis pangenome.
Fig. 3
figure 3figure 3

Absolutely (A, B) and tobacco-essential (C, D) pangenome maps. Absolutely essential (A, B) and tobacco-essential (C, D) genes were mapped to the P. gingivalis chromosome (A, C) and presented as gene presence-absence matrices (B, D). For the circular chromosomal mapping, (A, C), the P. gingivalis ATCC 33277 genome serves as the template for the overlay of the core gene sets (i.e. ubiquitously present across all 14 strains). For the gene presence-absence matrices (B, D), vertical blue lines indicate the presence of pangenome genes for each strain analyzed, whereas white vertical lines indicate gene absence.

Distribution of protease-encoding P. gingivalis genes across the pangenome

The essential protein catabolism genes that contain putative signal peptide-encoding regions and are found in the core of the combined conditionally (tobacco survival, abscess formation, epithelial invasion) essential pangenome of P. gingivalis are presented in Table 2.

Table 2 Ubiquitous conditionally essential P. gingivalis genes encoding protein catabolism products with signal peptides.

Functional enrichment of strain-specific cloud genes

The results from the functional enrichment analysis for each set of strain-specific cloud genes are collated in Supplemental Table 3.

Discussion

We have generated a 14 strain P. gingivalis global pangenome. Interestingly, the greatest genetic divergence was found between laboratory strains 381 and 33277 at one end of the phylogenetic tree and W50 and W83 at the other. All of the newly sequenced clinical isolates are intermediary, along with TDC60. As expected, most genes represented in the overall pangenome are common core and shell genes (66%). However, a large number of rarer, cloud genes (34%) were also noted, establishing that substantial genetic heterogeneity exists between the strains examined. This is in keeping with prior studies that have shown considerable genetic plasticity in this important oral pathobiont, likely contributing to variations in the pathogenicity of different P. gingivalis strains20,21,22,23,24.

We have previously elucidated and characterized the minimal genome of P. gingivalis ATCC 33277 that is required for growth in rich medium in a low-stress environment. It is of particular interest, then, that the number of these absolutely essential genes that are conserved across the P. gingivalis pangenome is high. A range from 96.5% (P. gingivalis 4612) up to 98.9% (P. gingivalis 381) is apparent. The minimal pangenome is largely comprised of housekeeping genes involved in fundamental metabolic pathways, including, but not limited to, lipid A synthesis (e.g., lpx genes), micronutrient supply (e.g., riboflavin biosynthesis protein, ribD), biosynthetic enzyme cofactor generation (e.g., bifunctional folate synthesis protein, sulD), protein trafficking (e.g. signal peptide peptidase A, sppA), peptidoglycan manufacture (e.g., mur genes), protein anabolism (multiple 30 s and 50 s ribosomal proteins) and transcription (e.g., σA, σ54). While relatively barren in established P. gingivalis virulence factors, the core absolutely essential pangenome enlightens as to primary physiological processes and suggests a number of potential novel therapeutic targets for the control of this microbe. Several of the central biological processes encoded by genes of the core minimal essential genome are solely prokaryotic and explain the broad range of antibiotics to which P. gingivalis is susceptible. With increasing insight into the mechanisms of action of classic antimicrobials and enhanced knowledge of the underlying metabolic pathways, there is reinvigorated interest in developing novel therapeutics for several of these traditional targets. For example, there is a contemporary interest in the development of a new generation of stable, clinically relevant and bacterial-specific folate generation inhibitors, including anti-SulD agents25,26. Sulfonamides, which block para-aminobenzoic acid to folate conversion, are among the earliest developed efficacious antimicrobials, although their mechanism of action was not fully elucidated until the twenty-first century. Polymyxins target lipid A and destabilize the Gram-negative outer membrane but are normally used only as last line antibiotics. However, several novel compounds targeting Lpx proteins, e.g., UDP-(3-O-(R-3-hydroxymyristoyl))-N-acetylglucosamine deacetylase, are under investigation as potential lipid A biosynthesis-inhibiting antimicrobials27. During bacterial riboflavin biosynthesis, the critical pyrimidine deaminase and reductase steps are encoded by a bifunctional enzyme, ribD, which our data show is essential across the minimal P. gingivalis pangenome28. In some other bacteria, these functions are performed by the products of separate ribD and rib G genes28. Several effective inhibitors of riboflavin pathway enzymes, including RibD, have been identified and are under investigation as potential new antibacterial agents29,30. BamA folds and inserts transmembrane β-barrel proteins into the outer membrane of Gram-negative bacteria while LbtD inserts LPS31. These two highly conserved proteins have been suggested to be the sole outer membrane proteins that are absolutely essential for Gram-negative viability in general31,32,33. In concordance, our data show that bamA and lptD are absolutely essential in all P. gingivalis strains examined herein. Multipronged approaches to the development of a new class of Gram-negative antibiotics, targeting BamA, are also currently underway, including bacteriocins, small molecule inhibitors and darobactin, a Photorhabdus symbiont-derived antibiotic31,34. The Tol/Pal system, initially characterized as a colicin uptake mechanism, is recognized as involved in the enhancing the pathogenic potential of several microbes and is essential for selected pathogens35. Our tolQ essentiality data suggest that P. gingivalis may be among them. Inhibitors of Tol/Pal proteins, including TolQ, have been posited as novel antimicrobials, particularly for drug-resistant pathogens35. Efforts are also underway to develop cmk-targeting antimicrobials for drug-resistant Gram-negative infectious agents36. While Cmk is essential for teichoic acid synthesis and cell wall production in Gram-positive bacteria, it is also required for bacterial DNA synthesis and has also emerged as a focus for novel antibacterial agents36. In bacteria, FtsY interacts with a protein-RNA complex, the signal recognition particle, to facilitate protein translocation to the cell membrane. Efforts to develop a novel class of antibacterials that target this essential prokaryotic system function are also in progress37.

As mutations in core essential genes are, by definition, lethal, the data provided herein should facilitate improved planning for single gene mutation strategies. For example, we have recently investigated the atypical cyclic di-AMP signaling system that is essential for P. gingivalis growth38. While we were successful in creating single gene knockouts in the P. gingivalis c-di-AMP signaling genes, pde and cdaR, a further c-di-AMP signaling gene, dac, determined to be essential by TnSeq, could not be generated38.

Greater variation than in the core absolutely essential pangenome appears amongst those genes conditionally essential for abscess formation and epithelial invasion (63.6–76.0% in P. gingivalis W83 and P. gingivalis 381, respectively); for surviving tobacco smoke exposure (62.0–74.0% in P. gingivalis W83 and P. gingivalis 381, respectively); and amongst the inherently and conditionally essential protein catabolism genes (71.0% in P. gingivalis W83 and 5607 to 88.0% in P. gingivalis 8012).

As tobacco smokers exhibit greater risk of developing disease more rapidly and developing periodontitis that is more difficult to treat than non-smokers11,12,13,15,39, we have a particular interest in the delineation of genes requisite for P. gingivalis survival in tobacco-rich environments. We hypothesized that the tobacco-essential pangenome of P. gingivalis would shed light on survival mechanisms relevant to the subgingival niche in smokers and potentially inform as to novel preventive and therapeutic strategies specific to this over-represented, under-studied high-risk sub-population. The core conditionally essential pangenome of tobacco-essential P. gingivalis genes includes trx, rbr, tpx, sodB and sufS. The trx gene encodes a putative thioredoxin. Rubrerythrin encoded by rbr, protects against oxygen-induced stress40. As P. gingivalis is catalase negative, such antioxidant gene products may be particularly important in safeguarding P. gingivalis against the oxidant-rich insult that is cigarette smoke. Indeed, cigarette smoke has been estimated to contain > 1015 oxidants and free radicals per inhalation41. We have previously established that deletion of the thiol peroxidase-encoding tpx gene in P. gingivalis ATCC 33277 endows reduced fitness, relative to the parental strain, in cigarette smoke-exposed bacteria6. The sufS gene product is a cysteine desulfurase which are generally involved in Fe–S cluster biogenesis under stress conditions, including oxidative stress42. Interestingly, drugs that target SufS are under development as novel antimicrobials43. The importance of oxidant evasion enzymes for tobacco survival in P. gingivalis is again emphasized by the finding that sodB, encoding a superoxide dismutase, is also part of the core tobacco essential pangenome.

Protein catabolism is central to P. gingivalis virulence, for example, by providing peptides as carbon sources, assisting the acquisition of iron and facilitating evasion of facets of the immune system10. We have identified a set of genes encoding membrane- or extracellular transport-targeted proteolytic enzymes that comprise the core conditionally essential genome for all conditions tested. These include porU, hrtA, rgpB, pepO and prtT. The porU gene product is a cysteine proteinase that is essential for the efficient function of the type IX secretion system (TIXSS) which transports multiple important P. gingivalis factors, including the classic virulence determinants, the gingipains10,44. Heat-shock-related protein A (HrtA) is another established P. gingivalis virulence mediator with pleiomorphic functions that include roles in quorum sensing, the TIXSS, oxidant resistance and murine lethality10,45,46,47. The arginine-specific gingipain, RgpB, is among the best characterized P. gingivalis virulence factors, playing critical roles in complement protein and defensin degradation10,48,49,50. As we have discussed previously10, PepO endopeptidase activity is enhanced during epithelial invasion51, while prtT encodes a trypsin-like protease. Therapeutic targeting of core, conditionally essential, surface-exposed or secreted P. gingivalis components may be worthy of further exploration.

Recently, Veith et al. have reviewed the Tn-Seq data relevant to the TIXSS of P. gingivalis52 and delineated inherently and conditionally essential genes relevant to this secretion system. The authors noted that 65% of TIXSS components are required for growth and, while only a minority of TIXSS cargo proteins were inherently essential, 77% of TIXSS cargo proteins were conditionally essential for epithelial colonization, abscess formation, or surviving cigarette smoke extract exposure52.

STRING analysis revealed that a common function of genes found within a single strain only is DNA-related, e.g., transposase activity and DNA integration, again consistent with genomic plasticity. Another common theme amongst the strain-specific functional enrichments is coiled coils, a protein secondary structure providing cell wall rigidity and influencing adhesion and biofilm formation53. P. gingivalis W50 was the only strain that did not exhibit any significant functional enrichments amongst the strain-specific genes.

Clearly, the pangenomic datasets generated herein are based on data originally derived from P. gingivalis ATCC 33277, the reference genome employed due its prevalence in the literature and the availability of a transposon sequencing library (TnSeq) that has now been widely exploited. In this context, it is important to note that our attempts to generate a TnSeq mutant library in P. gingivalis W83 proved unsuccessful, perhaps due to the surface ultrastructure. The generation of additional TnSeq libraries in other P. gingivalis strains may assist in the identification of further important accessory genes but the currently available ATCC 33277 library is expected to be efficient in the identification of ubiquitous genes across the various pangenomes created.

In summary, we have generated a pangenome based in the complete genomic sequences of 14 different P. gingivalis strains; a pangenome of common absolutely essential genes; and pangenomes of common genes conditionally requisite for survival under several disease-related conditions. Those genes that are highly pervasive in the P. gingivalis absolutely essential pangenome or are highly prevalent and essential for fitness in disease-relevant models, may represent particularly attractive therapeutic targets worthy of further investigation. In particular, then, it will be interesting to confirm the expected common phenotype of ubiquitous conditionally essential genes, as we have shown previously for multiple genes in the context of P. gingivalis ATCC 33277 only, such as PGN_0088 (sinR1), PGN_0287 (fimA), PGN_0388 (tpx), PGN_0491 (ltp1), PGN_0770 (rnz), PGN_1200 (a DNA-dependent ATPase encoding gene), PGN_1300 (nitrosative stress transcriptional regulator, hcpR), PGN_1444 (carbamoyl phosphate synthetase), PGN_1524 (ptk1)6,7. As mutations in absolutely essential genes are expected to be lethal, the data provided herein should also facilitate improved planning for P. gingivalis gene mutation strategies.

Materials and methods

Bacteria and genome sequencing

The chromosomes of several low-passage clinical strains of P. gingivalis from our collection (11029, 10208c, 6404, 4612, 7303, 10512, 5607, 8012, EC324) were sequenced using the fee-for-service facility of MicrobesNG (Birmingham, UK) according to their sample preparation and submission, DNA processing, Illumina sequencing, assembly and annotation protocols54. These strains were kindly provided by Dr. Richard Lamont (University of Louisville) and many have been reported on previously55,56,57,58. The genomes were compared with several publicly available complete P. gingivalis strain sequences (ATCC 33277, 381, TDC60, W50, W83).

Pangenome assembly and analysis

Assembled genome FASTA files from sequencing were used as input in Prokka (version 1.14.6) in order to generate a series of General Feature Format (GFF) version 3 files for each assembled genome59. Prokka was used in Bacteria mode and the genus and species set to Porphyromonas and gingivalis, respectively. GFF files were then used as input into the pangenome analysis platform, Roary60 (version 3.13.0) with the –e and –n parameters utilized to perform gene alignment using MAFFT. To be considered to be a part of the core pangenome, a given gene must be present in at least 99% of all strains. The core gene alignments generated from Roary were used as input into FastTree as input into Interactive Tree of Life (version 6.7.2)61. Default parameters were used for each software listed unless otherwise noted. The following reference Porphyromonas gingivalis strain sequences were used in the pangenome analysis: 33277 (Genbank Accession GCA_000010505.1), W83 (Genbank Accession GCA_000007585.1), W50 (Genbank Accession GCA_000271945.1), TDC60 (Genbank Accession GCA_000270225.1), and 381 (Genbank Accession GCA_001314265.1). For a given gene to be considered part of the core pangenome, its encoded protein cluster must have exhibited a corresponding sequence that is ≥ 95% similar in each input variant. The remaining genes were considered accessory shell and cloud genes.

Definition of core and accessory genes

For a given gene to be considered part of the core pangenome, its encoded protein cluster must have exhibited a corresponding sequence that is ≥ 95% similar in each input strain examined (> 99%, 14 strains). The remaining accessory genes were considered common, shell genes (15–99% of genomes; 3–13 strains;) or rare, cloud (< 15% of genomes; 1–2 strains only) genes.

Mapping essential P. gingivalis genes across the pangenome

Previously generated databases of essential genes in Porphyromonas gingivalis6,7,10,19 were used to generate a custom BLAST database consisting of the protein sequences of all known essential genes in P. gingivalis using the “blastn” function of BLAST+ (v2.14.0)62. The proteome from each of the sequenced clinical strains as well as the reference strains generated from Prokka, except for ATCC 33277 strain, were queried against this custom BLAST database using a cutoff evalue of 1 × 10−50. Results were limited to the single top-scoring hit and compiled into an *.xlsx file also containing the equivalent ATCC 33277 strain gene locus tags and the publication in which the essential gene was originally discovered. Mapping of universally essential genes to the circular P. gingivalis chromosome was facilitated by Proksee (Pubmed ID: 37140037) accessed at https://proksee.ca/ using P. gingivalis ATCC 33277 strain genome accession AP009380.1 and adding additional feature tracks through the “Features” tool with text files containing the essential gene feature data. The annotated circular P. gingivalis genomes were then downloaded as static PNG images.

Identification of membrane-targeted ubiquitous conditionally essential proteolytic genes

For ubiquitous conditionally essential genes that were derived from Miller et al.10, the protein sequences for each gene were uploaded to SignalP version 6 (accessed at https://services.healthtech.dtu.dk/services/SignalP-6.0/) (Pubmed ID: 34980915) with the following parameters: “Organism” set to “Other”, “Output format” set to “Long output”, and “Model mode” set to “Slow”. Output includes likelihood of various signal peptides within each submitted protein and the most probable cleavage site location within each submitted amino acid sequence. Images of the output were collated into a Powerpoint file (Supplemental Fig. 1).

Functional enrichment of cloud genes

To better characterize the overall pattern of genes present in the cloud pangenome, STRING (v12.0) accessed at https://string-db.org was used to perform functional enrichment analysis on each set of strain-specific genes63. To facilitate this, each strain proteome (.faa file from Prokka) was uploaded to STRING to generate a custom protein–protein interaction (PPI) database. During the upload process, the name of the strain was entered, and the taxon set as “Porphyromonas gingivalis”. Using the custom PPI databases, the lists of protein locus tags were used as input into the “Multiple proteins” search function with the FDR stringency set to “lenient (25 percent)”. On the generated PPI network page, the “Analysis” tab was selected and the “All enriched terms” file was downloaded. The output of these analyses can be seen in the Excel file presented as Supplemental Table 3.