Abstract
Among dozens of microbial DNA modifications regulating gene expression and host defense, phosphorothioation (PT) is the only known backbone modification, with sulfur inserted at a non-bridging oxygen by dnd and ssp gene families. Here we explored the distribution of PT genes in 13,663 human gut microbiome genomes, finding that 6.3% possessed dnd or ssp genes predominantly in Bacillota, Bacteroidota, and Pseudomonadota. This analysis revealed several previously undescribed PT synthesis systems, including type 4 Bacteriophage Exclusion (BREX) type 4 brx genes, which we genetically validated in Bacteroides salyersiae. Mass spectrometric analysis of DNA from 226 gut microbiome isolates possessing dnd, ssp, and brx genes revealed 8 PT dinucleotide settings confirmed in 10 consensus sequences by PT-specific DNA sequencing. Genomic analysis showed PT enrichment in rRNA genes and depletion at gene boundaries. These results illustrate the power of the microbiome for discovering prokaryotic epigenetics and the widespread distribution of oxidation-sensitive PTs in gut microbes.
Similar content being viewed by others
Introduction
Epigenetic modifications have been found in DNA from all domains of life. Among the variations in canonical A, T, C and G structure, such as N6-methyl-adenine (6mA), N4-methyl-cytosine (4mC), C 5-methyl-cytosine (5mC), and 7-deazaguanine derivatives1,2, phosphorothioates (PT) are the only known modification of the sugar-phosphate backbone, with a non-bridging oxygen replaced by sulfur in RP-specific configuration3,4,5 (Fig. 1A). Two PT-based restriction and modification (R-M) systems, the dnd4,6 and ssp7,8 gene clusters, have been observed in ~10% of bacteria and archaea9,10, with a third tdp system recently discovered in extremophilic bacteria11. Typically, the modification components are organized as a four-gene operon for dndBCDE and a three-gene operon for sspBCD, with an additional modification gene for dndA or sspA located adjacent to the operon, dispersed elsewhere in the genome, or replaced by a gene for a homolog such as iscS. However, the recently discovered tdpABC cluster requires only a single gene, tdpC, for PT synthesis11. Like methylation-based R-M systems8,10,12, DndACDE and SspABCD proteins catalyze PT modification on one or both strands of specific consensus sequences. For example, DndACDE confer double-stranded PTs at 5′-GPSAAC-3′/5′-GPSTTC-3′ sequences in Escherichia coli B7A and Salmonella enterica serovar Cerro 87, 5ʹ-GPSGCC-3′/5′-GPSGCC-3′ in Pseudomonas fluorescens pf0-1 and Streptomyces lividans 1326, and 5′-GPSATC-3′/5′-GPSATC-3′ in Hahella chejuensis KCTC 239612,13,14. SspABCD proteins, on the other hand, catalyze single-stranded 5′-CPSCA-3′ in Vibrio cyclitrophicus FF757,13. The dnd and ssp systems share some similarities, such as encoding a homolog of phosphoadenosine phosphosulfate (PAPS) reductase (DndC, SspD)7,15, a homolog of cysteine desulfurase (DndA, SspA)7,15, and a P-loop containing ATPase (DndD, SspC)7,16 (Fig. 1B). The restriction counterparts DndFGH and SspFGH sense PT modifications by poorly understood mechanisms, with only 10–15% of consensus sequences modified with PTs7,13.
A Sulfur replaces a non-bridging phosphate oxygen in the DNA backbone in PT modifications. A limit nuclease digest of PT-containing DNA leaves PT-linked dinucleotides that can be identified and quantified by LC–MS. B Synthesis of PTs generally follows the biochemical steps performed by Dnd proteins.
Both the metabolic pathways enabling sulfur incorporation into PTs and the altered chemical properties of sulfur-modified DNA have implications for interactions of PT-containing microbes with human hosts. For example, the sulfur in PTs is both readily oxidized and nucleophilic, which is proposed to provide epigenetic regulation of transcription of redox homeostasis genes12. PTs also provide weak to modest protective effects in cells exposed to reactive oxygen and nitrogen species, such as peroxides12,17 and peroxynitrite18. Contrasting with this protection, PT-containing bacteria are 5-fold more sensitive to neutrophil-derived hypochlorous acid (HOCl) due to extensive DNA breaks at PTs19. These unusual chemical properties of PTs raise questions about how PT-containing microbes might behave in the healthy gut microbiome or be altered by inflammatory bowel disease (IBD) or other chronic inflammatory conditions20,21,22. Evidence for the presence of PT genes in bacterial strains associated with the human gut microbiome and PT dinucleotides in fecal DNA23,24 thus motivated us to systematically analyze PT genomics in the human gut microbiome.
Here, we defined the landscape of PT-containing microbes in the human gut by performing a genomic analysis of 13,000 human microbiome genomes from the Broad Institute-OpenBiome Microbiome Library (BIO-ML)25, the Global Microbiome Conservancy (GMbC)26,27,28, and the Unified Human Gastrointestinal Genome (UHGG) collection29. Mass spectrometric analysis of PT dinucleotides in 226 of these isolates, coupled with PT-specific next-generation sequencing (PT-seq)30, led to the discovery of a previously undescribed PT system involving type 4 Bacteriophage Exclusion (BREX) genes brxPCZL. These results expand our knowledge about the diversity of PT epigenetics and lay the foundations for understanding the role of PT-containing microbes in human health and disease.
Results
Discovery of previously uncharacterized PT modification systems by analyzing genome neighborhoods of dndC and sspD genes
Given the power of physical clustering analyses to identify gene functions in bacteria, we first performed a comprehensive gene neighborhood analysis31 of dnd and ssp genes to find new PT modification systems. Here, we used enzyme function initiative (EFI) tools31 to first search UniProt for DndC and SspD homologs in sequence similarity networks (SSN), followed by the EFI Genome Neighborhood Tool to identify the genomic contexts of the SSNs. DndC and SspD possess a PAPS reductase domain with pyrophosphatase activity essential for PT biosynthesis (Fig. 1B), an activity shared by sulfur-inserting RNA modification enzymes ThiI32, MnmA33, and Ncs633, and TtcA34. As a second criterion for PT synthesis, we required that DndC and SspD neighborhoods also possess a P-loop-containing NTPase gene. Both DndD and SspC possess P-loop-containing ATPase activity essential for PT synthesis (Fig. 1B). Using this approach, we retrieved 3120 DndC homologs and 2132 SspD homologs from UniProt by BLAST (E-value cutoff 10−5) and searched genome neighborhoods encoding both a PAPS reductase domain and a P-loop NTPase (Supplementary Data 1 and 2).
The resulting gene neighborhoods revealed several previously undescribed putative PT-modifying gene candidates. Each circle or node in Fig. 2 depicts a neighborhood containing dndC (Fig. 1A) and sspD (Fig. 1B), with clustering of nodes based on SSNs. Here, it is clear that most of the dndC-containing gene neighborhoods fall within the largest cluster and possess additional dnd genes consistent with canonical Dnd-based PT synthesis (Fig. 2A, nodes with red outlines). However, several of the smaller clusters contained neighborhoods with a dndC gene, a P-loop NTPase, and other known defense genes, suggesting PT-based RM systems. In one instance, dndC and the NTPase gene lie near genes for a Bacteriophage Exclusion (BREX) type 4 system35 (Fig. 2A, yellow nodes). Five of six BREX systems possess a BrxC/PglY ATPase, a PglX DNA methyltransferase, and BrxZ (PglZ) phosphatase. In BREX type 4 clusters containing brxP, the PglX adenine methyltransferase is replaced with the DndC S-inserting PAPS reductase domain protein35. For example, in Methanothermobacter sp. (Fig. 2A, yellow node, black outline), brxP (dndC homolog) is adjacent to a brxC ATPase gene, a brxZ phosphatase gene, and a Lon-like protease domain-containing brxL gene typically found in BREX type 1 and 4 systems. In Fischerella thermalis (Fig. 2A, yellow node, red outline), the full set of dndBCDE genes cluster with brxL, brxZ phosphatase, sspB nickase, and a methyltransferase. It is common to find several different defense systems in the same region due to horizontal gene transfer and genetic decay. The systems in these islands can function independently, so the methyltransferases located next to the candidate PT genes may function independently of the dnd or BREX type 4 systems. The synthesis of PTs in the DndC-containing BREX type 4 system was subsequently validated genetically and by LC–MS, as discussed shortly.
Sequence similarity networks (SSN) analysis was performed for the 3120 closest homologues of DndC (A) and 2132 homologues of SspD (B) in the UniProt database. Each node (circle) in the network represents one DndC or SspD protein. An edge is drawn between two nodes that have a BLAST E-value cutoff of ≤10−100 (alignment score of 100) in the DndC SSN or 10−50 (alignment score of 50) in the SspD SSN. Some nodes are filled with colors according to their genome neighborhood structure, with a representative species shown and indicated in the SSNs with a two-letter label. The node outlines are colored red when dndD is present in the genome neighborhood of dndC, and similarly for sspD when sspBC are present in the neighborhood. For better visualization, single nodes and clusters with only a few nodes were hidden. HEPN higher eukaryotes and prokaryotes nucleotide-binding, MTase methyltransferase, NTPase nucleoside triphosphatase, NTP transf, nucleotidyltransferase, top topoisomerase.
In a second putative PT defense system, a small cluster including Bacillus cereus (Fig. 2A, light green node) pairs dndC with a putative minimal nucleotidyltransferase (MNT) and a higher eukaryotes and prokaryotes nucleotide-binding (HEPN) protein. This gene neighborhood resembles a Class II MNT-HEPN toxin–antitoxin (TA) system36, in which the HEPN protein is an RNase toxin, but its activity is neutralized by adenylylation by the MNT antitoxin37,38. The HEPNs in the dndC clusters lack the RNase domain, but the neighboring MNT, ThiF, has the conserved motif GSX10DXD of an adenylyl transferase. Coupled with the adjacent NTPase, the DndC encoded in this Class II MNT-HEPN neighborhood could confer a three-component PT-based stress response system.
Finally, the dndC genes clustered with P-loop protein genes in the genome neighborhoods in small clusters, such as Sulfurimonassp. (Fig. 2A, blue node), Anaerolineales bacterium (Fig. 2A, pink node), and Burkholderia lata (Fig. 2A, brown node). A DUF262 gene was found in the former genome neighborhood that is found in the SspE restriction component of Ssp-mediated PT systems and is involved in the PT-sensing anti-phage activity7.
The sspD gene neighborhoods were less complicated than those involving dndC (Fig. 2B). Again, sspD genes were associated with BREX defense system genes, as in the small cluster containing Cyanobacterium stanieri (Fig. 2B, yellow node). This reinforces the idea that BREX type 4 genes represent a PT-based defense system.
To compare the distribution of brx genes to dnd and ssp families in prokaryotes, we performed a BLASTp search with protein sequences for DndABCDE, SspABCDE, and BrxPCZL as queries in 6616 representative genomes from the Bacterial and Viral Bioinformatics Resource Center (BV-BRC; as of January 2021)39. Here, we defined the minimal genes necessary for a functional PT synthesis system as dndCD, sspBCD, brxPCZL based on the following observations: (1) DndA/SspA are often replaced by cysteine desulfurase such as IscS; (2) DndB is a non-essential regulator; (3) DndE protein is too short to search rigorously; (4) SspE is not required for PT modification; and (5) brxPCZL are the core 4 genes that define a BREX type 4 system, with the brxR regulator not essential. Based on gene locus information for each protein hit, we found the dndCD, sspBCD, and brxPCZL gene clusters present in 4.3%, 3.0%, and 0.6%, respectively, of the BV-BRC genomes (Supplementary Data 3). Notably, both dndCD and sspBCD gene clusters co-occurred in the same genomes of some Cyanobacteria, such as Gloeocapsa sp., Coleofasciculus chthonoplastes, and Scytonema hofmanni, among others (Supplementary Data 3). This complements the gene neighborhood analysis, which revealed that dndC clusters containing brx genes, MNT-HEPN genes, and DUF3696 genes are located near dndBCD operons in Cyanobacteria such as Fischerella, Nodosilinea, and Pseudanabaena (Fig. 2A, red-outlined yellow, green, and blue nodes in the main cluster). Similarly, 7 of 40 genomes with sspBCD operons located near brxCZL genes occurred in some Cyanobacteria genomes (Fig. 2B, red-outlined yellow nodes in the main cluster). These observations complement the observations of Lin et al.9 and suggest the co-evolution of the variety of PT-based epigenetic systems in the ancient photosynthetic Cyanobacteriota phylum40,41.
BREX type 4 systems are homologous to Ssp proteins and catalyze PT synthesis
The genome neighborhood analyses revealed strong associations between the PAPS domain and NTPase genes essential for PT synthesis and brx genes from the BREX type 4 family. This association was strengthened by the homology analysis (HHpred42) of BREX family and SSP proteins shown in Fig. 3A, which reiterates the hallmark brxPCZL genes in BREX type 4 and the lack of the PT-defining PAPS domain and NTPase genes in BREX type 1. For example, BrxP possesses both the PAPS reductase domain found in SspD and the DUF4007 domain found in SspB (Fig. 3A), which effectively renders BrxP as a SspD-SspB fusion protein. Similarly, BrxC is essentially an SspC homolog, both possessing the same DUF6079 domain (Fig. 3A).
A The genomic organization of the BREX type 4 gene systems in Bacteroides salyersiae and putative Butyricimonas faecalis. Vibrio cyclitrophicus FF75 is included to demonstrate a typical ssp gene system and Escherichia coli HS to demonstrate a typical BREX type 1 system. Protein domains are color-coded and labeled. B The levels of PT dinucleotides in engineered B. salyersiae and Bacteroides thetaiotaomicron strains. ΔbrxC, ΔbrxC brxCBf, ΔbrxC sspCBo, vector, and brxCBs all differ significantly from wild-type (WT) based on a two-tailed Student’s t-test with three biological replicates (p < 0.05). Data are presented as mean values ± SD. Bs: B. salyersiae, Bf: Butyricimonas faecalis, Bo: Bacteroides ovatus. Statistical data are provided in Supplementary Data 15 and 16.
Based on these similarities, we tested the PT synthesis activity of proteins encoded by genes in the BREX type 4 family. The first step was to identify PT modifications in bacterial strains known to possess BREX type 4 genes. Here, we quantified PTs as PT-linked dinucleotides by liquid chromatography-coupled triple-quadrupole mass spectrometry (LC–MS/MS) in a limited digest with nuclease P1 and phosphatase dephosphorylation43. In Bacteroides salyersiae DSM18765, which harbors a 7-gene brx operon but no dnd or ssp genes (Fig. 3A), we detected the PT dinucleotides APSC, CPSC, and TPSC at 56, 228, and 98 per 106 nt, respectively (Fig. 3B, Supplementary Fig. S1A, Supplementary Data 4). As shown in Supplementary Data 4, several other bacterial isolates with a 4–5 gene complement of brx genes (brxCLPRZ, brxCLPZ) and insufficient sets of dnd or ssp genes also yielded CPSC (Prevotalla sp.), APSA and APSC (putative Butyricimonas), TPSC (Parabacteroides), and APSA (Bacteroides). These results clearly distinguished the BREX type 4 gene systems from the dnd and ssp families.
Building on this associative evidence, we validated PT synthesis activity by creating in-frame deletion mutants of brx genes in B. salyersiae. As shown in Fig. 3B, loss of brxC abolished PTs, with PTs restored by complementation with in trans expression of B. salyersiae brxC (ΔbrxC brxCBs). Based on the similarity between BrxP and SspD, we attempted to create a brxP deletion mutant with B. salyersiae but were unable to obtain this mutant, suggesting that its deletion may be deleterious. As an alternative, we reconstructed the PT pathway from B. salyersiae by cloning brxPBs, mcrABs (a PT-dependent restriction enzyme in Streptomyces coelicolor)44, and brxCBs together in an expression plasmid or brxCBs alone in the plasmid. These plasmids were transferred into Bacteroides thetaiotaomicron VPI, which lacks PT genes but possesses a SufS cysteine desulfurase homolog of DndA to provide a presumably complete system for PT modification (SufS, BrxP, BrxC) and restriction (McrA). The combination of brxPBs, mcrABs, and brxCBs resulted in the TPSC, CPSC, and APSC dinucleotides in the same proportion as wild-type B. salyersiae brxCBs alone were not able to confer PTs, which is consistent with the essentiality of BrxP in PT synthesis. Furthermore, mcrA, brxZ, brxL or an unknown Bs02795 gene were not required for PT synthesis as shown in corresponding mutants (Fig. 3B). To test the generality of the PT synthesis by BrxCP, determine if brxC and sspC are interchangeable, and, since B. salyersiae and B. faecalis have different PT motifs, assess if BrxC is the component that determines the target motif, we heterologously expressed brxCBf from B. faecalis or sspCBo (brxC equivalent) from the human gut isolate B. ovatus CL03T12C18 in trans in the B. salyersiae brxC mutant. PTs were not detected in either strain (Fig. 3B). These results suggest species-specific complexities of the BREX system components requiring further investigation. We conclude that, in the presence of a cysteine desulfurase gene (sufS), brxP and brxC are the minimal set of BREX genes needed for PT synthesis.
The distribution of PT-modifying systems in the human gut microbiome
Based on the observation of PT-containing microbes in the mouse and human gut microbiome23,24, we wondered about the presence of brx-based PT systems in gut bacteria and their relationship to dnd and ssp systems. To quantify the distribution of the PT systems in the human gut microbiome, we searched for PT gene clusters in 13,663 human gut microbial genomes from the BIO-ML25 and the GMbC26,27,28 collection, and from isolate sequences of human gut microbes and metagenome-assembled genomes (MAGs) in the Unified Human Gastrointestinal Genome (UHGG) collection29. As with the BV-BRC searches noted earlier, we performed a BLASTp search of proteins DndABCDE, SspABCDE, and BrxPCZL as queries in these 13,663 genomes (Supplementary Data 5). The essential gene sets dndCD, sspBCD, and brxPCZL were found to be present in 2.7%, 3.6%, and 1.4% of the human gut microbiome genomes, respectively (Fig. 4). This represents a 1.6- and 1.2-fold reduction in dnd and ssp systems and a 2.3-fold enrichment in brx systems in the gut microbiome compared to the general BV-BRC genomes. The distributions of the three gene clusters at the genome level were nearly exclusive, with few exceptions (Supplementary Data 5). Among the most prevalent phyla and orders in the human gut microbiome, the PT-modifying gene clusters were mainly found in Bacteroidota (Bacteroidales), Bacillota (Clostridiales), Pseudomonadota (Enterobacterales), and, to a lesser extent, Actinomycetota (Coriobacteriales) (Fig. 4). These observations raised the question of the predictive power of the genomic analyses.
The reference phylogeny was reconstructed from the concatenated alignment of 10 ribosomal proteins. The colors of the triangles in the tree show the taxa at the order level. The occurrence of gene clusters was quantitatively presented by the colors and size of circles. For better visualization, orders containing <5 genomes were hidden. Source data are provided in Supplementary Table S5.
Validating the predicted PT modifications in gut microbiome isolates
Given the presence of genes essential for PT synthesis in ~1000 out of 13,663 gut microbiome isolates, we next used LC–MS to validate the presence of PT dinucleotides in DNA in a collection of 226 bacterial isolates possessing dndCD, sspBCD, brxPCZL gene neighborhoods or individual PT synthesis genes not predicted to lead to PTs (Supplementary Data 4). In total, we identified 8 PT dinucleotides, including APSA, APSC, CPSA, CPSC, GPSA, GPSC, GPST, and TPSC, and did not detect an additional two dinucleotides (CPST, GPSG) that we found in human fecal DNA samples (Supplementary Data 4)24. Previous studies in three bacterial species with dnd and ssp genes4,43,45 revealed GPSA and GPST at >500 per 106 nt and CPSC at >2000 per 106 nt, with CPSA, APSA, APSC, and TPSC detected at much lower levels of 1–6 per 106 nt4,43,45. The minor PT dinucleotides represent low-affinity binding sites for the DNA shape-selective Dnd proteins45. In sharp contrast, however, these minor sites in gut microbiome bacteria represent major PT modification sites, as high as 1800 per 106 nt in bacteria with brxPCZL genes (Supplementary Data 4).
The APSA, APSC, CPSA, CPSC, GPSA, GPSC, GPST, and TPSC dinucleotides were distributed unevenly among the isolates based on the type of PT modification system, with four groups emerging from the analyses (Fig. 4). The first group of 39 isolates was characterized by GPSA and the predominance of dnd genes. The largest portion of Group 1 (32 isolates) possessed both GPSA and GPST and harbored dndCD genes with or without dndBE, including isolates from the Bacteroidales and Clostridiales orders. Five isolates from both Bacteroidales and Clostridiales harboring the minimal dndCD gene set showed GPSA and GPSC. One isolate that possessed a solitary brxP but no dnd genes and possessed both GPSA and GPST. Given the presence of different dinucleotides for the ssp and brx gene families, this latter observation could be explained by genome sequencing errors or a sample mix-up during regrowth, either resolved by re-sequencing. A single Blautia species harboring dnd genes showed only GPSA (Supplementary Data 4).
The second group of 34 isolates was characterized by the CPSC dinucleotide and ssp genes, with all but 2 harboring sspBCD ±sspE (Fig. 5). One isolate harbored dndCD genes that are not proximal to the sspBCD operon and showed LC–MS evidence of low levels of GPSA and GPST at the detection limit of ~2 PTs per 106 nt (Supplementary Fig. S2), though these signals await high-resolution mass spectrometric validation. One isolate in this group lacks sspBC genes, and one lacks all essential sspBCD genes.
The numbers of isolates analyzed are listed on the left. The circles represent PT dinucleotides quantified by LC–MS (left) and the presence of corresponding genes (right). The PT-modified consensus motifs were characterized in one representative isolate from each group using PT-seq. t.b.d., to be determined. *The CPSAG motif was characterized using metagenomics PT-seq in another ongoing study. **GPSA and GPST were detected using LC–MS QQQ, but the exact mass was not verified by high-resolution mass spectrometry.
The third group of 30 isolates carried the BREX type 4 genes brxPCZL and the most diverse sets of PT dinucleotides. Four isolates of Parabacteroides contain CPSA, 12 isolates of Prevotella sp. contain CPSC, 3 isolates of Parabacteroides contain TPSC, and 9 isolates of putative Butyricimonas faecalis contain APSA and APSC (Fig. S1B). The B. salyersiae strain examined earlier, which contains CPSC, TPSC, and APSC (Fig. 5, Supplementary Data 4).
The fourth group of 122 bacteria lacked reliable LC–MS signals for PT dinucleotides. The majority (83) lacked the minimal sets of dndCD, sspBCD, or brxPC gene clusters. For example, Bacteroides dorei CL03T12C01 harbors a PAPS reductase-encoding gene next to an ATPase without an adjacent dndD (Supplementary Fig. S3). Several isolates harbor dndD and dptH next to a methylase and a restriction enzyme but lack dndC (Supplementary Fig. S3). The absence of PT dinucleotides in these isolates agrees with the requirement for a PAPS reductase domain-containing protein and an NTPase. However, several (10) carry the minimal set of dndCD or sspBCD genes (Fig. 4), so PT dinucleotides were expected. It is possible but unlikely that the lack of PT dinucleotides detected in the isolates is due to a symbiont phenotype in the gut microbiome, with loss of one of the PT synthesis genes accounted for by uptake of PT intermediates from other bacteria. The PT modification pathway does not require unique small-molecule intermediates, and the modification genes are generally not interchangeable among different bacteria, so it is unlikely that gut microbes can lose one or more PT modification genes and exist as a “PT symbiont” in the gut community. Again, we cannot rule out errors in the reference genomes due to genome sequencing and sample handling.
Although based on a limited number of samples, these observations suggest the hypothesis that (1) Dnd proteins produce GPSA, GPSC, and GPST, (2) Ssp proteins produce CPSC, and (3) Brx proteins produce everything except GPSA, GPSC, and GPST. Understanding the basis for these differences requires knowledge about the longer consensus sequences containing the PT dinucleotides in the gut microbes.
Novel PT consensus sequences in gut microbiome isolates
Here, we defined novel larger PT consensus sequences, especially for the BREX type 4 system, in gut microbiome isolates using an innovative and highly sensitive NGS sequencing technique, PT-seq30. The idea here is that the PT dinucleotides, such as GPSA and GPST, are found in the larger consensus sequences of GPSATC and 5'-GPSAAC-3ʹ/5ʹ-GPSTTC-3' in several types of bacteria5. Application of PT-seq (Supplementary Fig. S4) to the BREX type 4-containing human gut microbiome isolate B. salyersiae DSM18765, earlier found by LC-MS/MS to possess APSC, CPSC, and TPSC at 62, 228, and 98 per 106 nt (Supplementary Data 4), showed 5459 PT sites: 56 at APSCTC, 3881 at CPSCTC, and 1513 at TPSCTC sites, respectively (Fig. 6A, Supplementary Data 6). As observed with other dnd and ssp systems, the Brx proteins modified only a portion of the 27274 ACTC (0.2%), 22698 CCTC (17%), and 36290 TCTC (4%) total sites available, respectively. The observation of PTs at 3 NCTC sites, with a strong preference for CCTC, is consistent with previous observations that Dnd proteins select their target sequences based on DNA shape rather than precise binding contacts45. We appreciate the discrepancy between LC–MS/MS and PT-seq data for the quantity of APSC in the APSCTC context in B. salyersiae DSM18765. We previously observed that the efficiency of iodine-induced cleavage of PT sites to strand breaks in the PT-seq method varies by 2-fold for different dinucleotides30 and that the extension efficiency of the terminal transferase used for T-tailing depends on the nucleobase sequence of the initiators, where A-ending initiators are generally less favored46,47,48. Together, these could account for some portion of the discrepancy between the LC–MS and PT-seq data for ApsC. However, in another BREX type 4-containing putative B. faecalis (GMbC ID 5893AJ_0218_015_F2), PT-seq revealed 9832 GAPSAG and 5536 GAPSCG (Supplementary Data 7). These sites agreed with the APSA and APSC dinucleotides detected by LC–MS/MS (Supplementary Data 4). It thus remains to be determined if the APSCTC motif is uniquely problematic for PT-seq. Finally, PT-seq also detected sites with bistranded PTs: 1080 GAPSAG/CTPSTC and 667 GAPSCG/GCPSTC. However, we could not detect reliable signals for the corresponding TPST and TPSG dinucleotides by LC–MS/MS, which may be due to their low abundance and to insensitive MS detection of T-containing nucleotides.
A Pie charts depict the number of single- or bi-stranded PT-modified consensus sequences. The structural diagrams depict these motifs. B Analysis of the distribution of PT consensus sequences and modified sites in 1 kb upstream and downstream regions in B. salyersiae (left) and putative B. faecalis (right). The number of total motif sites in the sense strand (upper) or both strands (lower) is represented in pink. The PT-modified sites are represented in purple. The fraction of PT-modified motif sites is represented by the black line.
Application of the original PT-seq protocol without biotin labeling and capturing to a dnd system-containing human gut microbiome isolate, Lachnospiraceae sp. (GMbC ID 2807EA_1118_063_H5), revealed 1474 GPSAGC and GPSCTC sites occurring among the 16,683 possible GAGC/GCTC sites, with 1154 modified on one strand and 160 modified on both strands (Fig. 6A, Supplementary Data 8). A large fraction of pileup sites occurring at NTTT and other motifs represents artifacts of poly(dT) tailing at single-strand nicks by terminal transferase.
The uneven genomic distribution of PTs
In addition to defining the consensus sequences for BREX type 4 and other PT synthesis systems, PT-seq also revealed that PTs are nonrandomly distributed across bacterial genomes, which agrees with single-molecule real-time (SMRT) sequencing maps in Escherichia coli13. For example, the percentage of gene classes and intergenic regions containing PTs in Lachnospiraceae sp., B. salyersiae, and B. faecalis varied significantly: 6% of intergenic regions, 13% of tRNA genes, and 58% of coding sequences (CDS) (Supplementary Data 9). The percentage of consensus sequences modified with PTs, which is roughly 10% in genomes studied to date12,13, also varied in individual bacterial genomes, ranging from 1% in tRNA genes in Lachnospiraceae sp. to 40% in rRNA genes in B. faecalis (Supplementary Data 9). Interestingly, the consensus sequences recognized by PT synthesis proteins were underrepresented at the boundaries of CDS (Supplementary Fig. S5). We further assessed the distribution of PTs within and between CDSs by quantifying PT-modified and unmodified consensus sequences in the sense strand of gene bodies and in regions 1 kbp upstream and downstream of all CDSs. For both B. faecalis and B. salyersiae, this analysis revealed that the percentage of PT-modified consensus sequences was lower at the boundaries of coding regions (Fig. 5SB, black line; Supplementary Fig. S6). Published PT-seq data for E. coli BW25113, which has a bistranded GPSAAC/GPSTTC PT consensus, also showed this biased distribution of PTs (Supplementary Fig. S7A). Thus, both the consensus sequences for PTs and the proportion modified with PT were underrepresented on either side of CDSs. While PTs have been proposed to interfere with transcription13, the percentage of PT-modified consensus sequences did not correlate with the level of transcription activity (Supplementary Fig. S7B), nor did the number of PTs in possible promoter regions (Supplementary Fig. S7C). The basis for these biased distributions remains to be defined.
Discussion
Here, we applied systems-level informatics, mass spectrometric, and sequencing tools to discover and characterize new PT modification systems in bacteria, with a focus on gut microbes. A rigorous neighborhood analysis of the key PT synthesis genes in dnd and ssp systems led to the discovery of several potential previously undescribed PT modification systems, including MNT-HEPN, DUF262, and BREX type 4 gene families. Genetic and bioanalytical validation of the BREX type 4 system revealed the essentiality of brxPC genes for PT synthesis in bacteria with brxPCZL clusters. With the discovery of the dnd system in 20053, the ssp system in 20207, and the tdp system in 202511, the brxPCZL cluster represents the fourth established PT modification system. While the BREX type 4 system has been shown to have anti-phage activity35, such activity for the variant type 4 systems observed here awaits investigation.
The extent of distribution of the dnd, ssp, and brx gene clusters was assessed first among 6616 representative prokaryotic genomes and then among 13,663 gut microbiome isolates. The difference in the frequency of dndCD, sspBCD, and brxPCZL gene clusters in the BV-BRC general set of bacteria (4.3%, 3.0%, and 0.6%, respectively) and the gut microbiome isolates (2.7%, 3.6%, and 1.4%, respectively) suggests a bias for the BREX type 4 systems in the gut environment. The total of 7.7% of gut microbiome isolates possessing PT-synthesis genes is consistent with the estimate of 5–10% of stool microbes possessing PT modifications, as we observed using mass spectrometric measurement of PT dinucleotides in human fecal DNA24.
With the discovery of BREX type 4 as the third PT synthesis gene family, one feature of PT modifications appears to be universal for PT epigenetics: partial modification of available consensus sequences in individual genomes. In agreement with previous sequencing studies with bacteria containing Dnd proteins13,49, PT-seq analysis revealed that the BREX type 4 proteins also only partially modify their four-nucleotide consensus sequences, ranging from 0.2% at ACTC to 17% at CCTC in B. salyersiae (Fig. 5A, Supplementary Data 6). Partial modification of all available sequence motifs, hemi-modification within a sequence motif, cell replication-dependent shifts in the locations of modified motifs, and active maintenance of a constant density of PT modifications have all been observed in PT systems identified to date5,13,19,45, despite the presence of active restriction components13. While some portion of the hemi-modification and partial modification could result from naturally occurring oxidation and repair or replacement19, the unusual behavior of all PT systems to date appears to be inherent to the mechanisms of modification target selection and surveillance for restriction. The BREX type 4 system also highlights another feature of PT systems: sequence-selectivity of the dnd, ssp, tdp, and brx gene families. Previous7,8,50 and present studies (Fig. 4) consistently show that ssp systems catalyze CPSC dinucleotides mainly in CCA motifs, while the dnd systems insert PTs mainly in GN motifs: GPSA, GPSC, GPSG43, and GPST dinucleotides. The latter occurs mainly as bistranded modifications of GNNC motifs. To date, a tdp consensus of GPSATC has been observed in a few bacteria11. We previously showed that Dnd proteins select modification sites based on DNA shape, with GAAC, GTTC, and GATC all sharing similar shapes and all being modified with PTs at GPSA and GPST in S. enterica45. The fact that ssp is selective for CCA suggests a similar shape selectivity. BREX type 4-dependent PT systems are more complicated, which may reflect the hybrid nature of the PT-synthesizing gene families in different bacteria. The five types of PT nucleotides produced by Brx proteins (CPSA, CPSC, APSC, APSA, TPSC) share CPSC with ssp systems but lack the GPSN unique to dnd systems (Fig. 5A). This may be partly explained by the sequence similarities between SspBCD and BrxPC proteins. The shape selectivity argument is again supported with brxPCZL in B. salyersiae DSM18765, with the ACTC, CCTC, and TCTC sites in an NCTC motif all modified to differing extents (Fig. 5A) and in B. faecalis with GAAG and GACG motifs. However, the limited number of strains analyzed in the present studies and published work may have missed dnd, ssp, or brx systems that confer other dinucleotide patterns.
Our studies revealed the complexity of interactions among the three PT systems, with widespread combinations of the various components of all three systems and coexistence of two or more gene clusters in the same organism (Fig. 1, Supplementary Data 3). These evolutionary co-occurrences likely reflect a fitness advantage, such as that observed for the presence of Dnd and Ssp together, which provides complementary and synergistic protection against temperate and lytic phages as well as phage induction50. The co-occurrence of combinations of all three established PT systems and two putative systems in Cyanobacteria supports the hypothesis of an evolutionary divergence that may originate from a sulfur-based metabolism in an ancient Cyanobacteria ancestor9. We found that 7 strains harboring the dndBCD-brxCZL type gene islands, while 7 out of 40 strains harboring the sspBCDB-brxCZL type (or similar) gene islands are Cyanobacteria, and 3 of 40 are Deinococcota. Phylogenetic analyses suggest that Deinococcota and Cyanobacteria are both relatively close to the root of the bacterial tree of life51, which agrees with the argument that dnd systems originated in ancient Cyanobacteria after the Great Oxygenation Event9. Similarly, BREX systems have also been found in branches near the root of the bacterial tree and deep branches35. All four PT systems contain a gene encoding a PAPS reductase domain protein that has been proposed to be involved in the initial sulfur mobilization step5. The observation of a bias toward brx-based PT systems and the unique biochemical properties of PTs raise questions about the role of PT epigenetics in the human gut microbiome in health and disease.
Methods
Bacterial strains and growth conditions
Bacteria revived from BIO-ML and GMbC were cultured as previously decribed25. Parabacteroides spp. and Bacteroides spp., E. faecalis, and BIO-ML isolates were grown on ASF plates (Becton Dickinson) and BHIS plates52, BHI plates (Becton Dickinson), and Brucella agar (Catalog #AS-141, Anaerobe Systems, Morgan Hill, CA), respectively. Culture manipulations were performed at 37 °C in an anaerobic chamber with an atmosphere of 2.5-3.5% hydrogen, 10% carbon dioxide, and the remainder nitrogen. Growth on agar plates was harvested with phosphate-buffered saline (Catalog #10010-023, Gibco, Paisley, PA). The bacterial suspensions were pelleted by centrifugation at 4000×g for 15 min at ambient temperature. The supernatant was removed, and the pellets were stored at −20 °C.
Bacterial stains and plasmids used and generated for BREX 4 system deletion and reconstruction are listed in Supplementary Table S1. Bacteroidales strains were grown in basal liquid medium53 or on BHIS plates52. Antibiotics used for selection include erythromycin (10 µg/mL), gentamicin (200 µg/mL), and anhydrotetracycline (50 ng/mL). E. coli S17 λ pir was grown in LB broth or plates with carbenicillin (100 µg/mL) added for selection.
Creation of deletion mutants and complementing clones
Internal non-polar deletion mutants of brx genes, including mcrA (HMPREF1532_02792), brxC (HMPREF1532_02793), brxZ (HMPREF1532_02794), brxL (HMPREF1532_02795), and Bs02796 (HMPREF1532_02796), were constructed by amplifying DNA upstream and downstream of each gene using the synthetic primers listed in Supplementary Table S2. These flanking pieces were cloned into BamHI-digested pLBG1352 using NEBuilder (New England BioLabs) and transformed into E. coli S17 λ pir. PCR confirmed plasmids were sequenced (Plasmidsaurus) to confirm correct error-free PCR amplification. The correct construct was conjugally transferred from E. coli into B. salyersiae and cointegrates were selected on gentamycin/erythromycin. Double recombination cross-outs were selected on BHIS plates with anhydrotetracycline (aTC, 50 ng/ml) and screened via PCR for mutant genotype.
Genes expressed in trans in B. thetaiotaomicron and B. salyersiae ΔbrxC were PCR amplified and cloned into BamHI-digested pFD34054 using NEBuilder v2.5. Transformants were PCR screened, and the plasmid was sequenced. Plasmids were conjugally transferred from E. coli to B. salyersiae and transconjugants were selected on gentamycin/erythromycin plates.
Sequence similarity networks
Sequence similarity networks (SSNs) were generated by submitting the sequences of E.coli B7A DndC (GeneBank AIF62362.1) and V. cyclitrophicus FF75 SspD (NCBI Refseq WP_016789110.1) to the EFI-EST webtool v2021_0331 using the BLAST option (E-value cutoff 10−5, maximum number of sequences retrieved 5000). The initial SSN was generated with an alignment score cutoff set such that each connection (edge) represents a sequence identity of ~40%. Sequences that share 100% sequence identity were grouped into a single node. More stringent SSNs were created by increasing the alignment score cutoff in small increments (usually by 5–10). This process was continued until each cluster is estimated to be isofunctional, in which the nodes represent enzymes that catalyze the same reaction. Isofunctionality was determined by mapping the known dnd clusters and following their movements as the alignment cutoff increased until the dnd clusters fell into subclusters. The network was visualized in alignment score weighted Prefuse Force-Directed Layout using Cytoscape v3.955. The resulting SSNs were submitted to EFI-EST webtool31 for genome neighborhood analysis with default settings. Nodes were highlighted in SSNs if the neighboring Pfam families, with a maximal median distance of no more than 4 and a minimal co-occurrence rate larger than 0.2, have NTPase activity, NTP binding, or DNA binding activities.
Sequence analyses
For sequence analyses, the BLAST tools v2.956 (with E-value cutoff of 10-10 and query coverage of 25%) and HHpred v2020071742, and the resources of BV-BRC vbeta39 were routinely used. In total, the data from 20,279 bacterial and archaeal genomes were retrieved from the BIO-ML25, GMbC26,27,28, Unified Human Gastrointestinal Genome (UHGG)29, and BV-BRC representative bacteria as collected in January 2021. The queries used in this study were listed in Supplementary Data 10, including IscS (GeneBank AIF64277.1), DndB (GeneBank AIF62361.1), DndC (GeneBank AIF62362.1), DndD (GeneBank AIF62363.1), DndE (GeneBank AIF62364.1) in E.coli B7A and SspA (Refseq WP_016789103.1), SspB (Refseq WP_022570853.1), SspC (Refseq WP_016789109.1), SspD (Refseq WP_016789110.1), SspE (Refseq WP_016789111.1), and BrxP (GMbC tag OIFBNFKG_02855), BrxC (GMbC tag OIFBNFKG_02856), BrxZ (GMbC tag OIFBNFKG_02857), BrxL (GMbC tag OIFBNFKG_02858). The genes dndCD, sspBCD, and brxPCZL were considered to be present when all genes were adjacent.
Phylogeny tree
We first built a concatenated alignment of 10 nearly universal and single-copy ribosomal protein families. We used Diamond v0.8.2257 (with parameters blastx -more-sensitive -e 0.000001 -id 35 -query-cover 80) to BLAST all proteomes in our collection against the RiboDB database v1.4.158 of bacterial ribosomal protein genes. We excluded proteins bL17, bS16, bS21, uL22, uS3, and uS4, as they were not sufficiently distributed across all genomes. In each RiboDB gene family, we excluded genomes that contained gene duplicates. Then, we aligned all protein families individually with MUSCLE v5.159 (with default settings). We filtered out misaligned sites using BMGE v1.1260 (with parameters -t AA -g 0.95 -m BLOSUM30) and concatenated all individual alignments using Seaview v4.761. The phylogenomic tree was reconstructed using FastTree v2.1.1062 (with parameters -lg -gamma) and visualized and modified in iTOL v563.
DNA isolation
The cells were resuspended in 500 μL of phosphate-buffered saline upon receiving and re-pelleted by centrifugation at 6000×g for 5 min at 4 °C. Genomic DNA was isolated using E.Z.N.A Bacterial DNA kit (Catalog # D3350-02), with the bead beating option (speed of 4 m/s, 45 s “on”, 1 min rest, 3 cycles). DNA was eluted with 150–200 μL of RNase-free water and stored at −80 °C.
Digestion of DNA for LC-MS/MS analysis of PT dinucleotides
DNA (20 μg, 78 μL) was incubated with Nuclease P1 (1.5 U, 3 μL, US Biological) in 30 mM ammonium acetate pH 5.3 and 0.5 mM ZnCl2 (90 μL total reaction volume) for 2 h at 55 °C. The reaction mixture was diluted with Tris–HCl (100 mM final concentration, pH 8.0, 9 μL) and incubated with calf intestinal alkaline phosphatase (51 U, 3 μL, Sigma) for 2 h at 37 °C. Enzymes were removed by passing the mixture through a VWR 10 kDa spin filter with centrifugation at 12,000×g for 12 min. The solution was lyophilized to dryness and resuspended in H2O (50 μL).
LC–MS/MS analysis of DNA PT dinucleotides
Synthetic PT DNA dinucleotides or nuclease P1 hydrolyzed DNA were analyzed by LC–MS/MS on an Agilent 1290 series HPLC system equipped with a Synergi Fusion RP column (2.5 μm particle size, 100 Å pore size, 100 mm length, 2 mm inner diameter) and a DAD. The HPLC was coupled to an Agilent 6490 triple quadrupole mass spectrometer. The column was eluted at 0.35 mL/min at 35 °C with a linear gradient of 3–9% acetonitrile in 97% solvent A (5 mM ammonium acetate, pH 5.3) over 15 min. The column was rinsed with 95% acetonitrile in solvent A for 1 min, and then the initial conditions were regenerated by rinsing the column with 97% solvent A for 3 min. Canonical deoxyribonucleosides that eluted from the column were quantified by their 260 nm absorbance with the DAD. PT-containing dinucleotides were identified and quantified by tandem quadrupole mass spectrometry with electrospray ionization operated with the following parameters: N2 temperature, 200 °C; N2 flow rate, 14 L/min; nebulizer pressure, 20 psi; capillary voltage, 1800 V; and fragmentor voltage, 380 V. For product identification, the mass spectrometer was operated in positive ion multiple reaction monitoring mode using the conditions tabulated in Supplementary Table S3. No statistical method was used to predetermine sample size. Comparisons between Bacteroides WT and engineered strains were based on three biological replicates analyzed using a t-test. For the remaining species, only one sample was analyzed owing to limited material availability, and the most recent technical replicate was included in this study.
High-resolution mass spectrometry
Synthetic PT dinucleotides (2 pmol per 10 μL injection) or Nuclease P1 hydrolyzed RNA (4 μg per 10 μL injection) were analyzed on a Dionex Ultimate 3000 UHPLC system equipped with a Synergi Fusion RP column (2.5 μm particle size, 100 Å pore size, 100 mm length, 2 mm inner diameter). The HPLC was coupled to a Thermo Fisher Q Exactive Hybrid Quadrupole-Orbitrap mass spectrometer. The column was eluted at 0.35 mL/min at 35 °C with a linear gradient of 3–9% acetonitrile in 97% solvent A (5 mM ammonium acetate, pH 5.3) over 15 min. The column was rinsed with 95% acetonitrile in solvent A for 1 min, and then the initial conditions were regenerated by rinsing the column with 97% solvent A for 3 min. High resolution mass spectra for the PT-containing dinucleotides were obtained by hybrid quadrupole-Orbitrap mass spectrometry with the following parameters: sheath gas flow rate, 50 L/min; aux gas flow rate, 15 L/min; sweep gas flow rate, 3 L/min; spray voltage, 4.20 kV; and capillary temperature, 275 °C. For product identification, the mass spectrometer was operated in positive ion targeted single ion monitoring mode using the conditions tabulated in Supplementary Table S4.
PT-seq library preparation
PT modifications were mapped in the genomes of Lachnospiraceae sp., B. faecalis, and B. salyersiae by PT-seq as described elsewhere29 (Supplementary Fig. S4). DNA (10 µg) was diluted to 30.5 µL in filtered H2O and subjected to four blocking cycles, each consisting of the following: (a) DNA was heated at 94 °C for 2 min to denature and immediately cooled on ice for 2 min. (b) rSAP (1 µL) and rCutSmart buffer (3.5 µL) were added, the reaction was incubated for 30 min at 37 °C, and the phosphatase was heat deactivated for 10 min at 65 °C. (c) Four ddNTPs (1 µL, 2 mM each), TdT reaction buffer (1.5 µL), CoCl2 (5 µlL 0.25 mM), terminal transferase (1 µL, 20 units), and filtered H2O (3.5 µL) were added and the reaction incubated for 60 min at 37 °C. Fresh reagents were added to each subsequent cycle at specific steps: rSAP (1 µL) and rCutSmart buffer (0.3 µL) were added at Step b, while ddNTPs (0.3 µL each), Tdt reaction buffer (0.3 µL), CoCl2 (0.3 µL), and terminal transferase (1 µL) were added at Step c. After all cycles were complete, excess unincorporated ddNTPs were removed by treatment with a DNA Clean & Concentrator kit (Zymo #11-304C). For iodine cleavage of PTs, the blocked DNA (32 µL) was incubated with 500 mM Tris–HCl pH 9.0 (4 µL) and iodine solution (4 µL, 5 mM) (Fluka #318981-100) for 10 min at 65 °C and cooled to 4 °C at a speed of 0.1 °C/s. The reaction was then split into two 20 µL aliquots, and each filtered using a single DyeEX column (QIAGEN #63206) to remove salts and iodine, followed by recombining eluents. Filtered DNA was again denatured by heating at 94 °C for 2 min and then cooling on ice for 2 min. Thereafter, the reaction was adjusted with rCutSmart buffer (5 µL), rSAP (1 µL), and filtered H2O to a final volume of 50 µL, and incubated for 30 min at 37 °C to remove 3ʹ-terminal phosphates arising from iodine cleavage. The phosphatase enzyme was then inactivated for 10 min at 65 °C. The reaction was further adjusted with dTTP (1 µL, 1 mM) (NEB #N0443S), TdT reaction buffer (1 µL), CoCl2 (6 µl, 0.25 mM), filtered H2O (1 µL), and terminal transferase (1 µL, 20 units), and incubated for 45 min at 37 °C to add dT-tails to the 3’-terminal ends created specifically by iodine at PT sites. The reaction was split into three 20 µL aliquots, excess dTTP was removed using three DyeEX columns, and then eluents were recombined. The reaction (60 µL) was then adjusted with TdT reaction buffer (7.8 µL), CoCl2 (7.8 µL), ddUTP-biotin (1 µL, 1 mM) (Jena Bioscience #NU-1619-BIOX-S), filtered H2O (0.4 µL), and terminal transferase (1 µL) and incubated for 60 min at 37 °C to terminate the dT-tails with a conjugated biotin moiety. After cleaning with four DyeEX columns and recombining eluents, the processed DNA was diluted in filtered H2O (500 µL) and fragmented by probe sonication as described above. Thereafter, DNA fragments were mixed with Hydrophilic Streptavidin Magnetic Beads (5 µL) (NEB #S1421S) and binding buffer (500 µL) (5 mM Tris–HCl, pH 7.5, 1 M NaCl, 0.5 mM EDTA) and incubated for 60 min on a Nutator at room temperature. The beads were subjected to a pull-down process using a magnetic separation rack (Dynal #MPC-S) and washed three times with binding buffer (100 µL). After discarding the supernatant, beads were resuspended in filtered H2O (20 µL. Finally, the enriched DNA that was captured on the beads was directly subjected to Illumina library preparation using a SMART ChIP-seq kit (Takara #634865) following the manufacturer’s protocol. The final PCR step was performed using Illumina primers provided in the kit, and 12 cycles were run for amplification. The PCR product, with unique sequencing barcodes, was submitted to the MIT BioMicro Center for next-generation sequencing. Paired-end sequencing (150-bp) was performed on an Illumina MiSeq instrument, using a Custom Read2 sequencing primer that was supplied in the kit (poly(dA) followed by universal read 2 primer).
Data analysis
As illustrated in the workflow (Supplementary Fig. S4B), the data analysis started with trimming adapters. Adapters were removed using bbduk from BBtools v35.85 (sourceforge.net/projects/bbmap/), (with parameters ktrim = r k = 18 hdist = 2 hdist2 = 1 rcomp = f mink = 8 qtrim = r trimq = 30 for R1, ktrim = r k = 18 mink=8 hdist = 1 rcomp = f qtrim = r trimq = 30 for R2). T-tails were removed using bbduk with parameters ktrim = r k = 15 hdist = 1 rcomp = f mink = 8 for R1 and ktrim = l k = 15 hdist = 1 rcomp = f mink = 8 for R2. PhiX were removed using bbduk with parameters k = 31 hdist = 1. Trimmed reads were aligned to the corresponding genome using Bowtie2 v2.4.5 with the setting “sensitive”. The bam files were cleaned using SMARTcleaner v1.064 and split into two strands using samtools v1.19.265. The coverage of each strand was calculated separately using bedtools v2.30.066 with the genomecov -d -5 option. Then, custom scripts were used for pileup calling. Briefly, all read start positions were recorded for both strands separately. The read pileup depth at each position was represented by the number of reads starting (5ʹ-end) at each position. The 13 nt sequences centered at positions with depth ≥1 were retrieved using samtools65. The consensus motifs were analyzed with incrementing depth, typically from 1 to 100 with a step of 10, using MEME v5.3.367 with parameters -dna -objfun classic -nmotifs 5 -mod zoops -evt 0.05 -minw 3 -maxw 6 -markov_order 0 -nostatus -oc. The read starting sites mapping within <3 bps collapsed into the centermost consensus motif sites. Regions where starting positions were mapped within 4 bp on opposite strands and within reverse complementary sequences were considered as double-stranded modifications.
Data availability
Sequencing data have been deposited in the NCBI SRA database under BioProject ID PRJNA1006039. Raw LC–MS data have been deposited in the PRIDE archive with accession PXD059148.
Code availability
Custom scripts for processing the sequencing data are described in the “Methods” section and are available at https://github.com/dedonlab/PTseq_data_analysis.git68.
References
Sanchez-Romero, M. A. & Casadesus, J. The bacterial epigenome. Nat. Rev. Microbiol. 18, 7–20 (2020).
Thiaville, J. J. et al. Novel genomic island modifies DNA with 7-deazaguanine derivatives. Proc. Natl. Acad. Sci. USA 113, E1452–E1459 (2016).
Zhou, X. et al. A novel DNA modification by sulphur. Mol. Microbiol. 57, 1428–1438 (2005).
Wang, L. et al. Phosphorothioation of DNA in bacteria by dnd genes. Nat. Chem. Biol. 3, 709–710 (2007).
Wang, L., Jiang, S., Deng, Z., Dedon, P. C. & Chen, S. DNA phosphorothioate modification—a new multi-functional epigenetic system in bacteria. FEMS Microbiol. Rev. 43, 109–122 (2019).
Xu, T., Yao, F., Zhou, X., Deng, Z. & You, D. A novel host-specific restriction system associated with DNA backbone S-modification in Salmonella. Nucleic Acids Res. 38, 7133–7141 (2010).
Xiong, X. et al. SspABCD-SspE is a phosphorothioation-sensing bacterial defence system with broad anti-phage activities. Nat. Microbiol. 5, 917–928 (2020).
Wang, S. et al. SspABCD-SspFGH constitutes a new type of DNA phosphorothioate-based bacterial defense system. mBio 12, e00613–e00621 (2021).
Jian, H. et al. The origin and impeded dissemination of the DNA phosphorothioation system in prokaryotes. Nat. Commun. 12, 6382 (2021).
Xiong, L. et al. A new type of DNA phosphorothioation-based antiviral system in archaea. Nat. Commun. 10, 1688 (2019).
An, T. et al. A DNA phosphorothioation pathway via adenylated intermediate modulates Tdp machinery. Nat. Chem. Biol. 21, 1160–1170 (2025).
Tong, T. et al. Occurrence, evolution, and functions of DNA phosphorothioate epigenetics in bacteria. Proc. Natl. Acad. Sci. USA 115, E2988–E2996 (2018).
Cao, B. et al. Genomic mapping of phosphorothioates reveals partial modification of short consensus sequences. Nat. Commun. 5, 3951 (2014).
Chen, C. et al. Convergence of DNA methylation and phosphorothioation epigenetics in bacterial genomes. Proc. Natl. Acad. Sci. USA 114, 4501–4506 (2017).
You, D. L., Wang, L. R., Yao, F., Zhou, X. F. & Deng, Z. X. A novel DNA modification by sulfur: DndA is a NifS-like cysteine desulfurase capable of assembling DndC as an iron–sulfur cluster protein in Streptomyces lividans. Biochem.-Us 46, 6126–6133 (2007).
Yao, F., Xu, T., Zhou, X., Deng, Z. & You, D. Functional analysis of spfD gene involved in DNA phosphorothioation in Pseudomonas fluorescens Pf0-1. FEBS Lett. 583, 729–733 (2009).
Dai, D. et al. DNA phosphorothioate modification plays a role in peroxides resistance in Streptomyces lividans. Front. Microbiol. 7, 1–13 (2016).
Huang, Q. et al. Defense mechanism of phosphorothioated DNA under peroxynitrite-mediated oxidative stress. ACS Chem. Biol. 15, 2558–2567 (2020).
Kellner, S. et al. Oxidation of phosphorothioate DNA modifications leads to lethal genomic instability. Nat. Chem. Biol. 13, 888–894 (2017).
Bhattacharyya, A., Chattopadhyay, R., Mitra, S. & Crowe, S. E. Oxidative stress: an essential factor in the pathogenesis of gastrointestinal mucosal diseases. Physiol. Rev. 94, 329–354 (2014).
Zhu, S. et al. Development of methods derived from iodine-induced specific cleavage for identification and quantitation of DNA phosphorothioate modifications. Biomolecules 10, 1491 (2020).
Mangerich, A. et al. Infection-induced colitis in mice causes dynamic and tissue-specific changes in stress response and DNA damage leading to colon cancer. Proc. Natl. Acad. Sci. USA 109, E1820–E1829 (2012).
Sun, Y. et al. DNA phosphorothioate modifications are widely distributed in the human microbiome. Biomolecules 10, 1175 (2020).
Byrne, S. R. et al. Temporal dynamics and metagenomics of phosphorothioate epigenomes in the human gut microbiome. Microbiome 13, 81 (2025).
Poyet, M. et al. A library of human gut bacterial isolates paired with longitudinal multiomics data enables mechanistic microbiome research. Nat. Med. 25, 1442–1452 (2019).
Groussin, M. et al. Elevated rates of horizontal gene transfer in the industrialized human microbiome. Cell 184, 2053–2067 e2018 (2021).
Poyet, M. et al. Industrialization drives convergent microbial and physiological shifts in the human metaorganism. Preprint at bioRxiv https://doi.org/10.1101/2025.10.20.683358 (2025).
Rühlemann, M. et al. Convergent genomic responses of human gut bacteria to variations in industrialization. Preprint at bioRxiv https://doi.org/10.1101/2025.2010.2020.683395 (2025).
Almeida, A. et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat. Biotechnol. 39, 105–114 (2021).
Yuan, Y., DeMott, M., Byrne, S. & Dedon, P. PT-seq for highly sensitive metagenomic mapping of phosphorothioate DNA modifications. Preprint at bioRxiv https://doi.org/10.1101/2024.06.03.597111 (2024).
Oberg, N., Zallot, R. & Gerlt, J. A. EFI-EST, EFI-GNT, and EFI-CGFP: enzyme function initiative (EFI) web resource for genomic enzymology tools. J. Mol. Biol. 435, 168018 (2023).
Kambampati, R. & Lauhon, C. T. Evidence for the transfer of sulfane sulfur from IscS to ThiI during the in vitro biosynthesis of 4-thiouridine in Escherichia coli tRNA. J. Biol. Chem. 275, 10727–10730 (2000).
Shigi, N. Biosynthesis and functions of sulfur modifications in tRNA. Front. Genet. 5, 67 (2014).
Bouvier, D. et al. TtcA a new tRNA-thioltransferase with an Fe–S cluster. Nucleic Acids Res. 42, 7960–7970 (2014).
Goldfarb, T. et al. BREX is a novel phage resistance system widespread in microbial genomes. EMBO J. 34, 169–183 (2015).
Yao, J. et al. Identification and characterization of a HEPN-MNT family type II toxin-antitoxin in Shewanella oneidensis. Microb. Biotechnol. 8, 961–973 (2015).
Songailiene, I. et al. HEPN-MNT toxin-antitoxin system: the HEPN ribonuclease is neutralized by OligoAMPylation. Mol. Cell 80, 955–970 e957 (2020).
Yao, J. et al. Novel polyadenylylation-dependent neutralization mechanism of the HEPN/MNT toxin/antitoxin system. Nucleic Acids Res. 48, 11054–11067 (2020).
Olson, R. D. et al. Introducing the Bacterial and Viral Bioinformatics Resource Center (BV-BRC): a resource combining PATRIC, IRD and ViPR. Nucleic Acids Res. 51, D678–D689 (2023).
Schirrmeister, B. E., Gugger, M. & Donoghue, P. C. Cyanobacteria and the Great Oxidation Event: evidence from genes and fossils. Palaeontology 58, 769–785 (2015).
Luo, G. et al. Rapid oxygenation of Earth’s atmosphere 2.33 billion years ago. Sci. Adv. 2, e1600134 (2016).
Zimmermann, L. et al. A completely reimplemented MPI Bioinformatics Toolkit with a new HHpred Server at its core. J. Mol. Biol. 430, 2237–2243 (2018).
Wang, L. et al. DNA phosphorothioation is widespread and quantized in bacterial genomes. Proc. Natl. Acad. Sci. USA 108, 2963–2968 (2011).
Liu, G. et al. Cleavage of phosphorothioated DNA and methylated DNA by the type IV restriction endonuclease ScoMcrA. PLoS Genet. 6, e1001253 (2010).
Wu, X. et al. Epigenetic competition reveals density-dependent regulation and target site plasticity of phosphorothioate epigenetics in bacteria. Proc. Natl. Acad. Sci. USA 117, 14322–14330 (2020).
Schaudy, E., Lietard, J. & Somoza, M. M. Sequence preference and initiator promiscuity for de novo DNA synthesis by terminal deoxynucleotidyl transferase. ACS Synth. Biol. 10, 1750–1760 (2021).
Zhang, A. et al. Solid-phase enzyme catalysis of DNA end repair and 3’ A-tailing reduces GC-bias in next-generation sequencing of human genomic DNA. Sci. Rep. 8, 15887 (2018).
Forget, S. M. et al. Evolving a terminal deoxynucleotidyl transferase for commercial enzymatic DNA synthesis. Nucleic Acids Res. 53, gkaf115 (2025).
Cao, B. et al. Nick-seq for single-nucleotide resolution genomic maps of DNA modifications and damage. Nucleic Acids Res. 48, 6715–6725 (2020).
Jiang, S. et al. A DNA phosphorothioation-based Dnd defense system provides resistance against various phages and is compatible with the Ssp defense system. mBio 14, e0093323 (2023).
Zhu, Q. et al. Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea. Nat. Commun. 10, 5477 (2019).
Garcia-Bayona, L. et al. Nanaerobic growth enables direct visualization of dynamic cellular processes in human gut symbionts. Proc. Natl. Acad. Sci. USA 117, 24484–24493 (2020).
Pantosti, A., Tzianabos, A. O., Onderdonk, A. B. & Kasper, D. L. Immunochemical characterization of two surface polysaccharides of Bacteroides fragilis. Infect. Immun. 59, 2075–2082 (1991).
Smith, C. J. & Callihan, D. R. Analysis of rRNA restriction fragment length polymorphisms from Bacteroides spp. and Bacteroides fragilis isolates associated with diarrhea in humans and animals. J. Clin. Microbiol. 30, 806–812 (1992).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13, 2498–2504 (2003).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Buchfink, B., Reuter, K. & Drost, H. G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods 18, 366–368 (2021).
Jauffrit, F. et al. RiboDB Database: a comprehensive resource for prokaryotic systematics. Mol. Biol. Evol. 33, 2170–2172 (2016).
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Criscuolo, A. & Gribaldo, S. BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol. Biol. 10, 210 (2010).
Gouy, M., Guindon, S. & Gascuel, O. SeaView version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol. Biol. Evol. 27, 221–224 (2010).
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650 (2009).
Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 49, W293–W296 (2021).
Zhao, D. & Zheng, D. SMARTcleaner: identify and clean off-target signals in SMART ChIP-seq analysis. BMC Bioinform. 19, 544 (2018).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Bailey, T. L., Johnson, J., Grant, C. E. & Noble, W. S. The MEME Suite. Nucleic Acids Res. 43, W39–W49 (2015).
Yuan, Y., DeMott, M.S. & Dedon, P.C. Phosphorothioate DNA Modification by BREX Type 4 Systems in the Human Gut Microbiome, PTseq_data_analysis https://doi.org/10.5281/zenodo.17890169 (2025).
Acknowledgements
We thank Susan Weir and Katya Moniz at the Openbiome for providing us with bacterial isolates. This work was supported by National Institutes of Health (R01 ES031576, P.C.D., E.J.A.; R01AI093771, L.C.), by a NIEHS Training Grant in Environmental Toxicology T32-ES007020 (S.R.B.), and the Duchossois Family Institute (L.C.), and by funding for the GMbC from the MIT Center for Microbiome Therapeutics and the Neil and Anna Rasmussen Family Foundation.
Author information
Authors and Affiliations
Contributions
Conceptualization: P.C.D. and E.J.A.; methodology: P.C.D., E.J.A., Y.Y., and L.E.C.; data acquisition: S.B., Y.Y., L.E.C., M.P., M.G., K.F., Y.Y., J.R.B., C.G., J.L., A.Z.P.M., I.E.M., Y.A.N., L.T.T.N., C.A.O., L.R.R., J.S., T.V.; data analysis: P.C.D., E.J.A., Y.Y., and M.S.D.; writing original draft: Y.Y., M.S.D., P.C.D., and S.R.B. GMbC represents authors affiliated with the Global Microbiome Conservancy.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Guang Liu, Richard Morgan, and Peter Weigele for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yuan, Y., DeMott, M.S., Byrne, S.R. et al. Phosphorothioate DNA modification by BREX type 4 systems in the human gut microbiome. Nat Commun 17, 1717 (2026). https://doi.org/10.1038/s41467-026-68412-5
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41467-026-68412-5








