Intragenic DNA inversions expand bacterial coding capacity

Chanin, Rachael B.; West, Patrick T.; Wirbel, Jakob; Gill, Matthew O.; Green, Gabriella Z. M.; Park, Ryan M.; Enright, Nora; Miklos, Arjun M.; Hickey, Angela S.; Brooks, Erin F.; Lum, Krystal K.; Cristea, Ileana M.; Bhatt, Ami S.

doi:10.1038/s41586-024-07970-4

Article
Published: 25 September 2024

Intragenic DNA inversions expand bacterial coding capacity

Nature volume 634, pages 234–242 (2024)Cite this article

15k Accesses
30 Citations
148 Altmetric
Metrics details

Subjects

Abstract

Bacterial populations that originate from a single bacterium are not strictly clonal and often contain subgroups with distinct phenotypes¹. Bacteria can generate heterogeneity through phase variation—a preprogrammed, reversible mechanism that alters gene expression levels across a population¹. One well-studied type of phase variation involves enzyme-mediated inversion of specific regions of genomic DNA². Frequently, these DNA inversions flip the orientation of promoters, turning transcription of adjacent coding regions on or off². Through this mechanism, inversion can affect fitness, survival or group dynamics^3,4. Here, we describe the development of PhaVa, a computational tool that identifies DNA inversions using long-read datasets. We also identify 372 ‘intragenic invertons’, a novel class of DNA inversions found entirely within genes, in genomes of bacterial and archaeal isolates. Intragenic invertons allow a gene to encode two or more versions of a protein by flipping a DNA sequence within the coding region, thereby increasing coding capacity without increasing genome size. We validate ten intragenic invertons in the gut commensal Bacteroides thetaiotaomicron, and experimentally characterize an intragenic inverton in the thiamine biosynthesis gene thiC.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Short-read metagenomic datasets reveal intragenic invertons in BTh.**

**Fig. 2: PhaVa analysis of long-read sequencing data from isolates reveals that intragenic inversions are prevalent across the bacterial tree of life.**

**Fig. 3: PhaVa analysis of 210 long-read metagenomes from human stool.**

**Fig. 4: Consequences of inversion in thiamine biosynthesis protein.**

Bacterial retrons encode phage-defending tripartite toxin–antitoxin systems

Article 18 July 2022

An experimental census of retrons for DNA production and genome editing

Article 17 September 2024

Large-scale capsid-mediated mobilisation of bacterial genomic DNA in the gut microbiome

Article Open access 27 January 2026

Data availability

Short-read adult stool sequencing data was previously published and is available under NCBI BioProject ID PRJNA707487. Short-read paediatric stool sequencing data were previously published and are available under NCBI BioProject ID PRJNA787952. Long-read metagenomic sequencing data were previously published and are available under BioProject PRJNA820119 and BioProject PRJNA940499. Assembled metagenomic contigs are available at https://doi.org/10.5281/zenodo.7662825. A list of accession numbers for long-read isolate sequencing data are available in Supplementary Table 3. Mass spectrometry raw files (.d) generated in this study have been deposited to the ProteomeXchange Consortium through the PRIDE partner repository⁷⁷ (project accession PXD054577). Long-read sequencing data for the locked thiC intragenic inverton strains and RNA-sequencing data are available under NCBI BioProject ID PRJNA1118344. Accession codes for long-read datasets are listed in Supplementary Table 3. The reference genome for B. thetaiotaomicron VPI-5482 is the NCBI reference sequence AE015928.1. The reference genome for B. fragilis FDAARGOS_1225 is the NCBI reference sequence NZ_CP069563.1. Source data are provided with this paper.

Code availability

PhaVa is available at https://github.com/patrickwest/PhaVa. Long-read datasets were analysed with PhaVa (v0.1.0) with default parameters.

References

van der Woude, M. W. & Bäumler, A. J. Phase and antigenic variation in bacteria. Clin. Microbiol. Rev. 17, 581–611 (2004).
Article PubMed PubMed Central Google Scholar
Trzilova, D. & Tamayo, R. Site-specific recombination—how simple DNA inversions produce complex phenotypic heterogeneity in bacterial populations. Trends Genet. 37, 59–72 (2021).
Article CAS PubMed Google Scholar
Zieg, J., Silverman, M., Hilmen, M. & Simon, M. Recombinational switch for gene expression. Science 196, 170–172 (1977).
Article ADS CAS PubMed Google Scholar
Stocker, B. A. Measurements of rate of mutation of flagellar antigenic phase in Salmonella typhimurium. J. Hyg. 47, 398–413 (1949).
CAS PubMed PubMed Central Google Scholar
Meydan, S., Vázquez-Laslop, N. & Mankin, A. S. Genes within genes in bacterial genomes. Microbiol. Spectr. 6, rwr-0020-2018 (2018).
Article Google Scholar
Zhong, A. et al. Toxic antiphage defense proteins inhibited by intragenic antitoxin proteins. Proc. Natl Acad. Sci. USA 120, e2307382120 (2023).
Article CAS PubMed PubMed Central Google Scholar
Moxon, R., Bayliss, C. & Hood, D. Bacterial contingency loci: the role of simple sequence DNA repeats in bacterial adaptation. Annu. Rev. Genet. 40, 307–333 (2006).
Article CAS PubMed Google Scholar
Sberro, H. et al. Large-scale analyses of human microbiomes reveal thousands of small, novel genes. Cell 178, 1245–1259.e14 (2019).
Article CAS PubMed PubMed Central Google Scholar
Schlub, T. E. & Holmes, E. C. Properties and abundance of overlapping genes in viruses. Virus Evol. 6, veaa009 (2020).
Article PubMed PubMed Central Google Scholar
Medhekar, B. & Miller, J. F. Diversity-generating retroelements. Curr. Opin. Microbiol. 10, 388–395 (2007).
Article CAS PubMed PubMed Central Google Scholar
Andrewes, F. W. Studies in group-agglutination I. The Salmonella group and its antigenic structure. J. Pathol. Bacteriol. 25, 505–521 (1922).
Article Google Scholar
Goldberg, A., Fridman, O., Ronin, I. & Balaban, N. Q. Systematic identification and quantification of phase variation in commensal and pathogenic Escherichia coli. Genome Med. 6, 112 (2014).
Article PubMed PubMed Central Google Scholar
Sekulovic, O. et al. Genome-wide detection of conservative site-specific recombination in bacteria. PLoS Genet. 14, e1007332 (2018).
Article PubMed PubMed Central Google Scholar
Jiang, X. et al. Invertible promoters mediate bacterial phase variation, antibiotic resistance, and host adaptation in the gut. Science 363, 181–187 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Milman, O., Yelin, I. & Kishony, R. Systematic identification of gene-altering programmed inversions across the bacterial domain. Nucleic Acids Res. 51, 553–573 (2023).
Article CAS PubMed PubMed Central Google Scholar
Komano, T. Shufflons: multiple inversion systems and integrons. Annu. Rev. Genet. 33, 171–191 (1999).
Article CAS PubMed Google Scholar
Atack, J. M., Guo, C., Yang, L., Zhou, Y. & Jennings, M. P. DNA sequence repeats identify numerous type I restriction-modification systems that are potential epigenetic regulators controlling phase-variable regulons; phasevarions. FASEB J. 34, 1038–1051 (2020).
Article CAS PubMed Google Scholar
Chatzidaki-Livanis, M., Coyne, M. J., Roche-Hakansson, H. & Comstock, L. E. Expression of a uniquely regulated extracellular polysaccharide confers a large-capsule phenotype to Bacteroides fragilis. J. Bacteriol. 190, 1020–1026 (2008).
Article CAS PubMed Google Scholar
Taketani, M., Donia, M. S., Jacobson, A. N., Lambris, J. D. & Fischbach, M. A. A phase-variable surface layer from the gut symbiont Bacteroides thetaiotaomicron. mBio 6, e01339-15 (2015).
Article CAS PubMed PubMed Central Google Scholar
Troy, E. B., Carey, V. J., Kasper, D. L. & Comstock, L. E. Orientations of the Bacteroides fragilis capsular polysaccharide biosynthesis locus promoters during symbiosis and infection. J. Bacteriol. 192, 5832–5836 (2010).
Article CAS PubMed PubMed Central Google Scholar
Severyn, C. J. et al. Microbiota dynamics in a randomized trial of gut decontamination during allogeneic hematopoietic cell transplantation. JCI Insight 7, e154344 (2022).
Article PubMed PubMed Central Google Scholar
Siranosian, B. A. et al. Rare transmission of commensal and pathogenic bacteria in the gut microbiome of hospitalized adults. Nat. Commun. 13, 586 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Martens, E. C., Chiang, H. C. & Gordon, J. I. Mucosal glycan foraging enhances fitness and transmission of a saccharolytic human gut bacterial symbiont. Cell Host Microbe 4, 447–457 (2008).
Article CAS PubMed PubMed Central Google Scholar
Martens, E. C., Roth, R., Heuser, J. E. & Gordon, J. I. Coordinate regulation of glycan degradation and polysaccharide capsule biosynthesis by a prominent human gut symbiont. J. Biol. Chem. 284, 18445–18457 (2009).
Article CAS PubMed PubMed Central Google Scholar
Krinos, C. M. et al. Extensive surface diversity of a commensal microorganism by multiple DNA inversions. Nature 414, 555–558 (2001).
Article ADS CAS PubMed Google Scholar
Porter, N. T. et al. Phase-variable capsular polysaccharides and lipoproteins modify bacteriophage susceptibility in Bacteroides thetaiotaomicron. Nat. Microbiol. 5, 1170–1181 (2020).
Article CAS PubMed PubMed Central Google Scholar
Round, J. L. et al. The Toll-like receptor 2 pathway establishes colonization by a commensal of the human microbiota. Science 332, 974–977 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Neff, C. P. et al. Diverse intestinal bacteria contain putative zwitterionic capsular polysaccharides with anti-inflammatory properties. Cell Host Microbe 20, 535–547 (2016).
Article CAS PubMed PubMed Central Google Scholar
Mazmanian, S. K., Liu, C. H., Tzianabos, A. O. & Kasper, D. L. An immunomodulatory molecule of symbiotic bacteria directs maturation of the host immune system. Cell 122, 107–118 (2005).
Article CAS PubMed Google Scholar
Porter, N. T., Canales, P., Peterson, D. A. & Martens, E. C. A Subset of polysaccharide capsules in the human symbiont Bacteroides thetaiotaomicron promote increased competitive fitness in the mouse gut. Cell Host Microbe 22, 494–506.e8 (2017).
Article CAS PubMed PubMed Central Google Scholar
Musumeci, O. et al. Intragenic inversion of mtDNA: a new type of pathogenic mutation in a patient with mitochondrial myopathy. Am. J. Hum. Genet. 66, 1900–1904 (2000).
Article CAS PubMed PubMed Central Google Scholar
Smyshlyaev, G., Bateman, A. & Barabas, O. Sequence analysis of tyrosine recombinases allows annotation of mobile genetic elements in prokaryotic genomes. Mol. Syst. Biol. 17, e9880 (2021).
Article CAS PubMed PubMed Central Google Scholar
West, P. T., Chanin, R. B. & Bhatt, A. S. From genome structure to function: insights into structural variation in microbiology. Curr. Opin. Microbiol. 69, 102192 (2022).
Article CAS PubMed PubMed Central Google Scholar
van Kempen, M. et al. Fast and accurate protein structure search with Foldseek. Nat. Biotechnol. 42, 243–246 (2024).
Article PubMed Google Scholar
Casino, P., Rubio, V. & Marina, A. The mechanism of signal transduction by two-component systems. Curr. Opin. Struct. Biol. 20, 763–771 (2010).
Article CAS PubMed Google Scholar
Loenen, W. A. M., Dryden, D. T. F., Raleigh, E. A. & Wilson, G. G. Type I restriction enzymes and their relatives. Nucleic Acids Res. 42, 20–44 (2014).
Article CAS PubMed Google Scholar
De Ste Croix, M. et al. Phase-variable methylation and epigenetic regulation by type I restriction-modification systems. FEMS Microbiol. Rev. 41, S3–S15 (2017).
Article Google Scholar
Chen, L. et al. Short- and long-read metagenomics expand individualized structural variations in gut microbiomes. Nat. Commun. 13, 3175 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Maghini, D. G. et al. Quantifying bias introduced by sample collection in relative and absolute microbiome measurements. Nat. Biotechnol. 42, 328–338 (2024).
Rodionov, D. A. et al. Micronutrient requirements and sharing capabilities of the human gut microbiome. Front. Microbiol. 10, 1316 (2019).
Article PubMed PubMed Central Google Scholar
Sharma, V. et al. B-vitamin sharing promotes stability of gut microbial communities. Front. Microbiol. 10, 1485 (2019).
Article PubMed PubMed Central Google Scholar
Yatsunenko, T. et al. Human gut microbiome viewed across age and geography. Nature 486, 222–227 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Costliow, Z. A. & Degnan, P. H. Thiamine acquisition strategies impact metabolism and competition in the gut microbe Bacteroides thetaiotaomicron. mSystems 2, e00116–17 (2017).
Article CAS PubMed PubMed Central Google Scholar
Martinez-Gomez, N. C. & Downs, D. M. ThiC is an [Fe-S] cluster protein that requires AdoMet to generate the 4-amino-5-hydroxymethyl-2-methylpyrimidine moiety in thiamin synthesis. Biochemistry 47, 9054–9056 (2008).
Article CAS PubMed Google Scholar
Said, H. M. Intestinal absorption of water-soluble vitamins in health and disease. Biochem. J. 437, 357–372 (2011).
Article CAS PubMed Google Scholar
D’Souza, G. et al. Less is more: selective advantages can explain the prevalent loss of biosynthetic genes in bacteria. Evolution 68, 2559–2570 (2014).
Article PubMed Google Scholar
Jurgenson, C. T., Ealick, S. E. & Begley, T. P. Biosynthesis of thiamin pyrophosphate. EcoSal Plus https://doi.org/10.1128/ecosalplus.3.6.3.7 (2009).
Rodionov, D. A., Vitreschak, A. G., Mironov, A. A. & Gelfand, M. S. Comparative genomics of thiamin biosynthesis in prokaryotes. J. Biol. Chem. 277, 48949–48959 (2002).
Article CAS PubMed Google Scholar
Bacic, M. K. & Smith, C. J. Laboratory maintenance and cultivation of bacteroides species. Curr. Protoc. Microbiol. https://doi.org/10.1002/9780471729259.mc13c01s9 (2008).
Zhu, W. et al. Xenosiderophore utilization promotes Bacteroides thetaiotaomicron resilience during colitis. Cell Host Microbe 27, 376–388.e8 (2020).
Article CAS PubMed PubMed Central Google Scholar
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Article CAS PubMed Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central Google Scholar
Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000).
Article CAS PubMed Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Yang, C., Chu, J., Warren, R. L. & Birol, I. NanoSim: nanopore sequence read simulator based on statistical characterization. Gigascience 6, gix010 (2017).
Article Google Scholar
Ono, Y., Asai, K. & Hamada, M. PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores. Bioinformatics 37, 589–595 (2021).
Article CAS PubMed Google Scholar
Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 11, 2864–2868 (2017).
Article CAS PubMed PubMed Central Google Scholar
De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M. & Van Broeckhoven, C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics 34, 2666–2669 (2018).
Article PubMed PubMed Central Google Scholar
Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
Article CAS PubMed PubMed Central Google Scholar
Meng, E. C. et al. UCSF ChimeraX: tools for structure building and analysis. Protein Sci. 32, e4792 (2023).
Article CAS PubMed PubMed Central Google Scholar
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
Article CAS PubMed PubMed Central Google Scholar
Aramaki, T. et al. KofamKOALA: KEGG ortholog assignment based on profile HMM and adaptive score threshold. Bioinformatics 36, 2251–2252 (2020).
Article CAS PubMed Google Scholar
Eddy, S. R. Accelerated profile HMM searches. PLoS Comput. Biol. 7, e1002195 (2011).
Article ADS MathSciNet CAS PubMed PubMed Central Google Scholar
Paysan-Lafosse, T. et al. InterPro in 2022. Nucleic Acids Res. 51, D418–D427 (2023).
Article CAS PubMed Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. 57, 289–300 (1995).
Article MathSciNet Google Scholar
Prjibelski, A., Antipov, D., Meleshko, D., Lapidus, A. & Korobeynikov, A. Using SPAdes de novo assembler. Curr. Protoc. Bioinformatics 70, e102 (2020).
Article CAS PubMed Google Scholar
Lin, Y. et al. Assembly of long error-prone reads using de Bruijn graphs. Proc. Natl Acad. Sci. USA 113, E8396–E8405 (2016).
Article CAS PubMed PubMed Central Google Scholar
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).
Article PubMed PubMed Central Google Scholar
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019).
Article CAS PubMed PubMed Central Google Scholar
Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).
Article CAS PubMed Google Scholar
Skowronek, P. et al. Rapid and in-depth coverage of the (phospho-)proteome with deep libraries and optimal window design for dia-PASEF. Mol. Cell. Proteomics 21, 100279 (2022).
Article CAS PubMed PubMed Central Google Scholar
Kong, A. T., Leprevost, F. V., Avtonomov, D. M., Mellacheruvu, D. & Nesvizhskii, A. I. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat. Methods 14, 513–520 (2017).
Article CAS PubMed PubMed Central Google Scholar
Demichev, V., Messner, C. B., Vernardis, S. I., Lilley, K. S. & Ralser, M. DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput. Nat. Methods 17, 41–44 (2020).
Article CAS PubMed Google Scholar
MacLean, B. et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966–968 (2010).
Article CAS PubMed PubMed Central Google Scholar
Pino, L. K. et al. The Skyline ecosystem: informatics for quantitative mass spectrometry proteomics. Mass Spectrom. Rev. 39, 229–244 (2020).
Article ADS CAS PubMed Google Scholar
Perez-Riverol, Y. et al. The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res. 50, D543–D552 (2022).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

The authors thank D. Schmidtke, A. Natarajan, J. D. Shanahan, D. Maghini, M. Dvorak, A. Han, M. Chakraborty, X. Jin, E. Martens and Bhatt laboratory members for helpful conversations and scientific advice regarding this project; W. Zhu for plasmids (pNBU2_tet and pNBU2_erm); S. Winter for the DH5α strain; and D. Haft and F. Thibaud-Nissen for helpful discussion about accessing SRA long-read datasets. Funding was provided as follows: National Institutes of Health R01 AI148623 (A.S.B.), National Institutes of Health R01 AI143757 (A.S.B.), Stand Up 2 Cancer Foundation (A.S.B. and I.M.C.), National Institutes of Health R01 AI174515 (I.M.C.), National Institutes of Health T32 training Grant HG000044 (R.B.C.), The AP Giannini Foundation (R.B.C.), National Institutes of Health T32 training Grant HL120824 (P.T.W.), National Science Foundation Graduate Research Fellowship (M.O.G., A.S.H. and N.E.), Stanford DARE fellowship (N.E.), and National Institutes of Health TL1 training Award TL1TR003019 (K.K.L.). Computing costs were supported, in part, by an NIH S10 Shared Instrumentation grant (1S10OD02014101).

Author information

These authors contributed equally: Rachael B. Chanin, Patrick T. West

Authors and Affiliations

Department of Medicine, Division of Hematology, Stanford University, Stanford, CA, USA
Rachael B. Chanin, Patrick T. West, Jakob Wirbel, Gabriella Z. M. Green, Arjun M. Miklos, Erin F. Brooks & Ami S. Bhatt
Department of Genetics, Stanford University, Stanford, CA, USA
Matthew O. Gill, Ryan M. Park, Angela S. Hickey & Ami S. Bhatt
Department of Bioengineering, Stanford University, Stanford, CA, USA
Nora Enright
Department of Molecular Biology, Princeton University, Princeton, NJ, USA
Krystal K. Lum & Ileana M. Cristea

Authors

Rachael B. Chanin
View author publications
Search author on:PubMed Google Scholar
Patrick T. West
View author publications
Search author on:PubMed Google Scholar
Jakob Wirbel
View author publications
Search author on:PubMed Google Scholar
Matthew O. Gill
View author publications
Search author on:PubMed Google Scholar
Gabriella Z. M. Green
View author publications
Search author on:PubMed Google Scholar
Ryan M. Park
View author publications
Search author on:PubMed Google Scholar
Nora Enright
View author publications
Search author on:PubMed Google Scholar
Arjun M. Miklos
View author publications
Search author on:PubMed Google Scholar
Angela S. Hickey
View author publications
Search author on:PubMed Google Scholar
Erin F. Brooks
View author publications
Search author on:PubMed Google Scholar
Krystal K. Lum
View author publications
Search author on:PubMed Google Scholar
Ileana M. Cristea
View author publications
Search author on:PubMed Google Scholar
Ami S. Bhatt
View author publications
Search author on:PubMed Google Scholar

Contributions

R.B.C., P.T.W. and A.S.B. conceived and designed the study. Data acquisition and processing was performed by R.B.C., P.T.W., J.W., R.M.P., G.Z.M.G., A.S.H., M.O.G., E.F.B., A.M.M., N.E. and K.K.L. Data were visualized by R.B.C., P.T.W., J.W., R.M.P., K.K.L. and M.O.G. Data interpretation was done by R.B.C., P.T.W., J.W., M.O.G., K.K.L. and A.S.B. Funding acquisition was done by A.S.B. and I.M.C. Writing of the original draft was performed by R.B.C., P.T.W. and A.S.B. All authors contributed to the review and editing of this manuscript.

Corresponding author

Correspondence to Ami S. Bhatt.

Ethics declarations

Competing interests

P.T.W. is a contract bioinformatician at Oxford Nanopore Technologies; this position started during the review process. The other authors declare no competing interests.

Peer review

Peer review information

Nature thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Inverton detection and confirmation in BTh.

(A) Inversion proportions of CPS loci in metagenomic samples measured with PhaseFinder (Top) and PhaVa (Bottom). Samples with no inversions in the five CPS invertons were removed. (B) Schematic of PCR confirmation. Forward and Reverse primers bind to regions of the genome upstream and downstream of the inverton on opposite strands. The Common primer binds the DNA inside of the inverton, between the inverted repeats. When the DNA is in the forward orientation (left), the Common and Forward primer will generate a PCR product. When the inverton flips, the Common and Reverse primer will generate a PCR product (right). HCT, hematopoietic cell transplantation.

Source Data

Extended Data Fig. 2 Taxonomic composition of short-read samples from Siranosian et al. 2020.

(A-B) Taxonomic distribution for samples at the genus level. Individual reads were taxonomically classified with Kraken2 using a Genbank reference set. Relative abundances were estimated with Bracken. Genera that represented less than 2% estimated relative abundance in a given sample were collapsed into ‘other’ for plotting. (A) Samples without detected BTh intragenic inversions and (B) samples with detected BTh intragenic inversions are shown. (C) The distribution of genus level Shannon diversity calculated for individual samples. Samples are grouped by presence or absence of BTh intragenic inversions. Center line represents the mean Shannon diversity of grouped samples, boxes represent quartiles, and whiskers extend to the furthest datapoint in either direction within 1.5 times the interquartile range.

Source Data

Extended Data Fig. 3 Developing and optimizing PhaVa, a long-read based, accurate inverton caller.

(A) Schematic of the PhaVa workflow. Putative invertons are identified and long-reads are mapped to both a forward (highlighted by the black dashed lines) and reverse orientation (highlighted by the grey dashed lines) version of the inverton and surrounding genomic sequence. Reads that do not map across the entire inverton and into the flanking sequence on either side or have poor mapping characteristics are removed. See methods for details. (B-C) Optimizing cutoffs for the minimum number of reverse reads, as both a raw number and percentage of all reads, to reduce false positive inverton calls with simulated reads. Cell color and number represent (B) the false positive rate per simulated readset and (C) the total number of unique false positives across all simulated datasets. (D) False positives in simulated data plotted per species. All measurements were made with a minimum of three reverse reads cutoff and varying the percentage of minimum reverse reads cutoff. Dashed line indicates the minimum reverse reads percent cutoff used for isolate and metagenomic datasets. Solid lines indicate sample mean while colored bands indicate 95% confidence interval. (E) Output tables of particular interest are labeled and shown below the diagram with example output.

Source Data

Extended Data Fig. 4 Benchmarking PhaVa with pbsim2 simulated reads.

(A) Density plot of false positive rate vs false negative rate for 100 bacterial species. Three 100x replicates were generated per species. (B) Scatterplot of simulated reads from 8 different E. coli genomes, with varying ANI to a singular reference E. coli genome, showing reduced coverage (and thus reduced detection) of invertons when mapping to a distant reference. Three 100x replicates were generated per genome. (C) False positive rate for read sets simulated for both E. coli and BTh and varying coverage levels. Three replicates were generated per coverage level. Error bars represent standard deviation between replicates. (D) False positive rates in readsets simulated from E. coli with varying mean read length. Three 100x replicates were generated per mean read length. Error bars represent standard deviation between replicates.

Source Data

Extended Data Fig. 5 Very long (>750 bp), near perfect, inverted repeats can lead to false positives.

(A) Alignment of inverton NZ_CP025371.1:2124719-2124870-2125316-2125467, with its invertible sequence flipped, against the B. pertussis genome leads to perfect alignment of flanking and IR regions as expected. ‘Reference genome’ refers to the B. pertussis reference genome sequence. ‘Inverton reversed’ refers to the putative inverton sequence and flanking sequence, with the invertible sequence inverted. Red dashed lines indicate boundaries of the invertible sequence, black dashed lines indicate boundaries of the inverted repeats as detected by einverted, and purple dashed lines indicate the true boundary of inverted repeats. (B) Alignment of the reverse complement of the entire inverton NZ_CP025371.1:2124719-2124870-2125316-2125467 with its invertible sequence inverted and flanking sequence, against the B. pertussis genome leads to near perfect alignment (6 mismatches) spanning far into the flanking sequence to the true boundary of the inverted repeats, allowing for reads to map regardless of inverton orientation. (C) Example with toy nucleotide sequences. Red nucleotides indicate mismatches.

Extended Data Fig. 6 Overview of SRA long-read isolate sequencing samples analyzed with PhaVa.

(A) The number of unique species represented in the dataset, grouped by phylum. (B) The raw number of sequencing samples, grouped by phylum. (C) Histogram of sequencing samples per species. Species with large numbers of samples are labeled. (D) A histogram of sequencing depths for all long-read isolate sequencing samples.

Source Data

Extended Data Fig. 7 Intragenic invertons that recode proteins are identified in long-read isolate datasets.

(A-C) Genome diagrams for recoding intragenic invertons are shown. Grey boxes indicate annotated protein domains. Black lines indicate the region contained within the inverton. AlphaFold structure of the forward (dark blue) and reverse (light blue) are shown. Amino acids affected by the inverton are shown in pink. (A) slmA nucleoid occlusion factor Bordetella bronchiseptica. RMSD 26.287 angstroms across all pairs. pLDDT forward: 94.5. pLDDT reverse: 51.3. (B) barA two-component sensor histidine kinase in Aeromonas hydrophila. The Receiver and HPt domain are shown. RMSD 5.492 angstroms across all pairs. pLDDT forward: 83.4. pLDDT reverse: 76.2. (C) Type I restriction enzyme S subunit hsdS₁ and hsdS₂ in Mycoplasma hominis. RMSD 1.809 and 4.167 angstroms across all pairs, respectively. pLDDT hsdS₁ forward 90.8, reverse 87. pLDDT hsdS₂ forward 92.4, reverse 84.2. HTH, helix-turn-helix; HK, histidine kinase; HPt, histidine phosphotransfer domain; TRD, target recognition domains.

Extended Data Fig. 8 Intragenic invertons are rare across genomes yet consistently enriched in some Pfam clans.

(A) Histograms showing the number of clades (genomes, species, or genera) at various numbers of invertons indicate that invertons are rare, as only one to three invertons can be detected in the majority of clades. Only clades with at least five invertons (red line; number of clades is indicated in the top-right corner of each subplot) were included for the subsequent enrichment analysis. (B) KEGG pathways and Pfam clans were tested for enrichment of intragenic (or partial intergenic) invertons in included clades, using a one-sided Fisher’s exact test per clade (see Methods). Enrichment was only calculated for sets with at least five invertons associated with genes in the set. Histograms show the number of sets with enrichment score at the number of included clades, showing that most enrichments could be calculated for single clades only. For example, all KEGG pathways associated with enough intragenic invertons for an enrichment analysis on genome-level were specific for each genome. Sets with enrichment scores across at least five clades (red line) are labeled with their corresponding identifiers. (C) Heatmap showing the log-odds ratio (effect size for the enrichment of intragenic invertons) across included clades for the six Pfam clans that have enrichment scores on genus-level (see panel B). Stars indicate significance of the enrichment as calculated by Fisher’s exact test and corrected for multiple hypothesis testing using the Benjamini-Hochberg procedure.

Source Data

Extended Data Fig. 9 Locked thiC intragenic inverton construction and growth competition.

(A) Generation of locked intragenic invertons. The forward and locked forward thiC inverted repeat (IR) nucleotide sequences are shown. When possible, the wobble position of each codon corresponding to the IR was mutated to increase mismatches between the two palindromic sequences while maintaining the amino acid sequence. (B) Mutated nucleotides are highlighted in grey. (C) Locked thiC strains were competed against each other in thiamine-containing media in a 1:1 ratio. After 40 h, the abundance of each strain was enumerated using selective agar. Black bars indicate the locked forward strain and white bars indicate the locked reverse strain. Recovered abundances shown here correspond with the competitive index shown in Fig. 4d. Left - the locked forward strain is marked with an erythromycin resistant cassette and the locked reverse strain is marked with a tetracycline resistant cassette. Right - the locked forward strain is marked with a tetracycline resistant cassette and the locked reverse strain is marked with an erythromycin resistant cassette. Geometric mean and geometric standard deviation are shown. Each dot represents an individual replicate. Experiments were done in biological duplicate or triplicate and repeated 4 or 6 times. A two-tailed ratio paired t test was performed on the locked forward and locked reverse abundances. ***, p < 0.001; **, p < 0.01; *, p < 0.05.

Source Data

Extended Data Fig. 10 Detection of ThiC unique peptides in the forward and reverse orientation.

(A) Schematic showing the ThiC protein sequence in the forward and predicted reverse open reading frames (ORF1, ORF2, ORF3). Arrows indicate the direction of transcription. Shared colors between the forward and reverse ORFs indicate identical amino acids. The dark orange box in ORF1 is the ThiC reverse specific peptide. (B) Mass spectrometry quantification, by data-independent acquisition, of ThiC tryptic peptides aligned with each ORF. NCPVPVGTVPIYQALEK includes cysteine carbamidomethylation. (C) Quantification of the unique ThiC tryptic peptides that align exclusively to the forward or reverse sequences. Representative extracted ion chromatograms show the identified fragment ions applied in the quantification of the forward and reverse peptides, as detected in WT and LR, respectively. Representative MS/MS spectra of the unique ThiC forward peptide GDVEQLPEITSEYGQMR detected in WT and unique ThiC predicted inverton peptide GDVEQLPEITSEYGQIR detected in LR. Spectra include the MS1 precursor ion (orange), as well as b- and y- fragment ions (blue and red, respectively). ND, not detected; WT, wild-type; LF, locked forward thiC intragenic inverton; UF, unlocked forward thiC intragenic inverton; LR, locked reverse thiC intragenic inverton; UR, unlocked reverse thiC intragenic inverton.

Source Data

Extended Data Fig. 11 Effect of thiamine on BTh invertases and thiamine biosynthesis and uptake loci.

(A) Log₂RPKM values for all annotated invertases in BTh are shown across the thiC mutant backgrounds and thiamine concentrations. Axis is clustered with pheatmap’s default parameters (Euclidean). (B) Log₂RPKM values for transcript levels of all genes in the thiamine biosynthesis pathway (BT0647-0653) and genes involved in uptake of thiamine (BT2390, BT2396) are denoted. (C) Intensity of ThiH protein as determined by mass spectrometry using data-independent acquisition. UR, unlocked thiC reverse; LR, locked thiC reverse.

Source Data

Extended Data Fig. 12 Thiamine is not a strong driver of thiC intragenic inverton flipping.

(A) Schematic denoting the generation of unlocked thiC strains and the experimental setup for the long-term thiamine exposure assay. (B) Results of the long-term thiamine exposure assay. Top – two replicates of the unlocked reverse thiC strain that were serially cultured in low thiamine (0.001 μM). Bottom – two replicates of the unlocked forward thiC strain that were serially cultured in high thiamine (10 μM). LOD, limit of detection; O/N, overnight culture; ND, not detected.

Source Data

Supplementary information

Supplementary Figure 1

Full image of the gel from Fig. 4b.

Reporting Summary

Peer Review File

Supplementary Table 1

B. fragilis intragenic invertons identified from short-read metagenomic sequencing samples.

Supplementary Table 2

Sheet 1: NanoSim simulated read datasets. Sheet 2: pbsim2 simulated read datasets.

Supplementary Table 3

Sheet 1: Invertons identified from long-read isolate sequencing samples. Sheet 2: Intragenic invertons identified from long-read isolate sequencing samples. Sheet 3: List of accession numbers and associated metadata for long-read isolate sequencing samples. Sheet 4: Archaeal invertons identified from long-read isolate sequencing samples.

Supplementary Table 4

Dereplicated invertons identified from long-read metagenomic sequencing samples.

Supplementary Table 5

Differentially expressed genes across thiamine concentrations.

Supplementary Table 6

Sheet 1: Additional sequences. Sheet 2: Primers used in this study. Sheet 3: Strains used in this study Sheet 4: Recombinant DNA.

Supplementary Table 7

Variable isolation windows used in dia-PASEF method.

Source data

Source Data Fig. 1

Source Data Fig. 2

Source Data Fig. 3

Source Data Fig. 4

Source Data Extended Data Fig. 1

Source Data Extended Data Fig. 2

Source Data Extended Data Fig. 3

Source Data Extended Data Fig. 4

Source Data Extended Data Fig. 6

Source Data Extended Data Fig. 8

Source Data Extended Data Fig. 9

Source Data Extended Data Fig. 10

Source Data Extended Data Fig. 11

Source Data Extended Data Fig. 12

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chanin, R.B., West, P.T., Wirbel, J. et al. Intragenic DNA inversions expand bacterial coding capacity. Nature 634, 234–242 (2024). https://doi.org/10.1038/s41586-024-07970-4

Download citation

Received: 05 September 2023
Accepted: 20 August 2024
Published: 25 September 2024
Version of record: 25 September 2024
Issue date: 03 October 2024
DOI: https://doi.org/10.1038/s41586-024-07970-4

This article is cited by

Comprehensive profiling of genomic invertons in defined gut microbial community reveals associations with intestinal colonization and surface adhesion
- Xiaofan Jin
- Alice G. Cheng
- Katherine S. Pollard
Microbiome (2025)
Dietary carbohydrates alter immune-modulatory functionalities and DNA inversions in Bacteroides thetaiotaomicron
- Noa Gal-Mandelbaum
- Shaqed Carasso
- Naama Geva-Zatorsky
Nature Communications (2025)
How and when organisms edit their own genomes
- Vincent C. T. Hanlon
- Alex Cagan
- Sebastian Eves-van den Akker
Nature Genetics (2025)
Jekyll and Hyde flip of the script when bacteria invert gene sequences
- Chia-Chi Chang
- Robert R. Jenq
Nature (2024)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data figures and tables

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links